100% found this document useful (3 votes)
2K views669 pages

Heights in Diophantine Geometry - E. Bombieri, W. Gubler (Cambridge, 2006) WW PDF

Uploaded by

Slava Dubinin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
2K views669 pages

Heights in Diophantine Geometry - E. Bombieri, W. Gubler (Cambridge, 2006) WW PDF

Uploaded by

Slava Dubinin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 669

Heights in Diophantine Geometry

The first half of the book is devoted to the general theory of heights and its applications,
including a complete, detailed proof of the celebrated subspace theorem of W. M. Schmidt.
The second part deals with abelian varieties, the Mordell–Weil theorem and Faltings’s
proof of the Mordell conjecture, ending wih a self-contained exposition of Nevanlinna
theory and the related famous conjectures of Vojta. The book concludes with a
comprehensive list of references. It is destined to be a definitive reference book on modern
diophantine geometry, bringing a new standard of rigor and elegance to the field.

Professor ENRICO BOMBIERI is a professor of Mathematics at the Institute of Advanced


Study, Princeton.

Dr WALTER GUBLER is a lecturer in Mathematics at the University of Dortmund.


New Mathematical Monographs

Editorial Board

Béla Bollabás, University of Memphis


William Fulton, University of Michigan
Frances Kirwan, Mathematical Institute, University of Oxford
Peter Sarnak, Princeton University
Barry Simon, California Institute of Technology

For information about Cambridge University Press mathematics publications visit


http://publishing.cambridge.org/stm/mathematics/
HEIGHTS IN
DIOPHANTINE GEOMETRY

Enrico Bombieri
Institute of Advanced Study, Princeton

Walter Gubler
University of Dortmund
cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo

Cambridge University Press


The Edinburgh Building, Cambridge cb2 2ru, UK
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9780521846158

© Cambridge University Press 2006

This publication is in copyright. Subject to statutory exception and to the provision of


relevant collective licensing agreements, no reproduction of any part may take place
without the written permission of Cambridge University Press.

First published in print format 2006

isbn-13 978-0-511-14061-7 eBook (NetLibrary)


isbn-10 0-511-14061-4 eBook (NetLibrary)

isbn-13 978-0-521-84615-8 hardback


isbn-10 0-521-84615-3 hardback

Cambridge University Press has no responsibility for the persistence or accuracy of urls
for external or third-party internet websites referred to in this publication, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.
Contents

Preface page xi
Terminology xv

1. Heights 1
1.1. Introduction 1
1.2. Absolute values 1
1.3. Finite-dimensional extensions 5
1.4. The product formula 9
1.5. Heights in projective and affine space 15
1.6. Heights of polynomials 21
1.7. Lower bounds for norms of products of polynomials 29
1.8. Bibliographical notes 33

2. Weil heights 34
2.1. Introduction 34
2.2. Local heights 35
2.3. Global heights 39
2.4. Weil heights 42
2.5. Explicit bounds for Weil heights 45
2.6. Bounded subsets 54
2.7. Metrized line bundles and local heights 57
2.8. Heights on Grassmannians 66
2.9. Siegel’s lemma 72
2.10. Bibliographical notes 80

3. Linear tori 82
3.1. Introduction 82
3.2. Subgroups and lattices 82
3.3. Subvarieties and maximal subgroups 88
3.4. Bibliographical notes 92

v
vi Contents

4. Small points 93
4.1. Introduction 93
4.2. Zhang’s theorem 93
4.3. The equidistribution theorem 101
4.4. Dobrowolski’s theorem 107
4.5. Remarks on the Northcott property 117
4.6. Remarks on the Bogomolov property 120
4.7. Bibliographical notes 123

5. The unit equation 125


5.1. Introduction 125
5.2. The number of solutions of the unit equation 126
5.3. Applications 140
5.4. Effective methods 146
5.5. Bibliographical notes 149

6. Roth’s theorem 150


6.1. Introduction 150
6.2. Roth’s theorem 152
6.3. Preliminary lemmas 156
6.4. Proof of Roth’s theorem 163
6.5. Further results 170
6.6. Bibliographical notes 174

7. The subspace theorem 176


7.1. Introduction 176
7.2. The subspace theorem 177
7.3. Applications 181
7.4. The generalized unit equation 186
7.5. Proof of the subspace theorem 197
7.6. Further results: the product theorem 226
7.7. The absolute subspace theorem and the Faltings–Wüstholz
theorem 227
7.8. Bibliographical notes 230

8. Abelian varieties 231


8.1. Introduction 231
8.2. Group varieties 232
8.3. Elliptic curves 240
8.4. The Picard variety 246
Contents vii

8.5. The theorem of the square and the dual abelian variety 252
8.6. The theorem of the cube 257
8.7. The isogeny multiplication by n 263
8.8. Characterization of odd elements in the Picard group 265
8.9. Decomposition into simple abelian varieties 267
8.10. Curves and Jacobians 268
8.11. Bibliographical notes 282

9. Néron–Tate heights 283


9.1. Introduction 283
9.2. Néron–Tate heights 284
9.3. The associated bilinear form 289
9.4. Néron–Tate heights on Jacobians 294
9.5. The Néron symbol 301
9.6. Hilbert’s irreducibility theorem 314
9.7. Bibliographical notes 326

10. The Mordell–Weil theorem 328


10.1. Introduction 328
10.2. The weak Mordell–Weil theorem for elliptic curves 329
10.3. The Chevalley–Weil theorem 335
10.4. The weak Mordell–Weil theorem for abelian varieties 341
10.5. Kummer theory and Galois cohomology 344
10.6. The Mordell–Weil theorem 349
10.7. Bibliographical notes 351

11. Faltings’s theorem 352


11.1. Introduction 352
11.2. The Vojta divisor 356
11.3. Mumford’s method and an upper bound for the height 359
11.4. The local Eisenstein theorem 360
11.5. Power series, norms, and the local Eisenstein theorem 362
11.6. A lower bound for the height 371
11.7. Construction of a Vojta divisor of small height 376
11.8. Application of Roth’s lemma 381
11.9. Proof of Faltings’s theorem 387
11.10. Some further developments 391
11.11. Bibliographical notes 400
viii Contents

12. The abc -conjecture 401


12.1. Introduction 401
12.2. The abc-conjecture 402
12.3. Belyı̆’s theorem 411
12.4. Examples 416
12.5. Equivalent conjectures 424
12.6. The generalized Fermat equation 435
12.7. Bibliographical notes 442

13. Nevanlinna theory 444


13.1. Introduction 444
13.2. Nevanlinna theory in one variable 444
13.3. Variations on a theme: the Ahlfors–Shimizu characteristic 457
13.4. Holomorphic curves in Nevanlinna theory 465
13.5. Bibliographical notes 477

14. The Vojta conjectures 479


14.1. Introduction 479
14.2. The Vojta dictionary 480
14.3. Vojta’s conjectures 483
14.4. A general abc-conjecture 488
14.5. The abc-theorem for function fields 498
14.6. Bibliographical notes 513

Appendix A. Algebraic geometry 514


A.1. Introduction 514
A.2. Affine varieties 514
A.3. Topology and sheaves 518
A.4. Varieties 521
A.5. Vector bundles 525
A.6. Projective varieties 530
A.7. Smooth varieties 536
A.8. Divisors 544
A.9. Intersection theory of divisors 551
A.10. Cohomology of sheaves 563
A.11. Rational maps 574
A.12. Properties of morphisms 577
A.13. Curves and surfaces 581
A.14. Connexion to complex manifolds 583
Contents ix

Appendix B. Ramification 586


B.1. Discriminants 586
B.2. Unramified field extensions 591
B.3. Unramified morphisms 598
B.4. The ramification divisor 599

Appendix C. Geometry of numbers 602


C.1. Adeles 602
C.2. Minkowski’s second theorem 608
C.3. Cube slicing 615

References 620
Glossary of notation 635
Index 643
Preface

Diophantine geometry, the study of equations in integer and rational numbers, is


one of the oldest subjects of mathematics and possibly the most popular part of
number theory, for the professional mathematician and the amateur alike. Cer-
tainly, one of its main attractions is that, far from being a disconnected assembly
of isolated results, it provides glimpses of a view which hints at a well-organized
underlying structure.
Diophantine equations are of course determined by the underlying algebraic equa-
tions and therefore their associated algebraic geometry, obtained by dropping the
condition that the solutions must be integers or rational numbers, plays a big role in
their study. However, algebraic geometry is already not an easy subject. A pioneer
and one of the founding fathers of algebraic geometry, the German mathematician
Max Noether, after seeing the theory of algebraic curves with its elegance, simplic-
ity, and also depth of results, and comparing it with the collection of the existing
examples of algebraic surfaces at the time, for which nothing comparable could be
found, used to say that algebraic curves were created by God and algebraic sur-
faces by the Devil. Only later, with the development of new tools, in particular
the introduction of cohomological and topological methods, the theory of surfaces
and higher-dimensional varieties over a field found a satisfactory status.
Of special importance for arithmetic was the development of algebraic geom-
etry over fields of positive characteristic and p -adic fields, since the study of
polynomial congruences leads very naturally to such problems. The next big
step, the study of varieties over general rings (in contrast to fields), was done by
Grothendieck in his monumental construction of the theory of schemes. This pro-
vided the basic setting for the study of diophantine equations from a geometric
point of view. Bits and pieces of a theory were provided at an early stage (Weil’s
proof of the Mordell–Weil theorem is possibly the first example) and Weil’s theory
of heights, with its good arithmetic and geometric properties, was for a long time
the main tool. However, the development of a consistent theory was hindered by
two major obstacles.
An algebraic curve X over, for example, the ring Z of rational integers is, from
the point of view of schemes, a two-dimensional object, an arithmetic surface, en-
dowed with a morphism f : X → Spec(Z). Ideally, we would like to find an

xi
xii Preface

analogue of the classical theory of algebraic surfaces which applies in this arith-
metic setting.
This can be done only to some extent. First of all, global results require working
with complete varieties, and a first problem was to compactify Spec(Z) and de-
velop a good intersection theory for divisors. This step was brilliantly solved by
Arakelov, using adeles and introducing metrics on the “fibre at infinity.” Arakelov’s
work can be regarded as the start of a beautiful new theory, aptly named “arith-
metic geometry.” As an example, in arithmetic geometry the theory of heights is a
special chapter of the much more precise arithmetic intersection theory.
Arakelov’s theory did not solve all problems and major questions remain. In the
“horizontal direction” given by the base Spec(Z), infinitesimal methods are no
longer at our disposal and genuine new difficulties, with no counterpart in the
classical theory, do appear. This is one of the major stumbling blocks for further
progress. Thus at the present stage we may take a view half-way towards Max
Noether’s view: Arithmetic surfaces were also created by God, but their study
encounters devilish difficulties.
Today, there are already good books devoted to the subject, and we can mention
here Lang’s [169], Serre’s [277], the more expository but very comprehensive ac-
count of Lang’s [171] and Hindry and Silverman’s [153]. So, why a new book on
diophantine geometry?
As is often the case, this book grew from introductory lectures at the graduate
level, given over a decade ago at the Scuola Normale Superiore di Pisa and the
Mathematisches Forschungsinstitut of the Eidgenössische Technische Hochschule
in Zürich. An advanced knowledge of algebra or algebraic geometry was not a
prerequisite of the courses. Thus the subject was developed mainly through clas-
sical lines, namely the theory of varieties over fields of characteristic 0 insofar
as algebraic geometry was concerned, and the theory of heights for the number
theoretic aspects.
Already with the initial rough notes, embracing the view that in order to learn tools
it is best to use them in practice, it was decided to keep mathematical rigor as a
strict requirement, supplying references whenever needed and making a clear dis-
tinction between a proof and a plausible argument. Examples, including unusual
ones, and advanced sections in which deeper aspects of the theory were either
developed or described, were included whenever possible. Rather than including
this type of material as “exercises” at various levels of difficulty, often disguising
good research papers as exercises, it was decided to include proofs and extended
comments also for them. However, in the time needed to put the original material
together, the subject matter continued to advance at a fast pace, whence the need
for inclusion of additional interesting material, as well as substantial revisions of
what had been done before.
Preface xiii

In the final product, this book is basically divided into three parts. Chapters 1 to
7 develop the elementary theory of heights and its applications to the diophantine
geometry of subvarieties of the split torus Gnm , including applications to diophan-
tine approximation with proofs of Roth’s theorem and Schmidt’s subspace theorem
and some unusual applications.
Chapters 8 to 11 deal with abelian varieties and the diophantine geometry of their
subvarieties, ending with a detailed proof of Faltings’s celebrated theorem estab-
lishing Mordell’s conjecture for curves, following Vojta’s proof as simplified in
[29]. However, we felt that a proper treatment of Faltings’s big theorem, namely
his proof of Lang’s conjectures about rational points on subvarieties of abelian va-
rieties, was best done in the context of arithmetic geometry and with regrets we
limited ourselves on this matter only to a few comments about the theorem itself
and to some of its applications.
Chapters 12 to 14 are more speculative and at times straddle the borderline be-
tween diophantine geometry and arithmetic geometry. Chapter 12 deals with the
so-called abc-conjecture over number fields, including a complete proof of Belyı̆’s
theorem and its application to Elkies’s theorem, various examples, concluding
with a finiteness result for the generalized Fermat equation, due to Darmon and
Granville. Chapter 13, which is largely self contained, is an exposition of the
classical Nevanlinna theory, with proofs of the first and second main theorems of
Nevanlinna, and also Cartan’s extension of them to the theory of meromorphic
curves. Its purpose is to motivate the final Chapter 14 dealing with the well-known
Vojta conjectures, which have spurred a great deal of work in the field.
Proofs are usually given in full detail, but of course it was not feasible to de-
velop all algebra and algebraic geometry from scratch and they tend to be fairly
condensed at times. To alleviate this, Appendix A summarizes all concepts of al-
gebraic geometry needed in this book and Appendix B gathers the necessary facts
about ramification in number theory and algebraic geometry. Both are provided
with complete references to standard books and should help the reader in under-
standing which notions and notations we use. Finally Appendix C contains an
account of Minkowski’s geometry of numbers, with proofs, at least to the extent
we need in this book.
Some sections in this book appear in small print. Their meaning is simply that
they can be omitted in a first reading, either because they require more advanced
knowledge of algebra and geometry, or because they deal with side topics not
appearing elsewhere in the book. At the end of every chapter, the reader will find
some bibliographical notes, containing both historical comments and references
to additional literature. However, in no way do these references pretend to be
complete and they only represent our personal choices for additional reading.
xiv Preface

This book does not represent an introduction to diophantine geometry, nor a com-
plete treatment of the theory of heights. Neither do we strive for maximum gener-
ality, and most of the book is concerned only with a number field as ground field,
dealing only marginally with the function field case and even less with ground
fields of positive characteristic. Also, we do not extend the theory to semiabelian
varieties or non-split commutative linear groups, which are also quite important
and lead to delicate questions.
The whole theory of effective diophantine approximation, and Baker’s theory of
logarithmic forms, are missing entirely from this book and relegated to a few com-
ments at the end of Chapter 5. This is not due to a perception of lack of importance
of the subject. Rather, an adequate treatment of the topic would have required a
second large volume for this already large book.
The same can be said for arithmetic geometry, which no doubt deserves an ad-
vanced monograph by itself, also for the arithmetic theory of elliptic curves and
abelian varieties, and for the arithmetic theory of modular functions and its appli-
cations to diophantine problems.
Our goal in writing this book was to provide, in addition to the existing literature,
a wide selection of topics in the subject, containing foundational material with
complete proofs, numerous examples, and additional material viewed as a bridge
between the classical theory and arithmetic geometry proper. A fair portion of this
book is meant to be accessible to a reader with only a basic course in algebra and
algebraic geometry, but even the specialist in the field should be able to find inter-
esting material in it. We made no serious attempt to reach completeness about the
history of the subject, also referenced material (we never quote from secondary
sources) is for this very reason mostly from literature in the English and French
languages. Finally, although we attempted to put together a comprehensive bib-
liography, in no way do we pretend it to be complete. We apologize in advance
for the inevitable omissions in our bibliography, regarding priorities and precursor
works.
At the end of the book the reader will find an index of mathematical names in
lexicographic order and an index of notations ordered by page number. The vanity
index (index of authors mentioned in the text) has been omitted.
Terminology

We try to use standard terminology, but for convenience of the reader we gather
here some of the most frequently used notation and conventions.
In set theory, A ⊂ B means that A is a subset of B . In particular, A may be
equal to B . If this case is excluded, then we write A  B . The complement of A
in B is denoted by B \ A as we reserve − for algebraic purposes. We denote the
number of elements of A by |A| (possibly ∞ ). The identity map is id.
A quasi-compact topological space is characterized by the Heine–Borel property
for open coverings. In this book, a compact space is quasi-compact and Hausdorff.
We denote by N the set of natural numbers with 0 included and Z is the ring of
rational integers. Then Q , R and C are the fields of rational, real, and complex
numbers. A positive number means x > 0, but we use R+ for the non-negative
real numbers. The Kronecker symbol δij is 0 for i = j and 1 for i = j .
The real (resp. imaginary) part of a complex number z is denoted by z (resp.
z ) and z is complex conjugation.
The floor function x, defined for x ∈ R , is the largest rational integer ≤ x . The
ceiling function x denotes the smallest rational integer ≥ x .
The real functions on X are denoted by RX . For f, g ∈ RX , the Landau symbol
f = O(g) means |f (x)| ≤ Cg(x) for some unspecified positive constant C .
If we want to emphasize the dependence of C on parameters ε, L, . . . we write
f = Oε,L,... (g). As a special case, f = O(1) means that f is a bounded function
on X . We also use, with the same meaning, the equivalent Vinogradov’s symbol
f  g and f ε,L,... g . The symbol g  f is interpreted as f  g .
If X is a topological space and f, g are defined on a subset Y with an accumu-
lation point x , then f = O(g) for y → x means that |f (y)| ≤ Cg(y) holds for
all y ∈ Y contained in a neighbourhood of x . If this is true for all C > 0 (with
neighbourhoods depending on C ), then we use the Landau symbol f = o(g) for
y → x . The asymptotic relation f ∼ g for y → x means that f − g = o(|g|).
The Landau symbols and the Vinogradov symbol must be used with caution in
presence of parameters, not just because the constant involved may depend on pa-
rameters, but especially because the neighbourhood in which the inequality holds
will also depend on the parameters, an easily overlooked fact.

xv
xvi Terminology

In number theory, we use GCD(a, b) for the greatest common multiple of a and
b . As usual, a|b means that a divides b . The number of primes up to x is π(x).
The group of multiplicative units of a commutative ring with identity is denoted
by R× . We use the symbol V ∗ to denote the dual of a vector space V . Rings and
algebras are always assumed to be associative, fields are always commutative. If
the rings have an identity, then we assume that ring homomorphisms send 1 to 1.
The ideal generated by g1 , . . . , gm is denoted by [g1 , . . . , gm ]. The characteristic
of a field K is char(K) and we write Fq for the finite field with q elements.
The ring of polynomials in the variable x with coefficients in K is denoted by
K[x]. A monic polynomial has highest coefficient 1. The minimal polynomial
of an algebraic number α over a field is assumed to be monic and its degree is
the degree of α , denoted by deg(α). If we consider the minimal polynomial
over Z (or any factorial ring), then we replace monic by the assumption that the
coefficients are coprime. We use x to denote a vector with entries xi , thus K[x]
is the ring of polynomials in the variables xi . By K we denote a choice of an
algebraic closure of the field K .
For the terminology used in algebraic geometry, the reader is referred to
Appendix A.
The numbering in this book is by chapter (appendices in capitals), section, and
statement, in progressive order. Equations are numbered separately by chapter (ap-
pendices in capitals) and statement in progressive order, with the label enclosed in
parentheses. References to equations not occurring on the same page or the pre-
ceding page also give the page numbers; the first example is: (A.13) on page 558,
occurring on page 15.
1 HEIGHTS

1.1. Introduction

This chapter contains preliminary material on absolute values and the elementary
theory of heights on projective varieties. Most of this material is quite standard,
although we have included some of the finer results on classical heights which are
not usually treated in other texts.
In Section 1.2 we start with absolute values, and places are introduced as equiv-
alence classes of absolute values. The definitions of residue degree and ramifica-
tion index are given, as well as their basic properties and behaviour with respect
to finite degree extensions. In Sections 1.3 and 1.4 we introduce normalized ab-
solute values and the all-important product formula in number fields and function
fields. Section 1.5 contains the definition of the absolute Weil height in projective
spaces, the characterization of points with height 0, and a general form of Liou-
ville’s inequality in diophantine approximation. Section 1.6 studies the height of
polynomials and Mahler’s measure and proves Gauss’s lemma and its counterpart
at infinity, Gelfond’s lemma. Section 1.7, which can be omitted in a first read-
ing, elaborates further on various comparison results about heights and norms of
polynomials, including an interesting result of Per Enflo on 1 -norms.
The presentation of the material in this chapter is self contained with the excep-
tion of Section 1.2, where the basic facts about absolute values are quoted from
standard reference books (N. Bourbaki [47], Ch.VI, S. Lang [173], Ch.XII, and N.
Jacobson [157], Ch.IX).

1.2. Absolute values

Definition 1.2.1. An absolute value on a field K is a real valued function | | on


K such that:

(a) |x| ≥ 0 and |x| = 0 if and only if x = 0.


(b) |xy| = |x| |y|.
(c) |x + y| ≤ |x| + |y| (triangle inequality).
1
2 HEIGHTS

1.2.2. The trivial absolute value is equal to 1 except at 0. If an absolute value


satisfies instead of the triangle inequality (c) the stronger condition

(c  ) |x + y| ≤ max{|x|, |y|},

then it is called non-archimedean. If (c  ) fails to hold for some x, y ∈ K, then


the absolute value is called archimedean. The distance of x, y ∈ K is |x − y|.
This metric induces a topology on K. In the non-archimedean case, we have an
ultrametric distance and (c  ) is called the ultrametric triangle inequality . If two
absolute values define the same topology, they are called equivalent .
Proposition 1.2.3. Two absolute values | |1 , | |2 are equivalent if and only if
there is a positive real number s such that
|x|1 = |x|s2
for x ∈ K .
Proof: See [157], Th.9.1 or [173], Prop.XII.1.1. 
1.2.4. A place v is an equivalence class of non-trivial absolute values. By | |v
we denote an absolute value in the equivalence class determined by the place v .
If the field L is an extension of K and v is a place of K, we write w|v for a
place w of L if and only if the restriction to K of any representative of w is a
representative of v , and say that w extends v and, equivalently, that w lies over
v . We also employ the notation w|v (that is, w divides v ), motivated by the fact
that non-archimedean places in number fields correspond to prime ideals.
The completion of K with respect to the place v is an extension field Kv with a
place w such that:

(a) w|v .
(b) The topology of Kv induced by w is complete.
(c) K is a dense subset of Kv in the above topology.

The completion exists and is unique up to isometric isomorphisms ([157], Th.9.7


or [173], Prop.XII.2.1). By abuse of notation, we shall denote the unique place w
also by v .
Example 1.2.5. If the field is Q , then there is only one archimedean place ∞ on
Q , given by the ordinary absolute value | |. We also write | |∞ for this absolute
value (cf. [157], Th.9.4).
For a prime p we have the p-adic absolute value | |p determined as follows. Let
m/n ∈ Q be a rational number and write it in the form
m m
= pa  ,
n n
1.2. Absolute values 3

where m , n are integers coprime with p . Then we set


m
  = p−a .
n p
In fact, it suffices to define | |p by the conditions

1 for primes q = p
|q|p = 1
p if q = p.
The p -adic absolute values so defined give us a set of inequivalent representatives
for all non-archimedean places on Q ([157], Th.9.5). The field Qp of p -adic
numbers is the completion of Q with respect to the place p . The compact subset
Zp of p -adic integers is the closure of Z in Qp (for compactness, see [47], Ch.VI,
§5, no.1, Prop.2). On the other hand, the completion of Q with respect to the
archimedean place ∞ is R . In full generality, we have the following well-known
Theorem of Ostrowski:
Theorem 1.2.6. The only complete archimedean fields are R and C .
Proof: [157], §9.5 or [47], Ch.VI, §6, no.4, Th.2. 
Proposition 1.2.7. Let K be a field which is complete relative to an absolute
value | |v and let L be a finite-dimensional extension field of K . Then there is
a unique extension of | |v to an absolute value | |w of L. For any x ∈ L the
equation
|x|w = |NL/K (x)|v1/[L:K]
holds, where NL/K is the norm from L to K . Moreover, the field L is complete
with respect to | |w .
Proof: [157], Th.s 9.8, 9.9, 9.12 or [47], Ch.VI, §8, no.7, Prop.10. 
Remark 1.2.8. Clearly, the preceding proposition implies that there is a unique
extension to an absolute value on the algebraic closure of K . Note however that
the last clause of this proposition need not hold for infinite-dimensional extensions;
a well-known example is an algebraic closure of the p -adic field Qp (cf. S. Bosch,
U. Güntzer, and R. Remmert [43], 3.4.3).
1.2.9. Let K be a field with a non-archimedean place v and let L be a finite-
dimensional field extension of K . Assume that w is a place of L with w|v . The
ring
Rv := {x ∈ K | |x|v ≤ 1}
is called the valuation ring of v . The definition is obviously independent of the
representative | |v of v . Rv is a local ring with unique maximal ideal mv :=
{x ∈ K | |x|v < 1}.
The residue field k(v) is defined by Rv /mv . The quotient map Rv → k(v), x →
x is called the reduction. Applying it to coefficients, it extends to polynomials
and to power series.
4 HEIGHTS

The residue degree fw/v of L/K in w is the dimension of k(w) over k(v). Let
| |w be an absolute value representing w and | |v the restriction of | |w to K .
The value group |K × |v is a multiplicative subgroup of |L× |w and its index is
called the ramification index ew/v of w in v .
The place v is called discrete if the value group |K × |v is cyclic. Then mv is a
principal ideal and any principal generator is called a local parameter.

The following result is the very useful Hensel’s lemma.


Lemma 1.2.10. Let K be complete with respect to a non-archimedean place v .
Let f (t) ∈ K[t] be a monic polynomial with reduction f (t) = g(t)h(t) for
some monic coprime polynomials g(t), h(t) ∈ k(v)[t]. Then there exist monic
polynomials G(t), H(t) ∈ Rv [t] with F (t) = G(t)H(t) and G(t) = g(t),
H(t) = h(t).
Proof: For discrete valuations, we refer to [157], §9.11. The general case is proved
in [43], 3.3.4. 
Proposition 1.2.11. Let L/K and w|v be as in 1.2.9.
(a) The residue degree and the ramification index do not change if we pass to
completions.
(b) The product of the residue degree and the ramification index is at most
[L : K], with equality if v is discrete and K is complete relative to v .
Proof: [157], §9.10 or [173], Prop.s XII.4.2, XII.6.1, and §XII.5. 
1.2.12. A number field is a finite-dimensional field extension of Q . The ring
of algebraic integers of K is denoted by OK . Now let L be a locally compact
field containing a number field K as a dense subset and assume that the topology
is not discrete. Then it follows that L is complete because it is locally compact.
The classification of non-discrete locally compact fields is well known and tells
us that there is a place v of K such that L is the completion of K with respect
to v . Moreover, if L is connected, then L is isomorphic to R or C or to a finite
extension of Qp . The closure of OK in L coincides with the valuation ring Rv
of L.

On the other hand, every completion of a number field with respect to a non-
archimedean place is a finite extension of Qp , hence locally compact. For details,
we refer to [47], Ch.VI, §9, no.3, Th.1. The following result of Artin and Whaples
is called the approximation theorem:
Theorem 1.2.13. Let | |1 , . . . , | |n be inequivalent non-trivial absolute values on
a field K . Then for x1 , . . . , xn ∈ K and ε > 0 there is x ∈ K such that
|x − xk |k < ε (k = 1, . . . , n).
Proof: [157], §9.2 or [173], Th.XII.1.2. 
1.3. Finite extensions 5

1.3. Finite-dimensional extensions

Let K be a field with a fixed non-trivial absolute value | |v .


Proposition 1.3.1. Let L be a finite-dimensional field extension of K generated
by a single element ξ . If f (t) is the monic minimal polynomial of ξ over K and
f (t) = f1k1 (t) · · · frkr (t)
is its decomposition into different irreducible monic factors fj (t) ∈ Kv [t], then
for each j there is an injective homomorphism
ι : L −→ Kj := Kv [t]/(fj (t))
of field extensions over K , given by ξ → t . There is a unique extension | |j of the
absolute value of Kv to Kj . The absolute values | |j are pairwise inequivalent.
Moreover, Kj is the completion of L with respect to | |j and the embedding ι .
For any absolute value | |w extending | |v to L, there is a unique j ∈ {1, . . . , r}
such that the restriction of | |j to L is equal to | |w .
Proof: Proposition 1.2.7 leads to the unique extension | |j of the absolute value of
Kv . The map ι is a well-defined homomorphism of field extensions over K and
the image of ι is dense in Kj . Hence Kj is the completion of L with respect to
| |j . If the restrictions of | |j and | |k are equivalent, then we have an isometric
isomorphism of Kj onto Kk, leaving Kv fixed. Therefore, the images of ξ have
to be roots of the same irreducible factor of f (t) in Kv [t], yielding j = k . Let
| |w be an absolute value on L extending | |v . The closure of K in Lw can be
identified with Kv . Now ξ generates a finite-dimensional subfield of Lw over
Kv which is complete by Proposition 1.2.7, therefore this subfield is Lw itself.
Also ξ must be a root of some fj , hence Lw is isomorphic to Kj over Kv .
Moreover, we can assume that L is fixed under this isomorphism. Then it is clear
from Proposition 1.2.7 that | |w is equal to the restriction of | |j to L. 
Corollary 1.3.2. If L is a finite-dimensional separable field extension of K , then

[Lw : Kv ] = [L : K],
w|v

where the sum ranges over all places w of L with w|v .


Proof: By the primitive element theorem (see N. Jacobson [156], §4.14), there
is an element ξ of L which generates L over K . Proposition 1.3.1 implies the
formula. 

Remark 1.3.3. If the extension is not separable, we still have w|v [Lw : Kv ] ≤
[L : K]. If L is generated by a single element over K, this is clear from Proposi-
tion 1.3.1. For the general case, we use induction on the degree.
Definition 1.3.4. The number [Lw : Kv ] is called the local degree of L/K in w .
6 HEIGHTS

Corollary 1.3.5. Let L/K be a finite-dimensional Galois extension with Galois


group G = Gal(L/K) and let | |w0 , | |w be absolute values on L extending
| |v . Then there is an element σ ∈ G with
|x|w = |σ(x)|w0 for x ∈ L.
The completions Lw and Lw0 are isomorphic over Kv . However, they need not
to be isomorphic over L.
Proof: As in the proof of Corollary 1.3.2, there is an element ξ of L with L =
K(ξ). If f (t) is the minimal polynomial of ξ over K , then Lw is obtained by
adjoining a root of fj (t) to Kv in a fixed splitting field of f over Kv , where
fj (t) is an irreducible factor of f (t) in Kv [t]. Since L is a Galois extension, all
roots of f are contained in Lw , therefore Lw = Lw0 as a field. Then the absolute
values | |w0 and | |w correspond to embeddings ι0 and ι of L into Lw0 over
K . There is a unique ρ ∈ Gal(Lw0 /Kv ) with ι = ρ ◦ ι0 , given by ι0 (ξ) → ι(ξ).
If | | is the unique absolute value of Lw0 extending the one of Kv and if σ is the
unique element of G with ρ ◦ ι0 = ι0 ◦ σ , then
|x|w = |ι(x)| = |ρ ◦ ι0 (x)| = |ι0 ◦ σ(x)| = |σ(x)|w0 for x ∈ L. 
1.3.6. Let K be a field with a fixed non-trivial absolute value | |v . We consider
a finite-dimensional separable extension field L of K and a place w of L with
w|v . For any x ∈ L, we define

xw := |NLw /Kv (x)|v

and

|x|w := |NLw /Kv (x)|v1/[L:K] .


1/[L :K ]
We know from Proposition 1.2.7 that the restriction of |NLw /Kv |v w v to L is
a representative of w extending | |v . The obvious inequality [Lw : Kv ] ≤ [L : K]
implies that | |w is an absolute value representing w . If v is not archimedean or
[Lw : Kv ] = 1, we have that  w is also an absolute value representing w . On
the other hand, if v is archimedean and [Lw : Kv ] = 2, we have Lw = C and
Kv = R . Assume that the restriction of | |v to Q is the ordinary absolute value;
then  w is not an absolute value because the triangle inequality is not satisfied.
Lemma 1.3.7. Let x ∈ K \ {0} and y ∈ L \ {0}. With the notation above

log |x|w = log |x|v ,
w|v

log yw = log |NL/K (y)|v .
w|v
1.3. Finite extensions 7

Proof: Corollary 1.3.2 implies the first statement. There is an element ξ of L with
L = K(ξ). With the notation of Proposition 1.3.1, we have k1 = · · · = kr = 1
and an isomorphism

r

L ⊗K Kv −→ Kv [t]/(fj (t))
j=1

of Kv -algebras, given by ξ −→ (t)j=1,...,r (this is a form of the Chinese remain-


der theorem). By Proposition 1.3.1 we get

NL/K (y) = NLw /Kv (y),
w|v

proving the second claim. 


1.3.8. If K is a number field, the archimedean absolute values of K are deter-
mined by the embeddings σ : K −→ C of K into the complex numbers. There
are exactly [K : Q] such embeddings. An embedding σ is said to be real if σ(K)
is in the real subfield R of C , and complex otherwise. If σ is a complex em-
bedding, composition with complex conjugation yields a conjugate embedding σ ,
and it is clear that σ and σ determine the same archimedean absolute value. Con-
versely, if σ and σ  are two embeddings of K in C determining the same absolute
value, we have σ  = σ or σ  = σ . All this is immediate from Proposition 1.3.1,
because K has a primitive element over Q .
The completion of K at an archimedean place is isometric to either R or C . Ac-
cordingly, the set of archimedean places is subdivided into real places and com-
plex places.
Example 1.3.9. Let p be an odd prime and K = Q(ζ) with ζ a primitive p th root of
unity. Our goal in this example is to determine all extensions of an absolute value of Q to
K.
The minimal polynomial of ζ over Q is given by
f (t) = tp−1 + tp−2 + · · · + 1,
which is proved by applying Eisenstein’s criterion to f (t + 1) .
To begin with, we determine the extensions of the ordinary absolute value | |∞ of Q to
K . The irreducible factors of f (t) in R[t] have degree 2 . By Proposition 1.3.1, there are
exactly p−1
2
extensions of | |∞ ; all archimedean absolute values of K are associated to
the (p − 1)/2 pairs of complex conjugate embeddings of K and the local degree is equal
to 2 .
Next, we consider the extensions of the non-archimedean absolute value associated to a
prime number q . Suppose first that q = p . We need to decompose f (t) into irreducible
factors over Qq . There is a smallest number r ≥ 1 with p|q r − 1 , determined by the
property that Fq r is the smallest field of characteristic q containing a non-trivial p th root
of unity (note also that, by Fermat’s little theorem, r|p − 1 ). In that case, the field Fq r
8 HEIGHTS

contains all p th roots of unity. Hence f (t) is a product of p−1


r
distinct irreducible factors
in Fq [t] , each degree r . The same is true in Qq [t] by Hensel’s lemma (see Lemma 1.2.10).
We conclude again by Proposition 1.3.1 that there are exactly p−1 r
extensions of | |q to K
and the local degrees are equal to r . It is obvious that ζ remains a unit in any completion
of K with respect to such an absolute value and its representative in the residue field is also
a non-trivial primitive p th root of unity. By Proposition 1.2.11, the residue degree is equal
to r and the ramification index is 1 .
It remains to consider the place p . As before, Eisenstein’s criterion shows that the polyno-
mial f (t) is irreducible in Qp [t] . Then there is only one extension | |v of | |p to K and
the local degree is p − 1 . The minimal polynomial of ζ − 1 over Qp is f (t + 1) and so
NK v /Qp (ζ − 1) = p . Proposition 1.2.7 implies

|ζ − 1|v = p−1/(p−1) .
By Proposition 1.2.11, the ramification index is equal to p − 1 and the residue degree is
equal to 1 .
1.3.10. In the final part of this section, we handle finite-dimensional field extensions without
separability assumptions. It turns out that it suffices to adjust the exponents in the normal-
ization 1.3.6. Since we focus almost exclusively on number fields, the reader may skip the
rest of this section in a first reading.
Let K be a field with absolute values | |v and let L/K be a finite-dimensional field
extension. Our goal is to generalize Proposition 1.3.1 describing the extensions of | |v to
the field L .
Since L ⊗K Kv is a finite-dimensional Kv -algebra, the structure theorem of commutative
artinian rings ([157], Th.7.13) gives uniquely determined ideals Rj , which are local Kv -
algebras with maximal ideals mj and such that

r
L ⊗K Kv = Rj . (1.1)
j=1

We have natural embeddings of L and Kv into the residue field Kj = Rj /mj of Rj . By


Proposition 1.2.7, there is a unique extension | |j of | |v to Kj . Clearly, L is dense in
Kj , whence Kj is the completion of L with respect to this absolute value.
Proposition 1.3.11. The restrictions of | |j , j = 1, . . . , r , to L are all extensions of | |v
to absolute values on L and they are pairwise inequivalent.
Proof: Suppose the restrictions to L of | |j and | |k are equivalent. Then there is an
isomorphism ϕ : Kj −→ ∼ Kk which is the identity on L and Kv . Let ϕj be the canonical
isomorphism of L ⊗K Kv onto Kj . Then ϕ = ϕk ◦ ϕ−1 j (check on L and Kv ), which
is possible only if j = k . This proves the last clause of our claim.
Let | |w be an extension of | |v to L . The closure of K in Lw will be identified with
Kv . Since L is finite dimensional over K , it follows that LKv is a closed subfield of Lw ,
from which we conclude that LKv = Lw . By the universal property of the tensor product,
there is a homomorphism of L ⊗K  Kv onto Lw . The kernel of this homomorphism is a
maximal ideal, hence equal to mj × k=j Rk for some j . Therefore, Kj is isomorphic to
1.4. The product formula 9

Lw as a Kv -algebra, and by Proposition 1.2.7 we infer that it is in fact an isometry. This


shows that the restriction of | |j to L is | |w , as we wanted. 
1.3.12. Now we are ready to define the correct normalizations of the absolute values as
above. We have seen that the places w of L with w|v are in one-to-one correspondence
with the local Kv -algebras in (1.1), thus we may write

L ⊗K Kv = Tw (1.2)
w|v

and identify the residue field of Tw with the completion Lw . For y ∈ L , we set
yw := |NL w /K v (y)|v[T w :L w ]
and
|y|w := y1/[L:K
w
]
.
With these modifications the analogue of Lemma 1.3.7 still holds, namely:
Lemma 1.3.13. If x ∈ K \ {0} and y ∈ L \ {0} , then

log |x|w = log |x|v , (1.3)
w|v

log yw = log |NL/K (y)|v . (1.4)
w|v

Proof: Formula (1.4) is a trivial consequence of Proposition 1.2.7 and (1.2). If we set y = x
in (1.4), then (1.3) follows immediately from (1.2). 

1.4. The product formula

The product formula over Q may be stated and proved as a consequence of the
factorization of a non-zero rational number into a product of primes and a unit. In
spite of its simplicity and essentially trivial nature, it plays a fundamental role and
its importance cannot be overstated. The fact that it involves all places, including
the places at ∞ , means that, from the geometrical point of view, we are dealing
with a complete variety. In the case considered here, the general fibre of the variety
is a point and everything is quite simple. However, the best interpretation of the
product formula and its generalizations is found in the framework of Arakelov
theory.
1.4.1. Let K be a field and MK be a set of non-trivial inequivalent absolute values
on K such that the set
{ | |v ∈ MK | |x|v = 1}
is finite for any x ∈ K\{0}. We identify the elements of MK with the corre-
sponding places and say that MK satisfies the product formula if

|x|v = 1
v∈MK

for any x ∈ K\{0}.


10 HEIGHTS

We shall also refer to 


log |x|v = 0
v∈MK
as the product formula for x = 0.

If L/K is a finite-dimensional extension and MK is a set of places with associ-


ated normalized absolute values satisfying the product formula, we obtain a set of
places ML consisting of representatives | |w of w|v for v ∈ MK , normalized as
in 1.3.6 and 1.3.12.
Proposition 1.4.2. The set of places ML so normalized again satisfies the product
formula.
Proof: Let x ∈ L× . We need to check that |x|w = 1 only for finitely many
w ∈ ML . Since x is algebraic over K , we have
xn + an−1 xn−1 + · · · + a0 = 0 (1.5)
for suitable ai ∈ K . By assumption, we have |ai |v ∈ {0, 1} up to finitely many
v ∈ MK . Since there are only finitely many w ∈ ML lying over a given v
(Corollary 1.3.2), we have |ai |w ≤ 1, up to finitely many w ∈ ML . Clearly, there
are only finitely many archimedean places in MK and hence in ML . Thus it is
enough to consider non-archimedean w ∈ ML and then the ultrametric inequality
applied to (1.5) shows that |x|w ≤ 1 whenever all coefficients |ai |w ≤ 1. The
same argument applied to 1/x completes the proof that |x|w = 1 up to finitely
many w ∈ ML .
Once this is done, it is immediate from Lemmas 1.3.7 and 1.3.13 that the normal-
ized set of absolute values on L satisfies the product formula. 
1.4.3. By Example 1.2.5, we get
MQ := {| |p | p prime number or p = ∞},
normalized as follows. If p = ∞ , then | |p is the ordinary absolute value on Q ,
and, if p is prime, then the absolute value is the p -adic absolute value on Q , with
|p|p = 1/p .
Let K be a number field and let MK be the associated set of places and normal-
ized absolute values, obtained from the above construction applied to the extension
K/Q .
Proposition 1.4.4. If K is a number field, MK satisfies the product formula.
Proof: By the above discussion, we can assume that K is equal to Q and it is
obviously enough to show the product formula for a prime number x
 1
|x|p = |x|x |x|∞ = x = 1. 
x
p∈MQ
1.4. The product formula 11

In this book, whenever we talk about MK of a number field it will always be


the set so constructed from MQ . This is important for our normalizations, which
we repeat: If p = ∞ , then | |p is the ordinary absolute value on Q , and, if p is
prime, then the absolute value is the p -adic absolute value on Q , with |p|p = 1/p .
In either case, we have
|x|v := |NKv /Qp (x)|p1/[K:Q] (1.6)
for x ∈ K and v|p .
As an application of our previous considerations in this chapter, we prove the fol-
lowing refinement of Theorem 1.2.13 for a number field K , called the strong
approximation theorem:
Theorem 1.4.5. Let (| |v )v∈S be representatives for a finite set S of non-
archimedean places of the number field K , let xv ∈ Kv for every v ∈ S , and let
ε > 0. Then there is x ∈ K with |x − xv |v < ε for all v ∈ S and |x|v ≤ 1 for
all non-archimedean v ∈ S .
Proof: There is no loss of generality in assuming that xv ∈ K , because, by def-
inition of completion, K is dense in Kv . By Proposition 1.2.3, we may also
assume that the absolute values extend the p -adic absolute values | |p from Ex-
ample 1.2.5. By Corollary 1.3.2, there are only finitely many places lying over a
natural prime number, hence we may enlarge S to the set of all places lying over a
finite set S0 of prime numbers, taking xv = 0 at every new place v introduced by
doing so. For any x ∈ K × , Proposition 1.4.2 shows that |x|v = 1 only for finitely
many places v . Now take x to be the approximation to xv (v ∈ S ) obtained from
Theorem 1.2.13 with ε = 1. Then there is a finite set S1 of prime numbers, dis-
joint from S0 , such that |x|v = 1 for all places v of K, which do not lie over
S0 ∪ S1 . By the Chinese remainder theorem, there is m ∈ Z with |m − 1|p < δ
for p ∈ S0 and |m|p < δ for p ∈ S1 . If we choose δ > 0 sufficiently small, then
the approximation mx satisfies the conclusion of Theorem 1.4.5. 
1.4.6. Function fields (see A.4.11) are also important examples in diophantine geometry,
where the product formula holds and we devote the rest of this section to its discussion. In
order to understand the background from algebraic geometry, the reader may consult the
material on divisors in Sections A.8 and A.9. Since the focus of this book is mostly on
number fields, we may skip the proofs in a first reading.

Let X be a projective irreducible variety over a field K and let us fix an ample line bundle
L (see A.6.10). We denote by deg(Z) the degree of a cycle Z with respect to L . Since
the function field does not change by passing to the normalization (see A.12.6), we may and
shall assume that X is regular in codimension 1 (see A.8.10). For any prime divisor Z of
X , the local ring OX,Z is a discrete valuation ring and the valuation of f ∈ K(X)× is the
order of f at Z . The latter is denoted by ordZ (f ) . Since the degree of a principal divisor
is 0 , we have

ordZ (f ) deg(Z) = 0. (1.7)
Z
12 HEIGHTS

We normalize the absolute value corresponding to ordZ by


|f |Z := cordZ (f ) deg(Z ) ,
where c is some fixed number, 0 < c < 1 .
Proposition 1.4.7. The absolute values | |Z , Z prime divisor of X , are not trivial,
inequivalent, and satisfy the product formula.
Proof: Since OX,Z is a discrete valuation ring, the absolute value | |Z is not trivial. Let
Y, Z be different prime divisors. We may think of X as embedded in projective space.
There is a hyperplane not containing Y, Z . Let H be the corresponding very ample divisor
on X and let n ∈ N be so large that nH + Y is very ample (see A.6.10). By the same
argument as above, there is an effective divisor H  not containing Y and Z such that
H  = nH + Y + div(f )
for some f ∈ K(X)× . We have |f |Z = 1 and |f |Y = c− deg(Y ) . We conclude that | |Y
and | |Z are inequivalent. The product formula is a consequence of (1.7). 
Example 1.4.8. Let C be an irreducible projective curve over K . The curve is regular
if and only if it is normal (see A.8.10). So let C be regular (note however that this does
not mean that C is smooth, see A.13.3). Then C is a regular model for the function field
K(C) . The prime divisors of C are in one-to-one correspondence with orbits of points
under Gal(K/K) (see A.2.7). The order at a prime divisor is the order at any point in the
associated orbit.
This example fits with the preceding considerations if L = O([P0 ]) for some point P0 ∈
C(K) (and L is of course automatically ample). The product formula follows from the
fact that any rational function f ∈ K(C)× has the same number of zeros and poles.
1.4.9. On the function field K(X) , we shall always use the set of absolute values consid-
ered above. We denote it by MX to emphasize the role of the model X . Obviously, the
choice of the constant c is irrelevant. Usually, we shall choose c = 1/e .
Lemma 1.4.10. The following statements hold:

(a) Any finitely generated field over K is a function field of an irreducible projective
normal variety over K .
(b) Two irreducible varieties are birationally equivalent over K if and only if they have
isomorphic function fields over K .
(c) If L is a finite-dimensional extension of the function field K(X) of an irreducible
projective variety X over K , then there are an irreducible projective normal va-
riety Y over K , and a finite surjective morphism ϕ : Y −→ X , such that
L∼ = K(Y ) and the inclusion K(X) ⊂ L corresponds to ϕ : K(X) −→ K(Y ) .
(d) In (c), there is a distinguished choice for Y called the normalization of X in
L , uniquely characterized up to isomorphisms by the following property: Given a
dominant morphism ϕ : Y  → X of an irreducible normal K -variety Y  to X
and a homomorphism ρ : K(Y ) → K(Y  ) over K(X) , then there is a unique
dominant morphism ψ : Y  → Y with ψ  = ρ .
1.4. The product formula 13

Proof: Let L be a finitely generated field over K with generators x1 , . . . , xn . Let I be


the kernel of the homomorphism

K[t1 , . . . , tn ] −→ L

given by ti
→ xi . Denote the corresponding closed subvariety of An by X . Since I is
prime, X is an irreducible variety. Obviously, K(X) is isomorphic to L . The closure X
of X in PnK is a projective variety. The normalization of X is a projective normal variety
(see A.12.7) with function field L (see A.12.6, A.12.7). This proves (a).
For (b), see A.11.4.
To prove (c) and (d), a generalization of the construction in A.12.6 leads to the normalization
of X in L . If X were affine, then we would take the integral closure of K[X] in L . For
any variety X , we glue the normalizations of the affine open charts to get the normalization
of X (see A. Grothendieck [135], 6.3.9). The morphism from the normalization to X is
finite ([135], 6.3.10) and hence projectivity of X implies projectivity of the normalization
(see A.12.7). 

Remark 1.4.11. Note that a curve regular in codimension 1 is regular and hence deter-
mined up to isomorphism by its function field. For a higher-dimensional function field
K(X) , there is no canonical choice for the model X . Even if X is smooth, we may blow
up a point to get another smooth model X  for K(X) and it is clear that MX  MX  .
Hence we always fix a model when dealing with higher-dimensional function fields.
If L/K(X) is a finite extension, then Lemma 1.4.10 (d) shows that the normalization
ϕ : Y → X of X in L is a canonical model for the function field L . Let ϕ : Y  → X
be any finite surjective morphism of an irreducible projective variety Y  onto X with
K(Y  ) = L and with K(X) → L equal to (ϕ ) . We claim that MY = MY  .
We first show that we may replace Y  by its normalization Y  (in L ) without changing the
set of places. Indeed, the normalization morphism π : Y  → Y  is finite and birational.
Since Y  and Y  are projective, the valuative criterion of properness (see A.11.10) shows
that π induces an isomorphism outside of subsets of codimension ≥ 2 in Y  and Y  ,
hence MY  = MY  . So we may assume Y  normal and then Lemma 1.4.10 (d) yields a
unique dominant morphism ψ : Y  → Y factoring through ϕ . The morphism ψ is proper
(see A.6.15) and has finite fibres, hence ψ is finite (see A.12.4). Since K(Y  ) = K(Y ) =
L , the morphism ψ is also birational (Lemma 1.4.10 (b)) and the same argument as above
shows that ψ is an isomorphism in codimension 1 , hence MY = MY  .
We see that the set of places of L is well determined by X and in the following examples
we will show that MY is the set of places of L extending the places of MX .

Example 1.4.12. Let us consider a finite-dimensional field extension of the function field
K(C) of an irreducible projective regular curve C over the ground field K . By Lemma
1.4.10, there is an irreducible projective regular curve C  over K and a morphism ϕ :
C  → C such that the extension corresponds to the extension K(C  )/K(C) induced by
ϕ . We know from the above that the order at a closed irreducible subset Z of dimension 0
induces an absolute value | |v ∈ MC . Since C is regular, Cartier divisors can be identified
14 HEIGHTS

with Weil divisors (cf. A.8.21) and ϕ∗ (Z) is well defined. We have

ϕ∗ (Z) = mZ  Z  ,
Z

where Z ranges over all irreducible closed subsets of C  lying over Z and where mZ 


denotes the multiplicity in Z  . Note that for f ∈ K(C)× we have ordZ  (f ) = mZ  ordZ(f ) .
Thus Z  induces a place v  on C  with v  |v . Its ramification index and residue degree are
ev  /v = mZ  , fv  /v = [K(Z  ) : K(Z)] .
The projection formula for proper intersection products (see W. Fulton [125], Prop.2.5(c))
gives
Z.ϕ∗ (C  ) = ϕ∗ (ϕ∗ (Z)),
hence  
[K(C  ) : K(C)] = mZ  [K(Z  ) : K(Z)] = ev  /v fv  /v .
Z v
By Remark 1.3.3 and Proposition 1.2.11, we see that all places v  dividing v are induced
by the “fibre points” Z  and the local degree satisfies
[K(C  )v  : K(C)v ] = mZ  [K(Z  ) : K(Z)] .
Example 1.4.13. In order to extend the above example to higher dimensions, we use the
language of schemes. Let us consider a finite-dimensional extension of a function field
K(X) . By Lemma 1.4.10, we may identify this extension with an extension K(X  )/K(X)
induced by a finite surjective morphism ϕ : X  → X of irreducible projective varieties
over K and regular in codimension 1 .
Let Z be a prime divisor on X with corresponding place v . To study the places v  of X 
with v  |v , we may assume that X , and hence X  , are affine. The fibre over the generic
point ζ of Z is the affine scheme
 
ϕ−1 (ζ) = Spec K[X  ] ⊗K [X ] K(ζ) ,
where K(ζ) is the residue field of OX,ζ . By the structure theorem of finite-dimensional
algebras ([157], Th.7.13), we have

K[X  ] ⊗K [X ] K(ζ) ∼
= Oϕ −1 (ζ ),ξ .
ξ∈ϕ −1 (ζ )

Note that K[X ] ⊗K [X ] OX,ζ is a finitely generated module over the discrete valuation
ring OX,ζ . Since it is also torsion free (as a subring of K(X  ) ), it is free of rank [K(X  ) :
K(X)] . We conclude that

[K(X  ) : K(X)] = [Oϕ −1 (ζ ),ξ : K(ζ)].
ξ∈ϕ −1 (ζ )

Clearly, the order at any point ξ ∈ ϕ−1 (ζ) yields a place v  of K(X  ) with v  |v . Indeed,
denoting by tζ a local parameter in OX,ζ , we have
ordξ (f ) = ordξ (tζ )ordζ (f ),
×
for any f ∈ K(X) . Moreover, we verify that
ev  /v = ordξ (tζ ), fv  /v = [K(ξ) : K(ζ)] .
1.5. Heights in projective and affine space 15

Now, using Oϕ −1 (ζ ),ξ = OX  ,ξ / tζ , we see that


[Oϕ −1 (ζ ),ξ : K(ζ)] = ev  /v fv  /v .

Therefore, as in the preceding example, we conclude that all places of K(X  ) dividing v
are induced by points of ϕ−1 (ζ) .
Moreover, let L be an ample line bundle on X . Then the absolute values are normalized
by
|f |v := cordv (f ) degL (v) (v ∈ MK (X ) , f ∈ K(X))
to satisfy the product formula for some constant c . If we use the normalizations from 1.3.6
on K(X  ) and the equation
degϕ ∗ L (w) = [K(w) : K(v)] degL (v)
obtained from the projection formula (A.13) on page 558, then the above implies
ordw (g) degϕ ∗ L (w)
|g|w = c1 (w ∈ MK (X  ) , g ∈ K(X  ))

for c1 := c1/[K (X ):K (X )] . Note that ϕ∗ L is ample (cf. A.12.7) and hence the normaliza-
tions on K(X  ) fit with 1.4.6 for the new constant c1 .

1.5. Heights in projective and affine space

1.5.1. We denote by Q a choice of an algebraic closure of Q . Let us consider


the projective space PnQ with standard global homogeneous coordinates x = (x0 :
x1 : · · · : xn ). Let P ∈ PnQ . We now define a function, called height, on
algebraic points of PnQ , which may be considered as a measure of the “algebraic
complication” needed to describe P . This is a fundamental notion at the basis of
diophantine geometry.
Let P be a point of PnQ represented by a homogeneous non-zero vector x with
coordinates in a number field K . Then we set

h(x) := max log |xj |v .
j
v∈MK

Lemma 1.5.2. h(x) is independent of the choice of K .


Proof: Let L be another number field containing the coordinates x0 , . . . , xn of
x . We can assume that K ⊂ L. Then
  
max log |xj |w = max log |xj |w .
j j
w∈ML v∈MK w|v

Our claim now follows from the first formula of Lemma 1.3.7. 
Lemma 1.5.3. h(x) is independent of the choice of coordinates.
16 HEIGHTS

Proof: Let y be another coordinate vector. By the preceding lemma, we may


assume that x0 , . . . , xn , y0 , . . . , yn ∈ K . There is λ ∈ K , λ = 0, with y = λx ,
hence
  
h(y) = max log |yj |v = log |λ|v + max log |xj |v .
j j
v∈MK v∈MK v∈MK

Thus we get h(y) = h(x) by the product formula. 


These two lemmas show that the height so defined depends only on the point P
and not on the choice of coordinates of a homogeneous vector representing P .
Definition 1.5.4. We call h(x) the absolute logarithmic height (briefly, height)
of P and we denote it by h(P ). We also use the multiplicative height H(P ) =
eh(P ) .
Example 1.5.5. If the coordinates x0 , . . . , xn of P ∈ PnQ can be chosen in Q , we
can assume that they are integers and that x0 , . . . , xn have no common factors. If
we take such a representative for the coordinates of P , then the non-archimedean
places give no contribution to the height, and we obtain
h(P ) = max log |xj |∞ .
j

1.5.6. A similar notion holds for affine space. Let AnQ be the affine space of
dimension n over Q , together with the usual embedding in PnQ given by
P = (x1 , . . . , xn ) → (1 : x1 : · · · : xn );
then we define h(P ) as the height of the image of P .
1.5.7. It should always be clear from the context whether we are dealing with
points in affine or projective space, and there should be no problem in using the
same notation h for heights in affine or projective space.
In performing local calculations, it proves to be convenient to introduce the func-
tion
log+ t = max(0, log t)
on the positive real numbers, extended by log+ 0 = 0. Then it is immediate that
the height on affine space is given by

h(x1 , . . . , xn ) = max log+ |xj |v .
j
v∈MK

As a special case, the height of an algebraic number α is



h(α) = log+ |α|v
v∈MK

with K any number field with α ∈ K .


1.5. Heights in projective and affine space 17

1.5.8. Since any point in projective space admits a homogeneous representative


with one coordinate equal to 1, it is clear that the height so defined is never nega-
tive. The next result, Kronecker’s theorem, characterizes the case of equality.
×
Theorem 1.5.9. The height of ζ ∈ Q is 0 if and only ζ is a root of unity.
Proof: Let K be a number field and let ζ ∈ K × . If ζ is a root of unity, then its
absolute values are all equal to 1, and hence its height is 0.
Conversely, assume h(ζ) = 0. Then |ζ|v ≤ 1 for every v ∈ MK ; in particular, ζ
is an algebraic integer (for a formal argument, see the successive Remark 1.5.11).
Let d be the degree of ζ and let ζ = (ζ1 , . . . , ζd ) be a full set of conjugates of ζ .
Now consider, for every positive integer m , the elementary symmetric functions
si (ζ m ), i = 0, . . . , d, of ζ1m , . . . , ζdm . Since ζ is an algebraic integer, we have
si (ζ m ) ∈ Z for every m .
 
Since |ζj |v = 1 for every j and v , and since si (ζ m ) is the sum of di terms each
of which is a product of factors not exceeding 1 in absolute value, it is now clear
that
 d d

d
|si (ζ m )| ≤ = 2d .
i=0 i=0
i
There are only finitely many possibilities for the vector of such symmetric func-
tions, and by Dirichlet’s pigeon-hole principle there are two integers m , n with
m > n and with the same vector of symmetric functions.
Obviously, this is the same as saying that ζ m = π(ζ n ) for some permutation π
k k
of {1, . . . , d} and by iterating this relation we find that ζim = ζπnk (i) . If we take
k
−nk
k such that π k is the identity, we conclude that ζ m = 1 with mk − nk > 0.

1.5.10. We recall here some basic facts about S -integers and S -units in a number
field K . Let S ⊂ MK be a finite set of places, which includes the set S∞ of all
archimedean places of K . An element x ∈ K is an S -integer if |x|v ≤ 1 for
v∈ / S . The S -integers of K form a subring OS,K of K . The units in OS,K are
called the S -units of K and form a group US,K . An element x ∈ OS,K is an
S -unit if and only if |x|v = 1 for all v ∈
/ S.
Remark 1.5.11. An easy application of the non-archimedean triangle inequality
and Gauss’s lemma (see Lemma 1.6.3) shows that an S∞ -integer is the same as
an algebraic integer in K . If S = S∞ , we simply talk about the integers and the
units of the number field K . The units of K are the algebraic integers x ∈ K
with norm NK/Q (x) = ±1, as we see from writing the norm as a product of
conjugates of x .
1.5.12. We consider the following homomorphism
φ : US,K → R|S| , x → (log |x|v )v∈S
18 HEIGHTS

of groups. By taking the logarithm ofthe product formula, we see that the image
of φ is contained in the hyperplane v∈S yv = 0, y ∈ R|S| . By Kronecker’s
theorem, the kernel of φ is the group µK of roots of unity in K . This is part of
Dirichlet’s unit theorem:
Theorem 1.5.13. Let S be as in1.5.10. The image of φ is a lattice of|S|−1
maximal
rank |S| − 1 in the hyperplane v∈S yv = 0. Hence US,K ∼ = µK × Z .

We will not prove this result here, and refer instead to W. Narkiewicz [215], Th.3.6,
or S. Lang [172], p.104.
1.5.14. The Segre embedding
(n+1)(m+1)−1
PnQ × Pm
Q
−→ PQ
is given by
(x, y) −→ x ⊗ y := (xi yj ) ,
where the pairs ij are, for example, ordered lexicographically (see A.6.4). An
easy calculation shows that
h(x ⊗ y) = h(x) + h(y),
using max |xi yj |v = max |xi |v · max |yj |v for every v .
ij i j

This notion extends in an obvious fashion to finite products of projective spaces


with an arbitrary number of factors.
Proposition 1.5.15. If P1 , . . . , Pr are points of AnQ , then
h(P1 + · · · + Pr ) ≤ h(P1 ) + · · · + h(Pr ) + log r.
Proof: Let x(k) be coordinate vectors of Pk , k = 1, . . . , r , which we assume to
be in a suitable number field K . Then
 (1) (r)
h(P1 + · · · + Pr ) = max log+ |xj + · · · + xj |v .
j
v∈MK

If v is not archimedean, then


(1) (r) (k)
|xj + · · · + xj |v ≤ max |xj |v .
k

If instead v is archimedean, we use the triangle inequality for the ordinary absolute
value getting
(1) (r) (k)
|xj + · · · + xj |v ≤ |r|v max |xj |v .
k
Then Lemma 1.3.7 implies

log |r|v = log r
v|∞
1.5. Heights in projective and affine space 19

leading to
 (k)
h(P1 + · · · + Pr ) ≤ log r + max log+ |xj |v .
j,k
v∈MK

The obvious fact


(k)
 (k)
max log+ |xj |v ≤ max log+ |xj |v
j,k j
k

concludes the proof. 


1.5.16. The following considerations show that the inequality in Proposition 1.5.15 cannot
be improved upon in general.
Let α1 , . . . , αr be algebraic numbers. By the preceding Proposition 1.5.15, we have
h(α1 + · · · + αr ) ≤ h(α1 ) + · · · + h(αr ) + log r.
Now suppose that equality occurs for some r ≥ 2 . Looking at the proof above, we must
have
log+ |α1 + · · · + αr |v = log+ (|r|v max |αk |v ) = log |r|v + log+ max |αk |v
k k

for any archimedean prime v . This is equivalent to the two conditions


max |αk |v ≥ 1
k

and
|r|v max |αk |v = |α1 + · · · + αr |v .
k

Hence α1 = · · · = αr and we conclude directly



h(α1 + · · · + αr ) = h(rα1 ) ≤ log r + log+ |α1 |v = log r + h(α1 ).
v∈M K

On the other hand, the equality assumption implies h(rα1 ) = log r + rh(α1 ) , hence
h(α1 ) = 0 because r ≥ 2 . Thus by Kronecker’s theorem in 1.5.9, α1 is a root of unity
and we get h(rα1 ) = h(r) = log r .
Another example yielding almost equality in Proposition 1.5.15 is obtained taking αi =
l/(l + ai ) with 1 ≤ ai ≤ N and distinct ai and with l = N !t + 1 , t any positive
integer. Then the numbers l , l + a1 , . . . , l +
ar are coprime
 in pairs and an easy
 calculation
shows that h(αi ) = log(l + αi ) and h( αi ) = log(l + ai ) + log( αi ) . Hence
h(α1 + · · · + αr ) > h(α1 ) + · · · + h(αr ) + log r − ε for sufficiently large t .

The following result, quite useful in practice, expresses the fact that the height is
invariant by Galois conjugation.
Proposition 1.5.17. Let P be a point of affine or projective space with coordi-
nates (xj ) in Q. If σ ∈ Gal(Q/Q) and if the point σ(P ) is given by the coordi-
nates (σ(xj )), then h(P ) = h(σ(P )).
20 HEIGHTS

Proof: We choose a finite-dimensional Galois extension K of Q containing all


coordinates. Let | |p be an element of MQ and | | be an extension to an absolute
value of K . The composition of | | and σ is again an extension. Thus we have
an action of Gal(K/Q) on the absolute values of K extending | |p . Therefore,
σ permutes the extensions and we have
 
max log |xj |v = max log |σ(xj )|v . 
j j
v|p v|p

Lemma 1.5.18. If α ∈ K \ {0}, and λ ∈ Q , then h(αλ ) = |λ| · h(α). In


particular, h(1/α) = h(α).
Proof: If λ ≥ 0, the result is clear by definition of height. Thus we need only
consider λ = −1. For any absolute value | |v of K, we have
log |α|v = log+ |α|v − log+ |1/α|v .
If we sum over v , the left-hand side is 0 by the product formula and the right-hand
side equals h(α) − h(1/α). 
1.5.19. Let S ⊂ MK be a finite set of places. For α ∈ K \ {0}, we have

log |α|v ≤ h(α).
v∈S

If we use 1/α instead of α , then the preceding lemma shows that



log |α|v ≥ −h(α).
v∈S

This proves the so-called fundamental inequality



−h(α) ≤ log |α|v ≤ h(α). (1.8)
v∈S

1.5.20. Now let L be a finite-dimensional field extension of K and consider a


finite set S of places w ∈ ML . A classical problem of diophantine approximation
is that of approximating an element α ∈ L by elements β ∈ K , at all places
w ∈ S . Classically, this is done with absolute values normalized relative to the
field K rather than L, i.e. with  w as in 1.3.6. In order to emphasize that this
normalization depends on v ∈ MK , we shall use the notation  w,K , hence for
x ∈ L we have
xw,K = |NLw /Kv (x)|v .
With this normalization relative to K , the fundamental inequality applied to α−β
gives 
log α − βw,K
1/[L:K]
−h(α − β) ≤ ≤ h(α − β)
w∈S
under the assumption α = β . Now applying Proposition 1.5.15 we find
Liouville’s inequality:
1.6. Heights of polynomials 21

Theorem 1.5.21. If α ∈ L and β ∈ K with α = β , then



(2H(α)H(β))−[L:K] ≤ α − βw,K ≤ (2H(α)H(β))[L:K] .
w∈S

The left-hand side inequality is a general formulation of the familiar Liouville


inequality in diophantine approximation.
1.5.22. Heights can be introduced in any field with a product formula. We indicate the
necessary changes. Let F be a field with a set MF of non-trivial inequivalent absolute
values, satisfying the product formula. This field F will play the role of Q in our previous
considerations.
Let K/F be a finite-dimensional field extension and consider all places w with w|v for
some v ∈ MF , together with corresponding absolute values | |w normalized as in 1.3.12.
Then the set MK of such absolute values satisfies the product formula, because of (1.4)
on page 9. As before, this yields a non-negative height on PnF , independent of the choice
of coordinates. On the other hand, Kronecker’s theorem does not hold in general, as the
following example shows.
Example 1.5.23. Let K be a field and let F = K(X) be the function field of an irre-
ducible projective variety X over K, which is regular in codimension 1 (see 1.4.9). Let
P ∈ Pn (F ) , hence P = (f0 : · · · : fn ) for certain rational functions fi on X . Then

h(P ) = − deg(Z) min ordZ (fj ),
j
Z

where Z ranges over all prime divisors and the degree is with respect to a fixed ample class.
In particular, the height of a rational function f ∈ K(X)× is

h(f ) = h((1 : f )) = − deg(Z) min(0, ordZ (f )).
Z

Thus h(f ) = 0 if and only if f has no poles. By h(f ) = h(f −1 ) , this is equivalent to
div(f ) = 0 .
If X is normal, a function without poles is regular (R. Hartshorne [148], Proposition
I.6.3A), hence constant on the irreducible components of XK . We conclude that in this
case h(f ) = 0 if and only if f is locally constant on X (use A.6.15).

1.6. Heights of polynomials

Definition 1.6.1. The height of a polynomial


 
f (t1 , . . . , tn ) = aj1 ...jn tj11 · · · tjnn = aj tj
j1 ,...,jn j

with coefficients in a number field K is the quantity



h(f ) = log |f |v ,
v∈MK
22 HEIGHTS

where
|f |v := max |aj |v (1.9)
j

is the Gauss norm for any place v .


Proposition 1.6.2. Let f (t1 , . . . , tn ) and g(s1 , . . . , sm ) be polynomials in differ-
ent sets of variables. Then
h(f g) = h(f ) + h(g).
Proof: Note that the height of a polynomial is equal to the height of the vector of
coefficients in appropriate projective space. Then the claim follows from 1.5.14.

We will need estimates for h(f g) in terms of h(f ) and h(g), without assuming
different sets of variables for f and g . For finite places we have Gauss’s lemma.
Lemma 1.6.3. If v is not archimedean, then |f g|v = |f |v |g|v .
Proof: The inequality |f g|v ≤ |f |v |g|v is immediate because v is not archimedean.
Let us assume first that f (t) and g(t) are polynomials in one variable t . We de-
note by cj the coefficient

ak bl
j=k+l
of f (t)g(t). Without loss of generality, we can assume that |f |v = 1, |g|v = 1.
Suppose |f g|v < 1. Let j be the smallest index with |aj |v = 1. Since |cj |v < 1
and |ak |v < 1 for k < j , we get |b0 |v < 1. Now we apply the above formula
for the coefficient cj+l and conclude |bl |v < 1 by induction. This contradiction
proves the lemma in the one-variable case. For several variables, let d be an integer
larger than the degree of f g . The Kronecker substitution
j −1
xj = td (j = 1, . . . , n)
reduces the problem to the one-variable case. 
1.6.4. Gauss’s lemma applies to every non-archimedean absolute value of a field.
The archimedean case is more complicated and will be handled below.
If f (t1 , . . . , tn ) is a polynomial with complex coefficients, we define |f |∞ as in
(1.9), namely the maximum of the euclidean absolute value | | of the coefficients
of f .
Another very useful quantity in studying polynomials is the Mahler measure

M (f ) := exp log |f (eiθ1 , . . . , eiθn )| dµ1 · · · dµn ,


Tn

where we have abbreviated T for the unit circle {eiθ | 0 ≤ θ < 2π} equipped with
the standard measure dµ = (1/2π)dθ . Its main advantage is the multiplicativity
1.6. Heights of polynomials 23

property
M (f g) = M (f )M (g).
Let f (t) = ad td + · · · + a0 be a polynomial with complex coefficients and factor-
ization
f (t) = ad (t − α1 ) · · · (t − αd ).
Now we note that the mean value of log |t − α| on the unit circle is log+ |α|. In
fact, for |α| > 1 the function log |t − α| is harmonic in the unit disk, therefore its
mean value on the unit circle is its value at the centre, namely log |α| = log+ |α|.
If instead |α| < 1, the function log |1 − αt| is harmonic in the unit disk and
coincides with log |t − α| on the unit circle, while its value at the centre is 0, that
is log+ |α|. Finally the case |α| = 1 is deduced by continuity.

We have shown that M (t − α) = log+ |α|. If we combine this with the multi-
plicativity property of the Mahler measure, we obtain Jensen’s formula:

Proposition 1.6.5.


d
log M (f ) = log |ad | + log+ |αj |.
j=1

The following result shows the connexion between the Mahler measure and the
height and gives a bound for the absolute norm of an algebraic number.

Proposition 1.6.6. Let α ∈ Q and let f be the minimal polynomial of α over Z .


Then
log M (f ) = deg(α)h(α).
In particular
log |NQ(α)/Q (α)| ≤ deg(α)h(α).
Proof: Let d = deg(α) and write

f (t) = ad td + · · · + a0 .

We choose a number field K which contains α and is a Galois extension over


Q , with Galois group G . Then the list (σα)σ∈G contains every conjugate of α
exactly [K : Q]/d times. Gauss’s lemma gives
 d/[K:Q]
|ad |v max (1, |σα|v ) =1 (1.10)
σ∈G

for any non-archimedean v ∈ MK .


24 HEIGHTS

We have
 
[K : Q] h(α) = log+ |σα|v (by Proposition 1.5.17)
v∈MK σ∈G
  [K : Q] 
= log+ |σα|v − log |ad |v (by (1.10))
d
v|∞ σ∈G v |∞

[K : Q]  
d
= log |ad |v + log |αj |v ,
+
d j=1
v|∞

where in the last step we have used the product formula and collected the elements
σα into the conjugates αj , j = 1, . . . , d, of α . By Jensen’s formula, this proves
the first claim.
By the second formula of Lemma 1.3.7, we also have

d
log |NQ(α)/Q (α)| log+ |αj |v
v|∞ j=1

and the second claim follows from the preceding computation. 


The following lemma is useful in estimates. Let f (t) = ad td + · · · + ad be a
polynomial of degree d with complex coefficients, and for 1 ≤ p < ∞ denote by
p (f ) the norm

d
1/p
p (f ) := |aj |p .
j=0
For p = ∞ , we set ∞ (f ) = max |aj | = |f |∞ .
Lemma 1.6.7. If f (t) is as above, then M (f ) ≤ 1 (f ). Moreover

−1
d
∞ (f ) ≤ M (f ) ≤ 2 (f ) ≤ (d + 1)1/2 ∞ (f ).
d/2
Proof: The first inequality is obvious from the definition of M (f ) and the point-
wise bound |f (eiθ )| ≤ 1 (f ) on T.
Next, by convexity, we get

1/2
M (f ) ≤ |f (e )| dµ
iθ 2
.
T
By Parseval’s formula, the right-hand side equals
⎛ ⎞1/2
d
2 (f ) = ⎝ |aj |2 ⎠ ≤ (d + 1)1/2 ∞ (f ).
j=0
1.6. Heights of polynomials 25

Finally, we remark that


 ad−r    
  =  αj1 · · · αjr ,
ad j <···<j
1 r

hence

d
d
|ad−r | ≤ |ad | max(1, |αj |).
r j=1

By Jensen’s formula, we conclude that


d
|ad−r | ≤ M (f ). 
r
The following consequence, Northcott’s theorem, is very important.
Theorem 1.6.8. There are only finitely many algebraic numbers of bounded
degree and bounded height.
Proof: Let α be algebraic of degree d and height h(α) ≤ log H . Let f (t) =
ad td + · · · + a0 be the minimal polynomial of α over Z . By Proposition 1.6.6,
we have M (f ) ≤ H d . Also, Lemma 1.6.7 shows that max |ai | ≤ 2d M (f ).
Therefore, the coefficients of f are bounded by (2H)d . Since there are d + 1
integer coefficients for each f , they give rise to not more than (2(2H)d  + 1)d+1
distinct polynomials f . Since each f has d roots, the number of algebraic integers
2
of degree d and height at most H is at most d(2(2H)d  + 1)d+1 ≤ (5H)d +d .

For later use, we prove here a result of K. Mahler [188], which gives a bound for
the discriminant in terms of the Mahler measure.
Proposition 1.6.9. Let f (x) = ad xd + · · · + a0 be a polynomial with real or
complex coefficients, with roots α1 , . . . , αd . Let

D = ad2d−2 (αi − αj )2
i>j

be its discriminant. Then


|D| ≤ dd M (f )2d−2 .
In particular, if f (x) is the minimal polynomial over Z of an algebraic number ξ
of degree d , it holds
1
log |D| ≤ log d + (2d − 2)h(ξ).
d
Proof: We write D as the product of ad2d−2 and the square of a Vandermonde
determinant (see B.1.10 and Remark B.1.5) and estimate the determinant using
26 HEIGHTS

Hadamard’s inequality, ∗ obtaining


 ⎛ ⎞2
 1 α1 . . . α1d−1 

 ⎜1 α2 . . . α2d−1 ⎟  d 
d−1

 ⎜ ⎟ j 2
|D| = |ad |2d−2 det ⎜ . ⎟ ≤ |ad |2d−2
|αi | .
 ⎝ .. ... .. .. .. ... ⎠
  i=1 j=0
 1 α ... αdd−1 
d

The right-hand side of this inequality does not exceed



d
|ad |2d−2 dd max(1, |αi |)2d−2
1=1

and the first statement follows from Jensen’s formula, Proposition 1.6.5.
The second statement is also clear from Proposition 1.6.6. 
Lemma 1.6.10. Let f (t1 , . . . , tn ) be a polynomial with complex coefficients and
partial degrees d1 , . . . , dn . Then
n n

−1/2 dj
(dj + 1) M (f ) ≤ ∞ (f ) ≤ M (f ).
j=1 j=1
dj /2

Proof: The same proof as in Lemma 1.6.7 holds for the inequality on the left. We
prove the other assertion by induction on n . We can write uniquely

dn
f (t1 , . . . , tn ) = fj (t1 , . . . , tn−1 ) tjn
j=0

for certain polynomials fj (t1 , . . . , tn−1 ). By definition, it holds



 
log M (f ) = log M f (eiθ1 , . . . , eiθn −1 , t) dµ1 · · · dµn−1
Tn −1
and this is not smaller than

dn
log max |fj (eiθ1 , . . . , eiθn −1 )| dµ1 · · · dµn−1 − log
Tn −1 j dn /2
by Lemma 1.6.7, and which in turn is not smaller than

dn
max log |fj (e , . . . , e
iθ1 iθn −1
)| dµ1 · · · dµn−1 − log .
j Tn −1 dn /2


The inequality states that a determinant of a real matrix is majorized by the product of the
euclidean lengths of its rows. Geometrically, it says that the volume of a parallelepiped generated
by real vectors v1 , . . . , vn of given length is maximal when the vectors vi are pairwise orthogonal,
which is quite easy to prove. The result also holds for complex matrices. Hadamard’s proof of 1893 can
be found, among an interesting analysis of extremal cases with entries ±1 (the so-called Hadamard’s
matrices), in [143]. The result was known much earlier to Lord Kelvin and was proved by T. Muir in
1885, see [209], p.32.
1.6. Heights of polynomials 27

We conclude that

dn
M (f ) ≥ max M (fj )
dn /2 j
and the induction hypothesis implies the claim. 
As remarked above, Lemma 1.6.7 leads to Gelfond’s lemma:
Lemma 1.6.11. Let f1 , . . . , fm be complex polynomials in n variables and set
f := f1 · · · fm . Then

m 
m
2−d ∞ (fj ) ≤ ∞ (f ) ≤ 2d ∞ (fj ),
j=1 j=1

where d is the sum of the partial degrees of f .


 
(j) (j)
Proof: Let d1 , . . . , dn be the partial degrees of fj . By carrying out the
multiplication, we see that

m
∞ (f ) ≤ C ∞ (fj )
j=1

with
 
m−1 n  
(j)
C= 1 + dk ≤ 2d .
j=1 k=1
In the other direction, Lemma 1.6.10 implies
⎛ ⎞⎛ ⎛ ⎞1/2 ⎞
m m  n (j)

n m
dk
∞ (fj ) ≤ ⎝ ⎠⎜ ⎝ ⎝1 + ⎟
dk ⎠ ⎠ ∞ (f ).
(j)
(j)
j=1 j=1 k=1
dk /2 k=1 j=1

The next lemma completes the proof.


AB 
Lemma 1.6.12. Let a ≤ A, b ≤ B and d be natural numbers. Then ≤
A+B   d  a b
a+b and d/2 (d + 1)
1/2
≤ 2d .
Proof: The first statement is a trivial consequence of the identity
(1 + t)A (1 + t)B = (1 + t)A+B .
For the second claim (which also follows from a straightforward application of
Stirling’s formula), we proceed by induction.
 The inequality is obviously satisfied
for d = 0 and d = 1. Set Cd := d/2 d
(d + 1)1/2 and let m ∈ N ; then

1/2
1
C2m+1 /C2m = 2 1 − <2
2m + 2
and
1/2
1
C2m+2 /C2m = 4 1 − < 4.
(2m + 2)2
28 HEIGHTS

The induction hypothesis implies the second statement. 


Gelfond’s and Gauss’s lemma together with Lemma 1.3.7 give us
Theorem 1.6.13. Let f1 , . . . , fm be polynomials in n variables with coefficients
in Q and let d be the sum of the partial degrees of f := f1 · · · fm . Then

m 
m
−d log 2 + h(fj ) ≤ h(f ) ≤ d log 2 + h(fj ).
j=1 j=1

Remark 1.6.14. For the upper bound, only the sum d of the partial degrees of
f1 · · · fm−1 does matter. In fact, the proof of Gelfond’s lemma shows
⎛ ⎞
 
m−1 n m
 
m
|f |v ≤ ⎝ |1 + dk |v ⎠
(j)
|fj |v ≤ |2|dv |fj |v
j=1 k=1 j=1 j=1

for any archimedean place v of a number field containing all the coefficients. Then

m 
m−1 n 
m
h(fj ) + d log 2,
(j)
h(f ) ≤ h(fj ) + log(1 + dk ) ≤
j=1 j=1 k=1 j=1

which is often important for applications.


1.6.15. We conclude this section by mentioning an interesting question raised by D.H.
Lehmer ([180], p.476), known today as the Lehmer conjecture (in Lehmer’s paper, this
was addressed as a problem rather than a conjecture). If α = 0 is algebraic with minimal
polynomial f , the Mahler measure of α is M (α) := M (f ) . By Proposition 1.6.6, we
have M (α) = H(α)deg(α) . Now the question raised by Lehmer is whether there is an ab-
×
solute constant c such that M (α) ≥ c > 1 for α ∈ Q not a root of unity. Alternatively,
h(α) ≥ c/d for some absolute constant c .
The last inequality in the proof of Lemma 1.6.7 with r = 0 or d shows that h(α) ≥
(log 2)/d unless α is a unit. The same argument also shows that
1
h(α) ≥ − log 2 + log
∞ (f )
d
for all α . In particular, h(α) ≥ (log 2)/d if
∞ (f ) ≥ 2d+1 . Since there are only
finitely polynomials f of degree d with integer coefficients and with
∞ (f ) < 2d+1 , by
Kronecker’s theorem 1.5.9 we deduce that there is c(d) > 0 such that any α of degree d ,
not a root of unity, satisfies h(α) ≥ c(d) . Thus in studying Lehmer’s problem we may
assume that d is arbitrarily large.
The algebraic number α with minimal polynomial x10 +x9 −x7 −x6 −x5 −x4 −x3 +x+1
has M (α) = 1.17628081825991 . . . and is conjectured to yield the infimum of the Mahler
measure of an algebraic number. †


This polynomial already appears in Lehmer’s paper loc. cit., with a slightly different numerical
value which we have corrected here.
1.7. Norms of products of polynomials 29

If f is not reciprocal (a reciprocal polynomial f (z) satisfies f (z) = ±z deg(f ) f (1/z) ), a


nice theorem by C.J. Smyth [287] states that the minimum of M (α) occurs for the cubic
number with minimal equation x3 − x − 1 . This non-reciprocal number α is about α =
1.32471795724474 . . . . In the general case, for large d we have E. Dobrowolski’s theorem
[90]
3
log log d
M (α) ≥ 1 + c ,
log d
following from Theorem 4.4.1.

1.7. Lower bounds for norms of products of polynomials

We elaborate here further on the question of lower bounds for norms of products of poly-
nomials. The interesting question is to obtain lower bounds which are proportional to the
product of the norms, as in Gelfond’s lemma of the preceding section. It turns out that for
certain natural norms the constants involved in such lower bounds depend only on the de-
grees of the polynomials, not on the number of variables. This section will not be needed at
other places of the book.
1.7.1. Let us denote by
p (f ) the
p -norm of the coefficients of a complex polynomial f .
We shall prove that for p = 1 and p = 2 the
p -norm has the properties mentioned above.
This result extends to all p , 1 ≤ p < ∞ , but we will not prove this extension here. The
more difficult case p = 1 is due to Enflo, who used it in his work on invariant subspaces of
bounded operators in Banach spaces.
Theorem 1.7.2. Let d, e ∈ N . Then there is a constant C(d, e) > 0 such that

1 (f g) ≥ C(d, e)
1 (f )
1 (g)
for complex polynomials f, g of degree d, e in several variables.
Proof: (H.L. Montgomery) For k ∈ N , we define

1 (P k Q)
C(d, e, k) := inf ,

1 (P )k
1 (Q)
where the infimum ranges over all homogeneous polynomials P, Q of degree d, e .
We shall use in the sequel Euler’s formula
 ∂f
tj = df
j
∂tj

for a homogeneous polynomial f ∈ C[t1 , . . . , tn ] of degree d , and the formula


 ∂f
 ∂f


1 =
1 tj = d
1 (f ).
j
∂tj j
∂tj
Both are proved directly by looking at each monomial in f .
Lemma 1.7.3. The following two estimates hold:
C(d, 0, k + 1) ≥ C(d − 1, dk, 1) C(d, 0, k) if d ≥ 1,
e
C(d, e, k) ≥ C(d, e − 1, k + 1) if e ≥ 1.
2kd + e
30 HEIGHTS

Proof: Let f be homogeneous of degree d . We compute



∂ k+1 ∂f

1 f = (k + 1)
1 f k
∂tj ∂tj
 

∂f
≥ (k + 1) C(d − 1, dk, 1)
1
1 f k
∂tj

∂f
≥ (k + 1)C(d − 1, dk, 1) C(d, 0, k)
1
1 (f )k .
∂tj

Summing over j , we find


 
(k + 1) d
1 f k+1 ≥ (k + 1) C(d − 1, dk, 1) C(d, 0, k) d
1 (f )k+1

proving the first statement.


In a similar fashion, for f and g homogeneous of degrees d and e , we also have

∂g ∂g
C(d, e − 1, k + 1)
1 (f )k+1
1 ≤
1 f k+1
∂tj ∂tj

∂ ∂f
=
1 f (f k g) − kf k g
∂tj ∂tj

  ∂f



1 (f )
1 (f g) + k
1 f k g
1
k
.
∂tj ∂tj

Summing over j , we obtain


   
C(d, e − 1, k + 1) e
1 (f )k+1
1 (g) ≤ (dk + e)
1 (f )
1 f k g + kd
1 f k g
1 (f )
 
= (2kd + e)
1 (f )
1 f k g .

After cancelling a factor


1 (f ) , we get the claim. 
The proof of Enflo’s theorem is now easy. There is no loss of generality in assuming that
f and g are homogeneous polynomials. We order the triples (d, e, k) lexicographically.
Proceeding by induction, we prove that C(d, e, k) > 0 . If d = 0 or k = 0 , then
C(d, e, k) = 1 . So let us assume that d > 0, k > 0 . By Lemma 1.7.3, it is enough
to show the claim for a smaller triple, and we are done. This gives Theorem 1.7.2, with
C(d, e) := C(d, e, 1) > 0 . 

1.7.4. The double induction in the proof of the theorem is very expensive for the final esti-
mates. Let us compute some of the constants so obtained. We define recursively Γ(d, e, k)
as follows


⎨1 if d = 0 or k = 0
Γ(d, e, k) = Γ(d − 1, d(k − 1), 1) Γ(d, 0, k − 1) if e = 0 and dk = 0

⎩ e
2kd+e
Γ(d, e − 1, k + 1) if dek = 0
1.7. Norms of products of polynomials 31

and hence Γ(d, e, k) ≤ C(d, e, k) . For example


Γ(d, 0, 1) = 1 C(d, 0, 1) = 1
1
Γ(1, 1, 1) = 1/3 C(1, 1, 1) =
2
Γ(2, 2, 1) = 1/34020
Γ(3, 3, 1) = 1/(3.840584... × 1095 )
Γ(4, 4, 1) = 1/(2.089942... × 1013529 )
Γ(5, 5, 1) = 1/(6.562189... × 1019906418 )
and the computer took too much time for Γ(6, 6, 1) .
1.7.5. The solution for the case p = 2 uses the concept of hypercube representation
of a polynomial. The usual way of writing a homogeneous polynomial of degree d in n
variables is to represent it in the form
 
f (t1 , . . . , tn ) = ··· ai1 ...in ti11 · · · tinn .
i 1 +···+i n =d

The sum here runs over the lattice points in the hyperplane i1 + · · · + in = d of the n -
dimensional cube 0 ≤ iν ≤ d , ν = 1, . . . , n . Note that the number of lattice points in this
cube is (d + 1)n , growing exponentially in n for fixed d .
There is another way of writing the same polynomial, namely
1  
n n
∂d f
f (t1 , . . . , tn ) = ··· ti · · · ti d ;
d! i =1 i =1
∂ti1 · · · ∂tid 1
1 d

we define this as the hypercube representation of f , since now the sum is indexed by the
lattice points of the d -dimensional cube 1 ≤ iδ ≤ n , δ = 1, . . . , d . The number of lattice
points in this cube is nd , which grows polynomially in n for fixed d .
The hypercube representation of a polynomial is very convenient if we want to study poly-
nomials of low degree in a large number of variables.
1.7.6. Define for p ≥ 1
⎛ ⎞
n  p 1/p
1 ⎝   
n
 ∂d f  ⎠ .
[f ]p := ···  
d! i =1 i =1
∂ti1 · · · ∂tid
1 d

If we compare this norm with the


p -norm of the coefficients, simple combinatorics lead to

1− 1
1 p

p (f ) ≤ [f ]p ≤
p (f ). (1.11)
d!
1.7.7. Let d, e ∈ N . A shuffle of type (d, e) is a pair (K, L) , where K and L are disjoint
subsets of {1, . . . , d + e} of cardinality d and e . The set of shuffles of type (d, e) will
be denoted by sh(d, e) . Its cardinality is equal to d+e d
. For x = (x1 , . . . , xd+e ) ∈
[0, 1]d+e , we define xK := (xk 1 , . . . , xk d ) , where {k1 , . . . , kd } = K and k1 < · · · <
kd .
32 HEIGHTS

Let kp (d, e) be the largest constant such that


[f g]p ≥ kp (d, e) [f ]p [g]p
holds for all homogeneous polynomials f, g of degree d, e . Moreover, we define cp (d, e)
as the largest constant for which


F (x )G(x ) ≥ cp (d, e) F p Gp
K L
(K,L)∈sh(d,e) p

holds for all symmetrical functions F ∈ Lp ([0, 1]d ) , G ∈ Lp ([0, 1]e ) , with  p denot-
ing the Lp -norm.
Lemma 1.7.8. The constants cp (d, e) and kp (d, e) are related by
 
d+e
cp (d, e) = kp (d, e).
d

Proof: Let f (t1 , . . . , tn ) be a homogeneous polynomial of degree d and let F be the


symmetrical step function on [0, 1)d given by
1 ∂d f
F (x1 , . . . , xd ) = nd/p
d! ∂ti1 · · · ∂tid
for i1n−1 ≤ x1 < in1 , . . . , idn−1 ≤ xd < ind . Also, let g(t1 , . . . , tn ) be a homogeneous
polynomial of degree e and define G in the same way as F .
Then we verify that

d!e!

[f ]p = F p , [g]p = Gp , [f g]p = F (xK )G(xL )
.
(d + e)! p
(K,L)∈sh(d,e)

The rest of the proof is an approximation argument. Consider the discretization i/n , i =
1, . . . , n of [0, 1] ; given continuous F , G on [0, 1]d and [0, 1]e , we approximate F , G
by step functions as above and construct corresponding polynomials f , g . As n → ∞ ,
these functions are dense in Lp ([0, 1]d ) and Lp ([0, 1]e ) . 
Proposition 1.7.9. The constant c2 (d, e) is
 1/2
d+e
c2 (d, e) = .
d

Proof: Let F, G be symmetrical L2 -functions as in 1.7.7. Then


 2 

F (xK )G(xL )
= d +e
F (xK )G(xL ) F (xK  )G(xL  ) dx
(K,L)∈sh(d,e) 2 (K , L )∈sh(d , e ) [0,1]
(K  , L  )∈sh(d , e )

and this is equal to


 
d+e 
F 22 G22 + F (xK )G(xL  ) F (xK  )G(xL ) dx.
d [0,1]d +e
(K,L)=(K  ,L  )
1.8. Bibliographical notes 33

The integral is not negative, as we verify as follows. We have



F (xK )G(xL  ) F (xK  ) G(xL ) dx =
[0,1]d +e

F (xK ∩K  , xK ∩L  )G(xK ∩L  , xL∩L  )×
[0,1]d +e

F (xK ∩K  , xL∩K  )G(xL∩K  , xL∩L  ) dxK ∩K  dxK ∩L  dxL∩K  dxL∩L  .


Now we integrate first with respect to dxK ∩L  dxL∩K  . By Fubini’s theorem, we obtain
 2
 
 F (xK ∩K  , z)G(z, xL∩L  ) dz .
 
d+e 1/2
This proves non-negativity and c2 (d, e) ≥ e
.
The choices
F (x1 , . . . , xd ) = cos(2πx1 ) · · · cos(2πxd ),
G(x1 , . . . , xe ) = sin(2πx1 ) · · · sin(2πxe ),
d+e 1/2
also show, by orthogonality, that the constant e is sharp. 
Corollary 1.7.10. Let f, g be complex polynomials of degree d, e . Then:
d+e −1/2
(a) k2 (d, e) = e
.
(b)
2 (f g) ≥ √ 1

2 (f )
2 (g) .
(d+e)!

Proof: We may suppose that f , g are homogeneous. The first claim follows from
Lemma 1.7.8 and Proposition 1.7.9. The second follows from (a) and (1.11) on page 31. 

1.8. Bibliographical notes

The material in the first five sections of this chapter is quite standard and was
mainly taken from S. Lang [169] and J.-P. Serre [277]. However, the reader must
be warned that our normalization for absolute values does not always agree with
the normalization used by other authors. The rationale for our normalization is
that the degree [K : Q] does not appear in the first formula in Lemma 1.3.7, and
therefore it is absent in the definition of the absolute logarithmic height. This leads
to formulas invariant by field extensions.
The proof of Gelfond’s lemma in Section 1.6 follows K. Mahler [187], where
the important Mahler’s height is introduced. The inequality M (f ) ≤ f L2 (T)
appearing in the proof of Lemma 1.6.7 can be found in E. Landau [164], Satz 443,
with a somewhat different proof.
Section 1.7 is mostly from B. Beauzamy, E. Bombieri, P. Enflo, and H.L.
Montgomery [18].
2 WEIL HEIGHTS

2.1. Introduction

In this chapter we study heights from a geometric point of view.


We begin with the important Section 2.2 introducing local Weil heights associated
to Cartier divisors on a projective variety X, and studying their properties. These
considerations are given here only for projective varieties, where the treatment is
simpler.
Section 2.3 studies global Weil heights and their equivalence classes up to bounded
functions.
In Section 2.4, we study the height on a projective variety induced by the height in
the ambient projective space and in particular we prove the important Northcott’s
theorem on the finiteness of the number of points of bounded degree and bounded
height in a fixed projective space.
These three sections are very important for the handling of heights in diophantine
geometry and are required from Chapter 9 onwards.
In Section 2.5, which contains new material, the notion of presentation of a pro-
jective variety is introduced and explicit comparison theorems for the heights of
a variety X in two different projective embeddings are given, in terms of presen-
tations of these embeddings. This section may be skipped in a first reading. It
will be used only partially in Section 11.7 and implicitly in questions dealing with
effectivity.
Sections 2.6 and 2.7 extend the results obtained on local and global Weil heights
to the associated heights of locally bounded metrized line bundles on a complete
variety. They will be also used in the second half of the book.
Section 2.8 studies heights on Grassmann varieties and their properties. We need
it only for Section 2.9, where we state the important Siegel’s lemma in a strong
form, as a consequence of Minkowski’s geometry of numbers. For a quick tour, the
reader may take from the last two sections only the elementary version of Siegel’s

34
2.2. Local heights 35

lemma over Z given in 2.9.1 and its Corollary 2.9.2 over number fields, where the
constants are not made explicit, but which is quite often enough for applications.

2.2. Local heights

The reader should be familiar with the concept of Cartier divisors and its connexion
to meromorphic sections of line bundles, as in A.8.
In this section we introduce local heights associated to Cartier divisors on a projec-
tive variety X . However, in order to define them properly we need additional data
beyond the divisor D itself, namely a realization O(D) = O(D+ ) ⊗ O(−D− )
with base-point-free line bundles O(D± ) coming with given sets of generating
global sections. The set of Cartier divisors equipped with these additional data
forms a monoid, and the local heights so defined behave functorially with respect
to this monoid. This removes the need of working modulo bounded functions when
studying Weil heights, a point of crucial importance for applications because it al-
lows precise estimates.
2.2.1. Let K be a field and let us fix an absolute value | | on K . Let X be a
projective variety over K , which for simplicity we assume here to be irreducible.
Let D be a Cartier divisor on X with associated line bundle O(D) and meromor-
phic section sD . For construction of O(D) and sD , see A.8.18. Note that the
associated Cartier divisor D(sD ) of sD is equal to D .
There are base-point-free line bundles L, M on X such that O(D) ∼ = L ⊗ M −1
(cf. A.6.10 (a)). Now choose generating global sections s0 , . . . , sn of L and
t0 , . . . , tm of M , and call the data
D = (sD ; L, s; M, t)
a presentation of the Cartier divisor D .
2.2.2. For P ∈
/ supp(D), we define
 
 sk 
λD (P ) := max min log  (P ) .
k l tl sD
We use the notation tl sD for tl ⊗sD and sk /s for sk ⊗(s )−1 . Hence sk /(tl sD )
is a rational function on X .
We call λD (P ) the local height of P relative to the presentation D and, by abuse
of language, relative to D . In fact, it depends on the choice of sD as well as
on L, M and their generating sections. The local height is a real-valued function
defined outside of the support of the divisor D .
Example 2.2.3. Let f be a non-zero rational function on X with Cartier di-
visor D := D(f ). Then O(D) = OX and f is a meromorphic section of
36 WEIL HEIGHTS

O(D). Thus there is a local height λf relative to D , given by the presentation


(f ; OX , 1; OX , 1). For P ∈
/ supp(D), we have
λf (P ) = − log |f (P )|.
If g is another non-zero rational function on X , then λf g = λf + λg and λf −1 =
−λf .
2.2.4. Let D1 and D2 be Cartier divisors with presentations
Di = (sDi ; Li , si ; Mi , ti )
and local heights λDi . Then s1 s2 = (s1k s2k ), t1 t2 = (t1l t2l ) are generating
global sections of L1 ⊗ L2 , M1 ⊗ M2 , and we define λD1 +D2 as the local height
relative to the presentation
D1 + D2 = (sD1 sD2 ; L1 ⊗ L2 , s1 s2 ; M1 ⊗ M2 , t1 t2 )
of the divisor D1 + D2 . It is obvious that with this presentation we have
λD1 +D2 (P ) = λD1 (P ) + λD2 (P )
/ supp(D1 ) ∪ supp(D2 ).
for P ∈ X , P ∈
2.2.5. If λD is a local height with presentation (sD ; L, s; M, t), then λ−D is
defined by the presentation (s−1
D ; M, t; L, s) and we have
λ−D (P ) = −λD (P )
for P ∈ X \ supp(D). With these operations, the space of local heights is an
abelian group.
2.2.6. Another important operation on presentations is the pull-back. If
D = (sD ; L, s; M, t)
is a presentation of D on X and π : Y −→ X is a dominant morphism of
irreducible projective varieties over K , then
π ∗ D = (π ∗ sD ; π ∗ L, π ∗ s; π ∗ M, π ∗ t)
is a presentation of π ∗ D . We have λπ∗ D (P ) = λD (π(P )) for every P ∈ Y such
that π(P ) ∈/ supp(D). More generally, this works for a morphism π : Y → X
of irreducible projective varieties such that π(Y ) is not contained in supp(D).

We consider an affine variety U over K .


Lemma 2.2.7. Let hj ∈ K[U ], j = 1, . . . , N , be without common zero in U .
Then the ideal generated by the functions hj is equal to K[U ].
Proof: Choose a closed embedding U −→ AnK and let I(U ) be the ideal of U in
K[t1 , . . . , tn ]. The K-algebra K[U ] can be identified with K[t1 , . . . , tn ]/I(U ).
Let I be the inverse image of the ideal generated by h1 , . . . , hm under the projec-
tion
K[t1 , . . . , tn ] −→ K[t1 , . . . , tn ]/I(U ).
2.2. Local heights 37

We claim that I is equal to K[t1 , . . . , tn ]. In fact, the polynomials in I have no


common zero, and our claim follows from Hilbert’s Nullstellensatz in A.2.2. 
Definition 2.2.8. The set E ⊂ U (K) is bounded in U if for any f ∈ K[U ] the
function |f | is bounded on E .
Lemma 2.2.9. Let {f1 , . . . , fN } be generators of K[U ] as a K-algebra. If
sup max |fj (P )| < ∞
P ∈E j=1,...,N

holds, then E is bounded.


Proof: Let f ∈ K[U ]. Then we can write f = p(f1 , . . . , fN ) with p a polynomial
with coefficients in K . Let C be the number of monomials in p and let d be the
degree of p . We define

1 if the absolute value is archimedean
δ :=
0 otherwise.
Then, with |p| the Gauss norm of p from 1.6.3, we find

d
sup |f (P )| ≤ C δ |p| max 1, sup max |fj (P )| < ∞, (2.1)
P ∈E P ∈E j=1,...,N

concluding the proof. 


Lemma 2.2.10. If {Ul } is a finite affine open covering of the affine K-variety U
and ifE is bounded in U, then there are bounded subsets El of Ul such that
E = El .
l

Proof: It is enough to prove the claim for a refinement of {Ul }. Hence we can
assume that there are regular functions hl on U such that Ul = {x ∈ U | hl =
0}, see A.2.10. By Lemma 2.2.7 there are regular functions gl on U such that

l gl hl = 1. If C is the cardinality of the covering and δ is as before, then

−1
−δ
inf max |hl (P )| ≥ C sup max |gl (P )| > 0. (2.2)
P ∈E l P ∈E l

We define
El := {P ∈ E | |hl (P )| = max |hk (P )|}.
k

Obviously, El ⊂ Ul (K) and E = El . Let f1 , . . . , fN be a set of generators
l
of K[U ]. Then f1 , . . . , fN , 1/hl are generators of K[Ul ]. By Lemma 2.2.9, it is
enough to show that |1/hl | is bounded on El . In fact, the bound
sup |1/hl (P )| ≤ C δ sup max |gk (P )| < ∞. (2.3)
P ∈El P ∈E k

follows from (2.2). 


38 WEIL HEIGHTS

Theorem 2.2.11. Let X be a projective variety over K and let D , D be two


presentations of the Cartier divisor D . Then
|λD − λD | ≤ γ
for some constant γ < ∞ .
Proof: By 2.2.4, we see that λD − λD is a local height relative to the presentation
D − D of the zero divisor. Therefore, the left-hand side of the inequality extends
to a well-defined real function on X . Moreover, it is enough to show the claim for
D = 0 and D = (1; L, 1; M, 1). Then D has the form (1; L, s; L, t). We need
to find γ as above such that
 
 sk 
−γ ≤ max min log  (P ) ≤ γ.
k l tl
To this end, it suffices to obtain only the right-hand of this inequality, because we
can interchange the role of s and t .
Now choose a closed embedding of X into PN K with standard coordinates (x0 :
· · · : xN ), let Ui be the affine open subset {x ∈ X | xi = 0} of X , and let Uil
be the affine open subset {x ∈ Ui | tl (x) = 0}. The restrictions of gkl := sk /tl
to Uil are regular functions. The functions fij := xj /xi , j = 0, . . . , N , generate
K[Ui ] as a K-algebra (see A.2.10). Then define sets Ei by
Ei := {P ∈ X(K) | |xi (P )| = max |xj (P )|}.
j

It is clear that, if P ∈ Ei , we have


max |fij (P )| = 1, (2.4)
j

hence Ei is bounded in Ui (Lemma 2.2.9). Thus we can apply Lemma 2.2.10 to


Ui , Eiand the covering {Uil }, obtaining bounded subsets Eil of Uil such that
Ei = Eil and
l
sup max |gkl (P )| < ∞.
P ∈Ei l k

Since the sets Eil cover X(K), we get the claim. 


2.2.12. Since Hilbert’s Nullstellensatz is effective, the constant γ in Theorem
2.2.11 is effectively computable in terms of presentations of D and D . An
effective version of the Nullstellensatz can be found in D. Masser and G. Wüstholz
[195], Th.IV.
Remark 2.2.13. For the purpose of giving a precise meaning to the words “effectively
computable,” we need a closer look at the bounds in the results above.
In Lemma 2.2.9, there are finitely many elements pa ∈ K and d ∈ N such that

d
sup |f (P )| ≤ max |pa | max 1, sup max |fj (P )| .
P ∈E a P ∈E j
2.3. Global heights 39

The elements pa may be chosen to be the coefficients of the polynomial p in (2.1) on


page 37 if the absolute value is not archimedean, while in the archimedean case it suffices
to add to this list C times the coefficients of the list, where C is the number of coefficients.
Note also that the list of elements pa so obtained and the degree d depend only on the
geometric data (U, f, f1 , . . . , fN ) , but not on E , nor on the absolute value.
In the situation of Lemma 2.2.10, the bound of f ∈ K[Ul ] is again of the same type,
namely

d
sup |f (P )| ≤ max |pm | max 1, sup max |fj (P )| ,
P ∈E l m P ∈E j

where again f1 , . . . , fN are generators of K[U ] , the finitely many elements pm ∈ K ,


and d , depend only on geometric data ( f , the covering, generators) but not on E , the
absolute value | | , or the decomposition {El } . This is clear by applying the above result
to f ∈ K[Ul ] with generators f1 , . . . , fN , 1/h and then again to every gl ∈ K[U ] in
(2.3) on page 37.
If we apply these remarks to the proof of Theorem 2.2.11 and use (2.4), then we may choose
γ = max log+ |pm |
m

for a certain finite set of elements pm ∈ K , independent of the absolute value | | and
determined exclusively in terms of geometric data.

2.3. Global heights

In this section, starting from the local heights previously defined, we consider the
case in which K is a number field and define global heights.
2.3.1. Let X be an irreducible projective variety defined over K .
We consider a Cartier divisor D on X with presentation
D = (sD ; L, s; M, t).
Let F be a number field with K ⊂ F ⊂ K and let P ∈ X(F ) \ supp(D). For
v ∈ MF , we define the local height
 
 sk 

λD (P, v) := max min log  (P )
k l tl sD v
using our normalizations from 1.3.6. Let p ∈ MQ be the restriction of v to Q
and let | |u be an absolute value on K such that the restriction to K is equivalent
to | |v and such that the restriction to Q is equal to | |p . The existence of | |u
follows from Proposition 1.3.1. Then
[Fv : Qp ]
λD (P, v) = λD (P, u),
[F : Q]
where λD (P, u) is the local height relative to the absolute value | |u from 2.2.2.
This allows us to apply the results from Section 2.2 to λD (P, v).
40 WEIL HEIGHTS

Example 2.3.2. The hyperplane {x0 = 0} in PnK has the presentation


D = (x0 ; OPn (1), x0 , . . . , xn ; OPn , 1).
For P ∈ P (F ) with x0 (P ) = 0 and v ∈ MF the corresponding local height is
n
 xk 
λD (P, v) = max log  (P )v
k x0
and the product formula becomes

h(P ) = λD (P, v).
v∈MF

This explains the name local height. This notion will be extended later to arbitrary
divisors.
2.3.3. We go back to the general case in 2.3.1. Let λD be a local height relative to
the presentation D = (sD ; L, s; M, t) of a Cartier divisor D on X . For P ∈ X
there are sj and tl such that sj (P ) = 0, tl (P ) = 0. Therefore, we can find a non-
zero meromorphic section s of O(D) such that P is not contained in the support
of the Cartier divisor D(s). Then D(s) = (s; L, s; M, t) is a presentation of
D(s) and we have
λD(s) = λD + λf ,
where f is the rational function s/sD . If F is a finite extension K ⊂ F ⊂ K
such that P ∈ X(F ), the local height λD(s) (P, v) is finite for any v ∈ ML ,
because P is not in the support of D(s). Then we define the global height of P
relative to λ := λD by

hλ (P ) := λD(s) (P, v).
v∈MF

The next result justifies the definition and the name global height.
Proposition 2.3.4. The global height hλ is independent of the choices of F and
of the section s .
Proof: By Lemma 1.3.7, the global height is independent of F . Its independence
from the choice of s can be verified as follows. Let t be another non-zero mero-
morphic section of O(D) with P ∈ / supp(D(t)). Then 2.2.4 and 2.2.5 show
that
λD(s) (P, v) − λD(t) (P, v) = λs/t (P, v)
for any v ∈ MF . On the other hand, the product formula shows that the global
height of P relative to λs/t is 0, proving the claim. 
Remark 2.3.5. As an immediate consequence the global height relative to the
natural local height of a non-zero rational function is identically 0. It is also clear
that the map λ → hλ is a group homomorphism.
Theorem 2.3.6. Let λ, λ be local heights relative to Cartier divisors D, D with
D − D a principal divisor. Then hλ − hλ is a bounded function.
2.3. Global heights 41

Proof: By Remark 2.3.5, we can assume D = D = 0 and λ = 0, hence we


need only to show that hλ is a bounded function for any local height relative to
the zero divisor. Theorem 2.2.11 and Remark 2.2.13 give us a family {γv }v∈MK
of non-negative real numbers, almost all 0, such that
|λ(P, u)|u ≤ γv
for any P ∈ X and any place u on K with u|v . As before, let F be a finite
extension K ⊂ F ⊂ K such that P ∈ X(F ). By 2.3.1, we obtain
[Fw :Qp ]
|λ(P, w)| ≤ γv
[F :Q]

for any w ∈ MF , which divides v ∈ MK and p ∈ MQ . By Corollary 1.3.2, we


have 
[Fw : Kv ] = [F : K]
w|v

and we get
  [Kv :Qp ]
|hλ (P )| ≤ |λ(P, w)| ≤ γv < ∞. 
[K:Q]
w∈MF v∈MK

2.3.7. There is an isomorphism of the group of Cartier divisors modulo principal


Cartier divisors onto Pic(X), given by cl(D) → cl(O(D)). Let us denote the
real functions on X by RX and the subspace of bounded functions by O(1). Let
c ∈ Pic(X) and choose a Cartier divisor D with c = cl(O(D)) and a local height
λ relative to D . By Theorem 2.3.6, the image hc of hλ under the projection
RX −→ RX /O(1)
is independent of the choice of D and λ . A representative of hc is called a height
function associated to c .
In other words, an isomorphism class of line bundles determines a real-valued
height function up to bounded functions. We note however that considering only
equivalence classes of heights modulo bounded functions, as propounded by Weil,
although it has attractive functorial properties, it also has the great disadvantage of
throwing away the finer properties of heights needed to prove the deeper theorems
of diophantine geometry. A better point of view is offered in the next sections.
Theorem 2.3.8. The map
h : Pic(X) −→ RX /O(1),
given by c → hc , is a homomorphism. If ϕ : Y → X is a morphism of irreducible
projective varieties over K , then
hϕ∗ c = hc ◦ ϕ
for any c ∈ Pic(X).
42 WEIL HEIGHTS

Proof: The first claim follows from Remark 2.3.5 and Theorem 2.3.6. The second
one is an immediate consequence of 2.2.6. 
It is quite trivial, but important, to remark that a base-point-free line bundle has
always a non-negative height function. A more general result is the following
Proposition 2.3.9. Let D be an effective Cartier divisor on X . Then there is a
local height λ relative to D such that, for any P ∈
/ supp(D) and for any place u
of K , it holds λ(P, u) ≥ 0.
Proof: There are base-point-free line bundles L, M on X such that O(D) ∼ =
L ⊗ M −1 . Choose generating global sections t0 , . . . , tl of M . We can complete
sD t0 , . . . , sD tl to a family s0 , . . . , sk of generating global sections of L. The
local height given by the presentation
D = (sD ; L, s; M, t)
is non-negative outside of the support of D . 
2.3.10. The results of Sections 2.2 and 2.3 extend immediately to varieties which
are not necessarily irreducible. Here we must be careful to require that all mero-
morphic sections considered are invertible, i.e. not identically 0 on any irreducible
component of X . For the functorial property of 2.2.6, we must assume that no ir-
reducible component of Y is mapped into the support of D , in order to guarantee
a well-defined pull-back of the Cartier divisor.
2.3.11. We may introduce global heights for any field with product formula as long
as we work with properly normalized absolute values (see 1.3.6 for a perfect field
and 1.3.12 in general). Then all results of this section continue to hold.
2.3.12. We may also replace the ground field K by K . Then all geometric data
as varieties, morphisms, line bundles, and sections are defined over a sufficiently
large number field K and there is no problem about considerations with global
heights relative to the ground field K . Since the global height does not depend
on the ground field, it also makes sense to consider it as a global height over the
algebraically closed field K .

2.4. Weil heights

In this section we consider global heights given by a morphism of a projective


variety to a projective space. In fact, we will see that any global height is the
difference of two such Weil heights. We will formulate Theorem 2.3.8 and North-
cott’s theorem in terms of Weil heights. The results are based on the previous
sections.
Let X be a projective variety X over Q .
2.4. Weil heights 43

Definition 2.4.1. Let ϕ : X → PnQ be a morphism over Q . The Weil height of


P ∈ X(Q) relative to ϕ is defined by hϕ (P ) := h ◦ ϕ(P ), with h the usual
height on PnQ .

2.4.2. If ψ : X → PmQ
is another morphism over Q, the join ϕ#ψ is the
morphism
(n+1)(m+1)−1
X → PQ , x → (ϕj (x)ψk (x)),
with the lexicographic ordering on pairs (i, j).
It may be viewed as the composition of the graph morphism G(ψ) : X → X×Pm
Q
,
the product map ϕ × id : X × PQ → PQ × PQ , and the Segre embedding
m n m

(n+1)(m+1)−1
PnQ × Pm
Q
→ PQ (cf. A.6.4).
Remark 2.4.3. If ϕ is a closed embedding, then ϕ#ψ is a closed embedding.
In order to prove this claim, note that G(ψ) is always a closed embedding (see
A. Grothendieck [134], Cor.5.4.3). If ϕ is a closed embedding, then ϕ × id is a
closed embedding ([134], Prop.4.3.1). The Segre embedding is also a closed em-
bedding (cf. A.6.4). Since the composition of closed embeddings remains a closed
embedding ([134], Prop.4.2.5), we conclude that ϕ#ψ is a closed embedding.

The following proposition formalizes a remark already made in 1.5.14 about the
height in Segre embeddings.
Proposition 2.4.4. If ϕ : X → PnQ and ψ : X → Pm
Q
are morphisms over Q ,
then
hϕ#ψ = hϕ + hψ .
2.4.5. We claim that every Weil height may be viewed as a global height in the
sense of Section 2.3. There is a linear form  = 0 x0 + · · · + n xn , which does
not vanish identically on any irreducible component of X . Then it follows from
Example 2.3.2 and 2.2.6 that hϕ is the global height relative to the presentation
ϕ∗ (; OPn (1), x0 , . . . , xn ; OPn , 1).
Q Q

2.4.6. Conversely, we can write every global height as a difference of two Weil
heights. Let hλ be the global height relative to the presentation
D = (s; L, s0 , . . . , sn ; M, t0 , . . . , tm ).
We consider the morphisms
ϕ : X → PnQ , x → (s0 (x) : · · · : sn (x))
and
ψ : X → Pm
Q
, x → (t0 (x) : · · · : tm (x))
as in A.6.8. Then it follows from the independence of hλ from s and 2.4.5 that
hλ = h ϕ − h ψ .
44 WEIL HEIGHTS

2.4.7. Note that in 2.4.6 we may even assume that ϕ and ψ are closed embeddings
into projective spaces. This follows from Remark 2.3.5 and Proposition 2.4.4,
choosing any closed embedding θ of X into some projective space over Q and
replacing ϕ , ψ by ϕ#θ , ψ#θ .
Theorem 2.4.8. If ϕ : X → PnQ and ψ : X → Pm Q
are morphisms over Q with
ϕ∗ OPn (1) ∼
= ψ ∗ OPm (1), then hϕ − hψ is a bounded function.
Proof: Using 2.4.5, this is a reformulation of Theorem 2.3.6. 
Our next result is the general version of Northcott’s theorem, which is both sim-
ple and fundamental.
Theorem 2.4.9. Let X be a projective variety defined over the number field K
and let hc be a height function associated to an ample class c ∈ Pic(X). Then
the set
{P ∈ X(K) | hc (P ) ≤ C, [K(P ) : K] ≤ d}
is finite for any constants C, d ∈ R .
Proof: There is m ∈ N such that mc is very ample. By Theorem 2.3.8, mhc
is a height function associated to mc . Therefore, we can assume without loss
of generality that c is very ample. By Theorem 2.4.8, it is enough to prove the
statement for X = PnQ and c = cl(OPn (1)), i.e. for the standard height on PnQ .
Let U := {xj = 0} be a standard affine subset of PnQ . We have to show that there
are only finitely many points P in U (Q) with h(P ) ≤ C and [K(P ) : K] ≤
d . The height of P ∈ U is an upper bound for the heights of the coordinates.
Therefore, the case n = 1 implies the general statement. This is Theorem 1.6.8,
ending the proof. 
Remark 2.4.10. Clearly, we may also introduce Weil heights for any field with
product formula and all results above remain true with the exception of Northcott’s
theorem. We may use Example 1.5.23 as a counterexample if the field is infinite.
Example 2.4.11. The following example shows that Weil heights in the geometric case may
be interpreted in terms of intersection theory, as a degree function. This is conceptually very
important, because it allows us to use the intuition and methods of algebraic geometry in
dealing with heights.
The corresponding result in the arithmetic case lies much deeper and requires intersection
theory in the setting of arithmetic algebraic geometry (see Example 2.7.20).
Let X be an irreducible regular projective variety over an arbitrary field K , and let deg be
the degree of cycles corresponding to a fixed embedding of X into a projective space PnK .
By Proposition 1.4.7, we have a canonical set of absolute values on K(X) satisfying the
product formula. A point P ∈ PnK (K(X)) is given by coordinates f0 , . . . , fn ∈ K(X) .
Let ϕ be the rational map
X  PnK , x
→ ϕ(x) = (f0 (x) : · · · : fn (x)).
2.5. Explicit bounds for heights 45

Let x0 , . . . , xn be the coordinates of PnK , viewed as global sections of OPn (1) . Choose
j ∈ {0, . . . , n} such that xj |ϕ(X ) = 0 . Then the vector (f0 , . . . , fn ) is proportional to
(ϕ∗ x0 /ϕ∗ xj , . . . , ϕ∗ xn /ϕ∗ xj ) ∈ K(X)n+1
and we may assume that they are equal. By Example 1.5.23, we have

h(P ) = − min ordZ (fi ) deg Z
i=0,...,n
Z


= ordZ (ϕ∗ xj ) − min ordZ (ϕ∗ xi ) deg Z,


i=0,...,n
Z

where the sums range over all prime divisors Z of X . By the valuative criterion of proper-
ness (cf. A.11.10), the domain U of ϕ has a complement of codimension at least 2 . The
local ring associated to a prime divisor was introduced in A.8.7. By choosing a trivialization
of (ϕ|U )∗ OPn (1) at a generic point of Z , we may view ϕ∗ (xi ) as regular functions in Z .
Therefore, we have
min ordZ (ϕ∗ xi ) = 0
i=0,...,n
and thus 
h(P ) = ordZ (ϕ∗ xj ) deg Z.
Z
Since X\U is of codimension at least 2 , the restriction map induces an isomorphism
Pic(X) −→ ∼ Pic(U )

(because on a regular variety Cartier divisors and Weil divisors can be identified, cf. A.8.21).
So it makes sense to view (ϕ|U )∗ OPn (1) as an element of Pic (X) , which we simply
denote by ϕ∗ OPn (1) . It follows that
h(P ) = deg ϕ∗ OPn (1),
where the right-hand side denotes the degree of any divisor of a non-zero meromorphic
section of ϕ∗ OPnK (1) (see A.9.26). If Y is a projective variety over K(X) and ι : Y →
PnK (X ) is a closed embedding over K(X) into projective space, then P ∈ Y (K(X))
induces a rational map
ϕ : X  Y
as above, and we have
hι (P ) = deg ϕ∗ OY (1),
where OY (1) is the pull-back of OPn (1) to Y .

2.5. Explicit bounds for Weil heights

This is a somewhat technical section, the reading of which can be omitted at first. Its ulti-
mate purpose is to give a meaning to the phrase “effectively computable,” which otherwise
would be only a hollow claim, devoid of true mathematical significance.
The main tool in this section is the concept of presentation of a closed embedding of a
projective variety in projective space. The basic idea can be described as follows. Let
X → PnQ be a projective algebraic variety over the algebraically closed field Q , embedded
in projective space PnQ . It is well known that every rational function on X is then induced
46 WEIL HEIGHTS

by restriction of a rational function in the ambient space PnQ . On the other hand, we often
need to compare situations relative to different embeddings. The point of view taken in this
section is therefore the following. Since we are dealing with the function field Q(X) of X ,
we are allowed to choose a hypersurface in Pr+1Q
as a birational model. The homogeneous
coordinate ring S is the quotient of a polynomial ring by a principal ideal. This allows us to
introduce a height in S . Thus fixing this choice gives us a reference description of elements
in S .
This being done, we consider an arbitrary closed embedding X → PnQ . Then a presentation
of the embedding X → PnQ , relative to the reference ring S , consists in expressing the
rational functions (xi /xj )|X as elements of S . Now the problem of comparing heights
relative to different embeddings can be solved by comparing corresponding presentations.
This leads to very explicit comparison estimates for heights. The details are as follows.
2.5.1. Let X be an irreducible projective variety over Q of dimension r . There is a Q-
morphism
π : X −→ Pr+1 Q
such that X is mapped birationally onto a hypersurface (cf. A.11.5 and A.11.6). We
denote by z0 , . . . , zr+1 the standard coordinates of Pr+1
Q
. Then we may assume that the
hypersurface is given by an irreducible homogeneous polynomial f of degree d of the form
d−1 d
f (z0 , . . . , zr+1 ) = f0 + f1 zr+1 + · · · + fd−1 zr+1 + zr+1 ,
where fi ∈ Q[z0 , . . . , zr ] is homogeneous of degree d − i , f (0, . . . , 0, 1) = 0 and d is
the degree of X with respect to π ∗ OPr +1 (1) (cf. A.11.7). This situation is fixed for the
whole section.
2.5.2. Let S be the homogeneous coordinate ring of π(X) . We have
S = Q[z0 , . . . , zr+1 ]/I,
where I is the homogeneous ideal generated by f . Let z i be the image of zi in S (0 ≤
i ≤ r + 1) and note that z r+1 is integral over Q[z 0 , . . . , z r ] . The variables z 0 , . . . , z r
are algebraically independent, because the transcendence degree of Q(π(X)) = Q(X) is
r (cf. A.4.11). By abuse of notation, we denote them again by z0 , . . . , zr . The minimal
polynomial of z r+1 over the polynomial ring Q[z0 , . . . , zr ] is equal to f (z0 , . . . , zr , ·) ,
since
d−1
0 = f0 + f1 z r+1 + · · · + fd−1 z r+1 + z dr+1 . (2.5)
The elements form a basis of S over Q[z0 , . . . , zr ] and so we have an
d−1
1, z r+1 , . . . , z r+1
isomorphism of Q-vector spaces
∼ {p ∈ Q[z0 , . . . , zr+1 ] | deg
S −→ z r +1 (p) < d}.

By means of this map, we define the height of an element of S as the height of the corre-
sponding polynomial.
2.5.3. For l ∈ N , there are uniquely determined qlj ∈ Q[z0 , . . . , zr ] (j = 0, . . . , d − 1)
such that

d−1
z lr+1 = qlj z jr+1 . (2.6)
j=0
2.5. Explicit bounds for heights 47

The polynomials qlj are homogeneous of degree l − j (elements of negative degree are 0),
and qlj = δlj for 0 ≤ l ≤ d − 1 , where δlj is Kronecker’s symbol. We may now assume
that l ≥ d . Equation (2.5) shows

d−1 
d−1 
d−1
z lr+1 = − k+l−d
fk z r+1 =− fk qk+l−d,j z jr+1 ,
k=0 j=0 k=0

leading to the recursive formula



d−1
qlj = − fk qk+l−d,j , (2.7)
k=0

where j = 0, . . . , d−1 . Let F be a number field containing the coefficients of f0 , . . . , fd−1


and for v ∈ MF define δv to be 1 if v is archimedean and 0 otherwise. The recursion
(2.7) yields a bound
 δ v
 
 d+r+1 
|qlj |v ≤   |f |v  max |ql j |v
 r+1  l =l−d,...,l−1
v
 
for the Gauss norms. Here we have used that fk has d−k+r r
summands and
   
d
d−k+r d+r+1
= . (2.8)
r r+1
k=0
By induction we obtain
 (l−d)δ v
 
 d+r+1 
|qlj |v ≤   |f |vl−d+1 (2.9)
 r+1 
v
and thus Lemma 1.3.7 leads to
 
d+r+1
h(qlj ) ≤ (l − d + 1)h(f ) + (l − d) log .
r+1

Let ϕ : X → PnQ be a closed embedding over Q and let x0 , . . . , xn be the standard


coordinates of PnQ . Let p be a vector with entries pi ∈ S , i = 0, . . . , n , homogeneous of
degree d(p) .
Definition 2.5.4. The vector p is said to be a presentation of ϕ if the following conditions
are satisfied:

(a) If l ∈ {0, . . . , n} and xl |X = 0 , then we have pl = 0 .


(b) If l as in (a) and i ∈ {0, . . . , n} , then

pi xi 
= 
pl xl X
in Q(X) .
2.5.5. The number d(p) is called the degree of p . Consider the vector whose entries
are given by all the coefficients of p0 , . . . , pn . The height of the corresponding point in
appropriate projective space is called the height of the presentation, denoted by h(p) .
The existence of a presentation of ϕ is an obvious consequence of Q(X) = Q(π(X)) .
48 WEIL HEIGHTS

n
Lemma 2.5.6. Let ϕj : X → PQj , j = 1, . . . , k, be closed embeddings over Q with
presentations p(j) and let n := (n1 + 1) · · · (nk + 1) − 1 . Then the join ϕ1 # · · · #ϕk
gives a closed embedding
ϕ : X −→ PnQ , P
−→ (ϕ1i1 (P ) · · · ϕkik (P ))ij ∈{0,...,n j } .

It has a presentation p defined by


(1) (k)
pi := pi1 · · · pik (ij ∈ {0, . . . , nj })

of degree d(p) = d(p(1) ) + · · · + d(p(k) ) and height



k 
k−1  
h(p) ≤ h(p(j) ) + r log 6 + 6 d(p(j) )/r + C · (k − 1)
j=1 j=1

with
C = (d − 1)h(f ) + d(d + r + 1).
Proof: By Remark 2.4.3, ϕ is a closed embedding. Also, p is a presentation of ϕ of degree
d(p) = d(p(1) ) + · · · + d(p(k) ).
To prove the estimate for the height, by induction we may assume k = 2 . We have the
decomposition
(j)

d−1
(j)
pi j = pij m z m
r+1 (j = 1, 2),
m=0

whence   

d−1
(1)

d−1
(2)
pi = pi1 m 1 zm 1
r+1 pi2 m 2 zm 2
r+1 .
m 1 =0 m 2 =0

Equation (2.6) on page 46 leads to the decomposition



d−1
pi = pim z m
r+1
m=0

with
 (1) (2)

2d−2  (1) (2)
pim : = pi1 m 1 pi2 m 2 + pi1 m 1 pi2 m 2 qlm .
m 1 +m 2 =m l=d m 1 +m 2 =l
m 1 ,m 2 ≤d−1

Let F be a number field extension of Q containing the coefficients of f0 , . . . , fd−1 and of


(j)
all pij m j , j = 1, 2 , and for v ∈ MF define δv as in 2.5.3. Then we verify
(1) (2)
|pim |v ≤ |B|δvv |pi1 |v |pi2 |v max (1, |qlm |v ) , (2.10)
l=d,...,2d−2

where B is an upper bound for


  2d−2   

m
r + d(p(1) ) − m1  
d−1
r + d(p(1) ) − m1 r+l−m
+ .
m =0
r r r
1 l=d m 1 =l−d+1
2.5. Explicit bounds for heights 49

Thereby we have used the fact that number of monomials of degree D in r + 1 variables
is equal to r+D
D
. We use the estimates
 
r + d(p(1) ) − m1 1  r  r
≤ r + d(p(1) ) < 3 + 3 d(p(1) )/r
r r!

and
 
r+l−m
≤ 2r+l−m
r

to conclude that
 r
B := d 2r+2d 3 + 3 d(p(1) )/r

is such an upper bound. From (2.9) on page 47 and (2.10), we get


 
(1) (2) r+d+1
h(p) ≤ h(p ) + h(p ) + log B + (d − 1)h(f ) + (d − 2) log .
d+1

With the above value for B , we have

h(p) ≤ h(p(1) ) + h(p(2) ) + r log(6 + 6 d(p(1) )/r)) + C

with
C := (d − 1)h(f ) + d(d + r + 1). 

Remark 2.5.7. If we work over a fixed number field K and with an irreducible reduced
projective variety, then the constructions in 2.5.1 to 2.5.3 can be done over K and every
K-morphism to projective space has a presentation defined over K . Moreover, Lemma
2.5.6 remains valid.

2.5.8. We use the following notation. For a multi-index α = (α0 , . . . , αN ) ∈ NN +1 , we


set
|α| := α0 + · · · + αN
N +1
and for x = (x0 , . . . , xN ) ∈ Q we define

xα := xα0 0 · · · xαNN .

If Y is a closed subvariety of PN
Q
, we denote by JY the ideal sheaf of Y .

Proposition 2.5.9. Let ϕ : X → PnQ , ψ : X → Pm


Q
be closed embeddings over Q , with
corresponding presentations p, q . We assume

ϕ∗ OPn (1) ∼ ∗
= ψ OPm (1).

There is a positive integer kψ such that if k ≥ kψ , then


 
H 1 Pm Q , JψX ⊗ OP (k) = 0.
m
50 WEIL HEIGHTS

 
If k ≥ kψ and χ(k) := dim H 0 (X, ψ ∗ OPm (k)) and P ∈ X , then

hϕ (P ) − hψ (P ) ≤ (n + 1)χ(k) h(p) + h(q)

+ r log(6 + 6 d(p)/r) + r log(6 + 6 d(q)/r)


1
+ log((n + 1)χ(k)) + C ,
k

where
C := (d − 1)h(f ) + d(d + r + 1).
Proof: In order to understand the proof, the reader should be familiar with some basic facts
from cohomology of sheaves, as given in A.10.
The existence of kψ is a well-known result (see A.10.27). There is a short exact sequence

0 −→ JψX −→ OPm −→ ψ∗ OX −→ 0

of coherent sheaves on Pm
Q
. Tensoring with OPm (k) yields a short exact sequence

0 −→ JψX ⊗ OPm (k) −→ OPm (k) −→ (ψ∗ OX ) ⊗ OPm (k) −→ 0.

The projection formula gives

(ψ∗ OX ) ⊗ OPm (k) ∼ ∗


= ψ∗ ψ OPm (k),
Q Q

as we have verified in Example A.10.20. Using A.10.25, the first terms of the long exact
cohomology sequence are
 

0 −→ H 0 Pm 0 m 0
Q , JψX ⊗ OP (k) −→ H (PQ , OP (k)) −→ H (X, ψ OP (k)) −→
m m m

−→ H 1 (Pm
Q , JψX ⊗ OP (k)) −→ · · · .
m

The last cohomology group is 0 by the choice of k , and we infer that the map
 

H 0 Pm 0
Q , OP (k) −→ H (X, ψ OP (k))
m m

is surjective. By assumption, the invertible sheaves ϕ∗ OPn (1) and ψ ∗ OPm (1) are isomor-
phic and we may identify them. Let x = (x0 : · · · : xn ) and y = (y0 : · · · : ym ) be the
standard coordinates of PnQ and Pm Q
, and choose B ⊂ {β ∈ Nm+1 | |β| = k} such that
  
y β X
β ∈B

is a basis of H 0 (X, ψ ∗ OPm (k)) . There are uniquely determined aiβ ∈ Q such that
  
xki X = aiβ yβ X . (2.11)
β∈B
2.5. Explicit bounds for heights 51

Let P ∈ X(Q) . Choose a number field F containing xi (P ), yj (P ) and aiβ for all i, j, β .
By Proposition 2.4.4, we obtain, for the k -fold join ψ (k) := ψ# · · · #ψ , the equation

k(hϕ (P ) − hψ (P )) = log max |xki (P )|v − hψ (k ) (P )
i
v∈M F
 
= log max |xki (P )|v − log max |yβ (P )|v .
i |β|=k
v∈M F v∈M F

By (2.11), the triangle inequality and Lemma 1.3.7, we deduce


1 1
hϕ (P ) − hψ (P ) ≤
h(a) + log χ(k), (2.12)
k k
where a is the matrix (aiβ ) and h(a) is the height of the matrix viewed as a point in
appropriate projective space.
We take the ratio of equations (2.11) with indices i and l and deduce, using the definition
of presentation, that
 
alβ pki qβ = aiβ pkl qβ (i, l ∈ {0, . . . , n}). (2.13)
β∈B β∈B

Conversely, assume that (aiβ ) is a non-trivial solution of this equation. Then we have
   
(xi |X )k alβ (yX )β = (xl |X )k aiβ (yX )β .
β∈B β∈B

Let i be such that xi |X is not identically 0 . Then the last displayed equation shows that
the rational function on X defined by
 
β
β∈B aiβ y X
g :=
xki |X
does not depend on the index i . We claim that g is constant. To prove this, it suffices
to show that g is a regular function (use that X is projective and A.6.15). Indeed, since
x0 , . . . , xn generate OPn (1) , we see that for any point P ∈ X(Q) , there is an index i
such that xi (P ) = 0 , hence g is regular at P , as asserted.
This proves that the space of solutions of (2.13) is spanned by the matrix a = (aiβ ) given
by (2.11).
Our next task is to estimate h(a) . Since a scalar factor does not change the height, we may
estimate the height of any non-trivial solution of (2.13). By Lemma 2.5.6, we have a natural
presentation of ϕ(k) #ψ (k) in terms of p and q . The elements pki qβ of S are entries of
that presentation. The decomposition

d−1
pki qβ = cβij z jr+1
j=0

with uniquely determined cβij ∈ Q[z0 , . . . , zr ] leads to the system of equations


 
cβij alβ − cβlj aiβ = 0, (2.14)
β∈B β∈B

with i, l ∈ {0, . . . , n} and we have j ∈ {0, . . . , d − 1} .


52 WEIL HEIGHTS


Let cβij = cβijα z0α 0 · · · zrα r , so that the coefficients cβijα of the polynomials cβij
form a matrix c with
h(c) ≤ k (h(p) + h(q) + r log(6 + 6d(p)/r) + r log(6 + 6d(q)/r) + C) (2.15)
again by Lemma 2.5.6. Moreover, (2.14) is equivalent to the linear system of equations

(cβijα alβ − cβljα aiβ ) = 0
β∈B

indexed by i, l, j, α and unknowns aiβ . Let A denote the matrix associated to this linear
system; its entries are either 0 or ±cβijα . The number of unknowns is (n + 1)|B| and,
as remarked before, the space of solutions has dimension 1 . Therefore, the rank R of the
matrix A is
R = (n + 1)χ(k) − 1.
Let A be a R × (R + 1) submatrix of A of full rank R . Since A and A have the
same kernel, we look for a non-zero solution a of A · a = 0 . We consider a as a vector
(a0 , . . . , aR ) and A as a matrix of the form (Aµν )µ∈{1,...,R}, ν ∈{0,...,R} . Obviously, the
vector a with ρ th entry
aρ := (−1)ρ+1 det(Aµν )µ∈{1,...,R}, ν ∈{0,...,R}\{ρ}
is a non-zero solution of A · a = 0 . The estimate
max |aρ |v ≤ |R!|δδ v max |cβijα |R
v
ρ=0,...,R β,i,j,α

and (2.15) lead to


h(a) ≤ Rk (h(p) + h(q) + r log(6 + 6d(p)/r) + r log(6 + 6d(q)/r) + C) + log(R!).
By (2.12) and the definition of R , we now get

hϕ (P ) − hψ (P ) ≤ (n + 1)χ(k) h(p) + h(q)

1
+ r log(6 + 6d(p)/r) + r log(6 + 6d(q)/r) + C + log((n + 1)χ(k))
k
proving the proposition. 
The upper bound in Proposition 2.5.9 is quite explicit except for χ(k) . The next lemmas
will handle this problem.
Lemma 2.5.10. Let ψ : X → Pm Q
be a closed embedding over Q with presentation q and
let k ≥ kψ , χ(k) be as in Proposition 2.5.9. Then
   
kd(q) + r + 1 kd(q) − d + r + 1
χ(k) ≤ − .
r+1 r+1
Proof: Let y0 , . . . , ym be the standard coordinates of Pm
Q
. We have seen in the proof of
Proposition 2.5.9 that the linear map

H 0 (Pm 0
Q , OP (k)) −→ H (X, ψ OP (k))
m m

is surjective. Choose B ⊂ {β ∈ Nm+1 | |β| = k} such that


 
y β |X
β ∈B
2.5. Explicit bounds for heights 53

is a basis of H 0 (X, ψ ∗ OPm (k)) . The above monomials are linearly independent if and
Q
only if the polynomials (qβ )β∈B are linearly independent, by definition of a presentation.
Therefore
   
kd(q) + r + 1 kd(q) − d + r + 1
χ(k) ≤ dim Skd(q) = − ,
r+1 r+1
because by 2.5.2 the space Skd(q) is isomorphic to the vector space of homogeneous poly-
nomials p(z0 , . . . , zr+1 ) of degree kd(q) satisfying degz r +1 (p) < d and because of (2.9)
on page 47 . 
The next result is a slight generalization of a result of Mumford. Note that by base change
A.10.28, we may just as well work over the field of complex numbers. For details and a
proof of this lemma, we refer to A. Bertram, L. Ein, and R. Lazarsfeld [22].
Lemma 2.5.11. Let Y be an irreducible smooth closed subvariety of Pm
Q
defined over Q
and let c := min(1 + dim(Y ), codim(Y, Pm
Q
)) . Then
H i (Pm
Q , JY ⊗ OP (k)) = 0
m
Q

for i ≥ 1 and k ≥ c (deg(Y ) − 1) − dim(Y ) .


Lemma 2.5.12. If ψ : X → Pm
Q
is a closed embedding over Q with presentation q , then
deg(ψX) ≤ d(q)r d.
Proof: The Hilbert polynomial of ψX has degree r and its leading coefficient is
deg(ψX)/r! , see A.10.33. For large k , the Hilbert polynomial at k equals the left-hand
side of the inequality in Lemma 2.5.10. On the other hand, the right-hand side of that
inequality is also a polynomial of degree r in k , with leading coefficient
d(q)r d(q)r d(q)r d
((r + 1) + · · · + 1) − ((r + 1 − d) + · · · + (1 − d)) = .
(r + 1)! (r + 1)! r!
The conclusion follows. 
2.5.13. Now we summarize the results of this section. Let X be a smooth irreducible
projective variety over Q . Let r = dim(X) and let π : X → Pr+1
Q
be a morphism over
Q , mapping X birationally onto a hypersurface given by a homogeneous polynomial f of
degree d , as in 2.5.1.
Assume that ϕ : X → PnQ and ψ : X → Pm Q
are closed embeddings over Q with
ϕ∗ OPn (1) ∼
= ψ ∗ OPm (1) and corresponding presentations p and q . We also assume
d(q) ≥ 1 .
Theorem 2.5.14. For each P ∈ X , it holds

2
hϕ (P ) − hψ (P ) ≤ C1 (n + 1) d(q)r +r
h(p) + h(q) + r log(6 + 6 d(p)/r)

+ r log(6 + 6 d(q)/r) + log(n + 1) + C2

with
(d + 1)r (r + 1)r
C1 = d , C2 = (d − 1)h(f ) + d(d + r + 1) + r + 1.
r!
54 WEIL HEIGHTS

Proof: Let k := d (r + 1) d(q)r ; then k ≥ kψ by Lemmas 2.5.11 and 2.5.12. We have


   
kd(q) + r + 1 kd(q) + r + 1 − d
χ(k) ≤ −
r+1 r+1
 
kd(q) + r d
≤d ≤ (kd(q) + r)r ,
r r!
where the first step comes from Lemma 2.5.10 and the second step uses (2.8) on page 47.
From the definition of k , this implies
(d + 1)r (r + 1)r 2
χ(k) ≤ d d(q)r +r .
r!
An easy majorization shows that k−1 log χ(k) ≤ r + 1 , and the result follows easily from
Proposition 2.5.9. 

2.6. Bounded subsets

In order to show that a Weil height is determined by the isomorphism class of a


line bundle, we have introduced in Section 2.2 bounded sets in affine varieties.
In this section, we extend this concept first to arbitrary varieties and thereafter to
several absolute values. Implicitly, this was already used in the proof of Theorem
2.2.11, which now becomes more transparent. This section is used in 2.7 to study
locally bounded metrics on line bundles. Moreover, it is basic for proving the
Chevalley–Weil theorem in Section 10.3.
2.6.1. Let K be a field and let us fix an embedding K ⊂ K into an algebraic
closure. For the moment, we fix an absolute value | | on K .
We have defined in 2.2.8 bounded subsets of U (K) for an affine variety U . We
extend this notion to an arbitrary variety X over K .

Definition 2.6.2. A subset E ⊂ X(K) is called bounded in X , if there is a finite


covering {Ui }i∈I of X by affine open subsets,
 and sets Ei with Ei ⊂ Ui (K),
such that Ei is bounded in Ui and E = Ei .
i∈I

Remark 2.6.3. If E is bounded in X , Lemma 2.2.10 shows that for any finite
covering {Ui }i∈I of X by affine open subsets there is a subdivision

E= Ei , Ei ⊂ Ui (K),
i∈I
such that each Ei is bounded in Ui .
It is easy to prove that the image of a bounded set under a morphism is again
bounded. Moreover, if Y is a closed subvariety of X and E ⊂ Y (K), then E is
bounded in Y if E is bounded in X . The details of the proofs will be left to the
reader.
2.6. Bounded subsets 55

Example 2.6.4. In this example, we assume that K is locally compact with re-
spect to | | (for example, the completion of a number field with respect to a place,
cf. 1.2.12). We consider on X(K) the topology induced locally by open balls with
respect to closed embeddings into affine spaces and the maximum norm. Then the
topology is locally compact and independent of the embeddings. It depends only
on the place v represented by | | and is called the v -topology on X(K). A subset
E of X(K) is bounded in X if and only if E is relatively compact in X(K).
In order to prove this statement, suppose first that E is bounded in X . By defini-
tion, E may be covered by finitely many closed balls in affine spaces. Since these
balls are compact, we conclude that E is relatively compact.
On the other hand, let E be relatively compact in X. Passing to the closure, we
may assume E compact. Then E may be covered by finitely many open balls in
affine spaces. By Lemmas 2.2.9 and 2.2.10, E is bounded in X .

Example 2.6.5. The set Pn (K) is bounded in projective space PnK . We can use
the affine covering Xi := {x ∈ PnK | xi = 0}, i ∈ {0, . . . , n}, and the decompo-
sition Ei := {x ∈ PnK | |xi | = max |xj |} of E . By Remark 2.6.3, the set of
j=0,...,n
K-rational points is bounded in any projective variety. Implicitly, we have already
used these facts in the proof of Theorem 2.2.11.

Proposition 2.6.6. If X is a complete variety over K , then X(K) is bounded


in X . More generally, the inverse image of a bounded subset under a proper
morphism remains bounded.
Proof: By Chow’s lemma in A.9.37, there is a projective variety Y over K and
a surjective birational morphism Y −→ X . Using Remark 2.6.3 and Example
2.6.5, X(K) has to be bounded in X .
More generally, let ϕ : X  → X be a proper morphism of arbitrary varieties over
K and let E ⊂ X(K) be bounded in X . By Chow’s lemma again, the proof is
reduced to the case of a projective morphism and hence to X  = X × PnK with
ϕ the first projection. By Remark 2.6.3, we may further assume X affine and
then the same arguments as in Example 2.6.5 prove that ϕ−1 (E) is bounded in
X × PnK . 

Remark 2.6.7. It is trivial that any subset of a bounded subset is bounded again.
However, we may not pass from X to an open subset. For example, the set E =
{x ∈ Pn (K) | x0 = 0} is bounded in PnK by Example 2.6.5 but it is certainly
not bounded in the affine space {x ∈ PnK | x0 = 0}. Thus the notion of bounded
subset is not a local one and some care is needed with using it.

Definition 2.6.8. A real function f on a K-variety X is called locally bounded


if f (E) is bounded for every bounded E in X .
56 WEIL HEIGHTS

Remark 2.6.9. The locally bounded functions on X form an R -algebra. Propo-


sition 2.6.6 shows that a real function on a complete K-variety is locally bounded
if and only if it is bounded.
2.6.10. To apply this later on to the theory of heights, we need a generalization to
several absolute values. Let MK be a set of places on K . For every v ∈ MK , an
absolute value | |v is fixed in the equivalence class of v . We assume that
{v ∈ MK | |α|v = 1}
is finite for every α ∈ K \ {0}. Let M be a set of places on K . We assume that
every u ∈ M restricts to a v ∈ MK and we denote by | |u the unique extension
of | |v to an absolute value representing u .
Definition 2.6.11. Let U be an affine K-variety and let (E u )u∈M be a family of
subsets of U (K). The family is said to be M-bounded in U if for any f ∈ K[U ]
the quantity
Cv (f ) := sup sup |f (P )|u
u∈M,u|v P ∈E u

is finite for every v ∈ MK and Cv (f ) > 1 for only finitely many v .


If M has only one element, then E ⊂ U (K) is M-bounded in U if and only if E
is bounded in U in the sense of Definition 2.2.8.
Remark 2.6.12. We note that Lemmas 2.2.9 and 2.2.10 extend to the situation
with several absolute values instead of one. Indeed, if we replace in their formula-
tions E by a family (E u )u∈M of subsets and “bounded” by “M-bounded,” then
these statements continue to hold.
The straightforward generalization of the proofs is left to the reader. Implicitly,
this was already used in the proof of Theorem 2.2.11.
Definition 2.6.13. Still under the hypothesis of 2.6.10, let X be a K-variety and
let (E u )u∈M be a family of subsets of X(K). The family is called M-bounded
in X if there is a finite covering {Ui }i∈I of X by affine open subsets and, for any
u ∈ M , a decomposition

Eu = Eiu , Eiu ⊂ Ui (K),
i∈I

such that (Eiu )u∈M is M-bounded in Ui in the sense of Definition 2.6.11 for every
i ∈ I.

Remark 2.6.14. If we make the same changes in Remark 2.6.3 as proposed in


Remark 2.6.12, then Remark 2.6.3 still holds. If M has only one element, then
E ⊂ X(K) is M-bounded if and only if E is bounded in the sense of Definition
2.6.2.
2.7. Metrized line bundles 57

Definition 2.6.15. A subset E ⊂ X(K) is called M-bounded in X if the con-


stant family (E)u∈M is M-bounded.

Example 2.6.16. The set Pn (K) is M-bounded in PnK . The same covering and
the same decomposition as in Example 2.6.5 work. Note that the old Ei now
depend on u ∈ M . Even by starting with a single set, we have to work with
families.
Proposition 2.6.17. A complete K-variety is M-bounded. More generally, the
inverse image of an M-bounded family of subsets under a proper morphism is
M-bounded.
Proof: We leave it to the reader to make the obvious adjustments in the proof of
Proposition 2.6.6. 
Definition 2.6.18. A real function f on X × M is called locally M-bounded if,
for any M-bounded family (E u )u∈M in X , there is for every v ∈ MK a non-
negative real number γv , with γv = 0 only for finitely many v ∈ MK , such that
for all u ∈ M with v|u we have
|f (E u , u)| ≤ γv .

Example 2.6.19. If a is a regular function on an affine variety, then the func-


tion (x, u) → |a(x)|u is not necessarily locally bounded because infinitely many
bounds may be different from 0. But, if a is nowhere vanishing, then the function
(x, u) → log |a(x)|u is locally bounded. This is the typical situation where we
apply this notion.

2.7. Metrized line bundles and local heights

This section contains additional material for local and global heights. It will be
used only in Sections 9.5 and 9.6 and may be skipped in a first reading. In Sections
2.2–2.5, we have studied Weil heights coming from morphisms to projective space.
However, as remarked by A. Néron and as we will see in 9.5, the canonical height
functions on abelian varieties are not of this shape. To deal with this situation,
a local height is associated to every locally bounded metric of a line bundle on
an arbitrary variety. We will extend the results from Sections 2.2 and 2.3 to this
framework.
Let K be a field with a fixed embedding into its algebraic closure K . For the
moment, we fix an an absolute value | | on K . From Definition 2.7.12 on, we
deal with several absolute values. In order to define global heights, we will assume
in 2.7.16–2.7.19 that the product formula holds.
58 WEIL HEIGHTS

Definition 2.7.1. Let L be a line bundle on the K-variety X . A metric on L is


a norm   on every fibre Lx , x ∈ X , i.e. a a real function not identically zero
such that
λv = |λ| · v (λ ∈ K, v ∈ Lx ).
The pair (L,  ) is called a metrized line bundle usually denoted by L. The
metric is said to be locally bounded if log s is locally bounded on U for every
open subset U of X and every nowhere vanishing section s ∈ L(U ).
Example 2.7.2. Let f be a regular function on X . Then almost by definition
log |f | is locally bounded on {x ∈ X | f (x) = 0}. The trivial metric on OX is
characterized by 1 = |1|. We conclude that the trival metric is locally bounded.

2.7.3. Let L = (L,  ), M = (M,  ) be metrized line bundles on the K-


variety X . Then the tensor product L ⊗ M is the metrized line bundle (L ⊗
M,  ), where the metric on L ⊗ M is given by
v ⊗ w := v · w (v ∈ Lx , w ∈ Mx )
for any x ∈ X . Two metrized line bundles on X are called isometric if there
is an isomorphism which is fibrewise an isometry. The isometry classes of line

bundles on X form a group Pic(X). The identity is OX with the trivial metric
and the inverse of L is (L ,  −1 ). The locally bounded metrized line bundles
−1

obviously form a subgroup.


For a morphism ϕ : X  → X of varieties over K , we define the pull-back
ϕ∗ (L) = (ϕ∗ L,  ) as the metrized line bundle on X  with metric on ϕ∗ (L)
characterized by ϕ∗ (s) = s ◦ ϕ for any open subset U of X and s ∈ L(U ).
The pull-back induces a group homomorphism from Pic(X)   ). By
to Pic(X
Remark 2.6.3, the pull-back of a locally bounded metrized line bundle remains
locally bounded.
Example 2.7.4. On OPnK (1), we have the standard metric characterized by
|(x)|
(x) =
maxj=0,...,n |xj |
for any linear form  in the coordinates x0 , . . . , xn . If | | is archimedean, we
often use the Fubini–Study metric
|(x)|
(x)2 =  1/2 .
n
j=1 |xj |
2

We claim that both metrics are locally bounded. Let s be a nowhere vanishing
section of OPnK (1) over an open subset U of PnK and let E be bounded in U .
By Remark 2.6.3, we may assume that U is contained in a standard open subset
{x ∈ PnK | xj = 0}. Then s = f xj for an invertible regular function f on U .
2.7. Metrized line bundles 59

By definition, log |f | is bounded on E . For log xj  (resp. log xj 2 ), we have
the upper bound 0. A lower bound is easily obtained by using |xi /xj | bounded
on E for every i ∈ {0, . . . , n}. This proves the claim.
Proposition 2.7.5. Every line bundle on an arbitrary variety over K admits a
locally bounded metric.
Proof: For simplicity, we first prove the claim for a projective variety X . We may
assume that our line bundle L is generated by global sections s0 , . . . , sm because
every element in Pic(X) is the difference of two very ample ones (see A.6.10).
Then the morphism ϕ : X → Pm K , given by ϕ(x) = (s0 (x) : · · · : sm (x)),
satisfies ϕ∗ OPm (1) = L. The pull-back of the standard metric (cf. Example
2.7.4) is locally bounded. For further reference, we note that
 
 si 

si (x) = min  (x) . (2.16)
j sj
Now let L be any line bundle on an arbitrary variety X over K . We cover X
by finitely many affine trivializations {Ui }i=1,...,m of L. Let si be a nowhere
vanishing section of L over Ui . A first try to define the metric of L on Uj would
be formula (2.16) with j restricted to the ones with x ∈ Uj . Clearly, log si 
would be bounded from above by 0 on Ui . But poles of sj along Ui \ Uj would
make it impossible to give a lower bound on every bounded subset of Ui . The
following smoothing process avoids this problem.
Let xi1 , . . . , xini be coordinates on Ui . Note first that xi1 , . . . , xini , xj1 , . . . , xjnj
are coordinates on Ui ∩ Uj . This is clear by realizing Ui ∩ Uj as the closed affine
subvariety of Ui × Uj given by intersection with the diagonal (which is closed by
definition of a variety).
For notational simplicity, we add xi0 = 1 to the coordinates on every Ui . We
s
consider the transition function gji = sji of L on Ui ∩ Uj . Because gji may
be written as a polynomial in the coordinates xi , xj , we have a constant C1 and
r ∈ N such that
|gji (x)| ≤ C1 max(|xik |, |xjl |)r ≤ C1 max |xik |r · max |xjl |r (2.17)
k,l k l

for every x ∈ Ui ∩ Uj . Note that we may choose C1 and r independently of i, j .


Now we set
si (x) = min max |gij (x)xrjl | (2.18)
j l

for every x ∈ Ui . Again j ranges over all elements of {0, . . . , m} with x ∈ Uj .


If we use the cocycle rule ghj = ghi gij , it is clear that
sh (x) = |ghi (x)| · si (x)
for every x ∈ Uh ∩ Ui . Hence (2.18) characterizes a well-defined metric of L on
X.
60 WEIL HEIGHTS

To prove that this metric is locally bounded, let E be a bounded subset of an open
subset U of X and let s be a nowhere vanishing section of L over U . By Remark
2.6.3, we may assume that E ⊂ Ui for some i ∈ {1, . . . , m}. By definition, it is
clear that log |s/si | is bounded on E . Since
 
s
log s = log   + log si 
si
on E , it follows that we may assume U = Ui , s = si . There is a constant C2
such that
max |xik | ≤ C2 (2.19)
k=0,...,ni

for every x ∈ E . This leads to the upper bound


log si (x) ≤ log max |gii (x)xril | ≤ r log C2 .
l

Using xi0 = xj0 = 1, gij gji = 1, formulas (2.17) and (2.19), we get

−r
1 1
max |gij (x)xjl | ≥
r
max |xik | ≥
l C1 k C1 C2r
leading to the lower bound
log si (x) ≥ − log C1 − r log C2
on E . This proves the claim. 
Remark 2.7.6. It is sometimes useful to impose additional requirements on the
metrics. In the archimedean case, it is often convenient to require that the functions
s be C ∞ for every open subset U and every nowhere vanishing s ∈ L(U ). In
this case, we should work with the metric  2 on OPnK (1), because the standard
metric is not differentiable.
For a non-archimedean absolute value | | on K with place u , it is natural to
assume that s(x) ∈ |K(x)| for every x ∈ X . Moreover, we may assume that
the functions s are continuous with respect to the u -topology on U (K) defined
in Example 2.6.4.
All the results above remain valid for this kind of metrics. In particular, such
metrics always exist on any line bundle. In the non-archimedean case, the metric
constructed in the proof of Proposition 2.7.5 has these additional properties. In the
archimedean case, the existence of a C ∞ -metric follows from a partition of unity
argument.
In the non-archimedean case, continuity is not so important because the u -topology
is totally disconnected. If K is not locally compact, then continuity does not nec-
essarily imply local boundedness. At any rate, the only relevant property for our
purposes is that the metrics we use are locally bounded.
2.7. Metrized line bundles 61

2.7.7. In the situation of Example 2.7.4, we consider the presentation


D = ((x); OPnK (1), x0 , . . . , xn ; OPnK (1))
of the hyperplane D = div((x)). Then the local height λD is given by
 
 xk 
λD (P ) = max log  
k (x) 
for any P = x ∈ supp(D) (as in Example 2.3.2). We conclude that
λD (P ) = − log (x)
depends only on D and the standard metric   on O(D) = OPn (1). This
is used in what follows to generalize the concept of local heights, replacing the
presentation by a suitable locally bounded metric on O(D).
2.7.8. A Néron divisor D  on the K-variety X is a Cartier divisor D on X
with a locally bounded metric on the line bundle O(D). The corresponding lo-
 Note that D induces
cally bounded metrized line bundle is denoted by O(D).
a canonical meromorphic section sD of O(D) and we have D = D(sD ) (cf.
A.8.18).
The Néron divisors on X form a group with composition law
 
 +E
D  = D + E, O(D)  ⊗ O(E) .

Let ϕ : X  → X be a morphism of K-varieties such that no irreducible compo-


nent of X  is mapped into supp(D). Then the pull-back ϕ∗ (D) is a well-defined
Cartier divisor on X  (see A.8.26) and we define the pull-back ϕ∗ (D) as the
Néron divisor  
 = ϕ∗ (D), ϕ∗ O(D)
ϕ∗ (D)  .

 = (D,  )
Definition 2.7.9. The local height associated to the Néron divisor D
on the variety X is given by
λD (P ) := − log sD (P ), P ∈ X \ supp(D).

 = (D,  ) be a Néron divisor on the K-variety X .


Proposition 2.7.10. Let D
 is a Néron divisor on X , then
(a) If E
λD+
 E (P ) = λD
 (P ) + λE
 (P ), P ∈ supp(D) ∪ supp(E).

(b) If ϕ : X  → X is a K-morphism such that no irreducible component of


the K-variety X  is mapped into supp(D), then
λD ◦ ϕ(P  ) = λϕ∗ (D) 
 (P ), P  ∈ X  \ ϕ−1 supp(D).
62 WEIL HEIGHTS

(c) If f is a rational function on X , not identically zero on any irreducible


component, and D(f ) denotes the Néron divisor with trivial metric on
O(D(f )) = OX , then
 ) (P ) = − log |f (P )|,
λD(f P ∈ supp(D).

(d) If   is another locally bounded metric on O(D), then


λ(D,  ) (P ) − λ(D, ) (P ) = log ρ,
where ρ is the norm of 1 ∈ Γ(X, OX ) with respect to the locally bounded
metric  /  .
Proof: (a)–(c) are immediate from the definitions and (d) follows from (a). 
Proposition 2.7.11. Let D = (sD ; L, s; M, t) be a presentation of the Cartier
divisor D = D(sD ) (cf. 2.2.1). Then there is a unique locally bounded metric on
O(D) = L ⊗ M −1 given on a local section s by
 
 stl 

s(x) = min max  (x) .
k l sk
The local height λD (cf. 2.2.2) is equal to the local height λ(D, ) with respect
to the Néron divisor (D,  ).
Proof: By linearity, we may assume that O(D) = L and t0 , . . . , tm = 1. Then
the metric is the pull-back of the standard metric on OPnK (1) and we apply (2.16)
on page 59. 
Again, we generalize our considerations to several absolute values. Let us use the
same assumptions and notation as in 2.6.10.
Definition 2.7.12. An M-metric on a line bundle L is a family ( u )u∈M
such that  u is a metric with respect to the absolute value | |u satisfying the
following compatibility condition:
For every x ∈ X and u1 , u2 ∈ M with the same restriction to K(x), the equation
 u1 =  u2 holds on Lx (K(x)).
An M-metric is called locally bounded if for every nowhere vanishing section s
of L on an open subset U of X , the function (x, u) → log s(x)u on U × M
is locally M-bounded.
 is a Cartier divisor D with a locally bounded M -
2.7.13. Now a Néron divisor D
metric on O(D). All results from 2.7.2 to 2.7.11 hold in this context, replacing
metrics by M -metrics. We leave the details to the reader. Proposition 2.7.10 (d)
and Proposition 2.6.17 give our main result:
Theorem 2.7.14. Let D be a Cartier divisor on a complete variety X over K
and let ( u )u∈M , ( u )u∈M be locally M-bounded metrics on O(D) giving
2.7. Metrized line bundles 63

rise to the local heights


λD (P, u) := − log sD (P )u , λD  (P, u) := − log sD (P )u .
For v ∈ MK , there are constants γv ∈ R , with γv = 0 only for finitely many v ,
such that
|λD  (P, u) − λD (P, u)| ≤ γv
for all P ∈ X \ supp(D).
2.7.15. We apply these results to a non-zero rational function f on an irreducible
regular projective variety X over K . Then we have
n
div(f ) = mj Yj
j=1

for prime divisors Yj . By Proposition 2.3.9, there is a non-negative local height λj


relative to Yj . For any place u on K , exp(−λj (P, u)) measures the u -distance
from P ∈ X(K) to Yj . If we choose the local height λj relative to a presen-
tation as in Proposition 2.3.9, then this distance is even continuous with respect
to the u -topology on X(K) defined as in Example 2.6.4. By the generalization
of Proposition 2.7.10 and Theorem 2.7.14, there is a family (γv )v∈MK of non-
negative real numbers, with γv = 0 for all but finitely many v , such that
n  n
mj λj (P, u) − γv ≤ − log |f (P )|u ≤ mj λj (P, u) + γv
j=1 j=1

for any u ∈ M , u|v ∈ MK , and any P ∈ X(K) \ supp(div(f )). This is Weil’s
theorem of decomposition.
2.7.16. In order to define global heights, we assume that the absolute values of
MK satisfy the product formula (cf. 1.4). Let F/K be a subextension of K/K
and denote by MF the set of places w of the field F with w|v for some v ∈ MK .
We normalize the absolute value | |w as in 1.3.6 and 1.3.12. Then for u ∈ M we
have
| |w = | |u[Tw :Kv ]/[F :K] ,
where Tw is the completion Fw if F/K is separable.
2.7.17. Let L be a locally bounded M -metrized line bundle on the K-variety X .
Our goal is to define the associated global height function hL .
Let P ∈ X(K). We choose a finite subextension F/K of K/K with P ∈
X(F ). For every w ∈ MF , let us choose any u ∈ M with u|w . Then we
consider the | |w -norm
 w :=  u[Tw :Kv ]/[F :K]
on the fibre LP (F ). By the compatibility condition in the definition of an M -
metric, this norm is independent of the choice of u .
64 WEIL HEIGHTS

There is an invertible meromorphic section s of L with P ∈ supp(D(s))


(because there is an open dense trivialization of L in a neighborhood of P ). Then

the M -metric on L yields a Néron divisor D(s) and we set

λD(s)
 (P, w) := − log s(P )w , hL (P ) := λD(s)
 (P, w).
w∈MF

By (1.4) on page 9 (resp. Lemma 1.3.7), the definition of hL (P ) is independent of


the choice of F . From the product formula and Proposition 2.7.10 (c), it is clear
that hL (P ) does not depend on the choice of s .
Proposition 2.7.18. The global height functions above have the following proper-
ties:

(a) hL depends only on the isometry class of L.


(b) If L1 , L2 are locally bounded M -metrized line bundles on X , then
hL1 ⊗L2 = hL1 + hL2 .

(c) Let ϕ : X  → X be a morphism of K-varieties. Then hϕ∗ L = hL ◦ ϕ .


 be the associated
(d) Let D be a presentation of a Cartier divisor D and let D
Néron divisor from Proposition 2.7.11. Then the global height with respect
to D in 2.3.3 is equal to hO(D)
 .

(e) If X is complete (or more generally M-bounded), then hL does not depend
on the choice of the locally bounded M-metric up to bounded functions.
Proof: Property (a) is obvious and (b), (c) follow from Proposition 2.7.10. Claim
(d) is an immediate consequence of Proposition 2.7.11 and (e) follows from
Theorem 2.7.14 or from Proposition 2.7.10 (d). 
Remark 2.7.19. By the generalization of Proposition 2.7.5 to several absolute
values, every line bundle L admits a locally bounded M-metric. On complete
varieties, it follows from Proposition 2.7.18 that the corresponding global height
depends only on the isomorphism class of L up to bounded functions. Moreover,
Theorem 2.3.8 holds for arbitrary complete varieties X, Y over K .

In practice, metrics often arise from integral models, as it will transpire in the
following example which assumes that the reader is familiar with the basics of
scheme theory. As the example plays no further role in this book, the reader may
skip it without problems.
Example 2.7.20. Let R be a Dedekind domain with quotient field K . For every maximal
ideal ℘v of R , we fix a discrete valuation | |v on K with ℘v = {x ∈ R | |x|v < 1} .
Again, M is the set of absolute values on K extending those of MK .
2.7. Metrized line bundles 65

We consider a line bundle L on a flat proper reduced scheme X over R . The generic fibre
X = XK is a complete variety over K with line bundle L := LK . The goal is to describe
the natural M-metric  L on L induced from L .
Let x ∈ X . Then there is a finite subextension F/K of K/K with x ∈ X(F ) . The inte-
gral closure RF of R in F is again a Dedekind domain ([157], Th.10.7). By the valuative
criterion of properness ([148], Th.II.4.7), there is a unique morphism x : Spec(RF ) → X
mapping the generic point {0} to x . Let u ∈ M restricting to F to the valuation w on F
with corresponding prime ideal ℘w . There is a local nowhere vanishing section s of L in
x(℘w ) . Then s is also defined and non-zero in x (because x(℘w ) is in the closure of x in
X ) and we set
s(x)L,u := 1. (2.20)
 
If we replace s by another local nowhere vanishing section s in x(℘w ) , then s /s is a
unit in the localization of RF in ℘w (consider the stalk of the sheaf of sections of L in
x(℘w ) ). We conclude that (2.20) determines a well-defined M-metric on L .
To prove that this M-metric is locally bounded, let (E u )u∈M be an M-bounded family
in an open subset U of X and let s ∈ L(U ) be nowhere vanishing. We may cover X
by finitely many affine trivializations Ui of L with nowhere vanishing si ∈ L(Ui ) . For
u ∈ M , we define
Eiu := {x ∈ E u | x(℘w ) ∈ Ui },
using the notation
 from above. Clearly, Eiu is contained in the generic fibre Ui of Ui and
we have E = i Ei . For x ∈ Eiu , we have si (x)L,u = 1 and hence
u u

 
s 
log s(x)L,u = log  (x) .
si u

So it is enough to prove that (Eiu )u∈M is M-bounded in U ∩Ui . The coordinates on U ∩Ui
are given by the coordinates of U and Ui (cf. proof of Proposition 2.7.5). Clearly, the
coordinates of Ui are M-bounded on (Eiu )u∈M . Let x1 , . . . , xn be a set of coordinates on
Ui over R . Then they are also coordinates of Ui and |xj |u ≤ 1 on Eiu for j = 1, . . . , n
(using x(℘w ) ∈ Ui ). By the generalization of Lemma 2.2.10, (Eiu )u∈M has to be M-
bounded in U ∩ Ui proving that our M-metric is locally bounded.
Let s be an invertible meromorphic section of L . The M -metric  L induces a Néron
 on D = div(s) and a local height
divisor D
λD (x, u) := − log s(x)L,u
for x ∈ X \ supp(D) and u ∈ M .
Now let x ∈ X(K) \ supp(D) and let u ∈ M with u|v ∈ MK . We will give a geometric
interpretation of λD (x, u) in terms of intersection multiplicities. By renormalization, we
may assume that log |K × | = Z .
There is a similar intersection theory of Cartier divisors on X with cycles, as described in
Section A.9 (cf. [125], 20.2). By flatness, we easily deduce that X is dense in X and that
s extends uniquely to an invertible meromorphic section sL of L . On the other hand, the
closure {x} of x in X is a one-dimensional prime cycle on X . Since x ∈ supp(D) , the
66 WEIL HEIGHTS

proper intersection product


  
D(sL ).{x} = div sL {x}

is a well-defined zero-dimensional cycle Z on X . For v ∈ MK , let Zv be the part of Z


contained in the fibre X℘ v , which is a variety over the residue field and let degv (Z) be the
degree of Zv . We claim that
 
λD (x, u) = degv D(sL ).{x} .

Indeed, by projection formula, we have


 
degv D(sL ).{x} = degv (x∗ D(sL ).Spec(RF )) .

Using Proposition 2.7.10, we get


λD (x, u) = λx ∗ D ({0}, u),
thus we may assume X = Spec(RF ), L = OX . Then s ∈ K × and by our normalization
of | |v , we get
degv (div(sL )) = − log |s|v = λD (x, u),
as claimed.

2.8. Heights on Grassmannians

Let M denote the set of absolute values on Q extending MQ .


There are other choices for the height on PnQ (see Example 2.7.4 and Remark
2.7.6). In Arakelov theory, the following height is more natural.
n+1
Definition 2.8.1. For x ∈ Q and u ∈ M , we set


⎪ |xj |u
⎨ maxj
if u is non-archimedean,
Hu (x) :=
1/2
⎪  |xj |2
n

⎩ u if u is archimedean.
j=0

Let F ⊂ Q be a number field. Then, for x ∈ F n+1 and w ∈ MF , we define


Hw (x) := Hu (x)[Fw :Qp ]/[F :Q] ,
where p ∈ MQ and u ∈ M are such that w|p and u|w .
2.8.2. Now we define the Arakelov height for P ∈ PnQ (F ) with representative
x ∈ F n+1 by

hAr (P ) := log Hw (x).
w∈MF

As always, the multiplicative Arakelov height is defined by HAr (P ) := exp(hAr (P )).


2.8. Heights on Grassmannians 67

2.8.3. Again by the product formula, hAr (P ) is independent of the choice of the
representative x . It follows from Corollary 1.3.2 that the definition hAr (P ) does
not depend on the extension F , i.e. hAr is a well-defined function on PnQ . It is
easy to see that hAr differs from the old height h by a bounded function on PnQ .
In the sense of 2.3.7, they are equivalent.
The preceding remark also follows from Proposition 2.7.18, because hAr is the
global height associated to the locally bounded metrized line bundle on OPn (1)
using the Fubini–Study metrics at the archimedean places and the standard metrics
at the non-archimedean places (cf. 2.7.4).
n
2.8.4. Let W be an m-dimensional subspace of Q . The m th exterior power
n
∧m W is a one-dimensional subspace of ∧m Q . Therefore, we may view W as
m n
 n  P(∧ Q ). The latter may be identified with
a point PW of the projective space
projective space of dimension m by using standard coordinates.
Definition 2.8.5. hAr (W ) := hAr (PW ) is called the Arakelov height of W .
Definition 2.8.6. Let A be an n×m matrix of rank m with entries in Q. Then the
Arakelov height hAr (A) of A is defined as the Arakelov height of the subspace
n
of Q spanned by the columns of A .
Remark 2.8.7. We can be quite explicit in defining hAr (A). Let I ⊂ {1, . . . , n}
with |I| = m . We denote by AI the m × m submatrix of A formed with the
n
ith rows, i ∈ I , of A . Then the point in P(∧m Q ) corresponding to A is given
by the coordinates det(AI ), where I ranges over all subsets of {1, . . . , n} of
cardinality m . Let F ⊂ Q be a number field containing all entries of A and set


⎨max |det(AI )|u if u is non-archimedean,
Hu (A) := I

1/2
⎪ 
⎩ |det(AI )|2 if u is archimedean.
I
For w ∈ MF , we now set
Hw (A) = Hu (A)[Fw :Qp ]/[F :Q] ,
where p ∈ MQ and u ∈ M are such that w|p and u|w . Then we have

hAr (A) = log Hw (A).
w∈MF

It is also clear that hAr (AG) = hAr (A) for any invertible m × m matrix G with
entries in Q .
Proposition 2.8.8. Let u ∈ M be an archimedean place. Then
Hu (A) = |det(A∗ A)|u1/2 ,
t
where A∗ = A is the adjoint.
68 WEIL HEIGHTS

Proof: This follows from the well-known Binet formula



det(A∗ A) = |det(AI )|2 .
I

For completeness, we give the proof for the case of complex matrices. Let u :
Cm −→ Cn be the corresponding linear map and let u∗ be the adjoint map. We
have
∧m (u∗ ) ◦ ∧m (u) = ∧m (u∗ ◦ u).
In the canonical basis, the matrix of ∧m (u) (resp. ∧m (u∗ )) has only one column
(resp. one row) and its entries are det(AI ) (resp. det(AI )), where I ranges over
all subsets of {1, . . . , n} of cardinality m . The matrix of ∧m (u∗ ◦ u) is a 1 × 1
matrix with entry det(A∗ A), proving Binet’s formula. 
Remark 2.8.9. Let A be n × m submatrix of rank m with entries in Q and let
B, C be complementary submatrices of type n × m1 and n × m2 , respectively.
For any w ∈ MF , we have
Hu (A) ≤ Hu (B) Hu (C)
and thus
hAr (A) ≤ hAr (B) + hAr (C).
For a non-archimedean u , this follows easily from Laplace’s expansion. If instead
u is archimedean, it is a consequence of Proposition 2.8.8 and Fischer’s inequality

B B B∗C
det ≤ det(B ∗ B) det(C ∗ C)
C ∗B C ∗C
from linear algebra (see e.g. L. Mirsky [206], Th.13.5.5), which extends Hadamard’s
inequality.
n
Proposition 2.8.10. Let W be an m -dimensional subspace of Q and let W ⊥
n n
be its annihilator in the dual (Q )∗ = Q . Then hAr (W ⊥ ) = hAr (W ).
n
Proof: Let V be the vector space Q . Any element x of ∧m (V ) defines a linear
map ψ(x) : y → x ∧ y from ∧n−m V to ∧n (V ), or in other words an element
ϕ(x) of ∧n (V ) ⊗ ∧n−m (V ∗ ). The map
ϕ : ∧m (V ) −→ ∧n (V ) ⊗ ∧n−m (V ∗ )
is an isomorphism, which maps each element of the canonical basis of ∧m (V ) to
± an element of the canonical basis of ∧n (V ) ⊗ ∧n−m (V ∗ ). The line ∧m (W )
is mapped by ϕ to the line ∧n (V ) ⊗ ∧n−m (W ⊥ ), since, for any non-zero x
in ∧m (W ), the kernel of ψ(x) is the subspace of ∧n−m (V ) generated by the
elements of the form w ∧z , with w ∈ W and z ∈ ∧n−m−1 (V ). We conclude that
the coordinates of ∧m (W ) in P(∧m (V )) are, up to a sign, equal to the coordinates
of ∧n−m (W ⊥ ) in P(∧n−m (V ∗ )). This proves the claim. 
2.8. Heights on Grassmannians 69

Definition 2.8.11. Let A be an m × n matrix of rank m with entries in Q . Then


the Arakelov height of A is defined by
hAr (A) := hAr (At ),
where At is the transpose and hAr (At ) is the Arakelov height as in Definition
2.8.6.
Corollary 2.8.12. With A as in 2.8.11, the Arakelov height of the space of solu-
tions of A · x = 0 equals hAr (A).
Proof: Note that the rows of A form a basis of
{A · x = 0}⊥
and use Proposition 2.8.10. 
Another interesting and important property, which we give without proof, is a
theorem of W.M. Schmidt [271], Ch.I, Lemma 8A, and independently of T.
Struppeck and J.D. Vaaler [294]. We do not need it in the sequel.
n
Theorem 2.8.13. Let V , W be subspaces of Q . Then
hAr (V + W ) + hAr (V ∩ W ) ≤ hAr (V ) + hAr (W ).
2.8.14. The following local considerations hold more generally over the comple-
tion Qu with respect to u ∈ M . Note that this field is algebraically closed ([43],
Prop.3.4.1/3). For any ϕ : Hom((Qu )n , (Qu )m ) and u ∈ M , we define a dual
local height by
Hu∗ (ϕ) := sup{Hu (ϕ(x)) | x ∈ (Qu )n , Hu (x) ≤ 1} .
If we identify ϕ with a M × N matrix A = (akl ), we find that, if u is not
archimedean, we have
Hu∗ (A) = max |akl |u .
If instead u is archimedean, let A∗ denote the adjoint of A and let
0 ≤ λ1 ≤ λ2 ≤ · · · ≤ λN
denote the eigenvalues of the positive semi-definite matrix A∗ A . Using standard
facts about operator norms, we get
Hu∗ (A) = |λN |u1/2 .
Definition 2.8.15. Let ψ ∈ P GL(n + 1, Qu ) and let A be a representative of ψ
in GL(n + 1, Qu ). The distorsion factor of ψ is
ηu (ψ) = Hu∗ (A)Hu∗ (A−1 ).

2.8.16. Let u ∈ M . Now define for x, y ∈ (Qu )n+1 \{0} the projective distance
of x , y with respect to u to be
Hu (x ∧ y)
δu (x, y) = .
Hu (x)Hu (y)
70 WEIL HEIGHTS

This gives a map


δu : Pn (Qu ) × Pn (Qu ) −→ [0, 1] .
Definition 2.8.17. If F ⊂ Q is a number field and x, y ∈ F n+1 , then for w ∈
MF we define
δw (x, y) = δu (x, y)[Fw :Qp ]/[F :Q] ,
where p ∈ MQ and u ∈ MQ are such that w|p and u|v .

Proposition 2.8.18. For u ∈ M , the projective distance δu is a metric on Pn (Qu ).


Proof: It is clear that δu (P, Q) = δu (Q, P ) and δu (P, Q) = 0 if and only if
P = Q. It remains to verify the triangle inequality.
Consider first the case of a non-archimedean u . By homogeneity and continuity,
it is enough to show that
Hu (x ∧ z) ≤ max(Hu (x ∧ y), Hu (y ∧ z)) (2.21)
for x, y, z ∈ F n+1 with Hu (x) = Hu (y) = 1 and Hu (z) ≤ 1, where F ⊂ Qu
is a number field. We argue by contradiction, hence we assume
Hu (x ∧ z) > max(Hu (x ∧ y), Hu (y ∧ z)); (2.22)
in particular, we have
Hu (y ∧ z) < Hu (x ∧ z) ≤ Hu (z). (2.23)
Let ν ∈ F be such that Hu (z) = |ν|u . Then (2.23) shows that the reductions of y
and ν −1 z are linearly dependent over the residue field of F . This clearly implies
that there are µ ∈ F and z ∈ Ln+1 such that
z = µy + z ,
with |µ|u = Hu (z) and Hu (z ) < Hu (z).
We have x ∧ z = x ∧ z − µx ∧ y , and Hu (µx ∧ y) < Hu (x ∧ z) by equation
(2.22). Therefore, the ultrametric triangle inequality for Hu shows that
Hu (x ∧ z ) = Hu (x ∧ z) > 0. (2.24)

Moreover, it is obvious that y ∧ z = y ∧ z . This and (2.24) show that, if (2.22)
holds for x, y, z, it also holds for x, y, z , for some z with Hu (z ) < Hu (z).
By repeating this process, and noting that the absolute value | |u is discrete on
F , we can also make Hu (z ) arbitrarily small. This contradicts (2.24) and proves
(2.21).
If instead u is archimedean, we may work in C , with the usual euclidean absolute
value | |. An easy calculation shows that for x, y ∈ Cn+1 \ {0} we have

2
|x, y|
δu (x, y) = 1 − ,
x y
2.8. Heights on Grassmannians 71

where  ,  is the standard scalar product in Cn+1 with norm  . We need to


show that
(1 − |x, z|2 )1/2 ≤ (1 − |x, y|2 )1/2 + (1 − |y, z|2 )1/2 (2.25)
whenever x = y = z = 1.
Replacing x and y by eiθ x and eiφ y for suitable θ and φ , we may assume
that the scalar products x, y and y, z are real numbers. We identify Cn+1
with R2n+2 equipped with the euclidean scalar product  , R =  , ; then
it is enough to prove (2.25) in real euclidean space R2n+2 , with the standard eu-
clidean scalar product. Let u, v, w be the three vectors in R2n+2 corresponding
to x, y, z , and let α = ∠(u, v), β = ∠(v, w), γ = ∠(u, w) be the angles
formed by these vectors. Since the scalar product of two vectors of length 1 is the
cosine of their angle, the inequality to be proven becomes
| sin γ| ≤ | sin α| + | sin β|.
We may also assume that 0 ≤ α, β ≤ π/2, because for our purposes we are free to
replace v and w by −v and −w . The spherical distance now gives γ ≤ α + β .
If α + β > π/2, we have sin α + sin β > 1, and the inequality to be shown is
trivial. If instead α + β ≤ π/2, then
| sin γ| ≤ sin(α + β) = cos β sin α + cos α sin β ≤ sin α + sin β. 

We note without proof the following useful inequality.


Proposition 2.8.19. Suppose u ∈ M , P, Q ∈ Pn (Qu ) and ψ ∈ P GL(n +
1, Qu ). Then
ηu (ψ)−1 δu (P, Q) ≤ δu (ψ(P ), ψ(Q)) ≤ ηu (ψ)δu (P, Q).

For the proof, we refer to K.K. Choi and J.D. Vaaler [66].
2.8.20. Let O be the point (1 : 0) on P1Q . The Arakelov height of P ∈ P1Q is
now given by the elegant formula
 1
hAr (P ) = log .
δw (P, O)
w∈MF

We give an application of the projective metrics so introduced. We have the pro-


jective Liouville inequality:
Theorem 2.8.21. Let F ⊂ Q be a number field and let α, β be different elements
of F . Let α = (1 : α) and β = (1 : β) be the corresponding points on P1Q . Then
 −1
δw (α, β) = (HAr (α)HAr (β)) .
w∈MF
72 WEIL HEIGHTS

In particular, for every w ∈ MF it holds


−1
δw (α, β) ≥ (HAr (α)HAr (β)) .
Proof: Note that Hw (α ∧ β) = |α − β|w and then the claim follows from the
product formula. 

2.9. Siegel’s lemma

In 1929 C.L. Siegel, in the course of his work on diophantine approximation and
transcendency, formally stated what is known today as Siegel’s lemma
Lemma 2.9.1. Let aij , i = 1, . . . , M , j = 1, . . . , N be rational integers, not
all 0, bounded by B and suppose that N > M . Then the homogeneous linear
system
a11 x1 + a12 x2 + ··· + a1N xN = 0
a21 x1 + a22 x2 + ··· + a2N xN = 0
· · · ··· · · ·
aM 1 x1 + aM 2 x2 + ··· + aM N xN = 0

has a solution x1 , . . . , xN in rational integers, not all 0, bounded by


M !
max |xi | ≤ (N B) N −M .
i

Siegel’s lemma and its numerous variants have become a fundamental tool in dio-
phantine approximation and transcendency.
Proof: The M × N matrix (aij ) is denoted by A . Clearly, we may assume that
no row is identically 0. For a positive integer k , let us consider the set
" #
T := x ∈ ZN | 0 ≤ xi ≤ k, i = 1, . . . , N .
+
We denote by Sm the sum of the positive entries in the m th row of A , and sim-

ilarly by Sm the sum of the negative entries. Then for x ∈ T and y := Ax we
have

kSm ≤ ym ≤ kSm +
.
Let " #
T  := y ∈ ZM | kSm

≤ ym ≤ kSm
+
, m = 1, . . . , M .

Writing Bm := maxn |a mn |, we have Sm − Sm ≤ N Bm and we conclude that
+

the set T has at most m (N kBm + 1) elements. Now we choose k so that T
has more elements than T  , namely

(N kBm + 1) < (k + 1)N . (2.26)
m
2.9. Siegel’s lemma 73

 1
If we choose k to be the integer part of m (N Bm ) N −M and use N kBm + 1 <
N Bm (k + 1), then we easily verify that inequality (2.26) is fulfilled.
By Dirichlet’s pigeon-hole principle, there are two different points x , x ∈ T
with Ax = Ax . The point x := x − x is a solution of Ax = 0 in integers,
with maxn |xn | ≤ k . 
Corollary 2.9.2. Let K be a number field of degree d contained in C with | |
the usual absolute value on C . Let M, N ∈ N, 0 < M < N . Then there are
positive constants C1 , C2 such that for any non-zero M ×N matrix A with entries
amn ∈ OK , there is x ∈ OK N
\ {0} with A · x = 0 and
M
H(x) ≤ C1 (C2 N B) N −M ,
where B := supσ,m,n |σ(amn )| with σ ranging over the embeddings of K into
C.
Proof: Let ω1 , . . . , ωd be a Z -basis of the ring of algebraic integers OK . The
entries of A may be written in the form

d
amn = (j)
amn ωj , (j)
amn ∈ Z. (2.27)
j=1
d (k)
Using xn = k=1 xn ωk , we get

N 
d 
d 
N 
d
(l)
(A · x)m = (j)
amn ωj ωk x(k)
n =
(j)
amn bjk x(k)
n ωl ,
n=1 j,k=1 l=1 n=1 j,k=1
d
Let A be the (M d) × (N d) matrix
(l)
where ωj ωk = l=1 bjk ωl .
⎛ ⎞
d
A := ⎝ bjk ⎠
(j) (l)
amn
j=1

with rows indexed by (m, l), columns indexed by (n, k), and let y ∈ ZN d be the
(k)
vector (xn ). Then the Siegel lemma above gives a non-zero integer solution y
of A · y = 0 with

NM−M
(l)
H(y) ≤ N d max |amn | max |bjk |
2 (j)
.
m,n,j j,k,l

Let σ be ranging over the d = [K : Q] different embeddings of K into C . By


conjugating (2.27) with all σ s and using that (σ(ωj )) is an invertible d×d -matrix
(because the square of the determinant is the discriminant, see B.1.14 and Remark
B.1.15), we get
max |amn(j)
| ≤ C2 max |σ(amn )|
j σ
74 WEIL HEIGHTS

d
for a suitable positive constant C2 . Then xn =
(k)
k=1 xn ωk yields H(x) ≤
C1 H(y) and using C2 := C2 d2 maxj,k,l |bjk |, we get the claim.
(l)

2.9.3. The goal of this section is to give an improved version of Siegel’s lemma for
a given number field K of degree d . Often, the simple version above is sufficient
for applications and the reader may skip the remaining part of this section in a first
reading.
After stating the result in Theorem 2.9.4, we give a series of immediate corollaries,
which are quite useful in practice; even in the case K = Q , we get an improve-
ment of the elementary form of Siegel’s lemma proved above. The proof of the
generalized Siegel’s lemma will be done in several steps. First, we recall for com-
pleteness some basic results in the geometry of numbers, referring to Appendix
C for the proofs. Then we use Minkowski’s second main theorem to construct a
“small” basis in the range of a N × M matrix A of rank M . In order to find a
“small” basis of solutions of our original equation Ax = 0, we apply this result
to the matrix A whose columns are formed by any basis of solutions. As A and
A have the same Arakelov height, we will get the generalized Siegel’s lemma.
Finally, we give a relative version where the entries are in a finite extension F of
K but the solutions are required to be in K .

We now state Siegel’s lemma over a number field K of degree d and discriminant
DK/Q , in the form given by E. Bombieri and J.D. Vaaler [35].
Theorem 2.9.4. Let A be an M × N matrix of rank M with entries in K . Then
the K-vector space of solutions of Ax = 0 has a basis x1 , . . . , xN −M , contained
N
in OK , such that
−M
N
N −M
H(xl ) ≤ |DK/Q | 2d HAr (A).
l=1

Remark 2.9.5. Here H(x) is the multiplicative homogeneous height, so that we


consider x as a point in PN −1 (K). Hence there is no deep information contained
N
in the statement that we can choose our solutions in OK , because we can replace
any solution by a scalar multiple without changing the height.
Remark 2.9.6. The Arakelov height of a M × N matrix of rank M has been
introduced in Definition 2.8.11. We recall that, if W is the subspace of K N
spanned by the rows of A , then HAr (A) is the multiplicative Arakelov height of
the line ∧M W in the projective space P(∧M K N ). The difference with the usual
height consists only in using the L2 -local height, instead of the L∞ -local height,
at the archimedean places.
Sometimes it is not practical to assume always that A has maximal rank M . This
can be obviated as follows. For any M × N matrix of rank R with entries in K ,
2.9. Siegel’s lemma 75

let W be the subspace of K N spanned by the rows of A and define


row
HAr (A) := HAr (W ) = HAr (∧R W ),
where the line ∧R W is viewed as a point of the projective space P(∧R K N ).
Corollary 2.9.7. Let A be an M × N matrix over K of rank R . Then there is a
N
basis x1 , . . . , xN −R of the kernel of A , contained in OK , such that
−R
N
N −R
H(xl ) ≤ |DK/Q | 2d row
HAr (A).
l=1

Proof: There is an R×N submatrix A of rank R . Then A and A have the same
kernel and the same Arakelov height. The result follows by applying Theorem
2.9.4 to A . 
2.9.8. If Am is the m th row of A , then Remark 2.8.9 shows that

row
HAr (A) ≤ HAr (Am ),
m

where m ranges over R linearly independent rows of A . We denote by H(A)


M −1
the multiplicative height of the matrix A as a point of PN
K . Then the obvious
inequality

HAr (Am ) ≤ N H(A)
proves
Corollary 2.9.9. With the same assumptions as in Corollary 2.9.7, it holds that
−R
N
N −R √
H(xl ) ≤ |DK/Q | 2d ( N H(A))R .
l=1

In particular, there is a non-zero solution x ∈ OKN


of Ax = 0 with

N R−R
1 √
H(x) ≤ |DK/Q | 2d N H(A) .

If we compare with the original Siegel’s lemma √in the case K = Q , we have
improved the result by replacing the factor N by N in the final estimate.
2.9.10. The proof of Theorem 2.9.4 uses geometry of numbers over the adeles,
and we shall refer to Appendix C for details. Here we will limit ourselves to basic
definitions and results.
Let Kv be the completion of the number field K with respect to the place v ∈ MK
and let v|p ∈ MQ . Then Kv is a locally compact group (see 1.2.12) with Haar
measure uniquely determined up to a scalar. We normalize this Haar measure as
follows:
76 WEIL HEIGHTS

(a) if v is non-archimedean, βv denotes the Haar measure on Kv normalized


so that
βv (Rv ) = |DKv /Qp |p1/2 ,
where Rv is the valuation ring of Kv and DKv /Qp is the discriminant;
(b) if Kv = R , then βv is the ordinary Lebesgue measure;
(c) if Kv = C , then βv is twice the ordinary Lebesgue measure.
2.9.11. Let N be a positive integer. For every archimedean v ∈ MK , let Sv be
a non-empty convex, symmetric, open subset of KvN . By symmetric, we mean
Sv = −Sv . For each non-archimedean v ∈ MK , let Sv be a Kv -lattice in KvN ,
namely a non-empty compact and open Rv -submodule of KvN . We assume that
Sv = RvN for all but finitely many v . Then the set
" #
Λ := x ∈ K N | x ∈ Sv for every non-archimedean v
is a K-lattice in K N , that is a finitely generated OK -module, which generates
K N as a vector space (cf. C.2.6). Moreover,  the image Λ∞ of Λ under the
canonical embedding of K N into E∞ := v|∞ K N
v is an R -lattice in E∞
(cf. C.2.7). This is the familiar notion of a lattice, meaning that Λ∞ is a dis-
crete subgroup of the R -vector space E∞ and that E∞ /Λ∞ is compact.
Definition 2.9.12. The nthsuccessive minimum of the non-empty convex, sym-
metric, open subset S∞ := v|∞ Sv of E∞ with respect to the lattice Λ∞ is
λn := inf{t > 0 | tS∞ contains n K-linearly independent vectors of Λ∞ }.

The adelic version of Minkowski’s second theorem is (cf. Theorem C.2.11).


Theorem 2.9.13. The successive minima satisfy

(λ1 λ2 · · · λN )d βv (Sv ) ≤ 2dN .
v∈MK

2.9.14. For the proof of Siegel’s lemma 2.9.4 we choose the sets Sv as follows.
First, let QN N
v be the unit cube in Kv of volume 1 with respect to the Haar
measure βv . Explicitly, this is given by


⎨max xn v < 2
1
if v is real
N
Qv := max xn v < 2π 1
if v is complex


max xn v ≤ 1 if v is non-archimedean
using the normalization from 1.3.6 with respect to K/Q . Let A be an N × M
matrix of rank M with entries in K . Then we set
Sv := {y ∈ KvM | Ay ∈ QN
v }.
If v is archimedean, then Sv is a non-empty, convex, symmetric, bounded open
subset of KvM ; with respect to the injective map x → Ax , the image of Sv is a
2.9. Siegel’s lemma 77

linear slice of the cube QNv . If instead v is non-archimedean, then it is easy to


show, as we will verify below, that Sv is a Kv -lattice in KvM .
In order to apply Minkowski’s second theorem, we need to estimate volumes from
below.
Proposition 2.9.15. With the notation of 2.9.14 and for an archimedean v ∈ MK ,
it holds
−1
βv (Sv ) ≥ det(A∗ A)v 2 ,
t
where A∗ = A is the transpose conjugate of A .
Proof: If v is real, this is Theorem C.3.8 for n1 = · · · = nN = 1. If v is complex,
we write A = U + iV and y = u + iv for real U, V, u, v . We identify KvM with
R2M by means of y = u + iv and we proceed similarly with KvN = R2N . Then
the map y → Ay is given by the real 2N × 2M matrix

 U −V
A =
V U
and also $ %
1
QN
v =(u, v) ∈ R2N | u2j + vj2 < .

Now we apply Theorem C.3.8 with n1 = · · · = nN = 2 to get
1
βv (Sv ) ≥ det(A A )− 2 .
t

Since A → A is a ring homomorphism from the complex N × M matrices to the


real 2N × 2M matrices, we conclude that
det(A A ) = det((A∗ A) ) = det(A∗ A)2 ,
t

thereby proving the claim. 


The corresponding result for non-archimedean v ∈ MK is more precise, uses the
discriminant DKv /Qp (see B.1.14), and is easier to prove. We have:
Proposition 2.9.16. Let v be a non-archimedean place of K lying over the prime
p . Then, with the notation of 2.9.14, it holds
M
 −1
βv (Sv ) = |DKv /Qp |p2 max det(AI )v ,
I

where I ranges over all subsets of {1, . . . , N } of cardinality M and AI is the


M × M matrix formed by the ith rows of A with i ∈ I .
Proof: Choose a subset J of cardinality M such that det(AJ )v is maximal.
Without loss of generality, we may assume that J = {1, . . . , M }. Then W =
AA−1
J has the form

I
W = M ,
W
78 WEIL HEIGHTS

where IM is the M × M unit matrix and W  is an (N − M ) × M matrix. We


have
det(WI )v ≤ 1
for any subset I of cardinality M , because det(AJ )v was chosen to be maxi-
mal. Therefore, applying this with I = {1, . . . , l − 1, l + 1, . . . , M, M + j}, we
see that
wM +j,l v = det(WI )v ≤ 1.
This means that all entries of W are in the valuation ring Rv and proves
AJ Sv = {y ∈ KvM | W y ∈ QN
v } = Rv .
M

Under the linear transformation y = A−1 M


J y on Kv , the volume transforms by a
−1
factor det(Aj )v (cf. C.1.3). Hence
M
βv (Sv ) = det(Aj )−1 −1
v βv (Rv ) = det(Aj )v |DKv /Qp |p ,
M 2

completing the proof. 


Remark 2.9.17. The proof of the above proposition shows immediately that Sv
is a Kv -lattice in KvM for every non-archimedean v and Sv = RvM for all but
finitely many v .

We are now ready to prove:


Proposition 2.9.18. Let A be an N × M matrix of rank M with entries in K .
Then the image of A has a basis x1 , . . . , xM with
M
Md s
2 M
H(xm ) ≤ |DK/Q | 2d HAr (A),
m=1
π
where s is the number of complex places of K .
Proof: We apply Minkowski’s second theorem from 2.9.13 to the sets Sv defined
in 2.9.14. Then Propositions 2.9.15, 2.9.16, 2.8.8, show that
  M
βv (Sv ) ≥ |DKv /Q |p2 HAr (A)−d .
v v | ∞

As shown in (C.4) on page 606, we have



|DKv /Qp |p = |DK/Q |−1 ,
v | ∞

whence 
βv (Sv ) ≥ |DK/Q |− 2 HAr (A)−d .
M

v
Now Minkowski’s second theorem yields
M
λ1 · · · λM ≤ 2M |DK/Q | 2d HAr (A). (2.28)
2.9. Siegel’s lemma 79

It remains to connect the successive minima with the basis we want to find. With
our specific sets Sv, we form the K-lattice Λ as in 2.9.11 and identify Λ with its
image Λ∞ in E∞ . Let y ∈ K M be a lattice point in λS for some λ > 0 and let
x := Ay . Then the definition of S = v|∞ Sv gives maxn xn v < λ/2 if v is
real, maxn xn v < λ2 /(2π) if v is complex, and maxn xn v ≤ 1 if v is not
archimedean. Thus we have

s
λ 2 d
H(Ay) < . (2.29)
2 π
There are linearly independent lattice points y1 , . . . , yM ∈ K M such that ym ∈
λm S , for m = 1, . . . , M . Then (2.28) and (2.29) prove what we want, with
xm = Aym . 
Proof of Theorem 2.9.4: Let A be an N × (N − M ) matrix whose columns form
a basis of the kernel of A . Clearly, A has rank N − M and the image of A is the
same as the kernel of A . By Corollary 2.8.12, A and A have the same Arakelov
height. Now Proposition 2.9.18 applied to A gives a basis x1 , . . . , xN −M of the
kernel of A such that
N−M
Md s
2 M
H(xl ) ≤ |DK/Q | 2d HAr (A).
π
l=1
This completes the proof of Theorem 2.9.4. 
Finally, we give a relative version of Siegel’s lemma.
Theorem 2.9.19. Let K be a number field of degree d and discriminant DK/Q
and let F be a finite-dimensional field extension of K of degree r := [F : K].
Let A be an M × N matrix with entries in F and assume rM < N . Then there
exist N − rM K-linearly independent vectors xl ∈ OK N
such that
Axl = 0, l = 1, 2, . . . , N − rM
and
−rM
N 
M
N −r M
H(xl ) ≤ |DK/Q | 2d HAr (Ai )r ,
l=1 i=1
where Ai is the ith row of A .
Proof: Let ω1 , . . . , ωr be a basis of F/K . For the entries of A = (amn ), we have
r
(j)
amn = amn ωj
j=1
(j)
for uniquely determined amn ∈ K . Let A(j) be the M × N matrix with entries
(j)
amn . Then for x ∈ K , the equation Ax = 0 is equivalent to the system of
N

equations A(j) x = 0, for j = 1, . . . , r . Let A denote the associated rM × N


matrix. Then we are looking for solutions of A x = 0, x ∈ K N . Let σ1 , . . . , σr
80 WEIL HEIGHTS

be the distinct embeddings of F into K over K and let Ω×Ω be the r ×r matrix
with entries the M ×M matrices Ωij = σi (ωj )IM , where IM is the M ×M unit
matrix. Thus Ω is an rM ×rM matrix built up by r2 blocks of M ×M matrices.
By Remark B.1.15, we have DF/K = det(σi (ωj ))2 , whence Ω is invertible and
its inverse is again formed by r2 blocks of multiples of IM . From our definitions,
we also see that
⎛ ⎞
σ1 A
⎜ ⎟
A := ⎝ ... ⎠ = ΩA . (2.30)
σr A
We apply Corollary 2.9.7 to A . If R denotes the rank of A (and hence also of
A ), we get a basis x1 , . . . , xN −R of the kernel of A over K , contained in OK ,
such that
N −R
N −R
H(xl ) ≤ |DK/Q | 2d HAr row
(A ).
l=1
By (2.30) and the fact that Ω is a non-singular matrix, it is also clear that A and
A have the same kernels and
row
HAr (A ) = HAr
row
(A ).
By 2.9.8, we conclude that
−R
N 
N −R
H(xl ) ≤ |DK/Q | 2d HAr (Ai ),
l=1 i

where i ranges over R linearly independent rows of A . We rearrange our basis
xl by increasing height. Then
N−rM N−R
NN−r
N −r M 
M
−R
rM
H(xl ) ≤ H(xl ) ≤ |DK/Q | 2d HAr (Ai ).
l=1 l=1 i=1

The theorem follows easily using HAr (σj Ai ) = HAr (Ai ). 

2.10. Bibliographical notes

An account of the geometric theory of heights can be found in Lang’s monograph


[169]. A. Weil [324] was the first in his thesis in 1927 to study heights in a geo-
metric setting and their functorial properties, and Siegel used general heights asso-
ciated to ample divisors on curves in his work on integral points on curves. Weil’s
treatment again is based on Hilbert’s Nullstellensatz (an effective version of the
Nullstellensatz is in [195], Th.IV).
Our exposition differs from Lang’s in several respects. We use systematically the
notion of presentation of a Cartier divisor, so to emphasize how local heights are
determined not just by the divisor itself but in fact from other data. Global heights
2.10. Bibliographical notes 81

are treated first only in the case of projective varieties, using the join operation
on projective embeddings and the fact that very ample line bundles generate the
Picard group to obtain the group structure on global heights. This explicit approach
to heights is less elegant than a more abstract one, but is sufficiently constructive
to be usable for the explicit estimates in Section 2.5, which appear to be new.
The general statement of Northcott’s theorem is in a basic foundational paper by
A. Weil [328], following earlier work of D.G. Northcott [227], [228]. The general
theory of local heights is due to A. Néron [218]. We present it in Section 2.7, but
instead of using his quasi-functions, we use the modern language of metrized line
bundles promoted by Arakelov theory.
A systematic theory of heights for subspaces of linear spaces was developed for
the first time by W.M. Schmidt in [263]. Our exposition is mainly based on [35].
Propositions 2.8.8 and 2.8.10 were communicated to us by Oesterlé. The distorsion
factor 2.8.15 is introduced in E. Bombieri, A.J. Van der Poorten, and J.D. Vaaler
[37]; the projective distance is of course much older.
Already in 1909 A. Thue [298] used the pigeon-hole principle to find small in-
teger solutions to linear systems of equations. C.L. Siegel [283] gave the bound
(N B)M/(N −M ) , but it turns out that the cleaner bound (N B)M/(N −M )  re-
quires only a minor
√ modification in his proof, see A. Baker’s monograph [14]. The
improvement ( N B)M/(N −M )  obtained here seems to be of no consequence
for the applications we will make in this book.
Minkowski’s second theorem in the adelic setting is due to R.B. McFeat [198]. If
we ask for solutions in Q, then we can get a bound independent of the discriminant
DK/Q in Theorem 2.9.4, see D. Roy and J.L. Thunder [247], [248] (note that this
cannot be done for solutions in the field K ). This is quite important in some cases,
see the bibliographical notes to Chapter 7.
3 L I N E A R TO R I

3.1. Introduction

This short chapter contains simple but basic material about the algebraic group
Gnm over a field K of characteristic 0. Rather than developing a theory in a
more general context (as it will be done in Section 8.2) and deriving our results as
special cases, our aim in this section has been to give a down-to-earth elementary
treatment of Gnm , its subgroups and maximal subgroups of subvarieties of Gnm .
In order to read this chapter, the reader should be familiar with basic concepts
about varieties as provided by the first sections of Appendix A. We will also apply
Siegel’s lemma in 2.9.4 over Q to get effective bounds for certain normalization
matrices. The theory of linear tori will be used in Chapter 4.

3.2. Subgroups and lattices

3.2.1. As an affine variety, we identify G := Gnm with the Zariski open subset
x1 x2 · · · xn = 0
of affine space AnK , with the obvious multiplication
(x1 , x2 , . . . , xn ) · (y1 , y2 , . . . , yn ) = (x1 y1 , x2 y2 , . . . , xn yn ).
The element 1n = (1, 1, . . . , 1) is the identity of the group structure.
3.2.2. An algebraic subgroup of G is a Zariski closed subgroup, and a linear
torus H is an algebraic subgroup which is geometrically irreducible. Note that
we require that an algebraic subgroup is always defined over the fixed ground field
K of characteristic 0. We will later show that any algebraic subgroup is defined
by polynomials with coefficients in Q (Corollary 3.2.15).
Let H1 , H2 be algebraic subgroups of Gnm1 and Gnm2 . Then ϕ : H1 → H2 is
called a homomorphism (of algebraic subgroups) if ϕ is a morphism of algebraic
varieties which is also a group homomorphism. In Corollary 3.2.8, we prove that
a linear torus in G of dimension r is isomorphic to Grm .
A torus coset is simply a coset gH of a linear torus H of positive dimension.
82
3.2. Subgroups and lattices 83

A torsion coset is a coset εH , where H is a linear torus and ε a torsion point


in G , i.e. a point of finite order in G . The linear torus H may be trivial, hence
a torsion point is the simplest example of a torsion coset. For a torus coset (resp.
torsion coset), we do not assume that g (resp. ε) is a K -rational point, hence a
torus coset or a torsion coset need not be defined over K .
3.2.3. For i ∈ Zn , we abbreviate xi = xi11 · · · xinn . Let e1 = (1, 0, . . . , 0)t ,
. . . , en = (0, 0, . . . , 1)t be column vectors ( t is the transpose), which we identify
with the usual basis of Zn . Let A be an n × n matrix with columns Aei =
(a1i , . . . , ani )t ∈ Zn for i = 1, . . . , n and let ϕA : G −→ G be the map defined
by
ϕA (x) := (xAe1 , . . . , xAen ) = (xa1 11 · · · xann 1 , . . . , xa1 1n · · · xann n ).
Then we have ϕAB = ϕB ◦ ϕA , making it clear that, if det(A) = ±1, then ϕA
is an isomorphism with inverse ϕA−1 . The group of such matrices is denoted by
GL(n, Z) and SL(n, Z) is the subgroup with det(A) = 1.
Definition 3.2.4. An isomorphism ϕA is classically called a monoidal trans-
formation.
3.2.5. If a = a1 e1 + · · · + an en ∈ Zn , we have
 a
xAa = ((xAe1 )a1 · · · (xAen )an ) = ϕA (x) .
If we apply this with A−1 , Aa , in place of A , a , we find
−1  Aa
xa = xA Aa = ϕA−1 (x) . (3.1)

3.2.6. Let Λ be a subgroup of Zn . We say that Λ is a lattice if it is a subgroup


of rank n . If Λ is a subgroup, it spans a linear space VΛ := Λ ⊗Z R ⊂ Rn .
Then Λ & = VΛ ∩ Zn is a subgroup which contains Λ as a subgroup of finite index
ρ(Λ) := [Λ& : Λ] . The subgroup is called primitive if ρ(Λ) = 1.

It is easy to see that the subgroup Λ determines an algebraic subgroup


HΛ := {x ∈ G | xλ = 1 ∀λ ∈ Λ}
of G . The following result describes the structure of HΛ as a direct product of
algebraic subgroups F and MΛ , which means that multiplication gives an iso-
morphism F × MΛ → HΛ of algebraic subgroups.
Proposition 3.2.7. Let Λ be a subgroup of Zn of rank n − r . Then HΛ is an
algebraic subgroup of G of dimension r , which is the direct product of F and
MΛ , where F is a finite algebraic subgroup of order ρ(Λ) and MΛ ⊂ HΛ is a
linear torus equal to the connected component of the identity of HΛ .
Proof: By the theorem of elementary divisors (N. Bourbaki [49], Ch.VII, §4, no.3,
Th.1), there is a basis b1 , . . . , bn of Zn and elements λ1 , . . . , λn−r ∈ Z \ {0}
such that λ1 b1 , . . . , λn−r bn−r is a basis of Λ . Using a monoidal transformation
84 L I N E A R TO R I

to change coordinates, we may assume that b1 , . . . , bn is the standard basis. Then


HΛ = F × Grm with
λ
F = {x ∈ Gn−r
m | xλ1 1 = 1, . . . , xn−r
n −r
= 1}
and all assertions are easy to prove. 
The following two corollaries are immediate from Proposition 3.2.7 and its proof.
Corollary 3.2.8. For a subgroup Λ of Zn of rank n − r , the following properties
are equivalent:

(a) HΛ is a linear torus;


(b) HΛ is isomorphic to Grm ;
(c) HΛ is irreducible;
(d) Λ is primitive.
Corollary 3.2.9. If Λ ⊂ Λ are subgroups of Zn of the same rank n − r and
if HΛ = F MΛ is a direct product decomposition as in Proposition 3.2.7, then
MΛ = MΛ and there is a direct product HΛ = F  MΛ in the sense of Proposition
3.2.7 such that F  ⊂ F .
For the purpose of obtaining effective estimates, we need to bound the entries of
the matrix A used in the proof of the preceding proposition. For the rest of this
chapter,   will denote the 1 -norm of a vector x . For a matrix A , we denote
by A the maximum of the 1 -norms of its columns.
Proposition 3.2.10. Let Λ be a subgroup of Zn of rank n − r and suppose that
Λ has n − r independent vectors of norm at most d . Then there are a finite
subgroup Φ of Gn−rm and a matrix A ∈ SL(n, Z) with A ≤ n3 dn−r and
−1 2n−1 (n−1)2
A  ≤ n d , such that ϕA (Φ × Grm ) = HΛ .

We use a lemma of Mahler, which gives control of a basis of a lattice from the
knowledge of a maximal set of independent vectors.
Lemma 3.2.11. Let M be a subgroup of Zn of rank m and let λ1 , . . . , λm ∈ M
be independent vectors with norm at most d , generating a subgroup Λ ⊂ M . For
i = 1, . . . , m, let Vi be the real span of λ1 , . . . , λi and define Mi = M ∩ Vi ,
hence M = Mm . Then:

(a) [M : Λ] ≤ dm ;
(b) there are v1 , . . . , vm ∈ M such that for every i the vectors v1 , . . . , vi
form a basis of Mi and vi  ≤ md .
Proof: Since Mi is a discrete subgroup of Rn , it has a positive 1 -distance from
Vi−1 and there is vi = t1 λ1 + . . . + ti λi ∈ Mi of minimal distance to Vi−1 with
3.2. Subgroups and lattices 85

tk ∈ R . We may replace vi by any vector in vi + Mi−1 , hence we may assume


|tk | ≤ 12 for k = 0, . . . , i − 1. We note that vi and ti λi have the same distance
to Vi−1 , hence we must have 0 < |ti | ≤ 1. Therefore, we have
vi  ≤ λ1  + . . . + λi  ≤ md.
The vectors vi , i = 1, . . . , m, form the required basis of M .
For the assertion about the index, we may assume that M = Λ & . Then [M : Λ] =
ρ(Λ) . The vectors λj , j = 1, . . . , m have norm bounded by d and generate Λ as
a lattice in Vm . A fortiori, these vectors have euclidean length bounded by d and
span a parallelopiped in Vm of volume at most dm . Thus ρ(Λ) ≤ dm . 
Remark 3.2.12. The proof yields the better bound vi  ≤ max(1, i/2)d .
Proof of Proposition 3.2.10: Let H = HΛ and let λi ∈ Λ , i = 1, . . . , n − r , be
independent vectors of norm at most d . Define
" #
M := x ∈ Zn | λti · x = 0, i = 1, . . . , n − r ,
where λt is the transpose of λ. Then M is a primitive subgroup of Zn of rank r .
By Siegel’s lemma (use Theorem 2.9.4 and 2.9.8), there are independent vectors
x1 , . . . , xr in M with height
H(x1 ) · · · H(xr ) ≤ dn−r .
Since H(xi ) ≥ 1 ( xi = 0 has integer coordinates), we get a fortiori xi  ≤
ndn−r for i = 1, . . . , r .
For some choice of unit vectors, x1 , . . . , xr and eh1 , . . . , ehn −r are independent
in Zn . By Lemma 3.2.11, we obtain a basis v1 , . . . , vn of Zn such that vi  ≤
n2 dn−r and moreover v1 , . . . , vr is a basis of M , because M is primitive.
The matrix A = (vn , . . . , v1 )t has determinant ±1, satisfies A ≤ n3 dn−r ,
and we have
AΛ = (Λ0 , 0, 0, . . . , 0)t
' () *
r times
with Λ0 a lattice in Zn−r . Hence HΛ0 is a finite subgroup of Gn−r
m and ϕA (HΛ0 ×
Grm ) = HΛ by (3.1) on page 83. Replacing v1 by −v1 , we may assume
det(A) = 1.
Finally, A−1 is the matrix whose entries are the cofactors of A . By Hadamard’s
inequality (see footnote in the course of the proof of Proposition 1.6.9), these en-
tries are majorized by the product of the norms of n − 1 row vectors of A , hence
2
by (n2 dn−r )n−1 . This gives A−1  ≤ n2n−1 d(n−1) . 
3.2.13. Let X be a Zariski closed subvariety of G . We say that an algebraic
subgroup H of G is maximal in X if H ⊂ X and H is not contained in a larger
algebraic subgroup in X .
86 L I N E A R TO R I

Proposition 3.2.14. Let X  be a Zariski closed subvariety of G , defined by poly-


nomial equations fi (x) := ai,λ xλ = 0 (i = 1, . . . , m) and let Li be the set
of exponents appearing in the monomials in fi . Let H be a maximal algebraic
subgroup of G contained in X . Then H = HΛ , where Λ is generated by vectors
of type λi − λi with λi , λi ∈ Li , for i = 1, . . . , m.
Proof: We may suppose that X is not G , otherwise there is nothing to prove. Let
K be an algebraic closure of K . The restriction of the monomial xλ to H is a
character χλ of H with values in (K)× . For any such character χ , define
Li,χ = {λ | λ ∈ Li , χλ = χ}.
Since H ⊂ X , we have linear relations
 

ai,λ χ = 0.
χ λ∈Li , χ

By Artin’s theorem on linear independence of characters ([173], Ch.VIII,Th.4.1),


or directly using the Vandermonde determinant, this must be a trivial relation and
hence

ai,λ = 0 (3.2)
λ∈Li , χ

for every i, χ . By definition of Li,χ , the group H is contained in the subgroup


given by the system of equations

xλ = xλ , λ, λ ∈ Li,χ
for varying i, χ . Conversely, by (3.2) this subgroup is contained in X and hence
coincides with H because H is maximal in X . 
The following consequence shows that algebraic subgroups of G always deter-
mine a subgroup of Zn .
Corollary 3.2.15. Every algebraic subgroup H of G is of type HΛ for some
subgroup Λ of Zn .
Proof: It suffices to apply the above Proposition 3.2.14 choosing X = H . 
Corollary 3.2.16. Let H be an algebraic subgroup of G . Then H is a smooth
variety over K . If H is irreducible, then H is a linear torus.
Proof: By Corollary 3.2.15, there is a subgroup Λ of Zn with H = HΛ . By
Proposition 3.2.7, we see that HK is smooth and hence H is smooth by A.7.14.
Moreover, Corollary 3.2.8 proves the last claim. 
Another consequence is the converse of 3.2.3.
Proposition 3.2.17. If ϕ : Gnm → Gm m is a homomorphism of linear tori with
coordinates x and y , then there are µ1 , . . . , µm ∈ Zn with
ϕ(x) = (xµ1 , . . . , xµm ).
3.2. Subgroups and lattices 87

In particular, any automorphism of Gnm is a monoidal transformation.


Proof: It suffices to prove the claim for m = 1, that is for a character ϕ : Gnm →
Gm . Consider its graph X ⊂ Gnm × Gm ∼ = Gn+1
m , namely the locus of points
(x, ϕ(x)). We denote by (x, y) the standard coordinates on Gn+1 m = Gnm ×
Gm . Then X is a linear torus of codimension 1 and a fortiori coincides with its
maximal algebraic subgroup. By Corollaries 3.2.15 and 3.2.8, X is defined by a
non-trivial single monomial equation (x, y)λ = 1, with λ ∈ Zn+1 . This equation
is identically satisfied by setting y = ϕ(x), therefore
ϕ(x) = xµ
for some µ ∈ Qn . Since ϕ is a morphism, we must have µ ∈ Zn , and the result
follows. 
Proposition 3.2.18. Let ϕ : H1 → H2 be a homomorphism of algebraic sub-
groups H1 , H2 of Gnm and Gmm . Then ϕ(H1 ) is an algebraic subgroup of Gm .
m

Proof: It is enough to prove the claim for K algebraically closed, because Corol-
lary 3.2.15 shows that every algebraic subgroup is defined by polynomials with
coefficients in Q . We have to prove that ϕ(H1 ) is Zariski closed in H2 . By
Proposition 3.2.7, we may assume that H1 is a linear torus and so we may sup-
pose that H1 = Gnm (Corollary 3.2.8). Replacing H2 by the Zariski closure of
ϕ(H1 ), which is a linear torus of dimension r and hence isomorphic to Grm , it
remains to show that a dominant homomorphism ϕ : Gnm → Gm m of linear tori is
surjective.
By Proposition 3.2.17, there are µ1 , . . . , µm ∈ Zn with ϕ(x) = (xµ1 , . . . , xµm ).
Let B the matrix with columns µ1 , . . . , µm , then generalizing 3.2.3 we set ϕ(x) =
ϕB (x). Since ϕ(Gnm ) is dense, it is immediate that B has rank m . By lin-
ear algebra, there are R ∈ GL(m, Q) and S ∈ GL(n, Q) with B = RES ,
where E = (Im , 0, . . . , 0) with Im the unit matrix of rank m . If we had
R ∈ GL(m, Z) and S ∈ GL(n, Z), then this would immediately lead to sur-
jectivity of ϕB = ϕR ◦ ϕE ◦ ϕS . In general, the argument gives a k ∈ Z \ {0}
m there is an x ∈ Gm with ϕB (x) = y , which is enough
such that for all y ∈ Gm n k

to prove surjectivity. 
Theorem 3.2.19. The following statements hold:

(a) The map Λ → HΛ is a bijection between subgroups of Zn and algebraic


subgroups of G .
(b) Let Λ , M be subgroups of Zn . Then HΛ HM = HΛ∩M and HΛ ∩ HM =
HΛ+M .
Proof of (a): Corollary 3.2.15 shows that this map is surjective. In order to prove
that it is also injective, we argue as follows. Suppose Λ , M are two subgroups of
Zn and that HΛ = HM . We have xλ = 1 for λ ∈ Λ and xµ = 1 for µ ∈ M ,
88 L I N E A R TO R I

therefore HΛ ⊂ HΛ+M . Since the reverse inclusion is obvious, we deduce that


HΛ = HΛ+M . Now Proposition 3.2.7 shows that rank(Λ) = rank(Λ + M ) and
ρ(Λ) = ρ(Λ+M ), which clearly implies that Λ = Λ+M . Thus M is a subgroup
of Λ , and equality follows by symmetry.
Proof of (b): Using multiplication as a homomorphism and Proposition 3.2.18,
we conclude that HΛ HM is an algebraic subgroup of G . Thus HΛ HM is the
smallest algebraic subgroup containing both HΛ and HM . The correspondence
between subgroups of Zn and algebraic subgroups of G reverses inclusion rela-
tions. Since Λ ∩ M is the largest subgroup contained in both Λ and M , we have
the first assertion of (b). The proof of the second assertion is the same, because the
intersection of two algebraic subgroups of G is again an algebraic subgroup of G .


3.3. Subvarieties and maximal subgroups

In this section we study algebraic subgroups and their cosets contained in a closed
subvariety X of G := Gnm defined over a number field K .
3.3.1. We say that X is defined by polynomials of degree at most d , if X is
the set of zeros of a finite collection of polynomials fi (x) of degree at most d ,
with coefficients in K . The essential degree δ(X) of X is the minimum integer
d ≥ 1, such that X is defined by polynomials of degree at most d .
Note that, if X is defined over K by polynomials of degree at most d , then it is
also defined by polynomials of degree at most d with coefficients in K . In fact,
let K  ⊃ K be a field of definition for the polynomials fi ; by considering the
traces Tr(ωfi (x)) with ω ∈ K  , we obtain a set of equations defined over K (see
A.4.13).
Proposition 3.3.2. Let ϕA be a monoidal transformation or more generally any
homomorphism ϕA : Gm m → Gm induced by a matrix A as in 3.2.3. Then
n


k

δ Xi ≤ k max δ(Xi )
i=1,...,k
i=1
+
k

δ Xi ≤ max δ(Xi )
i=1,...,k
i=1
δ(ϕ−1
A (X)) ≤ A δ(X).
Moreover, if X is irreducible of degree d (i.e. its closure in PnK has degree d ),
then δ(X) ≤ d .

Proof: If Xi is defined by polynomials fij (x), j ∈ Ji , then Xi is defined by
polynomials
f1j1 (x) · · · fkjk (x) (ji ∈ Ji ).
3.3. Subvarieties and maximal subgroups 89

This proves the first inequality. The second inequality is obvious and the third
follows easily from the definition of A given before Proposition 3.2.10.
For the last statement, it suffices to consider all linear cones of dimension n − 1
over X , and note that each cone, of codimension 1 in An and of degree at most
d , is defined by a single polynomial equation of the same degree. 
3.3.3. If X is defined by polynomials of degree at most d ≥ 1 and has pure
dimension r = dim(X), by Bézout’s theorem (see [125], Ex.8.4.6) every irre-
ducible component of X has degree at most dn−r and in particular is defined by
polynomials of degree at most dn−r .
Proposition 3.3.4. The number of algebraic subgroups H of G with δ(H) ≤
2
d does not exceed (4ed)n . If H is such a subgroup and H  is an algebraic
subgroup of H of finite index, then δ(H  ) ≤ nd and H/H  is a finite group of
order at most dn .
Proof: Let H be an algebraic subgroup of G of dimension r defined by poly-
nomials of degree at most d . Then Proposition 3.2.14 shows that H is defined
by monomial equations xλ = 1 with λ = (λ1 , . . . , λn ), a vector which is the
difference of two vectors, with non-negative components and norm at most d ; in
particular, the norm of λ is bounded by λ ≤ d . By Corollary 3.2.15, H = HΛ
with Λ a subgroup of Zn of rank n − r , generated by elements λ as above. Let
also H  = HM , so that Λ ⊂ M has finite index in M .
By Lemma 3.2.11 and Corollary 3.2.9, we have |H/H  | = [M : Λ] ≤ dn and M
has a basis of vectors of norm at most nd , hence H  is defined by polynomials
of degree at most nd . Moreover, Λ has a basis vi (i = 1, . . . , n − r) such that
vi  ≤ nd . The number of such vectors does not exceed

n nd + n (2nd)n
2 ≤ 2n < (4ed)n ,
n n!
because n! > (n/e)n . It follows from this that the number of subgroups Λ does
2
not exceed (4ed)n . 
Proposition 3.3.5. Let X be a closed subvariety of G . Then every algebraic
subgroup H ⊂ X is contained in a maximal algebraic subgroup H  ⊂ X .
Proof: Without loss of generality, we may assume that every algebraic subgroup
H1 with H ⊂ H1 ⊂ X has the same dimension as H , say r , and that r < n.
In order to prove the proposition, we need to show that there is no infinite chain
H1  H2  H3  · · · of algebraic subgroups contained in the subvariety X .
By Theorem 3.2.19, we have Hi = HΛi for certain subgroups Λi of Z, n
of rank
n − r , satisfying Λ1  Λ2  Λ3  · · · . The intersection M = Λi is a
subgroup of rank strictly less than n − r (note that [Λi : Λi+1 ] ≥ 2 for every i),
therefore the corresponding subgroup HM has dimension at least r + 1.
90 L I N E A R TO R I


Let H  be the Zariski closure of Hi . Then H  is an algebraic subgroup of G
and it is the smallest algebraic subgroup containing all subgroups Hi . By Theorem
3.2.19, we conclude that H  = HM , because M is the largest  subgroup contained
in every Λi . Since each Hi ⊂ X , the Zariski closure of Hi is also contained
in X and we conclude that H1 ⊂ HM ⊂ X with dim(HM ) > dim(H1 ). This is
a contradiction. 
The same argument proves:
Proposition 3.3.6. The torsion points of an algebraic subgroup H of G are
Zariski dense in H .
Proof: By Corollary 3.2.15, there is a subgroupΛ of Zn with H = HΛ . Let
Hi = {x ∈ H | xi! i!
1 = 1, . . . , xn = 1}. Then Hi is the set of torsion points
of H .,By Theorem 3.2.19, we have Hi = HΛi with Λi = i! · Zn + Λ , and
M = Λi is Λ using the theorem of elementary divisors ([49], Ch.VII, §4, no.3,
Th.1).
 As noted in the proof of the preceding proposition, the Zariski closure of
Hi is HM . 
Definition 3.3.7. Let X ⊂ G be a closed subvariety of G defined by polynomials
of degree at most d . We define

X∗ = X − {all torsion cosets ⊂ X} ,

X◦ = X − {all torus cosets ⊂ X} .

Theorem 3.3.8. Let X ⊂ G be a closed subvariety of G defined by polynomials


of degree at most d ≥ 1. Then:

(a) Every torsion coset in X is contained in a maximal torsion coset εH ⊂ X .


(b) If εH is a maximal torsion coset in X , then H is defined by polynomials
of degree at most nd , and hence the number of such linear tori does not
2
exceed (4end)n .
(c) For any linear torus H the number of maximal torsion cosets εH ⊂ X is
finite and bounded in terms of d and n alone.

In particular, X ∗ is a Zariski open subset of X .


Proof of (a) and (b): Statement (a) is immediate by considering dimensions.
To prove (b), we may assume that ε = 1 (note that replacing X by ε−1 X requires
a base change from K to K(ε), but using the remark in 3.3.1 we get a set of
defining equations over K . Let H & be the maximal subgroup of Gn with H ⊂
m
& &
H ⊂ X . Then dim(H) = dim(H) (otherwise H would not be maximal) and
& is defined by polynomials of degree at most d .
Proposition 3.2.14 shows that H
3.3. Subvarieties and maximal subgroups 91

& and (b) follows


It follows that H is the connected component of the identity of H
from Proposition 3.3.4. 
The proof of (c) is harder and will be postponed to Chapter 4.
Theorem 3.3.9. Let X be a closed subvariety of G defined over K by polynomi-
als of degree at most d ≥ 1. Then:

(a) Every torus coset contained in X is contained in a maximal torus coset


gH ⊂ X .
(b) If gH is a maximal torus coset in X , then H is defined by polynomials
of degree at most nd , and hence the number of such linear tori does not
2
exceed (4end)n .
(c) For any non-trivial linear torus H of dimension r , the union

X(H) := gH
gH⊂X

of all torus cosets gH ⊂ X has the following structure. There is A ∈


SL(n, Z) such that
 × Gr ),
X(H) = ϕA (X(H) m

 is a closed subvariety of Gn−r defined over K by polynomi-


where X(H) m
als of degree bounded by n3 δ(H)n−r d .

In particular, X ◦ is a Zariski open subset of X .


Proof: The proof of (a) and (b) is the same as in the previous theorem. For the
proof of (c), let Λ define the linear torus H , so that Λ is a primitive subgroup
of Zn of rank n − r (Corollary 3.2.8). By Proposition 3.2.14, Λ is generated by
vectors of norm bounded by δ(H).
By Proposition 3.2.10, we may assume that H = 1n−r × Grm and replace X by
X& := ϕ−1 (X). By Proposition 3.3.2, X& is defined over K by polynomials of
A
3 n−r
degree at most n δ(H) d.
Let f&i (x1 , . . . , xn ) = 0 be a set of defining equations for X & . To say that gH ⊂ X
&
means that
f&i (g1 , . . . , gn−r , xn−r+1 , . . . , xn ) = 0
is identically satisfied in xn−r+1 , . . . , xn . This yields a finite set of polynomial
equations in g1 , . . . , gn−r , again of degree at most n3 δ(H)n−r d , defining a closed
 ⊂ Gn−r with the desired properties. This proves (c).
subvariety X(H) 
m

Remark 3.3.10. The proof gives a matrix A ∈ SL(n, Z) with A ≤ n3 δ(H)n−r
2
and A−1  ≤ n2n−1 δ(H)(n−1) .
92 L I N E A R TO R I

3.4. Bibliographical notes

The material in this chapter follows closely the treatment given in Schmidt’s paper
[273], with several additions. For the general theory of linear algebraic groups, we
refer to A. Borel [41] and M. Demazure and P. Gabriel [85].
4 SMALL POINTS

4.1. Introduction

In this chapter, we study the distribution of algebraic points of small height on


subvarieties of Gnm .
The general case is handled by Zhang’s theorem and its uniform version, dis-
cussed in Section 4.2, which presupposes knowledge of the results in the previous
Chapter 3. In Section 4.3, we give an alternative non-constructive proof using an
equidistribution theorem.
The special case in which the subvariety coincides with the ambient variety Gm is
a fascinating question of D.H. Lehmer, and we shall prove in Section 4.4
Dobrowolski’s well-known theorem, which provides a good lower bound for the
height of algebraic numbers which are not roots of unity.
In this circle of ideas, we shall also give a proof of a result of F. Amoroso and R.
Dvornicich providing an absolute positive lower bound for the height of algebraic
numbers, not roots of unity, in abelian extensions of Q .
In Section 4.5, we give examples of infinite algebraic extensions of number fields
which have only finitely many numbers of bounded height as in Northcott’s theo-
rem. In a similar way, we discuss in Section 4.6 infinite algebraic extensions of Q
such that the height has a positive lower bound outside of the roots of unity.

4.2. Zhang’s theorem

In 1992 S. Zhang obtained a surprising result on the height of algebraic points on


curves in Gnm . He showed that for any curve C ⊂ Gnm there is a positive lower
bound for the height of non-torsion algebraic points in C . This was new even in
the simplest case of the equation x + y = 1. Later, he extended this to general
subvarieties of Gnm .
As we shall see in Section 5.2, there are interesting applications to the problem of
obtaining good upper bounds for the number of solutions of the unit equation.
93
94 SMALL POINTS

We identify Gm with the affine line punctured at the origin, together with the
usual multiplication. Definitions and notation will be as in Chapter 3, with the
additional assumption that K is a number field. Note also that the statement of
Theorem 3.3.8, (c) has not yet been proved at this stage.
The standard height of x = (x1 , . . . , xn ) ∈ Gnm is

n

h(x) := h(xi )
i=1

with h the absolute Weil height of algebraic numbers. The height 


h has the
following properties:

h(P m ) = |m| · 
(a) Homogeneity and symmetry:  h(P ) for m ∈ Z.
(b) Non-degeneracy: 
h(P ) = 0 if and only if P is a torsion point.
(c) Triangle inequality: 
h(P Q−1 ) ≤ 
h(P ) + 
h(Q).
(d) Finiteness: There are only finitely many points P ∈ Gnm such that [Q(P ) :
Q] and h(P ) are both bounded.

These properties are clear from the corresponding properties of the Weil height,
with (b) and (d) following from the theorems of Kronecker and Northcott (see
1.5.9, 2.4.9). Thus 
h(P Q−1 ) is a translation invariant semidistance d(P, Q) on
Gm and actually a translation invariant distance on Gnm /tors .
n

Proposition 4.2.1. Let ϕA : Gnm −→ ∼ Gn be a monoidal transformation deter-


m
mined by A ∈ GL(n, Z). Then the height  h ◦ ϕA is equivalent to the height 
h, in
the sense that there are two positive constants c1 , c2 such that
c1 
h(x) ≤ 
h(ϕA (x)) ≤ c2 
h(x)
for every x ∈ Gnm .
Proof: We have, with the notation of 3.2.3 and 3.2.10

n 
n 
n

h(ϕA (x)) = h(xAei ) ≤ |aji | h(xj ) ≤ n A
h(x).
i=1 i=1 j=1

This proves the second half of the inequality, with c2 = nA.


Now we apply this inequality with ϕA−1 (x) in place of x and by replacing A by
A−1 , we obtain the first half of the inequality, with c1 = 1/(nA−1 ). 
A torsion coset in X is said to be maximal in X if it is not contained in a larger
torsion coset in X . A similar definition applies to torus cosets in X .
We have Zhang’s theorem:
4.2. Zhang’s theorem 95

Theorem 4.2.2. Let X be a closed subvariety of Gnm defined over a number field
K and let X ∗ be the complement in X of the union of all torsion cosets εH ⊂ X .
Let fi ∈ K[x] be a set of polynomials of degree at most d defining X . Then:
(a) The number of maximal torsion cosets in X is finite and bounded in terms
of d and n alone. Moreover, every maximal torsion coset has the form
ζH , where the orders of the torsion points ζ and the essential degrees of
the linear tori H are also bounded in terms of d and n .
(b) The height of points P ∈ X ∗ has a positive lower bound, depending on n ,
d , [K : Q], and max h(fi ).
The results (a) and (b) are effective.

There is a slightly different uniform version for torus cosets.


Theorem 4.2.3. Let 
h be the standard height on Gnm , with the associated semi-
distance d(P, Q) = h(P Q−1 ). Let X be a closed subvariety of Gnm defined
over a number field K by polynomials of degree at most d , and let X ◦ be the
complement in X of all its torus cosets gH ⊂ X , dim(H) ≥ 1. Then:
(a) X ◦ is Zariski open in X , and X \ X ◦ is defined by polynomials of degree
bounded in terms of n and d alone.
(b) There are a positive constant γ(d, n) and a positive integer N (d, n), de-
pending only on d and n , with the following property: Let Q ∈ Gnm . Then
{ P ∈ X ◦ | d(P, Q) ≤ γ(d, n)}
is a finite set of cardinality at most N (d, n). Moreover, for every point P
in this set, the estimate [K(P, Q) : K(Q)] ≤ N (d, n) holds.
Remark 4.2.4. The closed set X \ X ◦ can be effectively determined.
To see this, note that X \ X ◦ is the union of the subvarieties X(H), where H
runs over all non-trivial tori that can appear in maximal torus cosets gH ⊂ X . By
Theorem 3.3.9 (c) and Remark 3.3.10, we can find A ∈ SL(n, Z), with entries
bounded in terms of d and n , such that X(H) = ϕA (Y × Grm ), where r =
dim(H) ≥ 1 and Y is defined by polynomials of degree bounded in terms of d
and n . The proof there also shows that their height is bounded in terms of the
height of the polynomial equations used to define X . By Northcott’s theorem (see
Theorem 2.4.9), Y can be effectively determined.
Remark 4.2.5. The constants γ(d, n) and N (d, n) are effective and hence the
finite set of points in (b) can be effectively determined for every Q ∈ Gnm .
Remark 4.2.6. There is no uniform version of Theorem 4.2.2. For example, if a,
b are not roots of unity or 0, the equation 1 + ax + by = 0 in G2m has a non-
torsion solution ξ = (a−1 ρ, b−1 ρ2 ) with ρ a primitive cubic root of unity. We
96 SMALL POINTS

have 
h(ξ) = h(a) + h(b) > 0, and we can make it arbitrarily small by choosing
a and b , for example a = b = 21/m with m → ∞ .
Remark 4.2.7. D. Zagier [336] has obtained the optimal lower bound
 √ 
 1 1+ 5
h(ξ) ≥ log
2 2
for non-torsion solutions ξ of 1 + x + y = 0. Equality is attained for x or y , a
primitive 10th root of unity, and this minimum is isolated.

We need the following result:


Lemma 4.2.8. Let f (x1 , . . . , xn ) be a polynomial
 with integer coefficients, of
degree d and height H(f ) and let p > e d+n n H(f ) be a prime number. Let ξ =
(ξ1 , . . . , ξn ) be an algebraic point with f (ξ1 , . . . , ξn ) = 0 and f (ξ1p , . . . , ξnp ) =
0. Then  h(ξ) ≥ 1/(pd).
Proof: We may assume that the coefficients of f have no common divisor greater
than 1. Let K be a number field containing all coordinates ξi .
By Fermat’s little theorem, we have
f p (x1 , . . . , xn ) = f (xp1 , . . . , xpn ) + p g(x1 , . . . , xn ),
where g(x1 , . . . , xn ) ∈ Z[x1 , . . . , xn ] has degree at most pd . Since by hypothesis
we have f (ξ1 , . . . , ξn ) = 0, we get
f (ξ1p , . . . , ξnp ) = −p g(ξ1 , . . . , ξn ). (4.1)
For any ζ ∈ K \ {0} the product formula yields

log |ζ|v = 0 .
v∈MK
p
We apply this with ζ := f (ξ1 , . . . , ξnp ) and estimate terms as follows.
If v|p , we have by (4.1) and the fact that g has integer coefficients

n
log |ζ|v = log |p g(ξ1 , . . . , ξn )|v ≤ log |p|v + pd log+ |ξi |v + log |f |v ,
i=1
because the Gauss norm equals 1 (the coefficients of f have no non-trivial com-
mon divisor and v is not archimedean).
At the other places v , we have, with εv = [Kv : R]/[K : Q] if v is archimedean
and εv = 0 if v is not archimedean
n

d+n
log |ζ|v = log |f (ξ1p , . . . , ξnp )|v ≤ pd log+ |ξi |v + log |f |v + εv log ,
i=1
n
 
because the number of monomials in f does not exceed d+n n .
4.2. Zhang’s theorem 97


Summing over all v ∈ MK and using v|p log |p|v = − log p from Lemma
1.3.7, we infer


d+n
0= log |ζ|v ≤ − log p + pd 
h(ξ) + h(f ) + log . 
n
v∈MK

The following consequence of Lemma 4.2.8 is due to W.M. Schmidt.


Corollary 4.2.9. Let f (x) be as before and let m be a positive integer all of
whose prime factors are greater than e d+n
n H(f ). Suppose f (ξ) = 0. Then

either f (ξ ) = 0 or h(ξ) ≥ 1/(md).
m

Proof: The easy proof is by induction on the number of prime factors of m , writing

m = pm . If f (ξ m ) = 0, we apply the corollary inductively with m in place of
 
m . If instead f (ξ m ) = 0, we apply Lemma 4.2.8 to the point ξ m in place of ξ

and note that 
h(ξ m ) = m h(ξ). 
4.2.10. Proof of Theorem 4.2.2 (b) and the following partial result of 4.2.2 (a):
4.2.2 (a  ) The union of torsion cosets of X is contained in finitely many maximal
torsion cosets of the form ζM , where the orders of the torsion points ζ and the
essential degrees of the linear tori M are bounded in terms of d, n, [K : Q], and
max h(fi ).

We say that a quantity is controlled if it is a function of d, n, [K : Q], and


max h(fi ), which, in principle, can be made explicit.
We prove (a  ) and (b) simultaneously by induction on the rank n of the ambient
linear torus. The claim is trivial for n = 0. Now let n ≥ 1. If X = Gnm , we
have nothing to prove. So we may assume that there is a non-zero fk in our list of
defining polynomials. Let us consider
 
f (x) := fk (σx) = am xm ,
σ m

where σ ranges over all embeddings of K in Q̄ . It is a polynomial of degree at


most D  = [K : Q]d with coefficients in Q . Let q be the product of all primes
p ≤ e D+n H(f ). Let ξ ∈ X . We apply Corollary 4.2.9 with m = qj + 1,
n  
for j = 0, . . . , D+n
n − 1. If f (ξ qj+1 ) = 0 for some j , we obtain a controlled
positive lower bound for ĥ(ξ). If instead f (ξ qj+1 ) = 0 for every j , we get

am (ξ m )qj+1 = 0.
m

We view this as a homogeneous linear system with coefficients (ξ m )qj and


unknowns am ξ m , so that its determinant must be 0. This is a Vandermonde
determinant, and looking at its factorization we see that

ξ qm = ξ qm
98 SMALL POINTS

for some m = m . Hence ξ belongs to a subgroup H of Gnm defined by polyno-


mials of degree at most qD .
Let H = F M be a direct product decomposition as in Proposition 3.2.7 with M
a linear torus of rank r and F a finite abelian group, and let ξ ∈ γM , γ ∈ F .
By Proposition 3.3.4, the essential degree of M and the order of F is controlled.
Let ζ ∈ F with ξ ∈ ζM . Since the order of ζ is controlled, we conclude that
X  := ζ −1 X ∩ M
is defined over the extension K  := K(ζ)/K , also of controlled degree. Now we
bring M in normal form using Proposition 3.2.10. There is a monoidal transfor-
mation ϕA with controlled A and A−1  such that ϕA−1 (M ) = 1n−r × Grm .
Then Y := ϕA−1 (X  )∩(1n−r ×Grm ) is given by the polynomials fi (ζA(1n−r ×
y)) as a subvariety of Grm . They have controlled  degree and controlled height.
Since r < n, induction proves that Y \ Y ∗ = ζ l Ml for torsion points ζ l , of
controlled number and order and for linear tori Ml of controlled essential degree.
Moreover, the height has a controlled positive lower bound on Y ∗ . Using Proposi-
tion 4.2.1 and ĥ(ξ) = ĥ(ζ −1 ξ), we see that either ĥ(ξ) has a controlled positive
lower bound or ξ is contained in some ζϕA (1n−r × ζ l Ml ) ⊂ X . Clearly, the
latter is a torsion coset of the form ζ µ Mµ , where the torsion point ζ µ has con-
trolled order and the linear torus Mµ has controlled essential degree. The number
of such torsion cosets is also controlled. It remains to show that

X \ X∗ = ζ µ Mµ .
µ

Let ξ ∈ X \ X . Then ξ is contained in a torsion coset. By Proposition 3.3.6,
there is a sequence (ξ n ) of torsion points in X converging to ξ . Because of the
positive lowerbound outside the ζ µ Mµ , we conclude that all ξ n and hence ξ are
contained in ζ µ Mµ . This proves the claim. 
4.2.11. Proof of Theorem 4.2.3: Note that Theorem 4.2.3 (a) is part of Theorem
3.3.9, hence it remains to show (b). Now it will be convenient to say that a quantity
is controlled if it admits a bound depending only on d and n and which, in prin-
ciple, can be made explicit. However, no effort will be made here to give explicit
calculations for such bounds. To prove (b), it is enough to show the existence of
controlled constants γ > 0 and N > 0 such that the set
Eγ := {P ∈ X ◦ | ĥ(P ) ≤ γ}
has at most N points. Then the first claim follows immediately by applying trans-
lation with Q−1 and the last statement is also clear because conjugation over
K(Q) does not change X and heights.
We proceed by induction on n . By convention, G0m is the trivial torus {1}, thus
the claim is obvious for n = 0. For the induction step, we may assume that X
is a proper subvariety of Gnm . Let M be again the set of monomials of degree
4.2. Zhang’s theorem 99

d+n
bounded by d and let r := |M| = n . For s ∈ {1, . . . , r}, consider the s × r
matrix
⎛ ⎞
xm1
⎜ ⎟
As (x) := ⎝ ... ⎠ ,
m
xs m∈M

where x1 , . . . , xs are points of Gnm . Let Ys be the closed subvariety of (Gnm )s ,


given by the vanishing of all s × s minors of the matrix As (x).
Note that Ys is universal in the sense that it is defined over Q and depends only
on d , n , and s . If we apply Theorem 4.2.2 (a  ) and (b), then the constants depend
only on d and n . Hence the number of maximal torsion cosets in Ys is controlled
and they have the form εH , where the order of the torsion point ε and the essential
degree of H are also controlled. Moreover, the height has a controlled positive
lower bound on Ys∗ .
Let γ be a sufficiently small positive constant, which is controlled and which
we will define precisely later in the course of the proof. For the moment, we
just assume that sγ is smaller than the above lower bound on Ys∗ for every s =
0, . . . , r . Clearly, we may assume that Eγ is not empty. Now let us choose

s := 1 + max rank Ar (ξ 1 , . . . , ξ r ).
ξ1 ,...,ξr ∈Eγ

Note that the rank of Ar is at most r − 1 on X r since any equation of X yields


a linear relation among the columns of the matrix, hence 2 ≤ s ≤ r .
We fix ξ ∗1 , . . . , ξ ∗s−1 ∈ Eγ such that As−1 (ξ ∗1 , . . . , ξ ∗s−1 ) has maximal rank s−1.
Our rank condition implies that

S ∗ := ξ ∗1 × · · · × ξ ∗s−1 × Gnm

is not contained in Ys , otherwise the Laplace expansion of the s × s minors of


the matrix As (ξ ∗1 , . . . , ξ ∗s−1 , xs ) with respect to the last row would show that all
(s − 1) × (s − 1) minors of As−1 (ξ ∗1 , . . . , ξ ∗s−1 ) vanish. Hence the projection
ps to the last factor does not map S ∗ ∩ Ys onto Gnm . For any ξ ∈ Eγ , the point
(ξ ∗1 , . . . , ξ ∗s−1 , ξ) is contained in a maximal torsion coset εH . Now we fix the
torsion coset εH and also some ξ ∗s ∈ Eγ with (ξ ∗1 , . . . , ξ ∗s ) ∈ εH .
We have seen that ps (S ∗ ∩εH) is properly contained in Gnm . Let (x/ε)λ , λ ∈ Λ ,
be a set of defining equations of εH of controlled degree. Then S ∗ ∩εH is defined
by setting xi = ξ ∗i , i = 1, . . . , s − 1 in these equations. We conclude that


s−1
ps (S ∗ ∩ εH) = {x ∈ Gnm | xλs = ελ (ξ ∗i )−λi ,
i=1
λ = (λ1 , . . . , λs ) ∈ Λ} = ξ ∗s H  ,
100 SMALL POINTS

where H  is the algebraic subgroup of Gnm given by the equations xλs = 1, λ ∈


Λ . Let M be the connected component of the identity of H  . By Proposition
3.2.7 and Proposition 3.3.4, the essential degree of M is controlled and there are
torsion points ρj of Gnm , whose number and orders are also controlled, such that

H = ρj M.
j

We conclude that 
ps (S ∗ ∩ εH) = ηj M
j

with η j = ξ ∗s ρj . Now let X  := η −1j X ∩ M . Since M is a linear torus of


dimension n < n, the idea is to apply our induction hypothesis to X  in M . The

only obstacle is that M has not the normal form Gnm . As noted in 4.2.10, this can
be obviated by using Proposition 3.2.10 to bring M in normal form, and Proposi-
tion 4.2.1 to control the change in height induced by the monoidal transformation.
Thus the induction step gives controlled positive constants γ  and N  such that
the number of points ξ  ∈ (X  )◦ with ĥ(ξ  ) ≤ γ  is bounded by N  .
For ξ ∈ Eγ with (ξ ∗1 , . . . , ξ ∗s−1 , ξ) contained in our fixed maximal torsion coset
εH of Ys , we have ξ ∈ ps (S ∗ ∩ εH), thus ξ is contained in a torus coset ηj M .
Then ξ  := η −1 
j ξ ∈ X and

h(ξ  ) ≤ 
 h(η j ) + 
h(ξ) = 
h(ξ∗s ) + 
h(ξ) ≤ 2γ.
Finally, we may assume γ ≤ γ  /2. By the inductive step, we conclude that the
number of ξ ∈ Eγ ∩η j M with (ξ ∗1 , . . . , ξ ∗s−1 , ξ) ∈ εH is bounded by N  . Since
the number of torus cosets η j M is bounded by a controlled N  , there are at most
N  N  points ξ ∈ Eγ with (ξ ∗1 , . . . , ξ ∗s−1 , ξ) ∈ εH .
If ξ ranges over Eγ , then we have seen that the points
(ξ ∗1 , . . . , ξ ∗s−1 , ξ)
are contained in various maximal torus cosets εH of Ys . For every such εH , we
fix ξ ∗s ∈ Eγ with (ξ ∗1 , . . . , ξ ∗s ) ∈ εH and apply our preceding procedure. If N 
is the controlled number of such maximal torsion cosets, then Eγ contains at most
N = N  N  N  points. Since N is controlled, this proves the induction step. 
4.2.12. Completion of the proof of Theorem 4.2.2 (a): By Theorem 3.3.8 (b),
we already know that every maximal torsion coset in X has the form εH with
δ(H) ≤ nd .
In proving Zhang’s theorem 4.2.2 (a), it suffices to deal with the maximal torsion
points of X , namely maximal torsion cosets of dimension 0. To see this, we use
Theorem 3.3.9 as follows. We have to find all maximal torsion cosets εH ⊂ X .
Assume that dim(H) ≥ 1. Then εH is also a torus coset. By Theorem 3.3.8
(b), we can fix the linear torus H . Now, using Theorem 3.3.9 (c), the maximal
4.3. The equidistribution theorem 101

torus cosets εH in X are in one-to-one correspondence with the maximal torsion


 . Since X(H)
points of X(H)  is defined by polynomials of degree bounded in
terms of δ(X) and n , our claim follows.
Thus, by our preceding considerations, it suffices to deal only with maximal tor-
sion points. A maximal torsion point ε cannot belong to a torus coset gH ⊂ X ,
otherwise we could take g = ε and ε would not be maximal. Hence every max-
imal torsion point belongs to X ◦ . Since ε is a torsion point it has semidistance
0 from the identity 1, because d(ε, 1) =  h(ε) = 0. By Theorem 4.2.3 (b), the
number of such points ε and their order are at most N (d, n), as asserted, proving
what we want for maximal torsion points. 
4.2.13. Completion of the proof of Theorem 3.3.8 (c): This is immediate from
Theorem 4.2.2 (a) in the case of number fields. In general, we argue as follows.
Let εi Hi be a list of all maximal torsion cosets in X . They are defined over Q
(Corollary
 3.2.15). Now consider the Zariski closure Y of the union of all cosets
εi Hi . Then Y ⊂ X , thus all εi Hi are maximal torsion cosets of Y . Since Y
is defined over Q , there are only finitely many εi Hi . 

4.3. The equidistribution theorem

Another approach is due to L. Szpiro, E. Ullmo, and S. Zhang [296], and Yu.
F. Bilu [24]. The idea is that points of small height under the action of Galois
conjugation tend to be equidistributed with respect to a suitable measure.
For a in the multiplicative group C× of C , let δa be the usual Dirac measure at
a and for ξ ∈ Q let
1 
δξ = δσξ
[Q(ξ) : Q]
σ:Q(ξ)→C
be the probability measure supported at all complex conjugates of ξ , with equal
mass at each point.
In order to understand the following considerations, we recall some basic facts
from functional analysis and measure theory.
Let X be a locally compact Hausdorff space and let Cc (X) be the space of
complex-valued continuous functions on X with compact support, endowed with
the supremum norm. Then the Riesz representation theorem says that for every
continuous linear functional Λ on this normed space there exists a unique com-
plex regular Borel measure µ such that

Λ(f ) = f dµ, f ∈ Cc (X).
X
Moreover, the operator norm of Λ equals |µ|(X), where |µ| is the total variation
of the measure µ. For details, we refer to W. Rudin [252], Th.6.19.
102 SMALL POINTS

It is also clear that every complex regular Borel measure µ on X yields a contin-
uous linear functional on Cc (X). Therefore, the space of complex regular Borel
measures on X is the dual of Cc (X) in the sense of functional analysis, and we
denote it by Cc (X)∗ . The weak-* topology on Cc (X)∗ is the coarsest topology
-
on Cc (X) such that, for every f ∈ Cc (X), the linear functional µ → X f dµ
is continuous. The Banach–Alaoglu theorem (see W. Rudin [251], 3.15) says that
the unit ball " #
µ ∈ Cc (X)∗ | |µ|(X) ≤ 1
is weak-* compact.
In what follows, we apply these concepts to X = C× .
We have the following result of Bilu:
Theorem 4.3.1. Let (ξi )i∈N be an infinite sequence of distinct non-zero algebraic
numbers such that h(ξi ) → 0 as i → ∞ . Then the sequence (δ ξi )i∈N converges
in the weak-* topology to the uniform probability measure µT := dθ/(2π) on the
unit circle T := {eiθ | 0 ≤ θ < 2π} in C .
Proof: By the weak-* compactness of the unit ball in Cc (C× )∗ , it is enough to
show that any convergent subsequence of the sequence (δ ξi )i∈N has limit µT .
Thus we may assume that the measures δ ξi converge in the weak-* topology to a
Borel measure µ, and we have to show that µ = µT .
Let µ be a weak-* limit of the measures δ ξi . Let a0i and di be the leading
coefficient and degree of a minimal equation for ξi . Since the ξi are distinct,
Northcott’s theorem in 1.6.8 shows that di → ∞ .
As in 1.5.7, log+ is the maximum of 0 and log . By Propositions 1.6.5 and 1.6.6
and the hypothesis h(ξj ) → 0, we have
1 1  +
h(ξi ) = log |a0i | + log |σξi | → 0 (4.2)
di di σ

as i → ∞ ; this implies log |a0i | = o(di ) and σ log+ |σξi | = o(di ), where σ
ranges over all embeddings of Q(ξi ) into C .
By weak-* convergence we deduce

1 
f (σξi ) log |σξi | →
+
f (z) log+ |z| dµ(z)
di σ C

for any continuous function f (z) with compact support in C× . Thus (4.2) shows
that
f (z) log+ |z|dµ(z) = 0
C
and µ must be supported in the unit disk |z| ≤ 1. Since h(1/ξi ) = h(ξi ) → 0,
working with the sequence (1/ξi )i∈N we deduce in a similar fashion that µ is
4.3. The equidistribution theorem 103

supported in |z| ≥ 1. Thus any limit measure µ has support in the unit circle T .
By weak-* convergence again, we conclude that µ is a probability measure.
Let Di be the discriminant of a minimal equation for ξi . By Proposition 1.6.9, we
have
1
log |Di | ≤ log di + (2di − 2)h(ξi ).
di
Therefore, we get

0 ≤ log |Di | = (2di − 2) log |a0i | + log |σξi − σ  ξi | = o(d2i ). (4.3)
σ=σ 

By (4.2), (4.3), we easily deduce that µ is a continuous measure, in other words


the measure of a point is 0. If not, there are a point a ∈ T and a constant c > 0
such that, for any ε > 0, there are at least cdi conjugates σξi with
|σξi − a| < ε/2, (4.4)
as soon as i is large enough along the sequence for which µi → µ. The contri-
bution to (4.3) of σ, σ  verifying (4.4) is then≤ −c2 log(1/ε)d2i . The remaining
terms contribute not more than O(d2i ) + 2di log+ |σξi | = O(d2i ). This contra-
dicts (4.3).
We claim that the energy integral

1
I(µ) := log dµ(x) dµ(y) ≤ 0 (4.5)
T2 |x − y|
is not positive. To see this, let φ(x) be a positive continuous non-decreasing func-
tion on [0, 1] such that φ(x) = 0 for x ≤ 1/2 and φ(x) = 1 for x ≥ 1, and set
φε (x) = φ(x/ε), where 0 < ε < 1.
We have already observed that log |a0i | = o(di ), hence (4.3) shows that
1  1
lim 2 log = 0.
i→∞ di

|σξi − σ  ξi |
σ=σ

A fortiori, we get
1  1
lim sup φε (|σξi − σ  ξi |) log ≤ 0,
i→∞ d2i 
|σξ i − σ  ξi |
σ=σ

because log(1/t) > 0 if φε (t) < 1, while φε (t) ≤ 1 always. By weak-* conver-
gence, we infer that

1
φε (|x − y|) log dµ(x) dµ(y) ≤ 0
T 2 |x − y|
for 0 < ε ≤ 1. Since µ is a continuous measure, the diagonal has measure 0 and
monotone convergence shows that (4.5) holds.
104 SMALL POINTS

By appealing to well-known results of potential theory (see e.g. E. Hille [152],


vol.II, §16), it is known that the energy integral is minimized by a unique regular
probability measure µT on T. By symmetry, µT must be the Haar measure on
T . It is well known that I(µT ) = 0, indeed exp(−I(µT )) is the logarithmic
capacity, or transfinite diameter, of T, which is equal to 1. This and (4.5) prove
that µ = µT . 
Another way of concluding the proof, which in its discrete version is Bilu’s argu-
ment, is by Fourier analysis. Let µ ∗ ν be the convolution of two regular Borel
measures on T , that is the unique regular Borel measure determined by

f (x) d(µ ∗ ν)(x) = f (xy) dµ(x) dν(y).
T T2
We consider the Fourier coefficients

(n) =
µ x−n dµ(x).
T
Clearly, we have (µ∗ ν)(n) = µ (n)
ν (n) for n ∈ Z . We apply this with our limit
measure µ and with ν equal to the composition of µ with the inversion x → x−1
on T . Then the energy integral can be written as

I(µ) = − log |1 − x|d(µ ∗ ν)(x).
T
Since µ is a real measure, we see that
ν(n) = µ
(n).
The n th Fourier coefficient of − log |1−eiθ | is 0 if n = 0, and 1/(2|n|) if n = 0
(expand − log(1 − z) in Taylor series), so the energy integral is
 | µ(n)|2
I(µ) = ≥ 0.
2|n|
n=0

Equality holds only if µ(n) = 0 for n = 0, hence (4.5) proves that µ is the
uniform measure on T (if a regular Borel measure has all its Fourier coefficients
0, it must be the zero measure). 
Remark 4.3.2. We have stated Bilu’s theorem with the condition that the algebraic
numbers ξi are all distinct. The proof of the theorem shows that the only thing
that matters here is that di → ∞ . By the assumption h(ξi ) → 0 and Kronecker’s
theorem in 1.5.9, we may relax the hypothesis of the theorem to h(ξi ) → 0 and
the condition that no root of unity in the sequence (ξi )i∈N is repeated infinitely
often.
4.3.3. Now we give a second proof of Zhang’s theorem 4.2.2 (b). In the form given
here, this proof is non-constructive (it depends on compactness arguments), hence
it gives only the existence of a positive lower bound, but not an explicit dependence
on n , d , [K : Q], and max h(fi ).
4.3. The equidistribution theorem 105

We proceed by induction on n , the claim being trivial for n = 0.


Suppose we have an infinite sequence of distinct points ξ i ∈ X ∗ with 
h(ξ i ) →
0. We begin by mimicking the construction at the beginning of Section 4.3 by
defining the probability measure
1 
δξ = δσξ
[Q(ξ) : Q]
σ:Q(ξ)→C

associated to the Galois orbit of ξ , and considering a weak-* limit measure µ


of the sequence (δ ξi )i∈N , as in the proof of Theorem 4.3.1. The same argument
given there then shows that µ is supported in Tn .
For any non-trivial character χ(x) = xm 1 · · · xn
1 mn
of (C× )n , consider the asso-
ciated sequence (χ(ξ i ))i∈N . Clearly, h(χ(ξ i )) ≤ (max |mj |) 
h(ξ i ) → 0.
Case I: For every non-trivial character χ , the sequence (χ(ξ i ))iN ultimately con-
sists of distinct elements.
In this case, we claim that

χ(x) dµ = 0. (4.6)
Tn
We prove this as follows. By Theorem 4.3.1, the measure µχ determined by the
sequence (χ(ξ i ))i∈N is the uniform measure on T = χ(Tn ). Let us fix c > 0
and let f ∈ Cc (C× ) be the identity f (x) = x in the neighborhood  log |x| < c
of T. Then the function

1 ) · · · f (xn )
fχ (x) = f (xm 1 mn

has compact support in (C× )n and coincides with χ(x) in a neighborhood of Tn .


By weak-* convergence, we have

1
χ(x) dµ = fχ (x) dµ = lim fχ (σξ i ). (4.7)
Tn (C× )n i→∞ [Q(ξ i ) : Q]
σ:Q(ξi )→C

We would like to replace fχ (x) by f (χ(x)) in the last sum, but this step requires
justification, because f (χ(x)) is not compactly supported in (C× )n . Let m =
max
 |mj| and M = max |f |. By definition of f , we have fχ (x) = f (χ(x)) if
 log |xj | ≤ c/m for every j , and |fχ (x)| ≤ M n in any case.

We have
1  1 
fχ (σξ i ) = f (τ χ(ξ i ))
[Q(ξ i ) : Q] [Q(χ(ξ i )) : Q]
σ:Q(ξi )→C τ :Q(χ(ξi ))→C
1 
+ (fχ (σξ i ) − f (χ(σξ i ))) .
[Q(ξ i ) : Q]
σ:Q(ξi )→C
106 SMALL POINTS

 
A typical summand in the last sum is 0 unless  log |σξij | > c/m for some j ,
and in any case does not exceed M + M n . Thus the last sum does not exceed

n
1  
n
1 
(M +M n ) ≤ (M +M n ).
j=1
[Q(ξ i ) : Q] σ :Q(ξi )→C j=1
[Q(ξij ) : Q] σ :Q(ξ i j )→C
| log |σ ξ i j ||> c / m | log |σ ξ i j ||> c / m

Since h(ξ i ) → 0, by (4.2) on page 102 and Remark 4.3.2, this tends to 0 as
i → ∞ , and proves
1  1 
lim fχ (σξi ) = lim f (τ χ(ξ i ))
i→∞ [Q(ξ i ) : Q] i→∞ [Q(χ(ξ i )) : Q]
σ:Q(ξi )→C τ :Q(χ(ξi ))→C

= x dµχ (x) (by weak-* convergence)
T
= 0. (by Theorem 4.3.1)
In view of (4.7), this proves (4.6).
As in the proof of Bilu’s theorem, it is clear that µ is a probability measure.
Now the characters χ(x) restricted to Tn form an orthonormal basis of L2 (Tn ),
whence (4.6) shows that the restriction of µ to Tn is the uniform measure on Tn ,
because they have the same Fourier coefficients.
In particular, Tn is contained in the union of the conjugates of X over Q . Since
torsion points are Zariski dense in Gnm (Proposition 3.3.6), this contradicts the
assumption that X is a proper algebraic subvariety of Gnm .
Case II: There is a non-trivial character χ such that the sequence (χ(ξ i ))i∈N has
an element ε0 occurring infinitely many times.
Since h(χ(ξ i )) → 0, we have h(ε0 ) = 0 and ε0 is a root of unity by Kronecker’s
theorem in 1.5.9. Let ε be a torsion point such that χ(ε) = ε0 and replace X by
ε−1 X and {ξ i } by {ε−1 ξ i }. Now (ε−1 X)∗ = ε−1 (X ∗ ) and multiplication by
a torsion point does not change the height; therefore, there is no loss of generality
in assuming that ε0 = 1. Further, going to an infinite subsequence of the sequence
(ξ i )i∈N if needed, we may also assume that there is a torsion point ε such that
{ε ξ i } is contained in the connected component of the identity of the kernel of
χ , say H . Now H is a proper subtorus of Gnm and we may replace X , Gnm by
ε X ∩ H and H , and then use induction. 
Remark 4.3.4. As it stands, this proof does not lead to an effective form of
Theorem 4.2.2. However, it is not difficult to show that there is a lower bound
depending only on d , n , [K : Q], and max h(fi ) for a set of defining equations
of X .

We verify this as follows:


4.4. Dobrowolski’s theorem 107

Let us fix d , n , [K : Q], and H . By Northcott’s theorem (see Theorem 2.4.9),


there are only finitely many polynomials in n variables, of degree at most d , with
coefficients in K and height at most H .
Therefore, the set of varieties X in PnK , defined by polynomials of degree at most
d , with coefficients in K and height at most H, is a finite set. Now the required
lower bound depending only on the fixed data is the minimum of the lower bound
obtained for a given variety X , when X varies over this finite set.
The following immediate consequence of Theorem 4.3.1 is worth noting. We recall
that an algebraic number ξ is called totally real if Q(ξ) has only real embeddings
into C .
Corollary 4.3.5. Let κ > 0. If ξ = 0 is algebraic, not a root of unity, and has at
least κ deg(ξ) real conjugates, then h(ξ) ≥ c(κ) > 0 with c(κ) independent of
ξ.
In particular, totally real algebraic numbers other than 0 and ±1 have height
bounded below by an absolute positive constant.
Remark 4.3.6. A. Schinzel [258] obtained the sharp lower bound
 √ 
1 1+ 5
h(ξ) ≥ log = 0.2406059 . . .
2 2

for ξ ranging over all totally


√ real algebraic numbers = 0, ±1, with minimum
attained for ξ = ±(1 ± 5)/2.
Further examples of totally real numbers η with small height can be obtained by
noticing that, if ξ is totally real, then η − η −1 = ξ yields a totally real η of degree
not exceeding 2 deg(ξ). C.J. Smyth [288] used this process, starting
√ with ξ0 = 1,
to construct a sequence of totally real numbers ξ1 = (1 + 5)/2, . . . of small
height, with h(ξn ) accumulating at λ = 0.2732831 . . . .
Moreover, he proved that the heights of totally real numbers are dense in the inter-
val (λ, ∞). It is conceivable that λ is the smallest limit point of h(ξ) for totally
real ξ .
The minimum above is isolated, and subsequently Smyth [288] determined the
first four smallest values of h(ξ) for totally real ξ . They are attained at the points
ξ1 , ξ2 , ξ3 , and 2 cos(2π/7).

4.4. Dobrowolski’s theorem

In this section, we prove the following theorem of Dobrowolski alluded to in


1.6.15. We have:
108 SMALL POINTS

Theorem 4.4.1. Let α be an algebraic number of degree d , not a root of unity or


0. Then
3
c log log(3d)
h(α) ≥
d log(3d)
for an absolute constant c > 0.

We begin with a simple lemma. We assume throughout that α = 0 has degree d .


Let α1 , . . . , αd denote a full set of conjugates of α . We have:
Lemma 4.4.2. We may assume that α is a unit and Q(α) = Q(αp ) for every
prime p not dividing d . In particular, αp has degree d and α1p , . . . , αdp is a full
set of conjugates of αp . Moreover, the algebraic integers αip , for varying i and
p , p = 1 or a prime not dividing d , are all distinct.
Proof: If α is not a unit, we have the easy estimate h(α) ≥ (log 2)/d (see 1.6.15),
which is stronger than the lower bound stated in Dobrowolski’s theorem.
Let ξ = αp and K = Q(ξ). The polynomial xp − ξ is irreducible over a finitely
generated field K , unless ξ = η p for some η ∈ K . This is an old result by
Abel and is a special case of the well-known theorem ∗ of Vahlen–Capelli on the
reducibility of the polynomial xm −a (see L. Rédei [239], Th.427, Th.428 or [49],
Ch.V, §11, ex.5). A simple direct proof is as follows. The roots of xp − ξ = 0 are
λξ 1/p with λ a p th root of unity. If xp − ξ were reducible over K with a monic
irreducible factor g(x) of degree s < p, then looking at the last coefficient of
g(x) we would have λ αs ∈ K for some p th root of unity λ . There is an integer
m with ms ≡ 1 mod p , hence η := (λ )m α ∈ K and ξ = η p , as asserted.
Now suppose that p is not a divisor of d . Let s := [Q(α) : K]. We cannot have
s = p , because p does not divide d . Hence the polynomial xp − ξ is reducible
over K , and αp = ξ = η p with η ∈ K . In particular, h(α) = h(η). If we had
s > 1, then deg(η) < deg(α), and the theorem for α would follow from the
theorem in lower degree. Thus we may assume s = 1, which is the first statement
of the lemma.
Finally, assume that αip = αjq for two distinct primes p and q not dividing
d . By the first part of the lemma, we have that α1q , . . . , αdq is a permutation of
α1p , . . . , αdp , thus αiq = ασ(i)
p
for some permutation σ of {1, . . . , d}. This yields,
if m is the order of σ
m m −1 m −1
αiq = (αiq )q p
= (ασ(i) )q
q m −2 m −2
= (ασ(i) )pq = (ασp 2 (i) )pq
m −1 m −1 m
= · · · = (ασq m −1 (i) )p = (ασp m (i) )p = αip ,


The theorem states that xm − a is reducible over a field K of characteristic 0 if and only if
a = bp for some b ∈ K and a prime p with p|m , or a = −4b4 and 4|m .
4.4. Dobrowolski’s theorem 109

m
−pm
hence αq = 1 and α is a root of unity, which was excluded from the begin-
ning. 
Our next step is the construction of a polynomial F (x) ∈ Z[x] of degree at most
D , vanishing at α to order at least m . Here D and m are large parameters (in
the end going to ∞ ) and we want some control on the height of F (x).
Lemma 4.4.3. Let α be an algebraic number of degree d ≥ 2. Let us fix ε
with 0 < ε < 1 and suppose that dm ≤ (1 − ε)D . Then there is a polynomial
F (x) ∈ Z[x] of degree at most D , not identically 0, vanishing at α to order at
least m and such that

dm2 D Ddm
h(F ) ≤ log +1 + h(α) + o(D)
D − dm m D − dm
as D → ∞ .
Proof: This follows from Siegel’s lemma, Corollary 2.9.7, noting that in our case
the matrix A is

j j−h
A= α
h
with m rows indexed by h , 0 ≤ h < m , and D + 1 columns indexed by j ,
j = 0, 1, . . . , D . Using the easy bound

 
a a
log ≤ b log + 1
b b
we get h(Ah ) ≤ m(log(D/m) + 1) + Dh(α) for the h th row of A . 
4.4.4. In what follows, α will satisfy the conditions of Lemma 4.4.2. Since α is a
unit, it is an algebraic integer (see Remark 1.5.11). Let

d
f (x) = (x − αi ) ∈ Z[x]
i=1

be its minimal polynomial. For every prime p not dividing d let



d
fp (x) = (x − αip )
i=1

and let ep be the multiplicity of fp as a factor of F (x). Note that by Lemma


4.4.2 the polynomials f , fp are irreducible in Z[x] and are all distinct. It follows,

denoting by a product over primes not dividing d , that

F (x) = f (x)m fp (x)ep G(x)
p

for some G(x) ∈ Z[x] with G(α ) = 0 for every prime p not dividing d . More-
p

over, by Lemma 4.4.2, we have f (αp ) = 0 and fq (αp ) = 0 if p = q .


110 SMALL POINTS

Proof of Dobrowolski’s theorem: By the remarks in 1.6.15, we may assume that d


is large.
We differentiate ep times the polynomial F (x), divide by ep !, and specialize x
to αp , obtaining an algebraic integer

ep
1 d
η := f (αp )m fp (αp )ep fq (αp )eq G(αp ) = F (αp ).
ep ! dx
q=p

By 4.4.4, this is a non-zero algebraic integer of degree at most d .


On the other hand
f (xp ) ≡ f (x)p mod p Z[x],
whence, specializing x to α , the algebraic integer f (αp ) is divisible by p . Thus
the norm NQ(α)/Q (η) is a non-zero rational integer divisible by pdm .
An upper bound for the norm comes from the expression of η in terms of F (x)
just given and Lemma 4.4.3. We claim that

ep


d
 
NQ(α)/Q 1 d p 
F (α )  ≤ (D + 1)
D
H(F ) H(α)dpD , (4.8)
 ep ! dx ep

where H(α) = exp(h(α)) is the multiplicative height.


To see this, note first that

ep

1 d D
H F ≤ H(F ). (4.9)
ep ! dx ep

Next, for any polynomial G ∈ Q[x] of degree D and any β ∈ Q, we have

H(G(β)) ≤ (D + 1)H(G)H(β)D . (4.10)

Next, it is clear that

H(αp ) = H(α)p . (4.11)

Then the stated bound (4.8) for the norm follows from the last statement of Propo-
sition 1.6.6 and (4.9), (4.10), and (4.11).
We compare with the lower bound pdm , take logarithms, divide by d , and use
Lemma 4.4.3 to estimate h(F ), obtaining

D dm2 D
m log p ≤ ep log +1 + log +1
ep D − dm m

Ddm
+ pD + h(α) + o(D).
D − dm
4.4. Dobrowolski’s theorem 111

We define γ ≤ 1−ε and γp by m = γD/d and ep = γp D/d . Then the inequality


above simplifies to

γ log p γp d γ2 d
≤ log +1 + log + 1
d d γp (1 − γ)d γ

(4.12)
γ
+ p+ h(α) + o(1),
1−γ
with the o(1) term going to 0 as D → ∞ .
It remains to optimize inequality (4.12) by choosing a range for the prime p , the
parameter γ < 1, and an optimal γp . In what follows, we shall assume that d is
large and deal with estimates asymptotic with respect to d .
It is clearly convenient to make sure that γp is as small as possible, and to this end
we note that 

γp ≤ 1,
p
e
because each fp p divides F and fp has degree d . In particular, if our set of
primes consists of the primes in an interval [Y0 , Y ] not dividing d , and N > 0 is
the number of such primes, there exists p ∈ [Y0 , Y ] such that p does not divide d
and γp ≤ 1/N . Since x(log(d/x) + 1) increases with x for 0 < x < d, we see
that (4.12) can be replaced by

γ log p 1 γ2 d γ
≤ (log(N d)+1)+ log + 1 + p + h(α)+o(1).
d Nd (1 − γ)d γ 1−γ
Now we let D → ∞ , getting rid of the o(1) term, and deduce a fortiori

γ log Y0 1 γ2 d γ
≤ (log(N d) + 1) + log + 1 + Y + h(α),
d Nd (1 − γ)d γ 1−γ
(4.13)
with N the number of primes, not dividing d , in [Y0 , Y ].
In order to optimize (4.13), we consider the parameters Y0 , Y , γ as functions of
d → ∞.
We begin by estimating N . The number of primes dividing d is less than
log d/ log 2 , therefore the prime number theorem (H. Koch [162], Theorem
1.7.3) shows that
Y
N = π(Y ) − π(Y0 ) + O(log d) ∼ ,
log Y
provided Y0 = o(Y ) and log d = o(Y / log Y ) as Y → ∞ , which we shall
assume. We do not want to choose Y too large (in fact, Y = o(d) will suffice),
nor Y0 too small, and it is quite reasonable to choose Y0 = Y / log Y  , ensuring
that log Y0 ∼ log Y , and
log Y = o(log d), (4.14)
112 SMALL POINTS

ensuring that log(Y d) ∼ log d . Moreover, we do not want γ to be too small, and
in fact we shall need
log(1/γ) = o(log d) (4.15)
as d → ∞ . Then (4.13) becomes, as Y → ∞
(1 − o(1))γ log Y log Y γ2
≤ (1+o(1)) log d+(1+o(1)) log d+(1+o(1))Y h(α).
d Yd d
We rewrite this as

γ log Y (log Y )(log d) γ 2 log d


h(α) ≥ (1 − o(1)) − −
Yd Y 2d Yd
and proceed to optimize the two remaining parameters γ and Y , as functions of
d . Optimization with respect to γ gives
log Y
γ= ,
2 log d
which is compatible with condition (4.15), yielding

(log Y )2 (log Y )(log d)


h(α) ≥ (1 − o(1)) − .
4Y d log d Y 2d
Finally, optimization with respect to Y occurs with
4(log d)2
Y ∼ ,
log log d
which is compatible with condition (4.14), and conclude with

3
1 log log d
h(α) ≥ (1 − o(1))
8d log d
as d → ∞ . This proves Dobrowolski’s theorem. 
Remark 4.4.5. The constant 1/8 given here is far from being optimal for the
method given here. A more careful evaluation in Lemma 4.4.3 yields easily the
constant 1, as in E. Dobrowolski’s original paper [90].

The remaining part of this section deals with further results about Lehmer’s conjecture and
may be skipped in a first reading. We will use it only in Section 4.6. A natural extension of
Lehmer’s conjecture to the higher dimensional case is:
Conjecture 4.4.6. There is c(n) > 0 with the following property. let α1 , . . . , αn be
multiplicatively independent non-zero algebraic numbers. Then
c(n)
h(α1 ) · · · h(αn ) ≥ .
[Q(α1 , . . . , αn ) : Q]
For n = 1 , this reduces to Lehmer’s conjecture.
A significant extension of Dobrowolski’s theorem has been obtained by F. Amoroso and S.
David [9] in this context. They prove:
4.4. Dobrowolski’s theorem 113

Theorem 4.4.7. There is a positive constant c (n) with the following property. Let α be
as in 4.4.6 and let D = [Q(α1 , . . . , αn ) : Q] . Then
c (n)
h(α1 ) · · · h(αn ) ≥ (log(3D))−nκ(n) ,
D
where κ(n) = (n + 1)(n + 1)!n − 1 .
In fact, they prove this result in the more precise form in which the degree D is replaced
by the smallest degree ωQ (α) of a hypersurface defined over Q containing the point α .
As a corollary, we obtain the validity of Lehmer’s conjecture for any α not a root of unity
such that Q(α) is a Galois extension of Q .
We will not prove this result here and instead refer the interested reader to the original paper
of Amoroso and David.
4.4.8. Recall that a field extension K/Q is called abelian if it is a Galois extension with
an abelian Galois group. We conclude this section with a nice result of F. Amoroso and R.
Dvornicich [11], which provides a uniform positive lower bound for the height of algebraic
numbers, not a root of unity or 0 , in abelian extensions of Q .
Theorem 4.4.9. Let K/Q be an abelian extension and let α ∈ K , α not a root of unity
or 0 . Then
log(5/2)
h(α) ≥ .
10
Remark 4.4.10. Amoroso and Dvornicich obtain the more precise lower bound log(5)/12
and give an example with height log(7)/12 .
4.4.11. For m ≥ 3 let ζm be a primitive m th root of unity and denote by Cm = Q(ζm )
the m th cyclotomic field of degree ϕ(m) , and by Om the ring of integers of Cm . By
the Kronecker–Weber theorem (see L.C. Washington [322], Th.14.1), any finite abelian
extension K of Q is contained in a cyclotomic extension of Q . Thus, in proving 4.4.9,
there is no loss of generality in assuming that K = Cm for some m .
Lemma 4.4.12. Let K be a number field and let w be a non-archimedean place of K .
Then, for any α ∈ K \ {0} , there exists an algebraic integer β ∈ K \ {0} , such that αβ
is an algebraic integer and
|β|w = 1/ max(1, |α|w ).
Proof: Let S0 be the set of non-archimedean places v of K for which |α|v > 1 and set
ξv = 1/α . If, moreover, w ∈ / S0 , set ξw = 1 . Define S = S0 ∪ {w} . By the strong
approximation theorem (see Theorem 1.4.5), for any ε > 0 , there is β ∈ K \ {0} such
that
|β − ξv |v < ε
for every v ∈ S , and also |β|v ≤ 1 for a non-archimedean v not in S .
Since we are dealing with ultrametric absolute values, by definition of S0 and ξv we see
that, if ε is sufficiently small, we have |β|v = |1/α|v ≤ 1 for v ∈ S0 , and |β|w = 1
if w ∈ / S0 . Hence |β|w = min(1, |1/α|w ) = 1/ max(1, |α|w ) and also |β|v ≤ 1 and
|αβ|v ≤ 1 for v ∈ S . If instead v ∈ / S is a non-archimedean place, we have |α|v ≤ 1 by
definition of S0 , and again |β|v ≤ 1 and |αβ|v ≤ 1 for a non-archimedean v not in S .
Hence β and αβ are both algebraic integers, as noted in Definition 1.5.10. 
114 SMALL POINTS

Lemma 4.4.13. Let p be a rational prime. Then there exists a non-trivial σ = σp ∈


Gal(Cm /Q) with the following property:

(a) If GCD(p, m) = 1 , then p divides γ p − σγ for any γ ∈ Om .


(b) If GCD(p, m) = p , then p divides γ p − σγ p for any γ ∈ Om . Moreover, if
σγ p = γ p , there exists an m th root of unity ζ such that ζγ is contained in the
proper cyclotomic subfield Cm/p of Cm .
Proof: We recall that the ring of integers of Cm is Z[ζm ] (see [322], Ch.1, Prop.1.2). Thus
we can write γ = f (ζm ) for some f ∈ Z[x] . Suppose first that p does not divide m . Let
σ ∈ Gal(Cm /Q) be defined by σζm = ζm p
. Then
γ p ≡ f (ζm
p
) ≡ f (σζm ) ≡ σγ mod p,
proving (a).
If instead p divides m , we argue as follows. The Galois group Gal(Cm /Cm/p ) is cyclic
of order p or p − 1 according as p2 divides m or not. Let σ be a generator; then σζm =
λp ζm for some primitive p -root of unity λp . A similar calculation as before yields
γ p ≡ f (ζm
p p
) ≡ f (σζm ) ≡ σγ p mod p,
which is the first part of statement (b). Finally, if σγ = γ , we have σγ = λap γ for some
p p
a a a
integer a . It follows that σ(γ/ζm ) = γ/ζm and γ/ζm belongs to the fixed field Cm/p .

Proof of Theorem 4.4.9: Let α ∈ Cm \ {0} , α not 0 or a root of unity. Let p be a prime.
Since h(α) = h(ζα) for any root of unity ζ , replacing α by ζα we may also assume that
ζα is never contained in a proper cyclotomic subfield of Cm .
Case I: p does not divide m . Let σ be the element of Gal(Cm /Q) as in Lemma 4.4.13
and let v be any place dividing p . By Lemma 4.4.12, there is an algebraic integer β ∈ Cm
such that αβ is an algebraic integer and |β|v = 1/ max(1, |α|v ) . Then by Lemma 4.4.13
we have
|(αβ)p − σ(αβ)|v ≤ |p|v , |β p − σβ|v ≤ |p|v . (4.16)
Let us write η := αp −σα . By the ultrametric inequality, the bounds (4.16) and the identity
αp − σα = β −p ((αβ)p − σ(αβ) + (σβ − β p )σα),
we have
|η|v = |β|−p p p
v |(αβ) − σ(αβ) + (σβ − β )σα|v

≤ |β|−p p p
v max(|(αβ) − σ(αβ)|v , |β − σβ|v |σα|v )

≤ |p|v |β|−p
v max(1, |σα|v )

= |p|v max(1, |α|v )p max(1, |σα|v ).

Moreover, η = 0 because α is not a root of unity. Otherwise, by induction on i , we would


i
have αp − σ i α = 0 and, taking i to be the order of σ in Gal(Cm /Q) , α would be a root
of unity, which was excluded by hypothesis.
4.4. Dobrowolski’s theorem 115

Now we apply the product formula to η . Let as usual εv = 0 if v is not archimedean and
εv = [(Cm )v : Qv ]/[Cm : Q] if v|∞ . If v|p , we have shown that
log |η|v ≤ log |p|v + p log+ |α|v + log+ |σα|v .
If instead v does not divide p , we have trivially
log |η|v ≤ p log+ |α|v + log+ |σα|v + εv log 2.
Hence summing over all places and using the product formula, we get

0= log |η|v
v
 
≤ log |p|v + (p log+ |α|v + log+ |σα|v + εv log 2)
v|p v

= − log p + ph(α) + h(σα) + log 2 = − log(p/2) + (p + 1)h(α).

Therefore, in Case I, we have


log(p/2)
h(α) ≥ .
p+1
Case II: p divides m . We proceed in a similar way as in Case I, working now with η :=
αp − σαp . The application of Lemma 4.4.13, proceding as before but using this time the
identity
αp − σαp = β −p (αβ)p − σ(αβ)p + (σβ p − β p )σαp ),
shows that, if v|p , then
|η|v ≤ |p|v max(1, |α|v )p max(1, |σα|v )p .
If instead v does not divide p , we have the trivial estimate
log |η|v ≤ p log+ |α|v + p log+ |σα|v + εv log 2.

Note that η = 0 , otherwise Lemma 4.4.13 shows that there would be a root of unity
ζ ∈ Cm such that ζα ∈ Cm/p , which was excluded at the start. Hence the application of
the product formula yields, much in the same way as in the preceding case, the inequality
log(p/2)
h(α) ≥ .
2p
Theorem 4.4.9 follows by considering the prime p = 5 . 
We denote by a bar complex conjugation on Cm , namely the automorphism determined by
−1
ζm
→ ζm .
Corollary 4.4.14. Let γ be an algebraic integer in Cm . If γ/γ is not a root of unity, it
holds that
1   log(5/2)
log NC m /Q (γ) ≥ ,
ϕ(m) 10
where ϕ is the Euler ϕ -function.
116 SMALL POINTS

Proof: By Theorem 4.4.9, Lemma 1.3.7, and the product formula, we have
log(5/2)  
≤ log+ |γ/γ|v = log+ |γ/γ|v
10 v v | ∞
 
≤− log |γ|v = log |γ|v
v | ∞ v|∞

1  
= log NC m /Q (γ).
ϕ(m)

This proves what we want. 


The next result, apart from the numerical constant, is a well-known result of C.J. Smyth
[287].
Theorem 4.4.15. Let α = 0 be an algebraic integer of degree d and assume that α−1 is
not a conjugate of α . Then
c
h(α) ≥
d
for an absolute constant c .
Remark 4.4.16. The result of Smyth shows that the optimal constant c is c = log(θ0 ) ,
where θ0 > 1 is the smallest Pisot–Vijaraghavan number, namely the real root of the cubic
equation x3 −x−1 = 0 . The method of Smyth is based on techniques of complex function
theory and is quite different from the algebraic method followed here.
Proof: We have already remarked in 1.6.15 that such an estimate holds, with the constant
c = log 2 , if α is not an algebraic integer. Thus we may assume that α is an algebraic
integer of degree d ≥ 2 .
Let f (x) ∈ Z[x] be the minimal polynomial of α and let p be a prime number. We set
γ := f (ζp ) and apply Corollary 4.4.14 to the algebraic integer γ . To this end, we need to
verify that γ/γ is not a root of unity, at least for p large enough.
Suppose the contrary. Then γ/γ must be a root of unity in Cp , hence f (ζp ) = ±ζpj f (ζp−1 )
for some integer j , which we may assume to be in the range −1 ≤ j ≤ p − 2 . It follows
that
ζpd f (ζp ) = ±ζpj f ∗ (ζp ),
where f ∗ (x) = xd f (x−1 ) is the reciprocal polynomial of f . Consider the polynomial
g(x) = xmax(d−j,0) f (x) ∓ xmax(j−d,0) f ∗ (x).
Then g(x) is not identically 0 unless j = d and f (x) = ±f ∗ (x) , which is excluded
because α−1 is not a conjugate of α .
Clearly, g(x) has degree at most d + |d − j| ≤ max(2d + 1, p − 2) . On the other hand,
g(x) has degree at least p − 1 because ζp is a root of g(x) . This is a contradiction if
p ≥ 2d + 3 .
By Corollary 4.4.14, we deduce
1   log(5/2)
log NC p /Q (f (ζp )) ≥ .
p−1 10
4.5. Remarks on the Northcott property 117

Finally, by 1.6.15 and Proposition 1.6.6, we have

  1   
p−1
1
lim log NC p /Q (f (ζp )) = lim log f (e2πih/p )
p→∞ p−1 p→∞ p − 1
h=1
1
 2πiθ 
= log f (e ) dθ
0
= log M (f ) = dh(α). 

4.5. Remarks on the Northcott property

In this section, we consider only sets of algebraic numbers contained in a fixed algebraic
closure Q .

4.5.1. We say that a set A of algebraic numbers has the Northcott property (N) if for
every positive real number T the set
" #
A(T ) = α ∈ A | h(α) ≤ T

is finite. The Northcott theorem states that the set of all algebraic numbers of degree at most
d has property (N) (see Theorem 1.6.8).
We may ask if property (N) holds for other interesting sets. For example, does it hold for the
field Q(d) , the composite field of all number fields of degree at most d over Q ? Although
this question remains open in general, we shall show that this is the case if d = 2 . More
generally, we show that property (N) holds for the maximal abelian subfield of Q(d) .

4.5.2. Let K be a number field and denote by K (d) the compositum of all extension fields
(d)
F/K of degree at most d over K . Then K (d) is normal over K . We also denote by Kab
the compositum of all finite abelian extensions L/K with K ⊂ L ⊂ K . Since the
(d)
(d)
compositum of two finite abelian extensions is again a finite abelian extension, Kab is the
(d)
union of all finite abelian extensions over K . In particular, Kab /K is a Galois extension,
(d) (d)
and it is the maximal abelian subfield of Kab . If d ≥ 2 , the fields K (d) and Kab have
infinite degree over K .
We recall that a Galois extension F/K is called of exponent dividing n ∈ N if the order
of every element of Gal(F/K) divides n .
If L is a finite extension of K with [L : K] ≤ d and Galois closure F , then [F : K] ≤ d!
and hence F has exponent dividing d! . Since the compositum of Galois extensions of
exponent dividing n is obviously a Galois extension of exponent dividing n , we conclude
(d)
that K (d) and hence also Kab are Galois extensions of exponent dividing d! .

In the following result, we will prove that the local degrees of K (d) /K are bounded. This
is a motivation to consider the Northcott property (N) for the field K (d), which is an open
(d)
problem. However, we will prove in Theorem 4.5.4 that Kab has property (N).
118 SMALL POINTS

Proposition 4.5.3. Let v be any place of MK and let w be an extension of v to K (d) and
(d) (d)
let Kv and Kw be the corresponding completions. Then the local degree [Kw : Kv ] is
bounded in terms of d and [K : Q] alone, independently of v, w .
Proof: It is enough to consider non-archimedean places. Let us fix an algebraic closure
Ωv of Kv and let p be the residue characteristic of v . By results of M. Krasner [163],
the number of subextensions of Ωv /Kv is precisely known. We only use here that the
number of extensions of degree at most d is finite and bounded only in terms of d and
[Kv : Qp ] . Therefore, the degree of their compositum is bounded only in terms of d and
(d)
[Kv : Qp ] ≤ [K : Q] . Since Kw may be embedded in such a compositum, the result
follows. 
(d)
Theorem 4.5.4. Property (N) holds for the field Kab , for any d ≥ 2 .
Corollary 4.5.5. The field K (2) has property (N).
(2)
Proof: Obvious, because K (2) = Kab . 
√ √ √
Corollary 4.5.6. For any m ≥ 2 , the field Q( 1, 2, 3, . . . ) has property (N).
m m m

√ √
Proof: Let K = Q( m 1) . Then each field K(√m a)√is of√degree at most m and abelian
over K . Therefore, their compositum F = Q( 1, m 2, m 3, . . . ) is abelian over K and
m

(m) (m)
a subfield of Kab . By Theorem 4.5.4, Kab has the Northcott property and the same
holds for its subfield F . 
Proof of Theorem 4.5.4: In what follows, we abbreviate D = d! . We√may enlarge the
number field K , hence we may suppose that K contains the field Q( D 1) generated by
(d)
roots of unity of order D . Let us fix a positive real number T and let α ∈ Kab satisfy
h(α) ≤ T . As a subfield of an abelian field, L = K(α) is automatically a finite abelian
extension of K . By 4.5.2, L/K has exponent dividing D .
Let p be a prime, unramified in K and let v be a place of K above p . For the following
considerations, the reader is assumed to be familiar with the notation and results from ram-
ification theory developed in B.2.18 and B.2.19. Let e = ew/v be the ramification index
of a place w of L lying over v . Since Gal(L/K) operates transitively on these places,
e does not depend on the choice of w . If p > d , then our remarks on the exponent show
that p does not divide the order of Gal(L/K) . Hence w will be tamely ramified over v
and the inertia group of w over v is cyclic of order e (see B.2.18 (d), (e)) proving that e
divides D .
Now let θ = p1/e for some choice of the root, and consider the field L(θ) with a place
u lying over w . As a compositum of two abelian extensions of exponent dividing D , we
note that L(θ)/K is also abelian of exponent dividing D . Using the theory of Eisenstein
polynomials (see J.-P. Serre [276], Ch.I, §6) and v unramified over p , we deduce that the
ramification index of u|K (θ) over v is e and the residue degree is 1 . By Abhyankar’s
lemma ([215], Cor.4, p.236), this and u|K (θ) tamely ramified over v imply that u is un-
ramified over w . By Proposition 1.2.11, we conclude eu/v = e .
Let I ⊂ Gal(L(θ)/K) be the inertia group of u over v , a group of order e . Since
L(θ)/K is abelian, all the inertia groups above v are equal to I . Define U as the fixed
field of I . Then U is normal over K and U/K is unramified over v (see B.2.18 (d)). By
4.5. Remarks on the Northcott property 119

Galois theory, [L(θ) : U ] = |I| = e . Since u|U is unramified over p , we see again by the
theory of Eisenstein polynomials that u|U (θ) has ramification index e = [U (θ) : U ] over
u|U , proving in particular that U (θ) = L(θ) . It follows that α ∈ U (θ) and we may write
α = β0 + β1 θ + · · · + βe−1 θe−1 , βi ∈ U.
r
The conjugates of θ over U are ζ θ , where ζ is a primitive e th root of unity and r =
0, 1, . . . , e − 1 . Therefore, the trace TrU (θ)/U (θj ) vanishes if j is not a multiple of e and
equals e if j = 0 . Hence
1 
e−1
1
βj = TrU (θ)/U (αθ−j ) = j/e αr ζ −rj ,
e ep r=0

where αr are certain conjugates of α . Note that Proposition 1.5.17 yields h(αr ζ −rj ) =
h(α) ≤ T for 0 ≤ r ≤ e − 1 . By a standard inequality about the height of a sum (see
Proposition 1.5.15), we find

h(βj pj/e ) ≤ log e + h(αr ) + log e ≤ 2 log D + DT. (4.17)
r

As before, let u be any place of U (θ) = L(θ) above v and use the same letter to denote
the associated discrete valuation normalized by u(L(θ)× ) = Z . Since βj ∈ U , we have
that u(βj ) is divisible by e . Suppose now 1 ≤ j ≤ e − 1 . Then u(pj/e ) = j is not
divisible by e , whence u(βj pj/e ) = 0 . This shows that |u(βj pj/e )| ≥ u(p1/e ) = 1 .
Let us abbreviate γ = βj pj/e and suppose that γ = 0 . Letting δu be the local degree
δu := [U (θ)u : Qp ] , the choice of our normalizations in 1.3.6 leads to
 
 log |γ|u  ≥ − 1 log |p|u = δu log p .
e e[U (θ) : Q]
Thus we have
⎛ ⎞
   1 
2h(γ) = h(γ) + h(γ −1 ) ≥  log |γ|u  ≥ ⎝ δu ⎠ log p.
e [U (θ) : Q]
u|v u|v

By Corollary 1.3.2, we have δu = [U (θ) : K] . We conclude that, if γ = 0 , then
1
2h(γ) ≥ log p.
e [K : Q]
Comparing with (4.17), we derive that either βj = 0 or
log p ≤ 2e [K : Q](2 log D + DT ).
Let S be the set of rational primes containing all prime divisors of the discriminant DK/Q
and all primes p ≤ exp(2e [K : Q](2 log D + DT )) . If v ∈ MK is lying over a prime
p ∈ S , then B.2.13 shows that v is unramified over p . Hence our considerations above
yield that we must have βj = 0 for 1 ≤ j ≤ e−1 . This means that the algebraic number α
lies in U , which is an abelian extension of K of exponent dividing D and unramified over
v . Hence K(α) is unramified above any p ∈ S . This implies that K(α) is of bounded
degree over K , as we will show in Example 10.5.11. Here, we give a direct argument based
on Hermite’s discriminant theorem: Recall that a cyclic extension is a Galois extension with
cyclic Galois group. Writing Gal(K(α)/K) as a direct product of cyclic groups of order
dividing D , we see that K(α) is the compositum of cyclic extensions of K of degree at
120 SMALL POINTS

most D , each unramified ouside S . On the other hand, the power to which a prime divides
the discriminant of a number field of bounded degree is itself bounded (use Theorem B.2.12
and Corollary 1.3.2). Hence the discriminants of these cyclic extensions of K are bounded.
We conclude by Hermite’s discriminant theorem in B.2.14 that there are only finitely many
such cyclic fields. Hence there are only finitely many distinct fields K(α) and, since α has
bounded height, Northcott’s theorem, as in 1.6.8, shows that only finitely many possibilities
for α can occur. 

4.6. Remarks on the Bogomolov property

Again, we work always inside a fixed algebraic closure Q of Q .


4.6.1. We say that a set A of algebraic numbers has the Bogomolov property (B) if there
exists a positive real number T0 such that A(T0 ) consists of all roots of unity in A . We
have already seen in Theorem 4.4.9 that the infinite cyclotomic extension of Q generated
by all roots of unity has property (B). Another example of a field with property (B) is the
field of totally real numbers, see Corollary 4.3.5.
We will give an extension of Bilu’s theorem in 4.3.1 and its Corollary 4.3.5 to a p -adic
setting and deduce from this some new cases of infinite algebraic field extensions with
property (B).
4.6.2. For simplicity, we shall consider here only normal extensions L of Q . Given such an
extension, we denote by S(L) the set of rational primes p such that L may be embedded
in some finite extension Lp of Qp . We may also assume that the closure of L in Lp is
again Lp , in which case, since L is normal, the residual degree fp and ramification index
ep of the extension Lp /Qp do not depend on the given embedding (using Corollary 1.3.5).
Theorem 4.6.3. If S(L) is not empty, then the field L has property (B). More precisely
1  log p
lim inf h(α) ≥ . (4.18)
α∈L 2 ep (pfp + 1)
p∈S (L)

Remark 4.6.4. The lim inf is with respect to the directed system of finite subsets of L .
If the sum on the right-hand side of (4.18) diverges, then L has property (N). Thus the
question arises whether there are infinite extensions L where this occurs. We have been
unable to find such examples, and we consider it unlikely that this can occur for an infinite
extension.
By Proposition 4.5.3, S(Q(d) ) is the set of all prime numbers for every d ∈ N .
Example 4.6.5. Let us say that a non-zero algebraic number α is totally p -adic if the
rational prime p splits completely in the field Q(α) , meaning that all local degrees of
places over p are 1 . Then the field L of all totally p -adic algebraic numbers is normal
and p ∈ S(L) . Hence L has the Bogomolov property. This may be considered as the
p -adic analog of results of Schinzel and Smyth for totally real algebraic numbers alluded to
in Remark 4.3.6.
Example 4.6.6. Let p1 , . . . , pm be distinct rational primes and let L be the field of all
totally p -adic algebraic numbers for p = p1 , . . . , pm . Then it is clear that pi ∈ S(L) for
4.6. Remarks on the Bogomolov property 121

i = 1, . . . , m . We can show that in this field L , we have


m
log pi
lim inf h(α) ≤ .
α∈L pi −1
i=1

This shows that the lower bound given by (4.18) is of the correct order of magnitude, insofar
as the contribution of primes with fp = ep = 1 is concerned. We will not prove this result
here, and refer instead to E. Bombieri and U. Zannier [40].
4.6.7. Proof of Theorem 4.6.3: We shall prove a general lower bound for the height of an
algebraic number, of which Theorem 4.6.3 will be an easy corollary.

Let K be a Galois extension of Q , let α ∈ K \ {0} , and denote by α1 , α2 , . . . , αd a full


set of conjugates over Q , satisfying a minimal equation
ad xd + ad−1 xd−1 + · · · + a0 = 0
over Z , of discriminant D .
Fix a rational prime p and denote by v an extension to K , of residue degree fp and ram-
ification index ep , of the usual valuation − log | |p in Qp . By reordering the conjugates,
we may assume
v(α1 ) ≥ · · · ≥ v(αr ) ≥ 0 > v(αr+1 ) ≥ · · · ≥ v(αd ).
By Gauss’s lemma in 1.6.3 applied to

d
ad (x − αi ) = ad xd + ad−1 xd−1 + · · · + a0 ,
i=1

we find

d
v(ad ) + min(0, v(αd )) = min v(ai ),
i
i=1

from which it follows that



d
v(ad ) = − v(αi ), (4.19)
i=r+1

because the coefficients ai are integers without common factors.


In order to evaluate v(D) from below, we consider first the contribution to the product
coming from terms with v(αj ) < 0 . We have
 d j−1 
   d
v (αi − αj ) ≥ (j − 1) v(αj ),
j=r+1 i=1 j=r+1

yielding the lower bound


 
d
v(D) ≥ (2d − 2) v(ad ) + 2 v(αi − αj ) + 2 (j − 1) v(αj ).
i<j≤r j=r+1
122 SMALL POINTS

We substitute (4.19) in the right-hand side of this inequality and obtain from the formula
for the discriminant in Proposition 1.6.9 the inequality
 
d
v(D) ≥ 2 v(αi − αj ) − 2 (d − j) v(αj ). (4.20)
i<j≤r j=r+1

Consider now the reductions of αi , i ≤ r , modulo the maximal ideal of the valuation ring
of v . They are elements of the finite field Fq with q = pfp . For x ∈ Fq , let Nx be the
number of conjugates αi with reduction x . Suppose i < j ≤ r . If αi and αj have the
same reduction, we have v(αi − αj ) > 0 , hence v(αi − αj ) ≥ 1/ep , and otherwise we
have v(αi − αj ) ≥ 0 ; note that the number of pairs (i, j) with i < j and such that αi and
αj have the same reduction x is Nx (Nx − 1)/2 . If instead j > r , we have v(αj ) < 0 ,
hence v(αj ) ≤ −1/ep .
In view of these remarks, we deduce from (4.20) that
1  1
v(D) ≥ Nx (Nx − 1) + (d − r)(d − r − 1). (4.21)
ep x∈F ep
q

A more elegant formulation of (4.21) is obtained by defining the reduction of an element


with negative valuation to be ∞ . With this convention, we have

N∞ = d − r, Nx = d.
x∈Fq ∪{∞}

Therefore, introducing the normalized variance



2
1 d
Vp (α; K) := 2 Nx − ,
d q+1
x∈Fq ∪{∞}

we rewrite (4.21) as

d2 1 d
v(D) ≥ Vp (α; K) + − . (4.22)
ep q+1 ep
This estimate is useful only in the range q < d , but, since D is a non-zero rational integer,
we have v(D) ≥ 0 in any case. Thus from (4.22) it follows that
 1 1 1

log |D| ≥ d2 Vp (α; K) + − log p, (4.23)


ep q+1 d
q<d

where the sum ranges over all primes p with q = pf < d . On the other hand, by Proposi-
tion 1.6.9, we have
log |D| ≤ d log d + (2d − 2)d h(α). (4.24)
Combining (4.23) and (4.24), we finally obtain:
Theorem 4.6.8. Let K be a Galois extension of Q . For a non-archimedean place v of K
lying over the rational prime p let fp and ep be the residue degree and ramification index
of v over p and write q := pfp . Let α ∈ K \ {0} be of degree d . Then

log d d  1 1 1
h(α) ≥ − + Vp (α; K) + − log p.
2d − 2 2d − 2 ep q+1 d
q<d
4.7. Bibliographical notes 123

4.6.9. Completion of the proof of Theorem 4.6.3: For α ∈ L , we apply Theorem 4.6.8 with
K = L . Since Vp (α; K) ≥ 0 in any case and we may restrict the sum to finitely many
primes p ∈ S(L) , the proof is completed by noting that, by Northcott’s theorem from 1.6.8,
in any infinite sequence of distinct algebraic numbers of bounded height, the degrees must
go to ∞ . Thus d → ∞ if we want to estimate lim inf h(α) in L . 

Remark 4.6.10. Theorem 4.6.8 implies an equidistribution theorem for elements of an


infinite sequence (αn ) of algebraic numbers with height tending to 0 . In particular, for
any sequence (αn ) along which h(αn ) → 0 , we have that, if p is unramified in the Galois
closure of αn , then q := pfp → ∞ and


2
1 deg(αn )
Nx − log p → 0. (4.25)
deg2 (αn ) x∈Fq ∪{∞}
q+1

This may be regarded as an analog of Bilu’s theorem from 4.3.1. See also R. Rumely [254]
for related results in a p -adic and adelic setting.

4.7. Bibliographical notes

Theorem 4.2.2 (a) in the special case of a linear equation a1 x1 + a2 x2 + · · · +


an xn = 1 is quite old, the prototype going back to a theorem of H.B. Mann [191].
If n = 2, Lang [169], p.201, attributes it to Y. Ihara, J.-P. Serre, and J. Tate, and
gives Tate’s proof. Explicit results are in J.H. Conway and A.J. Jones [71] and
R. Dvornicich and U. Zannier [94]. The methods in these papers are based on
studying the action of the absolute Galois group of Q on torsion points of high
order. The heuristic argument behind such methods is that, if the order of torsion
points in X is unbounded, then X contains the closure, in the usual complex
topology, of some non-trivial analytic subgroup of Tn . The Zariski closure of this
analytic subgroup then provides a non-trivial linear torus contained in X .
H.P. Schlickewei [262] proved that the number of non-degenerate solutions of
a1 x1 + · · · + an xn = 1 in roots of unity is at most 24(n+1)! for arbitrary complex
numbers a1 , . . . , an .
Shou-Wu Zhang’s results can be found in a series of papers [339], [340], [341].
Theorem 4.2.3 is due to E. Bombieri and U. Zannier [38], building on earlier ideas
of D. Zagier [336] and W.M. Schmidt [272]. Lemma 4.2.8 is inspired by [90]. The
inductive proof in [38] yields extraordinarily small values for γ(d, n) expressed
by towers of exponentials of length n , but W.M. Schmidt [273] later obtained
explicit values for γ(d, n) and N (d, n) requiring only double exponentials. Much
better lower bounds, with a dependence on d in γ(n, d) of type d−c(n) , have been
obtained by S. David and P. Philippon [83], with deeper methods of arithmetic
geometry beyond the scope of this book.
124 SMALL POINTS

Further results going beyond Zagier’s lower bound in Remark 4.2.7 can be found
in C. Doche [91]. The proof of Theorem 4.3.1 is a modification of an argument of
Bilu [24] and the alternative argument is a suggestion of J. Bourgain.
Dobrowolski’s theorem has been slightly improved by R. Louboutin [183], who
obtains a constant 9/4 instead of the constant 1/8 given here, by a different
method. The argument given here can also be refined to give the same constant
9/4, by using the full force of Siegel’s lemma in 2.9.4 (including the use of suc-
cessive minima).
The higher dimensional version of Dobrowolski’s theorem is due to Amoroso and
David [9] (see also [10] for a correction and further results).
The presentation of the Amoroso–Dvornicich theorem and its application to
Smyth’s theorem follows closely [11]. A relative version of this result has been
obtained by Amoroso and Zannier in [12].
The remarks about the Northcott property and the Bogomolov property can be
found in a paper of Bombieri and Zannier [40].
5 T H E U N I T E QUAT I O N

5.1. Introduction

Let K be a number field. A classical and important problem is that of determining


the units u of K such that 1 − u is also a unit. More generally, let Γ be a finitely
generated subgroup of K × × K × . The unit equation in Γ is the equation

x+y =1

to be solved with (x, y) ∈ Γ . A basic result, going back to Siegel, Mahler, and
Lang, asserts that this equation has only finitely many solutions. In Section 5.2,
we shall give a complete proof of this result based on the uniform Zhang theorem
in 4.2.3 and obtain a uniform bound for the number of solutions. This is applied
in Section 5.3 to give an upper bound for the number of integer solutions of the
Thue–Mahler equation and of a hyperelliptic equation.
The important problem of finding explicit upper bounds for the height of solutions
of a unit equation requires different methods. In Section 5.4, we give just some
results. We refer to A. Baker’s monograph [14] and J.-P. Serre [277], §8.3, for an
approach using Baker’s theory of linear forms in logarithms.
We may also consider a linear torus G over a field of characteristic 0, a finitely
generated subgroup Γ of G and study the set C ∩ Γ , where C is a geometrically
irreducible algebraic curve in G . Lang proved that, if C ∩ Γ is an infinite set,
then C is a translation of a subtorus of G . Lang conjectured and Liardet proved
that the same conclusion holds if we replace Γ by its division group, that is the
group Γ consisting of all points y ∈ G such that y n ∈ Γ for some n (we use
multiplicative notation in G ). We will give a sketch of an effective version of this
theorem in the special case when the field is a number field, see Theorem 5.4.5.
Similar statements can be made for G a commutative algebraic group with no Ga
components (that is, a semiabelian variety) and replacing C by a subvariety X
of G , but they are far more difficult to prove; indeed, even the simplest case of a
curve in an abelian variety turns out to be equivalent to Mordell’s conjecture. The
latter will be proved in Chapter 11 and for the semiabelian case we refer to the
bibliographical notes in Section 11.11.
125
126 T H E U N I T E QUAT I O N

5.2. The number of solutions of the unit equation

We have the following nice result of F. Beukers and H.P. Schlickewei [23]:

Theorem 5.2.1. There are absolute computable constants C1 , C2 with the fol-
× ×
lowing property. Let Γ be a subgroup of Q × Q with rank Q (Γ) = r < ∞ ,
where rank Q (Γ) is the maximum number of multiplicatively independent elements
in Γ . Then the equation
x + y = 1, (x, y) ∈ Γ
has at most C1 · C2r solutions.
2
This result improves bounds C r and (Cr)r previously obtained by Schlickewei
and Schmidt. Beukers and Schlickewei give the values C1 = C2 = 256.
5.2.2. It is an interesting problem to determine the maximum number of solutions
of the equation x + y = 1 with (x, y) in a group Γ of rank r . In this vein, we
may remark the following. Suppose Γ is a subgroup of K × × K × , where K ×
is the multiplicative group of a number field K . If we take cosets in Γ/tors of
the subgroup of fourth powers, we are led to finding K -rational points on curves
ax4 + by 4 = 1, which have genus 3.
It has been conjectured by L. Caporaso, J. Harris, and B. Mazur [56] that the
number of K -rational points on a curve of genus g ≥ 2 is bounded solely in
terms of K and g . Since we have 4r cosets, this argument suggests that perhaps
C2 ≤ 4.
5.2.3. In many applications the group Γ is the group (US,K )2 , where S is a finite
set of places, containing all infinite places, of a number field K and US,K is the
group of units of the ring of S -integers of K . By Dirichlet’s unit theorem from
1.5.13, US,K is finitely generated of rank |S| − 1 and it is possible to determine
effectively a set of generators of US,K .
We will give below some examples of Γ with a large number of solutions of the
unit equation.
Example 5.2.4. The following simple argument yields an example of a subgroup of Q× ×
Q× with a large number of solutions.
Let N ≥ 2 be a positive integer. Let M be the number of positive integers up to x whose
prime factors do not exceed x1/N . It is clear that
π(x1/N )N
M≥ .
N!
Since π(y) > y/ log y for y ≥ 17 (see B. Rosser and L. Schoenfeld [245], Th.1, Cor.1,
p.69) and N ! ≤ 12 N N , we see that M > 2x/(log x)N if x ≥ 17N . Consider the M 2
sums n + n , where n , n ≤ x are positive integers whose prime factors do not exceed
x1/N . Since n + n ≤ 2x , one sum must occur at least M 2 /(2x) > 2x/(log x)2N
5.2. The number of solutions 127

times. In other words, there is an integer b such that the equation n + n = b has at least
2x/(log x)2N solutions.
It follows that, if Γ is the subgroup of Q× × Q× generated on each factor by all primes
up to x1/N and by b , then the unit equation in Γ has at least 2x/(log x)2N solutions,
provided x ≥ 17N .
This group Γ has rank r equal to either 2π(x1/N ) or 2π(x1/N ) + 2 , hence we have
r ∼ 2N x1/N / log x as x1/N tends to ∞ . If we make the asymptotically optimal choice
. /
log x log x
N= − ,
2 log log x 2(log log x)2
we verify that the number of solutions is at least

x (c+o(1)) √
r
log r
2N
=e
(log x)

with c = 2/e .
Example 5.2.5. Consider the equation au + bv = 1 for non-zero algebraic numbers a, b
to be solved with (u, v) ∈ Γ . This may be reduced to the unit equation by enlarging the
range of solutions to the group generated by Γ and (a, b) . This procedure will be used
later.
Here, we are interested directly in the equation axm + by m = 1 for varying m , corre-
sponding to a group Γ = (x, y)Z of rank 1 . We want to find a , b , x , y such that it has
the maximum number of solutions for m ∈ Z .
We may assume that m = 0 is a solution. Suppose that m = 1 is also a solution, so the
equation becomes (y − 1)xm + (1 − x)y m − (y − x) = 0 . Here we must exclude x = 1 ,
y = 1 , and x = y which correspond to degenerate cases.
If we fix two other solutions, say m1 and m2 , we can eliminate y and obtain an equation
for x . In general, this leads to pairs (x, y) such that we have four solutions, namely m =
0, 1, m1 , m2 .
Note however that there are special cases. If m1 = 2 , the equation degenerates into
(x − 1)(y − 1)(x − y) = 0 , so m1 = 2 must be excluded. Also, if m1 = 3 , the
values m2 = 4, 5, 6, 7, 9 must be excluded, because they lead to degenerate cases or a
group Γ of rank 0 .
However, taking m1 = 4 and m2 = 6 gives the equation x6 + x5 + 2x4 + 3x3 + 2x2 +
x + 1 = 0 . For any root ξ of this equation, we see that taking η = −1/(1 + ξ + ξ 3 ) ,
which is another root of the same equation, we have
η−1 m 1−ξ m
ξ + η =1
η−ξ η−ξ
for the six values m = 0, 1, 4, 6, 13, 52 .
Other examples are obtained by letting the Galois group of the equation (which is of order
6, generated by ξ → 1/ξ and ξ → η ) act on Γ = (ξ, η)Z , and also going to a division
group. It is conceivable that 6 is the maximum number of solutions and that any group of
rank 1 with six solutions is obtained in this way.
128 T H E U N I T E QUAT I O N

The above problem is closely connected to finding zeros of a linear recurrence: Let um+1 =
Aum + Bum−1 + Cum−2 be a linear recurrence of the third order, which we assume
non-degenerate in the sense that the roots βi (i = 1, 2, 3) of the associated characteristic
equation x3 −Ax2 −Bx−C are distinct and non-zero. We may consider the recurrence also
in the negative direction by solving for um−2 . Then the general solution of the recurrence
is given by

um = C1 β1m + C2 β2m + C3 β3m (m ∈ Z).

Let a = −C1 /C3 , b = −C2 /C3 , x = β1 /β3 and y = β2 /β3 . Then solving the equation
axm + by m = 1 in the group Γ = (x, y)Z of rank 1 is equivalent to finding the zeros of
the recurrence {um | m ∈ Z} .

Example 5.2.6. For a prime p , consider the cyclotomic field Cp = Q( p 1) and the corre-
sponding unit equation. Here we choose Γ = U × U , where U is the group of units in the
ring of algebraic integers of Cp .
If u + v = 1 and u , v are not real, then u + v = 1 is another solution of the unit
equation. By Kronecker’s theorem in 1.5.9, ε := u/u and ε := v/v are roots of unity
in Cp . Solving the system u + v = 1 , εu + ε v = 1 , we get u = (ε − 1)/(ε − ε) ,
v = (1 − ε)/(ε − ε) . Conversely, given distinct roots of unity ε , ε in Cp , not equal to
1 , we obtain a solution u , v of the unit equation. Thus the number of non-real solutions of
the unit equation in Cp is (p − 1)(p − 2) .

Example 5.2.7. The number of solutions of the unit equation in the maximal real subfield
Kp of Cp is much larger. A computer search using cyclotomic units produced three solu-
tions for K5 , 42 solutions for K7 , 570 solutions for K11 , 1830 solutions for K13 , 11 700
solutions for K17 , and 28 398 solutions for K19 .

Example 5.2.8. The following example gives an equation u + v = 1 with at least 2532
solutions u, v ∈ U , for a certain group U of rank 5 . Let K = Q(α) with α the real root
α > 1 of the Lehmer equation

x10 + x9 − x7 − x6 − x5 − x4 − x3 + x + 1 = 0.

This equation has another real root 1/α and eight non-real roots all of absolute value 1 ;
we shall refer to the map α
→ 1/α as real conjugation in Q(α) . The Mahler measure of
α is M (α) = α = 1.17628081825991 . . . , and it is widely conjectured to be the infimum
of the Mahler measure of an algebraic number, not a root of unity − the so-called Lehmer
conjecture (see 1.6.15).
The group U of units of K has rank 5: U = {±1}× < α, 1 − α, 1 + α, 1 + α +
α2 , 1 + α − α3 > . Now an extensive computer search for solutions of the corresponding
unit equation produced a remarkable total of 2532 solutions.
The following is a plot of the 2532 points (log |u|, log |u |) , where u is a real unit and u
is the real conjugate of u .
5.2. The number of solutions 129

15

. . ..
.. .. .
...
10 . . . ..
. .. . . ...
. .. . . ..
. . .. . .. . . ........... .. .
.
. . . .. ... . ....... . ..
5 . . . ... ... . .... ... ............................ .. ..
. . . .... .. .... .... ................ .. .
. . . . . . .
. . .... .. ...... ....... . . . . . . . . . . . . . . . . . . . .
. . .. .... ........... ............................................................. ....... . .
.. . ... .. ............................................................................................................ ..... .. .. ... . ..
. . . . . . . . .... . . . .. .. . . .
0 . .... . . .. ................................................................................................................................................................................................................. .. ..
.. .. .
.. . ... .. .. ..... .... ............................................................................................. .. ... . .
. . ...... . ......................................................... ..... .. .. . .
.. ................ . ... .. .. . .. .
.. .. . .... ......... ....... . ..... .... . .
. ............................................. ..... . .. .. .
. .. .... .. .. . . .
5 .. .... ... ..................... .. . . ..... . .. .
. . . ... . . ..
. ....... . .. .. . . . .
... . . .. .
.. . . .
10 . ..
... . .
.. . .

15
15 10 5 0 5 10 15

The proof of Theorem 5.2.1 is obtained by means of a Padé approximation method,


which originates in the work of Thue, Siegel, and Baker.

Lemma 5.2.9. Let f (x) ∈ K[[x]] be a formal power series with coefficients in
a field K . Let L, M be positive integers. Then there are polynomials P (x) ∈
K[x], Q(x) ∈ K[x] of degrees at most L and M and with Q not identically 0,
such that

P (x) − Q(x)f (x) = xL+M +1 R(x) (5.1)

for some formal power series R(x) ∈ K[[x]].


The quotient P (x)/Q(x) is uniquely determined and is called the (L, M )-Padé
approximant of f (x).

Proof: Equation (5.1) is equivalent to solving a system of L+M +1 homogeneous


linear equations in L + M + 2 unknowns, namely the coefficients of the polyno-
mials P and Q. This proves the existence of a non-trivial solution P (x), Q(x),
and non-triviality implies that Q is not identically 0. In order to show uniqueness,
if P&(x), Q(x)
& &
is another solution, then Q(x)P (x) − Q(x)P&(x) is a polynomial
of degree at most L + M divisible by x L+M +1
, hence is identically 0. 
We are interested in the special case in which f (x) = (1 − x)n , n ∈ N . Clearly,
we may assume n ≥ L + 1, otherwise the (L, M )-Padé approximant is (1 − x)n .
130 T H E U N I T E QUAT I O N

Theorem 5.2.10. Let L, M, N be non-negative integers and define polynomials


M

N +j L+M −j j
QL,M,N (x) = x ,
j=0
N L
1
PL,M,N (x) = (−x)L QN,L,M (1 − ),
x
RL,M,N (x) = (−1)L QL,N,M (1 − x)

in Z[x], of respective degree M , L, and N . Then we have the polynomial identity


PL,M,N (x) − (1 − x)L+N +1 QL,M,N (x) = xL+M +1 RL,M,N (x), (5.2)
hence the rational function PL,M,N (x)/QL,M,N (x) is the (L, M )-Padé approxi-
mant of (1 − x)L+N +1 .
Moreover, the 1 -norm of the vector of coefficients of QL,M,N (x) is

L+M +N +1
1 (QL,M,N ) = QL,M,N (1) = .
M
Finally, we have the identities, with  denoting the derivative
(PL,M,N (x)) = −(L + M + N + 1)PL−1,M,N (x),
((1 − x)L+N +1 QL,M,N (x)) = −(L + M + N + 1)(1 − x)L+N QL−1,M,N (x)
(xL+M +1 RL,M,N (x)) = −(L + M + N + 1)xL+M RL−1,M,N (x).
Proof: We have the following classical transformation of a hypergeometric inte-
gral, due to Kummer (A. Erdélyi, W. Magnus, F. Oberhettinger, and F. Tricomi
[100], I, (29), p.106)
1
tM (t − 1)N (t − x)L dt
0
 
1 1/x
= xL+M +1 + uM (xu − 1)N (u − 1)L du
0 1
1
= xL+M +1 uM (u − 1)L (xu − 1)N du
0
1
+ (−1)N (1 − x)L+N +1 v N (1 − v)L (1 − (1 − x)v)M dv,
0
where we have performed the changes of variables t = xu in the first equation,
and xu = 1 − (1 − x)v in the second equation.
This hypergeometric identity determines explicitly the (L, M )-Padé approximant
of (1 − x)L+N +1 . We define polynomials P , Q, R of precise degrees L, M , N
5.2. The number of solutions 131

by means of
1
P (x) = tM (1 − t)N (t − x)L dt
0
1
Q(x) = v N (1 − v)L (1 − (1 − x)v)M dv
0
1
R(x) = (−1)L uM (1 − u)L (1 − xu)N du.
0

Then our identity is

P (x) − (1 − x)L+N +1 Q(x) = xL+M +1 R(x), (5.3)

showing, by checking degrees, that P (x)/Q(x) is the (L, M )-Padé approximant


of (1 − x)L+N +1 .
By a familiar evaluation of Euler’s beta integral, we have
1
Q(x) = v N (1 − v)L ((1 − v) + xv)M dv
0
M

1

M L+M −j
= v N +j
(1 − v) dv xj
j=0
j 0

M
(5.4)
M (N + j)!(L + M − j)! j
= x
j=0
j (L + M + N + 1)!

M

−1 N +j L+M −j j
=D x ,
j=0
N L

where we have abbreviated


(L + M + N + 1)!
D= . (5.5)
L!M !N !
We define (PL,M,N , QL,M,N , RL,M,N ) := (DP, DQ, DR). Note that by (5.4)
and (5.5) the polynomial QL,M,N has positive integral coefficients. In particular,
the 1 -norm of QL,M,N is DQ(1), hence
1

L+M +N +1
1 (QL,M,N ) = D v (1 − v) dv =
N L
.
0 M
The uniqueness of Padé approximants now can be used to obtain relations between
two Padé approximants associated to a triple (L, M, N ) and to a permutation.
132 T H E U N I T E QUAT I O N

If in (5.3) we make the change of variable x → 1 − x , we verify that

(PL,M,N (1 − x), QL,M,N (1 − x), RL,M,N (1 − x)) =


(−1)L (PL,N,M (x), RL,N,M (x), QL,N,M (x)),

while making the change of variable x → 1/x , we find

(xN RL,M,N (1/x), xM QL,M,N (1/x), xL PL,M,N (1/x)) =


((−1)N +L PN,M,L (x), QN,M,L (x), (−1)N +L RN,M,L (x)).

By composing these changes of variable and permuting (L, M, N ), we infer


1
PL,M,N (x) = (−x) QN,L,M L
1− , RL,M,N (x) = (−1)L QL,N,M (1−x).
x

We easily see that differentiating (5.2) on page 130 yields an (L − 1, M, N )-


approximant of (1 − x)L+M . Thus the final identities follow by uniqueness of
Padé approximants. This completes the proof. 
We need another important property of a triple (P, Q, R) as in Theorem 5.2.10.

Proposition 5.2.11. Let (P, Q, R) be a triple as in Theorem 5.2.10. Then any


linear combination

αP (x) − β(1 − x)L+N +1 Q(x) − γxL+M +1 R(x)

with complex coefficients is either identically 0, in which case α = β = γ , or is a


polynomial with only simple roots outside {0, 1, ∞} .

Proof: Consider the rational function

xL+M +1 R(x) (1 − x)L+N +1 Q(x)


ϕ(x) := =1−
P (x) P (x)

and the associated covering ϕ : P1 → P1 .


By Theorem 5.2.10, P, Q, R have exact degree L, M, N and do not vanish at 0
or 1. Now let ex be the ramification index of ϕ at the point x . Then

e0 =L+M +1 (because P (0) = 0, R(0) = 0)


e1 =L+N +1 (because P (1) = 0, Q(1) = 0)
e∞ =M +N +1 (because deg(P ) = L, deg(R) = N )
5.2. The number of solutions 133

and in any case ex ≥ 1 and deg(ϕ) ≤ L + M + N + 1. Therefore, Hurwitz’s


theorem B.4.6 yields

−2 = deg(ϕ) · (−2) + (e0 − 1) + (e1 − 1) + (e∞ − 1) + (ex − 1)
x∈{0,1,∞}

= −2 + 2(L + M + N + 1 − deg(ϕ)) + (ex − 1)
x∈{0,1,∞}
≥ −2.

Equality must hold, hence deg(ϕ) = L + M + N + 1 (thus P, Q, R are pairwise


coprime) and ϕ is unramified outside 0, 1, ∞ .
Let λ ∈ C \ {0, 1} be with

αP (x) − β(1 − x)L+M +1 Q(x) − γxL+M +1 R(x) = (x − λ)2 S(x)

for some polynomial S(x). By identity (5.2) on page 130, we get

(α − β)P (x) − (γ − β)xL+M +1 R(x) = (x − λ)2 S(x).

Note that P (λ) = 0, because P (x) and R(x) have no common zeros. Dividing
by P (x), we find

S(x)
−(γ − β)ϕ(x) = (x − λ)2 − (α − β).
P (x)

Since ϕ is unramified at λ and P (λ) = 0, it follows that γ = β . Hence λ is a


multiple root of (α − β)P (x). Since P (λ) = 0, we conclude that α = β . 
Proof of Theorem 5.2.1. Preliminary lemmas: We need two lemmas. In what
follows, a, b, x1 , x2 , . . . will denote algebraic numbers.

Lemma 5.2.12. Suppose that ax1 + bx2 = c , a x1 + b x2 = c and ab = a b .


Then we have

h(x) ≤ log 2 + h((a : b : c)) + h((a : b : c )).

Proof: By Cramer’s rule, we have

cb − c b ac − a c
x1 = , x2 = .
ab − a b ab − a b
134 T H E U N I T E QUAT I O N

Hence
h(x) = h((ab − a b : cb − c b : ac − a c))

= log max(|ab − a b|v , |cb − c b|v , |ac − a c|v )
v
 
≤ log 2 + log max(|a|v , |b|v , |c|v ) + log max(|a |v , |b |v , |c |v )
v v
= log 2 + h((a : b : c)) + h((a : b : c )). 

Corollary 5.2.13. Suppose that x1 + x2 = 1 and y1 + y2 = 1, with non-zero x1 ,


x2 , y1 , y2 , and x = y . Then
h(x) ≤ log 2 + h(yx−1 ).
Proof: Use Lemma 5.2.12 with a = 1, b = 1 and a = y1 /x1 , b = y2 /x2 . 
The next lemma is the key to the proof.
Lemma 5.2.14. Suppose that x1 + x2 = 1, y1 + y2 = 1, with non-zero x1 , x2 ,
y1 , y2 . Let n ≥ 2 be an integer. Then
1
h(x) ≤ κ + h(yx−2n )
n−1
for an absolute constant κ .
Remark 5.2.15. We may take κ = log 42.
Proof: Let L, M, N ≥ 1 be positive integers. By Theorem 5.2.10, we have
a xL+M
1 + b xL+N
2 =c (5.6)
with
a := x1 RL,M,N (x1 ), b := x2 QL,M,N (x1 ), c := PL,M,N (x1 ).
Another, and obvious, relation is
a xL+M
1 + b xL+N
2 =1 (5.7)
with
a := y1 x−L−M
1 , b := y2 x−L−N
2 .
Now we define a condition C(L, M, N ) by
ab = a b. (C(L, M, N )).
We claim that either C(L, M, N ) or C(L−1, M, N ) holds. Suppose C(L, M, N )
does not hold. This is the same as saying that
y1
fL,M,N (x) := xL+M +1 RL,M,N (x) − (1 − x)L+N +1 QL,M,N (x)
y2
5.2. The number of solutions 135

vanishes at x = x1 . By Proposition 5.2.11, x1 must be a simple zero of fL,M,N (x),



that is fL,M,N (x1 ) = 0. Differentiating and using the last identities in Theorem
5.2.10, we see that
1
fL−1,M,N (x1 ) = − f (x1 ) = 0,
L + M + N + 1 L,M,X
proving our claim.
Therefore, either equations (5.6) and (5.7) are linearly independent, or the same
equations, but now with parameters (L − 1, M, N ) in place of (L, M, N ), are
linearly independent; this second alternative is the same as saying that
a xL+M
1 + b xL+N
2 = c (5.8)
with
a := RL−1,M,N (x1 ), b := QL−1,M,N (x1 ), c := PL−1,M,N (x1 ),
and equation (5.7) are linearly independent.
Now we specialize (L, M, N ) = (n, n, n). Suppose first that equations (5.6) and
(5.7) are linearly independent. Then Lemma 5.2.12 shows that
 
1 : x2 )) ≤ log 2 + h((a : b : c)) + h((a : b : 1)). (5.9)
2n h(x) = h((1 : x2n 2n

Let us write P, Q, R for PL,M,N , QL,M,N , RL,M,N . In order to estimate


h((a : b : c)), we note that the formulas for P, Q, R in Theorem 5.2.10, together
with the equations x1 + x2 = 1 and 1 − 1/x1 = −x2 /x1 , give



+ L+M +N +1 

log |Q(x1 )|v ≤ log   + M log |x1 |v
+
M

v
 L + M + N + 1 
log |R(x1 )|v ≤ log+   + N log+ |x2 |v

N

v
 L+M +N +1 
log |P (x1 )|v ≤ log+   + L max(log |x1 |v , log |x2 |v ).

L v

In the special case (L, M, N ) = (n, n, n) we consider here, this gives




3n + 1
h((a : b : c)) ≤ log + max((n + 1) log+ |x1 |v , (n + 1) log+ |x2 |v )
n v

3n + 1
= log + (n + 1)h(x).
n

By definition
h((a : b : 1)) = h(yx−2n ),
136 T H E U N I T E QUAT I O N

and, in view of (5.9), we deduce


3n + 1
2n h(x) ≤ log 2 + log + (n + 1)h(x) + h(yx−2n ),
n
hence

1 3n + 1 1
h(x) ≤ log 2 + h(yx−2n ). (5.10)
n−1 n n−1
If instead equations (5.6) and (5.7) on page 134 are linearly dependent, equ-
ations (5.7) and (5.8) on page 135 must be linearly independent. The same cal-
culation as before now shows that

1 3n 1
h(x) ≤ log 2 + h(yx−2n ),
n n n
which is better than (5.10). Thus (5.10) holds in any case.
  
The maximum of n−1 1
log 2 3n+1
n occurs for n = 2 and equals log 42. This
proves the lemma. 
5.2.16. Continuation of the proof of Theorem 5.2.1: Let Γ be a finitely generated
× ×
subgroup of Q × Q , of rank r . Let Γtors be its subgroup of torsion elements.
Then Γ/Γtors is a free abelian group of rank r , which we may identify with Zr .
Let Z be the set of solutions of x + y = 1 in Γ and let Z0 be its image in Zr
under the projection Γ −→ Zr . We claim that
|Z| ≤ 2 |Z0 | . (5.11)
Indeed, elements of Z with same image in Z0 can be written as (aε, bζ) with a,
b fixed and ε and ζ roots of unity. Consider the triangle in the complex plane with
vertices at 0, aε and 1. Then the equation aε + bζ = 1 shows that its sides have
length 1, |a|, |b|.
There are at most two such triangles (intersect a circle of radius |a| and centre 0
and a circle of radius |b| and centre 1), showing that the projection of Z onto Z0
is at most two-to-one.

We define a norm   on Rr as follows. Let x = (x1 , x2 ) ∈ Γ be any represen-


tative of u ∈ Zr ∼
= Γ/Γtors , and set
u = 
h(x) = h(x1 ) + h(x2 );
this is well defined because changing x1 or x2 by a root of unity does not change
their height. Next, we extend this to Qr by setting λu = |λ| · u, which is
consistent with the definition of   because h(xλ ) = |λ| · h(x) for λ ∈ Q .
Finally, we extend this to Rr by continuity.
The triangle inequality u + v ≤ u + v is clear from the properties of the
standard height, and we want to show that it is a norm. This requires a little proof.
5.2. The number of solutions 137

We know that   is positive on Qr \ 0, hence not negative on Rr , but it is not


yet clear that it remains positive on Rr \ Qr . Indeed, this is not a general fact as
can be seen from the example (u, v) = |u − αv| with a real irrational α .
The argument that   is a norm is due to Cassels and runs as follows.
Consider the subspace V0 := {u ∈ Rr | u = 0}. Then   induces a norm
on V /V0 by u + V0  = u. By orthogonal projection with respect to the
euclidean structure on Rr , we identify W := V /V0 with V0⊥ . It is clear that
Bt := {u ∈ Rr | u ≤ t} is a closed, convex, symmetric set. If u were not a
norm on Rr , the set Bt would be a cylinder over {w ∈ W | w ≤ t}.
Since all norms on W are equivalent, Minkowski’s first theorem in C.2.19 would
give infinitely many lattice points in Bt , and hence infinitely many elements x =
(x1 , x2 ) ∈ Γ with 
h(x) ≤ t . Since Γ is finitely generated, this would contradict
Northcott’s theorem in 1.6.8. Hence   is a norm with associated ball Bt of
radius t .
It is clear that

1
h(x) ≤ max(h(x1 ), h(x2 )) ≤ h(x) ≤ 
h(x).
2

In view of this inequality, Lemma 5.2.14 shows that for u, v ∈ Z0 and any integer
n ≥ 2 we have

2
u ≤ 2κ + v − 2nu . (5.12)
n−1

In the same way, Corollary 5.2.13 shows that

u ≤ log 4 + 2 v − u if u = v. (5.13)

The idea behind the last two displayed inequalities is the following.
For a vector u ∈ Rr let ν(u) = u/u be the associated unit vector with respect
to the norm  . Suppose that the vectors ν(u) and ν(v) are nearly the same, so
that u and v point about in the same direction. If v is much larger than u,
then we can find an integer n such that v − 2nu is small compared with n u,
and now (5.12) can be used to get an upper bound for u.
The details are quite simple. Let ε > 0 be a small positive constant and let u, v ∈
Z0 be two points with

ν(v) − ν(u) ≤ ε, v ≥ u .


138 T H E U N I T E QUAT I O N

Let n = v/(2 u) , so that 0 ≤ v − 2n u < 2 u. If n ≥ 2, then


(5.12) gives
2
u ≤ 2 κ + v − 2nu
n−1
2
v · ν(v) − 2n u · ν(u)
= 2κ +
n−1
2 4n
≤ 2κ + (v − 2n u) + u · ν(v) − ν(u)
n−1 n−1
4 + 4nε
≤ 2κ + u .
n−1
1
We take ε = 10 and note that (4 + 4nε)/(n − 1) ≤ 12 if n ≥ 45. In this case the
above chain of inequalities yields u ≤ 2κ + 12 u and u ≤ 4 κ . If instead
1 ≤ n < 45, we note that v − 2n u < 2 u, hence v ≤ 90 u .
We have shown:

Lemma 5.2.17. Let u, v ∈ Z0 and suppose that ν(v) − ν(u) ≤ 1


10 and
4κ < u ≤ v. Then
u ≤ v ≤ 90 u .

5.2.18. Conclusion of the proof of Theorem 5.2.1: Let us call large a solution
x ∈ Γ of x1 + x2 = 1 if  h(x) = h(x1 ) + h(x2 ) ≥ max(4κ, 5) and small
otherwise.
The counting of large solutions is done in two steps, first by providing an upper
bound for the number of points u ∈ Z0 such that H ≤ u ≤ AH and lying in a
fixed cone
C(ε; a) := {w ∈ Rr | ν(w) − a ≤ ε},
and then by covering all of Rr by means of finitely many cones C(ε; ai ).
For the first step we use (5.13). Suppose we have two distinct points u, v ∈
Z0 ∩ C(ε; a) with

max(4κ, 5) < u ≤ v ≤ (1 + δ)u .

Then (5.13) gives

u ≤ log 4 + 2 v − u

= log 4 + 2 v · ν(v) − u · ν(u)
≤ log 4 + 2 (v − u) + 2 u · ν(v) − ν(u)
≤ log 4 + (2δ + 4ε) u .
5.2. The number of solutions 139

If we take for example δ = 14 and ε = 20


1
, we obtain u ≤ (10/3) log 4 < 5,
contradicting the assumption u ≥ max(κ, 5). Thus we have a gap principle
5
v > u.
4

Suppose we have m large solutions in a cone C( 201


; a), say ui ∈ Z0 ∩ C( 20
1
; a)
with max(4κ, 5) < u1  ≤ u2  ≤ · · · . Then ν(um ) − ν(u1 ) ≤ 10 , and
1

hence, by Lemma 5.2.17, we have um  ≤ 90 u1  . On the other hand, the
preceding gap principle shows that um  ≥ ( 54 )m−1 u1 . Hence m − 1 ≤
log 90/ log(5/4) < 21 and, by (5.11) on page 136, we cannot have more than 42
1
large solutions with image in any given cone C( 20 ; a).

We need one more lemma:


Lemma 5.2.19. Let   be a norm on Rr . Let E be a subset of the ball Bt :=
{x ∈ Rr | x ≤ t} of radius t . Then for any ε > 0, we can cover E with
(1 + 2t/ε)r translates, all centred on the set E , of the ball Bε .
Proof: Indeed, consider a maximal set E of non-overlapping balls of radius ε/2
with centres on E . Since they are contained in a ball of radius t + ε/2 and they
are disjoint, their number does not exceed the ratio of the volumes of Bt+ε/2 and
Bε/2 , namely (1 + 2t/ε)r .
On the other hand, doubling the radius we obtain a covering of E . Otherwise, if
x∗ is a point of E not covered in this way, the ball x − x∗  ≤ ε/2, which is
centred on E , would be disjoint from E and E would not be maximal. 
1
In our case, taking ε = 20 , we infer from Lemma 5.2.19 that we can cover all of
R with not more than 41r cones C( 20
r 1
; a).
We have already shown that any such cone determines at most 42 large solutions
and we conclude that the total number of large solutions does not exceed 42 · 41r .
It remains to give a bound for the number of small solutions, and this is a conse-
quence of Theorem 4.2.3. We apply this theorem with d = 1 and n = 2, and
deduce that there are two constants γ = γ(1, 2) > 0 and N = N (1, 2) < ∞ such
× × ×
that, for any a, b ∈ Q , we have at most N solutions x = (x1 , x2 ) ∈ Q × Q
of
ax1 + bx2 = 1, 
h(x) ≤ γ.

Let Γ be the division group of Γ , namely


$ %
× ×
Γ := α ∈ Q × Q | αn ∈ Γ, for some n ≥ 1 .

By Lemma 5.2.19 applied to E = Bt ∩ Qr , there are (1 + 2t/γ)r  translates of


the ball Bγ , all centred at rational points and covering Bt .
140 T H E U N I T E QUAT I O N

This means that we can find points (ai , bi ) ∈ Γ , numbering not more than
(1 + 2t/γ)r  , such that every x = (x1 , x2 ) ∈ Γ with x1 + x2 = 1 and 
h(x) ≤ t
can be written, for some i, as x1 = ai ξ , x2 = bi η with ai ξ + bi η = 1 and
h(ξ) + h(η) ≤ γ . Since there are at most N such (ξ, η), we deduce that the
number of (x, y) in question does not exceed N · (1 + 2t/γ)r .
We can take t = max(4κ, 5). Hence the number of small solutions does not
exceed N · (1 + 2 max(4κ, 5)/γ)r . Thus the total number of solutions does not
exceed
42 · 41r + N · (1 + 2 max(4κ, 5)/γ)r  .
This completes the proof of Theorem 5.2.1 if Γ is finitely generated. In the general
case, Γ is the union of its finitely generated subgroups (of rank at most r ). Since
the above upper bound depends only on the rank, this proves the claim. 

5.3. Applications

The importance of the generalized unit equation stems from the fact that many
diophantine problems can be reduced to it. In this section, we review some of the
most interesting applications.
5.3.1. Let K be a number field, S a finite set of places of K containing all places
at infinity, and let OS,K denote the ring of S -integers of K , and let US,K be
the group of units of OS,K . Let also F (x, y) ∈ OS,K [x, y] be homogeneous of
degree r ≥ 3 with coefficients in OS,K and assume that F has at least three
non-proportional linear factors in a factorization over K .
The Thue–Mahler equation is the equation F (x, y) ∈ US,K , to be solved with
x, y ∈ OS,K .
In 1909, using a new method based on diophantine approximation, Thue proved
that the equation F (x, y) = m , with F (x, y) ∈ Z[x, y] and with three non-
proportional linear factors over C has only finitely many solutions in integers (see
6.2.1 for the argument). Through the work of Siegel and Mahler, this was extended
to equations in number fields to be solved in S -integers and to the more general
Thue–Mahler equation, with the proviso of considering equivalent two solutions
differing only by multiplication by an S -unit. We have:

Theorem 5.3.2. The number of equivalence classes of solutions (x, y) ∈ OS,K


2
of
r
12(3)|S|
the Thue–Mahler equation F (x, y) ∈ US,K does not exceed C1 · C2 , where
C1 and C2 are the constants introduced in Theorem 5.2.1.
Remark 5.3.3. With a different method J.-H. Evertse [109] has obtained the im-
proved bound (5 · 106 r)|S| , which shows a much better dependence on r . The
example xr + a(x − y)(2x − y) · · · (rx − y) = 1, with solutions (1, j) with
5.3. Applications 141

j = 1, . . . , r shows that already with |S| = 1 we may have r solutions of a Thue


equation F (x, y) = 1 of degree r .
On the other hand, L. Caporaso, J. Harris, and B. Mazur [56], and later D.
Abramovich [2], have obtained some evidence for the conjecture that the num-
ber of K-rational points on a curve C of genus at least 2, defined over a number
field K , admits a bound depending only on the genus of the curve and the degree
of the number field K ; in particular, their conjecture implies that there should be
a bound for the number of solutions of a Thue equation depending only on r and
the degree of K .
Proof: The following argument, which reduces a Thue–Mahler equation to a unit
equation, goes back to Siegel.
We may assume that F (x, y) = a0 xr + a1 xr−1 y + · · · + ar y r has degree r in x
and leading coefficient a0 = 1. This step is not really essential to the proof, but
makes things a little simpler. To verify this assertion, choose any solution (x0 , y0 )
of the Thue–Mahler equation F (x, y) ∈ US,K . Let A be the matrix

a b
A=
c d
with
a = F (x0 , y0 )−1 (a0 xr−1 0 + a1 xr−2
0 y0 + · · · + ar−1 y0r−1 )
b = F (x0 , y0 )−1 ar y0r−1
c = −y0
d = x0 .
Then det(A) = 1 and A has entries in OS,K . Therefore, the Thue–Mahler equa-
tion F (x, y) ∈ US,K is equivalent to the other Thue–Mahler equation
G(x, y) := F (x0 , y0 )−1 F (dx − by, −cx + ay) ∈ US,K
with leading coefficient G(1, 0) = 1.
Let α1 , α2 , α3 be three distinct roots of F (x, 1) = 0 over K , and define:
(a) K  = K(α1 , α2 , α3 );
(b) S  the set of places of K  over S , OS  ,K  the ring of S  -integers of K  ,
and US  ,K  the group of units of OS  ,K  .
The field K  has degree at most r(r − 1)(r − 2) over K and |S  | ≤ r(r − 1)(r −
2)|S| (use Corollary 1.3.2). The group US  ,K  has rank |S  |−1 by Dirichlet’s unit
theorem (see Theorem 1.5.13). Now we define Γ to be the group of pairs (u, v)
with u, v ∈ US  ,K  ; it is clear that Γ has rank s not exceeding 2(|S  | − 1) ≤
2r(r − 1)(r − 2)|S| − 2.
Since F has leading coefficient 1, all roots α1 , . . . , αr of the equation F (x, 1) = 0
are integral over OS,K . For a solution (x, y) ∈ OS,K 2
of the Thue–Mahler
142 T H E U N I T E QUAT I O N

equation, each factor x − αi y is integral over OS,K and the product is an S -unit.
Hence the factors x − αi y are S  -units for i = 1, 2, 3.
On the other hand, the three linear forms x − αi y , i = 1, 2, 3, must be linearly
dependent, and in fact a linear relation is
α2 − α3 x − α1 y α3 − α1 x − α2 y
+ = 1.
α2 − α1 x − α3 y α2 − α1 x − α3 y
This is an equation of type Au + Bv = 1 with (u, v) ∈ Γ . We extend Γ
to a new group Γ of rank at most s + 1 by adding a new generator (A, B)
and apply Theorem 5.2.1. Then the number of solutions (u, v) does not exceed
C1 · C2s+1 . Conversely, (u, v) determines (x, y) up to multiplication by a scalar,
and it follows that (u, v) determines at most one equivalence class of solutions
(x, y) of the Thue–Mahler equation F (x, y) ∈ US,K . 
5.3.4. Another equation which can be treated by similar methods is the hyperel-
liptic equation
by 2 = a0 xr + a1 xr−1 + · · · + ar
with coefficients in OS,K , b = 0, to be solved with (x, y) ∈ OS,K . For its treat-
ment, the reader is required to have some basic knowledge of algebraic number
theory. We recall that the class group of a number field is the group of fractional
ideals in K modulo the principal fractional ideals.
There is little loss in generality if we assume that b = 1 and that f (x) := a0 xr +
a1 xr−1 + · · · + ar has no multiple roots. In fact, we can always write bf (x) =
F (x)H(x)2 with F, H ∈ OS,K [x] and F (x) without multiple roots. This yields
the equation
Y 2 = F (X),
where Y = by/H(x) and X = x . Thus Y ∈ K if X ∈ OS,K and since Y is
integral over OS,K , which is integrally closed, we see that Y ∈ OS,K too.

Theorem 5.3.5. Let f (x) = a0 xr + a1 xr−1 + · · · + ar be a polynomial of degree


r ≥ 3 with no multiple roots and coefficients in OS,K . Let ω(Df ) be the number
of distinct prime ideals of K dividing the discriminant Df of f (x) which are not
contained in S . Also let α1 , α2 , α3 be three roots of f (x) and let
r −1
K  = K(a0 2 , α1 , α2 , α3 ).
Finally, suppose that the Sylow 2-subgroup of the class group of the ring of integers
OK  of the field K  is generated by ν elements. Then the equation
y 2 = f (x)

has at most C1 · (2C2 )16[K :K](|S|+ω(Df ))+16ν solutions (x, y) ∈ OS,K with y =
0, where C1 and C2 are the constants introduced in Theorem 5.2.1.
5.3. Applications 143

Proof: We use a method of Siegel to reduce the equation to a finite number of unit
equations. We consider first the special case in which:

(a) the polynomial f (x) is monic;


(b) f (x) has three roots αi , i = 1, 2, 3, in K ;
(c) the discriminant Df is a unit in OS,K ;
(d) the ring OS,K is a unique factorization domain.

We write f (x) = (x − α1 )(x − α2 )(x − α3 )h(x). By (a), (b), and Gauss’s lemma
in 1.6.3, we have α1 , α2 , α3 ∈ OS,K and h(x) ∈ OS,K [x].
Let x ∈ OS,K be a solution of y 2 = f (x) with f (x) = 0. We claim that
the principal ideals [x − α1 ], [x − α2 ], [x − α3 ], [h(x)] in OS,K are pairwise
coprime. The first three ideals are coprime because αi − αj ∈ [x − αi , x − αj ]
and αi − αj divides Df , which is a unit by assumption (c). A similar argument
applies to x − αi and h(x), noting that by (a) all roots are integral over OS,K and
working with a factorization of h(x) in the splitting field of f .
Now the ideal equation
[x − α1 ] [x − α2 ] [x − α3 ] [h(x)] = [y]2
shows that
[x − αi ] = y2i , i = 1, 2, 3
for some ideal yi of OS,K . By assumption (d), we conclude that the square root
of [x − αi ] must be a principal ideal. Thus we can write
x − αi = ui yi2
with ui ∈ US,K and yi ∈ OS,K , for i = 1, 2, 3. Note also that we can take
u1 , u2 , and u3 modulo squares. By Dirichlet’s unit theorem in 1.5.13, the group
of units US,K of OS,K is the direct product of a cyclic torsion group and a free
abelian group of rank |S| − 1. Therefore, we need not consider more than 8|S|
triples (u1 , u2 , u3 ). Eliminating x from these equations we find
ui yi2 − uj yj2 = αj − αi for i, j = 1, 2, 3 and i = j. (5.14)
√ √ √
Let F = K( u1 , u2 , u3 ) and let S  be the set of places of F lying over S .
In the field F , the equations (5.14) factorize as
√ √ √ √
( ui yi − uj yj )( ui yi + uj yj ) = αj − αi
√ √
with ui yi ± uj yj ∈ OS  ,K  , while αj − αi ∈ US  ,K  by (c). Thus vij :=
√ √
ui yi − uj yj is a unit in OS  ,K  for i, j = 1, 2, 3 and i = j . On the other
hand, we have identically
(v12 /v13 ) + (v23 /v13 ) = 1.
144 T H E U N I T E QUAT I O N

Now (v12 /v13 , v23 /v13 ) ∈ US  ,K  ×US  ,K  , which has rank 2|S  |−2. According
2|S  |−2
to Theorem 5.2.1, this equation has at most C1 · C2 solutions. Also |S  | ≤
[F : K] |S| ≤ 8 |S|.
Hence let us fix (v12 /v13 , v23 /v13 ), so that we can write vij = cij w with the
2|S  |−2
{cij } having not more than C1 · C2 possibilities. We have
c−1 −1
ij (αj − αi ) = cij (ui yi − uj yj )
2 2
 √ 
= c−1
ij ui yi2 − ( ui yi − cij w)2

= 2 ui yi w − cij w2 .
For a given i we have two distinct values j, k = i and cij = cik . Hence
c−1 −1
ij (αj − αi ) − cik (αk − αi ) = (cik − cij )w .
2

This determines w2 uniquely and hence w up to sign. Once w is given, yi and x


are uniquely determined.
If we take into account the number of triples (u1 , u2 , u3 ) and the number of sets
{cij }, we conclude that the number of solutions of the equation y 2 = f (x) in
OS,K with y = 0 does not exceed

2C1 · 8|S| (C2 )2|S |−2
≤ C1 · (2C2 )16|S| .

To complete the proof in the general case, we have only to enlarge the field K and
the set S so that our assumptions (a) to (d) are verified.
r −1
For (a), it suffices to add a0 2 to the field K .
For (b), it suffices to add αi , i = 1, 2, 3, to the field K .
For (c), it suffices to add to S the set S2 of places v for which ordv (Df ) > 0.
This gives us an extension K  of K (of degree [K  : K] ≤ 2r(r − 1)(r − 2)) and
a new set S3 , the places of K  lying over S and S2 . Thus we may assume that
(a), (b), (c) are satisfied.
For (d), we use:
Proposition 5.3.6. Let K be a number field. Then we can find a finite set of places
S of K such that for any finite set of places T ∈ MK with T ⊃ S , the ring OT,K
is a principal ideal domain and hence a unique factorization domain.
Proof: The ring of integers OK is not necessarily a unique factorization domain.
Since the class group of a number field is finite (see e.g. [172], Ch.V or [162], Th.
2.7.1), there are ideals I1 , . . . , Ir in OK forming a finite set of representatives for
the class group of OK . Let M1 , . . . , Mn be a finite set of prime ideals of OK ,
containing all maximal ideals dividing at least one of the ideals Ij , j = 1, . . . , r .
5.3. Applications 145

The set
+
n
M := (OK \ Mj )
j=1

is multiplicatively closed.
Let R be the localization of OK in M . The ring R is again a Dedekind domain
([156],Th.10.4). In order to show that R is a unique factorization domain, it is
enough to prove that any maximal ideal m in R is principal. The maximal ideal
M := m ∩ OK of OK generates m . Moreover, M is equivalent to a product of
ideals Mj , because they generate the class group of OK . Therefore, m is also
equivalent to a product of ideals R Mj .
Since R Mj = R for every j , we see that M is generated by a single element.
We conclude that R is a principal ideal domain. This proves what we want, with
S the set of places determined by the prime ideals dividing the ideals Ij . 
This proves a slightly weaker version of the theorem, in which ν is the number
of distinct prime ideals dividing a set of ideals which generate the class group of
K . For the more precise statement, we need two observations. First, note that in
place of (d) it suffices that the 2-primary part of the class group of OK is trivial.
Second, as shown by Landau in 1907, every ideal class contains prime ideals. This
follows from the general form of Dirichlet’s density theorem, a particular case of
which states that the set of prime ideals p of OK in a given ideal class of the class
group CK is a set of positive natural density 1/|CK | (see e.g. [215], Ch.VII, §2,
Prop.7.10, Cor.4).
Hence in the proof of the preceding proposition we can take all ideals Ij to be
prime ideals. This also shows that we can take ν to be the cardinality of a set of
generators of the 2-subgroup of the class group, completing the proof. 

Remark 5.3.7. The usefulness of Theorem 5.3.5 in applications is somewhat lim-


ited by the presence of the factor (2C2 )16ν in the given bound.
At present, the only general estimate we have on ν is comparable with the log-
arithm of the full class number of K  , and hence this factor is comparable with
a power of the class number of K  , which in turn is comparable with a power of
the height H(f ) of the polynomial f . Really useful estimates would be of order
H(f )ε for any fixed ε > 0. Note however that this will be the case if f (x) has
three roots in K and either r is odd or a0 is a square in K , since then K  = K
and ν will be independent of f .

Remark 5.3.8. The so-called superelliptic equation y m = f (x) with m ≥ 3


can be treated pretty much in the same way, with a reduction to a Thue–Mahler
equation, of Fermat type, of degree m . We leave the details to the reader.
146 T H E U N I T E QUAT I O N

5.4. Effective methods

A discussion of the unit equation would not be complete without mentioning its effective so-
lution obtained by Baker’s method or by the so-called Thue–Siegel principle. In its simplest
formulation, everything follows from:
Theorem 5.4.1. Let K be a number field and Γ a finitely generated subgroup of K × . Let
0 < ε ≤ 1 and v ∈ MK . There is an effectively computable function C(K, Γ, v, ε) such
that every solution of the diophantine inequality
|1 − γ|v ≤ H(γ)−ε , γ∈Γ
satisfies h(γ) ≤ C(K, Γ, v, ε) .
Remark 5.4.2. The best bounds for C(K, Γ, v, ε) are obtained via Baker’s theory of lin-
ear forms in logarithms in many variables, as in A. Baker and G. Wüstholz [16] in the
archimedean case and Kunrui Yu [335] in the non-archimedean case, see also Y. Bugeaud
[53] and Y. Bugeaud and M. Laurent [54].

A self-contained proof of Theorem 5.4.1, obtained with quite a different method (the so-
called Thue–Siegel method) is in [31] and E. Bombieri and P.B. Cohen [32]. The special
case A = 1 of the estimate in [32], which holds in the non-archimedean case, yields the
following completely explicit result.
We define ρ(x) ≥ e5 to be the solution of ρ/(log ρ)5 = x if x > e5 5−5 and ρ(x) = e5
otherwise and denote by ξ1 , . . . , ξt a set of generators of Γ/tors . Let K be a number
field of degree d , v be a non-archimedean place of K dividing the rational prime p and
with residue class degree fv . For Dv∗ := max (1, d/(fv log p)) , define h (x) to be the
modified height h (x) = max (h(x), 1/Dv∗ ) and H  (x) = exp(h (x)) . Finally, let

t
C = 66pfv (Dv∗ )6 and Q = (2tρ(C/ε))t h (ξi ).
i=1

Then any solution γ ∈ Γ of |1 − γ|v < H  (γ)−ε has height bounded by


 
h(γ) ≤ 16pfv ρ(C/ε)Q max 1, 4pfv Q .

5.4.3. It is easy to see how Theorem 5.4.1 can be used to solve effectively the unit equation
x + y = 1 in the group Γ = US,K of S -units of K . We give a quick sketch of the
argument.
 
Since y is an S -integer, we have v∈S log+ |y|v = h(y) and also v∈S log |y|v = 0
by the product formula, because log |y|v = 0 for v ∈ / S . Therefore, there is v ∈ S such
that
1
log |y|v ≤ − h(y),
|S|
which is the same as
|1 − x|v = |y|v ≤ H(y)−1/|S | .
Moreover, it is clear that H(x) = H(1 − y) ≤ 2H(y) (use Proposition 1.5.15) , hence
|1 − x|v ≤ 21/|S | H(x)−1/|S | . (5.15)
5.4. Effective methods 147

If H(x) ≤ 4 , we have the desired bound. If instead H(x) > 4 , it is immediate from (5.15)
that
|1 − x|v ≤ H(x)−1/(2|S |) ,
hence in any case Theorem 5.4.1 yields
h(x) ≤ max{log 4, C(K, Γ, v, 1/(2|S|))}.
Remark 5.4.4. Lang’s proof of the theorem stated in the introduction is obtained by using
Hurwitz’s genus formula to show that, for large m , a component of the pull-back of the
curve C ⊂ Gnm by the isogeny x
→ xm has necessarily large genus. Then we conclude
by an application of the well-known Siegel’s finiteness theorem on integral points on curves.
(For curves over a number field, see Theorem 7.3.9 and Remark 7.3.10. For the general case,
see [169], Ch.8, Th.2.4.)

We give here a sketch of a different argument, for the case when the curve is defined over a
number field, because it reduces the proof to the statement of Theorem 5.4.1 rather than the
ineffective Siegel theorem. We prove:
Theorem 5.4.5. Let C be a geometrically irreducible closed curve in Gnm , defined over a
number field K , not a translate of a subtorus of Gnm , and let Γ be any finitely generated
subgroup of Gnm (K) . Then C ∩ Γ is an effectively computable finite set.
Proof: By using the projections x
→ (xi , xj ) onto G2m , we easily reduce the problem to
the case n = 2 and where C is given by the equation f (x, y) = 0 .
Since Γ is finitely generated, there is a number field L and a finite set S ⊂ ML containing
×
all archimedean places such that Γ ⊂ (OS,L )2 . Replacing K by L and enlarging Γ , we
×
may assume that Γ = Γ1 × Γ1 , where Γ1 ⊂ OS,K is finitely generated.
For any (α, β) ∈ Γ , the affine multiplicative height satisfies

H((α, β)) := max (1, |α|v , |βv |v ) ≤ max (1, |α|v , |β|v )|S | . (5.16)
v∈S
v∈M K

Now we let (α, β) range over C ∩ Γ and we want to get an effective upper bound for
the height. Replacing α by α−1 or β by β −1 , which does not affect the standard height
h(α) + h(β) , and replacing C by the image of the corresponding automorphism, we may
asssume that the maximum in (5.16)is attained in v ∈ S with min (|α|v , |β|v ) ≥ 1 . Now
consider the polynomial f (x, y) = aij xi y j and order terms as
|apq αp β q |v ≥ |ars αr β s |v ≥ . . .
Since f (α, β) = 0 , the two largest terms must be of the same order of magnitude, hence
p log |α|v + q log |β|v = r log |α|v + s log |β|v + O(1)
(5.17)
≥ i log |α|v + j log |β|v + O(1)
for all monomials xi y j appearing in f (x, y) . Note that we may restrict our attention to the
set of (α, β) ∈ C ∩ Γ having the above properties with respect to the fixed absolute value
v ∈ S and with |apq αp β q |v ≥ |ars αr β s |v as the largest terms in f for fixed (p, q), (r, s) .
Here and in the following, the Landau (and Vinogradov) symbols are with respect to this
set. In particular, (p, q) and (r, s) must be linearly independent if H((α, β)) is large,
because of (5.16) and min(log |α|v , log |β|v ) ≥ 0 .
148 T H E U N I T E QUAT I O N

Consider now another monomial xi y j . We have (i, j) = Aij · (p, q) + Bij · (r, s) for
certain rational numbers Aij , Bij , with denominators dividing D = |ps − qr| ≥ 1 , hence
DAij , DBij ∈ Z . By (5.17), we deduce
p log |α|v + q log |β|v ≥ i log |α|v + j log |β|v + O(1)
= (Aij p + Bij r) log |α|v + (Aij q + Bij s) log |β|v + O(1)
= (Aij + Bij )(p log |α|v + q log |β|v ) + O(1) .
Therefore, if H((α, β)) is large enough, we must have either Aij + Bij = 1 or Aij +
Bij ≤ 1 − 1/D . Now we define I to be the set of all pairs (i, j) such that Aij + Bij = 1 .
Thus we have for (i, j) ∈ I an equation
xi y j = (xp y q )(xr−p y s−q )B i j . (5.18)
/ I , then Aij + Bij ≤ 1 − 1/D implies that
If instead (i, j) ∈
log |αi β j |v ≤ (1 − 1/D) log |αp β q |v + O(1). (5.19)

We abbreviate X = xp y q , Y = xr y s and note that xDi y Dj are monomials in X , Y .


Then we have by (5.18)

f (xD , y D ) = X D R(Y /X) + aij xDi y Dj , (5.20)
(i,j)∈I
/

where 
R(t) = aij tDB i j .
(i,j)∈I

Now we specialize (ξ, η) with (ξ D , η D ) = (α, β) and correspondingly write Ξ , H for the
specializations of X and Y . For (i, j) ∈ I , the bound (5.19) yields
|ξ Di η Dj |v = O(|Ξ|D−1
v ).
Since f (ξ D , η D ) = f (α, β) = 0 , from (5.20) and (5.16), we find in case of pq = 0 that
|R(H/Ξ)|v  |Ξ|−1
v ≤ max (|α|v , |β|v )−1/D ≤ H((α, β))−1/(|S |D) .
A similar exponential bound follows easily from the definition of (p, q) if pq = 0 . By
(5.17) again, |H/Ξ|v is bounded and bounded away from 0 and from the last displayed
equation we see that
|1 − ζ −1 H/Ξ|v  H((α, β))−c
for some root ζ of R(t) and some c > 0 .
Finally, ζ −1 H/Ξ belongs to the finitely generated group Γ2 obtained by adding ζ to
the division group {P ∈ G2m | P D ∈ Γ1 } of order D of Γ1 . It is also clear that
H((α, β))−c  H(ζ −1 H/Ξ)−κ for some κ > 0 . Thus we may apply Theorem 5.4.1
to Γ2 and conclude that H(αr−p β s−q ) is bounded. By Northcott’s theorem in 1.6.8,
αr−p β s−q belongs to a finite set, hence we have shown that (α, β) belongs to a finite
union of effectively computable torus cosets in G2m . Since C is geometrically irreducible
and not a torus coset by hypothesis, the intersection of C with these torus cosets is finite
and effectively computable. 
For comparison with this argument, see also the proof of Theorem 7.4.7.
5.5. Bibliographical notes 149

5.5. Bibliographical notes

Special cases of the unit equation appear in the work of Siegel [283] and Mahler
[187], and finiteness of the number of solutions is proved through a reduction to
a finite set of Thue equations. The first general formulation in a geometric setting
was done by S. Lang in 1960 [166]. S. Lang’s conjectured extension [167] to the
division group of a finitely generated group was later proved by P. Liardet [182].
Theorem 5.2.1 is the coronation of a long series of successive improvements in
counting the number of solutions of unit equations. Our proof follows [23] quite
closely. K.K. Choi has shown, in an unpublished note, that we can take C1 = 30,
C2 = 70 in this theorem. The first such bound depending only on the rank was
obtained, for the generalized S -unit equation in a number field, by J.-H. Evertse
[103] in a paper which sparked much research in this area.
The argument in Example 5.2.4 is due to D. Zagier and simplifies a more precise
calculation due to P. Erdős, C.L. Stewart, and R. Tijdeman [101], which leads to a
better value of the constant c. Example 5.2.5 is due to J. Berstel and is mentioned
in Beukers and Schlickewei [23], where it is shown that such an equation has at
most 61 solutions. The remark in Example 5.2.6 on complex solutions of the unit
equation in a cyclotomic field is due to H.W. Lenstra; this applies in a more general
setting, notably CM -fields.
The reduction of a Thue equation to a unit equation is in [283], Zweiter Teil, §1.
Siegel studies the unit equation by taking cosets of units modulo high powers,
thereby reducing it to a finite set of equations axr + by r = c , for which he had
independently proved finiteness of the number of integral solutions using diophan-
tine approximation methods.
Theorem 5.3.2 is due to Evertse [103]. Uniform polynomial bounds in r were
independently obtained in [30] and, with a sharper result and better proof, in [109].
The reduction of a hyperelliptic equation to a unit equation is in a two-page paper
[282] by C.L. Siegel in 1926, published under the pseudonym X.
6 ROT H ’ S T H E O R E M

6.1. Introduction

The Liouville inequality in 1.5.21, or its projective version in 2.8.21, while simple
and useful, does not tell the real truth about how well we can approximate algebraic
numbers by algebraic numbers in a fixed field K .
In 1909 A. Thue obtained the first improvement on Liouville’s theorem about ap-
proximation of algebraic numbers by rational numbers. He proved the following
result:
Thue’s theorem: Let α be a real algebraic number of degree d ≥ 3 and let
ε > 0. Then there are only finitely many rational numbers p/q , q ≥ 1, such that
 
 
α − p  ≤ d 1 .
 q  q 2 +1+ε

As a consequence of this theorem, Thue proved that the Thue equation


F (x, y) = m,
where F ∈ Z[x, y] is homogeneous of degree d with at least three non-proportional
linear factors over C , has only finitely many solutions in integers x, y , for every
fixed non-zero integer m .
The main drawback of Thue’s theorem is that it is ineffective, in the sense that
no bound can be placed a priori on the height of the rational approximations
p/q . Loosely speaking, this is due to the fact that no procedure is given to de-
cide whether a solution exists with height above a given constant. Since Thue’s
method obtains information on solutions assuming that one solution is known, this
leads in the end to ineffectivity.
On the other hand, many questions in number theory can be reduced to questions
of diophantine approximation as above, so that the result of Thue, even with its
inherent ineffectivity, definitely is of considerable importance.
Thue’s theorem went through various successive improvements. First of all, aside
from the fairly obvious extension to approximations in general number fields,
150
6.1. Introduction 151

Siegel showed that the exponent d/2 + 1 can be replaced by


d √
min s + < 2 d.
s∈N s+1
The fact that this exponent is of order o(d) rather than d turned out to be quite
important in Siegel’s proof of the finiteness of the number of integral points on a
curve of genus g ≥ 1, in treating the case g = 1 (for the case g ≥ 2, Siegel had
to develop a corresponding method dealing with simultaneous approximations). A
little later, Mahler developed the same method over the p -adic numbers, thereby
obtaining as an application the finiteness of the number of solutions of a Thue
equation in S -integers rather than ordinary integers; again, this had significant
applications to other problems in number theory. However, all these extensions of
Thue’s theorem also suffer from the same problem of ineffectivity.
Thue’s method depended on an auxiliary construction with polynomials in two
variables. It was expected that a similar construction in m variables would yield
further drastic improvements, and this was explored by Siegel and Schneider in
the 1930s. They could not deal with one crucial point of the construction, namely
the non-vanishing of the auxiliary polynomial when evaluated at a special point.
It was only in 1955 that Roth was able to overcome this stumbling block, thereby
obtaining the sharp exponent 2 in place of d/2 + 1 in Thue’s theorem.
In Section 6.2, we start by proving finiteness of integer solutions for the Thue
equation, then we formulate Roth’s theorem for number fields with respect to a
finite set of places and we sketch the proof in the case K = Q with respect to the
ordinary absolute value.
In Section 6.3, we introduce the index of a polynomial as a measure for its van-
ishing in a given point. It is used together with Wronskian techniques to prove
Roth’s lemma, the crucial point for the non-vanishing of the auxiliary polynomial
mentioned above. Section 6.4 is reserved for the proof of Roth’s theorem.
In Section 6.5, we prove Vojta’s generalization with moving targets, we give quan-
titative results for the number of exceptional good approximations, and we mention
the Cugiani–Mahler theorem. This section provides us with additional informa-
tion, which may be omitted in a first reading.
For this chapter, the reader should be familiar with the results about Siegel’s lemma
in Section 2.9. However, we do not need here the most sophisticated formulations,
for example the dependence on the discriminant of the number field will be irrele-
vant and we may deduce Roth’s theorem also from a relative version of the easier
Corollary 2.9.2. Roth’s theorem will be rather important in the sequel, it will be
used and generalized in Chapters 7 and 14 and Roth’s lemma is a tool in the proof
of Schmidt’s subspace theorem in Chapter 7 and in the proof of Mordell’s conjec-
ture in Chapter 11.
152 ROT H ’ S T H E O R E M

6.2. Roth’s theorem

This section states Roth’s theorem and gives a sketch of the proof.
6.2.1. In order to see how diophantine approximation can be applied to diophan-
tine equations, we start with the argument that Thue’s theorem implies finiteness
of integer solutions of the Thue equation (see the introduction in 6.1 for the state-
ments, and see Theorem 5.3.2 for a generalization and a quantitative result).
The argument is by contradiction. First, we assume F irreducible. We use the
decomposition


x x x m
F , 1 = ad − α1 · · · − αd = d
y y y y
into linear factors. If there are infinitely many integer solutions (xn , yn ) of the
Thue equation, then |yn | → ∞ and we may assume, by passing to a subsequence,
that xn /yn tends to a zero αj . As the other factors are bounded away from 0, we
get infinitely many solutions of |x/y − αj | ≤ C|y|−d for some constant C > 0.
Since d ≥ 3, this contradicts Thue’s theorem.
In general, let F1 , . . . , Fr be the non-constant irreducible polynomials in Z[x, y]
dividing F . By a linear change of coordinates, we may assume that y is not
a divisor of F . Dirichlet’s box principle gives finitely many divisors mj of m
such that the system of equations Fj (x, y) = mj (j = 1, . . . , r) has infinitely
many solutions (xn , yn ). The above argument shows that xn /yn approaches a
zero of every Fj (x, 1) and hence r = 1. Since F has at least three different
linear factors, we get deg(F1 ) ≥ 3. Now the irreducible case considered above
leads to a contradiction with the initial assumption of infinitely many solutions of
F1 (x, y) = m1 .
6.2.2. Now we give Lang’s general formulation of Roth’s theorem over number
fields; the same statement for the rational field Q belongs to D. Ridout.
We use the notions introduced in Chapter 1, 1.4.3: For a place v of a number field
K , we denote by | |v the normalized absolute value (as in (1.6) on page 11) to
get the product formula. As usual, we denote again by | |v its extension to the
completion Kv . The absolute exponential height H(α) = eh(α) for an algebraic
number is as defined in 1.5.7.
Theorem 6.2.3. Let K be a number field with a finite set S of places. For each
v ∈ S let αv ∈ Kv be K -algebraic. Let κ > 2. Then there are only finitely many
β ∈ K such that 
min (1, |β − αv |v ) ≤ H(β)−κ .
v∈S

The classical theorem of Roth is the special case K = Q and S = {∞}, so that
| |v is the ordinary absolute value in R .
6.2. Roth’s theorem 153

Remark 6.2.4. Theorem 6.2.3 is ineffective in the sense that the proof does not
give an upper bound for H(β). However, it does give an upper bound for the
number of solutions β , see 6.5.3.

A refinement of Theorem 6.2.3, in which we allow αv to vary with β , will be


mentioned later in Section 6.5.
6.2.5. Theorem 6.2.3 makes sense also if we allow αv = ∞ , just by replacing
the meaningless |∞ − β|v by |1/β|v . This can be seen by applying a linear
transformation T (x) = (ax + b)/(cx + d), a, b, c, d ∈ Z , such that T (αv ) are
all finite and applying the theorem with T (αv ) and T (β). Since H(T (β)) 
H(β), our claim follows.
A more elegant way of dealing with this consists in working on the projective line
P1 rather than the affine line A1 , replacing the affine v -adic distance |αv − β|v
with the projective v -adic distance δv (αv , β) introduced in 2.8.16, and the height
H(β) with the exponential Arakelov height on P1 .
6.2.6. Consider the special case K = Q , S = {∞, p} with α∞ = ∞ , αp = α
an algebraic integer in Qp and β = n ∈ Z . Then |1/n|∞ = 1/H(n), therefore
Theorem 6.2.3 implies that
|α − n|p < |n|−1−ε
has only finitely many solutions in integers n , for every fixed ε > 0.
6.2.7. The following application of Ridout’s form of Roth’s theorem is due to K. Mahler
[185]. Let g(k) be the smallest number such that every positive integer is a sum of at most
g(k) positive integral k th powers. It is a classical theorem of Lagrange that g(2) = 4 .
Waring stated the empirical theorem that g(3) = 9 , g(4) = 19 , and so on, and Hilbert
proved in general g(k) < ∞ . As noted by J.A. Euler (son of Leonhard Euler, see L.E.
Dickson [89], p.717), the number ( 32 )k  2k − 1 requires ( 32 )k  − 1 powers 2k and
2k − 1 powers 1k for its representation, making it clear that g(k) ≥ 2k + ( 32 )k  − 2 .
It turned out, after the researches of Hardy, Littlewood, and Vinogradov on the problem,
that the number of k th powers needed to represent large integers was substantially less than
the above lower bound for g(k) , thus reducing the problem of determining g(k) to a finite
calculation for every k . Through the work of Dickson, Pillai, Rubugunday, and Niven, this
eventually led to a complete solution for k ≥ 6 of the original Waring’s problem, in which
there were two possible answers, namely g(k) = 2k +( 32 )k −2 if ( 32 )k −( 32 )k ≥ ( 34 )k ,
and another more complicated result otherwise (see W.J. Ellison [99] for a detailed account
and references).
Let us apply Theorem 6.2.3 choosing S = {∞, 2, 3} , α∞ = 1 , α2 = ∞ , α3 = 0 , and
β = 3k /(n·2k ) with n = ( 32 )k  . We have (using the remark in 6.2.5): |α2 −β|2 ≤ 2−k ,
|α3 − β|3 = 3−k |n|−1
3 , and H(β) ≥ 3 |n|3 . After an easy simplification, we find that the
k

inequality
 −2−ε
|α∞ − β|∞ = |1 − 3k /(n · 2k )| < 2k 3k |n|3 3k |n|3
154 ROT H ’ S T H E O R E M

has only finitely many solutions k . If we multiply by n , we verify a fortiori that



k
k
3 3
 − ≥ 3−εk
2 2
for all but finitely many positive integers k , for any fixed ε > 0 . If we take ε =
log(4/3)/ log 3 and use the above solution of Waring’s problem, we deduce Mahler’s the-
orem that g(k) = 2k + ( 32 )k  − 2 for all sufficiently large integers k . Due to the ineffec-
tiveness of Roth’s theorem, it remains an open problem to determine an effective k0 such
that this result holds for k ≥ k0 .
6.2.8. The proof given here can be easily axiomatized to obtain the result for more general
fields of characteristic 0 , for example function fields of characteristic 0 (see [169], Ch.7).
On the other hand, Roth’s theorem does not hold in function fields of characteristic p . The
following example is due to Mahler.
Let K = k(t) , where k = Fp is an algebraic closure of the finite field Fp with p elements.
Let | |v be the absolute value on K such that |t|v = c−1 , with c > 1 . The completion
Kv can be identified with the field K((t)) of formal Laurent series


am t m , h∈Z
m=h

in the uniformizing parameter t , with


 
 h 
ah t + ah+1 t
h+1
+ · · ·  = c−h
v
provided ah = 0 .
Let q = pa and consider the Artin–Schreier equation
xq − x + t = 0
and the associated finite separable extension E = K[x]/(xq − x + t) of degree q . There
is an extension w of v to ME such that Ew = Kv , with a solution α of
αq − α + t = 0
given by
2 3
α = t + t q + tq + tq + · · · .
m m m +1
If we take β = t + tq + · · · + tq , we have H(β) = cq and |α − β|v = c−q ,
whence
|α − β|v = H(β)−[E :K ] .
Therefore in this case we cannot get any sharpening over the obvious Liouville inequality
and Roth’s theorem does not hold as soon as q ≥ 3 .

6.2.9. We first give a sketch of the main steps in the proof of Roth’s theorem in the
simplest possible case, namely K = Q and S = {∞}, so that | |v is the ordinary
euclidean absolute value. There is only one α to worry about. Suppose we have
infinitely many rational approximations p/q to α such that
 
 
α − p  ≤ q −κ .
 q
6.2. Roth’s theorem 155

Then, for any positive integer m and any large constant M, we can find m rational
approximations to α , namely pj /qj , j = 1, . . . , m, with log q1 > L and also
log qj+1 > M log qj , j = 1, . . . , m − 1.

Such a sequence of approximations will be said to be (L, M)-independent.


Step I: The auxiliary construction at the algebraic point.
We abbreviate x = (x1 , . . . , xm ). Construct a polynomial P (x) ∈ Z[x] with
partial degrees d1 , . . . , dm , vanishing to a (weighted) high order at (α, . . . , α).
The degrees dj are chosen so that the quantities dj log qj are nearly the same, for
j = 1, . . . , m. Vanishing to high order means vanishing of ∆P at (α, . . . , α)
with ∆ any differential operator of order (i1 , . . . , im ) with
i1 im
+ ··· + < t.
d1 dm
Here t is a parameter, which we want to be large.
Carrying out this step is done by an application of the pigeon-hole principle, say
Siegel’s lemma. The number of equations is asymptotic to Vm (t)d1 · · · dm , with
Vm (t) the volume of the region
Vm (t) := {x ∈ Rm | x1 + · · · + xm ≤ t, 0 ≤ xj ≤ 1 for j = 1, . . . , m}.
The number of coefficients is asymptotic to d1 · · · dm , and, if [Q(α) : Q] Vm (t) ≤
1 − δ , we can find such a polynomial P with integer coefficients bounded by
C d1 +···+dm , for a suitable constant C depending only on H(α), m , and δ .
Step II: Non-vanishing at the rational point.
Show that the polynomial P constructed in step I, or a suitable derivative of it
of rather small order, does not vanish at the rational point (p1 /q1 , . . . , pm /qm ),
provided the points pi /qi are (L, M )-independent and L, M are sufficiently
large.
This was the difficult step that, before Roth’s work, we could do only for m = 1
or 2. This step uses in an essential way the hypothesis that the approximations are
(L, M )-independent.
Step III: The upper bound.
Since P vanishes to high order at the algebraic points (α, . . . , α), the Taylor
expansion at (α, . . . , α) shows that, if |pj /qj − α| < qj−κ , then
−κtdj
|P (p1 /q1 , . . . , pm /qm )| ≤ C 
d1 +···+dm
max qj
j

for a constant C  depending only on α , m , and δ .


Step IV: The Liouville lower bound.
156 ROT H ’ S T H E O R E M

Since P does not vanish at the rational point, it is bounded away from 0 as
q1−d1 · · · qm
−dm
≤ |P (p1 /q1 , . . . , pm /qm )| .

Step V: Comparison of the upper and lower bounds.


We have chosen
d1 log q1 ∼ · · · ∼ dm log qm
 dj d
and since C is negligible with respect to qj j (because qj > eL ), comparison
of the upper and lower bounds shows that
κt ≤ m + O(1/M );
the constant involved in the O( ) symbol depends on α , m , and δ .
Thus, as M tends to ∞ , we find κ ≤ m/t provided [Q(α) : Q] Vm (t) ≤ 1 − δ .
A simple probability estimate shows that, if we choose t = ( 12 − ε)m , then Vm (t)
tends to 0 as m increases, so that this choice of t is admissible for large m . This
gives κ ≤ 1/( 12 − ε) leading to a contradiction for ε > 0 sufficiently small, and
Roth’s theorem follows.

6.3. Preliminary lemmas

In this section we prove several preliminary results needed for the proof of Roth’s
theorem. It will also be convenient to use the notation and formalism already
developed in Chapter 1.
6.3.1. We abbreviate x = (x1 , . . . , xm ), αv = (αv1 , . . . , αvm ) and write

 m

m mj
=
µ j=1
µj

and
µ1
µm
1 ∂ ∂
∂µ = ··· .
µ1 ! · · · µm ! ∂x1 ∂xm
We have

m m m−µ
∂µ x = x . (6.1)
µ
6.3.2. We work with polynomials in F [x1 , . . . , xm ] for a field F , vanishing to
high order at a point. For P ∈ F [x1 , . . . , xm ] and positive weights d = (d1 , . . . ,
dm ), we define the index of P at a point α = (α1 , . . . , αm ) to be
$ %
µ1 µm
ind(P ; d; α) = min + ··· + | ∂µ P (α) = 0 .
µ d1 dm
The following properties hold:
6.3. Preliminary lemmas 157

(a) ind(P + Q; d; α) ≥ min(ind(P ; d; α), ind(Q; d; α));


(b) ind(P Q; d; α) = ind(P ; d; α) + ind(Q; d; α);
µ1 µ2 µm
(c) ind(∂µ P ; d; α) ≥ ind(P ; d; α) − − − ··· − .
d1 d2 dm
By means of the Taylor expansion at α , we see easily that ind(P ; d; α) = ∞ if
and only if P = 0. Together with (a) and (b), this says that the index is a valuation.
6.3.3. It is convenient to introduce the set
Vm (t) := {x | x1 + · · · + xm ≤ t, 0 ≤ xj ≤ 1}
and its volume Vm (t) = vol(Vm (t)). For t ∈ Rn+ , we define

n
Vm (t) := Vm (ti ).
i=1

Lemma 6.3.4. Let αi , i = 1, . . . , n, be points αi = (αi1 , . . . , αim ) with coordi-


nates αij in a finite extension F/K of a number field K , of degree r = [F : K].
Suppose that t ∈ Rn+ satisfies
rVm (t) < 1.
Then, for all sufficiently large integers d1 , . . . , dm , there is P ∈ K[x1 , . . . , xm ],
not identically 0 and with partial degrees at most d1 , . . . , dm , such that:
(a) the index is bounded below by ind(P ; d; αi ) ≥ ti for i = 1, . . . , n;
(b) the height of P is bounded by
r n  m
h(P ) ≤ Vm (ti )(h(αij ) + log 2 + o(1))dj
1 − rVm (t) i=1 j=1
as dj → ∞ for j = 1, . . . , m.

Proof: We abbreviate I = (i1 , . . . , im ), J = (j1 , . . . , jm ), set P (x) = p J xJ
and consider the set of equations
i1 im
∂I P (αk ) = 0 for + ··· + < tk ; k = 1, . . . , n.
d1 dm
This is a linear system in the coefficients pJ of P , which we want to solve non-
trivially in K ; the coefficients of the linear system are in the field F , of degree
[F : K] = r . The number N of unknowns is N = (d1 + 1) · · · (dm + 1) ∼
d1 · · · dm as all dj → ∞ , while the number M of equations is asymptotically
M ∼ Vm (t) d1 · · · dm , because the number of lattice points (i1 /d1 , . . . , im /dm )
in Vm (t) is asymptotic to Vm (t) d1 · · · dm . In order to verify this, let Z be
this number. If we associate to each lattice point (i1 /d1 , . . . , im /dm ) the par-
allelepiped iν /dν ≤ xν ≤ (iν + 1)/dν , ν = 1, . . . , m, we immediately see that
Vm (t)d1 · · · dm ≤ Z .
158 ROT H ’ S T H E O R E M

For an upper bound for Z , we note that, if (i1 /d1 , . . . , im /dm ) is a lattice point
in Vm (t), then
i1 + 1 im + 1 1 1
+ ··· + ≤t+ + ··· +
d1 dm d1 dm
and iν + 1 ≤ dν + 1. It follows that, if we rescale Vm (t) by a factor 1 +
max(1, t−1 )(1/d1 + · · · + 1/dm ), then the rescaled domain contains all paral-
lelepipeds associated to lattice points in Vm (t). Hence

m
−1 1 1
Vm (t)d1 · · · dm ≤ Z ≤ Vm (t) 1 + max(1, t ) + ··· + d1 · · · dm ,
d1 dm
thus showing that Z ∼ Vm (t)d1 · · · dm as dj → ∞ .
In particular, we have N > rM if rVm (t) < 1 and dj → ∞ . By (6.1) on
page 156, the matrix of coefficients has entries

J
A= (αk ) J−I
I
with rows indexed by (I, k) and columns indexed by J .
We apply Siegel’s lemma as given in Theorem 2.9.19. In our case, it suffices to
produce only one small solution to our system of equations. Theorem 2.9.19 and
2.9.8 imply that, if N > rM , there is a solution x such that

r/(N −rM )
√ 
H(x) ≤ |DK | 1/2r
N H(A(I,k) ) , (6.2)
(I,k)

where A(I,k) is the (I, k)-th row of A .


For fixed (I, k), we estimate the height of the corresponding row vector A(I,k) of
 
A as follows. The vector A(I,k) has entries JI (αk )J−I , and hence

m
H(A(I,k) ) ≤ (2H(αkj ))dj .
j=1

This bound is independent of I .


As noted before, for fixed k we have about Vm (tk ) d1 · · · dm possibilities for I .
Thus the product of the heights of the rows of A is bounded by
 n 

m
Vm (tk )/2
 dj Vm (tk ) (1+o(1))d1 ···dm
(dj + 1) 2 H(αkj ) .
k=1 j=1

The conclusion follows from (6.2), noting that the term |DK |1/2r and the terms
(dj + 1)Vm (tk )/2 are negligible with respect to 2dj Vm (tk ) , so they do not affect the
asymptotics in the final estimate. 
6.3. Preliminary lemmas 159

Lemma 6.3.5. If 0 ≤ ε ≤ 1
2, then

1 2
Vm − ε m ≤ e−6mε .
2
Proof: We use a familiar method of probability theory. We set χ(x) = 1 if x < 0
and 0 if x > 0. Since χ(x) < e−λx for every λ > 0, we have


1 12
1 2
Vm −ε m = ··· χ(x1 + · · · + xm + mε) dx1 · · · dxm
2 − 12 − 12
12 12 
≤ ··· e−λ(mε+ xj ) dx1 · · · dxm
−1 − 12
 2 1 m
2
−λ(ε+x)
= e dx
− 12

= exp(−mU (λ))

with

sinh(λ/2)
U (λ) = ελ − log .
λ/2
It is possible to show that this estimate is quite precise, in the sense that

1
log Vm −ε m = −m max U (λ) + O(log m)
2 λ

uniformly in ε . For our purposes, it suffices to note that


u2 u4 u2 (u2 /6)2 2
sinh(u)/u = 1 + + + ··· ≤ 1 + + + · · · = eu /6 ,
3! 5! 6 2!
 
giving log sinh(u)/u ≤ u2 /6. If we choose λ = 12ε , we get what we want. 
6.3.6. The simplest way to achieve Step II in the proof of Roth’s theorem is by
means of Roth’s lemma:
Lemma 6.3.7. Let P (x1 , . . . , xm ) be a polynomial in m variables, with partial
degrees at most d1 , . . . , dm with di ≥ 1, with coefficients in Q and not identically
m
0. Let (ξ1 , . . . , ξm ) ∈ Q and let 0 < σ ≤ 12 . Suppose that:
(a) the weights d1 , . . . , dm are rapidly decreasing, namely
dj+1 /dj ≤ σ ;
(b) the point (ξ1 , . . . , ξm ) has components with large height, in the sense that
min dj h(ξj ) ≥ σ −1 (h(P ) + 4md1 ).
j

Then
m −1
ind(P ; d; ξ) ≤ 2m σ 1/2 .
160 ROT H ’ S T H E O R E M

Remark 6.3.8. The constant 2m appearing in the conclusion of the theorem is


not optimal but its actual value is of little importance here.
Proof: The proof is by induction on m . If the polynomial P is the product of two
polynomials in disjoint sets of variables, it is easy to obtain an upper bound for the
index if we have an upper bound for the index of the factors. The point is that, even
if P does not split in such a fashion, a suitable generalized Wronskian of P does.
If the degrees di form a rapidly decreasing sequence, we can get sufficient control
on the order of derivatives involved (it is here that we use hypothesis (a)), and the
induction works by splitting one variable at a time. The details are as follows.
6.3.9. If m = 1, the bound
ind(P ; d1 ; ξ1 ) d1 h(ξ1 ) ≤ h(P ) + d1 log 2
follows from Proposition 1.6.5 and the fact that (x1 − ξ1 )ind(P )d1 is a factor of P .
This gives Roth’s lemma for m = 1 with the better constants log 2 and 1 in place
of 4 and 2.
In order to perform the splitting of the Wronskian, we write P in the form

s
P = fj (x1 , . . . , xm−1 )gj (xm ),
j=0

where s ≤ dm and where the fj s, and similarly the gj s, are linearly independent
polynomials defined over Q .

We recall the Wronskian criterion for linear independence.


Proposition 6.3.10. Let K be a field of characteristic 0 and let x1 , . . . , xm be
algebraically independent over K . Let ϕj ∈ K[x1 , . . . , xm ], j = 1, . . . , n, be
n polynomials. Then ϕ1 , . . . , ϕn are linearly independent over K if and only if
some generalized Wronskian
⎛ ⎞
∂µ1 ϕ1 ∂µ1 ϕ2 . . . ∂µ1 ϕn
⎜ ∂µ ϕ1 ∂µ ϕ2 . . . ∂µ ϕn ⎟
Wµ1 ,...,µn (x1 , . . . , xm ) := det ⎜
⎝ ·
2 2 2 ⎟
· ... · ⎠
∂µn ϕ1 ∂µn ϕ2 . . . ∂µn ϕn
with |µi | = µ1i + µ2i + · · · + µmi ≤ i − 1 not identically 0.
Proof: If ϕ1 , . . . , ϕn are linearly dependent over K , then all generalized Wron-
skians vanish. Indeed, let
c1 ϕ1 + c2 ϕ2 + · · · + cn ϕn = 0
be a linear dependence relation among the ϕj . If we apply the differential opera-
tors ∂µi (i = 1, . . . , n) to this relation, we obtain a homogeneous linear system in
the coefficients cj and its determinant must vanish, proving what we want.
6.3. Preliminary lemmas 161

If instead ϕ1 , . . . , ϕn are linearly independent over K , we proceed as follows.


m −1
Consider the Kronecker substitution (x1 , x2 , . . . , xm ) → (t, td , . . . , td ), where
t is a new indeterminate. This maps monomials in x1 , . . . , xm into powers of t
and is injective on the set of monomials with partial degrees strictly less than d . It
follows that, if d is larger than the partial degrees of the ϕj s then ϕ1 , . . . , ϕn are
linearly independent over K if and only if the polynomials
m −1
Φj (t) := ϕj (t, td , . . . , td )
are linearly independent over K . By Wronski’s well-known result, this is the case
if and only if the Wronskian
 
W (t) := det (d/dt)i−1 Φj i,j=1,...,n (6.3)
is not identically 0. We have, for certain universal polynomials aµ,i (t; d, m) ∈
Q[t]  m −1
(d/dt)i−1 Φj = aµ,i (t; d, m) ∂µ ϕj (t, . . . , td )
|µ|≤i−1
and substituting into (6.3) we see that W (t) is a linear combination of generalized
m −1
Wronskians Wµ1 ,...,µn (t, td , . . . , td ) with |µi | ≤ i − 1. Since W (t) is not
identically 0, some generalized Wronskian does not vanish identically. 
6.3.11. We return to the proof of 6.3.7. Because of linear independence, applying
the above proposition shows that there are two Wronskians
U (x1 , . . . , xm−1 ) := det(∂µi fj )i,j=0,...,s
and
V (xm ) := det(∂ν gj )ν,j=0,...,s ,
which are not identically 0; here we have µi = (µ1i , . . . , µm−1,i ) and |µi | ≤
s ≤ dm . We multiply the two determinants and obtain
W (x1 , . . . , xm ) := det(∂µi ,ν P ) = U (x1 , x2 , . . . , xm−1 )V (xm ).
Since dj+1 /dj ≤ 1
2 for every j , we have
d1 + · · · + dm ≤ 2d1 .
The partial degrees of U and V are bounded by ((s + 1)d1 , . . . , (s + 1)dm−1 )
and (s + 1)dm ; we have
 
h(U ) + h(V ) = h(W ) ≤ (s + 1) h(P ) + 4d1 . (6.4)
Only the last display requires some explanation. We have h(W ) = h(U ) + h(V )
because U and V involve disjoint sets of variables, see Proposition 1.6.2. We esti-
mate h(W ) as follows, by expanding the determinant (π runs over permutations),
using the Laplace expansion and then applying Lemma 1.3.7, we get
 s

h(W ) ≤ max log  ∂µi ,π(i) P v + log((s + 1)!) .
π
v i=0
162 ROT H ’ S T H E O R E M

Then Gauss’s lemma in 1.6.3 and Gelfond’s lemma in 1.6.11 lead to


 s
 
h(W ) ≤ max log ∂µi ,π(i) P v
π
v i=0
+ (s + 1)(d1 + d2 + · · · + dm ) log 2 + log((s + 1)!) .
Now (6.1) on page 156 and d1 + . . . + dm ≤ 2d1 yield

s
h(W ) ≤ (h(P ) + (d1 + d2 + · · · + dm ) log 2)
i=0
" #
+ (s + 1) (d1 + d2 + · · · + dm ) log 2 + log(dm + 1)
 
< (s + 1) h(P ) + 4d1 ,
where in the last step we also used log(dm + 1) ≤ dm ≤ 12 d1 . This proves (6.4).
We obtain a lower bound for ind(W ) by expanding the determinant for W , using
properties 6.3.2 (a), (b), and (c) of the index to estimate from below the index of
W in terms of the index of a typical term in the expansion.
In what follows, we abbreviate ind(·) for ind(·; d; ξ). By 6.3.2 (c), we have the
estimate
µ1 µm−1 ν
ind(∂µ,ν P ) ≥ ind(P ) − − ··· − −
d1 dm−1 dm
dm ν
≥ ind(P ) − −
dm−1 dm
ν
≥ ind(P ) − − σ.
dm
Moreover, since the index is never negative, we can improve this to

ν
ind(∂µ,ν P ) ≥ max ind(P ) − , 0 − ε. (6.5)
dm
By 6.3.2 (a), (b), inequality (6.5), and expanding W by means of the Laplace
expansion, we get (here π runs over permutations of 0, . . . , s)
 s

ind(W ) ≥ min ind(∂µi ,π(i) P )


π
i=0
s


π(i)
≥ min max ind(P ) − ,0 − σ
π
i=0
dm
s

i
= max ind(P ) − ,0 − σ
i=0
dm

1 1
≥ (s + 1) min ind(P ), ind(P ) − (s + 1)σ.
2
(6.6)
2 2
6.4. Proof of Roth’s theorem 163

Here the last step comes from the easy inequality


s

i 1 1 2
max t − , 0 ≥ (s + 1) min t, t .
i=0
s 2 2

Next, we obtain an upper bound for ind(W ) by noting that


ind(W ) = ind(U ) + ind(V ) (6.7)

and using Roth’s lemma inductively on the number of variables to estimate ind(U )
and ind(V ).
Suppose we have proved Roth’s lemma for polynomials in l < m variables. We
apply the inductive assumption of Roth’s lemma to U and V but with (s + 1)dj
in place of dj . In view of the bounds obtained in (6.4) on page 161 for h(U ) and
h(V ), the hypotheses of Roth’s lemma are satisfied. Therefore, we obtain
m −2
ind(U ) ≤ 2(m − 1)(s + 1) σ 1/2 , ind(V ) ≤ (s + 1) σ
(use the better bound given in 6.3.9 for the case m = 1 when estimating ind(V )).
We insert these two estimates in (6.7) obtaining an upper bound for ind(W ), and
compare with the lower bound (6.6). This gives
m −2
min (ind(P ), ind(P )2 ) ≤ 4(m − 1)σ 1/2 + 4σ.
In any case ind(P ) ≤ m , hence the preceding bound may be simplified to
m −2 m −2
ind(P )2 ≤ 4m(m − 1)σ 1/2 + 4mε ≤ 4m2 σ 1/2 .
This completes the induction and the proof of Roth’s lemma. 

6.4. Proof of Roth’s theorem

The general case of Roth’s theorem is proved along similar lines as outlined in
Section 6.2, but the presence of several places means that we can only compare
approximations which have a similar behaviour at each place v . This creates ad-
ditional complications.
We shall prove the following statement:
Theorem 6.4.1. Let K be a number field with a finite set S of places. Let F be a
finite-dimensional extension of K and, for v ∈ S , let αv ∈ F . We extend | |v to
an absolute value | |v,K of F . Then for any κ > 2, there are only finitely many
β ∈ K such that

min(1, |β − αv |v,K ) ≤ H(β)−κ . (6.8)
v∈S
164 ROT H ’ S T H E O R E M

This statement implies Roth’s theorem in 6.2.3 by embedding K into Kv for


each v ∈ S , extending | |v to Kv and using the reduction | |v,K to the subfield
F = K({αv | v ∈ S}) ⊂ K . In fact, Theorem 6.4.1 is equivalent to Roth’s
theorem; we leave the proof to the reader, since the converse implication will not
be used here.
Proof of Theorem 6.4.1: We assume that (6.8) is satisfied for infinitely many β ∈
K and obtain a contradiction at the end.
We need a reduction, due to Mahler, which allows us to restrict our considerations
to approximations with similar behaviour at every place v ∈ S .

6.4.2. Step 0: Approximation classes.


We define

Λ(β) := min (1, |β − αv |v,K ) .
v∈S

A β ∈ K for which Λ(β) < 1 is said to be a non-trivial approximation.


Consider, for a non-trivial β , the vector

(log min (1, |β − αv |v,K ) / log Λ(β))v∈S .

This is a point in the |S|-dimensional unit cube lying on the hyperplane where
the sum of the coordinates is 1. We partition this cube by means of a grid of
semi-open subcubes of side 1/N where N is a positive integer, and classify β
according to the subcube containing the corresponding vector. The set of non-
trivial approximations β determining a same subcube is called an approximation
class. The quantity 1/N is the size of the approximation class.
Let λ := (λv )v∈S be the south-west corner of a subcube, namely the point

λ = (N xv /N )v∈S

with x any point in the subcube. Then we denote by Q(λ) the corresponding sub-
cube and by C(λ; N ) the approximation class of size 1/N determined by Q(λ).
For every β ∈ C(λ; N ) and v ∈ S , we have by definition
1
Λ(β)λv + N < min (1, |β − αv |v,K ) ≤ Λ(β)λv . (6.9)

Note also that


|S| 
1− ≤ λv ≤ 1, (6.10)
N
v∈S

because Q(λ) always contains a point x with xv = 1 if C(λ; N ) is not empty.
6.4. Proof of Roth’s theorem 165

Lemma 6.4.3. The number of approximation classes of size 1/N determined by


non-trivial approximations does not exceed

N + |S|
< 2N +|S| .
|S|
Proof: Consider an approximation class C(λ; N ). Then nv := N λv is a non-
negative integer and, by (6.10), we have

nv ≤ N.
v∈S

N +|S|
The number of solutions of this inequality is |S|
. 

6.4.4. Choosing independent solutions.


Since by hypothesis Λ(β) ≤ H(β)−κ has infinitely many solutions, the preceding
lemma shows that for any N there is an approximation class C(λ; N ) containing
infinitely many such β s.
Let β1 , . . . , βm be elements of K and let M ≥ 2. We say that the βj s are
(L, M )-independent if h(β1 ) ≥ L and h(βj+1 ) ≥ M h(βj ) for j = 1, . . . , n −
1.
By Northcott’s theorem in 1.6.8, any infinite sequence in K contains an infinite
subsequence of (L, M )-independent elements, therefore for any N and L, M
we can find an infinite subsequence of (L, M )-independent elements belonging to
a fixed approximation class C(λ; N ).
6.4.5. Step I: The auxiliary polynomial.
Let D be a large real number, which in the end will tend to ∞ , and choose
dj = D/h(βj ) , j = 1, . . . , m.
Let t = (tv )v∈S with tv = ( 12 − ε)m and let αv := (αv , . . . , αv ) ∈ F m ,
β = (β1 , . . . , βm ) ∈ K . We also abbreviate r := [F : K].
m

We want to apply Lemma 6.3.4 to this situation. By Lemma 6.3.5, we have



1 2 1
rVm (t) = r |S| Vm − ε m < r |S| e−6mε ≤
2 2
provided
log(2r |S|)
m> ,
6ε2
which we shall assume for the rest of this section. Thus the hypothesis of Lemma
6.3.4 is verified. We estimate (b) of Lemma 6.3.4 by noting that rVm (t)/(1 −
rVm (t)) ≤ 1. In this way we obtain a non-trivial polynomial P with coefficients
166 ROT H ’ S T H E O R E M

in K , partial degrees at most d1 , . . . , dm , such that ind(P ; d; αv ) ≥ ( 12 − ε)m


for v ∈ S , and
$  m %
h(P ) ≤ (h(αv ) + log 2)/h(βj ) D + o(D) (6.11)
v∈S j=1

as D → ∞ . If we define
C1 := |S| (max h(αv ) + log 2), (6.12)
v∈S
we obtain for large D the bound
h(P ) ≤ 2C1 D/L. (6.13)
6.4.6. Step II: Application of Roth’s lemma.
We would like P to have the additional property that P (β) = 0. In order to do
this, we apply Roth’s lemma to show that the polynomial P so constructed does
not vanish too much at β if β is (L, M )-independent and L, M are large, and
then work with a suitable derivative of P rather than P itself. The details are as
follows.
Let 0 < σ ≤ 1
2 . By Roth’s lemma in 6.3.7, we have
m −1
ind(P ; d; β) ≤ 2m σ 1/2
provided dj+1 /dj ≤ σ and dj h(βj ) ≥ σ −1 (h(P ) + 4md1 ).
In our case dj = D/h(βj ) ∼ D/h(βj ) and h(βj+1 ) ≥ M h(βj ), therefore the
first condition dj+1 /dj ≤ σ is verified if
M ≥ 2σ −1
and D is large enough, which we shall suppose. Similarly, using dj h(βj ) ∼
D , d1 ≤ D/h(β1 ) ≤ D/L and (6.13) we see that the condition dj h(βj ) ≥
σ −1 (h(P ) + 4md1 ) is verified for large D if
2C1 + 5m
D ≥ σ −1 D,
L
which is so if L ≥ (2C1 + 5m)σ −1 .
We conclude that, if M ≥ 2σ −1 , L ≥ (2C1 + 5m)σ −1 and D is large enough,
we have m −1
ind(P ; d; β) ≤ 2m σ 1/2 .
m −1
We choose σ = ε2 and deduce that there is µ such that ∂µ P (β) = 0 and

m
µj
≤ 2mε.
j=1
dj

Now recall that by construction we have ind(P ; d; αv ) ≥ ( 12 − ε)m , for v ∈ S .


Let Q = ∂µ P . We have, with C1 given by (6.12):
6.4. Proof of Roth’s theorem 167

Lemma 6.4.7. Suppose that β1 , . . . , βm are (L, M )-independent with


m −1 m −1
m > log(2r |S|)/(6ε2 ) and L ≥ (2C1 + 5m) ε−2 , M ≥ 2ε−2 .
Then, for every sufficiently large D , there is a polynomial Q ∈ K[x1 , . . . , xm ],
with partial degrees at most dj = D/h(βj ) , such that:

(a) ind(Q; d; αv ) ≥ 12 − 3ε m for v ∈ S ;


(b) Q(β) = 0;
(c) h(Q) ≤ 4 C1 D/L.
Proof: We have already verified statement (b), and 6.3.2 (c) implies (a). In order
to prove (c), it suffices to note (use (6.1) on page 156) that the differential operator
∂µ increases coefficients by not more that 2d1 +···+dm < 4d1 . Then use (6.13). 
6.4.8. Step III: The upper bound.
We begin by giving an upper bound for log |Q(β)|v , for each place v . As usual,
we abbreviate log+ t = max(0, log t) for t ≥ 0, and we also define

[Kv : Qv ]/[K : Q] if v is archimedean
εv :=
0 if v is non-archimedean.
If v ∈
/ S , we estimate log |Q(β)|v trivially, namely

m
log |Q(β)|v ≤ log |Q|v + (log+ |βj |v + εv o(1)) dj . (6.14)
j=1

Here the term o(1) tends to 0 as dj → ∞ .


If instead v ∈ S , we expand Q in Taylor series with center αv , obtaining

Q(β) = ∂j Q(αv )(β1 − αv )j1 · · · (βm − αv )jm ,
and estimate each term in the Taylor expansion. Now note that by Lemma 6.4.7
(a) we have

j1 jm 1
∂j Q(αv ) = 0 if + ··· + < − 3ε m ; (6.15)
d1 dm 2
and also a direct estimate yields

m
log |∂k Q(αv )|v,K ≤ log |Q|v + log+ |αv |v,K (dj − kj ) + εv (log 2 + o(1))dj .
j=1
(6.16)
We have the easily verified inequality
1
log |a − b|v,K ≤ − log+ + log+ |a|v,K + log+ |b|v,K + εv log 2
|a − b|v,K
168 ROT H ’ S T H E O R E M

(the first term on the right-hand side suffices if |a−b|v,K ≤ 1, while the remaining
terms take care of the case |a − b|v,K > 1), hence using (6.16) we find
  
 m


log ∂k Q(αv ) kj 
(βj − αv ) 
j=1 v,K


m
1
≤− kj log+ + log |Q|v
j=1
|βj − αv |v,K
m
+ (log+ |βj |v + log+ |αv |v,K + (log 4 + o(1))εv ) dj .
j=1
 
Now we can estimate log Q(β)v
  
 m

log |Q(β)|v = log  ∂k Q(αv ) (βj − αv )kj 
j=1 v,K
   
 m
 m

≤ max log ∂k Q(αv ) kj 
(βj − αv )  + εv log(dj + 1)
k v,K
j=1 j=1
⎛ ⎞

m
1
≤ − min  ⎝ kj log + ⎠ + log |Q|v
j=1
|βj − α v |v,K


m
+ (log+ |βj |v + log+ |αv |v,K + (log 4 + o(1))εv ) dj , (6.17)
j=1

where min means that the minimum is taken over (k1 , . . . , km ) not as in (6.15),
that is with

k1 km 1
+ ··· + ≥ − 3ε m. (6.18)
d1 dm 2

We put together (6.14) and (6.17), note that v εv = 1, and find
⎛ ⎞
    m
1
log Q(β)v ≤ − min  ⎝ kj log+ ⎠
j=1
|βj − α v |v,K
v∈MK v∈S
m
 

+ h(Q) + h(βj ) + log |αv |v,K


+
+ log 4 + o(1) dj .
j=1 v∈S

Lemma 6.4.7 (c) provides an upper bound for h(Q), and we have already observed
that dj ≤ 2d1 ≤ 2D/L+o(D/L). Also h(βj )dj ∼ D and h(βj ) ≥ L. Hence
6.4. Proof of Roth’s theorem 169

the last displayed inequality simplifies to


⎛ ⎞
    
m
1
log Q(β)v ≤ − min  ⎝ kj log+ ⎠
j=1
|βj − αv |v,K
v∈MK v∈S

C2
+ m+ D + o(D) (6.19)
L
with for example C2 = 4C1 + 2 log 4 + 2|S| maxv∈S log+ |αv |v,K .
It remains to estimate the minimum in the last displayed inequality. It is here that
we use the fact that the approximations βj are of similar type. Let C(λ; N ) be the
approximation class of the approximations βj . Recall that

Λ(βj ) := min(1, |βj − αv |v,K ).
v∈S
By the hypothesis (6.8) on page 163 and by (6.9) on page 164, we have
1 1
λv κ h(βj ) ≤ λv log ≤ log+ ;
Λ(βj ) |βj − αv |v,K
therefore, using h(βj )dj ∼ D , we get
⎛ ⎞ ⎛ ⎞
 m
1  m
min  ⎝ kj log+ ⎠≥ min  ⎝ λv κ h(βj ) kj ⎠
j=1
|βj − α v |v,K j=1
v∈S v∈S
  ⎛ ⎞
 m
k
λv min  ⎝ h(βj ) dj ⎠
j

j=1
d j
v∈S
  ⎛ ⎞
 m
kj ⎠
∼ Dκ λv min  ⎝ .
j=1 j
d
v∈S

Now (6.10) on page 164 gives λv ≥ 1 − |S|/N and (6.18) gives
  1

min  kj /dj ≥ − 3ε m.
2
We substitute in the last displayed inequality and find
⎛ ⎞
 m

min  ⎝ kj log+
1 ⎠ ≥ κ 1 − |S| 1
− 3ε mD + o(D).
j=1
|βj − αv |v,K N 2
v∈S
(6.20)
Finally, we substitute (6.20) into (6.19), obtaining the desired majorization


  |S| 1 C2
 
log Q(β) v ≤ −κ 1 − − 3ε mD + m + D + o(D).
N 2 L
v∈MK
(6.21)
170 ROT H ’ S T H E O R E M

6.4.9. Step IV: The lower bound.


Since Q(β) = 0 is in K , the lower bound for Q(β) is given in a most elegant
way by the product formula

log |Q(β)|v = 0 .
v∈MK

6.4.10. Step V: Comparison of the upper and lower bounds.


Comparison of the upper bound (6.21) and the product formula in 6.4.9 yields
(after division by mD and letting D → ∞ )

|S| 1 C2
−κ 1 − − 3ε + 1 + ≥ 0,
N 2 mL
1
which (assuming 2> 3 ε ) we may rewrite as


−1
−1
C2 |S| 1
κ≤ 1+ 1− − 3ε . (6.22)
L N 2
The right-hand side of this inequality tends to 2 as ε → 0, N → ∞ and L → ∞ ,
contradicting κ > 2.

For the reader’s convenience, we state in which order these various parameters
need to be chosen. First, note that the constants C1 and C2 depend only on the
given data K, F, (αv )v∈S in Theorem 6.4.1. Arguing by contradiction, we assume
that there are infinitely many solutions β of (6.8) on page 163 for some κ > 2.
We choose ε > 0 so small that ( 12 − 3ε)−1 < κ . Then we fix m, L, M so that the
assumptions of Lemma 6.4.7 are satisfied. Moreover, we may assume that L, N
are so large that (6.22) above is not satisfied. There is one approximation class
of size 1/N containing infinitely many solutions β of (6.8) on page 163, and
in particular containing m (L, M )-independent solutions β1 , . . . , βm . Once all
these data have been fixed the new parameter D is introduced, assumed very large,
and the polynomial Q satisfying (a) to (c) of Lemma 6.4.7 is constructed. This
leads to the asymptotic inequality (6.21) and, letting D → ∞ , to (6.22), which is
the desired contradiction. 

6.5. Further results

We describe here some additional results which complement Roth’s theorem.

6.5.1. As noted by Vojta, it is possible to give a refinement of Theorem 6.2.3 in which


we allow the point α to vary with β ; this is referred to as Roth’s theorem with moving
targets (because the target α of the approximation β is allowed to change with β ). We
have:
6.5. Further results 171

Theorem 6.5.2. Let K be a number field and let S be a finite subset of MK . Let F be a
finite extension of K and for each v ∈ S , extend | |v to an absolute value | |v,K of F .
Let κ > 2 . Then we cannot have an infinite sequence of points (αj , βj ) such that:

(a) αj = (αvj )v∈S ∈ F |S | , βj ∈ K;



(b) 1+ h(αvj ) = o(h(βj )) as j → ∞ ;
v∈S

(c) min(1, |βj − αvj |v,K ) ≤ H(βj )−κ .
v∈S

Proof: We replace α by αj and define αv = (αv1 , . . . , αvm ) in the preceding arguments.


Now (6.11) on page 166 will hold with the new upper bound
$ m %
(h(αvj ) + log 2)/h(βj ) D + o(D).
v∈S j=1

By our assumption (b) and by going to a subsequence, we may assume that



m
(h(αvj ) + log 2)/h(βj ) ≤ m/L
v∈S j=1

and the proof goes through as before. 


6.5.3. Theorem 6.2.3 and Theorem 6.5.2 can be made quantitative in the sense that we can
bound the number of solutions β in Theorem 6.2.3 and the length of the sequence (αj , βj )
in Theorem 6.5.2; in the case of Theorem 6.5.2, we must replace the o(h(βj )) appearing
in condition (b) by an explicit quantity δ(κ)h(βj ), where δ(κ) → 0 sufficiently fast as
κ → 2.
The proof of quantitative statements of this type utilizes in an essential way a strong gap
principle, which provides exponential growth for the sequence of heights of solutions in a
fixed approximation class. A simple statement is as follows:

Theorem 6.5.4. Let K be a number field and S be a finite subset of MK . Let F be a


finite-dimensional extension of K and, for v ∈ S , let αv ∈ F . We extend | |v to an
absolute value | |v,K of F . Let β, β  ∈ K be different elements in a same approximation
class of size 1/N as defined in 6.4.2, such that Λ(β) ≤ H(β)−κ , Λ(β  ) ≤ H(β  )−κ and
h(β  ) ≥ h(β) . Then

|S|
h(β  ) ≥ − log 4 + 1− κ − 1 h(β).
N

Remark 6.5.5. If N is so large that c := 1 − |S N


|
κ − 1 > 1 , we obtain h(β  ) ≥
c h(β) − log 4 . Thus the sequence of logarithmic heights of solutions grows at least in
geometric progression, whence the name ‘strong gap principle’.
Proof: For v ∈ S , we have |αv − β  |v,K < 1 if and only if |αv − β|v,K < 1 (since β , β 
are in the same approximation class). Passing from S to S  = {v ∈ S | |αv −β|v,K < 1} ,
the quantities in the theorem do not change and the lower bound gets even better. So we
172 ROT H ’ S T H E O R E M

may assume that |αv − β|v,K < 1 and |αv − β  |v,K < 1 for v ∈ S . By (6.9) on page 164,
we have for v ∈ S
log |β − β  |v = log |(αv − β  ) − (αv − β)|v
≤ max(log |αv − β  |v,K , log |αv − β|v,K ) + εv log 2
 
≤ − min κλv h(β  ), κλv h(β) + εv log 2
= −κλv h(β) + εv log 2.
Now we sum over v ∈ S and using (6.10) on page 164, we find


log |β − β  |v ≤ − 1 − |S|N κh(β) + log 2.


v∈S

Finally, the fundamental inequality (1.8) on page 20 and Proposition 1.5.15 give (note that
β = β  )

log |β − β  |v ≥ −h(β − β  ) ≥ −h(β) − h(β  ) − log 2
v∈S
and the result follows from the last two displayed inequalities. 
Lemma 6.5.6. Let K be a number field, S a finite subset of MK and F a finite extension
. Let κ
> 2 and let αv ∈ F for v ∈ S . Let N be an integer so large that
of K
|S |
c := 1− N
κ − 1 > 1 and let X > log 16/(c − 1) and A > 1 . Then the inequality

min(1, |β − αv |v,K ) ≤ H(β)−κ
v∈S
 +|S |

has at most log A/ log c+1
2
 N |S |
solutions β with h(β) ∈ (X, AX] .
Proof: The interval (X, AX] is contained in the union of log A/ log c+1
2
 intervals, each
of type (( c+1
2
)k X, ( c+1
2
)k+1 X] .
 +|S |
By the pigeon-hole principle, if there were more than log A/ log c+1 2
 N |S |
approx-
imations β with height h(β) ∈ (X, AX] , we would find an interval of type (Y, c+1 Y]
 +|S | 2
with Y > (log 16)/(c − 1) , and at least N |S |
approximations β , such that h(β) ∈
(Y, c+1
2
Y ] . Now, using Lemma 6.4.3, another application of the pigeon-hole principle
would give two elements β and β  with h(β), h(β  ) ∈ (Y, cY − log 4] and in a same
approximation class C(λ; N ) . This contradicts Remark 6.5.5. 
6.5.7. It is now clear how to obtain a bound for the number of solutions of the inequality in
Theorem 6.4.1. Let ε > 0 be such that ( 12 − 3ε)−1 < κ , let m = log(2r|S|)/(6ε2 ) ,
m −1 m −1
L ≥ m(C1 + 5)ε−2 and M = 2ε−2 (as required in Lemma 6.4.7). We choose
L, N so large that we contradict (6.22) on page 170. Then we cannot find m solutions
βj (j = 1, . . . , m) that are (L, M ) -independent and belong to a fixed approximation class
C(λ; N ) .
Consider all solutions β ∈ K of

min(1, |β − αv |v,K ) ≤ H(β)−κ
v∈S
6.5. Further results 173

in a fixed approximation class C(λ; N ) and group them in subsets N0 , N1 , . . . , Nj , . . .


as follows.
The first subset N0 contains all such solutions with h(β) ≤ L .
The other sets Nj are defined inductively: Suppose Nl is given for l ≤ j ; then Nj+1 is
obtained by first taking a solution βj+1 with smallest height and not in any Nl with l ≤ j ,
and then defining Nj+1 as the set of solutions β with h(βj+1 ) ≤ h(β) < M h(βj+1 ) . If
such a βj+1 does not exist, then Nl is empty for l > j .
The sequence β1 , β2 , . . . of solutions so constructed is (L, M ) -independent and belongs
to a fixed approximation class C(λ; N ) . Since by hypothesis (6.22) on page 170 is not veri-
fied, and since (6.22) on page 170 was obtained on the assumption that we had m (L, M ) -
independent solutions with m, L, M satisfying the hypothesis of Lemma 6.4.7, we see that
this sequence has less than m elements. It follows that all solutions in C(λ; N ) lie in the
union of the sets Ni , i = 0, . . . , m for some m < m . By the proof of Lemma 6.5.6, we
see that each Nj contains at most log M/ log c+1  elements. Moreover, Lemma 6.4.3
2  +|S |
shows that the number of approximation classes does not exceed N |S |
.
Now we subdivide solutions into:

(a) very small solutions, namely those with


|S|
h(β) ≤ log 16/(c − 1) with c = 1− κ − 1 > 1;
N
(b) small solutions, those with log 16/(c − 1) < h(β) ≤ L ;
(c) large solutions, those with h(β) > L .

A bound for the number of very small solutions can be obtained by Northcott’s theorem
in 1.6.8.
A bound for the number of small solutions can be obtained by appealing to Lemma 6.5.6
for the interval (log 16/(c − 1), L] , obtaining
 
c + 1 N + |S|
log L/ log  .
2 |S|
Note also that in this case we automatically have a bound for the height.
We cannot give a bound for the height of large solutions, but the above considerations show
that their number does not exceed
 
c + 1 N + |S|
m log M/ log  .
2 |S|
In order to get an idea of the size of this bound, we set δ = min(κ − 2, 1) . Then the above
quantity is majorized by
−2
(2r|S|)c 1 δ (c2 /δ)|S | (6.23)
for certain absolute constants c1 , c2 . For |S| = 1 , this is due to H. Davenport and K.F.
Roth [79] with the bound exp(70r2 δ −2 ) .
174 ROT H ’ S T H E O R E M

6.5.8. The bound for the number of large solutions obtained by the above method is rather
large as δ → 0 . It is possible to improve drastically this bound by replacing Roth’s lemma
by more sophisticated results, notably Dyson’s lemma in several variables or Faltings’s
product theorem (see Theorem 7.6.4). For example, an improved bound obtained using
Dyson’s lemma is
 c
log(2r|S|/δ) 3 (c4 /δ)|S |+4
for some absolute (and not very large) constants c3 , c4 . This is superior to (6.23) above.
6.5.9. No refinement of Roth’s theorem is known replacing κ > 2 by 2 + f (h(β)) , where
f (t) → 0 as t → ∞ . Heuristic arguments based on the analogy between diophantine
approximation and Nevanlinna theory suggest that f (t) = (1 + o(1))(log t)/t should be
admissible for this purpose (see Remark 13.2.25 and S. Lang and H. Trotter [175]).

On the other hand, something can still be said if the function f (t) tends to 0 sufficiently
slowly as t → ∞ . We have the Cugiani–Mahler theorem which we state without proof
(see E Bombieri and A.J. van der Poorten [36] for details):
Theorem 6.5.10. Let K be a number field and let S be a finite subset of MK . For each
v ∈ S let αv ∈ F ∩ Kv , where F is a finite extension of K of degree r . Let also

0 log log(t + log 4)
f (t) = 6 log r 4 .
log(t + log 4)
Let (βj ) be the sequence of solutions in K of

min (1, |βj − αv |v,K ) < (4H(βj ))−2−f (h(β j )) ,
v∈S

ordered by strictly increasing height. Then either the sequence (βj ) is finite or
h(βj+1 )
lim sup = ∞.
j→∞ h(βj )

6.6. Bibliographical notes

For further information about diophantine approximation in positive characteristic,


see M. Kim, D.S. Thakur, and J.F. Voloch [161].
The problem of finding a refined Roth’s theorem as in 6.5.9 is still open. The first
partial result was obtained in 1958 by Cugiani for approximations in Q , with
0
f (t) = C(r)/ log log t
for a suitable function C(r) of the degree r of the algebraic number α (he also
gives a value for C(r)). A little later M. Cugiani [74] extended this result to
approximations in an arbitrary number field K . After the appearance of Cugiani’s
paper for the case of K = Q , K. Mahler [186], Appendix B, independently
extended the earlier result of Cugiani to the case of a general number field K .
Nowadays, this is referred to as the Cugiani–Mahler theorem.
6.6. Bibliographical notes 175

More recently, alternative ways to Roth’s method for dealing with the difficult
point of the non-vanishing of the polynomial at the special point have been ob-
tained by H. Esnault and E. Viehweg [102] in their m -dimensional version of
Dyson’s lemma, and by G. Faltings [114] with his product theorem (see Theorem
7.6.4) and its quantitative versions due independently to J.-H. Evertse [106], R.G.
Ferretti [119], and G. Rémond [241]. We have not considered in this chapter these
very important improvements of Roth’s lemma and we have limited ourselves to a
rather elementary and explicit treatment along Roth’s original line.
Roth’s theorem with moving targets is proved in P. Vojta [315] and later M. Ru
and P. Vojta [250] extended the theorem to a version of Schmidt’s subspace the-
orem with moving targets. Vojta’s argument, inspired by N. Steinmetz’s paper
on Nevanlinna’s second theorem with moving targets [290] (see also the Biblio-
graphical Notes in Chapter 13), obtains the result as a consequence of Schmidt’s
subspace theorem (see Theorem 7.2.2). The simple direct proof outlined here ap-
pears to be new.
The improved bound in 6.5.8 for the case |S| = 1, with explicit values of c3 ,
c4 , see [36], Th.2 and the Note at the end of the proof. A similar result was
independently obtained by H. Luckhardt [184] combining Dyson’s lemma with
methods of mathematical logic. This was extended to several places by R. Gross
[132].
The improved function f (t) in Theorem 6.5.10 given here is obtained in [36],
Th.3. The improvements mentioned in 6.5.8 and 6.5.10 stem from replacing Roth’s
lemma with the deeper Dyson’s lemma in several variables of Esnault and Viehweg,
loc. cit.; the versions of Faltings’s product theorem mentioned above would suffice
as well. Except for this point, the structure of the proofs follows rather closely the
arguments in [79] and [74].
7 T H E S U B S PAC E T H E O R E M

7.1. Introduction

This chapter deals with Schmidt’s far-reaching extension of Roth’s theorem to sys-
tems of inequalities in linear forms. This is not a routine generalization; entirely
new difficulties appear in the course of the proof, which Schmidt resolved by in-
troducing new ideas from Minkowski’s geometry of numbers.
In the case of systems of inequalities, it is possible to have infinitely many solu-
tions. However, even then a finiteness theorem still holds, in the sense that solu-
tions are contained in finitely many proper linear subspaces of the ambient space.
This paves the way for applying induction arguments.
As is the case for Thue’s and Roth’s theorems, again Schmidt’s subspace theorem
is ineffective in the sense that no bound can be placed a priori on the height of the
finitely many linear spaces which contain the solutions. At any rate, it remains a
very flexible tool with wide applicability in many questions; the reader will find
some unusual applications of the subspace theorem in this chapter.
It is also possible, as for Roth’s theorem, to give an effective bound for the num-
ber of linear spaces containing the solutions. This requires rather sophisticated
methods beyond the scope of this book and will not be done here.
An extension of Schmidt’s theorem with a formulation allowing a finite set of
places, entirely analogous to Ridout’s and Lang’s generalizations of Roth’s theo-
rem, was later obtained by Schlickewei. This is quite important in applications.
Section 7.2 contains several equivalent formulations of the subspace theorem.
Then in Section 7.3 we consider applications of the subspace theorem related to
diophantine approximation and we give an alternative proof of Siegel’s theorem on
integral points, due to Corvaja and Zannier. In Section 7.4 we apply the subspace
theorem to the generalized unit equation. As a consequence we give Schmidt’s
solution of the norm-form equation, Laurent’s solution of the Lang conjecture for
tori and a nice result of Corvaja and Zannier from elementary number theory.
The proof of the subspace theorem, in the general form obtained by Schlickewei,
is given in full in Section 7.5. It is quite involved, first developing the “Roth
176
7.2. The subspace theorem 177

machinery” and then using new ideas of Schmidt, with further simplifications by
Evertse.
In the last two sections, we describe, without proofs, further important results. We
begin with Faltings’s product theorem in Section 7.6, which is a very important
geometric alternative to Roth’s lemma. In Section 7.7 we describe a deep exten-
sion of the subspace theorem obtained by Faltings and Wüstholz, dealing with
inequalities determined by forms of arbitrary degree; the finiteness statement be-
comes that the solutions are contained in a proper algebraic subvariety of the am-
bient space. Their method, quite different from Schmidt’s, uses deep tools from
algebraic geometry such as the theory of semi-stable bundles, Faltings’s product
theorem and an induction argument. Somewhat surprisingly, it was later shown
by Evertse and Ferretti that the Faltings–Wüstholz theorem can also be obtained
directly by an ingenious application of the subspace theorem.
In this chapter, the reader is assumed to be familiar with Chapter 6. For Section
7.4 it is also helpful to know the basic results of Chapters 3 and 5.

7.2. The subspace theorem

We begin with Schlickewei’s and Evertse’s formulation [108] of Schmidt’s sub-


space theorem over number fields, in a projective setting.
7.2.1. We follow the notions introduced in Chapter 1 and used in Chapter 6: For
a place v on a number field K , we denote by | |v the normalized absolute value
(as in (1.6) on page 11) to get the product formula. As usual, we denote again by
| |v its extension to the completion Kv . The multiplicative height H(α) = eh(α)
for an algebraic number is as defined in 1.5.7. The multiplicative projective height
H(x) on algebraic points of PnQ is defined as in 1.5.4. For x ∈ K n+1 , let |x|v :=
maxi |xi |v .
Theorem 7.2.2. Let K be a number field, S ⊂ MK a finite set of places, n ∈ N
and ε > 0. For every v ∈ S , let {Lv0 , . . . , Lvn } be a linearly independent set
of linear forms in the variables X0 , . . . , Xn with K -algebraic coefficients in Kv .
Then there are finitely many hyperplanes T1 , . . . , Th of PnK such that the set of
solutions x ∈ PnK (K) of
  n
|Lvi (x)|v
< H(x)−n−1−ε
i=0
|x| v
v∈S
is contained in T1 ∪ · · · ∪ Th .
Remark 7.2.3. The condition that Lvi has K -algebraic coefficients in Kv , rather
than in some finite extension of Kv , is not restrictive. We may even reduce every-
thing to the case where the coefficients are in K , by the following argument. As
in Theorem 6.4.1, we may replace the completions Kv by a field of definition F
178 T H E S U B S PAC E T H E O R E M

for all forms Lvi using suitable extensions of | |v to the number field F . We may
assume that F is a finite Galois extension of K . Then use Lui = Lvi for the place
u ∈ MF corresponding to the chosen extension of v to F . For the other places
w|v (w ∈ MF ), there is σ ∈ Gal(F/K) with w = u ◦ σ −1 (Corollary 1.3.5)
and we set Lwi := σ(Lvi ). The subspace theorem for F and this set of linear
forms implies the subspace theorem in the relative setting, because the left-hand
sides are the same for x ∈ K n+1 .
Note that the field of definition of the subspaces can always be assumed to be K ,
since we may replace Tj by the linear span of all solutions x ∈ PnK (K) contained
in Tj .
7.2.4. A very important refinement of the theorem above is a quantitative version in
which we also control the number of hyperplanes needed to contain all solutions.
The first result of this type was obtained by Schmidt and the best bounds are in
J.-H. Evertse [108] and J.-H. Evertse and H.P. Schlickewei [111]. These bounds
are quite sharp and uniform with respect to the number field K , which is essential
in many applications, an example is Theorem 7.4.1. Here we will limit ourselves
to the qualitative statement above, referring to the original papers [108], [111], for
statements and proofs of the quantitative versions.
The proof will be given in Section 7.5. The following affine version of the subspace
theorem is worth noting. We denote by OS,K the ring of S -integers (see 1.5.10).
Corollary 7.2.5. Let K , S , Lvi be as before with S containing all archimedean
places and let ε > 0. Then there are finitely many linear subspaces T1 , . . . , Th of
K n+1 such that the set of S -integral solutions x ∈ OS,K
n+1
\ {0} of
 
n
|Lvi (x)|v < H(x)−ε
v∈S i=0

is contained in T1 ∪ · · · ∪ Th .
Proof: Since x ∈ OS,K
n+1
\ {0}, we have
 
H(x) = |x|v ≤ |x|v .
v∈MK v∈S

Hence we see that every S -integral solution in Corollary 7.2.5 induces a projec-
tive solution in Theorem 7.2.2. We conclude that Corollary 7.2.5 follows from
Theorem 7.2.2. 
Theorem 7.2.6. The affine subspace theorem as in Corollary 7.2.5 implies the
subspace theorem as in Theorem 7.2.2.
Proof: Since the subspace theorem for a set of places S implies a fortiori the
subspace theorem for a subset S  ⊂ S , we may assume that S is so large that it
contains all the archimedean places and also that OS,K is a principal ideal domain
7.2. The subspace theorem 179

(see Proposition 5.3.6). Now let x ∈ K n+1 \{0} verify the projective diophantine
inequality
 in the statement of Theorem 7.2.2 and let X be the fractional ideal
X = xi OS,K of the ring OS,K . Then, if v ∈ / S , we have
|x|v = max |xi |v = max |ξ|v . (7.1)
i ξ∈X

On the other hand, we have X = (δ) for some δ ∈ K × . Therefore, from equation
(7.1) we infer that
max |xi |v = |δ|v
i

/ S . Hence, setting x := δ −1 x , we have


for every place v ∈
|x |v = max |xi |v = 1
i

for v ∈
/ S , whence

|x |v = H(x ) = H(x)
v∈S

and also x ∈ n+1
OS,K \ {0}. Now x is an affine solution of the inequality in
Corollary 7.2.5. This shows the equivalence of the projective and affine diophan-
tine inequalities in Theorem 7.2.2 and Corollary 7.2.5, completing the proof. 
Example 7.2.7. Roth’s theorem in 6.2.3 is the special case n = 1 of the subspace
theorem. In the form given in this book, it states that, given K and S as above
and given K -algebraic αv ∈ Kv for v ∈ S , the inequality

min(1, |β − αv |v ) < H(β)−2−ε (7.2)
v∈S

has only finitely many solutions β ∈ K . Note that, by splitting the solutions of
(7.2) into finitely many subsets according to the places v ∈ S for which |β −
αv |v < 1, we see immediately that (7.2) is equivalent to the statement that the
solutions to

|β − αv |v < H(β)−2−ε , |β − αv |v < 1 (v ∈ S ∗ ) (7.3)
v∈S ∗

form a finite set, for S ∗ any subset of S .


Let Lv0 = X0 , Lv1 = X1 − αv X0 , and take x = (1 : β) ∈ P1 (K). Then
|Lv0 (x)|v |Lv1 (x)|v
= |β − αv |v max(1, |β|v )−2 .
|x|v |x|v
Now if |β − αv |v < 1 we have |β|v < 1 + |αv |v and max(1, |β|v ) is bounded
above independently of β . Therefore, if (7.3) holds, we also have
 |Lv0 (x)|v |Lv1 (x)|v
< CH(x)−2−ε

|x|v |x|v
v∈S
180 T H E S U B S PAC E T H E O R E M

for some constant C . Now by Northcott’s theorem in 1.6.8, we have C < H(x)ε/2 =
H(β)ε/2 except for finitely many β ∈ K . Thus we may apply the subspace theo-
rem with n = 1 and ε/2 in place of ε and conclude that the solutions β of (7.3)
form a finite set. Roth’s theorem follows.

Next we give a stronger formulation of the subspace theorem due to P. Vojta [307]
and which is of importance in some applications.

Definition 7.2.8. Let F be a field. A set {L1 , . . . , Lm } of linear forms in the ring
F [X0 , . . . , Xn ] is said to be in general position if any subset of cardinality not
exceeding n + 1 is linearly independent over F .

Theorem 7.2.9. Let K be a number field and S ⊂ MK be a finite subset of


places of K . For every v ∈ S , let {Lv0 , . . . , Lvmv } be a set of linear forms
in Kv [X0 , . . . , Xn ] in general position and with K -algebraic coefficients. Let
ε > 0. Then there are finitely many hyperplanes T1 , . . . , Th of PnK such that the
set of projective solutions x ∈ PnK (K) of
 
mv
|Lvi (x)|v
< H(x)−n−1−ε
i=0
|x| v
v∈S

is contained in T1 ∪ · · · ∪ Th .
Proof: We note first that we may assume mv ≥ n . This is clear by extending the
set of linear forms L0v , . . . , Lmv v with suitable standard coordinates to a basis of
the space of linear forms in case of mv < n. Next, we reduce to the case mv = n .
By partitioning the set of solutions x into finitely many classes, we may assume
after a permutation of the forms Lvi that

|Lvi (x)|v ≤ |Lv1 (x)|v ≤ · · · ≤ |Lvmv (x)|v .

We keep only the forms Lvi with i ≤ n and remove the forms with i > n,
& vi of linear forms with L
obtaining a new set L & vi = Lvi for i ≤ n . The forms
Lvi with i ≤ n form a basis of the space of all linear forms, hence

|x|v  max |Lvi (x)|v = |Lvn (x)|v ≤ |Lvi (x)|v


i≤n

for i > n. Therefore, on each subset of solutions we have


 n & vi (x)|v
|L  
mv
|Lvi (x)|v
≤C < C H(x)−n−1−ε
i=0
|x| v i=0
|x| v
v∈S v∈S

for some constant C independent of x . As in Example 7.2.7, the constant C can


be removed by passing to a smaller ε , hence the result we want follows from the
& v0 , . . . , L
subspace theorem in 7.2.2 applied to the set of forms {L & vn }, v ∈ S . 
7.3. Applications 181

7.3. Applications

There are several consequences of the subspace theorem in the theory of diophan-
tine approximation of algebraic numbers, and we examine here a few of them. At
the end of this section, we present a proof of Siegel’s theorem on integral points
relying on the subspace theorem. For this, the reader should be familiar with the
theory of algebraic curves as provided by Appendix A.13.
7.3.1. There are several consequences of the subspace theorem in the theory of
diophantine approximation of algebraic numbers, and we examine here a few of
them.
A well-known theorem of Dirichlet states that, if 1, α1 , . . . , αn are real numbers,
then for every positive integer N there is a point x ∈ Zn+1 \{0} with max |xi | ≤
N , i = 1, . . . , n, such that
|x0 + α1 x1 + · · · + αn xn | ≤ N −n .
The easy proof is obtained applying the pigeon-hole principle to
{α1 x1 + · · · + αn xn (mod 1) | xi = 0, . . . , N },
or by geometry of numbers applying Minkowski’s first theorem in C.2.19 to the
symmetric convex body of volume 2n+1 given by
|X0 + α1 X1 + · · · + αn Xn | ≤ N −n , |Xi | ≤ N, i = 1, . . . , n.

The following result, which is equivalent to Roth’s original theorem if n = 1,


shows that this result is essentially sharp for linear forms with real algebraic coef-
ficients.
Theorem 7.3.2. Let α0 , . . . , αn be algebraic numbers. Then for every ε > 0 the
inequality
0 < |α0 x0 + · · · + αn xn | ≤ H(x)−n−ε (7.4)
has only finitely many solutions x ∈ Zn+1 .
Remark 7.3.3. The non-vanishing of the linear combination of the algebraic num-
bers α0 , . . . , αn is essential here. The statement of the theorem in this form makes
it applicable without assuming a linear independence condition, which is useful in
certain cases.
Remark 7.3.4. The preceding theorem can be easily extended to systems of ho-
mogeneous linear diophantine inequalities, the simplest example being
|x0 | ≤ N, |αi x0 − xi | ≤ N −1/n−ε (i = 1, . . . , n).
The solubility of this for real αi , ε = 0, and arbitrary N is a classical result
about simultaneous diophantine approximation, easily proved by Minkowski’s first
theorem. For the general case, see W.M. Schmidt [268], Ch.VI, §2.
182 T H E S U B S PAC E T H E O R E M

Proof of Theorem 7.3.2: The proof is by induction on n . If n = 0, there is nothing


to prove. We may assume that αi = 0 for every i. We apply the subspace theorem
in 7.2.2 and Remark 7.2.3 with K = Q , v = ∞ , L0v = α0 X0 + · · · + αn Xn ,
Lvi = Xi , i = 1, . . . , n, obtaining that the set X of solutions is contained in
finitely many rational linear subspaces Ti ⊂ PnQ . Let T be one of these subspaces.
Then it provides a linear relation

n
Aj Xj = 0,
j=0

which we may assume to hold on the set X (by partitioning). Without loss of
generality, we may suppose An = 0. It follows that
0 < |α0 x0 + · · · + αn xn | = |β0 x0 + · · · + βn−1 xn−1 |
with βi = αi − αn Ai /An , and a fortiori
0 < |β0 x0 + · · · + βn−1 xn−1 | ≤ H(x)−n−ε ≤ H(x )−n+1−ε
with x = (x0 , . . . , xn−1 ). Now the induction hypothesis applies and we obtain
finitely many possibilities for x . Moreover, since αn = 0, given x there are only
finitely many possibilities for xn , and there are only finitely many possibilities for
the subspace T . Hence the set X is finite. 

The following corollary is due to Schmidt, [268], Ch.VIII, Th.9A.


Corollary 7.3.5. Let α ∈ C be algebraic, let d be a positive integer, and let
ε > 0. Then there are only finitely many complex algebraic numbers ξ of degree
at most d such that
|α − ξ| ≤ H(fξ )−d−1−ε ,
where H(fξ ) is the height of the minimal polynomial of ξ .
Proof: We may assume that ξ is not a conjugate of α . Let
fξ (X) = xd X d + · · · + x0
be the minimal polynomial of ξ over Z and with coprime coefficients. Then
fξ (α) = 0 and using the mean-value theorem we get
0 < |x0 + x1 α + · · · + xd αd | = |fξ (α)|
α |α − ξ| H(fξ )
≤ H(fξ )−d−ε = H(x)−d−ε .
By the preceding theorem, there are only finitely many solutions x to this inequal-
ity (by a familiar argument already used at the end of Example 7.2.7, the constant
involved in the α symbol is irrelevant here). 
Remark 7.3.6. Note that H(fξ )  H(ξ)deg(ξ) , which follows from Proposi-
tion 1.6.6 and Lemma 1.6.7.
7.3. Applications 183

7.3.7. It is an interesting open problem to find the best exponent κd for which the inequality
|α − ξ| ≤ cH(fξ )−κ d +ε
has infinitely many real algebraic solutions ξ of degree at most d , for every fixed ε > 0 and
every real α not an algebraic number of degree at most d , for some constant c depending
on α and ε . If d = 1 and α is irrational, Dirichlet’s theorem shows that κ1 = 2 (even
with ε = 0 ). If d = 2 and α is not rational or a quadratic irrational, a difficult theorem
of H. Davenport and W.M. Schmidt [80] shows that κ2 = 3 (even with ε = 0 ). However,
their method breaks down if d ≥ 3 and it remains completely unclear what the correct
result should be.
In this connexion, it is instructive to consider the not unrelated problem of approximating
real numbers by algebraic integers of degree at most d . The first non-trivial case is d = 2 ,
where we can easily show (see H. Davenport and W.M. Schmidt [81]) that the inequality
|α − ξ| ≤ cH(fξ )−2
has infinitely many solutions in algebraic integers ξ of degree at most 2 for some constant
c depending on α , provided α is irrational. The next case d = 3 turned out to be quite
interesting. Davenport and Schmidt proved that, if α is not algebraic of degree at most 2 ,
then the inequality

|α − ξ| α H(fξ )−(3+ 5)/2

has infinitely
√ many solutions with ξ an algebraic integer of degree at most 3 . The exponent
(3 + 5)/2 looked somewhat strange and the authors commented, “We have no reason to
think that the exponents in these theorems are best possible.”
Thus it was a great surprise when D. Roy [246] constructed a real transcendental number α
and a constant c > 0 such that

|α − ξ| ≥ cH(fξ )−(3+ 5)/2

for every algebraic integer ξ of degree at most 3 .

7.3.8. We conclude this section with an application of Corollary 7.2.5, which illus-
trates its power. It is an alternative direct proof, due to P. Corvaja and U. Zannier
[72], of a famous theorem of Siegel on integral points on curves. The standard
proof uses diophantine approximation (see 14.3.5) on the Jacobian and Roth’s the-
orem (see [277], §7.3, for details).
Let C be a geometrically irreducible affine curve over a number field K and let
S be a finite set of places containing the archimedean places. We assume that C
is given as a closed subvariety of AnK . An S -integral point of C is a point of C
with OK,S -integral coordinates. This notion depends on the embedding of C into
affine space.
&aff → C be the normalization of C and we extend the affine curve C
Let π : C &aff
&
to a smooth projective curve C , which is unique up to isomorphism (see A.13.2,
&\C
A.13.3). The points in C &aff are called the points of C at ∞ .
184 T H E S U B S PAC E T H E O R E M

Now we are ready to state Siegel’s theorem on integral points.


Theorem 7.3.9. If C has at least three distinct points at ∞ , then C has only
finitely many S -integral points.
Remark 7.3.10. The usual version of Siegel’s theorem is stronger, requiring the condition
& has genus 0 . We briefly sketch how it may
of at least three distinct point at ∞ only if C
be deduced from Theorem 7.3.9.

We may assume C smooth, as we will show at the beginning of the proof of Theorem 7.3.9.
& has positive genus g , we may take, after replacing K by a larger number field, an
If C
unramified covering of C & to obtain a new curve C&  with at least three distinct points at ∞
with C  the inverse image of C . In order to show existence, we note that the first homology
group H1 (C &an , Z) is a free abelian group of rank 2g . By standard techniques from the
theory of covering spaces (see Section 12.3), we may easily construct a normal subgroup of
the fundamental group of finite index ≥ 3 and hence a finite unramified covering of C &an
of degree ≥ 3 . By Theorem 12.3.12, the covering is defined over a finite extension of K
and leads to our desired unramified covering π : C & → C & of degree ≥ 3 ; hence C  has at
least 3 points at ∞ .
Moreover, the Chevalley–Weil theorem as in Theorem 10.3.11 shows that rational points on
C lift to rational points over a finite extension K  of K . By enlarging S , we may assume
that π extends to an unramified finite morphism over OS,K and the valuative criterion of
properness ([148], Th. II.4.7) shows that S -integral points on C lift to S  -integral points
on C  for S  the set of places of K  lying over S . Thus we can apply the above theorem
to C  to deal with arbitrary curves of positive genus.
Proof of Theorem 7.3.9: To begin with, we show that, by increasing the set S , we
may assume that C is normal and therefore smooth.
Let R be the ring of regular functions on C . By a base change to a larger number
field, we may assume that the K -rational points of C lift to K -rational points of
the normalization (using birationality from A.13.2). The S -integral points on C
are those at which all coordinate functions x1 , . . . , xn take values in OS,K . Now
let f ∈ K(C) be integral over R . Then f satisfies some equation

N
fN + pj (x)f N −j = 0,
j=1

where pj (X) ∈ K[X]. By enlarging S , we may assume that pj (X) ∈ OS,K [X].
Since OS,K is integrally closed in K , we see that f continues to take values in
OS,K at the S -integral points of C . Thus adding f to the coordinate ring R
preserves S -integrality (possibly by enlarging S ). Therefore, by enlarging S we
may replace R by its integral closure, which is what we needed to prove.
& the associated smooth projective curve. Clearly, we may assume
We denote by C
now that C ⊂ C& . Let Q1 , . . . , Qr be the distinct points at ∞ on C , which we
7.3. Applications 185

may assume to be K -rational by enlarging K again. Let N be a large integer to


be chosen later and let V be the space of rational functions ϕ in K(C)× such
that div(ϕ) ≥ −N ([Q1 ] + · · · + [Qr ]), that is with poles of order at most N at
the points Qi . We include also ϕ = 0 in V . By the Riemann–Roch theorem in
A.13.5, V is a K -vector space of dimension
d = dim(V ) ≥ N r + 1 − g, (7.5)

where g is the genus of C& . Let ϕ = {ϕ1 , . . . , ϕd } be a basis of V . Since ϕj


is regular on C we may assume, by enlarging S if needed, that ϕj (P ) ∈ OS,K
whenever P is an S -integral point in C(K).
Now let (Pν )ν∈N be a sequence of S -integral points on C . Let v ∈ MK . Since
& is projective, C(K
C & v ) is compact for the v -adic topology (use Example 2.6.4);
hence, going to a subsequence if needed, we may assume that for v ∈ S the points
& v ). We now write S = S  ∪ S  ,
Pν converge v -adically to a point P v ∈ C(K

where S is the set of places v for which P v ∈ {Q1 , . . . , Qr }.
For the other places v ∈ S  , we note that |ϕj (Pν )|v is uniformly bounded, be-
cause the points P v are in C and the functions ϕj are regular on C .
For a place v ∈ S  , we define a filtration
V = Wv1 ⊃ Wv2 ⊃ . . .
by setting
Wvj = {ϕ ∈ V | ordP v (ϕ) ≥ j − 1 − N }.
We have dim(Wvj /Wv,j+1 ) ≤ 1 (as in Lemma 8.10.9), hence dim(Wvj ) ≥
d − j + 1. We now pick a basis for Wvd and complete it successively to bases of
the vector spaces Wv,d−1 , Wv,d−2 , . . . , Wv1 . This gives us a basis wv1 , . . . , wvd
of V such that
ordP v (wvj ) ≥ j − 1 − N for j = 1, . . . , d.
Expressing the wvj in terms of the basis ϕ of V , we obtain d independent linear
&
forms Lvj (ϕ) defined over K (because v ∈ S  and P v ∈ C(K)) with
ordP v (Lvj (ϕ)) ≥ j − 1 − N for j = 1, . . . , d. (7.6)

Finally, we define independent linear forms for v ∈ S  by setting Lvj (ϕ) = ϕj ,


for j = 1, . . . , d.
For v ∈ S  , let λv be a local parameter at P v , i.e. ordP v (λv ) = 1 (recall that
&
P v ∈ C(K) because of the definition of S  ). Then, since the sequence (Pν )ν∈N
converges v -adically to P v for v ∈ S  , we deduce from (7.6) that
for v ∈ S  , j = 1, . . . , d and ν sufficiently large.
j−1−N
|Lvj (ϕ(Pν ))|v  |λv (Pν )|v
186 T H E S U B S PAC E T H E O R E M

For v ∈ S  , we have that |Lvj (ϕ(Pν ))|v is bounded, because of the definition of
S  . Therefore, by multiplying these inequalities we get
 d 
d(d−2N −1)/2
|Lvj (ϕ(Pν ))|v  |λv (Pν )|v (7.7)
v∈S j=1 v∈S 

for ν sufficiently large. On the other hand, the ϕj (Pν ) are S -integers; hence
|ϕj (Pν )|v ≤ 1 for v ∈/ S , and also |ϕj (Pν )|v is bounded for v ∈ S  . For
−N
v ∈ S  and ν sufficiently large, we have |ϕj (Pν )|v  |λv (Pν )|v because the
maximum order of pole of ϕj at P v is at most N . This shows that

−N
H((ϕ1 (Pν ) : · · · : ϕd (Pν )))  |λv (Pν )|v . (7.8)
v∈S 
By combining (7.7) and (7.8), we infer that

d
d (d −2N −1)
|Lvj (ϕ(Pν ))|v  H((ϕ1 (Pν ) : · · · : ϕd (Pν )))− 2N

v∈S j=1

always for ν sufficiently large. Since, by hypothesis r ≥ 3, inequality (7.5) yields


d − 2N − 1 ≥ (rN + 1 − g) − (2N + 1) > 0 for N > g . Moreover, we
may assume that H((ϕ1 (Pν ) : · · · : ϕd (Pν ))) → ∞ because otherwise the ra-
tios ϕj (Pν )/ϕ1 (Pν ) would belong to a finite set independent of ν by Northcott’s
theorem from 1.6.8. Hence the points Pν would belong to a finite set, which is
what we want to prove. Thus we may apply the subspace theorem as in Corol-
lary 7.2.5 with any fixed 0 < ε < d(d − 2N − 1)/(2N ), concluding that the
S -integer points (ϕ1 (Pν ), . . . , ϕd (Pν )) lie in a finite union of linear subspaces of
K d . Since the functions ϕj are linearly independent on the curve C & , these linear
subspaces cannot contain ϕ(C) (at least if there are infinitely many distinct points
Pν ) and must have a finite intersection with ϕ(C). Hence, in any case, the points
Pν belong to a finite set. 

7.4. The generalized unit equation

In Chapter 5 we examined in some detail the unit equation x + y = 1 and its


applications, in particular to the finiteness of solutions of significant classes of
diophantine equations in two variables. The following theorem of J.-H. Evertse,
H.P. Schlickewei and W.M. Schmidt [112] is an extension of Theorem 5.2.1 to
several variables. We will not prove this theorem but we state it here because of its
importance.
Theorem 7.4.1. Let K be a field of characteristic 0, let K × be its multiplicative
subgroup, and let Γ be a subgroup of K × of finite Q -rank r . Let X be the set of
solutions of
x1 + · · · + xn = 1
7.4. The generalized unit equation 187

such that (x1 , . . . , xn ) ∈ Γ and no proper subsum of x1 + · · · + xn vanishes.


Then X is a finite set of cardinality at most
3n
|X | ≤ e(6n) (r+1)
.

A remarkable feature of this result is the uniformity of the bound for the number of
solutions, which, as in Theorem 5.2.1, is simply exponential in the rank of Γ and
independendent of the field K . This theorem is quite difficult to prove. We prove
now a weaker, but still very useful, version of Theorem 7.4.1, due independently to
Schlickewei and van der Poorten (see the bibliography in [260]) and J.-H. Evertse
[104].
Theorem 7.4.2. Let K be a number field and let S be a finite set of places of K
containing all archimedean places, with group of S -units US,K . Let X be the set
of solutions of
x1 + · · · + xn = 1
such that (x1 , . . . , xn ) ∈ (US,K )n and no proper subsum of x1 + · · · + xn van-
ishes. Then X is a finite set.
Corollary 7.4.3. Let X be the set of solutions of
x1 + · · · + xn = 1
such that (x1 , . . . , xn ) ∈ (US,K )n . Then there is a finite set F ⊂ US,K such that
every x ∈ X has at least one coordinate in F .
Proof of corollary: Clear by induction on n and Theorem 7.4.2. 
Proof of theorem: We follow U. Zannier [337]. The proof is by induction on n ,
the case n = 1 being obvious.
We say that a solution x ∈ X of the S -unit equation x1 + · · · + xn = 1 is non-
degenerate if no proper subsum of x1 + · · · + xn vanishes. We partition X into
finitely many subsets according to the set of indices jv (v ∈ S) such that
jv = min{j | |xj |v = max |xi |v }
i
and then it suffices to prove the result for each subset.
Let us fix one of these subsets, say X  , and define linear forms Lvj by Lvj = Xj
if j = jv and Lvjv = X1 + · · · + Xn . Since Lvjv (x) = x1 + · · · + xn = 1 and
since |xjv |v = |x|v , we have for v ∈ S

n  
n
|Lvj (x)|v = |xj |v = |x|−1
v |xj |v .
j=1 j=jv j=1

On the other hand, we have for every j



|xj |v = 1
v∈S
188 T H E S U B S PAC E T H E O R E M

because xj ∈ US,K (use the product formula in 1.4.3). Therefore, from the last
two displayed equations, we infer that

n 
|Lvj (x)|v = |x|−1
v .
v∈S j=1 v∈S

Moreover, we have  
H(x) = |x|v = |x|v
v∈MK v∈S
because each coordinate xj is an S -unit, and we conclude that

n
|Lvj (x)|v = H(x)−1 .
v∈S j=1

Thus we can apply Corollary 7.2.5 with ε = 1 and obtain that the solutions x ∈ X 
lie in a finite union S of proper linear subspaces of K n . Now we partition X 

 finitely many subsets, such that in a typical subset X we have a relation
into
aj xj = 0. We may suppose, after a permutation of coordinates, that an = 0.
Eliminating xn and using the equation x1 + · · · + xn = 1, we then obtain an
equation
b1 x1 + · · · + bn−1 xn−1 = 1
with bj ∈ K , not all 0. By removing  vanishing subsums from the above rela-
tion,
 we end up with a new relation i∈I bi xi = 1, where no proper subsum of
i∈I b i x i vanishes. Thus by partitioning once again X  into finitely many sub-
sets, it suffices to deal with one such relation. We enlarge S to a new finite set S  ,
where each bi (i ∈ I) is an S  -unit, and we obtain a non-degenerate solution y
in S  -units, yi = bi xi , of the equation

yi = 1.
i∈I

Since |I| ≤ n − 1, the induction hypothesis shows that the bi xi (i ∈ I), and
hence the coordinates xi (i ∈ I) themselves belong to a finite set. Thus we have
proved that the inductive hypothesis implies that there is a finite set Φ such that
any solution x ∈ X has at least one coordinate in Φ . Let xi0 = c, c ∈ Φ ,
be one of these relations. Then c = 1 because x is a non-degenerate solution.
By enlarging S to a new set S  so that 1 − c becomes an S  -unit and setting
zi = (1 − c)−1 xi , we have that zi = (1 − c)−1 xi (i = i0 ) is an S  -unit and
yields a non-degenerate solution of

zi = 1.
i=i0

Thus we can apply induction again to conclude that all remaining coordinates xi
also belong to a finite set. 
7.4. The generalized unit equation 189

7.4.4. Theorem 7.4.2 is a very powerful tool and we devote the next few paragraphs
to some of its consequences. The first application is to the so-called norm-form
equation, a generalization of the Thue equation (for irreducible polynomials). For
simplicity, we shall consider here only norm-form equations over Z .
Let ω1 , . . . , ωn ∈ Q and let K = Q(ω1 , . . . , ωn ), d = [K : Q]. Let L(X) be
the linear form

n
L(X) = ωj Xj ,
j=1
let S be the set of distinct embeddings σ : K → C , and define

n
Lσ (X) = σ(ωj )Xj .
j=1

The norm-form associated to L(X) is



NK/Q (L(X)) = Lσ (X).
σ∈S

It is a homogeneous polynomial of degree d in the n variables X1 , . . . , Xn ,


with rational coefficients. The corresponding norm-form equation is the equa-
tion NK/Q (L(X)) = c with c ∈ Q× , to be solved with integral x ∈ Zn .
A more intrinsic way of looking at norm-form equations is as follows. Define

n
M = L(Zn ) = ωj Z.
j=1

Then M is a free Z-module in K of rank ≤ d . Hence we may just as well


consider the equation NK/Q (µ) = c for µ ∈ M, where M is an arbitrary finitely
generated Z -module in K . Moreover, we may always assume that ω1 , . . . , ωn
are linearly independent over Z .
We first recall some results from algebraic number theory. A Z -submodule M
of K of rank n = d is called a full module. An order is a full module of K ,
which is also a subring of K containing 1. We denote by O×,+ the elements of
O of norm 1. By A.I. Borevich and I.R. Shafarevich [42], Ch.II, §2, Th.4, it is a
subgroup of the group of units O× , clearly of index 1 or 2. Since the ring O is
finitely generated as a Z -module, O is contained in OK . We conclude that every
order has finite index in the maximal order OK .
Dirichlet’s unit theorem extends to orders O saying that O× is a finitely generated
abelian group of rank r + s − 1, where r (resp. s ) is the number of real (resp.
complex) places of K ([42], Ch.II, §4, Th.5).
An obvious instance in which a norm-form equation may have infinitely many so-
lutions is when M is a full module. Then the set of m ∈ K such that mM ⊂ M
190 T H E S U B S PAC E T H E O R E M

is an order O in K ([42], Ch.II, §2, Th.3). Since by definition OM = M, if


the norm-form equation NK/Q (µ) = c has a solution µ0 , it is immediate that
µ0 O×,+ is also a family of solutions, necessarily infinite if K is not Q or imagi-
nary quadratic (by the generalization of Dirichlet’s unit theorem).
The same phenomenon may occur, even if n < d , if there are a proper subfield
K  ⊂ K , a full module M of K  with associated order O , and an element
α ∈ K × such that αM ⊂ M. If µ ∈ M , we have αµ ∈ M and

NK/Q (αµ ) = NK/Q (α)NK  /Q (µ )[K:K ] ,
hence if there is a solution µ0 = αµ0 with µ0 ∈ M  , we see that µ0 (O )×,+ is a
set of solutions if [K : K  ] is odd, and µ0 (O )× (thus allowing for units of norm
−1) if [K : K  ] is even. Again, we get an infinite set of solutions if K  is not Q
or imaginary quadratic. If [K : K  ] is even and (O )×,+ is of index 2 in (O )× ,
the set µ0 (O )× splits as
µ0 (O )× = µ0 (O )×,+ ∪ µ0 η(O )×,+
with η ∈ (O )× of norm NK  /Q (η) = −1.
A set of solutions µ0 (O )×,+ or µ0 η(O )×,+ (which may occur only if [K : K  ]
is even) obtained by such a procedure is a family of solutions associated to a pair
(M , µ0 ), where M is a full module in a subfield K  ⊂ K with associated order
O and where µ0 is a solution of the norm-form equation with µ0 ∈ αM for a
suitable α ∈ K × with αM ⊂ M.
√ √
√ √ 7.4.5. A typical example is L(X) = X1√+ 2X
Example √2 + 3X3 , K =
Q( 2, 3). Then the associated module M = Z + 2Z + 3Z has rank 3 and
is not full in K . We are looking for an infinite family √
of solutions
√ associated√ to
(M , µ0 ). The corresponding subfield K  has to be Q( 6), Q( 3), or Q( 2).
Clearly, we have √ √
K  µ0 ⊂ QM = Q + 2Q + 3Q
for such a family of solutions. We conclude that the family of solutions is con-
tained in √ √ √ √

MK = 2Z + 3Z, Z + 3Z or Z + 2Z.
The second and the third module are already full modules√in the corresponding
√ K

and for the first we have the full module M = 2Z + 6Z in Q( 6) such that

√1 M = MK . The restrictions of the norm-form equation NK/Q (x) = c to the
2

modules MK have the form
(2X22 − 3X32 )2 = c, (X12 − 3X32 )2 = c, (X12 − 2X22 )2 = c.
In order to have an infinite family of solutions, it is necessary that c = d2 for some
d ∈ Q× and every solution of one of the Pell equations
2X22 − 3X32 = ±d, X12 − 3X32 = ±d, X12 − 2X22 = ±d
7.4. The generalized unit equation 191

induces an infinite family of solutions of the original norm-form equation. (We


recall that a Pell equation is of the form X 2 − aY 2 = b for some square free
integer a and any b ∈ Z \ {0}. Since 2X22 − 3X32 = 6(X2 + X3 )2 − (2X2 +
3X3 )2 , the first equation is also of this type. The procedure to solve Pell’s equation
effectively is well known, we refer to [42], Ch.II, §7.)

The natural conjecture is then that the solutions of a non-degenerate norm-form


equation NK/Q (L(X)) = c consist of finitely many families of solutions. This
was answered in the affirmative by W.M. Schmidt [267] in 1972, as a consequence
of his subspace theorem.
Theorem 7.4.6. Let M be a finitely generated Z-module in K . Then the set of
solutions of the norm-form equation NK/Q (µ) = c , µ ∈ M, consists of the union
of finitely many families of solutions.
Proof: The proof is by verifying the theorem first if M is a full module in K
(which is automatically the case if d = 1) and then proceeding by induction on
the degree d = [K : Q].
Suppose first that rank(M) = d , hence M is a full module in K with associated
order O. Every solution µ0 determines a family of solutions µ0 O×,+ and to
prove the theorem we need to show that the number of distinct families so obtained
is finite. To see this, note that the group U of units of O of norm 1 has finite index
in the group of all units of K (by the Dirichlet unit theorem for orders). A classical
result of Minkowski shows that for any ξ ∈ K there is a unit η such that
 1/d
|σ(ξη)| ≤ C NK/Q (ξ) (7.9)
for σ running over all embeddings σ : K → C , for some constant C de-
pending only on K . A direct simple proof runs as follows (cf. [215], Ch.3,
Lemma 3.5). Replacing ξ by nξ for suitable n ∈ Z , we may assume that ξ
is a non-zero algebraic integer. Let ω1 , . . . , ωd be an integral basis of OK and let
B = |NK/Q (ξ)1/d | . Consider all numbers ω1 x1 + · · · + ωd xd with integer xi
and 0 ≤ xi ≤ B . There are (B + 1)d > |NK/Q (ξ)| such numbers, while there are
|NK/Q (ξ)| residue classes (mod ξOK ). Therefore, by the pigeon-hole principle
there are two such numbers in a same residue class with their difference equal to
αξ for some α ∈ OK \ {0}. Then |σ(αξ)| ≤ d maxi |σ(ωi )| B and it follows
that NK/Q (α) is bounded independently of ξ . Thus there are only finitely many
possibilities for α up to multiplication by units, i.e. α = βη , where β belongs to
a finite set and η is a unit. Then |σ(ηξ)| ≤ d maxiβ |σ(β −1 ωi )| B , proving what
we want.
Therefore, since U has finite index in the group of all units, the same conclusion as
in (7.9) holds with η ∈ U , provided we replace C by a larger constant which may
depend on the order O. Moreover, there is an integer b > 0 such that bM ⊂ OK .
We conclude that in any given family of solutions there is an element µ0 with
192 T H E S U B S PAC E T H E O R E M

height bounded in terms of NK/Q (µ0 ) = c and b (using (7.9) at archimedean


places and |µ0 |v ≤ |b|−1
v for non-archimedean places). By Northcott’s theorem in
1.6.8, the number of families is finite, as asserted. This proves the theorem if M
is a full module in K .
Suppose now that n = rank(M) < d . Let F be a Galois closure of K with
Galois group G and let H be the subgroup of G fixing K . Then we identify
{σ : K → F } with a set S of representatives in G of G/H .
Since M has rank n < d = |S| as a Z -module and S acts trivially on Z, there is
a relation of linear dependence in F

aσ σ(µ) = 0 (µ ∈ M).
σ∈S
Some coefficients aσ may be 0 and we select a relation

aσ σ(µ) = 0 (µ ∈ M) (7.10)
σ∈S 

for a subset S  ⊂ S , where now aσ = 0.


After these preliminaries, we take a sufficiently large finite set S ⊂ MF contain-
ing the archimedean places such that σ(M) ⊂ OS,F for every σ and moreover c
and the coefficients aσ (σ ∈ S  ) are S -units. Then the equation NK/Q (µ) = c
×
implies that σ(µ) ∈ OS,F for every σ and we conclude that
×
aσ σ(µ) ∈ OS,F (σ ∈ S  ).
Therefore, by applying Corollary 7.4.3 to (7.10), we infer that there is a finite set
F such that any solution µ in our given class satisfies
τ (µ) aσ
=f (7.11)
σ(µ) aτ
for a suitable choice (depending on µ) of f ∈ F and σ, τ ∈ S  , σ = τ .
This gives us a further subdivision of solutions into finitely many subclasses in
which f , σ , and τ remain fixed. Let us fix such a subclass X  and an element
α ∈ X  . Then by (7.11) we have
τ (α−1 µ) = σ(α−1 µ)
for every µ ∈ X  . By applying σ −1 and setting g = σ −1 τ , we obtain
g(α−1 µ) = α−1 µ
/ H . It follows from this that α−1 µ lies in a proper subfield K  of
for a certain g ∈
K , namely the fixed field of the group H, g generated by H and g . Therefore,
if M is the Z -module generated by the elements α−1 µ (µ ∈ X  ), we have that
M ⊂ K  and αM ⊂ M. For µ ∈ M , we have

NK/Q (αµ ) = NK/Q (α)NK  /Q (µ )[K:K ] ,
7.4. The generalized unit equation 193

and the norm-form equation NK/Q (αµ ) = c is equivalent to the new norm-form
equation NK  /Q (µ ) = ε , with ε = 1 if [K : K  ] is odd and ε = ±1 otherwise.
Since K  is a proper subfield of K , the inductive hypothesis applies to K  and
X  is indeed the union of finitely many families of solutions. Since there are only
finitely many subclasses X  to consider, the result follows. 
The following theorem is due to M. Laurent [178].
×
Theorem 7.4.7. Let Γ be a finitely generated subgroup of (Q )n and let Σ be
any subset of Γ . Then the Zariski closure of Σ in Gnm is a finite union of translates
of algebraic subgroups of Gnm .
Remark 7.4.8. In his paper Laurent proves the stronger statement, previously
conjectured by Lang, in which the subgroup Γ is a subgroup of C× of finite Q -
rank. This was obtained from the above theorem by using additional arguments
from Kummer theory. A proof of the stronger result is also immediate if in the
argument below we use Theorem 7.4.1, rather than Corollary 7.4.3.

Proof of Theorem 7.4.7: Let V be an irreducible component of the Zariski closure


of Σ and let f1 (X) = · · · = fr (X) = 0 be a set of polynomials defining V set
theoretically. Let

I
f (X) = ci Xai (ci = 0)
i=0
be one of these polynomials fi . We take S ⊂ MK to be a finite set of places such
that Γ ⊂ (US,K )n (it suffices to do so for a set of generators) and also ci ∈ US,K
for every i, and apply Corollary 7.4.3 to the equation
I

ci
− gai −a0 = 1 (g ∈ V ∩ Σ).
i=1
c 0

Then we conclude that there are λ1 , . . . , λJ ∈ US,K such that



I 
J
V ∩Σ⊂ {Xai − λj Xa0 = 0} .
i=1 j=1

Therefore, passing to the Zariski closures, we infer that



I 
J
V ⊂ {Xai − λj Xa0 = 0} .
i=1 j=1

Hence there are i, j ≥ 1 such that Xai − λj Xa0 = 0 on V , because V is


irreducible. Now we can eliminate the monomial Xa0 from f (X), obtaining a
polynomial f&(X) in which the number of monomials has decreased by at least
1 and we may replace f = 0 by Xai − λj Xa0 = f& = 0 in the set of defining
equations for V . Proceeding in this way, we see that V is defined by binomial
194 T H E S U B S PAC E T H E O R E M

equations of type Xa − λXb = 0; hence V is a translate of an algebraic subgroup


of Gnm . 
7.4.9. Our final application of the subspace theorem is a theorem in arithmetic,
due to Corvaja and Zannier [73], see also [337], p.50.
Theorem 7.4.10. Let S ⊂ MQ be a finite set of places including the place at ∞ ,
and let ε > 0. Then the set Σ of integer pairs (u, v) with prime factors only from
S and such that
GCD(u − 1, v − 1) ≥ max(|u|, |v|)ε
is contained in the union of a finite set and finitely many proper algebraic sub-
groups of G2m . In particular, the subset of pairs (u, v) ∈ Σ with u and v multi-
plicatively independent S -units is a finite set.

As a special case, we have:


Corollary 7.4.11. Let a > b > c ≥ 1 be positive integers and let P (a, b, c) be
the greatest prime factor of (ab + 1)(ac + 1). Then P (a, b, c) → ∞ as a → ∞ .
Proof of corollary: We argue by contradiction. Suppose there is an infinite set
of triples (a, b, c) with P (a, b, c) ≤ P . Let S be the set of all primes p ≤ P ,
including also ∞ and set u = ab + 1, v = ac + 1. Then u , v are S -units, u > v
and GCD(u − 1, v − 1) ≥ a.
Thus Theorem 7.4.10 applies and we deduce that there are infinitely many u , v
verifying an equation um = v n (see Corollary 3.2.15). Since u > v are positive
integers, we may assume that 1 ≤ m < n are positive and coprime, whence u =
tn , v = tm for some integer t . Now GCD((tm −1)/(t−1), (tn −1)/(t−1)) = 1
(otherwise there is a prime p such that the reduction of both polynomials (mod p)
has the zero t which is impossible for coprime m, n). Therefore, we have
a ≤ GCD(u − 1, v − 1) = t − 1 < u1/n ≤ (ab + 1)1/2 < a
because b < a. This is a contradiction, concluding the proof. 

Proof of Theorem 7.4.10: We may assume that S contains at least one non-
archimedean place, |v| ≥ |u| and v = 1. Let d be the denominator of the fraction
(u−1)/(v−1) in its lowest terms, thus d ≤ 2|v|1−ε because GCD(u−1, v−1) ≥
max(|u|, |v|)ε by hypothesis. We define zj ∈ Q and cj ∈ Z for j ∈ N by
u−1 cj
zj := uj−1 = .
v−1 d
We have the approximation

1  1 h
 
= + O |v|−h−1
v − 1 r=1 v r
7.4. The generalized unit equation 195

whence, multiplying by uj−1 (u − 1), we obtain


 
 h
uj−1  uj 
h
 

zj + −  = O |u|j |v|−h−1 . (7.12)
 v r v r 
r=1 r=1

We shall view such an inequality as providing a small value of a linear form in


independent variables corresponding to zj , uj−1 /v r , uj /v r and apply the affine
version Corollary 7.2.5 of the subspace theorem to a suitable set of linear forms,
which will include the linear forms associated to the inequalities (7.12).
Let k be a positive integer to be chosen later and let n = k+(k+1)h . We consider
linear forms Lνi , indexed by i = 1, . . . , n and ν ∈ S , in variables W1 , . . . , Wk
and Yjr , j = 0, . . . , k , r = 1, . . . , h, defined as follows. It will be notationally
convenient to define X to be the vector
X = (X1 , . . . , Xn ) := (W1 , . . . , Wk , Y01 , . . . , Y0h , . . . , Yk1 , . . . , Ykh ).

For i = 1, . . . , k and ν = ∞ we set



h 
h
L∞i := Wi + Yi−1,r − Yir .
r=1 r=1

For (ν, i) ∈
/ {(∞, 1), . . . , (∞, k)} we define instead
Lνi := Xi .
Obviously, for each ν ∈ S the linear forms Lν1 , . . . , Lνn are linearly indepen-
dent. Now define for a pair (u, v) ∈ Σ the point x = (x1 , . . . , xn ) to be
 
x := dv h z1 , . . . , zk , v −1 , . . . , v −h , uv −1 , . . . , uv −h , . . . , uk v −1 , . . . , uk v −h .
Then, for i > k , we have xi = dua v b , for suitable integers a and b , hence xi
equals d times an S -unit. Since Lνi = xi , we easily deduce that

|Lνi (x)|ν ≤ d for i > k.
ν∈S

Therefore, we have

n 
k
|Lνi (x)|ν ≤ dn−k |Lνi (x)|ν
ν∈S i=1 ν∈S i=1

(7.13)
k  
k
=d n−k
|L∞i (x)|∞ |xi |ν .
i=1 ν∈S\{∞} i=1

For i ≤ k , we have xi = dv h zi ∈ v h Z , whence



|xi |ν ≤ |v|−h . (7.14)
ν∈S\{∞}
196 T H E S U B S PAC E T H E O R E M

 
Moreover, by (7.12) we have |L∞i (x)| = O d|u|i |v|−1 for i = 1, . . . , k .
Putting this estimate and (7.14) in (7.13), we get

n  
|Lνi (x)|ν = O dn |u|k(k+1)/2 |v|−hk−k .
ν∈S i=1

Recalling that d ≤ 2|v|1−ε and |u| ≤ |v|, from the last displayed inequality we
obtain
 n  
|Lνi (x)|ν = O |v|h+k(k+1)/2−εn . (7.15)
ν∈S i=1

Finally, each xi is an integer and we have max |xi | ≤ 2d|v|h+k < 4|v|h+k+1 ,
hence H(x) ≤ 4|v|h+k+1 . In view of (7.15), we conclude that

n
|Lνi (x)|ν h,k H(x)−δ
ν∈S i=1

with δ = (εn − h − k(k + 1)/2)/(h + k + 1) provided that δ > 0. If we take


for example k ≥ 2/ε and h ≥ k 2 + 1, it is clear that we get δ > 0. The subspace
 that the vectors x all lie on a certain finite
theorem in Corollary 7.2.5 now proves
union of subspaces of Qn . Let ai xi = 0 be one of these subspaces. If we
substitute the value of xi in such an equation, we find a non-trivial equation of
type
f (u) g(u, v)
+ = 0,
v−1 vh
for some polynomials f and g . The rational function f (X)/(Y −1)+g(X, Y )/Y h
cannot vanish identically on G2m , otherwise Y − 1 would divide f (X), yielding
f = 0, and then g = 0 too, a contradiction. Thus the points (u, v), which belong
to a finitely generated subgroup of G2m , are also on a finite union of curves of type
Y h f (X) + g(X, Y )(Y − 1) = 0. By Laurent’s theorem in 7.4.7, we conclude that
the set Σ lies in a finite union of translates of algebraic subgroups of G2m .
In order to complete the proof of the theorem, it suffices to show that, if a translate
of such an algebraic subgroup contains infinitely many points in Σ , then it is
already a subgroup. Let X m Y n = c be an equation defining such a translate
(use Corollary 3.2.15). Let (u, v) ∈ Σ and write g = GCD(u − 1, v − 1).
Then from um v n = c we must have c ≡ 1 (mod g). Since by hypothesis g ≥
max(|u|, |v|)ε , it is now clear that, if such a translate contains infinitely many
points of Σ , we must have c = 1. This proves the first conclusion of the theorem.
For the second conclusion, it is obvious that, if u and v are multiplicatively inde-
pendent, then the pair (u, v) does not lie in any proper algebraic subgroup of G2m
(use Corollary 3.2.15), and the result follows from the first part of the theorem. 
7.5. Proof of the subspace theorem 197

7.5. Proof of the subspace theorem

We follow here the classical proof of Schmidt, with important modifications intro-
duced by H.P. Schlickewei [261] and J.-H. Evertse [108] in order to cover the case
of arbitrary number fields and allow a finite set of places.

7.5.1. By Theorem 7.2.6, it is sufficient to prove the subspace theorem in its affine
form Corollary 7.2.5. The proof is by contradiction.
The first step in the proof consists in following, as far as possible, the blueprint
provided by the proof of Roth’s theorem. Here a major new difficulty appears,
namely the non-vanishing of the auxiliary construction cannot be done at a single
point and it requires the consideration of n independent points. This gives only a
rather weak version of the subspace theorem.
The conclusion of the proof is to show how to go from n independent points to one
point only. This is where we need new ideas. Schmidt’s original method consisted
in applying the weaker form of the subspace theorem to a new set of linear forms,
obtained by taking exterior products of the given linear forms, and using geometry
of numbers to deduce the strong version of the subspace theorem. This part of
Schmidt’s proof has been substantially simplified by Evertse [108] and we will
follow his exposition here, with some further simplifications, because our goal is
only the easier qualitative form of the subspace theorem. As formally stated in the
next article, we will not keep track of constants depending only on K , S and the
linear forms Lvi . This will allow substantial simplifications in the exposition.

7.5.2. Before starting the proof of the subspace theorem, we need some notation.
We write

xh = (xh0 , . . . , xhn ) (h = 1, . . . , m), X = (x1 , . . . , xm )

and similarly for linear forms and vectors of linear forms

Lv (x) = (Lv0 (x), . . . , Lvn (x)) (v ∈ S) .

If i = (i0 , . . . , in ) is an (n + 1)-tuple of non-negative integers, we also write

|i| = i0 + · · · + in , xi = xi00 . . . xinn , i! = i0 ! · · · in ! .

We will also need to use some elementary exterior algebra and our notation will be
as follows. Let V be a vector space over a field F , of dimension n + 1, with basis
ei (i = 0, . . . , n). For k = 1, . . . , n, we equip the exterior power ∧k V with the
standard basis
ei1 ∧ · · · ∧ eik
with i1 < i2 < · · · < ik , in lexicographic order.
198 T H E S U B S PAC E T H E O R E M

n
We extend the standard scalar product x · y = j=0 xj yj on V to ∧k V by
Laplace’s identity
(x1 ∧ · · · ∧ xk ) · (y1 ∧ · · · ∧ yk ) = det((xi · yj ))i,j=1,...,k (7.16)
and by multilinearity. Obviously, this is just the standard scalar product on ∧k V
with respect to the standard basis (ei1 ∧ · · · ∧ eik )i1 <···<ik .

1 geometry, we define a K -linear ∗-operator on the


Similarly as in Riemannian
exterior algebra ∧V := k ∧k V by setting

(ei1 ∧ · · · ∧ eik ) := (−1)signπ ej0 ∧ · · · ∧ ejn −k ,
where π = (j0 , . . . , jn−k , i1 , . . . , ik ) is a permutation of {0, . . . , n}. For ω ∈
∧k V , we have
ω ∗∗ = (−1)k(n+1−k) ω .
We use it only for k = 1 and k = n , where it is trivial. Note also that ∗ is an
isometry with respect to the scalar product. We have the Laplace expansion
x0 · (x1 ∧ · · · ∧ xn )∗ = det(x0 , . . . , xn ), (7.17)
which we will use several times.
A similar notation will be used with regards to linear forms. If L(x) = a · x =
a0 x0 + · · · + an xn , then operations on L will be understood as operating on the
vector a .
Constants C0 , C1 , . . . denote unspecified constants depending only on the given
forms Lvi and K (hence in particular on n , S and a field F of definition for the
linear forms), but we will not make any special effort in giving explicit values for
them. We shall also use the Vinogradov symbol  and the Landau symbol O(·)
in the same way, namely up to constants depending only on the given forms Lvi
and K . However, all estimates will be uniform in the parameters ε , m , and the
weights di .
7.5.3. Step 0: Approximation classes and approximation domains.
We denote by F a finite extension of K , which is a field of definition for all
forms Lvi (v ∈ S, i = 0, . . . , n) (recall that they have algebraic Kv -coefficients)
and set r = [F : K]. As in Theorem 6.4.1, let | |v,K be an extension of the
absolute value | |v to F such that |Lv (x)|v = |Lv (x)|v,K for every x ∈ K n+1 .
In order to simplify the exposition, the reader could assume that F = K (see
Remark 7.2.3). We will work however in full generality, with the forms Lvi having
coefficients in F ∩ Kv .
The following step is not really necessary for the proof but it leads to cleaner
estimates. Since we can always enlarge the set S , we may and shall assume that S
contains the archimedean places and that the ring OS,K is a principal ideal domain
7.5. Proof of the subspace theorem 199

(see Proposition 5.3.6) and, by the proof of Theorem 7.2.6, it suffices to deal only
with S -integral vectors x satisfying the additional condition

|x|v = H(x).
v∈S

Such vectors will be called primitive.


Thus we begin by assuming that there is an infinite set of S -integral primitive
solutions of x ∈ OS,K
n+1
\ {0} of
 
n
|Lvi (x)|v,K < H(x)−ε . (7.18)
v∈S i=0

If x = (x0 , . . . , xn ) ∈ OS,K
n+1
is primitive, clearly ux is again primitive whenever
u is an S -unit. Moreover, if x is a solution of the basic inequality (7.18), then
ux is again a solution. Such solutions form an equivalence class. The following
lemma shows that in every equivalence class there is an element with small affine
height, as defined in 1.5.7. In order to distinguish it from the projective height, we
set
haff (x) := h((1, x)).
Lemma 7.5.4. There is a positive constant C0 , depending only on S and K , with
the following property. Let x = (x0 , . . . , xn ) ∈ OS,K
n+1
be a primitive point. Then
there is an S -unit u in OS,K such that
h(x) ≤ haff (ux) ≤ h(x) + C0 .
Proof: The inequality h(x) ≤ haff (ux) is obvious because h(ux) = h(x).
Now suppose for example that x0 = 0. Since x0 is a non-zero S -integer, the
product formula shows that
 
log |x0 |v = − log |x0 |v ≥ 0.
v∈S v ∈S
/

 the points (log |u|v )v∈S , u ∈


By Dirichlet’s unit theorem (see Theorem 1.5.13),
US,K , form a lattice Λ in the real subspace v∈S tv = 0 of R|S| . Therefore,
there is an S -unit u such that log |ux0 |v = log |u|v + log |x0 |v ≥ −C for v ∈ S ,
with C a positive constant depending only on the lattice Λ . Since ux is primitive,
we deduce that

h(ux) = log |ux|v
v∈S

≥ {−C + max log+ |uxi |v }
i=0,...,n
v∈S
= −|S| C + haff (ux).
The lemma follows by taking C0 = |S| C . 
200 T H E S U B S PAC E T H E O R E M

Corollary 7.5.5. Every equivalence class of primitive solutions of the basic in-
equality (7.18) contains an element x such that for every v ∈ S we have
haff (Lv (x)) ≤ h(x) + C1 .
In particular, we have for Lvi (x) = 0
 
 log |Lvi (x)|v,K  ≤ rh(x) + rC1 . (7.19)
Proof: Since the forms Lvi (i = 0, . . . , n) are linearly independent, we have for
w ∈ MF
 
 log |Lv (x)|w − log |x|w  ≤ γw ,
with γw depending only on the forms Lvi and equal to 0 up to finitely many w .
We conclude that haff (Lv (x)) and haff (x) differ only by a bounded quantity and
an application of Lemma 7.5.4 yields the first claim. Now let w be the place of F
with w|v . Then Proposition 1.2.7 shows that
| |v,K = | |[F
w
:K]/[Fw :Kv ]

and the claim follows from the fundamental inequality in (1.8) on page 20 and
[F : K] = r . 
7.5.6. As in the proof of Roth’s theorem, it is necessary to consider only solutions
x for which the factors |Lvi (x)|v,K , for each (v, i), have a similar behaviour
compared with H(x). In other words, we want
 
h(x)−1 log |Lvi (x)|v,K v∈S,i=0,...,n (7.20)
to be nearly constant along our set of primitive solutions. If Lvi (x) = 0 for some
(v, i), then the solution x lies in the hyperplane defined by the linear form Lvi .
Hence such solutions satisfy the conclusion of the theorem to be proved. Thus,
by the preceding corollary, we may and shall assume that for all (v, i) we have
Lvi (x) = 0 and that (7.19) holds.
The next step consists in splitting solutions into finitely many approximation classes,
as it was done in the proof of Roth’s theorem. Since we are not interested here in
counting the number of approximation classes, it will suffice to note that, given any
infinite set of solutions of inequality (7.18), and given N > 0 (which will be taken
very large), then by (7.19) there exists a cube in R(n+1)|S| of edge size 1/N , with
north-east corner at a point (cvi ) ∈ [−2r, 2r](n+1)|S| , containing (7.20) for an in-
finite subset of the given set of primitive solutions. (The point of the constant 2r
is to swallow the contribution of rC1 in (7.19) as soon as h(x) is large enough.)
Thus we have an infinite primitive approximation class of primitive solutions of
inequality (7.18), consisting of solutions satisfying

1
cvi − h(x) ≤ log |Lvi (x)|v,K ≤ cvi h(x) (7.21)
N
7.5. Proof of the subspace theorem 201

for each pair (v, i). By this inequality and (7.18) on page 199, we necessarily have

n
ε
cvi ≤ −ε + (n + 1)|S|/N < − (7.22)
2
v∈S i=0

provided N > 2(n + 1)|S|/ε , which we shall suppose henceforth.


Next, we are going to apply geometry of numbers in the context of Sections C.1
and C.2. We recall that KA denotes the adele ring of K . Let Rv be the discrete
valuation ring in the completion Kv . For v ∈ S and any Q ≥ 1, we define the v -
adic approximation domain Πv (Q) of level Q to be the parallelepiped in Kvn+1
given by
" #
Πv (Q) := ξ v ∈ Kvn+1 | |Lvi (ξ v )|v,K ≤ Qcv i ∀i = 0, . . . , n .
For archimedean v , this is a compact convex symmetric subset of KAn+1 . For non-
archimedean v , it is easy to see that Πv (Q) is a Kv -lattice. The approximation
domain is defined in KAn+1 by
 
Π(Q) = Πv (Q) × Rv .
v∈S v ∈S
/

Then Π(Q) is similar as the domains considered in Minkowski’s second theorem


(see Theorem C.2.11). The only difference is that the archimedean factors Πv (Q)
are closed instead of open. With the same definition for the successive minima
λ1 , . . . , λn+1 of Π(Q) as before, it is trivial to check that Minkowski’s second
theorem also holds for closed domains as Π(Q). Recall that we use the volume
with respect to the following measures: Let βv and β be the Haar measures on
Kv and KA respectively, as defined and normalized in C.1.9, and denote by βvn+1
and β n+1 the corresponding measures on Kvn+1 and KAn+1 .
Lemma 7.5.7. Let ∆v = det(Lvi ) be the determinant of the linear forms
Lv0 , . . . , Lvn and let p be the rational prime with v|p . Then with d = [K : Q]
we have:
(a) If v ∈
/ S , then
 (n+1)/2
βvn+1 (Πv (Q)) = βv (Rv )n+1 = DKv /Qp p .
(b) If v ∈ S is not archimedean, then
 (n+1)/2 
βvn+1 (Πv (Q)) = DKv /Qp p |∆v |−d
v,K Q
d i cv i
.
(c) If Kv = R , then

βvn+1 (Πv (Q)) = 2n+1 |∆v |−d
v,K Q
d i cv i
.
(d) If Kv = C , then

βvn+1 (Πv (Q)) = (2π)n+1 |∆v |−d
v,K Q
d i cv i
.
202 T H E S U B S PAC E T H E O R E M

(e) Let r, s be the number of real and complex places. Then


 
 
−(n+1)/2 −d
β n+1
(Π(Q)) = 2 r(n+1)
(2π)s(n+1)
|DK/Q | |∆v |v,K Qd v i cv i .
v∈S

Proof: Statement (a) is obvious from the definition of βv and, similarly, we would
get (b), (c), (d) if Πv (Q) is the cuboid {|ξvi |v ≤ Qcv i | i = 0, . . . , n}. In general,
a transformation of coordinates ξvi = Lvi (ξ v ) is necessary to get Πv (Q) from
the cuboid. Then the volume changes by a factor det(Lvi )−1 −d
v,K = |∆v |v,K (use
C.1.3) proving (b)–(d). Finally (e) follows from (C.4) on page 606. 
Corollary 7.5.8. The volume of an approximation domain is
β n+1 (Π(Q))  Q−dε/2 .
Proof: Clear from the preceding lemma and (7.22). 
We summarize the results obtained so far as:
Lemma 7.5.9. Suppose the affine subspace theorem in 7.2.5 is false for ε > 0.
Let d = [K : Q], let F be a finite extension of K, which is a field of definition
for all forms Lvi , and define r = [F : K]. Then there are real numbers cvi
(v ∈ S, i = 0, . . . , n) with |civ | ≤ 2r and

n
ε
civ ≤ − ,
i=0
2
v∈S

and an infinite set X of points of K n+1 , with the following properties:

(a) Distinct subsets of n elements of X span distinct linear subspaces of K n+1 .


(b) Let ι be the diagonal embedding ι : K n+1 −→ KAn+1 . Then for x ∈ X
we have
ι(x) ∈ Π(H(x)).
−dε/2
(c) β n+1
(Π(Q))  Q .
Proof: Since we assume the falsity of the subspace theorem, there is an infinite
set of primitive solutions of inequality (7.18) on page 199 verifying statement (a).
Noting that any infinite subset of this set continues to verify (a) and that there must
be an approximation class containing such an infinite subset, we have (b) for x in
this approximation class in view of (7.21) on page 200. Finally, we have already
verified (c) in Corollary 7.5.8. 
7.5.10. We shall show that given ε > 0 and the real numbers cvi the conclusion
of the lemma cannot hold, thereby proving the subspace theorem. The proof is
articulated in two separate parts.
The first part is a natural extension of the Roth machinery to our higher-dimen-
sional situation. However, to make it work we need to assume an additional
7.5. Proof of the subspace theorem 203

condition, namely that Π(Q) contains n linearly independent points ι(yi ) (i =


1, . . . , n). Then we show that the conclusion of Lemma 7.5.9 cannot hold. Note
that if n = 1 this condition is automatically satisfied (we take y1 = x ), but if
n ≥ 2 this appears to be a very restrictive condition.
The second part of the proof is quite different in nature and uses geometry of num-
bers to show, starting with any set of linear forms Lvi for which the conclusion
of Lemma 7.5.9 holds, that there exists a new set of linear forms for which the
same conclusion still holds and moreover the additional condition of existence of
independent points is also verified. Since this contradicts the result in the first part,
the subspace theorem follows.
Before stating the precise result to be proved in the first part of the proof, we need
a definition and two simple results. We assume that the subspace theorem is false,
hence Lemma 7.5.9 is in force for the set X . Let Π(Q) be the approximation
domain of level Q associated to the set {cvi }.
Definition 7.5.11. The K -vector space V (Q) is the linear subspace of K n+1
spanned by the points x ∈ K n+1 such that ι(x) ∈ Π(Q). The rank of Π(Q) is
the dimension of the K -vector space V (Q).
Lemma 7.5.12. Let X = {xν } and define Qν = H(xν ). Then for all but finitely
many xν ∈ X we have
1 ≤ rank(Π(Qν )) ≤ n.
Proof: By Lemma 7.5.9, we have ι(xν ) ∈ Π(Qν ). Thus V (Qν ) is not the 0
space and rank(Π(Qν )) ≥ 1.
In order to prove rank(Π(Qν )) ≤ n, we use geometry of numbers. Let λ1 , . . . ,
λn+1 be the successive minima of Π(Qν ) and let ι(x(i) ) be linearly independent
points defining the successive minima λi . Then V (Qν ) is the space spanned by
the points x(i) such that λi ≤ 1 and
rank(Π(Qν )) = max{i | λi ≤ 1}.
i
On the other hand, the lower bound in Minkowski’s second theorem (see Theorem
C.2.11) shows that
n+1 ≥ λ1 . . . λn+1  β
λn+1 n+1
(Π(Qν ))−1/d ,
ε/(2n+2)
hence Corollary 7.5.8 yields λn+1  Qν leading to λn+1 > 1 for large
Qν . By Northcott’s theorem in 2.4.9, we get the claim. 
For the following main result of the first part of the proof, we no longer assume
the falsity of the subspace theorem. We will return to the indirect proof only in the
second part, starting in Step VIII.
Theorem 7.5.13. Let S be a finite set of places on K containing the archimedean
places. For v ∈ S , let Lv0 , . . . , Lvn be independent linear forms on K n+1 with
204 T H E S U B S PAC E T H E O R E M

coefficients
 in a finite extension F/K . Let cvi ∈ R (v ∈ S, i = 0, . . . , n) with
vi cvi < 0 defining an approximation domain Π(Q) of level Q as in 7.5.6.
Then {V (Q) | Q ≥ 1, rank(Π(Q)) = n} is a finite set.

We break the proof of this theorem into several steps, trying to imitate the
arguments
 given in Chapter 6 for Roth’s theorem. We fix 0 < ε ≤ 1 with
c
vi vi ≤ −ε/2. The cardinality of the above set of subspaces may be expressed
in terms of K, S, Lvi , ε (see [108], Th.C), but we restrict our attention to the qual-
itative result.
7.5.14. Step I: The auxiliary polynomial.
In the proof of Roth’s theorem, we start by constructing a polynomial in several
variables vanishing to high weighted order at points (αv , . . . , αv ) ∈ Kvm with
v ∈ S . We begin here in a similar way, but here it proves to be convenient to work
with multihomogeneous polynomials. So we need to develop some notation first.
For an m -tuple of positive integers d = (d1 , . . . , dm ), we denote by P(d) the
K -vector space of multihomogeneous polynomials P (x1 , . . . , xm ) of degree dh
in the block of variables xh . By Example A.6.13, we can identify
P(d) = Γ(P, OP (d)), P := PnK × · · · × PnK .
' () *
m factors
Let I = (i1 , . . . , im ) with ih = (ih0 , . . . , ihn ). Using 6.3.1, we define

i1
im
1 1 ∂ ∂
∂I := ... ... = ∂i1 · · · ∂im .
i1 ! im ! ∂x1 ∂xm
Similarly as in the homogeneous case, the normalizations yield that

f (x) = ∂I f (0)xi11 · · · xm im
I
for any polynomial or more generally any power series f .
Let L(x) = (L0 (x), . . . , Ln (x)) be linearly independent linear forms over F .
Then, given P ∈ P(d) and the differential operator operator ∂I , we can write

∂I P (x1 , . . . , xm ) = a(L; J; I) L(x1 )j1 · · · L(xm )jm
J
with jh = (jh0 , . . . , jhn ), J = (j1 , . . . , jm ), and a(L; J; I) ∈ F . By homo-
geneity, we also have |jh | = dh − |ih | for h = 1, . . . , m.
If V = (v1 , . . . , vm ), we will write Vi for the vector
Vi = (v1i , . . . , vmi ).
Also, it will prove to be convenient to use the notation
|v1 | |vm |
(V/d) := + ··· + .
d1 dm
7.5. Proof of the subspace theorem 205

Lemma 7.5.15. Let 0 < η ≤ 2/(n + 1), r := [F : K] and


4
m≥ log(2(n + 1)r|S|).
(n + 1)(n + 2)η 2
Then there are constants C2 , C3 depending only on the forms Lvi and K such
that for d1 , . . . , dm sufficiently large, there is a non-zero polynomial P ∈ P(d)
with
h(P ) ≤ C2 |d|, h((a(Lv ; J, I))) ≤ C3 |d| (v ∈ S).
Moreover, a(Lv ; J; I) = 0 for v ∈ S , whenever J and I satisfy
m m
(Ji /d) ≤ − 2mη or (Ji /d) ≥ + 2nmη
n+1 n+1
for some i = 0, . . . , n and
(I/d) ≤ mη.

Here and in the following, the lower bound for d1 , . . . , dm may depend on the
given data including η, m, and so on.
Proof: We give here a proof which parallels the argument used in Lemma 6.3.4
in the construction of the auxiliary polynomial. We are interested here only in
asymptotics for d1 → ∞, . . . , dm → ∞ for fixed m .
The dimension of the vector space P(d) is the number of m -tuples J = (j1 , . . . ,
jm ) with non-negative integer components such that (|j1 |, . . . , |jm |) = d .
Since prescribing jhi for i = 0, . . . , n − 1 with the sum not exceeding dh deter-
mines jhn , we have dim(P(d)) ∼ d1 . . . dm V0 with
m 1 1 n−1 
 
V0 = ... χ[0,1] xhi dxh0 · · · dxh,n−1
h=1 0 0 i=0

and χ[a,b] the characteristic function of the interval [a, b]. The integral equals
(n!)−1 and we obtain
V0 = (n!)−m . (7.23)
Let us fix v ∈ S and compute, for η > 0, an asymptotic upper bound for the
number of linear conditions (with coefficients in F ) imposed on an element P of
P(d) by the vanishing of the coefficients a(Lv ; J; 0) whenever
m
(Ji /d) ≤ − mη (7.24)
n+1
for some fixed i, for example i = 0. The number of such m -tuples J is asymp-
totic to d1 · · · dm V with
1 1 m  m n−1 2 m n−1
   
V = ··· χ[0, m −mη] xh0 χ[0,1−xh 0 ] xhi dxhi .
n +1
0 0 h=1 h=1 i=1 h=1 i=0
206 T H E S U B S PAC E T H E O R E M

Therefore, we have
1 1 m  m
  (1 − xh0 )n−1
V = ··· χ[0, nm+1 −mη] xh0 dx10 · · · dxm0 .
0 0 (n − 1)!
h=1 h=1

In order to obtain a good upper bound for V , we proceed as in Lemma 6.3.5 noting
the majorization
χ[a,b] (x) ≤ eλ·(b−x) ,
valid for λ ≥ 0. This decouples the variables xh0 and we get
1
m
(1 − x)n−1
V ≤ eλ/(n+1)−λη e−λx dx .
0 (n − 1)!
for any λ ≥ 0. Suppose now that 0 < λ ≤ n + 4. By expanding the exponential
into a MacLaurin series and integrating term by term we obtain
1 ∞
(1 − x)n−1 (−λ)k
e−λx dx =
0 (n − 1)! (n + k)!
k=0
$ %
1 λ 1
≤ 1− + λ 2
n! n + 1 (n + 1)(n + 2)

1 λ 1
< exp − + 2
λ .
n! n + 1 (n + 1)(n + 2)
Hence from the last two displayed equations we conclude that
1 2 1
V < m
e−mλη+mλ (n +1)(n +2)
(n!)
provided 0 < λ ≤ n + 4. If we choose for example λ = η(n + 1)(n + 2)/2 with
η ≤ 2/(n + 1), we get
1 2
V < e−(n+1)(n+2)η m/4 . (7.25)
(n!)m
Now we apply the relative Siegel lemma in 2.9.19 to find a non-zero P with small
coefficients satisfying (7.24) for some i = 0, . . . , n. The calculation is entirely
analogous to the proof of Lemma 6.3.4. The final result is that, if
1
r(n + 1)|S| V /V0 ≤ , (7.26)
2
then h(P ) ≤ C2 |d|, for d1 , . . . , dm sufficiently large and with C2 depending
only on the forms Lvi and K .
Applying a differential operator ∂I to P increases the height by not more than
(log 2)|d|, proving h(∂I P ) ≤ C2 |d|. Then a bound h((a(Lv ; J; I))) ≤ C3 |d| is
obtained looking at (∂I P )(Mv (x)), where Mv ◦ Lv (x) = x .
7.5. Proof of the subspace theorem 207

We note the vanishing of a(Lv ; J; I) whenever


(Ji /d) ≤ m/(n + 1) − 2mη
for some i = 0, . . . , n and for every I with (I/d) ≤ mη .
For (I/d) ≤ mη , the vanishing of a(Lv ; J; I) in the complementary range
(Ji /d) ≥ m/(n + 1) + 2nmη for some i
is automatic, because a(P ; J; I) = 0 only if (Ji /d) > m/(n + 1) − 2mη for
every i. In view of the condition |jh | ≤ dh , this implies that J verifies
n

m mn m
(Ji0 /d) < (Ji /d)− − 2mη n ≤ m− +2mnη = +2mnη
i=0
n+1 n+1 n+1
for every i0 , hence J is outside of the complementary range defined above.
It remains to verify condition (7.26). By (7.23) on page 205 and (7.25) this is is
verified as soon as m is so large that
2 1
r(n + 1)|S|e−(n+1)(n+2)η m/4 ≤ ,
2
which is our initial assumption in the lemma. 
7.5.16. Step II: A generalization of Roth’s lemma.
In the higher-dimensional setting of the subspace theorem, the non-vanishing of
the auxiliary polynomial cannot be obtained at a single point (x1 , . . . , xm ) and
has to be replaced by the vanishing on a product V1 × · · · × Vm , where the factors
Vh are K -vector subspaces of K n+1 , all of dimension n . Accordingly, the notion
of index of a multihomogeneous polynomial has to be changed as follows.
Let M1 , . . . , Mm ∈ Q[x0 , . . . , xn ] be non-zero linear forms, let M =
(M1 , . . . , Mm ), and let d = (d1 , . . . , dm ) be an m -tuple of positive numbers.
Definition 7.5.17. Denote by I(t; d; M) the ideal in Q[x1 , . . . , xm ] generated
by all monomials M1 (x1 )j1 · · · Mm (xm )jm with
(j/d) ≥ t.

Then for a multihomogeneous polynomial P , the index of P with respect to M


and weights d is
ind(P ; d; M) := sup{t ≥ 0 | P ∈ I(t; d; M)}.
7.5.18. For n = 1, Mj = x1 −αj x0 (j = 1, . . . , m) and Q := P (1, x1 , . . . , 1, xm),
it is clear that ind(P ; d; M) is the same as ind(Q; d; α) defined in 6.3.2. Clearly,
the properties from 6.3.2 extend to the multihomogeneous case.

Now we can state the generalized Roth’s lemma:


208 T H E S U B S PAC E T H E O R E M

Lemma 7.5.19. Let P (x1 , . . . , xm ) ∈ Q[x1 , . . . , xm ] be a multihomogeneous


polynomial, not identically 0, with partial degrees at most d1 , . . . , dm . Let M =
(M1 , . . . , Mm ) be non-zero linear forms in Q[x0 , . . . , xn ], and let 0 < σ ≤ 12 .
Suppose that:
(a) the degrees d1 , . . . , dm are rapidly decreasing, namely
dj+1 /dj ≤ σ (j = 1, . . . , m − 1);
(b) the linear forms Mj have large height, namely
min dj h(Mj ) ≥ nσ −1 (h(P ) + 4md1 ).
j

Then
m −1
ind(P ; d; M) ≤ 2mσ 1/2 .
Proof: The idea is to specialize, in each group of variables xh , all variables
except two to 0 (which we may relabel as (xh0 , xh1 )) and apply Roth’s lemma
from 6.3.7. However, we must make sure that the specialized polynomial is not
identically 0 and also we must be able to compare index and heights before and
after specialization.
We may assume n ≥ 2 since if n = 1 this is simply Roth’s lemma in a homoge-
neous setting. Let F be a number field containing the coefficients of P and M.
Let b = (b0 , . . . , bn ) ∈ F n+1 \ {0} and suppose for simplicity that b0 = 0. Then
  n
h(b) ≤ log(max(|b0 |v , |bi |v ))
v∈MF i=1
≤ n max h((b0 , bi )).
i=1,...,n

Thus, after relabeling the variables (xj0 , . . . , xjn ) we may assume that the linear
forms
Mj (xj ) = bj0 xj0 + · · · + bjn xjn
specialize under xji = 0 (i = 2, . . . , n) to
3j (xj ) = bj0 xj0 + bj1 xj1
M
with
h(Mj ) ≥ h(M 3j ) ≥ 1 h(Mj ).
n
Since bj0 = 0, we can write uniquely

P = c(j)M1 (x1 )j1 · · · Mm (xm )jm qj (x1 , . . . , xm ), (7.27)
j

where the coefficients qj are polynomials in the variables xj = (xj1 , . . . , xjm ).
This makes it plain that in computing the index of P we may restrict ourselves to
decompositions as in (7.27).
7.5. Proof of the subspace theorem 209

After removing from P the highest factor xk12 dividing it, we specialize x12 = 0,
obtaining a new polynomial P ∗ , not identically 0. Since the coefficients of P ∗
are a subset of the set of coefficients of P , we certainly have
h(P ∗ ) ≤ h(P ).
Moreover, by the uniqueness of the decomposition (7.27), if xk12 divides P , then
every, qj in (7.27) is divisible by xk12 , and it follows that
ind(P ∗ ; M|x12 =0 ; d) = ind(P ; M; d).
Proceeding step-by-step in this way, we eventually arrive at a multihomogeneous
polynomial P&(& &m ), in the variables x
x1 , . . . , x &j = (xj0 , xj1 ), not identically 0,
with multidegree &
r ≤ d componentwise, such that
h(P&) ≤ h(P ), 3 d) = ind(P ; M; d),
ind(P&; M; 3j ) ≥ h(Mj )/n
h(M
for j = 1, . . . , m.
We apply Roth’s lemma in 6.3.7 to the polynomial P& , with M3 in place of ξ (this
3j ) ≥
is due to the fact that we work here in a homogeneous setting). Since dj h(M
n dj h(Mj ), and h(P&) ≤ h(P ), condition (b) of Roth’s lemma is verified as
−1

soon as
min dj h(Mj ) ≥ nσ −1 (h(P ) + 4md1 ).
j

Therefore
3 d) ≤ 2mσ 1/2m −1 .
ind(P ; M; d) = ind(P&; M; 
7.5.20. Step III: The height of V (Q).
As in Section 2.8, we define the height of a non-zero subspace V of Q by h(V ) :=
h(b1 ∧ · · · ∧ bk ), where b1 , . . . , bk is a basis of V . In the Roth case n = 1 we
have dim(V (Q)) = 1 and V (Q) is generated by x , hence
h(V (Q)) = h(x) = log(Q).
The goal of this key step is to show that a result of comparable strength still holds
if n ≥ 2.
Lemma 7.5.21. Let rank(Π(Q)) = n , let cmax = maxvi cvi , and suppose that
log(Q) ≥ C4 ε−1 . There is a linear space W , independent of Π(Q) and ε , such
that either V (Q) = W or
(4r |S|)−1 ε log(Q) − C5 ≤ h(V (Q)) ≤ ncmax |S| log(Q) + C6 .
Proof: Let y(1) , . . . , y(n) be a basis of V (Q) with ι(y(i) ) ∈ Π(Q). Then V (Q)
is given by an equation

y(1) ∧ · · · ∧ y(n) ∧ x = 0. (7.28)


210 T H E S U B S PAC E T H E O R E M

Hence

n
h(V (Q)) = h(y(1) ∧ · · · ∧ y(n) ) ≤ h(y(i) ) + C6 . (7.29)
i=1

We also have for v ∈ S , using ι(y(i) ) ∈ Π(Q) and that the form Lvi (i =
0, . . . , n) are linearly independent
|y(i) |v,K  |Lv (y(i) )|v,K ≤ max Qcv j ≤ Qcmax ,
j

while |y |v ≤ 1 for v ∈
(i)
/ S . In view of (7.29), we get the upper bound for
h(V (Q)).
The proof of the lower bound is more intricate. For the linear forms
 vk = (Lv0 ∧ · · · ∧ Lv,k−1 ∧ Lv,k+1 ∧ · · · ∧ Lvn )∗
L (k = 0, . . . , n)
we set
 vk ((y(1) ∧ · · · ∧ y(n) )∗ ).
Dvk = L
Since the ∗-operator is an isometry and by Laplace’s identity (7.16) on page 198
we have

Dvk = det Lvi (y(j) ) .


i∈{0,...,n}\{k};j∈{1,...,n}

Therefore, expanding the determinant and using |Lvi (y(j) )|v,K ≤ Qcv i , we find

|Dvk |v,K ≤ max(1, |n!|v ) max |Lvi (yπ(i) )|v,K
π
i=k (7.30)

−cv k + i cv i
≤ max(1, |n!|v ) Q ,
where π ranges over all bijective mappings π : {0, . . . , n} \ {k} → {1, . . . , n}.
Now suppose for the time being that there is an |S|-tuple {iv } such that
 ε
cviv ≥ − , Dviv = 0 (v ∈ S). (7.31)
4
v∈S

Then by (7.30) we get


  
|Dviv |v,K ≤ n! Q− v cv i v + v i cv i ≤ n! Q 4 − 2 = n! Q− 4 .
ε ε ε
0< (7.32)
v∈S

Since 0 = Dviv ∈ F , the fundamental inequality in (1.8) on page 20 yields


 
log |Dviv |v,K ≥ −r h(Dviv )
v∈S v∈S

= −r  vi ((y(1) ∧ · · · ∧ y(n) )∗ ))
h(L (7.33)
v
v∈S
≥ −r |S| h(V (Q)) − C7 .
7.5. Proof of the subspace theorem 211

By (7.32) and (7.33), we get


ε
−r |S| h(V (Q)) − C7 ≤ − log(Q) + log(n!),
4
thereby proving the lower bound estimate for h(V (Q)).
It remains to verify the preceding assumption on the existence of the |S|-tuple
{iv }. Hence let us suppose that this is not the case. Let
Iv = {i ∈ {0, . . . , n} | Dvi = 0}.
The system of equations
 vi (y) = 0
L (i ∈ {0, . . . , n} \ Iv )
has the non-trivial solution y = (y(1) ∧ · · · ∧ y(n) )∗ ∈ K n+1 \ {0}. However,
this system of equations depends only on the linear forms Lvi and not on Π(Q).
Hence let us fix a non-trivial solution w ∈ K n+1 \ {0} of this system (if we deal
with an empty system, any non-zero vector in K n+1 will work here) and let W
be the K -vector space W = {x ∈ K n+1 | w · x = 0}. We will show that, if
log(Q) is sufficiently large, then V (Q) ⊂ W , hence V (Q) = W because they
have the same dimension n, which is the alternative conclusion of the lemma.
In order to prove this, we deduce from Laplace’s expansion (7.17) on page 198 the
identity
n
x · w = det(Lv )−1  vi (w)
(−1)i Lvi (x)L
i=0
 vi (w) = 0 for i ∈
for any x and w . Moreover, in our case we have L / Iv , hence

x · w = det(Lv )−1 (−1)i Lvi (x)L vi (w).
(7.34)
i∈Iv

Now for each v ∈ S choose jv such that


cvjv = max cvi
i∈Iv

and note that, if x ∈ V (Q) \ W with ι(x) ∈ Π(Q), then we have


|Lvi (x)|v,K ≤ Qcv i ≤ Qcv j v (i ∈ Iv ).
Thus for v ∈ S and every such x we have a bound
|x · w|v  Qcv j v . (7.35)
On the other hand, since jv ∈ Iv our assumption of the non-existence of a good
|S|-tuple as in (7.31) shows that

cvjv < −ε/4.
v∈S
212 T H E S U B S PAC E T H E O R E M

Then, by the product formula, (7.34), (7.35), and the fundamental inequality (1.8)
on page 20, we find
    
1= |x · w|v = |x · w|v |x · w|v ≤ |x · w|v |w|v
v∈MK v∈S v ∈S
/ v∈S v ∈S
/

−ε/4
Q v ∈S cv j v
Q .
−1
Therefore, log(Q)  ε contradicting, for large C4 , the hypothesis log(Q) ≥
C4 ε−1 of the lemma. 
7.5.22. Step IV: Application of the generalized Roth’s lemma.
Here we combine Lemma 7.5.15 and the generalized Roth’s lemma from 7.5.19,
obtaining a polynomial vanishing to high order at Lv , v ∈ S , but not identically
0 when restricted to V1 × · · · × Vm , where the Vh s are suitable linear subspaces
of K n+1 .
We choose parameters as follows
4
m≥ log(2(n + 1)[F : K] |S|),
(n + 1)(n + 2)η 2
m −1 1
σ = (η/4)2 , η≤ .
n+1
We prove Theorem 7.5.13 by contradiction. Then there is a sequence Qν ≥ 1
with rank(Πv (Qν )) = n and V (Qν ) = V (Qµ ) for µ = ν . We may assume
that V (Qν ) omits the exceptional subspace W from Lemma 7.5.21. Going to a
subsequence (again denoted by Qν ), we may assume that log(Qν ) → ∞ at an
arbitrarily fast rate. Indeed, if this were not the case, then V (Qν ) would have
bounded height and, by Northcott’s theorem in 2.4.9, there would be only finitely
many spaces V (Qν ). Hence we may and shall assume that
log(Q1 ) ≥ C,
(7.36)
log(Qj+1 ) ≥ 2σ −1 log(Qj )
for every j and any given constant C , which may depend on the given parameters.
Since rank(Π(Qν )) = n , the vector space V (Qν ) is defined by a single equation
bν0 x0 + · · · + bνn xn = 0
and we denote by Mν the associated linear form, thus (7.28) on page 209 shows
h(V (Qj )) = h(Mj )
for every j . Now we take dj = D/ log Qj  (j = 1, . . . , m) and apply Lemma
7.5.15. This gives us, for large D , a certain non-zero polynomial P of multidegree
d. We claim that P and M = (M1 , . . . , Mm ) satisfy the hypotheses of the
generalized Roth lemma in 7.5.19.
7.5. Proof of the subspace theorem 213

In order to verify (a) and (b) of Lemma 7.5.19 we appeal to Lemma 7.5.21.
Condition (a), in view of our choice of the weights dj , follows from (7.36) for D
sufficiently large, which we will assume from now on.
For condition (b), we note first that, if log(Q1 ) ≥ C4 ε−1 , we have by Lemma
7.5.21 the estimate
dj h(Mj ) = dj h(V (Qj ))
≥ D/ log(Qj )((4r |S|)−1 ε log(Qj ) − C5 ) ≥ C8 εD.
On the other hand, the bound for h(P ) from Lemma 7.5.15 and the just verified
dj+1 ≤ σdj yield
nσ −1 (h(P ) + 4md1 ) ≤ nσ −1 (C2 + 4m)(D/ log(Q1 ))(1 + σ + · · · + σ m−1 )
 σ −1 mD/ log(Q1 ),
which is negligible with respect to εD if log(Q1 ) ≥ C9 (εσ)−1 m with C9 large
enough. This proves that condition b) is satisfied for large log(Q1 ).
Therefore, Lemma 7.5.19 yields
ind(P ; d; M) ≤ mη/2.
It follows that there is I = (i1 , . . . , im ) with
(I/d) ≤ mη/2
such that ∂I P does not vanish identically on the product space
V (Q1 ) × · · · × V (Qm ).
7.5.23. Step V: Non-vanishing at a small point of V (Q1 ) × · · · × V (Qm ).
What we really want is the non-vanishing of a derivative of P at a point Y =
(y1 , . . . , ym ) with yh of small height, say comparable with Qh . The next easy
lemma allows us to do so from the information we have gathered so far.
Lemma 7.5.24. Let k be field of characteristic 0 and x1 , . . . , xN algebraically
independent over k . Let f (x1 , . . . , xN ) ∈ k[x1 , . . . , xN ] be a non-zero polyno-
mial of degree at most ej in xj , and let B > 0. Then there are rational integers
zj and ij (j = 1, . . . , N ), with
|zj | ≤ B, 0 ≤ ij ≤ ej /B
for j = 1, . . . , N , such that for i = (i1 , . . . , iN ) we have
∂i f (z1 , . . . , zN ) = 0.

By induction on N . If N = 1, it suffices to note that f cannot be divisible


Proof:
by { |b|≤B (x1 − b)}e1 /B+1 , which has degree strictly greater than e1 . The
general case is a straightforward induction. 
214 T H E S U B S PAC E T H E O R E M

The general idea in applying this lemma is to consider many points which are linear
combinations of a basis y(i) with small coefficients and show that the auxiliary
polynomial we have constructed cannot vanish at every such point. However, we
must be sure that the height of the polynomial evaluated at the point does not
increase more that O(|d|). This means that the size B of the coefficients must
be kept bounded and then we would have too few points at our disposal to meet
our goal. On the other hand, applying a differential operator ∂I only increases
the height by O(|d|), and ∂i f cannot vanish at a point for every i unless f is
identically 0. Thus, if we vary not only the choice of the point but also vary the
polynomial by applying a differential operator, we can prove what we want without
increasing the height too much in the process.
(1) (n)
The details are as follows. Let yh , . . . , yh ∈ K n+1 be linearly independent
(l)
points with ι(yh ) ∈ Π(Qh ), which is possible because Π(Qh ) has rank n . We
(1) (n)
write zh = (zh1 , . . . , zhn ), Z = (z1 , . . . , zm ), and Yh = (yh , . . . , yh ). Then
the polynomial
R(Z) := (∂I P )(z1 · Y1 , . . . , zm · Ym )
is not identically 0 because ∂I P does not vanish identically on V (Q1 ) × · · · ×
V (Qm ). Clearly, R(Z) has degree at most dh in the block of variables zh . Hence
by the previous lemma there are a point Z ∈ Zmn , and rational integers jhl
(h = 1, . . . , m, l = 1, . . . , n), such that

|zhl | ≤ B, 0 ≤ jhl ≤ dh /B
and
(∂J R)(Z ) = 0.
Since ∂J R is a linear combination of derivatives ∂I P evaluated at the point
 n 
 
n
X = (x1 , . . . , xm ) :=  
(l)
z1l · y1 , . . . , zml · ym
(l)
,
l=1 l=1

there is a differential operator ∂ I with I = I + I for some I∗ = (i∗1 , . . . , i∗m )


 ∗

with |i∗h | ≤ ndh /B , such that


(∂I P )(X ) = 0.
Hence (I /d) ≤ mη
2 + mn
B and if we choose B = 2n/η we see that
(I /d) ≤ mη. (7.37)
We write T (X) := ∂I P (X).
Lemma 7.5.25. Let m and 0 < η ≤ 1/(n + 1) be given, with
4
m≥ log(2(n + 1)[F : K] |S|).
(n + 1)(n + 2)η 2
7.5. Proof of the subspace theorem 215

m −1
Let σ = (η/4)2 and let Qh (h = 1, . . . , m) be such that rank(Π(Qh )) = n
and, for a certain constant C10 ,
log(Q1 ) ≥ C10 (εσ)−1 m, log(Qh+1 ) ≥ 2σ −1 log(Qh ) (h = 1, . . . , m−1).
(l)
For h = 1, . . . , m, let yh (l = 1, . . . , n) be a basis of V (Qh ) such that
(l)
ι(yh ) ∈ Π(Qh ). Then there are a non-zero multihomogeneous polynomial T (X) ∈
K[X], of multidegree majorized by d componentwise, and rational integers zhl
with
|zhl | ≤ 2n/η
n
with the following properties. Let xh = l=1 zhl yh . Then:
(l)

(a) T (X ) = 0.

(b) h(T )  d1 .
 j1 jm
(c) If v ∈ S and T (X) = J a(Lv ; J)Lv (x1 ) . . . Lv (xm ) , then
 
 
a(Lv ; J) = 0 unless (Ji /d) − m  ≤ 2nmη for every i .
 n + 1
(d) h((a(Lv ; J)))  d1 for every v ∈ S .
Proof: Statement (a) follows from the construction of X . Also, (b) and (d) follow
from h(P )  |d| (see Lemma 7.5.15) and |d|  d1 . Finally, we note that
a(Lv ; J) is non-zero if and only if a(Lv ; J; I ) is non-zero. Hence (c) follows
from (7.37) and Lemma 7.5.15. 
7.5.26. Step VI: Conclusion of the proof of Theorem 7.5.13.
The proof of Theorem 7.5.13 is now easy and follows the blueprint of the proof of
Roth’s theorem. On the one hand, we have by the product formula and T (X ) = 0
that 
log |T (X )|v = 0.
v∈MK
On the other hand, |xh |v ≤ 1 for v ∈ S and hence
  
log |T (X )|v ≤ log |T (X )|v + log |T |v .
v∈MK v∈S v ∈S
/

By a change of coordinates, it is easy to check that


 
max |a(Lv ; J)|v,K = log |T |v + O(d1 )
J
v∈S v∈S

and hence the last three displayed formulas and Lemma 7.5.25 lead to
  
0≤ log max Lv (x1 )j1 · · · Lv (xm )jm v,K + O(d1 ).
a(Lv ;J)=0
v∈S
216 T H E S U B S PAC E T H E O R E M

For v ∈ S and a(Lv ; J) = 0, using


(l)
dh log(Qh ) = D + o(D), |zhl | ≤ 2n/η, |Lvi (yh )|v,K ≤ Qchv i ,
and Lemma 7.5.25 (c), we have for D sufficiently large
  
m 
n
log Lv (x1 )j1 · · · Lv (xm )jm  v,K
= jhi log |Lvi (xh )|v,K
h=1 i=0

m 
n
≤ jhi {O(1) + max log |zhl |v + cvi log(Qh )}
hl
h=1 i=0
m  n
jhi
= cvi dh log(Qh ) + O(log(1/η) d1 )
dh
h=1 i=0
n
=D cvi (Ji /d) + O(log(1/η) D/ log(Q1 ))
i=0
n

m
= cvi D + O(mη D) + O(log(1/η) D/ log(Q1 )).
i=0
n+1
Therefore, putting together the last two sequences of inequalities and recalling the
definition of ε at the beginning of the proof of Theorem 7.5.13, we deduce
 n

m
0≤ cvi D + O(mη D) + O(log(1/η) D/ log(Q1 ))
i=0
n +1
v∈S
ε/2
≤− m D + O(mηD) + O(log(1/η) D/ log(Q1 )).
n+1
Hence, dividing by mD , we conclude that

ε/2 log(1/η)
0≤− + O(η) + O .
n+1 m log(Q1 )
Since we are allowed to take Q1 arbitrarily large, we get 0 ≤ −ε/(2n+2)+O(η),
a contradiction if η is positive and small. This completes the proof of Theorem
7.5.13. 
7.5.27. Step VII: A general strategy and Evertse’s lemma.
In order to provide some motivation for what follows, we begin by describing in a
special case Schmidt’s original strategy for the proof of the subspace theorem.
Consider the simplest case, namely n = 2, K = Q , S = {∞} and three linear
forms L1 , L2 , L3 . Let λ1 , λ2 , λ3 be the three successive minima of Π(Qν ). If
λ1 ≤ λ2 ≤ 1, the rank is 2 and Theorem 7.5.13 can be applied. If instead
λ1 ≤ 1 < λ2 ≤ λ3 , the rank is 1. By Minkowski’s second theorem, we have
β 3 (Π(Qν ))−1  λ1 λ2 λ3  β 3 (Π(Qν ))−1 .
7.5. Proof of the subspace theorem 217

The volume of Π(Qν ) is relatively small, namely


β 3 (Π(Qν ))  Q−ε/2
ν .
Consider now the new linear forms Lij = Li ∧Lj in the space V = R3 ∧R3 ∼ = R3 .
There is a naturally associated parallelepiped Π2 (Qν ) in V , determined by the
forms Lij and the set of exponents cij = ci + cj .
Let x(1) , x(2) , x(3) determine the successive minima of Π(Qν ). Then the vectors
x(ij) = x(i) ∧ x(j) in V are linearly independent and
|Lij (x(pq) )|  λp λq Qcνi j .
From this, it is plain that the successive minima λ1,2 ≤ λ1,3 ≤ λ2,3 of Π2 (Q) (it
is convenient to use this indexing here) are majorized by
λp,q  λp λq .
Since β (Π2 (Qν )) is of the same order as β 3 (Π(Qν ))2 , Minkowski’s second
3

theorem shows that



(λ1 λ2 λ3 )2  λp,q  (λ1 λ2 λ3 )2 .
This estimate, in conjunction with the above upper bound for λp,q , yields
λp λq  λp,q  λp λq .
This is a special case of a general theorem by Mahler on the so-called compound
convex bodies.
If λ2 β 3 (Π(Qν )) ≥ C for some sufficiently large positive constant C , Minkowski’s
second theorem yields λ1,3 ≤ 1 for Qν sufficiently large. Then Theorem 7.5.13
will be applicable to Π2 (Qν ), with the conclusion that the points x(p) ∧ x(q)
belong to finitely many linear subspaces of R3 ∧ R3 . From this, we can de-
duce that the points x(1) also belong to finitely many linear subspaces of R3 .
This will prove what we want except in the relatively narrow range 1 < λ2 <
Cβ 3 (Π(Qν ))−1 .
One more idea is needed to complete the proof. In order to deal with the re-
maining range, Schmidt deforms the parallelepiped Π2 (Qν ) by stretching it but
keeping fixed the point x(1) , controlling the change in the successive minima us-
ing a lemma of Davenport. The new successive minima can then be brought in a
range to which we can apply Theorem 7.5.13, concluding the argument.
7.5.28. A clever simplification of Schmidt’s original proof was later found by Ev-
ertse, by means of a simple lemma bypassing Schmidt’s use of Mahler’s results on
successive minima of compound convex bodies, as well as Davenport’s lemma on
successive minima of parallelepipeds after stretching.
Evertse’s lemma, given here in a slightly simplified form paying no attention to
numerical constants, is the following:
218 T H E S U B S PAC E T H E O R E M

Lemma 7.5.29. Let K be a number field, S a finite set of places of K containing


all the archimedean places and let x(1) , . . . , x(n+1) be a basis of a K -vector
space V . Further, for v ∈ S , let Lv0 , . . . , Lvn be linearly independent linear
forms on V with coefficients in Kv , and let µvj be real numbers such that
 
 
0 < µv1 ≤ µv2 ≤ · · · ≤ µv,n+1 , Lvk (x(j) ) ≤ µvj
v

for all k, j . Then there are vectors



i−1
v(1) = x(1) , v(i) = ξij x(j) + x(i)
j=1

with ξij ∈ OS,K (1 ≤ j < i ≤ n + 1), bijective maps πv : {1, . . . , n + 1} →


{0. . . . , n}, and a positive constant C such that for v ∈ S and i, j = 1, . . . , n + 1
it holds

 
 (j)  C min(µvi , µvj ) if v is archimedean,
Lv,πv (i) (v ) ≤
v min(µvi , µvj ) if v is not archimedean.

The constant C depends only on K , S , and the set of linear forms Lvi .
Proof: By induction on n, the case n = 0 being trivial. Now suppose n ≥ 1 and
that the lemma holds for n − 1 in place of n .
Let us fix v ∈ S and let V  be the K -vector subspace of V with basis x(i)
(i = 1, . . . , n). Then the linear system

n
Lvk (x(j) )αvk = 0, (j = 1, . . . , n)
k=0

has a non-trivial solution (αv0 , . . . , αvn ) ∈ Kvn+1 . We define πv (n + 1) to be


any index for which
 
αv,π (n+1)  = max |αvi |v .
v v i

Obviously, αv,πv (n+1) = 0. Hence, setting


βvi = −αvi /αv,πv (n+1) ,
we have

Lv,πv (n+1) (x(j) ) = βvk Lvk (x(j) ), |βvk |v ≤ 1. (7.38)
k=πv (n+1)

The restrictions of the forms Lvi to V  yield a set of linear forms of rank n and
the restriction of Lv,πv (n+1) to V  is linearly dependent on the restrictions of the
remaining linear forms. Hence the restrictions to V  of the linear forms Lvk ,
7.5. Proof of the subspace theorem 219

k = πv (n + 1), are linearly independent over Kv . By the induction hypothesis,


there are bijective maps πv from {1, . . . , n} to {0, . . . , n}\πv (n+1), and vectors

i−1
v(1) = x(1) , v(i) = ξij x(j) + x(i)
j=1

for i = 1, . . . , n, with ξij ∈ OS,K , such that for i, j = 1, . . . , n and all v ∈ S


we have

  C  min(µvi , µvj )
 (j)  if v is archimedean,
Lv,πv (i) (v ) ≤ (7.39)
v min(µvi , µvj ) if v is not archimedean.
By definition of µvj and (7.38), this estimate continues to hold if i = n + 1 and

j = 1, . . . , n. Thus to complete the proof it suffices to show there are ξn+1,j ∈
OS,K such that, setting

n

v(n+1) = ξn+1,j v(j) + x(n+1) ,
j=1

we have for i = 1, . . . , n and v ∈ S



 
 (n+1)  Cµvi if v is archimedean,
Lv,πv (i) (v ) ≤
v µvi if v is not archimedean.
Indeed, for i = n + 1 this follows as above from (7.38).
Since the linear forms Lv,πv (i) , i = n + 1 are linearly independent on V  , which
is also generated by v(1) , . . . , v(n) , we can solve the linear system

n
Lv,πv (i) (x(n+1) ) = γvj Lv,πv (i) (v(j) ) (i = n + 1) (7.40)
j=1

with γvj ∈ Kv .

Now note that OS,K is a lattice in v∈S Kv , meaning that it is a discrete subgroup
with compact quotient. This follows as in the proof of Proposition C.2.6, which is
the special case, where S is the set of archimedean places. Hence the lattice has
 
a bounded fundamental domain and thus there is a vector (ξn+1,1 , . . . , ξn+1,n )∈
OS,K such that for v ∈ S we have
n


|ξn+1,j + γvj |v ≤ Av ,
with Av bounded only in terms of K and S . Moreover, we may assume that
Av = 1 for non-archimedean v by the following simple argument. There is a non-
zero m ∈ OS,K , depending only on K and S , such that |m|v ≤ min(1, A−1 v ) for
v | ∞ (v ∈ S) (for example, a sufficiently high power of the product of the rational
primes p with v|p ). Then, applying the argument to m−1 γvi and multiplying by
220 T H E S U B S PAC E T H E O R E M

m afterwards, we obtain the result with Av = 1 for v | ∞ and Av replaced by


|m|v Av at the infinite places. Hence, setting

n

v(n+1) = x(n+1) + ξn+1,j v(j)
j=1

we infer from (7.40) and the inductive step (7.39) the estimate
    
 (n+1) 
 n

Lv,πv (i) (v 
) = Lv,πv (i) (x (n+1)
)+  (j) 
ξn+1,j Lv,πv (i) (v )
v v
j=1
 
 n 

=  (j) 
(γvj + ξn+1,j )Lv,πv (i) (v )
j=1 v

|n|v Av C  µvi if v is archimedean,

µvi if v is not archimedean,

for i = 1, . . . , n. 
7.5.30. Step VIII: Application of the Grassmann algebra.
Let Π(Q) be an approximation domain associated to the forms Lvi and parame-
ters cvi as in Step 0, and suppose that R := rank(Π(Q)) is such that 1 ≤ R ≤ n .
Then R is determined by λR ≤ 1 < λR+1 and, as already noted in the proof of
Lemma 7.5.12, we have
λn+1  Qε/(2n+2) .
Now we define k to be the smallest integer in the interval [R, n] such that the
quotient λk /λk+1 is minimal. Since
⎛ ⎞ n +1−R
1

n
1
λk λj ⎠ λR n +1−R 1
− n +1−R
≤⎝ = ≤ λn+1 ,
λk+1 λj+1 λn+1
j=R

we have
λk
 Q−ε/{2(n+1)n} . (7.41)
λk+1
As usual, we write

[Kv : R]/[K : Q] if v is archimedean,
εv =
0 otherwise.

Then we define
µvj = λεj v (j = 1, . . . , n + 1).
7.5. Proof of the subspace theorem 221


Since v|∞ εv = 1 (Corollary 1.3.2), we have

µvj = λj , µvj = 1 if v | ∞, v ∈ S. (7.42)
v|∞

Let x(j) ∈ OS,K


n+1
(j = 1, . . . , n + 1) be linearly independent points such that
(j)
ι(x ) determine the successive minima of Π(Q). Then
 
 
Lvi (x(j) ) ≤ µvj Qcv i
v,K

for v ∈ S , i = 0, . . . , n and j = 1, . . . , n + 1 (recall that the absolute val-


ues are normalized so that for v|∞ we have |ta|v = tεv |a|v for t > 0). We
apply Evertse’s lemma to this situation and infer that there are v(i) ∈ OS,K n+1

(i = 1, . . . , n + 1) such that

i−1
v(1) = x(1) , v(i) = ξij x(j) + x(i)
j=1

and for each v ∈ S there is a bijection πv : {1, . . . , n + 1} → {0, . . . , n} such


that

  C11 min(µvi , µvj ) Qcv , π v (i )
 (j)  if v|∞,
Lv,πv (i) (v ) ≤ (7.43)
v,K Qc v , π v (i ) if v | ∞,

for i, j = 1, . . . , n + 1.
Now we pass to the Grassmann algebra of order n + 1 − k , where k , as defined
at the beginning of this article, verifies (7.41). We abbreviate
i = (i1 , . . . , in+1−k ), where {i1 < i2 < · · · < in+1−k },
Lv,πv (i) = Lv,πv (i1 ) ∧ Lv,πv (i2 ) ∧ · · · ∧ Lv,πv (in +1−k ) ,
v(i) = v(i1 ) ∧ v(i2 ) ∧ · · · ∧ v(in +1−k ) ,
cv,πv (i) = cv,πv (i1 ) + cv,πv (i2 ) + · · · + cv,πv (in +1−k ) ,

n+1−k 
n+1−k
λi = λiν , µvi = µv,iν .
ν=1 ν=1

The linear forms Lv,πv (i) are linearly independent and so are the points v(i) .
By (7.43) and the Laplace identity (7.16) on page 198, it is immediate that, for
some constant C12 , we have for every i and j and v ∈ S

  C12 µvi Qcv , π v (i)
 (j)  if v|∞,
Lv,πv (i) (v ) ≤ (7.44)
v,K c
Q vv , π (i) if v | ∞.
222 T H E S U B S PAC E T H E O R E M

Moreover, if i = (k + 1, k + 2, . . . , n + 1) but j = (k + 1, k + 2, . . . , n + 1)
Evertse’s lemma shows that we can do a little better, namely
 
 
Lv,πv (i) (v(j) ) ≤ C12 · (µvk /µv,k+1 ) · µvi Qcv , π v (i) if v|∞, (7.45)
v,K
because in this case j1 ≤ k , hence
µvj ≤ µvk µv,k+2 · · · µv,n+1 = (µvk /µv,k+1 ) · µvi .
Now we can prove:
Lemma 7.5.31. Let S(Q) be the symmetric convex domain in ∧n+1−k (KAn+1 )
defined in obvious notation by
 
Lv,π (i) (X) ≤ C12 µvi Qcv , π v (i) v|∞, i = (k + 1, . . . , n + 1),
v v,K
 
Lv,π (i) (X) ≤ C12 · (µvk /µv,k+1 ) · µvi Qcv , π v (i)
v|∞, i = (k + 1, . . . , n + 1),
v v,K
 
Lv,π (i) (X) ≤ Q v , π v (i)
c
v ∈ S, v | ∞,
v v,K
 
 X ≤ 1 v∈/ S.
v
Let λj (S(Q)) be the successive minima of S(Q). Then we have

n+1
λj (S(Q)) ≤ 1 if j < ,
k

n+1
λj (S(Q)) > C13 Qε/{2n(n+1))} if j = .
k

 
Proof: The first part of the thesis of the lemma (namely, for j < n+1
k ) is obvious
from (7.44) and (7.45), since they provide independent points in S(Q). For the
second part, we apply Minkowski’s second theorem, which gives
⎛ n +1 ⎞
(k )
⎜ ⎟ n +1
1⎝ λj (S(Q))⎠ β ( k ) (S(Q))1/d  1, (7.46)
j=1

where the constants involved depending only on K , S , and n . By Lemma 7.5.7


and (7.42)
n +1  $ µvk  %
β ( k ) (S(Q))1/d  µvi Qcv , π v (i)
µv,k+1
v∈S i


λk
 λi Q v i cv , π v (i)
λk+1
i

(nk )
λk n 
= λ1 · · · λn+1 Q(k ) v i cv , π v (i ) .
λk+1
7.5. Proof of the subspace theorem 223

By Minkowski’s second theorem, Lemma 7.5.7, and (7.41) on page 220 this may
be bounded by

(nk )
λk λk
 λ1 · · · λn+1 β n+1
(Π(Q))1/d
  Q−ε/{2n(n+1)} .
λk+1 λk+1
 
We have already noticed that λj (S(Q)) ≤ 1 if j < n+1 k+1 , hence the second part
of the thesis follows from the left-hand side of (7.46). 
7.5.32. Step IX: Proof of the subspace theorem.
We apply Theorem 7.5.13 as follows. Let (Qν ) be a unbounded family such that
we have approximation domains Π(Qν ) as in Step 0 with rank(Π(Qν )) = R ,
where 1 ≤ R < n (the case R = n being already covered by Theorem 7.5.13).
By going to a subfamily, we may assume that the parameter k defined at the be-
ginning of Step VIII (see 7.5.30) is constant along this family.
With µvi as in Step VIII (relative now to Q = Qν ), we note that there is a constant
C14 > 0 such that
Q−C
ν
14
≤ µvi ≤ QC
ν
14

for every v|∞ and every i (of course, µvi = 1 if v | ∞). The argument is the same
as for the proof of Corollary 7.5.5. Clearly, it suffices to verify the corresponding
result for the successive minima of Π(Qν ). By 7.5.6 and Lemma 7.5.7, we have

≥ Qν−2rd(n+1)|S|
d cv i
β n+1 (Π(Qν ))  Qν vi

and hence, by Minkowski’s second theorem, it suffices to obtain a lower bound for
the first minimum. Since the linear forms Lv0 , . . . , Lvn are linearly independent,
we have for every v ∈ S and x ∈ K n+1 \ {0} with ι(x) ∈ Π(Qν )
log |x|v ≤ max log |Lvi (x)|v,K + C15 ≤ cvi log(Qν ) + C15
i

for some constant C15 . Since cvi ≤ 2r , from this it follows haff (xi )  log(Qν )
and
−C16 log(Qν ) ≤ log max |Lvi (x)|v,K
i

for large Qν (use (1.8) on page 20). If we apply this with x such that ι(x) deter-
mines the first successive minimum of Π(Qν ) and with v|∞, then the right-hand
side is bounded by log |λ1 |v + cvi log(Qν ), proving what we want.
Once this observation has been made, we proceed as we did in defining approxi-
mation classes and, going once more to a subfamily, we may assume that, given
any small positive number γ > 0, there are bounded real numbers dvi (v|∞)
such that
−γεv + dvi < log(C12 µvi )/ log(Qν ) ≤ dvi if i = (k + 1, . . . , n + 1)
224 T H E S U B S PAC E T H E O R E M

and
−γεv +dvi < log(C12 (µvk/µv,k+1 )µvi )/ log(Qν ) ≤ dvi if i = (k+1, . . . , n+1).
If v ∈ S and v | ∞, we set dvi = 0.
If Πk (Qν ) is the parallelepiped in ∧n+1−k (KAn+1 ) defined by
 
Lv,π (i) (X) c +d
≤ Qνv , π v (i) v i (v ∈ S)
v v,K
 
X ≤1 (v ∈/ S),
v,K

then it is obvious that S(Qν ) ⊂ Πk (Qν ) and in particular λj (S(Qν )) ≥ λj


(Πk (Qν )) for every j . On the other hand, the volume of Πk (Qν ) does not in-
crease too much if γ is small, in fact Lemma 7.5.7 yields
n +1
) (Π (Q ))1/d  β (n +1 (n +1
k )γ
β( k k
ν k ) (S(Q ))1/d Q
ν ν .
n+1
Therefore, if we take for example γ = ε/{4n(n + 1) k }, by Minkowski’s
second theorem as in the proof of Lemma 7.5.31, we still find, for large Qν , that

n+1
λj (Πk (Qν )) ≤ 1 if j < ,
k

n+1
λj (Πk (Qν )) > C17 Qνε/{4n(n+1)} if j = .
k

Hence

n+1
rank(Πk (Qν )) = −1
k
as soon as Qν is sufficiently large, which we may suppose.
We have
  1

µvk 

dvi ≤ log + log(C12 µvi ) + γεv


log(Qν ) µv,k+1
v∈S i v|∞ i


1 λk
= log + log λi + O(1) + γ
log(Qν ) λk+1
i

1 λk n
= log + log(λ1 · · · λn+1 ) + O(1) + γ.
log(Qν ) λk+1 k
Therefore, using (7.41) on page 220 and again Minkowski’s second main theorem
together with Lemma 7.5.7, we get


ε n O(1)
dvi ≤ − − cvi + γ + .
2(n + 1)n k vi log(Qν )
v∈S i
7.5. Proof of the subspace theorem 225

Now we note that




   n 
cv,πv (i) + dvi = cvi + dvi = cvi + dvi .
k vi
i v∈S vi vi vi

Thus the above and our choice of γ gives


 
  ε 1 O(1)
cv,πv (i) + dvi ≤− 1 − n+1 + ,
2(n + 1)n 2 k log(Qν )
i v∈S

which is negative for Qν sufficiently large.


Now we are able to apply Theorem 7.5.13 to this situation, concluding that the
vector spaces V (Πk (Qν )) form a finite set.
Let v(i) = v(i1 ) ∧ · · · ∧ v(in +1−k ) be the points constructed in 7.5.30. Obviously,
they are linearly independent over K . Those with i = (k + 1, . . . , n + 1) belong
to V (Πk (Qν )), hence form a basis of V (Πk (Qν )).

Lemma 7.5.33. Let V be a vector space over K of dimension n + 1 and let W


be a subspace of dimension k . Let x(1) , . . . , x(k) be a basis of W and extend
it to a basis x(1) , .. . , x(n+1) of V . Let W  denote the subspace of ∧n+1−k V
generated by x(i) i=(k+1,...,n+1) . Then W  is independent of the choice of the
basis and is uniquely determined by W .
Proof: Recall that ∧n+1−k V is the dual of ∧k V with respect to the non-degenerate
pairing ω, ω   = ω ∧ ω  .
Since W  is the annihilator W ⊥ of ∧k W = Kx(1) ∧ · · · ∧ x(k) with respect to
this pairing, W  is well defined by W . Since the Grassmann coordinate ∧k W =
(W  )⊥ determines W uniquely, we get the claim. 
Now we can finish the proof of the subspace theorem.
We argue by contradiction, hence we get an infinite set X = {xν } as in Lemma
7.5.9. By Northcott’s theorem in 2.4.9, Qν := H(xν ) is unbounded. By Lemma
7.5.12 , we may assume that rank(Πv (Qν )) ≤ n. If the rank is equal to n for in-
finitely many Qν , then Theorem 7.5.13 contradicts statement (a) of Lemma 7.5.9.
So we may assume that rank(Πv (Qν )) < n for all ν and we may apply our above
considerations in Step IX.
By Lemma 7.5.33, the K -vector space V (Πk (Qν )) determines uniquely the K -
vector space Wk (Qν ) of dimension k generated by v(i) (i = 1, . . . , k). Since
the spaces V (Πk (Qν )) form a finite set, the associated spaces Wk (Qν ) also form
a finite set. Since k ≥ rank(Π(Qν )), we conclude that xν is a linear combination
of the vectors in K n+1 , denoted by x(1) , . . . , x(k) , determining the first k suc-
cessive minima . By construction, this holds also for v(1) , . . . , v(k) and hence the
226 T H E S U B S PAC E T H E O R E M

points xν belong to finitely many proper subspaces of K n+1 , contradicting state-


ment (a) of Lemma 7.5.9. Since the hypothesis of Lemma 7.5.9 was the falsity of
the subspace theorem, the subspace theorem must hold.

7.6. Further results: the product theorem

In this section we give a quick review, without proofs, of important progress in this area.
7.6.1. In his landmark paper [114], G. Faltings introduced a completely new approach to
study the index of a multihomogeneous polynomial at a point. We describe now the simplest
version of Faltings’s basic result, the product theorem.
We work in a product
PK := PnK1 × · · · × PnKm
of projective spaces, over an algebraically closed field K of characteristic 0 and with
sections
f ∈ Γ(P, OP (d1 , . . . , dm ))
associated to the ample line bundle OP (d1 , . . . , dm ) (here the degrees d1 , . . . , dm are
positive integers). Recall that f may be identified with a multihomogeneous polynomial,
homogeneous of degree dh in the variables xh = (xh0 , . . . , xhn h ) (see Example A.6.13).
7.6.2. For x ∈ PK , we define the index of f in x with respect to the weights d by
ind(f ; d; x) := min{(I/d) | ∂I f (x) = 0},
where I = (i1 , . . . , im ) with ih ranging over Nn h for h = 1, . . . , m and where (I/d)
and ∂I are defined as in 7.5.14.
This notion extends the definition of the index in 6.3.2 in the following way. For a polyno-
mial F ∈ K[x1 , . . . , xm ] with partial degrees at most dh , we may consider the multiho-
mogenization
f (x10 , x11 ; . . . ; xm0 , xm1 ) := xd101 · · · xdm0
m
F (x11 /x10 , . . . , xm1 /xm0 )
 m
of multidegree d . By passing from x ∈ AK to the multiprojective space P1K
n
, it is
clear that the index of F in x with respect to the weight d as defined in 6.3.2 is the same
as ind(f ; d; x) .
7.6.3. Let σ ≥ 0 . Faltings’s product theorem gives information on the geometry of the set
Zσ of PK on which ind(f ; d; x) ≥ σ . Since Zσ is the zero set of the multihomogeneous
polynomials ∂I f ((I/d) < σ) , it is a closed subvariety of PK .

Now we can state a simple version of Faltings’s product theorem.


Theorem 7.6.4. Let K be an algebraically closed field of characteristic 0 . Let m, n1 , . . . ,
nm be positive integers, P = Pn 1 × · · · × Pn m . Then, for every ε > 0 , there is C > 0 ,
depending on ε and m, n1 , . . . , nm , with the following property. Suppose:

(a) d1 > · · · > dm are rapidly decreasing positive integers, namely dh /dh+1 ≥ C
for h = 1, . . . , m − 1 .
(b) f ∈ Γ(PK , OP (d1 , . . . , dm )) \ {0} .
7.7. The Faltings–Wüstholz theorem 227

(c) For some σ ≥ 0 , Z is an irreducible component of both Zσ and Zσ+ε (with


respect to the weight vector d ).

Then:

(i) Z is a product of closed subvarieties Zi of Pn i , i.e. Z = Z1 × · · · × Zm .


(ii) The degrees deg(Zi ) are bounded in terms of ε and n1 + · · · + nm only.
(iii) If K = Q , the varieties Zi admit presentations pi in the sense of Section 2.5 such
that
d1 h(p1 ) + · · · + dm h(pm )  h(f ) + d1 ,
where the implied constant depends on ε and n1 , . . . , nm .

We will not prove here this important result, referring to Faltings’s paper [114], to the article
of M. van der Put [303] for a simple proof of (i) and (ii), and to the versions with explicit
good constants in Evertse [106], Ferretti [119], and Rémond [241].
Remark 7.6.5. Part (iii) of the thesis of this theorem is best stated taking for h(Zi ) a more
intrinsic notion of height, rather than the hand-made height through presentations. This is
done by Faltings in [114], by definining the height as an intersection number of arithmetic
cycles in Pn i . Another definition is by taking the height of the Chow point defining Zi ;
this second definition is equivalent to Faltings’s, up to a simple uniformly bounded error
term (see J.-B. Bost, H. Gillet, C. Soulé [45], Sec.4.3).
7.6.6. The product theorem is used as follows. Let N be an integer N > dim(P) and let
σ > 0 . Assume that ind(f ; d; x) ≥ σ at some x ∈ P . Then there exist a chain
P = Z1 ⊃ Z2 ⊃ · · · ⊃ ZN  x
with Zi an irreducible component of Ziσ/N . Since each Zi is irreducible, the dimension
drops every time we have Zi = Zi+1 and it follows from N > dim(P) that Zi = Zi+1
for some i . Taking ε = σ/N , we may apply the product theorem and deduce that Zi is a
product variety Zi = Zi1 × · · · × Zim . Obviously, we must have Ziµ = Pn µ for some µ
and the study of the vanishing of f on Zi is reduced, by projecting to some linear subspace
of Pn µ , to the study of the vanishing in a multiprojective space with smaller dimension
than that of P . Then we apply induction.
It turns out that this inductive procedure is much more efficient that Roth’s method using
Wronskians, where we lose a square root every time we increase m by 1 . For example,
m −1
proving Roth’s lemma using the product theorem allows us to replace σ 1/2 by the much
1/m
better σ (with minor changes for the other constants, see [106], Th.3). In quantitative
results, this has the effect of replacing doubly exponential bounds by simply exponential
bounds.

7.7. The absolute subspace theorem and the Faltings–Wüstholz theorem

It is possible to obtain quantitative versions of the subspace theorem, in which we control


the number of subspaces containing all solutions of height exceeding a certain bound, in the
same way as was the case with the Davenport–Roth refinement of Roth’s theorem (see (6.23)
228 T H E S U B S PAC E T H E O R E M

on page 173, 6.5.8). Evertse and Schlickewei [111] have obtained a remarkably strong result
of this type, which in view of its strong uniformity with respect to its dependence on the
field K and the set of places S has proved to be a powerful tool in applications (Theorem
7.4.1 is an example).
7.7.1. Let K be a number field with a finite set S of places. For v ∈ S , let Lv0 , . . . , Lvn
be linearly independent linear forms in x0 , . . . , xn with coefficients in K . We assume that
the coefficients of the linear forms Lv0 , . . . , Lvn are contained in a field extension of K
of degree at most D and that HAr (Liv ) ≤ H . We denote by | |v,K an extension of | |v
to K .

Now we can state the absolute subspace theorem of Evertse and Schlickewei (see [111],
Th.3.1, for explicit constants and a proof).
Theorem 7.7.2. Let ε > 0 . Then there are proper linear subspaces T1 , . . . , Th of PnK
with h bounded in terms of n , |S| , D and ε , with the property that the set of solutions
x ∈ Pn (K) of
 
n
|Lvi (σx)|v,K 
max ≤ H(x)−n−1−ε |det(Lvi )|v,K
v∈S i=0
σ∈Gal(K /K ) |σx|v,K v∈S

and with H(x) bounded below by a constant given in terms of n, ε, H is contained in


T1 ∪ · · · ∪ Th .
7.7.3. It is a natural question to obtain an extension of the subspace theorem in which
linear forms are replaced by homogeneous forms of higher degree, and more generally
by functions measuring a distance from an arbitrary algebraic variety. An answer to this
question was obtained by G. Faltings and G. Wüstholz in 1994, in an innovative paper
[117], which, as a special case, also gave a completely new proof of the subspace theorem.
We describe their result, beginning with the linear case. Let K ⊂ F be number fields and
let S be a finite set of places of F . For each w ∈ S , we fix a finite index set Iw , non-zero
linear forms Lwi ∈ F [x0 , . . . , xn ] and cwi ≥ 0 (i ∈ Iw ) .
Let V = Γ(PnK , OPn (1)) and VF = V ⊗K F . For every w ∈ S and any positive
real number p , we consider the subspace of VF generated by the linear forms for which
cwi ≥ p . In this way we get a finite chain of subspaces
VL = W 0  W 1  · · ·  W e  W e+1 = 0
of VL . Let pwj be the minimum of the cwi when i runs over the indices given by the
generators of W j , for j = 0, . . . , e . We also put pw0 = 0 if W 0 is not generated by the
forms Lwi .
For a subspace W of V , we define an invariant

e
dim((W j ∩ WF )/(W j+1 ∩ WF ))
µ(W ) := pwj .
w∈S j=0
dim(W )

The set of filtrations so obtained on V is jointly semistable, if for each non-zero proper
subspace W ⊂ V we have µ(W ) ≤ µ(V ) .
7.7. The Faltings–Wüstholz theorem 229

Then the Faltings–Wüstholz theorem is as follows. We state it with the normalizations


used in this book; for the proof and for details of the following remarks, we refer to [117].

Theorem 7.7.4. Assume that the linear forms Lwi (w ∈ S, i ∈ Iw ) define a jointly
semistable filtration on V and that µ(V ) > 1 . Then the number of points x ∈ Pn (K)
with
|Lwi (x)|w
< H(x)−c w i (w ∈ S, i ∈ Iw )
|x|w
is finite.

7.7.5. A more general theorem where the linear forms Lwi need not be jointly semistable
is then obtained by considering the first non-trivial step W in the Harder–Narasimhan fil-
tration of V . This is the unique subspace W of V characterized by the property that
(µ(W ), dim(W )) is maximal with respect to the lexicographic order.
Let P∗ (V ) denote the projective space of one-dimensional quotient spaces of V . The
conclusion now is that if µ(W ) > 1 then there are only finitely many x ∈ P∗ (V )(K) \
P∗ (V /W ) such that

|Lwi (x)|w
< H(x)−c w i (w ∈ S, i ∈ Iw ).
|x|w

It is not difficult to deduce from this Schmidt’s subspace theorem.

7.7.6. The Faltings–Wüstholz theorem can be applied to the study of a system of inequali-
ties
|fwi (x)|w < H(x)−c w i (w ∈ S, i ∈ Iw ),
where now fwi ∈ F [x0 , . . . , xn ] are homogeneous forms of any degree. One may assume
that they have all the same degree r and then one associates to this the corresponding
linear forms obtained by a Segre embedding Pn → PN using all monomials of degree r ,
see 1.5.14. Although the results obtained in this way probably are not optimal, they are
usually stronger than those obtained by a straightforward application of Schmidt’s subspace
theorem.

7.7.7. The computation of the invariants µ(V ) and the verification of the semistability
condition is not easy. R.G. Ferretti [120] has considered more generally replacing the forms
fwi , which define hypersurfaces in Pn , by projective subvarieties of Pn , and has shown
how to compute the associated invariants using the Chow form associated to a subvariety of
Pn .
Finally, in an interesting paper J.-H. Evertse and R.G. Ferretti [110] have been able to
combine this point of view with the absolute subspace theorem in 7.7.2, obtaining a rather
strong absolute version of the general Faltings–Wüstholz theorem. A new idea in their paper
is the use of a more general type of Segre embedding, which is chosen in an optimal way
so as to produce the best exponents. As a consequence, they obtain the Faltings–Wüstholz
theorem as a consequence of the original Schmidt’s subspace theorem and of their analysis
of generalized Segre embeddings.
230 T H E S U B S PAC E T H E O R E M

7.8. Bibliographical notes

The exposition in Sections 7.2 and 7.3 follows to a large extent material gleaned
from J.-H. Evertse’s expository paper [105]. The presentation of Siegel’s theorem
and Section 7.4 follow closely Zannier [337], Ch.II–IV.
The proof of the subspace theorem in the present form is the result of many years
of step-by-step progress. The first step towards it was W.M. Schmidt’s paper [264]
of 1967, in which he solved the problem in the case n = 2 and K = Q . In that
paper the role of geometry of numbers emerges clearly through the use of Mahler’s
theorems on successive minima of polar bodies. However, the extension to the
general case required control of all successive minima and this was done only in
1970 in [266], when Schmidt introduced the tool of the Grassmann algebra. This
was followed by Schlickewei’s generalization with several places and a general
number field K .
A new direction began with Schmidt’s extension of the Davenport–Roth theorem
to the multidimensional case in [269]. This line of research culminated in the ab-
solute subspace theorem 7.7.2 of Evertse and Schlickewei [111]. The remarkable
uniformity of their result with respect to fields of definition and the set of places S
has proven to be essential in applications. Essential ingredients in the proof of the
absolute subspace theorem are an absolute version of geometry of numbers (with
a corresponding absolute Siegel lemma) found by Roy and Thunder [247], [248],
a precise gap principle for the sequence of solutions, and a precise quantitative
version of Faltings’s product theorem.
Vojta in [307] gives a succint account of the subspace theorem stressing certain
analogies with the work of L. Ahlfors [7] on meromorphic curves. A version
of the subspace theorem allowing “moving targets,” similar to Theorem 6.5.2 in
Roth’s case, is in Ru and Vojta [250].
The paper by Faltings and Wüstholz [117] gave a deep geometric extension of
the subspace theorem, using new methods quite independent of Schmidt’s. The
main tools are Faltings’s fundamental product theorem and the introduction of the
Harder–Narasimhan filtration in order to be able to apply probabilistic methods for
the construction of the auxiliary polynomial.
The proof of Theorem 7.5.13 is patterned after Evertse in [108], with several sim-
plifications because we do not keep track of constants. The rest of the proof is
modeled after Evertse’s treatment of the rational case, see [105]. For Faltings’s
product theorem, we also recommend the illuminating review of [117] by J.-H.
Evertse in [107].
8 A B E L I A N VA R I E T I E S

8.1. Introduction

This chapter contains fundamental preparatory material on abelian varieties and


Jacobians of algebraic curves. Abelian varieties are defined as complete, geomet-
rically irreducible, and geometrically reduced group varieties. The main proper-
ties of abelian varieties are obtained using the seesaw principle, the theorem of the
cube, and the theorem of the square.
Classically, the theorem of the cube is proved directly using the theory of the
Jacobian of a curve, then deducing the existence of the Picard variety, and the
theorem of the square. On the other hand, once we have the Picard variety, it
is easy to deduce both the theorem of the square and the theorem of the cube.
Thus our philosophy will be to take for granted the existence of the Picard variety,
borrowed as a result about representable functors from algebraic geometry, and
deduce from this the basic theorems we need.
Section 8.2 contains preliminary material on group varieties, limited however to
what is needed to develop the theory of abelian varieties. Section 8.3 deals with el-
liptic curves, including the well-known method for obtaining a Weierstrass model
via the Riemann–Roch theorem.
The next three sections deal with the Picard variety and the theorems of the square
and the cube. Section 8.7 studies the basic isogeny defined by multiplication
by n . Section 8.8 contains the characterization of odd elements in the Picard
variety. Section 8.9 studies the factorization of an abelian variety into simple
abelian varieties and proves Poincaré’s complete reducibility theorem. Section
8.10 deals with the construction of the Jacobian of a curve and gives the main prop-
erties of the theta divisor, needed for the proof of the Mordell conjecture (Faltings’s
theorem).
In order to read this chapter, the reader should have some knowledge of algebraic
geometry as provided by Appendix A. At the end of almost every section, we give
a complex analytic description of the results. These expositions are rather sketchy
and will not be used anywhere else in the book, but they might be useful for readers
with a background in complex analysis (see also Section A.14).
231
232 A B E L I A N VA R I E T I E S

8.2. Group varieties

Let K be a field and K an algebraic closure of K . We assume that all occurring


varieties and morphisms are defined over K .
After the basic definitions, we state and prove the constancy lemma. If a mor-
phism ϕ defined on a product X × Y is constant on one fibre X × {y0 }, then
completeness of X implies that ϕ is constant on every X ×{y}. Abelian varieties
are complete group varieties. As applications of the constancy lemma, we get the
striking facts that abelian varieties are commutative and that every morphism of
abelian varieties is a translation of a homomorphism.
Another important tool is the use of translations to prove that a generic property of
a group variety holds everywhere. In this way, we will show that abelian varieties
are smooth, that the dimension formula and other properties hold for homomor-
phisms and that the tangent bundle is trivial. Another important result is that a
rational map to an abelian variety is a morphism at all smooth points. Finally,
complex abelian varieties are biholomorphic to complex tori equipped with posi-
tive definite Riemann forms.
In this section, the reader is assumed to be familiar with complete varieties (see
Section A.6) and with the concept of smoothness (see Section A.7).
Definition 8.2.1. A variety G with morphisms
m : G × G −→ G, (x, y) −→ xy (multiplication),
−1
ι : G −→ G, x −→ x (inverse),

and with an element ε ∈ G(K) is called a group variety (over K ) if G(K) is a


group with multiplication, inverse, identity induced by m, ι, ε.
If G1 , G2 are group varieties with multiplications m1 , m2 , then a morphism ϕ :
G1 → G2 with ϕ ◦ m1 = m2 ◦ (ϕ × ϕ) is called a homomorphism of group
varieties. If there is also a homomorphism ψ : G2 → G1 such that ψ ◦ ϕ and
ϕ ◦ ψ are both the identity, then ϕ is called an isomorphism of group varieties. If
G1 = G2 , then a homomorphism (resp. isomorphism) is called an endomorphism
(resp. automorphism) as usual.
A closed subvariety of G , whose K -rational points form a subgroup of G(K), is
a group variety. We say that it is a closed subgroup of G .
Example 8.2.2. The linear tori studied in Chapter 3 are commutative affine group
varieties. In this chapter, we want to perform a similar study for the following
objects:
Definition 8.2.3. An abelian variety is a geometrically irreducible and geometri-
cally reduced complete group variety.
8.2. Group varieties 233

A homomorphism of abelian varieties is nothing else other than a homomorphism


of group varieties. An abelian subvariety B of an abelian variety A is a geometri-
cally irreducible and geometrically reduced closed subgroup of A . As B is closed
in A , it is again an abelian variety.
Example 8.2.4. Let Mn denote the set of n × n -matrices with entries in K .
2
The identification of An (K) with Mn , together with addition, makes the latter
into an irreducible affine group variety over K . The determinant is a morphism
det : Mn → A1K , so we have an affine open irreducible subvariety GL(n)K ,
defined as the complement of the vanishing locus of the determinant (cf. Example
A.3.12). It is immediately seen that GL(n)K and matrix multiplication form a
group variety. All closed subgroups, as for example the special linear group
SL(n)K := {a ∈ GL(n)K | det(a) = 1}
or the upper triangular matrices, are affine group varieties.
Remark 8.2.5. We mention some general facts about the structure of group varieties not
used in the book. Every affine group variety is isomorphic to a closed subgroup of GL(n)
(cf. W.C. Waterhouse [323], 3.4). We will not consider here the theory of affine group
varieties and we refer the reader to the literature.
Let G be an irreducible group variety over a perfect field K . By a theorem of Chevalley (cf.
S. Bosch, W. Lütkebohmert, and M. Raynaud [44], Th.9.2.1), there is a smallest irreducible
affine closed subgroup H and an abelian variety A such that we have an exact sequence
0 −→ H −→ G −→ A −→ 0.
To study general group varieties we have to understand both affine group varieties and
abelian varieties. Since the trivial group variety A0K is the only complete geometrically
irreducible affine variety (see A.6.15 (d)), no other affine group variety is an abelian variety.

Next, we come to the constancy lemma:


Lemma 8.2.6. Let X, Y, Z be varieties such that X is complete and both X and
Y are geometrically irreducible. If f : X × Y → Z is a morphism such that
f (X × {y0 }) = {z0 }
for some y0 ∈ Y and z0 ∈ Z , then f (X × {y}) is a point for every y ∈ Y .
Proof: By base change, we may assume that K is algebraically closed. Let U be
an open affine neighborhood of z0 . The image C of f −1 (Z\U ) by the projection
X × Y → Y is closed because X is complete. Then V : = Y \C is an open
neighborhood of y0 and, for any y ∈ V , we have a morphism X → U , given by
x → f (x, y). Since X is complete and irreducible and U is affine, the morphism
has to be constant for any y ∈ V , with image f (x0 , y) choice of a point x0 ∈ X
(use A.6.15).
234 A B E L I A N VA R I E T I E S

Now note that


+
S := {y ∈ Y | |f (X × {y})| = 1} = {y ∈ Y | f (x1 , y) = f (x2 , y)}
x1 ,x2 ∈X

is closed in Y . Since it contains the non-empty open subset V of Y and since Y


is irreducible, we conclude that S = Y , proving our claim. 
Example 8.2.7. The affine line A1K is not a complete variety, because xy = 1
is a closed subvariety of A1K × A1K , while its projection on the second factor is
A1K \{0}, which is not closed in A1K . Now consider the morphism f : A1K ×
A1K → A1K , given by (x, y) → xy . Then f satisfies the hypothesis of the con-
stancy lemma in 8.2.6, namely
f (A1K × {0}) = {0}.
This shows that the constancy lemma in 8.2.6 does not hold for non-complete X .
Corollary 8.2.8. Let X, Y be geometrically irreducible varieties with at least one
K -rational point. We assume that X is complete. A morphism f : X × Y −→ G
of a product into a group variety G factorizes as f (x, y) = g(x)h(y), for suitable
morphisms g : X −→ G and h : Y −→ G .
Proof: We choose a point y0 ∈ Y (K) and define g : X −→ G by g(x) =
f (x, y0 ). The morphism F : X × Y −→ G defined by F (x, y)g(x)−1 f (x, y)
satisfies F (X × {y0 }) = {ε}, where ε is the identity of G . Now the constancy
lemma in 8.2.6 shows that F (X × {y}) is a point, say h(y), for every y ∈ Y ,
and f (x, y) = g(x)h(y). In order to verify that h is a K -morphism, note that
h = f (x0 , ·)g(x0 )−1 for any x0 ∈ X(K). 
Corollary 8.2.9. Let ϕ : A → G be a morphism of the abelian variety A into
the group variety G . Then the map
ψ : A −→ G, a −→ ϕ(a)ϕ(εA )−1
is a homomorphism of group varieties.
Proof: Apply the constancy lemma (see Lemma 8.2.6) with f : A × A → G ,
given by
(x, y) → ψ(x)ψ(y)ψ(xy)−1 ,
and with y0 and z0 the identity elements εA , εG of A and G . We conclude that
the restriction of f to A×{y} is a constant map for every y . Since f ({εA }×A) =
{εG }, we deduce that f is constant, with image the identity of G . 
Corollary 8.2.10. An abelian variety is commutative.
Proof: By Corollary 8.2.9, the inverse map ι is a homomorphism. This is equiva-
lent to commutativity. 
8.2. Group varieties 235

8.2.11. This allows us to change conventions. From now on, we write an abelian
variety additively, hence
m(x, y) = x + y,
ι(x) = −x,
and the identity is denoted by 0. For a ∈ A , the morphism τa (x) : = x + a is
called translation by a.

For n ∈ Z, we denote by [n] the endomorphism of A , which is multiplication by


n . The kernel of [n] is denoted by A[n]. It is the torsion subgroup of A . We will
also use these notations for any abelian group.
Proposition 8.2.12. A geometrically reduced group variety is smooth.
Proof: By base change and using A.7.14, we may assume K algebraically closed.
By A.7.16, there is an open dense smooth subset U . As above, we can define left-
and right-translation by a point of the group variety. They are automorphisms and
so the left-translation of U is also smooth. If we vary the left-translations, then we
get an open cover of the group variety, proving the claim. 
Proposition 8.2.13. For a group variety G over K , the following conditions are
equivalent:
(a) G is connected;
(b) G is geometrically connected (i.e. GK is connected);
(c) G is irreducible;
(d) G is geometrically irreducible.
In particular, a connected complete geometrically reduced group variety over K
is an abelian variety.
Proof: First, we note that a K -variety with at least one K -rational point is con-
nected if and only if it is geometrically connected (use A. Grothendieck [137],
Prop.4.5.13). This proves equivalence of (a) and (b). Every irreducible variety is
connected. So it remains to prove that (b) implies (d). We may assume that K is
algebraically closed and G connected. By Proposition 8.2.12, G is smooth and
therefore G is the disjoint union of its irreducible components (cf. A.7.14). We
conclude that G is irreducible. 
8.2.14. Let ϕ : G −→ H be a homomorphism of group varieties. Then the image
im(ϕ) is a closed subgroup of H ([85], Ch.II, §5, Prop.5.1).
8.2.15. There are more problems in handling ker(ϕ) . The main difficulty is that, if we
insist in defining ker(ϕ) as a group variety, then we lose the main formalism associated
to group homomorphisms. This is due to the appearance of nilpotent elements in ker(ϕ) ,
i.e. ker(ϕ) need not be reduced. For example, the support of the kernel of the Frobenius
236 A B E L I A N VA R I E T I E S

homomorphism x
→ xp on the multiplicative group Gm over Fp contains only 1 , and
therefore cannot be distinguished from the kernel of the identity map. These difficulties
disappear in the context of group schemes. For details, we refer to I.R. Shafarevich [280],
Ch.V, 4.2. The natural way is to define ker(ϕ) as the cartesian product G ×H Spec(K) in
the category of schemes with respect to the Cartesian diagram
ker(ϕ) → Spec(K)

εH

↓ ↓
ϕ
G → H

where εH is the map of Spec(K) to the neutral element of H . Then ker(ϕ) is a closed
subscheme such that its K -rational points form a group. On the other hand, ker(ϕ) need
not be reduced and so is not necessarily a group variety. Working with group schemes is
therefore the natural way of overcoming these obstacles, leading to a coherent theory. The
more elementary classical theory in the framework of group varieties is adequate only for
separable maps. On the other hand, a famous result of Cartier states that every group scheme
in characteristic 0 is reduced, i.e. is a group variety (see [85], Ch.II, §6, no.1).
Since the fact that ker(ϕ) is in general only a group scheme plays no role in this text, we
only consider
ker(ϕ) := {x ∈ G(K) | ϕ(x) = εH }
as a closed subgroup of G , unless specified otherwise.

Next, we consider the dimension theorem of group varieties.


Theorem 8.2.16. Let ϕ : G → H be a surjective homomorphism of irreducible
group varieties. Then
dim(G) = dim(H) + dim(ker(ϕ)).
Proof: Since all fibres are isomorphic to ker(ϕ), this follows from the dimension
theorem of varieties (cf. A.12.1). 
Proposition 8.2.17. Let ϕ : G → H be a surjective homomorphism of irre-
ducible group varieties. Then ϕ is flat. Moreover, if dim(G) = dim(H), then ϕ
is finite and |ker(ϕ)| is equal to the separable degree of the field extension K(G)
over K(H).
Proof: By generic
 flatness (cf. A.12.13), there is an open dense subset U of
G such that ϕU is flat. Of course, any translate of U is as good as U itself.
Assuming for a moment K algebraically closed, we may cover G by translates of
U . This proves flatness of ϕ . If K is not algebraically closed, then we perform
base change to K . However, we have to work in the category of schemes to ensure
that a morphism over K is flat if and only if its base change to K is flat (cf. [137],
Prop.2.5.1).
8.2. Group varieties 237

If dim(G) = dim(H), then there is an open dense subset U  of H such that ϕ


induces a finite map U := ϕ−1 (U  ) → U  whose fibres have cardinality equal
to the separable degree of K(G) over K(H) (cf. A.12.9). Also, this cardinality
equals |ker(ϕ)|. Again, we assume that K is algebraically closed to cover G by
translates of U proving finiteness of ϕ . If K is not algebraically closed, then we
use base change to K and the fact that a morphism over K is finite if and only if
its base change to K is finite (cf. [137], Prop.2.7.1). 
8.2.18. A rational curve is a curve birational to P1K . A variety X is called ra-
tionally connected if any two points in X(K) may be connected by a rational
curve over K . It follows from the constancy lemma in 8.2.6 that abelian varieties
do not contain rational curves (cf. Corollary 8.2.20 below). In particular, a mor-
phism X −→ A of X into an abelian variety A contracts the rational curves of
X to points. It follows that any morphism of a rationally connected variety, such
as projective space Pn , into an abelian variety is constant.
Proposition 8.2.19. Any morphism f : P1K −→ G of the projective line into a
group variety is constant.
Proof: Let (x0 : x1 ) be homogeneous coordinates on P1K . The map s : P1K ×
A1K −→ P1K given by s((x0 : x1 ), y) = (x0 : (x1 + x0 y)) is a morphism. Now
let f : P1K −→ G be a morphism of P1K into the group variety G . We apply
Corollary 8.2.8 to the composition
s f
P1K × A1K −→ P1K −→ G
and obtain that f ◦ s factorizes as f (s(x, y)) = g(x)h(y) for two suitable mor-
phisms g : P1K −→ G , h : A1K −→ G .
We set first y = 0, note that s(x, 0) = x to get g(x) = f (x)h(0)−1 . Thus
f (s(x, y)) = f (x)h(0)−1 h(y).
Next we set x = ∞ , note that s(∞, y) = ∞ to get
f (∞) = f (∞)h(0)−1 h(y).
This shows that h(y) = h(0), so h is a constant map and f (s(x, y)) = f (x).
Finally we set x = 0, note that s(0, y) = y , and find f (y) = f (0). 
Corollary 8.2.20. Let U be an open set of the projective line P1K . Then any
morphism f : U −→ A of U into an abelian variety is constant.
Proof: By the valuative criterion of properness (cf. A.11.10), f extends to a
morphism from P1K to A . By Proposition 8.2.19, we get the claim. 
Theorem 8.2.21. Let ϕ : X  G be a rational map of a smooth variety X
to a group variety G and let Umax be the domain of ϕ . Then every irreducible
component of X \ Umax is of codimension 1.
238 A B E L I A N VA R I E T I E S

Proof: By base change and A.11.9, we may assume K algebraically closed. By


A.7.14, we may assume that X is irreducible. Consider the rational map Φ :
X × X  G , given by Φ(x, y) := ϕ(x)ϕ(y)−1 . It is clear that the restriction of
Φ to the diagonal ∆ is constant equal to the identity ε of G . First, we prove that
ϕ is defined in x ∈ X if and only if Φ is defined in (x, x). If ϕ is defined in x,
then Φ is obviously defined in (x, x) and Φ(x, x) = ε . This proves also that Φ
is defined and constant ε on an open dense subset of the diagonal. Conversely, we
assume that Φ is defined in (x, x). The above shows that Φ(x, x) = ε . Since Φ
is defined in an open neighbourhood W of (x, x), there is an open neighbourhood
V of x such that {x} × V ⊂ W . Let y ∈ Umax ∩ V . Then we define ϕ(x) :=
Φ(x, y)ϕ(y). This defines a morphism even in an open neighbourhood of x and
agrees with the given ϕ on Umax . This proves x ∈ Umax .
By A.11.8, Φ is defined in (x, x) if and only if f → f ◦ Φ maps OG,ε to
OX×X,(x,x) . The latter is equivalent to the condition that for every f ∈ OG,ε \
{0}, the poles of div(f ◦ Φ) omit (x, x) (cf. A.8.21). Given f ∈ OG,ε , let Pf
be the divisor of poles of f ◦ Φ . We have seen above that Φ is defined on an
open subset of the diagonal, hence every irreducible component of ∆ ∩ Pf has
codimension 1 in ∆ (cf. A.8.24). We conclude that the complement of Umax is
equal to the union of all ∆ ∩ Pf projected to X . This proves the claim. 
Corollary 8.2.22. A rational map from a smooth variety to an abelian variety is a
morphism.
Proof: Let ϕ : X  A be a rational map with domain Umax . By the valuative
criterion of properness (cf. A.11.10), X \ Umax has codimension at least 2. Then
Umax = X follows by appealing to Theorem 8.2.21. 
Remark 8.2.23. Next, we prove that the differential of multiplication on a group
variety is given by addition. This is quite easy to see for a complex abelian variety
given analytically by V /Λ , where Λ is a lattice in the complex vector space V (see
8.2.27 below). Then V may be seen as the tangent space at 0 and the differential
of addition on V /Λ is just addition on V .
Proposition 8.2.24. Let m : G × G −→ G be multiplication of a smooth group
variety G . Then the differential of m at ε is the map TG,ε ⊕ TG,ε −→ TG,ε given
by addition of tangent vectors.
Proof: Note that TG×G,(ε,ε) = TG,ε ⊕ TG,ε (cf. A.7.17). For ∂ ∈ TG,ε , we have
dm(∂, 0) = dm ◦ dι(∂),
where ι : G → G × G, g → (g, ε). Since dm ◦ dι = d(m ◦ ι) is the identity
map, we conclude that dm(∂, 0) = ∂ . In the same way, we prove dm(0, ∂)= ∂ .
By linearity of dm , this gives the claim. 
Corollary 8.2.25. Let G be a smooth group variety and for n ∈ Z , let [n] :
G −→ G be the morphism x → xn . Then the differential of [n] at ε is the
endomorphism of TG,ε given by multiplying tangent vectors with n.
8.2. Group varieties 239

Proof: For ∂ ∈ TG,ε and for the diagonal morphism ∆ : G × G → G , we have


d[2](∂) = d(m ◦ ∆)(∂) = dm ◦ d∆(∂) = dm(∂, ∂) = 2∂
proving the case n = 2. The general case follows by induction with the same
argument. 
Proposition 8.2.26. Let G be an irreducible smooth group variety. Then the tan-
gent bundle TG on G is a trivial vector bundle of rank equal to dim(G).
Proof: Let ∂ε ∈ TG,ε . By translation, we extend ∂ε to a vector field ∂ on G .
More precisely, let τx (y) := yx be right translation on G and let ∂x (f ) :=
∂ε (f ◦ τx ) for any x ∈ G and f ∈ OG,x . Standard arguments for derivatives
show that ∂ is a vector field on G . Clearly, linearly independent tangent vectors
in ε extend to vector fields, which are linearly independent in every fibre. This
proves the claim. 
8.2.27. We review here the analytic description of an abelian variety. For details and proofs,
we refer to D. Mumford [212], Ch.I. Let A be an abelian variety over C endowed with its
complex analytic structure (cf. A.14). Then A is a compact complex manifold. Let V :=

T0 A . The kernel of exp : V → A is a lattice Λ in V and we get an isomorphism V /Λ →
0
A of complex Lie groups. By Proposition 8.2.26, we may identify V with H (A, TA ) =
H 0 (A, Ω1A )∗ , i.e. with the dual of the space of global holomorphic 1 -forms. Then we -may
identify the first homology group H1 (A, Z) with Λ by using the period map ω
→ γ ω
for γ ∈ H1 (A, Z) .
Conversely, let Λ be a lattice in a finite-dimensional complex vector space V . Then the
compact complex manifold T := V /Λ is called a complex torus. A hermitian form H on
V is called a Riemann form for T if E := H is an integer valued alternating bilinear
form on Λ . Note that an alternating bilinear form E on Λ with values in Z is the imaginary
part of a Riemann form if and only if E(iv, iw) = E(v, w) for all v, w ∈ V . We get the
Riemann form by H(v, w) := E(iv, w) + iE(v, w) . The torus T is an abelian variety if
and only if there is a positive definite Riemann form for T . Moreover, this is equivalent for
T to be the complex space associated to a complex algebraic variety.

To give an idea of the Riemann form, let L be a line bundle on the complex torus T .
Here in these analytical remarks, this always means a holomorphic line bundle. Then the
cohomology group H k (T, Z) can be identified with the group of alternating Z -valued k -

forms on Λ . This is clear for k = 1 and follows from the isomorphism Λk H 1 (T, Z) →
H (T, Z) induced by cup product. It is easy to see that multiplication by i is an isometry
k

with respect to the alternating bilinear form corresponding to the Chern class c1 (L) ∈
H 2 (T, Z) and so it may be viewed as the imaginary part of a unique Riemann form H .
Then H is positive definite if and only if L is ample. Note that two line bundles have the
same Chern class (and hence the same Riemann form) if and only if they are algebraically
equivalent (use A.14.8).
An analytic homomorphism ϕ : T = V /Λ → T  = V  /Λ of complex tori is the quotient
map of the linear map dϕ : V → V  of tangent spaces using that dϕ(Λ) ⊂ Λ is induced
from
Λ = H1 (T, Z) → Λ = H1 (T  , Z), γ
→ ϕ ◦ γ.
240 A B E L I A N VA R I E T I E S

Conversely, every linear map ψ : V → V  with ψ(Λ) ⊂ Λ induces an analytic homo-


morphism ϕ : T → T  with dϕ = ψ .

8.3. Elliptic curves

By Proposition 8.2.26, the cotangent bundle of an abelian variety over the field K
is trivial. Thus an abelian variety of dimension 1 has genus 1, i.e. is an elliptic
curve. In this section we prove the converse statement, namely that an elliptic
curve has a group structure and is an abelian variety. By Corollary 8.2.9, this
group structure is unique up to translations.
Elliptic curves are a major tool in arithmetic and play a role also in other parts of
mathematics. The interested reader may consult the monographs by D. Husemöller
[155] or J. Silverman [284] for a deeper study of the subject. We assume the reader
to be familiar with the theory of algebraic curves as provided by Section A.13.
Definition 8.3.1. An elliptic curve over K is a geometrically irreducible smooth
projective curve E of genus g(E) = 1 defined over K , equipped with a rational
point P0 ∈ E(K).
8.3.2. Note that geometrically irreducible is the same as irreducible by A.7.14. Let
E be an elliptic curve over K and let D be a divisor on EK of degree deg(D) >
0. The space of global sections Γ(EK , O(D)) may be realized as the subspace

L(D) := {f ∈ K(EK )× | div(f ) ≥ −D} ∪ {0}

in K(EK ), using the homomorphism s → s/sD . By the Riemann–Roch theorem


in A.13.5, we have
dimK L(D) = deg(D), (8.1)
hence the corresponding linear system |DK | has dimension deg(D)−1. It follows
that two distinct points on E never are rationally equivalent over K .

Let us fix a base point P0 ∈ E(K). For two points P1 , P2 ∈ E(K), let D :=


[P1 ] + [P2 ] − [P0 ]. Thus deg(D) = 1 and L(D) is one-dimensional, generated
by a function f , unique up to multiplication by a scalar. By construction, this
function has a pole divisor majorized by [P1 ] + [P2 ]. If P0 ∈ {P1 , P2 }, then f
has pole divisor [P1 ] + [P2 ] and vanishes at P0 and at exactly one other point P3 ,
which is the unique point rationally equivalent to [P1 ] + [P2 ] − [P0 ]. This makes
sense even if P1 or P2 equals P0 . Thus we get a well-defined composition law
on E by (P1 , P2 ) −→ P1 + P2 := P3 .
We should distinguish carefully between addition of points P1 , P2 on E and of
the corresponding divisors [P1 ], [P2 ]. Remembering that Pic0 (EK ) is the group
8.3. Elliptic curves 241

of rational equivalence classes of divisors of degree 0 (see A.9.40), we get an


additive map
E −→ Pic0 (EK ), P −→ cl([P ] − [P0 ]).
Formula (8.1) shows easily that this map is bijective. We shall give in 8.3.4 an
equation for E as a plane cubic curve, defined over K . This will lead in 8.3.6 to a
geometric interpretation of the group operations and in particular the inverse map
will be shown to be a morphism. In 8.3.7, we deduce explicit formulas for our
composition law in terms of the affine coordinates of the plane cubic curve. They
will be postulated in the addition law 8.3.8. We shall use them in 8.3.9 to prove
that the composition law is a morphism. This will prove:
Proposition 8.3.3. If the group structure on an elliptic curve E over K with base
point P0 ∈ E(K) is given by the bijective map
E −→ Pic0 (EK ), P −→ cl([P ] − [P0 ]),
then E is an abelian variety defined over K .
8.3.4. We give here the classical argument showing that E has a model given by
a smooth cubic curve.
In the same way as in 8.3.2, we realize Γ(E, O(D)) explicitly by
L(D) := {f ∈ K(E)× | div(f ) ≥ −D} ∪ {0}
for any divisor D on E . If deg(D) > 0, then the Riemann–Roch theorem in
A.13.5 again shows that L(D) has dimension deg(D).
We have an ascending chain of K -vector spaces
L([P0 ]) ⊂ L(2[P0 ]) ⊂ L(3[P0 ]) ⊂ L(4[P0 ]) ⊂ L(5[P0 ]) ⊂ L(6[P0 ])
and the j th member has dimension j .
Obviously, 1 is a basis of L([P0 ]). Since P0 is defined over K , there are x, y ∈
K(E) such that 1, x is a basis of L(2[P0 ]) and 1, x, y is a basis of L(3[P0 ]).
Looking at the order of pole at P0 , it is clear that 1, x, y, x2 is a basis of L(4[P0 ])
and 1, x, y, x2 , xy is a basis of L(5[P0 ]). Moreover, x3 , y 2 ∈ L(6[P0 ]). This
gives us seven elements 1, x, y, x2 , xy, x3 , y 2 spanning L(6[P0 ]). They must
be linearly dependent over K and we obtain a linear relation
c0 + c1 x + c2 y + c3 x2 + c4 xy + c5 x3 + c6 y 2 = 0

with coefficients in K . By the above, c5 and c6 are different from zero, so that
we may normalize c5 = −1. If we divide by c36 and replace x by x/c6 and y by
y/c26 , we get a relation of the form
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6 (8.2)
with ai ∈ K .
242 A B E L I A N VA R I E T I E S

Since deg(3[P0 ]) = 3 = 2g(E) + 1, the divisor 3[P0 ] is very ample (cf. A.13.7).
Hence the basis of L(3[P0 ]) corresponding to 1, x, y induces a closed embedding
of E into P2K (cf. Remark A.6.11). We know by (8.2) that the image of E is
contained in the projective curve with Weierstrass equation
x0 x22 + a1 x0 x1 x2 + a3 x20 x2 = x31 + a2 x0 x21 + a4 x20 x1 + a6 x30
in the homogeneous coordinates (x0 : x1 : x2 ) of P2K .
It is an easy matter to prove that the curve defined above is geometrically irre-
ducible, hence it gives a projective model of E as a smooth plane cubic curve.
Note also that the rational functions x = x1 /x0 and y = x2 /x0 are nothing
else than the two functions x, y defined before, hence the affine form (8.2) of the
Weierstrass equation describes the affine curve E ∩ {x0 = 0}. The only point of
E outside this part is the point (0 : 0 : 1) ∈ P2K , corresponding to P0 ∈ E . It is
easily seen that, in this model, P0 is an inflexion point of E .
Remark 8.3.5. If char(K) = 2, then replacing y by 12 (y − a1 x − a3 ) leads
to a Weierstrass equation with a1 = a3 = 0. Then the Jacobi criterion shows
that a Weierstrass equation describes a smooth curve C in P2K if and only if the
discriminant of the cubic polynomial x3 + a2 x2 + a4 x + a6 is not zero (see
Proposition 10.2.3 for the argument). By the genus formula
1
g(C) = (deg(C) − 1)(deg(C) − 2),
2
for a smooth plane curve (cf. A.13.4), this is an elliptic curve. If char(K) = 3,
then a further linear transformation leads to the well-known Weierstrass normal
form
y 2 = 4x3 − g2 x − g3
of the elliptic curve. For generalizations and details, see [284].
8.3.6. Now we go back to arbitrary characteristic. We describe more explicitly
the group structure of the abelian group E , beginning by proving that the inverse
operation is a morphism.

Consider the rational equivalence relation


[P1 ] + [P2 ] + [P3 ] ∼ 3[P0 ] (8.3)
on EK . This relation is equivalent to the geometric statement that the points
P1 , P2 , P3 are the three intersection points, counted with multiplicity, of a straight
line with E . We verify this as follows. The lines in P2K are just the divisors
of the global sections of OP2 (1) and, by construction, the restriction of this line
K
bundle to E is isomorphic to O(3[P0 ]). First, we assume that [P1 ] + [P2 ] +
[P3 ] ∼ 3[P0 ]. Then it exists s ∈ Γ(EK , O(3[P0 ])) with div(s ) = [P1 ] +
[P2 ] + [P3 ] (cf. A.8.21). By construction of the embedding E → P2K , there
is s ∈ Γ(P2K , OP2 (1)) with s = s|E . Then the line  = div(s) is the line
K
8.3. Elliptic curves 243

through the three points Pi . Indeed, by definition of proper intersection product


(cf. A.9.20), we have
.E = div(s|E ) = div(s ) = [P1 ] + [P2 ] + [P3 ].
The converse statement is proved in the same way by reversing the previous argu-
ment.
The zero element of E is the point P0 = (0 : 0 : 1).
The inverse P2 := −P1 of a point P1 ∈ E is characterized by the rational
equivalence relation [P1 ] + [P2 ] ∼ 2[P0 ], which can be rewritten as the special
case
[P0 ] + [P1 ] + [P2 ] ∼ 3[P0 ]
of (8.3).
It follows that P0 , P1 , P2 are on a straight line and in fact, noting that P0 = (0 : 0 :
1), we see that, if P1 = P0 , then P2 is the residual finite intersection of E with
the vertical line in the (x, y)-plane going through P1 . If (x1 , y1 ) are the affine
coordinates of P1 , then, using (8.2) on page 241, the affine coordinates (x2 , y2 )
of P2 are given by
x2 = x1 ,
y2 = −a1 x1 − a3 − y1 .
Thus the inverse map is an automorphism of the affine part of E defined over
K . On the other hand, a rational map of a smooth projective curve is always a
morphism (cf. A.11.10). We conclude that the above restriction extends to an
automorphism of E . This requires that 0 is mapped to 0, hence the inverse map
is a morphism on E defined over K .
8.3.7. We study here addition on the elliptic curve. By 8.3.6, it is enough to con-
struct
P3 = − (P1 + P2 ).
The point P3 is characterized by the rational equivalence relation (8.3). As we
have seen in 8.3.6, P3 is the third intersection point of the line  through P1 and
P2 with E , taking this line to be the tangent line to E at P1 if P1 = P2 .
If P1 = P0 and P2 ∈ / {P0 , −P1 }, then the third intersection point of the line
through P1 , P2 with E is contained in the (x, y)-plane. Let y = ax + b be
the equation for this line. We eliminate y in (8.2) on page 241 obtaining a cubic
equation for x , with two known solutions x1 , x2 . This equation has the form
x3 − (a2 + a1 a − a2 )x2 + terms of lower degree = 0.
The third solution x3 is determined by the trace x1 + x2 + x3 = a2 + a1 a − a2 .
Since P1 + P2 = −P3 , applying the inverse as in 8.3.6 we obtain:
244 A B E L I A N VA R I E T I E S

Proposition 8.3.8 (Addition Law). Let E be the elliptic curve in normal form

y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6 .

Then the origin O of the group E is the unique point at infinity and the group law
+ is defined as follows. Let P1 = (x1 , y1 ), P2 = (x2 , y2 ) be two finite points on
E and set
y2 − y1
a= if x1 = x2 ,
x2 − x1
3x2 + 2a2 x1 + a4 − a1 y1
a= 1 if x1 = x2 ,
2y1 + a1 x1 + a3
b = y1 − ax1 .

Then:

(a) The inverse of P1 is given by −P1 = (x1 , −a1 x1 − a3 − y1 ).

(b) If x2 = x1 and y2 = −a1 x1 − a3 − y1 , then P1 + P2 = O .

(c) Otherwise, we have P1 + P2 = (a2 + a1 a − a2 − x1 − x2 , −(a + a1 )(a2 +


a1 a − a2 − x1 − x2 ) − a3 − b).

The addition law and the associative law for an elliptic curve in Weierstrass form
can be seen visually as in the following picture:

P+Q
P
-2 -1 Q 1 2

-(P+Q+R)

Q+R
-1

-2
R

The addition law and the associative law on y 2 = x3 − x


8.3. Elliptic curves 245

8.3.9. The addition law shows that addition is a rational map. In order to finish
the proof of Proposition 8.3.3, it remains to show that + is a morphism. By
A.11.9, we may assume that K is algebraically closed. In a first step, we prove
that translation τQ by Q ∈ E is a morphism. We may assume Q = O . By the
formulas in Proposition 8.3.8, τQ is a rational map which restricts to a morphism
E \ {O, Q, −Q} → E \ {Q, O, Q + Q}. Since every rational map between
projective smooth curves extends to a morphism (cf. A.11.10), we get a morphism

τQ : E → E , which agrees with τQ on E \ {O, Q, −Q}. It remains to prove
 
that τQ = τQ . For R ∈ E , we get τQ ◦ τR = τQ+R
 
. In particular, every τQ is
 
an isomorphism with inverse τ−Q . We conclude that τQ maps {O, Q, −Q} onto
{Q, Q + Q, O}. For any R ∈ {O, Q, −Q, Q + Q, −Q − Q}, we have
τR (τQ
 
(Q)) = τQ+R 
(Q) = τQ (τR (Q)) = τQ

(Q + R) = Q + Q + R.

This excludes τQ (Q) = Q immediately. On the other hand, we know τR (O) ∈

{O, R, R + R}. Hence τQ (Q) = O is only possible if Q + Q = O . This proves

τQ (Q) = Q + Q = τQ (Q).
The equation

τQ (−Q) = O = τQ (−Q)

is proved in a similar fashion. Thus, using that τQ is a bijection, we conclude

that τQ (O) = Q = τQ (O). We have handled all exceptions, thereby proving that

τQ = τ Q .

Next, we prove that addition is a morphism. The formulas in 8.3.8 show that
addition is a rational map m , which is a morphism outside of
Z := {(P, P ) | P ∈ E} ∪ {(P, −P ) | P ∈ E} ∪ (E × {O}) ∪ ({O} × E) .
For (P, Q) ∈ Z , there are R, S ∈ E such that (P + R, Q + S) ∈ Z . Since
translations are morphisms by our above considerations, we see that
τ−P −Q ◦ m ◦ (τR × τS )
is a morphism in a neighbourhood of (P, Q) and agrees with + everywhere. This
proves that + is a morphism. 
8.3.10. Complex analytically, an elliptic curve is biholomorphic to C/Λ for a lattice Λ in
C (cf. 8.2.27). In dimension 1 the converse is true, i.e. every one-dimensional complex
torus is biholomorphic to an abelian variety: The imaginary part of a Riemann form must
be an integer multiple of the alternating bilinear form E0 given in the following way. Let
λ1 , λ2 be a positively oriented Z -basis of Λ ; then E0 is characterized by
v ∧ w = E0 (v, w)λ1 ∧ λ2 (v, w ∈ C).
A Riemann form is positive definite if and only if its imaginary part is a negative multiple of
E0 . Note however that in higher dimensions a complex torus need not be an abelian variety
(in fact, this is the case for a general complex torus).
246 A B E L I A N VA R I E T I E S

The description of the elliptic curve determined by C/Λ is done quite explicitly by means
of the Weierstrass ℘ - function associated to the lattice Λ , namely
1  $ 1 1
%
℘(z) := 2 + − 2 .
z (z − ω)2 ω
ω∈Λ\{0}

It is a Λ -periodic meromorphic function on C and has double periods at the lattice points.
It satisfies the first-order differential equation
℘ (z)2 = 4℘(z)3 − g2 ℘(z) − g3 ,
where the coefficients g2 , g3 are given by
 1  1
g2 := 60 , g3 := 140 .
ω4 ω6
ω∈Λ\{0} ω∈Λ\{0}

The coefficients g2 , g3 are examples of Eisenstein series.


The map z
→ (℘(z), ℘ (z)) is biholomorphic from C/Λ onto the elliptic curve with affine
Weierstrass equation y 2 = 4x3 − g2 x − g3 . This map is also an isomorphism of groups.
For further details, we refer to [284].

8.4. The Picard variety

Elliptic curves are the only standard explicit examples of abelian varieties, because
higher-dimensional abelian varieties can be defined only by means of a very large
number of equations, and little can be understood on abelian varieties by looking
directly at these equations.
In this respect, the cubic model of an elliptic curve is rather special and not rep-
resentative of the general situation. However, abelian varieties are ubiquitous in
algebraic geometry and they occur most naturally, through the Picard variety, in
the parametrization of families of divisor classes on a variety. For singular vari-
eties, it is better to use the Picard group instead of divisor classes (cf. Section A.8
and A.9.18).
This section is devoted to fundamental facts about Picard varieties. Here, the
reader is assumed to be familiar with the basic properties of the Picard group as
provided by A.5.16, with Section A.8 about divisors and with the concept of al-
gebraic equivalence for line bundles (see end of Section A.9). We fix the ground
field K and an algebraic closure K .
8.4.1. If ϕ : X −→ Y is a morphism of varieties over K and y ∈ Y , then the
fibre of ϕ over y is denoted by Xy . The pull-back of c ∈ Pic(X) to the fibre
Xy is denoted by cy . It is an element of Pic(Xy ). Note that Xy and cy are only
defined over K(y). Often, we identify X with Xy using the map x → (x, y). Of
course, this is only defined over K(y).
8.4. The Picard variety 247

In the following, we consider c ∈ Pic(X × Y ) and the fibres with respect to the
projections p1 , p2 onto the factors. For x ∈ X, y ∈ Y , we have
cy = c|X×{y} ∈ Pic(XK(y) ), cx = c|{x}×Y ∈ Pic(YK(x) ).

The next result is called the seesaw principle:


Theorem 8.4.2. Let X be a geometrically irreducible smooth complete variety
over K and Y an irreducible smooth variety over K . Let c ∈ Pic(X × Y ) and
suppose that there is a dense open subset U of Y such that cy = 0 for all y ∈ U .
Then c is equal to the pull-back of an element of Pic(Y ) by p2 .
Proof: By the semicontinuity theorem (see [148], Th.III.12.8 for the projective
case and A. Grothendieck [136], 7.7, or [212], II, §5, for the proper case), there
is an open dense subset V of Y with H 0 (X × V, ±c) = 0 and thus we can
find corresponding non-zero global sections s+ , s− . Now s+ ⊗ s− is a regular
function on X × V . Since X is complete and geometrically irreducible, this
regular function has to be constant on every fibre over V (use A.6.15). By passing
to a smaller V , we may assume that div(s+ ⊗ s− ) = 0 on X × V . Therefore
div(s± ) = 0 and hence the restriction of c to X × V is trivial (cf. A.8.18). Here
and in what follows, we use the fact that on a smooth variety we may identify
Cartier- and Weil-divisors (cf. A.8.21).
Let s be an invertible meromorphic section of c . Then there is a rational function
f ∈ K(X × Y )× such that div (f ) and div (s) are equal on X × V . Therefore,
their difference div(f )− div (s) is supported in X × Z , where Z is a closed sub-
variety of codimension 1 in Y . If Z1 , . . . , Zr are the irreducible components
of Z , then X × Z1 , . . . , X × Zr are the irreducible components of X × Z
(cf. A.4.11). Therefore the divisor div(f )− div (s) is a linear combination of
the X × Zi ( i = 1, . . . , r ), i.e. a pull-back of a divisor on Y . This proves the
claim. 
Remark 8.4.3. The seesaw principle holds even without smoothness assumptions
(see [212], II, §5). We often use the seesaw principle in the following form.
Corollary 8.4.4. Let X, Y be smooth varieties over K and assume that Y is ir-
reducible and that X is complete and geometrically irreducible. Let c ∈ Pic(X ×
Y ) with cy = 0 for all y in an open dense subset of Y and with cx = 0 for some
x ∈ X(K). Then c = 0.
Proof: By Theorem 8.4.2, we have c = p∗2 c for some c ∈ Pic(Y ). Now
consider the closed embedding ιx : Y −→ X × Y , y → (x, y), mapping Y
isomorphically onto the fibre over x . Since p2 ◦ ιx is the identity map on Y , we
get
c = ι∗x p∗2 c = cx = 0.
This also proves that c = 0. 
248 A B E L I A N VA R I E T I E S

Corollary 8.4.5. Let A be an abelian variety over K , let pi be the ith projection
A × A onto A , and let m be addition as usual. The following conditions are
equivalent for c ∈ Pic(A):

(a) m∗ (c) = p∗1 (c) + p∗2 (c);


(b) τa∗ (c) = c for all a ∈ A .

If (a) and (b) are satisfied, then [−1]∗ (c) = −c .

Note that identity (b) takes place in Pic(AK(a) ) after identifying A with A × {a}
as usual.
Proof: The equivalence is a consequence of
(m∗ (c) − p∗1 (c) − p∗2 (c)) |A×{a} = τa∗ (c) − c
and the seesaw principle from 8.4.4. If we pull-back equation (a) by the morphism
A −→ A × A, a −→ (a, −a),
then we get [−1]∗ (c) = −c . 
8.4.6. Let X be an irreducible smooth complete variety defined over a field K .
Recall that Pic0 (X) denotes the subgroup of Pic(X) consisting of those classes
represented by line bundles algebraically equivalent to the trivial line bundle (cf.
A.9.35). A fundamental result is that the group Pic0 (X) is canonically an abelian
variety. In what follows, we would like to describe this algebraic structure on
Pic0 (X).
We assume X(K) not empty and we fix a point P0 on X , which will play the role
of a base point. By A.7.14, we know that X is geometrically irreducible. Let T
be an irreducible variety. We say that c ∈ Pic(X ×T ) is a subfamily of Pic0 (X)
parametrized by T if:

(a) ct ∈ Pic0 (XK(t) ) for any t ∈ T ;


(b) cP0 = 0 ∈ Pic(T ).

By the seesaw principle in 8.4.2 and Remark 8.4.3, c is uniquely determined by


the family (ct )t∈T and by condition (b).

The following fundamental result due to Poincaré gives the existence of a universal
family p and parameter space B , such that any subfamily c parametrized by any
T is obtained by an appropriate pull-back of p . We denote by idX the identity
map of X .
Theorem 8.4.7. There is a subfamily p of Pic0 (X), parametrized by an irre-
ducible smooth complete variety B , with the following universal property. For
8.4. The Picard variety 249

any subfamily c of Pic0 (X), parametrized by an irreducible variety T , there is a


unique morphism ϕ : T → B with (idX × ϕ)∗ (p) = c .

B is called the Picard variety of X and p is the Poincaré class. If (B  , p ) is


another such pair, then there are morphisms ϕ : B  → B, ϕ : B → B  such that
p = (idX × ϕ)∗ (p) and p = (idX × ϕ )∗ (p ). Since (idX × (ϕ ◦ ϕ ))∗ (p) = p ,
we conclude ϕ ◦ ϕ = idB by uniqueness. Interchanging the role of (B, p) and
(B  , p ), we notice that ϕ is an isomorphism. In this sense, the pair (B, p) is
uniquely determined.
8.4.8. The proof of Theorem 8.4.7 is beyond the scope of this book. We advise the reader
to accept this fundamental but difficult result of algebraic geometry. For those with a solid
background in algebraic geometry, we give some references to deduce the theorem from the
existence of the Picard scheme. By a theorem of Murre and Oort, Pic(X) is representable
by a scheme P ic(X) for any proper scheme X over K (cf. [44], Th.8.2.3). Then P ic(X)
with this scheme structure is called the Picard scheme of X . If X is a smooth irreducible
variety over K with base point P0 ∈ X(K) , there is p & ∈ Pic(X × P ic(X)) with
& P 0 = 0 satisfying the following universal property: For any scheme T over K and
p
any c ∈ Pic(X × T ) with cP 0 = 0 , there is a unique morphism ϕ : T → P ic(X)
with (!idX × ϕ)∗ (& p) = c (cf. [44], Prop.8.2.4, and use [136], Prop.7.8.6, to check the
assumptions).
By A. Grothendieck [133], Th.2.1 and Cor.3.2, the connected component B of P ic(X)
containing 0 together with its induced reduced scheme structure is a smooth complete
variety over K . Note that the Picard scheme itself need not to be smooth if char(K) = 0 .
By the universal property of p& , we easily deduce that the formation of the Picard scheme
P ic(X) and the class p& is compatible with base change to field extensions of K . Since
B is an irreducible smooth variety containing 0 , we conclude that the restriction of p& to
X × B is a subfamily of Pic0 (X) parametrized by B .
We claim that the restriction p of p & to B satisfies the hypothesis of our theorem. It is
enough to show that for a subfamily c of Pic0 (X) parametrized by an irreducible variety
T , the morphism ϕ : T → P ic(X) from the universal property of p & factors through
the open and closed subscheme B of P ic(X) . Since T and B are both irreducible, it is
enough to show that ϕ(t) ∈ B(K) for some t ∈ T (K) . By the universal property again,
the point ϕ(t) is characterized by ct = pϕ(t) ∈ Pic0 (XK (t) ) . By definition of algebraic
equivalence, there is a subfamily d of Pic0 (XK (t) ) parametrized by an irreducible smooth
variety S over K(t) such that ds 1 = 0 and ds 2 = ct for some s1 , s2 ∈ T (K(t)) .
Now, by the universal property and compatibility with base change, there is a morphism
ψ : S → P ic(X)K (t) of schemes over K with (idX ×ψ)∗ (pK (t) ) = d . Then ψ(s1 ) = 0
and ψ(s2 ) = ct in P ic(X)(K(t)) . Therefore, ψ maps S into B and hence ct ∈ B(K) .

8.4.9. The following result will show that the F -rational points of the Picard vari-
ety may be identified with Pic0 (XF ) for any extension F/K . In particular, the set
of points of the Picard variety corresponds to Pic0 (XK ). We denote the Picard
variety of X cursively by P ic0 (X) to distinguish from the subgroup Pic0 (X)
of Pic(X).
250 A B E L I A N VA R I E T I E S

In the classical geometric setting of an algebraically closed field this caution is


not necessary. However, taking for example an elliptic curve over a field which
is not algebraically closed, it is easy to choose a divisor of degree 0 which is not
invariant under Gal(K/K) and hence not defined over K , making it clear that
the Picard variety has more points than Pic0 (X).
Let p be the Poincaré class on X × P ic0 (X). As a byproduct of 8.4.8, we obtain
the following result. If the reader has not followed the remarks there, he should
accept the corollary as well and skip its proof.
Corollary 8.4.10. Let F be an extension field of K:
(a) By base change, we have Pic(X) ⊂ Pic(XF ).
(b) P ic0 (XF ) = P ic0 (X)F and its Poincaré class is obtained from p by base
change to F .
(c) P ic0 (X)(F ) = Pic0 (XF ) by identifying b with pb .
Proof: We have seen that the formation of the Picard scheme P ic(X) is compatible with
base change. This proves (a) immediately. Because Pic0 (X) is smooth with K -rational
point 0 , we know that P ic0 (X)F is connected (use A.7.14), thus equal to the connected
component P ic0 (XF ) of P ic(X) endowed with its induced reduced scheme structure. By
8.4.8, also the formation of the universal class p& on X × P ic(X) is compatible with base
& , this proves (b) completely. Finally, (c)
change. As the Poincaré class is the restriction of p
follows immediately from (b). 
Remark 8.4.11. By the seesaw principle as in Corollaries 8.4.4 and 8.4.10 c), the
Poincaré class p is uniquely characterized by the conditions:
(a) pc = c for any c ∈ P ic0 (X);
(b) pP0 = 0.

Note that, in the situation of Theorem 8.4.7, the morphism ϕ is given by


ϕ(t) = ct = pϕ(t) .
This is clear by restriction of (idX × ϕ)∗ (p) = c to the fibre X × {t} and then
using the rule (f ◦ g)∗ = g ∗ ◦ f ∗ to show that
ct = (idX × ϕ)∗ (p)|X×{t} (idX × ϕ(t))∗ (p) = pϕ(t) = ϕ(t).
Such arguments will be often used in the sequel to deduce identities in the Picard
group.
Theorem 8.4.12. Together with its canonical group structure induced by tensor
product of line bundles, P ic0 (X) is an abelian variety over K .
Proof: It is enough to show that B : = P ic0 (X) is a group variety (using B
smooth and Remark 8.2.13). Let p1 , p2 be the canonical projections of X ×B ×B
8.4. The Picard variety 251

onto X × B . For c : = p∗1 (p) + p∗2 (p) and a, b ∈ B , the restriction of c to the
fibre X × {a} × {b} is equal to a + b (identifying X × {a} × {b} with X ).
In order to see this, note that the restriction of p∗1 (p) is equal to the restriction of
p to X × {a} and then use Remark 8.4.11. Since c is a subfamily of Pic0 (X)
parametrized by B × B , there is a unique morphism m : B × B → B with
(idX × m)∗ (p) = c . By Remark 8.4.11, we obtain
m(a, b) = pm(a,b) = c(a,b) = a + b
and so addition is a morphism. Let ι : B → B be the unique morphism with
(idX × ι)∗ (p) = −p. We get similarly
ι(b) = pι(b) = −pb = −b
and so the inverse is also a morphism. 

We summarize our results in:


Theorem 8.4.13. Let X be an irreducible smooth complete variety over K and
let P0 ∈ X(K) be a base point of X . Then the group Pic0 (XK ) has a unique
structure as an abelian variety over K , called the Picard variety and denoted by
P ic0 (X), with the properties:

(a) There is p ∈ Pic(X × P ic0 (X)) such that pb = b for b ∈ P ic0 (X) and
pP0 is trivial.
(b) For any subfamily c of Pic0 (X) parametrized by an irreducible variety T
over K , the set-theoretic map
T −→ P ic0 (X), t −→ ct
is actually a morphism over K .

The uniquely determined class p is called the Poincaré class.


8.4.14. Let X  be also a complete smooth variety over K with base point P0 ∈
X  (K) and let ϕ : X → X  be a morphism such that ϕ(P0 ) = P0 . Then the
map
ϕ : P ic0 (X  ) −→ P ic0 (X), c −→ ϕ∗ (c )
is a homomorphism of abelian varieties called the dual map of ϕ .
To prove this, we remark first that the pull-back of the Poincaré class p of X  to
X × P ic0 (X  ) under ϕ × idP ic0 (X  ) is a subfamily of Pic0 (X) parametrized by
P ic0 (X  ). For c ∈ P ic0 (X  ), the restriction of that pull-back to X × {c } is
equal to ϕ∗ (c ) by Theorem 8.4.13 (a). We conclude by Theorem 8.4.13 (b) that
the dual map is a morphism. Corollary 8.2.9 shows that it is a homomorphism.
Actually, it is characterized by
 ∗ (p) = (ϕ × idP ic0 (X  ) )∗ (p ).
(idX × ϕ)
252 A B E L I A N VA R I E T I E S

8.4.15. At the end, we describe the situation complex analytically. For details, we refer to P.
Griffiths and J. Harris [130], pp.326–332. Let X be an irreducible proper smooth complex
variety endowed with its structure as a compact connected complex manifold (cf. A.14).
The considerations hold more generally for a connected compact Kähler manifold. As in
Example A.10.10, the transition functions (gαβ ) of a line bundle L on X may be viewed
× ×
as a Čech-cocycle with values in OX and we may identify Pic(X) with H 1 (X, OX ).
The exponential map induces a short exact sequence
× exp
0 −→ ZX −→ OX −→ OX −→ 0,
where ZX is the sheaf associated to the constant presheaf Z on X . The beginning of the
associated long exact cohomology sequence is
0 −→ Z −→ C −→ C× −→ H 1 (X, Z) −→
× 1 c
−→ H 1 (X, OX ) −→ H 1 (X, OX ) −→ H 2 (X, Z) −→ · · · .
The map c1 gives the Chern class of line bundles. A line bundle is algebraically equiv-
alent to 0 if and only if its Chern class is 0 (cf. [130], p.462). If we use the canonical
isomorphism
H 1 (X, OX ) ∼ 0,1
= H (X)
arising from the Dolbeault complex, we conclude that the Picard variety is biholomorphic
to the complex torus H 0,1 (X)/H 1 (X, Z) .

8.5. The theorem of the square and the dual abelian variety

In Section 8.4 we have defined the Picard variety P ic0 (X) of an irreducible smooth com-
plete variety X with K -rational base point. Now X is always an abelian variety A over
the field K and the base point is the origin. Then the associated Picard variety is called the
.
dual abelian variety denoted by A
The theorem of the square says that for any c ∈ Pic(A) the point ϕc (a) := τa∗ (c) − c is in
A and additive in a ∈ A . Over C , it is quite clear that the translated τa∗ (c) is algebraically
equivalent to c using a path from 0 to a for the deformation. In the special case of an
elliptic curve E with origin P0 and divisor D we have
τP∗ (D) ∼ D − [P ] + [P0 ]
and the theorem of the square is evident from [P ] − [P0 ] algebraically equivalent to 0 and
[P + Q] ∼ [P ] + [Q] − [P0 ] . The theorem of the square in the general case will be obtained
from our results about the Picard variety.
As a consequence of the theorem of the square, we will prove that an abelian variety is
always projective. If c is ample, we will see that ϕc is surjective and has finite kernel, thus
A has the same dimension as A .

At the end, we will mention a direct construction of the dual abelian variety, biduality and
the complex analytic situation. These considerations are not essential for the sequel of the
book.
Theorem 8.5.1. Let c ∈ Pic(A) and a ∈ A . Then ϕc (a) := τa∗ (c)−c ∈ P ic0 (A)(K(a))
and ϕc : A → P ic0 (A) is a homomorphism of abelian varieties over K .
8.5. The theorem of the square 253

Proof: Let pi be the i th projection of A × A onto A and consider


c := m∗ (c) − p∗1 (c) − p∗2 (c)
on A × A . We already remarked in the proof of Corollary 8.4.5 that
c |A×{a} = τa∗ (c) − c
for a ∈ A . Thus ϕc (a) ∈ Pic0 (AK (a) ) = P ic0 (A)(K(a)) by the definition of algebraic
equivalence and Corollary 8.4.10. Since c |{0}×A = 0 , c is a subfamily of Pic0 (A)
parametrized by A . Theorem 8.4.13 shows that ϕc is morphism of varieties defined over
K . Since ϕc (0) is trivial, the map is a homomorphism of abelian varieties (Corollary
8.2.9). 
The next statement is the theorem of the square:
Theorem 8.5.2. For a, b ∈ A , we have

τa+b (c) + c = τa∗ (c) + τb∗ (c).
Proof: Immediate from Theorem 8.5.1, by substracting 2c on both sides. 
Theorem 8.5.3. Let b ∈ Pic(A) such that ϕb = 0 . Then for any ample c ∈ Pic(A) ,
there is some a ∈ A with
b = τa∗ (c) − c.

See [212], II.8, Th.1 for a cohomological proof of this key fact.
Remark 8.5.4. The kernel of ϕc gives much information about c . If c is ample, then the
kernel is finite. We will prove a partial converse of this statement, which we will use in
Section 8.10 in the proof that the Θ -divisor on the Jacobian is ample. On the other hand,
ker(ϕc ) = A if c ∈ Pic0 (A) . These statements about the kernel will be proved next and
afterwards we shall sketch a construction of the dual abelian variety.
Proposition 8.5.5. A class c ∈ Pic(A) is ample if and only if ker(ϕc ) is finite and in
addition H 0 (A, nc) = 0 for some positive integer n .
Proof: Assume that c is ample. Let B be the connected component of the closed subgroup
ker(ϕc ) containing 0 . For any b ∈ B , we have
τb∗ (c) = c,
and hence  
[−1]∗ (cB ) = −cB
by Corollary 8.4.5. Since  
0B = cB + [−1]∗ (cB )
is ample, B has to be the trivial abelian subvariety {0} (use A.6.15). Therefore ker(ϕc )
is finite. Choose n so large that nc is very ample. This gives H 0 (A, nc) = 0 .
In the other direction, we may assume that H 0 (A, c) = 0 , i.e. there is an effective divisor
D such that O(D) is in the class of c . The next lemma shows that c is ample. 
Lemma 8.5.6. Let D be an effective divisor on A and suppose that the subgroup
{a ∈ A | τa∗ (D) = D} is finite. Then D is ample on A .
254 A B E L I A N VA R I E T I E S

Proof: Note that D is ample if and only if DK is ample on AK (cf. Remark A.6.12);
hence we may assume K algebraically closed. The proof then proceeds by proving first
that the linear system |2D| is base-point free and defines a morphism ϕ of A into some
projective space. Then we prove that ϕ is a finite morphism and the conclusion comes by
pull-back. The details are as follows.
Let a, b ∈ A . If b is in the support of the effective divisor
Ea := τa∗ (D) + τ−a

(D),
then b + a or b − a is in the support of D . (The reader need not worry about pull-back
of divisors because it is done with respect to an isomorphism. In this case the pull-back
is just given by the inverse images of the components, keeping the multiplicities.) For any
given b ∈ A we can always find a point a ∈ (D − b) ∪ (b − D) , i.e. b ∈ / supp(Ea ) .
Then by the theorem of the square in 8.5.2 the effective divisor Ea is an element of |2D| .
Thus the linear system |2D| is base-point free hence, by A.6.8, it defines a morphism
ϕ : A −→ PnK .
The morphism ϕ is proper (cf. A.6.15 (b), (e)). Let F be an irreducible component of any
fibre. All elements of |2D| are pull-backs of hyperplanes by the definition of ϕ . Now for
any a ∈ A either F is contained in the support of Ea or F ∩ supp(Ea ) = ∅ , hence we
can find a ∈ A such that F and the support of Ea are disjoint, namely
a∈
/ supp(D) − F.
Let Z be an irreducible component of D . Then Z − F is an irreducible closed subset of
A not containing a . We conclude that Z − F is of codimension 1 . Now note that for any
b ∈ F we have
Z − F = Z − b,
whence it follows that Z is invariant by translations in F − F . Therefore, the same is
true for D instead of Z . By assumption, this is only possible for dim(F ) = 0 and we
conclude that ϕ has finite fibres. Thus, since ϕ is a proper morphism, it must also be finite
(cf. A.12.4). Finally, recalling that the pull-back of an ample class by a finite morphism is
again ample (see A.12.7), we see that 2D is ample. 
Corollary 8.5.7. An abelian variety is projective.
Proof: Let U be an affine open subset of A containing 0. We may assume that dim(A) ≥
1 . Let Z1 , . . . , Zr be the irreducible components of A \ U . Enlarging them, we may
assume that Z1 , . . . , Zr are prime divisors. In order to see this note that the complement
of a divisor in an affine smooth variety is affine (use A.2.10). Setting

r
D := Zi ,
i=1

the subgroup B := {a ∈ A | τa∗ (D) = D} is closed and for b ∈ B , we have U + b = U .


Since 0 ∈ U , we have
B ⊂ U.
As a complete variety, B must be finite (see A.6.15 (d)). Lemma 8.5.6 shows that D is
ample, hence A is projective. 
8.5. The theorem of the square 255

8.5.8. The situation is even better than as described above. On an abelian variety, any ample
divisor D is rationally equivalent to an effective divisor. As we have seen in the proof of
Lemma 8.5.6, the linear system |2D| is base-point free. In the case of elliptic curves, it is
obvious that 3D is very ample. The striking fact is that this remains true in general. We
will not need these results in this book and we only refer to [212] for a proof.

Proposition 8.5.9. For b ∈ Pic(A), the following statements are equivalent:

(a) b ∈ Pic0 (A).


(b) ker(ϕb ) = A .
(c) For every ample c ∈ Pic(A), there is a ∈ A such that b = τa∗ (c) − c .
(d) There is an ample c ∈ Pic(A) such that b = τa∗ (c) − c for some a ∈ A .
Proof: First, we prove first that (a) ⇒ (b). By Corollary 8.4.10, we may assume
that K is algebraically closed. Let
ϕ : A × P ic0 (A) −→ P ic0 (A)
be the map given by
(a, b) −→ τa∗ (b).
We will prove below that it is a morphism. For T := A × P ic0 (A), consider
c := (m × idP ic0 (A) )∗ (p) ∈ Pic(A × T ),
where m denotes the addition morphism as usual. Note that the restriction of
m×idP ic0 (A) to A×{a}×{b} is given by τa ×{b}, by identifying A×{a}×{b}
with A . By Remark 8.4.11 and the rule (f ◦ g)∗ = g ∗ ◦ f ∗ , we get

cA×{a}×{b} = τa∗ (b)

and similarly

c{0}×T = p.
Let us denote by p2 the projection of A × T onto T . The subfamily c − p∗2 (p) of
Pic0 (A) parametrized by T induces a morphism T → P ic0 (A), which is equal
to ϕ (cf. Theorem 8.4.13). Since ϕ(A × {0}) = 0, the constancy lemma in 8.2.6
shows τa∗ (b) = b for all a ∈ A . This proves the claim.
The implication (b) ⇒ (c) is Theorem 8.5.3. The existence of an ample class is
assured by Corollary 8.5.7 and so (d) is a consequence of (c). The implication (d)
⇒ (a) is part of Theorem 8.5.1. 
Definition 8.5.10. The Picard variety P ic0 (A) is called the dual abelian variety

of A and will be denoted by A.
 has the same dimension as A .
Corollary 8.5.11. The dual abelian variety A
256 A B E L I A N VA R I E T I E S

Proof: There is an ample c ∈ Pic(A) by Corollary 8.5.7. By Proposition 8.5.9 and


Corollary 8.4.10, the homomorphism ϕc : A → A  is surjective. By Proposition
8.5.5, the kernel is finite, therefore the dimension theorem in 8.2.16 proves the
claim. 
8.5.12. For K algebraically closed, we sketch a construction of the dual abelian variety
A independently of the existence of the Picard scheme. For details, see [212], Ch.III, §13.
First, the theorem of the square is proved, i.e. ϕc is a homomorphism of abstract groups
for any c ∈ Pic(A) . By Corollary 8.5.7, there is an ample c ∈ Pic(A) . Let B be the
connected component
 of ker(ϕc ) . We can prove that ker(ϕc ) is closed and hence B is an
abelian variety, cB is ample and τb∗ (c) = c for any b ∈ B . Corollary 8.4.5 shows that
 
[−1]∗ (c)B = −cB and since [-1] is an automorphism, this is ample. Therefore
0B = c|B + (−c)|B
is ample (cf. A.6.10 (c)) and so B = {0} (see A.6.15).
Next, we show that ϕc is surjective. Thus the dual abelian variety is the quotient of A by a
finite group scheme with underlying space ker(ϕc ) . If the characteristic of K is zero, then
this group scheme is the closed subgroup on ker(ϕc ) .

As suggested by the name, we have the following fact (needed only in the example below):

Theorem 8.5.13. Let p be the Poincaré class of A . Then A is the dual abelian variety of
P ic0 (A) and the Poincaré class of P ic0 (A) is the pull-back of p by the isomorphism
P ic0 (A) × A −→ A × P ic0 (A), (b, a)
→ (a, b).

For a proof, see [212], Ch.III, §13.


Example 8.5.14. If E is an elliptic curve with origin P0 , then we have seen in 8.3 that
P
→ cl([P ] − [P0 ]) is an isomorphism E → P ic0 (E) of groups. It is also a morphism
of varieties as it is induced by the subfamily cl(∆ − {P0 } × C − C × {P0 }) of Pic0 (E)
parametrized by E , where ∆ is the diagonal. Applying the same consideration to the dual
elliptic curve E = P ic(E) and using Theorem 8.5.13, we conclude that E is canonically
isomorphic to E  as an abelian variety. Thus E is self dual with Poincaré class on E × E
induced by cl(∆ − {P0 } × C − C × {P0 }) .
8.5.15. Here we assume that K = C . Let T = V /Λ be a complex torus. By 8.2.27, every
complex abelian variety is biholomorphic to a complex torus. We describe the results of
this section from the point of view of complex analysis. For proofs and further details, we
refer to [212], Ch.I and especially II.9.
The pull-back of a line bundle L on T to V with respect to the quotient map π : V −→ T
is trivial as every line bundle on V is trivial (Cousin’s theorem) and L may be obtained as
the quotient of V × C by an action of Λ of the form
Tλ (v, z) = (v + λ, eλ (v) · z) (λ ∈ Λ, v ∈ V ). (8.4)
The eλ (z) ∈ C× is obtained by the canonical isomorphism of the fibres of π ∗ L over v
and v + λ .
8.6. The theorem of the cube 257

The theorem of Appell–Humbert says that the trivialization V × C of π ∗ L may be nor-


malized in the following way: Let H be the Riemann form on V with imaginary part E
induced by the Chern class of L (cf. 8.2.27). Then there is a unique map α : Λ → T :=
{z ∈ C | |z| = 1} with
α(λ1 + λ2 ) = eiπE (λ 1 ,λ 2 ) α(λ1 )α(λ2 ) (λ1 , λ2 ∈ Λ) (8.5)
such that L is isomorphic to the quotient of V × C by the action of Λ given by
π
eλ (v) := α(λ)eπH (v,λ)+ 2 H (λ,λ) (8.6)
in (8.4). Conversely, every Riemann form H and every α : Λ → T with (8.5) give rise
to a line bundle L on V given as the quotient of V × C by the action of Λ induced by
(8.6) in (8.4), and H is the Riemann form associated to c1 (L) . We have seen in 8.4.15
that Pic0 (T ) is the kernel of the Chern class map c1 : Pic(T ) → H 2 (T, Z) . By 8.2.27,
we may identify H 2 (T, Z) with the alternating Z -valued bilinear forms on Λ and c1 (L)
corresponds to E . Hence cl(L) ∈ Pic0 (T ) if and only if its Riemann form H = 0 . In this
case, identity (8.5) shows that α is a homomorphism of groups. Hence we may identify
Hom(T, C× ) with Pic0 (T ) .
Let a ∈ T and let L := τa∗ L . Clearly, L and L have the same Chern class and hence H
is also the Riemann form of L . For any v ∈ V with π(v) = a , we compute easily that
γv (λ) := α (λ)/α(λ) = e2πiE (v,λ) (λ ∈ Λ).
×
Note that γv ∈ Hom(Λ, C ) represents ϕL (a) := τa∗ L ⊗ L−1 . Obviously, we have
γv+w = γv γw proving the theorem of the square.
If L is ample, then T is an abelian variety A , and H is positive definite (cf. 8.2.27). Hence
every homomorphism Λ → R has the form E(·, v) for suitable v ∈ V . We conclude
that every homomorphism α : Λ → T has the form α = exp(2πiE(·, v)) . Hence the
line bundle in Pic0 (A) corresponding to α is isomorphic to ϕL (a) = τa∗ L ⊗ L−1 for
a = π(v) proving Theorem 8.5.3.
For any line bundle L on T , we have ker(ϕL ) = Λ⊥ /Λ , where
Λ⊥ := {v ∈ V | E(v, Λ) ⊂ Z}.
By Proposition C.2.4, ker(ϕL ) is finite if and only if Λ⊥ is a lattice. The latter is easily
shown to be equivalent for H to be not degenerate (i.e. the matrix of H has full rank).
In particular, if L is ample, then ker(ϕL ) is finite. On the other hand, we have seen that
cl(L) ∈ Pic0 (T ) if and only if E = 0 (using that H is determined by E ). Clearly, this is
equivalent to Λ⊥ = V , i.e. to ker(ϕL ) = T .

8.6. The theorem of the cube

We first deduce some elementary facts for involutions and quadratic functions on
an abelian group and then prove the theorem of the cube. The theorem of the cube
states that the pull-back of a fixed line bundle on an abelian variety is a quadratic
function in the morphism. It will be deduced from the theorem of the square. In
fact, both statements are easily seen to be equivalent. The theorem of the cube will
258 A B E L I A N VA R I E T I E S

be a fundamental tool in the study of canonical heights on abelian varieties given in


the next chapter. We conclude this section by sketching a complex analytic proof
of the theorem of the cube using theta functions.
8.6.1. Let M be an abelian group with an involution *, i.e. a linear map
M −→ M, x −→ x∗
with
(x∗ )∗ = x
for any x ∈ M . An element x of M is called even (resp. odd), if x∗ = x
(resp. x∗ = −x ). The even (resp. odd) elements form a subgroup of M . The
intersection of these two subgroups is equal to the 2-torsion points of M . For
x ∈ M , we are looking for a decomposition x = x+ + x− into an even part x+
and an odd part x− . Such x+ (resp. x− ) is determined up to 2-torsion.
Lemma 8.6.2. Let x ∈ M . Then 2x has a decomposition into an even part and
an odd part. If the subgroup of odd elements is divisible by 2, then x has also
such a decomposition.
Proof: The decomposition 2x = (x + x∗ ) + (x − x∗ ) does the job. Divisibility by
2 means that we have an odd z with 2z = x − x∗ . Let x+ := x − z and x− := z .
We have to show that x+ is even. This follows from
(x+ )∗ = x∗ − z ∗ = (x − 2z) + z = x+ ,
completing the proof. 
Example 8.6.3. Let A be an abelian variety (or more generally any group variety)
over K . Then we consider the involution c → [−1]∗ c on the abelian group
Pic(A). Hence a line bundle is called even (resp. odd) if we have [−1]∗ L ∼=L
(resp. [−1]∗ L ∼
= L⊗(−1) ).
Proposition 8.6.4. On every abelian variety, there is an even very ample line
bundle.
Proof: By Corollary 8.5.7, there is a very ample c ∈ Pic(A). Then c + [−1]∗ c is
very ample (cf. A.6.10 (c)) and even. 
8.6.5. Let q : M → N be a set-theoretic map of abelian groups. If the function
b : M × M −→ N, (x, y) −→ q(x + y) − q(x) − q(y)
is bilinear, then q is called a quadratic function with associated bilinear form
b . Obviously, b is symmetric. A quadratic form is a quadratic function which is
homogeneous of degree 2 with respect to multiplication by integers. The quadratic
functions form an abelian group. Moreover, we have an involution given by
q ∗ (x) := q(−x).
8.6. The theorem of the cube 259

By Lemma 8.6.2 and its proof, we have a canonical decomposition of 2q into an


even element Q and an odd element L given by
Q(x) := q(x) + q(−x)

and

L(x) := q(x) − q(−x).


The even Q is called the associated quadratic form of q . Note that q(0) = 0 and
hence b(x, −x) = −Q(x). We get Q(x) = b(x, x) and so Q(x) is homogeneous
of degree 2. Similarly, L is called the associated linear form of q . An easy
calculation shows that L is linear.
Lemma 8.6.6. Let q : M → N be a quadratic function and let n ∈ Z . Then
n2 + n n2 − n
q(nx) = q(x) + q(−x)
2 2
for all x ∈ M .
Proof: We proceed by induction on |n|. The result is clear for |n| ≤ 1. Let b be
the bilinear form associated to q . Then
0 = b(nx, x) + b(nx, −x)
gives
0 = q((n + 1)x) − 2q(nx) + q((n − 1)x) − q(x) − q ∗ (x).
By the induction hypothesis, we get easily the claim. 
Corollary 8.6.7. A quadratic function is even (resp. odd) if and only if it is a
quadratic form (resp. homogeneous of degree 1).
Proof: Immediate from Lemma 8.6.6. 
Example 8.6.8. Let M := (Z/2Z)2 and N := Z/2Z. Consider the function
q : M → N given by q(x) = 0 if and only if x = 0. Then q is an odd quadratic
function which is not linear.
8.6.9. For k ∈ N \ {0} and I ⊂ {1, . . . , k}, the homomorphism
SI : M k −→ M

is defined by

SI (x1 , . . . , xk ) := xi .
i∈I

In particular, s∅ is identically 0.
260 A B E L I A N VA R I E T I E S

Lemma 8.6.10. Let q : M → N be a quadratic function and let k be an integer.


If k ≥ 3, then we have for x ∈ M k :

(−1)|I| q (SI (x)) = 0.
I⊂{1,...,k}

Proof: We proceed by induction on k . If we formulate the bilinearity of b in terms


of q , then we get
q(x + y + z) − q(x + z) − q(x + y) − q(y + z) + q(x) + q(y) + q(z) = 0.
This proves the claim for k = 3. For k > 3, we obtain

(−1)|I| q (SI (x))
I⊂{1,...,k}
 
= (−1)|I| q (SI (x)) − (−1)|J| q (SJ (x) + xk )
I⊂{1,...,k−1} J⊂{1,...,k−1}
 
|J|
=− (−1) b(SJ (x), xk ) − (−1)|J| q(xk ).
J⊂{1,...,k−1} J⊂{1,...,k−1}

The first sum equals 0, by induction applied to the linear and hence quadratic
function b(·, xk ). The second sum is 0 by carrying out the identity (1 − 1)k−1 =
0. 
We apply the above to the abelian group M = Mor(X, A) of morphisms from X
to A and to the abelian group N = Pic(X).
Theorem 8.6.11. Let X be a variety over the field K and let A be an abelian
variety over K with c ∈ Pic(A). Then the map of Mor(X, A) into Pic(X) given
by ϕ → ϕ∗ (c) is quadratic.

Let k ≥ 3 and X = Ak with ith projection pi onto A . For I ⊂ {1, . . . , k}, we


have 
SI (p1 , . . . , pk ) = pi .
i∈I

Lemma 8.6.10 shows that the theorem implies



(−1)|I| SI (p1 , . . . , pk )∗ (c) = 0.
I⊂{1,...,k}

For k = 3, this equation


 

|I|
(−1) pi (c) = 0 (8.7)
I⊂{1,2,3} i∈I

is called the theorem of the cube.


8.6. The theorem of the cube 261

Proof of Theorem 8.6.11: Let ϕ1 , ϕ2 , ϕ3 ∈ Mor(X, A) and let


Φ : X −→ A3 , x −→ (ϕ1 (x), ϕ2 (x), ϕ3 (x)) .
We pull-back equation (8.7) to X using Φ , getting bilinearity as in the proof of
Lemma 8.6.10. So it is enough to prove the theorem of the cube. Let c be the
left-hand side of (8.7). For a, b, c ∈ A , we observe
c |{a}×{b}×A = τa+b

(c) − τa∗ (c) − τb∗ (c) + c
and this is zero by the theorem of the square (see Theorem 8.5.2). In the same
way, the restriction of c to {a} × A × {c} is trivial. By the seesaw principle (see
Corollary 8.4.4), c|{a}×A×A is trivial for any a ∈ A . Now c|A×{b}×{c} is also
trivial and, by the seesaw principle again, c is trivial. 
Remark 8.6.12. We proved the theorem of the cube using the theorem of the square. The
latter was an immediate consequence of the existence of the Picard variety. Another way
is to prove directly the so-called theorem of the cube for varieties (for a proof, see [212],
Ch.II, §6):
Let X, Y, Z be geometrically irreducible and geometrically reduced varieties over K with
X, Y complete and let c ∈ Pic(X × Y × Z) . If there are x0 ∈ X(K), y0 ∈ Y (K)
and z0 ∈ Z(K) such that the restrictions of c to {x0 } × Y × Z, X × {y0 } × Z and
X × Y × {z0 } are all zero, then c = 0 .
The theorem of the cube is an immediate consequence of the theorem of the cube for vari-
eties (use the above for c ).
8.6.13. First, we give a short introduction into theta functions (for details, cf. S. Lang
[168], Ch.VI, Ch.VIII) and then we use them to prove the theorem of the cube complex
analytically. Let Λ be a lattice in the complex vector space V and let L : V × Λ → C, J :
Λ → C be maps with L(v, λ) linear in v ∈ V . A holomorphic (resp. meromorphic) theta
function for Λ of type (L, J) is a holomorphic (resp. meromorphic) function θ : V → C
satisfying
θ(v + λ) = exp(2πiL(v, λ) + 2πiJ(λ)) · θ(v) (v ∈ V, λ ∈ Λ).
It is easy to see that E(λ, µ) := L(λ, µ) − L(µ, λ) (λ, µ ∈ Λ) is the imaginary part of
a Riemann form H . If b is a symmetric bilinear form on V , l is a linear form on V , and
c ∈ C , then
θ(v) := exp(2πib(v, v) + 2πil(v) + c)
is a theta function for Λ of type L(v, λ) := 2b(v, λ), J(λ) := l(λ) + b(λ, λ) . These are
exactly the meromorphic theta functions with trivial divisor and thus they are called trivial
theta functions.
A normalized holomorphic (resp. meromorphic) theta function with Riemann form H
and quadratic character α (i.e. a function α : Λ → T satisfying (8.5) on page 257 is a
holomorphic (resp. meromorphic) function θ : V → C with
π
θ(v + λ) = α(λ) · exp(πH(v, λ) + H(λ, λ)) · θ(v) (8.8)
2
for all v ∈ V, λ ∈ Λ . Note that it is a theta function and H is indeed the Riemann
form constructed above. For a theta function, there is always a normalized theta function
262 A B E L I A N VA R I E T I E S

θ& associated to θ such that θ/θ& is a trivial theta function. Note that θ& is unique up to a
multiplicative non-zero constant.
For example, let Z be a symmetric complex g × g matrix with Z positive definite and
let Λ be the lattice Zg + Z · Zg in Cg . Then the Riemann bilinear relations (cf. [130],
p.306) say that this is equivalent for V /Λ to be an abelian variety. Then
H(v, w) := vt (Z)−1 w (8.9)
is a positive definite Riemann form for Λ . The Riemann theta function for Λ is the entire
function defined by the Fourier series

θ(v, Z) = exp(πikt Zk + 2πikt v).
k∈Zg

It is an easy matter to verify that θ(·, Z) is a theta function for Λ of type


1
L(v, m + Zn) = −nt v, J(m + Zn) = − nt Zn.
2
The associated Riemann form is indeed the H from (8.9). The normalized theta function
associated to θ(·, Z) is

& πi t −1
θ(v, Z) = exp v (Z) v · θ(v)
2
with quadratic character α(m + Zn) = exp(πimt n) .
Let L be a line bundle on the complex torus T := V /Λ . By the theorem of Appell–
Humbert (cf. 8.5.15), the trivialization V × C of π ∗ L may be normalized by a Riemann
form H and a quadratic character α : Λ → T . Then a global (resp. meromorphic) section
of L lifts to a normalized holomorphic (resp. meromorphic) theta function for Λ . This
induces a one-to-one correspondence between global (resp. meromorphic) sections of L
and normalized holomorphic (resp. meromorphic) theta functions with Hermitian form H
and quadratic character α . If L is ample or equivalently H positive definite (cf. 8.2.27),
then V /Λ is an abelian variety A and the Riemann–Roch theorem for abelian varieties
states that 0
dim(Γ(A, L)) = |det E(λj , λk )|,
where (λj ) is a Z -basis for the lattice Λ .
We conclude this section by proving the theorem of the cube complex analytically. Let
A = V /Λ be an abelian variety together with a line bundle L . We choose a trivialization
V × C of π ∗ L as in the theorem of Appell–Humbert. Since A is algebraic, there is a non-
zero meromorphic section of L inducing a normalized meromorphic theta function θ with
Riemann form H and quadratic character α . Then we consider the meromorphic function
θ(v1 + v2 + v3 )θ(v1 )θ(v2 )θ(v3 )
f (v1 , v2 , v3 ) :=
θ(v1 + v2 )θ(v1 + v3 )θ(v2 + v3 )
on V × V × V . It corresponds to a meromorphic section of
 ∗
4   |I |

pi L(−1) .
I ⊂{1,2,3} i∈I

For λ1 , λ2 , λ3 ∈ Λ , the identity (8.8) implies immediately the equation


f (v1 + λ1 , v2 + λ2 , v3 + λ3 ) = f (v1 , v2 , v3 )
8.7. Multiplication by n 263

using the facts that α is a homomorphism and H is bilinear. We conclude that the auto-
morphy factor e(λ 1 ,λ 2 ,λ 3 ) of the relevant line bundle above is trivial and hence this line
bundle is also trivial. This proves the theorem of the cube complex analytically.

8.7. The isogeny multiplication by n

Let A be an abelian variety over the field K . We assume that the reader is familiar
with the basic notions and results of Section A.12. In this section we study the
endomorphism [n] : A −→ A given by multiplication by n ∈ Z on the abelian
variety A . The main result is Proposition 8.7.2 dealing with its kernel A[n]. This
endomorphism plays a fundamental role in the study of abelian varieties, both
geometrically and from the arithmetic point of view. The arithmetic significance
of this isogeny is made clear by the construction of the Néron–Tate height in the
next chapter and by its role in the proof of the Mordell–Weil theorem, which will
be treated in Chapter 10.
Over the complex numbers, it is a classical fact that

A[n] ∼
= (Z/nZ)2dim(A) .

In fact, we may identify the complex manifold A with Cd /Λ , where Λ is a lattice


and d is the complex dimension of A ; multiplication by n on A is induced by
ordinary multiplication by n on Cd . If we forget the complex structure, we get an
isomorphism with the 2d -dimensional real manifold (R/Z)2d with multiplication
by n induced by the usual multiplication by n on R2d , and the result becomes
obvious. If char(K) is coprime to n , then we will see below that the situation is
the same.

(0,0)

The isogeny [3] on C/Λ and its nine points kernel, marked with dots
264 A B E L I A N VA R I E T I E S

However, all this changes quite drastically for char(K)|n . Then the map [p] is
always inseparable and its reduced kernel A[p] has cardinality pa , with a an inte-
ger 0 ≤ a ≤ dim(A). All values of a in this range may occur (the case a = 0 is
the so-called supersingular case). As it is mentioned in 8.2.15, it is better to give
the kernel of [p] the structure of a (non-reduced) group scheme, but this will not
be treated here.
Proposition 8.7.1. Let c ∈ Pic(A) and n ∈ Z . Then
n2 + n n2 − n
[n]∗ (c) = c+ [−1]∗ c .
2 2
In particular, we have [n]∗ (c) = n2 c if c is even and [n]∗ (c) = −c if c is odd.
Proof: By Theorem 8.6.11, the function
q : Z −→ Pic(A), n −→ [n]∗ c
is quadratic. Then the claim follows from Lemma 8.6.6 
Proposition 8.7.2. Let n ∈ Z\{0}. Then [n] is a finite flat surjective morphism
of degree n2dim(A) . The separable degree of [n] equals the number of points of
any fibre. If char(K)  | n , then [n] is an étale morphism and
A[n] ∼
= (Z/nZ)2dim(A) .
If p = char(K) divides n , then [n] is not separable.
Proof: Let g be the dimension of A . By Corollary 8.5.7, there is an ample
c ∈ Pic(A). The restriction of [n]∗ (c) to A[n] is trivial. Since [−1] is an auto-
morphism, [−1]∗ (c) is also ample. Proposition 8.7.1 shows that [n]∗ (c) is ample
and so is the restriction to A[n]. Therefore, A[n] has to be finite. Now the dimen-
sion theorem from 8.2.16 and Proposition 8.2.17 show that [n] is a surjective finite
flat morphism, whose fibres have cardinality equal to the separable degree of [n].
In order to compute its degree, we use intersection theory of divisors (see A.9).
There is a very ample even line bundle L on A (cf. Proposition 8.6.4). By A.9.18,
we may assume that L = O(D) for a divisor D on A . By the projection formula
(cf. A.9.24), we have the following identity of g -fold intersection numbers
[n]∗ (D) · · · [n]∗ (D) = deg[n] D · · · D.
By Proposition 8.7.1, we have [n]∗ (D) ∼ n2 D , where ∼ denotes rational equiv-
alence of divisors (cf. A.8.11). Noting that D · · · D = deg(X) = 0 (see A.9.33),
we deduce
n2g = deg[n].
We know that the differential d[n] is multiplication by n on the tangent space at
0 (cf. Corollary 8.2.25).
If char(K) | n , then we see by a translation argument that d[n] induces an iso-
morphism on tangent spaces. By Proposition B.3.5, the morphism [m] is étale and
8.8. Odd elements in the Picard group 265

hence separable (cf. A.12.19). We have seen that the number of points of A[n]
is equal to the separable degree of [n]. This means |A[n]| = n2g . For any m|n ,
it follows that the subgroup A[m] of A[n] has m2g elements. By the theory of
finite abelian groups, A[n] is isomorphic to (Z/nZ)2g .
If p = char(K)|n , then the differential d[n] vanishes at 0. Hence d[n] van-
ishes everywhere by a translation argument. Since a separable dominant mor-
phism is generically étale (cf. A.12.19), Proposition B.3.5 shows that [n] is not
separable. 

8.8. Characterization of odd elements in the Picard group

Let A be an abelian variety. All varieties are assumed to be defined over the field
K.
Recall from Example 8.6.3 that we have a canonical involution of Pic(A) defining
even and odd elements. First, we prove that Pic0 (A) is a divisible subgroup and
thus we get a decomposition into even and odd parts on the Picard group. Another
application is the beautiful result that the classes in the Picard group algebraically
equivalent to 0 are precisely the odd classes. Note that this is quite evident for an
elliptic curve E with origin P0 . Then a divisor class D is algebraically equivalent
to 0 if and only if deg(D) = 0. On the other hand, 8.3.6 yields [−1]∗ D ∼
2 deg(D)[P0 ] − D , proving that D is odd if and only if deg(D) = 0. At the end,
we will prove that the Poincaré class of an abelian variety is even.
Proposition 8.8.1. If c ∈ Pic(A) and r ∈ Z \ {0} with rc ∈ Pic0 (A), then
c ∈ Pic0 (A).
Proof: Obviously, we have rϕc = ϕrc and the latter is equal to zero by Proposi-
tion 8.5.9 (a), (b). Theorem 8.5.1 tells us that ϕc is a homomorphism of abelian
varieties and so ϕc = 0 (Proposition 8.7.2). Using once more Proposition 8.5.9
(a), (b), this implies that c ∈ Pic0 (A). 
Corollary 8.8.2. Let c ∈ Pic(A). Then there are an odd element c− and an even
element c+ of Pic(A) such that c = c− + c+ . The element c− is determined
only up to 2-torsion elements in Pic(A).
Proof: This follows from 8.6.1, Lemma 8.6.2 and Proposition 8.8.1. 
Theorem 8.8.3. If c ∈ Pic(A), then [−1]∗ c − c ∈ Pic0 (A). Moreover, the
following statements are equivalent:

(a) c is odd.
(b) For any variety X , the map Mor(X, A) −→ Pic(A), given by ϕ −→
ϕ∗ (c), is linear.
266 A B E L I A N VA R I E T I E S

(c) If pi is the ith projection of A × A onto A , then


(p1 + p2 )∗ (c) = p∗1 (c) + p∗2 (c).
(d) τa∗ (c) = c for all a ∈ A .
(e) c ∈ Pic0 (A).
(f) For all ample c ∈ Pic(A), there is an a ∈ A such that c = τa∗ (c ) − c .
(g) There is an ample c ∈ Pic(A) such that c = τa∗ (c ) − c for some a ∈ A .
Proof: The equivalence of (d), (e), (f), and (g) is Proposition 8.5.9. By Corollary
8.4.5, (c) and (d) are equivalent. In order to show that (c) ⇒ (b), we choose
ϕ1 , ϕ2 ∈ Mor(X, A) and define ϕ to be the morphism of X into A × A given
by ϕ(x) := (ϕ1 (x), ϕ2 (x)). Pulling back identity (c) by ϕ , we obtain
(ϕ1 + ϕ2 )∗ (c) = ϕ∗1 (c) + ϕ∗2 (c),
which is statement (b). We note that (b) ⇒ (a) is trivial.
Next, we show that [−1]∗ (c) − c ∈ Pic0 (A). For a ∈ A , we have [−1] ◦ τa =
τ−a ◦ [−1] proving
τa∗ ([−1]∗ (c)) − [−1]∗ (c) = [−1]∗ (τ−a

(c) − c). (8.10)
∗ ∗
Since τ−a (c) − c ∈ Pic0 (A) (Theorem 8.5.1), (8.10) is equal to c − τ−a (c) by
the implication (e) ⇒ (a). By the theorem of the square in 8.5.2, the latter equals
τa∗ (c) − c . So we have proved
τa∗ ([−1]∗ (c)) − [−1]∗ (c) = τa∗ (c) − c.
The implication (d) ⇒ (e) shows [−1]∗ c − c ∈ Pic0 (A).
Finally, we prove (a) ⇒ (e). Let c be an odd element of Pic0 (A). Then
−2c = [−1]∗ (c) − c ∈ Pic0 (A)
and so c ∈ Pic0 (A) (Proposition 8.8.1). 

Theorem 8.8.4. Let A be the dual abelian variety with corresponding Poincaré

class p ∈ Pic(A × A). Then p is even.
Proof: Let b ∈ A . By Remark 8.4.11 and Theorem 8.8.3, we have
 
([−1]∗ (p)) A×{b} = [−1]∗ (pA×{−b} )[−1]∗ (−b) = b.

Since
 
([−1]∗ (p)) {0}×B = [−1]∗ (p{0}×B ) = 0,
we get [−1]∗ (p) = p by Remark 8.4.11. 
8.9. Decomposition of abelian varieties 267

8.9. Decomposition into simple abelian varieties

We assume in this section that A is an abelian variety over the field K .


Recall from algebra that a module is called completely reducible if it is a direct sum of
irreducible modules. Abelian varieties have a similar property due to Poincaré where the
decomposition into a product of simple abelian varieties is now up to isogenies. The alge-
braic proof is based on the concept of dual abelian varieties from Section 8.5. The analytic
proof sketched at the end replaces duality by the use of a positive Riemann form. As an
application of Poincaré’s complete reducibility theorem and our description of the kernel of
multiplication by an integer from the previous section, we deduce that the group of homo-
morphisms is torsion free.
This section is only a side line in our book and not used anywhere else, but is of major
importance in the theory of abelian varieties. Our presentation is just the beginning of a
wonderful theory representing homomorphisms on the Tate modules (cf. [212], Ch.IV).
Definition 8.9.1. A surjective homomorphism of abelian varieties of the same dimension is
called an isogeny.
8.9.2. By the dimension theorem in 8.2.16, a surjective homomorphism is an isogeny if and
only if the kernel is finite. If c ∈ Pic(A) is ample, then ϕc is an isogeny by Corollary
8.5.11 and its proof. For n ∈ Z \ {0} , it follows from Proposition 8.7.2 that [n] is an
isogeny.

The next result is called Poincaré’s complete reducibility theorem.


Theorem 8.9.3. Let B an abelian subvariety of A . Then there is an abelian subvariety C
of A such that the restriction of addition gives an isogeny
m : B × C −→ A.
Proof: By Corollary 8.5.7, there is an ample c ∈ Pic(A) . Denote by ι : B → A the
inclusion and let  →B
ι: A  be the dual map, as in 8.4.14. By definition, we have
ι ◦ ϕc |B = ϕι∗ (c) .


Since ι (c) is ample on B , by what was remarked in 8.9.2 we have that ϕι∗ (c) is an
isogeny, hence ϕι∗ (c) has finite kernel. Let C be the kernel of 
ι ◦ ϕc . The above shows
that B ∩ C is finite, whence m : B × C → A has finite kernel. The dimension theorem
in 8.2.16 applied to 
ι ◦ ϕc gives
 = dim(A)
dim(C) + dim(B) 

(remember that i◦ϕ is onto because its restriction to B is an isogeny). By Corollary 8.5.11,
we have dim(A) = dim(A) and dim(B)  = dim(B) . Therefore m : B × C → A is
a homomorphism of abelian varieties of the same dimension, with finite kernel. By the
dimension theorem again, we conclude that the homomorphism is onto A , i.e. an isogeny.

Definition 8.9.4. An abelian variety B = {0} is called simple if {0} and B are its only
abelian subvarieties.
268 A B E L I A N VA R I E T I E S

Corollary 8.9.5. There are simple abelian subvarieties B1 , . . . , Br of A such that addi-
tion gives an isogeny m : B1 × · · · × Br −→ A .
Proof: Proceed by induction on the dimension and apply Theorem 8.9.3. 
Proposition 8.9.6. Let A1 , A2 be abelian varieties over K and let Hom(A1 , A2 ) be the
set of homomorphisms from A1 to A2 . Then, with respect to addition of homomorphisms,
the group Hom(A1 , A2 ) is a torsion-free abelian group.
Proof: Let us assume that we have m ∈ Z \ {0} and ϕ ∈ Hom(A1 , A2 ) such that
[m] ◦ ϕ = 0 . Let l be a prime different from char(K) with l | m . For r ∈ N , the
restriction of ϕ induces a homomorphism A1 [lr ] → A2 [lr ] . Proposition 8.7.2 shows that
Ai [lr ] ∼ r
= (Z/l Z)
2dim(A i )
.
Since [m] ◦ ϕ = 0 and l | m , we conclude

ϕ A 1 [l r ]
= 0.
Now let B be a simple abelian subvariety of A1 . The above applied to B instead of A1
shows that ϕ vanishes on infinitely many points of B . Therefore the restriction of ϕ to B
is zero (take the closure of the vanishing points and use that B is simple). Corollary 8.9.5
shows that ϕ = 0 . 
Remark 8.9.7. We can even prove that Hom(A1 , A2 ) is a free abelian group of rank
smaller or equal to 4dim(A1 )dim(A2 ) . It can also be shown that, for an isogeny ϕ :
A1 → A2 , there is an isogeny ψ : A2 → A1 such that ψ ◦ ϕ is equal to multiplication
with some non-zero integer. With this in mind, we show that the factors in Corollary 8.9.5
are uniquely determined up to isogeny and renumbering. For the proof of these results, we
refer to [212], Ch.IV, §19, or to [204], §12.
8.9.8. Complex analytically, A may be identified with a complex torus V /Λ with a positive
definite Riemann form H on V for Λ (cf. 8.2.27). An abelian subvariety B of A is given
by a subspace W of V such that Λ ∩ W is a lattice in W . Then B = W/(Λ ∩ W ) .
Now Poincaré’s complete reducibility theorem may be proved complex analytically in the
following way. Let W ⊥ be the orthogonal complement with respect to H . As H is not
degenerate (i.e. its matrix has full rank), the same holds for E := H . We may view E
as a non-degenerate alternating bilinear form on the Q -vector space Λ ⊗ Q and hence the
orthogonal complement of Λ ∩ W in Λ ⊗ Q with respect to E has dimension equal to
2(dim(V ) − dim(W )) . It is easily checked that the orthogonal complement is spanned
by Λ ∩ W ⊥ , where ⊥ is first meant with respect to E . Now for a complex subspace
of V , the orthogonal complements with respect to E and H are the same. We conclude
that Λ ∩ W ⊥ is a lattice in W ⊥ . Let C be the abelian subvariety W ⊥ /(Λ ∩ W ⊥ ) . As
(Λ ∩ W ) ⊕ (Λ ∩ W ⊥ ) is a sublattice of Λ (using H positive definite), we get an isogeny
B × C → A . The kernel has cardinality equal to the index of the lattices.

8.10. Curves and Jacobians

In this section, we assume that the reader is familiar with the Riemann–Roch
theorem of curves from A.13.5 and with related cohomological arguments
8.10. Curves and Jacobians 269

provided by Section A.10. Let C be an irreducible smooth projective curve over


a field K of genus g ≥ 1 with base point P0 ∈ C(K). By A.7.14, the existence
of a K -rational point ensures that C is geometrically irreducible.

Definition 8.10.1. The Picard variety of C is called the Jacobian variety of C .

We denote the Jacobian by J . By A.9.40, J is equal as a group to the rational


equivalence classes of divisors of degree 0 on CK . For every intermediate field
K ⊂ L ⊂ K , Corollary 8.4.10 shows that the L-rational points of J may be
identified with the rational equivalence classes defined over L. The definition also
makes sense for g = 0, but then the Jacobian variety is {0}.

8.10.2. The goal of this section is to deduce the properties of the Jacobian variety
from our previous results for Picard varieties. However, this is not the historical
approach, where Weil and Chow constructed the Jacobian directly from products
of C . We first sketch Weil’s approach in 8.10.3 and then Chow’s approach in
8.10.4. As both constructions are more elementary than the use of the Picard
variety (involving some machinery of algebraic geometry), we should note that
they do not give immediately the universal property of the Picard variety. In the
logic of this book, our developement of the theory does not follow the historical
approach.
In 8.10.5, we briefly describe the well-known analytic construction of the Jacobian
making it clear that the holomorphic differential forms on C and J are the ‘same’.
This result holds generally for any base field K , but we skip the algebraic proof
here because it is best proved in the language of schemes (see J.S. Milne [205]).
As an immediate consequence, we shall see that the Jacobian is g -dimensional.
On J , there is an important divisor Θ given as the (g − 1)-fold sum of the image
of C . The goal is to prove that Θ is ample and that the isogeny ϕΘ (cf. 8.5.1)
is an isomorphism, i.e. the Jacobian is self-dual. These facts will have important
diophantine consequences in the following chapters because we can endow J(K)
with a canonical inner product which, at least in the number field case, allows us
to work with an euclidean norm to count rational points.
After some introductory lemmas for divisors on curves, which follow from the
Riemann–Roch theorem, we prove in Proposition 8.10.13 that C may be seen
as a closed subvariety of J . Then we study in 8.10.14–8.10.21 the interrelation
between the Poincaré classes of C and J leading to the self-duality of J and to the
ampleness of Θ in Theorem 8.10.22. In 8.10.23, we proceed complex analytically
to construct a canonical Riemann form for the Jacobian and we give Riemann’s
theorem relating the theta divisor with Riemann’s theta function.

8.10.3. The first construction goes back to Jacobi, in the analytic setting. The construction
over fields of arbitrary characteristic was obtained by A. Weil, as sketched below.
270 A B E L I A N VA R I E T I E S

We assume that K is algebraically closed. Let Sr be the symmetric group with r elements.
The group Sr acts on C r by permuting factors. By considering symmetric functions, we
show that C (r) := C r /Sr is a smooth variety of dimension r , which clearly parametrizes
the effective divisors on C of degree r .
Now consider the special case r = g . By the Riemann–Roch theorem (see A.13.6), if D
is any divisor of degree g , the linear system |D| has dimension dim H 1 (C, O(D)) ≥ 0
and therefore is not empty. Generally, its dimension will be 0 (see Lemma 8.10.10 below
for details); if the dimension is larger than 0 , D is called a special divisor. Note that,
by Roch’s part of the Riemann–Roch theorem (the duality), D is special if and only if the
linear system |KC − D| , where KC denotes a canonical divisor of C , is not empty. For
example, if g = 2 we see that D is special if and only if it is a canonical divisor. The study
of special divisors for curves of high genus is highly non-trivial, and we refer for example
to E. Arbarello, M. Cornalba, P. Griffiths, and J. Harris [13] for the reader who wants to
learn more on this topic. We denote by U(g) the Zariski open subset of C (g) of non-special
effective divisors of degree g (see Lemma 8.10.11 below).
Recall that P0 is the base point on C and let D1 , D2 be two points on U(g) . The divisor
D1 + D2 − gP0 has degree g and if it were not special it would give a well-defined third
point D3 on U(g) , namely the unique effective divisor in |D1 + D2 − gP0 | . We show next
that there is an open dense subset V(g) of U(g) such that the addition law D1 ⊕ D2 = D3
induces a well-defined morphism
V(g) × V(g) −→ U(g) .

This is in fact a so-called birational group law on C (g) , where the sum D3 of D1 , D2 ∈
V(g) is given by the rational equivalence
D1 + D2 − gP0 ∼ D3 ≥ 0.

Now Weil gives a general process which attaches to every birational group law on a smooth
variety X a group variety G and a birational map ϕ of X into G with ϕ(ab) = ϕ(a)ϕ(b)
whenever the left-hand side is defined. Moreover, G is uniquely defined up to canonical
isomorphisms. So, in our case, we get a commutative group variety J . It can be identified
with Pic0 (C) in such a way that the birational map C (g)  J is equal to the morphism
j(g) : C (g) → Pic0 (C) given by
j(g) (P1 , . . . , Pg ) = cl([P1 ] + · · · + [Pg ] − g[P0 ]).

Since C (g) is complete, the same is true for J (see A.6.15 (c)) and so J is an abelian
variety of dimension g .
It is a point of historical significance that this last part of the construction required, and in
fact motivated, the concept of abstract variety in the sense of Weil. It still remained to see
that the Jacobian so constructed was indeed a projective variety.

8.10.4. Somewhat later Chow gave a different construction which obtained J directly as a
projective variety. Again, we assume for simplicity K algebraically closed. Chow bypassed
the difficulty created by special divisors by constructing first not the Jacobian itself, but
rather a certain projective bundle over it. The idea is the following.
8.10. Curves and Jacobians 271

Consider C (n) with n ≥ 2g − 1 . Let D be an effective divisor of degree n ; then the Serre
duality in A.10.29 gives
dim H 1 (C, O(D)) = 0,
therefore |D| is a projective space of dimension precisely n − g , which is a subvariety of
C (n) . Conversely, every closed subvariety of C (n) isomorphic to Pn−g is equal to a linear
system |D| of an effective divisor D of degree n . Just note that any line in C (n) joins
rationally equivalent divisors by the very definition of rational equivalence. Thus the variety
parametrizing the projective subspaces of C (n) of dimension n − g may be identified with
the rational equivalence classes of effective divisors on C of degree n . By the Riemann–
Roch theorem, every divisor of degree n is rationally equivalent to an effective divisor. By
means of the map D
→ D − n[P0 ] , the above parameter variety may be identified with
J = P ic0 (C) .
In fact, the subvarieties of a projective variety X of fixed degree and fixed dimension form
an algebraic family (Chow, Van der Waerden, Cayley); constructive proofs can be obtained
via elimination theory and the theory of Cayley–Chow form. This suffices to give J the
structure of a projective variety and the only problem is to show that it is smooth, which
Chow proved by a direct calculation.
8.10.5. In the context of complex geometry there is a more familiar construction of the
Jacobian variety (for details consult [130]).
-
Let γ be a 1 -cycle on C . Then γ ω is a linear functional on the holomorphic 1 -forms on
C , whose value depends only on the homology class of γ in H1 (C, Z) . Thus we obtain a
homomorphism
H1 (C, Z) −→ H 0 (C, Ω1C )∗
of the homology group H1 (C, Z) into the dual H 0 (C, Ω1C )∗ of the space of holomorphic
1 -forms on C . This embeds H1 (C, Z) as a lattice in H 0 (C, Ω1C )∗ . Then the complex
torus
J = H 0 (C, Ω1C )∗ / H1 (C, Z)
realizes the Jacobian variety complex analytically (cf. 8.10.23 for details). We have an
embedding

j : C −→ J, P
−→ ,
γp

where γp is any path connecting the base point P0 with P . The value j(P ) is independent
of the choice of the path. Independently of the choice of the base point P0 , we have a
homomorphism

n 
n
Pic0 (C) −→ J, ([Pi ] − [Qi ])
−→ (j(Pi ) − j(Qi )).
i=1 i=1

Abel’s theorem gives the injectivity and the Jacobi inversion theorem the surjectivity of
this homomorphism. There is a natural isomorphism of H 0 (J, Ω1J ) onto the dual of the
tangent space TJ,0 (use Proposition 8.2.26). Pull-back induces an isomorphism
∼ H 0 (C, Ω1 ).
H 0 (J, Ω1J ) −→ (8.11)
C
272 A B E L I A N VA R I E T I E S

Remark 8.10.6. This isomorphism holds for any base field K . More precisely,
let J = P ic0 (C) be the Jacobian of C and let us consider the map j : C → J ,
given by P → cl([P ] − [P0 ]). It follows from the theory of Picard varieties that j
is a morphism of varieties over K . In fact, let ∆ be the diagonal in C × C and let
p1 , p2 be the projections of C × C onto C , then cl(∆) − p∗1 cl([P0 ]) − p∗2 cl([P0 ])
is a subfamily of Pic0 (C) parametrized by C . By Theorem 8.4.13, we conclude
that j is a morphism. Then j is the j(1) in Weil’s approach 8.10.3 and j is the
same as the analytic j in 8.10.5. It can be proved that j ∗ gives the isomorphism
(8.11) (use [44], Th.8.4.1 and Prop. 8.4.2, or [205], Prop.2.2).
Corollary 8.10.7. The Jacobian variety of C has dimension g .
Proof: By Proposition 8.2.26, the tangent bundle TJ is a trivial vector bundle of
rank dim(J). By duality, the same holds for the cotangent bundle. Now recall that
the only regular functions on an irreducible complete variety over an algebraically
closed field are the constants (use A.6.15 (b), (d)). Since J is geometrically re-
duced, compatibility of cohomology and base change holds (see A.10.28). This
shows that H 0 (J, OJ ) = K and hence
dim H 0 (J, Ω1J ) = dim(J) · dim H 0 (J, OJ ) = dim(J).
The claim follows using the isomorphism (8.11). 
Remark 8.10.8. An important role on the Jacobian variety J is played the theta
divisor
Θ := j(C) + · · · + j(C) .
' () *
g−1
We shall show below that Θ is indeed a divisor on J . In order to deduce el-
ementary properties of j and of the theta divisor, we need the following three
lemmas. For a divisor D and a line bundle L, it is convenient to use the notation
L(D) := L ⊗ O(D) and similarly for the sheaf of sections.
Lemma 8.10.9. Assume that the ground field K is algebraically closed. Let L be
a line bundle on C . Then for every P ∈ C , we have
dim Γ(C, L(−[P ])) ≥ dim Γ(C, L) − 1.
Equality holds if and only if P is not a base point of L.
Proof: Let sP be the canonical global section of O([P ]) corresponding to the
divisor [P ] on C . Then we identify Γ(C, L(−[P ])) with a subspace of Γ(C, L)
using the injective map s → s ⊗ sP . Clearly, Γ(C, L(−[P ])) is the kernel of the
evaluation map
Γ(C, L) −→ K, s → s(P )
at P . This map is surjective if and only if P is not a base point of L (cf. A.5.20).
A fancier way to deduce this is the use of the skyscraper sheaf KP at P (see
Example A.10.20). Taking global sections in the short exact sequence
0 −→ L(−[P ]) −→ L −→ KP −→ 0,
8.10. Curves and Jacobians 273

where L is the sheaf of sections of L, we get the same result. Finally, using the
dimension formula from linear algebra, we get the claim. 
Lemma 8.10.10. Let L be a line bundle on C and let r ∈ {1, . . . , dim Γ(C, L)}.
Then there is a dense open subset U of C r such that
⎛ ⎛ ⎞⎞
r
dim Γ ⎝CK , LK ⎝− [Pj ]⎠⎠ = dim Γ(C, L) − r
j=1

for all (P1 , . . . , Pr ) ∈ U .


Proof: A repeated application of Lemma 8.10.9 gives us a sequence of distinct
points P1 , . . . , Pr such that
⎛ ⎛ ⎞⎞

r
dim Γ ⎝CK , LK ⎝− [Pj ]⎠⎠ = dim H 0 (C, L) − r.
j=1

On the right-hand side, we have used that forming cohomology groups is compat-
ible with base change (see A.10.28). Fix a basis s1 , . . . , sn of Γ(C, L) and let
Q1 , . . . , Qr be distinct points of C . We have an exact sequence
⎛ ⎞
r 5 r
0 −→ LK ⎝− [Qj ]⎠ −→ LK −→ KQj −→ 0
j=1 i=1

of sheaves on CK , where L is the sheaf of sections of L and KQj is the skyscraper


sheaf at Qj . The first part of the long exact cohomology sequence (see A.10.22)
reads as
  r 
 r
0 −→ H CK , LK −
0
[Qj ] −→ H 0 (CK , LK ) −→ K .
i=1
The last map is evaluation at Q1 , . . . , Qr . It is surjective if and only if
 
det si (Qj ) i∈I;j=1,...,r = 0
for some I ⊂ {1, . . . , n} of cardinality r . Let U be the set of points in C r
satisfying this last condition. Obviously, U is an open subset of C r . By the
equivalence above, the points in U satisfy the conclusion of the lemma. Finally,
we have seen at the beginning of the proof that (P1 , . . . , Pr ) ∈ U . Hence U is
not empty and is an open dense subset of the irreducible smooth variety C r (cf.
A.4.11). 
Lemma 8.10.11. Let r ∈ {1, . . . , g}. Then
⎧ ⎛ ⎛ ⎞⎞ ⎫
⎨ 
r ⎬
Ur := (P1 , . . . , Pr ) ∈ C r | ∀i = j ⇒ Pi = Pj , dim Γ⎝CK , O⎝ [Pj ]⎠⎠ = 1
⎩ ⎭
j=1
r
is an open dense subset of C .
274 A B E L I A N VA R I E T I E S

Proof: By Lemma 8.10.10 and its proof applied to L = Ω1C , the set of (P1 , . . . , Pr )
with ⎛ ⎛ ⎞⎞
 r
dim H 0 ⎝CK , Ω1CK ⎝− [Pj ]⎠⎠ = g − r
j=1
and Pi = Pj for all i = j is an open dense subset of C r . By the Riemann–Roch
theorem in A.13.5, this set is Ur . 
Remark 8.10.12. For any r ∈ N , we have a map
⎛ ⎞

r
jr : C r −→ J, (P1 , . . . , Pr ) → cl ⎝ [Pj ] − r[P0 ]⎠ .
j=1

Since j = j1 and addition on J are both morphisms (see Remark 8.10.6), we


easily deduce that jr is a morphism. Note that its image is closed because C r is
complete (see A.6.15 (b)). Let a := jr (P1 , . . . , Pr ), then the fibre over a is
⎧ ⎫
⎨  r  r ⎬
jr−1 (a) = (Q1 , . . . , Qr ) ∈ C r | [Qj ] ∼ [Pj ] .
⎩ ⎭
j=1 j=1

Let r ∈ {1, . . . , g} and (P1 , . . . , Pr ) ∈ Ur (as introduced in Lemma 8.10.11).


Then the fibre over a is obtained by permuting the entries, namely
jr−1 (jr (P1 , . . . , Pr )) = {(Pπ(1) , . . . , Pπ(r) ) | π ∈ Sr }.
By the dimension theorem in A.12.1, we conclude that dim jr (C r ) = r . In par-
ticular, Corollary 8.10.7 implies that the morphism jg is surjective. Moreover,
Θ = jg−1 (C g−1 ) is indeed a divisor.
Proposition 8.10.13. The map j : C → J, P → cl([P ] − [P0 ]), is a closed
embedding.
Proof: We may assume that K is algebraically closed (cf. A.6.12). Since g ≥ 1,
two points of CK are rationally equivalent if and only if they are equal (use the
Riemann–Roch theorem or give a direct proof). Hence j is one-to-one. We claim
that dj induces an injective map between tangent spaces. In order to prove this, it
is enough to show that the dual is surjective between cotangential spaces. We have
seen that for any a ∈ J , a cotangential vector in a extends canonically to a global
section of Ω1J (Proposition 8.2.26). By the isomorphism (8.11) on page 271, it

is enough to show that the evaluation map Γ(C, Ω1C ) → TC,P is surjective for
P ∈ C . Note that the kernel of the evaluation map is Γ(C, Ω1C (−[P ])). By
injectivity of j , we know that Γ(C, O([P ])) has dimension 1. By the Riemann–
Roch theorem in A.13.5, we conclude that the kernel is g − 1 dimensional and
hence the evaluation map is surjective.
Since J is a projective variety (Corollary 8.5.7), we have a closed embedding
J → PnK . In order to prove that j is a closed embedding, we have to show that the
linear system corresponding to the induced map C → PnK separates points and
8.10. Curves and Jacobians 275

tangent vectors ([148], Remark II.7.8). The first (resp. second) condition follows
by injectivity of j (resp dj ). 
8.10.14. As a divisor on J , we also consider
Θ− := [−1]∗ Θ = −j(C) − · · · − j(C) .
' () *
g−1

In Pic(J), we use θ := cl(O(Θ)) and θ := [−1]∗ θ . For a ∈ J , we set
ja := τ−a ◦ j , i.e. ja is the map C → J given by ja (P ) = j(P ) − a.
The pull-back of a divisor D with respect to a morphism ϕ : X → X  of
irreducible smooth varieties over K is well defined as a divisor if ϕ(X) is not
contained in the support of D (see A.8.26). In this case, viewing D as a Cartier
divisor on X  locally given on Uα by a rational function fα , the pull-back ϕ∗ (D )
is given on ϕ−1 (Uα ) by the rational function fα ◦ ϕ . Note that ϕ∗ (D ) is
well defined in CH 1 (X) for any divisor D on X  (see A.9.18). If ϕ is an
isomorphism (as [−1] is in the cases above) and if D is a prime divisor, then
ϕ∗ (D ) = ϕ−1 (D ).
Proposition 8.10.15. Assume K algebraically closed. For all (P1 , . . . , Pg ) ∈
C g , we have the rational equivalence relation

g
[Pi ] ∼ ja∗ (Θ− )
i=1

of divisors on C , where a := jg (P1 , . . . , Pg ).

If K is not algebraically closed, then Corollary 8.4.10 (a) shows that the rational
equivalence holds over any field of definition for P1 , . . . , Pg .
Proof: The idea is to show that the intersection of j(C) and Θ− + a is transverse
for generic (P1 , . . . , Pg ). Then the proposition will follow in the generic case
from our first step below. An application of the theorem of the square will lead to
the general case.
g
First step: For (P1 , . . . , Pg ) ∈ C g with dim Γ(C, O( i=1 [Pi ])) = 1, we have
(Θ− + a) ∩ j(C) = {j(P1 ), . . . , j(Pg )}.
To prove the first step, we note that for Q ∈ C , we have j(Q) ∈ Θ− + a if and
only if there are Q1 , . . . , Qg−1 ∈ C such that

g−1 
g
[Q] − [P0 ] ∼ (g − 1)[P0 ] − [Qi ] + [Pi ] − g[P0 ].
i=1 i=1
This is equivalent to

g−1 
g
[Q] + [Qi ] ∼ [Pi ].
i=1 i=1
276 A B E L I A N VA R I E T I E S

g g
By assumption, the linear system | i=1 [Pi ]| consists only of i=1 [Pi ]. This
proves immediately the first step.
Second step: For 1 ≤ r ≤ g and (P1 , . . . , Pr ) ∈ Ur (defined in Lemma 8.10.11),
the differential djr : TC r ,(P1 ,...,Pr ) → TJ,jr (P1 ,...,Pr ) is injective.
As in the proof of Proposition 8.10.13, it is enough to show that the dual map is
surjective. The kernel of the dual map
5
r

TJ,j r (P1 ,...,Pr )
−→ TC∗ r ,(P1 ,...,Pr ) ∼
= ∗
TC,P i
i=1

may be identified again with


+
r
Γ(C, Ω1C ([Pi ]) = Γ(C, Ω1C (−[P1 ] − · · · − [Pr ])
i=1

using (8.11) on page 271 and using that regular functions on C are constant (see
A.6.15 (b), (d)). By the Riemann–Roch theorem in A.13.5 (cf. proof of Lemma
8.10.11), we conclude that the kernel has dimension g − r . Since dim(J) = g
(Corollary 8.10.7), this proves the second claim.
Third step: For generic (P1 , . . . , Pg ) ∈ C g , the intersection of Θ− + a and j(C)
is transverse.
For the definition of transverse intersection, see Example A.9.22. Generic means
that it holds in an open dense subset of C g . Thus we may assume (P1 , . . . , Pg ) ∈
Ug . From the first step, we get
{j(P1 ), . . . , j(Pg )} = (Θ− + a) ∩ j(C).
We have to omit the singular part of Θ− +a, i.e. we do not want j(Pi ) ∈ Θ−
sing +a
for any i ∈ {1, . . . , g}. This incidence is equivalent to
j(P1 ) + · · · + j(Pi−1 ) + j(Pi+1 ) + · · · + j(Pg ) ∈ Θsing .
Clearly, these are closed conditions of codimension ≥ 1 on C g . So we may
assume that j(C) intersects Θ− + a only in the smooth locus. It remains to show
that the tangent spaces Tj(C),j(Pi ) and TΘ− +a,j(Pi ) span TJ,j(Pi ) . Without loss
of generality, we may assume i = 1.
We consider the morphism
ϕ : C g → J, (P1 , . . . , Pg ) → j(P1 ) − j(P2 ) − · · · − j(Pg ).
For P := (P1 , . . . , Pg ) ∈ Ug , we claim that dϕP is injective. Again, it is enough
to show surjectivity of the dual map
5
g

(δϕ)P : TJ,ϕ(P) −→ TC∗ g ,P ∼
= ∗
TC,P i
.
i=1
8.10. Curves and Jacobians 277

We may identify the left-hand side with Γ(C, Ω1C ) (via the isomorphism (8.11) on
page 271 and Proposition 8.2.26). For ω ∈ Γ(C, Ω1C ), we have (δj)P (ω) = ω(P )
and Corollary 8.2.25 leads easily to δ(−j)P = −ω(P ) for any P ∈ C . This
shows
5g

(δϕ)P (ω) = (ω(P1 ), −ω(P2 ), . . . , −ω(Pg )) ∈ TC,P i
i=1
and, as in the second step, we deduce that (dϕ)P is one-to-one. By dimensionality
reasons, (dϕ)P is an isomorphism for all P ∈ Ug .

We note that (dj)P1 : TC,P1 −→ Tj(C),j(P1 ) by Proposition 8.10.13. As the claim
has to be proved only generically, we may assume that (P2 , . . . , Pg ) ∈ Ug−1 . The
second step shows that

d(−jg−1 ) : TC g −1 ,(P2 ,...,Pg ) −→ TΘ− ,−jg −1 (P2 ,...,Pg ) .
We have also used that we map to the smooth locus of Θ− . We apply (dϕ)P to
the canonical direct sum decomposition
TC g ,(P1 ,...,Pg ) ∼
= TC,P1 ⊕ TC g −1 ,(P2 ,...,Pg )
from A.7.17. These four isomorphisms lead to the interior direct sum decomposi-
tion
TJ,ϕ(P) = Tj(C)−jg −1 (P2 ,...,Pg ),ϕ(P) ⊕ TΘ− +j(P1 ),ϕ(P) .
The third claim now comes by applying the isomorphism dτjg −1 (P2 ,...,Pg ) .
Fourth step: Proof of the proposition for generic (P1 , . . . , Pg ) ∈ C g .
We choose a generic (P1 , . . . , Pg ) on C g . By the third step, we know that the
intersection of Θ− + a and j(C) is transverse, hence all components in the in-
tersection product have multiplicity one (Example A.9.22). By the first step, we
get
g
(Θ− + a).j(C) = j(Pi ).
i=1
If we identify C with j(C), then it follows from the definitions that ja∗ (Θ− )
corresponds to the above intersection product. This proves the fourth step.
Fifth step: Proof of the proposition for all (P1 , . . . , Pg ) ∈ C g .
We note first that for two dense open subsets U, V of J , we have J = U − V .
This follows easily from the fact that the intersection of U and a + V is not empty
for all a ∈ J . By the fourth step and since jg is a surjective closed morphism,
there is an open dense subset U of J such that

g
[Pi ] ∼ ja∗ (Θ− )
i=1
278 A B E L I A N VA R I E T I E S

for all P1 , . . . , Pg ∈ C with a = jg (P1 , . . . , Pg ) ∈ U . Now let (P1 , . . . , Pg ) be


any point in C g . Since J = U +U −U , there are x, y, z ∈ U with a = x+y −z .
The map jg is surjective (cf.Remark 8.10.12), hence we have (Q1 , . . . , Qg ) ∈ C g
g
mapping to x . Let Dx := i=1 [Qi ]. We define divisors Dy and Dz on C in a
similar way. The theorem of the square from 8.5.2 shows that

τz−x−y (Θ− ) ∼ τ−x

(Θ− ) + τ−y (Θ− ) − τ−z

(Θ− ).
This proves
ja∗ (Θ− ) ∼ j ∗ ◦ τz−x−y

(Θ− ) ∼ jx∗ (Θ− ) + jy∗ (Θ− ) − jz∗ (Θ− ).
Since the claim is true for x, y, z , we conclude
ja∗ (Θ− ) ∼ Dx + Dy − Dz .
The relation
jg (P1 , . . . , Pg ) = a = x + y − z
implies

g
[Pi ] ∼ Dx + Dy − Dz
i=1
proving the proposition. 
Corollary 8.10.16. For all (P1 , . . . , Pg ) ∈ Ug and a := jg (P1 , . . . , Pg ), we have

g
[Pi ] = ja∗ (Θ− )
i=1
as an identity of divisors.
Proof: We may assume, by base change, that K is algebraically closed. By Propo-
sition 8.10.15, we have
 g
[Pi ] ∼ ja∗ (Θ− ).
i=1

g sides are effective divisors on C . By assumption, the linear system


Both
| i=1 [Pi ]| is zero-dimensional proving the claim. 
Corollary 8.10.17. For a ∈ J = P ic0 (C), we have
ja∗ (θ − ) − j ∗ (θ − ) = a.
Proof: By base change and Corollary 8.4.10, we may assume K algebraically
closed. Then the claim follows from surjectivity of jg (cf. Remark 8.10.12) and
Proposition 8.10.15. 
There are two Poincaré classes in the context of Jacobians. One is the Poincaré
class pC ∈ Pic(C × J), the other is the Poincaré class pJ ∈ Pic(J × J),  where
J is the dual abelian variety of J . In the final part of this section, we shall study
relations between these Poincaré classes and the class θ ∈ Pic(J) of the theta
divisor. This will lead us to the self-duality of the Jacobian.
8.10. Curves and Jacobians 279

Proposition 8.10.18. Let ∆ denote the diagonal in C × C . Then


(idC × j)∗ (pC ) = cl(O(∆ − C × {P0 } − {P0 } × C)).
Proof: By the characterization of the Poincaré class in Remark 8.4.11, we get for
P ∈ C:
 
(idC × j)∗ (pC ) C×{P }
pC  = cl(O([P ] − [P0 ])).
C×{j(P )}

Since the restriction of O(∆ − C × {P0 } − {P0 } × C) to C × {P } is in the same


class, we get the claim by the seesaw principle in Corollary 8.4.4 (noting that the
restrictions to {P0 } × C of both classes are 0). 
Proposition 8.10.19. Let m : J × J → J be addition and let p1 , p2 be the
projections of J × J onto the corresponding factor. For c := m∗ θ − − p∗1 θ − −
p∗2 θ − ∈ Pic(J × J), we have
(j × idJ )∗ (c) = −pC .
Proof: For a ∈ J = P ic0 (C), we have (pC )a = a . By Corollary 8.10.17, we
conclude that
(pC )a = j ∗ ◦ τ−a

(θ − ) − j ∗ θ − .
We have already seen in the proof of Corollary 8.4.5 that

c J×{a}
= τa∗ (θ − ) − θ − .
By pull-back of the identity to C , we obtain
((j × idJ )∗ (c))a = j ∗ ◦ τa∗ (θ − ) − j ∗ θ − .
This is equal to (pC )−a = −a = −(pC )a . Since the restriction of c to {0} × J
is 0, we get 
(j × idJ )∗ (c){P }×J = 0.
0

As pC |{P0 }×C = 0 as well, the claim follows from the seesaw principle (see
Corollary 8.4.4). 
Proposition 8.10.20. Let ϕθ , ϕθ− be the morphisms J → J introduced in 8.5.1
and let c := m∗ θ − − p∗1 θ − − p∗2 θ − ∈ Pic(J × J), as in Proposition 8.10.19.
Then
(idJ × ϕθ− )∗ (pJ ) = (idJ × ϕθ )∗ (pJ ) = c.
Proof: Let a ∈ J . Then
ϕθ− (a) = τa∗ (θ − ) − θ −
by definition. By Theorem 8.4.13, we get
 
(idJ × ϕθ− ) (pJ )J×{a} pJ J×{ϕ − (a)} = τa∗ (θ − ) − θ − .

θ


The latter is equal to c J×{a} (see proof of Proposition 8.10.19) and so we obtain,
by the seesaw principle in 8.4.4, that

(idJ × ϕθ− ) (pJ ) = c.
280 A B E L I A N VA R I E T I E S

Replacing θ − by θ , we get in the same way



(idJ × ϕθ ) (pJ ) = m∗ θ − p∗1 θ − p∗2 θ.
If we apply [−1]∗ , then the left-hand side of the equation does not change, be-
cause ϕθ is a homomorphism (see Theorem 8.5.1) and pJ is even (see Theorem
8.8.4). Instead, the pull-back of the right-hand side is equal to c . Our assertion
follows. 
8.10.21. We summarize here our findings.
Given a curve C of genus g ≥ 1 with base point P0 ∈ C(K), there is a natural
embedding j of C into the Jacobian variety. By 8.5.1, we have a dual homomor-
j : J → J . The theta divisor is defined by
phism 
Θ := j(C) + · · · + j(C)
' () *
g−1 times

and the corresponding class in Pic(J) is denoted by θ . Let θ − := [−1]∗ θ and let
ϕθ : J → J be the natural morphism introduced in Theorem 8.5.1. There are three
canonical morphisms from J × J onto J , namely addition m , first projection p1
and second projection p2 . The pull-back of the Poincaré class pJ ∈ Pic(J × J) 
by idJ × ϕθ is equal to the class
c := m∗ θ − − p∗1 θ − − p∗2 θ −
and it follows from the proof of Proposition 8.10.20 that
c = m∗ θ − p∗1 θ − p∗2 θ.
The next theorem shows that ϕθ is an isomorphism. We can identify J and J by
this natural isomorphism, so it makes sense to consider J as self-dual and c as
the Poincaré class of J .
Theorem 8.10.22. The map ϕθ is an isomorphism of J onto J whose inverse is
−
j . Moreover, θ is ample.
j ◦ ϕθ = idJ . For a ∈ J , we have
Proof: We first show −
j ◦ ϕθ (a) = −j ∗ (τa∗ (θ) − θ) .
−
Since τa∗ (θ) − θ ∈ J = P ic0 (J) is odd (Theorem 8.8.3), we may replace −j ∗
on the right-hand side by j ∗ ◦ [−1]∗ , leading to
j ◦ ϕθ (a) = j ∗ ◦ τ−a
 ∗
(θ − ) − j ∗ (θ − ) = ja∗ (θ − ) − j ∗ (θ − ) = a
by Corollary 8.10.17. We get −j ◦ ϕθ = idJ . In particular, ϕθ is injective. Since
J and J have the same dimension (Corollary 8.5.11), the dimension theorem in
8.2.16 proves that ϕθ is also surjective proving the first claim. By Proposition
8.5.5, θ has to be ample. 
8.10. Curves and Jacobians 281

To complete the picture we show that Θ− is a translation of Θ . Let KC be an effective


canonical divisor of C . It can be viewed as a point of C (2g−2) (cf. 8.10.3). The image
k := j(2g−2) (KC ) ∈ J is independent of the choice of KC .

Proposition 8.10.23. With the above notation, we have


Θ− = τk∗ (Θ).
Proof: Let a ∈ Θ , then there is an effective divisor D ∈ C (g−1) mapping to a . By the
Riemann–Roch theorem in A.13.5
dim Γ(C, O(D)) − dim Γ(C, Ω1 (−D)) = 0.
Thus there is an effective divisor D of degree g − 1 such that D + D is a canonical
divisor, say KC . Therefore
j(2g−2) (KC ) − j(g−1) (D ) = j(g−1) (D) = a.
This proves
Θ− ⊃ τk∗ (Θ).
Both are prime divisors and so they are equal. 

8.10.24. We end this chapter by interpreting the situation complex analytically (for details,
cf. [130]). We claimed in 8.10.5 that the Jacobian variety J is given by the complex torus
H 0 (C, Ω1C )∗ /H1 (C, Z). (8.12)
∼ 1 ∗
- (C) −→ H (X, Ω ) given by map-
0,1 0
Note first that we have a complex isomorphism H
ping the (0, 1) -form ρ to the linear functional · ∧ ρ . By Serre duality in A.10.29 and
8.4.15, we conclude that (8.12) is indeed the Jacobian variety.
We give a direct argument that (8.12) is a complex abelian variety. The intersection num-
ber induces a canonical alternating Z -valued bilinear form E on H1 (C, Z) given by
E(γ1 , γ2 ) := γ2 · γ1 . It is enough to show that E is the imaginary part of a posi-
tive definite Hermitian form on H 0,1 (C) . By Poincaré duality, we have an isomorphism

H1 (C, R) −→ HdR 1
(C, R) by mapping the cycle γ to ηγ characterized by

1
η= η ∧ ηγ (η ∈ HdR (C, R)).
γ C

The lattice H1 (C, Z) is realized in H 0,1 (C) by mapping γ to the projection ηγ0,1 of ηγ
with respect to the Hodge decomposition. Now recall that intersection products of cycles
correspond via Poincaré duality to wedge products of forms (cf. [130], p.59), i.e.

γ·δ = ηγ ∧ ηδ .
C

Then it is easily seen that E is the imaginary part of the positive definite Hermitian form

H(ρ, µ) = −2i ρ ∧ µ (8.13)

on H 0,1 (C) .
282 A B E L I A N VA R I E T I E S

Topologically, the compact Riemann surface C is characterized by its g holes and there is
a basis γ1 , . . . , γ2g of H1 (C, Z) such that the intersection matrix has the form

0 Ig
(γj .γk ) = , (8.14)
−Ig 0
where Ig is the g × g -identity matrix (to see this, choose γ1 , . . . , γg “around” the holes
and γg+1 , . . . , γ2g “through” the holes). We choose a basis ω1 , . . . , ωg of H 0 (C, Ω1C )
such that the period matrix has the form

ωj = (Ig Z) .
γk j=1,...,g; k=1,...,2g

Then ω1 , . . . , ωg give rise to complex coordinates on the torus J , which we identify with
Cg /Λ such that γ1 , . . . , γg correspond to the the standard basis. Since H is a positive
definite Riemann form, we easily deduce that Z is a symmetric g × g -matrix with Z
positive definite. We have Λ = Zg + Z · Zg as in the example of 8.6.13. Moreover, the
Riemann form in (8.13) equals the one in (8.9) on page 262. By (8.14) and 8.6.13 again, we
deduce that the Riemann theta function

θ(v, Z) = exp(πikt Zk + 2πikt v)
k∈Zg

corresponds to a global section s of a line bundle L on J , with Riemann form H and with
dim Γ(J, L) = 1 . Riemann’s theorem (cf. [130], p.338) states that the theta divisor Θ
from 8.10.8 is equal to div(s) up to translation.

8.11. Bibliographical notes

The presentation of the material is as in the classical treatises of S. Lang [165] and
Mumford [212], to which our exposition owes a great deal. We have also used
the modern survey articles of J.S. Milne [204], [205] at several places. The reader
is interested in going deeper into the complex analytic theory of abelian varieties
may consult Griffiths and Harris [130].
The theory of abelian varieties over an arbitrary field was initiated by A. Weil
[327], motivated by his proof of the Riemann hypothesis, where we also find his
construction of the Jacobian. Chow’s construction is given in W.L. [67].
An important part of the theory, which we have left aside, is the study of -adic
representations and Tate groups, for which we refer to Mumford [212].
On another point, while Weierstrass models for elliptic curves are very useful and
simple to study, it is very difficult to describe abelian varieties of dimension > 1
by homogeneous equations. This leads to a deep study of theta functions for which
the reader may consult D. Mumford’s Tata lectures [214].
9 N É RO N – TAT E H E I G H T S

9.1. Introduction

We have already seen the advantages of Weil’s normalized height h(x) on Gm


compared with more naive definitions: It is homogeneous of degree 1, it is not
negative, and torsion points on Gm are precisely the points of height 0 (Kro-
necker’s theorem in 1.5.9). In particular, the height defines a distance function
on Gm (Q)/tors . The heights associated to divisors of varieties studied in Chap-
ter 2 retain similar properties only if we consider them up to bounded functions.
Working, as Weil did, only in the associated equivalence class is formally pleasing
because of its functorial properties, but the price paid is that Weil’s equivalence
relation on heights is too coarse for some of the most important applications, such
as the proof of Faltings’s theorem.
It was a fundamental discovery of Néron that Weil’s equivalence classes of heights
associated to divisors on abelian varieties contain a unique representative with
all the nice functorial properties of Weil’s equivalence classes. Néron’s original
construction gave much more, namely a ‘best model’ (the Néron model) for abelian
varieties over a number field, and a decomposition of the height as a sum of local
heights, much in the same way as it happens for the height on Gm . The Néron
heights obtained in this way can be decomposed as a sum of a quadratic function
and a linear function, which is an extremely useful property in applications.
For many applications, it is enough to have the global Néron height. An elementary
proof of the existence of a normalized height associated to a divisor class on an
abelian variety was then found by Tate. This turns out to be sufficient for our
purposes in this book.
In Sections 9.2 and 9.3 we construct the Néron–Tate heights using Tate’s limit
argument and study the associated bilinear form. In Section 9.4 we make a deeper
study of the Néron–Tate height on Jacobians, in particular the height associated to
the Poincaré class, which will be needed in the proof of Faltings’s theorem, given
in Chapter 11. We prove Mumford’s formula for the height h∆ associated to the
diagonal ∆ on the product C × C of a curve C with itself, and then deduce
Mumford’s gap principle for rational points on a curve.

283
284 N É RO N – TAT E H E I G H T S

In Section 9.5, which is based on Section 2.7, we describe Néron’s canonical lo-
cal heights. This complements the picture and will not be used anywhere else in
this book. In Section 9.6, we use the theory of heights to deduce and extend a
result of Sprindžuk on the distribution of poles and, as an application, we give
a generalization of an old theorem of Runge leading in turn to a new proof of
Hilbert’s irreducibility theorem, based on the theory of local heights. This section
depends on additional material requiring more knowledge from the reader and may
be skipped in a first reading.
Chapters 2 (at least Sections 2.2–2.4) and 8 are the prerequisites for reading this
chapter.

9.2. Néron–Tate heights

Let K be a field with product formula and let A be an abelian variety over K .
All varieties and morphisms are assumed to be defined over K .
9.2.1. Let X be a complete variety over K . By Theorem 2.3.8, we have the height
homomorphism
h : Pic(X) −→ RX(K) /O(1),
which associates to c the equivalence class of heights hc .
In general, there is no canonical height function associated to c ∈ Pic(X). They
are only determined up to bounded functions. But on an abelian variety, there is
a canonical choice  hc of a height function in any class hc characterized by good
behaviour with respect to the group operation. The reader is assumed to be familiar
with the notions and results from Section 8.6.
By the theorem of the cube in 8.6.11, for every c ∈ Pic(A) we have a quadratic
function
Mor(X, A) −→ Pic(X), ϕ −→ ϕ∗ (c). (9.1)
Note that the decomposition c = c+ + c− of c into an even part c+ and an odd
part c− (Corollary 8.8.2) gives a decomposition of our quadratic function into a
quadratic form ϕ → ϕ∗ (c+ ); hence with the homogeneity property
(nϕ)∗ (c+ ) = n2 ϕ∗ (c+ )
(see Proposition 8.7.1), and into a linear form ϕ → ϕ∗ (c− ) (see Theorem 8.8.3).
The composite of the height homomorphism and the quadratic function in (9.1) is
a quadratic function
q : Mor(X, A) −→ RX(K) /O(1), ϕ −→ hϕ∗ (c) .
We conclude that q = q+ + q− for the quadratic form q+ (ϕ) := hϕ∗ (c+ ) and
the linear form q− (ϕ) := hϕ∗ (c− ) . Since 2 is invertible in the abelian group
9.2. Néron–Tate heights 285

RX(K) /O(1), this decomposition is unique, in contrast to c = c+ + c− , which is


unique only up to 2-torsion in Pic(X). In terms of 8.6.5, 2q+ (resp. 2q− ) is the
associated quadratic (resp. linear) form of q .
9.2.2. Let hc± be an arbitrary height function in the class hc± . For any integer
n , we have n2 hc+ = h[n]∗ (c+ ) and nhc− = h[n]∗ (c− ) . By Theorem 2.3.8, there
is a constant C(n) such that for every a ∈ A
|hc+ (na) − n2 hc+ (a)| ≤ C(n)
and
|hc− (na) − n hc− (a)| ≤ C(n).
These conditions serve to choose a canonical height function. We can do it in the
following abstract situation.
9.2.3. Let N be a multiplicatively closed subset of R (resp. R+ ) acting on a
set S by means of a map such that n(mx) = (nm)x for x ∈ S . A function
h : S −→ R is quasi-homogeneous of degree d ∈ N (resp. d ∈ R+ ) for N if
for n ∈ N there is a positive constant C(n) such that

|h(nx) − nd h(x)| ≤ C(n) for every x ∈ S, (9.2)

and is homogeneous of degree d for N if h(nx) = nd h(x).


The example we have in mind is the following: S = A(K), h = hc , N = Z and
the action of n is multiplication by n in the abelian group A(K). Then we have
seen in 9.2.2 that hc+ and hc− are quasi-homogeneous of degree 2 and 1.
Lemma 9.2.4. Let N act on the set S as before and let h : S −→ R be quasi-
homogeneous of degree d > 0. If N has an element of absolute value > 1, then
there is a unique homogeneous function 
h : S −→ R of degree d for N such that

h − h is bounded.
Proof: Assume for a moment that a homogeneous function h of degree d for N
exists, with h − 
h bounded. Then for x ∈ S and n ∈ N , we have

h(x) = lim n−d 
h(nx) = lim n−d h(nx),
|n|→∞ |n|→∞

because h −  h is bounded. This proves uniqueness and gives us an idea of how


to show existence. Apparently, in order for this argument to succeed, we need
C(n) = o(nd ), a condition we do not want to impose a priori. On the other hand,
noting that h(mnx) = h(m(nx)) allows us to get control of C(mn) in terms
of C(m) and C(n). This is enough for proving the existence of the limit if we
stay with a suitable subsequence, and this suffices for the proof. The details are as
follows.
286 N É RO N – TAT E H E I G H T S

Let us fix m ∈ N , m > 1. For a positive integer r , estimate (9.2) with n = m


and mr−1 x in place of x, gives
|h(mr x) − md h(mr−1 x)| ≤ C(m),
whence
 r 
 
 r−i 
|h(m x) − m h(x)| = 
r rd
m d(i−1)
h(mr−i+1
x) − m h(m x)
di
 
i=1

r
 
≤ md(i−1) · h(mr−i+1 x) − md h(mr−i x)
i=1
m −1 dr
≤ C(m).
md − 1
(To be precise, we should use always x instead of m0 · x .) Replacing x by ms x
for any s ∈ N , we get
mdr − 1
|h(mr+s x) − mrd h(ms x)| ≤ · C(m)
md − 1
and we conclude that
C(m)
|m−(r+s)d h(mr+s x) − m−sd h(ms x)| ≤ (9.3)
(md − 1)mds
for every r, s ∈ N . This shows that
 −sd 
m h(ms x) s∈N
is a Cauchy sequence, and we denote by  h(x) its limit. Using (9.3) for s = 0,
r → ∞ , we get
C(m)
|
h(x) − h(x)| ≤ d .
m −1
If we use again (9.2) with ms x in place of x and n ∈ N , we obtain
 

h(nx) = lim m−sd h(ms nx) − nd h(ms x) + nd h(ms x)
s→∞

= nd 
h(x).
We have proved the existence of a homogeneous function 
h of degree d for N ,
such that 
h − h remains bounded. 
If we combine 9.2.1–9.2.4, then we obtain canonical global height functions asso-
ciated to every class of Pic(A). This procedure is called Tate’s limit argument.

Corollary 9.2.5. Let c ∈ Pic(A) and let c = c+ + c− be a decomposition into


an even part c+ and an odd part c− . Then the classes hc± are independent of
the choice of the decomposition. There is a unique homogeneous height function

hc± in the class hc± , of degree 2 in the + case and degree 1 in the − case.
9.2. Néron–Tate heights 287

hc := 
Definition 9.2.6. The height function  hc+ + 
hc− is called the Néron–Tate
height associated to c .

All the formalism in Chapter 2 involving heights on abelian varieties is now true
for Néron–Tate heights as exact equations, and not just up to bounded functions.
More precisely:
Theorem 9.2.7. The Néron–Tate heights on abelian varieties have the following
properties:

(a) The map



h : Pic(A) −→ RA(K) , c −→ 
hc
is a homomorphism.
(b) If ϕ : A −→ B is a homomorphism of abelian varieties, then

hϕ∗ (c) = 
hc ◦ ϕ
for any c ∈ Pic(B).
(c) Let c ∈ Pic(A) be even. If c is base-point free or ample, then 
hc ≥ 0.
Proof: For (a), we have to prove the identity
hc + 
 hc = 
hc+c
for c, c ∈ Pic(A), while for (b) we need
hϕ∗ (c) = 
 hc ◦ ϕ
for c ∈ Pic(B).
It is enough to prove them for odd or even c . Both sides of either identity are then
homogeneous and in the same class, by Theorem 2.3.8. Corollary 9.2.5 shows that
they are equal. This proves (a) and (b).
To prove (c), we may assume that c is base-point free. For if c is ample, then
there is an m ∈ N such that mc is very ample. In particular, mc is base-point
free and by (a), we have mhc = hmc . So let c be base-point free and even. There
is a morphism ϕ of A into some projective space such that ϕ∗ OPnK (1) is in the
class of c (see A.6.8) and hence hϕ is in the class hc (see 2.4.5). We have seen
in the proof of Lemma 9.2.4 that

hc (a) lim n−2 hϕ (na)
n→∞

for any a ∈ A . Since hϕ is a non-negative function, the same holds for 


hc . 

Theorem 9.2.8. The Néron–Tate height  hc is the unique quadratic function in


the class hc . Moreover, 2
hc+ is the associated quadratic form and 2hc− is the
associated linear form.
288 N É RO N – TAT E H E I G H T S

Proof: First, we note that the function


b(a, a ) := 
hc (a + a ) − 
hc (a) − 
hc (a )
is bilinear in a, a . This follows easily from the theorem of the cube in 8.6.11,
using Theorem 9.2.7. The associated quadratic and linear forms are given by
hc (a) ± 
 hc (−a) = 
hc±[−1]∗ c (a) = 2
hc± (a),
again by Theorem 9.2.7.
It remains to prove uniqueness. By definition, the quadratic function is determined
up to bounded functions. Hence the same is true for the associated quadratic (or
linear) form. Corollary 9.2.5 shows that they are unique. 
Remark 9.2.9. As observed by several authors, the Tate construction given before is not
limited to abelian varieties. Let X be a projective variety over K and let ϕ : X → X be
a morphism over K . Assume that we have c ∈ Pic(X) and k, l ∈ Z, |k| > |l| , such that
lϕ∗ (c) = kc. (9.4)

Then we claim that there is a unique height function   c such that


hϕ in the class h
l
hϕ (ϕ(x)) = k
hϕ (x).
As a special case, the Néron–Tate height on an abelian variety for an even or odd class is
obtained by taking ϕ = [m] for some m ∈ N, m ≥ 2 .
In order to prove the claim, we may assume l = 0 , because if l = 0 clearly we have

hϕ = 0 . We consider the semigroup N := {λr | r ∈ N} , where λ := k/l . Then N
acts on the set S := X by means of λr · x := ϕr (x) , where ϕr = ϕ ◦ ϕ ◦ · · · ◦ ϕ is
the composition of ϕ with itself r times. Then (9.4) proves that a height function hc is
quasi-homogeneous of degree k/l (use Theorem 2.3.8). By Lemma 9.2.4, we get the claim
for

hϕ (x) := lim λ−r hc (ϕr (x)).
r→∞

The same proof as for Theorem 9.2.8(b) shows that 


hϕ is a non-negative function if c is
base-point free or ample.
Theorem 9.2.10. If K is a number field and c is ample, then  hϕ (x) = 0 if and only if x
is preperiodic, i.e. the sequence x, ϕ(x), ϕ2 (x), . . . contains only finitely many distinct
points.
Proof: If l = 0 , then a positive multiple of c is very ample and 0 (use A.6.10). Thus X
is finite (use A.6.15) and the claim follows from hϕ = 0 . So we may assume l = 0 .
Suppose that x is preperiodic. Then the sequence (ϕr (x)) contains only finitely many
distinct elements, making it plain that

hϕ (x) = lim λ−r hc (ϕr (x)) = 0.
r→∞

Suppose that 
hϕ (x) = 0 . Then
hϕ (ϕr (x)) = λr 
 hϕ (x) = 0
9.3. The bilinear form 289

for every r , therefore


   
hc (ϕr (x)) ≤ 
hϕ (ϕr (x)) + C(ϕ) = C(ϕ)
is bounded for every r . On the other hand, we have ϕr (x) ∈ K(x) . Hence the points
ϕr (x) have degree at most [K(x) : Q] and bounded height. By Northcott’s theorem in
2.4.9 they form a finite set and x is preperiodic for ϕ . 
Example 9.2.11. The following special case is quite interesting, because of its connexions
with dynamical systems.
Let f be a non-constant rational function on P1K . We assume that f , as a morphism
f : P1K → P1K , has degree n ≥ 2 . We deduce the relation
cl(f ∗ OP1 (1)) = n · cl(OP1 (1)) ∈ Pic(P1K ) ∼
=Z
and Remark 9.2.9 shows that there is a unique real-valued function 
hf on P1K with:
 
(a) hf (x) − h(x) ≤ C(f ) for some constant C(f ) independent of x ;

(b) 
hf (f (x)) = n
hf (x) .
Remark 9.2.12. Note that Theorem 9.2.10 extends Kronecker’s theorem from 1.5.9. In-
deed, if f (x) = xn in the situation of Example 9.2.11, we have  hf (x) = h(x) , where h
is the Weil height. The preperiodic points of f are 0 , ∞ and the roots of unity.
In the next section, we shall prove a counterpart of Kronecker’s theorem for abelian vari-
eties, namely Theorem 9.3.5, proved along similar lines.

9.3. The associated bilinear form

Let A be an abelian variety over a field K with product formula.


9.3.1. Let M be an abelian group and let b be a real-valued symmetric bilinear
form on M . We have in mind the example M = A(K) and a certain bilinear
form associated to a Néron–Tate height. The kernel of b is the abelian group
N := {x ∈ M | b(x, y) = 0 for every y ∈ M }.
Then b induces a symmetric bilinear form b on M := M/N and the kernel of b
is zero. Since b is real valued, M is torsion free and all torsion elements of M
are contained in N . We conclude that
M −→ M R := M ⊗Z R, m −→ m ⊗ 1

is injective. Let M be a finitely generated subgroup of M . The restriction of
  
b to the free abelian group M extends uniquely to a bilinear form b on M R .
   
Let M Q = M ⊗Z Q . An easy argument shows that M Q ⊂ M Q and so M R ⊂
 
M R . Since M R is the union of all M and the bilinear forms b coincide on
overlappings by uniqueness, we have a unique extension of b to a bilinear form bR
on M R .
290 N É RO N – TAT E H E I G H T S

9.3.2. We would like that the bilinear form bR (x, y) determines a scalar product
and an associated norm x2 = bR (x, x) on M R .
To this end, it is of course necessary that b(x, x) > 0 for every x ∈ M \ {0}.
Suppose this is the case. By clearing denominators, b(x, x) > 0 for x ∈ M Q \
{0}; therefore, by continuity we also have bR (x, x) ≥ 0 for x ∈ M R \ {0}.
Note however that this is not enough for bR to be positive definite, as is seen in
the following example: If α is a transcendental number in R , then the quadratic
form in R2 given by q(x) := (x1 − αx2 )2 is positive semidefinite. We have
2
q(α, 1) = 0, but q(x) > 0 whenever x ∈ Q \ {0}, because α is transcendental.
Thus some care is needed if we want to obtain a scalar product from the bilinear
form b .
Lemma 9.3.3. With the notation and assumptions of 9.3.1, the bilinear form bR is

positive definite if and only if for every finitely generated subgroup M of M and
for every C > 0 the set
{x ∈ M | bR (x, x) ≤ C}
is finite.
Proof: We may assume that M is finitely generated. Since M is torsion-free, it is
a lattice in M R . If bR is a scalar product, then there are only finitely many lattice
points in a bounded set (Proposition C.2.4). This proves the result in one direction.
Conversely, assume that bR is not positive definite. We may assume that bR is
positive semidefinite. Otherwise, the set {x ∈ M | bR (x, x) ≤ C} is clearly
infinite. There is a y ∈ M R \ {0} such that bR (y, y) = 0. For bR positive
semidefinite, the Cauchy–Schwarz inequality is valid. Therefore, y is in the kernel
of bR . By construction, the restriction of bR to M ×M has trivial kernel and hence
y ∈ M Q .
Choose a basis x1 , . . . , xr of M . It is also a basis of M R . For any n ∈ N , there
is a yn ∈ M such that the coordinates of yn − ny are in the interval [0, 1]. The
elements yn − ny are contained in the compact cube
 r 2

αi xi | 0 ≤ αi ≤ 1 ,
i=1
while on the other hand
b(yn , yn ) = b(yn − ny, yn − ny).
Since bR is continuous, it is bounded on that cube, say by C . Since y ∈ M Q , the
set {yn | n ∈ N} is infinite and contained in
{x ∈ M | bR (x, x) ≤ C}.
This proves the lemma. 
9.3. The bilinear form 291

9.3.4. Now we apply these considerations to M = A(K). The reader should


be familiar with our notions about quadratic functions introduced in 8.6.5. Let
c ∈ Pic(A) and let b be the bilinear form associated to the quadratic function 
hc .
The associated quadratic form and hence b itself depend only on c+ by Theorem
9.2.8. Hence we may assume that c is even. In view of 9.3.1, we get a symmetric
bilinear form bR on M R .
Assume that c is also ample. Then  hc is a non-negative function by Theorem 9.2.7
and hence bR is positive semidefinite as in 9.3.2. There are now two problems:
First, we would like to know the kernel N of b and, second, we would like to
have at our disposal the necessary and sufficient condition of Lemma 9.3.3 for a
scalar product. The latter is satisfied if there are only finitely many L-rational
points of bounded height relative to c for any finite field extension L/K . By
Northcott’s theorem in 2.4.9, this holds for a number field.
Now we assume that the condition of Lemma 9.3.3 holds. Our goal is to determine
N . Let x ∈ A be a point with  hc (x) = 0. Then for every integer n we have
 2
hc (nx)n hc (x) = 0, hence the set {nx | n ∈ Z} is finite. By the pigeon-hole
principle, there will be two distinct integers m , n such that nx = mx ; hence x is
a torsion point. We have already seen in 9.3.1 that the torsion elements of M are
contained in the kernel N , and hence N is the torsion subgroup of M = A(K).

Theorem 9.3.5. Let K be a number field and let c be ample and even. Then  hc
vanishes exactly on the torsion subgroup of A(K). Moreover, there is a unique
scalar product  ,  on the abelian group A(K) ⊗Z R such that


hc (x) = x ⊗ 1, x ⊗ 1

for every x ∈ A(K).

Proof: This follows from the preceding discussion in 9.3.4, because M R = M ⊗Z


R is canonically isomorphic to M ⊗Z R , as N is the torsion subgroup of M =
A(K). 
In the next result, we relate b to the Néron–Tate height of the Poincaré class. We
use the definitions and notations of 8.4–8.6.

Proposition 9.3.6. Let c ∈ Pic(A) and let b be the symmetric bilinear form
associated to   be the Poincaré class of A and
hc . Moreover, let δ ∈ Pic(A × A)

let ϕc : A −→ A be the homomorphism of Theorem 8.5.1. Then

b(a, a ) = 
hδ (a, ϕc (a ))

for every a, a ∈ A(K).


292 N É RO N – TAT E H E I G H T S

Proof: By definition
b(a, a ) = 
hc (a + a ) − 
hc (a) − 
hc (a )
=
hc ◦ τa (a) − 
hc (a) − 
hc (a ).

For the moment, let us keep a fixed and view the above as functions of a. By
Theorem 2.3.8, we conclude that

hc + b(·, a )
is a representative in the class hτ ∗ (c) . Then the representative above is a quadratic
a
function too, being a sum of a quadratic function and a linear form. Now Theorem
9.2.8 shows that
hτ ∗ (c) (a) = 
 hc (a) + b(a, a )
a
and hence by Theorem 9.2.7 we get
b(a, a ) = hϕ (a ) (a).
c

It follows that it is enough to prove


hc 
 hδ (·, c ) (9.5)
 = P ic0 (A).
for c ∈ A
On the other hand, the point c is the pull-back of δ to A × {c } (see Theorem
8.4.13), hence (9.5) holds up to a bounded function on A(K) (by Theorem 2.3.8).
In order to get equality, by Theorem 9.2.8 it is enough to show that 
hδ is bilinear.
If a ∈ A , then applying Theorem 9.2.7 to the homomorphism ϕ : A −→ A × A 
given by ϕ(a) = (a, 0) and using ϕ∗ (δ) = 0 (Theorem 8.4.13), we get
hδ (a, 0) = 
 hϕ∗ (δ) (a) = 0.
In the same way we verify that

hδ (0, a ) = 0
for a ∈ A . We conclude that the bilinear form associated to the quadratic function
hδ , evaluated at ((a, 0), (0, a )), is equal to 
 hδ (a, a ). This proves bilinearity. 
From the proof, we also obtain
Corollary 9.3.7. With the notation of the proof of Proposition 9.3.6, it holds

hτ ∗ (c) (a) = 
hc (a) + b(a, a )
a

and

hc = 
hδ (·, c ).
Corollary 9.3.8. Let c ∈ Pic0 (A) and let c ∈ Pic(A) be an even ample class.
Then
hc = O(
 hc1/2 ).
9.3. The bilinear form 293

Proof: Let a ∈ A(K). Corollary 9.3.7 shows that



hc (a)
hδ (a, c ).
By 8.9.2, there is an element a of A such that c = ϕc (a ). Let b be the bilinear
form associated to hc . Then, applying Proposition 9.3.6, we conclude that

hc (a) = b(a, a ).
We have seen in 9.3.4 that b induces a symmetric bilinear form on A(K) ⊗Z R ,
which is positive semidefinite because  hc is a non-negative function (use 9.3.2 and
Theorem 9.2.7). So we can apply the Cauchy–Schwarz inequality to get
hc (a)|2 ≤ b(a, a) · b(a , a ) = 4
| hc (a) · 
hc (ϕc (a )) .
This proves the claim. 
Remark 9.3.9. Let X be a projective smooth variety and fix an ample class c ∈
Pic(X). If c ∈ Pic0 (X), then
hc = O(|hc |1/2 + 1).
The proof involves the basic functorial properties of Picard varieties from Section
8.4 to reduce the claim to the case of abelian varieties, where it follows from
Corollary 9.3.8. By base change, we may assume that X has a base point P0 ∈
X(K). Let A := P ic0 (X) and let pX , pA be the Poincaré classes of X and
A . The pull-back of pX with respect to the isomorphism A × X → X × A ,
obtained by switching the entries, is denoted by ptX . Then ptX is a subfamily of
Pic0 (A) parametrized by X , which follows from (pX )P0 = 0, (pX )0 = 0 (see
Remark 8.4.11) and from the definition of algebraic equivalence. By Theorem
8.4.7 applied to A instead of X , there is a unique morphism ϕ : X → A  with
(idA × ϕ)∗ (pA ) = ptX . We may view c as point a ∈ A(K). For c := (pA )a ∈
 we get
Pic0 (A),
 
ϕ∗ (c ) = (idA × ϕ)∗ ◦ pA |{a}×A = ptX |{a}×X = c ,
where the last step was by Remark 8.4.11. Hence hc = hc ◦ ϕ + O(1) by
Theorem 2.3.8.
Let c be an ample even line bundle on A  , whose existence is guaranteed by Propo-
sition 8.6.4. Then there is n ∈ N such that nc − ϕ∗ ( c) is base-point free (see
A.6.10). Hence  hc ◦ ϕ = O(|hc | + 1) by Theorem 2.3.8 and Proposition 2.3.9.
We conclude that it is enough to prove the claim for c ,  c on the abelian variety
A instead of c , c and this is just Corollary 9.3.8.

Corollary 9.3.10. Let X be a projective smooth variety over K , let c ∈ Pic(X)


be ample and let c ∈ Pic(X) be algebraically equivalent to c . Then
hc = hc + O(|hc |1/2 + 1).
294 N É RO N – TAT E H E I G H T S

9.4. Néron–Tate heights on Jacobians

Let C be an irreducible smooth projective curve of genus g > 0 over a field K


with product formula. By base change, we may assume that C has a K -rational
base point P0 (see 1.5.22). We denote the Jacobian of C by J and identify J
with its dual J as in 8.10.21 and Theorem 8.10.22. Then the Poincaré class δ
corresponds to
δ = m∗ θ − p∗1 θ − p∗2 θ ∈ Pic0 (J × J),

where θ is the theta divisor 8.10.8, m is the sum homomorphism and pi is the
projection of A × A onto the ith factor.

Proposition 9.4.1. The Néron–Tate height hδ : J(K) × J(K) −→ R is a sym-


metric positive semidefinite bilinear form.

Proof: Using the above identification of J and J by ϕθ , Proposition 9.3.6 asserts


hδ is the symmetric bilinear form associated to the quadratic function 
that  hθ .
Let ∆ : J −→ J × J be the diagonal homomorphism. Proposition 8.7.1 shows
[2]∗ θ = 3θ + θ − where θ − = [−1]∗ θ as usual. Hence

∆∗ δ = (m ◦ ∆)∗ θ − (p1 ◦ ∆)∗ θ − (p2 ◦ ∆)∗ θ


= [2]∗ θ − 2θ
= θ + θ− .

For a ∈ J(K), we conclude

hδ (a, a) = 
 hθ+θ− (a)

by Theorem 9.2.7 again. Since θ is ample (see Theorem 8.10.22), θ − = [−1]∗ θ


is ample. Therefore, θ + θ − is an ample even class and the corresponding Néron–
Tate height is a non-negative function (Theorem 9.2.7 (c)), proving the claim. 

9.4.2. In the light of Proposition 9.4.1, we will use the following notation for
a, a ∈ J(K)

a, a  : = hδ (a, a ),
|a| : = 
hδ (a, a)1/2 = 
hθ+θ− (a)1/2 .

The symmetric positive semidefinite bilinear form  ,  is called the canonical


form of J . For a divisor D , it is also convenient to use the notation hD for a
height function associated to the isomorphim class of O(D).
9.4. Néron–Tate heights on Jacobians 295

Proposition 9.4.3. Let ∆ be the diagonal in C × C and let j : C −→ J be the


natural embedding from 8.10.8. Then Mumford’s formula for P, Q ∈ C(K) is
1 1
h∆ (P, Q) = |j(P )|2 + |j(Q)|2 − j(P ), j(Q)
2g 2g
1  1 
− hθ−θ− (j(P )) − h − (j(Q)) + O(1).
2g 2g θ−θ
Proof: We abbreviate z = j(P ), w = j(Q). By what was proved in 8.10.18–
8.10.21, we get
(j × j)∗ (δ) = cl(O(C × {P0 } + {P0 } × C − ∆)).
By Theorem 2.3.8, we get
h∆ (P, Q) = hC×{P0 } (P, Q) + h{P0 }×C (P, Q) − z, w + O(1). (9.6)
By Theorem 2.3.8 again, we have
hC×{P0 } (P, Q) = hp∗2 ([P0 ]) (P, Q) = h[P0 ] (Q) + O(1) (9.7)
and similarly
h{P0 }×C (P, Q) = h[P0 ] (P ) + O(1). (9.8)
Now Proposition 8.10.15 shows that g[P0 ] is in the class j ∗ (θ − ) and Theorem
2.3.8 implies
1
h[P0 ] (P ) =  h − (z) + O(1)
g θ
1 1 
= |z|2 − h − (z) + O(1).
2g 2g θ−θ
Substituting this in (9.7), (9.8) and putting the results into (9.6), we get Mumford’s
formula. 
Remark 9.4.4. Since θ − θ − is an odd class and θ + θ − is an even ample class,
we get by Corollary 9.3.8
1 1
h∆ (P, Q) = |j(P )|2 + |j(Q)|2 − j(P ), j(Q) + O(|j(P )| + |j(Q)| + 1).
2g 2g

As shown by Mumford, this formula has some rather interesting consequences for
curves of genus g ≥ 2.
Proposition 9.4.5. Assume that C has genus g ≥ 2 and let cos α ∈ ( g1 , 1),
ε > 0. Then there is a constant B = B(C, P0 , ε) > 0 such that for any pair
(P, Q) ∈ C(K) one of the following four possibilities occurs:

(a) P = Q;
(b) j(P ), j(Q) < (cos α) · |j(P )| · |j(Q)|;
296 N É RO N – TAT E H E I G H T S

(c) min(|j(P )|, |j(Q)|) ≤ B ;


(d) (2g cos α − 1 − ε) min(|j(P )|, |j(Q)|) ≤ max(|j(P )|, |j(Q)|).
Proof: We may assume that (a), (b) do not hold. Then we need to prove that either
(c) or (d) hold. Again, we abbreviate z = j(P ), w = j(Q) and assume |z| ≥ |w|.
By Remark 9.4.4, we have
1 1
z, w + h∆ (P, Q) = |z|2 + |w|2 + O(|z| + 1).
2g 2g
Since P = Q and ∆ is an effective divisor, we may assume by Proposition 2.3.9
that h∆ (P, Q) ≥ 0. Using also the negation of (b), we conclude
1 1
(cos α) |z| |w| ≤ |z|2 + |w|2 + O(|z| + 1).
2g 2g
We may also assume that |z| ≥ 1. We set r := |z| / |w| and find from the
preceding inequality that

1 1 1
cos α ≤ r+ +O .
2g r |w|
We multiply the last inequality by 2g , note that 1/r ≤ 1 and find

1 |z|
2g cos α − 1 − O ≤r= .
|w| |w|
If we choose B sufficiently large, then either (c) or (d) hold. 
Corollary 9.4.6. With the notation of Proposition 9.4.5, let P , Q be points on
C . Then, if j(P ) − j(Q) is in the kernel of  , , either P = Q or |j(P )| =
|j(Q)| ≤ B . In particular, if j(P ) − j(Q) is a torsion point and |j(Q)| > B ,
then P = Q.
Proof: It is an immediate consequence of the preceding Proposition 9.4.5. 
9.4.7. Our next goal is to count rational points of C (still assuming g ≥ 2), using
a procedure similar to that used in the proof of Theorem 5.2.1. As in Section 9.3,
the canonical bilinear form b =  hδ extends to a symmetric positive semidefinite
bilinear form &b on J(K) ⊗Z R . Let NR be its kernel; then &b induces a scalar
product on E := J(K) ⊗Z R/NR , again denoted by  , . By Corollary 9.4.6
above, the map
i : C(K) → E, P → j(P ) ⊗ 1 + NR ,
is one-to-one on the subset of points P such that |j(P )| > B .
Definition 9.4.8. A point P ∈ C(K) is called small if |j(P )| ≤ B , otherwise it
is called large.
9.4. Néron–Tate heights on Jacobians 297

9.4.9. We may choose (and fix) 0 < α < π/2 and ε > 0 such that cos α ∈ ( g1 , 1)
is such that λ := 2g cos α − 1 − ε > 1.
In the euclidean space E , we have the following geometric interpretation of Propo-
sition 9.4.5. If P, Q are different large points such that i(P ) and i(Q) include an
angle ≤ α and if |j(P )| ≤ |j(Q)|, then λ|j(P )| ≤ |j(Q)|. This shows that we
have gaps between points on C pointing in approximatively the same direction.
Let us consider the cone

T := {x ∈ E | x, a ≥ cos(α/2) · |x| · |a|}

with center 0, angle α/2 and axis through a ∈ E . We order the large points in
C(K) mapping to T in a sequence Q0 , Q1 , . . . such that

B < |j(Q0 )| ≤ |j(Q1 )| ≤ |j(Q2 )| ≤ . . . .

The above shows |j(Qk )| ≥ λk |j(Q0 )| for every k . For H > B , let nT (H) be
the number of large points Q ∈ C(K) mapping to T with |j(Q)| ≤ H . We get
9 :
log(H/B)
nT (H) ≤ .
log λ
9.4.10. The above bound for nT (H) is uniform with respect to T (for fixed
angle) and yields a counting of all large K -rational points of C mapping to T .
This can be used, in some circumstances, to count all large points in C(K) with
bounded height. To this end, it is necessary to assume that J(K) is a finitely gen-
erated group. Another possibility consists in fixing a priori a finitely generated
subgroup Γ of J(K), and consider only the subset of large points P ∈ C(K) for
which j(P ) ∈ Γ . The question whether we can take J(K) for such a group Γ
can then be examined independently. As we shall see later in the chapter dedicated
to the Mordell–Weil theorem, J(K) is finitely generated if K is a number field
or K is a function field over a finite field.
Thus we fix a subgroup Γ of J(K) of finite rank r := rankQ (Γ), where the rank
is the maximum number of Z -linearly independent elements in Γ .
We associate to Γ the finite-dimensional real vector subspace EΓ spanned by the
image of Γ in E . It is clear that dim(EΓ ) ≤ r .
For x ∈ E \ {0}, we set ν(x) = x/|x|. Then ν maps cones to spherical caps and
we get a bound for the minimal number of cones needed to cover EΓ by Lemma
5.2.19. From the proof of Lemma 5.2.19, we deduce immediately the following
result leading to a little sharpening of the bound for the number of large points.

Lemma 9.4.11. Let   be a norm on Rr and let ρ > 0. If x1 , . . . , xn ∈ Rr


have norm 1 and if xi − xj  > ρ, then n ≤ (1 + 2/ρ)r .
298 N É RO N – TAT E H E I G H T S

Proposition 9.4.12. For ρ = 2 sin(α/2) and r = rankQ (Γ), the number nΓ (H)
of large points Q ∈ C(K) with j(Q) ∈ Γ and |j(Q)| ≤ H does not exceed
9 :
log(H/B)
nΓ (H) ≤ · (1 + 2/ρ)r .
log λ
In particular, nΓ (H)  log H .
Proof: For any k ∈ N , we count the number of Q ∈ C(K) with λk B < |j(Q)| ≤
λk+1 B . By 9.4.9, the angle between two such points Q, Q is > α . We conclude
that |ν(Q)−ν(Q )| > ρ for ρ := 2 sin(α/2). By Lemma 9.4.11, there are at most
(1+2/ρ)r such points. Now the interval (B, H] may be covered by log(H/B))
such intervals, proving the claim. 
9.4.13. Still assuming g ≥ 2, we may choose α = π/60and ε > 0 such that

λ = 2g cos α − 1 − ε ≥ 2 and ρ = 2 sin(π/12) = 2 − 3 > 12 . With
this choice of parameters, we may summarize our findings about Mumford’s gap
principle:
Theorem 9.4.14. Let C be an irreducible smooth projective curve of genus g ≥ 2
over K with base point P0 ∈ C(K) leading to a closed embedding j of C
into the Jacobian variety J and let Γ be a subgroup of J of finite Q -rank r .
Then there is a constant B > 0 depending only on C and P0 with the following
properties:

(a) If we choose any cone T in E with center 0 and angle α/2 (see 9.4.9) and
if we order {Q ∈ C | j(Q) ∈ Γ, |j(Q)| > B, i(Q) ∈ T } by increasing
norm, then |j(Qn+1 )| ≥ 2|j(Qn )| for every n ∈ N .
(b) For H > B , the number nΓ (H) of points Q ∈ C with j(Q) ∈ Γ, B <
|j(Q)| ≤ H is bounded by
9 :
log(H/B)
nΓ (H) ≤ · 5r .
log 2
(c) In particular, nΓ (2H) − nΓ (H) ≤ 2 · 5r .

The question of bounding the number of small points requires different considerations. In
Chapter 5 we used a uniform version of Zhang’s theorem, as at the end of the proof of
Theorem 5.2.1. Another way of dealing with the problem is to apply a finiteness result, as
in Northcott’s theorem in 2.4.9. This approach leads to the following Northcott condition
already used in 4.5.1:
Definition 9.4.15. The field K satisfies the Northcott property (N) if {α ∈ K | h(α) ≤
R} is finite for every R > 0 .
Proposition 9.4.16. If K satisfies (N), then for any projective variety X over K and any
ample class c ∈ Pic(X) , the set {x ∈ X(K) | hc (x) ≤ H} is finite for every H > 0 .
9.4. Néron–Tate heights on Jacobians 299

Proof: Since hc is determined up to bounded functions, this property is well defined. Now
the proof is the same as our deduction of Northcott’s theorem in 2.4.9 from 1.6.8. 
By definition, | |2 is a height function with respect to an ample class, hence:
Proposition 9.4.17. The Northcott property (N) for a field K implies that the number of
small K -rational points of C is finite.
Remark 9.4.18. If K satisfies (N), then we see as in 9.3.4 that the kernel of , on
J(K) is the torsion subgroup of J(K) . By Lemma 9.3.3, , is a scalar product on
EJ (K ) = J(K) ⊗Z R . Thus, if Γ is a subgroup of J(K) , we get dim(EΓ ) = rankQ (Γ) .
9.4.19. Faltings’ theorem in 11.1.1 says that nJ (K ) (H) is in fact bounded whenever K is
a number field. On the other hand, there are fields K satisfying (N) for which J(K) is
finitely generated and Mumford’s bound nJ (K ) (H)  log H in Proposition 9.4.12 is best
possible. The following example is due to Serre [277], p.80.
Example 9.4.20. Let K be a finite field with q elements and let X be an irreducible
smooth projective curve. There is a natural set MK (X ) of absolute values on the function
field K(X) , satisfying the product formula (see 1.4.6–1.4.9). A point P ∈ Pn (K(X))
corresponds to a rational map ϕ : X  PnK defined over K . But ϕ is always de-
fined in codimension 1 (by the valuative criterion of properness and smoothness of X ,
see A.11.10), hence is a morphism. To ϕ we attach the line bundle L = ϕ∗ OPn (1) gen-
erated by the global sections sj := ϕ∗ (xj ), 0 ≤ j ≤ n . Example 2.4.11 shows that
h(P ) = degL (X) .
We claim that the field K(X) has the Northcott property (N). To prove that, let H > 0
and f ∈ K(X) with h(f ) ≤ H . By Example 1.5.23, we have

h(f ) = − deg(Z) min(0, ordZ (f )),
Z

where Z ranges over all prime divisors of X . It is easy to see that over a finite field, only
finitely many prime divisors of bounded degree can occur. We conclude that the number
of pole divisors of all rational functions f with h(f ) ≤ H is finite. By A.8.22, f is
determined up to a locally constant function by its pole divisor. Since K ∩ K(X) is a finite
field, we get the claim.
Let C be a geometrically irreducible smooth projective curve of genus g ≥ 2 over K . We
may assume that X is geometrically irreducible and that there is a point P ∈ C(K(X)) \
C(K) . This can certainly be achieved by a finite base extension of K(X) (extending K
also), which is again a function field of an irreducible normal projective curve by Lemma
1.4.10. The points of a normal curve are regular (see A.8.5) and the finite field K is perfect,
hence the curve has to be smooth again (see A.7.12 and A.7.13).
Denote the Jacobian of C by J . By Remark 10.6.5, J(K(X)) is a finitely generated
abelian group. Thus the assumptions of Proposition 9.4.12 are satisfied.
We recall the definition of the Frobenius map over K . For a variety Y over K , it is the
K -morphism F : Y → Y , given, in affine coordinates x on a chart, by x
→ xq . For
a line bundle L over Y , we have F ∗ (L) ∼= L⊗q , which is easily proved by considering
transition functions. Moreover, for every K -morphism ψ , we have F ◦ ψ = ψ ◦ F . In
300 N É RO N – TAT E H E I G H T S

particular, F : J → J is a homomorphism of abelian varieties, defined over K . By


Theorem 9.2.7, we get
|F (z)|2 = 
hθ+θ − (F (z)) = 
hF ∗ (θ+θ − ) (z) = q|z|2 (9.9)
for any z ∈ J(K(X)) . Note here that both the Jacobian and the theta divisor are compati-
ble with base change to K(X) (use Corollary 8.4.10).
By (9.9), for Pn := F n (P ) ∈ C(K(X)) , we have |j(Pn )|2 = q n |j(P )|2 . Since P ∈ /
C(K) , the points Pn are pairwise different. Here we have used that the K(X) -rational
fixed points of a Frobenius power have coordinates in a finite field extension of K and
hence in K (as K is algebraically closed in K(X) by A.4.11). Moreover, we note that
|j(P )| = 0 (which is clear from Remark 9.4.18 because j(P ) ∈ J(K) ) and hence is not a
torsion point (Proposition 8.7.2). This gives nJ (K (X )) (H) log H as desired.
Example 9.4.21. The following example gives an explicit construction of a curve for which
we have nΓ (H) log H . The reader will verify that it is really not different in nature
from the preceding example.
Let K be a finite field with q elements, let n be a positive integer and let aij (t) ∈ K(t) ,
i, j = 0, 1, 2 , be such that

2
n
aij (t)xi xqj = 0
i,j=0

defines a smooth plane curve C of degree q n + 1 in P2K (t) with homogeneous coordinates
(x0 : x1 : x2 ) . By the Jacobian criterion in A.7.15, this happens if det(aij (t)) is not
identically 0 . The curve C has genus g(C) = q n (q n − 1)/2 (see A.13.4) and in particular
g(C) ≥ 2 if q n ≥ 3 . Let P be a point on C rational over K(t) . It is easy to see that
the tangent line to C at P has intersection multiplicity q n with C at P ; therefore, since
C has degree q n + 1 , it intersects C residually in a single point P  , again rational over
K(t) . Thus, starting with a point P0 , we obtain by the above tangent process a sequence
of points P1 , P2 , . . . all defined over K(t) . In general, the sequence obtained in this way
is infinite and we have
m  log(h(Pm ))  m.

An explicit example is the curve (now in affine coordinates x, y )


xq+1 + txy q − y q+1 = 1.

The associated equation αq+1 + tα − 1 = 0 has a solution with a formal continued fraction
expansion
2 3
α = [0; t, tq , tq , tq , . . . ].
It is now easy to see, using classical elementary properties of continued fractions (see for ex-
ample G.H. Hardy and E.M. Wright [145] or O. Perron [233]), that, if we set (x−1 , y−1 ) =
(1, 0) , (x0 , y0 ) = (0, 1) , (x1 , y1 ) = (1, t) , and in general
m m
xm+1 = tq xm + xm−1 , ym+1 = tq ym + ym−1 ,
then
m −1 xm
[0; t, t2 , . . . , tq ]=
ym
9.5. The Néron symbol 301

and
xq+1
m
q
+ txm ym q+1
− ym = (−1)m+1 .
Thus we obtain a polynomial solution for every odd m , and in fact for any m if we are
in characteristic 2 . Moreover, it is the same as the geometric construction above, i.e. for
P0 = (1, 0) , we get Pm = (x2m−1 , y2m−1 ) for all m ∈ N . We leave the details to the
reader.
This gives an example of an affine curve (in fact, a Thue equation) over K[t] , of genus
q(q − 1)/2 , with infinitely many integral points over K[t] , occurring precisely at the rate
predicted by Mumford’s bound.

9.5. The Néron symbol

We give here Néron’s decomposition of the Néron–Tate height into canonical local heights.
This will lead to the Néron symbol on arbitrary smooth complete varieties, namely a pairing
between a divisor and a disjoint zero-dimensional cycle, both assumed to be algebraically
equivalent to 0 . This section is based on Section 2.7 and provides additional material not
essential to the rest of the book. Instead of working with local heights, we adopt the equiv-
alent concept of metrized line bundles promoted by Arakelov theory with much success.
Examples are given at the end and in particular we will describe quite explicitly the theory
when applied to Tate’s elliptic curves.
Let K be a field with a given set of places MK . For every place v ∈ MK , we fix an
absolute value | |v on K in the equivalence class determined by v . We assume that
{v ∈ MK | |α|v = 1} is finite for every α ∈ K \ {0} . We also fix an embedding K ⊂ K
and denote by M a set of places of K such that every u ∈ M restricts to a v ∈ MK . We
denote by | |u the unique extension of | |v to an absolute value in the equivalence class of
u.

9.5.1. Let L be a line bundle on a complete K -variety X . For the moment, we fix a place
u , which is the reference for boundedness and metrics. Let  ,   be locally bounded
metrics on L . By Proposition 2.6.6, the norm of the constant section 1 with respect to the
metric  /  of OX is a bounded function ρ and we set
d( ,   ) := sup | log ρ(x)|.
x∈X

Obviously, d is a distance function on the space of locally bounded metrics on L . The


existence of such metrics was shown in Proposition 2.7.5. By choosing one such metric
  , the map  
→ log( ,   ) is a non-canonical isometry onto the Banach space
of bounded real functions on X with supremum norm.

Lemma 9.5.2. The distance d on the space of locally bounded metrics of L satisfies:
⊗n
(a) If n ∈ Z , then d( ⊗n ,   ) = |n| · d( ,   ) .
(b) If ϕ : X  → X is a morphism, then d(ϕ∗  , ϕ∗   ) ≤ d( ,   ) .
Proof: These properties follow immediately from the definitions. 
302 N É RO N – TAT E H E I G H T S

9.5.3. We pick up the case of a dynamical system as in 9.2.9. Let ϕ : X → X be a


morphism of a complete K -variety X . We fix k, l ∈ Z , |k| > |l| , and we consider a line
bundle L on X with an isomorphism

θ : ϕ∗ (L)⊗l −→ L⊗k . (9.10)

Theorem 9.5.4. There is a unique locally bounded M -metric ( u )u∈M on L satisfying


 ⊗k ∗ ⊗l
u ◦ θ = (ϕ  u ) . (9.11)
Proof: Let us fix u ∈ M as in 9.5.1 and let B be the Banach space of locally bounded
u -metrics on L . Note that an u -metric   on L⊗k induces a metric  1/k on L
characterized by s(x)1/k = s⊗k (x)1/k for every local section s of L . We consider
the map
 1/k
Φ : B −→ B,  
→ (ϕ∗  )⊗l ◦ θ−1 .
By Lemma 9.5.2, Φ is a contraction with factor λ = |l/k| < 1 . Banach’s fixed point
theorem in 11.5.15 yields that Φ has a unique fixed point, meaning that there is a unique
locally bounded u -metric  u on L satisfying (9.11). To prove that ( u )u∈M is a
locally bounded M -metric, we choose any locally bounded M -metric ( 0,u )u∈M of
L . Then the proof of Banach’s fixed point theorem gives  u = limk→∞ Φk ( 0,u )
and a detailed analysis of the occuring estimates proves
1
d( 0,u ,  u ) ≤ · d( 0,u , Φ( 0,u )).
1−λ
The right-hand side is M -bounded. We conclude that ( u / 0,u )u∈M is a locally
bounded M -metric and hence the same is true for ( u )u∈M . 
Remark 9.5.5. We denote the M -metric in Theorem 9.5.4 by ( θ,u )u∈M . If θ is
another isomorphism in (9.10), then θ ◦ θ−1 is an automorphism of L⊗k given by multi-
plication with a nowhere vanishing regular function a . By A.6.15, a is constant on every
irreducible component of XK . From (9.11), we get
|a|−1 ⊗k  ∗ ⊗l
u · ( θ,u ◦ θ ) = (ϕ  θ,u ) .
1
Uniqueness in Theorem 9.5.4 implies  θ  ,u = |a|ul −k  θ,u .
9.5.6. Usually, we cannot hope for a canonical isomorphism θ . We clarify the role of θ by
introducing rigidifications.
Let X be an irreducible smooth complete variety over K with base point P0 ∈ X(K) .
A rigidification of a line bundle L on X is the choice of an element ν ∈ LP 0 (K) . An
isomorphism of rigidified line bundles is an isomorphism of the underlying line bundles
mapping the rigidification of the first to the rigidification of the latter.
The tensor product of two rigidified line bundles is defined by
(L1 , ν1 ) ⊗ (L2 , ν2 ) = (L1 ⊗ L2 , ν1 ⊗ ν2 ).
The pull-back with respect to a base point preserving morphism ψ : X  → X of irre-
ducible smooth varieties over K is defined by
ψ ∗ (L, ν) = (ψ ∗ (L), ψ ∗ (ν))
9.5. The Néron symbol 303

using the canonical homomorphism ψ ∗ : LP 0 → ψ ∗ (L)P 0 . The advantage of rigidifica-


tions is that it makes relations in the Picard group concrete. If (L, ν), (L, ν  ) are rigid-
ified line bundles with L ∼

= L , then there is a unique isomorphism θ : L → L with

θ(ν) = ν . Existence is obvious and as any other isomorphism is given by multiplication
with a nowhere vanishing regular function λ , uniqueness is clear from the fact that λ has
to be constant (use that X is geometrically irreducible (cf. A.7.14) and then completeness
in A.6.15). By A.4.11, we get λ ∈ K .
Recall that on an abelian variety, we always use 0 as a base point. The next result gives us
canonical M -metrics on rigidified line bundles of an abelian variety.
Theorem 9.5.7. For every rigidified line bundle (L, ν) of an abelian variety A over K ,
there is a unique locally bounded M -metric ( ν,u )u∈M satisfying the properties:

(a) An isomorphism of rigidified line bundles is an isometry.


(b)  ν 1 ,u ⊗  ν 2 ,u =  ν 1 ⊗ν 2 ,u (u ∈ M ) .
(c) With the rigidification 1 of OA , it holds  1,u = | |u (u ∈ M ) .

(d) If ψ : A → A is a homomorphism of abelian varieties over K , then
ψ ∗  ν,u =  ψ ∗ ν,u .
Proof: First, we prove that the theorem holds for all even line bundles. We fix an integer
m ∈ Z , |m| ≥ 2 . By Proposition 8.7.1 and 9.5.6, there is a unique isomorphism
∼ 2
θ : [m]∗ (L, ν) −→ (L, ν)⊗m (9.12)
of rigidified line bundles. We apply Theorem 9.5.4 with ϕ = [m] to get a unique locally
bounded M -metric ( ν,u )u∈M on L satisfying
2
 ⊗m ∗
ν,u ◦ θ = [m]  ν,u (u ∈ M ). (9.13)
Using elementary properties of line bundles, it is easy to show that (a)–(d) hold.
To prove uniqueness, assume that every rigidified even line bundle (L, ν) on A is endowed
with a locally bounded M -metric ( ν,u )u∈M satisfying (a)–(d). For any integer m ,
|m| ≥ 2 , property (d) implies
[m]∗  ν,u =  [m]∗ ν,u (u ∈ M ). (9.14)

Using θ as in (9.12) and (a), (b), we get


2
 [m]∗ ν,u =  ν ⊗m 2 ,u ◦ θ =  ⊗m
ν,u ◦ θ. (9.15)
Note that (9.14) and (9.15) give identity (9.13), which characterize the metric uniquely. This
proves uniqueness.
Similarly, we show existence and uniqueness for odd line bundles replacing the tensor power
m2 by m . An arbitrary line bundle L is isomorphic to the tensor product of an even line
bundle L+ and an odd line bundle L− . We use this factorization, which is unique up to
2 -torsion in the Picard group (Corollary 8.8.2), to get well-defined canonical M -metrics
( ν,u )u∈M on L with the required properties. 
304 N É RO N – TAT E H E I G H T S

Remark 9.5.8. The theorem gives canonical metrics for rigidified line bundles on all
abelian varieties simultaneously. We have seen in the proof that uniqueness already fol-
lows for all rigidified line bundles on a given abelian variety from (a), (b) and the special
case ψ = [m] of (d), where m ∈ Z, |m| ≥ 2 is fixed.
Remark 9.5.9. Note that νν,u = 1 . This follows from uniqueness because the locally
bounded M -metrics (ν−1 
ν,u  ν,u )u∈M also satisfy (a)–(d) of Theorem 9.5.7. If ν is
another rigidification on L , then
 ν  ,u = |ν/ν  |u ·  ν,u (u ∈ M ).
Hence the M -metric ( ν,u )u∈M is canonically determined by L up to (|a|u )u∈M for
some a ∈ K × . The corresponding canonical local height will be canonically determined
by the divisor up to (log |a|u )u∈M for some a ∈ K × .
We define an M -constant to be a family (γv )v∈M of real numbers with γv = 0 only for
finitely many v ∈ M . Hence the canonical local height above is determined by the divisor
up to an M -constant.
9.5.10. In order to eliminate this indeterminacy, we introduce first some notation. Let f
be a non-zero rational function on a smooth irreducible variety X . Recall that Z0 (XK )
denotes the group of zero-dimensional cycles on the  base change XK . We denote here
by [P ] the cycle associated to P ∈ X . For Z = nj Pj ∈ Z0 (XK ) with supp(Z) ∩
supp(div(f )) = ∅ , we define
 ×
f (Z) = f (Pj )n j ∈ K .
j

If λD is a local height with respect to a Néron divisor D on X , it is natural to extend it


additively to zero-dimensional cycles, i.e.

λD (Z, u) := nj λD (Pj , u)
j

for all Z = j nj Pj ∈ Z0 (XK ) with supp(Z) ∩ supp(D) = ∅ and u ∈ M .

By B0 (X) , we denote the subgroup of Z0 (XK ) , which is the kernel of the degree map.
In the next result, we will make use of the fact that the indeterminacy of the canonical local
heights from Remark 9.5.9 cancels by restriction to B0 (X) .
We also make use of the pull-back τa∗ of cycles with respect to translation. Pull-back of
cycles is defined with respect to flat morphisms, but here for an isomorphism it is simply
the inverse of push-forward, meaning that τa∗ (Y ) = Y − a for any prime cycle.
Theorem 9.5.11. Let A be an abelian variety over K . For u ∈ M , a divisor D on A
and Z ∈ B0 (A) with supp(D) ∩ supp(Z) = ∅ , there is a pairing (D, Z)u ∈ R called
the Néron symbol, which is uniquely characterized by the following properties:

(a) If D, D are divisors with (supp(D) ∪ supp(D )) ∩ supp(Z) = ∅ , then


(D + D , Z)u = (D, Z)u + (D , Z)u .
(b) If Z, Z  ∈ B0 (A) with supp(D) ∩ (supp(Z) ∪ supp(Z  ) , then
(D, Z + Z  )u = (D, Z)u + (D, Z  )u .
9.5. The Néron symbol 305

(c) If f ∈ K(A)× with supp(div(f )) ∩ supp(Z) = ∅ , then


(div(f ), Z)u = − log |f (Z)|u .
(d) If a ∈ A , then
(τa∗ (D), τa∗ (Z))u = (D, Z)u .
(e) For every P0 ∈ A \ supp(D) , the function
(A \ supp(D)) × M −→ R, (P, u)
→ (D, [P ] − [P0 ])u
is locally M -bounded.
Proof: We endow O(D) with a rigidification ν . Then Theorem 9.5.7 gives a canoni-
cal locally bounded M -metric  ν,u on O(D) and we denote the corresponding Néron
divisor by D ν . It gives rise to a local height λ  ν on A , which we extend to Z0 (A ) as
D K
in 9.5.10. For Z ∈ B0 (A) with supp(D) ∩ supp(Z) = ∅ , we set
(D, Z)u := λD ν (Z, u).
We have seen in Remark 9.5.9 that λD ν depends on the rigidification ν by an additive
M -constant. Since deg(Z) = 0 , we conclude that (D, Z)u does not depend on the
rigidification.
Properties (a), (c), (e) follow immediately from Theorem 9.5.7, and (b) is obvious from the
definition of the local height on Z0 (AK ) . In order to prove (d), let ν, νa be rigidifications
on O(D) and O(τa∗ D) . Lemma 9.5.13 below shows that τa∗  ν,u = Cu ·  ν a ,u for a
constant Cu > 0 . As we have seen above, M -constants do not influence the pairing and
(d) follows from
(τa∗ (D), τa∗ (Z))u = λτ ∗ D ν (τa∗ (Z), u) = λD ν ((τa )∗ ◦ τa∗ (Z), u) = λD ν (Z, u).
a

To prove uniqueness, let ( , ) be another pairing with properties (a)–(e). We consider


[D, Z]u := (D, Z)u − (D, Z)u .
By (c), it depends only on the rational equivalence class cl(D) of D . Now we need the
following moving lemma:
Lemma 9.5.12. Given a Cartier divisor D and finitely many points P1 , . . . , Pg on a pro-
jective variety over an infinite field, there is always a Cartier divisor D such that D − D
is a principal Cartier divisor with
{P1 , . . . , Pg } ∩ supp(D ) = ∅.
Proof: Every Cartier divisor on a projective variety is, up to a principal Cartier divisor,
equal to the difference of two very ample divisors (see A.6.10(a), A.8.16). So we may
assume D very ample giving rise to a closed embedding ι into projective space. Then K
infinite allows us to choose a hyperplane H omitting P1 , . . . , Pg and we get the claim for
D := ι∗ (H) . 
Continuation of the proof of Theorem 9.5.11: Note first that in the whole section we have
implicitly assumed that K is infinite. Otherwise, no place on K would exist. The moving
lemma allows us to define the pairing [D, Z]u for all divisors D and all Z ∈ B0 (A)
without assuming supp(D) ∩ supp(Z) = ∅ . By the theorem of the square in 8.5.2, we
have
m(τa∗ (D) − D) ∼ τma ∗
(D) − D
306 N É RO N – TAT E H E I G H T S

for every m ∈ Z and hence (d) proves


m[τa∗ (D) − D, Z]u = [D, (τma )∗ (Z) − Z]u . (9.16)
For a moment, we assume that Z = [P ] − [P0 ] for P, P0 ∈ A \ supp(D) . By (e), the
right-hand side of (9.16) is a locally M -bounded function of b = ma on the open subset
UD := {b ∈ A | supp(D) ∩ {P + b, P0 + b} = ∅}.
By the moving lemma, we may replace D by finitely many divisors D ∼ D such that
the open subsets UD  cover A . Independence of D and Remark 2.6.14 prove that the
right-hand side of (9.16) is a locally M -bounded function of b = ma on the whole A .
Since A is an M -bounded set (Proposition 2.6.17), we conclude that the left-hand side of
(9.16) is an M -bounded function of m and a . Letting m → ∞ and taking into account
(a), we deduce that
[τa∗ (D), Z]u = [D, Z]u .
By linearity (b) of the pairing in Z , this identity holds for all Z ∈ B0 (A) . From (d), we
deduce
[D, (τa )∗ (Z) − Z]u = 0.
Thus [D, · ]u vanishes on the subgroup of B0 (A) generated by the cycles of the form
(τa )∗ (Z) − Z for varying a ∈ A and Z ∈ B0 (A) . By induction, it is easily seen that this
subgroup is equal to the kernel of the homomorphism
 
S : B0 (A) −→ A, mj [Pj ]
→ mj Pj .
j j

In order to prove uniqueness, it is enough to show that [D, [P ] − [0]]u = 0 for every
P ∈ A . For m ∈ Z , the cycle
m ([P ] − [0]) − ([mP ] − [0])
has degree 0 and is in the kernel of S . By our considerations above and (b), we get
m[D, [P ] − [0]]u = [D, [mP ] − [0]]u . (9.17)
With the same arguments as for (9.16), we deduce that the right-hand side of (9.17) is an
M -bounded function of Q = mP . Hence the left-hand side of (9.17) is an M -bounded
function of m and P and as m → ∞ this is possible only if [D, [P ] − [P0 ]]u = 0 . This
proves uniqueness. 
Lemma 9.5.13. Let (L, ν) be a rigidified line bundle on the abelian variety A over K , let
a ∈ A and let νa be a rigidification on τa∗ (L) . For u ∈ M , there is a constant Cu > 0
such that
τa∗  ν,u = Cu ·  ν a ,u .
Proof: In order to compare the two metrics, we consider as in 9.5.1 the function ρ on A
given as the norm of 1 with respect to the metric τa∗  ν,u / ν a ,u on OA . We have to
prove that
ωL,a (P ) := log ρ(P ) − log ρ(0)
vanishes identically in P ∈ A . By Remark 9.5.9, it is clear that ωL,a does not depend on
the choice of the rigidifications ν, νa . From Theorem 9.5.7, we easily deduce the following
properties:
9.5. The Néron symbol 307

(a) ωL,a depends only on the isomorphism class of L .


(b) ωL 1 ⊗L 2 ,a = ωL 1 ,a + ωL 2 ,a .
(c) If ϕ : A → A is a homomorphism of abelian varieties and a ∈ A , then
ωϕ ∗ L,a  = ωL,ϕ(a  ) ◦ ϕ.
The main point is to prove that ωL,a is also linear in a . The theorem of the cube in 8.6.11
gives us a unique isomorphism
 ∗
4  |I | ∼
pi (L, ν)(−1) −→ (OA 3 , 1) (9.18)
I ⊂{1,2,3} i∈I
3
of rigidified line bundles on A .
ν
Now replacing (L, ν) by the metrized line bundle L := (L,  ν,u ) and (OA 3 , 1) by
OA 3 with the trivial metric, we see that this is an isometry by Theorem 9.5.7. Then by
pull-back with respect to the morphism P
→ (P, a, a ) , we get an isometry
∗ ∗ −1 ∼
⊗ τa∗ (L )−1 ⊗ L −→ OA 3
ν ν ν ν C
τa+a  (L ) ⊗ τa (L ) (9.19)
C
in the theorem of the square. Here, is endowed with a constant metric 1C := C for
OA 3
some constant C > 0 . The constant arises from the fact that the trivial contributions from
the theorem of the cube may have non-trivial metrics. By Theorem 9.5.7, a similar isometry
as in (9.19) holds, provided we replace the metrics by the canonical ones on τa∗ (L), τa∗ (L)

and τa+a  (L) with respect to some rigidifications. This proves immediately

(d) ωL,a+a  = ωL,a + ωL,a 

because the constants do not influence (d).


In order to prove that ωL,a vanishes identically, it is enough to consider separately the even
2
and odd cases. Suppose first that L is even. Then we from [m]∗ L ∼ = L⊗m we infer that
m2 ωL,a = ω[m]∗ L,a = ωL,ma ◦ [m] = m · ωL,a ◦ [m].
Because the metrics in question are bounded, it is clear that ωL,a is a bounded function
and, letting m → ∞ , we conclude that ωL,a = 0 .
If instead L is odd, then there is a unique isomorphism
(p1 + p2 )∗ (L, ν) ⊗ p∗1 (L, ν)−1 ⊗ p∗2 (L, ν)−1
of rigidified line bundles (Theorem 8.8.3 ). By pull-back with respect to the morphism
P
→ P + a , we get an isometry

τa∗ L ⊗ (L )−1 −→ OA
ν ν C

as above. This proves directly that ωL,a = 0 . 


Corollary 9.5.14. If K satisfies the product formula and L is a line bundle on A , then the
Néron–Tate height 
hL is equal to the global height associated to (L,  ν ) (see 2.7.17),
where  ν is the canonical M -metric on L with respect to any rigidification ν .
Proof: By (9.18) and Proposition 2.7.18, we see that h(L, ν ) is a quadratic function in
the height class hL . By Theorem 9.2.8, we get the claim. 
308 N É RO N – TAT E H E I G H T S

Remark 9.5.15. It is obvious that the Néron symbol is compatible with extension of the
base field. So it makes sense to work over an algebraic closure K . Note however that
the Néron symbol does not extend to arbitrary complete smooth varieties X over K , this
happens only if D belongs to the group B 1 (X) of divisors algebraically equivalent to zero.
Theorem 9.5.16. Let X be an irreducible smooth complete variety over K with base
point P0 . For every line bundle L on X algebraically equivalent to 0 and rigidification
ν , there is a unique locally bounded M -metric ( ν,u )u∈M on L satisfying the following
properties:

(a) An isomorphism of rigidified line bundles is an isometry.


(b)  ν 1 ,u ⊗  ν 2 ,u =  ν 1 ⊗ν 2 ,u (u ∈ M ) .
(c) On OX with rigidification 1 , it holds  1,u = | |u (u ∈ M ) .
(d) If ϕ : X  → X is a morphism of irreducible smooth complete varieties over K
mapping the base point of X  to the base point of X , then
ϕ∗  ν,u =  ϕ ∗ ν,u .
Proof: Let A := P ic0 (X) and A  := P ic0 (A) . By the theory of Picard varieties (Theorem
8.4.7), there is a unique morphism ψ : X → A  with (idA × ψ)∗ pA = ptX , where ptX is
the pull-back of pX to A × X with respect to (a, P )
→ (P, a) . For c = cl(L) ∈ A , we
have
ψ ∗ (pA |{c}×A ) = pX |X ×{c} = c.
By Theorem 9.5.7, the rigidified line bundles in the class pA |{c}×A have canonical metrics
and we use pull-back with respect to ψ to get a well-defined locally bounded M -metric
( ν,u )u∈M on L . From the properties of Picard varieties developed in Section 8.4, we
easily deduce (a)–(d). By applying Theorem 9.5.7 to odd line bundles, as we have done
in its proof, and noting in addition that odd is the same as algebraically equivalent to 0
(Theorem 8.8.3), we get uniqueness by construction. 
Theorem 9.5.17. Let X be an irreducible smooth complete variety over K and let u ∈
M . For D ∈ B 1 (X) and Z ∈ B0 (X) with supp(D) ∩ supp(Z) = ∅ , there is a Néron
symbol (D, Z)u ∈ R uniquely determined by the following properties:

(a) (D, Z)u is bilinear in D and Z .


(b) If f ∈ K(X)× with supp(div(f )) ∩ supp(Z) = ∅ , then
(div(f ), Z)u = − log |f (Z)|u .

(c) If ϕ : X → X is a morphism of irreducible smooth complete varieties over K
and Z  ∈ B0 (X  ) with ϕ(supp(Z  )) ∩ supp(D) = ∅ , then
(ϕ∗ (D), Z  )u = (D, ϕ∗ (Z  ))u .
(d) For P0 ∈ X \ supp(D) , the function P
→ (D, [P ] − [P0 ])u is locally M -
bounded on (X \ supp(D)) × M .
Proof: We choose a base point P0 ∈ X and we endow O(D) with a rigidification ν . By
 ν = (D,  ν ) and we set
Theorem 9.5.16, we get a canonical Néron divisor D
(D, Z)u = λD ν (Z, u).
9.5. The Néron symbol 309

Since deg(Z) = 0 , it is clear that (D, Z)u does not depend on the choices of base point
and rigidification. Now we easily deduce (a)–(d) from Theorem 9.5.16.
For uniqueness, we proceed as in the proof of Theorem 9.5.11. We get a pairing [D, Z]u
with properties (a),(c), and (d), which is defined for all D ∈ B 1 (X), Z ∈ B0 (X) and
which depends only on the rational equivalence class of D . By Theorem 9.5.11, the re-
striction of the Néron symbol to abelian varieties must be unique, i.e. [D, Z]u = 0 for
abelian varieties. In general, we have seen in the proof of Theorem 9.5.16 that D is ra-
tionally equivalent to the pull-back of a divisor algebraically equivalent to 0 on the dual
abelian variety of P ic0 (X) . We conclude from (c) and the above that [D, Z]u = 0 . This
proves uniqueness. 
9.5.18. Let X, X  be irreducible smooth complete varieties over K . We consider a divisor
E on X × X  called a correspondence. For P ∈ X such that {P } × X  ⊂ supp(E)
(resp. P  ∈ X  with X × {P  } ⊂ supp(E) ), we define a divisor on X  (resp. X ) by
E(P ) := (p2 )∗ (E.({P } × X  ))
and
t
E(P  ) = (p1 )∗ (E.(X × {P  })),
where p1 , p2 are the projections of X ×X  onto the factors and where we use the proper in-
tersection product from A.9.20. By linearity, we extend these operations to zero-dimensional
cycles. Almost by definition, the resulting divisor has to be algebraically equivalent to 0 if
the zero-dimensional cycle has degree 0 .
Theorem 9.5.19. Let Z ∈ B0 (X), Z  ∈ B0 (X  ) with supp(Z)×supp(Z  ) disjoint from
supp(E) . Then the reciprocity law
(E(Z), Z  )u = (t E(Z  ), Z)u
holds for every u ∈ M .
Proof: We deal first with the case where X and X  are abelian varieties. Let Pj be an
irreducible component of Z . For the canonical meromorphic section sE of O(E) , we
have  
E(Pj ) = (p2 )∗ div(sE |{P j }×X  ) .
Let ( j,u )u∈M be the canonical M -metric of O(E(Pj )) with respect to a rigidification.
  
If Z = mj [Pj ] and Z  = mk [Pk ] , then the proof of Theorem 9.5.17 shows that


(E(Z), Z )u = − mj mk log sE (P j ) (Pk )j,u . (9.20)
j,k

By Theorem 9.5.7 and Lemma 9.5.13, the pull-back of a canonical metric  E ,u on O(E)
with respect to the morphism P 
→ (Pj , P  ) is equal to  j,u times a positive constant
Cu . The latter does not influence (9.20) because deg(Z  ) = 0 , thus we obtain

(E(Z), Z  )u = − mj mk log sE (Pj , Pk )E ,u .
j,k

This is completely symmetric in Z and Z  , proving the case of abelian varieties.


The general case can be reduced to the previous special case of abelian varieties by appeal-
ing to the theory of the Picard variety. First, we choose base points P0 ∈ X and P0 ∈ X 
such that {P0 } × supp(Z  ) and supp(Z) × {P0 } are both disjoint from supp(E) . The
310 N É RO N – TAT E H E I G H T S

proper intersection product E.(X × {P0 }) (resp. E.({P0 } × X  ) ) induces a well-defined


divisor Y on X (resp. Y  on X  ). We define the divisors Y ×X  and X ×Y  on X ×X 
by linearity in the components. Both have support disjoint from supp(Z)×supp(Z  ) . The
class c of E −Y ×X  −X ×Y  in Pic(X ×X  ) is a subfamily of Pic0 (X) parametrized
by X  . By Theorem 8.4.7, there is a unique morphism ϕ : X  → A := P ic0 (X) with
(idX × ϕ )∗ (pX ) = c .
On the other hand, there is a unique morphism ϕ : X → A  with (idA × ϕ)∗ (pA ) = ptX .
Therefore, we conclude that
(ϕ × ϕ )∗ (ptA ) = c.

By the moving lemma in 9.5.12, there is a divisor Γ on A×A 
with class ptA ∈ Pic(A×A)
and with support disjoint from ϕ(supp(Z)) × ϕ (supp(Z  )) . Then EΓ := (ϕ × ϕ )∗ (Γ)
is well defined as a divisor (see A.8.26). By construction, there is a rational function f on
X × X  with
E = EΓ + Y × X  + X × Y  + div(f ),
and the support of div(f ) is disjoint from supp(Z) × supp(Z  ) . It is easily seen from
9.5.18 that
(Y × X  )(Z) = 0, (X × Y  )(Z) = 0
and
div(f )(Z) = div(f (Z, ·)).
From these identities and the similar ones for Z  , we deduce the claim for the correspon-
dences Y × X  , X × Y  and div(f ) . Hence we may assume that E = EΓ . We claim
that

ϕ (Γ(ϕ∗ Z)) = E(Z), ϕ∗ (t Γ(ϕ∗ Z  )) = t E(Z  ) (9.21)
hold. By Theorem 9.5.17(c) and the case of abelian varieties considered above, this proves
immediately the reciprocity law.
It remains to prove the two identities in (9.21). We begin by proving the first identity.
By linearity, we may assume that Z = [P ] ∈ X . The projection formula for proper
intersection products (see [125], Prop.2.3) yields
E(P ) = (p2 )∗ (E.({P } × idX  )∗ (X  )) = ({P } × idX  )∗ E
and a similar identity for Γ(ϕ(P )) . We deduce
(ϕ )∗ (Γ(ϕ(P ))) = (ϕ )∗ ({ϕ(P )} × idA )∗ (Γ)
= ({P } × idX  )∗ (ϕ × ϕ )∗ (Γ)
= E(P ),
proving what we want. The argument for the second identity is essentially the same. 
Corollary 9.5.20. For an irreducible smooth projective curve C over K and u ∈ M ,
there is a unique pairing (D, D )u ∈ R , well defined for divisors D, D of degree 0 with
supp(D) ∩ supp(D ) = ∅ , satisfying the following properties:

(a) It is bilinear.
(b) If f ∈ K(C)× , then (div(f ), D )u = − log |f (D )|u .
9.5. The Néron symbol 311

(c) It is symmetric, i.e. (D, D )u = (D , D)u .


(d) For P0 ∈ supp(D) , the map (P, u)
→ (D, P − P0 )u is locally M -bounded
on (C \ supp(D)) × M .
Proof: For existence, we use the Néron symbol. Clearly, it satisfies (a), (b), (d). The
diagonal ∆ is a correspondence on C × C and ∆(D) = D , t ∆(D ) = D . By the
reciprocity law in 9.5.19, we get
(D, D )u = (∆(D), D )u = (t ∆(D ), D)u = (D , D)u
proving (c).
To prove uniqueness, we consider again the difference [ , ]u of two such pairings. It is
clear that [D, D ]u depends only on the rational equivalence classes of D and D . Let
m ∈ Z and P, P0 ∈ C . By the Riemann–Roch theorem in A.13.5, there is an effective

divisor D+ with

D+ − g[P0 ] ∼ m([P ] − [P0 ]).

Because D+ is an effective divisor of degree g , property (d) implies that

[D, m([P ] − [P0 ])]u = [D, D+ − g[P0 ]]u
is a bounded function of (P, m) . In fact, we have to use the moving lemma as in the proof
of Theorem 9.5.11 to get boundedness on the whole C . Using (a) and letting m → ∞ , we
get [D, [P ] − [P0 ]]u = 0 proving the claim. 
Example 9.5.21. We consider the special case K = C with the usual absolute value | |u .
Let L be a line bundle on the complex abelian variety A . We claim that a metric   on
L is a canonical metric with respect to a suitable rigidification if and only if   is a C ∞ -
metric with harmonic first Chern form. For details about the differential geometric tools,
we refer to [130].
To prove the claim, note first that the first Chern form determines the metric up to multipli-
cation with a positive constant. For a divisor D on A , there is a unique harmonic represen-
∞ 
r a C -metric   on O(D)
tative ωD of the first Chern class of O(D) and we choose
with first Chern form ωD ([130], p.148). If Z = j=1 mj [Pj ] is a zero-dimensional
cycle of degree 0 with supp(D) ∩ supp(Z) = ∅ , then we set

r
(D, Z)u := − mj log sD (Pj ) .
j=1

Using deg(Z) = 0 , it is clear that the pairing does not depend on the choice of   .
Obviously, the pairing fulfills properties (a)–(c) of Theorem 9.5.11. Using the fact that
the harmonic forms on a complex torus are the same as the translation invariant forms,
we immediately deduce (d) as well. Finally, (e) follows from smoothness of the function
P
→ (D, [P ] − [P0 ])u on A \ supp(D) . We conclude that ( , )u is the Néron symbol.
Let   be the canonical metric of O(D) with respect to a rigidification. From the proof
of Theorem 9.5.11, it is clear that
(D, [P ] − [0])u = log sD (0) − log sD (P )
for every P ∈ supp(D) . This shows easily that   =   up to multiplication with a
positive constant.
312 N É RO N – TAT E H E I G H T S

Example 9.5.22. Now let K be a field with a complete discrete absolute value | |v nor-
malized by log |K × |v = Z . It has a unique extension to an absolute value | |u of K (see
Proposition 1.2.7). Let A be an abelian variety over K with good reduction in v , i.e. there
is a proper smooth scheme A over the discrete valuation ring Rv with AK = A called the
Néron model. Note also that A is a group scheme unique up to isomorphism (cf. 10.3.9).
We claim that a metric   on a line bundle L of A is a canonical metric with respect
to some rigidification if and only if there is a line bundle L on A with L = LK and
  =  L (Example 2.7.20).
For the proof, we note first that the restriction map L
→ LK gives an isomorphism from
Pic(A) onto Pic(A) . The inverse is induced by the map D
→ D of divisors taking Zariski
closures of components. To see this, use that the special fibre of A over the residue field is
irreducible and therefore every divisor supported in the special fibre is rationally equivalent
to 0 .
For L = LK with rigidification ν , we set
 ν,u := ν−1
L ·  L .

It is easily checked that our metric  ν,u on L satisfies (a)–(d) of Theorem 9.5.7, where
we allow in (d) only endomorphisms of A . Because it is also bounded (Example 2.7.20),
Remark 9.5.8 yields that the metric  ν,u is the canonical metric  ν,u from Theorem
9.5.7. This proves our claim.
Remark 9.5.23. For an archimedean place u , we may always reduce to Example 9.5.21
by base change to C . Hence canonical metrics  u are always C ∞ .
For a non-archimedean place u ∈ M , canonical metrics  u are always continuous with
respect to the u -topology and satisfy L(K)u = |K|u . This follows from Tate’s limit
argument in the proof of Theorem 9.5.4 and Remark 2.7.6. In the case of good reduction in
a discrete valuation, Example 9.5.22 shows that even L(K)u = |K|u holds. However,
the following example will show that this is not in general the case. Note also that |K|u =
1/n
∪n∈N\{0} |K|u (see Lemma 11.5.2).
Example 9.5.24. Every complex elliptic curve is biholomorphic to a complex torus C/(Z+
Zτ ) , τ > 0 . The map ζ = exp(2πiz) gives an analytic group isomorphism to the Tate
uniformization C× /q Z , q := e2πiτ . As in 8.6.13, we consider the theta function


2
θ(z, τ ) = eπin τ +2πinz
.
n=−∞

It transforms into


2
θ(ζ, q) = q̃ n ζ n , (9.22)
n=−∞

where q̃ = eπiτ . As an infinite product, it has the form



∞ 
∞ 

θ(ζ, q) = (1 − q̃ 2n ) · (1 + q̃ 2n−1 ζ) · (1 + q̃ 2n−1 ζ −1 ),
n=1 n=1 n=1
9.5. The Néron symbol 313

which shows that θ(·, q) has only simple zeros and they are precisely in ζ = −q̃ · q Z
(for details, see K. Chandrasekharan [62], Ch.V, Th.6). Let p : C× → C× /q Z be the
quotient map and let P be the 2 -torsion point p(−q̃) . Then p∗ O([P ]) is trivial and may
be identified with C× × C such that the canonical global section sP of O([P ]) pulls back
to ζ
→ (ζ, θ(ζ, q)) (use 8.6.13).
Conversely, O([P ]) may be obtained as the quotient of the trivial bundle by a Z -action
Z × (C× × C) −→ (C× × C), (k; ζ, v)
→ (q k ζ, ek (ζ, q) · v).
To determine the cocycle ek (ζ, q) , note that


2 2 

2 2
θ(q k ζ, q) = q̃ n ζ = q̃ −k
+2kn n
q̃ (n+k) ζ n = q̃ −k ζ −k θ(ζ, q)
n=−∞ n=−∞

−k 2 −k
and hence ek (ζ, q) = q̃ ζ . Note that O([P ]) gets rigidified by choosing ν corre-
sponding to 1 in the fibre over 0 of the trivial bundle. For m ∈ Z , [m]∗ O([P ]) is given
by the cocycle
2
m 2 −km 2 2
ekm (ζ m , q) = q̃ −k ζ = ek (ζ, q)m .
The right-hand side is the cocycle of O(m2 [P ]) , hence O([P ]) is even and we have also
checked [m]∗ O([P ]) ∼ = O(m2 [P ] . By our rigidifications, the isomorphism is uniquely
determined. To determine the canonical metric  ν , we compute the positive function
µ := 1ν . In fact, 1 is a section of p∗ O([P ]) = C× × C , but, locally, it may be also
viewed as a section of O([P ]) determining the metric completely. Compatibility of the
fibres over ζ and q k ζ leads to
µ(q k ζ) = |e−k (q k ζ, q)| · µ(ζ). (9.23)
2
On the other hand, [m]∗  ν =  ⊗m
ν gives
2
µ(ζ m ) = µ(ζ)m .
To find the canonical metric, we may start with any metric, i.e. with a µ0 satisfying (9.23).
For |q| ≤ |ζ| ≤ 1 , we choose µ0 (ζ) = |ζ|1/2 and we extend µ0 by (9.23) to C× leading
to a continuous function. Then the proof of Theorem 9.5.4 gives
2
µ(ζ) = lim µ0 (ζ m )1/m .
m→∞

For m ∈ Z , we choose k(m) ∈ Z with ζm := ζ m /q k(m) satisfying |q| < |ζm | ≤ 1 .


Then (9.23) leads to
2 k (m )2 k (m )

µ(ζ) = lim |e−k(m) (ζ m , q)|1/m = lim |q| 2m 2 |ζ| m .
m→∞ m→∞

We have k(m) = log |ζ|/ log |q| + O(1) , thus proving


1 2 log |ζ |
µ(ζ) = |q|− 2 (log |ζ |/ log |q|) · |ζ|log |ζ |/ log |q| = |ζ| 2 log |q | . (9.24)

It is easy to check that µ satisfies (9.23) and therefore (9.24) holds for all ζ ∈ C× .
We owe to Tate the counterpart of these considerations for the non-archimedean case. Let
K be a field endowed with a complete (discrete) absolute value | | . For every |q| < 1 , the
314 N É RO N – TAT E H E I G H T S

×
analytic torus K /q Z is isomorphic as an analytic group to Tate’s elliptic curve Eq given
by
y 2 + xy = x3 + a4 x + a6 ,
where a4 , a6 are convergent power series in q given by


1 

a4 = − n3 q n /(1 − q n ), a6 = − (7n5 + 5n3 )q n /(1 − q n ).
n=1
12 n=1
×
The isomorphism K /q Z → Eq is given by

∞ 

x(ζ, q) = q n ζ/(1 − q n ζ)2 − 2 nq n /(1 − q n )
n=−∞ n=1

∞ 

y(ζ, q) = q 2n ζ 2 /(1 − q n ζ)3 + nq n /(1 − q n )
n=−∞ n=1

and it is also an isomorphism of Gal(K/K) -modules. Note that the reduction of Eq is


given by y 2 + xy = x3 , a case called split multiplicative reduction. In fact, exactly
the elliptic curves with split multiplicative reduction have such a Tate uniformization. For
details of all these facts, we refer to P. Roquette [244] or J.H. Silverman [286], Ch.5, Th.3.1,

Th.5.3). For a fixed root q̃ = q , we define θ by (9.22) on page 312 and all considerations
above remain valid. In particular, the canonical metric  ν is determined by (9.24).
For example, we assume that | | is normalized by log |K × | = Z and that q = π 2k for a
local parameter π . For q̃ := π k , the 2 -torsion point P = p(−q̃) is K - rational and hence
L = O([P ]) is defined over K . Let Q := p(π) and let LQ be the fibre over Q . Then
(9.24) leads to
1
log LQ (K)ν = + Z.
4k

9.6. Hilbert’s irreducibility theorem

A famous theorem of Hilbert asserts that, if f (x, y) ∈ K[x, y] is an irreducible polynomial


over a number field K , then for infinitely many ξ ∈ K the polynomial f (x, ξ) , obtained
by specializing y to ξ , is also irreducible over K . Such a field is called Hilbertian.
In this section we give a proof of this theorem using the theory of heights developed here
and in Chapter 2.
9.6.1. For the reader’s convenience, we recall here from Chapter 2 some definitions and
simple facts on local heights.
Let D be a Cartier divisor on a projective variety X defined over a number field K and let
D be a presentation of D as defined in 2.2.1
D = (sD ; L+ , s; L− , t),
where O(D) = L+ ⊗ L−1
− , L+ and L− are base-point free, s and t are generating
global sections of L+ and L− respectively, and sD is the meromorphic section of O(D)
defining the Cartier divisor D .
9.6. Hilbert’s irreducibility theorem 315

As always, we normalize absolute values on number fields as in 1.3.6. This normalization


is not possible for absolute values on an algebraic closure K of the number field K . Let
M be the set of places on K . For u ∈ M , we normalize | |u so that if its restriction to K
is equivalent to | |v , v ∈ MK , and if p is the restriction of v to MQ , then | |u restricted
to Q is | |p . In what follows, we reserve the notation u to indicate absolute values | |u
with u ∈ M and v , w to indicate absolute values in finite subextensions of K/K .
For u ∈ M, we have associated local heights (see Section 2.2) on X given by
 
 sk 
λD (P, u) = log max min  (P ) .
k l sD tl u

If F is a finite subextension of K/K with F ⊃ K(P ) and | |u restricts on F to the


equivalent | |w with w ∈ MF , then
[Fw : Qp ]
λD (P, w) = λD (P, u)
[F : Q]
and for v ∈ MK the sum

λD (P, w)
w|v, w∈M F

does not depend of the choice of F , as in Lemma 1.3.7.


For notational simplicity, in what follows we abbreviate
[Fw : Qp ]
εw = .
[F : Q]

We have seen in Theorem 2.2.11 and Remark 2.2.13 (see also Theorem 2.7.14 for such a
result in a more general context) that, if λD and λD are two local heights attached to
different presentations of the same Cartier divisor D , then for u|v with v ∈ MK we have
|λD (·, u) − λD (·, u)| ≤ γv
for some constant γv = maxm log+ |pm |v for finitely many pm ∈ K , depending only
on the geometric data of the presentations. In particular, γv = 0 for all but finitely many
v ∈ MK .
For F a finite subextension of K/K such that F ⊃ K(P ) , the sum

hD (P ) := λD (P, w)
w∈M F

is independent of the choice of F and is called the global Weil height attached to D . By
Theorem 2.3.6, it is uniquely defined as a quasi-function independent of the presentation,
i.e. up to a uniformly bounded quantity.
We state now the main result of this section.
Theorem 9.6.2. Let C be a smooth irreducible projective curve defined over a number
field K and let f : C → P1 be a surjective rational function defined over K . Suppose
Q ∈ f −1 (∞) is a pole of f and let Q be a presentation of [Q] , with a corresponding
family of local heights λQ (·, u) , u ∈ M .
316 N É RO N – TAT E H E I G H T S

Then there is a family of real numbers (∆v )v∈M K with ∆v ≥ 0 and ∆v = 0 for all
but finitely many v ∈ MK , depending only on C , f and the presentation Q , with the
following property.
For P ∈ C(K) \ f −1 (∞) and any finite subextension F of K/K with F ⊃ K(P, Q) ,
we have
  ordQ (f ) 0
log+ |f (P )|w = − h(f (P )) + O( h(f (P )) + 1).
v∈M w ∈M , w |v
deg(f )
K F
λ Q (P , w )> ε w ∆v

Remark 9.6.3. In intuitive terms, if |f (P )|v > e∆v , then P is “close” to a pole Q of f .
The meaning of the theorem is that, as v varies, each pole of f is approached by P with a
probability proportional to the order of the pole.

To prove this theorem, we need first the following result:

Lemma 9.6.4. Let λ, λ be local heights relative to divisors D, D with disjoint support,
say given by presentations of D and D . Then there is a family of real numbers cv ≥ 0
and with cv = 0 for all but finitely many v ∈ MK , depending only on the presentations,
such that for u|v ∈ MK the following bound holds
 
min |λ(P, u)|, |λ (P, u)| ≤ cv .

The lemma may be easily generalized to a complete variety and to a family of local heights
relative to Néron divisors assuming that the intersection of all supports is empty.
Proof of lemma: Let us consider the open subsets U := C \ supp(D) and U  := C \
supp(D ) . Since U ∪ U  = C and C is M -bounded (see Proposition 2.6.17), there are
M -bounded families (Eu )u∈M and (Eu )u∈M of subsets of U and U  , respectively, with
C(K) = Eu ∪ Eu (see 2.6.3, 2.6.14). We note that sD is a nowhere vanishing section over
U . Since λ(P, u) = − log sD (P )u for a locally bounded metric on O(D) (Proposition
2.7.11), we conclude that
sup sup |λ(P, u)|
u∈M ,u|v P ∈E u

is finite for all v ∈ MK and 0 up to finitely many. Working with a presentation, this may
also be deduced directly from another subdivision of U and Eu using the remarks 2.6.3,
2.6.14. Similarly, λ is locally M -bounded on (Eu )u∈M and the claim follows. 
Proof of Theorem 9.6.2: By finite base change, we may assume that all poles of f are K -
rational and that C is geometrically irreducible. Indeed, there is a finite Galois extension
E/K such that the irreducible components of CK are given by conjugates (B σ )σ∈R for
a geometrically irreducible curve B over E with Q ∈ B(E) (see Example A.4.15). Then
the theorem for Q ∈ B proves the theorem for Q ∈ C as λQ is M -bounded on the
conjugates B σ = B (by Lemma 9.6.4).
Since the sum and pull-back of local heights remain a local height relative to suitable pre-
sentations (see 2.2.4 and 2.2.6), we see that
log+ |f (P )|u + ordQ (f )λQ (P, u)
9.6. Hilbert’s irreducibility theorem 317

is a local height relative to the effective divisor



D := f ∗ [∞] + ordQ (f )[Q] = − ordQ  (f )[Q ].
Q  ∈f −1 (∞)\{Q}

The preceding lemma applied to D and D := −ordQ (f )[Q] shows that for u|v ∈ MK
we also have
 
min −ordQ (f )λQ (P, u), log+ |f (P )|u + ordQ (f )λQ (P, u) ≤ cv .

For ∆v := −cv /ordQ (f ) and for a finite subextension F of K/K with F ⊃ K(P ) , this
leads to
 
log+ |f (P )|w
v∈M K w ∈M F , w |v
λ Q (P , w )> ε w ∆v
 
= −ordQ (f ) λQ (P, w) + O(1) (9.25)
v∈M K w ∈M F , w |v
λ Q (P , w )> ε w ∆v

= −ordQ (f ) hQ (P ) + O(1).

Since f ∗ [∞] − deg(f )[Q] ∈ Pic0 (C) (see A.9.40 and Corollary 8.4.10), Corollary 9.3.10
yields

1 0
hQ (P ) = h(f (P )) + O h(f (P )) + 1 .
deg(f )
If we insert this in (9.25), we get the claim. 
9.6.5. In 1887, C. Runge ([255], p.432) proved that, if G(x, y) is an irreducible polynomial
in Z[x, y] of degree d and if the homogeneous part Gd (x, y) of degree d of G is not pro-
portional to a power of an irreducible polynomial in Z[x, y] , then G(x, y) has only finitely
many zeros (x, y) ∈ Z × Q . Runge used this theorem to give a more precise criterion for
G to have only finitely many integer zeros (see [255], p.434). Runge’s method depends
on the construction of rational approximations to algebraic functions of one variable and is
quite explicit and applicable in practice.
Here we give an extension of Runge’s theorem, as a consequence of Theorem 9.6.2 and
Lemma 9.6.4.
Theorem 9.6.6. Let C be a smooth irreducible projective curve defined over a number
field K . Let f : C → P1 be a surjective rational function on C defined over K .
Let P ∈ C such that f (P ) ∈ OK (f (P )),S for a finite subset S ⊂ MK (f (P )) containing
all the archimedean places and satisfying the basic condition
deg(f )
[K(P ) : K] < |S|−1 , (9.26)
maxQ |ordQ (f )| [K(Q) : K]
where the maximum is taken over all poles of f . Then the set of such points P is finite and
effectively computable.
Remark 9.6.7. It is a simple exercise, which we leave to the reader, to deduce Runge’s
theorem from Theorem 9.6.6 by taking f = x , K = Q and S = {∞} .
318 N É RO N – TAT E H E I G H T S

Proof of theorem: For u ∈ M , log+ |f (P )|u is a local height relative to



f ∗ [∞] = − ordQ (f )[Q].
Q∈f −1 (∞)

By Theorem 2.2.11 and Remark 2.2.13, there are constants cv ≥ 0 , with cv = 0 for all but
finitely many v ∈ MK such that
  
 
 log+ |f (P )|u + ord (f )λ (P, u) ≤ cv
 Q Q  (9.27)
Q∈f −1 (∞)

for u|v . Since η := f (P ) ∈ OK (η),S , there is vη ∈ S with


log+ |η|v η ≥ |S|−1 h(η). (9.28)
Let F be as usual a sufficiently large finite Galois extension of K/K , which is a field of
definition for all points in f −1 (η) and f −1 (∞) . Fix once for all w ∈ MF and u ∈ M
with u|w|vη |v ∈ MK . By (9.27) and (9.28), we get
  
   −1 −1
 ordQ (f )λQ (P, u) ≥ εv η |S| h(η) − cv .

Q∈f −1 (∞)

Let (∆v )v∈M K be a family as in Theorem 9.6.2 working for all presentations Q . We see
that there is Q ∈ f −1 (∞) such that
λQ (P, w) > εw ∆v (9.29)
as soon as
h(η) > εv η |S| (cv + deg(f )∆v ) .
The right-hand side is bounded by an effective constant independent of the choice of P, S ,
and vη (use (9.26)), hence (9.29) holds as soon h(η) is sufficiently large, which we shall
suppose henceforth.
Now let σ be an automorphism of F/K . We write w , Q for w ◦ σ and σQ , and note
that we may choose the presentations of the poles of f and associated local heights so that
they are compatible by the action of σ . Since the action of Gal(F/K) on the places of F
extending v is transitive (see Corollary 1.3.5), we have εw = εw  and
λQ (σP, w )λQ (P, w) > εw  ∆v .
It is now clear that, if {P1 , . . . , Pr } , {Q1 , . . . , Qs } are a full set of conjugates of P and
Q over K , then
r  s 
log+ |η|w 
i=1 j=1 w  ∈M F , w  |v
λ Q (P i , w  )> ε   ∆v
j w

≥ log+ |η|w  ≥ log+ |η|v η ,
w  ∈M F , w  |v

where the last step comes from Lemma 1.3.7. Noting that ordQ j (f ) = ordQ (f ) for every
j and applying Theorem 9.6.2, we conclude with (9.28) that
|ordQ (f )| 0
|S|−1 h(η) ≤ rs h(η) + O( h(η) + 1).
deg(f )
9.6. Hilbert’s irreducibility theorem 319

By Galois theory, r = [K(P ) : K] and s = [K(Q) : K] , but since we have no informa-


tion about which pole Q is involved here we must take the maximum of s |ordQ (f )| over
all poles of f . We conclude that h(η) is bounded by an effective constant and the theorem
follows from Northcott’s theorem in 1.6.8. 
9.6.8. Theorem 9.6.6 can be applied directly to obtain irreducibility results about the fibres
of f . For example, if K = Q , S = {∞} , and all poles of f are simple and defined
over Q , we get [Q(P ) : Q] = deg(f ) for P ∈ f −1 (n) , for all but finitely many integers
n ∈ Z . Hence there are deg(f ) conjugates of P proving the irreducibility of the fibre.
Note that some condition about K and the poles of f must be imposed if we want to get
such a finiteness result. The example of the curve with affine equation y = x2 and f = y
shows that that the finiteness result does not hold if we omit the condition of simplicity of
the poles at ∞ . The example of the curve with affine equation x2 − 2y 2 = 1 and f = y ,
with the associated Pell equation, shows that the finiteness result does not hold even if f
has simple poles at ∞ defined over an extension of Q .

The reader will perceive the connexion of such a finiteness theorem with Siegel’s finiteness
theorem on integral points on curves as in Theorem 7.3.9, and indeed a proof of the finite-
ness result can be obtained by using the full force of Siegel’s theorem. However, Siegel’s
theorem is ineffective. It is therefore of some interest that Theorem 9.6.6 is strong enough
to prove a general effective version of Hilbert’s irreducibility theorem.
Theorem 9.6.9. Let C be a smooth irreducible projective curve defined over a number
field K and let f : C → P1 be a surjective rational function on C , also defined over
K . Then for all n ∈ N except for a set of natural density 0 , the divisor f ∗ [n] is a prime
divisor over K .
Remark 9.6.10. Recall that a subset M ⊂ N is called of natural density ρ if the following
limit exists and
|{n ∈ M | n ≤ x}|
ρ = lim .
x→∞ |{n ∈ N | n ≤ x}|

As will be clear from the proof, this theorem is effective in the following sense. There
are effectively computable quantities r , κ(B, r) , an effectively computable polynomial
P (x1 , . . . , xr−1 ) ∈ K[x1 , . . . , xr−1 ] \ {0} , depending on the set of ramification points of
f , such that, if 1 ≤ bi ≤ B with bi ∈ N , n ≥ κ(B, r) , and P (b1 , . . . , br−1 ) = 0 , then
at least one of the divisors f ∗ [n] , f ∗ [n + bi ] , i = 1, . . . , r − 1 , is a prime divisor over
K . In particular, the least n ∈ N such that f ∗ [n] is a prime divisor over K is effectively
computable.

Dèbes and Zannier [88] gave a proof of Hilbert’s irreducibility theorem relying on G -
functions. They used a clever trick by considering generic fractional linear transforms to
omit ramification. Here, we put this in the context of algebraic geometry by working on an
auxiliary curve to which we can apply directly Theorem 9.6.6. Before we come to the proof
of Theorem 9.6.9, we need a couple of lemmas.
Lemma 9.6.11. For i = 0, . . . , N , let Ci be a smooth geometrically irreducible curve
over a field F of characteristic 0 and let fi : Ci → P1F be a surjective rational function
320 N É RO N – TAT E H E I G H T S

of degree di . We consider the fibre product g : C → P1F × AN


F of the morphisms

gi : Ci × AN 1 N
F → PF × A F , (xi , t)
→ (fi (xi ) − ti , t) (i = 0, . . . , N ),

where we set t0 := 0 . Let Ri be the set of points in P1F \ {∞} over which fi ramifies.
Then the following properties hold:

(a) g is a flat finite morphism of degree d0 · · · dN ;


(b) C is an irreducible variety over F ;
(c) All fibres Ct of C over AN
F with respect to p2 ◦ g are geometrically connected
projective curves;
(d) If t ∈ AN F with ti − tj ∈ Ri − Rj for all i = j in {0, . . . , N } (such t form an
open dense subset), then a singular point (x0 , . . . , xN ) ∈ Ct has two components
xi , xj , which are ramified over ∞ with respect to fi and fj , respectively.

Proof of lemma: A surjective morphism of an irreducible variety onto a smooth curve is


always flat ([148], Prop.III.9.7). So fi is flat and gi is obtained by base change and com-
position with an isomorphism. Hence every gi and also the fibre product g are flat ([148],
Prop.III.9.2), at least if we understand the fibre product in the sense of schemes. We will
show below that C is indeed a variety.
Every gi is finite (same reason as for flatness) and hence the fibre product g is finite. Thus
the fibres are finite morphisms gt : Ct → P1F (t) for every t ∈ AN F , proving that Ct is a
projective curve (see A.12.7).
Let t be a point as in (d). Since every Ri is finite, it is clear that such ts form an open
dense subset of ANF . Let x = (x0 , . . . , xN ) ∈ Ct with at most one xi ramified over ∞ .
Even if the xi lie over finite points, our assumption on t implies that at most one xi is
ramified.
In any case, the fibre product of the morphisms gj omitting the ramified gi is smooth at the
point (x0 , . . . , xi−1 , xi+1 , . . . , xN ) , whence x is a smooth point of the map C → Ci ×AN F
obtained by base change ([148], Prop.III.10.1(b)). Since Ci is a smooth curve, we conclude
that the second projection p2 of Ci × AN F is a smooth morphism (again by base change)
and hence x is a smooth point of the composition C → AN F ([148], Prop.III.10.1(c)). In
particular, it is a smooth point of Ct , proving (d).
Since p2 ◦ g is open as a flat morphism (see A.12.13), such points are dense in C (for
varying t ) and hence the fibre product C (in the sense of schemes) has an open dense part,
which is reduced. We conclude that C is a variety.
Let t ∈ AN F be still as in (d). By flatness of g , every irreducible component of Ct covers
P1F (t) . In order to show that Ct is geometrically irreducible, it is enough to show that
Ct := gt−1 (P1F \ {∞}) is geometrically irreducible. By (d), we know that the latter is a
smooth variety and so it is enough to show that Ct is geometrically connected (see A.7.14).
To prove this, we may argue complex analytically. If F is a number field (as in our appli-
cation), this is obvious by using an embedding of F into C (see Section A.14). In general,
9.6. Hilbert’s irreducibility theorem 321

we use the Lefschetz principle: Ct is defined over a finitely generated subfield of F , which
may be embedded into C .

Let Rt := N i=0 (Ri − ti ) . Then the same arguments as in the proof of (d) show that
gt : (Ct )an → P1an \ {∞} is a finite ramified covering as in 12.3.8, and outside of gt−1 (Rt )
we get a topological covering (see 12.3.9) of degree d0 · · · dN (every fibre has exactly
d0 · · · dN points by definition of the fibre product). We choose x = (x0 , . . . , xN ) ∈
(Ct )an with y = g(x) ∈ Rt . To prove that (Ct )an is connected, it is enough to show that
there is a path connecting x with any other fibre point over y . We may just change one
entry at the time and, for notational simplicity, we change x0 to x0 ∈ g0−1 (y) .
The fundamental group π1 (P1an \ (R0 ∪ {∞}), y) is generated by the loops (σz )z∈R 0 ,
where σz is the loop starting in the base point y , then passing to a small neighbourhood of
z , turning in a small positive circle around z and returning the same way back to y . This
is a consequence of van Kampen’s theorem and is proved in the case |R0 | = 2 in 12.6.1;
the general case is by induction. We may assume that σz omits Rt and ∞ .
By the theory of covering spaces, π1 (P1an \ (R0 ∪ {∞}), y) operates transitively on the
fibre g0−1 (y) (see 12.3.5), hence there is γ ∈ π1 (P1an \ (R0 ∪ {∞}), y) such that the lift
γ0 to (C0 )an with starting point x0 ends in x0 . Now we may assume that γ is a product
of σz s (repetition possible). Then γ does not pass through Rt ∪ {∞} and we choose a
gi -lift γi in (Ci )an with starting point xi . It is enough to show that the lift of σz ends
in xi for i = 1, . . . , N . Since (Ci )an is a topological covering over a neighbourhood
of z (as gi does not ramify over z , otherwise ti − t0 = ti ∈ Ri − R0 ), we conclude
that the lift of σz has xi as end point (consider first the turn around z ). Therefore, the
path γ = (γ0 , . . . , γN ) starts in x and ends in (x0 , x1 , . . . , xN ) . This proves that Ct is
geometrically irreducible for every t as in (d).
Since we may do this for the generic point ξ of AN F , we conclude that the generic fibre
Cξ of C → AN N
F is a geometrically irreducible curve over F (AF ) . We conclude that
C is irreducible and hence we get (b). By Zariski’s main theorem (note that F (AN F ) is
algebraically closed in the function field of C by geometric irreducibility (see A.4.11) and
hence we may apply [136], Cor.4.3.10), we conclude that the fibres Ct are geometrically
connected for every t ∈ AN F , proving (c).

In the course of the proof, we have also seen that the generic fibre of g has degree d0 · · · dN ,
thus proving the remaining part of (a). Here, we use the invariance of the degree with respect
to base change. 
Remark 9.6.12. Alternatively, we give a more geometric way of deducing geometric con-
nectedness from (d) without using the theory of covering spaces. As before, it is enough
to show that Ct is geometrically connected for any t as in (d). We may assume F alge-
braically closed.
By induction on N and passing to normalizations, it suffices to consider the case N = 1 .
For t ∈ R1 − R0 , the curve
f1 (x1 ) − f0 (x0 ) = t
in C0 × C1 is Ct . We consider the rational map
F : C0 × C1  P1F , (x0 , x1 )
→ f1 (x1 ) − f0 (x0 ),
322 N É RO N – TAT E H E I G H T S

which is a morphism outside Q := f0−1 (∞)×f1−1 (∞) . By a suitable sequence of blowing-


ups in points over Q , we replace C0 × C1 by a smooth surface X with a blowing-up
morphism φ : X → C0 × C1 such that the rational map F& := F ◦ φ is a morphism ([148],
Example II.7.17.3, Cor.V.5.4). Then F& has a factorization
ψ h
X −→ Γ −→ P1F ,
with ψ a proper surjective morphism with connected fibres and h a finite morphism (by the
Stein factorization in A.12.8). Using the universal property of normalizations (see Lemma
1.4.10), we may replace Γ by its normalization, hence we may assume that Γ is a smooth
irreducible projective curve.
In order to complete the proof, we need to show that h is an isomorphism, because then the
fibre Xt is connected and hence the same is true for Ct = φ(Xt ) .
Suppose this is not the case. Then Hurwitz’s theorem in B.4.6 yields easily that h is ram-
ified at least over two points of P1F . We choose z ∈ Γ with h(z) = ∞ such that h
ramifies in z . The fibre ψ −1 (z) is a curve in X . The complement of the exceptional divi-
sor E := φ−1 (Q) is isomorphic to (C0 ×C1 )\Q , hence we may identify x ∈ ψ −1 (z)\E
with (x0 , x1 ) ∈ (C0 × C1 ) \ Q . Since f1 (x1 ) − f0 (x0 ) = h(ψ(x)) = h(z) , the chain
rule yields
(∂f0 /∂ξ0 )(x0 ) = (∂f1 /∂ξ1 )(x1 ) = 0 (9.30)
for local parameters ξ0 , ξ1 at x0 , x1 and since the set of points (x0 , x1 ) ∈ C0 × C1
satisfying (9.30) is finite, we conclude that ψ −1 (z) ⊂ E .
On the other hand, recall that the intersection quadratic form determined by the components
of a complete contractable curve E on a smooth surface is negative definite (see D. Mum-
ford [210], p.6). All points of Γ are algebraically equivalent (see A.9.40) and hence the
fibres of ψ are numerically equivalent, meaning in particular that the self-intersection of
ψ −1 (z) in X is 0 (see A.9.38). This is a contradiction, completing the proof.
9.6.13. In order to deal with the poles of gt on the fibre Ct , we generalize the situation of
Lemma 9.6.11 a little bit.
Let B, C0 , . . . , CN be smooth irreducible projective curves over F and let gj : Cj → B
be surjective morphisms for j = 0, . . . , N . Then similarly as in Lemma 9.6.11, it follows
that the fibre product C0 ×B · · ·×B CN is a projective curve. We assume for simplicity that
it is geometrically irreducible and we denote by C  its normalization. Let gj : C  → Cj
and g  : C  → B be the canonical morphisms induced by the normalization morphism.
Let Q ∈ C  and let Qj := gj (Q) . We denote the multiplicity of Q in the fibre with
respect to g  by mQ (g  ) . Similarly, mj denotes the multiplicity of Qj in the gj -fibre.
With πj a local parameter at Qj , we have
mj
gj = uj πj
for some unit uj ∈ OC j ,Q j . We define Lj to be the Galois closure of the field F (Qj ,
uj (Qj )1/m j ) ; clearly, it is independent of the choice of the uniformizing parameter πj .
Similarly, we define LQ with respect to a decomposition of g  at Q , always taking the
Galois closure in a fixed algebraic closure F .
9.6. Hilbert’s irreducibility theorem 323

Lemma 9.6.14. Under the hypothesis of 9.6.13, mQ (g  ) is the least common multiple of
the multiplicities m0 , . . . , mN and LQ is contained in the compositum of L0 , . . . , LN in
F.
Proof: It is enough to prove the claim for N = 1 , the general case is easily obtained by
induction. By base change to Q = (Q0 , Q1 ) , we may assume Q0 , Q1 to be F -rational.
We would like to argue complex analytically to work in a chart around Q ∈ C0 × C1 with
coordinates (z0 , z1 ) , where the fibre product is given by the equation u0 z0m 0 = u1 z1m 1 .
This would not change the relevant multiplicities and we could resolve the singularity at
(0, 0) in the chart to get the normalization over Q . However, to also get our claim about the
fields Lj , the argument should be carried out over F . This will be done in the framework of
formal geometry (see [148], Section II.9), meaning that we will work over an infinitesimal
neighbourhood of Q considering formal power series with coefficients in F instead of
convergent complex power series. We will use that the residue field and the local parameter
of the m -adic completion of a local ring remain the same as for the original local ring,
hence the multiplicities and Lj , LQ may be calculated in an infinitesimal neighbourhood.
The formal completion of X := C0 ×B · · · ×B CN along Q is Spec(O X,Q ) , obtained
as the mQ -adic completion of the local ring OX,Q (see [148], Example II.9.3). Note that
OC ,Q is regular ([148], Th.I.5.4.A) and hence is isomorphic to the ring of the formal
j j
power series F [[πj ]] by the Cohen structure theorem ([148], Th.I.5.5A). This yields easily
X,Q ∼
O m m
= F [[π0 , π1 ]]/ u0 π0 0 − u1 π1 1 .
There is a finite succession of blowing-ups in points over Q realizing the normalization C 
in a neighbourhood over Q . This may be performed formal analytically, which also follows
X,Q in its ring of fractions is isomorphic to the
from the fact that the integral closure of O
completion of the integral closure of OX,Q (see O. Zariski and P. Samuel [338], Ch.VIII,
Th.33). Hence we may replace X by the affine curve in A2F given by
A0 xm
0
0
= A1 xm 1
1 ,

where Aj := uj (Qj ) . Indeed, they have the same infinitesimal neighbourhood at Q ,


respectively at (0, 0) . In order to verify this it suffices to look at the power series expansion
of (1 + z)1/m i in F [[x]] for any z ∈ xF [[x]] . Thus we may replace uj by Aj by passing
to a new local parameter, which may be identified with xj .
We have a singularity at (0, 0) if and only if mi ≥ 2 for i = 0, 1 . This is a singularity
of a very simple type, which can be resolved locally in a single step by a local monoidal
transformation. We may assume 2 ≤ m0 ≤ m1 . Let k = GCD(m0 , m1 ) and write
mi = kµi , i = 0, 1 . Then there are positive integers ai such that µ0 a0 − µ1 a1 = 1 and
we define y0 , y1 by
x0 = y0µ 1 y1a 0 , x1 = y0µ 0 y1a 1 ,
hence
y0 = x−a
0
1 a0
x1 , y1 = xµ0 0 x−µ
1
1
.
The strict transform of X by this transformation has a local equation
A0 y1k = A1 .
324 N É RO N – TAT E H E I G H T S

Therefore, we get a resolution of the singularity at (0, 0) into k non-singular points located
at the points (0, y1 ) with y1k = A1 /A0 . Now Q corresponds to one of the points (0, y1 ) ,
where π := y0 is a local parameter. Using this correspondence, we see that
g  = u0 xm
0
0
= u0 y1a 0 m 0 π m 0 µ 1 .
Hence mQ (g  ) = kµ0 µ1 proving the first claim. Moreover
u0 (Q)y1a 0 m 0 (Q) = A0 (A1 /A0 )µ 0 a 0 = A−µ
0
1 a1
Aµ1 0 a 0 .
Noting that
−a /m 0
(A−a 1 µ1 a /m 1
0 Aa1 0 µ 0 )1/(kµ 0 µ 1 ) = A0 1 A1 0
we see that LQ ⊂ LQ 0 LQ 1 , proving the second claim. 
9.6.15. Let t ∈ AN
F be as in Lemma 9.6.11(d) and assume in addition that t is F -rational.
Let Ct be the normalization of Ct . Our goal is to apply Runge’s theorem to the canonical
morphism ft : Ct → P1F given as the composition of gt with the normalization morphism.
So we need an estimate for −ordQ (ft )[F (Q) : F ] , where Q is any pole of ft .
Lemma 9.6.16. Suppose the hypothesis of 9.6.15 holds and let d := deg(f ) . Then
−ordQ (ft )[F (Q) : F ] ≤ dd .
Proof: We will apply Lemma 9.6.14 for Cj := C , B := P1F , gj := gj,t and hence
C  = Ct . We note first that the canonical images Qj ∈ Cj of Q are all poles of f .
Let us choose J ⊂ {0, . . . , N } such that, for every i = 0, . . . , N , there is exactly one
j ∈ J with Qj conjugate to Qi . For conjugates, we refer to Example A.4.14. Note that
−ordQ i (gi ) is equal to the multiplicity mi of the pole Qi with respect to f . Similarly,
the field Li defined in 9.6.13 is the same with respect to f and with respect to gi , since f
and gi differ only by a constant. Obviously, mi and Li do not depend on the conjugation
class of Qi . Now Lemma 9.6.14 yields

−ordQ (ft )[F (Q) : F ] ≤ (mj [Lj : F ]) .
j∈J

Using the Euler ϕ -function and di := [F (Qi ) : F ] , we easily deduce


[Li : F ] ≤ di !mi ϕ(mi ) ≤ di !mi max(mi − 1, 1).
The displayed inequalities lead to
 
−ordQ (ft )[F (Q) : F ] ≤ dj !m2j max(mj − 1, 1) .
j∈J

On the other hand, Example 1.4.12 shows



mj dj ≤ d.
j∈J

By an easy induction on d , this gives


 
dj !m2j max(mj − 1, 1) ≤ dd ,
j∈J

completing the proof. 


Proof of Theorem 9.6.9: We first reduce the proof to the case of a geometrically irreducible
curve. The irreducible components Ci of CK are defined over a finite Galois extension L
9.6. Hilbert’s irreducibility theorem 325

over K and they are all conjugate with respect to Gal(L/K) . If Theorem 9.6.9 is known
for the restriction fi : Ci → P1L of f , then fi∗ [n] is 
a prime divisor over L , for n ∈ N
outside a set of natural density 0 . But then f ∗ [n] = i fi∗ [n] has to be a prime divisor
over K by considering the Galois action. So we may assume C geometrically irreducible.
For r ≥ 2 , let g : C → P1K × AK
r−1
be the fibre product considered in Lemma 9.6.11 with
Ci := C and fi := f for i = 0, . . . , N := r − 1 . Let R be the set of points in P1K \ {∞}
over which f ramifies and define t0 := 0 . We set
  r−1
r−2 
P (t1 , . . . , tr−1 ) := (ti − tj − z),
z∈R−R i=0 j=i+1

hence the points t considered in Lemma 9.6.11(d) form the Zariski open dense subset
U := {P = 0} .
Clearly, P is invariant under conjugation and hence defined over K . Note also that the
proof of Lemma 9.6.11 shows that the fibre gt : Ct → P1K (t) has degree dr . To get an
irreducible smooth projective curve, we will replace Ct by its normalization denoted by
Ct . By Lemma 9.6.11(d), they are isomorphic over P1K (t) \ {∞} . The induced morphism
ft : Ct → P1K (t) has also degree dr since the normalization morphism is birational (see
A.13.2).
Let B ∈ N and let B denote the box
B = {t | 1 ≤ ti ≤ B, i = 1, . . . , r − 1}.
The number of points b = (b1 , . . . , br−1 ) ∈ Nr−1 in the box B is B r−1 , while trivially
the number of those which verify the further condition P (b) = 0 is at most cB r−2 , for
some constant c depending only on P . Now we apply Theorem 9.6.6 to fb : Cb → P1K ,
for b in the set
H(B) := {b ∈ B | P (b) = 0}.
By Lemma 9.6.15, we have
max |ordQ (fb )| [K(Q) : K] ≤ dd , (9.31)
Q

where d := deg(f ) and Q ranges over the poles of fb . Now


fb−1 (n) = {(P0 , . . . , Pr−1 ) | Pi ∈ f −1 (n + bi ), i = 0, . . . , r − 1}
(we set b0 = 0 ). Obviously

r−1
[K(P0 , . . . , Pr−1 ) : K] ≤ [K(Pi ) : K].
i=0

Note also that deg(fb ) = dr . Therefore, using (9.31) and the last displayed inequality,
Theorem 9.6.6 applied with S the set of archimedean places of K yields

r−1
[K(Pi ) : K] ≥ [K : Q]−1 dr−d (9.32)
i=0

for all n ≥ κ(B, r) , where κ(B, r) may depend on B , r , and the geometric data.
If f ∗ [n + bi ] is not a prime divisor over K , then it has an irreducible component Y with
[K(Y ) : K] ≤ d/2 (see Example 1.4.12) and hence every Pi ∈ f −1 (n + bi ) satisfies
326 N É RO N – TAT E H E I G H T S

[K(Pi ) : K] ≤ d/2 . Thus by (9.32) the number s of indices i for which f ∗ [n + bi ] is not
a prime divisor over K satisfies
(d/2)s dr−s ≥ [K : Q]−1 dr−d ,
hence [K : Q]dd ≥ 2s and finally
s ≤ (d log d + log([K : Q]))/ log 2.
Now let n ≥ κ(B, r) and suppose that there are at least ρB integers m ∈ (n, B + n]
such that f ∗ [m] is not a prime divisor over K . Choosing entries of the form b = m −
n , we get at least (ρB)r−1 − cB r−2 points b ∈ H(B) . We choose r > (d log d +
log([K : Q]))/ log 2 . Then the above shows that no such points b ∈ H(B) exist, hence
ρ ≤ (c/B)1/(r−1) .
For B sufficiently large, this yields zero density for the set of integers m with f ∗ [m] not
a prime divisor. 
Remark 9.6.17. If we replace εw by [Tw : Kv ]/[F : K] according to our normalizations
in 1.3.12 then Theorem 9.6.2 and its proof hold for any field K with product formula.
Moreover, in Runge’s theorem, we can still say that the height of such points is effectively
bounded by a constant c . In the proof, some care is needed in case of a finite characteristic.
For F , we choose a sufficiently large finite normal extension and the argument shows that
we may replace [K(P ) : K] and [K(Q) : K] in the basic condition of Theorem 9.6.6 by
their separable degrees.
We claim that the same arguments as in the proof of Theorem 9.6.9 show that a field K
with product formula and of characteristic 0 is Hilbertian.
To see this, we choose a non-archimedean v ∈ MK and α ∈ K such that log |α|v is
above the bound c in Runge’s theorem. Every n ∈ N coprime to the residue characteristic
of v is a unit in Rv and hence |α|v = |α + n|v ≥ ec . We conclude that h(α + n) ≥ c .
We apply Runge’s theorem in the proof of Theorem 9.6.9 to the fibres fb−1 (α + n) and
with S such that α is an S -unit. In the same way as in the number field case, we deduce
that the set n ∈ N coprime to char(k(v)) and with f ∗ [α + n] not a prime divisor over K
has natural density 0 in N .
In fact, R. Weissauer [329] proved by different methods from model theory that every field
with product formula is Hilbertian.

9.7. Bibliographical notes

Large parts of this chapter are borrowed from the books of Lang [169] and Serre
[277]. The additional remarks 9.2.9–9.2.12 on dynamical systems and further in-
formations may be found in G.S. Call and J.H. Silverman [55], J.H. Silverman
[285]. The results from Section 9.4 are from D. Mumford [211]. Finally, Section
9.5 is due to Néron [218], see also [169], Ch.11.
As mentioned in the introduction, Néron obtained in [217] a best model for abelian
varieties A over the quotient field of a discrete valuation ring R . This Néron
9.7. Bibliographical notes 327

model is a smooth group scheme over R with generic fibre A . It is proper over R
if and only if A has good reduction. A detailed account for Néron models may be
found in the book of S. Bosch, W. Lütkebohmert, and M. Raynaud [44].
In [218], an interpretation of the Néron symbol in terms of intersection multiplic-
ities on the Néron model is given as in Example 9.5.22. For the singular case,
we refer to [169], Ch.11, §5. Over the complex numbers, Néron [218] has also
expressed the Néron symbol in terms of theta functions. For explicit formulas
of Néron’s canonical local height on elliptic curves, we refer to Silverman [286],
Ch.VI, §3, §4.
Beilinson and Bloch have generalized the Néron pairing to a pairing (Y, Z)u for
disjoint cycles Y and Z algebraically equivalent to 0 on the smooth complete
variety X satisfying dim(Y ) + dim(Z) = dim(X) + 1 (see A. Beilinson [20], S.
Bloch [27], where there are further generalizations and related conjectures). For a
general approach of canonical local and global heights of subvarieties with respect
to line bundles, the reader is referred to [141].
The literature on Hilbert’s irreducibility theorem is quite ample and we refer to
Lang [169], Ch.9 for a treatment of rather general cases. A critical analysis of both
early and modern works on the subject is in Schinzel’s monograph on polynomials
[259]. Very general results were obtained by Weissauer [329] using methods of
non-standard analysis and logic; proofs on classical lines were given by M. Fried
[124].
Theorem 9.6.2 is due essentially to V.G. Sprindžuk [289], who used quite different
techniques related to the theory of G -functions. The proof given here is in [28],
with the corrections provided by P. Dèbes [86]. See also P. Dèbes [87] and P. Dèbes
and U. Zannier [88]. The proof of Runge’s theorem 9.6.6 follows an argument in
[28]. The proof of Theorem 9.6.9 is a modification of [88] avoiding the theory of
G -functions and using translations, rather than the general Möbius transformations
of [88] and [124], in order to deal with Hilbert subsets of N .
10 THE MORDELL–WEIL THEOREM

10.1. Introduction

The main content of this chapter is the proof of the Mordell–Weil theorem, namely
the finite generation of the group of rational points of an abelian variety defined
over a number field.
The finiteness of the rank of the group of rational points on an elliptic curve E
defined over Q was proved by L.J. Mordell in his celebrated paper [207]. Mordell
worked with the elliptic curve given by a quartic equation y 2 = a0 x4 +. . .+a4 and
used its parametrization by means of Jacobi elliptic functions and theta functions.
It was by no means obvious at the time how to extend this result to elliptic curves
over number fields and to abelian varieties, and this was done by Weil in his famous
thesis [324].
A. Weil [325] also realized that, in the case of elliptic curves, it was somewhat
simpler to work with a Weierstrass model rather than the quartic equation used
by Mordell, replacing the addition and duplication formulas of elliptic functions
used by Mordell by rational functions on the curve; since then this has become
the standard elementary approach to the Mordell–Weil theorem for elliptic curves
over a field.
The basic structure of the proof, which remains unchanged until now, is in two
stages. The first stage consists in proving the so-called weak Mordell–Weil the-
orem, namely the finiteness of A(K)/φA(K) for some non-trivial isogeny φ of
the abelian variety A , usually taken to be [m], namely multiplication by an inte-
ger m ≥ 2. In the second stage, we use a Fermat descent argument to complete
the proof. The explicit approach by Mordell using elliptic functions, even with
Weil’s simplifications, is not practical enough to be carried out explicitly on ellip-
tic curves when dealing with multiplication by a general m , let alone on abelian
varieties, which we do not quite know how to describe by means of a useful ex-
plicit set of equations. Thus in the general case we follow a more abstract point of
view, culminating in the systematic use of Galois cohomology in the proof.

328
10.2. The weak Mordell–Weil theorem (elliptic curves) 329

In this chapter, we shall follow a mid-course, beginning in Section 10.2 with the
naive proof of the weak Mordell–Weil theorem for an elliptic curve over a number
field.
Section 10.3 contains a detailed proof of the important Chevalley–Weil theorem
on unramified morphisms, with its standard application (Corollary 10.3.13) that,
1
for an abelian variety A over a number field K , the extension K( m A(K))/K is
finite.
This leads in Section 10.4 to an alternative proof of the weak Mordell–Weil the-
orem through the Kummer pairing, and its standard interpretation in Galois co-
homology is the content of Section 10.5. We give also an extension of the weak
Mordell–Weil theorem suitable to function fields of transcendence degree 1. The
final short Section 10.6 concludes the proof of the Mordell–Weil theorem by means
of the Fermat descent.
The Mordell–Weil theorem as proved here is ineffective. Indeed, as yet no general
method is known for finding generators for the Mordell–Weil group A(K) of an
abelian variety A over a number field K . The ineffectiveness arises from the fact
that no procedure is known for finding representatives in A(K) of the finite group
A(K)/φA(K), due to our inability to decide whether a homogeneous space for A
has a rational point over K and, if so, to find an algorithm to produce such a point.
The question appears to be extraordinarily deep, even in the case of elliptic curves
over Q , and represents one of the most interesting open diophantine problems at
the time of writing this book.
For Section 10.2, the reader is assumed to be familiar with the basics of elliptic
curves presented in 8.3. For the other sections, we recommend first reading the
whole of Chapter 8. The proof of the Mordell–Weil theorem in 10.6 uses also the
fundamental properties of the Néron–Tate height from 9.2 and 9.3.

10.2. The weak Mordell–Weil theorem for elliptic curves

In this section we prove the finiteness of E(K)/2E(K), for an elliptic curve E


over a field K of characteristic char(K) = 2.
10.2.1. We have seen in 8.3.4 that E may be viewed as a plane curve in P2K , given
in standard affine coordinates by
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6
for some ai ∈ K . Replacing y by y − 12 (a1 x + a3 ) (which is allowed because
char(K) = 2), we may assume that a1 = a3 = 0. Therefore, after this simplifi-
cation, the affine part of E has equation
y 2 = (x − α1 ) (x − α2 ) (x − α3 )
with αi ∈ K , i = 1, 2, 3.
330 THE MORDELL–WEIL THEOREM

10.2.2. The intersection of E with the line P2K \ A2K in P2K is a divisor 3O , and
the point O = (0 : 0 : 1) is an inflexion point of E , which is taken as the identity
of the group multiplication. The affine part of E is simply E \ {O}, and in what
follows it will prove to be notationally convenient to write a point P ∈ E \ {O}
as P = (x, y) in terms of its affine coordinates.
Proposition 10.2.3. Let α1 , α2 , α3 ∈ K and f (x) := (x−α1 ) (x−α2 ) (x−α3 ).
Then the plane curve X in P2K , given in affine coordinates by y 2 = f (x), is an
elliptic curve over K if and only if the discriminant

Df := (αi − αj )
i=j

of f is not 0.
Proof: It is easy to show that X is irreducible. By the Jacobi criterion A.7.15,
(x, y) ∈ X is a singular point if and only if
∂ 2 ∂ 2
(y − f (x)) = 0 and (y − f (x)) = 0,
∂x ∂y
hence if and only if f  (x) = 0 and 2y = 0. But y = 0 is equivalent to f (x) = 0,
while f and f  have a common zero if and only if the discriminant Df of f
vanishes.
In the affine neighborhood {y = 0} of the point at infinity O , a defining equation
for X is given by the polynomial
g(z, x) := z − (x − zα1 ) (x − zα2 ) (x − zα3 )
∂g
and since ∂z (0, 0) = 1, the point O is always smooth.
We conclude that X is smooth if and only if Df = 0. In this case, the genus
formula for plane curves in A.13.4 yields that X is an elliptic curve. 
Under the assumptions of 10.2.1, we describe now the morphism [2] explicitly.
Let (x, y) be the standard affine coordinates of E . We have a group structure on
E given by the zero element O at infinity and we have the geometric description
of addition and inverse given in 8.3.6–8.3.7.
Let P = (x0 , y0 ) ∈ E(K); then −2P is equal to the third intersection point of
the tangent at P with E . If 2P = −2P = O , this tangent is vertical, proving:
Proposition 10.2.4. The group E[2] of 2-torsion points of E consists of the iden-
tity element O and the points (αi , 0), i = 1, 2, 3, of order 2.
10.2.5. Let P = (x0 , y0 ) ∈ E(K) and suppose that P is not a 2-torsion point.
The tangent line at P has equation
y = ax + b (10.1)
10.2. The weak Mordell–Weil theorem (elliptic curves) 331

with a, b determined as in 8.3.8, namely a = f  (x0 )/(2y0 ) and b = y0 − ax0 .


In order to determine the x -coordinate of the third intersection point −2P , we
eliminate y from (10.1) and the equation y 2 = f (x), obtaining the equation
(ax + b)2 − (x − α1 ) (x − α2 ) (x − α3 ) = 0. (10.2)
The polynomial on the left has a zero at x0 of multiplicity at least 2 (it is 3 if P
is a torsion point of order 3 on E ), which accounts for two solutions. The third
solution x1 is the x -coordinate of −2P . Hence factoring the left-hand side of
(10.2) into linear factors yields the identity
(ax + b)2 − (x − α1 ) (x − α2 ) (x − α3 ) = −(x − x0 )2 (x − x1 )
of cubic polynomials in x . We specialize x to αi , and find
(aαi + b)2 = −(αi − x0 )2 (αi − x1 )
and

2
aαi + b
x1 − αi = (10.3)
x0 − αi
for i = 1, 2, 3. In affine coordinates, by (10.1), we have 2P = (x1 , −ax1 − b).
Suppose now that αi ∈ K for i = 1, 2, 3. Then the preceding equation (10.3)
shows that x1 − αi is a square in K , for i = 1, 2, 3. The following result gives
the converse to this statement.
Lemma 10.2.6. Under the hypotheses of 10.2.1, suppose that α1 , α2 , α3 ∈ K .
Let (x1 , y1 ) be the affine coordinates of a point Q ∈ E(K), Q = O . Then Q ∈
2E(K) if and only if x1 − αi is a square in K for i = 1, 2, 3.
Proof: If Q ∈ 2E(K), then x1 − αi is a square in K by (10.3). On the other
hand, let x1 − αi = u2i , where ui ∈ K is determined up to sign. Consider the
system of equations
aαi + b
ui = , i = 1, 2, 3
c − αi
in the unknowns a, b , c . We substitute αi = x1 − u2i , getting after clearing
denominators the three equations
(ax1 + b) − u1 c − u21 a = −(x1 − u21 )u1
(ax1 + b) − u2 c − u22 a = −(x1 − u22 )u2
(ax1 + b) − u3 c − u23 a = −(x1 − u23 )u3 .

We view this as an inhomogeneous linear system in the unknowns ax1 + b , −c ,


−a. The determinant is the Vandermonde determinant (u2 − u1 )(u3 − u1 )(u3 −
u2 ), which is a factor of
0
(u22 − u21 ) (u23 − u21 ) (u23 − u22 ) = (α1 − α2 ) (α1 − α3 ) (α2 − α3 ) = Df ,
332 THE MORDELL–WEIL THEOREM

hence is not 0 by Proposition 10.2.3.


Since x1 and u1 , u2 , u3 are in K , solving the system by Cramer’s rule shows
that ax1 + b, −c, −a are in K . Since the morphism [2] is surjective, we know
a priori that there is a point P = (x0 , y0 ) ∈ E(K) such that Q = 2P . For
a suitable choice of u1 , u2 , u3 ∈ K , we conclude from (10.3) that x0 = c and
a = f  (x0 )/(2y0 ), b = y0 − ax0 as in 10.2.5. Hence P is K -rational. This
proves the claim. 
Remark 10.2.7. The numbers ui are determined only up to sign, hence we have
eight possible choices for the triple (u1 , u2 , u3 ). The corresponding rational points
P give the four division points for which 2P has first coordinate x1 . Since Q and
−Q have the same first coordinate x1 and are distinct, the eight solutions we have
found are the eight division points for which 2P = ±Q.
10.2.8. Next, we consider addition. Let P1 , P2 , P3 ∈ E(K) such that P1 + P2 +
P3 = O and Pi = O , i = 1, 2, 3. Let y = ax + b be the line through P1 , P2 , P3
and let (xi , yi ) be the affine coordinates of the points Pi . Equation (10.2) has
roots x1 , x2 , x3 , giving
(ax + b)2 − (x − α1 ) (x − α2 ) (x − α3 ) = (x1 − x) (x2 − x) (x3 − x). (10.4)
As in the proof of (10.3) we set x = αi and get
(aαi + b)2 = (x1 − αi ) (x2 − αi ) (x3 − αi ) (10.5)
for i = 1, 2, 3.

This gives us evidence for a group homomorphism ϕi : E(K) −→ K × /K ×2


given by (x, y) −→ x−αi mod K ×2 , where K ×2 denotes the squares in K × =
K \ {0}. However, this is defined only for x = αi . If we proceed as before but
with P1 = (αi , 0), then differentiating (10.4) at the point αi yields the equation
−(αi − αj ) (αi − αk ) = −(x2 − αi ) (x3 − αi ), (10.6)
where j, k are the remaining two indices. Now, as we will verify in a moment, we
obtain a homomorphism by setting
ϕ = (ϕ1 , ϕ2 , ϕ3 ) : E(K) −→ (K × /K ×2 )3
with


⎨1 if P = O,
ϕi (P ) := x − αi mod K ×2 if P = (x, y), x = αi ,


(αi − αj ) (αi − αk ) mod K ×2 if P = (αi , 0),
where again j, k denote the two other indices.
Lemma 10.2.9. The map ϕ : E(K) −→ (K × /K ×2 )3 is a group homomorphism
with kernel 2E(K).
10.2. The weak Mordell–Weil theorem (elliptic curves) 333

Proof: By (10.5), (10.6), and commutativity we have


ϕi (P1 )ϕi (P2 )ϕi (P3 ) = 1 (10.7)
provided the points P1 , P2 , P3 are different from O and at most one is equal to
the 2-torsion point (αi , 0).
If P1 = O , then P3 = −P2 and P2 and P3 have the same x-coordinate, and
(10.7) becomes obvious. It remains for consideration the case in which Pi =
(αi , 0) for i = 1, 2, 3. In this case (10.7) follows immediately from definition.
Since ϕi (P ) = ϕi (−P ) = 1/ϕi (P ), it follows that ϕi is a group homomor-
phism. By Lemma 10.2.6, ker(ϕ) = 2E(K). 
Proposition 10.2.10. Let R be a unique factorization domain with quotient field
K . Assume that char(K) = 2 and that the group of units R× in R is finitely
generated. Let E be the elliptic curve given by
y 2 = (x − α1 ) (x − α2 ) (x − α3 ),
where α1 , α2 , α3 are distinct elements of R . Then

ω(αj −αi )
|E(K)/2E(K)| ≤ 4 · 2 r i< j
,
where ω(α) denotes the number of distinct prime factors of α ∈ R \ {0} and
where r is the dimension of the F2 -vector space R× /R×2 .
Proof: By Lemma 10.2.9, we have a homomorphism ϕ : E(K) −→ (K × /K ×2 )3
with kernel 2E(K). We need to estimate the cardinality of the image.
Let S be a set of representatives of the primes of R and let P ∈ E(K) \ {O}
with affine coordinates (x, y). For i = 1, 2, 3, there are bi ∈ K, ui ∈ R× and ai
a product of distinct primes of S such that
x − αi = b2i ui ai . (10.8)
It follows from
y 2 = (x − α1 ) (x − α2 ) (x − α3 )
that the primes of the denominator of x occur with even multiplicity. Therefore, ai
is coprime to the denominator of bi and, substituting (10.8) into the last equation,
we see that a1 a2 a3 and u1 u2 u3 are squares in R . Hence there are c1 , c2 , c3 ∈ R ,
pairwise coprime and product of distinct primes of S , such that
a1 = c2 c3 , a2 = c1 c3 , a3 = c1 c2 .
Let π be a prime of S dividing ci . Since aj and the denominator of bj are
coprime, π divides the numerator of x − αj for j = i and it follows that π
divides αj − αk , where j, k are the other two indices. Therefore, the number of
possibilities for π is bounded by ω(αj − αk ) and there at most

ω(αj −αi )
2i < j
334 THE MORDELL–WEIL THEOREM

such triples (c1 , c2 , c3 ).


The image of R× in K × /K ×2 is isomorphic to Fr2 , restricting the range of ui
to 2r possibilities. We also know that u1 u2 u3 is a square, hence the number of
triples (u1 , u2 , u3 ) is bounded by 4r .
Finally, it is easily seen that the points of ϕ(E[2]) may be represented by (u1 c2 c3 ,
u2 c3 c1 , u3 c1 c2 ) for admissible choices of ui and cj . 
The following corollary is a special case of the weak Mordell–Weil theorem for
elliptic curves:
Corollary 10.2.11. Let E be an elliptic curve over a number field K with 2-
torsion also defined over K . Then E(K)/2E(K) is finite.
Proof: The ring of integers OK is not necessarily a unique factorization domain.
However, by Proposition 5.3.6, we can find a finite set of places S in K such that
for any finite set of places T ∈ MK with T ⊃ S , the ring R of T -integers in
K is a unique factorization domain. Its group of units R× is finitely generated by
Dirichlet’s unit theorem in 1.5.13. We have seen in 10.2.1 and Proposition 10.2.4
that E is K -isomorphic to an elliptic curve of the form required in Proposition
10.2.10, because we can always enlarge the ring R so as to ensure that every
αi ∈ R . The result now follows from Proposition 10.2.10. 
10.2.12. If the 2-torsion of E is not defined over the number field K , we can
still prove the finiteness of E(K)/2E(K) by an additional argument. Of course,
a proof is obtained by performing a base change to a finite extension L of K over
which the 2-torsion becomes rational, then prove the finiteness of E(L)/2E(L),
then prove that E(L) is a finitely generated abelian group by Fermat descent, and
conclude, by general results about abelian groups, that the subgroup E(K) is also
finitely generated of rank not exceeding the rank of E(L).
Here we give a simple-minded direct proof of this result.
Lemma 10.2.13. Let A be an abelian variety defined over a field K and let L be
a finite separable extension of K . Let m be a positive integer and suppose that
A(L)/mA(L) is a finite group. Then A(K)/mA(K) is a finite group.
Proof: Let d = [L : K] and let δ be a positive integer such that we can write d =
d0 d1 with d0 |mδ−1 and GCD(d1 , m) = 1. Let also E be a set of representatives
of A(L)/mA(L) in A(L).
The group A(L)/mδ A(L) is again a finite group, since a set of representatives for
it is contained in the finite set
E(δ) = E + mE + . . . + mδ−1 E.
Let F be the Galois closure of L over K , let G = Gal(F/K), let H be the
subgroup of G of index d fixing L, and denote by R a full set of representatives
for the left-cosets of H in G .
10.3. The Chevalley–Weil theorem 335

Let x ∈ A(K) ⊂ A(L). Then we have


x − mδ y ∈ E(δ)
for some y ∈ A(L). We apply the automorphisms σ ∈ R to this equation and
deduce
dx − mδ z ∈ E  (δ),

where z := σ σy and
 


E (δ) := σ E(δ).
σ∈R

Clearly, z ∈ A(K) because any element τ ∈ Gal(F/K) permutes the left-cosets


of H . Since d0 divides mδ−1 , we may divide by d0 , getting
1 
d1 x − m (mδ−1 /d0 )z ∈ A(K) ∩ E (δ)
d0
1 
and A(K) ∩ d0 E (δ) is still a finite set (use Proposition 8.7.2).
Finally, since d1 and m are coprime, the euclidean algorithm produces integers u
and v such that d1 u − mv = 1. After multiplication by u , it follows that
1
x − m((mδ−1 /d0 )uz − vx) ∈ A(K) ∩ uE  (δ). 
d0
Another proof of this lemma more in line with the Kummer theory will be given
in Lemma 10.4.3. If we combine the lemma with Corollary 10.2.11, we conclude
with the weak Mordell–Weil theorem for elliptic curves.
Theorem 10.2.14. Let E be an elliptic curve defined over a number field K .
Then E(K)/2E(K) is finite.

10.3. The Chevalley–Weil theorem

The reader is assumed to be familiar with the concepts of bounded sets from Sec-
tion 2.6 and ramification from Section A.12 and Appendix B. To fix the ideas, let
us consider for the moment an unramified finite morphism ϕ : Y → X of pro-
jective varieties over a field K . For P ∈ Y, Q := ϕ(P ) and a discrete valuation
v of K , we may ask the question whether the field extension K(P )/K(Q) is
unramified at all places over v . This is not true in general, but the local form of
the theorem of Chevalley–Weil states that the discriminant of the completions of
K(P )/K(Q) divides an element of the valuation ring independent of P .
This occupies us in the first part of this section, and then we globalize the statement
to families of places. In particular, if K is a number field, then we will see that
K(P )/K(Q) is unramified over all but finitely many places v .
336 THE MORDELL–WEIL THEOREM

1
Together with Hermite’s theorem, this will lead us to the finiteness of K( m A(K))/
K for an abelian variety A over K and m ∈ Z \ {0}.
10.3.1. Let K be a field with a non-archimedean absolute value | | on the alge-
braic closure K ⊃ K . We will first prove a local version of the Chevalley–Weil
theorem with respect to this absolute value, under fairly general conditions. We
need to introduce first some notation:
Recall that, for Q ∈ X(K), the residue field K(Q) is the intermediate field
of K/K generated by the coordinates of Q (in any affine chart). Let RQ be
the valuation ring of the restriction of | | to K(Q). The valuation ring of the
restriction of | | to K is denoted by R , and completions with respect to | | will
be denoted by .
Let ϕ : Y −→ X be a morphism of K -varieties and let P ∈ Y (K) with image
Q ∈ X(K). Since ϕ is defined over K , we have K(Q) ⊂ K(P ) and RQ =
K(Q) ∩ RP .
We define the local discriminant
 P },
dP/Q := {det(T r   (ai bj )) | a1 , . . . , a , b1 , . . . , b  ∈ R
K(P )/K(Q) d d

where we have abbreviated d for the local degree dP := [K(P ) : K(Q)].
 If R
is a discrete valuation ring, then this agrees with the discriminant dR P /R Q from
B.1.14.
Lemma 10.3.2. The discriminant  Q .
dP/Q is an ideal in R
Proof: Replacing a1 by λa1 for any λ ∈ R Q , it is evident that R
Q 
dP/Q ⊂ 
dP/Q .

To prove the claim, it is enough to show that the trace of any a ∈ RP is in R Q .
This follows from the fact that a subset S of a valuation ring R is an ideal if and
only if RS ⊂ S . Let f (t) = tm +am−1 tm−1 +· · ·+a0 be the minimal polynomial
 . By transitivity of traces, it is clear that T r
of a over K(Q)  (a) is a Z -
)/K(Q)
K(P
 ensures that all conjugates of a have
multiple of am−1 . Completeness of K(Q)
Q [t]. This
the same absolute value (use Proposition 1.2.7) and hence f (t) ∈ R
proves T rK(P
)/K(Q)

 (a) ∈ RQ and the claim. 
Now we state the local Chevalley–Weil theorem. Briefly, it means that for an
unramified morphism, the extension K(P )/K(Q) cannot be too ramified.
Proposition 10.3.3. Let us fix an embedding of the field K into K and let | | be
a non-archimedean absolute value on K . Let ϕ : Y −→ X be a finite unramified
morphism of K -varieties and E be a bounded set in X . Then there is α ∈ R\{0}
such that α ∈ 
dP/Q whenever P ∈ Y (K), with Q := ϕ(P ) ∈ E .
Proof: It is known that an unramified morphism is locally equal to a closed em-
bedding followed by a standard étale morphism (see A.12.17). This means that for
10.3. The Chevalley–Weil theorem 337

every y ∈ Y there is an affine neighborhood V of y and an affine neighborhood


U of ϕ(y) such that ϕ|V = ψ ◦ i, where i : V → W is a closed embedding and
ψ : W → U is a standard étale morphism. To recall the latter, let A := K[U ].
Then there is a monic polynomial f (t) ∈ A[t] and an affine variety W  with co-
ordinate ring K[W  ] = A[t]/f A[t] such that W is an open subset of W  and the
formal derivativef  (t) is a unit on W .
Since the finite morphism ϕ is proper (see A.12.4), Proposition 2.6.6 shows that
ϕ−1 (E) is bounded in Y . There are finitely many Uj , Vj of the above form
covering X and Y , respectively. By Remark 2.6.3, there is a decomposition of
ϕ−1 (E) into bounded sets Ej of Vj . Hence it is enough to prove the following
claim:
Let ψ : W → U be a standard étale morphism as above and let E  be a bounded
subset of W . Then there is α ∈ R \ {0} such that α ∈ 
dP/ϕ(P ) for all P ∈ E  .
Let
f (t) = td + a1 td−1 + · · · + ad , ai ∈ A.
By boundedness, there is a ∈ R \ {0} such that
max sup |ai (ϕ(P ))| ≤ |a|−1 .
i=1,...,d P ∈E 

Note that, if the restriction of | | to K is trivial, then | | is trivial on K (use


Proposition 1.2.7). Let P ∈ E  and let Q := ϕ(P ). We have a natural surjection
of K[W ] onto the residue field K(P ). Moreover, we have a commutative diagram
K[U ] →→ K(Q)

↓ ↓
K[W  ] →→ K(P )

of natural ring homomorphisms. Therefore, K(P ) is generated by the image tP


of t as a K(Q)-algebra. Since tP is a zero of the polynomial
fQ (t) := td + a1 (Q)td−1 + · · · + ad (Q),
the element ξ := atP ∈ K(P ) is a zero of
gQ (t) := td + aa1 (Q)td−1 + · · · + ad ad (Q).
By the choice of a, we have gQ (t) ∈ RQ [t] and
   
|ξ|d ≤ max |a|i |ai (Q)| |ξ|d−i ≤ max 1, |ξ|d−1 .
i=1,...,d
338 THE MORDELL–WEIL THEOREM

It follows that |ξ| ≤ 1, i.e. ξ ∈ RP . Let gξ be the minimal polynomial of ξ over


 . It is a monic polynomial and there is a monic polynomial h ∈ K(Q)[t]
K(Q) 
such that gQ = gξ h . By Gauss’s lemma (see Lemma 1.6.3), we have gξ , h ∈
RQ [t]. The elements

1, ξ, . . . , ξ d−1
 -vector space K(P
form a basis of the K(Q) ), which is contained in R
P . It
follows that the discriminant Dgξ ∈  dP/Q . By B.1.13 and Proposition 1.2.7, we
have
  
|Dgξ | = |NK(P  (gξ (ξ))| = |gξ (ξ)| .
)/K(Q)
d
(10.9)
For derivatives, the following identities hold
gξ (ξ)h(ξ) = gQ
 
(ξ) = ad−1 fQ (tP ) = ad−1 f  (P ). (10.10)
By assumption, f  is a unit on W and hence (f  )−1 is bounded on E  . Using
(10.9) and (10.10), the same holds for Dg−1
ξ
and hence there is an α ∈ R \ {0}
with |Dgξ | ≥ |α|. Using Dgξ ∈ dP/Q , we get the claim. 
10.3.4. In order to give the global version of the Chevalley–Weil theorem, we
need an extension of the above to several absolute values. Let MK be a set of
non-archimedean places on K .
For every v ∈ MK , we choose an absolute value | |v in the equivalence class v .
All the considerations below are independent of these choices. We assume that, for
every α ∈ K \ {0}, the set {v ∈ MK | |α|v = 1} is finite. For example, think of
the standard absolute values on a number field or a function field. Let M be a set
of places on K such that the restriction map M −→ MK is onto. By Proposition
1.2.8, every absolute value on K extends to an absolute value on K . For u ∈ M ,
we denote by | |u the absolute value in the equivalence class of u extending | |v ,
where v is the image of u in MK (here we use Proposition 1.2.3). For u ∈ M ,
the definitions of 10.3.1 carry over. Thus the local discriminant is now denoted by

duP/Q . For v ∈ MK , let Rv be the corresponding valuation ring in K .

The global Chevalley–Weil theorem has the following form:


Theorem 10.3.5. Under the hypothesis above, assume that ϕ : Y −→ X is a
finite unramified morphism of K -varieties and that (E u )u∈M is an M -bounded
family in X . Then for any v ∈ MK , there is a non-zero αv ∈ Rv such that
αv ∈ duP/Q whenever u ∈ M with u|v and P ∈ Y (K) with Q := ϕ(P ) ∈ E u .
Moreover, we can choose αv = 1 for all but finitely many v ∈ MK .
Proof: Use the proof of Proposition 10.3.3 and keep track of the dependence on u .
Note, by the definition of boundedness, that we may choose a and α depending
only on v (and not on the particular choice of u|v ). The details are left to the
reader. 
10.3. The Chevalley–Weil theorem 339

The global Chevalley–Weil theorem for discrete valuations may be stated in


terms of more familiar discriminants:
Corollary 10.3.6. Let MK be a set of discrete valuations of the field K , satisfying
the finiteness condition of 10.3.4. Let ϕ : Y −→ X be an unramified finite K-
morphism of complete K-varieties. Then there are a finite subset S of MK and
for any v ∈ S a non-zero element αv of the maximal ideal in Rv , such that for
any P ∈ Y (K), Q := ϕ(P ) and any place w0 of K(Q) with w0 |v , the following
statements hold:

(a) K(P )/K(Q) is unramified over w0 if v ∈


/ S;
(b) αv ∈ dw
P/Q if v ∈ S .
0

Here dw 0
P/Q is the discriminant of Rw0 over Rw0 , where Rw0 is the valuation ring
of w0 in K(Q) and Rw0 is its integral closure in K(P ).
Proof: To prove the claim, we may assume Y irreducible and ϕ surjective (since
ϕ is a closed map, A.12.4). By Proposition 2.6.17, X(K) is M -bounded in X ,
where M is the set of places on K extending those of MK . Now the result
follows from the remarks below, Theorem 10.3.5 and the decomposition

dw0  
dw
P/Q Rw0 = P/Q
w

of B.1.21, where w ranges over all places in MK(P ) with w|w0 . Note that the
extension K(P )/K(Q) is separable because ϕ is unramified. The number of
places w lying over w0 is bounded by [K(P ) : K(Q)] (see Remark 1.3.3) and
this is uniformly bounded (by the maximum of the d = deg(f ) occuring in the
proof of Proposition 10.3.3). Finally note that K(P )/K(Q) is unramified over
w0 if and only if 1 ∈ dw0
P/Q , see B.2.13. 
Now we can state the global version of the Chevalley–Weil theorem for number
fields.
Theorem 10.3.7. Let K be a number field and let ϕ : Y −→ X be an unramified
finite morphism of K -varieties. If X is complete, then there is a non-zero α ∈ OK
such that for any P ∈ Y (K) and Q := ϕ(P ) the discriminant dP/Q of OK(P )
over OK(Q) contains α .
Proof: By B.1.20, we have in the notation of Corollary 10.3.6
 
dwP/Q ∩ OK(Q) ,
0
dP/Q =
w0

where w0 ranges over all non-archimedean places of K(Q). Now we apply Corol-
lary 10.3.6 with MK the set of non-archimedean places of K . If v ∈ S , then we
may choose αv ∈ OK . Note that in a Dedekind domain the product or intersection
340 THE MORDELL–WEIL THEOREM

P/Q ∩ OK(Q) is a power of the prime ideal


of coprime ideals is the same. Since dw 0

of OK(Q) corresponding to w0 , we conclude that



α := αv
v∈S
satisfies the claim. 
Example 10.3.8. Let us consider the morphism ϕ : Gm → Gm over Q given
by x → x2 . Then ϕ is unramified and finite. Now let d be a square-free integer,

√ d as a point Q in Gm with coordinate x = d , then P = ± d
|d| ≥ 2. If we view
and K(P ) = Q( d) is a quadratic extension of K(Q) = Q . By Example B.1.16,
we have 
d if d ≡ 1 mod 4,
dQ(√d)/Q =
4d if d ≡ 2, 3 mod 4.
We conclude that dP/Q is not bounded on integer valued points. However, this is
not in contrast with the Chevalley–Weil theorem because integer valued points do
not form a bounded set in Gm . If we view ϕ as a map Gm → P1Q , then the above
points form a bounded set in P1Q , but now ϕ is no longer a finite map.

Finally, we apply the Chevalley–Weil theorem to an abelian variety and the mor-
phism multiplication by m . In 10.3.9 and in Proposition 10.3.10, we relate the
notion of good reduction to the discriminants dP/Q introduced before. Here, the
reader should be familiar with schemes. The main reference is [44]. We shall
not need these results in the further course of this book, and the reader may move
immediately to Theorem 10.3.11.
10.3.9. Let A be an abelian variety over the field K and let Rv be a discrete valuation
ring in K . We say that A has good reduction in v if there is a proper smooth scheme A
over Rv such that we may identify the generic fibre AK with A .
Assume also that A has good reduction in v and let Y be any smooth scheme over Rv .
A morphism YK → A over K induces a rational map Y  A , defined on the points
of codimension 1 by the valuative criterion of properness (see [148], Theorem II.4.7). By
the analogue of Theorem 8.2.21 for group schemes, it is a morphism. Then A is unique
up to isomorphisms which extend the identity and is called the Néron model of A . For
m ∈ Z \ {0} , the morphism multiplication by m extends to A , as well as the group
structure. It can be shown that this extension [m]A is flat ([44], 7.3.2). If m is not divisible
by the characteristic of the residue field relative to v , then [m]A is fibre-wise étale (by
Proposition 8.7.2) and it is even an étale morphism ([44], 7.3.2).

The local Chevalley–Weil theorem for abelian varieties may be stated in the following
precise form:
Proposition 10.3.10. Let A be an abelian variety over K , let m ∈ Z \ {0} and let v be
a discrete valuation on K . If A has good reduction in v and if the characteristic of the
residue field does not divide m , then for any P ∈ A(K) , the extension K(P )/K([m]P )
is unramified at all places lying over v .
10.4. The weak Mordell–Weil theorem (abelian varieties) 341

Proof: By base change, we may assume that Q := [m]P is K -rational. Let w be a


place of K(P ) with w|v and valuation ring Rw . By 10.3.9 and the valuative criterion
of properness ([148], Theorem II.4.7), P extends to an Rw -valued point of A . Then
Proposition B.3.6 yields the claim. 
The following combination of the global Chevalley–Weil theorem for number
fields and of Hermite’s theorem on discriminants of number fields is most im-
portant for applications. This is usually called the Chevalley–Weil theorem.
Theorem 10.3.11. Let K be a number field, K an algebraic closure of K , and let
ϕ : Y −→ X be a finite unramified morphism of K -varieties. If X is complete,
then there is a number field L, K ⊂ L ⊂ K such that P ∈ Y (L) for any
P ∈ Y (K) with ϕ(P ) ∈ X(K).
Proof: We may assume Y irreducible. Let P ∈ Y (K) with Q := ϕ(P ) ∈
X(K). By Theorem 10.3.7, there is α ∈ OK , independent of P , such that α ∈
dP/Q . Using B.2.13, we conclude that K(P )/K is unramified outside the finite
set of places dividing α . By Hermite’s theorem (see Corollary B.2.15), there are
only finitely many possibilities for K(P ). This proves the claim. 
10.3.12. Let A be an abelian variety over K . For a non-zero integer m , we denote
by m1
A(K) the subset [m]−1 A(K) of A(K). For S ⊂ A(K), the field K(S) is
the smallest intermediate field K ⊂ L ⊂ K with S ⊂ A(L).

The Chevalley–Weil theorem for abelian varieties states the following:


Corollary 10.3.13. Let A be an abelian variety defined over a number field K .
1
Then [K( m A(K)) : K] < ∞ .
Proof: By Proposition 8.7.2, the morphism [m] is finite and étale. The claim is
now obvious from Theorem 10.3.11. 

10.4. The weak Mordell–Weil theorem for abelian varieties

Let A be an abelian variety over the field K . The goal of this section is the proof
of the following statement, known as the weak Mordell–Weil theorem.
Theorem 10.4.1. Let A be an abelian variety over a number field K and let m
be a positive integer. Then A(K)/mA(K) is finite.
10.4.2. To begin with we need to introduce some notation. As usual, Gal(L/K)
denotes the Galois group of an intermediate field extension K ⊂ L ⊂ K . Let
g ∈ Gal(L/K) and let X be a variety over K . We view a point x ∈ X(L) as
belonging to some affine chart, with affine coordinates in L. Applying g −1 to
the coordinates, we get a well-defined point xg ∈ X(L). Clearly, xgh = (xg )h
and hence we have an action of the Galois group on X(L). If ϕ : X → Y
is a morphism over K , then ϕ(xg ) = ϕ(x)g . If F denotes the fixed field of
342 THE MORDELL–WEIL THEOREM

Gal(L/K), then x ∈ X(F ) is equivalent to xg = x for every g ∈ Gal(L/K).


In particular, if X is an abelian variety A , we have (ma)g = mag and (a+b)g =
ag + bg for a, b ∈ A(L) and m ∈ Z .

Recall that A[m] denotes the group of m -torsion points of A . The next statement
is contained in 10.2.13. We give an alternative proof using methods of Kummer
theory.
Lemma 10.4.3. Let L be a finite Galois extension of K and let m ∈ Z \ {0}. If
A(L)/mA(L) is finite, then A(K)/mA(K) is finite.
Proof: The inclusion A(K) ⊂ A(L) induces a homomorphism
A(K)/mA(K) −→ A(L)/mA(L)
of abelian groups. Let N be its kernel. It is enough to show that N is finite.
Choose a system of representatives in A(K) for N . For each representative a,
choose ba ∈ A(L) such that a = mba . Consider an element g ∈ Gal(L/K) and
define
λa (g) := bga − ba .
By 10.4.2, we have
mλa (g) = (mba )g − mba = ag − a.
By K -rationality of a, this is zero. Using our system of representatives, the rule
a → λa defines a map from N to the set of maps
Gal(L/K) −→ A[m].
N will be finite if the map is injective and the range is finite. The latter statement
follows from Proposition 8.7.2. In order to prove the former, let us suppose that
λa = λa for representatives a, a . We have
bga − ba = bga − ba
and hence
(ba − ba )g = ba − ba
for every g , or equivalently ba − ba ∈ A(K) (by 10.4.2). Therefore, by applying
[m] we get a = a . 
10.4.4. An important step in the proof of the weak Mordell–Weil theorem is the
generalization of some aspects of Kummer theory to abelian varieties.
Let m ∈ Z \ {0} be not divisible by char(K) and assume that A[m] ⊂ A(K).
We denote the separable algebraic closure of K in K by K s . For a ∈ A(K),
there is b ∈ A(K s ) such that a = mb (using [m] unramified from Proposition
8.7.2, every such b ∈ A(K) is in A(K s )). If g ∈ Gal(K s /K), then we define
a, g := bg − b.
By 10.4.2, we have a, g ∈ A[m].
10.4. The weak Mordell–Weil theorem (abelian varieties) 343

Let a ∈ A(K) and b ∈ A(K s ) with a = mb , then


(b + b )g − (b + b ) = (bg − b) + (b − b ).
g

This shows that a, g is independent of the choice of b (choose b ∈ A[m] and
use that b ∈ A(K) by assumption). Moreover, we see that  ,  is linear in the
first variable.
The map
 ,  : A(K) × Gal(K s /K) −→ A[m]
is called the Kummer pairing. The right-kernel of  ,  is defined by
{g ∈ Gal(K s /K) | a, g = 0 for every a ∈ A(K)}
and the left-kernel is defined in a similar fashion by
{a ∈ A(K) | a, g = 0 for every g ∈ Gal(K s /K)}.
1 
As in Corollary 10.3.13, let K m A(K) be the smallest intermediate field K ⊂
L ⊂ K such that any b ∈ A(K) with mb ∈ A(K) is rational over L.
Proposition 10.4.5. The Kummer  with left-kernel mA(K) and
 pairing1 is bilinear,
right-kernel the subgroup Gal K s /K( m A(K)) of Gal(K s /K).
Proof: Let g, g  ∈ Gal(K s /K). Using the notion and arguments of 10.4.2 and
10.4.4, we have
  
a, gg   = bgg − b = (bg − b)g + bg − b.
Since a, g is K -rational by assumption, we get
a, gg   = a, g + a, g  .
This proves linearity in the second variable and thus  ,  is bilinear (see 10.4.4).
For a ∈ mA(K), choose b ∈ A(K) such that a = mb . By K -rationality of b ,
we have
a, g = bg − b = 0
for every g ∈ Gal(K s /K). Conversely, let a be in the left-kernel. For any
b ∈ A(K s ) with a = mb , we have
0 = a, g = bg − b.
Since this is true for every g ∈ Gal(K s /K) and since K is the fixed field of the
Galois group, we conclude b ∈ A(K) (see 10.4.2). So the left-kernel is equal to
mA(K).
 1

Obviously, Gal K s /K( m A(K)) is contained in the right-kernel H .
On the other hand, let g be an element of the right-kernel. For b ∈ A(K s ) with
mb ∈ A(K), we have bg = b . It follows that the restriction of g to the residue
field K(b) is equal to the identity, hence the same is true for the restriction of g
344 THE MORDELL–WEIL THEOREM

 
to K( m1
A(K)). This proves H ⊂ Gal K s /K( m
1
A(K)) . We conclude that
equality holds. 
Remark 10.4.6. It follows from Proposition 10.4.5 that the right-kernel is a closed
1
normal subgroup of Gal(K s /K). By Galois theory, K( m A(K)) is a Galois
extension of K . By the same Proposition 10.4.5, we conclude that the Kummer
pairing induces a non-degenerate pairing
   
A(K)/mA(K) × Gal K( m 1
A(K))/K −→ A[m]
(i.e. left- and right-kernel are zero). Thus in order to prove the finiteness
 of the
1
group A(K)/mA(K), it is enough to show that Gal K( m A(K))/K is finite.
Proof of Theorem 10.4.1: By Lemma 10.4.3 and Proposition 8.7.2, we
 1may assume

that A[m] ⊂ A(K). Since here K is a number field, we see that K m A(K) /K
is finite by Corollary 10.3.13. As we have seen in Remark 10.4.6, this is enough
to prove the weak Mordell–Weil theorem. 

10.5. Kummer theory and Galois cohomology

In this section, we give additional information to the previous sections. The reader may skip
it in a first read because the results are of minor importance in our book. First, we recall
Kummer theory of algebraic field extensions. For completeness, we give the classical inter-
pretation of the Kummer pairing in 10.4.4 in terms of Galois cohomology. This is essential
for a deeper understanding of the group A(K)/mA(K) , going beyond its finiteness. Then
we use Kummer theory to give a generalization of the proof of the weak Mordell–Weil
theorem working also for a curve over an algebraically closed field.
10.5.1. Let us recall the basic facts from Kummer theory.
Let K be a field and m ∈ N , m > 0 . Assume that the group µm (K) of m th roots of unity
in K has m elements (hence m is not divisible by char(K) ). A finite-dimensional exten-
sion L/K is called abelian of exponent m if L/K is a Galois extension and Gal(L/K)
is abelian and if the least common multiple of the orders of the elements of Gal(L/K)
is m . For S ⊂ K × , we denote by K(S 1/m ) the smallest subfield of K s containing
{α ∈ K s | αm ∈ S} and K . Moreover, we write K ×m for the subgroup of m th powers
in K × and, more generally, for any subgroup H of K × , we will write H m for the group
of m th powers in H .

Theorem 10.5.2. There is an inclusion preserving bijection of the set of subgroups H of


K × containing K ×m onto the set of abelian extensions L/K of exponent dividing m
with L ⊂ K , given by H
−→ L := K(H 1/m ) . The inverse map is L
−→ H :=
(L× )m ∩ K × . There is a well-defined pairing
, : H × Gal(L/K) −→ µm (K),
given by α, g := g(β)/β , where β ∈ L with α = β m ∈ H and g ∈ Gal(L/K) . The
map
Gal(K(H 1/m )/K) −→ Hom(H/K ×m , µm (K)), g
→ ·, g
is an isomorphism of groups. The index of K ×m in H is equal to [L : K] (possibly ∞ ).
10.5. Kummer theory and Galois cohomology 345

For a proof the reader may consult [49], Ch.5,§11.8.


10.5.3. We recall here some facts from Galois cohomology (see J.-P. Serre [278],§2 for
details). Let G be a profinite group, namely a projective limit of finite groups with the
topology induced from the product topology of the finite discrete groups, hence G is com-
pact. A discrete right- G -module M is an abelian group endowed with the discrete topol-
ogy and with a continuous right- G -action. Let us view Z as a discrete right- G -module
with trivial G -action (i.e. ng = n ). Then we denote the set of G -homomorphisms of Z
into M by H 0 (G, M ) . This is a left-exact covariant functor on the category of discrete
right- G -modules and we can form the higher cohomology groups H i (G, M ) . There is a
down-to-earth description of these G -modules. For example, H 0 (G, M ) is equal to the
fixed points of M under the action of G and H 1 (G, M ) may be viewed as the set of con-
tinuous derivations of G with values in M modulo inner derivations. Here, a derivation is
a function d : G −→ M, satisfying Leibniz’s rule
d(xy) = (dx)y + dy
(the left action of G on M is viewed as trivial). In particular, for a ∈ M , da (x) := ax −a
is a derivation. Obviously, it is a continuous derivation called an inner derivation.

10.5.4. We return to the case when A is an abelian variety over the field K . Here m ∈
Z \ {0} is not divisible by char(K) .
We note that the Galois group Gal(K s /K) is a profinite group. By 10.4.4, we have a short
exact sequence of Gal(K s /K) -modules
[m]
0 −→ A[m] −→ A(K s ) −→ A(K s ) −→ 0.
Here A(K s ) has the discrete topology. By 10.4.2, the fixed part of A(K s ) and A[m]
under the action of Gal(K s /K) are A(K) and A(K)[m] respectively. The first part of
the long exact cohomology sequence reads then
0 −→ A(K)[m] −→ A(K) −→ A(K) −→ H 1 (Gal(K s /K), A[m])
[m]
−→ H 1 (Gal(K s /K), A(K s )) −→ H 1 (Gal(K s /K), A(K s )),
where the last map is multiplication by m . For a ∈ A(K) , a closer look at the definition
of the map δ : a
→ δa shows that δa is the following derivation: Choose ba ∈ A(K s )
such that mba = a , and set
δa (g) = bga − ba , g ∈ Gal(K s /K).
Since a is K -rational and [m] is defined over K , δa (g) is indeed an element of A[m] . It
is easily seen that δa is a continuous derivation. We get a short exact sequence
0 −→A(K)/mA(K) −→ H 1 (Gal(K s /K), A[m])
−→H 1 (Gal(K s /K), A(K s ))[m] −→ 0
called the Kummer sequence. If A[m] ⊂ A(K) , then
H 1 (Gal(K s /K), A[m]) = Hom (Gal(K s /K), A[m])
(the action of the Galois group is trivial, use Leibniz’s rule). Hence the Kummer sequence
may be viewed as an analogue of the Kummer pairing when we do not assume A[m] ⊂
A(K) .
346 THE MORDELL–WEIL THEOREM

10.5.5. A basic
 1 step inthe proof of the weak Mordell–Weil theorem is the finiteness of
the group K m A(K) /K . This was shown in Corollary 10.3.13, using the theorems of
Chevalley–Weil and Hermite. Next, we give an alternative proof of the finiteness, which is
also valid when K is the function field of a curve. Here m ∈ N \ {0} is always an integer
not divisible by char(K) and we also assume that the field K contains all m th roots of
unity.
By Kummer theory (see Theorem 10.5.2), there is a unique maximal abelian extension
K ab /K of exponent dividing m , contained in K .
Let v be a valuation on K , i.e. there is a non-archimedean absolute value | | on K with
v = − log | | . We assume v to be a non-trivial valuation and we will usually identify v
with the corresponding place of K .
Let Kvnr /K be the maximal subextension of K ab /K , which is unramified over v (see
B.2.8). By Kummer theory, we have the corresponding subgroup Hv := (Kvnr )×m ∩ K ×
of K × .
Lemma 10.5.6. Suppose that the characteristic of the residue field of v does not divide m .
Then
Hv = {α ∈ K | v(α) ∈ mv(K × )}.
Proof: Let α = β m ∈ Hv with β ∈ Kvnr . Since K(β)/K is unramified over v , m
divides v(α) in the value group of v (by Lemma B.2.6). On the other hand, let α ∈ K ×
be with m|v(α) in the value group of v . Again by Lemma B.2.6, K(α1/m ) is unramified
over K , whence K(α1/m ) ⊂ Kvnr . This proves α ∈ Hv and the claim. 
Remark 10.5.7. Let M be a set of non-trivial valuations on K . An extension L/K is
said to be M - unramified if L/K is unramified over all v ∈ M . The unique maximal
nr
element KM /K in the set of M -unramified subextensions of K ab /K is given by
nr
+ nr
KM := Kv .
v∈M

Suppose that none of the residue characteristics of v ∈ M divides m . By Lemma 10.5.6,


we get
nr ×m
HM := (KM ) ∩ K × = {α ∈ K × | v(α) ∈ mv(K × ) for every v ∈ M }.
If K does not contain all m th roots of unity, we proceed as follows. Let L/K be the
nr
extension generated by the m th roots of unity. We denote by KM the maximal subexten-
sion of L /K , which is M -unramified. If none of the residue characteristics of v ∈ M
ab

divides m , then Lemma B.2.6 shows that L/K is unramified over v ∈ M and hence L is
nr
contained in KM = Lnr
M L (see Proposition B.2.3), where ML is the set of valuations on
L restricting to M in K .

10.5.8. Next, we need an analogue of the class group of a number field. We assume that M
satisfies the following finiteness condition
(F) For every α ∈ K × , we have v(α) = 0 up to finitely many v ∈ M .
An M -divisor is a finite formal sum

nv [v],
10.5. Kummer theory and Galois cohomology 347

where nv ∈ Z and v ∈ M . In other words, it is an element of the free abelian group


DivM (K) on M . For f ∈ K × , we define

divM (f ) := v(f ) [v].
v∈M

By assumption (F), this is a finite sum. The set {divM (f ) | f ∈ K × } forms a subgroup
of DivM (K) . The quotient of Div(K) by this subgroup is called the M -class group of
K and will be denoted by ClM (K) .
For the next claim, we need the subgroup
UM := {α ∈ K × | v(α) = 0 for every v ∈ M }
of K × , which is the kernel of the homomorphism divM .
Proposition 10.5.9. Let m ∈ N \ {0} , let K be a field containing all m th roots of unity
and let M be a set of valuations on K, satisfying the finiteness condition (F) from 10.5.8
and with residue characteristics not dividing m . Then
nr m
[KM : K] = [UM : UM ] · |ClM (K)[m]|
(possibly ∞ ).
Proof: By Kummer theory (see Theorem 10.5.2) and Remark 10.5.7, we have
nr
[KM : K] = [HM : K ×m ].
For every α ∈ HM there is an element a ∈ DivM (K) with divM (α) = ma (see Remark
10.5.7). The map α
→ a yields a surjective homomorphism
HM −→ ClM (K)[m].
×m
Obviously, K is in the kernel. Let us consider the group homomorphism
ϕ : HM /K ×m −→ ClM (K)[m]
and let α be in its kernel. There is f ∈ K × such that
divM (α) = m divM (f ) = divM (f m ),
showing that α/f m ∈ UM . It follows that we have a natural exact sequence
m
0 −→ UM /UM −→ HM /K ×m −→ ClM (K)[m] −→ 0
and this proves the claim. 
Lemma 10.5.10. Let K be a field with a set M of discrete valuations and let m be a
positive integer not divisible by any of the residue characteristics of v ∈ M . We denote by
ML the set of valuations on an extension L/K restricting to M . If [Lnr M L : L] < ∞ for
\S : K] < ∞ for every finite subset S ⊂ M .
nr
every finite extension L/K , then [KM
Proof: By Remark 10.5.7, we may assume that K contains all m th roots of unity.
Let πv be a local parameter of v and let L be the field extension of K generated by
1/m
{πv | v ∈ S} . Then L/K is a finite extension. It is obvious that, if w ∈ ML , w|v ∈ S ,
and α ∈ K × , then m belongs to the value group of w and that m divides v(α) . By
1/m
Lemma B.2.6, L(α1/m )/L is unramified over w . Therefore, L(HM \S ) is unramified over
w (see Definition B.2.8). By Kummer theory (see Theorem 10.5.2) and Proposition B.2.4,
348 THE MORDELL–WEIL THEOREM

1/m
this is true also for w ∈ ML , w | v ∈
/ S . Since L(HM \S ) has exponent dividing m , we
1/m
conclude that L(HM \S ) ⊂ Lnr
M L . The claim follows from

nr 1/m 1/mnr
KM \S = K (HM \S ) ⊂ L(HM \S ) ⊂ LM L . 
Example 10.5.11. Let K be a number field, M := MK the canonical set of places, and S
a finite subset of M containing all archimedean places. In such a situation, L/K is called
unramified outside S, if the extension is (M \ S) -unramified. The group ClM \S (K)
is the class group of the S -integers, which is finite ([162], Theorem 2.7.1). UM \S is
the group of S -units, which is finitely generated by Dirichlet’s unit theorem in 1.5.13.
m nr
Therefore, UM \S is of finite index in UM \S . This proves the finiteness of [KM \S : K] .
Indeed, we may enlarge S to include all places where the residue characteristic divides m .
By Remark 10.5.7, we may assume that K contains all m th roots of unity. Then the result
follows from Proposition 10.5.9.
Example 10.5.12. Let k(X) be a function field of a projective geometrically irreducible
smooth variety X over the field k and let M = MX be its set of discrete valuations, as
in 1.4.6. If S is a finite subset of M (i.e. finitely many prime divisors), then A.9.18 shows
that
ClM \S (K) ∼= Pic(X \ S)
and
UM \S = {f ∈ k(X)× | supp(div(f )) ⊂ S}.
Here, we identify S with the union of its prime divisors. Assume that S = ∅ . Then
ClM (K) is isomorphic to the Picard group of X . We claim that UM ∼ = k× . By A.8.21,
it is clear that f ∈ UM is regular on X . Using geometric irreducibility and A.6.15, we
conclude that f is constant. By A.4.11, k has to be algebraically closed in k(X) proving
f ∈ k× .
Now let X = C be an irreducible projective smooth curve of genus g over an algebraically
closed field k . We assume that the positive integer m is not divisible by char(k) . Then
we claim that
[k(C)nr
M : k(C)] ≤ m .
2g

We note that Pic(C)[m] is a subset of the Jacobian variety J of C . By Proposition 8.7.2,


we have
|J(k(C))[m]| = m2g .
The claim now follows from Proposition 10.5.9 and [k× : k×m ] = 1 .
By the claim above and Lemma 10.5.10, k(C)nr
M \S / k(C) is finite for every finite subset
S ⊂ M.
Proposition 10.5.13. Let A be an abelian variety over the field K with A[m] ⊂ A(K) for
a positive integer m not divisible by char(K) . Then, given a set M of discrete valuations
on K satisfying (F) from 10.5.8, there is a finite subset S of M such that K( m1
A(K)) ⊂
nr
KM \S .
Proof: By assumption (F) and m = 0 in K , we may assume that none of the residue
characteristics of v ∈ M divides m . Thus we may assume that K contains all m th roots
1
of unity (use Remark 10.5.7). By Remark 10.4.6, the Galois group Gal (K( m A(K))/K)
is abelian of exponent dividing m and thus K( m A(K)) ⊂ K . The morphism [m]
1 ab
10.6. The Mordell–Weil theorem 349

on A is finite and unramified (see Proposition 8.7.2). By Corollary 10.3.6, there is a finite
S ⊂ M such that K(P )/K is unramified outside S for any P ∈ A(K) with Q =
[m] P ∈ A(K) . This proves the claim. 
Now we are ready to prove a more general version of the weak Mordell–Weil theorem.
Theorem 10.5.14. Let K be a number field or a function field of an irreducible curve over
an algebraically closed field. For any abelian variety A over K and any positive integer
m not divisible by char(K) , the quotient A(K)/mA(K) is finite.
Proof: In the function field case, we may assume that the curve is projective and smooth
(see A.13.2, A.13.3). By a finite base change, we may also assume that K contains all
m th roots of unity and that A[m] ⊂ A(K) (see Lemma 10.4.3, Lemma 1.4.10) . Now use
1
Examples 10.5.11, 10.5.12 and Proposition 10.5.13 to show that K( m A(K)) is a finite
extension of K . By Remark 10.4.6, this proves the claim. 
Remark 10.5.15. If K is a number field or a function field of a curve over an algebraically
closed field, then we may choose S in Proposition 10.5.13 as follows. First, note that M
denotes then the set of standard non-archimedean places of K . By Proposition 10.3.10, we
may choose
S = {v ∈ M | v(m) ≥ 1 or A has bad reduction}.

Obviously, the first condition is only necessary in the number field case. Note also that S
is finite (see [44], 1.4.3).

10.6. The Mordell–Weil theorem

We have the Mordell–Weil theorem:


Theorem 10.6.1. If A is an abelian variety over a number field K , then A(K)
is a finitely generated abelian group.

In order to prove this fundamental theorem we need first the Fermat descent:
Lemma 10.6.2. Let G be an abelian group and let m ≥ 2 be a positive integer.
Let also   be a real function on G satisfying
x − y ≤ x + y, mx = m x
for any x, y ∈ G . Assume that S is a set of representatives for G/mG , bounded
relative to   by a constant C . Then for any x ∈ G , there is a decomposition

l
x= mi yi + ml+1 z,
i=0

where yi ∈ S and where z ∈ G satisfies z ≤ C + 1. In particular, G is


generated by elements in the ball
{x ∈ G | x ≤ C + 1}.
350 THE MORDELL–WEIL THEOREM

Proof: There are y0 ∈ S , x0 ∈ G such that x = y0 + mx0 . We have


1
x0  ≤ (C + x).
m
Proceeding by induction, there are yl ∈ S , xl ∈ G such that xl−1 = yl + mxl
and  l+1 
 1 1
xl  ≤ i
· C + l+1 x.
i=1
m m
We choose l so large that x ≤ ml+1 and set z := xl , getting
1
z ≤ C + 1 ≤ C + 1.
m−1
Moreover, we have
x = y0 + my1 + . . . + ml yl + ml+1 z,
which proves the first claim. The second claim is a trivial consequence of the
first. 
Remark 10.6.3. Fermat’s famous proof that x4 + y 4 = z 2 has no non-trivial
integer solution was done by constructing a non-trivial smaller solution starting
from any non-trivial integer solution, thereby leading to a contradiction (see H.
Edwards [97],§1.5). This was the original Fermat’s descente infinie. Its structure
is indeed very similar to the proof of Lemma 10.6.2, where we start with a group
element x = x0 and produce new group elements x1 , x2 , . . . which get smaller
and smaller, ending with z = xl of norm ≤ C + 1.
10.6.4. Proof of Theorem 10.6.1: Choose an integer m ≥ 2. The weak Mordell–
Weil theorem in 10.4.1 gives the finiteness of A(K)/mA(K). By Proposition
8.6.4, there is an even ample c ∈ Pic(A). By Theorem 9.3.5, the assumptions
of Lemma 10.6.2 for   := 
1/2
hc on G := A(K) are satisfied. Therefore
Lemma 10.6.2 shows that the group A(K) is generated by a bounded set. Finally,
Northcott’s theorem in 2.4.9 shows that A(K) is finitely generated. 
Remark 10.6.5. More generally, the Mordell–Weil theorem is true for fields
finitely generated over the prime field. We give below the argument for the func-
tion field K of an irreducible curve over a finite field. For the higher-dimensional
case and further generalizations, which include Néron’s proof of Severi’s theorem
of the base for curves on an algebraic surface, see [169], Ch. 6.
We give the proof of the Mordell–Weil theorem for K = k(C) , where k is a finite field
and C is an irreducible curve over k . By a finite base change using Lemma 1.4.10, we
may assume that k contains all m th rooth of unity and that A[m] ⊂ A(K) . Here, we have
used that a subgroup of a finitely generated abelian group is again finitely generated.
Note that Ck is birational to a disjoint finite union of irreducible smooth projective curves
Cj (see A.13.2, A.13.3) and it is clear that k(C) is a subfield of any k(Cj ) . Since Cj is
10.7. Bibliographical notes 351

defined over a finite field, we may reduce to the case where C is a geometrically irreducible
smooth projective curve with a k -rational base point.
Then we note that the weak Mordell–Weil theorem holds for K = k(C) . Indeed the
same proof as in Theorem 10.5.14 applies with the only exception that we have no longer
[k× : k×m ] = 1 but this index is trivially finite.
The same proof as in 10.6.4 then shows that the Mordell–Weil theorem holds for K =
k(C) . It suffices to remark that Northcott’s theorem holds by Proposition 9.4.16 because
the Northcott property (N) is valid for K (see Example 9.4.20).

10.7. Bibliographical notes

Our presentation in Section 10.2 is an expansion of J.W.S. Cassels’s elegant ele-


mentary account [61] of the proof of the weak Mordell–Weil theorem for elliptic
curves.
C. Chevalley and A. Weil [65] considered in their original paper a finite unrami-
fied covering of a plane curve. Later, A. Weil [326] extended the Chevalley–Weil
theorem to higher-dimensional varieties. The proofs are always based on Weil’s
decomposition theorem. Our presentation is a little bit more general than the usual
presentations by working first with arbitrary valuations.
Vojta ([307], Th.5.1.6) gave a generalization of the Chevalley–Weil theorem to
ramified coverings, where the ramification is measured by the counting function
of the ramification divisor. Hence the ramification of K(P )/K(ϕ(P )) may be
unbounded along the ramification of the covering ϕ . For a generalization of the
Chevalley–Weil theorem to maps of one-dimensional orbifolds, we refer to H.
Darmon [76]. The idea is that the boundedness of the original Chevalley–Weil
theorem still holds if we allow modest ramification. This subject will be touched
upon in the proof of the finiteness of solutions of the generalized Fermat equation
in Section 12.6.
The deduction of the weak Mordell–Weil theorem from Kummer theory and from
the Chevalley–Weil theorem is fairly standard, see for example Serre [277], §4.2.
The exposition of Section 10.5 follows Lang [169], Ch.6, where the reader will
also find generalizations to higher-dimensional function fields. The generalization
of the Mordell–Weil theorem to finitely generated fields over the prime field is due
to A. Néron [216].
As mentioned in the introduction, the Mordell–Weil theorem is not effective. For
effective upper bounds of the order of the group A(K)/mA(K) and its relation
to the rank of A(K), see M. Hindry and J.H. Silverman [153], Th.C.1.9.
1 1 FA LT I N G S ’ S T H E O R E M

11.1. Introduction

This chapter contains a detailed proof of Faltings’s theorem:

Theorem 11.1.1. Let C be a geometrically irreducible smooth projective curve


of genus g ≥ 2, defined over a number field K . Then the number of K -rational
points of C is finite.

The assumption g ≥ 2 is crucial, the theorem fails for C = P1K and for elliptic
curves of positive rank. We may easily dispense with the other assumptions on
C . If C is any curve over the number field K , then A.13.2 and A.13.3 show the
existence of a finite extension L/K such that CL is birational to a finite disjoint
union of geometrically irreducible smooth projective curves Cj over L and we
may apply Faltings’s theorem to Cj (L) assuming that the genus of Cj is ≥ 2.
The above theorem was conjectured by Mordell (for the rational field K = Q )
at the end of his paper [207] proving the finite generation of the group of rational
points of an elliptic curve defined over Q . Its function field analogue was proved
by H. Grauert [129] in 1965 (the important earlier paper by Yu.I. Manin [189],
which claimed a proof of this result, was much later recognized to contain a serious
gap, which was eventually corrected by R. Coleman [69] in 1990). The Mordell
conjecture was at last proved by G. Faltings [113] in 1983, as a consequence of his
proofs of the Tate conjecture and the Shafarevich conjecture. He used Arakelov
theory on moduli spaces; we refer to the books of G. Faltings and G. Wüstholz
[116] and L. Szpiro [295] for details.
A completely new proof was then given by P. Vojta, first in the function field case
[308] and then in the arithmetic case [310]. In this chapter, we shall give Vojta’s
proof with the simplifications given in [29]. In view of the complicated proof, it
is worthwhile to give an outline of the basic ideas behind it.
The precursor of Vojta’s proof goes back to Mumford’s paper [211] of 1965, with
the results, which have been already described and proved in detail in Theorem
9.4.14. There we show that the height h∆ on C × C can be expressed, up to
bounded quantities, in terms of Néron–Tate heights on the Jacobian J of C . It
352
11.1. Introduction 353

then follows, by the quadraticity of heights on abelian varieties, that

h∆ (P, Q) = (quadratic form) + (linear form) + O(1).

Here we see directly that the quadratic form in the right-hand side of this equation
is indefinite, if the genus g is at least 2.
On the other hand, since ∆ is an effective curve on C × C , the height h∆ (P, Q)
is bounded below, away from the diagonal. This puts strong restrictions on the
pair (P, Q) because it means that P and Q, considered as lattice points in the
euclidean space J(K) ⊗ R , can never be too close to each other with respect to
the positive definite inner product determined by the canonical form. A simple
geometric argument now shows that the values of the height of rational points
on C , arranged in increasing order, grow at least exponentially. This is in sharp
contrast with the quadratic growth of rational points on elliptic curves, and shows
that rational points on curves of genus at least 2 are much harder to come by.
Mumford’s argument works over any field and is not limited to zero characteristic.
Since there are examples of curves of genus at least 2 over a function field of
positive characteristic having infinitely many rational points with height increasing
at an exponential rate, as shown here in 9.4.19 and the following examples, some
new idea was needed to attack the Mordell conjecture along Mumford’s line.
This new idea was provided by Vojta. We may look at other divisors than the
diagonal, and more precisely the set of divisors D for which the quadratic part of
the height hD (P, Q) is indefinite. If C is a general curve (in particular smooth),
the Néron–Severi group of numerical equivalence classes of divisors on the surface
C × C is generated by {P0 } × C , C × {P0 } (here P0 is any point on C ) and
the diagonal ∆ , so we have to deal with a divisor D numerically equivalent to
l {P0 }×C+m C×{P0 }+n ∆ (see [130], pp.285–286). If (l+n)(m+n) < g 2 n2 ,
the quadratic form associated to hD is indefinite, and, moreover, for large k , the
Riemann–Roch theorem shows that the linear system |kD| is not empty and of
positive dimension if gn2 < (l + n)(m + n). Thus, replacing D by a divisor
in the rational equivalence class of kD , we may suppose that D is effective and
therefore hD is bounded below away from the support of D . Hence the idea is to
choose the parameters l, m, n so as to force the quadratic form to be very negative
at (P, Q), and then use the fact that hD is bounded below to conclude that P and
Q have bounded height.
The new problem we face here is the fact that the choice of l, m, n depends on
the ratio of the heights of P and Q, so that we need to show that not only hD is
bounded below, but also that this lower bound has a sufficiently good uniformity
with respect to the divisor D . This difficulty was overcome by Vojta using the
arithmetic intersection theory [126] and the arithmetic Riemann–Roch theorem
obtained by H. Gillet and C. Soulé in [126] and [127].
354 FA LT I N G S ’ S T H E O R E M

As Vojta’s paper clearly shows, this idea is overly simple as such and there is
one more big obstacle to overcome. The difficulty is that the lower bound fails if
(P, Q) belongs to the support of the effective divisor D . In order to obtain a small
lower bound, the divisor D must be defined locally by means of equations with
small height, and there is little room to move it away from (P, Q). In characteristic
zero, by an appropriate use of derivations, we see that this is not too serious a
difficulty unless the divisor D goes through (P, Q) with very high multiplicity.
Note that in positive characteristic the argument using derivations fails (as it must),
and it is here that characteristic zero becomes part of the proof.
This situation is reminiscent of a familiar point, which occurs in many proofs in
diophantine approximation and transcendence theory, namely the non-vanishing at
specific points of functions arising from auxiliary constructions. In the classical
case, there are various techniques at our disposal: Roth’s lemma, which is arith-
metic in nature, the algebro-geometric Dyson’s lemma [102], and the powerful and
flexible product theorem of Faltings [114] (see Theorem 7.6.4).
In our situation, application of any of these methods requires the important con-
dition that the ratio of the heights of Q and P be sufficiently large in order to
conclude that the divisor D cannot go through (P, Q) with high multiplicity.
More precisely, Vojta uses a suitable extension of Dyson’s lemma for the product
of two curves and shows that, if (l + n)(m + n) is sufficiently close to gn2 and
(l + n)/(m + n) is sufficiently small, then any effective D as above does not
vanish too much at (P, Q), thereby completing the proof.
Faltings, in his solution of the Lang conjecture for subvarieties of abelian varieties,
uses his product theorem. In our particular situation, it shows that either (P, Q)
belongs to a finite union of proper product subvarieties of C × C , and in particular
P or Q belong to a finite set, or again D does not vanish too much at (P, Q).
The paper [29] simplified Vojta’s proof by showing that a direct application of
Roth’s lemma also suffices for obtaining the required small vanishing of D at
(P, Q). The proof uses only the elementary theory of heights developed in Chap-
ters 1 and 2 and replaces the difficult arithmetic Riemann–Roch theorem in Vojta’s
proof by the algebro-geometric Riemann–Roch theorem on the surface C × C and
by the classical Siegel lemma.
This chapter is organized as follows. In Section 11.2 we study Vojta divisors and
the associated heights. The short Section 11.3 extends Mumford’s method, already
presented in Chapter 9, to Vojta divisors, getting the required upper bound for the
height.
The short Section 11.4 gives a simple proof of a local version of Eisenstein’s the-
orem on the coefficients of Taylor series of algebraic functions, which is used to
control derivations.
11.1. Introduction 355

Section 11.5 introduces norms on certain spaces of power series and proves a gen-
eralization of Gauss’s lemma in 1.6.3. These results are used to give a less ad
hoc proof of the Eisenstein theorem, which is now derived as an application of
Banach’s fixed point theorem. Our goal in including this material here has been
to provide the reader with additional information, which may be useful in other
contexts.
Section 11.6 obtains the crucial lower bound for the height, in terms of the height
of a set of defining equations of the Vojta divisor and of its multiplicity or, more
precisely, index at (P, Q). Sections 11.7 and 11.8 construct a Vojta divisor of
small height, apply Roth’s lemma and show that the Vojta divisor has low multi-
plicity at (P, Q). Section 11.9 compares the upper and lower bounds so obtained
and completes the proof of Faltings’s theorem outlined before.
We conclude this chapter by describing in Section 11.10, without proofs, two fur-
ther important results. The first is Faltings’s big theorem [114], [115] dealing
with rational points of subvarieties of abelian varieties, and we give two applica-
tions. The second result, conjectured by Bogomolov in the case of curves, deals
with small points on subvarieties of abelian varieties and is due to S.W. Zhang
[342] and L. Szpiro, E. Ullmo, and S.W. Zhang [296]. The much easier corre-
sponding results for small points on subvarieties of a linear torus were treated in
some detail in Chapter 4. This section may be skipped in a first reading.
In the proof of Faltings’s theorem, we have assumed for simplicity that C has a
point P0 defined over K , and made P0 as part of our data; for example, we embed
C into its Jacobian by means of the Albanese map P → cl([P ] − [P0 ]). Thus
certain constants appearing in the proof will depend a priori not only on the curve
C , but also on the choice of P0 . This can be avoided by fixing instead a divisor D0
of small degree and small height (for example a suitable canonical divisor), and
working with the map P → cl((deg D0 )[P ] − D0 ) instead of the Albanese map,
at the cost of introducing additional complications in the application of Mumford’s
method, to get an upper bound for the height (for an application, see E. Bombieri,
A. Granville, and J. Pintz [33]).
The finiteness result of Faltings’s theorem is ineffective, in the sense that it pro-
vides no upper bound on the height of solutions. However, we can get bounds for
the number of solutions. An examination of the proof shows that solutions can
be viewed as the union of two sets, namely small solutions, for which an explicit
bound for the height can be given, and large solutions. In order to study large solu-
tions, we work in an euclidean space Rr , where r is the rank of the Mordell–Weil
group J(C)(K) of the Jacobian of C . The group J(C)(K)/tors is a lattice in
Rr and the euclidean metric induces the Néron–Tate height on this lattice. Now
the result of the Vojta construction is that any two large solutions in C(K) either
have comparable height within a constant factor, or determine two vectors at an
356 FA LT I N G S ’ S T H E O R E M

angle of at least say 40◦ , which suffices for proving finiteness in any cone with
center O and opening 40◦ . However, just to start Vojta’s method we need at least
two solutions in a cone and if there is only one solution we cannot say anything on
the height. Of course, in this case we obtain finiteness for free and a good bound
for the number of solutions.
We have chosen not to give explicit bounds for the constants c , C1 , C2 , . . . and
also other constants involved in the symbols  and O( ), which appear in the
course of the proof. However, all such constants are effectively computable and,
at the end of Section 11.9, we mention some explicit bounds for the number of
solutions.
This chapter is based on Chapters 2, 8, 9, and 10.

11.2. The Vojta divisor

Let C be an irreducible smooth projective curve of genus g over the field K with
a K -rational point. Obviously, this is no restriction for the proof of Faltings’s
theorem. Thus we begin by fixing a point P0 ∈ C(K). In fact, this section is
devoted to purely geometric properties of certain divisors on C × C , and we never
use the assumption that K is a number field.
By ∆ we denote the diagonal of C × C and for simplicity of notation we shall
also write
∆ := ∆ − {P0 } × C − C × {P0 }.
We study here properties of divisors on C × C , which are expressed as linear
combinations of the divisors {P0 } × C , C × {P0 }, and ∆ . It is worth noting
that, since we are in characteristic 0, for a general curve C , these three divisors
generate the full group of divisors of C × C up to algebraic equivalence, hence the
apparently special situation considered here is in fact typical for the general case.
Lemma 11.2.1. The following table gives the intersection numbers:

{P0 } × C C × {P0 } ∆

{P0 } × C 0 1 0
C × {P0 } 1 0 0

∆ 0 0 −2g

Proof: In order to compute the intersection numbers, we may assume that K is


algebraically closed, see A.9.25. The identities
({P0 } × C) · (C × {P0 }) = 1, ({P0 } × C) · ∆ = 1
11.2. The Vojta divisor 357

were shown in Example A.9.23. Next, we show ({P0 } × C) · ({P0 } × C) = 0. In


fact, the divisor {P0 } × C is algebraically equivalent to {P1 } × C for any point
P1 ∈ C(K) (cf. A.9.40), therefore A.9.38 shows
({P0 } × C) · ({P0 } × C) = ({P0 } × C) · ({P1 } × C).
Since {P0 } × C and {P1 } × C have disjoint supports if P1 = P0 , we must have
({P0 } × C) · ({P1 } × C) = 0, proving what we want.
We claim that ∆ · ∆ = 2 − 2g . By symmetry and linearity, this will complete the
intersection table. Let N∆ be the normal bundle of ∆ in C × C . By Proposition
A.9.19, we have
∆ · ∆ = deg N∆ .
The pullback of N∆ under the diagonal map is isomorphic to the tangent bundle
on C . The degree of the latter equals
− deg (ΩC/K ) = 2 − 2g
by the Riemann–Roch theorem on C (cf. A.13.6). 
Definition 11.2.2. For d1 , d2 , d ∈ N , the divisor
V := d1 {P0 } × C + d2 C × {P0 } + d ∆
is called a Vojta divisor.
11.2.3. We are interested in expressing the divisor V as a difference of two well-
chosen, very ample divisors on C × C , in order to calculate an associated height.
Fix N ≥ 2g + 1 for the rest of this chapter. By A.13.7, the divisor N [P0 ] is very
ample on C . Let
ϕN [P0 ] : C −→ PnK
be the corresponding closed embedding with O(N [P0 ]) ∼ = ϕ∗ OPn (1) (see for
instance Remark A.6.11). It follows easily (see [134], Prop.4.3.1) that the product
ϕN [P0 ] × ϕN [P0 ] gives a closed embedding
ψ : C × C −→ PnK × PnK
and we have
OC×C (δ1 N {P0 } × C + δ2 N C × {P0 }) ∼
= ψ ∗ OPn ×Pn (δ1 , δ2 ). (11.1)

For notational convenience, we shall abbreviate throughout O(d) := OPm (d)


and O(δ1 , δ2 ) := OPn ×Pn (δ1 , δ2 ).
Lemma 11.2.4. For integers δ1 , δ2 ≥ 1, the divisor δ1 N {P0 } × C + δ2 N C ×
{P0 } is very ample.
Proof: Since O(δ1 , δ2 ) is very ample (see A.6.13), this follows from (11.1). 
358 FA LT I N G S ’ S T H E O R E M

Lemma 11.2.5. If M is a sufficiently large integer, then


B := M ({P0 } × C + C × {P0 }) − ∆
is a very ample divisor on C × C .
Proof: This follows from Lemma 11.2.4 and A.6.10 (a). 
11.2.6. We fix such an M once for all. Let
φB : C × C −→ Pm
K
be a corresponding closed embedding such that φ∗B OPm (1) ∼ = O(B) (cf. Remark
A.6.11). The coordinates of Pm K and Pn
K × P n
K will be denoted by y and (x, x ).
We consider C × C as a closed subvariety of PK or PK × PK , as the case may
m n n

be, without mentioning the closed embeddings φB or ψ . Because they are defined
with a basis of global sections, no coordinate xj , xj , or yi vanishes identically on
C × C . Let us consider the condition:
(V1) δ1 := (d1 + M d)/N and δ2 := (d2 + M d)/N are positive integers.
By adding to d1 and d2 integers bounded by N , (V1) will be satisfied. This is
only a technical condition of minor import. Then we get a decomposition
V = (δ1 N {P0 } × C + δ2 N C × {P0 }) − dB
of the Vojta divisor V into the difference of two very ample divisors (Lemma
11.2.4 and Lemma 11.2.5). It follows from A.10.38 that for sufficiently large
δ1 , δ2 , d the following condition is satisfied:
(V2) The first cohomology groups of Jψ(C×C) (δ1 , δ2 ) and JφB (C×C) (d) van-
ish.
Here, as usual, JX denotes the ideal sheaf of a closed subvariety X and F(δ1 , δ2 ),
F(d) denote the tensor product of a sheaf F with the corresponding standard very
ample sheaf of a multiprojective space.
Now the long exact cohomology sequence shows that the natural maps
ψ ∗ : Γ (PnK × PnK , O(δ1 , δ2 )) −→ Γ (C × C, ψ ∗ O(δ1 , δ2 ))
and
φ∗B : Γ (Pm
K , O(d)) −→ Γ (C × C, O(dB))
are surjective (use Example A.10.20, A.10.22, and A.10.25). This is the property
we shall use in the following lemma.
Lemma 11.2.7. Suppose that V is a Vojta divisor satisfying (V1) and (V2). For
any global section s of O(V ), there are polynomials Fi (x, x ), i = 0, . . . , m,
bihomogeneous of bidegree (δ1 , δ2 ), such that

s = Fi (x, x )/yid  C×C
(11.2)
for i = 0, . . . , m.
11.3. Mumford’s method 359

Conversely, assume that Fi (x, x ), i = 0, . . . , m, are bihomogeneous polynomi-


als of bidegree (δ1 , δ2 ), satisfying
Fi (x, x )/yid = Fj (x, x )/yjd (11.3)
on C × C for every i, j . Then there is a unique global section s on O(V ) such
that (11.2) is valid for every i.
Proof: Let s be a global section of O(V ). Then
s ⊗ (yid |C×C )
is a global section of the line bundle associated to the divisor
V + dB = δ1 N {P0 } × C + δ2 N C × {P0 }.
By (11.1) on page 357, we get bihomogeneous polynomials Fi (x, x ) of bidegree
(δ1 , δ2 ) satisfying (11.2). Conversely, assume that Fi (x, x ), i = 0, . . . , m, are
bihomogeneous polynomials of bidegree (δ1 , δ2 ) fulfilling (11.3) on C × C for
every i, j . Then we define a meromorphic section of O(V ) by formula (11.2).
This is independent of the choice of i ∈ {0, . . . , m} by (11.3). Note that the poles
of s are contained in the subvariety yi = 0. Since y0 , . . . , ym have no common
zero, s is a global section (see A.8.21). 

11.3. Mumford’s method and an upper bound for the height

Let C be an irreducible projective smooth curve of genus g ≥ 1 over a field K


with product formula and let us fix P0 ∈ C(K) as in Section 11.2. The Jacobian
variety of C , defined and studied in Section 8.10, is denoted by J . We have the
closed embedding
j : C −→ J, P −→ cl([P ] − [P0 ]),
which is defined over K , since P0 ∈ C(K). Let  ,  be the canonical form of
J as defined in 9.4.2; it is a symmetric positive semidefinite form on J(K). The
corresponding seminorm is denoted by | |; more explicitly, we have |a|2 = a, a
for a ∈ J(K).
The next proposition generalizes Mumford’s formula (Proposition 9.4.3) to the
Vojta divisor V . It will be used as an upper bound for a height function hV (P, Q).

Proposition 11.3.1. Let P, Q ∈ C(K) and z = j(P ), w = j(Q). Then


d1 2 d2
hV (P, Q) = |z| + |w|2 − d z, w
2g 2g
+ d1 O(|z|) + d2 O(|w|) + (d1 + d2 + d + 1)O(1).
360 FA LT I N G S ’ S T H E O R E M

Proof: We have seen in the proof of Proposition 9.4.3 that


h∆ (P, Q) = −z, w + O(1)
and
1 1
h{P0 }×C (P, Q) = |z|2 − ĥ − (z) + O(1),
2g 2g θ−θ
with a similar formula for hC×{P0 } (P, Q). Therefore
d1 2 d2
hV (P, Q) = |z| + |w|2 − d · z, w
2g 2g
d1 d2
− ĥθ−θ− (z) − ĥ − (w) + (d1 + d2 + d + 1)O(1).
2g 2g θ−θ
Since θ−θ− ∈ Pic0 (J) (see Theorem 8.8.3) and since, by the proof of Proposition
9.4.1, θ + θ− is an even ample class with a, a = ĥθ+θ− (a) for a ∈ J , we get
the claim by Corollary 9.3.8. 
Remark 11.3.2. By the preceding Proposition 11.3.1, there is a natural quadratic
form on J(K) × J(K) associated to hV (P, Q), namely
d1 2 d2
|z| + |w|2 − dz, w.
2g 2g
Using the Cauchy–Schwarz inequality, this form is indefinite if and only if d1 d2 <
g 2 d2 .

11.4. The local Eisenstein theorem

In this section we give a quick proof of a local version of a well-known theorem


of Eisenstein on the coefficients of the Taylor series expansion of an algebraic
function of one variable.
Theorem 11.4.1. Let K be a field of characteristic 0 complete with respect to
an absolute value | | and let p(x, t) ∈ K[x, t] be a polynomial in two variables
with partial degrees at most d . Let ξ be an algebraic function of x such that
p(x, ξ(x)) = 0. We suppose that ξ(0) ∈ K , |ξ(0)| ≤ 1 and
∂p
(0, ξ(0)) = 0.
∂t
∞
Then the Taylor series expansion ξ(x) = k=0 ak xk has coefficients in K and
the following bound holds for l ≥ 1
; 
2l−1
 ∂p 
l 
|al | ≤ C |p|  (0, ξ(0)) ,
∂t
11.4. The local Eisenstein theorem 361

where |p| = max |coefficients of p| is the Gauss norm of p , and



1 in the non-archimedean case
C=
|8(d + 1) | in the archimedean case.
7

Proof: Suppose first that | | is not archimedean. We abbreviate pt = ∂p/∂t and


d l
∂l = (1/l!)( dx ) . By Leibniz’s formula

∂l (p(x, ξ(x))) = pij ∂l0 xi ∂l1 ξ · · · ∂lj ξ ,
ij

where the inner sum is over all solutions of l0 + · · · + lj = l ; the pij s are the
coefficients of the polynomial p . The sum of the terms with lλ = l and λ = 0
is simply pt · ∂l ξ ; since ∂l (p(x, ξ(x))) is identically 0 and the absolute value is
ultrametric, we get
|pt (0, ξ(0))| · |∂l ξ(0)| ≤ |p| max |∂l1 ξ(0)| · · · |∂lj ξ(0)| ,
where max runs over l1 + · · · + lj ≤ l with each lλ < l . On the other hand,
by hypothesis pt (0, ξ(0)) = 0 and we also have |ξ(0)| ≤ 1. This means that in
the last displayed inequality we need only consider products in which each lλ is at
least 1; noting that al = ∂l ξ(0) and |pt (0, ξ(0))| ≤ |p|, we get Theorem 11.4.1
by induction.
We treat the case in which | | is archimedean in a different fashion. We assume
that | | is normalized as usual. Let us abbreviate ξ (l) = ( dx
d l
) ξ . By induction on
l we establish that there is a polynomial ql (x, t) such that
ql (x, ξ(x))
ξ (l) (x) = −
pt (x, ξ(x))2l−1
for l ≥ 1 . We have q1 = px and
ql+1 = (ql )x p2t − (ql )t pt px + (2l − 1)ql (ptt px − pxt pt ) . (11.4)
Note that the partial degree of ql with respect to x (resp. t ) is bounded by l(2d −
1) − d (resp. l(2d − 2) + 2 − d ). For two polynomials f, g in any number of
variables, we have |f g| ≤ N · |f | · |g|, where N is the number of monomials in
g . Thus, from the recurrence (11.4), we estimate the Gauss norm |ql+1 | by
|ql+1 | ≤ A |p|2 |ql | ,
where
A = d4 (d + 1)2 (degx (ql ) + degt (ql )) + 2(2l − 1)d6 (d + 1)
≤ 8(d + 1)7 l .
Noting that |q1 | ≤ d |p|, this yields
|ql | ≤ (l − 1)! 8l−1 (d + 1)7l−6 |p|2l−1 .
362 FA LT I N G S ’ S T H E O R E M

The required estimate for al = ∂l ξ(0) follows from


1 ql (x, ξ(x))
∂l ξ(x) = −
l! pt (x, ξ(x))2l−1
and |ql (0, ξ(0))| ≤ |ql |(degt (ql ) + 1). This proves the theorem. 
In the next section, which is not essential for the understanding of this chapter,
we give a more conceptual proof of a slightly stronger result, valid in the case of
several variables and also in any characteristic.

11.5. Power series, norms, and the local Eisenstein theorem

In this section K is a field complete with respect to an absolute value | | . We shall obtain
here some simple but useful facts about norms of polynomials and power series, and deduce,
by an application of Banach’s fixed point theorem, a more general version of Theorem
11.4.1 covering the case of an algebraic function of several variables over a field of arbitrary
characteristic.
11.5.1. Let us fix n and let x = (x1 , . . . , xn ) . The ring of formal power series in x is
denoted by K[[x]] . For

f (x) := aα xα ∈ K[[x]]
α∈Nn

and r = (r1 , . . . , rn ) with ri > 0 , we define



f r := |aα |rα .
α∈Nn

Then
K r−1 x := {f ∈ K[[x]] | f r < ∞}
is a Banach algebra over K , complete with respect to the norm  r , satisfying f gr ≤
f r · gr . The spectral norm of f is defined by

ρr (f ) := inf f k 1/k
r .
k∈N\{0}

It follows easily from f gr ≤ f r · gr that

ρr (f ) = lim f k 1/k
r .
k→∞

If r = (1, . . . , 1) , we shall omit the suffix r in what follows. It is notationally convenient


to study only the case in which r = (1, . . . , 1) . If each ri belongs to the value group
of K , then we can find elements zi such that |zi | = ri and replacing (x1 , . . . , xn ) by
(x1 /z1 , . . . , xn /zn ) renormalizes the situation to the case in which r = (1, . . . , 1) .
However, this need not be the case in general, even if we assume K to be algebraically
closed. The following simple lemma shows that this difficulty disappears if we go to a
suitable extension L/K of K and a suitable extension of | | :
11.5. Norms and the local Eisenstein theorem 363

Lemma 11.5.2. Given r > 0 , there is always a field extension L/K and an extension of
| | to L such that r is in the value group of this extension. In the non-archimedean case,
the value group of K is given by
×
|K | = {|α|1/m | α ∈ K × , m ∈ N \ {0}}.
×
Proof: We begin by proving the second claim. The inclusion ⊃ is clear. Now let β ∈ K
be a zero of the polynomial
f (t) := an tn + an−1 tn−1 + · · · + a0
with coefficients in K , not all 0. Since f (β) = 0 and | | is ultrametric, there must be two
distinct indices i , j such that
|ai β i | = |aj β j | = maximum.
This yields
|β| = |ai /aj |1/(j−i) ,
completing the proof of the second claim. The first claim is a consequence of the second
one and of [47], Ch.VI, §10, no.1, Prop.1. In the archimedean case, we do not need an
extension by Ostrowski’s theorem (see Theorem 1.2.6). 
In view of this lemma, we see that going to a field extension we may always renormalize
everything so that r = (1, . . . , 1) .
Lemma 11.5.3. If the absolute value on K is not archimedean, then
ρr (f ) = sup |aα |rα .
α∈Nn

Moreover, ρr (f ) is multiplicative, i.e. ρr (f g) = ρr (f )ρr (g) for f, g ∈ K r−1 x .


Proof: By Lemma 11.5.2, it suffices to deal with the case r = (1, . . . , 1) , hence we drop
the suffix r in what follows.
Let us denote the right-hand side of the claim by m(f ) . We shall see below that m is a
multiplicative norm. In particular, m is power multiplicative
m(f k ) = m(f )k
for all positive integers k . Since m(f ) ≤ f  , we conclude that m(f ) ≤ ρ(f ) . In order
to prove m(f ) ≥ ρ(f ) , we may assume that f is a polynomial. Now for a polynomial f
in n variables of degree d , we have
f  ≤ (d + 1)n m(f ).
Applying this to the polynomial f k , we get
f k 1/k ≤ (kd + 1)n/k m(f )
and, letting k → ∞ , we deduce ρ(f ) ≤ m(f ) , completing the proof of the first statement.
Finally, we prove that m is multiplicative. This is an extension of Gauss’s lemma (see
Lemma 1.6.3) from polynomials to formal power series. The proof is immediate, applying
Gauss’s lemma to arbitrarily large truncations of the power series.
364 FA LT I N G S ’ S T H E O R E M

Another argument is as follows. We may suppose m(f ) = m(g) = 1 and we need to


prove that m(f g) = 1 . Let R be the valuation ring of | | , and k(v) be its residue field.
Reduction of the coefficients modulo the maximal ideal gives a homomorphism
π : R x −→ k(v)[x].
Since k(v)[x] is an integral domain, we have π(f g) = π(f )π(g) = π(0) . This proves
m(f g) = 1 , hence m is multiplicative. 
×
Lemma 11.5.4. Assume that r ∈ |K |n and let P (r) denote the closed polydisk
n
P (r) := {x ∈ K | |xi | ≤ ri , i = 1, . . . , n}.
Then for f ∈ K r−1 x it holds
ρr (f ) = sup |f (x)|
x∈P (r)

and there is x ∈ P (r) with ρr (f ) = |f (x)| .


Proof: Again, we may assume that r = (1, . . . , 1) and that K is algebraically closed. Let
s(f ) := sup |f (x)|.
x∈P

The function s is a norm on K x satisfying


s(f k ) = s(f )k
for all positive integers k . Since s ≤   , we conclude s(f ) ≤ ρ(f ) .
First, we conclude the proof in the non-archimedean case. The argument is the same as at
the end of proof of the preceding Lemma 11.5.3. We may assume that ρ(f ) = 1 and need
to prove s(f ) = 1 .
It is an easy consequence of Gauss’s lemma in 1.6.3 that the residue field k(v) is alge-
braically closed. By Lemma 11.5.3, the coefficients of f have absolute value at most 1 and
π(f ) is not 0, thus there is π(x) ∈ k(v)n with π(f )(π(x)) = π(0) . Now the claim is a
consequence of |f (x)| = 1 .
It remains to prove ρ(f ) ≤ s(f ) and the existence of x ∈ P with |f (x)| = s(f ) in the
archimedean case. By Ostrowski’s theorem in 1.2.6, we have K = C . The existence of x
follows from continuity of f and the compactness of P .

By the Cauchy inequalities for f (x) = aα xα , we have |aα | ≤ s(f ) . This is well
known for one variable and follows in the general case by induction on the number of
variables. In order to prove the desired inequality, we may assume that f is a polynomial
in n variables, of degree d . Then Cauchy’s inequality yields a fortiori
f  ≤ (d + 1)n s(f )
and the claim follows as in the proof of Lemma 11.5.3. 
−1
Definition 11.5.5. The algebra K{r x} of strictly convergent power series is the com-
pletion of K r−1 x with respect to the spectral norm ρr .
Proposition 11.5.6. The completion K{r−1 x} of K r−1 x is:
11.5. Norms and the local Eisenstein theorem 365

(a) If | | is not archimedean, then


$ %
K{r−1 x} = aα xα ∈ K[[x]] | lim |aα |rα = 0
|α|→∞

and ρr is the multiplicative Gauss norm, i.e.


 
ρr aα xα = maxn |aα |rα .
α∈N

−1
(b) If K = C , then C{r x} is equal to the set of continuous complex functions on
P (r) , which are analytic in the interior and ρr is the supremum norm on P (r) .
(c) If K = R , then R{r−1 x} is the Banach subalgebra of C{r−1 x} given by the
functions that have Taylor series at 0 with real coefficients.
Proof: The non-archimedean case follows easily from Lemma 11.5.3. If K = C , then
Lemma 11.5.4 and Weierstrass’s theorem show that C{r−1 x} consists of continuous func-
tions on P (r) , which are analytic in the interior, and that ρr is the supremum norm on
P (r) .
Conversely, let f be a continuous function on P (r) , which is analytic in the interior.
Clearly, we may assume r1 = · · · = rn = 1 . Let aα xα be the Taylor series of f
at 0 . By the Cauchy formula and continuity of f , we have

n
1 f (y)
aα = ··· α 1 +1 dy1 · · · dyn .
2πi |y 1 |=1 |y n |=1 y1 · · · ynα n +1

We conclude that aα eiα·t is the Fourier series of the periodic function f (eit 1 , . . . , eit n ) ,
ti ∈ R . For β ∈ Nn , let us consider the partial sum
 
sβ (x) = ··· aα xα .
α 1 ≤β 1 α n ≤β n

By the generalization of Féjer’s theorem to several variables (cf. [343], Th.XVII.1.20), the
arithmetic means
1 1  
σγ (eit 1 , . . . , eit n ) = ··· ··· sβ (eit 1 , . . . , eit n )
γ1 + 1 γn + 1
β 1 ≤γ 1 β n ≤γ n

converge uniformly to f (eit 1 , . . . , eit n ) for γ1 , . . . , γn → ∞ . The maximum modulus


principle shows that the polynomials σγ (z1 , . . . , zn ) converge uniformly to f (z1 , . . . , zn )
on max |zj | ≤ 1 . This proves the claim for K = C .
If K = R , then it is clear that uniform limits of real power series remain real. Con-
versely, if f ∈ C{r−1 x} has a real Taylor series at 0 , then the above Féjer polynomials
σγ (z1 , . . . , zn ) have real coefficients proving f ∈ R{r−1 x} . 
 α −1
11.5.7. Let f (x) = aα x ∈ K{r x} and let x0 ∈ P (r) if | | is not archimedean,
and x0 be in the interior of the polydisk P (r) if | | is archimedean. We define

r if | | is not archimedean
r0 :=
r − |x0 | = (r1 − |x01 |, . . . , rn − |x0n |) if | | is archimedean.
366 FA LT I N G S ’ S T H E O R E M

With the Taylor coefficients of f (x) at x0 defined by


 
 α+β
∂α f (x0 ) := aα+β xβ0
α
β
α+β   α i +β i 
(here, as usual, α
= αi
), we have the Taylor series

f (x + x0 ) = ∂α f (x0 )xα . (11.5)
α

There is a unique element g(x) of K{r−1


0 x} such that

f (x + x0 ) = g(x)
for every x ∈ P (r0 ) . If K is a field of characteristic 0 , we have

1 ∂ |α| f
∂α f (x0 ) = (x0 ),
α1 ! · · · αn ! ∂x1 · · · ∂xαn n
α1

which is the familiar formula for the coefficients of the Taylor expansion, and justifies the
notation ∂α for them. The Taylor series in the non-archimedean case cannot be used for
the purpose of analytic continuation, contrary to what happens if K = C .

If | | is not archimedean, we have ρr (f ) = ρr (g) (by Lemma 11.5.4), therefore

Proposition 11.5.8. In the non-archimedean case the norm ρr on K{r−1 x} is invariant


by the translation x
→ x + x0 , for any x0 ∈ P (r) .

For convenience, we now state the Cauchy inequalities in terms of the norm ρr :

Proposition 11.5.9. If f (x) = aα xα ∈ K{r−1 x}, then

|aα | ≤ ρr (f )r−α
for every α .
Proof: If | | is not archimedean, this follows from Lemma 11.5.3. If instead | | is
archimedean then, as noted before, the result in one variable is classical and due to Cauchy,
and in general follows by induction on the number of variables. 

Corollary 11.5.10. In the notation of 11.5.7, it holds


|∂α f (x0 )| ≤ ρr (f )r0 −α .
Proof: Clear from Proposition 11.5.8, Proposition 11.5.9, and the Taylor series expansion
(11.5) around x0 . 
A related estimate is Schwarz’s lemma:

Proposition 11.5.11. Let f ∈ K{r−1 x} vanish to order k at 0 . Then for 0 < t ≤ 1 it


holds
ρtr (f ) ≤ tk ρr (f ).
11.5. Norms and the local Eisenstein theorem 367

Proof: In the non-archimedean case, the claim follows directly from Lemma 11.5.3. If
instead | | is archimedean, we may assume that K = C . By Lemma 11.5.4, there is
x∗ ∈ P (tr) with |f (x∗ )| = ρtr (f ) . Since f vanishes to order k at 0 , the function
g(ζ) = ζ −k f (ζ x∗ /t) is an element of K{ζ} . By the maximum principle, we have

t−k ρtr (f ) = |g(t)| ≤ max |g(ζ)| = max |f (ζx∗ /t)|,


|ζ |=1 |ζ |=1

and the result follows from Lemma 11.5.4. 


−1
Proposition 11.5.12. Let x = (x1 , . . . , xn ) , y = (y1 , . . . , ym ) , let f ∈ K{r x} . For
j = 1, . . . , n , let gj ∈ K{s−1 y} with ρs (gj ) ≤ rj . Then f (g1 , . . . , gn ) is an element of
K{s−1 y} with
ρs (f (g1 , . . . , gn )) ≤ ρr (f ).
Proof: Again, by the rescaling described at the end of 11.5.2, we may assume r = (1, . . . , 1)
and s = (1, . . . , 1) . If f is a polynomial, then f (g1 , . . . , gn ) ∈ K{y} . In general, f is a
uniform limit of polynomials on the closed polydisk P and the claim follows from the fact
that ρ is the supremum norm on P . 

11.5.13. Given a polynomial p(x, t) in n + 1 variables with coefficients in the complete


field K and p(0, 0) = 0, ∂∂pt (0, 0) = 0 , we know by the implicit function theorem that,
at least in characteristic 0, the equation p(x, ξ) = 0 has a unique solution ξ = ξ(x) , with
ξ(0) = 0 , in a neighborhood of 0 . In fact, ξ(x) has a Taylor series expansion

ξ(x) = aα xα

convergent in a neighborhood of 0 .

We give here a proof of the local Eisenstein theorem for a strictly convergent power
series p(x, t) . Our argument is as for the implicit function theorem in calculus, but nonethe-
less we give here a detailed proof.

Theorem 11.5.14. Let p(x, t) ∈ K{R−1 x, S −1 t} . Suppose that


∂p
p(0, 0) = 0 and (0, 0) = 0.
∂t
There is a unique ξ(x) ∈ K[[x]] with the properties
p(x, ξ(x)) = 0, ξ(0) = 0.
Let  
 ∂p 
B := S  (0, 0) ρR,S (p)−1
∂t
and let r < B 2 in the non-archimedean case, resp. r = |32|1
B 2 in the archimedean case.
−1
Then ξ(x) ∈ K{(rR) x} and

BS if | | is not archimedean
sup |ξ(x)| ≤
x∈P (rR)
1
|16|
BS if | | is archimedean.
368 FA LT I N G S ’ S T H E O R E M

Existence and uniqueness of ξ will follow from an application of Banach’s fixed point
theorem, and we will also get a polydisk P on which ξ is strictly convergent and an upper
bound for the supremum norm of ξ on P . Then Cauchy’s inequalities will give the desired
estimate for the Taylor coefficients aα .
We first recall Banach’s fixed point theorem:
Theorem 11.5.15. Let (X, d) be a complete metric space and let ϕ : X −→ X be a
contractive map, i.e. there is θ < 1 such that
d (ϕ(x), ϕ(y)) ≤ θ · d(x, y) for all x, y ∈ X.
Then ϕ has a unique fixed point.
Proof: We start with an arbitrary point x0 of X . Then we apply ϕ iteratively to define
xn := ϕn (x0 ) . The triangle inequality shows

l−1 
l−1
θk
d(xk+l , xk ) ≤ d(xk+n+1 , xk+n ) ≤ θk+n d(x1 , x0 ) ≤ · d(x1 , x0 ).
n=0 n=0
1−θ
Hence we have a Cauchy sequence converging to x ∈ X and it follows from continuity of
ϕ that x is a fixed point. Uniqueness is clear for a contraction. 
Proof of Theorem 11.5.14: We indicate by a dot the partial derivative ∂
∂t
, as in f˙ := ∂f
∂t
.
By the Cauchy inequalities (see Proposition 11.5.9), we have
B := S |ṗ(0, 0)| ρR,S (p)−1 ≤ 1.
If the absolute value is archimedean, we may assume that | | is the usual euclidean absolute
value. Let
g(x, t) := t − ṗ(0, 0)−1 p(x, t).
We shall apply Banach’s fixed point theorem to the map
ϕ : f (x)
−→ g(x, f (x)).
We have to choose an appropriate set X of power series and a metric d(x, y) in X which
satisfy the hypothesis of the Banach fixed point theorem. The set X will depend on positive
parameters r1 , . . . , rn , s to be chosen later, which at this moment are required to satisfy
rj /Rj ≤ s/S ≤ 1 and, in the archimedean case, the stronger condition rj /Rj ≤ s/S ≤
1/4 . As usual, we set r := (r1 , . . . , rn ) . We define
X = X(r, s) := {f ∈ K{r−1 x} | ρr (f ) ≤ s}.
The set X is a closed subset of K{r−1 x} , hence X is complete with respect to the supre-
mum norm ρr in the polydisk P (r) . We shall determine the parameters r, s in such a way
that ϕ is a contraction on X with respect to the distance function d(f1 , f0 ) = ρr (f1 −f0 ) .
In order to achieve this, we may always assume, after a suitable field extension as in Lemma
11.5.2, that rj , R, s, S ∈ |K| . This has the advantage that the spectral norm of the occuring
strictly convergent power series equals the maximum norm on the corresponding polydisk
(see Lemma 11.5.4). At the end of the argument for applying Banach’s fixed point theorem,
we will go back to the original K to be sure that ξ(x) will have coefficients in K .
First, we claim that for t, t0 ∈ K with |t| ≤ s, |t0 | ≤ s , we have
ρr (g(x, t) − g(x, t0 )) ≤ θ |t − t0 |, (11.6)
11.5. Norms and the local Eisenstein theorem 369

where 
s
S
B −1 in the non-archimedean case
θ := 8s
S
B −1 in the archimedean case.
Suppose first that the absolute value is archimedean. By Ostrowski’s theorem in 1.2.6, we
may assume K = C . For x ∈ P (r) , we have
t
g(x, t) − g(x, t0 ) = ġ(x, u)du, (11.7)
t0

where the integral is on the line segment from t0 to t ; note that if |t0 |, |t| ≤ s then |u| ≤ s
follows by convexity, and also ġ is well defined in any closed polydisk in the interior of
P (R, S) . By construction, we have ġ(0, 0) = 0 . We apply Schwarz’s lemma from 11.5.11
and Cauchy’s inequality (see Corollary 11.5.10), getting (note that rj /Rj ≤ s/S ≤ 1/4
and B ≤ 1 )
2s 4s 4s 8s −1
ρr,s (ġ) ≤ ρ S r S (ġ) ≤ 2 ρ(R,S ) (g) ≤ (1 + B −1 ) ≤ B .
S 2s , 2 S S S
Using this bound in (11.7), we get (11.6).
In the non-archimedean case, let us introduce the operator Dt acting on power series in t
by f (t)
→ 1t (f (t) − f (0)) .
For x ∈ P (r) and |t| ≤ s , |t0 | ≤ s , an easy calculation shows
|g(x, t) − g(x, t0 )| ≤ ρ(r,s) (Dt g) |t − t0 |,
whence Schwarz’s lemma in 11.5.11 implies
s
|g(x, t) − g(x, t0 )| ≤ ρ(R,S ) (Dt g) |t − t0 |. (11.8)
S
We have
ρ(R,S ) (Dt g) ≤ S −1 ρ(R,S ) (g) ≤ B −1 .
Now (11.6) follows from this inequality and (11.8).
By Proposition 11.5.12, ϕ maps X to K{r−1 x} . The condition θ < 1 , as needed for the
application of Banach’s fixed point theorem, follows from (11.6) if Ss < B or Ss < B/8
according as we are in the non-archimedean or archimedean case.
In order to complete the proof, we need to check that ϕ maps X into itself for r, s suffi-
ciently small. This means checking that
|g(x, t)| ≤ s
for x ∈ P (r) and |t| ≤ s .
We begin by noting that, since g(0, 0) = 0 , Schwarz’s lemma from 11.5.11 yields
sup |g(x, 0)| ≤ λ sup |g(x, 0)| ≤ λSB −1 , (11.9)
x∈P (r) x∈P (R)

where λ = max{rj /Rj } . Suppose first that | | is archimedean. By (11.6) and (11.9) we
have
|g(x, t)| ≤ θs + |g(x, 0)| ≤ θs + λSB −1 . (11.10)
370 FA LT I N G S ’ S T H E O R E M

In the non-archimedean case, (11.6) on page 368 and the ultrametric inequality give
 
|g(x, t)| ≤ max (θs, |g(x, 0)|) ≤ max θs, λSB −1 . (11.11)
Now let r, s be given by
r1 rn s
= ··· = = a B2, = b B,
R1 Rn S
where we choose for example any 0 < a = b < 1 in the non-archimedean case, and
a = 1/32 and b = 1/16 in the archimedean case. Thus we see from (11.10) and (11.11)
that
ρ(r,s) (g(x, t)) ≤ s, θ<1
hold. We apply Banach’s fixed point theorem, obtaining a unique ξ ∈ K{r−1 x} with
p(x, ξ(x)) = 0, ρr (ξ) ≤ s.
This proves the existence of a solution ξ(x) satisfying the bound stated in the theorem.
It is also clear that ξ(0) = 0 . In fact, since ϕ maps the closed set
X0 := {f ∈ X | f (0) = 0}
into itself, we may apply Banach’s fixed point theorem to X0 as well. Uniqueness of the
fixed point shows that ξ ∈ X0 , hence ξ(0) = 0 .
We leave the proof of the uniqueness of ξ as a formal power series as an exercise for the
reader. For example, we can apply once more the Banach fixed point theorem, in the space
K[[x]] of formal power series with the norm
|f | = max e−|α| ,
a α =0

to the set X := {f ∈ K[[x]] | f (0) = 0} , with the same map ϕ . 


The next statement is a sharper form of the local Eisenstein theorem for a polynomial.
Corollary 11.5.16. Let p ∈ K[x, t] with p(0, 0) = 0 and ∂∂pt (0, 0) = 0 . Denote by |p|
the Gauss norm of p . Let d1 , . . . , dn+1 be the partial degrees of p and let

n+1
D := (dj + 1).
j=1

Then there is a unique 


ξ= aα xα ∈ K[[x]]
α∈Nn
with ξ(0) = 0 and p(x, ξ(x)) = 0 . For α ∈ Nn , we have
; 
2|α|−1
 ∂p 
|aα | ≤ C |α| · |p|  (0, 0) ,
∂t
where 
1 in the non-archimedean case
C=  
32D2  in the archimedean case.
Proof: This follows from Theorem 11.5.14 and the Cauchy inequalities in 11.5.9, tak-
ing (R, S) = (1, . . . , 1) . In the non-archimedean case, we have ρ(p) = |p| . In the
archimedean case, the claim is a consequence of ρ(p) ≤ |Dp| . 
11.6. A lower bound for the height 371

11.5.17. The bound given by the local Eisenstein theorem is sharp in the non-archimedean
case, and not far from the truth in the archimedean case.
Consider the polynomial p(x, t) = x − 2at + t2 , where a = 0 is a parameter satisfying
|a| ≤ 1 . The formal power series solution with ξ(0) = 0 is given by
 
0 ∞
1
ξ(x) = a − a 1 − x/a2 = (−1)j−1 2 a1−2j xj .
j=1
j
1
It is easy to see that (−1)j−1 22j−1 2
j
is a positive integer for j ≥ 1 .
Suppose that | | is not archimedean, with valuation ring R and residue field k(v) , and
suppose that |2| = 1 . Then
|2a|
|ṗ(0, 0)|/|p| = = |a|,
max(1, |2a|)
and it follows that ; 
2j−1
 ∂p 
|∂j ξ(0)| = |p|  (0, 0)
 ∂t 
1
whenever | j2 | = 1 . This indeed happens for infinitely many values of j if |2| = 1 .
Suppose this is not the case. We choose a ∈ K with |a| = 1 and look at the reduction
π(ξ(x)) of ξ(x) ∈ R[[x, a−1 ]] modulo the maximal ideal of R . Then this reduction would
be a polynomial in x and a−1 , contradicting the fact that the polynomial x − 2π(a)t + t2
is irreducible over k(v)(x) .
The same example shows that, apart for the value of the numerical constant C , the bound
given in the local Eisenstein theorem is also sharp in the archimedean case.

11.6. A lower bound for the height

Let C be an irreducible projective smooth curve over a number field K and let
us fix P0 ∈ C(K). Let V be a Vojta divisor satisfying (V1) and (V2). We are
interested in getting a lower bound for the height hV (P, Q), where P, Q ∈ C(K).
This is obtained by means of Lemma 11.6.5 and Lemma 11.6.7. The first lemma
gives an explicit lower bound in term of the Taylor coefficients of local coordinates
for C viewed as algebraic functions of a uniformizing parameter of C at P or Q.
The second lemma applies the local Eisenstein theorem to bound these Taylor
coefficients. The reader is assumed to be familiar with the concept of tangent
spaces provided by A.7.
11.6.1. Let ∂, ∂  be non-zero vectors in the tangent space of C at P, Q. As in
11.5.7, we abbreviate
1 1 
∂i := ∂ i , ∂i := ∂ i .
i! i!
Any differential operator on OC×C,(P,Q) of degree k with values in K(P, Q) is
a homogeneous polynomial of degree k in the variables ∂, ∂  with coefficients in
K(P, Q). In fact, they act on K(C × C). The advantage of the normalizations
372 FA LT I N G S ’ S T H E O R E M

above is that Leibniz’s rule has an easier form. If f1 , . . . , fr are rational functions
on C , then 
∂i (f1 · · · fr ) = ∂i1 f1 · · · ∂ir fr ,
where the sum ranges over all (i1 , . . . , ir ) with i1 + · · · + ir = i. This is easily
proved by induction on r , starting with r = 2.
11.6.2. Let s ∈ Γ(C × C, O(V )) \ {0}. A pair (i∗1 , i∗2 ) ∈ N2 is called admissible
if and only if
∂i∗1 ∂i∗2 s(P, Q) = 0
and
∂i1 ∂i2 s(P, Q) = 0
whenever i1 ≤ i∗1 , i2 ≤ i∗2 and (i1 , i2 ) = (i∗1 , i∗2 ). In order to make sense of the
above formulas, we should choose a trivialization of O(V ) in (P, Q). It is also
clear that admissibility is independent of the choice of the trivialization and the
choice of ∂, ∂  . By 11.2.6

ξij := (xi /xj )|C , ξij := (xi /xj )|C
are well-defined non-zero rational functions on C for i, j = 0, . . . , n. We also
write ξ j for the vector with components ξij , i = 0, . . . , n, and similarly for ξ j  .
11.6.3. We are going to choose an explicit height function relative to O(V ).
Choose a finite extension L/K such that P, Q ∈ C(L). We use the decompo-
sition
V = (δ1 N {P0 } × C + δ2 N C × {P0 }) − dB
of V as a difference of two very ample divisors (see 11.2.6). Now we have gener-
h
ating sections xh x (|h| = δ1 , |h | = δ2 ) of O(δ1 N {P0 }×C +δ2 N C ×{P0 })
and yi (|i| = d) of O(dB). With respect to the presentation (see 2.2.1)
 
h
sV ; O(δ1 N {P0 } × C + δ2 N C × {P0 }), xh x ; O(dB), yi ,
we have the global height function (see 2.4.6)
 
  xh x  h  
 
hV (P, Q) : = max min log  (P, Q)
h,h i  yi 
v∈ML v
 
  δ1  δ2 
 xj xj  
= max min log  (P, Q)  .
j,j  i  yid 
v∈ML v

Note that the vectors x(P ), x (Q) and y(P, Q) are only defined up to a multiple.
By the product formula, hV (P, Q) is well-defined; it is the difference of two Weil
heights, the first given by the closed embedding
 
h
(P, Q) −→ xh (P )x (Q)  |h|=δ1 , |h |=δ2
11.6. A lower bound for the height 373

and the second given by the closed embedding


 
(P, Q) −→ yi (P, Q) |i|=d .
11.6.4. From Lemma 11.2.7, we get bihomogeneous polynomials Fi (x, x ) of
bidegree (δ1 , δ2 ) with
 
s = Fi (x, x )/yid |C×C , i = 0, . . . , m.
Let h(F) be the height of the point in appropriate projective space whose coordi-
nates are given by all the coefficients of F0 , . . . , Fm . For v ∈ MK , let jv be the
index j for which |ξj0 (P )|v is largest and similarly jv for |ξj0
(Q)|v .
Lemma 11.6.5. Let s be a non-zero global section of O(V ) and let (i∗1 , i∗2 ) be
admissible for s at (P, Q). With the notation introduced above, we have

hV (P, Q) ≥ − h(F) − n log (δ1 + n) (δ2 + n)


 

− max max log |∂iλ ξνjv (P )|v


{iλ } ν
v∈ML λ
 

− max

max log |∂i ξνj

 (Q)|v
{iλ } ν vλ
v∈ML λ
− (δ1 + δ2 + i∗1 + i∗2 ) ,
where {iλ } and {iλ } run over all partitions of i∗1 and i∗2 .
Proof: We fix trivializations of OPn (1) at P and Q and of OPm (1) at (P, Q). By
tensor product, this gives trivializations of all the bundles in question, in particular
of V by 11.2.3 and 11.2.6. In the following, we use the trivializations without
mention. For example s is viewed as a regular function at (P, Q).
We have seen in 11.6.3 that
 
  yd 
 
hV (P, Q) = − max min log  δ i δ2 (P, Q) .
v
i j,j  xj 1 x  
j v
Assume that xj (P )xj  (Q)yi (P, Q) = 0. Only such i, j, j  are of importance in
the above formula. By admissibility and Leibniz’s rule, we have
   
yid   yid
∂i∗ ∂i∗ s (P, Q) = ∂i∗1 ∂i∗2 s (P, Q)
xδj 1 xj δ2 1 2 xδj 1 xj δ2
and the right-hand side equals
∂i∗1 ∂i∗2 Fi (ξ j , ξ j  )(P, Q).
Using (∂i∗1 ∂i∗2 s) (P, Q) = 0 and the product formula, we get
  
hV (P, Q) = − max min log∂i∗1 ∂i∗2 Fi (ξ j , ξ j  )(P, Q)v .
i j,j
v
374 FA LT I N G S ’ S T H E O R E M

δ1 +nδ2 +n


The number of monomials of Fi is bounded by n n ≤ (δ1 + n)n (δ2 +
n)n . We conclude that
hV (P, Q) ≥ − h(F) − n log ((δ1 + n)(δ2 + n))
  
 
− min max log ∂i∗1 ξ jl (P )
j |l|=δ1 v
v
  
   l 
− min

max

log  ∂ ∗ ξ
i2 j  (Q) .
j |l |=δ2 v
v

Now, since for each v we take the minimum with respect to j and j  , we may
take instead j = jv and j  = jv . This remark is quite important in what follows.
Let us consider log |∂i∗1 ξ jl (P )|v for v ∈ ML . By Leibniz’s rule, we have

n 

∂i∗1 ξ jl = ∂iµ ν ξνj ,
ν=0 µ=1

where iµν = i∗1 . Since the total number of pairs µν equals δ1 , the number
µν
 ∗  ∗
of possibilities for iµν equals δ1 +ii∗1 −1 ≤ 2δ1 +i1 . We are interested in the case
1
in which j = jv and j  = jv . We note that, since |ξjv 0 (P )|v is the largest
|ξj0 (P )|v , we have |ξνjv (P )|v = |ξν0 (P )/ξjv 0 (P )|v ≤ 1 for every ν , allowing
us to get rid of the terms with iµν = 0 in estimating derivatives by means of
Leibniz’s rule; the same remark of course applies to |ξ jv |v .
This gives the bound


 
 l 
log ∂i∗1 ξ jv (P ) v ≤ max max log |∂iλ ξνjv (P )|v + (δ1 + i∗1 ) εv log 2,
{iλ } ν
λ

where εv = [Lv : R]/[L : Q] if v|∞ and εv = 0 otherwise, and where {iλ }


runs over all partitions of i∗1 . An entirely analogous estimate holds for the sum
involving ξ j  , and Lemma 11.6.5 follows. 
11.6.6. Our next task consists in majorizing the sums appearing in Lemma 11.6.5;
this we do by an application of the local Eisenstein theorem. Let us fix a non-
constant f ∈ K(C). For any P ∈ C(K) which is neither a pole of f nor a
zero of df , the function ζ = f − f (P ) is a local uniformizer at P (i.e. a local
parameter in the local ring). Therefore the completion of OC,P with respect to its
maximal ideal is isomorphic to K(P )[[ζ]] (by the Cohen structure theorem, see
[148], Th.I.5.5A). Moreover, we may differentiate with respect to ζ .

By assumption and A.4.11, K(C) is a finite extension of K(f ). Choose gij (x, t) ∈
K[x, t] such that gij (f, t) ∈ K(f )[t] is a minimal polynomial of ξij over K(f ).
Since char(K) = 0, we have ∂t ∂
gij (f, ξij ) = 0 in K(C) (irreducible polynomi-
als are separable). Let deg(gij ) be the total degree of gij (x, t).
11.6. A lower bound for the height 375

Let us denote by Z the finite subset of C(K) consisting of:

(a) all zeros of xj , for j = 0, . . . , n;


(b) all poles of f ;
(c) the support of div(df );

(d) the zeros of ∂t gij (f, ξij ), for i, j = 0. . . . , n.

We are going to apply the local Eisenstein theorem to the polynomials


pij (x, t) := gij (x + f (P ), t + ξij (P ))
for any P ∈
/ Z . Note that pij (0, 0) = 0 and ∂
∂t pij (0, 0) = 0.
Let L/K be a finite extension with P ∈ C(L). For v ∈ ML , we have
deg(gi j )
|pij |v ≤ |C1 |ηvv max (1, |f (P )|v , |ξij (P )|v ) |gij |v , (11.12)

where C1 ≤ (deg(gij ) + 1)2 2deg(gi j ) and



1 if v is archimedean
ηv =
0 if v is not archimedean.
Since ξij is regular at P , we get from the local Eisenstein theorem in 11.4.1
 2k−1
|pij |v
|∂k ξij (P )|v ≤ |C2 |v
kηv (11.13)
| ∂t

pij (0, 0)|v

for k ≥ 1 and ∂k = (1/k!)(∂/∂ζ)k , with for example C2 = max 32(deg(gij ) +


1)4 (using the sharper estimate in Corollary 11.5.16). Now we relate the sum in
Lemma 11.6.5 with the canonical form on the Jacobian.
Lemma 11.6.7. If P ∈
/ Z , then
 

 
max max log |∂iλ ξνjv (P )|v  i∗1 |j(P )|2 + 1 ,
{iλ } ν
v∈ML λ

where the maximum runs over all partitions {iλ } of i∗1 . The constant implied in
the symbol  is independent of P and i∗1 .
Proof: We denote by a dot differentiation with respect to t , as in f˙ = ∂f
∂t .
It is clear that (f, ξij ) may be viewed as the first two affine coordinates of a
morphism ϕ from C into some projective space. Since N [P0 ] is ample on C ,
there is k ∈ N such that O(kN [P0 ]) ⊗ ϕ∗ O(−1) is very ample (see A.6.10). By
Theorem 2.3.8, this proves
h ((1 : f (P ) : ξij (P ))) ≤ hϕ (P )  hN [P0 ] (P ) + 1. (11.14)
376 FA LT I N G S ’ S T H E O R E M

From (11.13), we have


 
max max log |∂iλ ξνjv (P )|v
{iλ } ν
v λ
   


+ 1 
≤ i∗1 log(C2 ) + max max(2iλ − 1) log |pνjv | + log 
+  .
v
{iλ } ν ṗνjv (0, 0) v
λ

By (11.12) and (11.14), the right-hand side is bounded by




∗ ∗ + 1
2i1 h(g) + 2i1 log
ij v
|ġij (f (P ), ξij (P ))|v
+ 2i∗1 log(C2 C1 ) + 2i∗1 C3 (hN [P0 ] (P ) + 1),
where h(g) is the height of the vector formed with the coefficients of all polyno-
mials gij and 1, and where C3 is the largest constant involved in (11.14) times
the maximum of all deg(gij ). Now the claim follows from
N
hN [P0 ] (P ) = |j(P )|2 + O(|j(P )|) + O(1),
2g
which is a consequence of Proposition 11.3.1 applied to the Vojta divisor N {P0 }×
C , and from the fact that, by (11.14) and h(a) = h(1/a), we have
  1  1

log+ = h
ij v∈ML
|ġij (f (P ), ξij (P ))|v ij
ġij (f (P ), ξij (P ))

= h(ġij (f (P ), ξij (P )))
ij
 h((1 : f (P ) : ξij (P )))  hN [P0 ] (P ) + 1.

This completes the proof. 

11.7. Construction of a Vojta divisor of small height

In this section, we prove the crucial Lemma 11.7.3, which gives the existence
of a section of O(V ), with V a Vojta divisor, of small height. The argument
is fairly standard. The space of sections of O(V ) is presented as a subspace,
given by linear relations with small height, of a vector space with a standard basis.
The Riemann–Roch theorem shows that this subspace has large dimension, and
the existence of a small section follows by Siegel’s lemma or, equivalently, by
geometry of numbers.
Let C be an irreducible projective smooth curve of genus g over a number field K
and let us fix P0 ∈ C(K). We shall use the notation of Section 11.2, in particular
V will be a Vojta divisor satisfying (V1) and (V2).
11.7. A Vojta divisor of small height 377

Lemma 11.7.1. The following holds


dim Γ(C × C, ψ ∗ O(δ1 , δ2 )) = (N δ1 + 1 − g) (N δ2 + 1 − g)
and, for d1 + d2 > 4g − 4
dim Γ(C × C, O(V )) ≥ d1 d2 − gd2 + O(d1 + d2 ).
Proof: By our assumptions in 11.2.3, the divisors δi N [P0 ] are very ample on
C . By the Riemann–Roch theorem (see Theorem A.13.5) on the curve C and
N ≥ 2g + 1, we obtain
dim Γ(C, O(δi N [P0 ])) = N δi + 1 − g,
for i = 1, 2. Since O(δ1 N {P0 } × C + δ2 N C × {P0 }) is the pull-back of
O(δ1 , δ2 ) (see (11.1) on page 357), we get
dim Γ(C × C, ψ ∗ O(δ1 , δ2 )) ≥ (N δ1 + 1 − g)(N δ2 + 1 − g).
The right-hand side is the dimension of the subspace of Γ(C × C, ψ ∗ O(δ1 , δ2 ))
generated by tensor products s1 ⊗s2 , where si is a global section of OPn(δi N [P0 ]).
By (V2), it follows that any global section of O(δ1 N {P0 } × C + δ2 N C × {P0 })
is the restriction of a section of O(δ1 , δ2 ) (see 11.2.6, end). Because of
Γ(PnK × PnK , O(δ1 , δ2 )) ∼
= Γ(PnK , OPn (δ1 )) ⊗ Γ(PnK , OPn (δ2 ))
(see Example A.6.13), the above-mentioned subspace equals Γ(C × C, ψ ∗ O(δ1 ,
δ2 )) and we get the first claim.
For the proof of the second claim, we need the Riemann–Roch theorem on the
surface C × C . First note that H = {P0 } × C + C × {P0 } is ample (from Lemma
11.2.4) and that
V · H = d1 + d 2
(from Lemma 11.2.1). Let KC be a canonical divisor on C ; then KC×C :=
KC × C + C × KC is a canonical divisor on C × C , numerically equivalent to
(2g − 2)H . The former follows immediately from TC×C = TC ⊕ TC . In order
to verify the latter, we may assume that we are working with varieties over an
algebraically closed field (see A.9.25). A canonical divisor KC of the curve C ,
of genus g , has degree 2g − 2 (see A.13.6), hence it is algebraically equivalent to
(2g − 2)[P0 ] because the degree characterizes algebraic equivalence of divisors on
a curve (see A.9.40). Therefore, (2g − 2)H is algebraically equivalent to KC×C
(use A.9.36) and a fortiori numerically equivalent to it (see A.9.41). If
d1 + d2 > KC×C · H = 4g − 4,
we have (KC×C − V ) · H < 0. Therefore
dim Γ(C × C, O(KC×C − V )) = 0,
378 FA LT I N G S ’ S T H E O R E M

because an effective divisor cannot have a negative intersection number with the
ample divisor H . By the Riemann–Roch theorem in A.13.9, we get
1
dim Γ(C × C, O(V )) ≥ V · (V − KC×C ) + 1 + pa (C × C),
2
where pa (C × C) is the arithmetic genus. Again by Lemma 11.2.1, we get
V · V = 2d1 d2 − 2gd2 .
Together with
V · KC×C = (2g − 2)V · H = (2g − 2)(d1 + d2 ),
we get the second claim. 
11.7.2. We need now another assumption for the parameters of the Vojta divisor.
(V3) d1 + d2 > 4g − 4 and d1 d2 − gd2 > γ d1 d2 for some γ > 0.
Here, γ is independent of d1 , d2 , d . As we have seen at the beginning of A.11.6,
we may map C × C by a birational morphism onto a hypersurface of degree D in
P3K . We denote the projective coordinates in P3K by z and we may assume that
the polynomial giving the hypersurface has the same form as in A.11.7 or in 2.5.1.
All presentations (see 2.5.4) will refer to this set up.

Now we are ready to construct a Vojta divisor with small height.


Lemma 11.7.3. There are two positive constants C4 , C5 independent of d1 , d2 ,
d and γ with the following property. Let V be a Vojta divisor satisfying (V1), (V2),
(V3), and d1 , d2 ≥ C4 /γ . Then there is a non-zero global section s of O(V ) such
that the polynomials F0 , . . . , Fm in Lemma 11.2.7 may be chosen with
h(F) ≤ C5 (d1 + d2 )/γ.
Proof: The idea is to apply Siegel’s lemma to get a section of small height. Thus
we have to transfer the equations in Lemma 11.2.7 into a linear system of equations
with coefficients in K . Thereby, we must be careful not to increase the height of
the linear system too much.
We consider C as a curve in PnK of degree N (via the closed embedding ϕN [P0 ] )
and we may also assume, by a linear change of coordinates, that the projection
p(x) = (x0 : x1 : x2 ) maps C by a birational morphism into P2K . The purpose
of this projection is to reduce the number of linear equations to be considered
in the application of Siegel’s lemma; this is a rather important step in our proof.
Moreover, we may also assume that p(C) is explicitly given by a homogeneous
polynomial
−1
f (x0 , x1 , x2 ) = a0 + a1 x2 + . . . + aN −1 xN
2 + xN
2
with ai ∈ K[x0 , x1 ] homogeneous polynomials of degree N − i (see A.11.5–
A.11.7 for details).
11.7. A Vojta divisor of small height 379

Now let U be the subspace of Γ(C × C, O(V )) consisting of sections of the form

−d    

s = yi Fi (x0 , x1 , x2 ; x0 , x1 , x2 )
C×C

for i = 0, . . . , m and bihomogeneous polynomials Fi of bidegree (δ1 , δ2 ) with


the additional restriction that
degx2 Fi < N, degx2 Fi < N.
We may assume δi ≥ N , then the number of possible monomials so involved in
Fi is

N −3 N −3
N δ1 −
2
δ2 − .
2 2
Since f is irreducible over K(x0 , x1 ), we see that the restrictions to C × C of the
above monomials are linearly independent over K . By Lemma 11.2.7, the vector
space Γ(C × C, O(V )) may be identified with
$ %
W := (Fi ) ∈ Γ(C × C, O(δ1 , δ2 ))m+1 | (Fi ) satisfies (11.2) on page 358 .

The above considerations and Lemma 11.7.1 show that


codim(U m+1 , Γ(C × C, O(δ1 , δ2 ))m+1 ) = O(δ1 + δ2 );
therefore, we do not lose too many sections if we restrict our attention to U m+1 ∩
W . More precisely, Lemma 11.7.1 shows that
dim(U m+1 ∩ W ) ≥ dim(W ) − codim(U m+1 , Γ(C × C, O(δ1 , δ2 ))m+1 )
≥ d1 d2 − gd2 − O(d1 + d2 ).
(11.15)
Note also that d = O(d1 + d2 ) by (V3), whence δ1 + δ2 = O(d1 + d2 ).
Let (Fi ) ∈ U m+1 ∩ W , that is
y0d Fi (x0 , x1 , x2 ; x0 , x1 , x2 ) = yid F0 (x0 , x1 , x2 ; x0 , x1 , x2 ). (11.16)
Let p be a presentation of the morphism φB , hence
p ∈ K[x0 , x1 , x2 ; x0 , x1 , x2 ]m+1 ,
where all entries are bihomogeneous of the same bidegree (d1 (p), d2 (p)) with
degx2 pi < N, degx2 pi < N
and
(y0 : · · · : ym ) = (p0 (x; x ) : · · · : pm (x; x ))
on C × C . Then (d1 (p), d2 (p)) is by definition the bidegree of this presentation.
The height h(p) is the height of the vector of coefficients of all polynomials pi .
380 FA LT I N G S ’ S T H E O R E M

We consider equation (11.16) as a linear system, where the unknowns are the co-
efficients ai = (aiββ ) of the polynomials
  β
Fi (x0 , x1 , x2 ; x0 , x1 , x2 ) = ··· aiββ xβ x .

Our goal is to transform (11.16) into a linear system with coefficients in K . We


may replace y0 and yi by p0 and pi . Then, using the relation f (x0 , x1 , x2 ) = 0
to reduce the exponents in x2 , x2 to degree strictly less than N, we transform
(11.16) into a linear system
 α
 α
L0αα (ai )xα x = Liαα (a0 )xα x (11.17)
α,α α,α

for i = 0, . . . , m, where the coefficients Liαα (aj ) are linear forms with un-
knowns the vector aj of coefficients of the polynomial Fj . Here the indices α,
α range over
α = (α0 , α1 , α2 ), α = (α0 , α1 , α2 ),
|α| = δ1 + d · d1 (p), |α | = δ2 + d · d2 (p),
α2 < N, α2 < N.
It remains to compute a bound for the height of the linear forms Liαα . This is
a routine procedure, which we have already used in Chapter 2 in analyzing the
height of presentations, specifically in Lemma 2.5.6. More precisely, let q be the
presentation
β β β
q = (xβ0 0 xβ1 1 xβ2 2 x0 0 x1 1 x2 2 ), |β| = δ1 , |β  | = δ2 , β2 < N, β2 < N.
We consider the new presentation
(pdi qββ ), i = 0, . . . , m
with β , β  as above. Using the equations f (x0 , x1 , x2 ) = 0 and f (x0 , x1 , x2 ) =
0, we express the restriction to C × C of monomials appearing with exponents
greater or equal to N as linear combinations of monomials in x0 , . . . , x2 involv-
ing only x2 and x2 with exponents strictly less than N . In this way we obtain the
new presentation
⎛ ⎞
 α 
⎝ Liαα ,ββ xα x ⎠,
α,α

where the Liαα ,ββ are the coefficients of the linear form Liαα (a).
Now the same proof as in Lemma 2.5.6 for bigraded presentations of morphisms
shows that the height of the linear forms Liαα (a) is bounded by
d h(p) + h(q) + o(d) = d h(p) + o(d).
11.8. Application of Roth’s lemma 381

Here we have used our special model and F ∈ U to get h(q) = 0, otherwise
the upper bound would have been only  d1 + d2 + d , not sufficient for the
applications we have in mind.
From (11.17) we have to solve the linear system
L0αα (ai ) = Liαα (a0 )
with i = 0, . . . , m and α, α as above. The number of unknowns is

N −3 N −3
(m + 1)N 2 δ1 − δ2 − ≤ (m + 1)N 2 δ1 δ2 .
2 2
By (11.15) on page 379 and assumption (V3), the dimension of the space of solu-
tions is bounded below by
d1 d2 − gd2 − O(d1 + d2 ) ≥ γd1 d2 − O(d1 + d2 ).
There is a constant C4 > 0 such that this is bounded below by γ2 d1 d2 for d1 , d2 ≥
C4 /γ . Therefore, by Siegel’s lemma in Corollary 2.9.9 (even the simple version
in Corollary 2.9.2 is enough if we replace the height h(p) by log maxσ,i |σ(pi )|),
there is a solution yielding an F = (Fi ) ∈ U m+1 ∩ W with
δ1 δ2
h(F) ≤ 2(m + 1)N 2 (h(p)d + log(δ1 ) + log(δ2 ) + o(d)) .
γd1 d2
Since d1 d2 > gd2 , we easily get
δ1 δ2 d δ1 δ2 log(δi ) 0
= O(d1 + d2 ) and = O( d1 + d2 log(d1 + d2 ))
d1 d2 d1 d2
proving our claim. 

11.8. Application of Roth’s lemma

Let C be an irreducible projective smooth curve over the number field K and
let P0 ∈ C(K). With the notation introduced in Section 11.2, let V be a Vojta
divisor satisfying (V1) and (V2). Let Fi (x, x ), i = 0, . . . , m denote bihomoge-
neous polynomials of bidegree (δ1 , δ2 ), describing a non-trivial global section s
of O(V ) as in Lemma 11.2.7, hence
s = Fi (x, x )/yid |C×C
for i = 0, . . . , m. We are looking for an upper bound of the admissible pair
(i∗1 , i∗2 ) in the point (P, Q) ∈ (C × C)(K), defined in 11.6.2. The idea is to
project down to P1K × P1K to get a bihomogeneous polynomial instead of s and
then apply Roth’s lemma to that polynomial. In 11.8.2–11.8.5, we describe the
push-down of syid and show that it has similar properties as s . Lemma 11.8.6, the
goal of this section, is the application of Roth’s lemma.
382 FA LT I N G S ’ S T H E O R E M

11.8.1. For the reader’s convenience, we make the following simplifications: By a


change of coordinates, we may assume that the projection
p : PnK −→ P2K , x −→ (x0 : x1 : x2 )
is well defined on C and maps C birationally onto its image. Since C has degree
N in PnK , we may assume that p(C) is given by an irreducible polynomial
−1
f (x0 , x1 , x2 ) = a0 + a1 x2 + · · · + aN −1 xN
2 + xN
2 ,

where ai ∈ K[x0 , x1 ] is homogeneous of degree N − i. Additionally, we may


assume that none of the coordinates x0 , . . . , xn vanishes identically on C . We
also consider the finite morphism π : C → P1K given by π(x) = (x0 : x1 ), of
degree N . For details, we refer to A.11.5–A.11.7.
Since p is well defined on C and f vanishes at (x0 (P ) : x1 (P ) : x2 (P )),
either x0 (P ) = 0 or x1 (P ) = 0. Without loss of generality, we may assume
x0 (P ) = 0. On {x ∈ C | x0 = 0}, we use the affine coordinates ξ1 , . . . , ξn ,
where
ξj := xj /x0 |C , j = 1, . . . , n.
Since f (x0 , x1 , x2 ) is a monic polynomial in x2 , we see that ξ2 is integral over
K[ξ1 ]. Similarly, we may assume that ξj is integral over K[ξ1 ] for j ≥ 3. For
simplicity, we may also assume x0 (Q) = 0. The affine coordinates on PnK × PnK
are denoted by (ξ, ξ  ).

In order to understand the following, it is convenient for the reader to be familiar


with the intersection theory of divisors, as in Section A.9.
Lemma 11.8.2. There is a bihomogeneous polynomial Gi ∈ K[x0 , x1 , x0 , x1 ] of
bidegree (N 2 δ1 , N 2 δ2 ) such that
(π × π)∗ div(Fi |C×C ) = div(Gi ).
Here π × π denotes the natural morphism C × C −→ P1K × P1K induced by π .
Proof: Since π ×π is finite, (π ×π)∗ div(Fi |C×C ) is an effective divisor on P1K ×
P1K , and is not 0. Recall that Pic(P1K × P1K ) ∼ = Z × Z , with elements represented
by O(m, n) (cf. A.9.28) . Therefore, we have a bihomogeneous polynomial Gi ∈
K[x0 , x1 , x0 , x1 ] such that div(Gi ) is equal to that divisor (see Example A.6.13).
Denoting the bidegree of Gi by (k1 , k2 ) and by P1 a point on P1K (K), we have
a rational equivalence
div(Gi ) ∼ k1 {P1 } × P1K + k2 P1K × {P1 }.
Using Lemma 11.2.1 for P1K × P1 , we have
k2 = div(Gi ) · ({P1 } × P1K ).
By the projection formula, the right-hand side equals
div(Fi |C×C ) · (H × PnK )|C×C ,
11.8. Application of Roth’s lemma 383

where H denotes a hyperplane in PnK . Now from (11.1) on page 357 it follows
that the divisor div (Fi |C×C ) is rationally equivalent to δ1 N {P0 }×C +δ2 N C ×
{P0 }, while N {P0 } × C is in the class of H × PnK |C×C . Using Lemma 11.2.1
for C × C , we get k2 = N 2 δ2 . Similarly, we have k1 = N 2 δ1 . 
Remark 11.8.3. Let Norm denote the norm with respect to the field extension
K(C × C)/K(P1K × P1K ). Then we may choose
Gi (ξ1 , ξ1 ) = Norm(Fi (ξ, ξ  )). (11.18)
In order to see this, note that
div(Norm(Fi (ξ, ξ  )) = (π × π)∗ div(Fi (ξ, ξ  )))
= div(Gi (x, x )) − (π × π)∗ div(xδ01 x0 2 |C×C )
δ

by A.9.11. Since π has degree N and div(x0 |C ) = π ∗ ([∞]) on C , the projection


formula for proper intersections ([125], Prop.2.3) yields π∗ (div(x0 |C )) = N [∞].

This is equal to div(xN0 ) on PK and a similar identity holds for x0 . This easily
1

proves (11.18).
Lemma 11.8.4.
h(Gi ) ≤ N 2 h(Fi ) + O(δ1 + δ2 ).
Proof: Using 11.8.1, it is clear that 1, ξ2 , . . . , ξ2N −1 form a basis of K(C) over
K(ξ1 ). There are ajk ∈ K(ξ1 ), k = 0, . . . , N − 1, such that

N −1
ξj = ajk ξ2k , j = 3, . . . , n
k=0
and a similar relation holds for ξj .
There are polynomials b(x0 , x1 ), bjk (x0 , x1 ) ∈ K[x0 , x1 ], all homogeneous of
the same degree N  , such that ajk = bjk /b ; then we have a presentation p of the
closed embedding C −→ PnK given by

⎪ −1

⎪ bxN
0 if j = 0
⎨ N −2
bx0 xj if j = 1, 2
pj (x0 , x1 , x2 ) =

⎪ N
−1

⎩ bjk xN0
−k−1 k
x2 if j ≥ 3.
k=0
The relations 
pj xj 
=
p0 x0 C×C
are obvious, hence p is indeed a presentation of degree N  = N  + N − 1. By
Lemma 2.5.6, any monomial in ξ1 , . . . , ξn of degree ≤ δ1 has the form
1 
ck1 k2 ξ1k1 ξ2k2
b(ξ1 )δ1 
k1 +k2 ≤N δ1
k2 <N
384 FA LT I N G S ’ S T H E O R E M

with ck1 k2 ∈ K and h(c) ≤ δ1 h(p) + O(δ1 ) = O(δ1 ). We get a similar ex-
pression for a monomial in ξ1 , . . . , ξn of degree ≤ δ2 . By Proposition 1.6.2, any
monomial in ξ, ξ  of bidegree ≤ (δ1 , δ2 ) has the form
1   k k
ck1 k2 k1 k2 ξ1k1 ξ1 1 ξ2k2 ξ2 2
b(ξ1 )δ1 b (ξ1 ) k +k ≤N  δ k +k ≤N  δ
δ2
(11.19)
1 2 1 1 2 2
k2 <N k2 <N

with h(c) ≤ O(δ1 + δ2 ). Using the very definition of the height of a polynomial,
it becomes clear that Fi (ξ, ξ  ) may be written in the form (11.19) with
h(c) ≤ h(Fi ) + O(δ1 + δ2 ). (11.20)
 
Similar considerations also apply to the polynomials ξ k2 ξ2k Fi (ξ, ξ  ) instead of
Fi (ξ, ξ  ), we get expressions as in (11.19) with bounds as in (11.20).

The computation of the norm is done using the basis ξ2k ξ2k , 0 ≤ k, k < N , of
K(C × C) over K(ξ1 , ξ1 ). Since we may assume Gi (ξ1 , ξ1 ) = Norm(Fi (ξ, ξ  ))
(cf. Remark 11.8.3), it is the determinant of a N 2 × N 2 matrix A with entries
Aµν (ξ1 , ξ1 ) ∈ K(ξ1 , ξ1 )
whose numerators Bµν ∈ K[ξ1 , ξ1 ] have degree of order O(δ1 + δ2 ) and height
bounded by h(Fi )+O(δ1 +δ2 ). So far, we have given the argument for the height,
but if we follow the arguments carefully then we see that
|Bµν |v ≤ Cvδ1 +δ2 |Fi |v
and Cv = 1 for all but finitely many v ∈ MK . Therefore, there are B1 , B2 ∈
K[ξ1 , ξ1 ] of degree O(δ1 + δ2 ) such that
Gi = B1 /B2
and, with a new Cv still such that Cv = 1 for all but finitely many v ∈ MK , we
have 2
|B1 |v ≤ Cvδ1 +δ2 |Fi |N
v
for all v ∈ MK . This shows
h(B1 ) ≤ N 2 h(Fi ) + O(δ1 + δ2 ).
Then appealing to Theorem 1.6.13 we get
h(Gi ) ≤ N 2 h(Fi ) + O(δ1 + δ2 ).
This proves the claim. 
Lemma 11.8.5. There is a bihomogeneous polynomial E(x0 , x1 , x0 , x1 ), of bide-
gree (N d1 , N d2 ), with the following properties:

(a) (π × π)∗ (div(s)) = div(E).


(b) If (j1∗ , j2∗ ) is an admissible pair of E in (π(P ), π(Q)), then there is an
admissible pair (i∗1 , i∗2 ) of s in (P, Q) such that i∗1 ≤ j1∗ , i∗2 ≤ j2∗ .
11.8. Application of Roth’s lemma 385

(c) h(E) ≤ N 2 h(F) + O(d1 + d2 + d).


Proof: Similarly as in the proof of Lemma 11.8.2, there are k1 , k2 with
(π × π)∗ (div(yi |C×C ) ∼ k1 ({P1 } × P1K ) + k2 (P1K × {P1 }).
Note that
div(yi |C×C ) ∼ B = M ({P0 } × C + C × {P0 }) − ∆
by definition of φB . Similarly as in the proof of Lemma 11.8.2, we get k1 = k2 =
N M . Using
div(s) = div(Fi |C×C ) − ddiv(yi |C×C )
and Lemma 11.8.2, we see that
(π × π)∗ div(s) ∼ N (N δ1 − M d)({P1 } × P1K ) + N (N δ2 − M d)(P1K × {P1 })
∼ N d1 ({P1 } × P1K ) + N d2 (P1K × {P1 }).
Hence we have a global section of OP1 ×P1 (N d1 , N d2 ) with divisor equal to the
left-hand side. This global section may be viewed as a bihomogeneous polynomial
E(x0 , x1 , x0 , x1 ) of bidegree (N d1 , N d2 ) proving (a) (see Example A.6.13).
It is clear that there is a hyperplane (y) = 0 of Pm
K disjoint from the fibre (π ×
π)−1 (π(P ), π(Q)). As in Lemma 11.2.7, there is a bihomogeneous polynomial
F (x, x ) of bidegree (δ1 , δ2 ) with
s = F (x, x )/(y)d |C×C .
Again as in Lemma 11.8.2 and Remark 11.8.3, there is a bihomogeneous polyno-
mial G(x0 , x1 , x0 , x1 ) of bidegree (N 2 δ1 , N 2 δ2 ) with
(π × π)∗ div(F |C×C ) = div(G).
From the proof of (a), we get a bihomogeneous polynomial H(x0 , x1 , x0 , x1 ) of
bidegree (N M, N M ) such that
(π × π)∗ div((y)|C×C ) = div(H).
We may assume that
EH d = G.
Since the hyperplane {(y) = 0} does not meet the fibre over (π(P ), π(Q)), we
conclude that H doesn’t vanish at (π(P ), π(Q)). By Leibniz’s rule, (j1∗ , j2∗ ) is
an admissible pair for G in (π(P ), π(Q)). As in Remark 11.8.3, we may assume
G(ξ1 , ξ1 ) = Norm(F (ξ, ξ  )).
Integral elements form a ring and so F (ξ, ξ  ) is integral over K[ξ1 , ξ1 ] using
11.8.1. By passing to the minimal polynomial of F (ξ, ξ  ), we find a polynomial
a ∈ K[ξ, ξ  ] such that
a(ξ, ξ  )F (ξ, ξ  ) = G(ξ, ξ  ).
386 FA LT I N G S ’ S T H E O R E M

This follows from the fact that the last coefficient of the minimal polynomial equals
the norm up to a sign. By Leibniz’s rule, we see that
∂i1 ∂i2 F (P, Q) = 0
for some i1 ≤ j1∗ , i2 ≤ j2∗ . Since (y) does not vanish at (P, Q), an admissible
pair for F in (P, Q) is the same as an admissible pair for s in (P, Q). This proves
(b).
Let Hi be the bihomogeneous polynomial of bidegree (N M, N M ) such that
(π × π)∗ div(yid |C×C ) = div(Hi ).
Then we may assume
EHid = Gi
as above. By Theorem 1.6.13, we have
h(Hid ) = O(d)
and
h(E) + h(Hid ) = h(Gi ) + O(d1 + d2 + d).
Together with Lemma 11.8.4, we conclude
h(E) ≤ N 2 h(Fi ) + O(d1 + d2 + d)
proving (c). 
Lemma 11.8.6. There is√a constant C6 > 0, independent of d1 , d2 , d , and γ ,
such that for 0 < ε ≤ 1/ 2 , for any Vojta divisor satisfying (V1), (V2), (V3), with
d2 ≥ C4 /γ, d2 /d1 ≤ ε2
and for any P, Q ∈ C(K) with
  d1
min d1 hN [P0 ] (P ), d2 hN [P0 ] (Q) ≥ C6 2 (11.21)
γε
there exists a global section s of O(V ) with an admissible pair (i∗1 , i∗2 ) in (P, Q)
such that
d1 + d2 i∗1 i∗
h(F) ≤ C5 , + 2 ≤ 4N ε.
γ d1 d2
Proof: By Lemma 11.7.3, there is a non-zero global section s of O(V ) with
d1 + d2
h(F) ≤ C5 .
γ
We apply 11.8.1–11.8.5 to this s . The goal is to use Roth’s lemma in 6.3.7 for the
polynomial E(ξ1 , ξ1 ). By Lemma 11.8.5, the partial degrees of E are bounded
by r1 := N d1 and r2 := N d2 , respectively, and
d1
h(E) + 8r1  (11.22)
γ
11.9. Proof of Faltings’s theorem 387

(using d1 ≥ d2 ). On the other hand, we have


min{r1 h(π(P )), r2 h(π(Q))} = N min{d1 hN [P0 ] (P ), d2 hN [P0 ] (Q)} + O(d1 )
(11.23)
using the elementary fact
hN [P0 ] = h ◦ π + O(1)
following from 11.2.3 and Theorem 2.3.8. Using (11.22) and (11.23), we find
easily a constant C6 > 0 such that (11.21) implies
−2 (h(E) + 8r1 ) ≤ min{r1 h(π(P )), r2 h(π(Q))}.
Hence the assumptions in Roth’s lemma are satisfied and we get an admissible pair
(j1∗ , j2∗ ) for E in (π(P ), π(Q)) with
j1∗ j∗
+ 2 ≤ 4ε.
r1 r2
Now the claim follows from Lemma 11.8.5 (b). 

11.9. Proof of Faltings’s theorem

Let C be an irreducible projective smooth curve of genus g ≥ 2, defined over


a number field K , with a point P0 defined over K . The next result is Vojta’s
theorem.
Theorem 11.9.1. There are constants C7 , C8 , depending only on C and P0 ,
with the following property: Let P, Q ∈ C(K) and z = j(P ), w = j(Q). Then
one of |z| ≤ C7 , |w| ≤ C8 |z|, z, w ≤ 34 |z| |w| holds.
Remark 11.9.2. The constant 34 has no special significance and can be replaced
by any constant in ( √1g , 1]; what matters here is that it is strictly less than 1. This
follows from the proof.
Proof: We consider P, Q ∈ C(K) with |z| ≥ C7 , |w| ≥ C8 |z| for large constants
C7 , C8 to be determined later. Since the set Z defined in 11.6.6 is finite and
effectively determinable, we may assume that the constants C7 and C8 are so large
that P, Q ∈/ Z . Suppose that V is a Vojta divisor satisfying (V1), (V2), (V3), and
d1 , d2 ≥ C4 /γ (cf. 11.2.6 and 11.7.2). Then Proposition 11.3.1, Lemma 11.6.5,
Lemma 11.6.7 and Lemma 11.7.3 show that, for a positive constant C9 depending
only on C and P0 , we have

d1 + d2
−C9 · + i∗1 |z|2 + i∗2 |w|2 + i∗1 + i∗2
γ
d1 2 d2
≤ |z| + |w|2 − d z, w + O(d1 |z| + d2 |w| + d1 + d2 ).
2g 2g
388 FA LT I N G S ’ S T H E O R E M


Now we also assume that there is an ε , 0 < ε ≤ 1/ 2, with
d2 /d1 ≤ ε2 (11.24)
and
d1
min (d1 hN [P0 ] (P ), d2 hN [P0 ] (Q)) ≥ C6 (11.25)
γε2
as in Lemma 11.8.5. Applying this lemma, we get

d1
−C9 · 2 + 4N εd1 |z|2 + 4N εd2 |w|2
γ
(11.26)
d1 2 d 2
≤ |z| + |w|2 − d z, w + O(d1 |z| + d2 |w| + d1 ).
2g 2g
For a small positive number γ0 < 1 and D ∈ N , we choose
√ D
d1 = g + γ0 + O(1),
|z|2
√ D
d2 = g + γ0 + O(1)
|w|2
and
D
d= + O(1).
|z| |w|
The O(1) terms are for small adjustments so that d1 , d2 , d, δ1 , δ2 are all non-
zero natural numbers. This is a choice which makes (11.26) relatively sharp and
fulfills (V1), (V2), (V3), for D sufficiently large as a function of |z|, |w|, γ0 . It
is immaterial here how this notion of D being large depends on |z|, |w|, or γ0 ,
since in the end we shall let D → ∞ . Note that we have
d1 d2 − gd2 ≥ γ d1 d2
for
γ0
γ= + o(1),
g + γ0
where the term implicit in o(1) tends to 0 as D → ∞ . Using
d2 |z|2
= + o(1),
d1 |w|2
condition (11.24) becomes
|z|
≤ ε + o(1). (11.27)
|w|
As remarked in the proof of Lemma 11.6.7, we have
N 2
hN [P0 ] (P ) = |z| + O(|z|) + O(1),
2g
11.9. Proof of Faltings’s theorem 389

with a similar equation for Q and |w|. Thus the condition



|z| ≥ C10 /(ε γ), (11.28)
with C10 ≥ 1 a positive constant depending on C and P0 , implies (11.25) for
sufficiently large D .
We substitute the values for d1 , d2 , d, γ in (11.26) to derive


1 g + γ0 z, w 1 1
−C11 · +ε D ≤ D − D +O + D +o(D),
γ0 |z|2 g |z| |w| |z| |w|
for a certain constant C11 depending only on C and P0 . Assuming (11.28), we
divide by D , let D tend to ∞ , simplify, and find after rearranging terms

z, w g + γ0
− ≤ C12 ε (11.29)
|z| |w| g
with C12 depending only on C, P0 . We still need conditions (11.28) and |z| <
ε |w|, the limit of (11.27) for D → ∞ . To this end, we choose first γ0 so small
that √
3 g + γ0
− > 0,
4 g
and then ε so small that

3 g + γ0
− > C12 ε. (11.30)
4 g

Here we have used 3
4 > 2
2
and g ≥ 2. Let
<

γ0
C7 > C10 / ε
g + γ0
and
1
C8 > .
ε
For P, Q ∈ C(K) satisfying
|z| ≥ C7 , |w| ≥ C8 |z|,
(11.27) and (11.28) are both satisfied and inequality (11.29) holds. Finally, because
of (11.30) and C10 ≥ 1, we see that (11.29) implies
z, w 3
≤ ,
|z| |w| 4
proving the theorem. 
Now we are ready to prove Faltings’s theorem (see Theorem 11.1.1).
Proof: We may assume that C has a base point P0 ∈ C(K). From Theorem
9.3.5 and the Mordell–Weil theorem in 10.6.1, we know that J(K) ⊗Z R is a
finite-dimensional euclidean space. By Lemma 5.2.19, we may cover it by finitely
390 FA LT I N G S ’ S T H E O R E M

many cones T centered at 0 with angle α/2 from the axis to the ending, where
cos α > 34 . Let C7 , C8 be the constants in Theorem 11.9.1.
By Proposition 9.4.5 (Mumford’s gap principle with ε ≤ 2g cos α − 3), there is
a constant C13 , depending only on C and P0 , such that for any pair of distinct
points P , Q in a same cone T , with C13 ≤ |j(P )| ≤ |j(Q)|, we have

|j(Q)| ≥ 2 |j(P )|. (11.31)

Let C14 = max(C7 , C13 ). The set of K -rational points in the ball with center 0
and radius C14 is finite by Northcott’s theorem in 2.4.9, so it remains to see that

S := T ∩ {P ∈ C(K) | |j(P )| > C14 }

is finite. Suppose P0 , P1 , . . . , Pk are different points of S such that

|j(Pi )| ≤ |j(Pi+1 )|, i = 0, . . . , k − 1.

By (11.31), we have
|j(Pi+1 )| ≥ 2 |j(Pi )|,
yielding
|j(Pk )| ≥ 2k |j(P0 )|.
On the other hand, Theorem 11.9.1 shows that

|j(Pk )| ≤ C8 |j(P0 )|.

This proves
k ≤ log(C8 )/ log 2
and the theorem. 
11.9.3. In much the same way as in Section 9.4, Lemma 5.2.19 shows easily that the num-
ber of cones may be bounded by 7r , where r is the rank of the Mordell–Weil group over
K of the Jacobian of C . Hence C(K) is the union of two finite sets, namely the set of
small points P with h(P ) ≤ C14 , and the set of large points P with h(P ) > C14 .
The former set is finite by Northcott’s theorem, and the latter contains not more than
(log(C8 )/ log 2 + 1) 7r elements. The constants C14 and C8 are effectively com-
putable. As mentioned in the introduction to this chapter, we can dispense with the choice
of a K -rational point P0 . It turns out that we can take C8 = C15 g C 16 and C14 =
C17 g C 18 (h(C) + 1) for suitable absolute effectively computable constants C15 , . . . , C18 ;
here h(C) is the height of a presentation of C by means of (for example) a bicanonical
closed embedding. A sketch of the additional arguments needed can be found in [33]. In
any case, it is noteworthy that the bound for the height of small solutions is independent of
the field K and is linear in h(C) + 1 , and that the dependence on the field K shows up
only through the rank of the Mordell–Weil group.
11.10. Further developments 391

11.10. Some further developments

Inspired by Vojta’s proof, G. Faltings [114], [115], proved the following generalization of
the Mordell conjecture called now Faltings’s big theorem:
Theorem 11.10.1. Let A be an abelian variety over a number field K , let Γ = A(K)
and let X be a geometrically irreducible closed subvariety of A , which is not a translate
of an abelian subvariety over K . Then X ∩ Γ is not Zariski dense in X .
We will not prove this theorem here, since the proof is better understood in the language of
arithmetic geometry and we refer instead to the original papers by Faltings [114], citeFa2, as
well as to B. Edixhoven and J.-H. Evertse [96] or P. Vojta [313]. We state two applications of
this theorem, with only a sketch of proofs. For the properties of curves needed to understand
the arguments, we refer to [13]. The first result is due to J. Harris and J.H. Silverman [147],
Th.2, Cor.3.
11.10.2. Recall that a smooth geometrically irreducible curve C of genus g ≥ 2 , defined
over a number field K , is hyperelliptic if it is a double cover of P1L and bi-elliptic if it is a
double cover of an elliptic curve over some finite extension L/K . The hyperelliptic cover
is indeed defined over K and unique up to GL(2, K) (see [148], Prop.IV.5.3). If g ≥ 6 ,
the bi-elliptic cover is unique up to translation ([13] Ch.VIII, C-2, p.366) and also defined
over K ([153], Lemma 5).
Theorem 11.10.3. Let C be a smooth geometrically irreducible curve over a number field
K , of genus g ≥ 3 . Let D(K, 2) be the set of effective divisors of degree 2 on C , defined
over K . Then:

(a) if C as above is not hyperelliptic or bi-elliptic, then D(K, 2) is a finite set;


(b) if C is hyperelliptic and the genus is at least 9 , then D(K, 2) consists of a finite set
together with all divisors [Q] + [Q ] = f ∗ ([P ]) with P ∈ P1 (K) and f : C → P1
the hyperelliptic morphism of degree 2 ;
(c) if C is bi-elliptic and the genus is at least 9 , then D(K, 2) consists of a finite set
together with all divisors [Q] + [Q ] = f ∗ ([P ]) with P ∈ E(K) and f : C → E
the bi-elliptic morphism of degree 2 .
Outline of proof: The proof is based on the following observations.
We may assume that C has a K -rational point Q0 , otherwise we perform a quadratic base
change, which is harmless for our statements.
Effective divisors of degree 2 are parametrized by a smooth surface C (2) , the symmetric
square of C , and points of C (2) (K) are in one-to-one correspondence with effective di-
visors of degree 2 , rational over K . Now the map Q
→ cl([Q] − [Q0 ]) , extended by
linearity to divisors, yields a morphism j(2) : C (2) → A of C (2) into an abelian variety A
of dimension g , namely the Jacobian of C (see 8.10.3 and Remark 8.10.12). Let X be the
image of C (2) in A and denote by φ : C (2) → X the restriction of j(2) to X . Since Q0
is defined over K , the image of a rational point of C (2) is a rational point of X . There-
fore, applying Faltings’s big theorem, we infer that the image of such a point lies in the
union of a finite set and finitely many translates of elliptic curves. Indeed, Remark 8.10.12
392 FA LT I N G S ’ S T H E O R E M

implies that X cannot be a translate of an abelian surface. Thus it remains to describe the
set φ−1 (X(K)) .
If we have two distinct divisors [Q1 ]+[Q2 ] and [Q1 ]+[Q2 ] with the same image x0 under
φ , then H 0 (C, O([Q1 ] + [Q2 ])) is two-dimensional and the quotient of independent global
sections leads to a hyperelliptic structure on C , which is unique ([148], Prop.IV.5.3). We
conclude that |φ−1 (x0 )| ≥ 2 implies that φ−1 (x0 ) is a curve of genus 0 . Conversely, if
there is an hyperelliptic morphism f : C → P1 , then x0 := φ(f ∗ ([P ])) remains constant
as P varies on P1 , hence φ−1 (x0 ) is a curve of genus 0 in C (2). Thus φ is one-to-one,
except at the point x0 if C is hyperelliptic.
Note that, if g ≥ 4 , the curve C cannot be both hyperelliptic and bi-elliptic ([13], ∗
Ch.VIII, C-2, p.366). Hence, if the curve C is bi-elliptic, the above shows that the image
Y of E under the map P
→ φ(f ∗ ([P ])) is a curve. By Corollary 8.2.9, it is a translate of
an elliptic curve in A . Conversely, we can show that, if C has genus g ≥ 9 , any translate
Y = E + b ⊂ X of an elliptic curve determines a bi-elliptic structure C → Y ([147],
proof of Th.2 (b)). Moreover, since g ≥ 6 , this bi-elliptic structure is unique and defined
over K . This yields (b) and (c). For statement (a), we prove that, if X contains a curve of
genus 1 , then the curve C is either hyperelliptic or bi-elliptic ([147], Th.2 (a)). 

11.10.4. In our second application of Faltings’s big theorem, we consider a variety with
many rational points. The study of such varieties is an important branch of diophantine
geometry, which could not be treated in this book. For a survey of the theory, the reader is
referred to E. Peyre [234].
For a projective variety X over a number field K and a fixed height function hL with
respect to an ample line bundle L , we consider the multiplicative height HL := eh L and

NL (X, T ) := {P ∈ X(K) | HL (P ) ≤ T }

for T ≥ 0 . Sometimes X is replaced by a subset and we are interested in the asymptotics


of this counting function for T → ∞ .

11.10.5. The simplest example of a variety with many rational points is Pn−1
K for n ≥ 2 .
Let N (Pn−1
K , T ) be the counting function of points of bounded height with respect to the
standard height. The case K = Q was considered by Dedekind and Weber, and extended
to the counting of rational points on Grassmannians by W.M. Schmidt [265]. For general
K , let d be the degree, r (resp. s ) the number of real (resp. complex) places, RK the
regulator, DK/Q > 0 the discriminant, wK the number of roots of unity, hK the class
number, and ζK the zeta function of our number field K (see [172] for definitions). If the
reader is not willing to enter into the terminology of algebraic number theory, he may just
consider the special case K = Q and hence d = r = RQ = DK/Q = hK = 1 , wK = 2 ,
s = 0 and ζK is the Riemann zeta function.


The result is stated to hold for g ≥ 3 , but the proof there gives the condition g ≥ 4 . The
example y 2 = x8 + 1 shows that g ≥ 4 is indeed necessary.
11.10. Further developments 393

Schanuel’s theorem (see [257]) states


N (Pn−1
K ,T)
 n 
hK RK 2r (2π)s O(T (1 + log T )) if d = 1, n = 2,
= n r+s−1
T nd
+  
wK ζK (n) 1/2
DK/Q O T nd−1 otherwise,
for T ≥ 1 , where the implied constant may depend on n and K .
11.10.6. Schanuel’s theorem was generalized by J. Franke, Yu.I. Manin, and Y. Tschinkel
([122]) to flag manifolds, see also J.L. Thunder [299] for a more elementary approach with
a concrete error term. We expect also many rational points on Fano varieties, at least for K
sufficiently large. Recall that an irreducible projective smooth variety X is called a Fano
variety, if the anticanonical divisor −KX is ample. Then Manin has given a conjecture
about the asymptotics of NL (U, T ) for a sufficiently small open dense subset U of X (see
[122] and [234]).
11.10.7. Let X be a smooth cubic threefold in P4K , defined over a number field K . The
canonical divisor of X is the restriction of OP4 (−2) to X ([148], II.8.20) and hence X
is a Fano variety. Manin’s conjecture predicts an asymptotic ∼ cK T 2d for the number of
points of height at most T in a suitable dense open set of X , and it is expected that for X
itself we still have a bound O(T 2d ) , where we use the standard height on P4K . Since no
precise asymptotics are used here, it is easy to see that we may choose a different height
function with respect to OP4 (1) . It will be more convenient to use the Arakelov height hAr
on P4K (replacing the max -norm by the L2 -norm at archimedean places, see Section 2.8)
and we denote by NAr (X, T ) the corresponding counting function of points of bounded
height.
The solution of this problem is of particular importance in additive number theory with
Waring’s problem with cubes, and analytic number theory with cubic Weyl sums. General
Weyl sums
N
S(α) = e2πiαf (n)
n=1
with f (x) ∈ Z[x] a polynomial of degree k ≥ 1 occur quite often in analytic number
theory and harmonic analysis, and the problem of estimating such sums, both pointwise or
via high moments, is of central importance. For polynomials of high degree, Vinogradov’s
method and its variants lead to the best known results. However, for small degrees, in
particular for cubic polynomials, no improvements over Weyl’s original bounds (dating
back to 1910) in the case of pointwise estimates, or L.K. Hua’s bounds for moments [154]
(dating back to 1938) have been obtained.
The simplest non-trivial case for moments is f (n) = n3 and the sixth moment, with the
conjectural bound 1
|S(α)|6 dα ε N 3+ε ,
0
which if true would be the best possible (an asymptotic ∼ cN 3 is actually expected to
7
hold). As yet, no improvement in the exponent over the old Hua’s bound O(N 2 +ε ) of
1938 has been obtained. (Hua’s exponent is the critical one; any improvement, no matter
how small, would have interesting consequences.)
394 FA LT I N G S ’ S T H E O R E M

The above integral is the number of integer solutions of


x31 + x32 + x33 = x34 + x35 + x36
for 1 ≤ xi ≤ N , i = 1, . . . , 6 . This defines a non-singular cubic projective fourfold and
the problem amounts to counting the number of rational points of height at most T on the
subset of points with positive coordinates. By slicing, we are led to the problem of counting
rational points in a cubic threefold.
One of the difficulties in studying the distribution of points on a cubic hypersurface is that
the obvious composition law for pairs of points, namely the residual intersection with the
line through two points, is not defined if the line lies on the cubic hypersurface. A thorough
study of this composition defined in the complement of rational lines is in the interesting
monograph [190] by Yu.I. Manin. Thus the set of rational lines in a cubic hypersurface is
an exceptional set, which requires separate study.
11.10.8. First, we need some facts about the geometry of X , which are due to G. Fano
[118]. We consider the set Σ of lines of P4K contained in X . Then Σ is a subset of the
Grassmannian G(2, 5) of lines in four-dimensional projective space P4 . If x = (x0 :
· · · : x4 ) and x = (x0 : · · · : x4 ) are two points determining a line L , the Grassmann
coordinate of L is the vector

xi xj
Λ := det
xi xj 0≤i<j≤4

(with for example (i, j) in lexicographic order) determining a closed embedding of G(2, 5)
in P9 . It is easy to see that Σ becomes a closed projective subvariety of P9K (see J. Harris
[146], Example 6.19). This variety, of paramount importance in the study of the cubic
threefold X , is called in honor of Fano the Fano surface † associated to X . The surface
Σ is a smooth geometrically irreducible surface (see [34], Lemma 3).
11.10.9. We assume that Σ(K) = ∅ and we fix a base point Q0 ∈ Σ(K) . Then there
is a canonical abelian variety A , called the Albanese variety of Σ , and a canonical map
a : Σ → A mapping Q0 to 0 factoring through every such morphism from Σ to arbitrary
abelian varieties (see [165], II.3). It is obvious that a(Σ) generates A . This construction
works for any irreducible smooth projective variety with a base point and it may be shown
that the Albanese variety is the dual of the Picard variety ([165], VI, §1, Th.1). It was also
used in the deduction of Corollary 9.3.10.
Complex analytically, the Albanese variety is given as a -complex torus H 0 (Σ, Ω1Σ )∗ /
H 1 (Σ, Z) and the canonical map a is given by a(Q)(ω) = γ ω , where ω ∈ H 0 (Σ, Ω1Σ )
Q
and γP is a path from Q0 to Q (see [130], II.6). The reader will note that the Albanese
variety is a generalization of the Jacobian variety to higher dimensions.
Now we come back to the case of the cubic threefold. Then the Albanese variety A of the
Fano surface Σ has dimension 5 ([34], Lemma 5). Using the complex analytic description
of the Albanese map a above and that KΣ is very ample, it is easy to see that a has finite
fibres and hence it is a finite morphism (see A.6.15, A.12.4). Moreover, C.H. Clemens and

We should not be confused with the notion of Fano variety above, the surface Σ has a very
ample canonical bundle and therefore is of general type, see E. Bombieri and H.P.F. Swinnerton-Dyer
[34], proof of Lemma 5, [68], Lemma 10.13.
11.10. Further developments 395

P. Griffiths have shown that ϕ is an immersion (i.e. a local embedding, see [68], §12), but
we do not need this fact for our purposes.
11.10.10. On Σ , we use the Arakelov height induced by restricting the Arakelov height
hAr from P9K . If Q ∈ Σ with corresponding line LQ in X , then we have
hAr (Q) = hAr (LQ ),
where on the right-hand side we have the Arakelov height of the line in P4K . Here and in
the following, the Arakelov height of a projective linear subspace is defined as the Arakelov
height of the corresponding linear subspace in the sense of Section 2.8. Let Nlines (X, T )
be the number of points P ∈ X(K) of height HAr (P ) ≤ T , which are on K -rational
lines in X .
Theorem 11.10.11. If K = Q , then for T ≥ 1 the estimate
 
Nlines (X, T ) = c2 γ T 2d + O T 2d−1 (1 + log T )r/2

holds, where
 hK RK 2d−1 π d
γ= HAr (Q)−d < ∞ and c2 =
wK ζK (2) DK/Q
Q∈Σ(K )

and where r is the maximum rank of E(K)  E in Σ . If K = Q ,


 for all elliptic curves
then the error term has to be replaced by O T (1 + log T )1+r/2 .

Before we come to the proof, we need some useful results. First, we interpretate the
Arakelov height of a subspace in terms of the associated lattice.
11.10.12.
 Let W be an n -dimensional subspace of K N . The diagonal embedding K N →
N ∼
v|∞ Kv = R maps the K -lattice Λ := W ∩ OK to an nd -dimensional R -lattice
Nd N

Λ∞ in the closure of W in R Nd
(cf. Corollary C.2.7). Let vol(Λ∞ ) be the volume of
a fundamental domain of R2d /Λ∞ with respect to the Lebesgue measure. We denote by
λ1 the length of the shortest non-zero vector in Λ∞ with respect to the euclidean norm on
RN d .

We have the following result of Schmidt [263], Th.1, which we quote without proof.
n/2
Theorem 11.10.13. 2ns vol(Λ∞ ) = DK/Q HAr (W )d .

Next, we need a generalization of Schanuel’s theorem to an n − 1-dimensional linear sub-


space L of PN −1 , defined over K , which is uniform with respect to L . We denote by
π n/2
α(n) =
Γ( n2 + 1)
the volume of the unit ball in Rn .
Theorem 11.10.14. Under the assumptions of 11.10.12 and with n ≥ 2 , let L be the
N −1
projective linear subspace of PK induced by W . Then for T ≥ 1 the following estimate
396 FA LT I N G S ’ S T H E O R E M

holds
⎧  
⎨O T (1 + log+ ( T 0)) if d = 1, n = 2,
T nd  λ1
 λ1
NAr (L, T ) = cn +
HAr (L)d ⎩O ( T )nd−1 otherwise,
λ1

where the implicit constant in the bound may depend on n and K and where
hK RK −n/2 r+s−1
cn = D n α(n)r {2n α(2n)}s .
wK ζK (n) K/Q

For the proof, we need the following lemma from geometry of numbers. Let Ω be a
bounded measurable set in Rn with Lipschitz parametrizable boundary meaning that there
are finitely many Lipschitz continuous maps, defined on bounded subsets of Rn−1 and with
images covering ∂Ω .
Lemma 11.10.15. Let Λ be a lattice in Rn , let λ1 be the length of the shortest non-zero
lattice vector with respect to the euclidean norm   . For T > 0 , the number N (Λ, T ) of
lattice points in T Ω satisfies


n−1
vol(Ω) n T
N (Λ, T ) = T +O λ1
+ 1,
vol(Λ)
where the implied constant depends on n and Ω but not on Λ .
Proof: We refine the standard counting argument from [172], Ch.6, §2, Th.2 to make the
error term uniform with respect to the lattice. First, we use a result of geometry of numbers
(see C.G.Lekkerkerker [181], Ch.2, §10, Th.4)  confirming that Λ has a basis v1 , . . . , vn
such that the fundamental domain FΛ := { ni=1 mi vi | 0 ≤ mi < 1} is not too skew,
namely
v1  · · · vn 
= O(1).
vol(FΛ )
Hence there is a change of coordinates of norm bounded nby a constant C1 independent
of Λ , which transforms Λ into the orthogonal lattice i=1 Zvi ei , where (ei ) is the
standard basis of Rn . We conclude that a ball of diameter < λ1 /C1 intersects at most 2n
translates λ + FΛ , λ ∈ Λ .
Now let Nint (Λ, T ) (resp. Nbd (Λ, T ) ) be the number of lattice points λ ∈ Λ with λ+FΛ
contained in the interior or Ω (resp. with (λ + FΛ ) ∩ ∂(T Ω) = ∅) . Then
Nint (Λ, T ) ≤ N (Λ, T ) ≤ Nint (Λ, T ) + Nbd (Λ, T ).
Obviously, we have
Nint (Λ, T )vol(Λ) ≤ vol(T Ω) = T n vol(Ω),
hence it is enough to show that Nbd (Λ, T ) may be estimated by the error term in the claim.
The Lipschitz parametrizations of the boundary yield easily that ∂Ω may be covered by at
most C2 /ν n−1 balls of diameter < ν ≤ 2 . If we set ν = λ1 /(C1 T ) , then the above
yields

n−1
T
Nbd (Λ, T ) ≤ 2n C1n−1 C2
λ1
at least for ν ≤ 2 . Since N (Λ, T ) = 1 for T < λ1 , this proves the claim. 
11.10. Further developments 397

Proof of Theorem 11.10.14: For simplicity, we restrict the proof to the case K = Q . The
extension to arbitrary number fields may be done by S.H. Schanuel’s method in [257].
Let N (W, T ) be the number of x ∈ ZN ∩ W with |x1 |2 + · · · + |xN |2 ≤ T 2 . Let Ω
be the intersection of W with the unit ball in RN . Then Lemma 11.10.15 and Theorem
11.10.13 yield
α(n)
N (W, T ) = T n + O(( λT1 )n−1 ) + 1.
HAr (L)
Note that the implied constant is independent of W because all Ω s are isometric. Let
N ∗ (W, T ) be the number of primitive solutions, i.e. x ∈ ZN ∩ W with |x1 |2 + · · · +
|xN |2 ≤ T 2 and GCD(x1 , . . . , xN ) = 1 . It is clear that
 ∞

T
N (W, T ) − 1 = N ∗ W,
k
k=1

and hence the Möbius inversion formula gives




T
N ∗ (W, T ) = µ(k) N W, −1 ,
k
k=1

where µ is the Möbius function. We have N (W, T /k) = 1 if k > T /λ1 . We must count
only one half of the primitive solutions for N (L, T ) , hence we get
T /λ 1  T /λ 1 
1 ∗ α(n) T n  µ(k)  |µ(k)|
NAr (L, T ) = N (W, T ) = n
+O(( λT1 )n−1 ) .
2 2 HAr (L) k kn−1
k=1 k=1

By the product development

1   µ(k) ∞
= (1 − p−s ) =
ζ(s) p
ks
k=1

for the Riemann zeta function ([8], Ch.5, Sec.4.1, Th.9) and using Minkowski’s first theo-
rem in C.2.19 together with Theorem 11.10.13, we easily deduce the claim. 
Proof of Theorem 11.10.11: We may assume that Σ has a K -rational base point Q0 ,
otherwise the theorem is trivial. K -rational lines on X correspond to K -rational points
on Σ and we can apply Faltings’s big theorem to the image a(Σ) of Σ in the Albanese
variety A . Since a(Σ) generates A , no translate of an abelian subvariety is equal to a(Σ) .
Hence a(Σ(K)) is contained in the union of finitely many elliptic curves Yi and a finite
set S . Here, the elliptic curves are allowed to have origin different from the origin of A
and so we may assume that they are defined over K . Let  h be the Néron–Tate height on
an elliptic curve Yi with respect to an even ample line bundle Hi ; then we have

h(a(Q))  hAr (Q)  


 h(a(Q))
for all Q ∈ a−1 (Yi ) (since a∗ (Hi ) is ample on a−1 (Yi ) , see A.12.7).
Now recalling that the Néron–Tate height yields a norm on Yi (K)/tors , we deduce that
for any fixed κ > 0 the number of points Q ∈ Σ(K) with HAr (Q) ≤ T κ is  (1 +
κ log T )r/2 . Moreover, HAr (Q) grows faster than polynomial and hence γ is finite. By
398 FA LT I N G S ’ S T H E O R E M

Theorem 11.10.14 and noticing that the first successive minimum is uniformly bounded
10
from below (by the first successive minimum of OK ), we have
" # T 2d  
 P ∈ LQ (K) | HAr (P ) ≤ T  = c2 + O T 2d−1 (11.32)
HAr (Q) d

with T 2d−1 replaced by T (1 + log T ) in case of K = Q .


In order to finish the proof, we shall prove that there are constants κ, ρ > 0 , and a (possibly
empty) finite set P of points P ∈ X(K) , such that for every T ≥ ρ and every Q ∈ Σ(K)
with HAr (Q) > T κ , the set
" #
P ∈ LQ (K) | HAr (P ) ≤ T
either is empty or consists of exactly one point P ∈ P . Thus the contribution to the
counting of Nlines (X, T ) due to lines LQ with HAr (Q) > T κ is at most |P| .
We have already remarked that there are  (1 + κ log T )r/2 points Q ∈ Σ(K) with
HAr (Q) ≤ T κ , hence summing (11.32) over Q we get the theorem, because the effect of
intersection of lines is  (1 + κ log T )r and may be neglected in the counting.
Consider lines LQ with
HAr (LQ ) = HAr (Q) > T κ .

If x and x are two distinct points of LQ , 2.9.8 yields
HAr (LQ ) ≤ HAr (x)HAr (x ),
hence if κ > 2 we cannot have two points on LQ with Arakelov height at most T . Suppose
now that x0 ∈ LQ (K) is a (necessarily unique) point such that HAr (x0 ) ≤ T and let y
be any other point on LQ (K) . The line LQ has a parametrization x0 u + yv and lies in
X , a cubic threefold given by an equation

aijk xi xj xk = 0.
ijk

Substituting the parametrization of LQ into this equation and looking at the coefficient of
u2 v , we infer that
 4 

aπ(ijk) x0i x0j yk = 0, (11.33)


k=0 ijπ

where π ranges over all permutations. In other words, the line LQ is contained in the
cubic surface S in P3 , which is the intersection of X with the hyperplane ΠQ defined by
(11.33). Now we distinguish cases.
If S contains only finitely many lines, these lines are determined algebraically by the coef-
ficients of the defining equations of X and ΠQ , hence they have height of polynomial size
in the heights of these equations, hence  HAr (x0 )ρ ≤ T ρ for some absolute constant ρ
(for a precise result, we may use some sort of arithmetic Bézout theorem, see [45], Th.5.5.1,
for an advanced version). Since LQ is one of these lines, this contradicts the lower bound
HAr (LQ ) > T κ if κ > ρ and T is larger than some constant depending only on X ,
which we can assume by taking ρ large enough. Thus we need only deal with the case in
which S contains infinitely many lines.
11.10. Further developments 399

Note that the smooth cubic surface contains exactly 27 lines ([148], Th.V.4.7). Therefore,
if S contains infinitely many lines it must be singular and, intersecting S with a generic
place in ΠQ , we verify that S can be only one of the following possibilities:

(a) a cone over an elliptic curve;


(b) a cone over a rational cubic curve;
(c) a rational cubic ruled surface;
(d) reducible, with a projective plane as a component.

Cases (b), (c), (d) are immediately excluded because they would give rise to a family of
lines of X parametrized by a rational curve. This would lead to a rational curve in the Fano
surface Σ . Since the Albanese map is finite, the image under the Albanese map a would
give a rational curve inside an abelian variety contradicting 8.2.18 or Proposition 8.2.19.
Case (a) can actually occur. The maximum number of such cones is 30 , attained for ex-
ample with the Fermat cubic threefold x30 + x31 + x32 + x33 + x34 = 0 for the complete
intersections with the hyperplanes xi + ηxj = 0 , η 3 = 1 and i = j , see [68], Lemma 8.1
and p.315.
We conclude that the vertices x0 of these cones belong to a finite set P and by choosing
T sufficiently large only the vertices contribute to the counting, proving the claim. 
11.10.16. We have studied in Chapter 4 the theory of small points on subvarieties of a linear
torus. There is a similar theory on an abelian variety A over a number field K , which we
mention briefly. For details, we refer to the overview article of Abbes [1].
Let X be a closed subvariety of A . As in 3.2.2, a torsion coset of X is a translate of an
abelian subvariety by a torsion point and we define X ∗ to be the complement of the union
of all torsion cosets in X . We have the following analogue of Theorem 4.2.2, also due to
Zhang [342]:

Theorem 11.10.17. Let  hL be the Néron–Tate height on A with respect to an ample sym-
metric line bundle L . Then

(a) There are only finitely many maximal torsion cosets in X and its union is X \ X ∗ .
(b) There is a positive lower bound for the restriction of 
hL to X ∗ .
11.10.18. This theorem is called the Bogomolov conjecture. In fact, Bogomolov conjec-
tured it only for a subcurve of A . We omit the proof of Theorem 11.10.17, which is best
done in the framework of Arakelov theory. It relies on an equidistribution theorem due to
Szpiro, Ullmo, and Zhang [296]. This inspired Bilu’s approach in Section 4.3.

Since torsion points are characterized by Néron–Tate height 0 , Theorem 11.10.17 yields
immediately:
Corollary 11.10.19. If X is not a torsion coset, then the torsion points are not Zariski-
dense in X .
11.10.20. A. Moriwaki [208] generalized Theorem 11.10.17 to finitely generated fields F
over Q using a new type of heights given in terms of Arakelov geometry. This proves
400 FA LT I N G S ’ S T H E O R E M

Corollary 11.10.20 over F , which is the Manin–Mumford conjecture. The latter was first
proved by M. Raynaud [237], [238], using different methods.

11.11. Bibliographical notes

For a historical account, we refer to the introduction. Our presentation follows


quite closely [29].
There is a version of the local Eisenstein theorem (without the existence statement)
which does not require the condition ∂p ∂t (0, 0) = 0, as shown by W.M. Schmidt
[270]; his proof is more difficult and uses techniques from p -adic linear differen-
tial equations. The sharpest result of this type is due to B.M. Dwork and A.J. van
der Poorten [95], also obtained with similar methods. These deeper results are not
needed here.
P. Vojta [314] proved Faltings’s big theorem for a semiabelian variety A over
any field K of characteristic 0 and with Γ any finitely generated subgroup Γ of
A(K). M. McQuillan [199] extended Vojta’s result to the division group of Γ .
B. Poonen [235] proved a generalization of Faltings’s big theorem which includes
also Bogomolov’s conjecture in the case of A isogeneous to a product of an abelian
variety with a torus, and G. Rémond [242] showed it for all semiabelian varieties.
The problem of obtaining an effective bound for the number of points in Faltings’s
big theorem remained open for quite a while. Eventually, G. Rémond [240] proved
that the number of translates of maximal abelian subvarieties contained in X is
effectively bounded in terms of A , rank(Γ) and deg(X) in the number field
case. The main difficulty in Rémond’s proof arises from the fact that the natural
generalization of Mumford’s gap principle does not hold for varieties of higher
dimension and he had to introduce new ideas to overcame this problem. In the
end, his bound turns out to be simply exponential in rank(Γ).
The conjecture of Manin on the number of rational points on a Fano manifold has
now been proved in several cases, see Peyre [234], although counterexamples were
found by V.V. Batyrev and Y. Tschinkel [17].
Theorem 11.10.11 is previously unpublished.
A proof of Bogomolov’s conjecture for subvarieties of an abelian variety of CM
type, following the method used in Chapter 4 for the torus case, was given in E.
Bombieri and U. Zannier [39]; however, the proof does not extend in an obvious
way to the general case. The case of curves in an arbitrary abelian variety was then
proved by E. Ullmo [301] introducing new ideas, and the general case was finally
settled in [342] and [296].
S. David and P. Philippon have given good explicit quantitative versions of the
Bogomolov conjecture (see [82], [84]).
1 2 T H E abc- C O N J E C T U R E

12.1. Introduction

The abc-conjecture of Masser and Oesterlé is a typical example of a simple state-


ment that can be used to unify and motivate many results in number theory, which
otherwise would be scattered statements without a common link. As such, it de-
serves to be discussed, first by showing its power and then by generalizing it and
showing how it fits into the much more general and coherent set of conjectures
provided by Vojta in his thesis.
Although a pessimist may conclude that the ease with which the abc-conjecture
may be applied to solve notoriously difficult problems is only a reflection of how
difficult its proof is likely to be, we should keep in mind that its function field
analogue is quite easy to prove and provides a unified method of attack for many
problems in the arithmetic of function fields. Moreover, whatever its status in
the classical case, it is likely that exceptions, if any, will be extremely rare and
most of the conclusions obtained by its application are also likely to be valid and
provable in some instances by different methods. The abc-conjecture is also a
useful tool for guessing the right answer when analysing specific problems, hence
its significance should not be too easily discounted.
The content of this chapter is as follows. In Section 12.2 we recall the formula-
tion of the abc-conjecture over Q and prove some consequences of it, including
Elkies’s proof that the strong abc-conjecture implies Roth’s celebrated theorem
on approximation of algebraic irrationals by rational numbers, in fact effectively if
we assume an effective abc-conjecture.
Elkies’s proof is based on an interesting result, Belyı̆’s lemma, which proves that
any non-trivial morphism C → C  of irreducible smooth curves over a number
field K can be extended to C → C  → P1 , where now the composition C → P1
is ramified only at {0, 1, ∞}. A very interesting feature of this lemma is that it
holds only for curves defined over a number field. Belyı̆’s lemma will also be
the main tool in Section 12.3 to prove Belyı̆’s theorem, which is a necessary and
sufficient criterion for a complex curve to be defined over Q . In Section 12.4,
we prove first the analogue of the abc-conjecture for polynomials. Then we give

401
402 T H E abc-CONJECTURE

examples for the strong abc-conjecture over Q as well as counterexamples for


stronger formulations.
In Section 12.5, we deal with the equivalence of the abc-conjecture with other
conjectures, including Szpiro’s conjecture about conductors and discriminants of
elliptic curves over Q . This chapter concludes with Section 12.6 dealing with
a result of Darmon and Granville that the generalized Fermat equation has only
finitely many integer solutions. This comes from a non-trivial application of Falt-
ings’s theorem based on the theory of ramified coverings presented in Section 12.3.
We use several previous results, but otherwise this chapter may be read to a large
extend independently of the other parts of the book.

12.2. The abc-conjecture

This section begins with the formulation of the abc-conjecture over the rational
numbers. Then we give a weak explicit form and indicate how to deduce Fermat’s
and Catalan’s conjectures from it. The standard formulation of the abc-conjecture
involves a positive small parameter ε and an unspecified positive constant C(ε),
which depends only on ε , to rule out the possibility of disproving the conjecture
by finding numerical counterexamples. The main drawback of such an approach is
that algebraic geometry has not been of any help in producing plausible heuristics
about the behavior of C(ε). However, considerations of diophantine approxima-
tion have led A. Baker to suggest a specific dependence of C(ε) on ε , and we
mention his conjecture in 12.2.6.
The remaining part is dedicated to an argument, due independently to Langevin
and Elkies, that the abc-conjecture implies Roth’s original theorem over Q , in fact
in a stronger form. Its importance also lies in the fact that an effective abc-theorem
would make Roth’s theorem effective. The proof is based on Belyı̆’s lemma from
12.2.7. In Section 14.4, Elkies’s argument will be extended to number fields and
the proof becomes more conceptual. We conclude this section with an amusing
application of the abc-conjecture to a classical question of analytic number theory.
Definition 12.2.1. The radical rad(N ) of an integer N is the product of all dis-
tinct primes dividing N 
rad(N ) = p.
p|N

The following conjecture is called the abc-conjecture in the strong form:


Conjecture 12.2.2. Let ε > 0 be a positive real number. Then there is a constant
C(ε) such that, for any triple a, b , c of coprime positive integers with a + b = c,
the inequality
c ≤ C(ε) rad(abc)1+ε
holds.
12.2. The abc-conjecture 403

12.2.3. The adjective “strong” here refers to the fact that this statement is supposed
to hold for every positive ε . If we assume that it only holds for some fixed ε > 0,
for example ε = 1, then we refer to it as the weak abc-conjecture.
In this respect, we note that for applications the weak abc-conjecture often is as
useful as in its strong formulation. The statement

a + b ≤ rad(ab(a + b))2 (12.1)

for every pair a, b of positive coprime integers has in fact been conjectured by
several authors as a likely explicit form of the weak abc-conjecture.
Example 12.2.4. Suppose xn + y n = z n is a non-trivial solution in coprime positive
integers of the famous Fermat equation. Let us take a = xn , b = y n , c = z n in (12.1).
Since abc = (xyz)n we deduce

z n ≤ rad((xyz)n )2 = rad(xyz)2 ≤ (xyz)2 < z 6 .

Since z > 1 , this implies n ≤ 5 . It is well known that the Fermat equation has no non-
trivial solutions for n = 3 (Euler), n = 4 (Fermat), n = 5 (Dirichlet, Legendre). For a
proof of these classical cases, we refer to [97]. We conclude that (12.1) implies Fermat’s
last theorem. Any weak abc -conjecture would lead to a proof of the asymptotic Fermat
conjecture, namely a proof for all sufficiently large exponents n .
Fermat’s last theorem is now proved, at last, by A. Wiles and R. Taylor [331], [297]. For an
account of the proof, we refer to the book of H. Darmon, F. Diamond, and R. Taylor [77].

Example 12.2.5. The same argument applies to Catalan’s conjecture that 8 and 9 are
the only two consecutive perfect powers in the sequence of positive natural integers. If we
apply (12.1) to the Catalan equation xm + 1 = y n , we find
n
y n ≤ (xy)2 < y 2 m +2

and n(m − 2) < 2m . As we may restrict to m, n prime, this leaves us with the well-
known possibilities m = 2 (V.A. Lebesgue), or n = 2 (Chao Ko), or (m, n) one of the
pairs (3, 3) (trivial), (3, 5) , (5, 3) (Nagell). For an account of these cases, we refer to P.
Ribenboim [243]. Hence the Catalan conjecture follows from (12.1).
The Catalan conjecture has recently been established unconditionally by P. Mihăilescu [203]
(see also Yu.F. Bilu [25] for an exposition of his proof). It had been shown earlier uncon-
ditionally by R. Tijdeman [300] that the Catalan equation has only a finite number of solu-
tions, and effective bounds for the size of the solutions x , y and of the exponents m , n
could also be given. It suffices to consider the equation xp − y q = ±1 with p > q odd
primes and what really matters is to give a upper bound for the exponent p . Tijdeman’s
proof achieves this by appealing to Baker’s theory of linear forms in logarithms. A note-
worthy aspect of his method is that it extends to study the equation xp − y q = ±1 over an
arbitrary number field.
The full solution required additional considerations from the theory of cyclotomic fields
so to impose severe restrictions on the possible pairs of exponents (q, p) . In particular,
M. Mignotte and Y. Roy [201] proved its validity for q < 105 , while P. Mihăilescu [202]
404 T H E abc-CONJECTURE

proved that if odd primes q < p allow a solution of Catalan’s equation, then (q, p) satisfies
the two congruences pq−1 ≡ 1 (mod q 2 ) and q p−1 ≡ 1 (mod p2 ) , forming a so-called
Wieferich pair. A few examples of such pairs are known, namely
(2, 1093), (3, 1006003), (5, 1645333507), (83, 4871), (911, 318917), (2903, 18787),
but they appear to be rather uncommon.
Finally, Mihăilescu [25] was able to prove the further congruence p ≡ 1 (mod q) , hence
p ≡ 1 (mod q 2 ) followed because of the previously established congruence pq−1 ≡ 1
(mod q 2 ) . Therefore, p ≥ q 2 + 1 . A slightly more accurate argument also proved p ≥
4q 2 + 1 . This clean lower bound for p could be combined with the upper bound for p in
terms of q , again obtained by means of Baker’s theory, showing that q had to be within
the range covered by the Mignotte and Roy result. This completed the proof of Catalan’s
conjecture.
Variants of this proof, avoiding the use of Baker’s theory and using instead the theory of
cyclotomic fields to obtain the required upper bound for p , have been described by several
authors and we refer to Mihăilescu [203] for further details.

The strong abc-conjecture as formulated before is unsatisfactory because it does


not make precise the constant C(ε). A. Baker [15] proposed the following more
explicit statement:
Conjecture 12.2.6. There is an absolute constant K , such that if a, b , c are three
coprime integers with a + b + c = 0 the inequality

1+ε
max(|a|, |b|, |c|) ≤ K · (p/ε)
p|abc

holds for every ε with 0 < ε ≤ 1.

Numerical experiments are consistent with a small value for the constant K .
We continue with the application of the abc-conjecture to Roth’s theorem, begin-
ning with Belyı̆’s lemma:
Lemma 12.2.7. Let g : C → C  be a non-constant morphism between two ir-
reducible smooth curves C , C  , defined over a number field K . Let S be any
finite set of points on C(K). Then there is a non-constant rational function
h : C  → P1K such that the composite morphism f = h ◦ g : C → P1K is
unramified outside of f −1 ({0, 1, ∞}) and moreover f (S) ⊂ {0, 1, ∞}.

Unramified morphsims are studied in A.12, B.4, and B.3. Here, we deal with
unramified morphisms ϕ : C → C  of smooth curves over a number field. Then it
suffices to know that ϕ is unramified in x ∈ C if and only if d(w ◦ ϕ)/dz(x) = 0
with respect to local analytic coordinates z at x and w at ϕ(x) (see Proposition
A.12.18 and Example B.4.8).
12.2. The abc-conjecture 405

Proof: Reduction to C = P1Q and to the identity morphism. We choose any


non-constant rational function g1 : C  → P1K and replace g by the composition
g1 ◦ g . This reduces the problem to the case C  = P1K . By increasing S so as
to include the ramification set of g and setting S  = g(S), it is enough to find
a non-constant h ∈ Q(P1 ) which is unramified outside of h−1 ({0, 1, ∞}) and
maps S  into {0, 1, ∞}.
The proof is completed by the following descending double induction on the de-
gree of components of S and on the cardinality of S .
Lowering the degree of a point in S . Let α ∈ S be algebraic of highest degree
d ≥ 2. The minimal polynomial p(x) of α defines a morphism p : P1Q → P1Q
of degree d . The ramification set S1 of p consists of the roots of p (x) = 0 and
∞ , hence its elements have degree at most d − 1. After composition with the
morphism p , by the chain rule we may replace S by the set S  = p(S ∪ S1 ).
Note that p(α) = 0 and that p(β), β ∈ Q , has degree not exceeding the degree
of β . Therefore, the number of ramification points of highest degree d ≥ 2 has
gone down at least by 1, replacing S by S  . By composition of such morphisms
we reach a situation in which S consists only of rational points and ∞ .
Lowering the cardinality of S . Suppose now that S consists only of rational
points and ∞ and has cardinality |S| ≥ 4. By applying a projective automorphism
of P1Q we may assume that S contains {0, 1, ∞}. Let λ = A/(A+B) be a fourth
point in S , where A , B are integers with A, B, A+B = 0. Consider the rational
function h(x) = cxA (1 − x)B , where c is a non-zero constant to be determined.
Since
h A B
= −
h x 1−x
vanishes only at x = ∞ or x = λ , the morphism h : P1Q → P1Q is ramified at most
over {0, 1, ∞, λ}. We have h({0, 1, ∞}) ⊂ {0, ∞} because A, B, A + B = 0.
Note that
h(λ) = c λA (1 − λ)B ,
hence choosing c = λ−A (1 − λ)−B we get h(λ) = 1. Moreover, h is un-
ramified outside of h−1 ({0, 1, ∞}). Therefore, composition by h replaces S by
{0, 1, ∞} ∪ h(S), decreasing the cardinality of S at least by 1 because h(λ) = 1.
This completes the second induction step and the proof of the lemma. 
Example 12.2.8. Consider the morphism g : P1Q → P1Q , where g(x) = 2x3 − 3ax2 + 1
and a ∈ Z , a = 0, 1 . Let S be the set of roots of g(x) . Since a = 1 , the elements
of S are algebraic numbers of degree 3 . In order to determine the morphism h , we have
to replace S by g(S ∪ S1 ) , where S1 is the ramification set of g . In this case, since
g  (x) = 6x2 − 6ax , the ramification set of g is S1 = {∞, 0, a} . We have

S  = g(S ∪ S1 ) = {0, 1, ∞, 1 − a3 }.
406 T H E abc-CONJECTURE

By our choice of S and g , this makes the second step redundant. To lower the cardinality,
we may take A = 1 − a3 , B = a3 whence
3 3 3 3
h(x) = (1 − a3 )−1+a (a3 )−a x1−a (1 − x)a .
The composite map f = h◦g yields the desired morphism f : P1Q → P1Q with the property
that f is unramified outside of f −1 ({0, 1, ∞}) and f (S) ⊂ {0, 1, ∞} .
If we further specialize a = −1 , we find
(1 + 3x2 + 2x3 )2
f (x) = ,
4x2 (3 + 2x)
(1 + x)4 (1 − 2x)2
1 − f (x) = − ,
4x2 (3 + 2x)
yielding the identity
(1 + 3x2 + 2x3 )2 − (1 + x)4 (1 − 2x)2 = 4x2 (3 + 2x).

In general, this procedure based on composition of maps quickly leads to rational functions
with gigantic degree and height. Note that the procedure followed here is not necessarily
the best, for example if a = −1 the polynomial h(x) = (x − 1)2 does the job as well,
with a corresponding f (x) = (3x2 + 2x3 )2 and polynomial identity
x4 (3 + 2x)2 + (1 + x)2 (1 − 2x)(1 + 3x2 + 2x3 ) = 1.

Theorem 12.2.9. The strong abc-conjecture over Q implies Roth’s theorem (see
Theorem 6.2.3), in the special case K = Q and S = {∞}.
Remark 12.2.10. The proof is constructive; hence an effective version of the
strong abc-conjecture implies the effective Roth theorem, indeed a stronger ver-
sion of it which takes into account arithmetic ramification (see Section 14.4).
Proof: Let α be an algebraic number of degree n ≥ 2 (the case n = 1 is trivial)
and let g(x) be its minimal polynomial over Z . We apply Belyı̆’s lemma from
12.2.7 to the morphism g : P1Q → P1Q and the set S consisting of the roots of g(x),
obtaining a rational function h(x), defined over Q , such that the composition
f (x) = h(g(x)) has the following properties:

(a) the morphism f : P1Q → P1Q is unramified outside of {0, 1, ∞};


(b) f (S) = h(0) ⊂ {0, 1, ∞}.

Without loss of generality, we may suppose that h(0) = 0. Let f (x) = u(x)/w(x),
where u , w are polynomials with integral coefficients without common factors,
and let v(x) := w(x) − u(x). Let d be be the degree of the morphism f , hence
d = max(deg(u), deg(w)), and let
U (X, Y ) = Y d u(X/Y ), V (X, Y ) = Y d v(X/Y ), W (X, Y ) = Y d w(X/Y )
be the associated homogeneous forms of degree d . Note that U + V = W and
they have no common factors as well.
12.2. The abc-conjecture 407

Consider the factorizations of U , V , W into irreducible factors Ui of degree


ni ≥ 1, namely
U (X, Y ) = u0 U1 (X, Y )m1 · · · Ur (X, Y )mr
V (X, Y ) = v0 Ur+1 (X, Y )mr +1 · · · Us (X, Y )ms
W (X, Y ) = w0 Us+1 (X, Y )ms +1 · · · Ut (X, Y )mt
in the ring Z[X, Y ] and with u0 , v0 , w0 ∈ Z .
Since we assume h(0) = 0, we see that we may take n1 = n and U1 (X, Y ) =
G(X, Y ) = Y n g(X/Y ), the irreducible homogeneous binary form associated to
the algebraic number α .
We need a simple lemma based on the theory of Weil heights from Section 2.4.
The height of f is by definition the height of u(x) + w(y).
Lemma 12.2.11. There is a positive integer D , bounded in terms of the height
and degree of the rational function f (x), such that, if k, l are coprime integers,
then
GCD(U (k, l), V (k, l), W (k, l)) | D.
Proof: Let us consider the morphism ϕ : P1Q → P2Q given by ϕ(x) = (U (x) :
V (x) : W (x)). Then we have ϕ∗ OP2 (1) ∼
= OP1 (d). In terms of the multiplicative
height H , Theorem 2.3.8 gives a constant C < ∞ with
H(x)d ≤ CHϕ (x).
Thus setting x = (k : l), the above inequality yields
max{|U (k, l)|, |V (k, l)|, |W (k, l)|}
max(|k|, |l|)d ≤ C
GCD(U (k, l), V (k, l), W (k, l))
max(|k|, |l|)d
≤ C
GCD(U (k, l), V (k, l), W (k, l))
with a constant C  depending only on the height and the degree of the rational
function f . This proves the claim. 
Now we complete the proof of Theorem 12.2.9 as follows.
We may suppose that α is real. Let k/l be a rational approximation to α . We may
assume k, l coprime, l > 0 and that U (k, l)V (k, l)W (k, l) = 0. We abbreviate
D0 := GCD(U (k, l), V (k, l), W (k, l)),
set
a = U (k, l)/D0 , b = V (k, l)/D0 , c = W (k, l)/D0
and apply the abc-inequality to the relation a + b = c . The radical rad(abc) is a
divisor of
u0 v0 w0 U1 (k, l)U2 (k, l) · · · Ut (k, l);
408 T H E abc-CONJECTURE

therefore, recalling that U1 (k, l) = G(k, l) and |Ui (k, l)| f ldeg(Ui ) , we get
rad(abc) f |G(k, l)| lK ,
where

t 
t
K = − deg(U1 ) + deg(Ui ) = −n + deg(Ui ).
i=1 i=1
Next, we note that, since f (α) = 0 and |α − k/l| is assumed to be sufficiently
small, we must have that k/l is bounded away from the zeros of v . It then follows


 k 
|V (k, l)| = ld v  f ld .
l 
By Lemma 12.2.11, we conclude |b| f ld , with an implied constant depending
only on the height and degree of f (x).
In view of these considerations, the abc-inequality yields
ld ε,f (|G(k, l)| lK )1+ε
whence
|G(k, l)| ε,f ld−K−dε/(1+ε) .
It remains to evaluate d − K . To this end, we apply Hurwitz’s theorem from
B.4.6 to the ramified covering f : P1Q → P1Q . The ramification occurs only over
{0, 1, ∞} and we get
t
−2 = d · (−2) + (mi − 1) deg(Ui ).
i=1
Moreover, we have

t
3d = mi deg(Ui )
i=1
leading to

t
deg(Ui ) = d + 2
i=1

and finally d − K = d + n − deg(Ui ) = n − 2. We conclude that
|G(k, l)| ε,f ln−2−εd/(1+ε) . (12.2)
Since G(k, l) = ln g(k/l) and g(α) = 0, the mean-value theorem shows that

k
G(k, l) = ln g(k/l) = ln (g(k/l) − g(α)) = ln − α g  (ξ)
l
for some point ξ between k/l and α . Since k/l is close to α and g  (α) = 0, we
see that |g  (ξ)| is bounded away from 0, giving
 
k 
 − α g l−n |G(k, l)|.
l 
12.2. The abc-conjecture 409

Comparison with (12.2) yields the theorem. 


As already noted by M. Langevin [176], [177] and as we will further explain in
Remark 14.4.17, this argument proves much more than Roth’s theorem, namely
Theorem 12.2.12. Let ε > 0 and F (x, y) ∈ Z[x, y] be a homogeneous polyno-
mial of degree d with distinct linear factors over C . Then for all coprime integers
m , n with F (m, n) = 0, the strong abc-conjecture over Q implies
rad(F (m, n)) ε,F max(|m|, |n|)d−2−ε .

In particular, if we take F (x, y) = xy(x+y), we recover the strong abc-conjecture


over Q . If we apply Theorem 12.2.12 to the homogenization of the minimal poly-
nomial of an algebraic number, then we easily deduce Roth’s theorem over Q and
S = {∞}. An immediate consequence of Theorem 12.2.12 is
Corollary 12.2.13. Let ε > 0 and f (x) ∈ Z[x] be a polynomial of degree d , with
distinct roots over C . Then the strong abc-conjecture over Q implies
rad(f (n)) ε,f |n|d−1−ε
for non-zero n ∈ Z with f (n) = 0.
Proof: The polynomial F (x, y) = y d+1 f (x/y) is a homogeneous polynomial of
degree d+1 with distinct linear factors over C . Then apply the preceding theorem
to F (n, 1) = f (n). 
Example 12.2.14. The following conditional result of analytic number theory is a unusual
application of the abc -conjecture, due to A. Granville [128].
Let f (x) ∈ Z[x] be a polynomial without multiple roots. Assume also that f (n) has no
fixed square divisor for n ∈ Z . Then it is conjectured that f (n) takes squarefree values
infinitely often, in fact for a set of integers of positive density. This is easy to prove for
degree 1 , not difficult for degree 2 using sieve methods, and more delicate arguments can
be used for dealing with degree 3 , but not much more is known for larger degrees. We have
Theorem 12.2.15. Assume the strong abc -conjecture and let f (x) be as above. Then
the sequence (f (n))n∈N contains infinitely many squarefree integers. More precisely, the
sequence of positive integers n such that f (n) is squarefree has positive density
|{n ≤ x | f (n) is squarefree}| ∼ c(f ) x
for some constant c(f ) > 0 .
Proof: We may assume that f is not a constant. Let ω(p) be the number of solutions of
the congruence f (a) ≡ 0 (mod p2 ) , hence the integers n with f (n) not divisible by p2
form a sequence of density
ω(p)
cp (f ) = 1 − 2 .
p
If cp (f ) = 0 , then p2 divides f (n) for every n , which was excluded by hypothesis.
If p does not divide the discriminant of f , any solution of the congruence f (a) ≡ 0
(mod p) lifts uniquely to a solution (mod p2 ) , as we see using Hensel’s lemma in 1.2.10.
410 T H E abc-CONJECTURE


This proves ω(p) ≤ deg(f ) and hence the infinite product p cp (f ) is absolutely
convergent.

Let x be a large integer and let M := p≤√log x p2 . By the Chinese remainder theo-
rem,
 in any interval {k,
 . . . , k + M − 1} of M consecutive integers 2there are exactly
p≤√log x p − ω(p) integers n such that f (n) is not divisible by p for every prime
√ 2

p ≤ log x . As usual, we denote the number


 of primes up to z by π(z) . By the elemen-
tary estimate π(z)  z/ log z , we have p≤z log p ≤ π(z) log z  z ; hence

M = eO( log x)
= o(x).
Therefore,√the number of integers n ≤ x for which f (n) has no prime factors p2 |f (n)
with p ≤ log x is equal to
= x>  ω(p)

M 1− 2 + O(M ) ∼ c(f ) · x,
M √ p
p≤ log x

where c(f ) := p cp (f ) is non-zero, as we have seen above.

The number of integers n ≤ x for which p2 |f (n) and log x < p < x is majorized by


ω(p) x
2
·x+ ω(p)  √
√ p p<x
log x
p> log x

again because of the elementary estimate π(x)  x/ log x , is also negligible in our
counting.
It remains to show that the sequence of integers n , for which p2 |f (n) , for some prime
p ≥ n , has zero density. This is the difficult step and sieve methods, combined with
additional ingenious ideas, have been used to prove this for polynomials f (x) of degree at
most 3 .
Unfortunately, this approach fails if the degree of f (x) is 4 or more. However, on the as-
sumption of the abc -conjecture we can use a clever trick and Corollary 12.2.13 to conclude
the proof in a single stroke. We choose an integer m larger than the distance of any two
roots of f (x) and let l be another positive integer, which is at our disposal. Consider the
polynomial
g(x) = f (x)f (x + m) · · · f (x + lm)
and note that the assumption on m ensures that g(x) too has no multiple roots.
By Corollary 12.2.13, the strong abc -conjecture implies that

p g,ε |n|deg(g)−1−ε
p|g(n)

for n ∈ Z with g(n) = 0 . Therefore, if g(n) = uv 2 we must have


|uv| g,ε |n|deg(g)−1−ε .
Noting that g(n) = uv 2 has precise order |n|deg(g) , we get |v| g,ε |n|1+ε .
For n sufficiently large, this shows that of the integers f (n), f (n + m), f (n + 2m), . . . ,
f (n+lm) only one can be divisible by p2 for some prime p ≥ n and, splitting the integers
into m progressions modulo m , only m of the integers f (n), f (n + 1), . . . , f (n + lm)
12.3. Belyı̆’s theorem 411

may admit such a square factor. In particular, the density of integers n for which f (n)
admits a square factor p2 with p ≥ n is at most m/(lm) = 1/l . Since l can be chosen
arbitrarily large, the set of such integers n has density 0 , concluding the proof. 

12.3. Belyı̆’s theorem

Our next goal is Belyı̆’s striking theorem that a complex projective curve is defined over Q
if and only if it is a covering over P1C , unramified outside of {0, 1, ∞} . It will be of minor
importance in our book and the reader may skip it in a first reading.
We begin by gathering the necessary facts about coverings, referring for details to W.S.
Massey [196], Ch.5, and J.B. Conway [70], Ch.16. The reader is assumed to be familiar
with the connexion between algebraic and analytic structures on a smooth complex variety
as provided by A.14. Then we will prove Belyı̆’s theorem using the language of schemes.
An extension of the last part of the proof will yield a result of Grothendieck.
12.3.1. A topological covering is a continuous map π : Y → X of non-empty topological
spaces which is locally trivial, i.e. every x ∈ X has an open neighbourhood U and a
discrete non-empty topological space F such that π −1 (U ) is homeomorphic to U × F
with π corresponding to the first projection.
12.3.2. A morphism of topological coverings π1 : Y1 → X , π2 : Y2 → X is a continu-
ous map ϕ : Y1 → Y2 with π1 = π2 ◦ ϕ . If Y1 = Y2 and ϕ is a homeomorphism, then
ϕ is called an automorphism of the covering. The group of automorphisms of a topological
covering is called the covering group.
12.3.3. We always assume in this section that X is a connected locally contractible topo-
logical space with a base point x0 ∈ X . The fundamental group π1 (X, x0 ) is the group
of homotopy classes of loops with origin x0 . For a topological covering π : Y → X ,
the fundamental group π1 (X, x0 ) acts from the right on the fibre π −1 (x0 ) . Explicitly, for
y ∈ π −1 (x0 ) and γ ∈ π1 (X, x0 ) , we define y γ as the end point of the unique lift of γ to
Y with starting point y . Every morphism of topological coverings of X restricts to a map
of fibres over x0 compatible with the action of π1 (X, x0 ) .
12.3.4. There is a bijective correspondence between isomorphism classes of connected
topological coverings with base point over x0 and subgroups of π1 (X, x0 ) , similar to the
Galois correspondence in the theory of algebraic field extensions. A connected topological
covering π : Y → X with base point y0 over x0 corresponds to the subgroup π1 (Y, y0 )
of π1 (X, x0 ) . Note that the identification of the elements of π1 (Y, y0 ) with its images in
X is allowed because this homomorphism is injective. If ϕ : Y1 → Y2 is a morphism of
connected topological coverings mapping the base point of Y1 to the base point of Y2 , then
it has to be surjective and induces an inclusion H1 ⊂ H2 of corresponding fundamental
groups. Moreover, every inclusion of subgroups arises this way and the corresponding ϕ is
unique up to isomorphisms.
12.3.5. Let Y be a connected topological covering of X with base point y0 over x0 . Then
the (right) action of G := π1 (X, x0 ) on the fibre π −1 (x0 ) is transitive. The stabilizer H
of y0 is equal to π1 (Y, y0 ) . Let N (H) := {g ∈ G | gHg −1 = H} be the normalizer
of H in G . Then N (H)/H is isomorphic to the covering group Γ . In fact, the choice of
412 T H E abc-CONJECTURE

y0 leads to an identification of π −1 (x0 ) with the right coset space H\G and N (H)/H
operates from the left giving rise to the isomorphism with Γ .
If H is a normal subgroup of G , then Γ ∼ = G/H operates transitively and freely on
π −1 (x0 ) and we may write X = Γ\Y , as a quotient of Y by the left Γ -action.
12.3.6. If we change the base point from x0 to x1 , then we choose a path ρ from x0 to
x1 getting an isomorphism
π1 (X, x0 ) −→ π1 (X, x1 ), γ
→ ρ−1 γρ,
where ρ−1 is obtained from ρ by following the reverse direction. Now fixing x0 but
varying the base point y0 of Y in 12.3.4, we get a conjugated subgroup π1 (Y, y1 ) of
π1 (Y, y0 ) in π1 (X, x0 ) . Hence there is a bijective correspondence between isomorphism
classes of connected topological coverings of X and conjugation classes of subgroups of
π1 (X, x0 ) .
The connected topological covering corresponding to the trivial subgroup {1} is the uni-
versal covering X & of X . By 12.3.4 and 12.3.5, the covering group Γ
& of X& is isomorphic
to π1 (X, x0 ) and there is a bijective correspondence between conjugation classes of sub-
groups Γ of Γ & and isomorphism classes of connected topological coverings Y of X given
by Y = Γ\X &.

The covering Y is called finite if the number of fibre points is finite. The number of fibre
& : Γ] , hence the covering Y is finite over X if
points in Y of X is equal to the index [Γ
&.
and only if Γ has finite index in Γ
12.3.7. Now let X be a connected Riemann surface, in other words a connected complex
manifold of dimension 1 . Then every topological covering π : Y → X has a canonical
analytical structure: By 12.3.1, there is an analytic atlas (Uι )ι∈I of X such that Y is
locally trivial over Uι . Then we use the atlas (π −1 (Uι ))ι∈I of Y to define Y as a complex
manifold. Moreover, it is clear that every morphism of topological coverings of X will be
analytic.
12.3.8. Let us also assume that X is algebraic and π : Y → X is a finite connected
topological covering. We have seen in 12.3.7 that the covering is analytic. Let X be
the irreducible smooth projective curve containing X as an open subvariety (see A.13.3).
We use the complex manifold structure on X and X . A local application of 12.3.6 to
{z ∈ C | 0 < |z| < 1} proves that π extends uniquely to a finite ramified covering
π : Y → X , namely to a holomorphic map of connected Riemann surfaces locally of the
form z
→ z n for some n ∈ N , n ≥ 1 .
The Riemann existence theorem says that every compact Riemann surface is projective
algebraic (see [148], Th.B.3.1). This applies to Y and the GAGA-principle in A.14.7 shows
that π is algebraic. We conclude that every finite topological covering of X is algebraic
and that every morphism of topological coverings of X extends uniquely to a ramified
algebraic morphism of the smooth projective compactifications. As an algebraic morphism,
π is finite (because π is finite) and also étale. The latter follows from the fact that π is
analytically a local isomorphism and from A.12.18.
12.3.9. Conversely, we claim that every finite étale algebraic morphism π of a variety X 

on to a smooth complex variety X gives rise to a finite topological covering πan : Xan →
12.3. Belyı̆’s theorem 413

Xan . To see this, note first that πan is a local isomorphism (use A.12.18). For x ∈ X with
−1
πan (x) = {x1 , . . . , xn } , we get disjoint open neighbourhoods Uj of xj in the complex
topology such that πan maps Uj biholomorphically onto an open neighbourhood U of x .
−1
We need to prove, for U sufficiently small, that πan (U ) = U1 ∪ · · · ∪ Un , yielding local
triviality of πan , but this is an easy consequence of properness of πan (cf. A.14.6).

12.3.10. The universal covering space of P1an \ {0, 1, ∞} is the upper half plane H :=
{z ∈ C | z > 0} . The covering map is the modular function λ known from the
proof of Picard’s little theorem (see L. Ahlfors [8]; in 13.2.35 we give another proof using
Nevanlinna theory). Let Γ(2) be the kernel of the reduction modulo 2 on SL(2, Z) and let
Γ(2) be the image of Γ(2) in P SL(2, Z) = SL(2, Z)/{±1} . Then the covering group of
λ : H → P1an \ {0, 1, ∞} is Γ(2) . The next result is Belyı̆’s theorem:

Theorem 12.3.11. Let C be an irreducible smooth projective curve defined over C . The
following conditions are equivalent:

(a) There is a curve C  defined over Q with C isomorphic to the base change CC .
(b) There exists a non-constant rational function f : C → P1C ramified at most over
three points.
(c) There is a subgroup Γ of finite index in Γ(2) such that Γ\H is isomorphic to Uan
for a Zariski open subset U of C .

Moreover, if (a) holds, if we identify CC with C , and if S is a finite subset of C  (Q) , then
Γ in (c) can be chosen such that S ⊂ C \ U .

Proof: (a) ⇒ (b). This follows from Belyı̆’s lemma in 12.2.7.


(b) ⇒ (c). Let f : C → P1C be the rational function in (b); we may assume, after a
projective linear transformation of P1C , that f is unramified outside of {0, 1, ∞} . For
U := f −1 (P1C \ {0, 1, ∞}) , the map f : Uan −→ P1an \ {0, 1, ∞} is a finite unramified
connected covering. Since H is the universal covering space of P1an \ {0, 1, ∞} with
covering group Γ(2) (cf. 12.3.10), there is a subgroup Γ of Γ(2) of finite index such that
Uan ∼= Γ\H (cf. 12.3.6). This proves (c).
Moreover, if (a) holds and S is a finite subset of C  (Q) , then Belyı̆’s lemma proves that
there is a non-constant rational function f on C unramified outside of {0, 1, ∞} with
f (S) ⊂ {0, 1, ∞} . With this f in the proof of the implication (b) ⇒ (c), we get S ⊂
C \U.
(c) ⇒ (a). Let Γ be a subgroup of Γ(2) of finite index. By 12.3.6 and 12.3.10, we have a
finite topological covering f : Uan = Γ\H → P1an \ {0, 1, ∞} .
We first give a brief sketch of the proof. Evidently, f is defined over a finitely generated
Q -algebra R ⊂ C . Then the affine Q -variety S with coordinate ring R parametrizes a
family (fs )s∈S of coverings of P1 \ {0, 1, ∞} with generic fibre f . By shrinking S , we
will show that this family is étale leading to unramified coverings. Proving that the fibres
have the same monodromy, we will conclude that the coverings are isomorphic. Comparing
the generic fibre with the fibre over s ∈ S(Q) , we will get the claim.
414 T H E abc-CONJECTURE

The details are as follows. By 12.3.8, f is a finite étale algebraic morphism and therefore
U is affine. The coordinate ring of X := P1 \ {0, 1, ∞} over C is
? @
1 1
C[X] = C x, ,
x x−1
and C[U ] is a finitely generated C[X] -module. There are variables y = (y1 , . . . , yN )
and x = (x, x1 , 1−x1
) such that C[U ] = C[x, y]/I for an ideal I . By finiteness, we may
assume that y1 , . . . , yN generate C[U ] as a C[X] -module, hence


N
yi yj − λk (x)yk ∈ I (12.3)
k=1

for suitable λk (x) ∈ C[X] . Hence I is generated by polynomials p1 (x, y), . . . , pr (x, y)
consisting of all polynomials in (12.3) and some other polynomials of degree 1 in y . The
coefficients of these polynomials are contained in a subring R of C containing Q such that
R is a finitely generated Q -algebra. We note that R is an integral domain.
Now it is convenient to use the language of schemes. We consider the affine scheme

U0 := Spec (R[x, y]/(p1 (x, y), . . . , pr (x, y)))

of finite type over the integral affine scheme S := Spec(R) and let us denote by π :
U0 → S the morphism of structure. By construction, we have a canonical finite morphism
f0 : U0 → XR defined over R whose base change from R to C is f . Note that R ⊂
C induces a geometric point z ∈ S(C) , namely a morphism z : Spec(C) → S . By
construction, the fibre (f0 )z of f0 over z is f and hence étale. The image of z is the
generic point ζ of S and (f0 )z is the base change of the fibre (f0 )ζ to C . By flat descent,
we conclude that (f0 )ζ is an étale morphism (cf. A. Grothendieck and J.A. Dieudonné
[137], Prop.17.7.1). Obviously, (f0 )ζ is also surjective.
Note that f0 is flat in every point u over ζ . This follows from the fact that OU 0 ,u is
canonically isomorphic to the local ring of u in the fibre (U0 )ζ , from the similar fact for
f0 (u) and from flatness of (f0 )ζ (for details about fibres, cf. [139], 3.4). Clearly, f0 is
also unramified in u because this is a property of the fibre over π(u) = ζ . This implies
that f0 is étale in all the points of (U0 )ζ . The étale points of f0 form an open subset V0
of U0 (cf. B.3.2).
Note that there is an open dense subset S0 of S such that π −1 (S0 ) ⊂ V0 . This follows
from a theorem of Chevalley stating that the image of a morphism is a constructible sub-
set, i.e. a finite disjoint union of intersections of a closed and an open subset (cf. [148],
Ex.s II.3.18, II.3.19). Since (U0 )ζ ⊂ V0 , we get ζ ∈ T0 := π(U0 \ V0 ) . Then the con-
structibility of T0 implies that ζ is not contained in the closure of T0 in S and, by setting
S0 := S \ T 0 , we get our claim.
Now we consider S0 as an irreducible variety over Q and, by passing to a dense open
subset, we may assume that S0 is smooth (cf. A.7.16). Thus X ×S0 is a smooth irreducible
variety over Q (cf. A.4.11 and A.7.17). Since the restriction of f0 to π −1 (S0 ) is étale, we
conclude that π −1 (S0 ) is a smooth variety over Q (cf. [137], Prop.17.3.3). By Hilbert’s
Nullstellensatz in A.2.2, there is an algebraic point s ∈ S0 (Q) .
12.3. Belyı̆’s theorem 415

By base change of f0 from Q to C , we obtain a finite étale morphism

F : π −1 (S0 )C −→ X × (S0 )C

over (S0 )C . By construction, the fibre Fz is f and the fibre Fs is equal to the base change
of (f0 )s from Q to C . By 12.3.9 and connectedness of Uan , we conclude that Fan is a
connected topological covering between associated complex manifolds.
We choose x ∈ X(C) . Let ρ be a path in (S0 )an from z to s and let ρx = {x} × ρ . For
y ∈ F −1 (x, z) , let y ρ be the endpoint of the lift of ρx to a path with origin y .
The fibres of F over S are finite étale morphisms, therefore (fz )an and (Fs )an are also
topological coverings of Xan = P1an \ {0, 1, ∞} . For γ ∈ G := π1 (Xan , x) , let γz :=
γ × {z} and γs := γ × {s} . Obviously, we have

ρ−1
x γz ρx = γs ∈ π1 (Xan × (S0 )an , (x, s)).

For y ∈ Fz−1 (x) = F −1 (x, z) , we get

(y γ )ρ x = (y γ z )ρ x = (y ρ x )γ s = (y ρ x )γ

meaning that the map y


→ y ρ x is a G -equivariant bijection of F −1 (x, z) onto F −1 (x, s) .
Because (Fz )an = fan is a connected topological covering, G operates transitively on
F −1 (x, z) and hence the same is true for the action on F −1 (x, s) . This proves that (Fs )an
is also a connected covering. Moreover, it follows that both coverings correspond to conju-
gated subgroups of G (given as stabilizers of the actions). By the Galois correspondence
in 12.3.6, the topological coverings (Fz )an and (Fs )an are isomorphic. By 12.3.8, the
isomorphism is algebraic. This proves (a) with C  = (U0 )s . 

Theorem 12.3.12. Let X be a variety over Q and let ϕ be a finite topological covering of
(XC )an . Then there is a finite étale morphism ψ : Y → X of varieties over Q such that
(ψC )an = ϕ . Moreover, Y and ψ are unique up to isomorphism.

Proof: The arguments are similar to the proof of (c) ⇒ (a) for Belyı̆’s theorem in 12.3.11.
The details are left to the reader. We give the following hints:

(a) By passing to the connected components, we may assume that X and Y are both
connected.
(b) There is a unique way to endow Y with the structure of a complex space such that
ϕ is a finite analytic morphism which is locally biholomorphic.
(c) For algebraicity, we have to use Grothendieck’s generalization of the Riemann
existence theorem (cf. A. Grothendieck [140], Exposé XII, Th.5.1; see also [148],
Th.B.3.2 for the original version of Grauert–Remmert):

Theorem 12.3.13. Let us assign to every finite étale covering of a complex algebraic variety
Z the associated analytic morphism. Then this gives an equivalence from the category of
finite étale coverings of Z to the category of finite analytic coverings of Zan (meaning
finite morphisms which are locally biholomorphic).
416 T H E abc-CONJECTURE

12.4. Examples

We start with the abc-theorem for polynomials which is a direct consequence of


Hurwitz’s theorem and holds for ε = 0 and C(ε) = 1. The proof easily extends to
function fields and will imply the abc-theorem of W.W. Stothers and R.C. Mason
in Section 14.5. The analogy with number fields gives some evidence for the abc-
conjecture.
On the other hand, the theorem of C.L. Stewart and R. Tijdeman [291], which
will be our next goal, shows that the abc-conjecture over Q does not hold for
ε = 0 whatever constant C(ε) we choose. This is a clear instance that the analogy
between function fields and number fields can have some very subtle points to it
and should not be followed blindly as a wholesale tool for making conjectures. The
proof of this theorem relies only on Dirichlet’s pigeon-hole principle and the prime
number theorem. There is a connexion from the proof to the birthday paradox and
we sketch some heuristics of Granville on this theme.
The abc-theorem for polynomials has the following form:

Theorem 12.4.1. Let K be a field of characteristic 0 and let a(x), b(x), c(x) ∈
K[x] be not all constant, coprime in pairs and such that a(x) + b(x) + c(x) = 0.
Let rad(abc) be the monic polynomial with simple zeros at the zeros of abc. Then

max{deg(a), deg(b), deg(c)} ≤ deg(rad(abc)) − 1.

We give here two simple proofs, which will be reinterpreted and extended to more
general cases in Section 14.5.
First proof: Let us first observe that on the right-hand side, we count the roots
z without their multiplicity ordz . Clearly, we may assume that K algebraically
closed and, by a permutation, deg(a) = deg(b) ≥ deg(c).
The hypotheses of the theorem imply that none of a, b , c is identically 0 and the
rational functions u := −a/c, v := −b/c are not constant, with

u + v = 1.

First, note that the degree of u as a morphism P1 → P1 satisfies

deg(u) = max{deg(a), deg(b), deg(c)}.

For the ramification divisor Ru , we have




⎨ordz (u) − 1 if z ∈ u−1 (0),
ordz (Ru ) ordz (v) − 1 if z ∈ u−1 (1),


−ordz (u) + 1 if z ∈ u−1 (∞).
12.4. Examples 417

By Hurwitz’s theorem in B.4.6, we have



−2 ≥ deg(u) · (−2) + ordz (Ru ).
z∈u−1 ({0,1,∞})

Since deg(u) = deg(v) and



 deg(u) if λ = ∞,
ordz (u)
z∈u−1 ({λ})
− deg(u) if λ = ∞,

we conclude that
 
−2 ≥ deg(u) − u−1 ({0, 1, ∞}) . (12.4)
 
It remains to give a upper bound for u−1 ({0, 1, ∞}) .
We distinguish two cases. Suppose first that deg(u) = deg(a) = deg(b) =
deg(c). Then

supp(u−1 (0)) = {z ∈ K | a(z) = 0},


supp(u−1 (1)) = {z ∈ K | b(z) = 0},
supp(u−1 (∞)) = {z ∈ K | c(z) = 0},
and, since a, b , c are coprime, we get
−2 ≥ deg(u) − deg(rad(abc)),
which is even better than the conclusion of the theorem.
Otherwise, if deg(u) = deg(a) = deg(b) > deg(c), then the support of u−1 (∞)
must be increased by adding the point ∞ . This increases the final counting by 1,
completing the proof. 
Second proof: Consider the equation a(x) + b(x) = c(x) and differentiate with
respect to x . We denote differentiation by  . Then
1·a + 1·b + 1 · (−c) =0
a b c
a·a + b·b + c · (−c) = 0,
which we view as a homogeneous linear system of two equations with a solution
(a, b, −c). The associated matrix is

1 1 1
a b c .
a b c
  
Clearly, its rank is 2 otherwise aa = bb = cc , contradicting the assumption that a,
b , c are coprime and not all constant. By Cramer’s rule, it follows that the solution
418 T H E abc-CONJECTURE

(a, b, −c) of the above linear system is proportional to the vector of cofactors of
the matrix. Let λ ∈ K(x) be the the associated proportionality factor, hence
c b
− = λ · a,
c b
a c
− = λ · b,
a c

b a
− = λ · (−c).
b a
Recall that rad(abc) is the monic polynomial with simple zeros at the zeros of
abc. By the basic property d log(f g) = d log(f ) + d log(g), it is clear that
the product of rad(abc) with each cofactor on the left is a polynomial. Since
a, b, c are coprime, we conclude that the denominator of λ must be a divisor
of rad(abc). Since each cofactor vanishes at ∞ , taking degrees we infer that
max(deg(a), deg(b), deg(c)) ≤ deg(rad(abc)) − 1. 
Remark 12.4.2. Theorem 12.4.1 is sharp precisely whenever u := −a/c defines
a Belyı̆ morphism u : P1an → P1an ramified only over {0, 1, ∞} and a, b , c are
not all of the same degree. This is clear from the first proof of the theorem, because
additional ramification will contribute an additional positive term to the right-hand
side of (12.4).
Remark 12.4.3. A more natural geometric formulation of Theorem 12.4.1 is ob-
tained by replacing a, b , c by non-zero elements of the function field K(x) with
a + b + c = 0 and introducing the obvious projective height function

h((a : b : c)) = − min(ordz (a), ordz (b), ordz (c)),
z∈P1 (K)

with ordz = − deg if z is the point at ∞ of P1 (K), thus with a contribution at


z = ∞ of max(deg(a), deg(b), deg(c)). Then, if S is a finite subset of P1 (K)
such that a, b , c are S -units, Theorem 12.4.1 is equivalent to the inequality
h((a : b : c)) ≤ |S| − 2.
Equality holds if and only if u := −a/c defines a Belyı̆ morphism and S is the
set of valuations for which either u or 1 − u are not units.
This formulation admits a natural extension in which we replace the field of ra-
tional functions K(x) by a function field in one variable, of characteristic 0 and
genus g . Under the same hypotheses on a, b , c , we have the theorem of Stothers
and Mason
h((a : b : c)) ≤ |S| + 2g − 2,
which will be proved in Theorem 14.5.13.

The second proof of the abc-theorem for polynomials easily extends to prove the
general abc-theorem for polynomials.
12.4. Examples 419

Theorem 12.4.4. Let K be a field of characteristic 0 and let ai (x) ∈ K[x],


i = 1, . . . , n, where n ≥ 3, be polynomials such that a1 (x) + . . . + an (x) = 0.
Suppose that the polynomials ai (x) have no common zero for all i = 1, . . . , n
and also that no proper subsum of a1 (x) + . . . + an (x) vanishes identically. Then
  n
2
(n − 1)(n − 2)
max deg(ai ) ≤ max deg rad ai − 1, 0 .
i 2 i=1

All these generalizations will be proved, in more detail and in a more general
setting, in Section 14.5.
Remark 12.4.5. The coefficient is sharp if n = 3 or 4. If n ≥ 5, it is not clear
what is the best coefficient in Theorem 12.4.4. The simple example
 n − 2

n−2
xej − (xe + 1)n−2 = 0
j=0
j

shows that 12 (n−1)(n−2) cannot be replaced by any constant smaller than n−2.
Better lower bounds can be provided for n ≥ 4. The following clever example is
in J. Browkin and J. Brzeziński [51]. Let Pk (x) be the polynomial of degree k
such that

x2k+1 − 1 k (x − 1)2
= x Pk .
x−1 x
It is easy to see that all roots of Pk are negative, hence Pk has positive coefficients.
If we take k = n − 3 and set xe in place of x , we obtain an identity

n−3
x(2n−5)e − 1 − sj (xe − 1)2j+1 x(n−3−j)e = 0
j=0

with coefficients sj > 0. This is a vanishing sum of n polynomials and no proper


subsum vanishes, as we verify for example by specializing x = 2, since then
only the first term in the identity is positive and all others are negative. Now the
maximum degree is (2n − 5)e, while the radical (xe − 1)x has degree e + 1.
Therefore, the best coefficient in Theorem 12.4.4 is at least 2n − 5. In particular,
the precise coefficient is 3 if n = 4.

The exponent 1 + ε is necessary in the abc -conjecture over Q , as we see by the following
interesting result, due to Stewart and Tijdeman [291].
Theorem 12.4.6. For every δ > 0 , there are infinitely many triples a , b , c of coprime
positive numbers with a + b = c and

(4−δ) loglog c 
c>e log c p.
p|abc
420 T H E abc-CONJECTURE

Note that the factor is of order O(cε ) for any ε > 0 and hence this lower bound is com-
patible with the strong abc -conjecture in 12.2.2.
12.4.7. The strategy of proof is quite simple. If we restrict a and c to integers whose
prime factors belong to a rather small set, then the radical of ac is small. On the other
hand, the number of such integers up to x is large, and an application of Dirichlet’s pigeon-
hole principle shows that we can make b = c − a fairly small. It does not matter much
which absolute value we use for this purpose, and we shall use the 2 -adic topology in our
argument. If b is small in this sense, then its radical is smaller than expected, thus giving a
non-trivial example of an (a, b, c) -triple.
We begin with counting the number of integers up to N all of whose prime factors belong
to a fixed set P , provided the set P is not too large.

Proposition 12.4.8. Let P be a finite set of primes and let ϑP = p∈P log p . Then the
number Ψ(N, P) of integers n ≤ N with prime factors only from P is bounded by

−1
−1
(log N )|P|  (log N + ϑP )|P| 
log p ≤ Ψ(N, P) ≤ log p .
|P|! p∈P
|P|! p∈P

Proof: Let n = p∈P pa p and let us associate to n the integer vector a = (ap )p∈P . This
makes it clear that Ψ(N, P) is the number of lattice points in
$  %
T (log N ) := x ∈ RP | (log p)xp ≤ log N, xp ≥ 0 for p ∈ P .
p∈P

Let A be the set of lattice points in T (log N ) . We associate to a lattice point a ∈ A the
unit cube C(a) = {z | ap ≤ zp < ap + 1, p ∈ P} . Then it is clear that

T (log N ) ⊂ C(a) ⊂ T (log N + ϑP ).
a∈A

Taking volumes, we obtain what we want. 


Let Ψodd (x, y) be the number of y -smooth odd integers up to x , in other words the number
of odd integers n ≤ x which admit only prime divisors p ≤ y . Let π(y) be the number of
primes up to y . We have
Lemma 12.4.9. If y = o(log x) , then

y
log Ψodd (x, y) = (π(y) − 1) log log x − y + o .
log y
Proof: We apply Proposition 12.4.8 with N = x , taking for P the set of all odd primes
not exceeding y , which has cardinality |P| = π(y) − 1 . We get

−1
−1
(log x)|P|  (log x + ϑP )|P| 
log p ≤ Ψodd (x, y) ≤ log p . (12.5)
|P|! p∈P
|P|! p∈P

Partial summation of
 
log log p = (π(n) − π(n − 1)) log log n
2<p≤y 2<n≤y
12.4. Examples 421

yields


log log p = (π(n) − π(n − 1)) log log y
p∈P 2<n≤y
 m  du
+ (π(n) − π(n − 1))
m+1 2<n≤u u log u
2<m≤y−1
y
π(u) − 1
= (π(y) − 1) log log y − du.
3 u log u
By the prime number theorem and integration by parts, we deduce


y
log log p π(y) log log y + O . (12.6)
p∈P
(log y)2

Similarly, we find


y
ϑP = log p = y + o . (12.7)
log y
2<p≤y

Then the hypothesis y = o(log x) implies




y
|P| log(log x + ϑP )(π(y) − 1) log log x + o . (12.8)
log y
Now (12.5), (12.6), (12.8) and Stirling’s formula show that

y
log Ψodd (x, y) = (π(y)−1) log log x−π(y) log π(y)+π(y)−π(y) log log y+o .
log y
The lemma follows from the prime number theorem in the stronger form (see G.J.O.
Jameson [158], Exercise 1.5.4, Th. 5.1.8)

y y y
π(y) = + + o . 
log y (log y)2 (log y)2

12.4.10. Proof of Theorem 12.4.6: We choose y = log x and for simplicity we abbreviate
M = Ψodd (x, y) . We also define k by 2k < M ≤ 2k+1 . Now consider the sequence 1 =
n1 < n2 < · · · < nM ≤ x of odd integers up to x without prime factors larger than y .
Since 2k < M , by Dirichlet’s pigeon-hole principle there are two distinct elements ni <
nj in this sequence, having the same residue class modulo 2k . Let d = GCD(ni , nj )
and set
a = ni /d, b = (nj − ni )/d, c = nj /d.
The triple (a, b, c) so obtained is the desired good example for the abc -conjecture.
By construction, 2k divides nj − ni = d · b . On the other hand, ni and nj are both odd,
thus d is odd. Hence 2k divides b , giving
rad(b) ≤ 2−k+1 b ≤ 4M −1 b.
Also, every prime factor of ni nj does not exceed y , whence

rad(ac) ≤ p = eϑ P .
p∈P
422 T H E abc-CONJECTURE

It now follows from the last two displayed equations that

rad(abc) ≤ 4M −1 eϑ P b. (12.9)

By Lemma 12.4.9, we have


2y y
log M = y + +o ,
log y log y
therefore from (12.7) and (12.9) we infer that

b>K p
p|abc

with (note that c ≤ x )



(4+o(1)) loglog c
log c
K=e .
The result follows. 
Remark 12.4.11. The comparison of this example with Baker’s Conjecture 12.2.6 is of
some interest. In a triple (a, b, c) such
√ as in the example the number of distinct prime factors
of ac is O(y/ log y) , with y = log x . The most unfavorable case for the comparison
occurs when d = GCD(a, c) is negligible, c is of order x , and the number of prime factors
of b is also O(y/ log y) . Then, choosing the parameter ε in Baker’s conjecture optimally,
we would get the upper bound
√ 
c ≤ eκ log c p
p|abc

for some positive constant κ . This hypothetical bound is not too far away from the actual
lower bound. Thus Baker’s conjecture, if true, would probably be close to optimal.

The same method proves:


Proposition 12.4.12. Let P be a finite set of odd primes and let

ϑP = log p.
p∈P

Let N be a positive integer. Then we can find coprime positive integers a , b , c = a + b not
exceeding N , such that the prime factors of a and c are only from the set P and moreover

−1
1 (log N )|P| −ϑ P 
c> e log p rad(abc).
4 |P|! p∈P

Proof: Let M = Ψ(N, P) and let n1 < n2 < · · · < nM be the set of integers up to N
all of whose prime factors are from the set P . Then Proposition 12.4.8 yields

−1
(log N )|P| 
M≥ log p .
|P|! p∈P

If 2k < M ≤ 2k+1 , then by Dirichlet’s pigeon-hole principle we can find two distinct
elements ni , nj in the same class (mod 2k ) ; the rest of the proof is as in 12.4.10. 
12.4. Examples 423

n n
Example 12.4.13. Consider the special case a = 1 , b = 32 − 1 , c = 32 . Then Euler’s
theorem shows that 2n+1 divides b , giving the explicit example
log c
c> rad(abc).
3 log 3
This is comparable in strength with what we can obtain from Proposition 12.4.12 if P
consists of only one prime.
Example 12.4.14. We note here the example 2 + 109 · 310 = 235 , which has high abc -
ratio log c/ log rad(abc) , namely 1.62991 . This was found by E. Reyssat, by searching
for very good rational approximations to numbers of type a1/n . Since
1 1
109 5 = 2 +
1
1+
1
1+
1
4+
77733 + . . .
we see that 239
is a very good convergent to 1091/5 , corresponding to the example (note
that since 9 is a square this example is rather more effective than others of similar type).
Remark 12.4.15. For any coprime integers a, b, c with a+b = c , the abc -ratio is defined
by
log max(|a|, |b|, |c|)
.
log rad(abc)
The strong abc -conjecture implies that the set of accumulation points of all abc -ratios is the
interval [ 13 , 1] . For a proof, note that the lower bound 13 is obvious and the upper bound 1
is the strong abc -conjecture. Now take a = nk g(n) , b = h(n) , c = nk g(n) + h(n) with
suitable polynomials g , h , and apply Theorem 12.2.15 to f (x) := xg(x)h(x)(xk g(x) +
h(x)) to ensure that f (n) is square free infinitely often. For such values of n , we have
rad(abc) = f (n) . As n → ∞ , the abc -ratio tends to
" #
max k + deg(g), deg(h), deg(xk g + h)
.
1 + deg(g) + deg(h) + deg(xk g + h)
The set of these rational numbers is contained in [ 13 , 1] and is dense there. 
12.4.16. A simple heuristic argument has been proposed, suggesting that statements stronger
than Proposition 12.4.12 may be true. This is based on the so-called birthday paradox.
There are precisely ms integral vectors {n1 , . . . , ns } with 1 ≤ ni ≤ m . On the
other hand, the number of such vectors where all components ni are distinct is m(m −
1) · · · (m − s + 1) . By Stirling’s formula, we have
2
m(m − 1) · · · (m − s + 1)/ms ∼ e−s /2m


uniformly for s = o(m2/3 ) . This means that, if we look at vectors of length s ∼ α m
with α = o(m1/6 ) , then the probability of such a vector of having two equal components
2
is asymptotic to 1 − e−α /2 . In other words, a relatively short vector has a positive prob-
ability of having two equal components. For example, in a group of 40 people there is an
89 % chance that two persons will be born on the same day of the year (whence the name
“birthday paradox,” since such a conclusion appears to be implausible at first sight).
424 T H E abc-CONJECTURE

In view of the above, if the sequence ni , i = 1, . . . , M appearing in 12.4.10 were suf-


random, we should find two elements in the same class (mod 2k ) as soon as
ficiently √
M ( 2)k , rather than the condition M > 2k required by Dirichlet’s pigeon-hole
principle. Then the construction in 12.4.10 should lead to the bound
rad(abc)  M −2 eϑ P b
rather than (12.9) on page 422. If this were the case, then there would be a corresponding
2
improvement in Theorem 12.4.6 with y = (log x) 3 , yielding hypothetically
2
(9
(log c ) 3
−δ) log log c

c>e 4 p.
p|abc

12.4.17. On the other hand, it has been pointed out by Granville that such a sequence ni
is unlikely to have the randomness property needed to apply the birthday paradox with
confidence. In fact, the construction proposed before can be rephrased as follows:
Let p1 , . . . , pt be the elements of P (we assume 2 ∈
/ P ). We want to solve the congruence
pa1 1 · · · pat t ≡ pb11 · · · pbt t (mod 2k ).
Let G be the group (Z/2k Z)× of units of the ring Z/2k Z . Then, if g1 , . . . , gt are the
images of P in G , the above congruence becomes a relation
g1a 1 −b 1 · · · gta t −b t = 1
in the group G . Now there are too many choices of pairs (a1 , . . . , at ) , (b1 , . . . , bt ) leading
to the same difference (a1 − b1 , . . . , at − bt ) and the argument on which the birthday
paradox was built collapses. A closer analysis of this argument only shows that the constant
4 − δ in the theorem of Stewart and Tijdeman should be replaced by a larger one.
Remark 12.4.18. Indeed, M. van Frankenhuysen [304], using a packing of spheres argu-
ment, has improved the constant 4 − δ in Theorem 12.4.10 unconditionally to 6.068 .

12.5. Equivalent conjectures

We start with Hall’s conjecture and its strong version. Then we recall some ba-
sic facts about elliptic curves over Dedekind domains to introduce notions such
as reduction, minimal discriminant, and conductor. This enables us to formulate
Szpiro’s conjecture bounding the discriminant in terms of the conductor of an el-
liptic curve over Q . Finally, we prove the equivalence of the abc-conjecture, the
strong Hall conjecture, and the generalized Szpiro conjecture, via Frey’s famous
elliptic curve y 2 = x(x + a)(x − b). At the end, we give the generalization of
Hall’s conjecture due to Hall–Lang–Waldschmidt–Szpiro.
12.5.1. In 1969, on the basis of what at that time was considered extensive numer-
ical evidence, Marshall Hall [144] conjectured that there is a positive constant C
1
such that |x3 − y 2 | ≥ C|x| 2 for x, y ∈ Z with x3 − y 2 = 0. The exponent 1/2
in this statement cannot be improved upon, as shown a little later by L.V. Danilov,
12.5. Equivalent conjectures 425

who proved in [75] that 0 < |x3 − y 2 | < 0.97|x|1/2 has infinitely many solutions
in integers x , y .
The Hall conjecture is unlikely to be true as originally formulated and nowadays
we refer to Hall’s conjecture as the slightly weaker statement in which the expo-
nent 12 is replaced by 12 − ε and C by some C(ε) > 0, for every fixed ε > 0. An
equivalent formulation is that, for any solution of y 2 = x3 − z with x, y, z ∈ Z
and z = 0 viewed as a parameter, we have |x| ε |z|2+ε , |y| ε |z|3+ε .
12.5.2. What is of interest to us here is a stronger form of the conjecture, the
strong Hall conjecture below, claiming a bound in terms of rad(z) rather than
|z|. However, we have to avoid the counterexample
 4 3  6 2
2p − 3p = −p12 .
Note that every solution of z = x3 − y 2 = 0 with GCD(x3 , y 2 ) divisible by a
sixth power g 6 may be reduced to a solution x := x/g 2 , y  := y/g 3 , z  := z/g 6
such that GCD(x , y  ) is free from sixth powers. We call (x , y  , z  ) a primitive
3 2

solution.
Conjecture 12.5.3. Given ε > 0, every primitive solution of x3 − y 2 = z = 0
satisfies
|x| ε rad(z)2+ε , |y| ε rad(z)3+ε .
12.5.4. Another conjecture concerns the discriminant and the conductor of an el-
liptic curve. We recall first some basic facts (cf. [284] for more details).
Let K be any field. By (8.2) on page 241, an elliptic curve over K may be given
by an affine Weierstrass equation
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6 . (12.10)
Using the quantities
b2 = a21 + 4a2 , b4 = a1 a3 + 2a4 , b6 = a23 + 4a6
and
b8 = a21 a6 + 4a2 a6 − a1 a3 a4 + a2 a23 − a24 ,
we define
c4 = b22 − 24b4 , c6 = −b32 + 36b2 b4 − 216b6
and the discriminant
∆ = −b22 b8 − 8b34 − 27b26 + 9b2 b4 b6 .
If char(K) = 2, then replacing y by 12 (y − a1 x − a3 ) leads to the Weierstrass
normal form
y 2 = 4x3 + b2 x2 + 2b4 x + b6 .
426 T H E abc-CONJECTURE

If char(K) = 3 as well, then replacing (x, y) by ( 36


1
(x − 3b2 ), 108
1
y) leads to
the Weierstrass equation
y 2 = x3 − 27c4 x − 54c6 .
In any case, the quantities are defined in such a way that they are suitable in every
characteristic. We have the relation
1728∆ = c34 − c26 . (12.11)
3 2
If a1 = a3 = 0, then ∆ is 16 times the discriminant of x + a2 x + a1 x +
a0 . This normalization leads to the extension of Proposition 10.2.3 that an affine
Weierstrass equation (12.10) describes an elliptic curve if and only if ∆ = 0 (see
[284], Prop.III.1.4). It is easy to see that a Weierstrass equation (12.10) is unique
up to a coordinate transformation of the form
x = u2 x + r, y = u3 y  + su2 x + t (12.12)
for r, s, t, u ∈ K, u = 0. Then a direct calculation shows
u4 c4 = c4 , u6 c6 = c6 , u12 ∆ = ∆. (12.13)
If ∆ = 0 , then there is exactly one singular point on the projective curve described
by (12.10). It is either a cusp or a node (see [284], Prop.III.1.4). The same formulas
as in Proposition 8.3.8 show that the smooth points of the projective curve form
a one-dimensional commutative group variety, hence it is isomorphic to the torus
Gm (case of a node) or to A1 (case of a cusp) over K (see [284], Prop.III.2.5).
12.5.5. Now let v be a discrete valuation on K with valuation ring R and local
parameter π . As usual, we assume that the valuation is normalized by v(π) = 1.
Then v(x) is the order of an element x ∈ K , namely the largest integer for which
x ∈ Rπ v(x) .
It is clear that the elliptic curve E over K may be given by a Weierstrass equation
(12.10) with every ai ∈ R . Such a Weierstrass equation is called minimal if
v(∆) is minimal. Obviously, a minimal Weierstrass equation exists and (12.13)
shows that the corresponding minimal discriminant is unique up to units in R .
Using the transformation formulas for the quantities bi , c4 , ∆ , we prove that the
minimal Weierstrass equation is unique up to transformations of the form (12.12)
with r, s, t ∈ R and u ∈ R× ([284], Prop.VII.1.3). Let a ∈ k(v) be the reduction
of a ∈ R . Then the reduction
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6
of a minimal Weierstrass equation is an affine Weierstrass equation over the residue
field k(v) and the corresponding projective curve E over k(v) is unique up to
isomorphism.
We say that E has good reduction if E is an elliptic curve and otherwise we
speak of bad reduction. In the latter case, if the singular point is a node, the
12.5. Equivalent conjectures 427

reduction is called multiplicative (or semistable) and if the singular point is a


cusp, the reduction is called additive.
Multiplicative reduction is characterized by v(∆) > 0 and v(c4 ) = 0, while
additive reduction is characterized by v(∆) > 0 and v(c4 ) > 0 (see [284],
Prop.VII.5.1). If char(k(v)) = 2, 3 the last condition is equivalent to v(c4 ) > 0
and v(c6 ) > 0.

We also have the following useful necessary condition for minimality.


Proposition 12.5.6. Let K , v , R , π be as before and suppose that char(K) =
2, 3. Let y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6 be an R -Weierstrass equation
for E . Then, if 48π 4 |c4 and 864π 6 |c6 , the given R -Weierstrass equation for E
is not minimal.

As an immediate consequence, we note:


Corollary 12.5.7. For a minimal Weierstrass equation as before, the following
estimate holds


⎨12 if char(k(v)) = 2, 3,
min(3v(c4 ), 2v(c6 )) < 12 + 6v(3) if char(k(v)) = 3,


12 + 12v(2) if char(k(v)) = 2.

Proof of Proposition 12.5.6: The change of variables


 1  1  1 1
(x, y) = x − b2 , y − a1 x + a1 b2 − a3 (12.14)
12 2 24 2
transforms the Weierstrass equation
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6
into
1 1
(y  )2 = (x )3 − c4 x − c6 . (12.15)
48 864
The discriminant ∆ of (12.15) is ∆ = ∆, because the change of variables
(12.14) has determinant 1.
A B
The coefficients of (12.14) are a priori only in the ring R 12 , 13 , but if
1 1 1
b2 ∈ R, a1 ∈ R, a3 ∈ R, (12.16)
12 2 2
then (12.14) has coefficients in R and (12.15) is again an R -Weierstrass equation
equivalent to the original one.
Now we claim that the conditions (12.16) already follow from 48 1
c4 ∈ R and
1
864 c6 ∈ R . Assuming this claim the proof of the proposition is immediate, be-
1
cause if 48 c4 ∈ π 4 R and 8641
c6 ∈ π 6 R then we can make the further change of
428 T H E abc-CONJECTURE

variables (x , y  ) = (π 2 x , π 3 y  ), obtaining a new R -Weierstrass equation with


discriminant ∆ = π −12 ∆ , hence with v(∆ ) = v(∆) − 12 and thereby proving
that our original R -Weierstrass equation is not minimal. Thus it only remains to
prove our claim.
If char(k(v)) ∈ {2, 3}, this is obvious because then 2 and 3 are invertible in R .
If char(k(v)) = 3, then 2 is invertible in R and we need to prove that 13 c4 ∈ R
1
and 27 c6 ∈ R imply 13 b2 ∈ R . This is easy. Since c6 = −b32 + 36b2 b4 − 216b6
1
and 27|216, we see that 27 c6 ∈ R implies

3

1 1
− b2 + 4b4 b2 ∈ R,
3 3
yielding 1
3 b2 ∈ R , as wanted.
If char(k(v)) = 2, then 3 is invertible in R and the argument becomes rather
1
intricate. We do this in several steps. The assumption now is 16 c4 ∈ R and
1
c
32 6 ∈ R , and we want to show that
1 1 1
a1 ∈ R, a3 ∈ R, b2 ∈ R.
2 2 4
Since c4 = b22 − 24b4 and v(c4 ) ≥ 4v(2), we get v(b22 ) ≥ 3v(2) and v(b2 ) ≥
2 v(2). Now b2 = a1 + 4a2 and we conclude that v(a1 ) ≥ 4 v(2).
3 2 3

Next, c6 = −b32 + 36b2 b4 − 216b6 and v(c6 ) ≥ 5v(2), whence


v(216b6 ) = v(b6 ) + 3v(2) ≥ min(v(c6 ), v(b32 ), v(36b2 b4 ))

9 3 7
≥ min 5v(2), v(2), 2v(2) + v(2) = v(2)
2 2 2
and v(b6 ) ≥ 12 v(2) follows. Since b6 = a23 + 4a6 , we see that v(a3 ) ≥ 14 v(2).
Therefore, v(a1 a3 ) ≥ 34 v(2)+ 14 v(2) = v(2) and, recalling that b4 = a1 a3 +2a4 ,
we infer that v(b4 ) ≥ v(2). Now c4 = b22 − 24b4 gives
v(b22 ) ≥ min(v(c4 ), v(24b4 )) ≥ 4v(2)
and v(b2 ) ≥ 2v(2). Hence 1
4 b2 ∈ R , as wanted.
Now b2 = a21 + 4a2 shows that v(a1 ) ≥ v(2) and 1
2 a1 ∈ R , as wanted.
Finally
v(216b6 ) = v(b6 ) + 3v(2) ≥ min(v(c6 ), v(b32 ), v(36b2 b4 ))
≥ min(5v(2), 6v(2), 2v(2) + 2v(2) + v(2)) = 5v(2)
and we get v(b6 ) ≥ 2v(2). Since b6 = a23 + 4a6 , we conclude that v(a3 ) ≥ v(2)
and 12 a3 ∈ R , as wanted. 
12.5. Equivalent conjectures 429

12.5.8. Now let R be a Dedekind domain with quotient field K . Then every
maximal ideal p induces a discrete valuation vp and hence a minimal discriminant
∆p of the elliptic curve E over K . The minimal discriminant of E is the ideal

∆ := pvp (∆p ) ,
p

where p ranges over all maximal ideals of R . Clearly, only finitely many prime
ideals give a non-trivial contribution and ∆ is well defined. In general, there is
no minimal Weierstrass equation with coefficients in R working for all primes.
However for R = Z , there is such a minimal Weierstrass equation working for
all primes (cf. [284], VIII.Cor.8.3) and we call it the global minimal Weierstrass
equation.
12.5.9. We do not give the precise definition of the conductor of an elliptic curve
over a number field K (the reader can find it in A. Ogg [230], T. Saito [256] or
J.H. Silverman [286], IV, §10). All we need to know here is that it is an ideal

cond(E) = p fp ,
p

where p ranges over the maximal ideals of OK and the exponent fp ∈ N has the
following properties (cf. [286], Th.IV.10.2, Cor.IV.11.2):

(a) E has good reduction at p if and only if fp = 0.


(b) E has multiplicative reduction if and only if fp = 1.
(c) E has additive reduction if and only if fp ≥ 2.
(d) The minimal discriminant ∆ is a multiple of cond(E) and is supported in
the same set of prime ideals.
Example 12.5.10. G. Frey’s idea [123] for the proof of Fermat’s last theorem was
to associate to coprime a, b, c ∈ Z \ {0} with a + b = c the curve
y 2 = x(x + a)(x − b).
By Proposition 10.2.3, it is the affine part of an elliptic curve E defined over Q
called the Frey curve. The Weierstrass form is given by
y 2 = x3 + (a − b)x2 − abx. (12.17)
We have
b2 = 4(a − b), b4 = −2ab, b6 = 0, b8 = −(ab)2
and
c4 = 16(a2 + ab + b2 ), ∆ = 16(abc)2 .
A prime number p = 2 divides at most one of c4 , ∆ . Hence (12.13) on page 426
shows that (12.17) is a minimal Weierstrass equation for the prime p .
430 T H E abc-CONJECTURE

If p = 2 and 16 | abc, then 211 | ∆ and the transformation formula for ∆ in
(12.13) on page 426 shows that (12.17) is a minimal Weierstrass equation for p =
2 and hence a global minimal Weierstrass equation.
This is not necessarily true if one of a, b, c is divisible by 16. Up to evident
changes of coordinates, there are two cases:

(a) a ≡ 1 (mod 4) and b ≡ 0 (mod 16) .

In this case, the transformation


x = 4x , y = 8y  + 4x
leads to the Weierstrass form
a − b − 1  2 ab 
(y  )2 + x y  = (x )3 + (x ) − x . (12.18)
4 16
By assumption, the coefficients are integers and (12.13) on page 426 shows
c4 = a2 + ab + b2 , ∆ = 2−8 (abc)2 .
Since c4 is odd and using (12.13) on page 426 again, we conclude that (12.18)
is a minimal Weierstrass equation for p = 2. Clearly, it is a global minimal
Weierstrass equation.

(b) a ≡ −1 (mod 4) and b ≡ 0 (mod 16) .

Again, we prove that (12.17) is a minimal Weierstrass equation for p = 2, hence a


global minimal Weierstrass equation. We argue by contradiction and assume that
there is a transformation (12.12) on page 426 with v2 (∆ ) < v2 (∆) leading to an
R -Weierstrass equation. Using v2 (c4 ) = 4 and (12.13) on page 426, we conclude
v2 (u) = 1. The transformation formula
u8 b8 = b8 + 3rb6 + 3r2 b4 + r3 b2 + 3r4 = −(ab)2 − 6abr2 + 4(a − b)r3 + 3r4
(cf. [284], Table III.1.2) implies now v2 (r) ≥ 2. Finally, we consider the trans-
formation formula
u2 a2 = a2 − sa1 + 3r − s2 = a − b + 3r − s2
showing that v2 (s) ≥ 0. Moreover, reduction modulo 4 proves 0 ≡ −1 − s2
(mod 4), which is impossible, completing the proof.
Note that we do not need this case for applications to the abc-conjecture, because
we are free to replace a by −c and c by −a to go back to case (a).
Next, we consider the reduction E of E at a prime p . If p does not divide the
minimal discriminant ∆ , then E has good reduction. So let us assume p|∆ . For
p = 2, we have p|abc and hence p does not divide a − b . It follows that E is
given by the affine Weierstrass equation
y 2 = x3 + (a − b)x2 .
12.5. Equivalent conjectures 431

Since the residue characteristic is not 2, we see that the singular point (0, 0) is a
node and there is multiplicative reduction.
For p = 2, we assume first that the minimal Weierstrass equation has the form
(12.17) on page 429. Then ∆ ≡ 0 (mod 2) and Frey’s curve E has always bad
reduction at 2. The singular point (x0 , y 0 ) of E is determined by the unique
solution of x20 = ab (we are in characteristic 2). The transformation x = x + x0 ,
yy  + y 0 leads to the Weierstrass equation
(y  )2 = (x )3 + (x0 + a − b)(x )2
for E . As the residue characteristic is 2, the tangents in the singularity have the
same direction and hence it is a cusp, and there is additive reduction at 2.
Finally, assume that the minimal Weierstrass equation of E in p = 2 has the form
(12.18), hence a ≡ 1 (mod 4) and b ≡ 0 (mod 16) . Assuming also that E has
bad reduction at 2, ∆ ≡ 0 (mod 2) leads to b ≡ 0 (mod 32) . Then E is given
by the affine Weierstrass equation
a−b−1
y 2 + xy = x3 + dx2 , d=
4
and it is clear that the singular point (0, 0) is a node. Hence we have multiplicative
reduction.
We conclude that the conductor of E has the form

cond(E) = 2f2 · p,
p |a b c
p =2

where f2 = 0 if E has good reduction at p = 2 and f2 = 1 if E has multiplica-


tive reduction at p = 2. In the case of additive reduction at p = 2, we can only
say that 2 ≤ f2 ≤ v2 (∆) .

Now we are ready to state the generalized Szpiro conjecture:


Conjecture 12.5.11. Let E be an elliptic curve over Q with minimal Weierstrass
equation (12.10) on page 425 over Z and let ε > 0. Then
max(|∆|, |c4 |3 ) ε cond(E)6+ε ,
where the constant involved in the inequality is independent of E .

Theorem 12.5.12. The following conjectures are equivalent:

(a) strong abc-conjecture in 12.2.2 over Q ;


(b) strong Hall conjecture in 12.5.3;
(c) generalized Szpiro conjecture in 12.5.11.
432 T H E abc-CONJECTURE

Proof: (a) ⇒ (b). Let Γ := GCD(x3 , y 2 ). Applying the strong abc-conjecture to


the relation (x3 /Γ) + (−y 2 /Γ) = z/Γ , we get


1+ε
x3 y 2 z
max(|x|3 , |y|2 ) ε Γ rad . (12.19)
Γ3
We claim that

x3 y 2 z
Γ rad ≤ |xy| rad(z). (12.20)
Γ3
For the proof, it is enough to show that
3 2

x y z
vp (Γ) + vp rad ≤ vp (x) + vp (y) + vp (rad(z)) (12.21)
Γ3
holds for every prime p . This will be done by checking the following three cases:
First case: 3vp (x) < 2vp (y)
Then x3 − y 2 = z gives 3vp (x) = vp (Γ) = vp (z) and (12.21) reads as
3vp (x) + 1 ≤ vp (x) + vp (y) + χ(vp (x)), (12.22)
where χ(t) = 0 if t ≤ 0 and χ(t) = 1 if t > 0. Since Γ is free from sixth
powers, only the cases vp (x) = 0, 1 occur and (12.22) obviously holds.
Second case: 2vp (y) < 3vp (x)
Similarly as in the first case, we have 2vp (y) = vp (Γ) = vp (z) and (12.21) is
equivalent to
2vp (y) + 1 ≤ vp (x) + vp (y) + χ(vp (y)).
Checking vp (y) = 0, 1, 2, this is true.
Third case: 3vp (x) = 2vp (y)
Since Γ is free from sixth powers, we have vp (x) = vp (y) = 0 and (12.21) holds.
Now we deduce the strong Hall conjecture from (12.19) and (12.20). We conclude
that
|x|3 ε |xy|1+ε rad(z)1+ε , (12.23)
and
|y|2 ε |xy|1+ε rad(z)1+ε . (12.24)
By combining (12.23) and (12.24), we get
|xy|6 ε |xy|5(1+ε) rad(z)5(1+ε) .
By passing to a sufficiently small ε , as we are allowed to do, we get
|xy| ε rad(z)5+ε .
12.5. Equivalent conjectures 433

Inserting this in (12.23) and (12.24), we get (ii).


(b) ⇒ (c). We choose a global minimal Weierstrass equation for E . Let
Γ := GCD(c34 , c26 ) = Γ g 6
with Γ sixth-power free. By (12.11) on page 426, we have the relation

3
2
∆ c4 c6
1728 6 = − .
g g2 g3
Applying the strong Hall conjecture, we obtain
max(|c4 |3 , |c6 |2 ) ε g 6 rad(∆/g 6 )6+ε .
By Corollary 12.5.7, we deduce g  rad(g) ≤ rad(Γ) and hence
6+ε
max(|c4 |3 , |c6 |2 ) ε (rad(∆)rad(Γ)) . (12.25)

We have already remarked in 12.5.5 and 12.5.9 that if p = 2, 3 and p divides both
c4 and c6 then E has additive reduction at p and fp ≥ 2. Hence 12.5.9 leads to
rad(∆) rad(Γ) ≤ 6 cond(E),
proving what we want.
(c) ⇒ (a): Let a, b, c ∈ Z be coprime with a + b = c and let us consider the
Frey curve E from Example 12.5.10 with affine equation
y 2 = x(x + a)(x − b).
First, we assume that 16 | abc. Then (12.17) on page 429 is the global minimal
Weierstrass equation for E and the generalized Szpiro conjecture leads to
 
max 163 |a2 + ab + b2 |3 , 16|abc|2 ε cond(E)6+ε . (12.26)

By 163 | ∆ , our considerations in Example 12.5.10 show that



cond(E) < 163 p.
p |a b c
p =2

Taking this into account in (12.26), we easily deduce that


max(|a|, |b|, |c|) ε rad(abc)1+ε
as claimed by the strongabc-conjecture.
Finally, we assume 16|abc. By permuting a, b, c, we may assume a ≡ 1 (mod 4)
and b ≡ 0 (mod 16) . Then (12.18) on page 430 is the global minimal Weierstrass
equation for E and the generalized Szpiro conjecture gives
 2

 
2 3  abc 
max |a + ab + b | , 
2
ε cond(E)6+ε .
16 
434 T H E abc-CONJECTURE

Since E has multiplicative reduction at all primes p|∆ , we have



cond(E) = rad(∆) ≤ p
p|abc

and, as in the first case, we get (a). 


Remark 12.5.13. In order to understand the following remark, the reader should be
familiar with the proper minimal model C of the elliptic curve E over Q and the Kodaira
classification of special fibres (see [286], Ch.IV). The proof of the implication (a) ⇒ (b)
yields a little more than the strong Hall conjecture and suggests that the generalized Szpiro
conjecture may have a slightly stronger formulation in which the conductor is replaced by
 f
p , where p ranges over all primes ≥ 5 and where f  = 0 if E has good reduction
pp p
at p , where fp = 1 if E has multiplicative reduction or the special fibre of C has Kodaira
type II , III , IV at p , and where fp = 2 if the special fibre of C has Kodaira type I∗ν , II∗ ,
III∗ or IV∗ .
In order to see this, we note first that fp is bounded by 2+3vp (3)+6vp (2) for any prime p
(see [286], Th.IV.10.4), hence we may neglect the primes 2 and 3 . Then replacing rad(Γ)
in (12.25) by the stronger rad(g) , we conclude that for primes p ≥ 5 the exponent 2 in
the conductor is only needed when p6 |Γ . This is equivalent to the statement that the special
fibre of C has Kodaira type I∗ν , II∗ , III∗ , or IV∗ at p (use Tate’s algorithm in [286], IV,
§9).

In a similar vein to the Hall conjecture, the Hall–Lang–Waldschmidt–Szpiro


conjecture claims the following:
Conjecture 12.5.14. Let A, B ∈ Z\{0}, m, n ∈ N\{0, 1}. Then for all x, y ∈ Z
with x and y coprime and
Axm + By n = z = 0
and, for every ε > 0, it holds
|x|mn−m−n ε rad(z)n+ε
and
|y|mn−m−n ε rad(z)m+ε .
12.5.15. We leave to the reader the easy task of deducing Conjecture 12.5.14
from the abc-conjecture. We may just follow the arguments proving Theorem
12.5.12(b) from (12.19) and (12.20) on page 432.
Conversely, Conjecture 12.5.14 for the equation x3 − y 2 = z implies the strong
abc-conjecture. For the proof we use the identity
2 2
(a2 + ab + b2 )3 − ((b − a)(a + 2b)(2a + b)/2) = 33 (ab(a + b)/2) .
Since a, b are coprime, we see that x := a2 + ab + b2 and y := (b − a)(a +
2b)(2a + b)/2 have no common prime divisors = 3. If 3|b − a, then we may
12.6. The generalized Fermat equation 435

divide the identity by 33 to get coprime x and y . Then Conjecture 12.5.14 gives
a2 + ab + b2 ε rad(abc)2+ε
proving immediately the strong abc-conjecture for c = a + b .
Remark 12.5.16. The conjecture implies that the generalized Fermat equation
Axm + By n = Cz p has only finitely many solutions in coprime integers x,
y , z provided ABC = 0 and m 1
+ n1 + p1 < 1. In the next section, we will
show that this is actually a consequence of Faltings’s theorem. It has been also
conjectured that for fixed A , B , C there will be no non-trivial solutions (namely,
satisfying a certain necessary coprimality condition) for sufficiently large m , n ,
and p , although this remains open.
If A = B = C = 1, we quickly find the solutions
1 + 23 = 32 , 25 + 72 = 34 , 73 + 132 = 29 ,
27 + 173 = 712 , 35 + 114 = 1222 .
However, Beukers did a more thorough computer search and found the amazing
examples
177 + 762713 = 210639282 , 14143 + 22134592 = 657 ,
438 + 962223 = 300429072 , 338 + 15490342 = 156133
to which Zagier added
92623 + 153122832 = 1137 .
These five large solutions look unusual from the point of view of diophantine equa-
tions. Their respective abc-ratios log c/ log rad(abc) are approximately
1.14125, 1.12221, 1.06109, 1.05700, 1.08836.
These values, fairly close to 1, are consistent with the strong abc-conjecture pre-
dicting that 1 is the largest accumulation point of abc-ratios.

12.6. The generalized Fermat equation

By Fermat’s last theorem, the Fermat equation xn + y n = z n has no solutions in Z \ {0}


for n ≥ 3 . More generally, we may consider the generalized Fermat equation
Axp + By q = Cz r
for given parameters A, B, C ∈ Z \ {0} . In this section, we prove a result of H. Darmon
and A. Granville ([78]) that if p1 + 1q + 1r < 1 , then the generalized Fermat equation has
only finitely many coprime solutions (meaning that GCD(x, y, z) = 1 ).
For p = q = r ≥ 4 , the generalized Fermat equation describes a smooth projective curve
of genus 12 (p − 1)(p − 2) ≥ 3 (see A.13.4), hence Faltings’s theorem implies immediately
the claim. For p = q = r = 3 , we get an elliptic curve and we may easily construct
436 T H E abc-CONJECTURE

examples with infinitely many coprime solutions by essentially doubling the points (see
[78], §6).
In the general case, the argument is more involved because the equation no longer describes
a projective curve. The idea of Darmon and Granville is to consider a finite Galois covering
π : C → P1 , which is unramified outside π −1 {0, 1, ∞} and with ramification indices
p, q, r over 0, 1, ∞ . The construction of π will use the topological and analytical means
from Section 12.3 and Belyı̆’s theorem will show that π is defined over a number field K .
By analysing the ramification of K(P )/K(β) for β = Axp /(Cz r ) and P ∈ π −1 (β) ,
we can deduce from Hermite’s theorem that the fibre points over β are rational over a fixed
number field. This may be seen as an extension of the Chevalley–Weil theorem to modestly
ramified coverings. By Hurwitz’s theorem, the genus of C will turn out to be ≥ 2 and
hence Faltings’s theorem will lead to the finiteness of the number of coprime solutions.
This completes the discussion in Remark 12.5.16.
In this section, we assume that the reader is familiar with Section 12.3.

12.6.1. We begin by describing the fundamental group of P1an \ {0, 1, ∞} .


Let x0 ∈ P1an \ {0, 1, ∞} be the base point and let σ0 ∈ π1 (P1an \ {0, 1, ∞}, x0 ) be
represented by a path connecting x0 with a point in a small neighbourhood of 0 , then
turning on a positive circle around 0 and then going backwards along the same way to x0 .
In the same way, we define σ1 , σ∞ ∈ π1 (P1an \ {0, 1, ∞}, x0 ) turning around 1 and ∞ ,
respectively. Then we claim that π1 (P1an \ {0, 1, ∞}, x0 ) is generated by σ0 , σ1 , σ∞ with
the single relation σ0 σ1 σ∞ = 1 .
In order to prove this statement, we may identify P1an with the Riemann sphere. We choose
a closed disc D in C containing 0, 1 . Topologically, the closure E of P1an \ D is also
a disc and D ∩ E is a circle. We choose the base point x0 ∈ D ∩ E getting natural
homomorphisms ϕD , ϕE from π1 (D∩E, x0 ) to π1 (D\{0, 1}, x0 ) and π1 (E\{∞}, x0 ) ,
respectively. Then van Kampen’s theorem (see H. Seifert and W. Threlfall [274], §52,
Th.1) says that π1 (P1an \ {0, 1, ∞}, x0 ) is the free product of π1 (D \ {0, 1}, x0 ) and
π1 (E \ {∞}, x0 ) modulo the relations ϕD (γ) = ϕE (γ) with γ varying over generators
of π1 (D ∩ E, x0 ) .
Z
Clearly, we have π1 (E\{∞}, x0 ) = σ∞ and another application of van Kampen’s theorem
shows that π1 (D \ {0, 1}, x0 ) is the free group on the set {σ0 , σ1 } . Then, if γ denotes
the boundary of D in positive direction, we have π1 (D ∩ E, x0 ) = γ Z and the relations in
−1
van Kampen’s theorem are γ = σ0 σ1 = σ∞ , proving the claim. 

12.6.2. Our goal is to construct a ramified covering π : Can → P1an for a connected
Riemann surface Can which is unramified outside π −1 {0, 1, ∞} . By 12.3.4 and 12.3.8,
such a covering corresponds to the subgroup H = π1 (Can \ π −1 {0, 1, ∞}, y0 ) of G =
π1 (P1an \ {0, 1, ∞}, x0 ) , where the base point y0 lies over x0 . We would like to have a
Galois covering, which means that H is a normal subgroup. It follows easily from 12.3.5
and 12.3.8 that the covering group G/H acts also transitively on the ramified fibres of π .
For x ∈ Can , we denote by ex/π(x) the ramification index of the corresponding valua-
tions which is equal to the multiplicity of the fibre divisor π ∗ ([π(x)]) in x (see Example
12.6. The generalized Fermat equation 437

1.4.12 for an algebraic discussion). The ramification index should not be confused with the
multiplicity of the ramification divisor at x which is ex/π(x) − 1 .
Lemma 12.6.3. Let π be a covering as in 12.6.2. If π is finite, then ex/π(x) is equal to
the cardinality of the stabilizer (G/H)x of x .
Proof: The argument may be done complex analytically or complex algebraically by A.14.7.
Since the covering group G/H acts transitively on the fibre π −1 (y) for any y ∈ P1an , the
ramification index ex/π(x) depends only on π(x) . By 12.3.6, the degree of π is equal to
[G : H] . By Example 1.4.12, we have deg(π) = mex/π(x), where m is the cardinality of
the set π −1 (y) . Since m = [G : H]/|(G/H)x | , we get the claim. 
12.6.4. In hyperbolic geometry, we easily show that the disc D = {|z| < 1} contains a
geodesic triangle with angles πp , πq , πr . This means that the sides are contained in circles
perpendicular to {|z| = 1} and the sum of angles is π( p1 + 1q + 1r ) < π leading to hyper-
bolicity. Moreover, reflecting the triangles successively at their sides, we get a tessellation
of D . By the Riemann mapping theorem ([8], Ch.6, Th.s 1-4), the interior of the original ge-
odesic triangle may be mapped biholomorphically on to the upper half plane {(w) > 0}
and the boundary is mapped to the boundary. Moreover, by Schwarz’s reflection principle,
we get an extension to a holomorphic map Ω : D → C . The multivalued inverse may be
described explicitly by a quotient of hypergeometric functions. For details, we refer to C.
Carathéodory [57], Vol.II, Part 7, Ch.III.
By the description above, it is clear that Ω is ramified only over points over 0, 1, ∞ with
ramification indices p , q , and r , respectively. Let P0 , P1 , P∞ be the vertices of the origi-
nal geodesic triangle ∆ lying over 0, 1, ∞ . Let τ∞ be the hyperbolic reflection at the side
through P0 and P1 . Similarly, we denote the reflections at the other sides of ∆ by τ0 , τ1
and we set γ0 := τ1 ◦ τ∞ , γ1 := τ∞ ◦ τ0 , γ∞ 
:= τ0 ◦ τ1 . These are hyperbolic rotations
2π 2π 2π
around the vertices P0 , P1 , P∞ with angles p , q , r . So we have the relations
(γ0 )p = (γ1 )q = (γ∞
 r
) = γ0 γ1 γ∞

=1 (12.27)
in the subgroup Γ of the automorphism group of D generated by γ0 , γ1 , γ∞

.
By construc-
tion, it is easy to see that Γ is the covering group of Ω . We call Γ a triangle group.
12.6.5. We can show that Γ is the free group on the set {γ0 , γ1 , γ∞

} modulo the relations
(12.27) (see S. Katok [159], Sec.4.3). It follows from 12.3.4 and 12.3.5 that a covering
π : Can → P1an as in 12.6.2 has ramification indices over 0, 1, ∞ dividing p , q , and r ,
respectively, if and only if π factors through Ω . We will use this only as a motivation to
consider suitable normal subgroups of finite index in Γ to get our desired finite covering
π of ramification indices p , q , and r . This will be provided by Fox’s theorem solving a
conjecture of Fenchel.
Theorem 12.6.6. The triangle group Γ has a torsion-free normal subgroup H of finite
index.

For the quite elementary algebraic proof, we refer to the original article R.H. Fox [121].
Corollary 12.6.7. There is a compact connected Riemann surface Can and a finite Galois
covering π : Can → P1an which is unramified outside of π −1 {0, 1, ∞} and which has
ramification indices p, q, r at the points over 0 , 1 , and ∞ , respectively.
438 T H E abc-CONJECTURE

Proof: By 12.3.4 and 12.3.5, the normal subgroup H from Fox’s theorem lifts to a normal
subgroup of π1 (P1an \{0, 1, ∞}, x0 ) leading to a ramified Galois covering π : Can → P1an .
From 12.3.6, we deduce that π is finite and deg(π) = [Γ : H] . By construction, π
is unramified outside of π −1 {0, 1, ∞} . The geometric description of Γ given in 12.6.4
shows that Γ contains an element γ0 of order p . Since H is torsion free, we conclude
that Γ/H contains also an element γ0 of order p . By 12.3.4 and 12.3.8, there is a ramified
covering ϕ : D → Can such that Ω = π ◦ϕ . By 12.3.5, it is clear that Γ/H is the covering
group of π . By construction, γ0 is in the stabilizer of ϕ(P0 ) .
Thus Lemma 12.6.3 yields that p divides the ramification index eϕ(P 0 )/0 . On the other
hand, we have eϕ(P 0 )/0 |eP 0 /0 = p , hence eϕ(P 0 )/0 = p . We argue at points over 1 and
∞ much in the same way, proving the claim. 
12.6.8. By Belyı̆’s theorem and its proof (see Theorem 12.3.11), there is a number field K
and a morphism π : C → P1K from an irreducible smooth projective curve C over K such
that the ramified covering from Corollary 12.6.7 is induced by base change. We are free to
enlarge K , hence we may assume that all fibre points over 0, 1, ∞ are K -rational. Since
we are in characteristic 0 , local parameters do not change under field extension (use A.4.6)
and hence the ramification points of C lie over 0, 1, ∞ with ramification indices p , q , and
r , respectively.
Proposition 12.6.9. The genus of C satisfies g(C) ≥ 2 .
Proof: The genus is invariant under base extension (use A.10.28), so we may work over the
complex numbers. Since the covering is Galois, the covering group operates transitively on
the fibres (see 12.6.2) and the ramification indices are the same in a given fibre. By Example
1.4.12, the number of fibre points over 0, 1, ∞ is equal to dp , dq , and dr , respectively, where
d is the degree of the covering morphism π . Then Hurwitz’s theorem (see Theorem B.4.6)
yields
d d d 1 1 1
2 − 2g(C) = 2d − (p − 1) − (q − 1) − (r − 1) = d( + + − 1) < 0
p q r p q r
proving the claim. 
12.6.10. Now we use the language of schemes to construct an irreducible projective scheme
C over OK with generic fibre C = CK . This can be done by using a set of homogeneous
equations with coefficients in OK describing C in projective space PnK . This leads to a
closed subscheme of PnO K and we choose C to be the irreducible component containing
C.
The morphism π does not necessarily extend to a morphism C → P1O K , its domain is
an open subset containing the generic fibre. Hence there is a closed finite subset S of
Spec(OK ) such that π extends to a morphism π : CS → P1O S , K over OS,K , where OS,K
is the ring of S -integers in OK and CS is the part of C lying over Spec(OS,K ) . Since
the set of smooth points of CS is open and contains C (use ([137], Th.17.5.1), the same
argument shows that we may assume, after enlarging S , that CS is a smooth scheme over
OS,K . This means that C has good reduction over the non-archimedean places outside of
S . We may also assume that no place outside of S divides pqr .
The set of unramified points of CS with respect to π is open (see B.3.2) and does not contain
the points over 0, 1, ∞ of the generic fibre C . Hence the ramified points are contained in
12.6. The generalized Fermat equation 439

the closure of π −1 {0, 1, ∞} in CS and in finitely many fibres of C over closed points of
Spec(OS,K ) . By enlarging S again, we may exclude the latter.
For two points in C , their closures in C intersect at most in finitely many points lying over
the closed points of Spec(OK ) . Again, we may assume that the closures in CS of different
fibre points over 0 are disjoint and that the same holds for the fibres over 1 and ∞ .
12.6.11. A point P ∈ C may be viewed as an OK (P ) -integral point of C by the valuative
criterion of properness ([148], Th.II.4.7). This means that there is a unique morphism P :
Spec(OK (P ) ) → C mapping the generic point {0} to P . For a non-archimedean place w
of K(P ) with maximal ideal mw ∈ Spec(OK (P ) ) , let P (w) := P (mw ) . We suppose that
the restriction v of w to K is not contained in S and that P (w) is a k(v) -rational point of
the reduction Ck(v) . Since C has good reduction in v , the reduction is a smooth curve over
k(v) and hence we have a local parameter ζ& in the discrete valuation ring OCk (v ) ,P (w) .
Let ζ be a lift to OC,P (w) .
Lemma 12.6.12. Under the assumptions above and if we identify the completion Kv with
a subfield of the completion K(P )w , then K(P )w = Kv (ζ(P )) .
Proof: We need to show that K(ζ(P )) is dense in K(P ) with respect to w . Let πv be a
local parameter in the valuation ring Rv of K , let πw be similarly defined for w , and let
N ∈ N.
First step: For every a ∈ OC,P (w) , there is p(x) ∈ Rv [x] with
a − p(ζ) ∈ [πv , ζ]N  OC,P (w) .

To prove the first step, we proceed by induction on N . For N = 0 , there is nothing to


prove. Now let N ≥ 1 . Since ζ& is a local parameter in the smooth k(v) -rational point
P (w) , there is a polynomial q(x) ∈ Rv [x] with reduction q&(x) ∈ k(v)[x] such that
& & (mod ζ&N ) . By lifting, there is b ∈ OC,P (w) with
a ≡ q&(ζ)
a ≡ q(ζ) + bπv (mod ζ N ).
By induction applied to b , we get the first step.
To prove the lemma, we note that the image of OC,P (w) under the evaluation map at P
generates K(P ) as a field. Using ζ(P ) ≡ 0 (mod πw ) and the first step, we conclude
that every element in K(P ) may be approximated by rational functions in ζ(P ) proving
the claim. 
Lemma 12.6.13. Let | |v be a complete non-archimedean absolute value on the field F
and let k ∈ N be coprime to the characteristic of the residue field of v . For α ∈ F with
|α|v < 1 , let u := 1 + α . Then there is λ ∈ F with u = λk .
Proof: Formally, we have the identity
   k


1/k n
1+x= x
n=0
n
 
of power series with coefficients in Q . It is easy to see that the coefficients 1/kn
are indeed
in Z[ k1 ] . Since k is a unit in Rv , we conclude that the identity also holds for power series
440 T H E abc-CONJECTURE

with coefficients in Rv . Replacing x by α , the power series is convergent and we get a


k th root of u in F . 
The following result of S. Beckmann [19] is the key to the proof of finiteness of the number
of solutions of the generalized Fermat equation.
Lemma 12.6.14. Let π : C → P1K be the covering from 12.6.8 and let S be as in 12.6.10.
We identify the place v ∈ S with the corresponding discrete valuation normalized by
v(OK ) = Z and let v + := max(0, v) . Then for Q ∈ P1 (K) \ {0, 1, ∞} with affine
coordinate β satisfying
v + (β) ≡ 0 (mod p), v + (β − 1) ≡ 0 (mod q), v + ( β1 ) ≡ 0 (mod r),
−1
the extension K(P )/K is unramified over v for every P ∈ π (Q) .
Proof: First, we handle the case v (β) = v (β − 1) =
+ +
v + ( β1 )
= 0 . Clearly, this implies
v(β) = v(β − 1) = 0 and hence we have Q(v) ∈ {0(v), 1(v), ∞(v)} for the reductions.
Let w be a place of K(P ) over v . Since π(P ) = Q , we get π(P (w)) = Q(v) ∈
{0(v), 1(v), ∞(v)} . By our considerations in 12.6.10, we conclude that π is unramified
in P (w) and hence K(P )/K is unramified over w (see Proposition B.3.6).
Now we assume that at least one congruence is not an equality. By a change of coordinates,
we may assume v(β) > 0 . This means Q(v) = 0(v) . Every point P ∈ C may be
viewed as an OK (P ) -integral point P of C (see 12.6.11) and hence as a prime divisor on
the smooth model CR v over the discrete valuation ring Rv . By standard facts for divisors
(see A.8 and [125], Ch.20 for the generalization over Dedekind rings), we have the identity

π ∗ (0) = p·R
(12.28)
R∈π −1 (0)

of divisors on CR v . This follows from surjectivity of Ck(v) onto P1k(v) , so only horizontal
components occur in π ∗ (0) and hence the multiplicities may be computed in the generic
fibre C . For the reduction π & of π modulo v , we have the identity

& ∗ [0(v)] =
π eR/0(v)
&
&
[R]
(12.29)
π −1 (0(v))
R̃∈&

of divisors on Ck(v) . Pulling (12.28) back to Ck(v) and comparing with (12.29), we con-
clude that reduction modulo v maps π −1 (0) onto π −1 (0(v)) . By our assumptions in
12.6.10, this reduction is also one-to-one and hence bijective. Moreover, the above compar-
ison shows that the ramification index eR/0(v)
& is equal to p . By our assumptions in 12.6.8,
all fibre points over 0 are K -rational and hence all fibre points over 0(v) are k(v) -rational.
Now we consider P ∈ π −1 (Q) and a place w of K(P ) over v . Since Q(v) = 0(v) , the
above shows the k(v) -rationality of P (w) ∈ π & −1 (0(v)) . There is a unique R ∈ π −1 (0)
with R(v) = P (w) . Let ζ be a local equation of the prime divisor R in R(v) . Hence
ζ ∈ OC,P (w) with reduction ζ& equal to a local parameter of P (w) inside the smooth curve
Ck(v) . Let x be the affine coordinate on P1K , then x is a local equation for 0 and therefore
(12.28) shows
x = uζ p (12.30)
for a unit u ∈ OC,P (w) .
12.6. The generalized Fermat equation 441

Let L be the number field obtained from K by adjoining all p th roots of u(R) . Since
by assumption the characteristic of k(v) does not divide p , the extension L/K is unram-
ified over v (Lemma B.2.6). By Proposition B.2.3, it is enough to show that L(P )/L is
unramified over v . Replacing K by L , we may assume that K contains all p th roots of
u(R) .
We claim that Kv (β 1/p ) = K(P )w . Replacing u by u/u(R) and ζ by u(R)1/p ζ ,
we may assume u(R) = 1 . For the reduction u & of u modulo v , we have u &(P (w)) =
&(R(v)) = 1 ∈ k(v) and hence |u(P ) − 1|w < 1 . If u has a p th root, then our claim
u
follows directly from (12.30) and Lemma 12.6.12. However, in general the p th root of the
unit u exists only as a v -adic analytic function and we need to modify the proof slightly.
First we note the formal identity
 
1 ∞
1/p
u =
p (u − 1)n , (12.31)
n=0
n
which becomes an equation in the w -adic topology by evaluating it at u = u(P ) (see proof
of Lemma 12.6.13). The first step in the proof of Lemma 12.6.12 shows that u − 1 may
be expressed as a formal power series in ζ without the constant term and with coefficients
in the valuation ring R̂v of the completion Kv . Inserting this in the right-hand side of
(12.31), and using (12.30), we get a formal identity
1 

xp = ζ + an ζ n (12.32)
n=2

& (v)) = 0 ∈ k(v) , we have |ζ(P )|w < 1 and hence


with coefficients an ∈ R̂v . Since ζ(P
(12.32) gives an identity in K(P )w by evaluating it at ζ = ζ(P ) . Now (12.32) implies
that
1 

n
ζ = xp + bn x p (12.33)
n=2

for unique coefficients bn ∈ R̂v . This follows from putting (12.33) into (12.32) and com-
parison of coefficients. Inserting P in (12.32) and (12.33), completeness yields Kv (β 1/p ) =
Kv (ζ(P )) and hence Lemma 12.6.12 implies Kv (β 1/p ) = K(P )w .
By assumption the residue characteristic of v does not divide p and p|v(β) , hence Lemma
B.2.6 yields that K(P )w /Kv is unramified over v . Since residue fields do not change by
passing to completions (see Proposition 1.2.11), we have proved that K(P )/K is unrami-
fied over w . 
Now we are ready to prove the theorem of Darmon and Granville.
Theorem 12.6.15. Let p, q, r ∈ N \ {0} with p1 + 1q + 1r < 1 and let A, B, C ∈
Z \ {0} . Then there are only finitely many solutions (x, y, z) ∈ Z3 of the generalized
Fermat equation
Axp + By q = Cz r
with GCD(x, y, z) = 1 .
Proof: Let π : C → P1K be the covering from 12.6.8 and let S be the finite set of non-
archimedean places on K constructed in 12.6.10. By enlarging S , we may assume that it
442 T H E abc-CONJECTURE

contains all places dividing ABC . Let (x, y, z) be a coprime solution of the generalized
Fermat equation. We may assume xyz = 0 excluding at most finitely many solutions. For

Axp
β := ,
Cz r

our assumptions on S and GCD(x, y, z) = 1 yield the congruences in Lemma 12.6.14.


Note that Q := β ∈ P1 (Q) \ {0, 1, ∞} and Lemma 12.6.14 shows that K(P )/K is
unramified over v for all non-archimedean places v ∈ S of K and for all P ∈ π −1 (Q) .
By Example 1.4.12, we have [K(P ) : K] ≤ deg(π) . Hermite’s theorem implies that there
are only finitely many number fields in K/K of bounded degree and unramified outside
S (see Corollary B.2.15). Hence there is a number field L extending K containing K(P )
for all possible coprime solutions. Since g(C) ≥ 2 (see Proposition 12.6.9), Faltings’s
theorem (see Theorem 11.1.1) shows that C(L) is finite, thus proving that there are only
finitely many possibilities for β .
Let (x, y, z) and (x , y  , z  ) be coprime solutions in (Z \ {0})3 of the generalized Fermat
equation with β = β  . Let
be a prime with
-adic valuation v . Using coprimeness,
we get v (x) = v (x ) and v (z) = v (z  ) for every
not dividing ABC . Moreover,
for
|ABC we have min{v (x), v (z)} ≤ v (B) and hence only finitely many coprime
solutions give rise to the same β . This proves the claim. 

12.7. Bibliographical notes

Szpiro’s conjecture, originally formulated only for the discriminant and with a un-
determined exponent, was formulated at an exposition in Hannover in 1983. Influ-
enced by this conjecture and the theorem of Stothers–Mason, the abc-conjecture
came out from a discussion between Masser and Oesterlé in 1985. The original
purpose of both conjectures was to give new insights into Fermat’s last theorem.
N. Elkies [98] proved that the abc-conjecture implies Mordell’s conjecture, effec-
tively if an effective version of the abc-conjecture is available. He also mentioned
that his argument using Belyı̆’s lemma proves Vojta’s height inequality and hence
Roth’s theorem as a special case. This aspect of the abc-conjecture over arbitrary
number fields will be touched upon in Section 14.4. Proofs of Theorem 12.2.9
have also been given by Langevin in [177] and Oesterlé (in a seminar).
The implication (b) ⇒ (a) in Belyı̆’s theorem was well known to the experts, the
argument uses a specialization argument due to Weil. The input of Grothendieck
to Theorem 12.3.12 was the generalization of Riemann’s existence theorem. In
1979, G.V. Belyı̆ [21] proved the converse implication in Theorem 12.3.11 and
caused Grothendieck to remark that such a deep result was never proved in such
a simple and short way (see A. Grothendieck [138], where the study of dessins
d’enfant was initiated).
12.7. Bibliographical notes 443

In 1981 W.W. Stothers [293] obtained the abc-theorem for polynomials; our first
proof follows essentially his arguments with Hurwitz’s genus formula. Indepen-
dently, Mason was inspired by an analogue of Baker’s theory of linear forms of log-
arithms for function fields and found, in 1983, the generalization given in 12.4.3,
which is a major tool in his studies of diophantine equations over function fields
(R.C. [192], [193]).
The idea of the second proof appears, in a special case, in a Comptes Rendus
note of A. Korkine in 1880 (see [89], Vol.II, p.750). The general abc-theorem for
polynomials is a special case of a result of D. Brownawell and D. Masser [52],
where also the first example in Remark 12.4.5 appears and, independently, of J.F.
Voloch [319]. Granville’s remark at the end of Section 12.4 is unpublished, so it
remains to the reader to fill in the necessary details.
The equivalence of the various conjectures in Section 12.5 is given by Vojta [307],
Ch.5, App.ABC. Note however that his version of the Hall–Lang–Waldschmidt–
Szpiro conjecture is not true as stated, with the counterexample given in 12.5.2. If
we assume x, y coprime as in Conjecture 12.5.14, then his proof of “Hall–Lang–
Waldschmidt–Szpiro conj. ⇒ generalized Szpiro conj.” is incomplete because c4
and c6 may have common divisors for a minimal Weierstrass equation.
In J. Oesterlé [229], a sketch of proof for “abc ⇒ generalized Szpiro conj.” is
given after ideas of Hindry, analysing the reduction of the elliptic curve at every
place. This is similar to our proof of Theorem 12.5.12 (a) ⇒ (b) based on the
minimality criterion in Corollary 12.5.7, for which we could not find a complete
reference in the literature. The implication “Hall–Lang–Waldschmidt–Szpiro conj.
⇒ abc” is from A. Nitaj [222], where the reader will also find further applications
of the abc-conjecture.
In Section 12.6, we follow the presentation of Darmon and Granville [78]. Beck-
mann’s proof of the crucial Lemma 12.6.14 uses Galois representations. We ex-
pand here the sketch of proof given in [78]. Darmon [76] has also an interpretation
of the lemma as a Chevalley–Weil theorem for orbifolds. For explicit examples
and a similar treatment for the superelliptic equation, we refer to [78].
1 3 N E VA N L I N NA T H E O RY

13.1. Introduction

In 1987 Vojta formulated a sweeping set of precise conjectures about the structure
of the set of rational points on algebraic varieties. The rationale about these con-
jectures was a rather precise analogy between the Nevanlinna theory of the distri-
bution of values of meromorphic functions and diophantine approximation. In this
way, Vojta motivated, clarified, and unified results and conjectures in diophantine
approximation and diophantine equations. The analogy between Nevanlinna the-
ory and diophantine approximation had also been noticed earlier by Ch. Osgood,
in a somewhat different setting.
Here we discuss the Nevanlinna theory, while Vojta’s conjectures, their connexion
with the abc-conjecture studied in the preceding chapter, and their parallelism with
Nevanlinna theory, will be dealt with in the next final chapter.
Section 13.2 is a brief introduction to the classical Nevanlinna theory in one vari-
able, presenting the main results together with some examples. Next, Section
13.3 presents the Ahlfors–Shimizu elegant formulation of the theory, including
Ahlfors’s proof of the second main theorem. Holomorphic curves are the content
of Section 13.4 dealing with geometric aspects of Nevanlinna theory, culminating
with the conjectural second main theorem of Griffiths.
This chapter may be read independently of the other parts of the book.

13.2. Nevanlinna theory in one variable

Nevanlinna theory in one variable in its simplest form describes the value distri-
bution theory of a non-constant meromorphic function f : C → P1an . First, we
prove Jensen’s formula, which is the basic tool. As a consequence, we will ob-
tain Nevanlinna’s first main theorem. Then we state without proof the lemma on
the logarithmic derivative and the second main theorem of Nevanlinna. Next we
present the notion of defect, deduce the basic defect inequality and Picard’s lit-
tle theorem as a corollary. The section ends by showing that the analogue of the
abc-conjecture holds for meromorphic functions on C .
444
13.2. Nevanlinna theory 445

Throughout this chapter, we shall suppose that the meromorphic function f is not
a constant, unless specified otherwise. Proofs, when given, will be written in a
concise form. Working them out in full detail is left as a useful exercise for the
reader. We follow here the standard classical notation except that the ramification
function and the counting function truncated at 1 are denoted by Nram and N (1)
instead of N1 and N , which we find in the classical literature.
The usual way to study the distribution of values of a meromorphic function f (z)
is to consider the number of solutions, counted with multiplicity, of the equation
f (z) = a in a disk {|z| < r}, as r varies. Recall that ordz (f ) denotes the order
of f at z ∈ C . The main tool at our disposal is the Poisson–Jensen formula:
Proposition 13.2.1. Let f be meromorphic in the closed disk |z| ≤ R and assume
that f (z) = 0, ∞ . Then for |z| < R it holds
  2 
 R − az 
log |f (z)| = − orda (f ) log  
R(z − a) 
|a|<R,a=z
2π iθ
(13.1)
1   Re + z
+ log f (Reiθ ) ·  dθ.
2π 0 Reiθ − z

The case in which there are no zeros or poles is called Poisson’s formula and the
case z = 0 is called Jensen’s formula.

Proof: We give only a quick sketch of proof. Consider first the special case in
which f (z) has no zeros or poles. Then u(z) := log |f (z)| is harmonic in an
open neighborhood of the disk {|z| ≤ R}, hence u(0) is the average of u(z)
on the boundary {|z| = R}. The Poisson formula expressing u(z) in terms of
its boundary values reduces to this special case by composition with a Möbius
transformation mapping the disk conformally onto itself and sending the origin to
the point z .
In the general case, the Lebesgue dominated convergence theorem applied to the
functions f (rz) with r → 1 implies that we may assume f without zeros and
the claim is easily obtained by multiplying f (z)
poles on the boundary. Finally, 
by the finite Blaschke product {R(z − a)/(R2 − az)}−orda (f ) for |a| < R
and applying the formula to the new function (note that the product has absolute
value 1 on |z| = R ). For more details, we refer to W. Hayman [149], p.1 and [8],
p.208. 
The special case of the Poisson–Jensen formula in which z = 0 is particularly
important to us and we proceed to rewrite it as follows. First, we introduce some
notation. For a real-valued function F (r), r > 0, quantitative estimates such as
F (r) = O(log r) are always meant with respect to r → ∞ . Also we define
F + (r) := max{F (r), 0} , F − (r) := − min{F (r), 0},
446 N E VA N L I N NA T H E O RY

so that F (r) = F + (r) − F − (r).


Definition 13.2.2. For a ∈ C and r > 0, the enumerating function is defined by

n(r, a, f ) := ord+z (f − a),
|z|<r

the number of solutions of f (z) = a in the disk |z| < r counted with their
multiplicity. For a = ∞ , we replace f by 1/f to get

n(r, ∞, f ) := ord−
z (f ),
|z|<r

the number of poles of f (z) in the disk |z| < r counted with their multiplicity.
13.2.3. This function is too irregularly behaved and a logarithmic average
r
n(t, a, f )
dt
t
is a much better function which also arises in other contexts. However, care must
be taken if f (z) − a vanishes at the origin, because then the integral diverges at
r = 0.
Definition 13.2.4. For a ∈ C and r > 0, the counting function is defined by
r
n(t, a, f ) − ord+0 (f − a)
N (r, a, f ) : = dt + ord+0 (f − a) log r
t
0
 r
 
= ord+0 (f − a) log r + ord +
z (f − a) log  .
z
0<|z|<r

For a = ∞ , we replace f by 1/f and ∞ by 0 to get


r
n(t, ∞, f ) − ord−
0 (f )
N (r, ∞, f ) : = dt + ord− 0 (f ) log r
t
0
 r
 
= ord−0 (f ) log r + ord −
z (f ) log  .
z
0<|z|<r

Remark 13.2.5. The modification in the definition of N (r, a, f ) if ord0 (f −a) =


0 is, in Hayman’s words, a tiresome but minor irritation of the theory. Its effect is
to replace quantities such as f (0) by c(f, 0), the leading coefficient of the Laurent
series of f (z) at z = 0.

On the other hand, the function N (r, a, f ) so defined is perfectly suited for a
compact reformulation of the Poisson–Jensen formula at z = 0, which is Jensen’s
formula:
Proposition 13.2.6. Let
c(f, 0) := lim f (z)z −ord0 (f )
z→0
13.2. Nevanlinna theory 447

be the leading coefficient in the Laurent series of f at 0. Then



1  
log f (reiθ ) dθ + N (r, ∞, f ) − N (r, 0, f ) = log |c(f, 0)|.
2π 0
Proof: This is the special case z = 0 of the Poisson–Jensen formula (13.1) on
page 445 applied to z −ord0 (f ) f (z), as we verify using the definition of N (r, a, f )
and the general equation F (r) = F + (r) − F − (r). 
Definition 13.2.7. The proximity function is

1 1
m(r, a, f ) := log+   dθ.
2π 0 f (re ) − a

For a = ∞ , we replace f by 1/f to get



1  
m(r, ∞, f ) := log+ f (reiθ ) dθ.
2π 0
Remark 13.2.8. The counting function is a logarithmically weighted degree of the
zero divisor of f − a on the open disk D(r) := {z ∈ C | |z| < r}. The proximity
function is a logarithmic average on the boundary ∂D(r) measuring how close
f (z) is to a. Note that m(r, a, f ) < ∞ because values f (reiθ ) = a lead only to
integrable logarithmic singularities on ∂D(r).
Remark 13.2.9. Suppose that f is an entire function. Then the proximity function
and log f r , where f r := max|z|≤r |f (z)|, are comparable. In one direction,
we have
m(r, ∞, f ) ≤ log+ f r .
In the other direction, we apply (13.1) on page 445, noting that for |z| = r < R
we have  2 
 R − az 
 
 R(z − a)  ≥ 1
and iθ

Re + z R+r
0≤ ≤ .
Re − z
iθ R−r
Since f (z) is entire, we have orda (f ) ≥ 0 and we conclude that
R+r
log |f (z)| ≤ m(R, ∞, f ).
R−r
Setting R = 2r , we get
log+ f r ≤ 3m(2r, ∞, f ). (13.2)
Thus in the case of entire functions the proximity function at a = ∞ plays the
same role as the logarithm of the maximum modulus. This is not so for meromor-
phic functions and we owe to R. Nevanlinna the discovery of a quantity that can
take the place of the logarithm of the maximum modulus in the general case. This
is expressed in his first main theorem.
448 N E VA N L I N NA T H E O RY

Theorem 13.2.10. For a ∈ C the following formula holds:


m(r, a, f )+N (r, a, f ) = m(r, ∞, f )+N (r, ∞, f )−log |c(f −a, 0)|+(r, a, f ),
with |(r, a, f )| ≤ log+ |a| + log 2.
Proof: Immediate from Jensen’s formula in 13.2.6 applied to f − a, noting first
that
 + log |f | = log+ |f | −log− |f |, orda (f ) = ord+ −
a (f ) − orda (f ), and then that
 log |f − a| − log+ |f | ≤ log+ |a| + log 2. 
In 13.4.9, we will give the argument in a more general setting in the higher-
dimensional case.
Since m(r, a, f ) + N (r, a, f ) turns out to be independent of a up to a bounded
function, this suggests the following definition:
Definition 13.2.11. The characteristic function of f is
T (r, f ) := m(r, ∞, f ) + N (r, ∞, f ).

This function turns out to be well behaved as a function of r .


Example 13.2.12. Let f (z) be a polynomial of degree d . Then the fundamental
theorem of algebra shows
N (r, a, f ) = d log r + O(1), m(r, a, f ) = O(1)
for a = ∞ and r → ∞ , the implied constant in the O(1) symbol depends on f
and a. For a = ∞ , we have N (r, ∞, f ) = 0, m(r, ∞, f ) = d log r + O(1),
hence T (r, f ) = d log r + O(1).

The following interesting result, Cartan’s formula, is an easy consequence of


Jensen’s formula.
Proposition 13.2.13. Let C := log+ |f (0)| if f (0) = ∞ and C := log |c(f, 0)|
if f (0) = ∞ . Then

1
T (r, f ) = N (r, eiθ , f ) dθ + C.
2π 0
Proof: We assume first that f (0) = ∞ . By Jensen’s formula in 13.2.6, if f (0) =
eiθ we have

1    
log f (reiφ ) − eiθ  dφ + N (r, ∞, f ) = N (r, eiθ , f ) + log f (0) − eiθ  .
2π 0
Integrating with respect to θ , using Fubini’s theorem and

1  
log c − eiθ  dθ = log+ |c| (13.3)
2π 0
(apply Jensen’s formula to c − z ), we get Cartan’s formula.
13.2. Nevanlinna theory 449

If f (0) = ∞ , we have to replace log |f (0) − eiθ | above by log |c(f, 0)| and the
proof follows along the same lines. 
Corollary 13.2.14. The Nevanlinna characteristic T (r, f ) is an increasing convex
function of log r .
Proof: Note that N (r, eiθ , f ) is an increasing convex function of log r , hence so
is its mean value in θ . 
Remark 13.2.15. Hence T (r, f ) is increasing but it has not to be strictly increas-
ing. For example, if f {|z| ≤ r} ⊂ {|w| < 1}, then T (r, f ) = 0.
Corollary 13.2.16. The proximity function is bounded on average on circles

1
m(r, a + Reiθ , f ) dθ ≤ log 2 + log+ (1/R).
2π 0
Proof: Replacing f by f − a we may assume that a = 0. Next, note that we have
a general inequality m(r, Rb, f ) ≤ m(r, b, f /R) + log+ (1/R), as we readily see
1
using f −Rb 1
= R1 (f /R)−b and the definition of proximity function. Hence we may
assume that R = 1. Now we set a = eiθ in the first main theorem, take its mean
value with respect to θ , and conclude applying Cartan’s formula and equation
(13.3). 
Proposition 13.2.17. If f is not a constant, then T (r, f ) is unbounded. If
T (r, f )
lim inf < +∞,
r→∞ log r
then f is a rational function.
Proof: For the second statement, the hypothesis entails that T (rj , f ) = O(log rj )
along an unbounded sequence (rj )j∈N , hence N (rj , ∞, f ) = O(log rj ) and we
conclude that f has only finitely many poles. Hence there is a polynomial Q(z)
such that Q(z)f (z) is an entire function. By Example 13.2.12, we also have
m(rj , ∞, Qf ) ≤ m(rj , ∞, Q) + m(rj , ∞, f ) = O(log rj ).
Using (13.2) on page 447, we conclude that Qf is an entire function with
sup |Qf |(z)  rjd
|z|≤rj

for some d ∈ N . The Cauchy inequalities applied to {|z| ≤ rj } show that Qf is


a polynomial of degree ≤ d . This proves the second statement.
If T (r, f ) is bounded, then the above argument shows that f has no poles, hence
we may choose Q = 1 and d = 0 proving that f is constant. 
The next example gives the converse of Proposition 13.2.17.
450 N E VA N L I N NA T H E O RY

Example 13.2.18. Let f be a non-constant rational function. Then there are co-
prime polynomials P, Q with f (z) = P (z)/Q(z). Recall that the degree of f ,
considered as a finite morphism, is given by deg(f ) = max{deg(P ), deg(Q)}.
Similarly as in Example 13.2.12, we may compute directly
N (r, a, f ) = deg(P − aQ) log r + O(1),
m(r, a, f ) = (deg(f ) − deg(P − aQ)) log r + O(1)
for a = ∞ and
N (r, ∞, f ) = deg(Q) log r + O(1),
m(r, ∞, f ) = (deg(f ) − deg(Q)) log r + O(1).
This illustrates the first main theorem and shows that, if f is a non-constant ratio-
nal function, then T (r, f ) = deg(f ) log r + O(1).
Definition 13.2.19. The order ρ(f ) of a meromorphic function f is
log T (r, f )
ρ(f ) := lim sup .
r→∞ log r
13.2.20. For meromorphic functions f, g not identically zero, we have
T (r, f g) ≤ T (r, f ) + T (r, g), T (r, g) = T (r, 1/g) + log |c(g, 0)|.
The first identity is obtained from the analogous relations for the proximity and
counting functions. The second identity follows immediately from Jensen’s for-
mula. We conclude that
ρ(f g) ≤ max{ρ(f ), ρ(g)}, ρ(f /g) ≤ max{ρ(f ), ρ(g)}.
13.2.21. The function N (r, a, f ) is a measure of the vanishing of f − a in the
circle {|z| < r}, counting multiplicity. It is very important to measure this multi-
plicity, in other words the ramification of f , and this is done as follows.
By the Weierstrass factorization theorem ([8], Ch.5, Th.7), there are entire func-
tions f0 , f1 without common zeros such that f = f1 /f0 .
The Wronskian W (f0 , f1 ) of (f0 : f1 ) is

f0 f1
W (f0 , f1 ) := det
f0 f1
and we define
Nram (r, f ) := N (W (f0 , f1 ), 0, r).
This function is an average measure of the ramification of f in the disk {|z| < r}
and is independent of the choice of f0 , f1 with f = f1 /f0 , as long as f0 and f1
have no common zeros. For z ∈ C and a := f (z), let m(z) be the multiplicity
of z in the equation f (z) = a. Then it is easy to show that
ordz (W (f0 , f1 )) = m(z) − 1.
13.2. Nevanlinna theory 451

Therefore, we have
Nram (r, f ) = N (r, 0, f  ) + 2 N (r, ∞, f ) − N (r, ∞, f  ). (13.4)

13.2.22. It is also clear how to define functions nram (r, a, f ) and Nram (r, a, f ).
The only difference consists in taking max{ord+ z (f − a) − 1, 0} instead of ordz
+

(f − a) in the definitions of n(r, a, f ) and N (r, a, f ) (if a = ∞ use max{ordz
(f ) − 1, 0} instead of ord−z (f )). Then we have

Nram (r, f ) = Nram (r, a, f ),
a∈P1an

where the sum on the right is actually a finite sum for each r .

Fundamental in Nevanlinna theory is the lemma on the logarithmic derivative.


Lemma 13.2.23. The estimate
m(r, ∞, f  /f ) = O(log T (r, f )) + O(log r)
holds for r outside a set E of finite Lebesgue measure.

The proof is elementary, but rather lengthy and certainly not obvious. We refer to
the literature, see for example [149], Th.2.2 or W. Cherry and Z. Ye [64], Ch.3.
We have seen in Corollary 13.2.16 that the average of the proximity function
m(r, a, f ) on a circle of radius R is at most log 2 + log+ (1/R). This means that
values of a for which m(r, a, f ) is large and comparable with T (r, f ) must be
quite exceptional and we expect N (r, a, f )/T (r, f ) to be usually near to 1. The
second main theorem of Nevanlinna theory makes this observation quantitative.
Theorem 13.2.24. Let a1 , . . . , aq be different elements of P1an = C ∪ {∞}. Then

q
m(r, aj , f ) + Nram (r, f ) ≤ 2T (r, f ) + O(log T (r, f )) + O(log r)
j=1

outside of a set E of finite Lebesgue measure. Moreover, if f has finite order the
result holds for all large r without exception.
Proof: This is a consequence of the lemma on the logarithmic derivative as we
will show more generally for the higher-dimensional case, in Theorem 13.4.16. A
complete proof will be given in Section 13.3. 
Remark 13.2.25. Lang drew attention to the significance of the error term, pointing out
that it has a definite counterpart in metric diophantine approximation (see S. Lang [171],
p.199). The interested reader will find in [64] a thorough analysis of the error term, in-
cluding numerically explicit estimates. So far, no significant application to the theory of
meromorphic functions in the plane has come from these refinements.
452 N E VA N L I N NA T H E O RY

Example 13.2.26. Let f (z) = exp(z) . Clearly, we have N (r, 0, f ) = N (r, ∞, f ) = 0 .


Further, we compute
2π π/2
1 iθ 1 r
m(r, ∞, f ) = log+ |ere | dθ = r cos θ dθ =
2π 0 2π −π/2 π

and hence T (r, f ) = r/π . Similarly, we get m(r, 0, f ) = r/π . For a ∈ {0, ∞} , the set
of solutions of f (z) = a is log a + 2πiZ , with multiplicity 1 . From this it is easy to verify
that
r
N (r, a, f ) = + O(log r)
π
and a more accurate analysis shows that N (r, a, f ) = π −1 r + O(1) (see [149], p.7 or [64],
Prop.1.6.1). Hence from the first main theorem we get m(r, a, f ) = O(1) .

Definition 13.2.27. The defect of f at a is


m(r, a, f )
δ(a, f ) := lim inf .
r→∞ T (r, f )
The ramification defect of f at a is
Nram (r, a, f )
θ(a, f ) := lim inf .
r→∞ T (r, f )
Also we define
m(r, a, f ) + Nram (r, a, f )
Θ(a, f ) := lim inf ≥ δ(a, f ) + θ(a, f ).
r→∞ T (r, f )

As an aside, note that Θ(a, f ) > δ(a, f ) + θ(a, f ) may occur even for functions
of finite order (contrary to what has been written in [64], p.125).

Since Nram (r, f ) = a Nram (r, a, f ), as an almost immediate consequence of
the second main theorem we have the celebrated defect inequality of Nevanlinna:
Theorem 13.2.28. Let f in C be a meromorphic function. Then
 
{δ(a, f ) + θ(a, f )} ≤ Θ(a, f ) ≤ 2.
a∈P1an a∈P1an

Moreover, δ(a, f ) + θ(a, f ) ≤ 1 for every a ∈ P1an .


Proof: If f is not a rational function, we know that T (r, f )/ log r → ∞ (see
Proposition 13.2.17) and the result is immediate from the second main theorem. If
instead f is a non-constant rational function, then Example 13.2.18 shows that the
only deficient value is a = f (∞) with
m(∞)
δ(a, f ) = ≤ 1,
deg(f )
13.2. Nevanlinna theory 453

where m(λ) is the multiplicity of f (z) = a at z = λ . Hurwitz’s theorem in B.4.6


applied to the morphism f : P1C → P1C yields

−2 = −2 deg(f ) + {m(λ) − 1},
λ∈P1C

where m(λ) − 1 is the multiplicity of the ramification divisor of f at the point


z = λ . It is a simple exercise to rewrite this as
 1
{δ(a, f ) + θ(a, f )} = 2 − , (13.5)
1
deg(f )
a∈PC

hence the defect inequality continues to hold a fortiori for all non-constant rational
functions. The last claim follows from Nram (r, a, f ) ≤ N (r, a, f ) (use 13.2.22)
and the first main theorem. 

Remark 13.2.29. The term −1/ deg(f ) in the right-hand side of (13.5) arises
because we are dealing with a covering f : C → P1an , while Hurwitz’s theorem
deals with the covering f : P1an → P1an .

Remark 13.2.30. From the definition and the first main theorem, we have
N (r, a, f )
1 − δ(a, f ) = lim sup .
r→∞ T (r, f )
The defect inequality implies that δ(a, f ) = 0 except for a countable set of values
a, hence δ(a, f ) measures the failure for N (r, a, f ) to be close to T (r, f ). This
explains the terminology “defect” for δ(a, f ) and deficient value for a whenever
δ(a, f ) > 0.
Remark 13.2.31. The notion of deficient value is not invariant by translation. There are
examples of meromorphic functions f (z) such that δ(f (z), a) = δ(f (z − z0 ), a) may
occur for a suitable translation z0 . This quite pathological phenomenon is associated to
either infinite order or extremely irregular behaviour of T (r, f ) and does not occur if T (r+
1, f )/T (r, f ) → 1 as r → ∞ (see R. Nevanlinna [221], Ch.X, §2, 230).

Remark 13.2.32. In Example 13.2.26, the deficient values are a = 0, ∞ , both with
δ(a, f ) = 1 . Thus Theorem 13.2.28 is sharp.

Example 13.2.33. Let ℘(z) be the Weierstrass elliptic function associated to the elliptic
curve y 2 = 4x3 − g2 x − g3 = 4(x − e1 )(x − e2 )(x − e3 ) (see 8.3.10). Define e0 = ∞ .
Then ℘(z) is a doubly periodic meromorphic function of order 2 with no deficient values.
A doubly periodic meromorphic function on C is called elliptic. Let f be a non-constant
elliptic function with fundamental domain Ω of the period lattice Λ . The order d of f
is the number of solutions z ∈ Ω , counted with multiplicity, of the equation f (z) = a
(the reader should be careful not to confuse this definition of order with the order of f as a
meromorphic function, which is always ρ(f ) = 2 as we will show below).
454 N E VA N L I N NA T H E O RY

An application of the argument principle shows that the order does not depend on the choice
of a . An easy covering argument shows
πd
n(r, a, f ) = · r2 + O(r)
vol(Ω)
and hence
πd
N (r, a, f ) = · r2 + O(r).
2vol(Ω)
Note that the bounds may be chosen independently of a , therefore Cartan’s formula (Propo-
sition 13.2.13) gives
πd
T (r, f ) = · r2 + O(r).
2vol(Ω)
By Remark 13.2.30, no deficient values occur for an elliptic function f .
Now we specialize f = ℘ . Since ℘ has only the pole 0 ∈ Ω and the pole is of order two,
we infer that the order of ℘ is 2 as claimed at the beginning and which is, by accident, equal
to ρ(℘) . Moreover, ℘ is an odd elliptic function of order 3 with the triple pole in 0 , hence
the set of zeros of ℘ is ( 12 Λ)\ Λ and they are all simple. We conclude that 12 Λ is the set of
ramification points of ℘ . They are the solutions of the equations ℘(z) = ej (j = 0, . . . , 3)
which have only double roots. A geometric argument as above gives
π
Nram (r, ej , ℘) = · r2 + O(r).
2vol(Ω)

We conclude that θ(ej , ℘) = 12 and hence θ(ej , ℘) = 2 giving again an extremal
example of Theorem 13.2.28.
Consider the function z = z(w) defined by
w
1 1 1 −1
z= (t − a) m −1 (t − b) n −1 (t − c) p dt,
0
1
where a , b , c are distinct real numbers and m , n , p are integers such that m + n1 + p1 = 1 .
Then z(w) maps the upper half plane biholomorphically onto an open triangle in the z -
plane with angles π/m , π/n , π/p at the vertices z(a), z(b), z(c) . (Look first at the
boundary. This is a special case of the Schwarz–Christoffel formula, see [152], Th.17.6.1,
p.374.) The Schwarz reflection principle ([8], Th.6.5.26) proves that the image of the multi-
valued function z(w) with ramification points a, b, c is the whole plane. In fact, the pos-
sible values for (m, n, p) are (2, 3, 6) , (2, 4, 4) , (3, 3, 3) up to ordering and the image
triangles of the closed upper/lower plane cover C . The inverse function f (z) = w is a
meromorphic elliptic function (Schwarz reflection principle again) with no deficient values
and similar arguments as above show that
1 1 1
θ(a, f ) = 1 − , θ(b, f ) = 1 − , θ(c, f ) = 1 − ,
m n p
providing other extremal examples of Theorem 13.2.28 (see [221], Ch.X, §3, 236).
Remark 13.2.34. The following Nevanlinna inverse problem was completely solved by
D. Drasin, in a major paper [92]. Let {(aj , δj , θj )} , j = 1, 2, . . . be any finite or countable
sequence with distinct aj ∈ C ∪ {∞} , non-negative real numbers δj , θj satisfying 0 <
δj + θj ≤ 1 and j (δj + θj ) ≤ 2 . Then there is a meromorphic function f with
{aj } as its set of deficient values and δ(aj , f ) = δj , θ(aj , f ) = θj . In general, such a
13.2. Nevanlinna theory 455

function will have infinite order but the growth can always be bounded by T (r, f ) < rω(r) ,
with ω(r) → ∞ arbitrarily slowly. For finite order, there are additional restrictions. For
example, a meromorphic function of order 0 can have at most one deficient value (this is an
old result of Valiron from 1925, see [149], Th.4.10, p.110). This extends the result, pointed
out in the proof of Theorem 13.2.28, that f (∞) is the only possible deficient value for a
rational function. According to Drasin (loc.cit.), his method also shows that the restricted
inverse problem with the additional condition δj = 0 for every j can always be solved
with f of order 0 .
Another deep result of D. Drasin [93], solving a long-standing conjecture of F. Nevanlinna,
that if the sum of the deficiencies of a meromorphic function f of finite order ρ is
is
δ(a, f ) = 2 , then 2ρ is an integer ≥ 2 , ρ δ(a, f ) is a positive integer, and every
possibility can actually occur (see R. Nevanlinna [220], p. 357, or L. Ahlfors [4], p.406).
Meromorphic functions of finite order must also satisfy additional conditions. For example,

a result of A. Weitsman [330] shows that δ(a, f )1/3 < ∞ whenever f has finite order.

13.2.35. Picard’s little theorem, stating that a meromorphic function f : C →


P1an omitting three values is a constant, follows immediately from Theorem 13.2.28.
Indeed, suppose that a is an omitted value. Then N (r, a, f ) = 0 and δ(a, f ) = 1
by Remark 13.2.30. The defect inequality in Theorem 13.2.28 now shows that a
non-constant f has at most two omitted values in P1an .
Example 13.2.36. Finally, we show that the analogue of the abc-conjecture in
Nevanlinna theory is well known (as in the function field case). We begin by
defining the analogue of the radical for a meromorphic function. We start with the
notion of the truncated counting function:
Definition 13.2.37. Let f be a non-constant meromorphic function and let a ∈ C .
The truncated counting function N (1) (r, a, f ) is

0 (f − a)} log r
N (1) (r, a, f ) := min{1, ord+
 r
 
+ min{1, ord+z (f − a)} log   .
z
0<|z|<r

For a = ∞ we define
N (1) (r, ∞, f ) := min{1, ord−
0 (f )} log r
 r
 
+ min{1, ord− z (f )} log   .
z
0<|z|<r

 have defined in Chapter 12 the radical of a non-zero integer m to be rad(m) =


We
p|m p , the product of all distinct primes dividing m . More generally, we define
the radical of a rational number m/n (with m , n coprime) to be rad(m/n) =
rad(mn), the product of the distinct primes appearing in the factorization of the
rational number. We define the following logarithmic analogue:
456 N E VA N L I N NA T H E O RY

Definition 13.2.38. Let f be a non-constant meromorphic function. The conduc-


tor of f in |z| ≤ r is by definition

cond(r, f ) := N (1) (r, 0, f ) + N (1) (r, ∞, f ) = N (r, ∞, f  /f ).

The following result can be viewed as the analogue of an abc-theorem for mero-
morphic functions.
Theorem 13.2.39. Let f , g be non-constant meromorphic functions such that
f + g = 1.
Then
T (r, f ) ≤ cond(r, f g) + O(log T (r, f )) + O(log r)
for all r outside of a set E of finite Lebesgue measure. Moreover, if f has finite
order, then we may choose E bounded.
Proof: We apply the second main theorem to f, with the three points {a1 , a2 , a3 } =
{0, 1, ∞}. We infer that

3
m(r, aj , f ) + Nram (r, f ) ≤ 2T (r, f ) + O(log T (r, f )) + O(log r) (13.6)
j=1

for r outside of an exceptional set E of finite Lebesgue measure. Moreover, the


set E is bounded if f has finite order.
By the first main theorem, we have

3
{m(r, aj , f ) + N (r, aj , f )} = 3 T (r, f ) + O(1)
j=1

and combining this with (13.6) we get



3
T (r, f ) + Nram (r, f ) ≤ N (r, aj , f ) + O(log T (r, f )) + O(log r)
j=1

for r ∈ E . In view of 13.2.22, we verify that



3 
3
N (r, aj , f ) ≤ N (1) (r, aj , f ) + Nram (r, f )
j=1 j=1
= cond(r, f g) + Nram (r, f ),

the last step because N (1) (r, 1, f ) = N (1) (r, 0, g) and f, g have the same poles.
The theorem follows by combining the last two displayed inequalities. 
13.3. The Ahlfors–Shimizu characteristic 457

13.3. Variations on a theme: the Ahlfors–Shimizu characteristic

There is a very nice geometric definition for a slightly different characteristic func-
tion, due independently to L. Ahlfors [3] and T. Shimizu [281]. It leads to a theory
equivalent to Nevanlinna’s, but also to simpler and more elegant proofs, well mo-
tivated by underlying geometric concepts. We still denote by f : C → P1an a
non-constant meromorphic function.
13.3.1. We begin by recalling the stereographic projection and its main proper-
ties. Let S be the Riemann sphere of diameter 1 in R3 , lying on the Gauss plane
C with coordinates z = x + iy and touching the plane at the origin 0. Let N
denote the North Pole on S .

.N
.P
.z(P)

z-plane

the stereographic projection

The stereographic projection maps a point P = N on S to the point z(P ) in the


Gauss plane C intersection with the line joining N and P . This map is conformal,
i.e. preserving angles, and transforms circles into circles or lines. If P → N , then
z(P ) → ∞ and S becomes a model of the projective complex line C ∪ {∞} =
P1C . We define

⎨ √ |z12−z √2 | if z1 , z2 ∈ C,
1+|z1 | 1+|z2 |2
k(z1 , z2 ) :=
⎩√ 1 2 if {z1 , z2 } = {z, ∞}.
1+|z|

Its interpretation is that, if P , Q, are two points on S and z(P ) and z(Q) are their
images in C ∪ {∞} by the stereographic projection, then the euclidean chordal
distance P − Q between P and Q is given by
P − Q = k(z(P ), z(Q)).
This induces a distance function on P1an . For the elements of area, we easily com-
pute that the pull-back of the Fubini–Study form ω , given in affine coordinates
w by
i dz ∧ dz
ω(z) = ,
2π (1 + |z|2 )2
458 N E VA N L I N NA T H E O RY

with respect to the stereographic projection, is equal to π −1 dσ , where dσ is the


euclidean spherical area form on S . The normalization is such that ω induces a
probability measure on P1an . Let us consider the euclidean form
i 1
Φ= dz ∧ dz = dx ∧ dy
2π π
on C . Using the notation of 13.2.21, an easy computation shows that
|f  |2 |W (f0 , f1 )|2
f ∗ω = Φ = Φ. (13.7)
(1 + |f |2 )2 (|f0 |2 + |f1 |2 )2

Now we define the spherical proximity function m(r, a, f ) by

◦ 1 1
m(r, a, f ) := log dθ. (13.8)
2π 0 k(f (reiθ ), a)
A case-by-case calculation and Jensen’s formula in 13.2.6 imply

lim {m(r, a, f ) + N (r, a, f )} = −ν(a, f ), (13.9)
r→0+

where


⎨log k(f (0), a) if f (0) = a,
ν(a, f ) = log(|c(f − a, 0)|/(1 + |a|2 )) if f (0) = a and a = ∞,


− log |c(f, 0)| if f (0) = a = ∞.

Since the chordal distance is bounded by 1, we have



m(r, a, f ) ≥ 0.

With the new proximity function, the first main theorem takes the following very
elegant form:
Theorem 13.3.2. For every a, b ∈ P1an it holds
◦ ◦
m(r, a, f ) + N (r, a, f ) + ν(a, f ) = m(r, b, f ) + N (r, b, f ) + ν(b, f ).


Definition 13.3.3. The quantity T (r, f ) := m(r, a, f ) + N (r, a, f ) + ν(a, f ),
which is independent of a, is the Ahlfors–Shimizu characteristic. It has been

normalized so that limr→0+ T (r, f ) = 0.

Proof of theorem: It is enough to verify the claim for a ∈ C and b = ∞ . By


(13.8) and Jensen’s formula in 13.2.6, we have
2π 0
◦ ◦ 1 1 + |a|2
m(r, a, f ) − m(r, ∞, f ) = log dθ
2π 0 |f (reiθ ) − a|
0
= N (r, ∞, f ) − N (r, a, f ) − log |c(f − a, 0)| + log 1 + |a|2 .
13.3. The Ahlfors–Shimizu characteristic 459


Let T (r, a, f ) denote the left-hand side in the theorem. The above shows that
◦ ◦
T (r, a, f ) − T (r, ∞, f ) is constant in r . On the other hand, (13.9) shows that its
◦ ◦
limit for r → 0+ is equal to 0, hence T (r, a, f ) = T (r, ∞, f ). 
Lemma 13.3.4. Let η be an integrable 2-form on P1an . Then

f ∗η = n(r, a, f ) η(a).
|z|≤r P1an

Proof: Let γr be the curve f ({|z| = r}). Then the local mapping principle shows
that the restriction of f to {|z| < r} \ f −1 (γr ) is a ramified covering. For a
non-ramified a ∈ P1an \ γr , the number of sheets over a is equal to n(r, a, f ) and
hence the transformation formula of integrals implies the claim. 
Remark 13.3.5. As a corollary of proof, we note that n(r, a, f ) is a locally con-
stant function in a ∈ P1an \ γr . The same argument shows that n(r, a, f ) is a lo-
cally constant function in (r, a) outside of the closed set {(r, a) | r > 0, a ∈ γr }
of measure zero.
Theorem 13.3.6.
r
◦ dt
T (r, f ) = f ∗ω .
0 |z|≤t t
Proof: Since the chordal distance is continuous and bounded by 1, the spherical

proximity function m(r, a, f ) is a non-negative measurable function on R+ ×P1an .
Interchanging the order of integration using Fubini’s theorem and recalling that the
chordal distance is rotation invariant, we get


◦ 1
m(r, a, f ) ω(a) = log ω(a) = − ν(a, f ) ω(a).
P1an P1an k(f (0), a) 0

This is clearly finite and so we may integrate the equation




T (r, f ) = m(r, a, f ) + N (r, a, f ) + ν(a, f )
on P1an with respect to ω to get


T (r, f ) = N (r, a, f ) ω(a).
P1an

By Remark 13.3.5, the integrand in the definition of N (r, a, f ) is a non-negative


measurable function on R+ × P1an and hence Fubini’s theorem implies
r
◦ dt
T (r, f ) = n(t, a, f ) ω(a) .
0 1
Pan t
By Lemma 13.3.4, we get the claim including the convergence of the integral. 
460 N E VA N L I N NA T H E O RY

-
Remark 13.3.7. By Lemma 13.3.4, |z|≤r f ∗ ω is equal to the area of f ({|z| ≤
r}) on the sphere with respect to the spherical area form ω , counted with the sheet

multiplicity. Hence T (r, f ) is a logarithmic average of the growth of the spherical
area covered by f on disks of increasing radius.
-
Remark 13.3.8. Note that |z|≤r f ∗ ω is a continuous positive strictly increasing

d
function equal to r dr T (r, f ). We conclude that the Ahlfors–Shimizu character-

istic T (r, f ) is a positive convex strictly increasing function of log(r) and hence

limr→∞ T (r, f ) = ∞ . Moreover, using (13.7) on page 458 and polar coordinates,

we easily deduce that T (r, f ) is a C ∞ -function of r .
13.3.9. We have
C
1 2π  2
1 + f (reiθ ) dθ,

m(r, ∞, f ) = log
2π 0
hence
◦ 1
0 ≤ m(r, ∞, f ) − m(r, ∞, f ) ≤ log 2.
2
This proves
◦ 1
ν(∞, f ) ≤ T (r, f ) − T (r, f ) ≤ ν(∞, f ) + log 2
2

and hence T (r, f ) and T (r, f ) differ by a bounded function. We will prove this
again in 13.4.8, in a general setting valid in higher dimensions.

Now we give L. Ahlfors’s proof [5] of the second main theorem, in the following
form.
Theorem 13.3.10. Let f be a non-constant meromorphic function on C and
a1 , . . . , aq be distinct points in P1an . Let k ≥ −1 be a given real number and
let ε > 0. Then
q

◦ ◦
m(r, aj , f ) + Nram (r, f ) < 2 T (r, f ) + (1 + ε) log T (r, f ) + (k + ε) log r,
j=1
-
for r → ∞ outside of an open set E such that E
tk dt < ∞ . Moreover, if f

has finite order ρ, the result holds with the right-hand side 2 T (r, f ) + (2ρ − 1 +
ε) log r , for all large r without exception.
Remark 13.3.11. The error term is sharp for ρ = 0 . If we apply it to a non-constant ratio-
nal function, hence ρ = 0 , we recover Hurwitz’s theorem in the form (13.5) on page 453.
For ρ > 0 , the error term was improved by Z. Ye to the sharp (ρ − 1 + ε) log r , see [64],
Th.4.3.1 and Z. Ye [334].
13.3. The Ahlfors–Shimizu characteristic 461

Proof: We start with a natural generalization of the spherical proximity function.


Let us consider an absolutely continuous probability measure µ on P1an , with
associated density ρ given by dµ = ρω .
The local mapping principle implies that n(r, a, f ) is a non-negative lower semi-
continuous function in (r, a) (see Remark 13.3.5), which is bounded for bounded
r . Therefore n(r, a, f ) is µ-integrable and we may define the enumerating func-
tion
nµ (r, f ) := n(r, a, f ) dµ(a) < ∞.
P1an

We also define a non-negative proximity function



◦ ◦
mµ (r, f ) := m(r, a, f ) dµ(a).
P1an

Let r > r0 > 0, where r0 is fixed and will be chosen later. We assume that

mµ (r0 , f ) < ∞ . Then the first main theorem in 13.3.2 yields
r
◦ ◦ ds ◦
mµ (r, f ) − mµ (r0 , f ) + nµ (s, f ) = T (r, f ) − T (r0 , f ) (13.10)
r0 s

proving that mµ (r, f ) < ∞ . We have

nµ (r, f ) = n(r, a, f ) dµ(a) = f ∗ (ρω)
P1an |z|≤r
(13.11)
1 r 2π
|f  (seiθ )|2 iθ
=   2 2 ρ(f (se )) s ds dθ,
π 0 0  iθ 
1 + f (se )

where we have used Lemma 13.3.4 and the last step was obtained by (13.7) on
page 458 writing f ∗ (ρω) in polar coordinates.
 2
By (13.7) on page 458, it is clear that |f  |2 / 1 + |f |2 is a C ∞ -function on C
and we define

1 |f  (reiθ )|2
λ(r) := ρ(f (reiθ )) dθ.
2π 0 (1 + |f (reiθ )|2 )2

Using (13.11) and the definition of λ in (13.10), we infer that


r s
ds ◦

2 λ(t) t dt ≤ T (r, f ) + mµ (r0 ; f ). (13.12)
r0 0 s

Now we proceed to the key step, namely obtaining a lower bound for λ(r). To
this end, we use Jensen’s inequality (see [252], Th.3.3) applied to log which
generalizes the inequality between arithmetic and geometric means:
462 N E VA N L I N NA T H E O RY

Lemma 13.3.12. If F (x) is positive and integrable in the interval [a, b], then
b
b
1 1
log F (x) dx ≥ log F (x) dx.
b−a a b−a a

Hence from the definition of λ(r) and Jensen’s inequality we get


2π 2π
1 |f  (reiθ )|2 1
log dθ + log ρ(f (reiθ )) dθ ≤ log λ(r).
2π 0 (1 + |f (reiθ )|2 )2 2π 0
(13.13)
The usefulness of this inequality lies in the decoupling of f and ρ resulting from
it. The idea of using it in this context goes back to F. Nevanlinna.
We evaluate the first integral in (13.13) by splitting it as
2π 2π C
1 1
2 log |f  (reiθ )| dθ − 4 log 1 + |f (reiθ )|2 dθ.
2π 0 2π 0
The result is

2N (r, 0, f  ) − 2N (r, ∞, f  ) + 2 log |c(f  , 0)| − 4m(r, ∞, f ),
-
as we verify by applying Jensen’s formula in 13.2.6 to 2π 1
log |f  (reiθ )| dθ . By

the first main theorem in the Ahlfors–Shimizu form, we also have m(r, ∞, f ) =

T (r, f ) − N (r, ∞, f ) − ν(∞, f ), hence the first term in the left-hand side of
(13.13) is

2N (r, 0, f  ) + 4N (r, ∞, f ) − 2N (r, ∞, f  ) − 4T (r, f )
+ 2 log |c(f  , 0)| + 4ν(∞, f ).

We have already noted in (13.4) on page 451 that


Nram (r, f ) = N (r, 0, f  ) + 2N (r, ∞, f ) − N (r, ∞, f  ).
Therefore, we have proved the important inequality

1 ◦
log ρ(f (reiθ )) dθ − 4T (r, f ) + 2Nram (r, f ) ≤ log λ(r) + O(1).
2π 0
(13.14)
To complete the proof of the second main theorem, we combine the lower bound
for λ(r) given by (13.14) with the upper bound (13.12), together with a careful
choice for µ.
The upper bound (13.12) controls only a certain double average of λ(r), while we
want a pointwise bound. We can achieve this by excluding an exceptional set of
values of r . A simple way is as follows. We write for brevity
r r
ds
A(r) := λ(t)t dt, B(r) := A(s) .
0 r0 s
13.3. The Ahlfors–Shimizu characteristic 463

Note that A is a non-negative increasing continuous function and B(r) is a non-


negative increasing C 1 -function. We may choose r0 > 0 such that A(r0 ) > 0.
Note that rB  (r) = A(r) and hence B(r) is a strictly increasing function of log r
for r ≥ r0 .
We fix a parameter k ≥ −1 and ε > 0 and define
" #
E1 := r > r0 | λ(r) > rk−1 A(r)1+ε
" #
E2 := r > r0 | A(r) > rk+1 B(r)1+ε
E := E1 ∪ E2 .
Let S be the set of r > r0 such that A is strictly increasing in a neighbourhood
of r . Clearly, S is an open subset of (r0 , ∞) and E1 \ S is a set of measure zero.
Then

rλ(r) dA(r) A(r0 )−ε
rk dr ≤ 1+ε
dr = 1+ε

E1 E1 \S A(r) E1 \S A(r) ε
and in a similar way

dB(r) dB(r) B(r0 )−ε
rk dr = rk+1 ≤ 1+ε
≤ .
E2 E2 A(r) E2 B(r) ε
Thus for r > r0 and outside of the set E := E1 ∪ E2 we have by (13.12) on
page 461
2
λ(r) ≤ rk−1 A(r)1+ε ≤ rk−1+(k+1)(1+ε) B(r)(1+ε)

(1+ε)2
1◦
≤ r2k+(k+1)ε T (r, f ) + O(1)
2
and moreover

1 
rk dr < A(r0 )−ε + B(r0 )−ε .
E ε
We plug this upper bound for λ(r) into (13.14) and easily obtain

1 ◦ ◦
log ρ(f (reiθ )) dθ ≤ 4T (r, f ) − 2Nram (r, f ) + (1 + ε)2 log T (r, f )
2π 0
+ {2k + (k + 1)ε} log r + O(1)
(13.15)
for r ∈
/ E ∪ [0, r0 ]. The O(1) term depends a priori on f , µ, and r0 but is
independent of r and ε .
Let a1 , . . . , aq be distinct points in P1an . We define a non-negative function

q
1
R(a) := log
j=1
k(a, aj )
464 N E VA N L I N NA T H E O RY

and set
log ρ(a) := 2R(a) − α log(1 + R(a)) + K, (13.16)

where α > 1 and K is a constant such that dµ = ρω is a probability measure on


P1an . This choice is indeed admissible, because the singularities of ρ occur only at
a = aj . If we write t = a − aj , t = |t|eiθ for aj = ∞ , then in a neighborhood
of t = 0 we have

−α
O(1) −2 1
ρ(aj + t) = e |t| log ,
|t|
while ω(aj + t) is comparable with |t| d|t| dθ . Since α > 1, then this and a
similar calculation for aj = ∞ show that ρ is integrable.

We claim that for this µ we can choose r0 such that mµ (r0 , f ) < ∞ . In order
to verify this, it suffices to have aj ∈ f ({|z| = r0 }) for j = 1, . . . , q , which is
obviously the case for almost every r0 . Indeed, the potential function

1
P (z) = log dµ(a)
1
Pan k(z, a)
is bounded for z away from {aj | j = 1, . . . , q} and our claim follows noting that
◦ 1
- 2π
mµ (r0 , f ) = 2π 0
P (f (r0 eiθ )) dθ .
By definition
2π 
q
1 iθ ◦
R(f (re )) dθ = m(r, aj , f ).
2π 0 j=1

Moreover, Jensen’s inequality yields



1 1  
log 1 + R(f (reiθ )) dθ ≤ log 1 + R(f (reiθ )) dθ
2π 0 2π 0
q


≤ log 1 + m(r, aj , f )
j=1

≤ log T (r, f ) + log q + O(1).
+

Putting everything together, we conclude from our choice (13.16) that


2π q
1 ◦

log ρ(f (reiθ )) dθ ≥ 2 m(r, aj , f ) − α log+ T (r, f ) − O(1).
2π 0 j=1
(13.17)

By Remark 13.3.8, we know that T (r, f ) is a strictly increasing unbounded func-

tion and so we may assume T (r, f ) ≥ 1 for r ≥ r0 .
13.4. Holomorphic curves 465

If we combine (13.17) and (13.15) on page 463, we find



q


2 m(r, aj , f ) ≤ 4T (r, f ) − 2Nram (r, f )
j=1

+ {α + (1 + ε)2 } log T (r, f ) + {2k + (k + 1)ε} log r + O(1)
-
for r outside of a set E with E rk dr < +∞ . Note that we may choose E
open because outside of a countable set of r ≥ r0 without accumulation point,
our function λ(r) is continuous, hence E1 is open and E2 is open anyway. Since
ε can be taken arbitrarily small and α > 1 can be taken arbitrarily close to 1, we
get the first statement of the second main theorem.

Finally, suppose that f has finite order ρ , hence T (r, f ) = O(rρ+ε ). We fix
k > ρ − 1 and we may choose ε > 0 with k > ρ − 1 + ε . Then by the first main
theorem it suffices to show that
◦ 
q
(q − 2)T (r, f ) + Nram (r, f ) ≤ N (r, aj , f ) + (ρ + k) log r
j=1

holds for sufficiently large r . This certainly holds outside of the set E . Now the
left-hand side of this inequality is an increasing function of r , hence if r ∈ E we
have
◦  q
(q − 2)T (r, f ) + Nram (r, f ) ≤ N (r1 , aj , f ) + (ρ + k) log r1 (13.18)
j=1
-
for any r1 > r with r1 ∈ / E . On the other hand, since E rk dr < ∞ , we
can always find such an r1 with r1 < r + r−k if r is large enough. This gives
log(r1 /r) < r−k−1 . Further, for any a and r < R , we have
R R
n(r, a, f ) log ≤ N (r, a, f ) − N (r, a, f ) ≤ n(r, a, f ) log .
r r

If we take R = er1 , we get a fortiori n(r1 , a, f ) ≤ T (er1 , f )+O(1) = O(rρ+ε ).
If we take R = r1 , we get N (r1 , a, f ) − N (r, a, f ) = O(rρ+ε−k−1 ) = o(1),
because k > ρ − 1 + ε . This shows that having r1 in the right-hand side of
(13.18) has no effect on the final result and completes the proof. 

13.4. Holomorphic curves in Nevanlinna theory

In this section, we consider a holomorphic map f : C → X into a complex va-


riety X . We generalize the concepts from the case X = P1an considered in 13.2.
After introducing characteristic functions for metrized line bundles completely
analogous to Section 2.7, the generalization of Nevanlinna’s first main theorem
corresponds to Weil’s theorem in 2.7.14. The extension of Nevanlinna’s second
466 N E VA N L I N NA T H E O RY

main theorem to holomorphic curves f remains conjectural, but the case of Pnan
with D the sum of hyperplanes in general position was proved in Cartan’s thesis.
We deduce this special case from the lemma on the logarithmic derivative using
Wronskian techniques. The case of a curve X is also well known and we finish by
describing the generalization replacing C by finite ramified analytical coverings,
which find their arithmetical counterpart in finite extensions of number fields. At
the end, we give a reformulation of the abc-theorem for meromorphic functions,
which makes the analogy to the abc-conjecture clearer.
For this section, the reader is assumed to be familiar with the analytic theory of
complex varieties (see Section A.14 for an introduction and [130] for more details).
13.4.1. Let s be an invertible meromorphic section of a line bundle L giving
rise to a Cartier divisor D = div(s). We suppose that f (C) is not contained
in the support of D . We use the notation ordY (D) := ordY (s) to denote the
multiplicity of D in the prime divisor Y .
Definition 13.4.2. For r > 0, the counting function is defined by
 r
 
Nf,D (r) := ord0 (f ∗ D) log r + ordz (f ∗ D) log  
z
0<|z|<r

13.4.3. Let   be a continuous metric on L. Then


λD (x) := − log s(x)
is a local height with respect to D (see Section 2.7). It is continuous outside of the
support of D and has logarithmic singularities along D . For notational simplicity,
we omit here the reference to the metric.
Definition 13.4.4. The proximity function is defined by

1
mf,D (r) := λD ◦ f (reiθ ) dθ
2π 0
and the characteristic function is
Tf,D (r) := Nf,D (r) + mf,D (r).
Example 13.4.5. Let X = Pnan and let L = OPn (1). We denote the coordinates
on Pnan by x0 , . . . , xn . We consider a global section (x) = 0 x0 + · · · + n xn of
OPn (1), whose divisor is a hyperplane H . On L, we choose the standard metric
given by
|(x)|
(x) = .
max{|x0 |, . . . , |xn |}
A holomorphic map f : C → Pnan is always given by entire functions f0 , . . . , fn
without common zeros such that f0 (z), . . . , fn (z) are the homogeneous coordi-
nates of f (z). This follows from the Weierstrass factorization theorem ([8], Ch.5,
Th.7). Assuming f (C) ⊂ H , we verify that
Nf,H (r) = N (r, 0,  ◦ f )
13.4. Holomorphic curves 467

and

1
mf,H (r) = log max{|f0 (reiθ) |, . . . , |fn (reiθ )|} dθ
2π 0

1
− log | ◦ f (reiθ )| dθ.
2π 0

Since N (r, ∞,  ◦ f ) = 0, Jensen’s formula in 13.2.6 proves



1
Tf,H (r) = log max{|f0 (reiθ )|, . . . , |fn (reiθ )|} dθ − log |c( ◦ f , 0)|.
2π 0
Example 13.4.6. We specialize further in Example 13.4.5 by setting n = 1. Let
(x) = x0 and hence H = [∞] on P1an . Then we have
Nf,H (r) = N (r, ∞, f )
and
mf,H (r) = m(r, ∞, f ),
where on the right-hand side, we have the quantities from the definitions in 13.2.4
and 13.2.7. We conclude that Tf,H (r) = T (r, f ).
Example 13.4.7. Let us consider again the case n = 1 taking now (x) = x1 −
ax0 for some a ∈ C , hence H = [a] on P1an . Again, we have
Nf,H (r) = N (r, a, f ).
To get the old proximity function from Definition 13.2.4, we use the new metric
|x1 − ax0 |
x1 − ax0  =
max{|x0 |, |x1 − ax0 |}
on L = OP1 (1). We easily compute from Example 13.4.5 that
mf,H (r) = m(r, a, f ).
Example 13.4.8. Going back to X = Pnan , let us consider the Fubini–Study
metric
|(x)|
(x) = 1/2
(|x0 | + · · · + |xn |2 )
2

on L = OPn (1). Using the standard differential operator


1
dc = (∂ − ∂),
4πi
the Chern form
ddc log (x)−2 = ω
468 N E VA N L I N NA T H E O RY

is the generalization to higher dimension of the Fubini–Study form from 13.3.1.


First, we assume f (0) ∈ H = div((x)). We compute the proximity function

1
mf,H (r) = − log  ◦ f (reiθ ) dθ
2π 0
r 2π

∂ 1
=− log  ◦ f (teiθ ) dθ dt − log  ◦ f (0).
0 ∂t 2π 0

The differential operator dc is given in polar coordinates z = reiθ by


1 ∂g ∂g dr
dc g = r dθ −
4π ∂r ∂θ r
for any differentiable function g . Therefore, by interchanging differentiation and
integration, we get
r
dt
mf,H (r) = − dc log  ◦ f (z)2 − log  ◦ f (0).
0 |z|=t t
By Stokes’s theorem, we have

d log  ◦ f (z) =
c 2
ddc log  ◦ f (z)2 + n(t, 0,  ◦ f ),
|z|=t |z|≤t

where the enumerating function n(t, 0, g) is the number of zeros, counted with
multiplicities, of the holomorphic function g =  ◦ f in the open disk {|z| < t}.
By
f ∗ ω = ddc log  ◦ f −2
and r
dt
n(g, 0, t) = N (r, 0, g),
0 t
we get
r
dt
mf,H (r) = f ∗ω − N (r, 0,  ◦ f ) − log  ◦ f (0).
0 |z|≤r t
This proves our final result
r
dt
Tf,H (r) = f ∗ω − log  ◦ f (0) (13.19)
0 |z|≤r t
provided f (0) ∈ H . Note that
r
dt
Tfω (r) := f ∗ω
0 |z|≤r t
does not depend on the choice of . It is called the Ahlfors–Shimizu character-
istic of f generalizing the construction in 13.3.3. The equivalence with Tf,H (r)
is a consequence of (13.19).
13.4. Holomorphic curves 469

If f (0) ∈ H , then Tf,H is still equivalent to Tfω by using the second part of the
following first main theorem in the higher-dimensional case.
Theorem 13.4.9. Let L be a line bundle on the compact complex variety X with
an invertible meromorphic section s and a continuous metric   giving rise to
the characteristic function Tf,div(s) (r). For a holomorphic map f : C → X with
f (C) ⊂ |div(s)|, the following results hold:

(a) If we replace the metric   by a continuous metric   getting the char-



acteristic function Tf,div(s) , then

Tf,div(s) (r) − Tf,div(s) (r) = O(1)
with bound independent of f .
(b) If we replace s by the invertible meromorphic section s such that f (C) ⊂
|div(s )|, then
Tf,div(s) (r) − Tf,div(s ) (r) = log |c((s /s) ◦ f, 0)|.
Proof: Changing the metric does not influence the counting function. For the
corresponding proximity functions, we get

1 s
mf,div(s) (r) − mf,div(s) (r) = log ◦ f (reiθ ) dθ.
2π 0 s
Now s /s is a continuous function on X without zeros. By compactness of

X , we conclude that log s
s is bounded proving (a).

Now we exchange s . Then we have


  
1 2π s 
mf,div(s) (r) − mf,div(s ) (r) =  iθ 
log  ◦ f (re ) dθ
2π 0 s
and
Nf,div(s) (r) − Nf,div(s ) (r) =

 
r
s s  
− ord0 ◦ f log r − ord0 ◦ f log   .
s s z
0<|z|<r

By Jensen’s formula from 13.2.6, we get (b). 


Remark 13.4.10. Considering f as fixed, we see that the characteristic function
is determined by the isomorphism class of L up to bounded functions. This makes
the analogy with Theorem 2.3.8 in diophantine geometry perfect. In particular, we
denote by Tf,L any characteristic function, it is determined up to O(1). Moreover,
the proof shows that mf,D is determined by the Cartier divisor D up to a bounded
quantity.
470 N E VA N L I N NA T H E O RY

Proposition 13.4.11. Let D1 , D2 be Cartier divisors on the compact complex


variety X and let f : C → X \ (supp(D1 ) ∪ supp(D2 )) be a holomorphic map.
Then
mf,D1 +D2 = mf,D1 + mf,D2 + O(1), Nf,D1 +D2 = Nf,D1 + Nf,D2
and
Tf,D1 +D2 = Tf,D1 + Tf,D2 + O(1).
Proof: This follows as for local heights. The details are left to the reader. 
Proposition 13.4.12. Let ϕ : X → X  be a holomorphic map of compact complex
varieties and let L be a line bundle on X  . Then
Tϕ◦f,L = Tf,ϕ∗ L + O(1).
Moreover, if s is an invertible meromorphic section of L with D := div(s )
such that neither f (C) nor any component of X are mapped into supp(D ), then
Nϕ◦f,D = Nf,ϕ∗ D , mϕ◦f,D = mf,ϕ∗ D + O(1).
Proof: Using the pull-back metric from L , all claims are immediately clear from
the definitions and hold even without the O(1) term. The first main theorem im-
plies the claim in general. 
Proposition 13.4.13. If the line bundle L on the compact complex variety X is
generated by global sections, then Tf,L is bounded below.
Proof: If X = Pnan and L = OPn (1), then, by choosing the Fubini–Study metric
and s = xj with f (0) ∈ div(xj ), it is clear that mf,div(s) ≥ 0 and Nf,div(s) ≥ 0.
In general, L is a pull-back of a line bundle OPn (1) (cf. A.6.8) and the claim
follows from Proposition 13.4.12. 
Concerning the generalization of the second main theorem, not much is known
apart from the linear case in Pnan which we study next.
Definition 13.4.14. The set {H1 , . . . , Hq } of hyperplanes in PnC is said to be
in general position if any sublist of less than n + 2 hyperplanes corresponds to
linearly independent linear forms. Equivalently, for any subset I ⊂ {1, . . . , q} of
cardinality |I| ≤ n + 1, we have
+

dim Hi = n − |I|.
i∈I

13.4.15. Let D := H1 + . . . + Hq be the associated divisor. Let f : C → Pnan be a


holomorphic map whose image is not contained in any hyperplane of Pnan . By the
Weierstrass factorization theorem ([8], Ch.5, Th.7), we have f = (f0 : · · · : fn )
for entire functions f0 , . . . , fn without common zero. We define the Wronskian
13.4. Holomorphic curves 471

of (f0 : · · · : fn ) by
⎛ ⎞
f0 f1 ··· fn
⎜f0 f1 ···  ⎟
fn ⎟
W (f0 , . . . , fn ) = det ⎜
⎝ · · ··· · ⎠.
(n) (n) (n)
f0 f1 ··· fn
Clearly this is determined by f up to entire functions without zeros, hence the
counting function
Nf,ram (r) := N (r, 0, W (f0 , . . . , fn ))
is well defined. In this situation, we have Cartan’s second main theorem:
Theorem 13.4.16. For any ε > 0 and any given hyperplane H , the inequality
mf,D (r) + Nf,ram (r) ≤ (n + 1)Tf,H (r) + O(log Tf,H (r)) + O(log r)
holds for all r outside of a set E of finite Lebesgue measure.
Proof: If we add additional hyperplanes to H1 , . . . , Hq , then the left-hand side
increases up to bounded functions (see the proof of Theorem 13.4.9). So we may
assume q ≥ n + 2 and we set p := q − n − 1. Let j (x) be a linear form with
Hj = div(j (x)). For K = {k1 , . . . , kp } ⊂ {1, . . . , q} of cardinality p , we
define
K (x) := k1 (x) · · · kp (x).
We consider the morphism
(p )
ϕ : Pnan −→ Panq , x → (K (x))|K|=p .
Using ϕ∗ O (p ) (1) ∼
= OPn (p), Proposition 13.4.12 implies
P q
Tϕ◦f,O(1) (r) = pTf,H (r) + O(1). (13.20)
Let I = {i1 , . . . , in+1 } := {1, . . . , q} \ K and gi := i (f0 , . . . , fn ). Then we
have
W (gi1 , . . . , gin +1 ) = dI · W (f0 , . . . , fn ), (13.21)
where dI is the determinant of the (n + 1) × (n + 1) matrix formed with the
coefficients of (i )i∈I . We define the logarithmic Wronskian by
⎛ ⎞
1 1 ··· 1
⎜ f0 /f0 f1 /f1 · · · fn /fn ⎟
λ(f0 , . . . , fn ) := det ⎜ ⎟ = W (f0 , . . . , fn ) .
⎝ · · ··· · ⎠ f0 · · · fn
(n) (n) (n)
f0 /f0 f1 /f1 · · · fn /fn
From (13.21), we deduce
g1 · · · gq
K (f ) = · d−1 · λ(gi1 , . . . , gin +1 ). (13.22)
W (f0 , . . . , fn ) I
472 N E VA N L I N NA T H E O RY

To compute (13.20), we will use equation (13.22). The first factor is independent
of K and will be handled by Jensen’s formula. The second factor is independent
of r and hence contributes only a bounded amount to the height. The third factor
is small by the lemma on the logarithmic derivative in 13.2.23.
The details are as follows. For a moment, we fix r > 0. Given a meromorphic
function g on C and v ∈ C, |v| ≤ r , it is notationally convenient to use

⎨|g(v)|

 v ordv (g)
if |v| = r,
|g|v =  r  if v = 0,

⎩ −ordv (g)
r if v = 0.
The functions | |v behave almost like absolute values and Jensen’s formula may
be interpretated as a product formula (see 14.2.2). Later, this notation is helpful
for translating the argument to the function field case.
Let v ∈ C , |v| = r with v ∈ f −1 (D). Then we have
(π ) (π )
log |λ(gi1 , . . . , gin +1 )|v ≤ log max |gi1 0 /gi1 · · · gin +1
n
/gin +1 |v + log |(n + 1)!|v
π

n 
q
(j)
≤ log+ |gi /gi |v + log |(n + 1)!|v ,
j=0 i=1
(13.23)
where π ranges over all bijective maps of {0, . . . , n}. By Example 13.4.5 and
Jensen’s formula in 13.2.6, we have

1
Tϕ◦f,O(1) (r) = max log |K (f )|v + max log |K (f )|v dθ + O(1).
K 2π |v|=r K
|v|<r
(13.24)
For |v| < r , we use |K (f )|v ≤ 1. For |v| = r , equation (13.22) implies

1
max log |K (f )|v dθ
2π |v|=r K
 

1  g1 · · · gq 
≤ 
log   − log |dI |v + |λ(gi1 , . . . , gin +1 )|v dθ.
2π |v|=r W (f0 , . . . , fn ) v
(13.25)
Applying the lemma on the logarithmic derivative (see Lemma 13.2.23) to

) · · · (g  /g)
(j) (j) (j−1)
gi /gi = (gi /gi
in (13.23), we get

1
log |λ(gi1 , . . . , gin +1 )|v dθ ≤ O(log Tf,H (r)) + O(log r) (13.26)
2π |v|=r
13.4. Holomorphic curves 473

for all r outside of a set E of finite Lebesgue measure. Here, we have to note that
Tgi = O(Tf,H ) and similarly for the derivatives of gi . Using (13.25) and (13.26)
in (13.24), we get
 
1  g1 · · · gq 
Tϕ◦f,O(1) (r) ≤ log   dθ + O(log Tf,H (r)) + O(log r)
2π |v|=r W (f0 , . . . , fn ) v
for r ∈ E . By Jensen’s formula in 13.2.6, we deduce
  
 W (f0 , . . . , fn ) 
Tϕ◦f,O(1) (r) ≤ log   + O(log Tf,H (r)) + O(log r)

g1 . . . gq v
|v|<r

= Nf,D (r) − Nf,ram (r) + O(log Tf,H (r)) + O(log r)


outside of E . If we insert this in (13.20) on page 471, we get the claim from
Tf,D = mf,D + Nf,D . 

13.4.17. Let X be a connected projective complex manifold with an ample line



bundle H . The highest exterior power KX := ∧dim TX of the cotangent bundle
is called the canonical line bundle of X .
We consider a divisor D on X with normal crossings, i.e. for every x ∈ X ,
there is a holomorphic coordinate system w around x such that D is given by
the equation w1 · · · wk = 0. The generalization of the second main theorem is the
following conjecture of Griffiths:

Conjecture 13.4.18. There is a closed algebraic subset Z = X such that for any
holomorphic map f : C → X with f (C) ⊂ Z , the estimate
mf,D (r) + Tf,KX (r) = O(log Tf,H (r)) + O(log r)
holds for all r outside of a set of finite Lebesgue measure.
q
Remark 13.4.19. In the case X = Pnan and D = j=1 Hj , we compare the
Griffiths conjecture with Cartan’s second main theorem in 13.4.16. The assump-
tion that D has normal crossings means that the hyperplanes are in general posi-
tion. For H , we may choose a hyperplane and we have KX = O(−(n + 1)H)
(cf. [148], Example II.8.20.1). If we neglect the ramification term Nf,W , then the
Griffiths conjecture matches with Cartan’s second main theorem up to the excep-
tional set Z . P. Vojta [316] removed this discrepancy and proved that Griffiths’s
conjecture holds for X = Pnan and D a sum of hyperplanes in general position,
with an exceptional set Z equal to a finite union of proper linear subspaces.

13.4.20. Let f : C → C  be an analytic map of Riemann surfaces and we assume


that f is not constant on any connected component of C . Then the ramification
divisor Rf is locally given as a Cartier divisor by f , where the derivative is with
respect to any charts on C and C  . Then Example B.4.8 shows that the notion
474 N E VA N L I N NA T H E O RY

agrees in the algebraic framework with the definition in B.4.4. If g : C  → C  is


a further analytic map of Riemann surfaces, the chain rule gives
Rg◦f = Rf + f ∗ Rg . (13.27)

 map f : C → C to a Riemann surface C


13.4.21. For a non-constant analytic
with ramification divisor Rf = mz [z], we have the counting function
 r
 
NRf (r) = m0 log r + mz log   .
z
0<|z|<r

13.4.22. Next, we prove Griffiths’s conjecture for an irreducible smooth projective


curve C . Let D be a sum of different points a1 , . . . , aq ∈ C . We choose also an
ample line bundle H on C .
Theorem 13.4.23. Let f : C → C \ {a1 , . . . , aq } be a non-constant holomorphic
map. Then
mf,D (r) + Tf,KC (r) + NRf (r) = O(log Tf,H (r)) + O(log r)
for r outside of a set of finite Lebesgue measure.
Proof: This follows formally from Nevanlinna’s second main theorem using a
non-constant rational function on C viewed as a morphism g : C → P1an and
functorial properties.
The details are as follows: We choose an effective divisor D0 on P1an such that
D ⊂ g −1 (D0 ) and such that g is unramified outside of g −1 (D0 ). Let D :=
g ∗ (D0 ), D1 := (D )red (namely, the sum of the components of D ) and D2 :=
D1 − D . Nevanlinna’s second main theorem in 13.2.24 implies
Tg◦f,D0 (r) + Tg◦f,KP1 (r) + NRg ◦f (r)
(13.28)
≤ Ng◦f,D0 (r) + O(log Tg◦f,OP1 (1) (r)) + O(log r)
outside of a set of finite Lebesgue measure. First, note that Proposition B.4.7
implies that Rg = D − D1 and hence Propositions 13.4.11 and 13.4.12 yield
Ng◦f,D0 (r) = Nf,D (r) = Nf,D1 (r) + Nf,Rg (r)
(13.29)
= Nf,D (r) + Nf,D2 (r) + Nf,Rg (r).
Note that (13.27) implies
NRg ◦f (r) = NRf (r) + Nf,Rg (r).
Hurwitz’s theorem in B.4.5 gives
KC ∼= g ∗ (KP1 ) ⊗ O(Rg ).
By the first main theorem in 13.4.9 and Rg = D − D1 , we get
Tg◦f,D0 (r) + Tg◦f,KP1 (r) = Tf,D (r) + Tf,g∗ KP1 (r) + O(1)
= Tf,D (r) + Tf,D2 (r) + Tf,KC (r) + O(1).
13.4. Holomorphic curves 475

Finally, for n sufficiently large, we have that H ⊗n ⊗g ∗ OP1 (−1) is base-point-free


(cf. A.6.10) and hence Proposition 13.4.13 shows

Tg◦f,OP1 (1) (r) = Tf,g∗ OP1 (1) (r) + O(1) = O(Tf,H (r)) (13.30)

for r → ∞ . Here we have to note that Tf,H is unbounded because f is not


a constant (see Proposition 13.2.17). If we use (13.29) and (13.30) in (13.28),
we get easily the claim because Nf,D2 ≤ Tf,D2 + O(1) and mf,D + Nf,D =
Tf,D + O(1). 

Remark 13.4.24. The generalization of the second main theorem to equidimen-


sional non-degenerate holomorphic maps f : Cn → X is due to J. Carlson and P.
Griffiths [58]. It motivates the Griffiths conjecture. Note that the ramification term
only occurs in the equidimensional setting. The error term may be improved in the
equidimensional case in a similar way as in Remark 13.2.25. For details, we refer
to [174], Ch.2, or [249], Ch.5.
Back to X = C , the case of genus g = 0 is just Nevanlinna’s second main
theorem from 13.2.24. If g = 1, then KC is trivial (cf. Proposition 8.2.26 or
A.13.6) and we conclude that the deficient value satisfies
mf,[a] (r)
δf (a) := lim inf ≤0
r→∞ Tf,[a] (r)

for every a ∈ C . The first main theorem in 13.4.9 proves Nf,[a] (r) > 0 for r
sufficiently large, hence every f is surjective if the genus is g = 1.
For genus g ≥ 2, KC is ample (see A.13.6, A.13.7). Theorem 13.4.23 implies that
Tf,[a] = O(log r). It follows from Proposition 13.2.17 that f induces an algebraic
morphism P1C → C , which is impossible by Hurwitz’s theorem in B.4.6.

Remark 13.4.25. The above results for curves may be proved without Nevanlinna
techniques. To get non-trivial results in the case g ≥ 1, we may prove a similar
inequality as in Theorem 13.4.23 for holomorphic maps f : {|z| ≤ R} → C and
we get a upper bound for R (cf. S. Lang and W. Cherry [174], p.93 or [64], §5.8).

13.4.26. For an effective reduced divisor D on C , we define the truncated count-


(1)
ing function Nf,D (r) by
 r
(1) ∗ ∗  
Nf,D (r) := min{1, ord+ 0 (f D)} log r + min{1, ord+z (f D)} log   .
z
0<|z|<r

(1)
Lemma 13.4.27. Nf,D (r) ≥ Nf,D (r) − NRf (r)
Proof: This follows from f ∗ D − (f ∗ D)red ≤ Rf proved similarly as in Proposi-
tion B.4.7 (or just count zeros locally) and from Proposition 13.4.13. 
476 N E VA N L I N NA T H E O RY

As an immediate consequence, we obtain the truncated second main theorem


for curves.

Theorem 13.4.28. For every ε > 0, we have


(1)
Tf,D (r) + Tf,KC (r) ≤ Nf,D (r) + O(log Tf,H (r)) + O(log r).
Proof: This follows from Tf,D = mf,D + Nf,D , Theorem 13.4.23 and Lemma
13.4.27. 

13.4.29. Theorem 13.4.23 holds also if we replace C by a finite ramified covering.


In diophantine geometry, this corresponds to passing from Q to a number field.
Let Y be a connected Riemann surface and let p : Y → C be a ramified covering.
As we have seen in 12.3.8, this extends to a finite algebraic morphism p : Y → P1C
of the compactifications.
Let D be a divisor on a complex variety X and let f : Y → X be a non-constant
holomorphic map with image not contained in supp(D). For r > 0, we define
the counting function
Nf,D (r) :=
   

1  r 

ordy (f D) log r + ∗ 
ordy (f D) log  
deg(p) p(y) 
p(y)=0 0<|p(y)|<r

and the proximity function, using a continuous metric on O(D) as before, is



1
mf,D (r) := − log sD (y) p∗ (dθ).
2π deg(p) p−1 {|z|=r}

Note the analogy with the normalizations in 1.3.6. They lead to the fact that the
characteristic function

Tf,D (r) := Nf,D (r) + mf,D (r)

is invariant under base change to a finite ramified covering of Y . Then the gener-
alization of Theorem 13.4.23 is

Theorem 13.4.30. Let a1 , . . . , aq be different points of a complex projective curve


C with an ample line bundle H and let f : Y → C \ {a1 , . . . , aq } be a holomor-
phic map. Then

q
mf,[aj ] (r) + Tf,KC (r) + NRf (r) − NRp (r) ≤ O(log(rTf,H (r)))
j=1

holds for all r > 0 outside of a set of finite Lebesgue measure.


13.5. Bibliographical notes 477

This is a special case of the second main theorem of P. Griffiths and J. King [131]
and W. Stoll [292] for parabolic coverings in the equidimensional setting analo-
gous to Remark 13.4.24. For a proof, we refer to [292], Th.18.13E.
Remark 13.4.31. This result indicates that NRp measures the dependence on the
covering p . However, to make sense of this statement, we should look for a second
main theorem uniform in f . This is done explicitly by W. Cherry (cf. [174] and
[63] for details) and his result implies that NRp indeed gives the contribution of
the covering p to the second main theorem.

The following is a reformulation of the abc-theorem in 13.2.39:


Theorem 13.4.32. Let f, g, h be non-constant entire functions without common
zeros such that f + g = h and let

1
T (r, f, g, h) := log max{|f (reiθ )|, |g(reiθ )|, |h(reiθ )|} dθ.
2π 0
Then
T (r, f, g, h) ≤ N (1) (r, 0, f gh) + O(log T (r, f, g, h)) + O(log r)
holds for all r > 0 outside of a set E of finite Lebesgue measure. If f, g, h have
finite order, we may choose E bounded.
Proof: By Example 13.4.5, T (r, f, g, h) and T(h:f :g),H differ only by a bounded
function for any hyperplane H of P2an . Using Proposition 13.4.12 for the map
ϕ((x0 : x1 )) = (x0 : x1 : (x0 − x1 )) and Example 13.4.6, we conclude that the
left-hand side may be replaced by T (r, f /h). On the other hand, we have

f g
cond r, · = N (1) (r, 0, f gh)
h h
and the first claim follows from Theorem 13.2.39 applied to the relation fh + hg =
1. If f, h are of finite order, then we have seen in 13.2.20 that f /h has also finite
order and the last claim follows as well. 

13.5. Bibliographical notes

For more details about Nevanlinna theory in one variable, we refer the reader to
the classic books of R. Nevanlinna [221] and W. K. Hayman [149]. If the reader
is interested in a finer analysis of the error term, he is referred to W. Cherry and Z.
Ye [64].
The foundations of the theory are given in R. Nevanlinna’s article [219]. He proved
the lemma on the logarithmic derivative to deduce his second main theorem. A
little later, his brother F. Nevanlinna gave a proof of the second main theorem
based on differential geometry and differential equations. Ahlfors simplified this
478 N E VA N L I N NA T H E O RY

proof further and expanded its geometric interpretation leading to the presentation
in Section 13.3. In a breakthrough work [6], L. Ahlfors interpretated and extended
Nevanlinna theory as a geometric theory of covering surfaces.
Cartan’s formula appears in H. Cartan [59]. In his thesis (see H. Cartan [60]), he
proved his second main theorem for hyperplanes.
References for Section 13.4 are the books of S. Lang [170], S. Lang and W. Cherry
[174] and M. Ru [249]. For a generalized abc-theorem in Nevanlinna theory sim-
ilar to Theorem 12.4.4, we refer to [249], Theorem A.3.2.6. The reader may also
consult, in connexion with Cartan’s version of Nevanlinna’s theory and its appli-
cations, the expository article by G.G. Gundersen and W. Hayman [142].
R. Nevanlinna asked for a second main theorem with moving targets, i.e. the con-
stants ai are replaced by meromorphic functions gi with log T (r, gi ) =
o(log T (r, f )). He treated the case of three targets by elementary means, using
a fractional linear transformation to reduce it to the constant case. The general
case remained open for a long time. The weaker form of the second fundamental
theorem without the contribution due to ramification was then obtained indepen-
dently by C.F. Osgood [232] and N. Steinmetz [290], see [249] for details and
further extensions. This was Vojta’s motivation for Roth’s theorem with moving
targets (see Section 6.5).
Finally, in a major paper K. Yamanoi [333] obtained a second fundamental theo-
rem with moving targets in full generality, with the expected contribution coming
from ramification. However, the error terms here are far weaker than those in
earlier works, because of the use of Ahlfors’s [6] geometric theory of covering
surfaces as a main tool.
1 4 T H E VO J TA C O N J E C T U R E S

14.1. Introduction

Ch. Osgood [231], [232], was the first to observe, in his researches on diophantine
approximation in differential fields, that the corresponding Roth’s theorem in that
setting could be viewed as analogous to Nevanlinna’s second main theorem, with
the exponent 2 in Roth’s theorem and the coefficient 2 in 2 T (r, f ) having the
same significance. To P. Vojta, in his landmark Ph.D. thesis, goes the credit of
finding a solid connexion between classical diophantine geometry over number
fields and Nevanlinna theory, thereby leading to far-reaching conjectures, which
unified and motivated much further research, see [306], [307].
This final chapter is dedicated to the Vojta conjectures. They may be considered
as an arithmetic counterpart of the Nevanlinna theory discussed in Chapter 13
and of which the abc-conjecture, which was the subject of a detailed analysis in
Chapter 12, turns out to be an important special case.
The first two sections of this chapter develop Vojta’s dictionary establishing a par-
allel between diophantine approximation and Nevanlinna theory, leading to his
conjectures over number fields in Section 14.3. Schmidt’s subspace theorem and
the theorems of Roth, Siegel, and Faltings now appear as special cases of Vojta’s
conjectures without the ramification term. This lends support to the validity of Vo-
jta’s conjectures and also shows that the crux of the matter in attacking the general
case consists precisely in controlling the ramification. The next Section 14.4 con-
tains a generalization of the strong abc-conjecture to curves over number fields
and we show its equivalence to the Vojta conjecture with ramification for curves,
due to Elkies, van Frankenhuysen, and Vojta. In particular, the abc-conjecture
implies the Mordell conjecture. Section 14.5 deals with the analogue of the strong
abc-conjecture over function fields of characteristic 0, concluding with the general
abc-theorem over function fields due to Voloch and Brownawell-Masser.
As it is already clear from the above, this chapter uses many results from previous
chapters. The reader is assumed to be familiar with the geometric theory of heights

479
480 T H E VO J TA C O N J E C T U R E S

from Chapter 2, with the abc-conjecture for integers and polynomials from Chap-
ter 12 and with the basic results from Nevanlinna theory in Chapter XIII. We will
use frequently the results about ramification from Appendix B.

14.2. The Vojta dictionary

In this short section, we introduce Vojta’s dictionary between basic objects of


Nevanlinna theory and corresponding concepts in diophantine approximation.
14.2.1. In diophantine approximation, we deal with an infinite sequence (βj )j∈N
of distinct elements of a number field K approximating given numbers a1 , . . . ,
aq ∈ K and arranged by increasing height. The sequence corresponds, in
Nevanlinna theory, to a non-constant meromorphic function f : C → P1an . The
role of the elements βj is played by the series of maps
fr : D(r) = {z ∈ C | |z| ≤ r} −→ P1an
for varying r > 0. The numbers ai are now given constants in C .
14.2.2. For r > 0, the role of places in Nevanlinna theory is played by the closed
disk D(r) . Let Fr be the field of meromorphic functions on D(r) , i.e. every
element extends to a meromorphic function in some neighbourhood of D(r) . For
v ∈ D(r), we have the discrete valuation ordv giving rise to the normalized
absolute value  ord (g)
v v if v = 0,
|g|v := r
r−ordv (g) if v = 0,
for any g ∈ Fr . For v on the boundary ∂D(r), we set
|g|v := |g(v)|v .
It is only well defined outside the poles of g contained in ∂D(r), although | |v
remains almost an absolute value in the sense that the triangle inequality |g1 +
g2 |v ≤ |g1 |v + |g2 |v and |g1 g2 |v = |g1 |v |g2 |v hold outside a finite set. Moreover,
|g|v = 0 for infinitely many v if and only if g = 0. The important point is that
v → log |g|v is integrable on ∂D(r) for any g ∈ Fr \ {0}. Thus the boundary
has to be considered as the set of archimedean places.
The product formula in K corresponds to Jensen’s formula in 13.2.6 written as

1
log |g|v + log |g|v dθ = log |c(g, 0)|.
2π ∂D(r)
v∈D(r)

14.2.3. Thus |βj |v corresponds in Nevanlinna theory to



⎨|f
⎪ (v)|
 ord (f )
for |v| = r,
|fr |v =  vr  v for 0 < |v| < r,

⎩ −ordv (f )
r for v = 0.
14.2. The Vojta dictionary 481

A reader puzzled by the fact that the set of places varies with r > 0 should identify
D(r) with the closed unit disk using the dilatation v → w := v/r and thus fr
with f(r) (w) := f (rw). Then we have


⎨|f (rw)| for |w| = 1,
|f(r) |w = |w| ord w (f (r ) )
= |w|ord r w (f )
for 0 < |w| < 1,

⎩ −ordw (f )
r for w = 0.
For w = 0, note that the normalization of the absolute value still depends on r
and not only on f(r) . This is the reason why it is better to consider the set of
places as variable. Ignoring the place w = 0 would lead to an additional error
term O(log r).
14.2.4. By the above analogy between normalized absolute values in number the-
ory and Nevanlinna theory, it is clear that the height h(β) corresponds to the char-
acteristic function T (r, f ). For a ∈ K , the analogue of the proximity function in
number theory is
 
mS (a, β) := − log min{1, |β − a|v } = log− |β − a|v
v∈S v∈S

which was used on the left-hand side of Roth’s theorem in 6.2.3, where S is any
finite set of places containing the archimedean ones. The counting function in this
number theoretic setting would be

NS (a, β) := log− |β − a|v .
v∈S

Note that in Nevanlinna theory the proximity function is defined by integration


over ∂D(r) which, by comparing the product formula with Jensen’s formula, only
corresponds to the archimedean places.
14.2.5. The first main theorem
mS (a, β) + NS (a, β) = h(β) + O(1)
with a constant independent of β clearly holds also for a number field K . For a
proof, note that the left-hand side is simply h(β − a) and use the standard inequal-
ity h(x + y) ≤ h(x) + h(y) + log 2, see Proposition 1.5.15.
To deal with the analogue of the second main theorem, we give the following
variant of Roth’s theorem in 6.2.3.
Theorem 14.2.6. Let K be a number field, let S ⊂ MK be a finite set of places
containing all archimedean places, and let a1 , . . . , aq be distinct elements of K .
Then for any fixed ε > 0 there are only finitely many β ∈ K such that

q
mS (ai , β) ≥ (2 + ε)h(β). (14.1)
i=1
482 T H E VO J TA C O N J E C T U R E S

Proof: For every v ∈ S , there is at most one i ∈ {1, . . . , q} such that β ∈ K is a


good approximation of ai , in the sense that
1
|β − ai |v < cv := min |aj − ak |v .
2 j=k
Now let β ∈ K be a solution of (14.1) which may be rewritten as

q 
min (1, |β − ai |v ) ≤ H(β)−2−ε . (14.2)
i=1 v∈S

For v ∈ S , we choose av ∈ {a1 , . . . , aq } such that |β − ai |v is minimal, then the


above shows
q
min (1, |β − ai |v ) ≥ min(1, cv )q−1 |β − av |v .
i=1
If we substitute this in (14.2), we get the inequality in Roth’s theorem with a multi-
plicative constant. If there were infinitely many solutions β of (14.1), then North-
cott’s theorem in 1.6.8 would yield H(β) → ∞ and the multiplicative constant
could be eliminated by passing to a smaller ε > 0, and Roth’s theorem in 6.2.3
would lead to a contradiction. 
Proposition 14.2.7. Roth’s theorem in 6.2.3 is equivalent to Theorem 14.2.6.
Proof: We have seen above that Roth’s theorem implies Theorem 14.2.6. In order
to see the converse, it is enough to deduce Theorem 6.4.1 for a normal finite ex-
tension E of K . Given αv ∈ E and extensions | |v,K of | |v to E for v in a
finite subset S of MK , we have to prove that

min(1, |β − αv |v,K ) ≤ H(β)−2−ε (14.3)
v∈S

has only finitely many solutions β ∈ K . By enlarging S , we may assume that S


contains all archimedean places. By Corollary 1.3.5, Gal(E/K) operates transi-
tively on {w ∈ ME | w|v}, hence the local degrees [Ew : Kv ] remain constant
and Corollary 1.3.2 gives

min(1, |β − αv |v,K ) = min(1, |β − σw (αv )|w )
w|v

for suitable σw ∈ Gal(E/K). So we may assume E = K . Applying Theorem


14.2.6 with {a1 , . . . , aq } = {αv | v ∈ S}, we get finiteness of (14.3). 
14.2.8. Hence Roth’s theorem is the analogue of the second main theorem without
the ramification term (see 13.2.24). In order to obtain a perfect correspondence,
one should have
N
mS (ai , β) ≤ 2h(β) + O(log h(β))
i=1
14.3. Vojta’s conjectures 483

for h(β) → ∞ instead of (2 + )h(β) in Theorem 14.2.6. This was conjectured


by Lang. In the particular case K = Q and S = ∞ , computations by Lang and
Trotter [175] of continued fraction expansions of certain algebraic numbers give
some limited support to this statement.

14.3. Vojta’s conjectures

In this section we translate the notions from the geometric part 13.4 of Nevanlinna
theory to projective varieties over a number field K . Griffiths’s conjecture leads us
to Vojta’s conjecture which would imply Roth’s theorem, Siegel’s theorem on in-
tegral points, Faltings’s theorem, as well as several other outstanding conjectures.
If we allow the points to vary in number fields of bounded degree, then the rami-
fication term analogous to Nevanlinna theory is given by the absolute logarithmic
discriminant. The consequences of Vojta’s conjecture with ramification will be
discussed in the next Section 14.4.
14.3.1. Let X be a projective variety over K and let s = sD be an invertible
meromorphic section of a line bundle L = O(D) with corresponding Cartier
divisor D . We choose a presentation of L (or more generally a bounded M -
metric) to get a local height λD (·, v) for every v ∈ MK . For a finite subset S of
MK (usually containing the archimedean primes), the counting function is

NS,D (P ) := λD (P, w)
w|v∈MK \S

and the proximity function is



mS,D (P ) := λD (P, w),
w|v∈S

where w ranges over MK(P ) . This is well defined for P ∈ X \ supp(D). As


a change of presentations (or M -metrics) changes these quantities only up to
bounded functions, which does not really matter here, we omit in what follows
explicit references to the presentations.

Based on his analogies between Nevanlinna theory and diophantine geometry, dis-
cussed here in Section 14.2, Vojta translated Griffiths’s conjectural second main
theorem from 13.4.18 into the following
Conjecture 14.3.2. Let X be an irreducible smooth projective variety over K .
Let D be a divisor with normal crossings, let H be an ample line bundle and let
KX be the canonical line bundle on X . For any ε > 0 and any finite subset
S ⊂ MK , there is a closed subset Z = X such that for all P ∈ X(K) \ Z , we
have
mS,D (P ) + hKX (P ) ≤ εhH (P ) + O(1).
484 T H E VO J TA C O N J E C T U R E S

Here, D is said to have normal crossings if the base change to C has normal
crossings in the analytic sense of 13.4.17. The above inequality is called Vojta’s
height inequality.
Remark 14.3.3. Since Theorem 14.2.6 easily extends to the case ai ∈ K (see
below for a generalization), Proposition 14.2.7 shows that the case X = P1K in
Vojta’s conjecture above is equivalent to Roth’s theorem.

More generally, for X = PnK Schmidt’s subspace theorem yields the follow-
ing analogue of Vojta’s version of Cartan’s second main theorem (see Remark
13.4.19):
Theorem 14.3.4. Vojta’s conjecture in 14.3.2 holds for X = PnK and D a divisor
equal to a finite union of hyperplanes in general position defined over K . The
exceptional subset Z may be chosen as a finite union of linear subspaces defined
over K .
Proof: Since all quantities in Conjecture 14.3.2 are invariant under base change to
a larger number field, we may assume that all hyperplane components {Li = 0}
of D are defined over K . For x ∈ PnK (K), we have the local height
− log (|Li (x)|v /|x|v ) , |x|v := max |xj |v ,
j

with respect to {Li = 0} and hence




|Li (x)|v
mS,D (x) = − log + O(1).
i
|x|v
v∈S

By KPnK ∼
= OPnK (−n − 1) ([148], Example II.8.20.1), we easily deduce the claim
from Theorem 7.2.9. 
Remark 14.3.5. As in Nevanlinna theory, Vojta’s conjecture in 14.3.2 is known
for curves X = C of genus g . Note that in this case, the exceptional set is finite
and hence may be omitted by enlarging the bound. We sketch the argument and
we give additional explanations to the meaning of Vojta’s conjecture:
If g = 0, then we know that Conjecture 14.3.2 is equivalent to Roth’s theorem.
If g = 1 and D = 0, then Vojta’s conjecture is a special case of the approximation
theorem for abelian varieties (cf. [277], §7.3) which is an intermediate step in the
standard proof of Siegel’s theorem on finiteness of S -integral points (see Remark
7.3.10 and Serre [277], §7.5). The latter claims that for a geometrically irreducible
smooth projective curve of genus g ≥ 1 and for a reduced divisor D = 0 (or
g = 0 and |supp(D)| ≥ 3), the S -integral points of C \ supp(D) are finite. Note
that the complement of D is always affine ([148], Exercise IV.1.3).
In order to see that the Vojta height inequality directly implies Siegel’s theo-
rem, note that S -integral means that NS,D (P ) = O(1) and hence mS,D (P ) =
14.3. Vojta’s conjectures 485

hD (P )+O(1). If g = 0 and using KC = −2[∞], H = [∞] and |supp(D)| ≥ 3,


we get hH (P ) bounded proving the claim by Northcott’s theorem in 2.4.9. If
g = 1, then KC is trivial (cf. Proposition 8.2.26 or A.13.6) and H := D = 0
implies again hH (P ) bounded.
For g ≥ 2, Vojta’s conjecture is equivalent to Faltings’s theorem in 11.1.1 (use
KX = H ample and that we may assume λD (P, v) ≥ 0 for all v ∈ MK by
Proposition 2.3.9).
In [312], Vojta has given a direct proof of his height inequality for curves in 14.3.2.
This gives a unified proof of Faltings’s, Roth’s, and Siegel’s theorems.
Remark 14.3.6. The assumption that D has normal crossings is really necessary.
This will be shown in a series of examples where we always assume that S is a
finite subset of MK containing the archimedean places:
If X = P1K and D = 2[0] + [∞], then Vojta’s height inequality would give
h(P ) ≤ NS,D (P ) + εh(P ) + O(1) (14.4)
for all P ∈ Gm (K). The set of S -integral points in Gm (K) is equal to the set of
S -units. If |S| ≥ 2, then Dirichlet’s unit theorem in 1.5.13 shows that there are
infinitely many S -units, and they satisfy NS,D (P ) = O(1). By Northcott’s the-
orem in 2.4.9, this contradicts (14.4), hence no multiple components are allowed
for D .
Next, we consider the example X = P2K and
D = [x1 = 0] + [x2 = 0] + [(x1 − x2 )x0 = (x1 + x2 )2 ].
Then (1 : 0 : 0) is an ordinary triple point, hence D has not normal crossings.
The coordinate ring of the affine variety P2K \ D is
? @
1 1 x0 x1
K z, x, , , z := , x := .
x (x − 1)z − (x + 1)2 x2 x2
The goal is to construct infinitely many S -integral points P = (z : x : 1) in
P2K \ D lying Zariski dense. Here a point P is S -integral if and only if z is an
S -integer and x, (x − 1)z − (x + 1)2 are S -units. Note that
(x − 1)z − (x + 1)2 = −4xn (14.5)
has a unique solution zn (x) ∈ K[x, x−1 ] for every n ∈ Z . So we assume that S
contains all places over 2. By Dirichlet’s unit theorem, there is an S -unit a which
is not a root of unity. We conclude that the set
Y := {(zn (am ) : am : 1) | m, n ∈ Z}
is S -integral in P2K \ D . For fixed n ∈ Z, equation (14.5) describes a rational
projective curve in P2K such that {(zn (am ) : am : 1) | m ∈ Z} is dense. For n ≥
3, the irreducible curves have distinct degrees and hence they are not contained
486 T H E VO J TA C O N J E C T U R E S

in a proper closed subvariety of P2K . We conclude that Y is a dense S -integral


subset in P2K \ D . As before, we conclude that Vojta’s height inequality does not
hold.
Finally, we show that no bad singularities of the components are allowed either.
Let us consider the plane curve D = [x0 xd−1 1 = xd2 ] for d ≥ 4. Then D is
irreducible and has again not normal crossings in the singularity (1 : 0 : 0). For
an S -integer a and an S -unit u , the point P = (ad + u : 1 : a) is S -integral on
P2K \ D , hence
 max |xi |v
NS,D (P ) = log = 0.
v∈S
|x0 x1
d−1
− xd2 |v

For |S| ≥ 2, again Dirichlet’s unit theorem implies easily that such points are
dense in P2K , in contradiction with Vojta’s height inequality.
Remark 14.3.7. A smooth projective variety X over K is called of general type
if there is a positive integer n , an ample line bundle L on X and an effective
⊗n ∼
divisor E on X with KX = L ⊗ O(E) (cf. [307], §1.2). If X is of general
type, then Vojta’s conjecture, applied with D = 0 and L = H , implies
(1 − ε)hH (P ) + hE (P ) ≤ O(1).
By Proposition 2.3.9, we may assume hE (P ) ≥ 0. Then Northcott’s theorem in
2.4.9 shows that X(K) is not Zariski dense in X . This is the Bombieri–Lang
conjecture.
For any projective variety X over K , the special set SpX is defined as the Zariski
closure of the union of the images of all non-constant rational maps from irre-
ducible group varieties to X . Clearly, in the special set one may have infinitely
many K -rational points by considering the images of 1, g, g 2 , · · · in X for a K -
rational point g of the group variety. Then the general Lang conjecture claims
that X is of general type if and only if (X \ SpX ) (K  ) is finite for all finitely
generated extensions K  /K (see [171], Ch.I, §3 for a further discussion).
Faltings proved this for a subvariety X of an abelian variety, which is Faltings’s
big theorem (see Theorem 11.10.1 for the number field case and [115] for the
general case). In this case the special set is the union of all translates of abelian
subvarieties of dimension ≥ 1 contained in XK (see [171], Ch.I, §6).
14.3.8. In Conjecture 14.3.2, we have only considered points rational in a fixed
number field K . But what happens if we allow P to vary over all K -rational
points? The analogous situation in Nevanlinna theory was considered in Theorem
13.4.30. The additional effect of finite field extension was measured by the count-
ing function of the ramification divisor of the covering. In order to get an analogy
with the number field case, we use the language of schemes. If the reader is not
familiar with the latter, then he may pass directly to Definition 14.3.9.
14.3. Vojta’s conjectures 487

Let F/K be a finite dimensional field extension (say F ⊂ K ). Then an F -


rational point P of the projective variety X may be identified with a morphism
Spec(F ) → X over K with image P . Instead of the finite coverings p : Y → C ,
we use here Spec(OF ) → Spec(OK ) as models for our fields which may be seen
as arithmetic curves. The ramification divisor of p may be written in the form

Rp = (ΩY /Spec(C),y ) · [y],
y∈Y

where Ω denotes the sheaf of relative differentials (see A.7.29) and  is the length
(see B.4.4). So it is natural to define the ramification divisor of F/Q by

RF/K = (ΩOF /OK ,P )P.
P∈Spec(OF )

By B.1.18, ΩOF /OK is a principal OF -module with annihilator equal to the dif-
ferent DF/K , hence
(ΩOF /OK ,P ) = vP (DF/K ),
where vP is the discrete valuation associated to P and where vP (I) := min{vP (a) |
a ∈ I} for any ideal I of OF . Note that after localization at P , it is always the
valuation of a principal generator. Thus the counting function of RF/K should be
 
NRF / K = − (ΩOF /OK ,P ) log |P|vP = − log |DF/K |vP .
P∈Spec(OF ) P∈Spec(OF )

Since the norm NF/Q of the different DF/K is the discriminant dF/K (see B.1.17),
Lemma 1.3.7 and the product formula lead to
1  1
NRF / K = − log |dF/K |℘ = log |NK/Q dF/K |.
[F : K] [F : Q]
℘∈Spec(OK )

Definition 14.3.9. For a number field K , we define the absolute logarithmic dis-
criminant dK by
1
dK := log |DK/Q |
[K : Q]
where DK/Q ∈ N is the discriminant (see B.1.14). The absolute logarithmic
discriminant of a point P in a K -variety is defined by d(P ) := dK(P ) .
Proposition 14.3.10. Let F/K be a finite extension of number fields. Then
1 1 
0 ≤ dF − d K = log |NK/Q dF/K | = − log |dF/K |v .
[F : Q] [F : K] v
where v ranges over all non-archimedean places of MK .
Proof: Recall that |dF/K |v is the absolute value of a principal generator of the
localization of dF/K in the prime ideal corresponding to v . It follows from the
approximation theorem in 1.2.13 that we may choose the same principal generator
488 T H E VO J TA C O N J E C T U R E S

for a finite set of places. Then the claim follows from Proposition B.1.19 and
Lemma 1.3.7. 
From 14.3.8, we get NRF / K = dF − dK . The analogy with Theorem 13.4.30
leads to Vojta’s conjecture with ramification.
Conjecture 14.3.11. Let X, D, H, S, ε be as in Conjecture 14.3.2 and let d ∈ N .
Then there is a closed subset Z = X such that for all P ∈ X \ Z with [K(P ) :
K] ≤ d it holds
mS,D (P ) + hKX (P ) − d(P ) ≤ εhH (P ) + O(1).
Remark 14.3.12. Even for C = P1Q , Vojta’s conjecture with ramification is un-
known. The case of curves and its relations to the abc-conjecture are studied in
the next section.

14.4. A general abc-conjecture

A natural thing to ask is what form the abc-conjecture should take over a num-
ber field K . Also, the question arises whether there is any special feature in the
structure of the equation a + b = c , and what may be an appropriate higher-
dimensional generalization of it. To this end, we may view the abc-conjecture as
a statement about points on the model x + y = 1, x, y = 0, 1 of the affine curve
P1 \ {0, 1, ∞}. There is nothing special about this particular model, for example
we could have worked instead with the affine curve x(x − 1)y = 1. What matters
here is that the abc-conjecture is a statement about ramification, both arithmetic
and geometric.
In this section, C denotes an irreducible smooth projective curve of genus g de-
fined over K . Let D be an effective reduced divisor on C with local height λ .
Then we will formulate a conjecture on C for D and λ similar to the truncated
second main theorem in Nevanlinna theory. The case D = [0]+[1]+[∞] will give
us the desired generalization of the strong abc-conjecture to number fields. Based
on the work of Elkies and Vojta, we will show that this strong abc-conjecture, the
conjectural truncated second main theorem for arbitrary C , and Vojta’s conjecture
with ramification for curves, are all equivalent.
14.4.1. Clearly, the left-hand side max{|a|, |b|, |c|} of the abc-conjecture in 12.2.2
corresponds logarithmically to the height hλ on C . So we have to deal with

log rad(abc) = log |1/p|p .
p|abc

For every natural prime p , there is a contribution log p if the local height of the
point (a : b : c) of the curve x + y = z is non-zero. This leads to the follow-
ing generalization of the radical. For a precise explanation, we refer to Example
14.4.4.
14.4. A general abc-conjecture 489

Definition 14.4.2. The conductor of P ∈ C\supp(D) with respect to λ is defined


by 
condKλ (P ) := χ(λ(P, v)) log |1/πv |v ,
fin
v∈MK (P )

where v ∈ MK(Pfin
) denotes the discrete valuations of K(P ) with local para-
meters πv and 
0 if t ≤ 0,
χ(t) =
1 if t > 0.
Remark 14.4.3. Note that the conductor is completely analogous to the truncated
counting function in Nevanlinna theory from 13.4.26. It makes also sense for
non-reduced effective divisors. The following example shows how to recover the
special conductor for the simple minded formulation of the abc-conjecture we
have considered in Chapter 12.
Example 14.4.4. Consider the case in which C = P1K and D = [0] + [1] + [∞].
Let (x0 : x1 ) be standard homogeneous coordinates on P1K , which we view as
global sections of OP1 (1). Then D has a presentation
D = (x0 x1 (x0 − x1 ); OP1 (3), x30 , x20 x1 , x0 x21 , x31 ; OP1 , 1)
with associated height
hD (P ) = 3h(P ),
where h(P ) is the standard projective height in P1K . Moreover, Ω1P1 =∼ OP1 (−2)
(use A.13.6), thus there is a presentation K of a canonical divisor such that
hK (P ) = −2h(P ).
It follows that
hD (P ) + hK (P ) = h(P ).
The local height function λ(P, v) := λD (P, v) at v ∈ MK(P ) is given by
 
 xi0 x3−i 
λ(P, v) = max log   1 
i=0,...,3 x0 x1 (x0 − x1 ) v
   

 x20   x21 
= max log    
, log   .
x1 (x0 − x1 ) v x0 (x0 − x1 ) v
If x0 = c, x1 = a, x0 − x1 = b, where OK(P ) a, OK(P ) b , OK(P ) c are coprime
ideals, which is always possible if OK(P ) is a principal ideal domain, then a+b =
c and
 2  2

c  a 
λ(P, v) = max log   , log  
 
ab cb
  v   v  

1 1 1


= max log   , log   , log   ;
a v b v c v
490 T H E VO J TA C O N J E C T U R E S

therefore 
condK
λ (P ) = log |1/πv |v .
v(abc)>0
Another way of writing this conductor, which works in any case, is as follows. For
x = x1 /x0 , let

condK [0] (x) = log |1/πv |v ,
v(x)>0

condK
[1] (x) = log |1/πv |v ,
v(1−x)>0

condK
[∞] (x) = log |1/πv |v .
v(1/x)>0

Clearly, they correspond to conductors of the divisors [0], [1] and [∞] on P1K .
We have
condK K K K
λ (P ) = cond[0] (x(P )) + cond[1] (x(P )) + cond[∞] (x(P )).

Proposition 14.4.5. Let D , D be effective Cartier divisors on C :

(a) If D ≤ D , then there are presentations D, D with associated local heights


0 ≤ λ ≤ λ and hence condK λ ≤ condλ outside of supp(D ).
K 

(b) If supp(D) = supp(D ), then condK λ − condλ is a bounded function


K

outside of supp(D), for any local heights λ, λ of D and D .




(c) Let λ, λ , λ be local heights associated to D , D and D + D . Then



condKλ ≤ condλ + condλ + O(1) outside of supp(D) ∪ supp(D ).
K K

Proof: By Proposition 2.3.9, there are presentations D, D of D and D − D


such that the corresponding local heights λ, λ are non-negative functions outside
of the supports of D and D − D , respectively. Setting D = D + D (cf. 2.2.4),
we get (a).
It follows from Theorem 2.2.11 and Remark 2.2.13 (resp. Theorem 2.7.14 in the
case of Néron divisors) that the conductor is independent of the choice of the local
height up to bounded functions on C \ supp(D). If supp(D) = supp(D ), then
there is n ∈ N such that D ≤ nD and (b) follows from (a).
To prove (c), we may assume by (a) and (b) that λ, λ ≥ 0 and λ = λ + λ . Then
the claim is obvious from the definition of the conductor. 
The conductor condK λ (P ), as well as the absolute logarithmic discriminant d(P ),
does not have a good functorial behaviour with respect to base change. As we will
show next, their sum condK λ (P ) + d(P ) does not suffer from this defect and it
is this quantity which will appear on the right-hand side of the conjectural second
main theorem.
14.4. A general abc-conjecture 491

Proposition 14.4.6. Let ϕ : C  → C be a finite morphism of irreducible smooth


projective curves over K . Let also λ be a local height relative to the effective
divisor D on C and λ := λ ◦ ϕ be the local height relative to ϕ∗ D . Then for
P  ∈ C  with P := ϕ(P  ) ∈ supp(D) the following statements hold:

λ (P ) ≤ condλ (P );
(a) condK K

(b) d(P  ) + condK 


λ (P ) ≥ d(P ) + condλ (P );
K

(c) if ϕ is unramified outside of ϕ−1 (supp(D)), then


d(P  ) + condK 
λ (P ) ≤ d(P ) + condλ (P ) + O(1).
K

Proof: By Lemma 1.3.7, we get


  1


λ (P ) − condλ (P ) =
condK 1− log |1/πv |w ,
K
χ(λ(P, v))
fin
ew/v
v∈MK (P )
w|v
(14.6)
fin
where w ranges over MK(P  ) and ew/v is the ramification index. This proves (a).

On the other hand, Proposition 14.3.10 gives


1 
d(P  ) − d(P ) = − log |dK(P  )/K(P ) |v .
[K(P  ) : K(P )] fin
(14.7)
v∈MK (P )

By Dedekind’s discriminant theorem in B.2.12, we have




log |dK(P  )/K(P ) |v = fw/v (ew/v − 1 + δw ) log |πv |v ,
w|v

where fw/v is the residue degree and



 ∈ [1, ew/v v(ew/v )] if v(ew/v ) ≥ 1,
δw
=0 else.

As usual the discrete valuation v is normalized by v(πv ) = 1. The case in which


v(ew/v ) ≥ 1 is called the case of wild ramification.
By Proposition 1.2.11, ew/v fw/v is the local degree of w|v and hence
1  
δw −1

− log |d  |
K(P )/K(P ) v = 1 + log |1/πv |w .
[K(P  ) : K(P )] ew/v
w|v

If we substitute this in (14.7), comparison with (14.6) leads to (b).


To prove (c), we note that (14.7) minus (14.6) and the above considerations yield
d(P  ) + condK 
λ (P ) ≤ d(P ) + condλ (P ) + C1 + C2 ,
K
492 T H E VO J TA C O N J E C T U R E S

where
1 
C1 = − log |dK(P  )/K(P ) |v
[K(P  ) : K(P )] v
with v ranging over all non-archimedean places of K(P ) with λ(P, v) ≤ 0 and
 
C2 = v(ew/v ) log |1/πv |w .
fin
v∈MK w|v
(P )

Note that only places w ∈ MK(Pfin


 ) wildly ramified over v contribute to C2 . Let

p be the natural prime with v|p and let vp be the corresponding discrete valuation
normalized by vp (p) = 1. Using
v(ew/v ) = ev/p vp (ew/v ) ≤ ev/p log[K(P  ) : K(P )]/ log p,
we get
 log[K(P  ) : K(P )] 
v(ew/v ) log |1/πv |w ≤ log |1/p|w
log p
v|p w|v w|p

= log[K(P ) : K(P )],
where the last step was done by appealing to Lemma 1.3.7. Let S0 be the set of
natural primes p such that K(P  )/K(P ) is wildly ramified at a place over p . For
every p ∈ S0 , Example 1.4.12 shows
p | ew/v ≤ [K(P  ) : K(P )] ≤ deg(ϕ),
hence the cardinality |S0 | is bounded by a constant depending on deg(ϕ). We
conclude
C2 ≤ |S0 | log deg(ϕ) = O(1).
It remains to bound C1 . By Proposition 14.4.5, we may assume λ ≥ 0. Let M be
the set of places of K represented by the extensions of | |p , p ∈ MQ . It is easy to
show that E u := {x ∈ C \ supp(D) | λ(x, u) = 0}, u ∈ M , is an M -bounded
family in C \ supp(D) (use Proposition 2.6.17). The Chevalley–Weil theorem in
the form of Theorem 10.3.5 gives the existence of a non-zero α ∈ Z such that
α ∈ dK(P  )w /K(P )v
for all P  ∈ C \ supp(D) and w ∈ MK(P fin  
 ) with λ (P , w) = 0. By Corollary

1.3.2, B.1.20 and B.1.21, we get


 
|dK(P  )/K(P ) |v = |dK(P  )w /K(P )v |v ≥ |α|[K(P
v
):K(P )]
.
w|v

Therefore 
C1 ≤ − log |α|v = log |α|
fin
v∈MK (P )

by Lemma 1.3.7 and the product formula. This finishes the proof of (c). 
14.4. A general abc-conjecture 493

Corollary 14.4.7. Let F/K be a finite subextension of K/K . Then:

λ (P ) ≤ condλ (P ) with equality if F/K is unramified.


(a) condF K

λ (P ) ≤ dF (P ) + condλ (P ) ≤ dK(P ) + condλ (P ) + O(1).


(b) dK(P ) + condK F K

Proof: Let us consider the base change CF as a curve C  over K . Then it is


clear that the natural morphism ϕ : C  → C is finite and unramified. We have
P  ∈ C  in the fibre over P such that K(P  ) = F (P ) and the claim follows from
Proposition 14.4.6. 
Proposition 14.4.8. Let S0 be a finite set of natural primes. Then the contribution
of the places v lying over the primes of S0 to condK λ (P ) satisfies the bound
 
χ(λ(P, v)) log |1/πv |v ≤ log p
v|p∈S0 p∈S0

on C \ supp(D).
Proof: This is obvious from |πv |v ≥ |p|v and Lemma 1.3.7. 
Proposition 14.4.9. Let λ be a local height relative to the effective divisor D on
C . Then
condKλ (P ) ≤ hλ (P ) + O(1)

for all P ∈ supp(D).


Proof: By Proposition 14.4.5, we may assume that λ ≥ 0 and that λ is given by a
presentation. Then λ(P, v) ≥ log |1/πv |v or λ(P, v) = 0 and hence
 
hλ (P ) = λ(P, v) ≥ χ(λ(P, v)) log |1/πv |v = condK
λ (P ),
v∈MK (P ) fin
v∈MK (P )

proving the claim. 


In Nevanlinna theory, we have seen that the second main theorem for curves in
13.4.23 easily implies the truncated second main theorem in 13.4.28. Similarly,
we can show that the second main theorem for coverings in 13.4.30 implies a
corresponding truncated second main theorem. The contribution of a finite rami-
fied covering p : Y → C is measured by NRp (cf. 13.4.31), which is the ana-
logue of the absolute logarithmic discriminant d(P ) from diophantine geometry
(cf. 14.3.8). Since the conductor corresponds to the truncated counting function,
the truncated second main theorem for coverings leads to the following:
Conjecture 14.4.10. Let D be a reduced effective divisor on C with local height
λ , let H be an ample line bundle on C and let ε > 0. Then
hD (P ) + hKC (P ) ≤ condK
λ (P ) + d(P ) + εhH (P ) + O[K(P ):K] (1)

for all P ∈ C \ supp(D).


494 T H E VO J TA C O N J E C T U R E S

Here, as in the other conjectures of this chapter, in the bound O[K(P ):K] (1) which
may depend on ε and a whole complex of other data, it is only the dependence on
the point P which matters here and we give it in terms of the degree [K(P ) : K].
14.4.11. If we specialize to the case C = P1K and D = [0] + [1] + [∞], then
KC ∼ = OP1 (−2) (cf. A.13.6) and hence
hD (P ) + hKC (P ) = h(P ) + O(1).
By Example 14.4.4, we get the strong abc-conjecture of Elkies generalizing the
strong abc-conjecture in 12.2.2 to number fields:
Conjecture 14.4.12. For every ε > 0, it holds
(1 − ε)h(x) ≤ condQ Q Q
[0] (x) + cond[1] (x) + cond[∞] (x) + dQ(x) + O[Q(x):Q] (1)

for all x ∈ Q \ {0, 1}, where O[Q(x):Q] (1) depends only on ε and [Q(x) : Q].

We will show that both conjectures 14.4.10 and 14.4.12 are equivalent to
Conjecture 14.3.11 restricted to curves. For the latter, the exceptional set Z is
finite and may be omitted by enlarging the O(1)-term. Hence Vojta’s conjecture
with ramification for curves reads as:
Conjecture 14.4.13. Let S be a finite subset of MK . With the same hypothesis
as in Conjecture 14.4.10, the estimate
mS,D (P ) + hKC (P ) ≤ d(P ) + εhH (P ) + O[K(P ):K] (1)
holds for all P ∈ C \ supp(D).
14.4.14. Vojta’s idea to deduce the strong abc-conjecture from his conjecture is
that, by passing to a finite covering π : C  → C , we may improve the height in-
equality in 14.4.13. In fact, Conjecture 14.4.13 applied to C  and D := π ∗ (D)red
(namely, the sum of the irreducible components) gives
mS,D (P  ) + hKC  (P  ) − d(P  ) ≤ ε hπ∗ H (P  ) + O[K(P  ):K] (1) (14.8)
for all P  ∈ supp(D ). Now KC  ∼
= π ∗ KC + Rπ from Theorem B.4.5 implies
hKC  (P  ) = hKC (P ) + mS,Rπ (P  ) + NS,Rπ (P  ) + O(1) (14.9)
  ∗
for P := π(P ). By Proposition B.4.7, we have D ≥ π (D) − Rπ and hence
mS,D (P  ) ≥ mS,D (P ) − mS,Rπ (P  ) + O(1). (14.10)
By (14.9) and (14.10) in (14.8), we get
mS,D (P ) + hKC (P ) + NS,Rπ (P  ) − d(P  ) ≤ ε hH (P ) + O[K(P  ):K] (1).
This leads to the improvement NS,Rπ (P  ) + d(P ) − d(P  ) on the left-hand side
of the original Vojta height inequality on C . Moreover, by a Chevalley–Weil type
argument, this improvement is always bounded from below ([307], Th.5.1.6), but
we will not need this result here.
14.4. A general abc-conjecture 495

If the morphism π is unramified outside of π −1 (supp(D)), we may apply Propo-


sition 14.4.6, getting

hD (P ) + hKC (P ) − d(P ) + NS,Rπ (P  )



≤ condK
λ (P ) + NS,D (P ) + ε hH (P ) + O[K(P  ):K] (1). (14.11)
Example 14.4.15. A Fermat curve is a plane projective curve given by the pro-
jective equation
xn0 + xn1 = xn2
for some n ≥ 1. By the Jacobi criterion in A.7.15, the Fermat curve is smooth.
There is a finite covering π : Cn → C1 ∼ = P1Q given by mapping (x0 : x1 : x2 )
to (xn0 : xn1 : xn2 ). Local analytically over 0, 1, or ∞ , the morphism is given
by z → z n , hence Rπ = (n − 1)D , where D = π −1 {0, 1, ∞} (use Example
B.4.8). As a consequence of the Hurwitz theorem in B.4.6, we note that Cn has
genus 12 (n − 1)(n − 2) (also clear from A.13.4).
Theorem 14.4.16. The following conjectures are all equivalent:

(a) strong abc-conjecture in 14.4.12;


(b) Conjecture 14.4.10 for all curves C over any number field;
(c) Vojta’s conjecture in 14.4.13 for all curves C over any number field;
(d) Vojta’s conjecture in 14.4.13 for all Fermat curves over Q .

The implication (a) ⇒ (b) follows from an argument of Elkies [98] which was
elaborated by M. van Frankenhuysen [305]. The claims (b) ⇒ (c) ⇒ (d) are
trivial and (d) ⇒ (a) is due to P. Vojta (see [307], [317]).
Proof: (a) ⇒ (b): Let C be an irreducible smooth projective curve over the
number field K with local height λ relative to the reduced divisor D . The proof
is based on Elkies’s idea of using a Belyı̆ function f : C → P1K for D , in other
words with supp(D) ⊂ f −1 {0, 1, ∞} and unramified outside of f −1 {0, 1, ∞}
(see Lemma 12.2.7). We use D0 = [0] + [1] + [∞], D = f ∗ (D0 ), D1 = Dred 

and D2 = D1 − D with corresponding local heights λ0 , λ , λ1 and λ2 . By


Proposition B.4.7, the ramification divisor Rf satisfies
Rf = D − D1 . (14.12)
For P ∈ f −1 {0, 1, ∞}, the strong abc-conjecture, Example 14.4.4 and Corollary
14.4.7 imply
(1 − ε )h(f (P )) ≤ condK
λ0 (f (P )) + d(f (P )) + O[K(f (P )):Q] (1).

By Proposition 14.4.6, we have

λ0 (f (P )) + d(f (P )) ≤ condλ (P ) + d(P ).


condK K
496 T H E VO J TA C O N J E C T U R E S

By Proposition 14.4.5, we get


(1 − ε )h(f (P )) ≤ condK K
λ (P ) + condλ2 (P ) + d(P ) + O[K(P )):K] (1).

Note that only the dependence on P is indicated in the bound, which may depend
on K as well. By Proposition 14.4.9 and h(f (P )) = hf ∗ OP1 (1) (P ) + O(1), we
deduce
(1 − ε )hf ∗ OP1 (1) (P ) ≤ condK
λ (P ) + hD2 (P ) + d(P ) + O[K(P ):K] (1).
(14.13)
We have proved this only for P ∈ supp(D1 ), but by increasing the constants we
may assume that it holds for all P ∈ supp(D). By Theorem B.4.5 and (14.12),
we have
KC ∼ = f ∗ KP1 ⊗ O(Rf ) ∼= f ∗ OP1 (1) ⊗ O(−D1 ).
By this equation, inequality (14.13), and Theorem 2.3.8, we get
(1 − ε ) (hKC (P ) + hD1 (P )) ≤ condK
λ (P ) + hD2 (P ) + d(P ) + O[K(f (P )):K] (1).

Finally, D = D1 − D2 leads to
hKC (P )+hD (P )−ε (hKC (P ) + hD1 (P )) ≤ condK
λ (P )+d(P )+O[K(P ):K] (1).

By A.6.10, there is n ∈ N such that nH − D1 − KC is ample, hence


hD1 + hKC ≤ nhH + O(1).

Choosing ε = ε/n , we get (b).
(b) ⇒ (c): This is obvious from hD (P ) = mS,D (P ) + NS,D (P ) and from

λ (P ) ≤ NS,D (P ) + O(1)
condK
easily deduced from Proposition 14.4.8.
(c) ⇒ (d) is trivial.
(d) ⇒ (a): Let P ∈ P1Q \ {0, 1, ∞} with affine coordinate x = x(P ) ∈ Q. We
identify P1Q with C := C1 and we consider the covering π : C  → C of Fermat
curves with C  := Cn for suitable n ≥ 1 (see Example 14.4.15). We choose
P  ∈ C  with P = π(P  ). For D := [0] + [1] + [∞] and D := π −1 {0, 1, ∞},
Example 14.4.15 yields easily π ∗ D = nD and hence Rπ = (n − 1)D proves

 1  1
NS,Rπ (P ) = 1 − NS,π D (P ) + O(1) = 1 −
∗ NS,D (P ) + O(1).
n n
Applying (14.11) on page 495 to the covering π , we get
1
hD (P ) + hKC (P ) − d(P ) ≤ condQ 
λ (P ) + NS,D (P ) + ε hH (P ) + O[Q(P ):Q] (1).
n
By Proposition 2.3.9, we have NS,D (P ) ≤ hD (P ) + O(1) and hence

1
1− hD (P ) + hKC (P ) − ε hH (P ) ≤ condQ
λ (P ) + d(P ) + O[Q(P ):Q] (1).
n
14.4. A general abc-conjecture 497

Since C = P1Q , KC ∼
= OP1 (−2) (cf. A.13.6), H = OP1 (1) with hH the standard
height and Example 14.4.4, we get

3
1 − − ε h(P ) ≤ condQ Q Q
[0] (x(P )) + cond[1] (x(P )) + cond[∞] (x(P ))
n
+ d(P ) + O[Q(P ):Q] (1)
for all P ∈ P1Q \ {0, 1, ∞} with affine coordinate x(P ). If we choose ε = ε/2
and n ≥ 6/ε , we get (a). 
Remark 14.4.17. If we are only interested in x ∈ K \ {0, 1} for a fixed number
field K , then dependence on K plays no role and the strong abc-conjecture in
14.4.12 and Corollary 14.4.7 imply the K -rational abc-conjecture
(1 − ε)h(x) ≤ condK K K
[0] (x) + cond[1] (x) + cond[∞] (x) + O(1).

The proof of Theorem 14.4.16 can be adapted to show that the K -rational abc-
conjecture implies the K -rational version of Conjecture 14.4.10, namely
hD (P ) + hKC (P ) ≤ condK
λ (P ) + εhH (P ) + O(1)

for all P ∈ C(K) \ supp(D). In particular, choosing K = Q , C = P1Q and


D = div(F ), we get immediately Theorem 12.2.12. Indeed, we have
log(rad(F (m, n))) = condQ
λ ((m : n))

for the local height


max{|x0 |dp , |x1 |dp }


λ(x, p) = log .
|F (x0 , x1 )|p
14.4.18. In Section 12.2 we have shown that the strong abc-conjecture over Q
implies the classical Roth theorem over Q . It is now clear, as pointed out by Elkies,
that the K -rational abc-conjecture implies Roth’s theorem over the number field
K in 6.2.3.
Indeed, the proof of Theorem 14.4.16 shows that the K -rational abc-conjecture
implies Vojta’s height inequality in 14.3.2 for K -rational points of an irreducible
smooth projective curve X over K . Hence Remark 14.3.3 proves the claim.

By Remark 14.3.5, the same argument proves the following result of Elkies [98]:
Theorem 14.4.19. The K -rational abc-conjecture implies Faltings’s theorem in
11.1.1 for the number field K .
Remark 14.4.20. The proof we have given actually shows that an effective ver-
sion of the K -rational abc-conjecture implies effective versions of Roth’s and
Faltings’s theorems.
498 T H E VO J TA C O N J E C T U R E S

Remark 14.4.21. Equivalently, we may formulate the strong abc -conjecture as


 
h(x) ≤ C1 · condQ Q Q
[0] (x) + cond[1] (x) + cond[∞] (x)
(14.14)
+ C2 · dQ(x) + OC 1 ,C 2 ,[Q(x):Q] (1)

for x ∈ Q \ {0, 1} with C1 = C2 > 1 .


Such a result would be quite useful for any constants C1 , C2 . Note also that D. Masser
[194] proved that (14.14) is false for C2 = 1 and any C1 , with a method similar as the one
used by Stewart–Tijdeman in proving Theorem 12.4.6.

14.5. The abc-theorem for function fields

We have seen in Example 12.4.1 that the abc-conjecture holds for complex poly-
nomials. In this section, we extend this to a function field K of characteristic 0
proving the theorem of Stothers and Mason. But first, we transfer the results of
the last section to the case of function fields of characteristic 0. Then we prove
the abc-conjecture and also the Vojta height inequality in the split function field
case where no ε -term is necessary. In this situation, we can also transfer Car-
tan’s second main theorem with similar arguments as in Nevanlinna theory. As
a corollary, we obtain the result of Voloch and Brownawell-Masser bounding the
non-degenerate solutions of the unit equation with several summands.
14.5.1. In this section, K = k(B) denotes a function field of an irreducible pro-
jective variety B over a field k of characteristic zero. We assume that B is regular
in codimension 1 and we fix an ample class c on B .
Let MK be the set of prime divisors on B . We recall from Proposition 1.4.7 that
the discrete absolute values
|f |v := e− degc (v)ordv (f ) (f ∈ K, v ∈ MK )
satisfy the product formula.
14.5.2. By Lemma 1.4.10, every finite extension F/K is a function field F =
k(Y ) for a variety Y regular in codimension 1 and a finite morphism p : Y →
B . We have seen in Remark 1.4.11 that the set of places MF is independent
of the choice of the model Y and the argument shows that two such models are
isomorphic outside subsets of codimension at least 2. In fact, we may choose the
normalization of B in F as a canonical model. By Example 1.4.13, the absolute
values on MF are normalized according to 1.3.6 by
|f |w := e− degp ∗ c (w)ordw (f )/[F :K] (f ∈ F, w ∈ MF ).
14.5.3. In what follows, we assume that the reader is familiar with Appendix B.4.
In analogy to 14.3.8, we define the counting function of the ramification divisor
14.5. The abc-theorem for function fields 499

Rp by

NRp = − (ΩY /B,v ) log |πv |v .
v∈MF
Note that πv is a local parameter of v , hence
1
NRp = degp∗ c (Rp ).
[F : K]
Let KY (resp. KB ) be the canonical line bundle on the smooth part Yreg (resp.
Breg ). Since the complement of the smooth part has codimension at least 2, the
corresponding canonical divisors are well-defined in the Chow groups by pass-
ing to the Zariski closures of the components, hence their degrees are also well-
defined. By Theorem B.4.5 and projection formula (A.13) on page 558, we get
1   degp∗ c (KY )
NRp = degp∗ c (KY ) − degp∗ c (p∗ KB ) = − degc (KB ).
[F : K] [F : K]
This suggests the following analogue of the absolute logarithmic discriminant
degc (KY )
dF := .
[F : K]
By the arguments in 14.5.2, all these quantities do not depend on the choice of Y .
Example 14.5.4. If B is a geometrically irreducible smooth projective curve and
c is the equivalence class of a point, then A.13.6 shows that dF = (2g(Y ) −
2)/[F : K], where g(Y ) is the genus of Y .
Definition 14.5.5. For a complete variety X over K with any local height λ
relative to a Cartier divisor D and for a finite subset S ⊂ MK , the counting
function NS,D and the proximity function mS,D are defined as in 14.3.1. By
14.5.3, we define d(P ) := dK(P ) for any P ∈ X .
If D is effective, then we define the conductor condK
λ as in Definition 14.4.2,
where the sum now ranges over all MK(P ) .
Example 14.5.6. A point P = (f0 : · · · : fn ) ∈ Pnk (F ) induces a rational map
fP : Y  Pnk defined over k . By Example 2.4.11, the standard height satisfies
1
h(P ) = degp∗ c fP∗ H
[F : K]
for every hyperplane H of Pnk with P ∈ H .
Example 14.5.7. More generally, we consider a complete variety X with a Cartier
divisor D defined over the constant field k . Then P ∈ X(F ) \ supp(D)(F )
induces a rational map fP : Y  X (locally defined as in Example 14.5.6), but
note that fP is intrinsically defined because X is defined over k . We claim that
1
λ(P, v) = ordv (fP∗ D) degp∗ c (v) (v ∈ MF )
[F : K]
500 T H E VO J TA C O N J E C T U R E S

is a local height relative to D . If D is a hyperplane section, then this follows


similarly as in Example 14.5.6. On a projective variety, D is the difference of
two hyperplane sections in suitable embeddings (cf. A.6.10, A.6.11), thus leading
to the claim. For complete varieties, we have to use the theory of local heights
from Section 2.7. Indeed, O(D) has a canonical metric  v given by sv := 1
for any local nowhere vanishing section s defined over k and it is immediate that
λ(P, v) is the corresponding local height. We conclude that
1
hD (P ) = degp∗ c (fP∗ (D)).
[F : K]
By the valuative criterion of properness, the complement of the domain of fP has
codimension at least 2 in Y (see A.11.10), hence fP∗ (D) may be viewed as a
Weil divisor on Y respecting rational equivalence. Let S be a finite subset of
MK , then fP∗ (D) = fP∗ (D)S + fP∗ (D)T , where the first summand is supported
over S and the second over the complement T of S . We conclude
1 1
mS,D (P ) = degp∗ c (fP∗ (D)S ), NS,D (P ) = degp∗ c (fP∗ (D)T ).
[F : K] [F : K]
For F = K(P ), it is clear that
1
condK
D (P ) = degp∗ c (fP∗ (D)red ),
[F : K]
where we recall that Ered denotes the sum of the components of the divisor E .
14.5.8. We introduce a technique to reduce the canonical quantities from Example
14.5.7 to the case of a function field of a curve. By intersection theory, all quan-
tities are homogeneous in c of degree dim(B) − 1. So we may assume that c is
very ample giving rise to a closed embedding B → PnK . Let F := K(P ) and let
Y be a model for F as in 14.5.2. By Bertini’s theorem (cf. [148], Cor.III.10.9),
we may choose (generic) hyperplanes H1 , . . . , Hdim(B)−1 of PnK such that the
proper intersection product
Yc := p∗ H1 . . . p∗ Hdim(B)−1 .Y
is an irreducible smooth projective curve over k . Bertini’s theorem is usually
stated for k algebraically closed and X regular, but this implies easily the claim
for any field of characteristic zero and the singularities of Y do not disturb the
applications of Bertini’s theorem because the dimension of the singular part de-
creases at least by 1 every time we intersect with a generic hyperplane. Similarly,
we may assume that
Bc := H1 . . . Hdim(B)−1 .B
is an irreducible smooth projective curve over k . Let Kc , Fc be the function
fields of Bc and Yc . Then the projection formula (A.13) on page 558 gives
[F : K] = degp∗ c Y / degc B = degp∗ c Yc / degc Bc = [Fc : Kc ]. (14.15)
14.5. The abc-theorem for function fields 501

Moreover, it is clear that P may be viewed as a point Pc ∈ X(Fc ) (see below).


We conclude that Kc (Pc ) = Fc . The goal is to show that the quantities from
Example 14.5.7 for P are the same as the corresponding quantities for Pc relative
to the function field Kc of the curve Bc .

Let U be the largest open subset of Yreg where fP is defined. Because Y \ U


has codimension at least 2 in Y and choosing H1 , . . . , Hdim(B)−1 generic, we
may assume that p∗ H1 , . . . , p∗ Hdim(B)−1 and the closure of fP∗ D in Y intersect
properly (even generically transversely) and that the proper intersection product
is supported in U . We may assume also that the restriction gP of fP to Yc is
a well-defined morphism. By commutativity of the proper intersection product
(generalizing A.9.20, see [125], Th.2.4), we have
p∗ H1 . . . p∗ Hdim(B)−1 .fP∗ D = fP∗ D.Yc = gP∗ D
and hence
1
hD (P ) = degp∗ c (gP∗ D) = hD (Pc ).
[F : K]
For v1 , . . . , vm ∈ MF and choosing H1 , . . . , Hdim(B)−1 generic, we may assume
that
degp ∗ c (vi )

∗ ∗
p H1 . . . p Hdim(B)−1 .vi = [yij ]
j=1
and that all the points yij are different (here, we use that the intersections are
generically transverse and A.9.22). If we choose for v1 , . . . , vm the components
of fP∗ D , then we get
p∗ H1 . . . p∗ Hdim(B)−1 .(fP∗ D)red = (gP∗ D)red ,
proving easily
condK Kc
D (P ) = condD (Pc ).
Applying the above to a finite subset S = {v1 , . . . , vm } of MK and to p−1 (S),
we get a finite subset Sc of MKc formed by the set of prime divisors, defined over
K , containing a point yij . As above, we may assume that
NS,D (P ) = NSc ,D (Pc ), mS,D (P ) = mSc ,D (Pc ).
As we may work on Yreg (the complement has codimension at least 2), Proposi-
tion II.8.20 of [148] implies
div(KYc ) ∼ (div(KY ) + p∗ H1 + · · · + p∗ Hdim(B)−1 ).Yc
and hence
d(Pc ) = d(P ) + (dim(B) − 1) degc (B)
using (14.15) and
  
deg p∗ H1 + · · · + p∗ Hdim(B)−1 .Yc = (dim(B) − 1) degp∗ c (Y ).
502 T H E VO J TA C O N J E C T U R E S

Remark 14.5.9. Now we restrict our attention to the case X = C an irreducible


smooth projective curve over K . The same arguments prove that the results
14.4.5–14.4.9 hold also in the context of function fields. In order to prove Propo-
sition 14.4.6, we have to use Proposition B.4.9 replacing Dedekind’s discriminant
theorem. In Proposition 14.4.8, we have to replace the right-hand side by

log |1/πv |v = deg(S0 ).
v∈S0

We leave the details to the reader. By the way, we may easily extend these results
to higher-dimensional complete varieties X , but we do not need that in the sequel.
14.5.10. We may pose the conjectures 14.4.10 and 14.4.13 also in the function
field case. However, as no Belyı̆ function exists in this case, we are unable to show
that the strong abc-conjecture implies the other conjectures. As a substitute for
the Fermat coverings in 14.4.15, we will use the following result:
Lemma 14.5.11. Let C be an irreducible smooth projective curve over K and let
n ∈ N\{0}. Suppose that we have disjoint non-zero rationally equivalent reduced
divisors D1 , D2 on C . Then there is an irreducible smooth projective curve C 
over K and a finite morphism π : C  → C of degree n which has ramification
divisor Rπ = (n − 1)D for D := (π ∗ (D1 + D2 ))red .
Proof: The argument works for any field K of characteristic zero. There is f ∈
K(C) \ {0} with
div(f ) = D1 − D2 .

Let C  be the irreducible smooth projective curve with function field K(C)( n f )
and let π : C  → C be the natural finite morphism induced by the extension of
function fields (use Lemma 1.4.10). First note that
deg(π) = [K(C  ) : K(C)] ≤ n.
By B.1.9, we verify easily that the discriminant of xn −f is ±nn f n−1 . By Lemma
B.2.2, we conclude that K(C  )/K(C) is unramified over any v ∈ MK(C) which
is no component of D1 or D2 . For w ∈ MK(C  ) with v := π(w) equal to a
component of D1 , the ramification index satisfies
0
ew/v = ordw (f ) = n · ordw ( n f ) ≥ n.
Similarly, we obtain ew/v ≥ n if v is a component of D2 . By Example 1.4.12,
we have
 
[K(C  ) : K(C)] = ew/v fw/v = ew/v [K(w) : K(v)],
w w

where w ranges now over all components of π −1 (v) for a given component v of
D1 + D2 . We conclude that the fibre consists only of one component w and
deg(π) = [K(C  ) : K(C)] = ew/v = n, [K(w) : K(v)] = 1.
By Proposition B.4.9, we get the claim. 
14.5. The abc-theorem for function fields 503

Theorem 14.5.12. The following conjectures are equivalent for the function field
K:
(a) Conjecture 14.4.10 for C = P1K ;
(b) Conjecture 14.4.10 for all curves C over K ;
(c) Vojta’s conjecture in 14.4.13 for all curves C over K .
Proof: (a) ⇒ (b): We have still a finite covering f : C → P1K using any non-
constant rational function. We choose a reduced effective divisor D0 on P1K such
that supp(D) ⊂ f −1 (D0 ) and such that f is unramified outside of f −1 (D0 ).
Then (b) follows from (a) along the same lines as in Theorem 14.4.16.
(b) ⇒ (c): This is analogous to the proof of Theorem 14.4.16.
(c) ⇒ (a): The goal is to prove Conjecture 14.4.10 for an effective reduced
divisor D on P1K . By the analogues of Propositions 14.4.5 and 14.4.9, we may
replace D by a larger effective reduced divisor. We choose any effective reduced
divisor D1 disjoint from D with deg(D1 ) = deg(D). Note that D ∼ D1 . We
may replace D by D + D1 . For n ∈ N \ {0}, Lemma 14.5.11 gives a finite
morphism π : C  → P1K of degree n with ramification divisor
Rπ = (n − 1) (π ∗ (D))red .
Using this covering instead of Example 14.4.15, the proof of (a) is completely
similar as the implication (d) ⇒ (a) in the proof of Theorem 14.4.16. 
If everything is defined over the constant field k , then we have seen in Example
14.5.7 that we have canonical local heights induced by geometry. In this special
case, we may prove the abc-conjecture, even without the ε -term. The correspond-
ing result is the following theorem of Stothers and Mason.
Theorem 14.5.13. Let C be an irreducible smooth projective curve over k and
let D be an effective reduced divisor on C defined over the constant field k . If we
use the canonical local heights from Example 14.5.7 relative to D and KC , then
hD+KC (P ) ≤ condK
D (P ) + d(P ) + (dim(B) − 1) degc (B)
 
for all P ∈ C \ supp(D) ∪ C(k) .
Proof: The basic idea is to use Bertini’s theorem to reduce the problem to the case
of a function field of a curve and then apply Hurwitz’s theorem.
By 14.5.8, we may assume that B is a curve. Let F := K(P ) with a model
p : Y → B as in 14.5.2. Taking into account the equivalence
div(fP∗ KC ) ∼ div(KY ) − RfP ,
from Theorem B.4.5, we conclude
1  
hKC (P ) = degp∗ c (KY ) − degp∗ c (RfP ) .
[F : K]
504 T H E VO J TA C O N J E C T U R E S

From Proposition B.4.7, we get


RfP ≥ fP∗ D − (fP∗ D)red .
This proves
1  
hD+KC ≤ degp∗ c (fP∗ D)red + degp∗ c (KY )
[F : K]
= condK
D (P ) + d(P )
proving the claim. 
Example 14.5.14. In particular, if B is a geometrically irreducible smooth pro-
jective curve over k and P ∈ C(k(B)) \ C(k), then
k(B)
hD+KC (P ) ≤ condD (P ) + 2g(B) − 2. (14.16)
We consider the special case C = P1k
and D = [0] + [1] + [∞]. There are
a(x), c(x) ∈ k(B) with P = (c(x) : a(x)). We define b(x) by a(x) + b(x) =
c(x). Since KP1 ∼
= OP1 (−2), the left-hand side of (14.16) is the standard height
hD+KC (P ) = h(P ) = h((a : b : c)).
We set S := f −1 (D), hence |S| = condD (P ). Dividing a, b, c by c , we may
k(B)

assume that a, b, c are S -units and (14.16) yields


h((a : b : c)) ≤ |S| + 2g(B) − 2,
which was already mentioned in Remark 12.4.3 and which was proved by Mason
[192]. Assuming also B = P1k , we may assume that a, b, c are coprime poly-
nomials in k[x]. Let n0 be the number of different zeros of a(x)b(x)c(x). We
get

k(B) n0 if deg(a) = deg(b) = deg(c),
|S| = condD (P ) =
n0 + 1 otherwise.
Hence we recover Theorem 12.4.1, which was proved by Stothers [293], from
(14.16).

The following simple lemma is the substitute of the lemma on the logarithmic
derivative in the function field case.
Lemma 14.5.15. Let Y be an irreducible smooth curve over k , let f ∈ k(Y ), let
v ∈ Mk(Y ) and let j ∈ N . Then

j

d
ordv f −1 f ≥ −j
dπv
for any local parameter πv in OY,v . Moreover, if f is a unit in OY,v it holds

j

−1 d
ordv f f ≥ 0.
dπv
14.5. The abc-theorem for function fields 505

Proof: By the definition of d/dπv in A.7.25 and noting that dπv is a basis of Ω1C,v
(see proof of Proposition B.4.7), we conclude that d/dπv is a derivative with
d
OY,v ⊂ OY,v .
dπv
Then the claim follows easily from
f = uπvordv (f )
for a unit u in OY,v and from Leibniz’s rule. 
14.5.16. Our next goal is an analogue of Cartan’s second main theorem for
function fields. As we will see the arguments are very similar as for Theorem
13.4.16.
The basic assumptions are the following: Let S be a finite subset of MK , let
q
H1 , . . . , Hq be hyperplanes of PnK in general position, let D := j=1 Hj and
let P be a point of PnK not lying in any hyperplane defined over k . As in 14.5.2,
we may choose a model Y of F := K(P ).
14.5.17. We first assume that K is the function field of a curve. This case re-
sembles most the field of meromorphic functions on C considered in Nevanlinna
theory. It enables us to define the ramification divisor of P in the following way:
The point P is given by a rational map fP : Y  Pnk defined over k . For
v ∈ MF , we may choose relatively prime elements f0v , . . . , fnv of the discrete
valuation ring OY,v with fP = (f0v : · · · : fnv ). By assumption, f0v , . . . , fnv
are linearly independent over k and so we may define ordv (RP ) as the order of
the Wronskian of f0v , . . . , fnv in v , where the derivatives are taken with respect
to a local parameter πv in v . By Leibniz’s rule and the multilinearity of the
determinant, this is independent of the choices of f0v , . . . , fnv and πv . Then we
define the ramification divisor to be

RP := ordv (RP )v,
v∈MF

with counting function



NS,ram (P ) := ordv (RP ) log |1/πv |v .
v∈p−1 S

Theorem 14.5.18. Let K be the function field of a curve in characteristic 0 and


let ε > 0. Under the assumptions in 14.5.16, it holds


n(n + 1)
mS,D (P ) + NS,ram (P ) ≤ (n + 1)h(P ) + d(P ) + log |1/πv |v ,
2 −1 v∈p S

where we use the canonical proximity function from Example 14.5.7.


506 T H E VO J TA C O N J E C T U R E S

Proof: We proceed as in the proof of Theorem 13.4.16. We choose fixed f0 , . . . , fn


∈ F with fP = (f0 : · · · : fn ). Of course, we cannot assume that they are regular
and relatively prime. Another difficulty is that no global coordinate z is available
on Y . In order to overcome this problem, we choose z ∈ F \ k as a reference for
differentiation. Then we use the Wronskian

j

d
W (f0 , . . . , fn ) := det fi .
dz i,j=0,...,n

Again, we may assume q ≥ n + 2 and we set p := q − n − 1. Let Hj :=


div(j (x)) for linear forms j (x) and let gj := j (f0 , . . . , fn ). First, it is clear
that
 
h (gk1 · · · gkp  )k1 <···<kp  = p · h(P ). (14.17)

No O(1)-term is necessary because the heights are canonical. Let I = {i1 <
· · · < in+1 } be the complement of {k1 < · · · < kp } in {1, . . . , q}. For the
logarithmic Wronskian
W (gi1 , . . . , gin +1 )
λ(gi1 , . . . , gin +1 ) := ,
gi1 · · · gin +1
we get again the fundamental identity
g1 · · · gq
gk1 · · · gkp  = · d−1 · λ(gi1 , . . . , gin +1 ), (14.18)
W (f0 , . . . , fn ) I
where dI is the determinant formed with the coefficients of (i )i∈I .
Let v ∈ p−1 S . We choose a local parameter πv for v . Leibniz’s rule and the
multilinearity of the determinant give
n (n +1)
λ(gi1 , . . . , gin +1 ) = (dπv /dz) 2 λv (gi1 , . . . , gin +1 ), (14.19)
where λv denotes the logarithmic Wronskian with respect to the differential oper-
ator d/dπv instead of d/dz . By Lemma 14.5.15, we get
n (n +1) n (n +1)
|λ(gi1 , . . . , gin +1 )|v ≤ |dπv /dz|v 2
· |1/πv |v 2
.
Together with the product formula, (14.18) implies

max log |gk1 · · · gkp  |v
k1 <...<kp 
v∈p−1 S
 
  W (f0 , . . . , fn ) 
≤ log  

g1 · · · gq v
v∈p−1 S
n(n + 1) 
+ (log |dπv /dz|v + log |1/πv |v ) .
2 −1 v∈p S
14.5. The abc-theorem for function fields 507

For i = 0, . . . , n the functions

fiv := πv− minj =0, . . . , n ordv (fj ) fi

are regular and relatively prime in v . As in (14.19), we get


n (n +1)
W (f0 , . . . , fn ) = (dπv /dz) 2 πv(n+1) mini =0, . . . , n ordv (fi ) Wv (f0v , . . . , fnv ),

where Wv denotes the Wronskian with respect to the differentiation d/dπv . We


conclude

log |W (f0 , . . . , fn )|v = −NS,ram (P )
v∈p−1 S
 n(n + 1)  

 dπv 
+ log   + (n + 1) log max |fi |v .
−1
2 dz v i
v∈p S

Moreover, we have
λHj (P, v) = − log |j (f0v , . . . , fnv )|v
= − log |gj |v + log max |fi |v .
i

We conclude that

max log |gk1 · · · gkp  |v ≤
k1 <···<kp 
v∈p−1 S

q 
NS,Hj (P ) − NS,ram (P ) − p log max |fi |v
i
j=1 v∈p−1 S
   

n(n + 1)  dπv 
+ log   + log |1/π |
v v .
2 dz v −1
v∈MF v∈p S

Note that dz may be viewed as a meromorphic section of KY and hence


  
 dπv 

log   = d(P ).
dz v
v∈MF

From (14.17), on page 506 we have



p h(P ) = log max |gk1 · · · gkp  |v .
k1 <···<kp 
v∈MF

For v ∈ p−1 S , we use

log |gk1 · · · gkp  |v ≤ p · log max |fi |v .


i
508 T H E VO J TA C O N J E C T U R E S

This leads to

q

p · h(P ) ≤ NS,Hj (P ) − NS,ram (P )
j=1


n(n + 1)
+ d(P ) + log |1/πv |v
2 −1 v∈p S

proving immediately the claim. 

Remark 14.5.19. To generalize Cartan’s second main theorem to a higher-


dimensional function field K = k(B), we introduce the n th truncated count-
ing function
1 
min (ordv (fP∗ D), n) degp∗ c (v)
(n)
ND (P ) :=
[F : K]
v∈MF

for n ∈ N and where F := K(P ) is as in 14.5.2. By Example 14.5.7, we have


(1)
ND (P ) = condKD (P ). Then the desired generalization is


q
(n)
(q − n − 1)h(P ) ≤ NHj (P )+
j=1

(14.20)
n(n + 1) degp∗ c (p−1 S)
+ d(P ) + (dim(B) − 1) degc (B) + .
2 [F : K]
To give a sketch of proof, we first note that we may assume B to be a curve by the
techniques of 14.5.8. Then the claim follows from Theorem 14.5.18 and from the
identity
q
(n)
qh(P ) − N∅,ram (P ) ≤ NHj (P ),
j=1

which is easily deduced from the properties of the Wronskian (similarly as in


[249], Lemma A.3.2.1).
In the following application to the generalized unit equation, we will not use
(14.20), but we will directly reduce it to Theorem 14.5.18, which is sharper.

Let S be a finite subset of MK . Recall that u ∈ K is called an S -unit if


ordv (u) = 0 for all v ∈ MK \ S . As a consequence of the above theorem,
we obtain:

Corollary 14.5.20. Suppose that u0 , . . . , un are S -units in K satisfying


u0 + · · · + u n = 0 (14.21)
14.5. The abc-theorem for function fields 509

and assume that the elements (ui )i∈I are linearly independent over k for every
proper subset I of {0, . . . , n}. Then
n(n − 1)
h((u0 : · · · : un )) ≤ (dK + degc (S) + (dim(B) − 1) degc (B)) .
2
Proof: By the techniques introduced in 14.5.8, we may assume that B is a curve.
We note also that 
degc (S) = log |1/πv |v .
v∈S

We apply Theorem 14.5.18 to P := (u1 : · · · : un ) ∈ Pn−1 (K) and to the hyper-


planes Hj := {xj = 0} for j = 0, . . . , n − 1 and Hn := {x0 + · · · + xn−1 = 0}.
Because the ui s are S -units, we have NS,Hj (P ) = 0 and hence mS,Hj (P ) =
h(P ). Omitting the term NS,ram (P ) ≥ 0 in Theorem 14.5.18 and using
h(P ) = h((u0 : · · · : un ))
as an easy consequence of (14.21), we get the claim. 
The special case in which K is the function field of a curve B of genus g is the
theorem of Voloch and Brownawell-Masser ([319], [52] Th.A).
Theorem 14.5.21. Let K be the function field of a geometrically irreducible
smooth projective curve B of genus g over the field of constants k of charac-
teristic 0. For u := (u0 , . . . , un ) ∈ K n+1 \ {0}, define

h(u) := − deg(v) min(ordv (u0 ), . . . , ordv (un )).
v∈MK

Suppose that u0 + . . . + un = 0, that for any proper subset I ⊂ {0, . . . , n} the


elements ui , i ∈ I , are linearly independent over k , and that u0 , . . . , un are
S -units for some finite set S ⊂ MK . Then
1
h(u) ≤ n(n − 1){deg(S) + 2g − 2}.
2

In fact, Brownawell and Masser ([52], Th.B) proved a bit more, substantially re-
laxing the linear independence condition of subsets of u0 , . . . , un . This is useful
for specific applications.
Theorem 14.5.22. Let K = k(B) be a function field in characteristic 0. Suppose
that u0 + . . . + un = 0, that no non-empty proper subsum vanishes, and that
u0 , . . . , un are S -units for some finite set S ⊂ MK . Then
(n − 1)n
h(u) ≤ max{dK + degc (S) + (dim(B) − 1) degc (B), 0}.
2

For the proof, we may not use Cartan’s second theorem for function fields because
the functions u1 , . . . , un need not be linearly independent over k . The following
510 T H E VO J TA C O N J E C T U R E S

lemma enables us to define an analogue of the Wronskians such that we can trans-
fer the steps from the proof of Cartan’s second main theorem. The proof does not
use the linearly independent case and hence reproves Corollary 14.5.20.
A subset I of {0, . . . , n} is called minimal if the set uI := {ui | i ∈ I} is linearly
dependent over k but every proper subset of uI is linearly independent over k .
Lemma 14.5.23. There are disjoint non-empty subsets I1 , . . . , Il of {0, . . . , n}
and non-empty subsets J1 , . . . , Jl−1 with
{0, . . . , n} = I1 ∪ · · · ∪ Il , Jν ⊂ I1 ∪ · · · ∪ Iν (ν = 1, . . . , l − 1)
such that I1 , J1 ∪ I2 , J2 ∪ I3 , . . . , Jl−1 ∪ Il are minimal.

For the elementary proof using simple linear algebra, we refer to [52], Lemma 6,
or [249], Lemma A.3.2.7.
Proof of Theorem 14.5.22: By the usual arguments from 14.5.8, we may assume
that B is a curve. By renumbering, we may assume that Ij = {Nj−1 , . . . , Nj −1}
for a sequence N0 = 0 < N1 < · · · < Nl = n + 1. For convenience, we set
J0 = ∅. For every ν ∈ {1, . . . , l}, we have a linear relation
cν,0 u0 + · · · + cν,n un = 0
×
with cν,i ∈ k for i ∈ Jν−1 ∪ Iν and cν,i = 0 else. We set nν := |Iν | and let
z ∈ k(B) \ k . We consider the (n1 − 1) × (n + 1) matrix

i

d
A1 := c1,j uj
dz i=0,...,n1 −2;j=0,...,n

and the nν × (n + 1) matrices



i

d
Aν := cν,j uj
dz i=0,...,nν −1;j=0,...,n

for ν = 2, . . . , l . Finally, let us consider the n × (n + 1) matrix


⎛ ⎞
A1
⎜ ⎟
A := ⎝ ... ⎠ .
Al
For j = 0, . . . , n, let Wj (u0 , . . . , un ) be the determinant of the matrix obtained
from A by deleting its j th column. They are analogues of the Wronskians in the
proof of Theorem 14.5.18. Since the sum of columns of A is 0, we get
Wj (u0 , . . . , un ) = (−1)j W0 (u0 , . . . , un ). (14.22)
×
The block nature of the matrix A shows that W0 (u0 , . . . , un ) is k -proportional
to the product
W (u1 , . . . , uN1 −1 )W (uN1 , . . . , uN2 −1 ) · · · W (uNl −1 , . . . , un )
14.5. The abc-theorem for function fields 511

of usual Wronskians. By minimality of the Iν in the decomposition of Lemma


14.5.23, these Wronskians and hence all Wj (u0 , . . . , un ) are non-zero. For
Wj (u0 , . . . , un )
λj (u0 , . . . , un ) :=  ,
i=j ui

the analogue of the fundamental identity (14.18) on page 506 is


u0 · · · un
uj = (−1)j λj (u0 , . . . , un ) (14.23)
W0 (u0 , . . . , un )
obtained immediately from (14.22). For
1 −2
n  ν −1
l n
a := i+ i,
i=0 ν=2 i=0

Lemma 14.5.15 yields as in the proof of Theorem 14.5.18 the bound


a a
|λj (u0 , . . . , un )|v ≤ |dπv /dz|v · |1/πv |v .
If we combine this inequality with (14.23) and sum over v ∈ S , the product
formula yields
 
   W0 (u0 , . . . , un ) 
log max |uj |v ≤ log   +

j=0,...,n u0 · · · un v
v∈S v ∈S
/
 (14.24)
+a (log |dπv /dz|v + log |1/πv |v ) .
v∈S

Replacing differentiation d/dz by d/dπv , we get


log |W0 (u0 , . . . , un )|v ≤ a log |dπv /dz|v .
Since u0 , . . . un are S -units, we conclude from (14.24) that


h(u) ≤ a dK + log |1/πv |v .


v∈S

Obviously, we have a ≤ (n − 1)n/2 proving the claim. 


Remark 14.5.24. Note that Theorem 14.5.22 includes the general abc-theorem
for polynomials mentioned in Theorem 12.4.4. Indeed, we just have to choose S
as the union of ∞ and the zeros of the polynomials. Similarly, we get the general
abc-theorem for function fields in characteristic zero:
Theorem 14.5.25. Let f0 , . . . , fn ∈ K = k(B) with f0 + · · · + fn = 0 such that
n subsum vanishes identically. For f := (f0 : · · · : fn ) ∈ P (K) and
n
no proper
D := j=0 {xj = 0}, we get
(n − 1)n  
h(f ) ≤ max condK D (f ) + dK + (dim(B) − 1) degc (B), 0 .
2
512 T H E VO J TA C O N J E C T U R E S

Proof: We know that f induces a rational map f : B  Pnk . Let S be the
set of prime divisors of B contained in the closure of f −1 (D). Dividing by f0 ,
we may assume that f0 , . . . , fn are S -units. We claim that f ∗ (D)red = S . For
− minj ordv (fj )
v ∈ S , the local equation of f ∗ ({xj = 0}) is fjv := fj πv . Hence
v ∈ f ∗ (D)red if and only if at least one fjv is in the maximal ideal of OB,v
proving immediately the claim. Hence we have

condK
D (f ) = degc (f (D)red ) = degc (S)

and Theorem 14.5.25 follows from Theorem 14.5.22. 

14.5.26. There are interesting unresolved questions connected with Theorem


14.5.25, best illustrated in the simplest case in which K = C(t) is the field of
rational functions in one variable and fi ∈ C[t] are polynomials, not all constant
and without a common factor. As shown in 12.4.5, the coefficient n(n − 1)/2 is
sharp if n = 2 or 3, and is at least 2n − 3 for n ≥ 4. Browkin and Brzeziński
[51] conjecture that 2n − 3 is the correct constant in the theorem, in the slightly
stronger form
" #
max deg(fi ) ≤ (2n − 3) max deg(rad(f0 · · · fn )) − 1, 0 . (14.25)

Another intriguing question has been posed by Vojta, namely whether for any fixed
positive ε > 0 and any n ≥ 2 there is a closed subvariety V  {x0 + · · · +
xn = 0} in projective space PnC , depending only on n and ε , with the following
property: If f0 (t), . . . , fn (t) are polynomials in C[t] without common zeros and
f0 + · · · + fn = 0, then
max deg(fi ) ≤ (1 + ε) max{deg(rad(f0 . . . fn )) − 1, 0}
unless the holomorphic curve {(f0 (t) : · · · : fn (t)) | t ∈ C} is contained in V .
We illustrate Vojta’s idea by showing how to exclude the example by Browkin and
Brzeziński (see 12.4.5) for the case n = 3. Consider the identity
(a + b + c)3 = a3 + b3 + c3 − 3abc + 3(ab + ac + bc)(a + b + c),
which reduces to the four term identity
a3 + b3 + c3 − 3abc = 0
if a + b + c = 0. The example of Browkin and Brzeziński is obtained by taking
a = 1, b = −t , c = t − 1 and f0 = a3 , f1 = b3 , f2 = c3 , f3 = −3abc, so that
equality holds in (14.25). Clearly, no subsum vanishes. The relations to avoid are
simply
(fi + fj + fk )3 − 27fi fj fk = 0
for any choice of distinct indices i, j, k .
14.6. Bibliographical notes 513

14.6. Bibliographical notes

Sections 14.2 and 14.3 are taken from [307] and [318], where the reader will find
further informations. The degeneracy of K -rational points in a variety of gen-
eral type was posed as an open problem in Bombieri’s lecture at the university
of Chicago in 1980. For function fields in characteristic 0, this was solved by J.
Noguchi [225] under the stronger assumption that the cotangent bundle is ample.
S. Lang gave more general conjectures relating the structure of K -rational points
also to hyperbolicity and connecting the special sets from Nevanlinna theory and
diophantine approximation (see [172]).
The zero-dimensional part of the exceptional set Z in the K -rational Vojta con-
jecture must depend on ε , but it may be that the higher-dimensional part is inde-
pendent. At least, this holds in Schmidt’s subspace theorem and in Vojta’s version
of Cartan’s second main theorem, both proved by P. Vojta in [309], [316].
Vojta’s conjecture with ramification does imply Conjecture 14.4.10 also in higher
dimension. This is due to P. Vojta [317], true over number fields and function fields
of characteristic zero.
Cartan’s second main theorem holds also in the linearly degenerate case, with the
factor n + 1 replaced by n + t + 1, where t is the codimension of the linear span
of the image of f . This was done by E.I. Nochka (see [223], [224]). Ru and Wong
have worked out the number theoretic analogue, see also P. Vojta [316] for an
alternative proof. For function fields in characteristic 0 (with hyperplanes defined
over the constant field), this is due independently to J. Noguchi [226] and J.T.-Y.
Wang [320]. In [320], there is also a generalization of the linearly non-degenerate
case to characteristic p .
In the non-split case (when varieties and divisors are not defined over the constant
field), very few results are known. For a function field K of characteristic 0 and
a curve C of genus g ≥ 2 over K , P. Vojta [311] proved
hKC (P ) ≤ (2 + ε)d(P ) + O(1)
for all P ∈ C(K) using the methods of Grauert [129] in the proof of the Mordell
conjecture over function fields. M. Kim [160] generalized it to characteristic p
under an assumption on the Kodaira spencer map. Note that Vojta’s conjecture
( H = KC ample, D = 0) would predict a factor 1 + ε instead of 2 + ε , at least
for points of bounded degree.
The proof of the general abc-theorem for polynomials is similar to the one of
Brownawell and Masser [52], it is inspired by Cartan’s proof of its second main
theorem in Nevanlinna theory. The proof of Voloch [319] is more geometric and
relies on the Brill–Segre formula, which is a generalization of Hurwitz’s genus
formula for non-degenerate maps from a curve into projective space.
A P P E N D I X A A L G E B R A I C G E O M E T RY

A.1. Introduction

We collect here some definitions and results from algebraic geometry needed in
the text. For most of our purposes, it is enough to work with varieties over a base
field K and to consider points rational in a fixed algebraic closure K . Thus we
may neglect the modern language of schemes in order to keep the exposition ele-
mentary. In some side remarks or proofs, not essential for the basic understanding
of the book, it will still be convenient to use the theory of schemes on the level of
[148], Ch.II.
Arguments are only given if they are easy and instructive or if we have not found
an appropriate reference, otherwise we freely quote from standard text books up to
the volumes of Grothendieck. Most of the quoted results are true for more general
classes of schemes, but we formulate them just for varieties using the dictionary
mentioned in A.2.8.
No knowledge of algebraic geometry is required for reading Appendix A. How-
ever, the presentation is too brief for learning the subject and it would be useful,
from an educational point of view, if the reader is familiar with the theory of vari-
eties over an algebraically closed field as in the books of R. Hartshorne [148], Ch.
I, D. Mumford [213], Ch. I, or I.R. Shafarevich [279].
We advise the reader to work through Sections A.2–A.4 to gather the most fre-
quently used definitions, notations, terminology, and results. Also if you look up
to the definition of a projective variety in Section A.6, then you will be ready to
start the book, coming back to Appendix A only when required in the text.

A.2. Affine varieties

Let K be a field and K an algebraic closure of K .


n
A.2.1. The affine n-space AnK is equal to K endowed with the following topol-
ogy: A subset Y is closed if and only if there is T ⊂ K[x1 , . . . , xn ] with
514
A.2. Affine varieties 515

zero set Y = Z(T ), where


n
Z(T ) := {α ∈ K | f (α) = 0 ∀ f ∈ T }.
This defines a topology on AnK depending on the ground field K . It is called
the Zariski topology. Note that x1 , . . . , xn are the coordinate functions on AnK .
On an affine n-space, we always fix a set of coordinate functions and set x :=
(x1 , . . . , xn ).
A.2.2. Let Y be a closed subset of AnK . The ideal of vanishing is given by
I(Y ) := {f ∈ K[x] | f (Y ) = {0}}.
Then
K[Y ] := K[x]/I(Y )
is the coordinate ring of Y . It is a reduced K -algebra meaning that K[Y ] has
no nilpotent elements. For an ideal J in K[x], let

J := {f ∈ K[x] | ∃ n ∈ N with f n ∈ J}
0
be the radical of J . Note that I(Y ) is a radical ideal, i.e. I(Y ) = I(Y ). By
the trivial fact Z(I(Y )) = Y , every closed subset is the zero set of a radical ideal.
Hilbert’s Nullstellensatz says

I(Z(J)) = J.
For a proof, we refer to [157], Th.7.15, where it is assumed that K is algebraically
closed but the same proof applies to our situation.
A.2.3. The Zariski topology on AnK induces a topology on Y . Let U be an open
subset of Y . Then a function f : U −→ K is called regular in α ∈ U if there
are p, q ∈ K[x] with q(α) = 0 and f = p/q in a neighbourhood of α. If f is
regular in all points of U , then f is called a regular function. The ring of regular
functions on U is denoted by OY (U ). Clearly, any element of the coordinate ring
may be viewed as a regular function on Y . In fact, we can prove K[Y ] = OY (Y ).
The pair (Y, OY (Y )) is called an affine variety over K . By abuse of notation, it
is usually denoted by Y .
A.2.4. Let V, V  be open subsets of affine K -varieties X, X  . A K -morphism
(or a morphism over K ) is a continuous map ϕ : V −→ V  such that for all
open subsets U  of V  , we have a well-defined map
ϕ : OX  (U  ) −→ OX (ϕ−1 (U  )), f  → f  ◦ ϕ.
This means simply that the function f  ◦ ϕ is regular on ϕ−1 (U  ). If the ground
field K is fixed, then we often simply speak about varieties and morphisms mean-
ing always that they are defined over K .
Note that we may reconstruct ϕ from ϕ : We may assume that X, X  are closed

subsets of AnK , Am
K with coordinates x and x , respectively. Then let ϕi :=
516 A L G E B R A I C G E O M E T RY

ϕ (xi ). We get ϕ(α) = (ϕ1 (α), . . . , ϕn (α)) for all α ∈ V . This way, we
obtain a bijective correspondence between K -morphisms of varieties over K and
K -algebra homomorphisms of the corresponding coordinate rings.
The morphism ϕ is called a K -isomorphism if and only if there is a K -morphism
ψ : X  → X such that ϕ ◦ ψ and ψ ◦ ϕ are both the identity map.
A.2.5. For x ∈ X , we consider pairs (U, f ), where U is an open neighbourhood
of x and f ∈ OX (U ). Two pairs (U, f ) and (U  , f  ) are called equivalent if
and only if f = f  is on a neighbourhood of x . The set of equivalence classes is
denoted by OX,x and is called the local ring of x in X . In K[X], we consider
the maximal ideal mx = {f ∈ K[X] | f (x) = 0}. Then OX,x is the localization
of K[X] in mx with unique maximal ideal mx = mx OX,x (see A.2.10).
A.2.6. Let K ⊂ L ⊂ K be an intermediate field. Let X be an affine variety over
K . Then the set of L-rational points of X is
{x ∈ X | f (x) ∈ L ∀ f ∈ OX (X)}.
If we view X as a closed subset of AnK , then this means that all coordinates of
x are in L. For x ∈ X , the smallest L such that x is L-rational is denoted by
K(x). In fact, we have

OX,x /mx −→ K(x), f + mx → f (x).
A.2.7. Note that points in X have not to be closed. For x ∈ X , the closure of x
is the set of conjugates of x . If X is a closed subset of AnK , then the conjugates
of x ∈ X are given by σx applying σ ∈ Gal(K/K) componentwise. Thus the
local rings of x and of its conjugates are the same.
A.2.8. Now we relate affine varieties to affine schemes. This makes it possible to quote
results from standard books about schemes. If the reader is not familiar with the language
of schemes, he can skip the following remarks without any problems for the understanding
of the book.
Let X be an affine variety over K with ring of regular functions A = OX (X) , Then we
have a map
t : X −→ Spec(A), x
→ I({x}).
Then t(X) is the set of maximal ideals and this is dense in Spec(A) . Moreover, the
topology on X is the coarsest topology making t continuous, i.e. U is open on X if and
only if it is the inverse image of an open subset V in Spec(A) . Then we have
∼ OX (t−1 (V ))
OSpec(A) (V ) =
or more formally t∗ OX ∼ = OSpec(A) . This follows immediately from the definitions and
the density of t(U ) in V . Therefore any morphism of affine varieties over K extends to a
morphism of affine schemes.

Conversely, for any reduced affine scheme Spec(A) of finite type over K , we can consider
the K -rational points of Spec(A) as an affine variety X over K with OX (X) = A . This
A.2. Affine varieties 517

may be used to translate results from affine varieties over K to reduced affine schemes of
finite type over K and conversely.

A.2.9. Let X be an affine variety over K with coordinate ring K[X]. By


Hilbert’s basis theorem, K[X] is a noetherian ring, i.e. any ideal is finitely gen-
erated or equivalently there exists no properly ascending chain of ideals in K[X]
(see for example [157], 7.9). Using A.2.2, there is a one-to-one corrrespondence
between closed subsets of X and radical ideals in K[X], given by considering the
ideal of vanishing I(Y ) or passing to the zero set Z(J). We conclude that there
is no properly descending infinite chain of closed subsets in X . This implies that
for any family (Uα )α∈I of open subsets of X , there is a finite I0 ⊂ I with
 
Uα = Uα .
α∈I0 α∈I

In other words, every open subset of an affine variety over K is quasicompact.

A.2.10. Not every open subset of an affine K -variety X is an affine K -variety.


But for any f ∈ K[X], the open subset
Xf := {x ∈ X | f (x) = 0}
is an affine variety over K with coordinate ring isomorphic to the localization of
K[X] in the multiplicative monoid {f n | n ∈ N} ([148], Prop.II.2.2). Note that
this open subsets form a basis for the topology on X .

To prove this, let x be a point in an open subset U . There is f ∈ I(X \ U ) with f (x) = 0
and hence x ∈ Xf ⊂ U .
This is easily used to prove that the local ring OX,x of X in x is isomorphic to
the localization of K[X] in the maximal ideal mx ([148], Prop.II.2.2).

A.2.11. Here, we explain how to pass from affine varieties over K to affine vari-
eties over K . Let X be an affine variety over K . It is a closed subset of AnK given
n
by the ideal I(X). As AnK and AnK have the same underlying set K , we may
view X as a closed subset of AnK given by the zeros of the set I(X) ⊂ K[x].
The corresponding affine variety over K is denoted by XK . Note that X and
XK have the same points but a different topology and a different ring of regular
functions. It is easily seen that the variety XK does not depend on the choice of
the affine space. It is called the base change of X to K .
For any extension field F of K , we define the base change XF as the affine
variety given by the closed subset Z(I(X)) in AnF and by the corresponding co-
ordinate ring
0
F [XF ] = F [x1 , . . . , xn ]/ I(X).
518 A L G E B R A I C G E O M E T RY

A.3. Topology and sheaves

Let T be a topological space and K a field.


A.3.1. Then Y ⊂ T is called irreducible if and only if for all closed subsets A
and B different from Y we have T = A ∪ B . The empty set is not irreducible.
Example A.3.2. If Y is a closed subset of AnK , then Y is irreducible (with respect
to the induced topology) if and only if I(Y ) is a prime ideal.

To prove it, assume first that Y is irreducible. Let a ∈ K[x] \ I(Y ) and b ∈ K[x] with
ab ∈ I(Y ) . Then the union of Z(a) ∩ Y and Z(b) ∩ Y is Y , hence Z(b) ⊃ Y proving
b ∈ I(Y ) . Thus I(Y ) is a prime ideal.
Conversely, assume that I(Y ) is a prime ideal. Let A and B be closed subsets of Y with
A ∪ B = Y . If A = Y , then A.2.2 shows that there is an a ∈ I(A) \ I(Y ) . Since
I(A) ∩ I(B) = I(A ∪ B)
contains ab for every b ∈ I(B) , we conclude that b ∈ I(Y ) . Therefore I(B) = I(Y )
proving B = Y and the irreduciblity of Y .
For an affine variety X , we conclude that in the one-to-one correspondence be-
tween radical ideals of K[X] and closed subsets of X mentioned in A.2.9, the
prime ideals correspond to the irreducible closed subsets of X .
A.3.3. A maximal irreducible subset of T is called an irreducible component
of T . As the closure of an irreducible subset is again irreducible, the irreducible
components are closed. If T is irreducible, then any non-empty open subset is
irreducible and dense. The proofs are immediate from the definitions.
Example A.3.4. If Y is a closed subset of AnK , then the irreducible components
of Y are the zero sets of the minimal prime ideals containing I(Y ). There are
finitely many irreducible components and their union is equal to Y .
A.3.5. We define the dimension of T to be
dim(T ) := sup{n | A0  A1  · · ·  An },
where A0  A1  · · ·  An is ranging over all chains of irreducible closed
subsets of T .
Example A.3.6. If X is an affine variety over K , then Example A.3.4 shows
that the dimension of X is equal to the Krull dimension of the noetherian ring
K[X]. By definition, the Krull dimension dim(A) of a commutative ring A is
the supremum over the length of all prime ideal chains. So this follows from the
one-to-one correspondence between prime ideals in K[X] and irreducible closed
subsets of X, which we deduce from Example A.3.2. If X is irreducible, then
it is a consequence of commutative algebra that the dimension of X is equal to
the transcendence degree of K(X) over K, where K(X) is the quotient field of
K[X] (H. Matsumura [197], Ch.5, §14). In particular, the dimension of AnK is n .
A.3. Topology and sheaves 519

Example A.3.7. Let X be an affine variety over C . We may view X as a closed


subset of AnC . Instead of the Zariski topology on Cn , we may use the complex
topology where the balls form a basis. It induces the complex topology on X
and we get a complex analytic variety Xan . Instead of regular functions, we
use holomorphic functions. For details, we refer to Section A.14 or to [280], Chs
VII, VIII. The complex topology is finer than the Zariski topology. In contrast to
the Zariski topology, the complex topology is Hausdorff. We have mentioned in
Example A.3.6 that dim(AnK ) = n . However, if we use the complex topology on
Cn , the dimension introduced in A.3.5 is infinite for n ≥ 1. So our notation is
useful only for the Zariski topology, but this is the only case of interest for us.
A.3.8. We call an affine variety X over K geometrically irreducible if XK is
irreducible (cf. A.2.11). Note that a geometrically irreducible variety X is always
irreducible since the topology on XK is finer.
X is called geometrically reduced if K[X] ⊗K K contains no nilpotent ele-
ments. If we consider X as a closed subset of AnK , then this is equivalent to
the assumption that the ideal in K[x] generated by I(X) is a radical ideal. By
Hilbert’s Nullstellensatz in A.2.2, we conclude that
K[XK ] = K[X] ⊗K K
for a geometrically reduced X .
Example A.3.9. Note that Z(x2 + 1) ⊂ A1R is an irreducible zero-dimensional
R -variety but not geometrically irreducible.
Example A.3.10. Let K be the quotient field of the polynomial ring F2 [t]. Then
Z(x2 + t) is a zero-dimensional variety in A1K , which is not geometrically
reduced.
A.3.11. Let X and X  be affine varieties over K . On X × X  , we define the
following structure of an affine variety over K . We may view X, X  as closed

subsets of AnK and Am K , respectively. We use the coordinates x on AK and x
n

on AK . We identify AK × AK with AK , hence the coordinates on the latter


m n m n+m

are given by (x, x ). We consider I(X) and I(X  ) as subsets of K[x, x ] by


I(X) ⊂ K[x] ⊂ K[x, x ] ⊃ K[x ] ⊃ I(X  ).
Then
X × X  = Z(I(X) ∪ I(X  )) ⊂ AK
n+m

makes X ×X  into an affine variety over K . We call X ×X  the product variety


of X and X  . It is easy to see that it does not depend on the choices of AnK and

AmK . Note that the topology on X × X is finer than the product topology.

Example A.3.12. We consider the group GL(n, K) of invertible n × n matrices


with entries in K . We identify the space of n × n matrices with entries in K
520 A L G E B R A I C G E O M E T RY

n2
with K by ordering the entries xij of the matrices lexicographically. Now we
2
consider GL(n, K) as a closed subset of AnK +1 by using the map
2  
GL(n, K) → AnK +1 , (xij ) → (xij ); det(xij )−1 .
2
Let (yij )1≤i,j≤n ; yn2 +1 be the coordinates on AnK +1 . Then GL(n, K) is iden-
tified with Z(det(yij )yn2 +1 − 1). This makes GL(n, K) into a n2 -dimensional
affine variety over K , which we denote by GL(n)K . Note that for any interme-
diate field L of K and K , the L-rational points are equal to GL(n, L). The
base change of GL(n)K to K is equal to GL(n)K . The group operation gives a
K -morphism
GL(n)K × GL(n)K −→ GL(n)K , (g1 , g2 ) → g1 · g2 .
Similarly, the inverse is a K -morphism. Hence GL(n)K is an example of a group
variety handled in Section 8.2.
A.3.13. A presheaf of abelian groups on our topological space T is a map assign-
ing to each open subset U of T an abelian group F(U ) and to every open subset
V of U a homomorphism ρU V : F(U ) → F(V ) called the restriction map such
that:
(a) F(∅) = 0;
(b) ρU
U is the identity;

W = ρW ◦ ρV .
(c) if W ⊂ V ⊂ U are open subsets, then ρU V U

Instead of abelian groups, we may consider rings, K -vector spaces or other alge-
braic structures.
Example A.3.14. For an open subset U of T , let F(U ) be the set of continuous
real functions on U . If V is an open subset of U , then we define ρU
V by restricting
functions of U to V . Then F is a presheaf of R -algebras on T . If T is a
differentiable manifold, then the same construction works with C ∞ -functions.
A.3.15. A presheaf F on T is called a sheaf if the following conditions are satis-
fied for every open subset U of T and every open covering (Ui )i∈I of U :
(a) if s ∈ F(U ) with ρU
Ui (s) = 0 for all i ∈ I , then s = 0;
U
(b) if si ∈ F(Ui ) for all i ∈ I and ρUUi ∩Uj (si ) = ρUi ∩Uj (sj ) for all i, j ∈ I ,
i j

then there is s ∈ F(U ) with ρUUi (s) = si for each i ∈ I .

Note that s in (b) is unique by (a). The presheaves in Example A.3.14 are sheaves.
Example A.3.16. Let X be an affine variety over K . Then OX is a presheaf of
K -algebras. We define ρU V again as the restriction map of functions. It is almost
by definition of a regular function that OX is a sheaf.
A.4. Varieties 521

A.3.17. Let F and G be presheaves of abelian groups on T . A homomorphism


ϕ : F → G of presheaves is a homomorphism ϕU : F(U ) → G(U ) for all open
V ◦ ϕU = ϕV ◦ ρV .
subsets U such that for all open subsets V ⊂ U , we have ρU U

A homomorphism of sheaves is the same as a homomorphism of presheaves.


Obviously, ϕ is called an isomorphism of presheaves if there is a morphism
ψ : G → F of presheaves such that ϕ ◦ ψ and ψ ◦ ϕ are both the identity on the
corresponding presheaves.
For more details, we refer to [148], II.1, or to [134], 0.3.

A.4. Varieties

Let K be a field and K an algebraic closure of K .


A.4.1. A prevariety X over K is a topological space with a finite open covering
(Uα )α∈I of X and homeomorphisms ϕα : Uα → Xα for affine varieties Xα
over K such that
ϕα ◦ ϕ−1
β : ϕβ (Uβ ∩ Uα ) −→ ϕα (Uβ ∩ Uα ) (A.1)
is a K -isomorphism for all α, β ∈ I . Then (Uα , ϕα ) is called an affine chart of
X.
Note that this concept is similar to the concept of manifold in differential geometry.
Not to focus to much on the above covering, we should consider a maximal atlas
compatible with the covering. We leave the details to the reader.
A.4.2. Let U be an open subset of the prevariety X . Then a function f : U → X
is called regular in x ∈ U if f ◦ ϕ−1 α is regular in ϕα (x) for all α ∈ I with
x ∈ Uα . By (A.1), we have to check that only for one α . Again, f is called
regular if it is regular in all x ∈ U . The K -algebra of regular functions on U is
denoted by OX (U ). Obviously, we get a sheaf OX on X .
A.4.3. Let X, X  be prevarieties over K . A map ϕ : X → X  is called a
morphism over K (or a K -morphism or simply a morphism), if ϕ is continuous
and for each open subset U  of X  and f  ∈ OX  (U  ), we have
ϕ (f  ) := f  ◦ ϕ ∈ OX (ϕ−1 U  ).
In analogy with the affine case, we define an isomorphism to be an invertible
morphism.
A.4.4. Let X and X  be prevarieties over K , given by finite affine charts (Uα )α∈I
and (Uβ )β∈J , respectively. Let ϕα : Uα → Xα and ψβ : Uβ → Xβ be the
homeomorphisms to affine varieties Xα , Xβ from A.4.1. On Uα × Uβ , we use
the topology such that the map ϕα × ψβ to the affine product variety Xα × Xβ
is a homeomorphism. Clearly, the topologies coincide on overlappings, therefore
522 A L G E B R A I C G E O M E T RY

the covering (Uα × Uβ )α∈I,β∈J defines a unique topology on X × X  such that
each Uα × Uβ is an open subset of X × X  . Then X × X  is a prevariety with
affine charts (Uα × Uβ , ϕα × ψβ ). The structure does not depend on the choice
of the affine charts. The details are left to the reader.
A.4.5. A prevariety X over K is called a variety over K if the diagonal ∆ :=
{(x, x) | x ∈ X} is closed in X × X . Obviously, any affine variety over K is a
variety. Regular functions and morphisms are the same as for prevarieties.
A.4.6. In the whole book, only the notion of a K -variety will occur. Let X be
a variety over K . For an intermediate field L of K and K , a point x ∈ X is
called L-rational if x is L-rational in one affine chart (and hence in all affine
charts containing x ). The set of L-rational points is denoted by X(L). Similarly
as in A.2.5, we define the local ring OX,x . It has a unique maximal ideal mx and
the residue field K(x) := OX,x /mx .

The base change XF of X to an extension field F/K is the variety over F


obtained by using base change of the affine charts from A.2.11. The set X(F ) of
F -rational points of X is defined by X(F ) := XF (F ), i.e. an F -rational point
of X is a point x ∈ XF with an affine neighbourhood, where x takes coordinates
in F .
The product X1 × X2 of varieties X1 , X2 over K is again a variety over K .
The easy proofs are left to the reader.
We call X geometrically irreducible if XK is irreducible. A K -variety X
is geometrically reduced if the affine open charts are geometrically reduced.
Note that for X geometrically irreducible (resp. geometrically reducible), the
base change XF with respect to any field extension F/K is irreducible (resp.
reduced). Moreover, if the characteristic of K is zero (or more generally if K is
perfect), then all K -varieties are geometrically reduced ([148], Exercise II.3.15).
For the reader familiar with the language of schemes, A.2.8 shows that varieties
over K are the “same” as reduced schemes of finite type and separated over K .
A.4.7. Let U be an open subset of the K -variety X . We choose an open covering
(Uα , ϕα )α∈I of X by affine charts. Then U is a K -variety with the induced
topology and with the charts Uα ∩ U . There is a slight problem that Uα ∩ U has
not to be affine. By A.2.10, we may replace Uα ∩ U by an open affine covering
(Vαβ )β∈Jα meaning that the restriction of ϕα to Vαβ gives a homeomorphism
onto an affine open subset of Xα . Then the (Vαβ , ϕα |Vα β ) are the affine charts of
the K -variety U and U is called an open subvariety of X . If U is isomorphic to
an affine variety over K , then U is called an affine open subset of X . We have
seen above that the affine open subsets of X form a basis for the topology of X .
A.4.8. Let Y be a closed subset of the K -variety X . Again, let (Uα , ϕα )α∈I be
affine charts covering X . Since any closed subset of an affine K -variety is again
A.4. Varieties 523

an affine K -variety, we conclude that Y is a K -variety with the induced topology


and with the affine charts (Uα ∩ Y, ϕα |Uα ∩Y ). Clearly, the diagonal in Y × Y is
closed. We always use this structure of K -variety on Y and we call Y a closed
subvariety of X .
For each open subset U of X , let JY (U ) be the ideal of regular functions on U
vanishing on Y ∩ U . Then JY is called the ideal sheaf of Y .
If i : Y  → X is a morphism of varieties over K such that i induces a K -
isomorphism onto a closed subvariety of X , then i is called a closed embedding
of Y  in X .
A.4.9. We apply the topological considerations from Section A.3 to the K -variety
X . First, X has finitely many irreducible components which cover X . For affine
X , this is Example A.3.4. In general, we apply this to the affine charts and then
we take closure. As the closure of an irreducible subset is again irreducible, we
get easily the claim.
On X , we use the dimension introduced in A.3.5. Let
A0  A1  · · ·  An
be a chain of irreducible closed subsets. We choose an affine chart Uα with A0 ∩
Uα = ∅. Using irreducibility, it is clear that Aj ∩ Uα is dense in Aj . Therefore
dim(X) = max dim(Xα ) < ∞.
α∈I
For an irreducible closed subset A of X , the codimension of A in X is
codim(A, X) := sup{n | A0  A1  · · ·  An },
where A0  A1  · · ·  An is ranging over all chains of irreducible closed
subsets of X with A = A0 . For any closed subset B of X , the codimension of
B in X is
codim(B, X) := inf{codim(A, X) | A ⊂ B},
where A is ranging over all irreducible closed subsets of B . Clearly, it is enough
to consider the irreducible components of B . If B and X are irreducible, then
dim(B) + codim(B, X) = dim(X).
Let Uα be an affine chart with Uα ∩ B = ∅ . Since Uα is dense, we have dim(Uα ) =
dim(X) and codim(Uα ∩ B, Uα ) = codim(B, X) . So we may assume that X is an
irreducible affine variety over K . Then the claim follows from [148], Caution II.3.2.8.

A.4.10. A variety X over K is called equidimensional (or of pure dimen-


sion n) if all irreducible components of X have dimension n . If X and X  are
K -varieties of pure dimension n and n , then XK is a K -variety of pure dimen-
sion n and X × X  is a K -variety of pure dimension n + n ([137], Cor.4.2.8,
Prop.4.2.4). However, the intersection of two irreducible subsets need not to be
pure dimensional.
524 A L G E B R A I C G E O M E T RY

A.4.11. Let X be an irreducible variety over K . We consider pairs (U, f ) where


U is a non-empty open subset and f is a regular function on U . Two pairs (U, f )
and (U  , f  ) are called equivalent if f = f  on U ∩ U  . An equivalence class
is called a rational function on X . Clearly, for a rational function, there is a
maximal representative (U, f ) containing all other representatives. We usually
identify a rational function with this representative. The rational functions form a
field K(X) called the function field of X . This follows from irreducibility of X .
Any non-empty open subset U of X is irreducible and dense (see A.3.3), hence
K(U ) = K(X). If U is affine with coordinate ring K[U ], then it follows that
K(X) is the quotient field of K[U ]. By Example A.3.6, the dimension of X is
equal to the transcendence degree of K(X) over K .
We remark that X is geometrically irreducible (resp. geometrically reduced) if
and only if K is separably algebraically closed in K(X) (resp. K(X) is separa-
ble over K ). For a proof, we refer to [213], II.4, Prop.4.
If X is geometrically irreducible and Y is an irreducible variety over K , then the
product variety X × Y is irreducible ([137], Cor.4.5.8). But the product of two
irreducible varieties is not necessarily irreducible.

A.4.12. To define rational functions on any variety X over K , we consider pairs


(U, f ) with U open dense in X and f regular on U . The equivalence classes
with respect to the above relation are called rational functions on X . They form a
ring K(X). If X is not irreducible, then K(X) is no longer a field. We have

K(X) = K(Xj ),
j

where Xj is ranging over all irreducible components of X , i.e. K(X) is a product



of fields. To see this, note that any (U, f ) ∈ K(X) may be represented on j (U \
∪i=j Xi ) and this is a disjoint union of open subsets.

A.4.13. Let F be a subfield of K/K and let σ : F → K be an embedding over


K . For an affine variety Y over F given by polynomials f1 (x), . . . , fm (x) in
AnF , the conjugate Y σ is the affine variety in Anσ(F ) given by σ(f1 ), . . . , σ(fm ).
The conjugate Y σ of any variety Y over F is defined by the conjugates of the
local affine charts. Then Y σ is a variety over σ(F ). If Y is projective subva-
riety of PnF (see Section A.6), then Y σ is given by the conjugate homogeneous
polynomials. Applying σ to the points of Y , we get a bijection onto Y σ , which,
however, is not a morphism over F .
Let R be the set of embeddings σ : F → K over K . Let X be a variety over
K and let Y be a closed subvariety of XF . Then we claim that Y0 := σ∈R Y σ
is the smallest closed subvariety of X which is defined over K and contains Y .
Moreover, if Y is irreducible, then Y0 is also irreducible.
A.5. Vector bundles 525

In a first step, we show that Y0 is defined over K . To see this, we may assume that Y is
affine given by f1 (x), . . . , fm (x) in AnF . Then Y0 is given by all the polynomials of the
form σ∈R σ(fjσ ) , where 1 ≤ jσ ≤ m . Since Y is defined over a finite subextension,
we may assume that [F : K] < ∞ . We consider first the case F/K separable. Passing to
a finite extension, we may assume F/K Galois with Galois group R . It is enough to show
that Y0 is defined by the polynomials
 
  
ρ(µ) ρ ◦ σ(fjσ ) = TrF (x)/K (x) µ σ(fjσ ) (µ ∈ F, 1 ≤ jσ ≤ m)
ρ∈R σ∈R σ∈R

defined over K . Clearly, Y0 is contained in the zero set of these polynomials. If x ∈ Y0 ,


there are 1 ≤ jσ ≤ m (σ ∈ G) with σ(fjσ )(x) = 0 . By Artin’s theorem on linear
independence of characters ([173], Ch.VIII, Th.4.1), there is µ ∈ F with
 
ρ(µ) ρ ◦ σ(fjσ )(x) = 0,
ρ∈R σ∈R

proving the first step in the separable case.


In general, let E be the separable closure of K in F . Then F/E is purely inseparable.
We may assume that char(K) = p > 0 , otherwise we are done by the separable case. By
e
[156], Prop.8.13, there is a power pe such that αp ∈ E for all α ∈ F . We conclude that
e e
f1p , . . . , fm
p
have coefficients in E , hence Y is defined over E (as a closed subvariety of
XE ). Now the separable case yields that Y0 is defined over K .
Every closed subvariety defined over K containing Y has to contain also all conjugates,
hence Y0 is indeed minimal. If Y is irreducible and Y0 is the union of two closed subsets,
then Y is contained in one of them and irreducibility of Y0 follows from minimality.

Example A.4.14. Let Z be a zero-dimensional irreducible closed subvariety of


X . Then Z is contained in an affine open subset U with ideal of vanishing I(Z)
equal to a maximal ideal m of K[U ]. For x ∈ Z , A.2.5 and A.2.6 show that
OX,x ∼= K[U ]m and that K(x) ∼ = K(Z) ∼ = K[U ]/m . By A.2.7, Z is the set of
conjugates of x .
Example A.4.15. Let X be an irreducible variety over K and let F be the sepa-
rable algebraic closure of K in K(X). By [137], Cor.4.5.10, the irreducible com-
ponents of XK are defined over the Galois closure E of F/K and their number
is [F : K]. By A.4.13, they are conjugates with respect to Gal(E/K).

A.5. Vector bundles

Let X be a variety over a field K .


A.5.1. A vector bundle over X is a variety E over K with a morphism πE :
E → X over K and the following additional structure: There is an open covering
−1
(Uα )α∈I of X and isomorphisms ϕα : πE (Uα ) → Uα × ArKα of varieties over
−1
K such that πE = p1 ◦ ϕα on πE (Uα ), where p1 is the first projection of
526 A L G E B R A I C G E O M E T RY

Uα × ArKα . Moreover, we assume that for all α, β ∈ I and x ∈ Uα ∩ Uβ , there is


gαβ (x) ∈ GL(rα , K) with
ϕα ◦ ϕ−1
β (x, λ) = (x, gαβ (x)λ) (A.2)
for all λ ∈ ArKα . Note that rα = rβ on overlappings. If all rα are equal to a fixed
number r , then E is called a vector bundle of rank r .
A.5.2. For example, p1 : X × ArK → X is a vector bundle of rank r . It is
called the trivial vector bundle of rank r . In A.5.1, we have required that E
is locally isomorphic to the trivial vector bundle. Hence (Uα , ϕα )α∈I is called a
trivialization of E . By abuse of notation, we often skip the morphism πE .
−1
The fibres of E are denoted by Ex := πE (x). Note that πE = p1 ◦ ϕα means
that ϕα maps Ex isomorphically onto the fibre of Uα × ArK over x . We consider
the restriction
ϕα
Ex (K(x)) −→ {x} × K(x)r
to K(x)-rational points. The right-hand side is a K(x)-vector space. We also
endow Ex (K(x)) with the K(x)-vector space structure such that the above map
is an isomorphism. We claim that this definition is independent of the choice of
the trivialization. To see this we have just to note that gαβ (x) ∈ GL(r, K(x)) in
(A.2), because both ϕα and ϕβ are isomorphisms of K -varieties.
Similarly, we see that Ex has a canonical K -vector space structure such that ϕα
r
induces an isomorphism of K -vector spaces onto {x} × K . Clearly, we have
Ex = Ex (K(x)) ⊗K(x) K .
Remark A.5.3. If we assume in A.5.1 that E is only a prevariety with all the
other properties of a vector bundle, then local triviality easily implies that E is a
variety.
A.5.4. We call gαβ (x) the transition matrix. Since the trivializations ϕα , ϕβ are
morphisms over K , it is easy to see that we have a morphism
Uα ∩ Uβ −→ GL(r)K , x → gαβ (x),
where GL(r)K is the group variety introduced in Example A.3.12. On Uα ∩ Uβ ∩
Uγ , we have the cocycle rule
gαβ gβγ = gαγ . (A.3)
A.5.5. Let E and F be vector bundles over X . A map ϕ : E → F is called a
homomorphism of vector bundles if it is a morphism of varieties over K such
that for all x ∈ X , there is a linear map ϕx : Ex → Fx of K -vector spaces with
ϕ(x) = (x, ϕx (x)). Note that this implies πF ◦ ϕ = πE .
An isomorphism of vector bundles is an invertible homomorphism of vector bun-
dles. If a homomorphism ϕ : E → F of vector bundles is a closed embedding,
then E is a subbundle of F and we may identify E with ϕ(E).
A.5. Vector bundles 527

Example A.5.6. Let E be a vector bundle over K and let µ ∈ K . Then we have
a homomorphism [µ] : E → E given by using multiplication with µ on the fibre
Ex . To check that [µ] is a K -morphism, we use that [µ] is given in a trivialization
Uα × ArK by (x, λ) → (x, µλ),
A.5.7. In this remark, we show how to give a vector bundle by its transition ma-
trices. Let (Uα )α∈I be an open covering of X . For all α, β ∈ I , we consider
K -morphisms gαβ : Uα ∩ Uβ → GL(r)K satisfying the cocycle rule (A.3). Then
we glue the trivial bundles Uβ × ArK and Uα × ArK along the isomorphisms

ϕαβ : (Uα ∩ Uβ ) × ArK −→ (Uα ∩ Uβ ) × ArK
given by ϕαβ (x, λ) = (x, gαβ (x)λ). For more on glueing, see [148], Exer-
cise II.2.12. We obtain a vector bundle E over X with trivializations ϕα :
−1
πE (Uα ) → Uα × ArK such that the transition matrices are equal to gαβ .
Conversely, if we start with a vector bundle and apply this process to its transition
matrices, we get a new vector bundle isomorphic to the original one.
Example A.5.8. Let E and E  be vector bundles over X . As an abstract set,
the direct sum E ⊕ E  is given as the disjoint union of (Ex ⊕ Ex )x∈X . We get
πE⊕E  by mapping Ex ⊕ Ex onto x . To define a vector bundle structure on it, we
choose trivializations (Uα , ϕα )α∈I and (Uα , ϕα )α∈I of E and E  , respectively.
Note that we may always assume that the open coverings are the same by passing
to a common refinement. We claim that there is a unique vector bundle structure
on E ⊕ E  such that
 r r

−1 rα +rα
ϕα ⊕ ϕα : πE⊕E  (Uα ) −→ Uα × AK = Uα × K ⊕ K α ,
α

defined a priori on fibres, is an isomorphism of vector bundles. Implicitly, this is


used to define E ⊕ E  as a K -variety. To see the claim, we note that the transition
matrices of E ⊕ E  are given by

 gαβ (x) 0
gαβ :=  ∈ GL(rα + rα , K(x)),
0 gαβ (x)

where gαβ , gαβ are the transition matrices of E and E  , respectively. Clearly,
gαβ gives a morphism Uα ∩ Uβ → GL(rα + rα )K proving well-definedness of


the vector bundle. Obviously, we have


(E ⊕ E  )x (K(x)) = Ex (K(x)) ⊕ Ex (K(x)).
Note that we have a homomorphism E ⊕ E → E of vector bundles given by
addition on each fibre.
A.5.9. Let E be a vector bundle over X . A section of E over an open subset U
of X is a K -morphism s : U → E such that πE ◦ s is the identity map on U . If
U = X , then s is called a global section of E . Using Examples A.5.6 and A.5.8,
it is clear that the set Γ(U, E) of sections of E over U is a K -vector space. Then
528 A L G E B R A I C G E O M E T RY

we get a sheaf E of K -vector spaces on X by setting E(U ) := Γ(U, E) for each


open subset U of X and using restriction of morphisms. Then E is called the
sheaf of sections of E .
Example A.5.10. Let OX := X × A1K be the trivial bundle of rank 1 over X ,
then we identify the sheaf of sections with the sheaf of regular functions OX on
X in the following way. Let f be a regular function, then x → (x, f (x)) gives a
section of OX and every section has this form.
Example A.5.11. Not every sheaf of K -vector spaces is the sheaf of sections of
a vector bundle. For example, if JY is the ideal sheaf of a closed subvariety Y
of codimension at least 2 in X , then it is not ot this form. Otherwise, JY would
be associated to a vector bundle of rank 1, since JY = OX is outside of the
subvariety. But then JY would be locally a principal ideal in OX , which is a
contradiction to codimension 2.
For the precise relation between sheaves and vector bundles, see A.10.13.
A.5.12. Let E and F be vector bundles over K of rank r and r . Then the
tensor product E ⊗ E  of E and E  is constructed similarly as the direct sum in
Example A.5.8. As an abstract set, it is the disjoint union of Ex ⊗ Ex , x ∈ X . Let
(Uα , ϕα )α∈I , (Uα , ϕα )α∈I be trivializations of E and E  , respectively. Let x

(resp. x ) be the coordinates of ArK (resp. ArK ). We order (xi ⊗ xj )1≤i≤r,1≤j≤r
r r 
lexicographically to get an identification of K ⊗K K with Arr K . There is a
unique structure on E ⊗ E  as a vector bundle over X such that all

−1
ϕα ⊗ ϕα : πE⊗E  (Uα ) −→ Uα × AK ,
rr

defined a priori on fibres, are isomophisms of vector bundles. This is clear since
the transition matrices of E ⊗ E  are given by gαβ ⊗ gαβ

. Clearly, we have
(E ⊗ E  )x (K(x)) = Ex (K(x)) ⊗K(x) Ex (K(x))
and
Γ(Uα , E ⊗ E  ) = Γ(Uα , E) ⊗ Γ(Uα , E  ).
However, this has not to be true for all open subsets of X . Here, equal means up
to canonical isomorphism.
A.5.13. Similarly, we construct the dual vector bundle E ∗ of E . It is the disjoint
union of the dual vector spaces Ex∗ of Ex , x ∈ X . The transition matrices of E ∗
t
are given by the transposes hαβ := gβα .
We can also extend other constructions from linear algebra to vector bundles. It is
always the same pattern. First, we define the underlying set using the construction
fibrewise. Then we choose the evident trivializations. We use it to define the vector
bundle structure on the abstract set. We have to show that it fits on overlappings,
which becomes clear by considering transition matrices. Note that pointwise, they
are the same as the transformation matrices in linear algebra.
A.5. Vector bundles 529

For example, we can define exterior products E ∧E  of vector bundles, Hom(E,


E  ) or the quotient vector bundles E/F for a subbundle F of E . The details
are left to the reader.
A.5.14. Let ϕ : X  → X be a morphism of varieties over K and let E be a
vector bundle over X . Suppose that E is given by the transition matrices gαβ
with respect to the open covering (Uα )α∈I . Then the pull-back ϕ∗ (E) of E is
the vector bundle over X  given by the transition matrices gαβ ◦ ϕ with respect to
the open covering (ϕ−1 Uα )α∈I .
If ϕ is a closed embedding or an inclusion of an open subvariety, then ϕ∗ E is
simply denoted by E|X  called the restriction of E to X  .
A.5.15. Let F be an extension field of K and let E be a vector bundle on X .
Then the base change EF is the vector bundle on XF given by the same transition
functions as E . If F ⊂ K , then the fibre of EF over x ∈ X is Ex ⊗K F .
A.5.16. A line bundle L on X is a vector bundle of rank 1 over X . Note that
the tensor product or the pull-back of line bundles are again line bundles. We use
the following notation: The n -fold tensor product of L is denoted by L⊗n and
L−1 = L∗ for the dual. For negative n , we define L⊗n := (L−1 )⊗|n| .
The set of isomorphism classes of line bundles on X form a group under ⊗ . It
is called the Picard group Pic(X). An element c of Pic(X) is always written
boldly. The Picard group is abelian and so it is written additively. The isomor-
phism class of a line bundle L is denoted by cl(L). We have 0 = cl(OX ) and
−cl(L) = cl(L−1 ). To check this, note that the transition functions of L ⊗ L−1
t
are given by gαβ gβα = 1.
ψ
The following evident functoriality rule is important. Given morphisms X  →
ϕ
X  → X of varieties over K , we have
ψ ∗ ϕ∗ (c) = (ϕ ◦ ψ)∗ (c) ∈ Pic(X  )
for every c ∈ Pic(X).
A.5.17. The zero section may be the only global section of a line bundle L on
X . To bypass this difficulty, we introduce meromorphic sections. We consider
pairs (U, sU ), where U is an open dense subset of X and sU ∈ Γ(U, L). Two
pairs (U, sU ) and (V, sV ) are called equivalent if sU = sV on U ∩ V . This
gives an equivalence relation. The corresponding equivalence classes are called
meromorphic sections of L. By abuse of notation, we simply denote them by s .
Note that there is a maximal (W, sW ) and we identify the equivalence class with s
to have a concrete section at hand. We call s an invertible meromorphic section
of L if there is an open dense subset U of X such that s is a regular function on
U without zeros.
530 A L G E B R A I C G E O M E T RY

Example A.5.18. Similarly as in Example A.5.10, we may identify a rational


function f with a meromorphic section x → (x, f (x)) of OX and conversely.
A.5.19. Let E be a vector bundle on X given by the transition matrices gαβ with
respect to the open covering (Uα )α∈I . For an open subset U of X , the restriction
of s ∈ Γ(U, E) may be identified with sα ∈ OX (Uα ) satisfying
sα = gαβ sβ
on Uα ∩ Uβ ∩ U . Conversely, any set sα ∈ OX (Uα ), α ∈ I, with this transition
rule gives rise to a section s ∈ Γ(U, E) using the sheaf property.
If ϕ : X  → X is a morphism of K -varieties and s ∈ Γ(U, E), then sα ◦ ϕ =
(gαβ ◦ ϕ)(sβ ◦ ϕ) and hence we get a section ϕ∗ (s) of ϕ∗ E over ϕ−1 U .
A.5.20. A vector bundle over X is said to be generated by global sections
(sλ )λ∈I if for all x ∈ X , the vectors {sλ (x) | λ ∈ I} generate the K -vector
space Ex . In particular, if E is a line bundle, this means that sλ (x) = 0 for at
least one λ ∈ I .
We say that E is generated by global sections if it is generated by the family of
all global sections.
If E, E  are vector bundles over X generated by global sections (sλ )λ∈I and
(sµ )µ∈J , then E ⊗ E  is generated by global sections (sλ ⊗ sµ )λ∈I,µ∈J . More-
over, if ϕ : X  → X is a morphism of varieties over K , then ϕ∗ (E) is generated
by the global sections (ϕ∗ sλ )λ∈I .
A base-point of a line bundle L is a point where every global section of L van-
ishes. A line bundle generated by global sections is called base-point-free.

A.6. Projective varieties

Let K be a field and K an algebraic closure of K .


n+1
A.6.1. On K , two non-zero vectors are called equivalent if they lie in the
same one-dimensional linear subspace. The set of equivalence classes is denoted
by Pn (K). It may be viewed as the set of one-dimensional linear subspaces in
n+1 n+1
K . We denote points in Pn (K) by (α0 : · · · : αn ), where α ∈ K \ {0} is
a representative. We should always be aware that the coordinate vector α is only
determined up to multiples. For T ⊂ K[x0 , . . . , xn ] consisting of homogeneous
polynomials, let
Z(T ) := {α ∈ Pn (K) | f (α) = 0 ∀f ∈ T }
be the zero set of T . All subsets of this form are called closed in Pn (K). The
Zariski topology on Pn (K) is the topology with exactly these closed subsets. It
depends on K , so we denote the corresponding topological space by PnK . It is
A.6. Projective varieties 531

called the projective n -space over K . In fact, it is canonically a variety, as we


will show below. On PnK , we always fix a set of coordinates x = (x0 : · · · : xn )
and we call them homogeneous coordinates.
A.6.2. The standard affine open subsets of PnK are
Ui := {α ∈ PnK | αi = 0} (i = 0, . . . , n).
Then we have homeomorphisms

α0 αi−1 αi+1 αn
ϕi : Ui −→ AnK , α → ,..., , ,..., .
αi αi αi αi
So PnK is a K -variety with affine charts (Ui , ϕi )i=0,...,n . For the proof that the
diagonal is closed, we refer to A.6.4. Clearly, the K -variety PnK does not depend
on the choice of coordinates, i.e. if we change coordinates by g ∈ GL(n + 1, K),
then we get the same K -variety PnK .
A.6.3. Let Y be a closed subset of PnK . By A.4.8, it is a closed subvariety of PnK .
We call it a projective variety over K . The homogeneous ideal I(Y ) of Y is
the ideal in K[x0 , . . . , xn ] generated by the homogeneous polynomials vanishing
on Y . The homogeneous coordinate ring S(Y ) of Y is defined by
5
S(Y ) := S(Y )d := K[x]/I(Y ),
d∈N
where the graduation is induced by the degree of polynomials. If Y is irreducible,
then the field of rational functions K(Y ) is the subfield of the quotient field of
S(Y ) given by
{f /g | ∃ d ∈ N with f, g ∈ S(Y )d }.

To sketch the proof, we view Y as a closed subset of PnK . Then there is a standard affine
open subset Ui with Y ∩ Ui = ∅ . By A.4.11, the quotient field of K[Ui ∩ Y ] is equal
to K(Y ) . Since Ui is isomorphic to affine n-space, any rational function may be writ-
ten as the quotient of two polynomials in x0 , . . . , xi−1 , xi+1 , . . . , xn . By passing to the
homogenizations, we get the claim.

A.6.4. The product of projective varieties over K is again a projective variety over
K . To prove it, let Y (resp. Y  ) be a closed subvariety of PnK (resp. Pm K ). We
consider the Segre embedding
ι : PnK × Pm
K −→ PK ,
N
(x, x ) → (xi xj )0≤i≤n,0≤j≤m ,
where N = (n + 1)(m + 1) − 1. As Y × Y  is a closed subvariety of PnK × Pm K,
it is enough to show that the Segre embedding is a closed embedding. Let (yij ) be

the coordinates on PN K (say ordered lexicographically) such that ι maps (x, x )

to (yij ) = (xi xj ). We have to prove that ι maps PK × PK isomorphically onto
n m

the closed subvariety


Z := Z({yij ykl − ykj yil | i, k ∈ {0, . . . , n}, j, l ∈ {0, . . . , m}}).
532 A L G E B R A I C G E O M E T RY

Clearly, the image of ι is contained in Z . Let Vkl := {ykl = 0} be a standard


−1
affine open subset of PN
K . Then ι (Vkl ) is the product of the standard affine open
subsets Uk := {xk = 0} and Ul := {xl = 0}. Then we define a K -morphism


Vkl ∩ Z −→ Uk × Ul , (yij ) → ((y0l : · · · : ynl ), (yk0 : · · · : ykm )) .


It is easily checked that it is inverse to ι|Uk ×Ul . Therefore ι is an isomorphism
onto Z .
If n = m , then the diagonal in PnK × PnK is the intersection of PnK × PnK with
Z({yij −yji | i, j = 0, . . . , n}) in PN
K , hence the diagonal is closed. We conclude
that every projective variety is indeed a variety.
A.6.5. Let L be the subbundle of the trivial vector bundle E := PnK × An+1
K
given by
L := {(λ, µ) | µ ∈ Kλ}.
n+1
This means that the fibre over λ is just the line through λ in K . We de-
note the standard coordinates of PnK and An+1 K by x 0 , . . . , xn and y 0 , . . . , yn ,
respectively. Then L is given in the trivial bundle by the equations
xi yj − xj yi (i, j = 0, . . . , n).
Hence L is a closed subset of E . We use that to define L as a closed subvariety
over K . Moreover, πL := πE |L is a K -morphism. To define trivializations, we
use the standard affine open subsets Uα := {xα = 0} of PnK . Let
−1
ϕα : πL (Uα ) −→ Uα × A1K , (λ, µ) → (λ, µα ).
Then it is easy to see that ϕα is an isomorphism of K -varieties which is linear
in the fibres. Therefore L is a line bundle on PnK called the tautological line
bundle. It is denoted by OPnK (−1). The above trivializations lead to the transition
functions

gαβ (x) =

on Uα ∩ Uβ . For m ∈ Z , let
OPnK (m) := OPnK (−1)⊗(−m) .
 m
x
Then its transition functions are gαβ (x) = xαβ on Uα ∩ Uβ .

A.6.6. We claim that the global sections of OPnK (m) may be identified with the
homogeneous polynomials in K[x0 , . . . , xt ] of degree d .

For s ∈ Γ(PnK , OPnK (m)) and with respect to the above trivializations, there are regular
functions sα on Uα such that
(ϕα ◦ s) (x) = (x, sα (x)) (A.4)
label for all x ∈ Uα . They satisfy
sα = gαβ sβ (A.5)
A.6. Projective varieties 533

on Uα ∩ Uβ , where gαβ (x) = (xβ /xα )m are the transition functions of OPnK (m) . Con-
versely, any regular functions sα ∈ OPnK (Uα ) for α = 0, . . . , n satisfying (A.5) determine
a unique global section s with (A.4). Since Uα is a standard affine open subset of PnK , we
may identify the sα with polynomials. The rule (A.5) means that there is a homogeneous
polynomial of degree m such that sα is obtained by inserting 1 for xα . Hence the global
sections of OPnK (m) may be identified with the homogenous polynomials of degree m in
the variables x0 , . . . , xm with coefficients in K .

A.6.7. Let E be a vector bundle over a projective K -variety X . Then Γ(X, E)


is a finite-dimensional K -vector space ([148], Th.II.5.19). In particular, if K is
algebraically closed and X is irreducible, then OX (X) = Γ(X, OX ) has to be a
finite-dimensional field extension of K and hence K = OX (X).
A.6.8. Let L be a line bundle on a K -variety X and assume that it is generated
by a set of global sections {s0 , . . . , sn }. Then we have a K -morphism
ϕ : X −→ PnK , x → (s0 (x) : · · · : sn (x)) .
To be more precise, we choose a trivialization of L around x such that we can
identify s0 , . . . , sn with regular functions as in Example A.5.10. Then the above
definition makes sense because not all sj (x) vanish. Since we have homogeneous
coordinates on PnK , this definition does not depend on the choice of the trivializa-
tion. We claim that L = ϕ∗ OPnK (1) and sj = ϕ∗ (xj ) hold for j = 0, . . . , n.

In order to prove this statement, let Vj := {x ∈ X | sj (x) = 0} . Since L is generated


by these global sections, (Vj )j=0,...,n is an open covering of X . Clearly, we have Vj =
ϕ−1 ({xj = 0}) . Moreover, sj defines a trivialization
ϕj : πL−1 (Vj ) −→ Vj × A1K , λsj (x)
→ (x, λ).
The transition functions of L with respect to this trivialization are
sj (x) xj ◦ ϕ
gij (x) = = (x).
si (x) xi ◦ ϕ
This proves L = ϕ∗ OPnK (1) . To check sj = ϕ∗ (xj ) , note that xj (resp. sj ) is given in
x s
the trivialization over {xi = 0} (resp. Vi ) by the regular function xji (resp. sji ). Because
xj sj
of x i ◦ ϕ = s i , we get the claim.

A.6.9. Conversely, let ϕ : X → PnK be a K -morphism for a K -variety X . Then


L := ϕ∗ OPnK (1) is generated by the global sections s0 := ϕ∗ (x0 ), . . . , sn :=
ϕ∗ (xn ). If we construct now the morphism determined by these global sections as
in A.6.8, then we get back our original morphism ϕ .
A.6.10. Let X be a projective variety over K . Then a line bundle L on X is
called very ample if there is a closed embedding i : X → PnK such that L ∼ =
i∗ OPnK (1). Obviously, very ample line bundles are generated by its global sections,
which may be viewed as hyperplane sections. Note however that not every global
534 A L G E B R A I C G E O M E T RY

section of L has to be a hyperplane section with respect to the embedding i. But


we can construct an i such that this is true (see Remark A.6.11 below).
A line bundle L is called ample if L⊗n is very ample for some n ∈ N .
For line bundles L and M , the following holds ([148], Exercise II.7.5):

(a) If M is ample, then L ⊗ M ⊗n is very ample for n sufficiently large.


(b) If L is generated by global sections and M is ample (resp. very ample),
then L ⊗ M is ample (resp. very ample).
(c) If L and M are both ample, then L ⊗ M is ample.
Remark A.6.11. Let L be a very ample line bundle on X and let s0 , . . . , sn be
a basis of Γ(X, L). Then we claim that the morphism
ϕ : X −→ PnK , x → (s0 (x) : · · · : sn (x))
from A.6.8 is a closed embedding.


To prove this, we choose a closed embedding i : X → Pm K such that i OPm K
(1) = L .
We denote the coordinates on PK by y0 , . . . , ym and those of PK by x0 , . . . , xn . The
m n

morphism i is determined by the global sections t0 = i∗ (y0 ), . . . , tm = i∗ (ym ) as in


A.6.9. We adjoin some tm+1 , . . . , tM ∈ Γ(X, L) to get generators of the vector space and
a morphism
j : X −→ PM
K, x
→ (t0 (x) : · · · : tM (x)).
First note that j(X) is closed in PMK (see A.6.15 below). Clearly, we have a morphism
ψ : j(X) → i(X) with i = ψ ◦ j . By definition of a closed embedding, i maps X
isomophically onto i(X) and so we check easily that ψ is an isomorphism. This proves
that j is a closed embedding. In particular, the morphism
ϕ : X −→ Pm+n+1
K , x
→ (t0 (x) : · · · : tm (x) : s0 (x) : · · · : sn (x))
is a closed embedding. Since t0 , . . . , tm are linear combinations of s0 , . . . , sn , we have a
morphism ψ  : PnK → Pn+m+1
K with ϕ = ψ  ◦ ϕ . Since ϕ is a closed embedding, we
conclude as above that ϕ is a closed embedding.

Remark A.6.12. A line bundle L on X is ample (resp. very ample) if and only
if the base change LK on XK is ample (resp. very ample). To see the ample
case, use the cohomological criterion of ampleness from [148], Prop.III.5.3, and
the compatibility of cohomology and base change (see A.10.28). The very ample
case follows from [137], Prop.2.7.1, implying of course also the ample case. In
fact, it is proved that a morphism is a closed embedding if and only if its base
change is a closed embedding.
Example A.6.13. A multiprojective space is a product
PK := PnK1 × · · · × PnKr
A.6. Projective varieties 535

of projective spaces. The projection to the ith factor PnKi is denoted by pi . Now
for d1 , . . . , dr ∈ Z, let
OP (d1 , . . . , dr ) := p∗1 OPn 1 (d1 ) ⊗ · · · ⊗ p∗r OPn r (dr ).
Let xi = (xi0 : · · · : xini ) be the coordinates on PnKi . By generalizing A.6.6, we
see that the global sections of OP (d1 , . . . , dr ) may be identified with the multiho-
mogeneous polynomials in x1 , . . . , xr , homogeneous of degree di in xi .
The Segre embedding may be extended to include several factors, thereby proving
that OP (1, . . . , 1) is very ample. On the other hand, p∗i OPn i (1) is generated by
global sections (but certainly not ample for two or more factors). By A.6.10, we
conclude that O(d1 , . . . , dr ) is very ample if d1 , . . . , dr ≥ 1.
A.6.14. A variety X over K is called complete if for all varieties Y over K , the
second projection p2 : X × Y → Y is closed (i.e. maps closed sets to closed sets).
In algebraic geometry, this is the analogue of compact complex manifolds.
There is a relative version of this notion called proper morphisms. Every closed
embedding and every morphism from a complete variety will be proper. This
notion is important in algebraic geometry, many finiteness results are related to it.
For our book, it plays only a minor role and is used to state the results properly.
The reader may always think of a morphism of complete varieties.
For j = 1, 2, let ϕj : Xj → S be a morphism of varieties over K . Then
X1 ×S X2 := {(x1 , x2 ) ∈ X1 × X2 | ϕ1 (x1 ) = ϕ2 (x2 )}
is called the fibre product of X1 and X2 over S . It is easily seen that the fibre
product is a closed subset of X1 × X2 , so we may view X1 ×S X2 as a closed
subvariety of X1 × X2 . Let ϕ : X → X  be a morphism of varieties over K . For
every morphism ψ : Y  → X  , we define the base change ϕY  : X ×X  Y  → Y 
by ϕY  (x, y  ) = y  . The morphism ϕ is called proper if all base changes of ϕ
are closed.
Clearly, a variety X is complete if and only if the constant map X → A0K is
a proper morphism. For details about proper morphisms, we refer to [148], II.4.
(However, the reader has to translate the results from the category of schemes. Note that the
fibre product of varieties is the variety associated to the fibre product of schemes. Since any
morphism of varieties is of finite type and separated, our definition of a proper map agrees
with the one in [148].)
A.6.15. We mention some properties of complete varieties over K . Most of them
can be deduced from the corresponding properties of proper morphisms in [148],
Cor.4.8.
(a) A closed subvariety of a complete variety over K is complete because any
closed embedding of varieties is proper and the composition of proper mor-
phisms is proper.
536 A L G E B R A I C G E O M E T RY

(b) The base change XK of a complete K -variety X is a complete variety


over K . More generally, the base change of a proper morphism is again
proper. Thus the fibre product of proper morphisms over S is proper over
S . In particular, the product of complete varieties over K is a complete
variety over K .
(c) If ϕ is a morphism from a complete variety X over K to an arbitrary K -
variety X  , then ϕ is closed and the image of X is a complete variety over
K . To deduce it, note that if a composition ψ1 ◦ ψ2 of morphisms is proper,
then ψ2 is also proper. This proves that ϕ is closed and we get the claims.
(d) A complete affine variety consists of finitely many points. For a proof, see
A.10.18.
(e) Any projective variety over K is complete ([148], Th.II.4.9). However, the
converse does not hold (cf. [280], Ch.VI, 2.3).

A.7. Smooth varieties

Let K be a field, K an algebraic closure of K , and let X be a variety over K


with sheaf of regular functions OX .
A.7.1. Let A be a commutative K -algebra with identity and let M be an A -
module. Then a K -derivative of A into M is a K -linear map ∂ : A → M
satisfying Leibniz’s rule
∂(ab) = a∂(b) + b∂(a)
for all a, b ∈ A . Note that K ⊂ A and ∂(K) = {0} by the Leibniz’s rule. We
denote the A -module of K -derivations of A into M by DerK (A, M ).
A.7.2. The goal is to define the tangent space. Recall from differential geometry
that this is done by using derivations. A similar concept is used here.
For x ∈ X , let K(x) = OX,x /mx be the residue field of x . Then the tangent
space TX,x of x is the K(x)-vector space
TX,x := DerK (OX,x , K(x)).
Example A.7.3. Let A = K[x1 , . . . , xn ] and let M be an A -module. Then

DerK (A, M ) −→ M n , ∂ → (∂x1 , . . . , ∂xn ) .
This is easily deduced from Leibniz’s rule. In particular, if M = K[x1 , . . . , xn ],
then we have
5n

DerK (K[x1 , . . . , xn ], K[x1 , . . . , xn ]) = K[x1 , . . . , xn ] .
i=1
∂xi
A.7. Smooth varieties 537

For α ∈ AnK , let ∂x



i
|α ∈ DerK (K[x], K(α)) be the partial derivative evaluated
at α. Using M = K(α) above, we get
5 
∂ 
n
DerK (K[x], K(α)) = K(α) .
i=1
∂xi α
Recall that OAnK ,α is the localization of K[x] in the maximal ideal I(α). Again
by Leibniz’s rule, we get
DerK (OAnK ,α , K(α)) = DerK (K[x], K(α))
and hence 
5
n
∂ 
TAnK ,α = K(α) .
i=1
∂xi α

A.7.4. Let π : A → A be a homomorphism of commutative K -algebras with
identity. Note that an A -module M  is naturally an A -module using π to define
multiplication by elements of A . Then we have an A -linear map
DerK (A , M  ) −→ DerK (A, M  ), ∂ → ∂ ◦ π.
If π is surjective, then this map is one-to-one. Moreover, the image consists of the
K -derivatives of A into M vanishing on ker(π).
Let ϕ : X → Y be a morphism. We consider x ∈ X and we assume that
K(x) = K(ϕ(x)). Then we have a homomorphism ϕ : OY,ϕ(x) → OX,x
inducing the differential (dϕ)x : TX,x → TY,ϕ(x) , ∂ → ∂ ◦ ϕ .
A.7.5. For a point y of a closed subvariety Y of X , we use this trivial remark
to identify TY,y with a subspace of TX,y . Note that the restriction of regular
functions gives a surjective homomorphism OX,y → OY,y and hence an injective
K(y)-linear map TY,y → TX,y .
A.7.6. As a corollary of Example A.7.3, we see that the tangent space in x ∈ X
is a finite-dimensional K(x)-vector space. More precisely, we may assume X a
closed affine subvariety of AnK given by polynomials f1 , . . . , fr . By A.7.4 and
A.7.5, the tangent space of x in X is a K(x)-linear subspace of TAnK ,x given by
TX,x = {∂ ∈ TAnK ,x | ∂(f1 ) = · · · = ∂(fr ) = 0} .
Let F/K be a field extension. By A.7.4 again, we see that DerF (K[X] ⊗K F,
F (x)) is the subspace of TAnF ,x = Der(F [x], F (x)) characterized by ∂(f1 ) =
· · · = ∂(fr ) = 0. We conclude
TX,x ⊗K(x) F (x) = DerF (K[X] ⊗K F, F (x)) . (A.6)

The coordinate
√ ring of the base change XF is K[X] ⊗K F/ 0 , where the radical
ideal 0 consists of the nilpotent elements (see A.2.11). By A.7.4, we conclude
that TXF ,x is an F (x)-linear subspace of TX,x ⊗K(x) F (x). Equality occurs if
X is geometrically reduced. This holds certainly in char(K) = 0 (see A.4.6).
538 A L G E B R A I C G E O M E T RY

A.7.7. We use A.7.6 to extend the definition of the differential in A.7.4 assuming
no longer K(x) = K(y). Let ϕ : X → Y be a morphism mapping x to y . Since
we deal with a local problem, we may assume X, Y affine. By base change, ϕ
induces a homomorphism K[Y ] ⊗K K(x) → K[X] ⊗K K(x) of K(x)-algebras
and hence a K(x)-linear map
Der (K[X] ⊗K K(x), K(x)) → Der (K[Y ] ⊗K K(x), K(x)) , ∂ → ∂◦(ϕ ⊗1) .
By A.7.6, this is a map TX,x → TY,y ⊗K(y) K(x), which we call the differential
(dϕ)x .
A.7.8. Let x ∈ X and let mx be the maximal ideal of OX,x . Note that mx /m2x
is a K(x) = OX,x /mx -vector space whose dual is denoted by (mx /m2x )∗ . Then
we have a K(x)-linear map
ρ : TX,x −→ (mx /m2x )∗ ,
where ρ(∂) is defined for ∂ ∈ TX,x by
ρ(∂)(f ) := ∂(f ) ∈ K(x)
for f ∈ mx . By Leibniz’s rule, ρ(∂) vanishes on m2x and is K(x)-linear, hence
we may view ρ(∂) as an element of (mx /m2x )∗ .
Now we assume that x is K -rational. Then ρ is an isomorphism. Its inverse maps
 ∈ (mx /m2x )∗ to ∂ ∈ TX,x given by
∂(f ) := (f − f (x)), f ∈ OX,x .
The space (mx /m2x )∗ is Zariski’s definition for the tangent space at x ∈ X(K).
A.7.9. For x ∈ X , let dimx (X) := maxY dim(Y ), where Y ranges over all
irreducible components containing x. Since the prime ideals of OX,x are in one-
to-one correspondence with the closed irreducible subsets containing x , we get
dimx (X) = dim (OX,x ).
From commutative algebra ([197], p.78), we know
dimK(x) (mx /m2x ) ≥ dim (OX,x ).
We call OX,x a regular local ring if equality occurs. More generally, this holds
for any noetherian local ring.
If OX,x is regular, then we say that x is a regular point of X . A singular point
is a point which is not regular. If all points of X are regular, then X is called a
regular variety.
A.7.10. For x ∈ X , we have
dimK(x) (TX,x ) ≥ dimx (X).
A.7. Smooth varieties 539

This follows from A.7.8 and A.7.9 if x is K -rational. In general, we use base
change to K . The right-hand side does not change (see A.4.10). Then we use
A.7.6 to get the claim.
A.7.11. A point x ∈ X is called smooth if
dimK(x) (TX,x ) = dimx (X).
If all points are smooth, then X is a smooth variety over K.
A.7.12. Let x ∈ X be a smooth point. We claim that x is a regular point of XK
and hence also of X ([137], Prop.0.17.3.3, for the descent).

Let mx be the maximal ideal of x in K[U ] ⊗K K . As in A.7.8, we have


  
∼  ∗
DerK K[U ] ⊗K K m , K −→ mx /m2x .
x

By A.7.6, the left-hand side may be identified with


 
DerK K[U ] ⊗K K, K = TX,x ⊗K (x) K.
By A.4.10, the Krull dimension of OX,x is equal to the one of OX K ,x and hence
dim(OX K̄ ,x ) = dimK (x) (TX,x ) = dimK (mx /m2x ).
 
But OX K ,x is the quotient of K[U ] ⊗K K m by the nilpotent elements, hence they
x
have the same Krull dimension. We conclude that
 
dim K[U ] ⊗K K m = dimK (mx /m2x )
x
 
and hence K[U ] ⊗K K m is a regular local ring. In commutative algebra, we prove that
x
a regular local ring is a unique factorization domain ([197], Th.48, p.142). In particular, it
is an integral domain proving
 
OX K ,x = K[U ] ⊗K K m .
x

We conclude that x is a regular point of XK .


A.7.13. A regular point of X is not necessarily smooth. For K perfect however, smooth
and regular points of X are the same (use [137], Prop.6.7.4).
A.7.14. If X is a smooth variety over K , then A.7.12 shows that X is geometrically
reduced. Conversely, if X is a geometrically reduced variety over K such that XK is a
smooth variety over K , then X is a smooth variety over K . This follows from
TX,x ⊗K (x) K = TX K ,x
as we have seen in A.7.6.
It follows also from A.7.12 that the irreducible components of a smooth variety over K
are disjoint because the local ring OX,x in an intersection point x of two irreducible com-
ponents is not an integral domain contradicting regularity. Therefore, X is the disjoint
union of irreducible open subvarieties. Hence the connected components are the same as
the irreducible components. If X is an irreducible smooth variety over K with at least one
K -rational point, then X is geometrically irreducible ([137], Cor.4.5.14).
540 A L G E B R A I C G E O M E T RY

A.7.15. Let X be a closed subset of AnK . For x ∈ X , we have the Jacobi criterion of
smoothness: Let f1 , . . . , fr be generators of I(X) . Then x is a smooth point of X if and
only if the Jacobi-matrix


(fj ) (x)
∂xi 1≤i≤n,1≤j≤r
has rank n − dimx (X) .

To prove this, we consider the K(x) -linear map


ϕ : TAnK ,x −→ K(x)r , ∂
→ (∂f1 , . . . , ∂fr ).
By A.7.3, the dimension of the image of ϕ is equal to the rank of the Jacobi-matrix J(x) .
Using A.7.6, the kernel may be identified with TX,x . We conclude
dim(TX,x ) + rank(J(x)) = n.
A.7.16. By the Jacobi criterion, the smooth points of X form an open subset char-
acterized by the non-vanishing of a suitable minor. If the variety is geometrically
reduced, this subset is dense ([137], Cor.17.15.13).
A.7.17. If X, X  are smooth varieties over K , then it follows immediately from
the Jacobi criterion that X × X  is a smooth variety over K . For x ∈ X(K), x ∈
X  (K), we have
TX×X  ,(x,x ) = TX,x ⊕ TX  ,x
using Leibniz’s rule.
A.7.18. Our next goal is to define the tangent bundle for X . First, we handle AnK .
Then the tangent bundle TAnK is the disjoint union of (TAn ,x )x∈AnK as a set. We
K
get a map π : TAnK → AnK , mapping the K -vector space TAnK ,x to x . There is a
bijective map ϕ from TAnK to the trivial
 bundle AnK × AnK over AnK with inverse
 ∂ 
given by ϕ−1 (λ) := i=1 λi ∂x
n
 in the fibre over α ∈ AnK . Then TAnK is a
i
α
vector bundle by requiring that ϕ is a trivialization. By Example A.7.3, we may
identify the global sections of TAnK with Der(K[x], K[x]).
A.7.19. Let X be a smooth affine variety over K . An element
∂ ∈ Der(K[X], K[X])
is called a vector field over X . We have the evaluation ∂|x ∈ TX,x for all x ∈ X
given by
∂|x (f ) = ∂(f )(x).
(By Leibniz’s rule, ∂ extends to a derivative on OX,x .) If we view X as a closed
subset of AnK , then TX,x is a subspace of TAnK ,x . Let f1 , . . . , fr be generators
of I(X). We assume that X is of pure dimension d . Then the Jacobi criterion
shows that X is covered by the open subsets

2

UI,J := det fj = 0 ,
∂xi i∈I,j∈J
A.7. Smooth varieties 541

where I ⊂ {1, . . . , n}, J ⊂ {1, . . . , r} are both of cardinality n − d .


A.7.20. Let X be a pure dimensional affine variety over K of dimension d given
as a closed subset of AnK by I(X) = (f1 , . . . , fr ). We assume that there are
subsets I ⊂ {1, . . . , n}, J ⊂ {1, . . . , r} of cardinality n − d with


det fj (x) = 0
∂xi i∈I,j∈J

for all x ∈ X . Then we claim that Der(K[X], K[X]) is a free K[X]-module


with a basis ∂1 , . . . , ∂d . Moreover, ∂1 |x , . . . , ∂d |x is a K(x)-basis of TX,x for
every x ∈ X .

We sketch the proof: For notational simplicity, we assume I = J = {1, . . . , n − d} . For


x ∈ X , we consider
 
∂ 
n
gi (x) ∈ TAnK ,x .
i=1
∂xi  x

By A.7.5, it is an element of TX,x if and only if



n

gi (x) (fj )(x) = 0 (j = 1, . . . , r). (A.7)
i=1
∂xi

We consider it as a system of homogeneous linear equations in the unknowns gi (x) . Let us


write the Jacobi matrix in block form

∂ A(x) B(x)
(fj ) (x) = ,
∂xi 1≤i≤n,1≤j≤r
C(x) D(x)

where A(x) is an invertible (n − d) × (n − d) matrix. Then (A.7) is equivalent to


A(x)
(g1 (x), . . . , gn (x)) = 0.
C(x)
By linear algebra, a basis of solutions is given by the rows of the d × n matrix
 
(hij (x)) := −C(x)A−1 (x) Id .

Clearly, the entries of A and C are regular functions on X . By the formula for A−1 (x)
using determinants, this also holds for nthe entries of A−1 (x) . We conclude that hij ∈

K[X] . For i = 1, . . . , d , let ∂i := j=1 hij ∂ x j . Since its evaluation at every point of
X satisfies (A.7), this is a well-defined vector field on X , i.e. ∂i ∈ Der(K[X], K[X]) .
Then ∂1 |x , . . . , ∂d |x is a basis of TX,x for all x , therefore ∂1 , . . . , ∂d are K[X] -linearly
independent. For any ∂ ∈ Der(K[X], K[X]) , we have


d
∂= ∂(xn−d+j )∂j ,
j=1

because ∂i (xn−d+j ) = δij . Hence ∂1 , . . . , ∂d is a K[X] -basis of Der(K[X], K[X]) .


542 A L G E B R A I C G E O M E T RY

A.7.21. Now we are ready to define the tangent bundle TX of a smooth K -


variety X . As a set, TX is the disjoint union of the tangent spaces TXK ,x , x ∈
X . We consider pure dimensional affine open subsets U of X such that the
module Der(K[U ], K[U ]) is a free K[U ]-module of rank dim(U ). Since X
is the disjoint union of irreducible open subvarieties, we see that X has a ba-
sis of irreducible affine open subsets. By A.7.20, X is covered by open subsets
U considered above. For such an U , we choose a K[U ]-basis ∂1 , . . . , ∂d of
Der(K[U ], K[U ]). Then the coordinate map ϕU gives an isomorphism from the
fibres of TX over U to the fibres of the trivial bundle U ×AdK . We claim that there
is a unique vector bundle structure on TX such that the ϕU ’s are trivializations.
Proof: Clearly, we may assume that X is irreducible. Let d := dim(X) and suppose that
(U, ϕU ) is given by ∂1 , . . . , ∂d as above. Let (U  , ϕU  ) be another trivialization given by
another basis ∂1 , . . . , ∂d ∈ Der(K[U  ], K[U  ]) . Using Leibniz’s rule, any vector field on
U extends to an element of Der(OX,x , OX,x ) for all x ∈ U . Moreover, ∂1 , . . . , ∂d (resp.
∂1 , . . . , ∂d ) form a basis of Der(OX,x , OX,x ) for all x ∈ U ∩ U  . Since the evaluations
at x form also a basis of TX,x , there are gij (x) ∈ K(x) with
 
d
∂i x = gji (x)∂j |x .
j=1

We conclude that gij is a regular function on U ∩ U  . Since ϕU ◦ ϕ−1 U  is given by


(x, λ)
→ (x, (gij ) · λ) , it is a morphism of varieties. It has the inverse ϕU  ◦ ϕ−1
U , hence
it is an isomorphism. This proves the claim. 
In fact, we have used ϕU to define the vector bundle structure over U by requiring that ϕU
is an isomorphism of vector bundles. Then the above argument shows that the structure fits
on overlappings.
Note it is the same construction as in A.5.7, where we have constructed a vector bundle
from given transition matrices. However, here the vector bundle is given as an abstract set.

A.7.22. Let U be an affine open subset of X . Then we have


Der(K[U ], K[U ]) = Γ(U, TX ).
By the construction of the tangent bundle, this is true locally. Using the behaviour
of derivatives with respect to localizations, we get easily the claim. It is also
immediate that the vector space of K(x)-rational points in the fibre TX over x is
equal to the tangent space in x .
A.7.23. Let X be a smooth variety over K . We apply the various constructions

of Section A.5 to the tangent bundle TX . The cotangent space TX,x at x is

the dual HomK(x) (TX,x , K(x)) of TX,x . The cotangent bundle TX is the dual

bundle of the tangent bundle. The sections of TX (resp. ∧k TX ) are called vector
fields (resp. k -forms). The sheaf of k -forms is denoted
d by Ω k
X . If X is of pure
k ∗ ∗
dimension d , then ∧ TX is a vector bundle of rank k . In particular, ∧d TX is a
line bundle called the canonical line bundle of X , which we denote by KX .
A.7. Smooth varieties 543

A.7.24. Let U be an open subset of the smooth K -variety X . For f ∈ OX (U ),


we define the differential df ∈ Ω1X (U ) by the following procedure: For x ∈ X ,

df (x) ∈ TX,x is given by ∂ ∈ TX,x → ∂(f )(x).
A.7.25. Let C be a smooth curve over K . For every f ∈ K(C), we get a
meromorphic section df of TC∗ . If g ∈ K(C) has differential not identically 0,
dg ∈ K(C) defined by df = dg · dg .
then we get a rational function df df

If char(K) = 0, then dg is identically zero if and only if g is locally constant on


CK . In char(K) = p = 0, this may fail but it follows from A.7.8 that dπ does
not vanish identically for any local parameter π in a point x ∈ C .
A.7.26. Let ϕ : X → X  be a morphism of smooth varieties over K . The dual of
the differential dϕ from A.7.6 induces a pull-back ϕ∗ : ΩkX  (U  ) → ΩkX (ϕ−1 (U  ))
of k -forms for every open subset U  of X  . We leave the details to the reader.
A.7.27. Let Y be a closed subvariety of the smooth variety X over K . We assume
that Y is also smooth. Then TY is a subbundle of TX . The quotient vector bundle
TX |Y /TY is called the normal bundle of Y in X and it is denoted by NY /X .
The conormal bundle NY∗ /X has fibre over y ∈ Y equal to { ∈ TX,y ∗
| (∂) =
0 ∀∂ ∈ TY,y }.
A.7.28. Using the theory of coherent sheaves (see Section A.10 on cohomology),
it is possible to define Ω1X also on singular varieties. Let ∆ : X → X × X be the
diagonal morphism and let J∆ be the ideal sheaf of the diagonal in X × X . Then
Ω1X := ∆∗ (J∆ /J∆2 ) is a coherent sheaf on X . If U is an affine open subset, then
it follows from [148], Rem.II.8.9.2, that
DerK (K[U ], M ) = Hom(Ω1X (U ), M ) (A.8)
for any K[U ]-module M . Applying this with M = K(x) for x ∈ U or with
M = K[U ], we see easily that the old definition of Ω1X agrees with the above one
for a smooth variety. Moreover, if X is of pure dimension n , then X is smooth
over K if and only if Ω1X is locally free of rank n . This follows immediately
from (A.8) for M = K(x).
A.7.29. More generally, if ϕ : X → Y is a morphism of varieties (or schemes),
then we define the sheaf of relative differentials Ω1X/Y by
Ω1X/Y := ∆∗ (J∆ /J∆2 ),
where ∆ : X → X ×Y X is the diagonal morphism and J∆ is the sheaf of
ideals of ∆(X). If X and Y are affine, then Ω1X/Y (X) is the module of Kähler
differentials defined in B.1.18. In general, Ω1X/Y is a coherent sheaf. If X, Y, ϕ
are defined over K , then we have a natural exact sequence
ϕ∗ Ω1Y → Ω1X → Ω1X/Y → 0.
For details, we refer to [148], II.8.
544 A L G E B R A I C G E O M E T RY

A.8. Divisors

In this section, X denotes a variety over the field K .


n
A.8.1. A Weil divisor on X is a formal linear combination i=1 ni Yi of ir-
reducible closed subvarieties Yi of codimension 1. The multiplicities ni are
assumed to be in Z. In more abstract terms, a Weil-divisor is an element of the
free abelian group with basis the set of irreducible closed subvarieties of X of
codimension 1. The Weil divisors form an abelian group with addition
  
nY Y + nY Y = (nY + nY )Y,
Y Y Y
where Y ranges over all irreducible closed subvarieties of codimension 1. Note
that only finitely many nY and nY are different from 0.
A.8.2. An irreducible closed subvariety of codimension 1 is called a prime di-
visor. Non-negative linear combinations of prime divisors are called effective
Weil divisors. We define a partial order on the space of Weil divisors by setting
D ≥ D if and only if D − D is effective.
The support supp(D) of a Weil divisor D is the union of all prime divisors with
non-zero multiplicity. The latter are called the components of D . The compo-
nents Y with multiplicity
 nY > 0 (resp. nY < 0) are called the zeros (resp.
poles) of D and nY >0 Y (resp. nY <0 Y ) is the zero- (resp. pole-) divisor
of D .
Example A.8.3. Let C be a curve over an algebraically closed field. Then the
prime divisors of C are just the sets with one element and we denote the prime
divisor
 associated to x ∈ C by [x]. Then the Weil divisors on C are of the form
x∈C nx [x].

A.8.4. Let F be a field. A surjective function v : F → Z ∪ {∞} is called a


discrete valuation if
(a) v(α) = ∞ ⇔ α = 0;
(b) v(αβ) = v(α) + v(β);
(c) v(α + β) ≥ min(v(α), v(β)).
The ring Rv := {α ∈ F | v(α) ≥ 0} is called a discrete valuation ring. It is a
local ring with maximal ideal mv := {α ∈ F | v(α) > 0}. We can show easily
that Rv is a principal ideal domain. A principal generator π of mv is called a local
parameter and it is unique up to multiplication with units. It is characterized by
v(π) = 1 or equivalently by π irreducible.
The valuation v may be reconstructed from the discrete valuation ring Rv by the
unique factorization property: For α ∈ K and a local parameter π , there is a
A.8. Divisors 545

unique unit u ∈ Rv such that α = uπ v(α) . By passing to the discrete absolute


value | |v := e−v , we see that our definitions agree with 1.2.9.
Theorem A.8.5. Let R be a commutative noetherian local ring with 1 and maxi-
mal ideal m . Then the following conditions are equivalent:
(a) dimR/m (m/m2 ) = dim(R) = 1;
(b) R is a unique factorization domain of Krull dimension 1;
(c) m is a principal ideal and dim(R) = 1;
(d) R is a principal ideal domain which is not a field;
(e) R is a discrete valuation ring;
(f) R is an integrally closed domain of Krull dimension 1.
Proof: The first equality in (a) means that R is a regular local ring and hence a unique
factorization domain ([197], Th.48, p.142) yielding (b). The implications (b) ⇒ (a), (c), (f)
are obvious.
To deduce (d) from (c), let π be a principal generator of m . By Krull’s intersection theorem
, j
j m = {0} ([157], Th.7.21), every α ∈ R \{0} may be written in the form α = uπ
v(α)

for a unit u in R and v(α) ∈ N . We deduce that R is an integral domain, otherwise π


would be nilpotent contradicting dim(R) = 1 . Hence the above factorization of α is
unique. If I is an ideal of R , then it is generated by any α with v(α) minimal. This
proves (d).
For (d) ⇒ (e), we deduce from the assumptions that R has an irreducible element π unique
up to units. Then the above unique factorization holds for every non-zero α in the quotient
field F of R if we allow v(α) ∈ Z . This gives the discrete valuation v on F with
valuation ring R . Conversely, we get (e) ⇒ (b) using the local parameter π . Finally, we
quote (f) ⇒ (b) from the theory of Dedekind domains ([157], Th.10.6). 
Example A.8.6. Let f be a non-zero rational function on the irreducible smooth
curve C over an algebraically closed field. For every x ∈ C , the local ring OC,x
is regular (see A.7.12). From Example A.3.6, we easily deduce dim(OC,x ) = 1.
By Theorem A.8.5, OC,x is a discrete valuation ring with a canonical discrete
valuation v on K(C). We define the order of f in x by ordx (f ) := v(f ). The
Weil divisor associated to f is defined by

div(f ) = ordx (f )[x].
x∈C
We have to prove that {x ∈ C | ordx (f ) = 0} is finite. We may assume that C is an affine
variety. As f is the quotient of two regular functions, we may assume f ∈ K[C] . Since f
is an invertible regular function outside the closed subset Z({f }) , we get the claim.

From A.8.4 (b), we get the property div(f g) = div(f )+div(g) for f, g ∈ K(C)\
{0}.
546 A L G E B R A I C G E O M E T RY

A.8.7. In order to extend this construction to a higher dimension, we need the local
ring in a prime divisor. Let Y be an irreducible closed subset of the K -variety X .
Then we consider pairs (U, f ), where U is an open subset of X with U ∩ Y = ∅
and f ∈ OX (U ). Two pairs (U, f ) and (U  , f  ) are called equivalent if f = f 
on an open subset U  ⊂ U ∩ U  with U  ∩ Y = ∅. (Since Y is irreducible,
U ∩ Y and U  ∩ Y are both open dense subsets of Y and hence U ∩ U  ∩ Y is not
empty.) The equivalence classes form a ring OX,Y called the local ring of X in
Y . It is a local ring with maximal ideal mY formed by the classes of (U, f ) with
f (Y ∩ U ) = {0}.
If Y is of dimension 0, then we have seen in A.2.7 that Y is the set of conjugates
of a point x ∈ X . Just by definition, we have OX,x = OX,Y .
A.8.8. To study the local ring in a prime divisor Y , we may restrict to any open
subset U of X with U ∩ Y = ∅. So we may assume that X is an affine variety
over K . By Example A.3.2, the ideal {f ∈ K[X] | f (Y ) = {0}} of Y is a prime
ideal ℘ in K[X]. We have a homomorphism
K[X] −→ OX,Y , f → (X, f ),
which induces a homomorphism K[X]℘ → OX,Y . The latter is an isomorphism
by definition of localization and since f is locally the quotient of two regular
functions on X . Note that the prime ideals of K[X]℘ are in one-to-one corre-
spondence with prime ideals in K[X] contained in ℘ . A prime ideal ℘˜ of K[X]℘
corresponds to its inverse image in K[X] (see [157], Prop.7.9). Using the one-to-
one correspondence between prime ideals in K[X] and irreducible closed subsets
of X , we conclude that
dim(OX,Y ) = codim(Y, X). (A.9)
This holds also for non-affine varieties X . We leave the details to the reader.
A.8.9. A variety X over K is called regular in codimension 1 if OX,Y is a
regular local ring for all prime divisors Y of X .
A.8.10. A variety X over K is said to be normal if OX,x is an integrally
closed domain for all x ∈ X . An easy exercise shows that any localization
of an integrally closed domain remains integrally closed. For a prime divisor
Y of X and y ∈ Y , OX,Y is the localization of OX,x in the prime ideal
{f ∈ OX,Y | f |Y = 0}. We conclude that OX,Y is an integrally closed do-
main of Krull dimension 1.
By Theorem A.8.5, a normal variety is regular in codimension 1. Moreover, every
regular variety is normal. This is an easy consequence of the fact that a regular
local ring is a unique factorization domain ([197], Th.48, p.142).
A.8.11. Let X be a K -variety which is regular in codimension 1. For any prime
divisor Y , the regular local ring OX,Y has Krull dimension 1 by (A.9). By
A.8. Divisors 547

Theorem A.8.5, OX,Y is a discrete valuation ring for a canonical valuation ordY
on its field of fractions Q. We call ordY the order of f ∈ Q in Y .
If X is irreducible, then Q = K(X) and we define the Weil divisor of a non-zero
rational function f on X by

div(f ) := ordY (f )Y,
Y
where Y ranges over all prime divisors of X . We call them also principal Weil
divisors. We have to prove that ordY (f ) = 0 only for finitely many prime divisors
Y . As in Example A.8.6, we may assume that X is affine and that f ∈ K[X].
Then f is an invertible regular function outside Z({f }), hence ordY (f ) = 0
only for irreducible components of Z({f }) proving the claim.

Two Weil divisors D, D are called rationally equivalent, denoted by D ∼ D ,


if D − D is a principal divisor.
A.8.12. We still assume that X is a K -variety regular in codimension 1. We
generalize the above construction to an invertible meromorphic section s of a line
bundle L on X . Let Y be any prime divisor of X . We choose a trivialization
(Uα , ϕα ) of L with Uα ∩ Y = ∅. Then we define the order of s in Y by
ordY (s) := ordY ∩U (sα ), where sα = ϕα ◦ (s|Uα ) is considered as a rational
function on U . If (Uβ , ϕβ ) is another trivialization, then we have sα = gαβ sβ
for the transition function gαβ . Since gαβ is invertible on Uα ∩ Uβ and hence in
OX,Y , we see that the definition of ordY (s) does not depend on the choice of the
trivialization. The Weil divisor associated to s is defined by

div(s) := ordY (s)Y,
Y
where Y ranges over all prime divisors of X . Since X is covered by finitely many
trivializations and on a trivialization, the Weil-divisor of s is the Weil-divisor of
the corresponding rational function, we see that ordY (s) = 0 only for finitely
many prime divisors Y . Moreover, if we view s as a section of L defined on
an open subset U (say with U maximal), then ordY (s) = 0 only for some irre-
ducible components of (X \ U ) ∪ {x ∈ U | s(x) = 0}.
A.8.13. Now we consider two line bundles L, L on X . Let s, s be invertible
meromorphic sections of L and L , respectively. Then s ⊗ s is an invertible
meromorphic section of L ⊗ L and we have
div(s ⊗ s ) = div(s) + div(s )
using property (b) of the order function.
A.8.14. Clearly, if ϕ : L → L is an isomorphism of line bundles and s = ϕ ◦ s ,
then we have div(s ) = div(s). To connect line bundles with sections, we should
identify sections up to such isomorphisms.
548 A L G E B R A I C G E O M E T RY

A.8.15. Let X be a variety over K . On X , we consider the following group


D(X). It is the set of equivalence classes of pairs (L, s), where s is an invertible
meromorphic section of L and (L, s) ∼ (L , s ) if there is an isomorphism ϕ :
L → L with s = ϕ ◦ s . The group operation is given by
(L, s) · (L , s ) = (L ⊗ L , s ⊗ s ).
The identity element is represented by (OX , 1) and the inverse of (L, s) is repre-
sented by (L∗ , s∗ ), where s∗ is given on an open dense subset by s∗ (x)(s(x)) =
1. This explains the notion of invertible meromorphic section. Up to now, we
write ss for s ⊗ s , s−1 for s∗ and s/s for s ⊗ (s )−1 . Clearly, D(X) is an
abelian group.
A.8.16. We have a surjective homomorphism of D(X) onto Pic(X), given by
(L, s) → cl(L).

To see surjectivity, we have to prove that every line bundle L has an invertible meromorphic
section. For any irreducible component Xj of X , we choose a non-empty trivialization
(Uj , ϕj ) of L such that Uj is disjoint from the other irreducible components. Then we
have sj ∈ Γ(Uj , L) given by sj (x) = ϕ−1 j (x, 1) . The union U of all Uj s is disjoint
and there is s ∈ Γ(U, L) defined by s|U j = sj . Since U is dense, (U, s) is an invertible
meromorphic section of L .

A.8.17. The elements of D(X) may be described by Cartier divisors. The idea
behind this concept is that divisors should locally given by single equations. To
make it precise, a Cartier divisor on X is given by the data (Uα , fα )α∈I , where
(Uα )α∈I is an open covering, fα is a unit in K(Uα ), and fα /fβ is a unit in
OX (Uα ∩ Uβ ) for all α, β . We identify two Cartier divisors (Uα , fα )α∈I ,
(Uβ , fβ )β∈J if fα /fβ is a unit in OX (Uα ∩ Uβ ) for all α, β . Given two Cartier
divisors D and D , it is always possible to pass to a common refinement. So we
add two Cartier divisors D and D by choosing representatives (Uα , fα )α∈I and
(Uα , fα )α∈I and then D + D is given by (Uα , fα fα )α∈I . Clearly, they form an
abelian group.
A.8.18. We will show below that D(X) is isomorphic to the group of Cartier
divisors. The line bundle associated to the Cartier divisor D will be denoted by
O(D) occuring with a distinguished invertible meromorphic section sD . We will
speak about certain properties of D as ample or base-point-free if O(D) has the
corresponding properties.
Let s be an invertible meromorphic section of a line bundle L and let us choose a
trivialization (Uα , ϕα )α∈I of L. Then sα := ϕα ◦ s may be viewed as a rational
function on Uα . If gαβ is the transition function, then we have sα = gαβ sβ on
Uα ∩Uβ . Since gαβ is a unit, we see that D(s) := (Uα , sα )α∈I is a Cartier divisor
on X . It does not depend on the choice of the trivialization. Then (L, s) → D(s)
induces a homomorphism from D(X) to the group of Cartier divisors.
A.8. Divisors 549

It remains to give an inverse of this map. So let D = (Uα , fα )α∈I be a Cartier


divisor. Then gαβ := fα /fβ is a unit in OX (Uα ∩ Uβ ). Let O(D) be the
line bundle on X given by the transition functions gαβ (see A.5.7). Recall its
construction: We glue the trivial bundles Uα × A1K along the isomorphisms given
by gαβ . This gives a trivialization over Uα . If we consider fα as a meromorphic
section of Uα × A1K , then fα = gαβ fβ shows that they fit on overlappings, i.e.
we get an invertible meromorphic section sD of O(D). It is easy to check that
D → (O(D), sD ) gives the inverse.

A.8.19. Let f be a rational function on X , which is not identically 0 on any


irreducible component of X . In other words, f is an invertible meromorphic
section of OX . Then the associated Cartier divisor D(f ) is called principal.
The above considerations show that (L, s) → D(s) induces an isomorphism from
Pic(X) onto the group of Cartier divisors modulo principal Cartier divisors.

A.8.20. Let D be a Cartier divisor on a variety X over K assumed to be regular


in codimension 1. Then the Weil divisor associated to D is given by cyc(D) =
div(sD ). If D is given by the local data (Uα , fα )α∈I , then the restriction of
cyc(D) to U is equal to div(fα ). Here, restriction means that we replace all
prime divisors Y of X by Y ∩ U letting the coefficients invariant.

A.8.21. Let X be a regular variety over K . Then D → cyc(D) is an isomor-


phism from the group of Cartier divisors onto the group of Weil divisors ([148],
Prop.II.6.11, Rem.II.6.11.1A). So on regular varieties, we do not distinguish be-
tween Cartier divisors and Weil divisors and we simply speak about divisors.
We claim that a divisor D is effective if and only if sD is a global section of
O(D). As a special case, we obtain that a non-zero rational function f on an
irreducible regular variety is regular if and only if the pole-divisor of f is zero.

Clearly, if sD is a global section, then D = div(sD ) is effective. On the other hand,


let D be a prime divisor. For any x ∈ X , the local ring OX,x is a unique factorization
domain ([197], Th.48, p. 142). The ideal Ix of D in OX,x is defined by Ix := {(U, f ) ∈
OX,x | f = 0 on U ∩ D } . If x ∈ D , then we have Ix = OX,x . If x ∈ D , then we
choose an affine neighbourhood V of x . Then the ideal I of D in K[V ] is a prime ideal
not containing smaller prime ideals (see A.3.2). So the same holds for Ix in OX,x . We
conclude that Ix is a principal ideal generated by a prime πx ∈ OX,x . Since I is finitely
generated, we deduce that πx is regular and generates I in a neighbourhood Ux of x . For
x ∈ D , we set Ux := X \ D and πx = 1 . Then (Ux , πx )x∈X is a Cartier divisor of X .
The associated Weil divisor is clearly supported in the irreducible D . Moreover, for any
x ∈ D , (Ux , πx ) generates the maximal ideal in OX,D . Hence the associated Weil divisor
is D . This proves the claim. Note that this proves also surjectivity of the cycle map.

A.8.22. On an irreducible regular complete variety X over an algebraically closed


field K , it is sometimes more convenient to work with divisors than with sections.
550 A L G E B R A I C G E O M E T RY

For a divisor D on X , we define the complete linear system


|D| := {D | D ∼ D ≥ 0}
in the space of divisors. Then we have a surjective map
Γ(X, O(D)) \ {0} −→ |D|, s → div(s).
Using K algebraically closed, two non-trivial global sections s, s have the same
divisor if and only if s/s ∈ K × . In fact, s/s has to be a rational function without
poles, hence regular (see A.8.21) and thus constant (use A.6.15(c) and (d)). We
may identify the complete linear system |D| with the projective linear space given
by the one-dimensional linear subspaces in Γ(X, O(D)). Hence we have
dim(|D|) = dim(Γ(X, O(D))) − 1.
A base-point of |D| is x ∈ X with x ∈ supp(D ) for all D ∈ |D|. It is the
same as a base-point for O(D) (see A.5.20). More generally, a subspace of the
projective space |D| is called a linear system, but in this book we consider only
complete linear systems.
A.8.23. Let D = (Uα , fα )α∈I be a Cartier divisor on any K -variety X . The
support of D is the closed subset

×
supp(D) := {x ∈ Uα | fα ∈ OX,x },
α
×
where OX,x denotes the group of invertible elements in OX,x . The Cartier divisor
D is called effective if and only if fα ∈ OX (Uα ) for all α ∈ I . If D is
effective, then it follows from Krull’s principal ideal theorem ([157], p.449) that
supp(D) is of codimension 1 in X . For non-effective Cartier divisors, this does
not necessarily hold. However, if X is regular, then we may identify Cartier-
and Weil-divisors (see A.8.21) and their supports agree, thus the support is of
codimension 1.
A.8.24. Let X be a regular variety over K with an irreducible closed subset Y
and a closed subset Z of codimension 1 with Y ⊂ Z . As a consequence of
A.8.21 and A.8.23, we remark that Y ∩ Z has codimension 1 in Y and hence
codim(Y ∩ Z, X) = codim(Y, X) + 1
following from additivity of codimensions (use A.4.9).
A.8.25. We mention the following generalization. If Y and Z are irreducible closed sub-
sets of a smooth variety X over K , then
codim(Y ∩ Z, X) ≤ codim(Y, X) + codim(Z, X).
For a proof, we refer to [125], Sec. 8.2.
A.9. Intersection theory of divisors 551

A.8.26. The pull-back of a Cartier divisor is not always well defined as a Cartier
divisor. Let ϕ : X  → X be a morphism of varieties over K and let D =
(Uα , fα )α∈I be a Cartier divisor on X . We have to assume that the image of no
irreducible component of X  is contained in supp(D). Then the pull-back of D
is the Cartier divisor
ϕ∗ (D) := (ϕ−1 (Uα ), fα ◦ ϕ)α∈I .
Our assumptions imply that fα ◦ ϕ are well-defined rational functions on X  . It
is easy to check that (O(ϕ∗ D), sϕ∗ (D) ) = (ϕ∗ O(D), ϕ∗ (sD )).

A.9. Intersection theory of divisors

The basic reference for intersection theory is the book of W. Fulton [125]. We
need only the first two chapters, namely properties of the intersection product of
a divisor with a closed subvariety. We collect here the most important results, not
going into full generality, assuming often that the ambient variety is smooth and
neglecting the theory of refined intersections which takes care about supports.
In this section, X is a variety over a field K .
A.9.1. We extend the concept of divisors to higher codimension. A cycle of di-
mension d is a formal linear combination of irreducible closed subvarieties of X
of dimension d with coefficients
1 in Z . They form an abelian group Zd (X). The
elements of Z(X) := d∈N Zd (X) are called cycles. A basis of this abelian
groupis formed by the irreducible closed subvarieties called prime cycles. If
Z = Y nY Y is a cycle, then the prime cycles Y with multiplicity nY = 0 are
called the components of Z . By definition, they are finite in number. Their union
is called the support of Z . Note that a Weil divisor is a cycle of pure codimension
1.
A.9.2. We assume that X is irreducible and Y is a prime divisor on X . The
goal is to define the order of a non-zero function f of X in Y . This will be
a generalization of the construction in A.8.11. In A.8.7, we have introduced the
local ring OX,Y . By A.8.8, it is an integral domain of Krull dimension 1 with
quotient field K(X).
For any non-zero f in the maximal ideal mY of OX,Y , the ring A = OX,Y /f OX,Y
has Krull dimension 0. Since the localization of a noetherian ring remains noe-
therian ([157], Th.7.10), it follows from A.8.8 that OX,Y is a noetherian ring. In
commutative algebra, a theorem of Krull says that the intersection of all prime
ideals in a commutative ring is the ideal of nilpotent elements ([157], Th.7.1).
Since A is a noetherian local ring whose maximal ideal m = mY /f OX,Y is the
unique prime ideal, we conclude that m is nilpotent, i.e. the ideal mn generated
by the n -fold products in m is 0 for some n ∈ N . Now we have a chain of ideals
0 = mn ⊂ mn−1 ⊂ · · · ⊂ m ⊂ A. (A.10)
552 A L G E B R A I C G E O M E T RY

Since A is noetherian, the K(Y )-vector space mj /mj+1 is finite dimensional,


where K(Y ) is equal to the residue field A/m . Then we define the order of f in
Y by

n−1
ordY (f ) := dimK(Y ) (mj /mj+1 ).
j=0
If f ∈ OX,Y \ mY , then f is a unit in OX,Y and we define ordY (f ) = 0.
Example A.9.3. If X is regular in codimension 1, then OX,Y is a principal ideal
domain (see A.8.11). Then mY is generated by the local parameter πY . If f =
uπYn is the prime factorization of f ∈ OX,Y , u a unit, then mj is generated by the
image of πYj and mn = 0. Moreover, multiplication by πYj gives an isomorphism
of K(Y ) = A/m onto mj /mj+1 . This proves that our new definition of the order
agrees with the one in A.8.11.
A.9.4. The concept of order is best understood in terms of composition series. Let
Λ be a ring and M be a Λ -module. Then M is said to have finite length if there
is a chain
0 = M0  M1  · · ·  Mr = M
of submodules without possible refinement. Such a chain is called a composition
series and r is called its length. Then the Jordan Hölder theorem ([157], p.108)
implies that all composition series have the same length. This number is called
the length of M and is denoted by Λ (M ). If N is a submodule of M , then it
is easy to prove that M has finite length if and only if N and M/N have finite
length. In this case, we have ([157], Exercise 2, p.109)
Λ (N ) + Λ (M/N ) = Λ (M ). (A.11)

If X is an irreducible K -variety with prime divisor Y , we apply this to the in-


tegral domain and noetherian local ring Λ = OX,Y . In the notation of A.9.2, we
consider the Λ -module mj /mj+1 for any non-zero f ∈ OX,Y . It is clear that a
Λ -submodule is the same as a K(Y )-subspace. Therefore
Λ (mj /mj+1 ) = dimK(Y ) (mj /mj+1 ).
With an inductive application of (A.11) to the chain (A.10), we get
ordY (f ) = Λ (Λ/f Λ).
A.9.5. Let f, g be non-zero elements of OX,Y . We claim
ordY (f g) = ordY (f ) + ordY (g). (A.12)
This follows from (A.11) and the exact sequence
·f
0 −→ Λ/gΛ −→ Λ/f gΛ −→ Λ/f Λ −→ 0
for Λ = OX,Y . Therefore we have a unique extension of ordY to a function
ordY : K(X)× → Z satisfying (A.12).
A.9. Intersection theory of divisors 553

A.9.6. Recall that, if OX,Y is regular, then we have


ordY (f + g) ≥ min{ordY (f ), ordY (g)}
for all f, g ∈ K(X)× . Conversely, if this holds, then OX,Y is regular (as a
discrete valuation ring, see Theorem A.8.5).
A.9.7. Let X be an irreducible K -variety and f a non-zero rational function on
X . There is a non-empty open subset U of X such that f is a unit in OX (U ).
By definition of the local ring OX,Y , we have ordY (f ) = 0 for all prime divisors
Y with Y ∩ U = ∅. Therefore ordY (f ) = 0 is possible only for the irreducible
components of X \ U of codimension 1. As they are finite in number, we get a
well-defined Weil divisor

div(f ) := ordY (f )Y,
Y

where Y ranges over all prime divisors of X . It is called the Weil divisor of f
(or principal Weil divisor). By A.9.5, we have
div(f g) = div(f ) + div(g)
for non-zero f, g ∈ K(X).
The assumption X irreducible was just made for simplicity. By the same construc-
tion, we can define the Weil divisor of a rational function, which is not identically
zero on any irreducible component of X .
A.9.8. Let X be a K -variety. We consider the subgroup R(X) of Z(X) gen-
erated by div(f ), where f ranges over all non-zero rational functions on prime
cycles of X . Note that we view the Weil divisor div(f ) of Y as a cycle on X .
Two cycles are called rationally equivalent if their difference is in R(X). This
gives an equivalence relation on Z(X) denoted by ∼ . The quotient CH(X) :=
Z(X)/R(X) is called the Chow group. We grade it by dimension.
A.9.9. Let ϕ : X → X  be a morphism of varieties with X complete (or more
generally a proper morphism). Then the image of a closed subset of X is a closed
subset of X  (see A.6.15). Let Y be a prime cycle of X . Then ϕ(Y ) is a prime
cycle of X  and we may view K(ϕ(Y )) as a subfield of K(Y ) by the map f  →
f  ◦ ϕ . Let

[K(Y ) : K(ϕ(Y ))]ϕ(Y ) if [K(Y ) : K(ϕ(Y ))] < ∞,
ϕ∗ (Y ) :=
0 else.

For any cycle Z = nY Y , we define the push-forward of Z by

ϕ∗ (Z) := nY ϕ∗ (Y ) ∈ Z(X  ),
where Y ranges over all the prime divisors of X .
554 A L G E B R A I C G E O M E T RY

A.9.10. Note that [K(Y ) : K(ϕ(Y ))] < ∞ if and only if Y and ϕ(Y ) have
the same dimension. This follows from the fact that the dimension is equal to
the transcendence degree of the function field (see A.4.11). If K(Y ) is separable
over K(ϕ(Y )), then [K(Y ) : K(ϕ(Y ))] is the number of points in the fibre of a
generic point of ϕ(Y ) (see A.12.9).
A.9.11. Let ϕ : X → X  be a surjective proper morphism of irreducible varieties
over K and let f be a non-zero rational function on X . We have seen in A.9.9
that K(X) is a field extension of K(X  ). If this is a finite extension, then
ϕ∗ (div(f )) = div(N (f )),
where N : K(X) → K(X  ) is the norm. If the extension is infinite, the push-
forward is 0. For a proof, see [125], Prop.1.4.
A.9.12. Let ϕ : X → X  be a proper morphism of varieties over K . Then A.9.11
shows that ϕ∗ R(X) ⊂ R(X  ) and so we get a push-forward map
ϕ∗ : CH(X) −→ CH(X  ),
mapping the class of a cycle to the class of its push-forward.
A.9.13. Let X be a K -variety and let L be a line bundle on X . For an invertible
meromorphic section s of L, the Weil divisor div(s) associated to s is defined
similarly as in A.8.12 and A.8.13 still holds.
A.9.14. Let Y be a prime cycle on a K -variety X . Then we define c1 (L).Y ∈
CH(X) to be the rational equivalence class of div(sY ), where sY is any in-
vertible meromorphic section of L|Y . We have seen in A.8.16 that such an sY
always exists. If sY is another choice, then sY /sY is a rational function and
hence div(sY ) and div(sY ) are rationally equivalent.
By additivity, we define c1 (L).Z for all cycles Z on X . If Z is rationally equiv-
alent to 0, then c1 (L).Z = 0 (see [125], Cor.2.4.1). Therefore c1 (L).α is well-
defined for α ∈ CH(X) by using representatives. Then the homomorphism
CH(X) −→ CH(X), α → c1 (L).α
is called the first Chern class operation of L. Clearly, it does not depend on the
isomorphism class of the line bundle.
A.9.15. If L and L are line bundles on X and α ∈ CH(X), then
c1 (L ⊗ L ).α = c1 (L).α + c1 (L ).α.
This follows immediately from A.8.13. Moreover, we have
c1 (L). (c1 (L ).α) = c1 (L ). (c1 (L).α) .
For a proof of commutativity, we refer to [125], Cor.2.4.2.
A.9. Intersection theory of divisors 555

A.9.16. If ϕ : X → X  is a proper morphism over K and L is a line bundle on


X  , α ∈ CH(X), then
ϕ∗ (c1 (ϕ∗ L ).α) = c1 (L ).ϕ∗ (α).
This is the projection formula. For a proof, see [125], Prop.2.5(c).
A.9.17. The following remark is for readers familiar with the basics of schemes. If X is
any scheme of finite type over the field K with irreducible components X1 , . . . , Xr , then
the obvious generalization of (A.9) on page 546 proves dim(OX,X j ) = 0 . Since OX,X j
is noetherian (as in A.9.2), we conclude that OX,X j is of finite length ([157], Th.7.12) and
we may define the multiplicity of Xj in X by
(OX,X j ) . The cycle of X is

cyc(X) :=
(OX,X j ) · Xj .
j

If D is an effective Cartier divisor on X , then D may be viewed as a closed subscheme


of X given by the local equations. By construction, the cycle of D is just the Weil divisor
associated to D .
This is useful to handle base change to a field extension F over K . Let XF be the
base change of X to F as a scheme. Then there is a unique base change homomorphism
Z(X) → Z(XF ), Z
→ ZF , such that for any closed subscheme Y of X , we have
cyc(YF ) = cyc(Y )F . To see this, define the base change first for prime cycles in the
obvious way and then extend by linearity. By [125], Lemma A.4.1, it has the required
property (the argument is as in [125], Lemma 1.7.1).
It is easy to see that the base change of cycles descends to the Chow groups. For α ∈
CH(X) and a line bundle L on X , we claim that c1 (LF ).αF is equal to the base change
of c1 (L).α . To see this, we may assume that α is prime and even equal to X . By A.8.18,
it is enough to show that cyc(DF ) = cyc(D)F for a Cartier divisor D on X . This is a
local question, so we may assume D effective and the claim follows from the above.

Note that for an irreducible closed subvariety Y of a variety X over K , the base change
to F as a scheme may be non-reduced and hence may be different from the base change as
a variety. Only the use of schemes leads to the above compatibilities.

A.9.18. Now let X be a smooth variety over K . Then Cartier divisors and Weil
divisors are the same (cf. A.8.21). It follows immediately that we have an isomor-
phism
Pic(X) −→ CH 1 (X), cl(L) → c1 (L).X.
Hence we get an intersection theory with divisors: For a divisor D on X and a
cycle Z on X , the intersection product is
D.Z := c1 (O(D)).Z ∈ CH(X).
Then the intersection product is bilinear, compatible with rational equivalence and
satisfies commutativity for divisors. If ϕ : X → X  is a morphism of smooth
varieties, then we have a pull-back ϕ∗ (D ) := c1 (ϕ∗ O(D )).X  ∈ CH(X) of
556 A L G E B R A I C G E O M E T RY

a divisor D on X  . If ϕ is a proper morphism, then we have the projection


formula
ϕ∗ (ϕ∗ (D ).Z) = D .ϕ∗ (Z)
for a cycle Z on X and a divisor D on X  .
Proposition A.9.19. Let X be a smooth variety over K and let D be a prime
divisor which is smooth over K . Then the self-intersection D.D ∈ CH(X) is
represented by the divisor of any invertible meromorphic section of the normal
bundle ND/X .
Proof: It is enough to prove O(D)|D ∼ = ND/X . Let D be equal to (Uα , fα )α∈I as a
Cartier divisor. Since D is effective, we have fα ∈ O(Uα ) (see A.8.21). The conormal

bundle ND/X is generated by dfα as a subbundle of TX∗ over D ∩ Uα (use A.7.27). In

particular, dfα is a non-vanishing section defining a trivialization of the line bundle ND/X
on D by mapping λ · dfα to λ . The transition function gαβ is computed by

gαβ · dfα = dfβ = · dfα


on D ∩ Uα ∩ Uβ . Hence ND/X has the same transition functions gαβ = fβ /fα as
O(−D)|D proving the claim. 

A.9.20. Let X be a smooth variety over K . For some purposes, it is necessary


to define the intersection product of a divisor D with a prime cycle Y of codi-
mension p as a honest cycle (not only as a rational equivalence class as above).
However, this is only possible if Y is not contained in the support supp(D) of
D . Under this assumption, we see that the invertible meromorphic section sD of
O(D) corresponding to D restricts to an invertible meromorphic section sD |Y
of O(D)|Y . Then the proper intersection product of D and Y is the cycle
D.Y := div(sD |Y ) ∈ Z p+1 (X).
Clearly, the rational equivalence class of the proper intersection product induces
the intersection product in the Chow group. By additivity, we define the proper
intersection product of a Cartier divisor D and a cycle Z under the hypothesis
that no component of Z is contained
 in supp(D). Then the proper intersection
product is of the form D.Z = W nW W, where W is ranging over all prime
cycles. The number nW is called the intersection multiplicity of W in D.Z .
If additionaly Z = D is also a divisor, we can prove D.D = D .D as an identity
of cycles ([125], Th.2.4).
A.9.21. Let X be a smooth variety over K . We consider a prime cycle Y not
contained in the support of a divisor D . For a prime cycle W different from the
irreducible components of supp(D) ∩ Y , the intersection multiplicity of W in
the proper intersection product D.Y is obviously zero. Now we assume addi-
tionaly that D is effective. Then we claim that the intersection multiplicity of an
irreducible component W of supp(D) ∩ Y is at least 1.
A.9. Intersection theory of divisors 557

To prove this, note that D is given as a divisor by a local equation γ ∈ OX (U ) for some
open subset U intersecting W . Then the intersection multiplicity of D.Y in W is the
order of γ in OY ,W . Since γ vanishes on W , it is contained in the maximal ideal mY ,W
of OY ,W . Hence the length of OY ,W /γOY ,W is at least 1 .
As a corollary of proof, we see that the intersection multiplicity of W in D.Y is 1
if and only if the maximal ideal mY,W of OY,W is generated by a local equation of
D . In this case, OY,W is a regular local ring since it follows that mY,W /m2Y,W is
a one-dimensional K(W ) = OY,W /mY,W -vector space and the Krull dimension
of OY,W is codim(W, Y ) = 1 (cf. A.8.8).
Example A.9.22. Assume that D and Y are both smooth. Then the intersection
of D and Y is called transversal if TY,y ⊂ TD,y for all y ∈ D ∩ Y . We claim
that the intersection multiplicity of an irreducible component W of D∩Y in D.Y
is 1.

To prove it, we may assume that K is algebraically closed because the intersection product
is compatible with base change (see A.9.17). We may assume that D is given by a single
equation γ ∈ OX (X) . Let γ = uπ n for a unit u in OY ,W and a local parameter π
in OY ,W (which is a principal ideal domain since Y is smooth). We know that n =
ordW (γ|Y ) ≥ 1 . There is a y ∈ W such that π vanishes in y and u is regular in
y . Using Zariski’s definition of the tangent space (see A.7.8), TY ,y ⊂ TD,y means that
γ ∈ mY ,y \ m2Y ,y . It follows n = 1 proving the claim.

Example A.9.23. Let C be a smooth curve over K . Then C × C is a smooth


surface (see A.7.17). We claim that for x, y ∈ C(K), we have
(C × {y}).({x} × C) = {(x, y)}
as a proper intersection product. By Example A.9.22, this follows from transver-
sality of the intersection. Let ∆ be the diagonal of C × C . Similarly, we deduce
that
∆.({x} × C) = {(x, x)} = (C × {x}).∆
for x ∈ C(K).
A.9.24.
 Let X be a complete K -variety. For Z ∈ Z0 (X) of the form Z =
Y nY Y with Y ranging over the irreducible closed subvarieties of dimension
0, we define the degree of Z by

deg(Z) := nY [K(Y ) : K].
Y

Note that K(Y ) is a finite-dimensional field extension of K (see A.9.10). The


unique map π : X → A0K is a proper morphism. The degree is characterized by
π∗ (Z) = deg(Z) · A0K .
It follows from A.9.11 that deg(Z) = 0 for Z rationally equivalent to 0. There-
fore the degree may be viewed as an additive homomorphism on CH0 (X).
558 A L G E B R A I C G E O M E T RY

Let X be a smooth complete variety over K of pure dimension d and let D1 , . . . ,


Dd be divisors on X , not necessarily distinct. We define their intersection num-
ber by
D1 · · · Dd := deg(D1 . . . Dd ) ∈ Z.
Here D1 . . . Dd denotes their intersection product in CH0 (X). Clearly, the in-
tersection number depends only on the rational equivalence classes in the Chow
group and it is invariant under permutation of the divisors. In other words, the in-
tersection number induces a symmetric multilinear form on the Chow group with
values in Z.
If ϕ : X  → X is a surjective morphism of irreducible smooth complete varieties
over K , both of dimension d , then we have
ϕ∗ D1 · · · ϕ∗ Dd = [K(X  ) : K(X)]D1 · · · Dd (A.13)

for divisors D1 , . . . , Dd on X . This follows from


ϕ∗ (X  ) = [K(X  ) : K(X)]X ∈ Zd (X)
and the projection formula.
If D1 , . . . , Dd are effective divisors, it may happen that D1 · · · Dd is negative.
But, if D1 , . . . , Dd−1 are base-point free and Dd ≥ 0, then A.9.33 below shows
that
D1 · · · Dd ≥ 0.
A.9.25. Let F/K be a field extension and D1 , . . . , Dd divisors on the smooth
complete K -variety X of pure dimension d . Using the same set of equations, we
get a divisor (Dj )F on the base change XF and we claim that
(D1 )F · · · (Dd )F = D1 · · · Dd .
By A.9.17, the intersection product is compatible with base change of cycles. Now the
claim follows from the general fact that the degree of a zero-dimensional cycle is invariant
under base change. To see this, it is enough to consider Y ∈ Z0 (X) prime and even
Y = X . Then the base change YF is the cycle associated to Spec(K(Y ) ⊗K F ) . Since
B := K(Y ) ⊗K F is a finite-dimensional F -algebra, it is the product of the local rings in
the prime ideals ([157], Th.7.13) and we conclude

deg(Y ) = [K(Y ) : K] = [B : F ] = [Bp : F ] = deg(YF ),
p∈Spec(B )

where the last step follows from [Bp : F ] =


(Bp )[B/p : F ] (see [125], Lemma A.1.3).

A.9.26. Let X be a projective variety over K and let L be an ample line bundle
on X . For a prime cycle Z ∈ Zd (X), the degree of Z with respect to L is
degL (Z) := deg(c1 (L) . . . c1 (L).Z) ∈ Z,
A.9. Intersection theory of divisors 559

where c1 (L) occurs d times. By additivity, we extend the degree to all cycles. In
particular, we define the degree of X with respect to L by

degL (X) := degL (Xj ),
j

where Xj is ranging over all irreducible components of X . Clearly, the degree of


a cycle depends only on its rational equivalence class, hence we may view it as an
additive homomorphism
degL : CH(X) −→ Z.
Example A.9.27. For affine space, we claim that CH 1 (AnK ) = {0}.

Let X be a prime divisor of AnK . First, we show that the ideal I(X) of X in K[x] is
generated by an irreducible polynomial. We choose any non-zero f (x) ∈ I(X) . Clearly,
f (x) is not a constant. By considering the prime factorization of f (x) and since I(X)
is a prime ideal, we see that I(X) contains an irreducible polynomial g(x) . Since X
has codimension 1 , we have X = Z(g) and Hilbert’s Nullstellensatz shows that g(x)
generates I(X) . Next, we have to prove X = div(g) . To see it, note that g is invertible
outside of X . This proves div(g) = mY for some m ∈ N . The maximal ideal of the
local ring OAnK ,X is generated by g , proving X = div(g) . Since this holds for any prime
divisor, we get CH 1 (AnK ) = {0} .
As a corollary of A.9.18, we see that any line bundle on AnK is isomorphic to the
trivial bundle, i.e. Pic(AnK ) = {0}.
Example A.9.28. We deduce from the above that
∼ CH 1 (Pn ) ∼
Pic(Pn ) = = Z.
K K

To see it, let Y be a prime divisor of PnK . We choose a standard affine open subset Uj =
{xj = 0} with Uj ∩ Y = ∅ . By the example above, Y ∩ Uj = div(f ) for some
f ∈ OPnK (Uj ) . Using Uj ∼ = AnK , f comes from an irreducible polynomial of degree d in
˜
n variables. Let f be the homogenization of f , i.e.

x0 xj−1 xj+1 xn
f˜(x0 , . . . , xn ) = xdj f ,..., , ,..., .
xj xj xj xj
We may view f˜ as a global section of OPnK (d) (see A.6.6). We claim that div(f˜) = Y .
This follows immediately from div(f ) = Y ∩ Uj and the non-vanishing of f˜ outside
of Uj . So we conclude that d
→ c1 (OPnK (d)).PnK is a surjective homomorphism of Z
onto CH 1 (PnK ) . Since the line bundles OPnK (d) are pairwise not isomorphic (compare the
dimensions of Γ(PnK , OPnK (±d)) ) and since Pic(X) ∼ = CH 1 (X) (see A.9.18), we get the
claim.
For a multiprojective space PK := PnK1 × · · · × PnKr (see A.6.13), we can similarly
show that Pic(PK ) ∼= Zr . We simply have to replace homogeneous polynomials
in the above consideration by multihomogeneous polynomials.
560 A L G E B R A I C G E O M E T RY


Proposition A.9.29. Let X be a variety over K , then CH p (X) −→ CH p (X ×
AnK ), where the isomorphism is given by mapping a prime cycle Y on X to the
prime cycle Y × AnK on X (see A.4.11).

Proof: For simplicity, we only prove surjectivity (see [125], Th.3.3, for a whole proof). By
induction, we may assume that n = 1 . Let Y  be a prime cycle on X × A1K . We have to
prove that Y  is rationally equivalent to a linear combination of cycles of the form Y ×A1K .
Replacing X by the closure of p1 (Y  ) , where p1 denotes the first projection, we may
assume X = p1 (Y  ) . If dim(Y  ) > dim(X) , then Y  = X × A1K . So we may assume
dim(Y  ) = dim(X) . Then Y  is a divisor in X × A1K . Let U be a non-empty affine open
subset of X . As a closure of an irreducible subset, X is irreducible and hence the same
holds for U . We consider the ideal I(Y  ∩ U  ) in K[U  ] = K[U ][x] for U  = U × A1K .
The ideal in K(U )[x] generated by I(Y  ∩ U  ) is generated by a polynomial f with
coefficients in K(U ) . By shrinking U , we may assume that f ∈ K[U ][x] and that f
generates I(Y  ∩ U  ) in K[U  ] . This shows that div(f ) and Y  agree on Y  ∩ U  , hence

Y  = div(f ) + j nj (Yj × A1K ) , where Yj is ranging over the irreducible components
of X \ U (of codimension 1 ). This proves surjectivity. 

Remark A.9.30. In particular, we have CH(AnK ) = {0}. Note that we need only
surjectivity in the above statement.

Example A.9.31. We claim that we have an isomorphism CH(PnK) ∼ = Z[x]/[xn+1 ]


of abelian groups. This can be seen as follows. A d + 1-dimensional subspace
of K n+1 is the same as the intersection of the kernels of p = n − d linearly
independent linear forms 1 (x), . . . , p (x). Then Ld = Z({1 (x), . . . , p (x)})
is called a d -dimensional projective linear subspace of PnK . If p = 1, then we
call it a hyperplane. Since the hyperplanes intersect transversally, we easily get
Ld = div(1 ) . . . div(p ) as a proper intersection product (use Example A.9.22).
Since all hyperplanes are rationally equivalent, the same holds for all projective
linear subspaces of dimension 0, . . . , n. We prove that their classes form a basis
of CH(PnK ). Let Y be a prime cycle on PnK of dimension d . Let Uj = {xj = 0}
be a standard affine open subset of PnK intersecting Y . By Remark A.9.30, Y ∩Uj
is rationally equivalent to 0 on Uj . This proves that Y is rationally equivalent to
a cycle contained in PnK \ Uj ∼ = Pn−1
K . By induction on n , we see that Y is
rationally equivalent to a linear combination of L0 , . . . , Ln−1 . It remains to prove
that the classes of L0 , . . . , Ln are linearly independent in CH(PnK ). By dimen-
sionality reasons, it is enough to show that no class is zero. This follows from
degOPn (1) Lj = 1, which is obvious from transversal intersection.
K

We use the above isomorphism to define a ring structure on CH(PnK ). The mul-
tiplication is called the intersection product of cycles on PnK (see [125] for an
intersection product on any smooth variety). Note that this extends the intersection
A.9. Intersection theory of divisors 561

product of divisors. The intersection product is determined by


$
Lj+k−n if j + k ≥ n,
Lj .Lk = (A.14)
0 else.
On PnK , the degree of a cycle Z is always with respect to OPnK (1) and is denoted
by deg(Z). If Z ∈ Zd (PnK ), then Z is rationally equivalent to deg(Z) Ld . This
follows immediately from deg(Ld ) = 1. If Z  ∈ Zd (X) and if d + d ≥ n , then
we get Bézout’s theorem
deg(Z.Z  ) = deg(Z) deg(Z  ).
Example A.9.32. If Z is an effective divisor on PnK , then Example A.9.28 shows
that Z = div(f˜) for some homogeneous polynomial f˜(x0 , . . . , xn ) of degree
d . By rational equivalence, Z has the same degree as d · div(x0 ) and hence
deg(Z) = d .
A.9.33. We mention some positivity properties of the intersection product. Let X
be a variety over K with a line bundle L and let Z be a cycle on X with non-
negative multiplicities. Then Z is called an effective cycle. If L is generated by
global sections, then c1 (L).Z may be represented by an effective cycle. To see it,
we may assume that Z is a prime cycle. By assumption, there is a global section
s whose restriction to Z is not identically zero. Then div(s|Z ) is an effective
representative of c1 (L).Z .
We conclude that degL (Z) ≥ 0. In particular, this holds for very ample line
bundles. But for L ample and Z a non-zero effective cycle, we claim even that
degL (Z) > 0.

For a proof, we may assume that L is very ample (replace L by a suitable power). But
then, we may assume that X is a closed subvariety of PnK and L = OPnK (1)|X . Then
degL (Z) is the degree of Z in PnK . We may assume that Z is a prime cycle of dimension
d . Let H be a hyperplane in PnK not containing the prime cycle Z . If d > 0 , then Z is
not contained in the affine space PnK \ H (see A.6.15). Therefore H ∩ Z is not empty and,
since the multiplicities of H.Z are at least 1 in every irreducible component of H ∩ Z (see
A.9.21), we get the claim by induction on d .

A.9.34. Next, we introduce algebraic equivalence. It is a coarser equivalence than


rational equivalence and it is similar to homotopy in homology. We use it only for
divisors.
Let X be a variety over K and let L1 , L2 be line bundles on X . We say that L1
and L2 are algebraically equivalent if there is an irreducible smooth variety T
called the parameter space and a line bundle L on X × T such that
L1 ∼= L|X and L2 ∼
t1 = L|X t2

for some t1 , t2 ∈ T (K). Here, we identify the fibres Xtj := X × {tj } with X .
This is possible because of the K -rationality of tj .
562 A L G E B R A I C G E O M E T RY

It is easy to show that it is indeed an equivalence relation. To prove transitivity,


we pass to the product of parameter spaces. Clearly, isomorphic line bundles are
algebraically equivalent, hence algebraic equivalence makes sense on Pic(X).
A.9.35. Let L1 , L2 , M be line bundles on X such that L1 is algebraically equiv-
alent to L2 . Then it is easy to see that L1 ⊗ M is algebraically equivalent to
L2 ⊗ M . In particular, the elements in Pic(X) algebraically equivalent to 0 form
a subgroup of Pic(X) denoted by Pic0 (X).
A.9.36. Let ϕ : X  → X be a morphism of varieties over K and let L1 , L2 be
algebraically equivalent line bundles on X . Then ϕ∗ L1 is algebraically equivalent
to ϕ∗ L2 . In the notation of the definition in A.9.34, we use the same parameter
space and the pull-back of L to X  × T .
A.9.37. Let L1 , L2 be algebraically equivalent ample line bundles on a projective
variety over K . Then we claim degL1 (X) = degL2 (X).

By passing to a suitable tensor power, we may assume that L1 and L2 are both very
ample (see A.6.10). Then degL j (X) is determined by the leading coefficient of the Hilbert
polynomial of Lj (see A.10.33). By definition, we have a line bundle L on X × T with
T an irreducible smooth variety over K such that Lt 1 ∼
= L1 , L t 2 ∼
= L2 for some t1 , t2 ∈
T (K) . Since the projection of X × T onto X is flat (see A.12.11) and X is complete,
the Hilbert polynomial of Lt does not depend on the choice of t ∈ T (see A.10.35). This
proves the claim.
Chow’s lemma ([137], Th.5.6.1) says that every complete K -variety X is image
of a birational surjective morphism ϕ : X  → X from a projective K -variety X  .
By projection formula, the invariance of the degree under algebraic equivalence
holds more generally for complete varieties over K .
A.9.38. Let X be a complete K -variety with line bundles L1 , . . . , Lr and let
Z ∈ Zr (X). Then deg(c1 (L1 ) . . . c1 (Lr ).Z) depends only on the algebraic
equivalence classes of L1 , . . . , Lr .

For L1 algebraically equivalent to L1 , it is enough to show that


deg(c1 (L1 ) . . . c1 (Lr ).Z) = deg(c1 (L1 ).c1 (L2 ) . . . c1 (Lr ).Z) .
By multilinearity of the intersection product and by A.6.10, we may assume L1 , L1 both
ample. Let us choose a representative Y of c1 (L2 ) . . . c1 (Lr ).Z . Clearly, we may assume
that Y is a prime cycle, i.e. an irreducible curve. By A.9.36, we know that L1 |Y is
algebraically equivalent to L1 |Y . Using A.9.37, we get
deg(c1 (L1 ).Y ) = deg(c1 (L1 |Y )) = deg(c1 (L1 |Y )) = deg(c1 (L1 ).Y ).

A.9.39. Let X be a smooth K -variety. Since Pic(X) is isomorphic to CH 1 (X)


(see A.9.18), we can translate the above theory to divisors. Two divisors D1 and
D2 on X are algebraically equivalent if O(D1 ) and O(D2 ) are algebraically
equivalent. We have the following properties:
A.10. Cohomology of sheaves 563

(a) If D1 and D2 are rationally equivalent, then they are algebraically equiva-
lent.
(b) If D1 and D2 are algebraically equivalent and D is a further divisor on
X , then D1 + D is algebraically equivalent to D2 + D .
(c) Algebraic equivalence is preserved by pull-back of divisors with respect to
a morphism of smooth varieties over K .
(d) Algebraically equivalent divisors on a smooth projective curve have the
same degree.
A.9.40. Let X be an irreducible smooth projective curve over an algebraically
closed field. Then the converse of (d) also holds, i.e. if deg(D1 ) = deg(D2 ), then
D1 and D2 are algebraically equivalent.

To prove it, note that D1 and D2 are sums of ±[x] for some x ∈ X . Since they have the
same degree and algebraic equivalence is compatible with sum of divisors, we may assume
D1 = [x1 ], D2 = [x2 ] . Note that X × X is smooth (see A.7.17) and the diagonal ∆
is a divisor on X × X . Moreover, for any x ∈ X , we have O(∆)|X ×{x} ∼ = O([x]) .
This follows from transversality of X × x and ∆ in X × X . So we choose T := X as
parameter space, t1 := x1 , t2 := x2 and L = O(∆) to get the claim.

A.9.41. Two line bundles L1 , L2 on a smooth complete variety X over K are


called numerically equivalent if c1 (L) · α = c1 (L2 ) · α for every α ∈ CH1 (X).
By A.9.38, algebraically equivalent line bundles are numerically equivalent. As
above, we define divisors to be numerically equivalent if and only if the associated
line bundles are numerically equivalent.

A.10. Cohomology of sheaves

This section gives a brief introduction to sheaf cohomology on varieties. For more
details, we refer to [148], Ch.III. First, we need some additional constructions of
sheaves. We consider a topological space T . On T , all (pre-)sheaves will be
(pre-)sheaves of abelian groups. Later, we pass to a K -variety X .
A.10.1. For a presheaf F on T , there is a canonical way to associate a sheaf F +
on T and a homomorphism ι : F → F + such that for any sheaf G on T and any
homomorphism ϕ : F → G , there is a unique homomorphism ϕ+ : F + → G
with ϕ = ϕ+ ◦ ι . Then F + is called the sheaf associated to F . For the easy
construction, we refer to [148], Prop.-Def.II.1.2.
A.10.2. Let F be a sheaf on T . A subsheaf of F is a sheaf G such that G(U )
is a subgroup of F(U ) for all open subsets U of T and such that the restriction
maps of G are induced by the restriction maps of F .
564 A L G E B R A I C G E O M E T RY

Example A.10.3. If ϕ : F → F  is a homomorphism of sheaves, then the kernel


of ϕ is the subsheaf of F given by ker(ϕ)(U ) := ker(ϕU ) for all open subsets U .
However, the abelian groups ϕU (U ) form only a subpresheaf G of F  . We define
the image im(ϕ) = ϕ(F) to be the sheaf associated to the presheaf ϕ(F(U ))
with U ranging over the open subsets of T . We call ϕ surjective if ϕ(F) = F  .
In general, this does not mean that ϕU (F(U )) = F  (U ) for all open subsets U .
A.10.4. A sequence
ϕp −1 ϕp
· · · −→ F p−1 −→ F p −→ F p+1 −→ · · ·
of sheaves is said to be exact if all maps ϕp are homomorphisms of sheaves such
that im(ϕp−1 ) = ker(ϕp ) for all p . An exact sequence of the form
ϕ ϕ
0 −→ F  −→ F −→ F  −→ 0
is called a short exact sequence of sheaves.
Example A.10.5. Again, we consider a sheaf F on T and a subsheaf F  . For
an open subset U of T , let G(U ) := F(U )/F  (U ). Then G with the restriction
maps induced by F is a presheaf of abelian groups on T . The sheaf associated to
G is called the quotient sheaf F/F  and we get a short exact sequence
ϕ ϕ
0 −→ F  −→ F −→ F/F  −→ 0.
Conversely, for any short exact sequence as in A.10.4, we have a canonical iso-
morphism F/ϕ (F  ) ∼
= F  . For any homomorphism ϕ : F1 → F2 of sheaves,
the sheaf coker(ϕ) := F2 /ϕ(F1 ) is called the cokernel of ϕ .
A.10.6. Let F be a sheaf on T . We fix an open covering U = (Uα )α∈I of X .
We fix a well-ordering on I . This will not cause a problem later, because all our
coverings will be finite. For p ∈ N , we define an abelian group

C p (U, F) := F(Ui0 ∩ · · · ∩ Uip ).
i0 <···<ip

The coboundary map is a homomorphism


dp : C p (U, F) −→ C p+1 (U, F).
For σ ∈ C p (U, F), the components of dp (σ) are given by

p+1
Ui ∩···∩Ui ∩Ui k +1 ∩···∩Ui p
(dp (σ))i0 ,...,ip +1 = (−1)k ρUi 0 ∩···∩Ui kp −1 (σi0 ,...,ik −1 ,ik +1 ,...,ip +1 ),
0
k=0

where ρ is the restriction homomorphism of the sheaf F . Then we get the Čech
complex
d−1 d0 d1
0 −→ C 0 (U, F) −→ C 1 (U, F) −→ · · ·
A.10. Cohomology of sheaves 565

of abelian groups. Complex means dp ◦ dp−1 = 0 for all p . The elements in


the kernel of dp are called Čech cocycles with respect to U of degree p and the
elements in the image of dp−1 are Čech boundaries with respect to U of degree
p . The p th Čech cohomology group with respect to U is
Ȟ p (U, F) := ker(dp )/im(dp−1 ).
By the sheaf axioms, we get
Ȟ 0 (U, F) = F(X).
This is clear since a Čech cocycle with respect to U is a collection of local sections,
which agree on overlappings.
A.10.7. Let ϕ : F → F  be a homomorphism of sheaves. Then we get homomor-
phisms
ϕp : C p (U, F) −→ C p (U, F  ), (αi0 ,...,ip ) → (ϕp (αi0 ,...,ip ))
such that ϕp+1 ◦dp = dp ◦ϕp , i.e. ϕ is a homomorphism of complexes. Hence, it
induces homomorphisms Ȟ p (U, F) → Ȟ p (U, F  ) of Čech cohomology groups,
which we denote also by ϕp or simply by ϕ .
A.10.8. From a formal point of view, it is nicer to define Čech cocycles without
choosing a well-ordering. We define C p (U, F) as the set of elements

σi0 ,...,ip ∈ F(Ui0 ∩ · · · ∩ Uip )
(i0 ,...,ip )∈I p +1

with σi0 ,...,ip = 0 if two components of (i0 , . . . , ip ) are equal and


σiπ (0) ,...,iπ (p ) = (−1)sign(π) σi0 ,...,ip
if π is a permutation of {0, . . . , p}. The same definitions as in A.10.6 may be
used to define the Čech complex. Clearly, the Čech cocycles and boundaries may
be identified with the old ones.
Now let U  = (Uβ )β∈J be a refinement of U , i.e. U  is also an open covering
of X and for every β ∈ J there is α ∈ I with Uβ ⊂ Uα . We choose a map α :
J → I with Uβ ⊂ Uα(β) . Then we define a homomorphism θ : C p (U, F) −→
C p (U  , F) by
Uα (j 0 ) ∩···∩Uα (j p )
θp (σ)j0 ,...,jp := ρU   (σα(j0 ),...,α(jp ) ).
j ∩···∩Uj p
0

Then θ is a homomorphism of Čech complexes inducing a homomorphism


Ȟ p (U, F) → Ȟ p (U  , F)
of Čech cohomology groups. It does not depend on the choice of the refinement
map ([148] Exercise III.4.4 or F. Warner [321], 5.33).
566 A L G E B R A I C G E O M E T RY

Example A.10.9. Let X be a variety over K . We consider the sheaf K of rational


functions on X , i.e. K(U ) = K(U ) for any open subset U of X . The sheaf K
is easy to understand. If we denote the irreducible components of X by Xj , then
K(U ) is the product of the fields K(Xj ), where j is ranging over all components
with Xj ∩ U = ∅ (see A.4.12). Note that OX and K are sheaves of K -algebras.
×
We denote by OX (U ) and K× (U ) the group of multiplicative units for any open
×
subset U of X . Clearly, OX is a subsheaf of the sheaf K× of abelian groups.
Then any open covering U = (Uα )α∈I and fα ∈ K(Uα ) give a Cartier divisor
on X if and only if
 ×
 ×
fα · OX (Uα ) α∈I ∈ Ȟ 0 (U, K× /OX ).
×
Moreover, we see that the group of Cartier divisors is isomorphic to (K× /OX )(X).

Example A.10.10. Recall that we can give a line bundle on X by a covering U =


×
(Uα )α∈I and transition functions gαβ ∈ OX (Uα ∩ Uβ ) satisfying gαβ gβγ = gαγ
×
on Uα ∩ Uβ ∩ Uγ . We see that (gαβ ) is a Čech cocycle of OX with respect to the
covering U .
Two line bundles L, L are isomorphic if and only if there is a common trivializa-
tion U = (Uα )α∈I and hα ∈ O(Uα )× for α ∈ I such that gαβ 
/gαβ = hα /hβ ,
i.e. a Čech coboundary with respect to the covering U . The function hα gives the
matrix of the isomorphism from the trivialization of L to the one of L on Uα .
We conclude that
Pic(X) ∼= lim ×
Ȟ 1 (U, OX ),
−→
U
where the direct limit is over all open coverings directed with respect to refine-
ments.
A.10.11. An OX -module F is a sheaf of abelian groups on X such that F(U )
is an OX (U )-module with
V (ξ) · ρV (f ) = ρV (ξ · f )
ρU U U

for all open subsets V ⊂ U in X and ξ ∈ OX (U ), f ∈ F(U ). A homomorphism


of OX -modules is a homomorphism ϕ : F → F  of sheaves such that ϕU is a
homomorphism of OX (U )-modules for each open subset U .
A.10.12. An OX -module F on X is called free of rank r ∈ N if there is an
isomorphism F → OX r
of OX -modules. We call an OX -module locally free if,
for all x ∈ X , there is an open neighbourhood U of x such that F|U is a free
OX -module of rank rU ∈ N .
A.10.13. Let E be a vector bundle on X . Then the sheaf of sections E is a locally
free OX -module. Just the trivializations give us the isomorphisms to OU rU
. We
claim that every locally free OX -module F arises this way and we will carry
A.10. Cohomology of sheaves 567

over the terminology of vector bundles (especially of line bundles) to locally free
OX -modules (of rank 1).

We have an open covering (Uα )α∈I of X such that F|U α is free of rank rα . We have an
OX (U ) -basis sα1 , . . . , sαr α in F (Uα ) . On non-empty Uα ∩ Uβ ,

ρU α Uα
U α ∩U β (sα1 ), . . . , ρU α ∩U β (sαr α )

and
U U
ρU βα ∩U β (sβ 1 ), . . . , ρU βα ∩U β (sβ r β )
are both an OX (Uα ∩ Uβ ) -basis of F (Uα ∩ Uβ ) . Therefore, we have rα = rβ and
gαβ ∈ GL(rα , OX (Uα ∩ Uβ )) with

U


ρU βα ∩U β (sβ i ) = (gαβ )ji ρU α
U α ∩U β (sαj ) (A.15)
j=1

for i = 1, . . . , rα .
An element of GL(rα , OX (Uα ∩ Uβ )) is an invertible rα × rα -matrix with entries in
OX (Uα ∩ Uβ ) . Therefore gαβ may be viewed as a morphism Uα ∩ Uβ → GL(rα )K .
Clearly, we have gαβ gβ γ = gαγ on Uα ∩ Uβ ∩ Uγ . By the construction in A.5.7, we get a
−1
vector bundle E on X with transition matrices gαβ . Let ϕα : πE (Uα ) → Uα × ArK be
the corresponding trivialization. We consider e1 , . . . , er α ∈ Γ(Uα , Uα × ArK ) , pointwise
equal to the standard basis. Then ϕ−1 −1
α ◦ e1 , . . . , ϕα ◦ er α form an OX (Uα ) -basis of
Γ(Uα , E) . Since they satisfy the same transition rule on Uα ∩ Uβ as in (A.15), we may
identify them with sα1 , . . . , sαr α . Then the sheaf of sections E of E coincides with F .

Example A.10.14. Let X be a K -variety. In Example A.10.9, we have intro-


duced the sheaf K of rational functions on X . For any Cartier divisor D on
X , we realize the sheaf of sections OX (D) of the line bundle OX (D) as a sub-
sheaf of K : Let D = (Uα , fα )α∈I . For an open subset U of X , we define
F(U ) := {f ∈ K(U ) | f · fα ∈ OX (U ∩ Uα ) ∀α ∈ I}. Clearly, this is a
subsheaf of K . The claim follows from the isomorphism
Γ(U, OX (D)) −→ F(U ), s → s/sD .

By A.9.17, an effective Cartier divisor D may be viewed as a closed subscheme of X and


the subsheaf of K corresponding to OX (−D) is equal to the ideal sheaf JD .

A.10.15. We have seen in Example A.5.11 that not every sheaf is locally free. We
now introduce the notion of coherent sheaves, which includes almost all sheaves
of importance for our book. A coherent sheaf is an OX -module F on X , which
is locally isomorphic to the cokernel of free sheaves, i.e. for all x ∈ X there is
an open neighbourhood U of x and an OU -module homomorphism ϕ : OU rU

OU for some rU , sU ∈ N such that OU /im(ϕ) is isomorphic to F . Obviously,
sU sU

every locally free OX -module is coherent.


568 A L G E B R A I C G E O M E T RY

A.10.16. If ϕ : E → F is a homomorphism of OX -modules, then ker(ϕ), im(ϕ)


and coker(ϕ) are OX -modules. They are all coherent when E and F are coherent
([134], (0.5.3.4)).
A.10.17. We introduce now some basic operations on OX -modules. Let E and F
be OX -modules. Then HomOX (E, F) is the sheaf which is given on an open sub-
set U by the homomorphisms E|U → F|U of OU -modules and whose restriction
maps are the restrictions of homomorphisms.
We define E ⊗OX F to be the sheaf associated to the presheaf E(U )⊗OX (U ) F(U ),
U open in X , with the obvious restriction maps.
If E and F are coherent, then HomOX (E, F) and E ⊗OX F are both coherent
([134], (0.5.3.5). Suppose that E, F are the sheaves of sections of vector bundles
E and F . Then it is easy to see that HomOX (E, F) (resp. E ⊗OX F ) is the sheaf
of sections of Hom(E, F ) (resp. E ⊗ F ).
A.10.18. Let ϕ : X → X  be a morphism of K -varieties and let F be an OX -
module on X . For an open subset U  of X  , we define
ϕ∗ (F)(U  ) := F(ϕ−1 U  ).
Together with the restriction maps induced from F , we get an OX -module ϕ∗ (F)
on X  called the direct image of F . If X is a complete variety, then the direct
image of a coherent sheaf is coherent. In fact, this holds more generally for proper
morphisms ([136], Cor.3.2.2).

We deduce that a complete affine variety X is finite. Let ϕ : X → A0K be the constant
map. Then ϕ∗ (OX )(A0K ) = K[X] is a finite-dimensional K -vector space. We conclude
that ϕ is a finite morphism proving the claim (see A.12.4).

A.10.19. Again, we consider a morphism ϕ : X → X  of K -varieties. For an


OX  -module F  on X  , we define the presheaf F on X in the following way. We
fix an open subset U of X . Then we consider pairs (U  , s ), where U  is an open
neighbourhood of ϕ(U ) and s ∈ F  (U  ). Two pairs (U  , s ) and (V  , t ) are
called equivalent if the restrictions of s and t agree on an open neighbourhood
of ϕ(U ). Then the set of equivalence classes is denoted by F(U ). We use the
restriction maps induced from F  to get a presheaf F .
The same thing can be done with OX  instead of F  . Then we get a presheaf
G of K -algebras on X , where the elements of G(U ) are equivalence classes of
pairs (U  , f  ) with U  as above and f  a regular function on U  . Clearly, F(U )
is a G(U )-module. Using composition with ϕ , we see that OX (U ) is also a
G(U )-module. Then we define the inverse image (or pull-back) ϕ∗ F  to be the
sheaf associated to the presheaf on X given on an open subset U by F(U ) ⊗G(U )
OX (U ). Note that this construction is necessary to get an OX -module. If F  is
coherent, then ϕ∗ (F  ) is also coherent ([148], Prop.II.5.8).
A.10. Cohomology of sheaves 569

Obviously, we have ϕ∗ OX  = OX . We deduce that the inverse image sheaf of


a (locally) free OX  -module is (locally) free. If E  is the sheaf of sections of a
vector bundle E  , this shows easily that ϕ∗ (E  ) is the sheaf of sections of ϕ∗ (E  ).
If X is an open or closed subvariety of X  , then we simply write F  |X for the
pull-back.
Example A.10.20. Let Y be a closed subvariety of X with ideal sheaf JY . If i
denotes the inclusion map Y ⊂ X , then we have a short exact sequence
0 −→ JY −→ OX −→ i∗ OY −→ 0. (A.16)
Locally, this is obvious and, by the sheaf property, we get the claim. Since OX
and i∗ OY are coherent, it follows that JY is coherent. Note that, if we tensor
the short exact sequence with a locally free sheaf E on X , then we obtain a short
exact sequence
0 −→ JY ⊗OX E −→ E −→ i∗ OY ⊗OX E −→ 0.
By the projection formula ([148], Exercise II.5.1(d), we have
i∗ OY ⊗O E ∼X= i∗ (OY ⊗O i∗ E) ∼ Y= i∗ i∗ E.
If Y is just a K -rational point P of X , then i∗ OY is called the skyscraper sheaf
KP of P . Note that KP is the sheaf given by

K if P ∈ U ,
KP (U ) =
{0} if P ∈ U .
A.10.21. Now we are ready to define the cohomology groups of a coherent OX -
module F on a K -variety X . We choose a covering U of X by affine open
subsets. For p ∈ N , the p th cohomology group is H p (X, F) := Ȟ p (U, F).
It does not depend on the choice of U ; i.e., if U  is also an affine open cover-
ing of X , then there exists a common refinement U  also by affine open sub-
sets (use A.2.10) and the canonical homomorphisms Ȟ p (U, F) → Ȟ p (U  , F)
and Ȟ p (U  , F) → Ȟ p (U  , F) from A.10.8 are isomorphisms ([148], Th.III.4.5).
Note that the cohomology groups are K -vector spaces. By A.10.6, we have
H 0 (X, F) = F(X).
ϕ ϕ
A.10.22. Let 0 −→ F  −→ F −→ F  −→ 0 be a short exact sequence of
coherent OX -modules. Then we get a long exact sequence of cohomology groups
ϕ ϕ
0 −→ H 0 (X, F  ) −→ H 0 (X, F  ) −→ H 1 (X, F  ) −→ · · ·
δ
X
H 0 (X, F) −→
X

· · · −→ H p (X, F  ) −→ H p (X, F) −→ H p (X, F  ) −→ H p+1 (X, F  ) −→ · · · .


δ δ

We briefly sketch the argument assuming some familiarity with homological algebra (see
for example [157], Ch.6). By [148], Prop.II.5.6, the sequence
ϕ ϕ 
0 −→ F  (U ) −→ F (U ) −→ F  (U ) −→ 0
570 A L G E B R A I C G E O M E T RY

is exact for any affine open subset U . Since the intersection of finitely many affine open
subsets remains affine ([148], Exercise II.4.3), we see that the sequence
ϕ ϕ 
0 −→ C ∗ (U, F  ) −→ C ∗ (U, F ) −→ C ∗ (U, F  ) −→ 0
is also exact. By homological algebra, we get the long exact sequence of cohomology
groups.

A.10.23. If X is an affine K -variety and F is a coherent sheaf of OX -modules,


then H p (X, F) = 0 for all p ≥ 1([148], Th.III.3.7). This explains the use of
affine open coverings in A.10.21.
A.10.24. For any K -variety X and p > dim(X), we have H p (X, F) = 0
([148], Th.III.2.7).
A.10.25. If Y is a closed subvariety of X and F is a coherent OY -module on
Y , then we have
H p (Y, F) = H p (X, i∗ F) (p ∈ N),
where i : Y → X is the inclusion. This follows immediately from
C p (U, i∗ F) = C p (U ∩ Y, F),
where U is an affine open covering of X .
A.10.26. For a projective variety X over K , we have the following generalization
of A.6.7. If F is a coherent OX -module, then H p (X, F) is a finite-dimensional
K -vector space for all p ∈ N ([148], Th.III.5.2). In fact, this is more generally
true for complete varieties ([136], 3.2.1).
A.10.27. The sheaf associated to a line bundle is called invertible. Let L be an
invertible sheaf on a projective variety X over K . If L is ample, then for each
coherent OX -module F , there is n0 ∈ N such that
H p (X, F ⊗ L⊗n ) = 0
for all p ≥ 1 and all n ≥ n0 ([148], Prop.III.5.3).
A.10.28. Now we handle base change. Let L be a field extension of K . We
consider a coherent sheaf F on a geometrically reduced K -variety X . There
is a unique coherent sheaf FL on XL such that FL (UL ) = F(U ) ⊗K L and
ρUVL = ρV ⊗ 1 for all affine open subsets V ⊂ U of X . This follows easily from
L U

the fact that on an affine variety U , there is a one-to-one correspondence between


coherent sheaves and finitely generated K -modules, given by G → G(U ) (use
[148], Cor.II.5.5, Prop.II.5.2).
Anyway, we need the base change only for locally free sheaves E . Then E is the
sheaf of sections of a vector bundle E . We have seen that EL is a vector bundle
over XL (see A.5.15). It is given by the same transition functions as E . We claim
that EL is the sheaf of sections of EL . Since X is geometrically reduced, this is
clear for the restriction to trivializations and shows immediately the claim.
A.10. Cohomology of sheaves 571

Cohomology is compatible with base change, i.e.


H p (X, F) ⊗K L = H p (XL , FL )
for all p . To prove that we choose a finite affine open covering U = (Uα )α∈I of
X . Then the base change UL of the covering is affine. Clearly, the Čech complex
C ∗ (UL , FL ) is obtained from C ∗ (U, F) by tensoring with L. Thereby, kernels
and images go to kernels and images proving the claim.
A.10.29. Let X be an irreducible smooth projective variety over K of dimension

d . We have the canonical line bundle KX = ∧d TX on X (see A.7.23). We denote
its sheaf of sections (i.e. the d -forms) by ωX . For a locally free sheaf E of X ,
we have the Serre duality
H i (X, E) ∼
= H d−i (X, Ě ⊗ ωX )∗ ,
where Ě = HomOX (E, OX ) is the dual of E ([148], Cor.III.7.7, Cor.III.7.12).
A.10.30. Let F be a coherent sheaf on a projective variety X over K . The Euler
characteristic of F is

dim(X)
χ(F) := (−1)j dim H j (X, F).
j=0

Let us consider an exact sequence


ϕ0 ϕ1 ϕn −1
0 −→ F0 −→ F1 −→ · · · −→ Fn −→ 0
of coherent sheaves on X . Then we have
n
(−1)j χ(Fj ) = 0. (A.17)
j=0

For a short exact sequence, this follows from the long exact sequence of cohomology groups
and the fact that the alternate sum of dimensions is zero for a finite exact sequence of finite-
dimensional vector spaces. In general, we have
n  n
(−1)j χ(Fj ) = (−1)j (χ(ker(ϕj )) + χ(im(ϕj )))
j=0 j=0


n 
n
= (−1)j χ(im(ϕj−1 )) + (−1)j χ(im(ϕj )) = 0.
j=0 j=0

Let L be a very ample invertible sheaf on X . We define F(n) := F ⊗ L⊗n for


all n ∈ Z. Then there is P ∈ Q[x] called the Hilbert polynomial of F with
respect to L given by P (n) = χ(F(n)) for all n ∈ Z .
To prove it, we may assume that X is a closed subvariety of PnK with L = OPnK (1)|X .
Let U be the set of points x ∈ X such that F (V ) = 0 for every sufficiently small
neighbourhood V of x . The support of F is supp(F ) := X \ U . Since F is coherent,
we can show that the support is closed ([148], Exercise II.5.6). Let d(F ) be the dimension
572 A L G E B R A I C G E O M E T RY

of supp(F ) and let n(F ) be the number of d(F ) -dimensional irreducible components of
supp(F ) . We order the pairs (d(F ), n(F )) lexicographically and we use induction with
respect to this order. If F = 0 , then the support of F is not empty and we find a standard
open subset Uj = {xj = 0} intersecting supp(F ) . Consider the exact sequence
ϕ
0 −→ K −→ F (−1) −→ F −→ C −→ 0, (A.18)
where the homomorphism ϕ is tensoring with xj . Note that the restriction of ϕ to
Uj is an isomorphism. We conclude that the supports of the kernel K and the co-
kernel C are contained in supp(F ) ∩ {xj = 0}. By induction, we may assume that
d
χ(K(n)) and χ(C(n)) have the form n
j=0 aj j for some aj ∈ Z . Note that (A.18)
⊗n
remains exact after tensoring with L and the supports do not change. By (A.17), we
d(F )−1 n 
see that χ(F (n)) − χ(F (n − 1)) = b j for some bj ∈ Z . Then
d(F )−1 n+1 j=0 j
χ(F (n)) = χ(F (−1)) + j=0 bj j+1 .
As a corollary of the proof, we see that the Hilbert polynomial has the form
 x

d(F )
P (x) = aj
j=0
j

with integer coefficients aj . By [135], Prop.5.3.1, the degree of the Hilbert poly-
nomial is equal to the dimension of supp(F).
A.10.31. Let i : X → Pm K be a closed embedding with OPK (1)|X = L. From
m

A.10.25 and the projection formula ([148], Exercise II.5.1(d)


i∗ (F ⊗ L⊗n ) = i∗ (F ⊗ i∗ OPm (n)) ∼= i∗ (F) ⊗ OPm (n),
K K

we deduce that the Hilbert polynomial of F with respect to L is the same as the
Hilbert polynomial of i∗ (F) with respect to OPm
K
(1). We conclude that without
loss of generality, we can always assume X = PmK , L = OPK (1).
m

K . Then we use F(n) := F ⊗ OPK (n). Note


So let F be a coherent sheaf on Pm m

that
5∞
Γ∗ (F) := K , F(n))
H 0 (Pm
n=0
is a graded K[x0 , . . . , xn ]-module. It follows from [148], Exercise II.5.9 and
from A.10.26 that Γ∗ (F) is a finitely generated graded K[x]-module. Let P be
the Hilbert polynomial of F (always with respect to OPm K
(1)). By A.10.27, we
get
P (n) = dim H 0 (PmK , F(n))
for all n  0. Hilbert has shown that for any finitely generated K[x]-module M,
there is a polynomial PM ∈ Q[x] with PM (n) = dim Mn for n  0 (see [157],
Th.7.23). We conclude that the Hilbert polynomial of F is equal to PΓ∗ (F ) .
 
Example A.10.32. The Hilbert polynomial of OPm K
is P (x) = x+m
m by A.6.6.
A.10. Cohomology of sheaves 573

A.10.33. Let X be a closed subvariety of Pm


K with homogenization I(X) and let
P be the Hilbert polynomial of OX with respect to L = OPmK
(1)|X . We call it the
Hilbert polynomial of X . Then for n sufficiently large, P (n) is the dimension
of the space of homogeneous polynomials of degree n restricted to X , i.e.

m+n
P (n) = − dim I(X)n .
n
This follows from the long exact cohomology sequence applied to (A.16) on
page 569, with arguments as those in A.10.31.
The Hilbert polynomial P of X has degree d = dim(X) (see A.10.30). By
[125], Example 2.5.2, the highest coefficient of P is ad /d! with ad equal to the
degree of X . This is illustrated in the following example.
A.10.34. Assume that the homogeneous ideal I(X) of X is generated by a ho-
mogenous non-zero polynomial f of degree d . By A.10.14, we have OPm (−X) =
JX giving rise to
f·  
0 −→ OPm K
−→ OPmK
(d) −→ OPm K
/JX (d) ∼ = i∗ OX (d) −→ 0.
The associated long exact sequence, A.10.25 and Example A.10.32 show that the
Hilbert polynomial of X is

x+m x+m−d
P (x) = − .
m m
Then P is a polynomial of degree m − 1 with leading coefficient d/(m − 1)!.
A.10.35. An important property of the Hilbert polynomial is that it is invariant under flat
perturbation. This is useful for modular problems (see [280], Ch.VI, 4). To formulate
it properly, we have to use the language of schemes. The reason is that for a morphism
ϕ : X → X  and y ∈ X  , the fibre Xy may be different in the sense of schemes.
However, in our applications, there is no difference, i.e. the fibres will be reduced.
Let ϕ : X → X  be a proper morphism of noetherian schemes. Consider a coherent
sheaf F on X such that F is flat over X  , i.e. F (ϕ−1 U  ) is a flat OX  (U  ) -module
(see A.12.10) for every affine open subset U  of X  . Then the Euler characteristic of
Fy := F|X y is locally constant in y ∈ X  ([212], II.5). If L is a very ample invertible
sheaf on X , we conclude that the Hilbert polynomial of Fy with respect to Ly is locally
constant on X  . If X  is connected, it is independent of the choice of the fibre Xy .
Example A.10.36. Let K = F2 , let X be the closed subvariety of P1K × A1K given
by 0 = x2 + yx + 1 , where x is an affine coordinate on P1K and y is the coordinate
on A1K . Let ϕ : X → A1K be the second projection. Then the fibre of ϕ over y as a
2
scheme
 isx−1  by the equation x + yx + 1 = 0 , hence the Hilbert polynomial of Xy is
given
x+1
1
− 1 = 2 . On the other hand, the fibre over 0 is as a variety given by x = 1 , i.e.
the Hilbert polynomial is equal to 1 . This shows that it is necessary to consider the fibres
in the sense of schemes.
574 A L G E B R A I C G E O M E T RY

A.10.37. It is possible to relate the cohomology of the product to the cohomologies of the
factors. This is done in the Künneth formula: Let X1 and X2 be varieties over K . Let
E be a locally free sheaf on X1 and let F be a coherent sheaf on X2 . We denote the
projection of X1 × X2 onto Xi by pi . Then
5
H n (X1 × X2 , p∗1 E ⊗ p∗2 F ) ∼
= H p (X1 , E) ⊗K H q (X2 , F ).
p+q=n

For the proof (of a much more general result), we refer to [136], Th.6.7.8.

A.10.38. Let F be a coherent sheaf on a multiprojective space PK : PnK1 × · · · ×


PnKr (see A.6.13). Then there is k ∈ N such that
H i (PK , F ⊗ OP (d1 , . . . , dr )) = 0
for all d1 , . . . , dr ≥ k and i > 0.
Note first that the claim holds for F = OX with k = 1 . This is clear from the case r = 1
([148], Th.III.5.1) and the Künneth formula. For general F , there is a short exact sequence
0 −→ F  −→ E −→ F −→ 0,
where E is a finite direct sum of sheaves OP (qi , . . . , qi ) for various integers qi (because of
[148], Cor.II.5.18 and OP (1, . . . , 1) very ample). Applying descending induction on i in
the associated long exact cohomology sequence, we deduce the claim from the case above
because cohomology is compatible with direct sums ([148], Rem.III.2.9.1).

A.11. Rational maps

A.11.1. Let X, X  be varieties over K . We consider pairs (U, ϕ), where U is


an open dense subset in X and ϕ : U → X  is a morphism. Two pairs are
called equivalent if the morphisms agree on the intersection. To show that it is
an equivalence relation, we use that a morphism is determined by its restriction to
an open dense subset. This follows from the fact that the diagonal is closed. An
equivalence class is called a rational map. A rational map ϕ : X  X  may be
represented by a pair (Umax , ϕmax ), where Umax is maximal, in fact the union of
all possible open subsets where ϕ is defined. We call Umax the domain of ϕ . If
X  = A0K , then rational maps are the same as rational functions.
A.11.2. Let ϕ : X  X  and ψ : X   X  be rational maps. The image
im(ϕ) of a rational map is defined as the image of its domain. Note that the
composition ψ ◦ ϕ does not always make sense, since im(ϕ) may be contained in
the complement of the domain of ψ . However, if X is irreducible and im(ϕ) is
dense in X  , then the composition ψ ◦ ϕ makes sense as a rational map from X
to X  . A rational map of varieties is called dominant if the image is dense.
A.11.3. A rational map ϕ : X  X  of irreducible varieties over K is said
to be a birational map if it is dominant and if there is a dominant rational map
ψ : X   X which is inverse to ϕ , i.e. ϕ ◦ ψ and ψ ◦ ϕ are both (equivalent to)
A.11. Rational maps 575

the identity on X and X  , respectively. In this case, X is said to be birational to


X  . Clearly, the composition of birational maps is again birational.
A.11.4. Let ϕ : X  X  be a dominant rational map of irreducible varieties.
Then we get a homomorphism ϕ : K(X  ) → K(X), f → f ◦ ϕ , of function
fields. Conversely, any homomorphism of function fields arises uniquely this way
because on suitable affine open dense subsets, it induces a homomorphism of co-
ordinate rings and then we may use A.2.4. In particular, ϕ is birational if and only
if ϕ is an isomorphism of function fields.
Up to birationality, we have seen that X is determined by its function field. Note
that any finitely generated field extension F of K is a function field of an irre-
ducible (affine) variety over K .
A.11.5. Let X be an irreducible K -variety of dimension r which is geometrically
reduced. We claim that X is birational to a hypersurface in Ar+1
K .

We sketch the proof: By A.4.11, K(X) is separable over K , i.e. there are algebraically
independent f1 , . . . , fr ∈ K(X) such that K(X) is a finite-dimensional separable field
extension of F := K(f1 , . . . , fr ) . By the primitive element theorem ([156], Sec.4.14),
K(X) is generated over F by a rational function fr+1 . It follows that the vector ϕ(x) :=
(f1 (x), . . . , fr+1 (x)) gives a rational map to Ar+1
K . Let p be the minimal polynomial
of fr+1 over F . By clearing denominators, we may assume that p = q(f1 , . . . , fr , ·)
for some q ∈ K[x1 , . . . , xr+1 ] . The hypersurface Z({q}) in Ar+1K has a function field
isomorphic to K(X) . It follows easily that ϕ gives a birational map from X to Z({q}) .

A.11.6. Now we assume that K is infinite and that X is an irreducible geometrically


reduced projective variety in PnK of dimension r < n . Then we realize the birational map
into a hypersurface of Pr+1
K by a suitable projection with centre outside of X :
By a linear change of coordinates, we may assume that all r + 1 -codimensional projective
linear subspaces of the form {xj0 = · · · = xjr = 0} are disjoint from X . In particular,
X ⊂ {x0 = 0} . Then K(X) is generated by xx 10 , . . . , xxn0 over K . Since this extension is
separable, we may assume after a renumbering of the coordinates that K(X) is separable
over F := K( xx 10 , . . . , xxr0 ) (see [157], proof of Th.8.37). By separability, F ⊂ K(X) has
only finitely many intermediate fields ([156], Th.4.28). Since K is infinite, we conclude
that there are linearly independent linear forms
r+1 , . . . ,
n ∈ K[xr+1 , . . . , xn ] such
x
that the intermediate fields Ej := F (
j ( rx+1 0
, . . . , xxn0 )) are the same for all j ∈ {r +
1, . . . , n} . Let A be the square matrix whose rows are the coefficients of the
j . Then
x
the primitive elements of the Ej are the components of A · ( rx+1 0
, . . . , xxn0 )t . Multiplying
−1
the latter with A , we see that Ej = K(X) for all j ∈ {0, . . . , n} . Now consider the
projection
π : PnK  Pr+1
K , x
→ (x0 : · · · : xr :
r+1 (x)).
As in A.11.5, we conclude that π is a birational map to a hypersurface in Pr+1
K . By as-
sumption, we have {x0 = · · · = xr = 0} ∩ X = ∅ . Therefore the centre of π lies outside
of X .
576 A L G E B R A I C G E O M E T RY

A.11.7. Let π be a projection as above with K still infinite. It maps X bira-


tionally onto Z({f }) for some homogeneous polynomial f ∈ K[x0 , . . . , xr+1 ].
The degree d of f is equal to the degree of X . To see this, note that pull-back
π ∗ maps a hyperplane to a hyperplane and hence the claim follows from projec-
tion formula and Example A.9.32. After a linear change of coordinates, we may
assume that f has the form
f (x) = f0 (x0 , . . . , xr ) + · · · + fd−1 (x0 , . . . , xr )xd−1 d
r+1 + xr+1 .

A.11.8. Let ϕ : X  X  be a rational map of varieties, defined as a morphism


on the open dense subset U of X . If ϕ is defined as a morphism in an open
neighourhood of x ∈ X with x = ϕ(x), then x is contained in the closure of
ϕ(U ) and we have ϕ (OX  ,x ) ⊂ OX,x .
We claim that the converse of this evident fact also holds.
Let x ∈ X and x ∈ X  . We may assume that all the irreducible components of
X pass through x . We suppose that x is contained in the closure of ϕ(U ). Then
we get a well-defined homomorphism of OX  ,x to the ring of rational functions
K(X), given by f  → f  ◦ ϕ . Finally, we assume that the range of this homomor-
phism is contained in OX,x . Then the claim is that ϕ is defined in x with image
x .
Proof: We choose an affine neighbourhood V  of x with coordinates x1 , . . . , xn . Then
there is an affine open neighbourhood V of x such that all xj := xj ◦ ϕ are regu-
lar functions on V . We define a morphism V → V  by v
→ (x1 (v), . . . , xn (v)) .
By considering ϕ , it is clear that ϕ agrees with this morphism on U ∩ V proving the
claim. 

A.11.9. Let ϕ : X  X  be a rational map of K -varieties with X smooth. If the base

change ϕK̄ extends to a morphism XK̄ → XK̄ , then ϕ extends to a morphism X → X  .

Proof: Assume that ϕK̄ extends to a morphism ϕ̄ : XK̄ → XK̄ and let x ∈ X . We need
to prove that ϕ extends to a morphism in a neighbourhood of x with image x := ϕ̄(x) .
We may assume that all irreducible components of X pass through x . Let U be an open
dense subset of X , where ϕ is defined as a morphism.
Clearly, x is in the closure of ϕ(U ) . By A.11.8, we have to prove that, for any regular
function f  in an open neighbourhood of x , the rational function f  ◦ ϕ is regular in a
neighbourhood of x .
Since f  ◦ ϕ̄ is regular in x over K̄ , we know that no poles of div(f  ◦ ϕ̄) pass through
x . Since div(f  ◦ ϕ̄) is the base change of div(f  ◦ ϕ) to K̄ (use that X is smooth and
work with Cartier divisors), we conclude that f  ◦ ϕ has no poles through x .
Now we know that a rational function on a smooth variety without poles is regular (cf.
A.8.21). Therefore, the function f  ◦ ϕ is regular in an open neighbourhood of x . 
A.12. Properties of morphisms 577

A.11.10. Let ϕ : X  X  be a rational map of a K -variety X regular in


codimension 1 to a complete K -variety X  with domain U . By the valuative
criterion of properness ([148], Th.II.4.7) and the scheme-theoretic analogue of
A.11.8, we have
codim(X \ U, X) ≥ 2.
In particular, if X is a regular curve, then ϕ extends to a morphism.

A.12. Properties of morphisms

A.12.1. Let ϕ : X → X  be a dominant morphism of irreducible varieties over


K . Then for any y ∈ ϕ(X), every irreducible component of Xy = ϕ−1 (y) has
dimension ≥ dim(X)−dim(X  ) and there is an open dense subset U of X  such
that
dim(Xy ) = dim(X) − dim(X  )
for all y ∈ U . This is the dimension theorem.

The inequality follows from [137], Th.5.5.8, and the existence of U is a consequence of
generic flatness (see A.12.13 below). The whole statement is part of [148], Exercise II.3.22.
A.12.2. Let ϕ : X → X  be a morphism of K -varieties. Then ϕ is called a
finite morphism if there is an open affine covering (Uα )α∈I  of X  such that
Uα := ϕ−1 (Uα ) is also affine and such that K[Uα ] is a finitely generated K[Uα ]-
module using A.2.4.
Lemma A.12.3. A finite morphism has finite fibres.
Proof: We may assume X and X  both affine.
For x ∈ X  , the fibre over x is an affine variety 
√ over K(x ) with coordinate ring

K(x ) ⊗K [X  ] K[X] divided by the radical ideal 0 . We may assume that K[X] is a
finite K[X  ] -module. Therefore, the coordinate ring of Xx  is a finite-dimensional K(x ) -
vector space.
Suppose now that Y is an irreducible component of Xx  . Then K[Y ] is a quotient of
K[Xx  ] . We conclude that K[Y ] is also a finite-dimensional K(x ) -vector space and
hence a field. Its transcendence degree over K(x ) is 0 proving dim(Y ) = 0 (see Exam-
ple A.3.6). 
A.12.4. There is the following converse: A morphism ϕ : X → X  of varieties
over K is finite if and only if ϕ is a proper morphism with finite fibres ([136],
Prop.4.4.2).
A.12.5. Let ϕ : X → X  be a finite morphism and let U  be any affine open
subset of X  . Then U := ϕ−1 (U  ) is affine and K[U ] is a finitely generated
K[U  ]-module ([148], Exercise II.3.4). Therefore, the definition does not depend
on the choice of the affine open covering.
Moreover, we conclude that the composition of finite morphisms remains finite.
578 A L G E B R A I C G E O M E T RY

Example A.12.6. For every irreducible variety X over K , there is a canonical


irreducible normal variety X  called the normalization of X .
To sketch the construction, we first assume that X is affine.
The integral closure A of K[X] in K(X) is a finite K[X]-module ([338], Th.9,
p.267). We conclude that there is an irreducible affine variety X  with K[X  ] =
A . It follows easily that the localization of an integrally closed domain remains
integrally closed and hence X  is normal.
The inclusion K[X] ⊂ K[X  ] leads to a canonical birational finite morphism
π : X  → X (using A.2.4, A.11.4). For general X , the irreducible normal variety
X  and the canonical finite birational morphism π : X  → X is obtained by a
gluing process (for details, see [135], Sec.6.4).
A.12.7. Let ϕ : X → X  be a finite surjective morphism of varieties over K and
let L be a line bundle on X  . Then L is ample if and only if ϕ∗ (L ) is ample
on X ([148], Exercise III.5.7(d)).
A.12.8. Let ϕ : X → X  be a proper morphism of varieties over K . Then there
is a variety X  , a finite morphism ψ : X  → X  and a surjective morphism
ρ : X → X  with connected fibres such that ϕ = ψ ◦ ρ . These properties do not
determine X  , ρ and ψ uniquely. However, the construction below is canonical,
called the Stein factorization:

For an affine open subset U  of X  , the sheaf OX (ϕ−1 (U  )) is a finitely generated K[U ] -
module (see A.10.18). Although ϕ−1 U  may not be affine, there is an affine variety U  ,
unique up to isomorphism, with K[U  ] = OX (ϕ−1 (U  )) . This follows from the fact that
the K -algebra OX (ϕ−1 (U  )) is finitely generated and without nilpotents, so we may use
the generators to define coordinates. Using the homomorphism ϕ : K[U  ] → K[U  ] ,
we get a finite morphism ψ : U  → U  with ψ  = φ (see A.2.4). Using the affine
open subsets Ug , g ∈ K[U  ] , we prove easily that the morphisms ψ : U  → U  agree
on overlappings, i.e. with varying U  we can paste the U  and the morphisms U  →
U  to get a K -variety X  and a morphism ψ : X  → X  . By construction, ψ is
finite. To define ρ : X → X  on ϕ−1 (U  ) , let x1 , . . . , xn be generators of K[U  ] =
OX (ϕ−1 (U  )) as a K -algebra. We may view them as coordinates on U  and we define
ρ(x) := (x1 (x), . . . , xn (x)) . It is easy to see that we get a well-defined morphism ρ :
X → X  with ρ−1 (U  ) = ϕ−1 (U  ) for all U  . By construction, ρ : K[U  ] →
OX (ϕ−1 (U  )) is the identity. We conclude that ϕ = ρ ◦ψ  on K[U  ] proving ϕ = ψ◦ρ
on X . By the factorization property and A.6.15(c), we conclude that ρ is proper. Since
ρ is one-to-one, it is easy to see that ρ(X) is dense in X  (use [148], Exercise II.2.18).
Since a proper map is closed, we conclude ρ(X) = X  . For the proof of connected fibres
and more details, we refer to [136], 4.3.

A.12.9. Suppose that ϕ : X → X  is a morphism of irreducible varieties over K .


We assume that dim (X) = dim (X  ) and that ϕ is dominant. The degree of ϕ
is defined by deg(ϕ) := [K(X) : K(X  )] (finite by A.9.10). Then there is an
A.12. Properties of morphisms 579

open dense subset U  of X  such that the restriction of ϕ gives a finite morphism
ϕ−1 (U  ) → U  whose fibres have cardinality equal to the separable degree of
K(X)/K(X  ) also called the separable degree of ϕ.

We sketch the proof. We may assume that X and X  are affine. Using the correspon-
dence between finitely generated fields and varieties up to birationality, we may assume
that K(X) is a primitive extension of K(X  ) , which is either separable or purely insepa-
rable. Let q(t) be the minimal polynomial of a primitive element over K(X  ) . Again by
shrinking X and X  , we may assume that q(t) ∈ K[X  ][t] and that
K[X] ∼ 
= K[X ][t]/(q(t)).
This proves already finiteness. If q is separable, then q and dt d
q are coprime in K(X  )[t] ,
hence there are a, b ∈ K(X )[t] with aq + b dt q = 1 . Shrinking X, X  again, we may
 d

assume that a, b ∈ K[X  ][t] . It follows that the identity survives after specializing in any
x ∈ X  , hence the specialization of q in x is separable. We conclude that the fibre over
x has exactly deg(q) = [K(X) : K(X  )] points.
If K(X)/K(X  ) is purely inseparable, then K has characteristic p = 0 and q(t) =
e
tp − h for some h ∈ K(X  ) ([157], Prop.8.13). Again, we may assume h ∈ K[X  ] and
all the fibres have exactly one element.
A.12.10. In algebra, there is an important generalization of free modules. Let A
be a commutative ring with 1 and let M be an A -module. Then M is called
flat if for every injective A -module homomorphism N  → N , the tensor homo-
morphism M ⊗A N  → M ⊗A N is also injective. Flatness is preserved under
base change and localization. Moreover, M is flat if and only if the localization
M℘ is a flat A℘ -module for all prime ideals ℘ of A . Clearly, the same holds if
we consider only maximal ideals. For proofs and more details, we refer to [197],
Ch.2, §3.
A.12.11. We transfer this concept to algebraic geometry by calling a morphim
ϕ : X → X  of varieties over K flat in x ∈ X if ϕ : OX  ,ϕ(x) → OX,x makes
OX,x into a flat OX  ,ϕ(x) -module. If ϕ is flat in all points of X , then ϕ is called
flat. If X and X  are affine, then ϕ is flat if and only if K[X] is a flat K[X  ]-
module. This follows from A.12.10 and the one-to-one correspondence between
maximal ideals and points modulo conjugates (see A.2.7).
A.12.12. Flatness is important since many properties of one fibre hold for all fibres.
For example, if ϕ : X → X  is a flat morphism of pure dimensional varieties over
K , then for x ∈ ϕ(X), we have
dim ϕ−1 (x ) = dim(X) − dim(X  ).
[148], Cor.III.9.6. Without flatness, the dimension theorem only ensures ≥ . For
example, if X is the blow up of a two-dimensional variety in a point, then the
inequality is strict. If ϕ is a flat morphism of projective varieties, then equal-
ity follows from the more general fact that the Hilbert polynomial is stable with
respect to the scheme-theoretic fibres (use A.10.33 and A.10.35).
580 A L G E B R A I C G E O M E T RY

A.12.13. The theorem on generic flatness states that for any morphism ϕ : X →
X  of varieties over K , there is an open dense subset U  of X  such that the
restriction of ϕ to a map ϕ−1 (U  ) → U  is flat ([137], Th.6.9.1).
Note that if ϕ is not dominant, this is not interesting since we may choose U  :=
X  \ ϕ(X) . On the other hand, every flat morphism of K -varieties is open ([137],
Th.2.4.6) and hence dominant.
A.12.14. A morphism ϕ : X → X  of varieties over K is called unramified in
x ∈ X if the following conditions are satisfied:
(a) The maximal ideal mx of OX,x is generated by the image of mϕ(x) under
the map ϕ : OX  ,ϕ(x) → OX,x , f  → f  ◦ ϕ .
(b) ϕ induces a separable extension K(ϕ(x)) ⊂ K(x) of residue fields.
If ϕ is unramified in all points of X , then it is called an unramified morphism.
If ϕ is unramified and flat, then it is called étale.
Example A.12.15. We assume that X and X  are both affine varieties over K
and K[X] = K[X  ][t]/(f (x , t)) where f (x , t) is a monic polynomial in the
variable t with coefficients in K[X  ]. There is a unique morphism ϕ : X → X 
with ϕ equal to the canonical homomorphism K[X  ] → K[X] (see A.2.4). Note
that ϕ makes K[X] into a free K[X  ]-module of rank deg(f ). Hence ϕ is a
finite flat morphism. We claim that the set of unramified (even étale) points in X
is equal to X \ Z({ ∂t

f }).

We compute first the fibre over x0 ∈ X  . Let



r
f (x0 , t) = fj (x0 , t)n j
j=1

be the decomposition of f (x0 , t)


into different irreducible polynomials fj (x0 , t) in the
polynomial ring K(x0 )[t] . Then the fibre ϕ−1 (x0 ) has coordinate ring

r
K[ϕ−1 (x0 )] ∼
= K(x0 )[t]/(fj (x0 , t))
j=1

by the Chinese remainder theorem. The fibre points are in one-to-one correspondence
with the zeros of f1 (x0 , ·), . . . , fr (x0 , ·) . Let x0 = (x0 , t0 ) be a fibre point over x0 .
Without loss of generality, we may assume that t0 is a zero of f1 (x0 , t) (and hence
of no other fj (x0 , t) ). Let I(x0 ) be the ideal of vanishing of x0 in K[X  ] , hence
K(x0 ) = K[X  ]/I(x0 ) . By the Chinese remainder theorem again, we have

r
K[X]/K[X]I(x0 ) ∼
= K(x0 )[t]/(fj (x0 , t)n j ).
j=1

Using localization in I(x0 ) , we conclude that


OX,x /OX,x mx  = ∼ K(x0 )[t]/(f1 (x0 , t)n 1 ).
0 0 0
A.13. Curves and surfaces 581

On the other hand, we have


OX,x 0 /mx 0 = K(x0 ) ∼  
= K(x0 )[t]/(f1 (x0 , t)).
We conclude that mx 0 generates mx 0 if and only if n1 = 1 . Moreover, the residue field
K(x0 ) is separable over K(x0 ) if and only if f1 (x0 , t) has no multiple roots. We conclude
that ϕ is unramified in x0 if and only if t0 is a simple zero of f (x0 , t) which proves the
claim.
A.12.16. In the situation of Example A.12.15, let V be an open subset of X \
Z({ ∂t∂
f }). Then the restriction ψ : V → X  of ϕ is called a standard étale
morphism. Next, we will see that étale morphisms are locally build up by standard
étale morphisms.
A.12.17. Let ϕ : X → X  be an étale morphism. For every x ∈ X , there are open
affine neighbourhoods U, V  of x and ϕ(x), respectively, such that ϕ restricts to
a standard étale morphism ψ : U → V  . For a unramified morphism, there is a
closed embedding i : U → V and a standard étale morphism ψ : V → V  such
that ϕ = ψ ◦ i. This follows from a theorem of Chevalley (see [137], Th.18.4.6
and the proof of Cor.18.4.7).
In the theory of complex manifolds, an étale morphism is just a local isomorphism.
However, this does not hold in algebraic geometry, but we have the following result
which we deduce easily from [44], Prop.2.2.8:
Proposition A.12.18. Let ϕ : X → X  be a morphism of smooth varieties over
K . Then ϕ is étale in x ∈ X if and only if the differential dϕ induces an
isomorphism

TX,x → TX  ,ϕ(x) ⊗k(ϕ(x)) k(x).
A.12.19. A dominant morphism ϕ : X → X  of irreducible varieties over K is
called separable if K(X) is a separable extension of K(X  ). If ϕ is separable
and dim(X) = dim(X  ), the proof of A.12.9 shows that ϕ is generically unram-
ified over X  , i.e. there is an open dense subset U  of X  such that the restriction
of ϕ to ϕ−1 (U  ) → U  is étale.

A.13. Curves and surfaces

A.13.1. A curve (resp. a surface) over the field K is a pure dimensional K -


variety of dimension 1 (resp. 2).
A.13.2. Any curve over K is birational to a regular projective curve over K .

To see this, we note first that the disjoint union of projective curves is projective. Hence
we may assume C irreducible. By passing to an open affine part and then to the projective
closure, we may assume C projective. The normalization π : C  → C is a birational
finite morphism (see A.12.6). By A.12.7, C  is projective. Since a normal curve is regular
(Theorem A.8.5), we get the claim.
582 A L G E B R A I C G E O M E T RY

A.13.3. If K is perfect, then a regular curve is smooth (see A.7.13). For non-
perfect K , this does not necessarily hold (see [148], Exercise III.10.1). For any
curve C over K , the above implies that CK is birational to a smooth projective
curve over K . Clearly, this holds also over a finite-dimensional subextension of
K/K .
A.13.4. Let C be a geometrically irreducible smooth projective curve over K .
The dual of the tangent bundle is a line bundle which we call the canonical line
bundle of C . It is denoted by KC . The genus of C is
g(C) := dim Γ(C, KC ).
In other words, it is the dimension of the space Ω1C (C) of globally defined 1-forms
on C . For example, if C is a smooth plane curve of degree d in P2K , i.e. C is
the zero set of a geometrically irreducible homogeneous polynomial f (x0 , x1 , x2 )
of degree d , then the genus formula
1
g(C) = (d − 1)(d − 2)
2
holds ([148], Exercise II.8.4(f)).

We have seen in A.9.24 that the degree of a zero-dimensional cycle does not de-
pend on its rational equivalence class. Using CH0 (C) ∼= Pic(C) from A.9.18,
the degree deg(L) of a line bundle L on C is defined as the degree of the
corresponding rational equivalence class. Most important is the Riemann–Roch
theorem for curves:
Theorem A.13.5. Let L be a line bundle on the geometrically irreducible smooth
projective curve C over K . Then we have
dim Γ(C, L) − dim Γ(C, KC ⊗ L−1 ) = deg(L) + 1 − g(C).
Proof: For K algebraically closed, this is proved in [148], Th.IV.1.3. By base change and
the following remarks, we reduce to this case. First, note that the base change of the tangent
bundle is the tangent bundle of the base change (see A.7.6). So the same holds for the
canonical line bundle. The cohomology on a geometrically reduced variety is compatible
with base change (see A.10.28). We have seen in the proof of A.9.25 that degree of a divisor
(and hence of a line bundle) is invariant under base change. This proves the claim in general.


A.13.6. As an application (setting L = KC ), we conclude deg(KC ) = 2g(C)−2.


Using cohomology, we can reformulate the Riemann–Roch theorem by
χ(L) = deg(L) + 1 − g(C),
where χ(L) = H (C, L) − H 1 (C, L) is the Euler characteristic of the sheaf of
0

sections L of L.
A.14. Connexion to complex manifolds 583

To see this, we may assume K algebraically closed as above. Then the claim
follows from Serre duality (see A.10.29), i.e. H 1 (C, L) is the dual space of
H 0 (C, Ω1C ⊗ L−1 ).
A.13.7. Let C be a smooth geometrically irreducible projective curve over K and
let L be a line bundle on C . Then L is ample if and only if deg(L) > 0. If
deg(L) ≥ 2g(C) + 1, then L is very ample.

For K algebraically closed, this is proved in [148], Cor.IV.3.2 and the general case follows
by base change using A.6.12.

A.13.8. Let X be a smooth geometrically irreducible projective variety over K .


Then the tangent bundle of X is a vector bundle of rank d := dim(X) and KX :=

∧d TX is called the canonical line bundle of X . Its sections are d -forms on
X . A divisor of a meromorphic section of KX is denoted by div(KX ) and is
called a canonical divisor. The number pa (X) := (−1)d (χ(OX ) − 1) is the
arithmetic genus of X , where χ is the Euler characteristic from A.10.30. If X
is a smooth irreducible projective curve, then the arithmetic genus agrees with the
genus defined in A.13.4 by Serre duality A.10.29.

The Riemann–Roch theorem for surfaces says:


Theorem A.13.9. Let D be a divisor on the irreducible smooth projective surface
X . Then
1
χ(OX (D)) = D · (D − div(KX )) + 1 + pa (X).
2
Proof: For K algebraically closed, this is [148], Th.V.1.6. The reduction to this case is by
base change similarly as for curves by noting that intersection numbers are invariant under
base change (see A.9.25). 

A.13.10. By Serre duality again (see A.10.29), we may rephrase this as


dim Γ(X, O(D)) − dim H 1 (X, O(D)) + dim Γ(X, O(KX − D)) =
1
= D · (D − div(KX )) + 1 + pa (X).
2
The proof is the same as for curves.

A.14. Connexion to complex manifolds

This section is for a reader which is more familiar with the theory of complex
manifolds (as in [130]) than with the algebraic side. We explain the meaning of
the other sections of the appendix in terms of complex analysis, which could be
helpful for the understanding. For more details and proofs, we refer to [148], App.
B, [280], Ch.s VII, VIII, and J.-P. Serre [275].
584 A L G E B R A I C G E O M E T RY

In this section, K denotes a field contained in C and K is the algebraic closure


of K in C . As usual, x = (x1 , . . . , xn ).
A.14.1. Let X be a Zariski-closed subset of AnK , i.e. there are f1 (x), . . . , fr (x) ∈
n
K[x] such that X ⊂ K is the zero set of these polynomials. Then X was
called an affine variety over K and regular functions on X were restrictions of
polynomials from K[x]. If we consider the zero set of f1 , . . . , fr in Cn , then we
get an affine variety XC over C . It is closed in Cn with respect to the Zariski
topology.
A.14.2. On Cn , we have the complex topology given by the basis formed by the
open balls. Clearly, the complex topology is finer than the Zariski topology. We
define Xan to be the complex space with the same underlying set as XC and with
the topology induced by the complex topology of Cn .
A.14.3. More generally, if X is a variety over K , then we can use an affine atlas
(Uα , ϕα )α∈I of X to define an atlas on XC . Passing to the corresponding com-
plex spaces, we get an atlas of a complex space Xan . The original variety X is
often called an algebraic variety and Xan is called the associated complex ana-
lytic variety. The dimension of the algebraic variety is the same as the dimension
of the associated complex space. It follows from the Jacobi criterion that X is
smooth if and only if Xan is a complex manifold.
For the rest of the section, we assume for simplicity that X is a smooth variety
over K .
A.14.4. First, note that every regular function on an open subset U of X is the
restriction of a unique analytic function on Uan . A morphism of smooth varieties
over K extends uniquely to an analytic map of the associated complex manifolds.
The smooth variety X is geometrically irreducible if and only if Xan is connected.
The dimension of X is equal to the dimension of the complex manifold Xan . If
X1 and X2 are smooth varieties over K , then (X1 × X2 )an = (X1 )an × (X2 )an .
A rational function on X gives rise to a meromorphic function on Xan .
A.14.5. If E is a vector bundle over X , then Ean is a holomorphic vector bundle
on Xan . They have the same transition matrices. Clearly, every section of E
extends uniquely to a holomorphic section of Ean . The basic operations of vector
bundles are compatible with passing to the associated holomorphic vector bundles.
Moreover, (TX )an is the tangent bundle of the complex manifold Xan .
A.14.6. The variety X is complete if and only if Xan is compact. More generally,
a morphism ϕ of algebraic varieties is proper if and only if inverse images of
compact subsets with respect to ϕan remain compact.
A.14.7. We have seen above that to every algebraic object, we have canonically an
analytic one. On complete varieties, we can often reverse the procedure due to the
GAGA principle of Serre (see [275]; [140], XII).
A.14. Connexion to complex manifolds 585

Suppose that X is a smooth complete variety over C . Then meromorphic func-


tions on Xan are the same as rational functions on X . More generally, a holo-

morphic map ϕ : Xan → Xan for smooth complete varieties X, X  over C gives

a morphism X → X . The procedure Y → Yan maps the set of closed subsets of
PnC bijectively onto the set of closed subsets of (PnC )an . If E, E  are vector bundle
on X , then E ∼ = E  if and only if Ean ∼= Ean 
as holomorphic vector bundles.
Moreover, the cohomology groups H (X, E) agree with the cohomology groups
p

H p (Xan , Ean ).
A.14.8. Let X be a smooth variety over K . Any divisor D on X gives rise to
a divisor Dan on Xan . Principal divisors pass to principal divisors on Xan . We
have a canonical homomorphism CHr (X) → H2r (Xan , Z) mapping the prime
cycle Y to Yan in the homology group. If D is a divisor on X , then D.Y maps
to the cup product of Dan and Yan . Hence it is the same to compute intersection
numbers on X or in homology on Xan . Two divisors on X are algebraically
equivalent if and only if they are homologically equivalent on Xan , i.e. if they have
the same image in Hdim(X)−2 (Xan , Z). For details, we refer to [125], Ch.19.
A.14.9. If X is an irreducible smooth projective curve over C , then Xan is a
connected projective manifold of dimension 1. The genus of X is equal to the
genus of Xan . A complex manifold of dimension 1 is called a Riemann surface.
We conclude that Xan is a connected compact Riemann surface. Conversely, Rie-
mann’s existence theorem says that every connected compact Riemann surface
has this form ([148], Th.B.3.1).
A P P E N D I X B R A M I F I C AT I O N

B.1. Discriminants

The discriminant is a measure for ramification. In Section B.1, we begin with


the study of discriminants from a purely algebraic point of view. The arithmetic
aspect enters by applying it to the local rings of Dedekind domains. This will
be continued in Section B.2 to study unramified field extensions in some given
places and we will mention the classical discriminant theorems from algebraic
number theory. In Section B.3, we compare these notions with the concept of
unramified morphisms from algebraic geometry. The final Section B.4 introduces
the ramification divisor of a morphism of algebraic varieties in characteristic 0 and
we will prove Hurwitz’s theorem which is important at various places in this book.
B.1.1. Let R be a commutative ring with 1 and let A be a commutative R -algebra
free as an R -module of finite rank n . The discriminant of A with respect to
the basis e1 , . . . , en is given by
DA/R (e1 , . . . , en ) := det(TrA/R (ei ej )) ∈ R.
Recall that the trace and the norm of a ∈ A are given by

n
TrA/R (a) = αii ∈ R, NA/R = det(αij ) ∈ R,
i=1
n
where a · ej = i=1 αij ei for coordinates αij ∈ R .
B.1.2. If e1 , . . . , en is another
n basis of A , then there is a n × n matrix λ with
entries in R such that ei = j=1 λji ej . The transformation matrix λ is invertible
and we have
DA/R (e1 , . . . , en ) = det(λ)2 DA/R (e1 , . . . , en ).
Since det(λ) is a unit in R , this proves that the principal ideal dA/R in R gener-
ated by DA/R (e1 , . . . , en ) does not depend on the choice of the basis. Then dA/R
is called the discriminant ideal of A/R or simply the discriminant.
B.1.3. Let f ∈ R[t] be a monic polynomial of degree n . Then
A := R[t]/[f (t)]
586
B.1. Discriminants 587

is an R -algebra with R -module basis 1, t, . . . , tn−1 . We call


Df := DA/R (1, t, . . . , tn−1 )
the discriminant of f .
Example B.1.4. Let R := Z[x1 , . . . , xn ] . We consider the polynomial
f (t) := tn + x1 tn−1 + · · · + xn ∈ R[t].
Let y1 , . . . , yn be the zeros of f (t) . It is known from Galois theory that y1 , . . . , yn
generate a Galois extension K = Q(y1 , . . . , yn ) of degree n! over the fraction field
Q(x1 , . . . , xn ) of R . We have f (t) = (t − y1 ) · · · (t − yn ) . For the i th elementary
symmetric polynomial si , we get
xi = si (−y1 , . . . , −yn ). (B.1)
Clearly, the discriminants of f considered as a polynomial in R[t] and K[t] , respectively,
are the same. For B := K[t]/[f (t)] , we conclude that
Df = DB /K (1, t, . . . , tn−1 ).
By the Chinese remainder theorem, we have a canonical isomorphism

n
B −→ K[t]/(t − yj ), t
→ (y1 , . . . , yn ).
j=1

Let ej be the j th idempotent, i.e.


 
ej = (yj − yk )−1 (t − yk ).
k=j k=j

Then e1 , . . . , en is a K -basis of B and we have


n
ti = yji ej .
j=1

Hence the transformation matrix a is a Vandermonde matrix. By B.1.2, this leads to


DB /K (1, t, . . . , tn−1 ) = det(a)2 DB /K (e1 , . . . , en ).
Since DB /K (e1 , . . . , en ) = 1 , we get
 n (n −1) 
Df = (yi − yj )2 = (−1) 2 (yi − yj ). (B.2)
i>j i=j

Note that the right-hand side is a symmetric polynomial in y1 , . . . , yn and hence it is a


polynomial in x1 , . . . , xn with coefficients in Z ([156], Th.2.19).
Remark B.1.5. Note that Example B.1.4 may be used to compute the discriminant
of a monic polynomial f whenever the zeros of f are known. More precisely, let
R be again an arbitrary commutative ring with 1 and let f ∈ R[t] be monic
and of degree n . Suppose that R is a subring of a commutative ring R and
that f (t) = (t − ξ1 ) . . . (t − ξn ) for some ξ1 , . . . , ξn ∈ R . If we specialize
y1 = ξ1 , . . . , yn = ξn in Example B.1.4, we get easily from (B.2)
 n (n −1) 
Df = (ξi − ξj )2 = (−1) 2 (ξi − ξj ).
i>j i=j
588 RAMIFICATION

Remark B.1.6. Even if we do not know the zeros, we get information from Ex-
ample B.1.4. Let f (t) := tn + a1 tn−1 + · · · + an ∈ R[t]. By the specialization
xi = ai in Example B.1.4, we see that the discriminant Df is a polynomial in
a1 , . . . , an with coefficients in Z . The right-hand side of (B.2) is homogeneous
of degree n(n − 1) in y1 , . . . , yn . Hence if we consider ai of degree i, then
(B.1) shows that the discriminant is a homogeneous polynomial in the weighted
variables a1 , . . . , an of degree n(n − 1).
B.1.7. Let f (t) = am + am−1 t + · · · + a1 tm−1 + a0 tm , g(t) = bn + bn−1 t + · · · +
b1 tn−1 + b0 tn be polynomials with coefficients in R . For k ∈ Z \ {0, . . . , m}, we
set ak = 0 and, similarly, we proceed with the coefficients of g . Then we form
the (m + n) × (m + n) matrix M by the rules

ai−j if 1 ≤ j ≤ n,
Mij =
bi−j+n if n + 1 ≤ j ≤ m + n.
For m = 2, n = 3, the matrix is
⎛ ⎞
a0 0 0 b0 0
⎜a1 a0 0 b1 b0 ⎟
⎜ ⎟
⎜a2 a1 a0 b2 b1 ⎟
⎜ ⎟.
⎝ 0 a2 a1 b3 b2 ⎠
0 0 a2 0 b3
Then the resultant of f, g with respect to m, n is resm,n (f, g) := det(M ).
If we make the natural choice m = deg(f ) and n = deg(g), then we skip the
reference to m and n .
B.1.8. Suppose that f is monic and g is an arbitrary polynomial in R[t], then
res(f, g) is invertible in R if and only if the ideal generated by f and g is R[t]
(N. Bourbaki [49], Ch.IV, §6, No.6, Prop.7, Cor.1). In particular for a field R , we
have res(f, g) = 0 if and only if f and g are coprime (i.e. if and only if they have
no common root in an algebraic closure).
B.1.9. The resultant may be used to compute the discriminant in terms of the co-
efficients. Let f  (t) be the formal derivative of the monic polynomial f (t) =
tn + a1 tn−1 + · · · + an ∈ R[t] of degree n . Then
n (n −1)
Df = (−1) 2 resn,n−1 (f, f  ).
([49], Ch.IV, §6, No.7, Prop.11). This shows that Df is a polynomial of degree
≤ 2n − 2 in a1 , . . . , an with coefficients in R and we have certainly equality
in “most” cases. Note that in contrast to Remark B.1.6, the a1 , . . . , an are not
weighted, i.e. we consider a1 , . . . , an as variables of degree 1.
B.1.10. The considerations in B.1.9 make it natural to define the discriminant of
any polynomial f (t) := a0 tn + a1 tn−1 + · · · + an by
Df := a02n−2 Dg ,
B.1. Discriminants 589

where g(t) := tn + aa10 tn−1 + · · · + aan0 . In this way, Df is a homogeneous


polynomial of degree 2n − 2 in a0 , . . . , an with coefficients in Z . Obviously, this
definition extends the one in B.1.3 for monic polynomials. Note that the definition
even makes sense in the case a0 = 0 if we consider first the right-hand side as a
homogeneous polynomial in the variables a0 , . . . , an and then we specialize. To
avoid notational problems in this case, we have to emphasise that the discriminant
of f is with respect to n and we denote it by Df,n . From B.1.9, we get
n (n −1)
Df,n = (−1) 2 resn,n−1 (f, f  )/a0
for an arbitrary polynomial f .
B.1.11. Let f, g ∈ R[t] be of degree bounded by m and n , respectively. Then
Df g,m+n = Df,m Dg,n resm,n (f, g)2
([49], Ch.IV, §6, No.7, Prop.11, Cor.1).
B.1.12. Let ϕ : R → R be a homomorphism of commutative rings with 1. For
f (t) := a0 tn +a1 tn−1 +· · ·+an , let ϕ(f ) := ϕ(a0 )tn +ϕ(a1 )tn−1 +· · ·+ϕ(an ).
Since forming the resultant is compatible with this operation, we have Dϕ(f ),n =
ϕ(Df,n ). This property was already used in the specialization of Remark B.1.5.
B.1.13. Let f, g ∈ R[t] with f monic and let A := R[t]/[f ] be as in B.1.3. Let
m := deg(f ) and deg(g) ≤ n . Then the resultant may be expressed in terms of
the norm by
resm,n (f, g) = NA/R (g).
([49], Ch.IV, §6, No.6, Prop.7). In particular, we get
m (m −1)
Df = (−1) 2 NA/R (f  ).

B.1.14. Now we study discriminants in the case of Dedekind domains. The main
reference is the book of J.-P. Serre [276]. A Dedekind domain is an integrally
closed domain of Krull dimension ≤ 1. Let R be a Dedekind domain with field
of fractions K . We consider a finite-dimensional separable field extension L/K .
In most of our applications, R will be the ring of algebraic integers in a number
field. The integral closure of R in L is denoted by R̄L . Note that R̄L is not
necessarily a free R -module. The integral closure is only an R -lattice, i.e. a
finitely generated R -module which generates L as a K -vector space (see [276],
Ch.I, §4, Prop.8). Let n := [L : K]. Then the discriminant of R̄L over R is the
ideal dR̄L /R in R generated by
{det(TrL/K (ai bj )) | a1 , . . . , an , b1 , . . . , bn ∈ R̄L }.
Since the K -bilinear form α, β := TrL/K (αβ) on L is not degenerated (as
L/K is separable, see [49], Ch.V, §10, no.6, Prop.10), the discriminant is not
zero.
590 RAMIFICATION

If no confusion is possible, we use the notion dL/K := dR̄L /R . Moreover, if the


discriminant ideal is principal, DL/K denotes a principal generator of dL/K . For
K = Q , the choice can be made unique by using the positive principal generator.
Remark B.1.15. If we suppose additionally that R̄L is a free R -module with
basis e1 , . . . , en , then for every a1 , . . . , an , b1 , . . . , bn ∈ RL , we have
det(TrL/K (ai bj )) = det(A)det(B)det(TrL/K (ei ej )),
 
where ai = k Aki ek and bj = k Bkj ek . Hence the definitions in B.1.2 and
B.1.14 agree. Since L/K is separable, we have exactly n different embeddings
σ1 , . . . , σn of L into the algebraic closure K̄ (see [156], Th.4.26). Since the trace
is the sum of the conjugates, we get

n
TrL/K (ei ej ) = σk (ei ej )
k=1

and hence dL/K is the principal ideal of R generated by det(σi (ej ))2 .
Example B.1.16. Let d√ be a square free integer, |d| ≥ 2 . We consider the quadratic
√ field L := Q( d) . We first determine the algebraic integers in L . Let α :=
number
r + s d ∈ L with r, s ∈ Q . Then
α2 − 2rα + r2 − s2 d = 0.
Hence α is an algebraic integer if and only if 2r, r2 − s2 d ∈ Z . For α to be an algebraic
integer, it is necessary that r = m/2, s = n/2 for m, n ∈ Z . If m is even, then r2 −
s2 d = (m2 − n2 d)/4 is an integer if and only if n is even. If m is odd, then r2 − s2 d ∈ Z
if and only if n is odd and d ≡ 1 (mod 4) . This proves that OL has Z -basis
 √
1, 1+2 d if d ≡ 1 (mod 4),

1, d if d ≡ 2, 3 (mod 4).
In the first case, Remark B.1.15 shows
D  √ 2 E
1+ d
1 2√
dL/Q = det 1− d
= [d].
1 2

Similarly, for d ≡ 2, 3 (mod 4) , we prove


D √
2 E
1 √d
dL/Q = det = [4d].
1 − d

B.1.17. Let L/K be still a finite-dimensional separable extension of the quotient


field K of a Dedekind domain R . The codifferent {a ∈ L | TrL/K (aR̄L ) ⊂ R}
is a fractional ideal of L containing R̄L . Its inverse is an ideal in R̄L called the
different of R̄L over R . It is denoted by DR̄L /R or simply by DL/K . We have
dL/K = NL/K (DL/K ).
For proofs, we refer to [276], Ch.III, §3.
B.2. Unramified extensions 591

B.1.18. The Kähler differentials build an R̄L -module Ω1R̄ L /R := I/I 2 , where I is the
kernel of tensor multiplication R̄L ⊗R R̄L → R̄L (compare with A.7.29). If all residue
extensions of L/K are separable, then Ω1R̄ L /R is generated by one element as an R̄L
-module and DL/K is its annihilator (see [276], Ch.III, §7, Prop.14).

Proposition B.1.19. Let K be the fraction field of the Dedekind domain R and
let L/K and M/L be finite-dimensional separable field extensions. Then
[M :L]
DM/K = DM/L · DL/K , dM/K = NL/K (dM/L ) · dL/K .
Proof: See [276], Ch.III, §4, Prop.8. 

B.1.20. By localizing in maximal ideals, we can always reduce to the free case.
More generally, let S be a multiplicative submonoid of R \ {0}. Then the local-
ization S −1 R is still a Dedekind domain with integral closure S −1 R̄L in L and
we have
DS −1 R̄L /S −1 R = S −1 DR̄L /R , dS −1 R̄L /S −1 R = S −1 dR̄L /R .
In particular, if p is a non-zero maximal ideal of R , we get a discrete valuation
ring Rp and the integral closure of Rp in L is (R̄L )p . Since (R̄L )p is a finitely
generated torsion free module over the principal ideal domain Rp , we conclude
that (R̄L )p is a free Rp -module of rank [L : K] ([156], Th.3.10) and
d(R̄L )p /Rp = (dR̄L /R )p .
Note that (dR̄L /R )p is a power of pRp , hence
 
dR̄L /R = d(R̄L )p /Rp ∩ R
p

is the prime factorization of the discriminant. Similarly, we may proceed for the
different using prime ideals of R̄L .
B.1.21. The formation of the discriminant and the different is also compatible with
completions. Let p be a non-zero maximal ideal of R and let P be a maximal
ideal of S = R̄L with P|p (i.e. P ∩ R = p ). The completions of the discrete
valuation rings Rp and SP are denoted by R̂p and ŜP , respectively. They are
still discrete valuation rings and ŜP is a free R̂p -module of rank equal to the local
degree. By [276], Ch.III, §4, (iii), we have

DŜP /R̂p = R̂P DR̄L /R , dŜP /R̂p = R̂p dR̄L /R .
P|p

B.2. Unramified field extensions

In this section, L/K is a finite-dimensional field extension and v, w are non-


archimedean places of K and L with w|v .
592 RAMIFICATION

B.2.1. We call L/K unramified in w if L/K and the extension of residue fields
k(w)/k(v) are separable and if the local degree [Lw : Kv ] is equal to the residue
degree [k(w) : k(v)]. Otherwise, L/K is called ramified in w . Note that by
passing to completions, the residue degree fw/v and the ramification index ew/v
do not change, hence
ew/v fw/v ≤ [Lw : Kv ] (B.3)
by Proposition 1.2.11. Hence ew/v = 1 if L/K is unramified in w . If the
valuation is discrete, then we have equality in (B.3) and hence [Lw : Kv ] =
[k(w) : k(v)] is equivalent to ew/v = 1. We say that L/K is unramified over v
if L/K is unramified in every place of L lying over v .
Lemma B.2.2. Assume that L is generated over K by a root of a monic f ∈
Rv [t], where Rv is the valuation ring of v . If the discriminant Df is a unit in
Rv , then L/K is unramified over v .
Proof: We call the root α . Using B.1.12, we deduce that α and hence L/K are separa-
ble. A standard trick using the ultrametric inequality or Gauss’s lemma (see Lemma 1.6.3)
proves α ∈ Rv . By B.1.12, we have Df¯ = Df = 0̄ , where the bar denotes reduction
to the residue field. By Remark B.1.5, the polynomial f¯ is separable. By Gauss’s lemma
and B.1.11, we may assume that f is also irreducible. Let f = f1 · · · fr be the factoriza-
tion into irreducible polynomials fj ∈ Kv [t] . By Gauss’s lemma again, we may assume
that the coefficients of these polynomials are contained in the discrete valuation ring of the
completion. By separabiltiy of f¯, the factors fj are pairwise coprime. The polynomial
f1 , . . . , fr are in one-to-one correspondence with the places of L over v (see Proposition
1.3.1), we denote the latter by w1 , . . . , wr . By Hensel’s lemma (see Lemma 1.2.10), f¯j is
a power of an irreducible polynomial in k(v)[t] . Using separability of f¯, we conclude that
fj is indeed irreducible over k(v) . Since Kv [t]/[fj (t)] = Lw j (see Proposition 1.3.1),
we get
[Lw j : Kv ] ≥ fw j /v ≥ deg(fj ) = deg(fj ) = [Lw j : Kv ]
and hence equality occurs everywhere. We conclude that ᾱ is a primitive separable element
of k(wj )/k(v) . Hence L/K is unramified in every wj proving the claim. 

Proposition B.2.3. Let E/L be a finite-dimensional field extension and let u be


a place of E with u|w . Then E/K is unramified in u if and only if E/L is
unramified in u and L/K is unramified in w .
Proof: By [49], Ch.V, §7, no.5, Prop.9, the residue field extension k(u)/k(v) is separable
if and only if k(u)/k(w) and k(w)/k(v) are separable. The same transitivity holds for
K ⊂ L ⊂ E . By fu/v = fu/w fw/v and [Eu : Kv ] = [Eu : Lw ][Lw : Kv ] , we get the
claim. 

Proposition B.2.4. Let L /K be any field extension. We suppose that L, L are


contained in a field E to define the composite LL inside E . Let u be a place of
LL with u|w . If L/K is unramified in w , then LL /L is unramified in u .
B.2. Unramified extensions 593

Proof: Since LL /L is generated by separable elements (from L ), the extension is separa-
ble. Note that the completion (LL )u is the composite of Lw and L . This follows since
Lw is the closure of L in (LL )u and Lw L is a finite-dimensional field extension of the
complete field Lw , hence Lw L is also complete (Proposition 1.2.7). For the restriction
w of u to L , we get
(LL )u = Lw L = Lw Lw  .
Because of the invariance of the residue fields under completions, we may assume that L
and L are complete with respect to w and w , respectively. Since k(w)/k(v) is separable,
the theorem of the primitive element (see [156] §4.14) gives k(w) = k(v)(ᾱ) for a suitable
α in the valuation ring Rw of w . As a consequence of unramifiedness and completeness,
we have [L : K] = [k(w) : k(v)] , hence α is a primitive element of L/K . Let f be the
minimal polynomial of α over K , then our above considerations also show that we may
assume f to be a monic polynomial with coefficients in Rv and that the reduction f¯ is
the minimal polynomial of ᾱ over k(v) . Obviously, α is a primitive element of LL /L .
By Remark B.1.5, the discriminant Df¯ of the separable polynomial f¯ is not zero. Using
Df¯ = Df (see B.1.12), we conclude that Df is a unit in Rv . Finally, Lemma B.2.2 proves
that LL /L is unramified in u . 

From the propositions in B.2.3 and B.2.4, we deduce:


Corollary B.2.5. Let L /K be a finite-dimensional field extension such that L
and L are contained in a field E . Let w be a place of L with w |v and let u
be a place of the composite LL inside E lying over w and w . Then LL /K is
unramified in u if and only if L/K is unramified in w and L /K is unramified
in w .

In the next lemma, we consider v as a valuation of K , i.e. we fix an absolute


value | |v in the non-archimedean place v and we set v := − log | |v (by abuse
of notation). The value group of v is defined by v(K × ), it is an additive subgroup
of R .
Lemma B.2.6. Let m ∈ N \ {0} and let α1/m be an m th root of α ∈ K × . If
K(α1/m )/K is unramified in a valuation w of L with w|K = v , then m divides
v(α) in the value group of v . Conversely, if the characteristic of the residue field
k(v) is prime to m and if m|v(α), then K(α1/m )/K is unramified over v .
Proof: Suppose that K(α1/m )/K is unramified in w . Then ew/v = 1 (see B.2.1) and
hence
v(α) = mw(α1/m ) ∈ mv(K × ).
Conversely, we assume that m and char(k(v)) are coprime and that m|v(α) . The latter
implies that we may assume v(α) = 0 . Obviously, α1/m is a root of f (t) := tm − α .
Using B.1.9, we easily see that
m (m −1)
Df = (−1) 2 mm αm−1 .
By assumption, this is a unit in the valuation ring Rv . Then Lemma B.2.2 proves the claim.

594 RAMIFICATION

Example B.2.7. Let K := Q2 (t) be the quotient field of the polynomial ring over the field
Q2 of 2 -adic numbers. On K , we have the discrete valuation v induced by
 
 
 i
 ai t  := max |ai |2 ,
  i
i v
which we considered√in Definition 1.6.1. Clearly, the same construction gives a discrete
valuation w on K( t) extending v . Obviously, we have ew/v = 1 and the residue

field k(w) = F2 (√ t) is a non-separable extension of k(v) = F2 (t) of degree 2 . Since
fw/v = 2 = [K( t) : K] , we conclude that w is the unique valuation over v . Although

2|v(t) = 0 , the extension K( t)/K is ramified over v .

Definition B.2.8. Let E/K be an algebraic field extension which is not necessar-
ily finite-dimensional and let v be a non-archimedean place of K . By Corollary
B.2.5, the union of all intermediate fields L of E/K with [L : K] < ∞ and
with L/K unramified over v is still a subfield of E denoted by Kvnr . It fol-
lows from Corollary B.2.5 that a finite-dimensional subextension L/K of E/K
is unramified over v if and only if L is contained in Kvnr .
B.2.9. Let R be a Dedekind domain with field of fractions K = R and let L/K
be a finite-dimensional separable field extension. Then the integral closure R̄L of
R in L is also a Dedekind domain (see B.1.14). By Theorem A.8.5, the localiza-
tion Rp in a maximal ideal p of R is a discrete valuation ring, inducing a place vp
of K . We say that L/K is unramified in a maximal ideal P of R̄L if the exten-
sion is unramified in the corresponding place vP of L. A fundamental property
of a Dedekind domain is that every non-zero ideal has a unique factorization into
maximal ideals (see [157], Th.10.1). Let p := P ∩ R with factorization
pR̄L = Pe11 . . . Perr
into different maximal ideals P1 , . . . , Pr of R̄L . Note that ej is the ramifica-
tion index of vPj over vp and that every place of L over vp has this form. We
conclude that L/K is unramified in P if and only if P occurs in the prime fac-
torization with exact power 1 and if R̄L /P is a separable field extension of R/p .
B.2.10. Under the assumptions of B.2.9, let P be a non-zero prime ideal of R̄L
and let p := P ∩ R . We assume that R̄L /P is a separable field extension of R/p .
Then vP is said to be wildly ramified over vp if and only if p|evP /vp . Otherwise,
vP is called tamely ramified over vp . Note that unramified is considered as a
special case of tame ramification.

By abuse of notation, we identify vp with the discrete valuation on K normalized


by vp (K × ) = Z (see A.8.4). Similarly vP (L× ) = Z leading to vP = evP /vp vp
on K . Important in this context is Dedekind’s different theorem:
Theorem B.2.11. Under the assumptions of B.2.10, we have
 
evP /vp − 1 ≤ vP (DL/K ) < evP /vp + vP evP /vp
B.2. Unramified extensions 595

with equality on the left if and only if vP is tamely ramified over vp .

For a proof, we refer to [162], Th.3.12.9 and Th.4.6.7. As an immediate con-


f
sequence of B.1.17 and of NL/K (P) = p v P / v p (see [162] §3.10), we obtain
Dedekind’s discriminant theorem:
Theorem B.2.12. Under the assumptions of B.2.9, let p be a non-zero prime ideal
of R such that the residue extension of every prime ideal of R̄L over p is separa-
ble. Then we have
   
vp dL/K = evP /vp (1 + δP ) − 1 fvP /vp ,
P|p

where δP ∈ {1, . . . , vp (evP /vp )} if vP is wildly ramified over vp and δP = 0 if


vP is tamely ramified over vp .
B.2.13. As a corollary of B.2.11 (resp. B.2.12), we obtain that L/K is unramified
in vP (resp. over vp ) if and only if P (resp. p ) is not a divisor of the different
DL/K (resp. of the discriminant dL/K ). This holds even without assuming a
priori that the residue field R̄L /P is separable over R/p (see [276], Ch.III, §5,
Th.1).

For a lot of results in diophantine geometry, Hermite’s discriminant theorem is


crucial:
Theorem B.2.14. For any D > 0, there are only finitely many number fields
K ⊂ Q with discriminant DK/Q bounded in absolute value by D .
Proof: We give the proof under the assumption that the degree of the number field is also
bounded. In all our applications of Hermite’s discriminant theorem, this is a priori clear.
In fact, it follows from Minkowski’s discriminant theorem (see Theorem B.2.16) that the
degree is bounded in terms of the discriminant.

By Corollary C.2.7, OK induces a lattice in K∞ := v|∞ Kv . We assume first that K
has a place w with Kw = R . For a given constant C > 0 , we consider the open convex
subset
S∞ := {α ∈ K∞ | |αw |w < |C|w , |αv |v < | 12 |v ∀v|∞, v = w}.
By the proof of Proposition C.1.10, the volume of a fundamental domain of the lattice with
respect to the normalized Haar measure on K∞ introduced in C.1.9 is equal to |DK/Q |1/2 .
Choosing C sufficiently large depending on DK/Q , Minkowski’s first theorem from C.2.19
gives a non-zero α ∈ OK with corresponding lattice point in S∞ . By the product formula,
we have |α|w ≥ 1 and hence one (real) conjugate of α is different from all other conjugates
(with respect to the embeddings of K into C ). A given embedding Q(α) → C has exactly
[K : Q(α)] extensions to K , hence we get K = Q(α) . Since h(α) is bounded in terms
of C , Northcott’s theorem (see Theorem 1.6.8) shows that only finitely many α may occur
proving the claim.
596 RAMIFICATION

If K has only complex archimedean places, we fix a place w and we consider the open
convex subset
S∞ := {α ∈ K∞ | |#(αw )|w < | 12 |w , |(αw )|w < |C|w , |αv |v < | 12 |v ∀v|∞, v = w}.
Then the same argument as above yields the claim. 
Corollary B.2.15. Let K be a number field and let S be a finite set of places
of K containing all archimedean ones. Then there are only finitely many number
fields L in K of bounded degree which are unramified outside of S .
Proof: The transitivity rule of discriminants (see B.1.19) implies
[L:K ]
dL/Q = NK/Q (dL/K ) · dK/Q .
By Dedekind’s discriminant theorem (see Theorem B.2.12), the norm is bounded. We con-
clude that DL/Q is also bounded and hence the above theorem proves the claim. 
For completeness, we state Minkowski’s discriminant theorem:
Theorem B.2.16. Let K be a number field of degree d with exactly s complex
places. Then
 π 2s d2d
|DK/Q | ≥ · .
4 (d!)2
The proof is another application of Minkowski’s first theorem. For details, we refer to [162],
Th.2.13.5.
 d d
Remark B.2.17. Note that the function f (d) := π4 dd! satisfies
f (d+1)  d
f (d)
= π4 1 + d1 ≥ π2 > 1,
hence Minkowski’s discriminant theorem shows that |DK/Q | > 1 for K = Q and that d
is bounded in terms of the discriminant.
B.2.18. Let L/K be a finite-dimensional Galois extension and let w be a place
of L lying over the complete discrete valuation v of K . We assume also that the
residue field extension k(w)/k(v) is separable. We will find a maximal unram-
ified subextension LI /K such that L/LI is totally ramified, meaning that the
ramification index is equal to the degree.
By Proposition 1.2.7 and Corollary 1.3.5, the Galois group D = Gal(L/K) op-
erates isometrically on L with respect to | |w , hence reduction modulo w gives a
homomorphism ε : D → Gal(k(w)/k(v)). The kernel of ε is called the inertia
group and is denoted by I . From Hensel’s lemma in 1.2.10, Gauss’s lemma in
1.6.3 and the primitive element theorem ([156], §4.14), we deduce:
(a) The extension k(w)/k(v) is Galois and ε is surjective.
For the fixed field LI of I , Galois theory and (a) yield

Gal(LI /K) → Gal(k(w)/k(v))
and hence the theory of finite fields (see [156], §4.13) proves:
B.2. Unramified extensions 597

(b) Gal(LI /K) is a cyclic group of order equal to the residue degree fw/v .

Let wI be the restriction of w to LI . Applying (a) with LI instead of K and


using I = Gal(L/LI ), we get:

(c) k(wI ) = k(w) and hence fwI /v = fw/v , fw/wI = 1.

By Galois theory and (b), we have [D : I] = [LI : K] = fw/v . By Proposition


1.2.11 and (c), we deduce:

(d) ew/v = [L : LI ] = |I| = ew/wI and ewI /v = 1.

Since the compositum of unramified extensions remains unramified (Corollary


B.2.5) and since the ramification index is bounded by the degree (see Proposition
1.2.11), it is clear from (d) that LI /K is the maximal unramified subextension of
L/K . If the residue characteristic is not a divisor of ew/v , then w is called tamely
ramified over v (see B.2.10). The following result may be deduced from (d) and
[276], Ch.IV, §2, Cor.1-3.

(e) If w is tamely ramified over v , then I is a cyclic group.

B.2.19. To apply B.2.18 in the situation of a Galois extension L/K of number


fields with non-archimedean places w|v , we make the following observations: By
Corollary 1.3.5 the Galois group G := Gal(L/K) operates transitively on the set
of places lying over v . We call Dw := {σ ∈ G | σw = w} the decomposition
group of w over v . It is immediate that decomposition groups of places over v
are conjugate subgroups in G .
Since the elements of Dw are isometric with respect to | |w and since L/K is
generated by the zeros of a polynomial with coefficients in K , we deduce easily
that Dw is isomorphic to the Galois group of the completions Lw /Kv , which is
clearly a Galois extension. Using this natural isomorphism for identification, the
inertia subgroup of Lw /Kv from B.2.18 may be seen as a subgroup of G , which
we call the inertia group of w over v .
Let H be any subgroup of Dw with fixed field LH . Let wH be the restriction
of w to LH . To apply the results of B.2.18, it is useful to note that forming the
completion is compatible with forming fixed fields, i.e. (LH )wH = (Lw )H .
It is clear that (LH )w H ⊂ (Lw )H . On the other hand, α ∈ (Lw )H may be approximated
by a sequence αn ∈ L . By continuity of the elements in Dw , we may replace αn by
1 
σ(αn ) ∈ LH ,
|H| σ∈H

proving the claim.


598 RAMIFICATION

B.3. Unramified morphisms

In this section, we assume that the reader is familiar with the language of schemes. It will be
of minor importance in the book and serves mainly to connect the results about unramified
morphisms from A.12 with Appendix B. We consider a morphism ϕ : X → X  of finite
type between noetherian schemes.
B.3.1. The morphism ϕ is called unramified in x ∈ X if the following conditions are
satisfied:

(a) The maximal ideal mx of OX,x is generated by ϕ (mϕ(x) ) .


(b) ϕ induces a separable extension k(x)/k(ϕ(x)) of residue fields.

If ϕ is unramified in all points of X , then ϕ is called an unramified morphism.


B.3.2. In the same way, we translate the definition of a flat morphism (see A.12.11) to
the case of schemes. Moreover, ϕ is called étale if ϕ is flat and unramified. The set of
unramified (resp. flat, resp. étale) points of X  with respect to ϕ is an open subset of
Y (see [137], Th.17.4.1, Th.11.3.1, Th.17.6.1). This makes it clear that a morphism of
varieties over a field is unramified (resp. flat, resp. étale) if and only if the corresponding
property holds for the associated morphism of schemes.
Remark B.3.3. In the situation of B.2.9, L/K is unramified in the maximal ideal P (i.e.
in vP ) if and only if the morphism Spec(R̄L ) → Spec(R) is unramified in P . In fact,
the assumption L/K separable was only used to ensure that R̄L is a finitely generated R -
module. This is also satisfied if R is a complete discrete valuation ring (see [276], Ch.II,
§2, Prop.3) or the coordinate ring of an affine irreducible regular curve (see A.12.6).
Example B.3.4. Let X  = Spec(A) be an affine (noetherian) scheme and let X =
Spec(B) , where B = A[t]/[f (t)] for a monic polynomial f (t) ∈ A[t] . There is a unique
morphism ϕ : X → X  with ϕ equal to the canonical homomorphism A → B . Then
Example A.12.15 still holds in this context leading to the same definition of standard étale
morphisms as in A.12.16.
Every étale morphism is again locally of standard étale type and every unramified morphism
is locally the composition of a closed embedding with a standard étale morphism.
B.3.5. Let X, X  be schemes of finite type over the discrete valuation ring Rv and let
ϕ : X → X  be a morphism over Rv . Let Q ∈ X  (K) and let P ∈ X(K) with
ϕ(P ) = Q . We consider a place w of K(P ) and we assume that P extends to an
Rw -integral point P of X . This means an Rv -morphism P : Spec(Rw ) → X with
P = P ({0}) . Using Rw ∩ K = Rv , we conclude that Q extends to an Rv -valued point
Q of X  . We denote the image of the maximal ideal of Rw (resp. Rv ) by P (w) (resp.
P (v) ).
Proposition B.3.6. If ϕ is unramified in P (w) , then K(P )/K is unramified in w .
Proof: Since the set of unramified points of X with respect to ϕ is open (see B.3.2), we
conclude that P is also an unramified point and hence K(P )/K is separable. Using the
local nature of unramified morphisms, we may assume that ϕ is a standard étale morphism
(see Example B.3.4).
B.4. The ramification divisor 599

Now we proceed as in the proof of Proposition 10.3.3 using the same notation. Since Q
is an Rv -integral point of X  , we have ai (Q) ∈ Rv and hence we may choose a = 1 .
Similarly, f  (P ) is a unit in Rw . So we may assume α = 1 . This proves that the dis-
criminant of the completions K(P )w /Kv is equal to the valuation ring R̂v of Kv . Using
base change to R̂v , the argument at the beginning of the proof shows that the extension
K(P )w = Kv (P )/Kv is separable. Now B.2.13 implies that K(P )w /Kv is unrami-
fied over v . Since residue fields do not change by passing to completions (see Proposition
1.2.11), it follows that K(P )/K is also unramified. 

B.4. The ramification divisor

In this section, k denotes a field of characteristic 0. Let B, Y be irreducible k -


varieties regular of codimension 1 and let p : Y → B be a finite morphism over
k . The goal is to study the divisor measuring the ramification of p . The reader is
assumed to have some familiarity with function fields (see Section 1.4).
B.4.1. We have seen in A.8.11 that OY,w is a discrete valuation ring for every
prime divisor w of Y . In particular, we may consider w as a place of k(Y ) and
v = p(w) as a place of k(B) with w|v (see Example 1.4.13).
B.4.2. The canonical line bundle KY is well defined on the smooth part Yreg of
Y by KY := ∧dim(Y ) TY∗ . By passing to closures of associated Weil divisors
and using that Y \ Yreg has codimension at least 2, we get a canonical divisor
div(KY ) well defined in CH 1 (Y ).
Proposition B.4.3. On V := Yreg ∩ p−1 (Breg ), the pull-back induces an injective
homomorphism ν : p∗ Ω1B |V → Ω1Y |V of locally free sheaves of rank dim(B).
Proof: By A.7.23, p∗ Ω1B |V and Ω1Y |V are locally free of rank dim(Y ) . Anyway, we have
an exact sequence
p∗ Ω1B |V −→ Ω1Y |V −→ Ω1Y /B |V −→ 0
ν
(B.4)
of coherent sheaves (see A.7.29). Let ξ be the generic point of Y . Then Ω1Y /B ,ξ =
Ω1k(Y )/k(B ) = 0 by separability of k(Y )/k(B) ([148], Th.8.6A). Hence Ω1Y /B is a tor-
sion sheaf, i.e. it is supported in a closed subset R = Y . The image of ν is a locally free
subsheaf of Ω1Y |V of rank r . The sheaves are equal outside of R , hence r = r proving
immediately the claim. 

B.4.4. We have seen in the proof above that Ω1Y /B is a torsion sheaf, hence the
stalk Ω1Y /B,v is of finite length for a prime divisor v of Y (following also from
the proof below). So we may define the ramification divisor of p by

Rp := (Ω1Y /B,v )v,
v
where  is the length. It is supported in the set R considered above.
600 RAMIFICATION

Theorem B.4.5. We have


div(KY ) ∼ p∗ div(KB ) + Rp .
Proof: Let v be a prime divisor on Y . Since B, Y are regular in codimension 1 and p
is finite, the dimension theorem (see A.12.1) shows that the complement of the set V from
Proposition B.4.3 has codimension at least 2 in Y . Considering the stalks of (B.4) in v ,
we get an exact sequence
  νv
0 −→ p∗ Ω1B v −→ Ω1Y ,v −→ Ω1Y /B ,v −→ 0
of modules over the discrete valuation ring OY ,v . By the theorem of elementary divisors
([49], Ch.VII, §4, no.3, Th.1), we deduce easily


(Ω1Y /B ,v ) = ordv (det(νv )). (B.5)

Let W ⊂ V be a trivalization of p Ω1B
and Ω1Y. With respect to the trivialization, γW :=
det(ν) is a regular function on W . Clearly, (γW ) is a well-defined Cartier divisor on V .
By (B.5), its associated Weil divisor is the ramification divisor Rp . On the other hand, we
may consider det(ν) as an injective homomorphism
det(ν) : p∗ KB |V −→ KY |V
induced by pull-back. For an invertible meromorphic section s of KB reg , we get an invert-
ible meromorphic section s := det(ν) ◦ p∗ (s) of KY |V . Working locally with respect to
the trivializations, it is easy to check that
div(s ) = p∗ (div(s)) + Rp |V

on V . This proves the claim. 


As a corollary, we immediately obtain Hurwitz’s theorem.
Theorem B.4.6. Let ϕ : C  → C be a surjective morphism of irreducible smooth
projective curves over k of genus g(C  ) and g(C). Then we have
2g(C  ) − 2 = deg(ϕ)(2g(C) − 2) + deg(Rϕ ).
Proof: By A.6.15 and A.12.4, the morphism is finite. For any divisor D on C , we have
seen in Example 1.4.12 that deg(ϕ∗ D) = deg(ϕ) deg(D) . We apply Theorem B.4.5 and
A.13.6 to get the claim. 
For an effective divisor D , the sum of its components is denoted by Dred . The
divisor is called reduced if D = Dred .
Proposition B.4.7. Let φ : C  → C be a surjective morphism of irreducible
smooth projective curves over k and let D be an effective reduced divisor on C .
Then we have the inequality
φ∗ (D) − φ∗ (D)red ≤ Rφ
of divisors on C  with equality if and only if φ is unramified outside of φ−1 (D).
B.4. The ramification divisor 601

Proof: We check the identity over an irreducible component v of supp(D) . Since D


is reduced, it is given by a local parameter πv in v . Let πv  be a local parameter in
v  ∈ φ−1 (v) . Then we have
πv = πve  u ∈ OC  ,v  , u  ∈ OC
×
 ,v  ,

where e is the multiplicity of φ∗ (D) in v  . We claim that Ω1C,v is generated by dπv and
similarly Ω1C  ,v  is generated by dπv  . If v is k -rational, we argue as follows. By A.7.8,
we may identify the fibre of Ω1C over v with mv /m2v . As πv generates the latter and
corresponds to dπv in the former, we conclude that dπv generates also the stalk Ω1C,v . In
general, we may use base change to k . Using that v is geometrically reduced (see A.4.6),
πv is a local parameter in all points of Xk lying in v . The special case considered above
and the compatibility of Ω1C with base change (use A.7.6) show that dπv is a basis of
Ω1C k ,x for all x ∈ v . The stalk Ω1C,v is free of rank 1 over the discrete valuation ring
OC,v proving that dπv is a basis of Ω1C,v .
Returning to the proof of our original claim, Leibniz’s rule gives
dπv = eπve−1
 u dπv  + πve  du .
By the short exact sequence (B.4) on page 599 and using char(k) = 0 , we conclude that
Ω1C  /C,v has length e − 1 proving immediately the claim. 

Example B.4.8. If k = C in Proposition B.4.7, then we may consider the cor-



responding analytic map φan : Can → Can of compact Riemann surfaces. Note
that the local parameters πv in v ∈ C  and πv in v := φ(v  ) ∈ C correspond to


analytic coordinates. Hence the above proof shows that Rφ has multiplicity e − 1
in v  where e is the order of zeros of the holomorphic function πv ◦ φ ◦ πv−1
 (z)

in z = 0.
Proposition B.4.9. Let w be a prime divisor of Y with ramification index ew/v
over v := p(w) (see Example 1.4.13). Then the multiplicity of the ramification
divisor Rp in w is equal to ew/v − 1.
Proof: We have Ω1Y /B ,w = Ω1OY , w /OB , v . By B.1.18, this module over the discrete
valuation ring OY ,w is generated by one element and its annihilator is the different of
OY ,w /OB ,v . We conclude that
(Ω1Y /B ,w ) is the order of the different in the place w . By
Dedekind’s different theorem (see B.2.11), we get the claim. 
A P P E N D I X C G E O M E T RY O F N U M B E R S

C.1. Adeles

We first recall the existence and uniqueness of a Haar measure on a locally compact
group. In the special case of a completion of a number field K of degree d , we
prove this by an explicit construction. Afterwards we introduce adeles, which are
an important tool in number theory, following A. Weil [328]. Section C.2, with
McFeat’s version [198] of Minkowski’s second theorem in an adelic setting, is
modeled after the exposition of Bombieri and Vaaler [35]. Section C.3 presents
J.D. Vaaler’s cube slicing inequality [302] using also techniques from A. Prékopa
[236].
Theorem C.1.1. Let G be a locally compact group. There is a non-zero positive
left-invariant Borel measure µG on G , i.e.

f (yx) dµG (x) = f (x) dµG (x)
G G
for all y ∈ G and all continuous complex functions f with compact support. The
measure µG is uniquely determined up to positive multiples and is called a Haar
measure on G .
We refer to N. Bourbaki [46], Ch.7, §1, no.2, Th.1 for a proof. Now let H be a
normal closed subgroup and let π : G −→ G/H be the quotient morphism. By
definition, a subset U of G/H is open if and only if π −1 (U ) is open in G . Note
that π is an open map and hence G/H is a locally compact group.
Corollary C.1.2. Given Haar measures µG , µH on G and H , there is a unique
Haar measure µG/H on G/H such that

f (x) dµG (x) = dµG/H (π(x)) f (xy) dµH (y)
G G/H H
for all continuous complex functions f with compact support. Moreover, this for-
mula continues to hold for any f ∈ L1 (G, µG ).
Proof: We prove easily that

G/H −→ C, Hx
→ f (xy)dµH (y)
H

602
C.1. Adeles 603

is a continuous function with compact support. Thus



dµG/H (π(x)) f (xy) dµH (y)
G/H H

is a positive linear functional on the space of continuous complex functions with compact
support. By the Riesz representation theorem ([249], Th.2.14), there is a unique Borel
measure µ on G such that

f (x) dµ(x) = dµG/H (π(x)) f (xy) dµH (x)
G/H H

for every continuous complex function f on G with compact support. Clearly, µ is left
invariant and hence it is equal to µG up to a positive multiple (Theorem C.1.1). By normal-
ization, we get the first claim. In order not to go into too many details of measure theory,
we only refer for the last claim to [46], Ch.7, §2, no.3, Prop.5. The argument proceeds by
showing that, for µG/H -almost every π(x) ∈ G/H , the function y
→ f (xy) is µH -
-
integrable on H , then that the function x
→ H f (xy) dµH (x) is µG/H -integrable on
G/H , and finally that the desired formula holds. 

Example C.1.3. If v ∈ MK , the completion Kv is locally compact (see 1.2.12).


We will construct a Haar measure µv on Kv with the property
µv (αΩ) = αv µv (Ω) (C.1)
for any α ∈ Kv and any Borel measurable subset Ω of K . For the normalization
of the absolute value, we refer to 1.3.6.
If v is archimedean, then Kv = R or C , the Lebesgue measure is a Haar mea-
sure satisfying (C.1), and there is nothing else to prove. Hence let v be a non-
archimedean place with valuation ring Rv in Kv , residue field k(v), and local
parameter π (i.e. π is a generator of the maximal ideal in Rv ). We denote by
ev , fv the ramification index and the residue degree of v over p := char(k(v)).
We consider the closed balls
Bε (x) := {y ∈ Kv | y − xv ≤ ε},
where the “centres” x range over Kv . Note that we need only consider balls of
radius δ n (n ∈ Z), where δ := πv . By the ultrametric triangle inequality,
two balls are either disjoint or one is contained in the other. Every open subset of
Kv is a countable disjoint union of such closed balls. We claim that B1 (0) is the
disjoint union of pfv balls Bδ (x). This follows from B1 (0) = Rv by noting that
the balls Bδ (x) are the fibres with respect to reduction. Thus
µv (Bδn (x)) := p−nfv
is a σ -additive and translation invariant set function on these balls. Obviously, µv
extends uniquely to a function on the compact open subsets of Kv with the same
properties.
604 G E O M E T RY O F N U M B E R S

By standard arguments of measure theory, µv extends uniquely to a translation


invariant Borel measure. By Proposition 1.2.11, we have

= p−fv .
1/ev
δ = πv = pv
This proves (C.1) first for Ω = Bδn (0). By translation invariance, we get (C.1)
for any closed ball and by uniqueness of the extension we get it for all Borel
measurable subsets Ω of Kv .

Definition C.1.4. The adele ring of K is the subring


 2

KA := x ∈ Kv | xv ∈ Rv up to finitely many v
v∈MK

of the additive group v∈MK Kv .
Remark C.1.5. We never use the topology on KA , which is induced by the prod-
uct topology because it is not locally compact. However, for every finite S ⊂ MK
containing all archimedean places, the product topology makes
 
HS := Kv × Rv
v∈S v ∈S
/

into a locally compact topological group. Then there is a unique structure on KA


as a topological group such that the groups HS are open topological subgroups of
KA . In fact, KA is a locally compact topological ring.

We identify K with a subgroup of KA by means of the diagonal map


K −→ KA , x → (x)v∈MK .
Theorem C.1.6. The subgroup K is a discrete closed subgroup of KA and KA /K
is compact.
Proof: We first show that K is discrete in KA . It is enough to prove that 0 is an isolated
point. We choose w ∈ MK and we consider the neighbourhood

U := {x ∈ Kw | |x|w < 1} × {x ∈ Kv | |x|v ≤ 1}
v=w

of 0 in KA . By the product formula, 0 is the only point of K ∩ U .


Every discrete subgroup is closed. To see it, we choose a neighbourhood V of 0 in KA
such that V − V ⊂ U . Then it is clear that, for every x ∈ KA , there is at most one point
in K ∩ (V + x) , proving closedness.
The compactness of KA /K is an immediate consequence of the next lemma and
Tychonov’s theorem. 
C.1. Adeles 605

Remark C.1.7. Let ω1 , . . . , ωd be a basis of OK over Z . By equation (1.1) on


page 8, there is a canonical isomorphism

K ⊗Q R ∼= Kv . (C.2)
v|∞

For simplicity, we may identify both spaces. Let


⎧ ⎫
⎨  
d ⎬
Ω∞ := x ∈ Kv | ∃aj ∈ [0, 1), x = ωj ⊗ aj .
⎩ ⎭
v|∞ j=1

If K is a number field, it is customary to call the non-archimedean places also


finite places, while the archimedean places are called infinite places.

Lemma C.1.8. The subset Ω := Ω∞ × v finite Rv of KA is a fundamental
domain of KA /K , i.e. every class in KA /K has exactly one representative in Ω .
Proof: First, we show uniqueness. Let x, x ∈ Ω∞ with α = x − x ∈ K and with
 
projections dj=1 ωj ⊗ aj , dj=1 ωj ⊗ aj to Ω∞ . For every finite place v , we conclude
that α ∈ Rv and hence α ∈ OK . Now looking at the infinite places we see that

d
α= ωj ⊗ (aj − aj )
j=1

with aj − aj ∈ (−1, 1) . Then α ∈ OK yields aj = aj and hence α = 0 .


Now we show that any class K + x ∈ KA /K has a representative in Ω . Let S be the
set of non-archimedean places with |xv |v > 1 . By the strong approximation theorem (see
Theorem 1.4.5) there is α ∈ K such that |xv − α|v < 1 for all v ∈ S and |α|v ≤ 1 for
all non-archimedean places v ∈ S . Therefore, replacing x by x − α , we may assume that
xv ∈ Rv for all finite v . We have

n
(xv )v|∞ = ω j ⊗ aj
j=1

for some aj ∈ R . There are bj ∈ Z such that 0 ≤ aj − bj < 1 for j = 1, . . . , d . Then



x − dj=1 ωj ⊗ bj is the desired representative in Ω . 

C.1.9. Now we fix the Haar measures βv on Kv and on the adeles in the following
way:

(a) If v is not archimedean, then we normalize the Haar measure by


 1/2
βv (Rv ) = DKv /Qp p ,

where DKv /Qp is the discriminant of v|p ∈ MQ .


(b) If Kv = R , then βv is the ordinary Lebesgue measure.
(c) If Kv = C , then βv is twice the ordinary Lebesgue measure.
606 G E O M E T RY O F N U M B E R S

For a finite subset S of MK containing all the archimedean primes, the product
measure  
βS := βv × βv |Rv
v∈S v∈S
is a Haar measure on the open topological subgroup HS of KA introduced in
Remark C.1.5. The measures βS fit together to give a Haar measure β on KA .
Clearly, the counting measure is a Haar measure on the discrete subgroup K . By
Corollary C.1.2, we get a uniquely determined Haar measure βKA /K on KA /K .
Proposition C.1.10. The volume of KA /K with respect to the Haar measure
βKA /K is 1.
Proof: The fundamental domain Ω is measurable. Since it is contained in a compact subset,
Ω has finite measure. By Corollary C.1.2, we have
β(Ω) = βK A /K (KA /K).
By definition, we have
⎛ ⎞
   1/2
β(Ω) = ⎝ βv ⎠ (Ω∞ ) DK  . (C.3)
v /Qp p
v|∞ v finite

From B.1.20 and B.1.21, we deduce


   
DK/Q  = DK /Q 
p v p p
v|p

and hence the product formula gives


  
|DK/Q |−1 = DK  . (C.4)
v /Qp p
v finite

Let d = [K : Q] be the degree of the number field K . We consider



K ⊗Q R = Kv = R d
v|∞

as a d -dimensional real vector space. By the identification of Kv = R or K = C , we


get an embedding σv of K into R or C . Let σ1 , . . . , σr be the real embeddings and
σr+1 , . . . , σr+s be the chosen complex embeddings. Then α ∈ K gives rise to the vector
v(α) := (σ1 (α), . . . , σr (α), #σr+1 (α), σr+1 (α), . . . , #σr+s (α), σr+s (α)) .
The volume of Ω∞ with respect to the Lebesgue measure on Rd is
|det (v(ω1 ), . . . , v(ωn ))| = 2−s |det (σi (ωj ))| ,
where on the right-hand side we have the determinant of a d×d matrix by using all complex
embeddings, namely σr+s+1 := σ r+1 , . . . , σn := σ r+s . By Remark B.1.15, we conclude
that
⎛ ⎞

⎝ βv ⎠ (Ω∞ ) = |DK/Q |1/2 .
v|∞

Now (C.3) and (C.4) show that β(Ω) = 1 , proving the claim. 
C.1. Adeles 607

C.1.11. To motivate our normalizations of the Haar measures in C.1.9, we have to use
duality and Fourier theory. The following considerations are not used in the sequel. For
proofs, we refer to N. Bourbaki [48], E. Hewitt and K.A. Ross [150], [151], W. Rudin
[253], and A. Weil [326].
Let G be a locally compact abelian group. A character of G is a continuous homomor-
phism of G into T := {z ∈ C | |z| = 1} . Together with the compact-open topology

(also called the topology of uniform convergence on compact sets), the set of characters G
is a locally compact abelian group. For x ∈ G and γ ∈ G  , let x, γ := γ(x) . Then
we get a perfect duality between G and G  , i.e. the canonical homomorphism of G into
the characters of G is an isomorphism. For a continuous complex function f on G with
compact support, the Fourier transform is the continuous complex function f on G  given
by


f(γ) := f (x) x, γ dµ(x),
G

where µ is a fixed Haar measure on G . By Plancherel’s theorem, there is a unique Haar


 on G
measure µ  such that the Fourier transform extends to an isometry of L2 (G, µ) onto
2   associated to µ . Moreover, it is the unique
) . It is called the Haar measure on G
L (G, µ
Haar measure such that the Fourier inversion formula is true.
If G is self-dual and if we identify G with G  by a given isomorphism, then there is a
unique Haar measure µ on G which is equal to its associated Haar measure µ  . This
measure is called self-dual. If G is the additive group of Kw for a number field K and
a place w ∈ MK , then G is self-dual. Let χw be any non-trivial character, then we may
identify y ∈ Kw with the character x, y = χw (xy) on Kw . Thus the choice of a
non-trivial character leads to a well-determined Haar measure on Kw .
For Q∞ = R , any character has the form χ(x) = exp(2πiax) for some non-zero real
number a . We normalize χ∞ by using a = 1 .
If p is a prime
∞ number, we choose χp as follows. Any x ∈ Qp has a representation of
n=n 0 an p for some rational integers an and for some n0 ∈ Z . Let xp be
n
the form
the sum restricted to indices n < 0 ; then any character on Qp has the form χ(x) :=
exp(−2πi(ax)p ) for some non-zero a ∈ Qp . Again, we choose a = 1 for χp .
Our normalizations are justified by the fact that the annihilator of Z (resp. Zp ) with respect
to the pairing ·, · is again Z (resp. Zp ) in the infinite (resp. finite) case.
For the place w of our number field  K , let v be its restriction to Q . We choose χw :=
χv ◦TrK w /Qv leading to the formula w χw (x) = 1 for any x ∈ K . Then our normalized
Haar measure βw is the unique self-dual measure on Kw with respect to the duality induced
by χw .
608 G E O M E T RY O F N U M B E R S

C.2. Minkowski’s second theorem

In this section, we prove Minkowski’s second theorem over the adeles. We begin
by introducing various types of lattices. This material is mainly borrowed from
[328].
We fix notation in the following way. Let K be a number field of degree d ,
N ∈N , v ∈ M
N
K . By E , Ev , E∞ , EA , we denote the euclidean spaces K ,
N N N
Kv , v|∞ Kv , and KA . We fix the Haar measure on EA by using the N -fold
product of the Haar measure β on KA introduced in C.1.9.
Definition C.2.1. For a finite place v of K , a Kv -lattice in Ev is an open and
compact Rv -submodule of Ev .
Proposition C.2.2. Let Λv be an Rv -submodule of Ev . Then Λv is a Kv -lattice
in Ev if and only if Λv is a finitely generated Rv -module which generates Ev as
a Kv -vector space.
Proof: Let Λv be a Kv -lattice. Since Λv is open, it is clear that Ev is generated by Λv as
a Kv -vector space. The Rv -span of N linearly independent vectors over K is an open and
compact Rv -submodule of Ev . Note that Λv is covered by such submodules contained in
Λv . Since Λv is compact, we can select a finite subcovering. This leads to a finite set of
generators for Λv .
Conversely, let Λv be a finitely generated Rv -submodule which generates Ev as a Kv -
vector space. As a continuous image of some RvM , the space Λv is compact. There is a
Kv -basis of Ev contained in Λv . Let U be its Rv -span. Then U is an open neighbour-
hood of 0 and U ⊂ Λv . By a translation argument, we conclude that Λv is open in Ev
and hence Λv is a Kv -lattice in Ev . 

Definition C.2.3. An R -lattice (or simply a lattice) in a finite-dimensional real


(or complex) vector space V is a discrete subgroup Λ of V such that V /Λ is
compact.

The analogue of Proposition C.2.2 is the following well-known fact.


Proposition C.2.4. Let Λ be a subgroup of the finite-dimensional real vector
space V . Then the following conditions are equivalent:

(a) Λ is an R -lattice in V ;
(b) Λ is discrete in V and contains an R -basis of V ;
(c) Λ has a Z-basis which is an R -basis for V .
Proof: (a) ⇒ (b). We claim that Λ generates V as an R -vector space. Let W be a
complementary subspace of RΛ . Then W is contained in the compact V /Λ , hence W =
0 . We conclude that Λ spans V over R , hence it contains an R -basis of V .
C.2. Minkowski’s second theorem 609

(b) ⇒ (c). We may assume that V = RN and that Λ contains the standard basis e1 , . . . ,
eN . Obviously, Λ is generated by S := {λ ∈ Λ | maxi |λi | ≤ 1} . Since Λ is discrete
in V , it is a closed subgroup (as in the proof of Theorem C.1.6). As an intersection of
a compact cube and a discrete closed subset, our S has to be finite. Thus Λ is a finitely
generated abelian group without torsion, hence free of finite rank r ≥ N . Since every
element of Λ/ ⊕j Zej has a representative in the finite set S , we conclude that r = N ,
proving (c).
Finally, (c) ⇒ (a) is obvious. 

Definition C.2.5. A K -lattice in E is a finitely generated OK -submodule Λ of


E which generates E as a K -vector space.
Proposition C.2.6. The K -lattices have the following characterization:

(a) If Λ is a K -lattice in E , then the closure Λv of Λ in Ev is a Kv -lattice


in Ev for any non-archimedean v ∈ MK . Moreover, we have Λv = RvN
up to finitely many v ∈ MK .
(b) Conversely, if for any non-archimedean v ∈ MK we have a Kv -lattice
Λv of Ev and if Λv = RvN up to finitely many v , then there is a unique
K -lattice Λ in E such that Λv is the closure of Λ in Ev . Moreover, we
have +
Λ= (E ∩ Λv ) .
v finite
Proof: Let Λ be a K -lattice generated by x1 , . . . , xm as an OK -module. Then x1 , . . . , xm
generate the closure Λv as an Rv -module (use the compactness of Rv ) and they generate
Ev as a Kv -vector space. By Proposition C.2.2, we know that Λv is a Kv -lattice in Ev .
If we express our generators in terms of the standard basis of E = K n and vice-versa,
then it is clear that the coordinates are in the valuation ring Rv up to finitely many v . This
proves (a).
Now let Λv be a Kv -lattice in Ev , and assume Λv = RvN up to finitely many v . We
define + 
Λ := (E ∩ Λv ) and Λfin := Λv .
v finite v finite

Clearly, Λfin is an open subgroup of v finite Kv . We have to show that Λ is a K -lattice
in E . By means of the canonical embedding of E into E∞ , this subgroup Λ is mapped
onto the projection Λ∞ of
Γ := E ∩ (E∞ × Λfin )
to E∞ . We claim that Λ∞ is an R -lattice in E∞ . Since E is a discrete closed subgroup
in EA (see Theorem C.1.6), we conclude that Γ is a discrete closed subgroup of E∞ ×
Λfin . The first projection of the latter has the property that the inverse image of a compact
subset of E∞ is compact. This proves easily that the image of a discrete closed subset is
discrete. In particular, Λ∞ is a discrete subgroup of E∞ . In order to prove compactness
of E∞ /Λ∞ , it is enough to show that there is a compact subset C of E∞ × Λfin such that
E∞ × Λfin = C + Γ. (C.5)
610 G E O M E T RY O F N U M B E R S

Note that the first projection maps C onto E∞ /Λ∞ , hence this proves compactness of the
latter.
We have an isomorphism
(E∞ × Λfin ) /Γ ∼
= (E + (E∞ × Λfin )) /E (C.6)
of locally compact groups. Since E∞ × Λfin is an open subgroup of EA , we conclude that
E + (E∞ × Λfin ) and also its complement are open subgroups of EA . This is easily seen
by writing them as a union of cosets of E∞ × Λfin . Hence the right-hand side of (C.6)
is a closed subset of the compact quotient EA /E (Theorem C.1.6). This proves that the
left-hand side of (C.6) is also compact. Since the quotient homomorphism is an open map,
we can cover (E∞ × Λfin ) /Γ by the images of finitely many open subsets Ui which are
relatively compact in E∞ × Λfin . Choosing C = ∪i Ui , we get (C.5). This finishes the
proof that Λ∞ is an R -lattice in E∞ .
Now Proposition C.2.4 shows that every Z -basis of Λ∞ is an R -basis of E∞ . Hence Λ
is a free abelian group of finite rank which generates E as a Q -vector space. Obviously, Λ
is a K -lattice in E .
Next, we prove that the closure of Λ in Ev equals Λv for every finite v . Let xv ∈ Λv
and let ε > 0 , then the strong approximation theorem (see Theorem 1.4.5) applied to the
coordinates of xv shows that it exists x ∈ E such that |x − xv |v < ε and x ∈ Λw for
all finite w = v . If ε is sufficiently small, then the openness of Λv implies also x ∈ Λv .
Thus x ∈ Λ proving the density of Λ in Λv .
It remains to prove uniqueness. Let Λ be another K -lattice with closure Λv in Ev for
every finite v . By construction, we have Λ ⊂ Λ . Since both are free abelian groups of
rank N d , the index is finite. Hence there is a non-zero m ∈ Z such that mΛ ⊂ Λ . The
strong approximation theorem shows that Λ is dense in Λfin with respect to the diagonal
embedding (choose OK -generators of Λ and then apply it to the coordinates). Let λ ∈ Λ .
Since mΛfin is an open subgroup of Λfin , there is λ ∈ Λ such that λ − λ ∈ mΛfin . We
deduce
λ − λ ∈ mΛfin ∩ E = mΛ ⊂ Λ
proving λ ∈ Λ . We conclude Λ = Λ . 

Corollary C.2.7. The image Λ∞ of a K -lattice Λ under the diagonal embedding


E → E∞ is an R -lattice.
C.2.8. For λ ∈ R and x = (xv ) ∈ E , let λx ∈ E be given by
$
λxv if v is archimedean,
(λx)v =
xv if v is not archimedean.
For v|∞, we consider a non-empty open convex symmetric bounded subset Sv of
Ev . Here, symmetric means −Sv = Sv . Let Λ be a K -lattice in E and let us
consider the subset  
S := Sv × Λv
v|∞ v finite
of EA .
C.2. Minkowski’s second theorem 611

Definition C.2.9. For n = 1, . . . , N , the n th successive minimum is


" #
λn = inf t > 0 | tS contains n linearly independent vectors of Λ over K .
Remark C.2.10. Note that λS ∩ E is a discrete relatively compact subset of EA ,
hence finite. The closure of λn S is the intersection of all λS with λ > λn .
We conclude that this closure contains n linearly independent vectors. We easily
verify
0 < λ1 ≤ λ2 ≤ · · · ≤ λN < ∞.

Now we are ready to state Minkowski’s second theorem.


Theorem C.2.11. Let Λ be a K -lattice of E with closure Λv in Ev for finite
places v . For v|∞, let S
v be a non-empty
 open convex symmetric bounded subset
Sv of Ev and let S := v|∞ Sv × v finite Λv ⊂ EA . Then
(λ1 λ2 · · · λN )d vol(S) ≤ 2dN ,
where the volume is computed with respect to the Haar measure on EA given by
the product of the normalized Haar measures on KA from C.1.9.
Moreover, suppose that for every complex place v the bounded set Sv is symmetric
in the stronger sense that αSv = Sv for |α| = 1. Let r and s be the number of
real and complex places of K , respectively. Then
2dN π sN
|DK/Q |−N/2 ≤ (λ1 · · · λN )d vol(S),
(N !)r ((2N )!)s
where DK/Q is the discriminant of K .

For the proof we follow [35], obtaining it as a consequence of the Davenport–


Estermann theorem stated below. Forv|∞, let T v be a non-empty bounded con-
vex open subset of Ev and let T := v|∞ Tv × v finite Λv .
Definition C.2.12. For n = 1, . . . , N , let µn be the supremum of all µ ≥ 0
satisfying the following condition: If x, y ∈ µT with x − y ∈ E = K N , then the
last N − n + 1 coordinates of x and y coincide, i.e. xj = yj for j = n, . . . , N .
Remark C.2.13. In the same way as in Remark C.2.10, we have
0 < µ1 ≤ µ2 ≤ · · · ≤ µN < ∞.
We leave the details to the reader (hint: consider S := T − T ).
Theorem C.2.14. With the hypothesis above, we have
(µ1 µ2 · · · µN )d vol(T ) ≤ 1.

This is the Davenport–Estermann theorem. We prove it first and then we the


deduce Minkowski’s second theorem as a consequence.
612 G E O M E T RY O F N U M B E R S

Lemma C.2.15. Let us consider the homomorphism


Φn : EA = KAN −→ (KA /K) × KAN −n ,
n
x → (x1 , . . . , xn , xn+1 , . . . , xN ),
where xj denotes the class of xj . For µ ≥ 1, we have
vol (Φn (µT )) ≥ µd(N −n) vol (Φn (T )) ,
where the volumes are with respect to a Haar measure.
Proof: Suppose first n = N and pick y ∈ T . The origin is contained in T − y (algebraic
minus), hence the convexity shows µ(T − y) ⊃ T − y and so we have
vol (Φn (µ(T − y))) ≥ vol (Φn (T − y)) .
By invariance of the volume under translations, we get the claim for n = N .
Thus we may assume n < N . We write KAN = KAn × KAN −n and for y ∈ KAN −n , let
T (y) = {w ∈ KAn | (w, y) ∈ T }.
We denote the variables of (KA /K)n and KAN −n by w and y , respectively. The corre-
sponding Haar measures will be called dw and dy . By Fubini’s theorem, we have

vol (Φn (µT )) = dy dw.
N −n
KA Φn ((µT )(y))

Using the substitution y = µy and noting that (KAN −n )∞ has real dimension d(N − n) ,
we get

  
vol (Φn (µT )) = µd(N −n) vol Φn (µT )(µy ) dy . (C.7)
N −n
KA
 
By (µT )(µy ) = µT (y ) and the case N = n , we conclude
     
vol Φn (µT )(µy ) ≥ vol Φn T (y ) . (C.8)

From (C.7) and (C.8), we obtain



  
vol (Φn (µT )) ≥ µd(N −n) vol Φn T (y ) dy = µd(N −n) vol (Φn (T ))
N −n
KA

proving the lemma. 

C.2.16. Proof of the Davenport–Estermann theorem: Let us apply Lemma C.2.15


with µn T in place of T and with µ = µn+1 /µn . For n = 1, . . . , N − 1, we get

d(N −n)
µn+1
vol (Φn (µn+1 T )) ≥ vol (Φn (µn T )) . (C.9)
µn
We claim that the map
(KA /K) × KAN −n −→ (KA /K) × KAN −n−1
n n+1

given by
(x1 , . . . , xn , xn+1 , . . . , xN ) → (x1 , . . . , xn+1 , xn+2 , . . . , xN )
C.2. Minkowski’s second theorem 613

is one-to-one on Φn (µn+1 T ). If Φ0 denotes the identity on KAN , then this also


holds for n = 0. To prove it, let x, y ∈ µn+1 T with Φn+1 (x) = Φn+1 (y). This
means that xm = ym for m > n + 1 and xm − ym ∈ K for m ≤ n + 1.
Especially, we have x − y ∈ E . Since Tv is open for v|∞, there is µ < µn+1
with x, y ∈ µT . By definition of µn+1 , we have xn+1 = yn+1 . This proves
Φn (x) = Φn (y) which means injectivity on Φn (µn+1 T ).
Now we use the normalized Haar measures on KA and on KA /K introduced in
C.1.9. The volumes in (C.9) should be understood with respect to the product
measures. Since Φn (µn+1 T ) is mapped bijectively onto Φn+1 (µn+1 T ), Corol-
lary C.1.2 shows
vol (Φn (µn+1 T )) = vol (Φn+1 (µn+1 T )) . (C.10)
For n = 1, . . . , N , we deduce from (C.9) and (C.10) that

d(N −n)
µn+1
vol (Φn+1 (µn+1 T )) ≥ vol (Φn (µn T )) .
µn
This leads to

N −1
d(N −n)
µn+1
vol (ΦN (µN T )) ≥ vol (Φ1 (µ1 T ))
n=1
µn (C.11)
= (µ1 · · · µN ) vol(T ),
d

where in the last step we have used the identity


vol (Φ1 (µ1 T )) = µdN
1 vol(T ),

which follows from (C.10) for n = 0 and the transformation formula. By Propo-
sition C.1.10, the left-hand side of (C.11) is bounded by 1, thereby proving the
Davenport–Estermann theorem. 
C.2.17. Proof of Minkowski’s second theorem: Note that a change of coordinates
in EA by an invertible N × N matrix γ does not affect the statement, because
Example C.1.3 shows that the volume changes by

det(γ)v = 1.
v∈MK

Thus we may assume that for n = 1, . . . , N the closure of λn S contains the first
n elements of the standard basis e1 , . . . , eN of EA = KAN .
Let us apply the Davenport–Estermann theorem with T = S . It is enough to prove
µn ≥ 12 λn . We proceed by induction on n . Let x, y ∈ 12 λn S with x − y ∈ E .
Since S is convex and symmetric, we get
1 1
x−y = (2x) + (−2y) ∈ λn S.
2 2
614 G E O M E T RY O F N U M B E R S

If n = 1, then Remark C.2.10 shows x = y and hence µ1 ≥ 12 λ1 . So we may


assume n ≥ 2 and, by induction hypothesis and C.2.13, we may assume λn >
λn−1 . Therefore the closure of λn−1 S is contained in λn S . Our assumptions
show that e1 , . . . , en−1 and x − y belong to λn S . Again by Remark C.2.10,
x − y must be a linear combination of e1 , . . . , en−1 . This proves xm = ym for
m ≥ n and therefore µn ≥ 12 λn , as wanted.
For the lower bound, for each infinite place we define
$ 
N %

Sv := t ∈ Ev | λi |ti | < 1 ,
i=1

where | | is the usual absolute value on R or C . Then by convexity and symmetry


we verify that Sv ⊂ Sv . (If v is complex, the condition of symmetry needed is
exactly αSv = Sv for |α| = 1.)
Now let S  be defined by Sv as above if v|∞ and Sv = RvN otherwise. Then
S  ⊂ S . By computing volumes (see [162], Th.2.13.3) we find
⎧ N
⎪ −1
⎨ N ! (λ1 · · · λN )
2
⎪ if v is real,
(4π)N
βvN (S  ) = (λ1 · · · λN )−2 if v is complex,
⎪ (2N )!
⎩D
⎪ N/2
 if v|p for p prime.
K v /Qp p

Since vol(S ) ≤ vol(S), the lower bound follows by multiplying the local vol-
umes and by (C.4) on page 606.


C.2.18. We compare our adelic approach with the classical geometry of numbers.
Let Λ∞ be an R -lattice in RN and let S∞ be a non-empty open convex symmet-
ric bounded subset of RN . Let λ1 , . . . , λN be the classical successive minima of
S∞ with respect to Λ∞ defined by
" #
λn := inf t > 0 | tS∞ contains n linearly independent vectors of Λ∞ .
The classical second theorem of Minkowski (see [181], Ch.2, §9, Th.1) states
that
λ1 · · · λN vol(S∞ ) ≤ 2N vol(Λ∞ ),
where the volumes are taken with respect to the Lebesgue measure on RN , with
vol(Λ∞ ) the volume of a fundamental domain of RN /Λ∞ .

We show that this is equivalent to Theorem C.2.11. In order to see that the adelic version
implies the classical theorem we may assume, by a linear transformation, that Λ∞ = ZN
and then we may apply Theorem C.2.11 in the case K = Q .
Conversely, let Λ be a K -lattice in E = K N and let Sv be a non-empty open symmetric
convex bounded subset of Ev , for every non-archimedean place v . We choose a Z -basis
C.3. Cube slicing 615


of OK . Then we may identify E = K N with QN d , and similarly v|p KvN with QN d
p ,
for every p ∈ MQ by (C.2) on page 605. By Proposition C.1.10, the normalized Haar
measures on KAN and on QN A
d
an R -lattice
agree. Our K -lattice Λ may be viewed as 
in RN d . For the non-empty open convex symmetric bounded subset S∞ = v|∞ Sv of

RN d and for S = S∞ × v finite Λv ⊂ EA as in Theorem C.2.11, we claim that
vol(S) = vol(S∞ )/vol(Λ∞ ), (C.12)
where on the left-hand side of the equation the volume is taken with respect to the normal-
ized Haar measure from C.1.9, while on the right the volume of S∞ is taken with respect
to the usual real Lebesgue measure on E∞ = RN d and vol(Λ∞ ) is the volume of a
fundamental domain of E∞ /Λ∞ .
We have already seen that in proving this we may assume K = Q . By the argument in
C.2.17, both sides are invariant under a transformation by A ∈ GL(N d, Q) , so we can
reduce everything to the case Λ = ZN d , where the claim is obvious.
Let λ1 , . . . , λN d be the classical successive minima of S∞ with respect to Λ∞ . Then it
is clear that λ1 = λ1 and λj ≤ λd(j−1)+1 . Hence (C.12) shows that the classical second
theorem of Minkowski implies Theorem C.2.11.

C.2.19. Let Λ∞ and S∞ be as in C.2.18. If


vol(S∞ ) > 2N vol(Λ∞ ),
then Minkowski’s first theorem states that there is at least one non-zero lattice
point contained in S∞ .

This is a special case of the classical second theorem of Minkowski using λ1 ≤ λn . It is
also an immediate consequence of Blichfeldt’s principle [26], a special case which states
that, if Σ is a measurable set, k ∈ N , and vol(Σ) > k vol(Λ∞ ) , then there is a translate of
Σ containing k + 1 distinct points of Λ∞ . Birkhoff’s elementary proof in H.F. Blichfeldt
[26] (see [181], pp.35, 40–43 for extensions and an alternative proof) may be presented as
follows. By intersecting Σ with a sufficiently large ball, we may assume that Σ is bounded.
Let R be a parallelepiped which is a fundamental domain for Λ∞ and consider all lattice
translates of R by x ∈ Λ∞ which intersect Σ ; they cover Σ . The sum of the volumes
of the sets Σ ∩ (R + x) − x ⊂ R is vol(Σ) > k vol(R) ; hence there is a point z ∈ R
which belongs to at least k + 1 such sets (the sum of the characteristic functions cannot be
bounded by k ). The translate Σ − z contains at least k + 1 points of Λ∞ .
Minkowski’s first theorem is immediate by applying Blichfeldt’s principle to Σ = (1/2)S∞
with k = 1 , because if Σ is a convex symmetric set about the origin and y1 , y2 ∈ Σ then
y1 − y2 ∈ 2Σ .

C.3. Cube slicing

In this section, we prove Vaaler’s theorem [302] that the slice of a linear subspace with the
symmetric unit cube of volume 1 has volume at least 1 . The proof uses some basic facts
about log-concave functions, which we handle first.
616 G E O M E T RY O F N U M B E R S

Definition C.3.1. A non-negative real function f on Rn is called log-concave if log f is


concave, i.e. for any x, y ∈ Rn and real numbers λ, µ > 0 with λ + µ = 1 , we have
f (λx + µy) ≥ f (x)λ f (y)µ .
Remark C.3.2. For a log-concave function f , the set {x ∈ Rn | f (x) > 0} is convex
and it is easy to see that f is continuous in the interior of this set.
Lemma C.3.3. Let f, g be log-concave functions on Rn and let λ, µ > 0 with λ+µ = 1 .
For t ∈ Rn , let
r(t) := sup f (x)g(y),
λx+µy=t

where x, y range over Rn . Then



λ
µ
1 1
r(t) dt ≥ f (x) λ dx g(y) µ dy .

This is a kind of converse of Hölder’s inequality. In fact, it holds for all non-negative Borel
measurable functions. The proof for log-concave functions is easier and needs just basic
techniques from analysis. For details, we refer to Prékopa [236], Th.3.
Lemma C.3.4. Let f : Rn ×Rm −→ R+ be a log-concave function and let A be a convex
set of Rm . Then
x
→ f (x, y) dy
A
is a log-concave function on Rn if the integral is always finite.
Proof: We fix x1 , x2 ∈ Rn and let λ1 , λ2 > 0 with λ1 +λ2 = 1 . Let x3 := λ1 x1 +λ2 x2 .
For i = 1, 2, 3 , we define a function fi on Rm by
fi (y) := χA (y)f (xi , y),
where χA is the characteristic function of A . Clearly, fi is log-concave and
f3 (y) ≥ sup f1 (y )λ 1 f2 (y )λ 2 .
λ 1 y +λ 2 y =y

By Lemma C.3.3, we get



λ 1
λ 2
f3 (y) dy ≥ f1 (y ) dy f2 (y ) dy ,

proving the claim. 


Remark C.3.5. Let µ be the probability measure on R corresponding to the Gauss nor-
n

mal density. For a Borel subset Ω , it is given by



µ(Ω) = exp(−π|x|2 ) dx .

2
Note that the density exp(−π|x| ) is a symmetric continuous log-concave function on
Rn . For 0 < s < 1 , let us consider the compact convex symmetric subset of Rn , with
non-empty interior, defined by
Ks := {x ∈ Rn | exp(−π|x|2 ) ≥ s}.
C.3. Cube slicing 617

An easy application of Fubini’s theorem gives


1
µ(Ω) = dx ds .
0 K s ∩Ω

Remark C.3.6. Let Bρ(n) be the closed ball of volume 1 in Rn with centre 0 and radius
ρ(n) . It is well known that
1
n 1
ρ(n) = π − 2 Γ
n
+1 .
2
Lemma C.3.7. Let N = n1 + · · · + nr be a partition of N and define
QN := Bρ(n 1 ) × · · · × Bρ(n r ) .
Let A be a closed symmetric convex subset of RN . Then we have
µ(A) ≤ vol(A ∩ QN ),
where µ is the Gauss measure and vol is the Lebesgue measure on RN .
Proof: We prove the lemma by induction on r . Suppose that r = 1 and let N = n . On
the sphere Sn−1 = {x ∈ Rn | |x| = 1} , we consider the Lebesgue measure λn−1 . Then
the polar decomposition x = rx with r > 0 and x ∈ Sn−1 gives

µ(A) = χA (rx ) exp(−πr2 )rn−1 dr dλn−1 (x )
Sn −1 0

for every closed convex symmetric subset A of Rn . We fix x ∈ Sn−1 . By convexity,


either
Rx ∩ Bρ(n) ⊂ Rx ∩ A
or
Rx ∩ A ⊂ Rx ∩ Bρ(n) .
In the first case
∞ ∞
 2 n−1
χA (rx ) exp(−πr )r dr ≤ exp(−πr2 )rn−1 dr
0 0
1 − n2  n 
= π Γ +1
n 2
ρ(n)
n−1
= r dr
0 ∞
= χB ρ (n ) (rx )rn−1 dr .
0
We conclude that
∞ ∞
χA (rx ) exp(−πr2 )rn−1 dr ≤ χB ρ (n ) ∩A (rx )rn−1 dr
0 0
holds in both cases. Thus

µ(A) ≤ χB ρ (n ) ∩A (rx )rn−1 dr dλn−1 (x )
Sn −1 0
= vol(Bρ(n) ∩ A)
proving the claim for r = 1 .
618 G E O M E T RY O F N U M B E R S

For notational simplicity, we do the induction step only in the case r = 2 . Points on
RN = Rn 1 × Rn 2 are denoted by x = (y, z) . For y ∈ Rn 1 , let
Ay := {z ∈ Rn 2 | (y, z) ∈ A} .
Then Lemma C.3.4 shows that the symmetric function

f (y) := dz
B ρ (n ) ∩A y
2

is log-concave on R n1
. Note that the functions


1
fn := · χ{x∈Rn 1 |f (x)≥ k }
n n
k=0

decrease as n → ∞ to f . Since the sets involved in the characteristic functions are closed,
convex, and symmetric, the induction hypothesis and monotone convergence give

f (y) dµ(y) ≤ f (y) dy .
B ρ (n )
1

Using Remark C.3.5, we get


1
dy dz ds ≤ vol(A ∩ QN ) . (C.13)
0 B ρ (n ) K s ∩A z
2

Now the same argument as above applied to the function



f (z) := dy
K s ∩A z

shows that
f (z)dµ(z) ≤ f (z) dz .
B ρ (n )
2

Using this in (C.13), the induction step is completed by Fubini’s theorem and Remark
C.3.5. 
Finally, we are ready to prove Vaaler’s cube-slicing theorem. In the simplest case of a
cube in RN , this simply states that the volume of a linear slice through the centre of a cube
of volume 1 is bounded below by 1 . In general, it states that the volume of a slice through
the centre of a product of balls of volume 1 is bounded below by 1 .
Theorem C.3.8. Let N = n1 + · · · + nr be a partition and let QN := Bρ(n 1 ) × · · · ×
Bρ(n r ) as above. For a real N × M matrix B of rank M , we have
1
 
det(B t B)− 2 ≤ vol {y ∈ RM | By ∈ QN } .

Proof: Let L be the M -dimensional linear subspace of RN given as the image of RM .


By the transformation formula, it is clear that we have to show
1 ≤ vol(QN ∩ L) ,
where the volume is computed on L . This is the cube-slicing inequality. For ε > 0 , let
Lε := {x ∈ RN | inf |x − y| ≤ ε} .
y∈L
C.3. Cube slicing 619

With respect to the orthogonal decomposition RN = L × L⊥ , we have Lε = L × Bε .


Note that Lε is a closed convex symmetric subset of RN . By Lemma C.3.7, we get

vol(Bε )−1 µ(Lε ) ≤ vol(Bε )−1 dy (C.14)
Q N ∩L ε

As ε → 0 , the left-hand side of this inequality tends to 1 . Here, we have used that
µ(Lε ) = µ(L)µ(Bε ) = µ(Bε )
with respect to the decomposition above and
lim µ(Bε )/vol(Bε ) = 1 .
ε→0
Let y ∈ L , not a boundary point of L ∩ QN . Then

lim vol(Bε )−1 χQ N ∩L ε (y, z) dz = χL∩Q N (y)


ε→0 L⊥
and, by the Lebesgue dominated convergence theorem, we conclude that the right-hand side
of (C.14) tends to vol(QN ∩ L) . This proves the cube-slicing inequality. 
References

[1] A. Abbes, Hauteurs et discrétude (d’après L. Szpiro, E. Ullmo et S. Zhang), Séminaire Bour-
baki, Exposé 825, Vol. 1996/97, Astérisque 245 (1997), 141–166.
[2] D. Abramovich, Uniformité des points rationnels des courbes algébriques sur les extensions
quadratiques et cubiques, C. R. Acad. Sci. Paris Sér. I Math. 321 (1995), 755–758.
[3] L.V. Ahlfors, Beiträge zur Theorie der meromorphen Funktionen, Skand. Mathematik-
erkongress 7 (1930), 84–88.
[4] L.V. Ahlfors, Über eine in der neueren Wertverteilungstheorie betrachtete Klasse transzen-
denter Funktionen, Acta Math. 58 (1932), 375–406. Also Collected Papers. Vol. 1, 112–143.
Birkhäuser, Boston-Basel-Stuttgart 1982. xx+520 pp.
[5] L.V. Ahlfors, Über eine Methode in der Theorie der meromorphen Funktionen, Soc. Sci. Fenn.
Comm. Phys.-Math. 8, No. 10 (1935), pp.1–14. Also Collected Papers. Vol. 1, 190–203.
Birkhäuser, Boston-Basel-Stuttgart 1982. xx+520 pp.
[6] L.V. Ahlfors, Zur Theorie der Überlagerungsflächen, Acta Math. 65 (1935), 157–194.
[7] L.V. Ahlfors, The theory of meromorphic curves, Acta Soc. Sci. Fennicae Nova Ser. A. 3, No.
4 (1941), 31 pp.
[8] L.V. Ahlfors, Complex Analysis: An Introduction to the Theory of Analytic Functions of
One Complex Variable. Third edition. International Series in Pure and Applied Mathematics.
McGraw-Hill Book Co., New York 1978. xi+331 pp.
[9] F. Amoroso and S. David, Le problème de Lehmer en dimension supérieure, J. reine angew.
Math. 513 (1999), 145–179.
[10] F. Amoroso and S. David, Distribution des points de petite hauteur dans les groupes multipli-
catifs, Ann. Scuola Norm. Sup. Pisa Cl. Sci. (5) 3 (2004), 325–348.
[11] F. Amoroso and S. Dvornicich, A lower bound for the height in abelian extensions, J. Number
Th. 80 (2000), 260–272.
[12] F. Amoroso and U. Zannier, A relative Dobrowolski lower bound over abelian extensions, Ann.
Scuola Norm. Sup. Pisa Cl. Sci. (4) 29 (2000), 711–727.
[13] E. Arbarello, M. Cornalba, P.A. Griffiths, and J. Harris, Geometry of Algebraic Curves. Vol.
I. Grundlehren der mathematischen Wissenschaften 267. Springer-Verlag, New York 1985.
xvi+386 pp.
[14] A. Baker, Transcendental Number Theory. Cambridge University Press 1975. ix+147 pp.
[15] A. Baker, Logarithmic forms and the abc -conjecture, in Number Theory: Diophantine, Com-
putational and Algebraic Aspects, 37–44. Györy, Kálmán et al. (eds), Proceedings of the inter-
national conference (Eger, Hungary, 1996). De Gruyter, Berlin 1998.
[16] A. Baker and G. Wüstholz, Logarithmic forms and group varieties, J. reine angew. Math. 442
(1993), 19–62.
[17] V.V. Batyrev and Yu. Tschinkel, Rational points on some Fano cubic bundles, C. R. Acad. Sci.
Paris Sér. I Math. 323, No. 1 (1996), 41–46.
[18] B. Beauzamy, E. Bombieri, P. Enflo, and H.L. Montgomery, Products of polynomials in many
variables, J. Number Th. 36 (1990), 219–245.

620
References 621

[19] S. Beckmann, On extensions of number fields obtained by specializing branched coverings, J.


reine angew. Math. 419 (1991), 27–53.
[20] A. Beilinson, Height pairing between algebraic cycles, in K-theory, Arithmetic and Geometry,
1–26. Yu. Manin (ed.), Semin. Moscow Univ. 1984–86. Lecture Notes in Mathematics 1289.
Springer-Verlag, Berlin 1987.
[21] G.V. Belyı̆, On Galois extensions of a maximal cyclotomic field, Izv. Akad. Nauk SSSR, Ser.
Mat. 43 (1979), 267–276; English transl. in Math. USSR Izv. 14 (1980), 247–256.
[22] A. Bertram, L. Ein, and R. Lazarsfeld, Vanishing theorems, a theorem of Severi, and the equa-
tions defining projective varieties, J. Amer. Math. Soc. 4 (1991), 587–602.
[23] F. Beukers and H.P. Schlickewei, The equation x + y = 1 in finitely generated groups, Acta
Arith. 78 (1996), 189–199.
[24] Yu.F. Bilu, Limit distribution of small points on algebraic tori, Duke Math. J. 89 (1997), 465–
476.
[25] Yu.F. Bilu, Catalan’s conjecture (after Mihăilescu), Séminaire Bourbaki, Exposé 909, Vol.
2002/03, Astérisque 294 (2004), 1–26.
[26] H.F. Blichfeldt, A new principle in the geometry of numbers, with some applications, Trans.
Amer. Math. Soc. 15 (1914), 227–235.
[27] S. Bloch, Height pairings for algebraic cycles, J. Pure Appl. Algebra 34 (1984), 119–145.
[28] E. Bombieri, On Weil’s “Théorème de décomposition,” Amer. J. Math. 105 (1983), 295–308.
[29] E. Bombieri, The Mordell conjecture revisited, Ann. Scuola Norm. Sup. Pisa. Cl. Sci. (4) 17
(1990), 615–640. Errata-corrige: “The Mordell conjecture revisited,” ibid. 18 (1991), 473.
[30] E. Bombieri, On the Thue–Mahler equation II, Acta Arith. 67 (1994), 69–96.
[31] E. Bombieri, Effective Diophantine approximation on Gm , Ann. Scuola Norm. Sup. Pisa Cl.
Sci. (4) 20 (1993), 61–89.
[32] E. Bombieri and P.B. Cohen, An elementary approach to effective Diophantine approximation
on Gm , in Number Theory and Algebraic Geometry, 41–62. M. Reid and A. Skorobogatov
(eds), London Math. Soc. Lecture Note Ser. 303. Cambridge University Press, Cambridge
2003.
[33] E. Bombieri, A. Granville and J. Pintz, Squares in arithmetic progressions, Duke Math. J. 66
(1992), 369–385.
[34] E. Bombieri and H.P.F. Swinnerton-Dyer, On the local zeta function of a cubic threefold, Ann.
Scuola Norm. Super. Pisa Sci. Fis. Mat. (3) 21 (1967), 1–29.
[35] E. Bombieri and J.D. Vaaler, On Siegel’s lemma, Invent. Math. 73 (1983), 11–32. Ibid., Ad-
dendum to “On Siegel’s lemma,” Invent. Math. 75 (1984), 377.
[36] E. Bombieri and A.J. van der Poorten, Some quantitative results related to Roth’s theorem, J.
Austral. Math. Soc. 45 (1988) 233–248. Corrigenda, J. Austral. Math. Soc. 48 (1990), 154–155.
[37] E. Bombieri, A.J. van der Poorten, and J.D. Vaaler, Effective measures of irrationality for cubic
extensions of number fields, Annali Sc. Norm. Sup. Pisa Cl. Sc. (4) 23 (1996), 211–248.
[38] E. Bombieri and U. Zannier, Algebraic points on subvarieties of Gn m , Internat. Math. Res.
Notices (1995), 333–347.
[39] E. Bombieri and U. Zannier, Heights of algebraic points on subvarieties of abelian varieties,
Ann. Scuola Norm. Sup. Pisa Cl. Sci. (4) 23 (1996), 779–792 (1997).
[40] E. Bombieri and U. Zannier, A note on heights in certain infinite extensions of Q , Atti Accad.
Naz. Lincei Cl. Sci. Fis. Mat. Natur. Rend. Lincei (9) Mat. Appl. 12 (2001), 5–14.
[41] A. Borel, Linear Algebraic Groups. Second edition. Graduate Texts in Mathematics 126.
Springer-Verlag, New York 1991. xii+288 pp.
[42] A.I. Borevich and I.R. Shafarevich, Number Theory. Translated from the Russian by Newcomb
Greenleaf. Pure and Applied Mathematics, Vol. 20. Academic Press, New York–London 1966.
x +435 pp.
622 REFERENCES

[43] S. Bosch, U. Güntzer, and R. Remmert, Non-Archimedean Analysis. A systematic approach


to rigid analytic geometry. Grundlehren der mathematischen Wissenschaften 261. Springer-
Verlag, Berlin 1984. xii+436 pp.
[44] S. Bosch, W. Lütkebohmert, and M. Raynaud, Néron Models. Ergebnisse der Mathematik und
ihrer Grenzgebiete (3) 21. Springer-Verlag, Berlin 1990. x+325 pp.
[45] J.-B. Bost, H. Gillet, and C. Soulé, Heights of projective varieties and positive Green forms, J.
Amer. Math. Soc. 7, No. 4 (1994), 903–1027.
[46] N. Bourbaki, Éléments de Mathématique. Fasc. XXIX. Livre VI: Intégration. Chapitre 7:
Mesure de Haar. Chapitre 8: Convolution et représentations. Actualités Scientifiques et In-
dustrielles, No. 1306. Hermann, Paris 1963. 222 pp.
[47] N. Bourbaki, Eléments de Mathématique. Fasc. XXX. Algèbre commutative. Chapitre 5: En-
tiers. Chapitre 6: Valuations. Actualités Scientifiques et Industrielles, No. 1308. Hermann,
Paris 1964. 207 pp.
[48] N. Bourbaki, Éléments de Mathématique. Fasc. XXXII. Théories Spectrales. Chapitre I:
Algèbres normées. Chapitre II: Groupes localement compact commutatifs. Actualités Scien-
tifiques et Industrielles, No. 1332. Hermann, Paris 1967. iv+166 pp.
[49] N. Bourbaki, Éléments de Mathématique. Algèbre. Chapitres 4 à 7. Masson, Paris 1981.
vii+422 pp.
[50] B. Brindza, K. Györy, and R. Tijdeman, On the Catalan equation over algebraic number fields,
J. reine angew. Math. 367 (1986), 90–102.
[51] J. Browkin and J. Brzeziński, Some remarks on the abc -conjecture, Math. Comp. 62 (1994),
931–939.
[52] W.D. Brownawell and D.W. Masser, Vanishing sums in function fields, Math. Proc. Cambridge
Philos. Soc. 100 (1986), 427–434.
[53] Y. Bugeaud, Bornes effectives pour les solutions des équations en S -unités et des équations de
Thue–Mahler, J. Number Th. 71 (1998), 227–244.
[54] Y. Bugeaud and M. Laurent, Minoration effective de la distance p -adique entre puissances de
nombres algébriques, J. Number Th. 61 (1996), 311–342.
[55] G.S. Call and J.H. Silverman, Canonical heights on varieties with morphisms, Compos. Math.
89 (1993), 163–205.
[56] L. Caporaso, J. Harris, and B. Mazur, Uniformity of rational points, J. Amer. Math. Soc. 10
(1997), 1–35.
[57] C. Carathéodory, Theory of Functions of a Complex Variable. Vol. II. Translated by F. Stein-
hardt. Chelsea Publ. Company, New York 1954, 220 pp.
[58] J. Carlson and P. Griffiths, A defect relation for equidimensional holomorphic mappings be-
tween algebraic varieties, Ann. of Math. (2) 95 (1972), 557–584.
[59] H. Cartan, Sur la fonction de croissance attachée à una fonction méromorphe de deux variables
et ses applications aux fonctions méromorphes d’une variable, C. R. Acad. Sci. Paris Sér. I
Math. 189 (1929), 521–523.
[60] H. Cartan, Sur les zéros des combinaisons linéaires de p fonctions holomorphes données,
Mathematica 7 (1933), 5–31.
[61] J.W.S. Cassels, Diophantine equations with special reference to elliptic curves, J. London Math.
Soc. 41 (1966), 193–291.
[62] K. Chandrakekharan, Elliptic Functions. Grundlehren der mathematischen Wissenschaften
281. Springer-Verlag, Berlin 1985. xi+189 pp.
[63] W. Cherry, The Nevanlinna error term for coverings, generically surjective case, in Proceedings
of the Symposium on Value Distribution Theory in Several Complex Variables, 37–53. W. Stoll
(ed.), Notre Dame Math. Lectures 12. University of Notre Dame Press, Notre Dame IN 1992.
[64] W. Cherry and Z. Ye, Nevanlinna’s Theory of Value Distribution: The Second Main Theo-
rem and its Error Terms. Springer Monographs in Mathematics. Springer-Verlag, Berlin 2001.
xii+201 pp.
References 623

[65] C. Chevalley and A. Weil, Un théorème d’arithmétique sur les courbes algèbriques, C. R. Acad.
Sci. Paris Sér. I Math. 195 (1932), 570–572.
[66] K. K. Choi and J. D. Vaaler, Diophantine approximation in projective space, in Number Theory
(Ottawa ON 1996), 55–65. CRM Proc. Lecture Notes 19. Amer. Math. Soc., Providence RI
1999.
[67] W. L. Chow, The Jacobian variety of an algebraic curve, Amer. J. Math. 76 (1954), 453–476.
[68] C.H. Clemens and P.A. Griffiths, The intermediate Jacobian of the cubic threefold, Ann. of
Math. (2) 95 (1972), 281–356.
[69] R. Coleman, Manin’s proof of the Mordell conjecture over function fields, Enseign. Math. (2)
36, No. 3–4 (1990), 393–427.
[70] J.B. Conway, Functions of One Complex Variable. II. Graduate Texts in Mathematics 159.
Springer-Verlag, New York 1995. xvi+394 pp.
[71] J.H. Conway and A.J. Jones, Trigonometric diophantine equations (On vanishing sums of roots
of unity), Acta Arith. 30 (1976), 229–240.
[72] P. Corvaja and U. Zannier, A subspace theorem approach to integral points on curves, C. R.
Math. Acad. Sci. Paris Sér. I Math. 334, No. 4 (2002), 267–271.
[73] P. Corvaja and U. Zannier, On the greatest prime factor of (ab+1)(ac+1) , Proc. Amer. Math.
Soc. 131, No. 6 (2003), 1705–1709.
[74] M. Cugiani, Sull’approssimabilità di un numero algebrico mediante numeri algebrici di un
corpo assegnato, Boll. Un. Mat. Ital. (3) 14 (1959), 151–162.
[75] L.V. Danilov, The Diophantine equation x3 − y 2 = k and a conjecture of M. Hall. (Russian)
Mat. Zametki 32 (1982), 273–275, 425. English translation in Math. Notes 32 (1983), 617–618.
[76] H. Darmon, Faltings plus epsilon, Wiles plus epsilon, and the generalized Fermat equation, C.
R. Math. Rep. Acad. Sci. Canada 19, No. 1 (1997), 3–14. Corrigenda, C. R. Math. Rep. Acad.
Sci. Canada 19, No. 2 (1997), 64.
[77] H. Darmon, F. Diamond, and R. Taylor, Fermat’s Last Theorem. R. Bott et al. (eds) Current
Developments in Mathematics. International Press, Cambridge MA 1995. 1–154.
[78] H. Darmon and A. Granville, On the equation z m = F (x, y) and Axp + By q = Cz r , Bull.
London Math. Soc. 27 (1995), 513–543.
[79] H. Davenport and K.F. Roth, Rational approximations to algebraic numbers, Mathematika 2
(1955), 160–167.
[80] H. Davenport and W.M. Schmidt, Approximation to real numbers by quadratic irrationals. Acta
Arith. 13 (1967/1968), 169–176.
[81] H. Davenport and W.M. Schmidt, Approximation to real numbers by algebraic integers. Acta
Arith. 15 (1968/1969), 393–416.
[82] S. David and P. Philippon, Minorations des hauteurs normalisés des sous-variétés de variétés
abéliennes, in Number Theory, 333–364. V. Kumar Murty (ed.) et al., Proceedings of the Int.
Conference of the Ramanujan Mathematical Society, Providence, Contemp. Math. 210 (1998).
[83] S. David and P. Philippon, Minorations des hauteurs normalisées des sous-variétés des tores,
Ann. Scuola Norm. Sup. Pisa Cl. Sci. (4) 28 (1999), 489–543. Errata: ibid. 29 (2000), 729–731.
[84] S. David and P. Philippon, Minorations des hauteurs normalisées des sous-variétés de variétés
abéliennes II, Comment. Math. Helv. 77 (2002), 639–700.
[85] M. Demazure and P. Gabriel, Groupes algébriques. Tome I: Géométrie Algébrique, Généralités,
Groupes Commutatifs: Avec une appendice Corps de Classes Local par Michel Hazewinkel.
Masson & Cie, Éditeur, Paris; North-Holland Publishing Co., Amsterdam 1970. xxvi+700 pp.
[86] P. Dèbes, Quelques remarques sur un article de Bombieri concernant le Theorème de
Décomposition de Weil, Amer. J. Math. 107 (1985), 39–44.
[87] P. Dèbes, G -fonctions et théorème d’irreducibilité de Hilbert, Acta Arith. 47 (1986), 371–402.
[88] P. Dèbes and U. Zannier, Hilbert’s irreducibility theorem and G -functions, Math. Ann. 309
(1997), 491–503.
624 REFERENCES

[89] L.E. Dickson, History of the Theory of Numbers. Vol. II: Diophantine Analysis. Reprinted
Chelsea Publishing Co., New York 1966. xxv+803 pp.
[90] E. Dobrowolski, On a question of Lehmer and the number of irreducible factors of a polyno-
mial, Acta Arith. 39 (1979), 391–401.
[91] C. Doche, On the spectrum of the Zhang–Zagier height, Math. Comp. 70 (2001), 419–430.
[92] D. Drasin, The inverse problem of the Nevanlinna theory, Acta Math. 138 (1977), 83–151.
[93] D. Drasin, Proof of a conjecture of F. Nevanlinna concerning functions which have deficiency
sum two, Acta Math. 158 (1987), 1–94.
[94] R. Dvornicich and U. Zannier, On sums of roots of unity, Monatsh. Math. 129 (2000), 97–108.
[95] B.M. Dwork and A.J. van der Poorten, The Eisenstein constant, Duke Math. J. 65 (1992), 23–
43. Corrigenda, Duke Math. J. 76 (1994), 669–672.
[96] B. Edixhoven and J.-H. Evertse (eds), Diophantine Approximation and Abelian Varieties: Intro-
ductory Lectures. Papers from the Conference held in Soesterberg, April 12–16, 1992. Lecture
Notes in Mathematics 1566. Springer-Verlag, Berlin 1993. xiv+127 pp.
[97] H. Edwards, Fermat’s Last Theorem: A Genetic Introduction to Algebraic Number Theory.
Graduate Texts in Mathematics 50. Springer-Verlag, New York 1996. xvi+410 pp.
[98] N.D. Elkies, ABC implies Mordell, Internat. Math. Res. Notices (1991), 99–109.
[99] W.J. Ellison, Waring’s problem, Amer. Math. Monthly 78 (1971), 10–36.
[100] A. Erdélyi, W. Magnus, F. Oberhettinger, and F.G. Tricomi, Higher Transcendental Functions.
Vol I. With a preface by Mina Rees. With a foreword by E.C. Watson. Reprint of the 1953
original. Robert E. Krieger Publishing Co., Inc., Melbourne, Fla. 1981. xiii+302 pp. Also: H.
van Haeringen, L.P. Kok, Table errata: Higher Transcendental Functions, Vol. I by A. Erdélyi,
W. Magnus, F. Oberhettinger, and F.G. Tricomi, Math. Comp. 41 (1983), 778.
[101] P. Erdős, C.L. Stewart, and R. Tijdeman, Some diophantine equations with many solutions,
Compos. Math. 66 (1988), 37–56.
[102] H. Esnault and E. Viehweg, Dyson’s lemma for polynomials in several variables (and the
theorem of Roth), Invent. Math. 78 (1984), 445–490.
[103] J.-H. Evertse, On equations in S -units and the Thue–Mahler equation, Invent. Math. 75 (1984),
561–584.
[104] J.-H. Evertse, On sums of S -units and linear recurrences, Compos. Math. 53 (1984), 225–244.
[105] J.-H. Evertse, The subspace theorem of W. M. Schmidt, in Diophantine Approximation and
Abelian Varieties. Introductory Lectures, 31–50. B. Edixhoven and J.-H. Evertse (eds), papers
from the Conference held in Soesterberg, April 12–16, 1992. Lecture Notes in Mathematics
1566. Springer-Verlag, Berlin 1993.
[106] J.-H. Evertse, An explicit version of Faltings’ product theorem and an improvement of Roth’s
lemma, Acta Arith. 73 (1995), 215–248.
[107] J.-H. Evertse, Math. Reviews 95g:11068.
[108] J.-H. Evertse, An improvement of the quantitative subspace theorem, Compos. Math. 101
(1996), 225–311.
[109] J.-H. Evertse, The number of solutions of the Thue–Mahler equation, J. reine angew. Math. 482
(1997), 121–149.
[110] J.-H. Evertse and R.G. Ferretti, Diophantine inequalities on projective varieties, Internat. Math.
Res. Notices (2002), 1295–1330.
[111] J.-H. Evertse and H.P. Schlickewei, A quantitative version of the absolute subspace theorem, J.
reine angew. Math. 548 (2002), 21–127.
[112] J.-H. Evertse, H.P. Schlickewei, and W.M. Schmidt, Linear equations in variables which lie in
a multiplicative group, Ann. of Math. (2) 155 (2002), 807–836.
[113] G. Faltings, Endlichkeitssätze für abelsche Varietäten über Zahlkörpern, Invent. Math. 73
(1983), 349–366. Erratum: ibid. 75 (1984), 381.
[114] G. Faltings, Diophantine approximation on abelian varieties, Ann. of Math. (2) 133 (1991),
549–576.
References 625

[115] G. Faltings, The general case of S. Lang’s conjecture, in Barsotti Symposium in Algebraic
Geometry, 175–182. V. Cristante and W. Messing (eds), papers from the symposium held in
Abano Terme, 1991. Perspectives in Mathematics 15, Academic Press, Inc., San Diego CA
1994.
[116] G. Faltings and G. Wüstholz (eds), Rational Points. Seminar Bonn/Wuppertal 1983/84. Third
enlarged edition. Aspects of Mathematics E6. Vieweg, Braunschweig 1992. x+311 pp.
[117] G. Faltings and G. Wüstholz, Diophantine approximations on projective spaces, Invent. Math.
116 (1994), 109–138.
[118] G. Fano, Sul sistema ∞2 di rette contenuto in una varietà cubica generale dello spazio a quattro
dimensioni, Atti R. Acc. Sc. Torino 39 (1904), 778–792.
[119] R.G. Ferretti, An effective version of Faltings’ product theorem, Forum Math. 8 (1996), 401–
427.
[120] R.G. Ferretti, Mumford’s degree of contact and Diophantine approximations, Compos. Math.
21 (2000), 247–262.
[121] R.H. Fox, On Fenchel’s conjecture about F -groups, Mat. Tidsskr. B. 1952 (1952), 61–65.
[122] J. Franke, Yu.I. Manin, and Y. Tschinkel, Rational points of bounded height on Fano vari-
eties, Invent. Math. 95 (1989), 421–435. Erratum: “Rational points of bounded height on Fano
varieties,” ibid. 102 (1990), 463.
[123] G. Frey, Links between stable elliptic curves and certain diophantine equations, Ann. Univ.
Sarav. Ser. Math. 1 (1986), 1–40.
[124] M. Fried, On the Sprindžuk–Weissauer approach to universal Hilbert subsets, Israel J. of Math.
51 (1985), 347–363.
[125] W. Fulton, Intersection Theory. Second edition. Ergebnisse der Mathematik und ihrer Grenzge-
biete, 3. Folge. Springer-Verlag, Berlin 1998. xiv+470 pp.
[126] H. Gillet and C. Soulé, Arithmetic intersection theory, Publ. Math. IHES 72 (1990), 93–174.
[127] H. Gillet and C. Soulé, An arithmetic Riemann–Roch theorem, Invent. Math. 110 (1992), 473–
543.
[128] A. Granville, ABC allows us to count squarefrees. Internat. Math. Res. Notices (1998), 991–
1009.
[129] H. Grauert, Mordells Vermutung über rationale Punkte auf algebraischen Kurven und Funktio-
nenkörper. Publ. Math. IHES 25 (1965), 131–149.
[130] P. Griffiths and J. Harris, Principles of Algebraic Geometry. Pure and Applied Mathematics.
Wiley-Interscience, New York 1978. xii+813 pp.
[131] P. Griffiths and J. King, Nevanlinna theory and holomorphic mappings between algebraic vari-
eties, Acta Math. 130 (1973), 145–220.
[132] R. Gross, A note on Roth’s theorem, J. Number Th. 36 (1990), 127–132.
[133] A. Grothendieck, Fondements de la géométrie algébrique, Séminaire Bourbaki, Exposé 236,
Vol. 1961/62, Secrétariat Math., Paris 1962.
[134] A. Grothendieck, Eléments de géométrie algébrique I. Le Langage des Schémas. Rédigés avec
la collaboration de J. Dieudonné. Publ. Math. IHES 4 (1960), 228 pp.
[135] A. Grothendieck, Eléments de géométrie algébrique II. Étude globale élémentaire de quelques
classes de morphismes. Rédigés avec la collaboration de J. Dieudonné. Publ. Math. IHES 8
(1961), 222 pp.
[136] A. Grothendieck, Eléments de géométrie algébrique III. Étude cohomologique des faisceaux
cohérents. Rédigés avec la collaboration de J. Dieudonné. I, Publ. Math. IHES 11 (1961), 167
pp.; II, ibidem 17 (1963), 91 pp.
[137] A. Grothendieck, Eléments de géométrie algébrique IV. Étude locale des schémas et des mor-
phismes de schémas. Rédigés avec la collaboration de J. Dieudonné. I, Publ. Math. IHES 20
(1964), 259 pp.; II, ibid. 24 (1965), 231 pp.; III, ibid. 28 (1966), 255 pp.; IV, ibid. 32 (1967),
361 pp.
626 REFERENCES

[138] A. Grothendieck, Esquisse d’un programme, in Geometric Galois Actions. 1. Around


Grothendieck’s “Esquisse d’un programme,” 5–48. Leila Schneps and Pierre Lochak (eds),
London Math. Soc. Lecture Note Ser. 242. Cambridge University Press, Cambridge 1997. Eng-
lish translation ibid., 243–283.
[139] A. Grothendieck and J.A. Dieudonné, Eléments de Géométrie Algébrique I. Grundlehren der
mathematischen Wissenschaften 166, Springer-Verlag, Berlin Heidelberg New York 1971,
ix+466 pp.
[140] A. Grothendieck et al., Revêtements Étales et Groupe Fondemental. Séminaire de Géométrie
Algébrique du Bois Marie 1960–1961 (SGA 1). Dirigé par Alexandre Grothendieck. Augmenté
de deux exposés de M. Raynaud. Lecture Notes in Mathematics 224. Springer-Verlag, Berlin–
New York 1971. xxii+447 pp.
[141] W. Gubler, Local and canonical heights of subvarieties, Ann. Scuola Norm. Sup. Pisa Cl. Sci.
(5) II (2003), 711–760.
[142] G.G. Gundersen and W.K. Hayman, The strength of Cartan’s version of Nevanlinna theory,
Bull. London Math. Soc. 36 (2004), 433–454.
[143] J. Hadamard, Résolution d’une question relative aux déterminants, Bull. Sci. Math. (2) 17
(1893), 240–246.
[144] M. Hall, The diophantine equation x3 − y 2 = k , in Computers in Number Theory, 173–198.
Proc. Sci. Res. Council Atlas Sympos. No. 2, Oxford 1969. Academic Press, London 1971.
[145] G.H. Hardy and E.M. Wright, An Introduction to the Theory of Numbers. Fifth edition. Oxford
at the Clarendon Press 1979. xvi+426 pp.
[146] J. Harris, Algebraic Geometry: A First Course. Graduate Texts in Mathematics 133. Springer-
Verlag, Berlin 1992. xix+328 pp.
[147] J. Harris and J.H. Silverman, Bielliptic curves and symmetric products, Proc. Amer. Math. Soc.
112 (1991), 347–356.
[148] R. Hartshorne, Algebraic Geometry. Graduate Texts in Mathematics 52. Springer-Verlag, New
York–Heidelberg 1977. xvi+496 pp.
[149] W.K. Hayman, Meromorphic Functions. Oxford Mathematical Monographs. Oxford at the
Clarendon Press 1964. xiv+191 pp. Reprinted 1975 (with Appendix). xiv+195 pp.
[150] E. Hewitt and K.A. Ross, Abstract Harmonic Analysis. Volume I: Structure of Topological
Groups, Integration Theory, Group Representations. Second edition. Grundlehren der mathe-
matischen Wissenschaften 115. Springer-Verlag, Berlin 1994. viii+519 pp.
[151] E. Hewitt and K.A. Ross, Abstract Harmonic Analysis. Volume II: Structure and Analysis for
Compact Groups: Analysis on Local Compact Abelian Groups. Second printing. Grundlehren
der mathematischen Wissenschaften 152. Springer-Verlag, Berlin 1994. viii+771 pp.
[152] E. Hille, Analytic Function Theory. Vol. II. Introductions to Higher Mathematics. Ginn & Co.,
Boston MA–New York–Toronto, ON. 1962. xii+496 pp.
[153] M. Hindry and J.H. Silverman, Diophantine Geometry: An Introduction. Graduate Texts in
Mathematics 201. Springer-Verlag, New York 2000. xiv+558 pp.
[154] L.-K. Hua, On Waring’s problem, Quart. J. Math. Oxford Ser. (2) 9 (1938), 199–202.
[155] D. Husemöller, Elliptic Curves. With an appendix by Ruth Lawrence. Graduate Texts in Math-
ematics 111. Springer-Verlag, New York 1987. xvi+350 pp.
[156] N. Jacobson, Basic Algebra I. First edition. W. H. Freeman & Co., San Francisco 1974.
xvi+472 pp. Second edition. W. H. Freeman & Company, New York 1985. xviii+499 pp.
[157] N. Jacobson, Basic Algebra II. First edition. W. H. Freeman & Co., San Francisco 1980.
xix+666 pp. Second edition. W. H. Freeman & Company, New York 1989. xviii+686 pp.
[158] G.J.O. Jameson, The Prime Number Theorem. London Math. Soc. Student Texts 53. Cambridge
University Press 2003. x+252 pp.
[159] S. Katok, Fuchsian Groups. Chicago Lectures in Mathematics. The University of Chicago
Press, Chicago IL 1992. x+175 pp.
References 627

[160] M. Kim, Geometric height inequalities and the Kodaira–Spencer map, Compos. Math. 105, No.
1 (1997), 43–54. erratum: ibid 121, No. 2 (2000), 219.
[161] M. Kim, D.S. Thakur, and J.F. Voloch, Diophantine approximation and deformation, Bull. Soc.
Math. France 128 (2000), 585–598.
[162] H. Koch, Number Theory: Algebraic Numbers and Functions. Translated from the German by
David Kramer. Graduate Studies in Mathematics 24. AMS, Providence RI 2000. xviii+368 pp.
[163] M. Krasner, Nombre des extensions d’un degré donné d’un corps p-adique, in Les Tendances
Géom. en Algèbre et Théorie des Nombres, 143–169, Editions du Centre National de la
Recherche Scientifique, Paris 1966.
[164] E. Landau, Vorlesungen über Zahlentheorie. Bd. II: Aus der analytischen und geometrischen
Zahlentheorie. Hirzel, Leipzig 1927. Reprinted, Chelsea Publ. Co. 1947. viii+308 pp.
[165] S. Lang, Abelian Varieties. Interscience Publishers, New York 1959. x+169 pp. Reprinted,
Springer-Verlag, New York–Berlin 1983. xii+256 pp.
[166] S. Lang, Integral points on curves, Publ. Math. IHES 6 (1960), 319–335.
[167] S. Lang, Division points on curves, Annali Mat. Pura Appl. (4) 70 (1965), 229–234.
[168] S. Lang, Introduction to Algebraic and Abelian Functions. Second edition. Graduate Texts in
Mathematics 89. Springer-Verlag, New York–Berlin 1982. ix+169 pp.
[169] S. Lang, Fundamentals of Diophantine Geometry. Springer-Verlag, New York 1983. xviii+370
pp.
[170] S. Lang, Introduction to Complex Hyperbolic Spaces. Springer-Verlag, New York 1987.
viii+271 pp.
[171] S. Lang, Number Theory III: Diophantine Geometry. Encyclopaedia of Mathematical Sciences,
Vol. 60. Springer-Verlag, Berlin 1991, xiv+296 pp.
[172] S. Lang, Algebraic Number Theory. Second edition. Graduate Texts in Mathematics 110.
Springer-Verlag, New York 1994. xiv+357 pp.
[173] S. Lang, Algebra. Revised third edition. Graduate Texts in Mathematics 211. Springer-Verlag,
New York 2002. xvi+914 pp.
[174] S. Lang and W. Cherry, Topics in Nevanlinna Theory. With an Appendix by Zhuan Ye. Lecture
Notes in Mathematics 1433. Springer-Verlag, Berlin 1990. 174 pp.
[175] S. Lang and H. Trotter, Continued fractions for some algebraic numbers, J. reine angew. Math.
255 (1972), 112–134; addendum, ibid. 267 (1974), 219–220.
[176] M. Langevin, Cas d’égalité pour le théorème de Mason et applications de la conjecture (abc) ,
C. R. Acad. Sci. Paris Sér. I Math. 317 (1993), 441–444.
[177] M. Langevin, Liens entre le théorème de Mason et la conjecture (abc) , in Number Theory
(Ottawa ON 1996), 187–213. CRM Proc. Lecture Notes 19, AMS, Providence RI 1999.
[178] M. Laurent, Équations diophantiennes exponentielles, Invent. Math. 78 (1984), 299–327.
[179] M. Laurent, M. Mignotte, and Yu. Nesterenko, Formes linéaires en deux logarithmes et
déterminants d’interpolation, J. Number Th. 55 (1995), 285–321.
[180] D.H. Lehmer, Factorization of certain cyclotomic functions, Ann. of Math. 34 (1933), 461–479.
[181] C.G. Lekkerkerker, Geometry of Numbers. Bibliotheca Mathematica, Vol. VIII. Wolters-
Noordhoff Publishing, Groningen; North-Holland Publishing Co., Amsterdam–London 1969.
ix+510 pp.
[182] P. Liardet, Sur une conjecture de Serge Lang. (French) Journées Arithmétiques de Bordeaux
(Conf., Univ. Bordeaux, Bordeaux 1974), 187–210. Astérisque 24–25, Soc. Math. France,
Paris 1975.
[183] R. Louboutin, Sur la mesure de Mahler d’un nombre algébrique, C. R. Acad. Sci. Paris Sér. I
Math. 296 (1983), 707–708.
[184] H. Luckhardt, Herbrand-Analysen zweier Beweise des Satzes von Roth: Polynomiale An-
zahlschranken, J. Symbolic Logic 54 (1989), 234–263.
[185] K. Mahler, On the fractional parts of the powers of a rational number (II), Mathematika 4
(1957), 122–124.
628 REFERENCES

[186] K. Mahler, Lectures on Diophantine Approximations. Part I: g -adic Numbers and Roth’s
Theorem. Prepared from the notes by R. P. Bambah of my lectures given at the University
of Notre Dame in the Fall of 1957. University of Notre Dame Press, Notre Dame, Ind. 1961.
xi+188 pp.
[187] K. Mahler, On some inequalities for polynomials in several variables, J. London Math. Soc. 37
(1962), 341–344.
[188] K. Mahler, An inequality for the discriminant of a polynomial, Michigan Math. J. 11 (1964),
257–262.
[189] Yu.I. Manin, Rational points of algebraic curves over function fields, Izv. Akad. Nauk SSSR Ser.
Mat 27 (1963), 1395–1440; Amer. Math. Soc., Transl., II. Ser. 50 (1966), 189–234.
[190] Yu.I. Manin, Cubic Forms. Algebra, Geometry, Arithmetic. Translated from the Russian by
M. Hazewinkel. Second edition. North-Holland Mathematical Library, Vol. 4. North-Holland
Publishing Co., Amsterdam 1986. x+326 pp.
[191] H.B. Mann, On linear relations between roots of unity, Mathematika 12 (1965), 107–117.
[192] R.C. Mason, The hyperelliptic equation over function fields, Math. Proc. Cambridge Philos.
Soc. 93 (1983), 219–230.
[193] R.C. Mason, Diophantine Equations over Function Fields. London Math. Soc. Lecture Note
Ser. 96. Cambridge University Press 1984. x+125 pp.
[194] D. Masser, On abc and discriminants, Proc. Amer. Math. Soc. 130, No. 11 (2002), 3141–3150.
[195] D. Masser and G. Wüstholz, Fields of large transcendence degree generated by values of elliptic
functions, Invent. Math. 72 (1983), 407–464.
[196] W.S. Massey, Algebraic Topology: An Introduction. Reprint of the 1967 edition. Graduate Texts
in Mathematics 56. Springer-Verlag, New York–Heidelberg 1977. xxi+261 pp.
[197] H. Matsumura, Commutative Algebra. Second edition. Mathematics Lecture Note Series 56.
W.A. Benjamin, Inc., New York 1970. xii+262 pp. Benjamin/Cummings Publishing Co., Inc.,
Reading MA 1980. xv+313 pp.
[198] R.B. McFeat, Geometry of numbers in adele spaces, Dissertationes Math. Rozprawy Mat. 88
(1971), 49 pp.
[199] M. McQuillan, Division points on semiabelian varieties, Invent. Math. 120 (1995), 143–159.
[200] M. Mignotte, Sur l’Équation de Catalan, C. R. Acad. Sci. Paris Sér. I Math. 314 (1992), 165–
168.
[201] M. Mignotte, Y. Roy, Minorations pour l’équation de Catalan. C. R. Acad. Sci. Paris Sér. I
Math. 324 (1997), 377–380.
[202] P. Mihăilescu, A class number free criterion for Catalan’s conjecture, J. Number Th. 99 (2003),
225–231.
[203] P. Mihǎilescu, Primary cyclotomic units and a proof of Catalan’s conjecture, J. reine angew.
Math. 572 (2004), 167–195.
[204] J.S. Milne, Abelian varieties, in Arithmetic Geometry, 103–150. Cornell and Silverman (eds),
papers from the conference at University of Connecticut, Storrs Conn. 1984. Springer, New
York 1986.
[205] J.S. Milne, Jacobian varieties, in Arithmetic Geometry, 167–212. Cornell and Silverman (eds),
papers from the conference at University of Connecticut, Storrs Conn. 1984. Springer, New
York 1986.
[206] L. Mirsky, An Introduction to Linear Algebra. Oxford at the Clarendon Press 1955. xi+433 pp.
[207] L.J. Mordell, On the rational solutions of the indeterminate equations of the third and fourth
degrees. Proc. Cambridge Philos. Soc. 21 (1922), 179–192.
[208] A. Moriwaki, Arithmetic height functions over finitely generated fields, Invent. Math. 140
(2000), 101–142.
[209] T. Muir, The Theory of Determinants in the Historical Order of Development. Vol 4: The period
1880 to 1900. Macmillan & Co. Limited, St. Martin Street, London 1923. xxxi+508 pp.
References 629

[210] D. Mumford, The topology of normal singularities of an algebraic surface and a criterion for
simplicity, Publ. Math. IHES 9 (1961), 5–22.
[211] D. Mumford, A remark on Mordell’s conjecture, Amer. J. Math. 87 (1965), 1007–1016.
[212] D. Mumford, Abelian Varieties. Published for the Tata Institute of Fundamental Research Stud-
ies in Mathematics, No. 5. Oxford University Press, London 1970. viii+242 pp.
[213] D. Mumford, The Red Book of Varieties and Schemes. Second, expanded edition. Includes
the Michigan lectures (1974) on curves and their Jacobians. With contributions by Enrico Ar-
barello. Lecture Notes in Mathematics 1358. Springer-Verlag, Berlin 1999. x+306 pp.
[214] D. Mumford, Tata Lectures on Theta. I: Introduction and Motivation: Theta Functions in One
Variable. Basic Results on Theta Functions in Several Variables. II: Jacobian Theta Functions
and Differential Equations. III (with M. Nori, P. Norman). Progr. Math. 28, 43, 97. Birkhäuser
1983, 1984, 1991. xiii+235 pp., xiv+272 pp., viii+202 pp.
[215] W. Narkiewicz, Elementary and Analytic Theory of Algebraic Numbers. Second Edition. PWN–
Polish Scientific Publishers and Springer-Verlag, Warszawa 1990. xiv+746 pp.
[216] A. Néron, Problèmes arithmétiques et géométriques rattachés à la notion de rang d’une courbe
algébrique dans un corps, Bull. Soc. Math. France 80 (1952), 101–166.
[217] A. Néron, Modèles minimaux des variétés abéliennes sur les corps locaux et globaux, Publ.
Math. IHES 21 (1964), 361–482.
[218] A. Néron, Quasi-fonctions et hauteurs sur les variétés abéliennes, Ann. of Math. (2) 82 (1965),
249–331.
[219] R. Nevanlinna, Zur Theorie der meromorphen Funktionen, Acta Math 46 (1925), 1–99.
[220] R. Nevanlinna, Über Riemannsche Flächen mit endlich viele Windungspunkten, Acta Math. 58
(1932), 295–373.
[221] R. Nevanlinna, Eindeutige Analytische Funktionen. Zweite Auflage, Reprint. Grundlehren der
mathematischen Wissenschaften 46. Springer-Verlag, Berlin-New York 1974. x+379 pp.
[222] A. Nitaj, La conjecture abc , Enseign. Math. (2) 42, No. 1–2 (1996), 3–24.
[223] E.I. Nochka, Defect relations for meromorphic curves (Russian) Izv. Akad. Nauk Moldav. SSR
Ser. Fiz.-Tekhn. Mat. Nauk No. 1 (1982), 41–47, 79.
[224] E.I. Nochka, On the theory of meromorphic functions, Sov. Math., Dokl 27 (1983), 377–381;
transl. from Dokl. Akad. Nauk SSSR 269 (1983), 547–552.
[225] J. Noguchi, A higher dimensional analogue of Mordell’s conjecture over function fields, Math.
Ann. 258 (1981), 207–212.
[226] J. Noguchi, Nevanlinna–Cartan theory over function fields and a diophantine equation, J. reine
angew. Math. 487 (1997), 61–83. Correction: ibid. 497 (1998), 235.
[227] D.G. Northcott, An inequality in the theory of arithmetic varieties, Proc. Cambridge Philos.
Soc. 45 (1949), 502–509.
[228] D.G. Northcott, A further inequality in the theory of arithmetic varieties, Proc. Cambridge
Philos. Soc. 45 (1949), 510–518.
[229] J. Oesterlé, Nouvelles approches du “théorème” de Fermat, Séminaire Bourbaki, Exposé 694,
Vol. 1987/88, Astérisque 161/162 (1988), 165–186.
[230] A. Ogg, Elliptic curves and wild ramification, Amer. J. Math. 89 (1967), 1–21.
[231] Ch.F. Osgood, A number theoretic-differential equations approach to generalizing Nevanlinna
theory, Indian J. Math. 23 (1981), 1–15.
[232] Ch.F. Osgood, Sometimes effective Thue–Siegel–Roth–Schmidt–Nevanlinna bounds, or better,
J. Number Th. 21 (1985), 347–389.
[233] O. Perron, Die Lehre von den Kettenbrüchen. Bd. I. Elementare Kettenbrüche. 3. Aufl., Teubner
Verlagsgesellschaft, Stuttgart 1954, vi+194 pp.
[234] E. Peyre, Points de hauteur bornée, topologie adélique et mesures de Tamagawa, 22nd Journées
Arithmétiques (Lille 2001), J. Théor. Nombres Bordeaux 15, No. 1 (2003), 319–349.
[235] B. Poonen, Mordell–Lang plus Bogomolov, Invent. Math. 137 (1999), 413–425.
630 REFERENCES

[236] A. Prékopa, On logarithmic concave measures and functions, Acta Sci. Math. (Szeged) 34
(1973), 335–343.
[237] M. Raynaud, Courbes sur une variété abélienne et points de torsion, Invent. Math. 71 (1983),
207–233.
[238] M. Raynaud, Sous-variétés d’une variété abélienne et points de torsion, in Arithmetic and Ge-
ometry I, 327–352. J. Coates and S. Helgason (eds), Progr. Math. 35, Birkhäuser Boston
Inc. 1983.
[239] L. Rédei, Algebra. Vol. 1. International Series of Monographs in Pure and Applied Mathematics
91. Oxford: Pergamon Press 1967. xviii+823 pp.
[240] G. Rémond, Décompte dans une conjecture de Lang, Invent. Math. 142 (2000), 513–545.
[241] G. Rémond, Sur le théorème du produit, 21st Journées Arithmétiques (Rome, 2001), J. Théor.
Nombres Bordeaux 13, No. 1 (2001), 287–302.
[242] G. Rémond, Approximation diophantienne sur les variétés semi-abéliennes, Ann. Sci. Éc.
Norm. Sup. (4) 36, No. 2 (2003), 191–212.
[243] P. Ribenboim, Catalan’s Conjecture. Are 8 and 9 the Only Consecutive Powers? Academic
Press, Inc., Boston MA 1994. xvi+364 pp.
[244] P. Roquette, Analytic Theory of Elliptic Functions over Local Fields. Hamburger mathematis-
che Einzelschriften (N.F.), Heft 1. Vandenhoeck & Ruprecht, Göttingen 1970. 90 pp.
[245] J.B. Rosser and L. Schoenfeld, Approximate formulas for some functions of prime numbers,
Illinois J. Math. 6 (1962), 64–94.
[246] D. Roy, Approximation to real numbers by cubic algebraic integers. II, Ann. of Math. (2) 158
(2003), 1081–1087.
[247] D. Roy and J. L. Thunder, An absolute Siegel’s lemma, J. reine angew. Math. 476 (1996), 1–26.
[248] D. Roy and J. L. Thunder, Addendum and erratum to: “an absolute Siegel’s lemma”, J. reine
angew. Math. 508 (1999), 47–51.
[249] M. Ru, Nevanlinna Theory and its Relation to Diophantine Approximation. World Scientific
Publishing Co., Inc., River Edge NJ 2001. xiv+323 pp.
[250] M. Ru and P. Vojta, Schmidt’s subspace theorem with moving targets, Invent. Math. 127 (1997),
51–65.
[251] W. Rudin, Functional Analysis. McGraw-Hill Series in Higher Mathematics. McGraw-Hill
Book Co., New York–Düsseldorf–Johannesburg 1973. xiii+397 pp. Second edition. Interna-
tional Series in Pure and Applied Mathematics. McGraw-Hill, Inc., New York 1991. xviii+424
pp.
[252] W. Rudin, Real and Complex Analysis. Third edition. McGraw-Hill Book Co., New York 1987.
xiv+416 pp.
[253] W. Rudin, Fourier Analysis on Groups. Reprint of the 1962 original. Wiley-Interscience, New
York 1990. ix+285 pp.
[254] R. Rumely, On Bilu’s equidistribution theorem, in Spectral Problems in Geometry and Arith-
metic. Iowa City IA 1997, 159–166. Contemp. Math. 237, AMS, Providence RI 1999.
[255] C. Runge, Ueber ganzzahlige Lösungen von Gleichungen zwischen zwei Veränderlichen, J.
reine angew. Math. 100 (1887), 425–435.
[256] T. Saito, Conductor, discriminant, and the Noether formula of arithmetic surfaces, Duke Math.
J. 57 (1988), 151–173.
[257] S.H. Schanuel, Heights in number fields, Bull. Soc. Math. France 107 (1979), 433–449.
[258] A. Schinzel, On the product of the conjugates outside the unit circle of an algebraic number,
Acta Arith. 24 (1973), 385–399. Addendum ibid. 26 (1974/75), 329–331.
[259] A. Schinzel, Polynomials with Special Regard to Reducibility. With an Appendix by Umberto
Zannier. Encyclopedia of Mathematics and its Applications 77. Cambridge University Press,
Cambridge 2000. x+558 pp.
[260] H.P. Schlickewei, S -unit equations over number fields, Invent. Math. 102 (1990), 95–107.
References 631

[261] H.P. Schlickewei, The quantitative subspace theorem for number fields, Compos. Math. 82
(1992), 245–273.
[262] H.P. Schlickewei, Equations in roots of unity, Acta Arith. 76 (1996), 99–108.
[263] W.M. Schmidt, On heights of algebraic subspaces and diophantine approximations, Ann. of
Math. (2) 85 (1967), 430–472.
[264] W.M. Schmidt, On simultaneous approximations of two algebraic numbers by rationals, Acta
Math. 119 (1967), 27–50.
[265] W.M. Schmidt, Asymptotic formulae for point lattices of bounded determinant and subspaces
of bounded height, Duke Math. J. 35 (1968), 327–339.
[266] W.M. Schmidt, Simultaneous approximation to algebraic numbers by rationals, Acta Math. 125
(1970), 189–201.
[267] W.M. Schmidt, Norm form equations, Ann. of Math. (2) 96 (1972), 526–551.
[268] W.M. Schmidt, Diophantine Approximation. Lecture Notes in Mathematics 785. Springer-
Verlag, Berlin 1980. x+299 pp.
[269] W.M. Schmidt, The subspace theorem in Diophantine approximations, Compos. Math. 69
(1989), 121–173.
[270] W.M. Schmidt, Eisenstein’s theorem on power series expansions of algebraic functions, Acta
Arith. 56 (1990), 161–179.
[271] W.M. Schmidt, Diophantine Approximations and Diophantine Equations. Lecture Notes in
Mathematics 1467. Springer-Verlag, Berlin 1991. viii+217 pp.
[272] W.M. Schmidt, Heights of algebraic points lying on curves or hypersurfaces, Proc. Amer. Math.
Soc. 124, No. 10 (1996), 3003–3013.
[273] W.M. Schmidt, Heights of points on subvarieties of Gn m , in Number Theory (Paris, 1993–
1994), 157–187. London Math. Soc. Lecture Note Ser. 235. Cambridge University Press, Cam-
bridge 1996.
[274] H. Seifert and W. Threlfall, Lehrbuch der Topologie. B.G. Teubner VII, Leipzig und Berlin
1934. Also Herbert Seifert and William Threlfall, Seifert and Threlfall: A Textbook of Topology.
Translated from the German edition of 1934 by M.A. Goldman. With a preface by J.S. Birman.
With Topology of 3 -dimensional Fibered Spaces by Seifert. Translated from the German by
Wolfgang Heil. Pure and Applied Mathematics 89. Academic Press, Inc.. New York–London
1980. xvi+437 pp.
[275] J.-P. Serre, Géométrie algébrique et géométrie analytique, Ann. Inst. Fourier 6 (1956), 1–42.
[276] J.-P. Serre, Corps Locaux. Deuxième édition. Publications de l’Université de Nancago, No.
VIII. Hermann, Paris 1968. 245 pp.
[277] J.-P. Serre, Lectures on the Mordell–Weil Theorem. Translated from the French and edited
by Martin Brown from notes by Michel Waldschmidt. Aspects of Mathematics, E15. Friedr.
Vieweg & Sohn, Braunschweig 1989. x+218 pp.
[278] J.-P. Serre, Galois Cohomology. Translated from French by P. Ion. Springer-Verlag, Berlin
1997. x+210 pp.
[279] I.R. Shafarevich, Basic Algebraic Geometry. 1. Varieties in Projective Space. Second edition.
Translated from the 1988 Russian edition and with notes by Miles Reid. Springer-Verlag, Berlin
1994. xx+303 pp.
[280] I.R. Shafarevich, Basic Algebraic Geometry. 2. Schemes and Complex Manifolds. Second edi-
tion. Translated from the 1988 Russian edition by Miles Reid. Springer-Verlag, Berlin 1994.
xiv+269 pp.
[281] T. Shimizu, On the theory of meromorphic functions, Japanese Journ. of Math. 6 (1929), 119–
171.
[282] X, The integer solutions of the equation y 2 = axn + bxn−1 + . . . + k , J. London Math. Soc.
1 (1926), 66–68. Also Gesammelte Abhandlungen, Bd. I, Springer-Verlag, Berlin-Heidelberg-
New York 1966, 207–208.
632 REFERENCES

[283] C.L. Siegel, Über einige Anwendungen diophantischer Approximationen, Abh. Preuß. Akad.
Wissen. Phys.-math. Klasse 1929, Nr. 1. Also Gesammelte Abhandlungen, Bd. I, Springer-
Verlag, Berlin–Heidelberg–New York 1966, 209–274.
[284] J.H. Silverman, The Arithmetic of Elliptic Curves. Graduate Texts in Mathematics 106.
Springer-Verlag, New York 1986. xii+400 pp.
[285] J.H. Silverman, Rational points on K3 surfaces: A new canonical height, Invent. Math. 105
(1991), 347–373.
[286] J.H. Silverman, Advanced Topics in the Arithmetic of Elliptic Curves. Graduate Texts in Math-
ematics 151. Springer-Verlag, New York 1994. xiv+525 pp.
[287] C.J. Smyth, On the product of the conjugates outside the unit circle of an algebraic integer,
Bull. London Math. Soc. 3 (1971), 169–175.
[288] C.J. Smyth, On the measure of totally real algebraic numbers, I, J. Austral. Math. Soc. Ser. A
30 (1980/81), 137–149; II, Math. Comp. 37 (1981), 205–208.
[289] V.G. Sprindžuk, Arithmetic specializations in polynomials, J. reine angew. Math. 340 (1983),
26–52.
[290] N. Steinmetz, Eine Verallgemeinerung des zweiten Nevanlinnaschen Hauptsatzes, J. reine
angew. Math. 368 (1986), 134–141.
[291] C.L. Stewart and R. Tijdeman, On the Oesterlé–Masser conjecture, Monatsh. Math. 102 (1986),
251–257.
[292] W. Stoll, Value Distribution on Parabolic Spaces. Lecture Notes in Mathematics 600. Springer-
Verlag, Berlin-New York 1977. viii+216 pp.
[293] W. W. Stothers, Polynomial identities and Hauptmoduln, Quart. J. Math. Oxford Ser. (2) 32
(1981), 349–370.
[294] T. Struppeck and J.D. Vaaler, Inequalities for heights of algebraic subspaces and the Thue–
Siegel principle, in Analytic Number Theory (Allerton Park IL 1989), 493–528. Progr. Math.
85, Birkhäuser Boston, Boston MA 1990.
[295] L. Szpiro (ed.), Séminaire sur les pinceaux arithmétiques: la conjecture de Mordell. Papers
form the seminar held at the École Normale Supériore, Paris 1983–84. Astérisque 127. Soc.
Math. France, Paris 1985. x+287 pp.
[296] L. Szpiro, E. Ullmo, and S. Zhang, Equirépartition des petits points, Invent. Math. 127 (1997),
337–347.
[297] R. Taylor and A. Wiles, Ring-theoretic properties of certain Hecke algebras, Ann. of Math. (2)
141 (1995), 553–572.
[298] A. Thue, Über Annäherungswerte algebraischer Zahlen, J. reine angew. Math. 135 (1909), 284–
305. Also Selected Mathematical Papers of Axel Thue. With an introduction by Carl Ludwig
Siegel. Universitetsforlaget, Oslo-Bergen-Tromsø 1977, 232–253.
[299] J.L. Thunder, Asymptotic estimates for rational points of bounded height on flag varieties,
Compos. Math. 88, No. 2 (1993), 155–186.
[300] R. Tijdeman, On the equation of Catalan, Acta Arith. 29 (1976), 197–209.
[301] E. Ullmo, Positivité et discrétion des points algébriques des courbes, Ann. of Math. (2) 147
(1998), 167–179.
[302] J.D. Vaaler, A geometric inequality with applications to linear forms, Pacific J. Math. 83 (1979),
543–553.
[303] M. van der Put, The product theorem, in Diophantine Approximation and Abelian Varieties.
Introductory Lectures, 77–82. Papers from the Conference held in Soesterberg, April 12–16,
1992. B. Edixhoven and J.-H. Evertse (eds), Lecture Notes in Mathematics 1566. Springer-
Verlag, Berlin 1993.
[304] M. van Frankenhuysen, A lower bound in the abc conjecture, J. Number Th. 82 (2000), 91–95.
[305] M. van Frankenhuysen, The ABC conjecture implies Vojta’s height inequality for curves, J.
Number Th. 95 (2002), 289–302.
References 633

[306] P. Vojta, A Diophantine conjecture over Q , in Séminaire de théorie des nombres, Paris 1984–
85, 241–250. Progr. Math. 63, Birkhäuser Boston, Boston MA 1986.
[307] P. Vojta, Diophantine Approximations and Value Distribution Theory. Lecture Notes in Mathe-
matics 1239. Springer-Verlag, Berlin 1987. x+132 pp.
[308] P. Vojta, Mordell’s conjecture over function fields, Invent. Math. 98 (1989), 115–138.
[309] P. Vojta, A refinement of Schmidt’s subspace theorem, Amer. J. Math 111, No. 3 (1989), 489–
518.
[310] P. Vojta, Siegel’s theorem in the compact case, Ann. of Math. (2) 133 (1991), 509–548.
[311] P. Vojta, On algebraic points on curves, Compos. Math. 78, No. 1 (1991), 29–36.
[312] P. Vojta, A generalization of theorems of Faltings and Thue–Siegel–Roth–Wirsing, J. Amer.
Math. Soc. 5 (1992), 763–804.
[313] P. Vojta, Applications of arithmetic algebraic geometry to Diophantine approximations, in
Arithmetic Algebraic Geometry, 164–208. E. Ballico (ed.), lectures from the Second C.I.M.E.
Session held in Trento, 1991. Lecture Notes in Mathematics 1553. Springer-Verlag, Berlin
1993.
[314] P. Vojta, Integral points on subvarieties of semiabelian varieties, I, Invent. Math. 126, No. 1
(1996), 133–181.
[315] P. Vojta, Roth’s theorem with moving targets, Internat. Math. Res. Notices (1996), 109–114.
[316] P. Vojta, On Cartan’s theorem and Cartan’s conjecture, Amer. J. Math. 119 (1997), 1–17.
[317] P. Vojta, A more general abc conjecture, Int. Math. Res. Notices (1998), 1103–1116.
[318] P. Vojta, Nevanlinna theory and diophantine approximation, in Several Complex Variables,
535–564. M. Schneider (ed.) et al., Berkeley CA 1995–96, MSRI Publ. 37, Cambridge Univer-
sity Press, Cambridge 1999.
[319] J.F. Voloch, Diagonal equations over function fields, Bol. Soc. Brasil. Mat. 16 (1985), 29–39.
[320] J.T.-Y. Wang, The truncated second main theorem of function fields, J. Number Th. 58 (1996),
139–157.
[321] F. Warner, Foundations of Differentiable Manifolds and Lie Groups. Corrected reprint of the
1971 edition. Graduate Texts in Mathematics 94, Springer-Verlag, New York 1983. ix+272 pp.
[322] L.C. Washington, Introduction to Cyclotomic Fields. Second edition. Graduate Texts in Math-
ematics 83. Springer-Verlag, New York 1997. xiv+487 pp.
[323] W.C. Waterhouse, Introduction to Affine Group Schemes. Graduate Texts in Mathematics 66.
Springer-Verlag, New York–Berlin 1979. xi+164 pp.
[324] A. Weil, L’arithmétique sur les courbes algébriques, Acta Math. 52 (1929), 281–315. Also
Œuvres Scientifiques – Collected Papers. Vol. I, Corrected Second Printing, Springer-Verlag,
New York–Heidelberg–Berlin 1980, 11–45.
[325] A. Weil, Sur un théorème de Mordell, Bull. Sc. Math. (2) 54 (1929), 182–191. Also Œuvres
Scientifiques–Collected Papers. Vol. I, Corrected Second Printing, Springer-Verlag, New York–
Heidelberg–Berlin 1980, 47–56.
[326] A. Weil, Arithmétique et Géométrie sur les Variétés Algébriques. Actualités Scientifiques et
Industrielles, No. 206. Hermann, Paris 1935. 3–16. Also Œuvres Scientifiques–Collected Pa-
pers. Vol. I, Corrected Second Printing, Springer-Verlag, New York–Heidelberg–Berlin 1980,
87–100.
[327] A. Weil, Variétes Abéliennes et Courbes Algébriques. Hermann, Paris 1948. 163 pp.
[328] A. Weil, Arithmetic on algebraic varieties, Ann. of Math. (2) 53 (1951), 412–444. Also Œuvres
Scientifiques–Collected Papers. Vol. I, Corrected Second Printing, Springer-Verlag, New York–
Heidelberg–Berlin 1980, 454–486.
[329] R. Weissauer, Der Hilbertsche Irreduzibilitätssatz, J. reine angew. Math. 333 (1982), 203–220.
[330] A. Weitsman, A theorem on Nevanlinna deficiencies, Acta Math. 128 (1972), 41–52.
[331] A. Wiles, Modular elliptic curves and Fermat’s Last Theorem, Ann. of Math. (2) 141 (1995),
443–551.
634 REFERENCES

[332] P.M. Wong, On the second main theorem of Nevanlinna theory, Amer. J. Math. 111 (1989),
549–583.
[333] K. Yamanoi, The second main theorem for small functions and related problems, Acta Math.
192 (2004), 225–294.
[334] Z. Ye, A sharp form of Nevanlinna’s second main theorem of several complex variables, Math.
Z. 222 (1996), 81–95.
[335] K. Yu, p -adic logarithmic forms and group varieties. I, J. reine angew. Math. 502 (1998), 29–
92; II, Acta Arithmetica 89 (1999), 337–378.
[336] D. Zagier, Algebraic numbers close to both 0 and 1, Math. Comp. 61 (1993), 485–491.
[337] U. Zannier, Some Applications of Diophantine Approximation to Diophantine Equations, with
Special Emphasis on the Schmidt Subspace Theorem. Forum, Editrice Universitaria Udinese,
Udine 2003. 69 pp.
[338] O. Zariski, P. Samuel, Commutative Algebra. Vol. II. Reprint of the 1960 edition. Graduate
Texts in Mathematics 29. Springer-Verlag, New York–Heidelberg–Berlin 1975. x+414 pp.
[339] S. Zhang, Positive line bundles on arithmetic surfaces, Ann. of Math. (2) 136 (1992), 569–587.
[340] S. Zhang, Positive line bundles on arithmetic varieties, J. Amer. Math. Soc. 8 (1995), 187–221.
[341] S. Zhang, Small points and adelic metrics, J. Alg. Geom. 4 (1995), 281–300.
[342] S. Zhang, Equidistribution of small points on abelian varieties, Ann. of Math. 147 (1998), 159–
165.
[343] A. Zygmund, Trigonometric Series. Vols I, II. Third edition. With a foreword by Robert A.
Fefferman. Cambridge Mathematical Library. Cambridge University Press, Cambridge 2002.
xii; Vol. I: xiv+383 pp.; Vol. II: viii+364 pp.
Glossary of Notation

A⊂B A is a subset of B (possibly equal), xv


B\A complement of A in B, xv
|A| number of elements of A, xv
id identity map, xv
N, Z natural numbers (0 included), rational integers, xv
Q, R, C rational, real and complex numbers, xv
R+ non-negative real numbers, xv
δij Kronecker symbol, xv
RX real functions on X, xv
f = O(g), f = o(g) Landau symbols, xv
f  g, g f Vinogradov symbols, xv
x max{m ∈ Z | m ≤ x}, xv
x min{m ∈ Z | m ≥ x}, xv
GCD(a, b) greatest common divisor of a, b ∈ Z. xvi
a|b a divides b in Z, xvi
π(x) number of primes up to x ∈ R+ , xvi
R× multiplicative units of a commutative ring R with 1, xvi
V∗ dual of a vector space V , xvi
[g1 , . . . , gm ] ideal generated by g1 , . . . , gm , 83
char(K) characteristic of a field K, xvi
Fq field with q elements, xvi
K[x] polynomials in variable x and coefficients in K, xvi
deg(α) degree of algebraic number α, xvi
x vector with entries xi , xvi
K algebraic closure of the field K, xvi
| |, | |v absolute value (with respect to place v), 1, 2
w|v place w is an extension of place v, 2
Kv completion of K at place v, 2
Qp , Zp , | |p p-adic numbers, integers and absolute value, 3
NL/K norm from L to K, 3
Rv , k(v) valuation ring and residue field for place v, 3
x reduction of x to the residue field, 3
fw/v , ew/v residue degree and ramification index, 4
OK algebraic integers of the number field K, 4
Gal(L/K) Galois group of field extension L/K, 6
xw , |x|w normalizations for w|v on finite extension L/K, 6, 9
MK normalized absolute values of a number field K, 11

635
636 Glossary of notation

| |Z absolute value on function field in prime divisor Z, 12


MX standard absolute values on function field of X, 12
h(P ) absolute logarithmic height of P ∈ PnK , 16
H(P ) eh(P ) multiplicative height of P ∈ PnK , 16
log+ t max(0, log t), 16
h(α) height of algebraic number α, 16
OS,K , US,K S-integers and S-units of number field, 17
xw,K |NL w /K v (x)|v , 20
h(f ) height of a polynomial f , 21
|f |v Gauss norm of f with respect to place v, 22
M (f ), M (α) Mahler measure of f and algebraic number α, 22, 28
T the unit circle in C, 22

p (f )
p -norm of the coefficients of the polynomial f , 24
[f ]p
p -norm of the hypercube representation, 31
sh(d, e) shuffle of type (d, e), 31
D = (sD ; L, s; M, t) presentation of Cartier divisor D, 35
λD (P ) local height of P relative to D, 35
λf (P ) local height relative to the rational function f , 36
λD (P, v), λD (P, u) local heights relative to v ∈ MF and u ∈ MK , 39
hλ (P ) global height of P relative to local height λ, 40
hc class of height functions relative to c ∈ Pic(X), 41
hϕ Weil height relative to morphism ϕ, 43
ϕ#ψ join of morphisms ϕ, ψ to projective spaces, 43
d(p), h(p) degree and height of presentation p, 47
|α| α0 + · · · + αN for α ∈ NN +1 , 49
xα xα0 0 · · · xαNN for x ∈ RN +1 , 49
L = (L,  ) metrized line bundle, 58
 O(D)
D,  Néron divisor and its metrized line bundle, 61, 62
λD (P ) local height of P relative to D,  61
λD (P, u), λD (P, v) local heights relative to v ∈ MF and u ∈ MK , 62, 64
hL (P ) global height of P relative to L, 64
Hu (x), Hu (A) Arakelov norm for vector x and matrix A, 66, 67
hAr , HAr Arakelov height, multiplicative version, 66, 67, 69
∧m W m th exterior power of vector space W , 67
At transpose of the matrix A, 69
Hu∗ (A) dual local height, 69
ηu (ψ) distorsion factor of ψ ∈ P GL(n + 1, Qu ), 69
δu (x, y) projective distance, 69
row
HAr (A) multiplicative Arakelov height with respect to rows, 75
H(A) multiplicative height of matrix A, 75
IM M × M unit matrix, 78
G := (Gnm , ·, 1n ) multiplicative algebraic group AnK \ {0}, 82
ej standard basis of Zn , 83
ϕA (x) (xAe1 , . . . , xAen ) for integer matrix A, 83
GL(n, Z), SL(n, Z) general and special linear group over Z, 83
&
Λ division group of subgroup Λ in Zn , 83
Glossary of notation 637

ρ(Λ) & : Λ], 83



HΛ {x ∈ G | xλ = 1 ∀λ ∈ Λ}, 83
MΛ connected component of HΛ , 83
δ(X) essential degree of subvariety X of G, 88
X∗ complement of all torsion cosets in X, 90
X◦ complement of all torus cosets in X, 90
X(H) union of gH ⊂ X for given linear torus H, 91

h(x) standard height of x ∈ Gnm , 94
εv [Kv : R]/[K : Q] for v|∞ and 0 else, 96
δξ average of Dirac measures over conjugates of ξ ∈ Q, 101
Cc (X) continuous compactly supported functions on X, 101
Cc (X)∗ weak-* dual of Cc (X), 102
(N) Northcott property, 117
A(T ) algebraic numbers in A with height ≤ T , 117
K (d) compositum of degree d extensions of K, 117
(d)
Kab maximal abelian subfield of K (d) , 117
(B) Bogomolov property, 120
Vp (α; K) normalized variance of α in number field K, 122
  Q (Γ)
rank
m
rank
m ofm abelian
j
 group Γ, 126
, 156
µ j=1 µ j
 µ 1  µ m
∂µ 1
µ 1 !···µ m !

∂ x1
· · · ∂ x∂m , 156
ind(P ; d; α) index of polynomial P at α, 156, 226
Vm (t) volume
n of standard simplex xi ≤ t in Rm , 157
Vm (t)  i=1 Vm (ti ), 157
Λ(β) v∈S min (1, |β − αv |v,K ), 164
C(λ; N ) approximation class of size 1/N and corner λ, 164
|x|v maxi |xi |v , 177
i! i0 ! · · · in !, 197
ω∗ star-operator on exterior algebra ∧k V , 198
haff (x) affine height of x ∈ AnK , 16
Πv (Q), Π(Q) (v-adic) approximation domain, 201
V (Q), rank(Π(Q)) vector space spanned by K n+1 ∩ Π(Q), dimension, 203
P(d) multihomogeneous polynomials of multidegree d, 204
∂I ∂i1 · · · ∂im for I = (i1 , . . . , im ), 204
Vi , (V/d) (v1i , . . . , vmi ), |vd 11 | + · · · + |vd m
m|
, 204
ε identity element of a group variety G, 232
m, ι multiplication and inverse of G as morphisms, 232
SL(n)K special linear group as group variety, 233
τa translation by a on abelian group A, 235
[n], A[n] multiplication map by n on A and its kernel, 235
cy restriction of c ∈ Pic(X) to fibre Xy , 246
idX identity map of X, 248
P ic0 (X) Picard variety of X, 249
 A
ϕ,  dual map and dual abelian variety, 251, 255
ϕc (a) τa∗ (c) − c ∈ A  for a ∈ A and c ∈ Pic(A), 252
638 Glossary of notation

Mor(X, A) morphisms from X to A, 260


θ(v, Z) Riemann theta function for Λ = Zg + Z · Zg , 262
c+ , c− even and odd part of c ∈ Pic(A), 265
Hom(A1 , A2 ) homomorphisms of abelian varieties, 268
Sr , C (r) symmetric group with r elements, C r /Sr , 270
j canonical embedding of C into Jacobian J, 272
Θ theta divisor, 272
L(D), L(D) L ⊗ O(D), L ⊗ O(D), 272
jr canonical maps from C r to J, 274
Θ− , θ, θ − [−1]∗ Θ, classes of Θ, Θ− , 275

hc Néron–Tate height associated to c, 287
a, a , |a| canonical form of J and associated norm, 294
hD height function associated to cl(O(D)), 294
nΓ (H) |{Q ∈ C(K) ∩ j −1 (Γ) | Q large, |j(Q)| ≤ H}|, 298
d( ,   ) distance of locally bounded metrics, 301
(L, ν), ( ν,u )u∈M rigidified line bundle with canonical metrics, 303, 308
[P ] cycle in Z0 (XK ) associated
  to P ∈ X, 304
nj
f (Z) j f (P j ) for Z = j n j [Pj ], 304
B0 (X) kernel of the degree map on Z0 (XK ), 304
(D, Z)u Néron symbol, 304, 308, 310
B 1 (X) divisors algebraically equivalent to zero, 308
E(P ), t E(P  ) (p2 )∗ (E.({P } × X  )), (p1 )∗ (E.(X × {P  })), 309
εw
[F w :Qp ]
[F :Q]
for w ∈ MF , 315
dP /Q , 
 duP /Q local discriminant (in the place u), 336, 338
dP /Q global discriminant in the number field case, 339
1
m
A(K) m-division group of A(K) in A(K), 341
K(S) smallest extension L/K with S ⊂ A(L), 341
xg Galois action of g ∈ Gal(L/K) on x ∈ X(L), 341
Ks separable algebraic closure of K, 342
a, g Kummer pairing, 342, 344
µm (K) m th roots of unity in K, 344
K ×m , H ×m m th powers, for subgroup H of K × , 344
H i (G, M ) Galois cohomology group, 345
K ab maximal abelian extension of exponent dividing m, 346
Kvnr , KM nr
max. subextension of K ab unramified over v, M , 346, 346
nr ×m
HM
 (KM ) ∩ K × for a set of valuations M , 346
nv [v] ∈ DivM (K) M-divisors, 346
divM (f ) M-divisor associated to f ∈ K × , 346
ClM (K), UM M-class group, M-units, 346
∆ ∆ − {P0 } × C − C × {P0 }, 356
V Vojta divisor for parameters d1 , d2 , d, 357
ψ closed embedding of C × C into PnK × PnK , 357
O(d), O(δ1 , δ2 ) OPm (d), OPn ×Pn (δ1 , δ2 ), 357
ϕB closed embedding of C × C into Pm K , 358
(V1), (V2), (V3) conditions for the Vojta divisor, 358, 378
Glossary of notation 639

K[[x]]
 ring
 of formal power series, 362
 aα xα r |aα |rα , 362
K r−1 x {f ∈ K[[x]] | f r < ∞}, 362
ρr (f ), ρ(f ) spectral norm, for r = (1, . . . , 1), 362, 364
K{r−1 x} strictly convergent power series, 364
∂α f (x0 ) Taylor coefficients at x0 , 366
f˙ ∂
∂t
f , 368
1 i
∂i i!
∂ , 371
(i∗1 , i∗2 ) admissible pair in (P, Q) ∈ C × C, 372
NL (X, T ) number of points in X(K) with HL (P ) ≤ T , 392
RK , wK , hK regulator, number of roots of unity, class number, 392
ζK zeta function of number field K, 392
NAr (X, T ) NO(1) (X, T ) for the Arakelov height, 393
Nlines (X, T ) contributions
 of points on rational lines, 395
rad(N ) p|N p, radical of N ∈ N, 402
π1 (X, x0 ) fundamental group of X, 411
Γ\Y quotient of Y by left action of Γ, 411
H upper half plane in C, 413

rad(f ) (x − α), radical of f ∈ K[x], 416
f (α)=0
ϑP p∈P log p, 420
Ψ(N, P), Ψodd (N, P) number of (odd) n ≤ N with all prime factors in P, 420
E reduction of elliptic curve E, 426
cond(E) conductor of elliptic curve E, 429
ex/π(x) ramification index of x over π(x), 436
D open unit disc in C, 437
F +, F − max{F, 0}, − min{F, 0}, 445
n(r, a, f ) enumarating function for r > 0, a ∈ P1an , 446
N (r, a, f ), m(r, a, f ) counting and proximity function, 446, 447
c(f, 0) leading coefficient of the Laurent series at 0, 446
T (r, f ) Nevanlinna’s characteristic function of f , 448
ρ(f ) order of a meromorphic function f , 450
W (f0 , . . . , fn ) Wronskian of entire functions f0 , . . . , fn , 450, 470
Nram (r, f ) counting function of ramification of f , 450
Nram (r, a, f ) counting function of ramification over a, 451
δ(a, f ), θ(a, f ) defect and ramification defect of f , 452
N (1) (r, a, f ), cond(r, f ) truncated counting function and conductor, 455
k(z1 , z2 ) chordal distance on the Riemann sphere, 457
ω Fubini–Study form on projective space, 457, 467

m(r, a, f ) spherical proximity function, 458

T (r, f ) Ahlfors–Shimizu characteristic, 458
ordY (D) multiplicity of divisor D in component Y , 466
Nf,D (r), mf,D counting function and proximity function, 466, 476
Tf,D (r), Tf,L (r) characteristic function for holomorphic curve f , 466, 469, 476
Nf,ram counting function for ramification of f , 470
Rf ramification divisor of holomorphic map f , 473
640 Glossary of notation

(1)
Nf,D (r) truncated counting function, 475
mS (a, β), NS (a, β) proximity and counting function for a ∈ K, 481
NS,D , mS,D counting and proximity function for divisor D, 483, 499
dK , d(P ) absolute logarithmic discriminant, 487, 499
condK λ (P ) conductor of P , 489, 499
fin
MK discrete valuations of number field K, 489
χ characteristic function for (0, ∞), 489
NS,ram counting of ramification in function field case, 505
(n)
ND (P ) higher truncated counting function, 508
AK , x1 , . . . , xn
n
affine n-space over K, fixed coordinates, 514
Z(T ) zero set of T ⊂ K[x] in AnK , 514
I(Y ) ideal of vanishing of Y ⊂ AnK , 515
K[X]
√ coordinate ring of affine variety X, 515
J radical of ideal J, 515
ϕ composition with morphism ϕ, 515, 521, 574
OX,x , mx local ring of x ∈ X and maximal ideal, 516, 522
mx {f ∈ K[X] | f (x) = 0}, 516
X(L) L-rational points of X, 516, 522
K(x) residue field of OX,x , 516, 522
Spec(A) Spectrum of ring A, 516
XF base change of X to extension F/K, 517, 522
dim(T ) dimension of topological space T , 518
dim(A) Krull dimension of commutative ring A, 518
GL(n)K , GL(n, L) invertible matrices as K-variety, with entries in L, 519
ρU
V restriction map from U to V for presheaf, 520
OX sheaf of regular function on variety X, 521
JY ideal sheaf, 522
codim(B, X) codimension of B in X, 523
K(X) function field of X, 524
πE : E → X vector bundle E over X, 525
gαβ transition function of E, 525
Ex fibre of E over x, 526
E ⊕ E, E ⊗ E direct sum and tensor product of vector bundles, 527, 528
Γ(U, E) space of sections of E over U , 527
E sheaf of sections of vector bundle E, 527
OX trivial bundle of rank 1 over X, 528
E ∗ , E ∧ E  , Hom(E, E  ) dual, wedge product and homomorphisms, 528
E/F quotient vector bundle by subbundle F , 529
ϕ∗ (E), E|X  pull-back and restriction of E, 529
L⊗n , L−1 tensor power and dual of line bundle L, 529
c = cl(L) ∈ Pic(X) Picard group and class of L, 529
ϕ∗ (s) pull-back of section, 530
PnK , (x0 : · · · : xn ) projective space, fixed homogeneous coordinates, 530
I(Y ) homogeneous ideal of projective variety Y , 531
S(Y ) = K[x]/I(Y ) homogeneous coordinate ring of Y , 531
OPnK (m) tensor powers of the tautological line bundle, 532
Glossary of notation 641

PK , OP (d1 , . . . , dr ) multiprojective space with line bundle, 534


X1 ×S X2 fibre product of varieties X1 , X2 over S, 535
DerK (A, M ) K-derivations of A into M , 536
TX,x tangent space of variety X in x, 536
dϕ differential of morphism or function, 537, 543
dimx (X) dimension in x ∈ X, 538
∂|x evaluation of vector field ∂ at x, 540
TX , TX∗ tangent and cotangent bundle of X, 542, 542
ΩkX , ϕ∗ sheaf of k-forms with pull-back, 542, 543
KX , NY /X canonical line bundle, normal bundle, 542, 543
Ω1X/ Y sheaf of relative differentials, 543
D ≥ D partial order on Weil divisors, 544
supp(D) support of divisor, 544, 550
[x] divisor of x in a curve, 544
ordY (f ) order of f ∈ K(X) in Y , 545, 546
div(f ) Weil divisor associated to f , 545, 547, 553
OX,Y , mY local ring of X in prime divisor Y , 546
D ∼ D rational equivalence of Weil divisors, 547
div(s) Weil divisor of meromorphic section s, 547, 554
ss , s−1 , s/s multiplication, division of meromorphic sections, 548
O(D), sD line bundle and section for Cartier divisor D, 548
D(s) Cartier divisor of meromorphic section s, 548
cyc(D) Weil divisor associated to D, 549
|D| complete linear system, 549
ϕ∗ (D) pull-back of divisor D, 551, 555
Z∗ (X) group of cycles on X, graded by dimension, 551

Λ (M ) length of Λ-module M , 552


Z ∼ Z  , CH∗ (X) rational equivalence of cycles, Chow group, 553
ϕ∗ (Z) push-forward of cycle Z, 553
c1 (L).Z first Chern class operation of line bundle L, 554
cyc(X) cycle of scheme X, 555
ZF base change of cycle Z to extension F/K, 555
D.Z intersection product of divisor D with Z, 555, 556
deg(Z), degL (Z) degree of Z (with respect to L), 557, 558, 561
D1 · · · Dd intersection number of divisors, 557
Pic0 (X) classes of line bundles algebraically equivalent to 0, 562
ker(ϕ), im(ϕ) = ϕ(F) kernel and image of sheaf homomorphism, 564
F /ϕ (F  ), coker(ϕ) quotient sheaf, cokernel of ϕ, 564
C ∗ (U, F ), Ȟ ∗ (U, F ) Čech complex and cohomology, 564, 565
OX (D) sheaf of sections of OX (D), 567
HomOX (E, F ) sheaf of OX -module homomorphisms, 568
E ⊗OX F tensor product of OX -modules, 568
ϕ∗ (F ) direct image of OX -module F , 568
ϕ ∗ F  , F  |X pull-back and restriction of OX -module, 568
H p (X, F ) cohomology group of coherent OX -module, 569
ωX sheaf of sections of canonical line bundle, 571
642 Glossary of notation

χ(F ) Euler characteristic of F , 571


F (n) F ⊗ L⊗n for L very ample, 571
supp(F ) support of OX -module F , 571
ϕ : X  X  rational map, 574
Xy fibre of ϕ : X → Y over y ∈ Y , 577
deg(ϕ) degree of equidimensional morphism, 578
g(C) genus of curve C, 582
deg(L) degree of a line bundle on C, 582
div(KX ), pa (X) canonical divisor, arithmetic genus, 583
Xan complex analytic variety associated to X, 584
DA/R (e1 , . . . , en ), dA/R discriminant of free R-algebra A, 586
TrA/R , NA/R trace and norm of A over R, 586
Df , res(f ) discriminant and resultant of polynomial f , 586, 588
R̄L integral closure of Dedekind domain R in L, 589
dL/K = dR̄ L /R discriminant of R̄L for K = Quot(R), 589
DL/K principal generator of dL/K , 589
DR̄ L /R = DL/K different of R̄L for K = Quot(R), 590
Kvnr maximal unramified subextension, 594
vp place induced by maximal ideal p of R, 594
Rp ramification divisor of morphism p, 599
Dred reduced divisor, 600
KA adele ring of number field K, 604
β, βv normalized Haar measures on KA and Kv , 605
E, Ev , E∞ , EA euclidean spaces K N , KvN , v|∞ KvN and KAN , 608
λn nth successive minimum, 611
Index

abc -conjecture Affine variety, 515


Baker’s explicit version, 404 associated complex analytic variety,
implies Catalan’s conjecture, 403 519
implies Fermat’s last theorem, 403 associated scheme, 516
implies Roth’s theorem, 406 basis of topology, 517
K -rational, 497 complex topology, 519
strong form, 402 geometrically irreducible, 519
over number field, 494 geometrically reduced, 519
weak form, 403 open subsets are quasicompact, 517
abc -ratio, 423, 435 Ahlfors–Shimizu characteristic, 458
abc -theorem of a holomorphic curve, 468
for meromorphic functions, 456, Albanese variety, 394
477 Algebraic equivalence
for polynomials, 416, 504 of divisors, 562
Abel’s theorem, 271 of line bundles, 561–563
Abelian extension, 113, 344 on an abelian variety, 255, 265
Abelian subvariety, 233 Algebraic subgroup, 82
Abelian variety, 232 Algebraic variety, 584
as a complex torus, 239 Amoroso–Dvornicich theorem, 113
is commutative, 234 Ample
is projective, 254 Cartier divisor, 548
simple, 267 line bundle, 534
Absolute value, 1–4 finite pull-back, 578
archimedean, 2 on abelian variety, 252–255
discrete, 4 on curve, 583
equivalent, 2 Approximation
non-archimedean, 2 class, 164
normalization of, 6, 9, 11 nontrivial, 164
p -adic, 2 Approximation class
trivial, 2 primitive, 200
Absolute values Approximation domain, 201
of a function field, 12 v -adic, 201
of a number field, 10 Approximation theorem, 4
Additive reduction strong form, 11
of an elliptic curve, 427 Arakelov height
Adeles, 604–607 multiplicative, 66
Admissible pair, 372 of a matrix, 67, 69, 74
Affine chart, 521 of a subspace, 67, 395
Affine open subset, 522 on projective space, 66
Affine space, 514 Archimedean, see also Absolute value

643
644 Index

Arithmetic genus, 583 Čech cocycle, 565


Artin–Schreier equation, 154 Čech cohomology group, 565
Automorphism of group varieties, 232 Čech complex, 564
Character, 607
Bad reduction of elliptic curve, 426 Characteristic function, 448
Banach’s fixed point theorem, 368 of a holomorphic curve, 466, 476
Base change of Ahlfors–Shimizu, 458
and (very) ample, 534 Chevalley–Weil theorem, 341
and closed embeddings, 534 for abelian varieties, 341
of a variety, 522 for discrete valuations, 339
of a vector bundle, 529 for number fields, 339
of affine variety, 517 global version, 338
of coherent sheaf, 570 local version, 336
of cycle, 555 local version for abelian varieties, 340
Base-point Chordal distance, 457
of a line bundle, 530 Chow group, 553
of complete linear system, 550 of projective space, 560
Base-point-free Chow’s lemma, 562
Cartier divisor, 548 Class group, 142
line bundle, 530 of S -integers, 348
Belyı̆’s lemma, 404 Closed embedding, 523
Belyı̆’s theorem, 413 Closed map, 535
Bertram–Ein–Lazarsfeld theorem, 53 Closed subgroup, 232
Bi-elliptic curve, 391 Closed subvariety, 523
Bilu’s theorem, 102, 399 ideal sheaf, 523, 528, 569
Binet’s formula, 68 Cocycle rule, 526
Birational map, 574 Codifferent, 590
Birational to a hypersurface, 575–576 Codimension, 523
Birational varieties, 575 Coherent sheaf, 567–574
Birthday paradox, 423 Cohomology group
Bogomolov conjecture, 399 of coherent sheaf, 569–574
Bogomolov property (B), 120 Compact, xv
Bogomolov’s conjecture, 400 Complete linear system, 550
Bombieri–Lang conjecture, 486 Complete variety, 535–536
Bounded subset, 37, 54–57 Completion, 2
Bézout’s theorem, 561 Complex analytic variety, 584
Complex embedding, 7
Canonical divisor, 583, 599 Complex manifold, 583–585
Canonical form, 294 Complex space, 584
Canonical line bundle, 473, 542, 582, 583 Complex topology, 519, 584
Canonical metric, 303, 308 Complex torus, 239
has harmonic Chern form, 311 Component
Cartan’s formula, 448 of a cycle, 551
Cartier divisor, 548 of Weil divisor, 544
Čech cocycle, 566 Composition series, 552
associated Weil divisor, 549 Conductor
effective, 550 of a meromorphic function, 456
of rational function, 549 of a point, 489
Catalan’s conjecture, 403 over function field, 499
Cauchy inequalities, 366 Conjugates
Čech boundary, 565 of a point in an affine variety, 516
Index 645

of a variety, 524 for group varieties, 236


Conormal bundle, 543 of varieties, 577
Constancy lemma, 233 Direct image of coherent sheaf, 568
Coordinate ring, 515 Direct sum of vector bundles, 527
is noetherian, 517 Dirichlet’s theorem, 181
Correspondence, 309 Dirichlet’s unit theorem, 18
Cotangent bundle, 542 for orders, 189
Cotangent space, 542 Discrete valuation ring, 544
Counting function, 446 Discriminant
in diophantine approximation, 481 absolute logarithmic, 487
in diophantine geometry, 483 for function fields, 499
of a holomorphic curve, 466, 476 Dedekind’s theorem, 595
over function field, 499 Hermite’s theorem, 595
Covering group, 411 minimal global form, 429
Cube-slicing inequality, 618 minimal local form, 426
Curve, 581–583 Minkowski’s theorem, 596
rational, 237 of a polynomial, 586–589, 592
Cycle, 551 of elliptic curve, 425
of a scheme, 555 of free algebra, 586
over Dedekind domain, 589–591
Darmon–Granville theorem, 441 Discriminant ideal, 586
Davenport–Estermann theorem, 611 Distorsion factor, 69
Decomposition group, 597 Divisor, 544–551
Dedekind domain, 589 Néron, 61
Dedekind’s different theorem, 594 on regular variety, 549
Dedekind’s discriminant theorem, 595 special, 270
Defect inequality, 452 with normal crossings, 473, 484
Defect of meromorphic function, 452 Dobrowolski’s theorem, 29, 107
Deficient value, 453 Domain, 574
Degree Dominant rational map, 574
essential, 88 Drasin’s results on deficiencies, 454
is positive, 561 Dual abelian variety, 255
of a cycle, 559, 561 biduality, 256
of a line bundle, 582 Dual vector bundle, 528
of a variety, 559
of an algebraic number, xvi Effective Cartier divisor, 550
of morphism, 578 Effective cycle, 561
of zero-dimensional cycle, 557 Effective divisor, 549
Derivation, 345 Effective methods, 146
Derivative, 536 for unit equation, 146
Descente infinie, 350 Effective Weil divisor, 544
Different, 590–591 Elliptic curve, 240–246, 425–431
Dedekind’s theorem, 594 addition law, 244
Differential additive reduction, 427
of a function, 543 good reduction, 426
of a morphism, 537, 538 multiplicative reduction, 427
Differential forms, 542 split multiplicative reduction, 314
Dimension Elliptic function, 453
of cycle, 551 Endomorphism of group varieties, 232
of topological space, 518 Enflo’s theorem, 29
Dimension theorem Enumerating function, 446
646 Index

Essential degree, 88 Fubini–Study metric, 58, 467


Étale morphism, 580 Full module, 189
differential criterion, 581 Function field, 524
for schemes, 598 and birationality, 575
local behaviour, 581, 598 extension of, 12
Euler characteristic of projective variety, 531
of coherent sheaf, 571 places of, 12
Even elements, 258 Fundamental group, 411
Even line bundle, 258 Fundamental inequality, 20
Exact sequence
of sheaves, 564 GAGA-principle, 584
Exponent of field extension, 117, 344 Galois cohomology, 345
Exterior product of vector bundles, Gap principle, 139
529 strong, 171
Gauss norm, 22
Faltings’s big theorem, 391, 486 Gauss’s lemma, 22
Faltings’s theorem, 352, 406, 485, 497 Gelfond’s lemma, 27
Vojta’s proof, 352 general abc -theorem
Faltings–Wüstholz theorem, 229 for function field, 511
Fano surface for cubic threefold, 394 for several polynomials, 418, 511
Fano variety, 393 General Lang conjecture, 486
Fenchel’s conjecture, 437 General position
Fermat curve, 495 hyperplanes, 470
Fermat descent, 349, 350 linear forms, 180
Fermat equation, 403 Genus, 582
generalized, 435, 441 Geometrically connected, 235
Fermat’s conjecture, 403, 429 Geometrically irreducible
Fermat’s last theorem, 403, 429 affine variety, 519
Fibre of vector bundle, 526 smooth variety, 539
Fibre product of varieties, 535 variety, 522, 524
Filtration Geometrically reduced
Harder–Narasimhan, 229 affine variety, 519
jointly semistable, 228 variety, 522, 524
Finite length, 552 smooth points are dense, 540
Finite morphism, 577–579 Global height, 40
First Chern class, 554 Global section, 549
First main theorem of a vector bundle, 527
Ahlfors–Shimizu version, 458 Good reduction
for a holomorphic curve, 469 of a curve, 438
of Nevanlinna, 448 of an abelian variety, 312, 340
Fischer’s inequality, 68 of an elliptic curve, 426
Flat module, 579 Group variety, 232
Flat morphism, 579–580
for schemes, 598 Haar measure, 602–604
is open, 580 normalized on adeles, 605–607
Fourier transform, 607 Hadamard’s inequality, 26, 68
Fox’s theorem, 437 Hall’s conjecture, 425
Free OX -module, 566 strong form, 425
Frey curve, 429 Hall–Lang–Waldschmidt–Szpiro conjecture, 434
Frobenius map, 299 Height, 15–21
Fubini–Study form, 457 function, 41, 63
Index 647

global, 40 Integer of a number field, 17


in affine space, 16 Intersection multiplicity, 556
in projective space, 16 Intersection number of divisors, 558
local, 35, 39 Intersection product
multiplicative, 16 non-negativity, 561
Néron–Tate, 284–289 with a divisor, 555
of a matrix, 75 Intersection product of cycles, 560
of a presentation, 47 Inverse image of coherent sheaf,
of an algebraic number, 16 568
of polynomials, 21–29 Invertible meromorphic section
over function field, 21, 44, 500 of a line bundle, 529, 548
standard, 94 Weil divisor, 547, 554
Hensel’s lemma, 4 Invertible sheaf, 570
Hermite’s discriminant theorem, 595 Irreducible component, 518, 523
Hilbert polynomial of a smooth variety, 539
of coherent sheaf, 571–573 Irreducible subset, 518, 523
of projective subvariety, 573 Irreducible topological space, 518
Hilbert’s basis theorem, 517 Isogeny, 267
Hilbert’s irreducibility theorem, 319, 326 Isomorphism
Hilbert’s Nullstellensatz, 515 of varieties, 521
Hilbertian field, 314 of affine varieties, 516
Homogeneous coordinate ring, 531 of group varieties, 232
Homogeneous coordinates, 531 of sheaves, 521
Homogeneous function, 285 of vector bundles, 526
Homogeneous ideal
Jacobi criterion, 540
of projective variety, 531
Jacobi inversion theorem, 271
Homomorphism
Jacobian variety, 268–282
of abelian varieties, 233
selfduality, 280
of algebraic subgroups, 82
Jensen’s formula, 23, 445, 446
of complexes, 565
Jensen’s inequality, 462
of group varieties, 232
Join
image, 235
of closed embeddings, 43
kernel, 236
of morphisms, 43
of sheaves, 521
cokernel, 564 Kähler differentials, 543, 591
image, 564 Kronecker symbol, xv
kernel, 564 Kronecker’s theorem, 17, 289
surjective, 564 Krull dimension, 518
of vector bundles, 526 Kummer pairing, 343
Hurwitz’s theorem, 453, 600 Kummer sequence, 345
Hypercube representation, 31 Kummer theory, 344
Hyperelliptic curve, 391 Künneth formula, 574
Hyperelliptic equation, 142
Hyperplane in projective space, 560 (L, M ) -independence, 165
Hyperplane section, 533 Landau symbols, xv
Large point, 296
Ideal of vanishing, 515 Large solution, 138
Ideal sheaf of closed subvariety, 523 Lattice, 83, 608
Index of a polynomial, 156, 226 non-archimedean, 608
Inertia group, 596, 597 number field, 609
Inner derivation, 345 real, 608
648 Index

Laurent’s theorem, 193 Metric


Lefschetz principle, 321 M-, 62
Lehmer’s conjecture, 28 canonical, 303, 308
Leibniz’s rule, 536 Fubini–Study, 58
Lemma on the logarithmic derivative, 451 locally bounded, 58
Length of module, 552 on a line bundle, 58
Line bundle, 529 standard, 58
first Chern class, 554 trivial, 58
generated by global sections, 533 Metrized line bundle, 58
metrized, 58 isometric, 58
on multiprojective space, 535, 559 pull-back of, 58
on projective space, 532–533, 559 Minimal polynomial, xvi
Linear system, 550 Minimal subset for unit equation, 510
Linear torus, 82 Minkowski’s discriminant theorem, 596
Liouville inequality, 154 Minkowski’s first theorem, 615
Liouville’s inequality, 20, 150 Minkowski’s second theorem, 611, 614
projective, 71 Monoidal transformation, 83
Local degree, 5 Mordell conjecture, 352, 406
Local discriminant, 336 Mordell–Weil group, 355
Local Eisenstein theorem Mordell–Weil theorem, 349
for a polynomial, 370 for finitely generated fields, 350
for a strictly convergent power series, 367 Morphism
Local height, 61 of affine varieties, 515
canonical, 304 of topological coverings, 411
over function field, 499 of varieties, 521
relative to a Cartier divisor, 35, 39 Moving lemma, 305
relative to a presentation, 35, 39 Multiplicative reduction of elliptic curve, 427
Local parameter, 4, 544 Multiplicity
Local ring of a cycle, 551
in a prime divisor, 546 of a divisor, 544
of a point in a variety, 516, 522 Multiprojective space, 534
Locally bounded cohomology, 574
function, 55 Mumford’s formula, 294
M-metric, 62 generalization, 359
metric, 58 Mumford’s gap principle, 298
Locally compact field, 4
Locally free OX -module, 566 Natural density, 319
Locally M -bounded function, 57 Néron divisor, 61
Log-concave function, 616 Néron model, 312, 327, 340
Long exact cohomology sequence, 569 Néron symbol
on abelian varieties, 304
M-bounded family of subsets, 56 on complete varieties, 308
M-bounded subset, 57 on curves, 310
M-class group, 347 Néron–Tate height, 284–289
M-constant, 304 associated bilinear form, 289–293
M-divisor, 346 Nevanlinna’s inverse problem, 454
M-metric, 62 Noetherian ring, 517
Mahler measure, 22 Non-archimedean, see also Absolute value
infimum of, 28 Norm, 586
Manin–Mumford conjecture, 400 Norm-form equation, 189
Meromorphic section of line bundle, 529 family of solutions, 190
Index 649

Normal bundle, 543 unramified, 592, 598


and self-intersection, 556 wildly ramified, 594
Normal variety, 546 Plane curve, 582
Normalization Poincaré class, 249
of a curve, 581 is even, 266
of a variety, 578 Poincaré’s complete reducibility theorem, 267
in a field, 12 complex analytically, 268
Normalized variance, 122 Poisson’s formula, 445
Northcott property, 117, 298 Poisson–Jensen formula, 445
Northcott’s theorem Pole-divisor, 544
for algebraic numbers, 25 Preperiodic point, 288
for varieties, 44 Presentation
Number field, 4 degree of, 47
Numerical equivalence, 563 height of, 47
of a Cartier divisor, 35
Odd elements, 258 of a morphism, 47
Odd line bundle, 258, 265 pull-back of, 36
Open subvariety, 522 sum of, 36
Order Presheaf, 520
of a meromorphic function, 450 associated sheaf, 563
of a number field, 189 Prevariety, 521
of a rational function, 545, 547, 552 Prime cycle, 551
of a section, 547 Prime divisor, 544
Ostrowski’s theorem, 3 Primitive solution, 425
Primitive subgroup, 83
p -adic, totally, 120 Principal Cartier divisor, 549
p -adic integers, 3 Principal Weil divisor, 547, 553
p -adic numbers, 3 Product formula, 9
Padé approximant, 129 for function fields, 12
Pell equation, 191 for number fields, 10
Picard group, 529, 548, 549 Product of prevarieties, 521
Čech cohomology group, 566 Product theorem, 226
isomorphic to first Chow group, 555 Product variety, 522, 524
of affine space, 559 of affine varieties, 519
of multiprojective space, 559 Projection formula, 555, 556
of projective space, 559 Projective distance, 69
Picard scheme, 249 Projective linear subspace, 560
Picard variety, 246–252 Projective space, 531
Picard’s little theorem, 455 Projective space over function field, 499
Pisot–Vijaraghavan number, 116 Projective variety, 531
Place, 2 is complete, 536
complex, 7 Proper intersection product
division of, 2 with a divisor, 556
extension of, 2 Proper morphism, 535–536
finite, 605 Proximity function, 447
infinite, 605 in diophantine approximation, 481
lying over, 2 in diophantine geometry, 483
ramified, 592 of a holomorphic curve, 466, 476
real, 7 over number field, 499
tamely ramified, 594 Pull-back
totally ramified, 596 of Cartier divisor, 551
650 Index

of coherent sheaf, 568 of a point, 522


of divisor class, 556 Restriction map for presheaves, 520
of k -forms, 543 Restriction of a vector bundle, 529
of vector bundle, 529 Resultant, 588–589
Push-forward Riemann form, 239
of cycle, 553 Riemann surface, 585
of principal divisor, 554 Riemann theta function, 262
on Chow groups, 554 Riemann’s existence theorem, 412, 585
Grothendieck’s generalization, 415
Quadratic form, 258 Riemann–Roch theorem
Quadratic function, 258 for abelian varieties, 262
associated bilinear form, 258 for curves, 582
associated linear form, 259 for surfaces, 583
associated quadratic form, 259 Rigidified line bundle, 302
Quasi-homogeneous function, 285 pull-back, 302
Quasicompact, xv, 517 tensor product, 302
Quotient sheaf, 564 Roth’s lemma, 159
Quotient vector bundle, 529 generalized, 207
Roth’s theorem, 152–156, 164, 179, 181, 406
Radical ideal, 515 with moving targets, 170
Radical of an ideal, 515 Runge’s theorem, 317, 326
Radical of an integer, 402
Ramification defect S -integer, 17, 348
of a meromorphic function, 452 S -integral point, 183
Ramification divisor, 473, 599–601 S -unit, 17, 348
Ramification index, 4 Schanuel’s theorem, 393
Rank of an abelian group, 297 Scheme, 522, 555, 567, 573, 598–599
Rational equivalence affine, 516
of cycles, 553 Schmidt–Struppeck–Vaaler theorem, 69
of Weil divisors, 547 Schwarz’s lemma, 366
Rational function, 524 Second main theorem, 451
Rational map, 574–577 Ahlfors’s proof, 460
image of, 574 Cartan’s generalization, 471
on curve, 577 over function field, 505
Rational point, 522 Cartan–Vojta version, 473, 484
of an affine variety, 516 equidimensional case, 475
Real embedding, 7 for projective curve, 474, 476
Reciprocity law, 309 truncated version, 476
Reduced divisor, 600 Griffiths’s conjecture, 473
Reduced K -algebra, 515 in diophantine approximation, 482
Reduction, 3 Section of vector bundle, 527
Refinement of open covering, 565 Seesaw principle, 247
Regular curve, 581 Segre embedding, 531, 535
Regular function, 515, 521 Semistable reduction of elliptic curve, 427
Regular in codimension 1, 546 Separable degree of morphism, 579
Regular local ring, 538 Separable morphism, 581
Regular point, 538 Serre duality, 571
Regular variety, 538 Sheaf, 520, 528, 563–574
is normal, 546 of k -forms, 542
Residue degree, 4 on singular varieties, 543
Residue field, 3 of modules, 566
Index 651

of relative differentials, 543 Tate uniformization


Short exact sequence of sheaves, 564 of complex elliptic curve, 312
Shuffle of type (d, e) , 31 Tate’s elliptic curve, 314
Siegel’s lemma, 72 Tate’s limit argument, 286
Bombieri–Vaaler version, 74 Tautological line bundle, 532
relative version, 79 Taylor coefficients, 366
Siegel’s theorem, 184, 484 Taylor series, 366
Singular point, 538 Tensor product of vector bundles, 528
Size of approximation class, 164 Theorem of the cube, 260
Skyscraper sheaf, 569 complex analytically, 262
Small point, 296 for varieties, 261
Small solution, 138 Theorem of the square, 253
Smooth point, 539 complex analytically, 257
and regularity, 539 Theta divisor, 272
Smooth variety, 539–543 is ample, 280
Smyth’s theorem, 29, 116 Theta function, 261
Special divisor, 270 normalized, 261
Special set of a variety, 486 trivial, 261
Spherical proximity function, 458 Thue equation, 150, 189
Standard affine open subsets, 531 Thue’s theorem, 150
Standard metric, 58 Thue–Mahler equation, 140
Standard étale morphism, 581 Topological covering, 411
for schemes, 598 finite, 412
Stein factorization, 578 Torsion coset, 83, 399
Stereographic projection, 457 Torsion sheaf, 599
Stewart–Tijdeman theorem, 419 Torus
Stothers–Mason theorem, 418, 503 coset, 82
Strictly convergent power series, 364 linear, 82
Strong approximation theorem, 11 Totally real algebraic number, 107
Subbundle of vector bundle, 526 Trace, 586
Subfamily of Pic0 (X) , 248 Transition function, 529
Subsheaf, 563 Transition matrix, 526, 527
Subspace theorem, 177 Translation, 235
absolute, 228 Transversal intersection, 557
affine, 178 Triangle group, 437
basic inequality, 199 Triangle inequality, 1
Evertse’s lemma, 217 ultrametric, 2
general, 180 Trivial metric, 58
primitive solutions, 199 Trivial vector bundle, 526
Successive minimum, 76, 611 of rank 1, 528
Superelliptic equation, 145 Trivialization of vector bundle, 526
Support Truncated counting function
of a cycle, 551 higher, 508
of Cartier divisor, 550 of a holomorphic curve, 475
of coherent sheaf, 571 of a meromorphic function, 455
of Weil divisor, 544
Szpiro conjecture, 431 Unit equation, 125–140
general, 186
Tangent bundle, 542 Unit of a number field, 17
of affine space, 540 Universal covering, 412
Tangent space, 536–540 Unramified field extension, 591–597
652 Index

outside S , 348 Vojta divisor, 356–359


over M , 346 with small height, 378
Unramified morphism, 580 Vojta’s conjecture, 483
for schemes, 598–599 over function field, 503
local behaviour, 581, 598 with ramification, 488
for curves, 494
v -topology, 55 Vojta’s height inequality, 483
Vaaler’s cube-slicing theorem, 618 Vojta’s theorem, 387
Valiron’s result on deficiencies, 455
Valuation, 346, 593 Waring’s problem, 153, 393
discrete, 544 Weak Mordell–Weil theorem, 341, 349
Valuation ring, 3 for elliptic curves, 329–335
disrete, 544 Weierstrass equation, 242
Value group, 4, 593 global minimal form, 429
Variety, 522 minimal local form, 426
associated complex analytic variety, Weierstrass ℘ -function, 246, 453
584 Weil divisor, 544
associated scheme, 522 component, 544
basis of topology, 522 effective, 544
codimension of a closed subset, 523 of invertible meromorphic section, 554
dimension, 523 of rational function, 545, 547, 553
equidimensional, 523 of section, 547
general type, 486 Weil height, 42–45
geometrically irreducible, 522, 524 Weil’s theorem of decomposition, 63
geometrically reduced, 522, 524 Weyl sum, 393
of pure dimension, 523 Wieferich pair, 404
rationally connected, 237 Wronskian, 450, 471
Vector bundle, 525–530 Wronskian criterion, 160
generated by global sections, 530
on projective variety, 533 Ye’s error term, 460
rank, 526 Zariski topology, 515
sheaf of sections, 528, 566 on projective space, 530
Vector field, 540, 542 Zariski’s tangent space, 538
Very ample line bundle, 533 Zero set, 515, 530
on curve, 583 Zero-divisor, 544
Vinogradov symbols, xv Zhang’s theorem, 94

You might also like