0% found this document useful (0 votes)
332 views836 pages

William Feller (Auth.), René L. Schilling, Zoran Vondraček, Wojbor A. Woyczyński (Eds.) - Selected Papers I-Springer International Publishing (2015)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
332 views836 pages

William Feller (Auth.), René L. Schilling, Zoran Vondraček, Wojbor A. Woyczyński (Eds.) - Selected Papers I-Springer International Publishing (2015)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 836

WILLIAM FELLER

Selected Papers
I
William Feller
Selected Papers I
W. Feller (around 1930)
Photograph courtesy of Joanne Elliott
William Feller
Selected Papers I

Edited by
René L. Schilling • Zoran Vondraček
Wojbor A. Woyczyński

123
William Feller
(1906 Zagreb – 1970 New York City)

Editors
René L. Schilling
Institut für Mathematische Stochastik
Technische Universität Dresden
Dresden
Germany

Zoran Vondraček
Department of Mathematics
University of Zagreb
Zagreb
Croatia

Wojbor A. Woyczyński
Department of Mathematics
Case Western Reserve University
Cleveland, OH
USA

ISBN 978-3-319-16858-6
Library of Congress Control Number: 2012954381

Math. Subj. Classification (2010): 01A75 60G05, 60F05, 60E07, 60J35, 60K05, 47D07

Springer Cham Heidelberg New York Dordrecht London


© Springer International Publishing Switzerland 2015
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or
dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained
herein or for any errors or omissions that may have been made.

Printed on acid-free paper

Springer International Publishing AG Switzerland is part of Springer Science+Business Media


(www.springer.com)
Preface

These volumes contain a selection of William Feller’s most important works on prob-
ability theory, mathematical biology, analysis and geometry. Feller was a prolific
writer, with many groundbreaking contributions which, almost 45 years after his
death, are still frequently quoted – both directly and indirectly. This abundance meant
that the decision which of his more than 100 research papers should be included
in these Selecta was not an easy task. Most of Feller’s English-language research
contributions have been included, and we also selected a fair share of his pre-1940
publications in German and French, six of which were translated into English for
these Selecta. His two most cherished contributions, the definitive Feller–Lindeberg
version of the Central Limit Theorem [Feller 1935c], and the deep On the theory of
stochastic processes [Feller 1936c] – a companion to Kolmogorov’s seminal work
on stochastic processes and PDEs – are here being made accessible in English for
the first time. We decided not to reprint Feller’s work on measure theory, and the
geometry selection contains only the fundamental Acta Mathematica contribution by
Busemann and Feller; in fact, Feller did not continue to work in these directions, and
we felt that his impact on these areas remained rather limited.
Feller is widely known for his brilliant two-volume textbook, An Introduction
to Probability Theory and Its Applications, on which he continuously worked since
the late 1940s. It is one of the few books which changed considerably from edition
to edition. The observant reader of these Selecta will notice that in later editions
Feller included many results from his (then) recent research papers. Consequently,
the Introduction to Probability Theory is still a “modern” book, and a treat for every
reader as well. For obvious reasons we could not include these books in the present
selection.
The papers are arranged in chronological order: Volume 1 covers the years 1928–
1950, and Volume 2 the period 1951–1971. Each volume contains additional mate-
rial, of which the most important are commentaries and essays written by the leading
experts in the areas covered by the Selecta. We are grateful to them for their will-
ingness to spend considerable time and effort to put various aspects of Feller’s work,
and its afterlife, into modern perspective; Ellen Baake & Anton Wakolbinger wrote
on Mathematical Biology, Hans Fischer on Foundations and the Central Limit Theo-
rem, Masatoshi Fukushima on Diffusions and Boundaries, Niels Jacob on Functional
Analysis, Ross Maller on Limit Theorems, Goran Peskir on Boundary Conditions
and Erhard Scholz on Geometry. The volumes would not have the same relevance to

v
modern mathematical research without their incisive analyses.
Thanks are due to Springer-Verlag for undertaking the publication. The team
around Catriona Byrne and Marina Reizakis was very supportive and helped the
project move forward throughout the past couple of years. We are grateful to all own-
ers of the copyrights for their cooperation, and for generously granting the reprint
permissions free of charge. These days this is no longer the rule, but an exception.
Only one publisher declined to help us. Thus the absence of the difficult-to-access
1938 survey paper [*Feller 1938a] on the foundations of probability theory leaves a
distressing gap in these Selecta.
Many colleagues and friends helped us during the preparation of these volumes.
We would like to thank all of them, in particular, Christian Berg, Ulrich Brehm,
Jürgen Elstrodt, Tony Knapp, Heinz König, Anders Martin-Löf and Hrvoje Šikić
for valuable comments. We are especially grateful to Joanne Elliott, a long-time
friend, and neighbour of the Feller family for insightful conversations about Feller’s
life in America, and for sharing generously her vast trove of historical photographs
and other materials; some are reproduced here with her permission. Several of the
surviving Feller’s PhD students contacted by us also expressed support for the idea of
this publication. Our student Ms. Franziska Kühn (TU Dresden) helped us with the
intricacies of typesetting this work in Latex and other administrative chores.

Dresden, Zagreb, and Cleveland René L. Schilling


September 2014 Zoran Vondraček
Wojbor A. Woyczyński

vi Preface
Acknowledgements

The Editors gratefully acknowledge the kindness of these institutions and individuals in granting the fol-
lowing permissions:

Acta Universitatis Szegediensis, Bolyai Institute


W. Feller: Über das Gesetz der großen Zahlen. Acta Litterarum ac Scientiarum. Regiae Univer-
sitatis Hungaricae Francisco-Josephinae. Sectio Scientiarum Mathematicarum. (Acta. Sci. Litt.
Szeged) 8 (1937) 191–201.  c Acta Universitatis Szegediensis, Bolyai Institute. Reprinted with
permission of Acta Universitatis Szegediensis, Bolyai Institute.
American Mathematical Society
W. Feller: On the integro-differential equations of purely discontinuous Markoff processes. Trans-
actions of the American Mathematical Society 48 (1940) 488–515.  c American Mathematical
Society. Reprinted with permission of the American Mathematical Society.
W. Feller: The general form of the so-called law of the iterated logarithm. Transactions of the Amer-
ican Mathematical Society 54 (1943) 373–402.  c American Mathematical Society. Reprinted with
permission of the American Mathematical Society.
W. Feller: Generalization of a probability limit theorem of Cramér. Transactions of the American
Mathematical Society 54 (1943) 361–372.  c American Mathematical Society. Reprinted with
permission of the American Mathematical Society.
W. Feller: The fundamental limit theorems in probability. Bulletin of the American Mathematical
Society 51 (1945) 800–832. c American Mathematical Society. Reprinted with permission of the
American Mathematical Society.
H. Busemann, W. Feller: Regularity properties of a certain class of surfaces. Bulletin of the Ameri-
can Mathematical Society 51 (1945) 583–598.  c American Mathematical Society. Reprinted with
permission of the American Mathematical Society.
P. Erdös, W. Feller, H. Pollard: A property of power series with positive coefficients. Bulletin
of the American Mathematical Society 55 (1949) 201–204.  c American Mathematical Society.
Reprinted with permission of the American Mathematical Society.
W. Feller: Fluctuation theory of recurrent events. Transactions of the American Mathematical So-
ciety 67 (1949) 98–119.  c American Mathematical Society. Reprinted with permission of the
American Mathematical Society.
W. Feller: Some recent trends in the mathematical theory of diffusion. Proceedings of the In-
ternational Congress of Mathematicians, Cambridge (MA) 1950. Vol. 2. American Mathematical
Society, Providence (RI) 1952, pp. 322–339. 
c American Mathematical Society. Reprinted with
permission of the American Mathematical Society.
W. Feller: Diffusion processes in one dimension. Transactions of the American Mathematical So-
ciety 77 (1954) 1–31. c American Mathematical Society. Reprinted with permission of the Amer-
ican Mathematical Society.

vii
W. Feller: Boundaries induced by non-negative matrices. Transactions of the American Mathemat-
ical Society 93 (1956) 19–54. c American Mathematical Society. Reprinted with permission of
the American Mathematical Society.
J. Elliott, W. Feller: Stochastic processes connected with harmonic functions. Transactions of the
American Mathematical Society 82 (1956) 392–420.  c American Mathematical Society. Reprinted
with permission of the American Mathematical Society.
Cambridge University Press
W. Feller: Some new connections between probability and classical analysis. Proceedings of the
International Congress of Mathematicians, Edinburgh 1958. Cambridge University Press, Cam-
bridge 1960, pp. 69–86. c Cambridge University Press. Reprinted with permission of the Cam-
bridge University Press.
W. Feller: On fitness and the cost of natural selection. Genetics Research 9 (1967) 1–15. 
c Cam-
bridge University Press. Reprinted with permission of the Cambridge University Press.
Croatian Academy of Sciences and Arts
W. Feller: Neuer Beweis für die Kolmogoroff–P. Lévysche Charakterisierung der unbeschränkt
teilbaren Verteilungsfunktionen. Bulletin international de l’académie Yougoslave des sciences et
des beaux-arts, Zagreb. Classe des sciences mathématiques et naturelles 32 (1939) 106–113.  c
Croatian Academy of Sciences and Arts. Reprinted with permission of the Croatian Academy of
Sciences and Arts.
Duke University Press
W. Feller: Completely monotone functions and sequences. Duke Mathematical Journal 5 (1939)
661–674.  c Duke University Press. Reprinted with permission of the Duke University Press.
W. Feller: Some geometric inequalities. Duke Mathematical Journal 9 (1942) 885–892. 
c Duke
University Press. Reprinted with permission of the Duke University Press.
Elsevier
W. Feller: The birth and death processes as diffusion processes. Journal de Mathématiques pures
et appliquées, IX. série 38 (1959) 301–345. 
c Elsevier. Reprinted with permission of Elsevier.
Illinois Journal of Mathematics
W. Feller: Generalized second order differential operators and their lateral conditions. Illinois Jour-
nal of Mathematics 1 (1957) 459–504. Reprinted with permission of the Illinois Journal of Mathe-
matics, published by the University of Illinois at Urbana-Champaign.
W. Feller: On the intrinsic form for second order differential operators. Illinois Journal of Mathe-
matics 2 (1958) 1–18. Reprinted with permission of the Illinois Journal of Mathematics, published
by the University of Illinois at Urbana-Champaign.
W. Feller: Differential operators with the positive maximum property. Illinois Journal of Mathemat-
ics 3 (1959) 182–186. Reprinted with permission of the Illinois Journal of Mathematics, published
by the University of Illinois at Urbana-Champaign.
Indiana University Mathematics Journal
W. Feller, S. Orey: A renewal theorem. Journal of Mathematics and Mechanics 10 (1961) 619–624.

c Indiana University Mathematics Journal. Reprinted with permission of the Indiana University
Mathematics Journal.
W. Feller: On the Fourier representation for Markov chains and the strong ratio theorem. Journal
of Mathematics and Mechanics 15 (1966) 273–283.  c Indiana University Mathematics Journal.
Reprinted with permission of the Indiana University Mathematics Journal.
W. Feller: An extension of the law of the iterated logarithm to variables without variance. Journal
of Mathematics and Mechanics 18 (1968) 343–355.  c Indiana University Mathematics Journal.
Reprinted with permission of the Indiana University Mathematics Journal.
Institute of Mathematical Statistics, Baltimore

viii Acknowledgements
W. Feller: On the integral equation of renewal theory. Annals of Mathematical Statistics 12 (1941)
243–267.  c Institute of Mathematical Statistics, Baltimore. Reprinted with permission of the In-
stitute of Mathematical Statistics, Baltimore.
W. Feller: On a general class of “contagious” distributions. Annals of Mathematical Statistics 14
(1943) 389–400.  c Institute of Mathematical Statistics, Baltimore. Reprinted with permission of
the Institute of Mathematical Statistics, Baltimore.
W. Feller: On the normal approximation to the binomial distribution. Annals of Mathematical
Statistics 16 (1945) 319–329.  c Institute of Mathematical Statistics, Baltimore. Reprinted with
permission of the Institute of Mathematical Statistics, Baltimore.
W. Feller: Note on the law of large numbers and “fair” games. Annals of Mathematical Statistics
16 (1945) 301–304.  c Institute of Mathematical Statistics, Baltimore. Reprinted with permission
of the Institute of Mathematical Statistics, Baltimore.
W. Feller: On the Kolmogorov–Smirnov limit theorems for empirical distributions. Annals of Math-
ematical Statistics 19 (1948) 177–189. Erratum: Annals of Mathematical Statistics 21 (1950) 301—
302. Both:  c Institute of Mathematical Statistics, Baltimore. Reprinted with permission of the
Institute of Mathematical Statistics, Baltimore.
W. Feller: The asymptotic distribution of the range of sums of independent random variables.
Annals of Mathematical Statistics 22 (1951) 427–432.  c Institute of Mathematical Statistics, Bal-
timore. Reprinted with permission of the Institute of Mathematical Statistics, Baltimore.
W. Feller: Non-Markovian processes with the semigroup property. Annals of Mathematical Statis-
tics 30 (1959) 1252–1253.  c Institute of Mathematical Statistics, Baltimore. Reprinted with per-
mission of the Institute of Mathematical Statistics, Baltimore.
The Editors of Ann. Math. Statist.: William Feller, 1906–1970. Annals of Mathematical Statis-
tics 41 No. 6 (1970), iv–xiii. c Institute of Mathematical Statistics, Baltimore. Reprinted with
permission of the Institute of Mathematical Statistics, Baltimore.
Institute of Mathematics Polish Academy of Sciences
H. Busemann, W. Feller: Zur Differentiation der Lebesgueschen Integrale. Fundamenta Mathe-
maticae 22 (1934) 226–256.  c Institute of Mathematics Polish Academy of Sciences. Reprinted
with permission of the Institute of Mathematics Polish Academy of Sciences.
W. Feller: Über die Existenz von sogenannten Kollektiven. Fundamenta Mathematicae 32 (1939)
87–96.  c Institute of Mathematics Polish Academy of Sciences. Reprinted with permission of the
Institute of Mathematics Polish Academy of Sciences.
W. Feller: On positivity preserving semigroups of transformations on C[r1 , r2 ]. Annales de la So-
ciété Polonaise de Mathématique 25 (1952) 85–94.  c Institute of Mathematics Polish Academy of
Sciences. Reprinted with permission of the Institute of Mathematics Polish Academy of Sciences.
John Wiley & Sons, Inc.
W. Feller: On differential operators and boundary conditions. Communications on Pure and Ap-
plied Mathematics 8:1 (1955) 203–216. 1955
c John Wiley & Sons, Inc. This material is repro-
duced with permission of John Wiley & Sons, Inc.
W. Feller: A simple proof for renewal theorems. Communications on Pure and Applied Mathe-
matics 14:3 (1961) 285–293. 1961
c John Wiley & Sons, Inc. This material is reproduced with
permission of John Wiley & Sons, Inc.
L’Enseignement Mathématique
W. Feller: One-sided analogues of Karamata’s regular variation. L’Enseignement Mathématique,
IIe. Série 15 (1969) 107–121. 
c L’Enseignement Mathématique. Reprinted with permission of
L’Enseignement Mathématique.
Lund University, Centre for Mathematical Sciences
W. Feller: On a generalization of Marcel Riesz’ potentials and the semigroups generated by them.

Selected Works of W. Feller, Volume 1 ix


Meddelanden från Lunds Universitets Matematiska Seminarium. Supplement M. Riesz, Uppsala
1952, pp. 73–81. 
c Lund University, Centre for Mathematical Sciences. Reprinted with permis-
sion of Lund University, Centre for Mathematical Sciences.
Princeton University, Department of Mathematics
W. Feller: The law of the iterated logarithm for identically distributed random variables. Annals of
Mathematics 47 (1946) 631–638.  c Princeton University, Department of Mathematics. Reprinted
with permission of Princeton University, Department of Mathematics.
W. Feller: Two singular diffusion problems. Annals of Mathematics 54 (1951) 173–182. 
c Prince-
ton University, Department of Mathematics. Reprinted with permission of Princeton University,
Department of Mathematics.
W. Feller: The parabolic differential equations and the associated semigroups of transformation.
Annals of Mathematics 55 (1952) 468–519.  c Princeton University, Department of Mathematics.
Reprinted with permission of Princeton University, Department of Mathematics.
W. Feller: Semigroups of transformations in general weak topologies. Annals of Mathematics 57
(1953) 287–308. c Princeton University, Department of Mathematics. Reprinted with permission
of Princeton University, Department of Mathematics.
W. Feller: On the generation of unbounded semigroups of bounded linear operators. Annals of
Mathematics 58 (1953) 166–174.  c Princeton University, Department of Mathematics. Reprinted
with permission of Princeton University, Department of Mathematics.
W. Feller: The general diffusion operator and positivity preserving semigroups in one dimension.
Annals of Mathematics 60 (1954) 417–436.  c Princeton University, Department of Mathematics.
Reprinted with permission of Princeton University, Department of Mathematics.
W. Feller: On second order differential operators. Annals of Mathematics 61 (1955) 90–105.  c
Princeton University, Department of Mathematics. Reprinted with permission of Princeton Univer-
sity, Department of Mathematics.
W. Feller: On boundaries and lateral conditions for the Kolmogorov differential equations. Annals
of Mathematics 65 (1957) 527–570. Additional notes: Annals of Mathematics 68 (1958) 735–
736. Both: c Princeton University, Department of Mathematics. Reprinted with permission of
Princeton University, Department of Mathematics.
W. Feller: On the oscillations of sums of independent random variables. Annals of Mathematics 91
(1970) 402–418.  c Princeton University, Department of Mathematics. Reprinted with permission
of Princeton University, Department of Mathematics.
Society for Industrial and Applied Mathematics
W. Feller: Infinitely divisible distributions and Bessel functions associated with random walks.
SIAM Journal of Applied Mathematics 14 (1966) 864–875.  c Society for Industrial and Applied
Mathematics. Reprinted with permission of the Society for Industrial and Applied Mathematics.
Springer
W. Feller: Über algebraisch rektifizierbare transzendente Kurven. Mathematische Zeitschrift 27
(1928) 481–495. c Springer. Reprinted with permission of Springer.
W. Feller: Über die Lösungen der linearen partiellen Differentialgleichungen zweiter Ordnung vom
elliptischen Typus. Mathematische Annalen 102 (1930) 633–649.  c Springer. Reprinted with per-
mission of Springer.
H. Busemann, W. Feller: Krümmungseigenschaften konvexer Flächen. Acta Mathematica 66 (1936)
1–47. 
c Springer. Reprinted with permission of Springer.
W. Feller: Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung. Mathematische
Zeitschrift 40 (1936) 521–559. 
c Springer. Reprinted with permission of Springer.
W. Feller: Zur Theorie der stochastischen Prozesse. (Existenz- und Eindeutigkeitssätze). Mathe-
matische Annalen 113 (1936) 113–160.  c Springer. Reprinted with permission of Springer.

x Acknowledgements
W. Feller: Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung. II. Mathemati-
sche Zeitschrift 42 (1937) 301–312. Erratum: Mathematische Zeitschrift 44 (1939) 794. Both: 
c
Springer. Reprinted with permission of Springer.
W. Feller: Die Grundlagen der Volterraschen Theorie des Kampfes ums Dasein in wahrschein-
lichkeitstheoretischer Behandlung. Acta Biotheoretica, Leiden 5 (1939) 11–39. 
c Springer. Re-
printed with permission of Springer.
W. Feller: On the classical Tauberian theorems. Archiv der Mathematik 14 (1963) 317–322. 
c
Springer. Reprinted with permission of Springer.
W. Feller: On the Berry-Esseen theorem. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte
Gebiete 10 (1968) 261–268.  c Springer. Reprinted with permission of Springer.
W. Feller: General analogues to the law of the iterated logarithm. Zeitschrift für Wahrscheinlich-
keitstheorie und verwandte Gebiete 14 (1969) 21–26.  c Springer. Reprinted with permission of
Springer.
W. Feller: Limit theorems for probabilities of large deviations. Zeitschrift für Wahrscheinlich-
keitstheorie und verwandte Gebiete 14 (1969) 1–20.  c Springer. Reprinted with permission of
Springer.
The Johns Hopkins University Press
W. Feller: A limit theorem for random variables with infinite moments. American Journal of Math-
ematics 68:2 (1946) 257–262. 1946
c The Johns Hopkins Press. Reprinted with permission of The
Johns Hopkins University Press.
University of California Press
W. Feller: On regular variation and local limit theorems. In: J. Neyman and L. LeCam (eds.):
Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability 1965/66,
Vol. 2, Pt. 1, University of California Press, Berkeley, Los Angeles (CA) 1967, pp. 373–388.  c
University of California Press. Reprinted with permission of the University of California Press.
J.L. Doob: William Feller and Twentieth-Century Probability. In: L. LeCam, J. Neyman, E.L. Scott
(eds.): Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability
1970/71, Vol. 2, University of California Press, Berkeley, Los Angeles (CA) 1972, pp. 15–20.  c
University of California Press. Reprinted with permission of the University of California Press.
M. Kac: William Feller, In Memoriam. In: L. LeCam, J. Neyman, E.L. Scott (eds.): Proceedings
of the Sixth Berkeley Symposium on Mathematical Statistics and Probability 1970/71, Vol. 2,
University of California Press, Berkeley, Los Angeles (CA) 1972, pp. 21–23.  c University of
California Press. Reprinted with permission of the University of California Press.
For the following publications, the copyright resides with the author(s). Despite our best efforts, we were
not able to trace the present copyright owners and we do ask them to come forward if they feel that their
rights are infringed or that the material has been used in violation of copyright law.
W. Feller: On probability problems in the theory of counters. In: Studies, Essays, presented to
R. Courant (Courant anniversary volume). Interscience Publishers, New York 1948, pp. 105–115.
K.L. Chung, W. Feller: On fluctuations in coin-tossing. Proceedings of the National Academy of
Sciences, USA 35 (1949) 605–608.
W. Feller, H.P. McKean Jr.: A diffusion equivalent to a countable Markov chain. Proceedings of
the National Academy of Sciences, USA 42 (1956) 351–354.
W. Feller: On boundaries defined by stochastic matrices. In: Proceedings of Symposia in Applied
Mathematics. Vol. 7. McGraw-Hill Book Co., New York (for the American Mathematical Society,
Providence (RI)) 1957, 35–40.
W. Feller: On semi-Markov processes. Proceedings of the National Academy of Sciences, USA 51
(1964) 653–659.
W. Feller: On the influence of natural selection on population size. Proceedings of the National
Academy of Sciences, USA 55 (1966) 733–738.

Selected Works of W. Feller, Volume 1 xi


The following papers are already in the public domain. We are grateful to the University of California Press
for this information.
W. Feller: On the theory of stochastic processes, with particular reference to applications. In: J.
Neyman (ed.): Proceedings of the (First) Berkeley Symposium on Mathematical Statistics and
Probability 1945/46. University of California Press, Berkeley, Los Angeles (CA) 1949, pp. 403–
432.
W. Feller: Diffusion processes in genetics. In: J. Neyman (ed.): Proceedings of the Second Berke-
ley Symposium on Mathematical Statistics and Probability 1950. University of California Press,
Berkeley, Los Angeles (CA) 1951, pp. 227–246.

xii Acknowledgements
Contents of Volume 1

Preface v
Acknowledgements vii
Contents of Volume 1 xiii

Contents of Volume 2 xvii

Curriculum Vitae: William Feller 1906–1970 xxi

Bibliography of William Feller xxv

PhD Students of William Feller xxxv

Z. W. Birnbaum (ed.): William Feller, 1906–1970 1

J. L. Doob: William Feller and Twentieth Century Probability 7

M. Kac: William Feller, In Memoriam 13

R. L. Schilling and W. A. Woyczyński: William Feller. A Biography 17

H. Fischer: Feller’s Early Work on Measure Theory and Mathematical


Foundations of Probability 43

H. Fischer: Feller’s Early Work on Limit Theorems 69

E. Scholz: Feller and Busemann on Surface Theory — Contributions to


Geometry 87

[Feller 1928] Über algebraisch rektifizierbare transzendente Kurven 101

[Feller 1930] Über die Lösungen der linearen partiellen Differential-


gleichungen zweiter Ordnung vom elliptischen Typus 117

[Feller 1934a] Zur Differentiation der Lebesgueschen Integrale 135

[Feller 1935c] Über den zentralen Grenzwertsatz der


Wahrscheinlichkeitsrechnung 167

xiii
[Feller 1935c] (translation) On the Central Limit Theorem of
Probability Theory 207

[Feller 1936b] Krümmungseigenschaften konvexer Flächen 245

[Feller 1936c] Zur Theorie der stochastischen Prozesse.


(Existenz-und Eindeutigkeitssätze.) 293

[Feller 1936c] (translation) On the Theory of Stochastic Processes.


(Existence and Uniqueness.) 341

[Feller 1937a] Über den zentralen Grenzwertsatz der


Wahrscheinlichkeitsrechnung. II 389

[Feller 1937a] (translation) On the Central Limit Theorem of


Probability Theory. II 403

[Feller 1937b] Über das Gesetz der großen Zahlen 417

[Feller 1937b] (translation) On the Law of Large Numbers 429

[Feller 1939a] Die Grundlagen der Volterraschen Theorie des Kampfes


ums Dasein in wahrscheinlichkeitstheoretischer Behandlung 441

[Feller 1939a] (translation) The Foundations of Volterra’s Theory of the


Struggle for Life in a Probabilistic Treatment 471

[Feller 1939b] Completely Monotone Functions and Sequences 497

[Feller 1939c] Über die Existenz von sogenannten Kollektiven 511

[Feller 1939c] (translation) On the Existence of So-called Kollektivs 521

[Feller 1939d] Neuer Beweis für die Kolmogoroff–P. Lévysche


Charakterisierung der unbeschränkt teilbaren Verteilungsfunktionen 531

[Feller 1940c] On the Integro-differential Equations of Purely


Discontinuous Markoff Processes 539

[Feller 1941a] On the Integral Equation of Renewal Theory 567

[Feller 1942] Some Geometric Inequalities 593

[Feller 1943b] Generalization of a Probability Limit Theorem of Cramér 601

[Feller 1943c] The General Form of the So-called Law of the Iterated
Logarithm 613

xiv Contents of Volume 1


[Feller 1943d] On a General Class of “Contagious” Distributions 643

[Feller 1945a] On the Normal Approximation to the Binomial Distribution 655

[Feller 1945b] The Fundamental Limit Theorems in Probability 667

[Feller 1945c] Regularity Properties of a Certain Class of Surfaces 701

[Feller 1945d] Note on the Law of Large Numbers and “Fair” Games 717

[Feller 1946a] A Limit Theorem for Random Variables with Infinite


Moments 721

[Feller 1946b] The Law of the Iterated Logarithm for Identically


Distributed Random Variables 727

[Feller 1948a] On the Kolmogorov–Smirnov Limit Theorems for


Empirical Distributions 735

[Feller 1948c] On Probability Problems in the Theory of Counters 751

[Feller 1949a] A Property of Power Series with Positive Coefficients 761

[Feller 1949b] On Fluctuations in Coin-tossing 765

[Feller 1949c] On the Theory of Stochastic Processes, with Particular


Reference to Applications 769

[Feller 1949d] Fluctuation Theory of Recurrent Events 799

Contents of Volume 1 xv
Contents of Volume 2

Preface v
Acknowledgements vii
Contents of Volume 2 xiii

Contents of Volume 1 xvii

Curriculum Vitae: William Feller 1906–1970 xxi

Bibliography of William Feller xxv

PhD Students of William Feller xxxv

R. L. Schilling and W. A. Woyczyński: William Feller. A Biography 1

E. Baake and A. Wakolbinger: Feller’s Contributions to Mathematical


Biology 25

N. Jacob: Feller on Differential Operators and Semi-groups 45

M. Fukushima: Feller’s Contributions to the One-Dimensional


Diffusion Theory and Beyond 63

G. Peskir: On Boundary Behaviour of One-Dimensional Diffusions:


From Brown to Feller and Beyond 77

R. Maller: Feller’s Work in Renewal Theory, the Law of the Iterated


Logarithm and Karamata Theory 95

[Feller 1951a] The Asymptotic Distribution of the Range of Sums of


Independent Random Variables 115

[Feller 1951d] Diffusion Processes in Genetics 121

[Feller 1951e] Two Singular Diffusion Problems 141

[Feller 1952a] The Parabolic Differential Equations and the Associated


Semigroups of Transformation 151

xvii
[Feller 1952b] On a Generalization of Marcel Riesz’ Potentials and the
Semigroups Generated by Them 203

[Feller 1952c] Some Recent Trends in the Mathematical Theory of


Diffusion 213

[Feller 1952e] On Positivity Preserving Semigroups of Transformations


on C[r1 , r2 ] 231

[Feller 1953a] Semigroups of Transformations in General Weak Topologies 241

[Feller 1953b] On the Generation of Unbounded Semigroups of Bounded


Linear Operators 263

[Feller 1954a] The General Diffusion Operator and Positivity Preserving


Semigroups in One Dimension 273

[Feller 1954b] Diffusion Processes in One Dimension 293

[Feller 1955a] On Second Order Differential Operators 325

[Feller 1955b] On Differential Operators and Boundary Conditions 341

[Feller 1956a] Boundaries Induced by Non-negative Matrices 355

[Feller 1956b] A Diffusion Equivalent to a Countable Markov Chain 391

[Feller 1956c] Stochastic Processes Connected with Harmonic Functions 395

[Feller 1957a] Generalized Second Order Differential Operators and


Their Lateral Conditions 425

[Feller 1957d] On Boundaries and Lateral Conditions for the Kolmogorov


Differential Equations 471

[Feller 1957e] On Boundaries Defined by Stochastic Matrices 517

[Feller 1958] On the Intrinsic Form for Second Order Differential


Operators 523

[Feller 1959b] The Birth and Death Processes as Diffusion Processes 541

[Feller 1959c] Non-Markovian Processes with the Semigroup Property 587

[Feller 1959d] Differential Operators with the Positive Maximum Property 589

[Feller 1960] Some New Connections Between Probability and Classical


Analysis 595

xviii Contents of Volume 2


[Feller 1961a] A Renewal Theorem 613

[Feller 1961b] A Simple Proof for Renewal Theorems 619

[Feller 1963] On the Classical Tauberian Theorems 629

[Feller 1964] On Semi-Markov Processes 635

[Feller 1966a] On the Fourier Representation for Markov Chains and the
Strong Ratio Theorem 643

[Feller 1966b] On the Influence of Natural Selection on Population Size 655

[Feller 1966c] Infinitely Divisible Distributions and Bessel Functions


Associated with Random Walks 661

[Feller 1967a] On Regular Variation and Local Limit Theorems 673

[Feller 1967c] On Fitness and the Cost of Natural Selection 689

[Feller 1968b] On the Berry-Esseen Theorem 705

[Feller 1968d] An Extension of the Law of the Iterated Logarithm to


Variables Without Variance 713

[Feller 1969b] One-sided Analogues of Karamata’s Regular Variation 727

[Feller 1969d] General Analogues to the Law of the Iterated Logarithm 743

[Feller 1969c] Limit Theorems for Probabilities of Large Deviations 749

[Feller 1970] On the Oscillations of Sums of Independent Random


Variables 769

Contents of Volume 2 xix


Curriculum Vitae:
William Feller 1906–1970

7 July 1906 William Feller is born in Zagreb (German: Agram), Croatia


(then part of the Austro-Hungarian empire), as son of Ida (née
Oehmichen) and Eugen Viktor Feller. Feller’s Christian name
is Vilibald; during his life Feller uses several translations of his
Christian name: Vilim, Willi, Willy, William
1923–1925 Feller studies 4 semesters mathematics at the University of Za-
greb (Croatia)
1925–1926 Feller studies 2 semesters mathematics at the Georg-August-
University, Göttingen (Germany)
1926 PhD at the Georg-August-University, Göttingen (Germany).
Advisor: Richard Courant. Thesis: Über algebraisch rekti-
fizierbare transzendente Kurven (On algebraically rectifiable
transcendental curves), see [*Feller 1926] and [Feller 1928] in
Feller’s bibliography, p. xxv
1926–1928 Post-doc at the Georg-August-University, Göttingen (Germany)
1928 Habilitation in mathematics at Kiel University, Kiel (Germany)
1928–1933 Lecturer and Privatdozent at Kiel University, Kiel (Germany)
1933–1934 Visiting University of Copenhagen (Denmark)
1934–1939 Lecturer at the University of Stockholm (Sweden)
27 July 1938 William Feller marries Clara Maria Nielsen
1939 Editor-in-Chief of the newly founded Mathematical Reviews
1939–1945 Associate professor at Brown University, Providence (RI)
1944 US citizenship
1945–1950 Full professor at Cornell University, Ithaca (NY)
1950–1970 Eugene Higgins Professor of Mathematics at Princeton Univer-
sity, Princeton (NJ)

xxi
1966 Permanent title of visiting professor of the Rockefeller Univer-
sity, New York (NY)
1969/1970 National medal of science
14 January 1970 William Feller dies of cancer at the age of 63 in New York City
(NY)

xxii Curriculum Vitae: William Feller 1906–1970


Handwritten CV of Feller’s application for the admission to the PhD examination at
the University of Göttingen. Reproduction courtesy of Universitätsarchiv Göttingen
[Math Nat Prom 0010, 23]. Translation of the text:
Curriculum Vitae. I, Willy Feller, was born on 7 July 1906 in Zagreb (Agram); I
am Yugoslav citizen and of Roman Catholic confession. In my native town I went to
secondary school (Realgymnasium) and I got the permission of the government to
skip two grades so that I could already graduate from school in June 1923.. Then I
started studying mathematics and physics. First I studied from October 1923 until
July 1925, i. e. four semesters, at the University of Zagreb; since the last winter term
I am enrolled at [the University of] Göttingen. Göttingen, 10 August 1926.

Selected Works of W. Feller, Volume 1 xxiii


Bibliography of William Feller

A star “*” indicates that the respective entry is not contained in these Selecta.

[*Feller 1926] Über algebraisch rektifizierbare transzendente Kurven [On alge-


braically rectifiable transcendental curves]. PhD-Thesis Universität Göttin-
gen, June 1926. Advisor: Richard Courant. Also published as [Feller 1928].

[Feller 1928] Über algebraisch rektifizierbare transzendente Kurven [On alge-


braically rectifiable transcendental curves]. Mathematische Zeitschrift 27
(1928) 481–495.

[*Feller 1929] Über die Lösungen der linearen partiellen Differentialgleichungen


zweiter Ordnung vom elliptischen Typus [On the solutions of second-order
linear partial differential equations of elliptic type]. Habilitationsschrift Uni-
versität Kiel, February 1929. Also published as [Feller 1930].

[Feller 1930] Über die Lösungen der linearen partiellen Differentialgleichungen


zweiter Ordnung vom elliptischen Typus [On the solutions of second-order
linear partial differential equations of elliptic type]. Mathematische Annalen
102 (1930) 633–649.

[*Feller 1931] (with Erhard Tornier) Mengentheoretische Untersuchung von Eigen-


schaften der Zahlenreihe [Set-theoretic investigation of some properties of
the natural numbers]. Zentralblatt 1 (1931) 257–259.

[*Feller 1932a] (with Erhard Tornier) Maß- und Inhaltstheorie des Baireschen Null-
raumes [Measure and integration theory of Baire’s null space]. Mathemati-
sche Annalen 107 (1932) 165–187.

[*Feller 1932b] (with Erhard Tornier) Mengentheoretische Untersuchung von Eigen-


schaften der Zahlenreihe [Set-theoretic investigation of some properties of
the natural numbers]. Mathematische Annalen 107 (1932) 188–232.

[*Feller 1932c] Allgemeine Maßtheorie und Lebesguesche Integration [General


measure theory and Lebesgue integration]. Sitzungsberichte der Preußi-
schen Akademie der Wissenschaften, Physikalisch–Mathematische Klasse 27
(1932) 459–472.

xxv
[Feller 1934a] (with Herbert Busemann) Zur Differentiation der Lebesgueschen In-
tegrale [On the differentiation of Lebesgue’s integrals]. Fundamenta Mathe-
maticae 22 (1934) 226–256.

[*Feller 1934b] Bemerkungen zur Maßtheorie in abstrakten Räumen [Remarks on


measure theory in abstract spaces]. Bulletin international de l’académie
Yougoslave des sciences et des beaux-arts, Zagreb. Classe des sciences
mathématiques et naturelles 28 (1934) 30–45. Croatian original publication:
Dr. Vilim (W.) Feller: Prilog teoriji mjera u apstraktnim prostorima. RAD
249 (1934) 204–224.

[*Feller 1935a] (with Herbert Busemann) Bemerkungen zur Differentialgeometrie


der konvexen Flächen. I. Kürzeste Linien auf differenzierbaren Flächen [Re-
marks on the differential geometry of convex surfaces. I. Shortest lines on
differentiable surfaces]. Matematisk Tidsskrift B (1935) 25–36.

[*Feller 1935b] (with Herbert Busemann) Bemerkungen zur Differentialgeometrie


der konvexen Flächen. II. Über die Krümmungsindikatrizen [Remarks on
the differential geometry of convex surfaces. II. On curvature indicatrices].
Matematisk Tidsskrift B (1935) 87–115.

[Feller 1935c] Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung


[On the central limit theorem of probability theory]. Mathematische
Zeitschrift 40 (1935/36) 521–559. This paper has been translated for the
present Selecta.

[*Feller 1936a] (with Herbert Busemann) Bemerkungen zur Differentialgeometrie


der konvexen Flächen. III. Über die Gauss’sche Krümmung [Remarks on the
differential geometry of convex surfaces. III. On Gauss’ curvature]. Matem-
atisk Tidsskrift B (1936) 41–70.

[Feller 1936b] (with Herbert Busemann) Krümmungseigenschaften konvexer


Flächen [Curvature properties of convex surfaces]. Acta Mathematica 66
(1936) 1–47.

[Feller 1936c] Zur Theorie der stochastischen Prozesse. (Existenz- und Ein-
deutigkeitssätze) [On the theory of stochastic processes. (Existence- and
uniqueness theorems)]. Mathematische Annalen 113 (1936) 113–160. This
paper has been translated for the present Selecta.

[Feller 1937a] Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrech-


nung. II [On the central limit theorem of probability theory. II]. Mathemati-
sche Zeitschrift 42 (1937) 301–312.
Erratum: Mathematische Zeitschrift 44 (1939) 794. This paper has been
translated for the present Selecta.

[Feller 1937b] Über das Gesetz der großen Zahlen [On the law of large num-
bers]. Acta Litterarum ac Scientiarum. Regiae Universitatis Hungaricae

xxvi Bibliography of William Feller


Francisco-Josephinae. Sectio Scientiarum Mathematicarum. (Acta. Sci. Litt.
Szeged) 8 (1937) 191–201. This paper has been translated for the present
Selecta.
[*Feller 1937c] Über die Theorie der stochastischen Prozesse [On the theory of
stochastic processes]. In: Comptes Rendus du Congrès International des
Mathématiciens. Oslo 1936. Vol. 2. A. W. Brøggers Boktrykkeri A/S, Oslo
1937, pp. 194–196.
[*Feller 1938a] Sur les axiomatiques du calcul des probabilités et leurs relations avec
les expériences [On the axioms of probability theory and their relations with
the experience]. In: P. Cantelli, M. Fréchet, R. von Mises, I. F. Steffensen
and A. Wald (eds.): Conférences internationales de sciences mathématiques.
Colloque consacré à la théorie des probabilités. II: Les fondements du calcul
des probabilités. Actualités scientifiques et industrielles 735. Hermann &
Cie., Paris 1938, pp. 7–21.
[*Feller 1938b] Note on regions similar to the sample space. Statistical Research
Memoirs, University College London 2 (1938), 117–125.
[*Feller 1938c] (with J. Runnström and E. Sperber) Die Aufnahme von Glucose
durch Bäckerhefe unter aeroben und anaeroben Bedingungen [On the ab-
sorption of glucose by baker’s yeast under aerobic and anaerobic condi-
tions]. Naturwissenschaften 26 (1938) 547–548.
[Feller 1939a] Die Grundlagen der Volterraschen Theorie des Kampfes ums Da-
sein in wahrscheinlichkeitstheoretischer Behandlung [The foundations of
Volterra’s theory on the struggle for life in a probabilistic treatment]. Acta
Biotheoretica, Leiden 5 (1939) 11–39. This paper has been translated for
the present Selecta.
[Feller 1939b] Completely monotone functions and sequences. Duke Mathematical
Journal 5 (1939) 661–674.
[Feller 1939c] Über die Existenz von sogenannten Kollektiven [On the existence of
so-called Kollektivs] Fundamenta Mathematicae 32 (1939) 87–96. This pa-
per has been translated for the present Selecta.
[Feller 1939d] Neuer Beweis für die Kolmogoroff–P. Lévysche Charakterisierung
der unbeschränkt teilbaren Verteilungsfunktionen [A new proof of Kol-
mogoroff’s and P. Levy’s characterization of infinitely divisible distribution
functions]. Bulletin international de l’académie Yougoslave des sciences et
des beaux-arts, Zagreb. Classe des sciences mathématiques et naturelles
32 (1939) 106–113. Croatian original publication: Vilim (W.) Feller: O
Kolmogoroff–P. Lévyjevu predočivanju beskonačno djeljivih funkcija repar-
ticije. RAD 263 (1939) 95–112.
[*Feller 1940a] On the logistic law of growth and its empirical verifications in biol-
ogy. Acta Biotheoretica, Leiden 5 (1940) 51–65.

Selected Works of W. Feller, Volume 1 xxvii


[*Feller 1940b] On the time distribution of so-called random events. Physical Re-
views, II. Series 57 (1940) 906–908.

[Feller 1940c] On the integro-differential equations of purely discontinuous Markoff


processes. Transactions of the American Mathematical Society 48 (1940)
488–515.

[*Feller 1940d] Statistical aspects of ESP. Journal of Parapsychology 4 (1940) 271–


298.

[Feller 1941a] On the integral equation of renewal theory. Annals of Mathematical


Statistics 12 (1941) 243–267.

[*Feller 1941b] (with Jacob David Tamarkin) Partial Differential Equations. Brown
University Summer Session for Advanced Instructions & Research in Me-
chanics. 6/23- 9/13 1941, Providence (RI) 1941.
Chapters 1–3 by J. D. Tamarkin, Chapters 4–7 by William Feller.
Reprinted by the Lewis Flight Propulsion Laboratory, National Committee
for Aeronautics, Cleveland 1956.

[Feller 1942] Some geometric inequalities. Duke Mathematical Journal 9 (1942)


885–892.

[*Feller 1943a] On A. C. Aitken’s method of interpolation. Quarterly of Applied


Mathematics 1 (1943) 86–87.

[Feller 1943b] Generalization of a probability limit theorem of Cramér. Transactions


of the American Mathematical Society 54 (1943) 361–372.

[Feller 1943c] The general form of the so-called law of the iterated logarithm. Trans-
actions of the American Mathematical Society 54 (1943) 373–402.

[Feller 1943d] On a general class of “contagious” distributions. Annals of Mathemat-


ical Statistics 14 (1943) 389–400.

[Feller 1945a] On the normal approximation to the binomial distribution. Annals of


Mathematical Statistics 16 (1945) 319–329.

[Feller 1945b] The fundamental limit theorems in probability. Bulletin of the Ameri-
can Mathematical Society 51 (1945) 800–832.

[Feller 1945c] (with Herbert Busemann) Regularity properties of a certain class of


surfaces. Bulletin of the American Mathematical Society 51 (1945) 583–598.

[Feller 1945d] Note on the law of large numbers and “fair” games. Annals of Math-
ematical Statistics 16 (1945) 301–304.

[Feller 1946a] A limit theorem for random variables with infinite moments. Ameri-
can Journal of Mathematics 68 (1946) 257–262.

xxviii Bibliography of William Feller


[Feller 1946b] The law of the iterated logarithm for identically distributed random
variables. Annals of Mathematics 47 (1946) 631–638.
[Feller 1948a] On the Kolmogorov–Smirnov limit theorems for empirical distribu-
tions. Annals of Mathematical Statistics 19 (1948) 177–189.
Erratum: Annals of Mathematical Statistics 21 (1950) 301—302.
[*Feller 1948b] Spanish Translation of the paper [Feller 1945b] Revista Matemática
Hispano-Americana, IV. Seria 8 (1948) 95–132.
[Feller 1948c] On probability problems in the theory of counters. In: Studies, Es-
says, Presented to R. Courant (Courant Anniversary Volume). Interscience
Publishers, New York 1948, pp. 105–115.
[Feller 1949a] (with Paul Erdös and Harry Pollard) A property of power series
with positive coefficients. Bulletin of the American Mathematical Society
55 (1949) 201–204.
[Feller 1949b] (with Kai Lai Chung) On fluctuations in coin-tossing. Proceedings of
the National Academy of Sciences, USA 35 (1949) 605–608.
[Feller 1949c] On the theory of stochastic processes, with particular reference to ap-
plications. In: J. Neyman (ed.): Proceedings of the (First) Berkeley Sym-
posium on Mathematical Statistics and Probability 1945/46. University of
California Press, Berkeley, Los Angeles 1949, pp. 403–432.
[Feller 1949d] Fluctuation theory of recurrent events. Transactions of the American
Mathematical Society 67 (1949) 98–119.
[*Feller 1950] An Introduction to Probability Theory and Its Applications. John Wi-
ley & Sons, New York 1950.
[Feller 1951a] The asymptotic distribution of the range of sums of independent ran-
dom variables. Annals of Mathematical Statistics 22 (1951) 427–432.
[*Feller 1951b] (with George E. Forsythe) New matrix transformations for obtaining
characteristic vectors. Quarterly of Applied Mathematics 8 (1951) 325–331.
[*Feller 1951c] The problem of n liars and Markov chains. American Mathematical
Monthly 58 (1951) 606–608.
[Feller 1951d] Diffusion processes in genetics. In: J. Neyman (ed.): Proceedings of
the Second Berkeley Symposium on Mathematical Statistics and Probability
1950. University of California Press, Berkeley, Los Angeles (CA) 1951, pp.
227–246.
[Feller 1951e] Two singular diffusion problems. Annals of Mathematics 54 (1951)
173–182.
[Feller 1952a] The parabolic differential equations and the associated semigroups of
transformation. Annals of Mathematics 55 (1952) 468–519.

Selected Works of W. Feller, Volume 1 xxix


[Feller 1952b] On a generalization of Marcel Riesz’ potentials and the semigroups
generated by them. In: Meddelanden från Lunds Universitets Matematiska
Seminarium. Supplement M. Riesz, Uppsala 1952, pp. 73–81.
[Feller 1952c] Some recent trends in the mathematical theory of diffusion. In: Pro-
ceedings of the International Congress of Mathematicians, Cambridge (MA)
1950. Vol. 2. American Mathematical Society, Providence (RI) 1952, pp.
322–339.
[*Feller 1952d] (with George E. Forsythe) New matrix transformations for obtain-
ing characteristic vectors. In: Proceedings of the International Congress of
Mathematicians, Cambridge (MA) 1950. Vol. 1. American Mathematical So-
ciety, Providence (RI) 1952, p. 661.
[Feller 1952e] On positivity preserving semigroups of transformations on C[r1 , r2 ].
Annales de la Société Polonaise de Mathématique 25 (1952) 85–94.
[Feller 1953a] Semigroups of transformations in general weak topologies. Annals of
Mathematics 57 (1953) 287–308.
[Feller 1953b] On the generation of unbounded semigroups of bounded linear oper-
ators. Annals of Mathematics 58 (1953) 166–174.
[Feller 1954a] The general diffusion operator and positivity preserving semigroups
in one dimension. Annals of Mathematics 60 (1954) 417–436.
[Feller 1954b] Diffusion processes in one dimension. Transactions of the American
Mathematical Society 77 (1954) 1–31.
[Feller 1955a] On second order differential operators. Annals of Mathematics 61
(1955) 90–105.
[Feller 1955b] On differential operators and boundary conditions. Communications
on Pure and Applied Mathematics 8 (1955) 203–216.
[Feller 1956a] Boundaries induced by non-negative matrices. Transactions of the
American Mathematical Society 83 (1956) 19–54.
[Feller 1956b] (with Henry P. McKean Jr.) A diffusion equivalent to a countable
Markov chain. Proceedings of the National Academy of Sciences, USA 42
(1956) 351–354.
[Feller 1956c] (with Joanne Elliott) Stochastic processes connected with harmonic
functions. Transactions of the American Mathematical Society 82 (1956)
392–420.
[*Feller 1956d] On generalized Sturm–Liouville operators. In: J. B. Diaz and
L. E. Payne (eds.): Proceedings of the Conference on Differential Equations
(Dedicated to A. Weinstein), University of Maryland Book Store, College
Park (MD) 1956, pp. 251–270.

xxx Bibliography of William Feller


[Feller 1957a] Generalized second order differential operators and their lateral con-
ditions. Illinois Journal of Mathematics 1 (1957) 459–504.
[*Feller 1957b] The numbers of zeros and of changes of sign in a symmetric random
walk. L’Enseignement Mathématique, IIe. Série 3 (1957) 229–235.
[*Feller 1957c] Sur une forme intrinsèque pour les opérateurs différentiels du second
ordre [On an intrinsic form of second-order differential operators]. Publica-
tions de l’Institut de Statistique de l’Université de Paris. Université de Paris
VI, Institut de Statistique, Paris 6 (1957) 291–301.
[Feller 1957d] On boundaries and lateral conditions for the Kolmogorov differential
equations. Annals of Mathematics 65 (1957) 527–570.
Additional notes: Annals of Mathematics 68 (1958) 735–736.
[Feller 1957e] On boundaries defined by stochastic matrices. In: Proceedings of
Symposia in Applied Mathematics. Vol. 7. McGraw-Hill Book Co., New
York (for the American Mathematical Society, Providence (RI)) 1957, 35–
40.
[*Feller 1957f] An Introduction to Probability Theory and Its Applications. Vol. 1.
2nd edn. of [*Feller 1950]. John Wiley & Sons, New York 1957.
[Feller 1958] On the intrinsic form for second order differential operators. Illinois
Journal of Mathematics 2 (1958) 1–18.
[*Feller 1959a] On combinatorial methods in fluctuation theory. In: U. Grenander
(ed.): The Harald Cramér Volume. Almqvist & Wiksell, Stockholm; John
Wiley & Sons, New York 1959, pp. 75–91.
[Feller 1959b] The birth and death processes as diffusion processes. Journal de
Mathématiques Pures et Appliquées, IX. série 38 (1959) 301–345.
[Feller 1959c] Non-Markovian processes with the semigroup property. Annals of
Mathematical Statistics 30 (1959) 1252–1253.
[Feller 1959d] Differential operators with the positive maximum property. Illinois
Journal of Mathematics 3 (1959) 182–186.
[*Feller 1959e] On the equation of the vibrating string. Journal of Mathematics and
Mechanics 8 (1959) 339–348.
[Feller 1960] Some new connections between probability and classical analysis. Pro-
ceedings of the International Congress of Mathematicians, Edinburgh 1958.
Cambridge University Press, Cambridge 1960, pp. 69–86.
[Feller 1961a] (with Steven Orey) A renewal theorem. Journal of Mathematics and
Mechanics 10 (1961) 619–624.
[Feller 1961b] A simple proof for renewal theorems. Communications on Pure and
Applied Mathematics 14 (1961) 285–293.

Selected Works of W. Feller, Volume 1 xxxi


[*Feller 1961c] Chance processes and fluctuations. In: E. F. Beckenbach and
M. R. Hestenes (eds.): Modern Mathematics for the Engineer: Second Se-
ries. University of California Engineering Extension Series. Mc Graw-Hill,
New York 1961, pp. 167–181.
[Feller 1963] On the classical Tauberian theorems. Archiv der Mathematik 14 (1963)
317–322.
[Feller 1964] On semi-Markov processes. Proceedings of the National Academy of
Sciences, USA 51 (1964) 653–659.
[Feller 1966a] On the Fourier representation for Markov chains and the strong ratio
theorem. Journal of Mathematics and Mechanics 15 (1966) 273–283.
[Feller 1966b] On the influence of natural selection on population size. Proceedings
of the National Academy of Sciences, USA 55 (1966) 733–738.
[Feller 1966c] Infinitely divisible distributions and Bessel functions associated with
random walks. SIAM Journal of Applied Mathematics 14 (1966) 864–875.
[*Feller 1966d] An Introduction to Probability Theory and Its Applications. Vol. 2.
John Wiley & Sons, New York 1966.
[Feller 1967a] On regular variation and local limit theorems. In: J. Neyman and
L. LeCam (eds.): Proceedings of the Fifth Berkeley Symposium on Math-
ematical Statistics and Probability 1965/66, Vol. 2, Pt. 1, University of Cal-
ifornia Press, Berkeley, Los Angeles (Ca) 1967, pp. 373–388.
[*Feller 1967b] A direct proof of Stirling’s formula. American Mathematical
Monthly 74 (1967) 1223–1225.
Erratum: American Mathematical Monthly 75 (1968) 518.
[Feller 1967c] On fitness and the cost of natural selection. Genetics Research 9
(1967) 1–15.
[*Feller 1968a] On Müntz’ theorem and completely monotone functions. American
Mathematical Monthly 75 (1968) 342–350.
[Feller 1968b] On the Berry-Esseen theorem. Zeitschrift für Wahrscheinlichkeits-
theorie und verwandte Gebiete 10 (1968) 261–268.
[*Feller 1968c] On probabilities of large deviations. Proceedings of the National
Academy of Sciences, USA 61 (1968) 1224–1227.
[Feller 1968d] An extension of the law of the iterated logarithm to variables without
variance. Journal of Mathematics and Mechanics 18 (1968) 343–355.
[*Feller 1968e] An Introduction to Probability Theory and Its Applications. Vol. 1.
3rd edn. of [*Feller 1950], [*Feller 1957f]. John Wiley & Sons, New York
1968.

xxxii Bibliography of William Feller


[*Feller 1969a] A geometrical analysis of fitness in triply allelic systems. Mathemat-
ical Biosciences 5 (1969) 19–38.
[Feller 1969b] One-sided analogues of Karamata’s regular variation.
L’Enseignement Mathématique, IIe. Série 15 (1969) 107–121.
[Feller 1969c] Limit theorems for probabilities of large deviations. Zeitschrift für
Wahrscheinlichkeitstheorie und verwandte Gebiete 14 (1969) 1–20.
[Feller 1969d] General analogues to the law of the iterated logarithm. Zeitschrift für
Wahrscheinlichkeitstheorie und verwandte Gebiete 14 (1969) 21–26.
[*Feller 1969e] On the fluctuations of sums of independent random variables. Pro-
ceedings of the National Academy of Sciences, USA 63 (1969) 637–639.
[*Feller 1969f] Are life scientists overawed by Statistics? (Too much faith in statis-
tics). Scientific Research 4.3 (1969) 24–29.
[Feller 1970] On the oscillations of sums of independent random variables. Annals
of Mathematics 91 (1970) 402–418.
[*Feller 1971] An Introduction to Probability Theory and Its Applications. Vol. 2.
2nd edn. of [*Feller 1966d]. John Wiley & Sons, New York 1971. Posthu-
mously published. The final version was published with the help of J. Gold-
man, A. Grunbaum, H. P. McKean, L. Pitt and A. Pittenger.

Textbooks
A star “*” indicates that the respective entry is not contained in the present selection.
[*Feller 1941b] (with Jacob David Tamarkin) Partial Differential Equations. Brown
University Summer Session for Advanced Instructions & Research in Me-
chanics. 6/23- 9/13 1941, Providence (RI) 1941.
Chapters 1–3 by J. D. Tamarkin, Chapters 4–7 by William Feller.
Reprinted by the Lewis Flight Propulsion Laboratory, National Committee
for Aeronautics, Cleveland 1956.
[*Feller 1950] An Introduction to Probability Theory and Its Applications. John Wi-
ley & Sons, New York 1950.
[*Feller 1957f] An Introduction to Probability Theory and Its Applications. Vol. 1.
2nd edn. of [*Feller 1950]. John Wiley & Sons, New York 1957.
[*Feller 1966d] An Introduction to Probability Theory and Its Applications. Vol. 2.
John Wiley & Sons, New York 1966.
[*Feller 1968e] An Introduction to Probability Theory and Its Applications. Vol. 1.
3rd edn. of [*Feller 1950], [*Feller 1957f]. John Wiley & Sons, New York
1968.

Selected Works of W. Feller, Volume 1 xxxiii


[*Feller 1971] An Introduction to Probability Theory and Its Applications. Vol. 2.
2nd edn. of [*Feller 1966d]. John Wiley & Sons, New York 1971. Posthu-
mously published. The final version was published with the help of J. Gold-
man, A. Grunbaum, H. P. McKean, L. Pitt and A. Pittenger.

Unpublished Lecture Notes


A star “*” indicates that the respective entry is not contained in the present selection.

[*Feller 1929/30] Partielle Differentialgleichungen der Physik. Kiel W.S. 1929/30


[Partial Differential Equations of Physics. Kiel Winter Term 1929/30]. Hand-
written lecture notes (in German, approx. 100 pp.). Deposited as a bound
manuscript at Princeton University Library.

[*Feller 1942b] Vectors in a Plane; Notes for Mathematics 5, Summer 1942. Brown
University, Providence (RI) 1942. (Brown University Libary: SCI – Level 7,
Aisle 1B 1-SIZE QA261.F45)

[*Feller 1945e] Lectures on probability. Brown University, Semester II, 1944-1945.


Brown University, Providence (RI) 1945. (Brown University Library: SCI
QA273.F34)

xxxiv Bibliography of William Feller


PhD Students of William Feller

1. Forsythe, George E.: Riesz summability methods of order r, for R(r) < 0, Ce-
saro summability of independent random variables. (Brown University 1941)
2. Kincaid, Wilfred MacDonald: Part I: On non-cut sets of locally connected
continua. Part II: An application of orthogonal moments to problems in statis-
tically indeterminate structures. Part III: Numerical methods for finding char-
acteristic roots and vectors of matrices. (Brown University 1946)
3. Juncosa, Mario L.: On the asymptotic behavior of the minimum in a sequence
of random variables. (Cornell 1949)
4. Weber, Maria A.: The solution of a linear differential equation of parabolic
type. (Cornell 1949)
5. Seifert, George H.: Some third order boundary value problems. (Cornell 1950)
6. Shapley, Lloyd Stowell: Additive and non-additive set functions. (Princeton
1953. Co-advisor: A. W. Tucker)
7. Murdoch, Brian Hughes: Preharmonic functions. (Princeton 1955)
8. Billingsley, Patrick Paul: The invariance principle for dependent random vari-
ables. (Princeton 1955)
9. McKean, Henry P. Jr.: Sample functions of stable processes. (Princeton 1955)
10. Trotter, Hale F.: Convergence of semi-groups of operators. (Princeton 1956)
11. Gaver, Donald Paul: Some results in the theory of queues. (Princeton 1956)
12. Knight, Frank Beardsley: Construction of diffusion processes by means of ran-
dom walks. (Princeton 1959)
13. George, Melvin D.: The approximation of solutions of nonlinear differential
equations. (Princeton 1959)
14. Freedman, David Amiel: Mixtures of stochastic processes. (Princeton 1960)
15. Shepp, Lawrence A.: Recurrent sums of random variables. (Princeton 1961)

xxxv
16. Schay, Geza: The equations of diffusion in the special theory of relativity.
(Princeton 1961)
17. Goldman, Jay Robert: Stochastic point processes: limit theorems and infinite
divisibility. (Princeton 1965)
18. Silverstein, Martin Louis: Many particle processes. (Princeton 1965)
19. Weiss, Benjamin: Vibrating systems and positivity preserving semi-groups.
(Princeton 1965)

20. Kurtz, Robert. On regularly varying functions in analysis and probability.


(Princeton 1966)
21. Pitt, Loren Dallas: Two problems in Markov processes. Extending the life span
of Markov processes. Products of Markovian semi-groups. (Princeton 1967)
22. Cleveland, William Swain II: Time series projections, theory and practice.
(Yale 1969. Co-advisor: L. J. Savage)

xxxvi PhD Students of William Feller


William Feller, 1906–19701,a ¶ iv

William Feller died on January 14, 1970, at the age of 63, at Memorial Hospital
in New York. He was a member of the National Academy of Sciences, of
the American Academy of Arts and Sciences, of the Danish and Yugoslavian
Academies of Sciences, fellow of the Royal Statistical Society, past governor of
the Mathematical Association of America, former president of the Institute of
Mathematical Statistics (1946). A few days before his death he had learned
also of his election as honorary member of the London Mathematical Society
and of the decision to award him the National Medal of Science, which his wife
Clara was to receive in his stead on February 16 at the White House. These
outward and visible honors confirm his position in science, to which is added
our affection for his gaiety, enthusiasm, gentleness, and responsiveness.
Will Feller was born in Zagreb, Yugoslavia, on July 7, 1906; he attended
the University there from 1923 to 1925, leaving with a degree equivalent to our
Master of Science. From 1925 to 1928 he worked at the University of Göttin-
gen, where he received the Ph.D. in 1926, at the age of twenty. At Göttingen
he had the good fortune to become acquainted with David Hilbert, always
his ideal mathematician, as well as with Richard Courant, who recognized his
promise and encouraged him to become a mathematician in earnest. In 1928
he went as Privatdozent to the University of Kiel, but left there in 1933 after
refusing to sign a Nazi oath. He passed a year in Copenhagen, where he came
to know Harold Bohr and his brother Niels, and then five years (1934-1939)
at the University of Stockholm, in the vicinity of Marcel Riesz and Harald
Cramér. It was during his last year there, on July 27, 1938, that he married
1 Editor’s note. This article, prepared by the Editors, is a combination of the memorial

resolution by the Faculty of Princeton University and the documents supporting Feller’s
nomination for the National Medal of Science.
a This text is retyped, with permission, from the obituary which appeared in The Annals

of Mathematical Statistics 41 (1970), No. 6, pp. iv–xiii. Thec Institute of Mathematical


Statistics. The obituary contained a rather complete bibliography of W. Feller, but we
decided to refer to the more comprehensive bibliography compiled by the present editors
which appears on pp. xxv–xxxiv; all citations of the form [Feller 19nn], resp., [*Feller 19nn]
(if the respective paper is not included in these Selecta) point to this bibliography. The
pilcrow ¶ indicates a page break in the original text, and the original pagination is indicated
in the margin.

Ó Springer International Publishing Switzerland 2015 1


R.L. Schilling et al. (eds.), Selected Papers I,
Clara Nielsen, who had been his student at Kiel.
In 1939 Will and Clara moved to Providence, Rhode Island, where Will
became associate professor at Brown University as well as the first executive
editor of Mathematical Reviews; he deserves the gratitude of mathematicians
for his six years of effort establishing the new journal, now the leading review
of mathematics in the world. In 1944, toward the end of the period at Brown,
Will became a citizen of the United States (District Court, Providence). The
following year he accepted a professorship at Cornell University. Finally, in
1950, he came to Princeton as Eugene Higgins Professor of Mathematics, a po-
sition which he held until his death. In 1966 he was appointed also Permanent
Visiting Professor at Rockefeller University where, during two years of leave
from Princeton in 1965-66 and 1967-68, he found himself serving in part as li-
aison between geneticists and mathematicians. Feller’s work had great variety
and scope, for he contributed to calculus, geometry and functional analysis.
But about half of his papers lie in the field of probability. Only these, his best
known papers, are discussed here.
¶v ¶ The most notable of Feller’s papers on probability before 1950 dealt with
the classical limit theorems of probability, which concern the asymptotic be-
havior of the sums Sn = X1 + · · · + Xn of a sequence X1 , X2 , · · · of independent
random variables. One supposed the sequence S1 , S2 , · · · to be essentially di-
vergent, and in the classical theory one assumes the mean value mn = E(Sn )
and the variance vn = s2n = E(Sn2 ) − m2n to exist. In 1930, when Feller was 24
years old, the state of development was this:
Kolmogorov had just given necessary and sufficient conditions for the law
of large numbers in the classical version, that is to say, conditions on the Xn
to ensure that, for every  > 0,
(1) Pr{|Sn − mn | < n } → 1 as n → ∞,
or more intuitively that for large n the average (X1 + · · · + Xn )/n is pretty sure
to be close to its mean value mn /n.
The central limit theorem, whose history goes back to De Moivre and
Laplace, seemed to be near its final form. Indeed, the existence of the mean
and variance being granted, rather weak additional conditions were known en-
suring that the reduced variable (Sn − mn )/sn have a distribution close to the
unit normal.
The foregoing investigations treated only the distribution measures of the
sums. Much more delicate is the study of the asymptotic behavior of the
sequence S1 , S2 , · · · itself, first properly formulated in 1909 by Emil Borel.
After notable advances by Hausdorff, Hardy and Littlewood, and Khinchin,
Kolmogorov arrived in 1929 at what seemed the last work, the famous law of
the iterated logarithm, which says that almost certainly
S n − mn
(2) lim sup √ =1
n→∞ 2vn log log vn
provided the Xn satisfy rather strong conditions of boundedness.

2 The Annals of Mathematical Statistics 41 (1970), No. 6, pp. iv–xiii


Other problems concerning the sums Sn had not received such elaboration.
In particular, problems of renewal theory, so important in biology, population
research or engineering, had been solved in many particular instances but had
not yet been brought to the attention of mathematicians capable of establishing
the general theory. (In this context, Xn is the lifetime of the nth replacement;
all Xn have the same distribution; the quantity of interest is the age at time
t of the replacement then in use, that is to say, Y (t) = t − Sn , where n is
determined by the inequalities Sn < t ≤ Sn+1 , and one seeks conditions for the
distribution of Y (t) to converge as t increases.)
Let us follow the development of these topics after 1930.
In 1935 both Feller [Feller 1935c] and Paul Lévy, quite separately, showed
the mean mn and variance s2n to be altogether foreign to the proper statement
of the central limit theorem, then proceeded to give simple conditions neces-
sary and sufficient for the distribution of Sn to be nearly normal. Feller was
not content with this ‘best possible’ result; he returned to the problem time
and again to simplify the proof, to calculate explicit estimates [Feller 1943b],
[Feller 1968b], to study particular examples [Feller 1945a]. In 1937 Feller
[Feller 1937b] similarly pointed out the lack of naturality of the mean ¶ mn , ¶ vi
and the scaling factor n in the formulation (1) of the law of large numbers,
then established the definitive version.
The law of the iterated logarithm had a startling continuation. First Lévy
gave new insight into the situation by noting that any sequence (an ) of con-
stants must be either a ‘lower sequence’ for the Sn , or else an ‘upper sequence’;
it is a lower (or upper) sequence if almost certainly the inequality Sn < an (or
Sn > an ) holds for only finitely many values of n. The discovery initiated
the problem of finding a criterion for upper and lower sequences, which was
accomplished by Kolmogorov for the simplest choices of the Xn . (His proof
was never published, but one can be found hidden in a paper by Petrovsky,
or explicit in a paper of 1942 by Erdös.) Finally, in 1943, Feller [Feller 1943c]
established the generalization of Kolmogorov’s criterion to arbitrary random
variables, under mild and reasonable conditions of regularity. The paper re-
mains certainly one of the most intense, profound arguments in mathematics.
Although difficult to read, because of the complexity of the reasoning, it has
still served as the model in searching for similarly complete answers to such
questions as the growth of Mn = max(S1 , · · · , Sn ) or the number of changes of
sign of the Sn . In recent lectures [Feller 1969d] Feller has recast the argument
and one must admit, in tribute to his exposition, that the new version not
only displays the author as the one best understanding his own work but also
provides the key to using his technique. He has also simplified the proof of
Kolmogorov’s law (2) in order to make it accessible to the beginning student.
Characteristically, Feller has investigated what can be said when the restric-
tions imposed in [Feller 1943c] are not assumed; for example, [Feller 1946b]
and the recent paper [Feller 1968d] both contain unexpected results.
The development of renewal theory illustrates both Feller’s power as a
mathematician and his interest in the applications of probability theory. His

Z. W. Birnbaum — Selected Works of W. Feller, Volume 1 3


paper [Feller 1940c] presented what was known in 1940, furnishing proofs,
unifying the theory, bringing the problems to the attention of mathemati-
cians. Exact results were known in several examples, asymptotic expansions
had been found under stringent assumptions, but the ultimate simplicity and
beauty of the subject were still concealed. The main problem turned out to be,
in the notation introduced above, the proof that E(Y (t)) approached E(X1 ),
a problem that can be phrased in integral equations or power series with-
out mention of probability; it was the decided opinion of most experts that
some supplementary hypothesis (beyond a certain obvious one) was needed
to ensure convergence. However, in the paper [Feller 1949a] of 1949, Erdös,
Feller and Pollard succeeded in showing convergence to hold universally, a re-
sult of extreme importance in theoretical applications, as shown by the paper
[Feller 1949d] on the theory of recurrent events. Feller had never tired of the
subject; for example, among his papers of 1961, [Feller 1961b] gives a simple
proof based on an idea of Hunt and Choquet and Deny, while [Feller 1961a]
introduces new methods which yield a generalization of the standard theorem.
Although Feller continued his contributions to classical problems, the years
1950 to 1962 saw him engaging all his effort in work of another spirit-the
creation of a theory of diffusion which combines functional analysis, differential
equations and probability. There was plenty of work for all: During this period
¶ vii ¶ twelve students wrote their theses under Feller, most of them on topics
related to diffusion. Mathematicians outside Princeton also took part in the
development, so that today diffusion in one dimension is perhaps the most
thoroughly investigated of stochastic phenomena.
A few historical remarks are needed to place this accomplishment in the
right setting.
Kolmogorov, in a remarkable paper of 1931, showed that a Markov process
satisfies certain integro-differential equations, the best known instance being
Brownian motion and the heat equation (an early result of Einstein). Con-
sequently there arose, for equations of a certain type, the questions of the
existence and the uniqueness of a corresponding Markov process, treated in
detail by Feller in [Feller 1936c] and [*Feller 1940b]. The exact relations re-
mained unclear, as Feller himself often pointed out in attempts to engage other
mathematicians in fruitful research. In addition, the study of other fields, es-
pecially genetics, had acquainted him with a number of situations concerning
differential operators where it was important to classify the totality of corre-
sponding processes.
Thus Feller was led to study, from a new point of view, the parabolic equa-
tion ∂u/∂t = Lu, with L a second-order differential operator on a linear inter-
val. It turned out that an associated Markov process corresponded to a ‘bound-
ary condition’, but one which might differ markedly from those familiar in the
theory of differential equations. The theory of semigroups of operators entered
as an essential tool in formulating and solving the analytic problems; the idea
was to view a boundary condition as a restriction on the domain of L which
made the restricted operator the infinitesimal generator of an appropriate semi-

4 The Annals of Mathematical Statistics 41 (1970), No. 6, pp. iv–xiii


group. The papers [Feller 1951e, Feller 1951d, Feller 1952c, Feller 1952a] set-
tled the main problems, [Feller 1953a], [Feller 1953b] and much later
[Feller 1953b] dealt with related questions about abstract semigroups of oper-
ators, and [Feller 1954b] presented the interpretation of the various boundary
conditions in terms of the behavior of the paths of the Markov processes.
Feller’s next step was to abstract and generalize the notion of differential op-
erator appearing in diffusion theory, and especially to avoid reference to an ir-
relevant differentiable structure. He found that every such operator (provided
certain degeneracies are excluded) can be written, essentially uniquely, in the
form (d/dμ)(d/dx), where x is a coordinate function (the scale parameter)
and μ is an increasing function (the speed measure), both intrinsically associ-
ated to the operator; these papers [Feller 1954a], [Feller 1955a], [Feller 1955b],
[*Feller 1956d], [Feller 1957a], [Feller 1958], [Feller 1959d] made the study of
the most general linear diffusion hardly more difficult than the study of Brow-
nian motion.
Feller himself was the first to apply the ideas to other situations. In
[Feller 1959b] he showed how simply one can treat the birth-and-death process
by introducing the right parameters. In [Feller 1956a], [Feller 1957d], return-
ing to the Kolmogorov equations for discontinuous processes, he initiated the
general theory of boundaries for Markov processes, a subject which has had
a tremendous growth in the last decade. In two recent papers [Feller 1966b],
[Feller 1967c], the outcome of discussions with Th. Dobzhansky at the Rock-
efeller Institute, he dealt with the ‘Haldane paradox’: evolution by natural
selection involves as ‘cost’ a number of ‘genetic deaths’, often so great that
evolutionary change can occur only very slowly. Feller succeeded in showing
¶ that the Haldane paradox is spurious; rather amusingly, as early as 1952, in ¶ viii
the address [Feller 1951d], he had already analyzed the assumptions implicit in
the current mathematical theory of evolution and had warned against the one
of constant population size, which was later to lead to the Haldane paradox.
Feller’s activity increased with the years, and he never cast aside a problem
once begun, even when the solution appeared complete. Many of the twelve
papers written or published in his last two years treat interests going back
to the thirties. Among them is the paper of which he was proudest, General
analogues of the law of the iterated logarithm [Feller 1969d]; surpassing the
great paper [Feller 1943c] of 1943 in simplicity and generality, it is the outcome
of thirty years of struggle to clarify his understanding.
His own research will of course remain Feller’s principal contribution to
mathematics, but he has helped the progress of science and humanity in many
other ways. He always seemed to have time for discussion with students or
colleagues – indeed, with almost anyone and on any subject. His students, es-
pecially, recall the patience with which he guided them and the endless hours he
spent listening to their conjectures or voicing his own. His expository lectures
and articles have done a great deal to spread the knowledge and recognition of
probability throughout the world. His books catch the flavor of his mathemat-
ics, though only his presence could convey the full enthusiasm of his lectures.

Z. W. Birnbaum — Selected Works of W. Feller, Volume 1 5


The jacket cover of the third edition of Volume I bears this appreciation from
Gian-Carlo Rota of MIT: ‘. . . one of the great events in mathematics of this
century. Together with Weber’s Algebra and Artin’s Geometric Algebra this
is the finest text book in mathematics in this century. It is a delight to read
and it will be immensely useful to scientists in all fields.’ The book achieves
a remarkable popular appeal with no sacrifice of mathematical rigor, and is
good evidence of Feller’s contention that accurate theory has great practical
value and interest.

6 The Annals of Mathematical Statistics 41 (1970), No. 6, pp. iv–xiii


Ó Springer International Publishing Switzerland 2015 7
R.L. Schilling et al. (eds.), Selected Papers I,
8 6th Berkeley Symposium Math. Statist. Probab. (1972), Vol II, pp. xv-xx
J. L. Doob — Selected Works of W. Feller, Volume 1 9
10 6th Berkeley Symposium Math. Statist. Probab. (1972), Vol II, pp. xv-xx
J. L. Doob — Selected Works of W. Feller, Volume 1 11
12 6th Berkeley Symposium Math. Statist. Probab. (1972), Vol II, pp. xv-xx
Ó Springer International Publishing Switzerland 2015 13
R.L. Schilling et al. (eds.), Selected Papers I,
14 6th Berkeley Symposium Math. Statist. Probab. (1972), Vol II, pp. xxi-xxiii
M. Kac — Selected Works of W. Feller, Volume 1 15
William Feller. A Biography

by René L. Schilling from Dresden


and Wojbor A. Woyczyński from Cleveland

William Feller was born on 7 July 1906 in Zagreb, Croatia, which was then
a Southern province of the Austro-Hungarian Empire. He was the ninth of
twelve children of Eugen Viktor Feller and Ida Feller. Following the Roman-
Catholic tradition, the young baby boy was named Vilibald after the saint (St.
Willibald) whose feast day fell on his birthday. In the church register of births
(city register)1 he had been registered as Vilibald Srećko2 Feller. Feller called
himself Vilim3 (which is the Croatian form of William), and throughout his life
he would adopt the native versions of his first name depending on the country
he lived in: thus in Germany and Scandinavia he called himself Willi/Willy,
and William/Will in America.
Feller’s father Eugen was born in 1871 in Lemberg, Galicia, in the North-
eastern corner of the Austro-Hungarian Empire (now Lviv, in Western Ukraine),
and died in 1936 in Zagreb. His wife Ida was born in 1870 (place unknown),
and passed away in Zagreb two years after her husband’s death. His paternal
grandparents were David Feller and Elizabeth (Elsa) Holzer from Lemberg, and
his maternal grandparents were Ferdinand Oemichen and Hermina Peerz (or
Perc). Feller’s parents owned a pharmaceutical company. The then-famous,
now-obscure Elsa fluid, named after Eugen’s mother, was one of the mainstays

1 Žubrinić [47, p. 9]. The references [Feller 19nn] and [*Feller 19nn] (the star indicating

that the respective paper is not contained in these Selecta) refer to Feller’s bibliography,
while [n] points to the list of references at the end of this essay.
2 Srećko is the Croatian equivalent of Felix, ‘the lucky’.
3 The rather anecdotal story in Seneta [40, p. 87] is based on the claim that July 7 is

St. Willibald in the “(German) Catholic saint’s list” while it is Sv. Vilim’s feast day in the
“Croatian Catholic list” which cannot be verified (www.namecalendar.net, accessed 3 August
2014, gives for Vilim the dates 6 April, 23 May, 8 June and 25 June, the Croatian Catholic
Calendar www.hkr.hr/kalendar, accessed 29 August 2014, gives the dates 10 February, 23
May, 8 June and 29 July). It is more likely that Vili(m) was used as a family nickname
instead of the more formal Vilibald; also in German, it is common practice to abbreviate
Willibald by Willi or Willy.

Ó Springer International Publishing Switzerland 2015 17


R.L. Schilling et al. (eds.), Selected Papers I,
Picture 1: William Feller as a young
boy, aged 4. The handwritten note
in the lower left corner says “Unser
Willy” (our Willy). The photograph
is taken from the family’s picture al-
bum, courtesy of Joanne Elliott.

of the enterprise. It had been marketed as an elixir capable to cure all kinds
of maladies, such as headaches, colds, back pains, etc., and was the source
of the family’s substantial wealth. Like many upper-class families, the Fellers
were bilingual and William spoke both Croatian and German. Young William
was raised in the Roman-Catholic tradition even though his paternal grand-
father David was Jewish; he had converted to Catholicism when marrying
Elsa.4 Feller’s ancestors probably came to Zagreb during the second half of
the nineteenth century. More details on Feller’s parents and grandparents can
be found in the two studies by Fatović-Ferenčić, and Ferber-Bogdan [10, 11].

Zagreb 1906–1925
We do not know the exact year when the Fellers moved to their modern, spa-
cious and elegantly decorated villa on 31a, Jurjevska street,5 but it is safe to
assume that William spent most of his childhood there. Judging by family
photos and personal stories, they lived in a pleasant and productive envi-
ronment. Several of William’s brothers had successful careers of their own:

4 Žubrinić[47, pp. 2 ff.]


5 TheVilla was built in 1910/11 and designed by Mathias Feller from Munich, William’s
uncle. The whole September 1914 issue of the German architectural magazine Innendeko-
ration [12] is devoted to its description.

18 William Feller. A Biography


Picture 2: The Fellers’ Villa at 31a, Jurjevska Street, in Zagreb. Clockwise
from the top left: View from the South in 1913/14 and today, from the East
(today), and the West (in 1913/14). Photographs on the left side courtesy of
Joanne Elliott, on the right side courtesy of Zoran Vondraček.

Ferdinand (Ferdo) (1897–1960) was a well-known pharmacist, Miroslav (Fritz)


(1901–1961) was a poet, novelist, philosopher, and an art critic, and Marijan
(1903–1974) was a pianist with an academic career in Zagreb and Sarajevo.6
Not much is known about William Feller’s schooldays. Until the end of
the first World War (1918), Croatia and Zagreb (Agram in German) were part
of the autonomous “Kingdom of Dalmatia, Croatia and Slavonia”, associated
with the Kingdom of Hungary, itself part of the Austro-Hungarian monarchy.
From 1918 onwards it was part of the Kingdom of Serbs, Croats and Slovenes,
which was renamed “Kingdom of Yugoslavia” in 1929. In Feller’s days the edu-
cational system still followed the Austrian and German tradition, and William
(as did five of his brothers) attended the I. Realgymnasium 7,8 in Zagreb, grad-

6 Žubrinić [47, pp. 26 ff.] and the Croatian Biographical Encyclopedia [22].
7 German for a “secondary school that prepares students for the university, that offers
Latin but no Greek, and that typically emphasizes sciences and modern languages” (www.
merriam-webster.com/dictionary/realgymnasium, accessed 30/7/2014).
8 The former I. Realgymnasium today houses the Mimara museum of art, one of Croatia’s

foremost art collections.

R. Schilling & W. Woyczyński — Selected Works of W. Feller, Volume 1 19


uating in June 1923.9 During his first years there he was a private pupil10 ; he
also skipped two years.11 One of Feller’s private teachers was Stanko Vlögel,
a mathematician and faculty member at the University of Zagreb.12
In October 1923 Feller enrolled at the University of Zagreb as a student of
Mathematics and Physics. From the matriculation forms, which for the sum-
mer terms of 1924, and 1925, and the winter term of 1924/25, are reprinted in
Žubrinić [47, p. 14, p. 23], we learn that Feller was then using Vilim (rather
than Vilibald) as his first name. The forms also contain information about
the courses Feller had been taking: Mathematics I–IV (with M. Kiseljak),
Differential and Integral Calculus (V. Varićak), Infinite Series (S. Bohniček),
Number Theory 1,2 (S. Bohniček), Theory of Real Functions (V. Vranić), Cal-
culus of Variations (V. Varićak), and two mathematical seminars (V. Varićak),
along with a panoply of experimental and theoretical physics13 , some chem-
istry, and a sample of courses in pedagogy and psychology. The fact that
Feller simultaneously studied two subjects was well within the Austro-German
tradition. The final exam was either the Staatsexamen, which would qualify
him to teach at grammar schools (Gymnasium and Realgymnasium), or the
doctorate (the Diplom, Master’s Diploma, was introduced only later, in the
1930s – 50s). From today’s perspective, Feller’s educational pathway at the
University of Zagreb is comparable to the standard education at an average
German or Austrian university of the 1920s. Among Feller’s teachers Vladimir
Varićak (1865–1942) was the most internationally visible scientist, renowned
for his contributions to Einstein’s theory of relativity and non-Euclidean geom-
etry. Very likely Varićak’s lectures and mathematical seminars made a lasting
impact on Feller’s early work in geometry.

The German years: Göttingen & Kiel 1925–1933


In Constance Reid’s biographies of David Hilbert and Richard Courant we find
a confirmation of the fact that
At the beginning of the twentieth century, mathematics students
all over the world were receiveing the same advice: “Pack your
suitcase and take yourself to Göttingen!”14
Thus, after four semesters of study at Zagreb,15 Feller enrolled for the winter
term of 1925/26 at the University of Göttingen. After the first World War,
9 Žubrinić [47, p. 12]
10 I.e.
he had private teachers at home, only the exams to progress to the next year were
taken at the Realgymnasium.
11 “[I]ch [. . . ] durfte mit Erlaubnis der Regierung zwei Klassen kontrahieren” [1, hand-

written CV in the Promotionsakte].


12 Žubrinić [47, p. 12]
13 Among the courses were: Mechanics & Acoustics, Experiments in Physics, Theoretical

Mechanics, Thermodynamics, Optics, Theory of Heat and a Physics Colloquium.


14 Reid [33, p. 102]
15 Feller did not obtain a formal degree from Zagreb. The Promotionsakte from Göttingen

20 William Feller. A Biography


... in Göttingen there was again a scientific paradise. Gifted stu-
dents flocked to the university. There was a constant procession
of distinguished visitors from all over the world. Sometimes they
came merely for a single talk or a series of talks to the mathe-
matische Gesellschaft. Often they lectured for a full term as guest
professors. The very air seemed to [. . . ] crackle [. . . ] with scientific
electricity.16
Feller was only nineteen when he moved to Göttingen. We do not know many
details of his student life, but the Promotionsakte (dissertation file) [1] contains
his address: Nikolausbergerweg 43, Göttingen. He attended the Anfänger-
praktikum, problem classes for beginners, a revolutionary idea introduced by
Richard Courant17 , and very quickly caught the attention of Courant’s assis-
tants.18 Reid [34, p. 111] contains an anecdote about the first contact between
Feller and Courant:

At the beginning of the [winter] term in 1925, [. . . ] [the assistants]


promptly alerted Courant to the presence of Willy Feller. After
the third calculus lecture, to Feller’s amazement, the professor –
an unbelievably august personage to a European student of that
day – approached. Questioning the boy about his education in
his native land, Courant discovered that Feller was already doing
mathematics on his own. He told him to bring his work to the next
lecture. Even thus instructed, Feller was too bashful to produce his
papers on the appointed day. The next morning he was awakened
by a commotion on the stairs leading to his attic room. There was
a knock on the door. Courant entered and left a few moments later
with the desired papers.19
Very likely the papers were related to the topics covered in the Mathemat-
ics Seminar Feller attended in his Zagreb days. Soon thereafter Feller joined
Courant’s circle of students and collaborators, which proved to be hugely bene-
ficial to him: on 10 August 1926, after only two semesters at Göttingen, Feller,
who had just turned twenty, submitted his Ph.D. thesis Über algebraisch rek-
tifizierbare transzendente Kurven [*Feller 1926]. Courant gave the thesis the

[1] lists an Abgangszeugnis (transcript of marks) of the University in (sic!) Agram (Zagreb)
dated 13 October 1925. The Abgangszeugnis, however, was returned to the candidate after
the defence of the thesis and is not part of the Promotionsakte.
16 Reid [34, p. 119]
17 Reid [34, p. 99]. In those years, Courant’s assistants, O. Neugebauer and
K. O. Friedrichs were in charge of the Anfängerpraktikum. Feller would meet them again
later in his career.
18 Around the same time quite a few very gifted students were at Göttingen, e.g. Herbert

Busemann (1905–1994), Hans Lewy (1904–1988), John von Neumann (1903–1957), and
Franz Rellich (1906–1955).
19 Rota also mentions this as an example of one of Feller’s “bombastic stories” [38, p. 230].

R. Schilling & W. Woyczyński — Selected Works of W. Feller, Volume 1 21


grade of I, or at least II,20 remarking
that [Feller] independently, without outside input, found the topic
and worked on it, notably in an environment, (Agram), where no
outside input was possible.21
Courant was slightly unhappy about the fact that the topic was not com-
pletely in the mainstream of the mathematics of the day,22 but he found this
shortcoming balanced by the complete independence of Feller’s effort. Indeed,
the topic of Feller’s thesis (algebraically rectifiable transcendental curves) was
much closer to Varićak’s work than to anything done at Göttingen at that
time, and Feller would never come back to this line of research.
The 90-minute oral exam took place on 3 November 1926 and the exami-
nation subjects were Physics (examiner: J. Franck), Mathematical Analysis
(G. Herglotz), and Mathematics – Geometry (R. Courant). Feller passed
this part with a I (outstanding). German universities require that the the-
sis be printed, so the degree was not finally awarded until 18 July 1927, af-
ter the thesis had been accepted for publication in Mathematische Zeitschrift
[Feller 1928].
One year later Abraham Adolf Fraenkel23 invited the “highly gifted” Feller
to join Kiel University. He moved there in 1928 and almost immediately (in
February of 1929), with Fraenkel’s backing, obtained the habilitation in math-
ematics,24 and became Privatdozent. The habilitation thesis [*Feller 1929]
(published as [Feller 1930]) Über die Lösungen der linearen partiellen Differ-
entialgleichungen zweiter Ordnung vom elliptischen Typus 25 was obviously in-
fluenced by his time at Göttingen (see Jacob’s detailed commentary [24] in
these Selecta).
At about the same time, in October 1929, Erhard Tornier started his job
as a visiting professor at Kiel,26 an innocuous event at the time which had
momentous repercussions in Feller’s life four years later. He became Feller’s
colleague and it is very likely that they began discussing mathematics right
20 Promotionsakte [1]. At German universities (and also at Göttingen) a PhD the-

sis/examination is marked on the following scale: I is outstanding, summa cum laude, II


very good, magna cum laude, III good, cum laude, IV satisfactory, rite.
21 dass er [Feller] selbständig, ohne jede äussere Anregung, sein Problem sich gestellt und

bearbeitet hat und zwar in einer Umgebung, (Agram), in welcher er keinerlei Anregung von
aussen empfangen konnte.
22 Courant notes “that the choice of the topic leads away from those questions which

nowadays appear to be interesting and important” [“dass die Stellung des Themas etwas aus
dem Rahmen der Fragen hin[aus]führt, die uns heute interessant und wichtig erscheinen”].
23 of Zermelo–Fraenkel fame, worked on number theory, set theory and the foundations

of mathematics. He came to Kiel in 1928. His relation to Feller and Tornier is described in
his autobiography Lebenskreise [17, pp. 154 f ].
24 Habilitation is a post-doctoral degree that bestows on the recipient the venia legendi,

the permission to teach, examine and supervise students independently at the university
which conferred the degree. Usually a full professor of the faculty has to support one’s
application for habilitation.
25 On the solutions of second-order linear partial differential equations of elliptic type
26 Hochkirchen [21, p. 26]

22 William Feller. A Biography


Picture 3: William Feller in his twenties or thirties. Photographs courtesy of
Joanne Elliott.

away since Tornier was interested in the foundational questions of probability


theory. They were then a very active topic in mathematics and several big open
problems remained following introduction of von Mises’ Kollektivs (1919), and
the appearance of Steinhaus’ (1923) Fundamenta Mathematicae paper [44]
which tried to provide probability theory with a measure-theoretic foundation
in the sense of Lebesgue. But Kolmogorov’s axiomatization (1933) had not
yet been published, and Tornier’s strategy could be described as fitting in-
between the above approaches.27 This may have been Feller’s first contact
with modern probability and measure theory. Having completed his visiting
position appointment at Kiel, Tornier left for Halle but at the end of 1931 he
asked Fraenkel

to be allowed to transfer his habilitation [which he obtained at Halle


in 1930] to Kiel, in order to be ‘in my [Fraenkel’s] environment
and to gain substantial knowledge from me’. I knew Tornier only
superficially,28 but I did grant his wish, mainly because of Feller
[our emphasis].29
Thus Feller changed his subject yet again, and started a collaboration with
Erhard Tornier on measure theory and the foundations of probability theory,
(see the essay of Fischer [15] in these Selecta) which resulted in several joint
publications [*Feller 1931, *Feller 1932a, *Feller 1932b] and his own papers
27 Hochkirchen [21]. Feller’s paper [*Feller 1938a] is largely trying to explain why Tornier’s
theory is in line with Kolmogorov’s approach; cf. also Fischer [15] in these Selecta.
28 When Tornier was a visiting faculty member at Kiel, Fraenkel was in Jerusalem. Both,

however, have had the same doctoral advisor: Kurt Hensel at the University of Marburg.
29 Fraenkel [17, p. 131; our translation from German]

R. Schilling & W. Woyczyński — Selected Works of W. Feller, Volume 1 23


on Lebesgue integration [*Feller 1932c] appear; the later papers [Feller 1934a]
(with Busemann), [*Feller 1938a], and [Feller 1939c], also belong to this circle
of ideas.
The other side of the coin was that Tornier turned out to be an ardent
Nazi supporter joining the Nazi party (NSDAP) as early as 1932.30 As soon
as Hitler came to power on 30 January 1933, Tornier started denouncing his
‘Jewish’ and non-Aryan colleagues, including Feller, whose paternal grandfa-
ther was Jewish. In Nazi language, Feller was a Vierteljude (‘Quarter-Jew’)
and, as such, not fit for civil service. The Gesetz zur Wiederherstellung des
Berufsbeamtentums (BBG)31 required all staff members to fill out a question-
naire and to prove their Aryan ancestry (Ariernachweis) – and so did Feller.32
Since Feller was considered to be a Vierteljude, the Freie Kieler Studenten-
schaft – a Nazi student organization – demanded on 21 April 1933 (two days
after Feller’s questionnaire had been evaluated), that Feller be suspended, and
this request was supported by Tornier who emphasized Feller’s Croatian, hence
Slavic, origins. On the basis of the BBG, Feller’s right to teach at Kiel uni-
versity was revoked on 9 September 1933.33 A month later he left Germany
for Denmark.34
During Feller’s Kiel days a young student, Clara Nielsen, attended one of
his classes.35 Several years later, in 1938, she became his wife. Unfortunately,
we do not have much information regarding Clara Nielsen. Most likely she
was German, possibly a member of the Danish minority in Schleswig; we know
from Feller’s letter36 that “Clara’s father (who is 2/3 danish [sic!], her mother
being dutch [sic!]) lives in Flensburg”, a city on the German–Danish border.
It seems she was born on 15 September 1910,37 which would have made her
twenty-four years old in 1933, the year Feller left Germany.

30 He joined on 1 May 1932, after the philosophical faculty at Kiel agreed to recognize his

Halle habilitation on 2 March 1932, cf. [21, p. 29].


31 Act for the re-instatement of the professional civil service (7 April 1933) a rather eu-

phemistically named Nazi law which allowed to cleanse the civil service from non-Aryan and
politically non-conforming members. Its §3 is one of the earliest Aryan paragraphs. The
full text is available under www.documentarchiv.de/ns/beamtenges.html (accessed 1 August
2014), excerpts are in Uhlig [45, pp. 143 ff.].
32 Doob [9] writes of an “Nazi oath” which Feller “refused to sign [. . . ] and was forced

to leave”; this story is reiterated at several places but it cannot be confirmed. On the
contrary, Feller filled out all required forms and he was forced to leave on the basis of
his ancestral information. The form, which appeared as an appendix to the BBG, is
available online alex.onb.ac.at/cgi-content/alex?apm=0&aid=dra&datum=19330004&zoom=
2&seite=00000253&ues=0&x=13&y=7 (accessed 20 August 2014).
33 See the official web-page of the University of Kiel, www.uni-kiel.de/ns-zeit/bios/

feller-willy.shtml (accessed 31 July 2014) and the documentation in Uhlig [45, p. 23].
34 Uhlig [45, p. 23]
35 Doob [9, p. xvii], Birnbaum [4, p. iv]
36 Handwritten letter of Feller to Borge Jessen [13]
37 U.S. Social Security Death Index, accessed on 1 August 2014 via death-records.

findthebest.com/l/107145366/Clara-Feller. This record shows that a Clara Feller from Prince-


ton, Mercer County, was born on 15 September 1910 and died on 1 October 1973. Her So-
cial Security number was also provided.

24 William Feller. A Biography


European Exile: Copenhagen & Stockholm
1933–1939
Feller spent the Academic Year 1933/34 in Copenhagen, where Harald and
Niels Bohr had been actively helping German emigrés. There he ran again
into Herbert Busemann and Otto Neugebauer who, in the years 1934–38, con-
tinued editing the Zentralblatt für Mathematik und ihre Grenzgebiete 38 from
Denmark. Feller knew both from his Göttingen days and very quickly started
collaborating with them.
His work with Busemann focused on Lebesgue integration [Feller 1934a]
and on differential geometry resulting in a series of papers [*Feller 1935a,
*Feller 1935b, *Feller 1936a, Feller 1936b]. Erhard Scholz discusses their con-
tribution to the surface theory later in these Selecta [39].
Neugebauer recruited Feller to write for Zentralblatt, where we find eight of
his contributions written in the years 1933-38. Two reviews, of Kolmogorov’s
seminal 1933 Grundbegriffe [Zbl. 0007.21601], and Khinchine’s Asymptotische
Gesetze [Zbl. 0007.21601] are particularly noteworthy: Both are enthusiasti-
cally friendly, in spite of the prevailing neutral tone of Zentralblatt’s entries,
and show Feller’s familiarity with the new measure-theoretic probability the-
ory. It should be emphasized that in 1933 Kolmogorov’s work was just one
more attempt to axiomatize probability theory. Feller was among the first
authors who fully recognized the importance of this approach and adopted it
throughout his career.

38 The following historical remarks on the Zentralblatt are from zbmath.org/about/ (ac-

cessed 1 August 2014): “The Zentralblatt für Mathematik und ihre Grenzgebiete was
founded in 1931 with the aim to publish reviews of the entire world literature in mathe-
matics and related areas. Zentralblatt became the second comprehensive review journal for
mathematics in Germany after the Jahrbuch über die Fortschritte der Mathematik (estab-
lished in 1868) which has been active until the 1940s. Although the Zentralblatt had, essen-
tially, the same agenda as the Jahrbuch, the latter aimed at maintaining the completeness
of the coverage and the classification of all articles in each calendar year, whereas Zentral-
blatt put more emphasis on the promptness of the reviews and the international aspect.
“The initiative for the foundation of a new mathematical reviewing journal came from
mathematicians Otto Neugebauer, Richard Courant, and Harald Bohr, together with the
publisher Ferdinand Springer. The rapidly growing number of newly published mathematics
works in the 1920s and the scientists’ need for obtaining quick information on recent material
motivated the decision to create an alternative service to the Jahrbuch. [...]
“Neugebauer directed the new periodical for several years until the political situation
in Germany made his position as editor-in-chief unsustainable. In 1933, shortly after the
Nazi party rose to power, a law was enacted which banned Jews and political enemies from
holding jobs as civil servants. A call to dismiss Courant, Neugebauer, Landau, Bernays,
and Noether appeared in a local newspaper and, soon afterwards, Courant escaped to the
UK and later moved to New York. Due to this pressure, Neugebauer decided to resign from
his post at Göttingen University and in 1934 took up a professorship in Copenhagen, from
where he continued his work for Zentralblatt.
“The struggle to produce the reviewing journal became more difficult throughout this
period, however, for the Nazis tried to influence the editorial policy. Neugebauer eventually
gave up his position as editor-in-chief in 1938 after a series of incidents, including Levi-Civita
being dismissed from the editorial board without his knowledge.”

R. Schilling & W. Woyczyński — Selected Works of W. Feller, Volume 1 25


In 1934, with Harald Bohr serving as an intermediary, Feller obtained a
position with Harald Cramér’s group in Stockholm,
and stayed on in Stockholm for five years. He made a great number
of Swedish friends, collaborating with economists and biologists as
well as with the members of our [Cramér’s] probabilistic group.
He had studied in Göttingen and was well initiated in the great
traditions of this mathematical center. We tried hard to get a
permanent position for him in Sweden, but in those years before
the war this was next to impossible, and it was with great regret
that we saw him leave [in 1939] for the United States, where an
outstanding career was awaiting him.39

With his already published work in probability theory and analysis, and pa-
pers on differential geometry, Feller had choices to make, and the contact with
Cramér tipped the scale in favour of probability theory. During his Stockholm
years Feller wrote two seminal papers which would establish his prominent
position as a probabilist: his work on limit theorems [Feller 1935c], where he
found the necessary and sufficient conditions for the validity of the Central
Limit Theorem (CLT), and his extension [Feller 1936c] of the results of Kol-
mogorov’s Analytische Methoden paper [27], which provided the foundations
of the theory of Markov processes. The mathematical methods used in both
papers reflected his strong analysis and PDE background which he had ac-
quired in Göttingen. There was more work on limit theorems [Feller 1937a]
(influenced by Feller’s interaction with Marcel Riesz), the paper on the weak
law of large numbers [Feller 1937b] (dedicated to Harald Bohr and written
during Feller’s February 1937 visit to Lund), and another on infinite divisibil-
ity [Feller 1939d]; their mathematical contents is scrutinized in detail in Hans
Fischer’s essay later in these Selecta [16].
At the memorable Geneva Colloque sur le Calcul des Probabilités40 Cramér
and Feller met Neyman who “gave a lecture on his theory of confidence inter-
vals, which was then something quite new”, and met with a skeptical recep-
tion;41 Feller, however, immediately took up this topic in [*Feller 1938b].42
His own contribution to the colloquium [*Feller 1938a]43 was a comparison
of the von Mises’, Tornier’s and Kolmogorov’s approaches to the foundations
of probability theory; he continued this theme in [Feller 1939c]. Interestingly
enough, after all the Tornier’s political scheming in Kiel, Feller unflinchingly
gave him the mathematical credit due. A critical assessment of this work can
be found, again, in Fischer’s essay [15]. At the same time, clearly with Kol-
mogorov [27] and [Feller 1936c], as well as Risser [36], in mind he came up with

39 Cramér [8, p. 519]


40 11–16October 1937, the proceedings are [18].
41 Cramér [8, p. 528]
42 Also, see Neyman [30, p. 57, Footnote (3)].
43 Unfortunately, this important contribution cannot be reprinted in these Selecta because

of the fees demanded by the publisher (Hermann, Paris).

26 William Feller. A Biography


Picture 4: Clara and William Feller
on the train in the 1930s. Photograph
courtesy of Joanne Elliott.

a probabilistic interpretation of Volterra’s theory of biological populations us-


ing discrete-time and continuous-time stochastic processes [Feller 1939a] (see
the commentary by Baake and Wakolbinger later in these Selecta [2]). During
his Stockholm years Feller also started to collaborate with applied scientists
(see [*Feller 1938c, *Feller 1940a]). In [Feller 1939b] he solved a problem “pro-
posed [. . . ] by Conny Palm, of the telephone-administration of Stockholm”,
in connection with studies of telephone traffic; this started a new line of re-
search for Feller and he would return to this topic time and again, see the
commentary by Maller [28] in these Selecta.
During Feller’s stay in Sweden, important events took place in his pri-
vate life as well. On 27 July 1938 he married his former Kiel student, Clara
Nielsen. Back in Zagreb, his parents passed away, his father in 1936 and his
mother in 1938. Most of his siblings still resided in Zagreb, and we know
that he remained in contact with his homeland. In particular, his two pa-
pers [*Feller 1934b, Feller 1939d] appeared first in Croatian in Rad JAZU, the
journal of the Yugoslav (now: Croatian) Academy of Arts and Sciences, and
a shorter German version (“Auszug”, meaning excerpt) with essentially the
same mathematical content appeared in the international issue of Rad. At
least since 1937, Feller had been a corresponding member of the Yugoslav
Academy of Arts and Sciences.44

44 On the titlepage of the Rad-version of [Feller 1939d] there is a note, “Napisao član
dopisnik” [“written by corresponding member”], and the acceptance date (22 November
1937) is given. This, curiously, only appears in the original Croatian version of [Feller 1939d],
cf. also Seneta [40, p. 87 f.] and Žubrinić [47, p. 45 f.; he confirms that Feller became
corresponding member in 1937].

R. Schilling & W. Woyczyński — Selected Works of W. Feller, Volume 1 27


The 1940s: Brown & Cornell
In the summer of 1939 the Fellers were on the move again. Since William could
not get a permanent position in Sweden,45 he accepted the offer of an asso-
ciate professorship at Brown University in Providence, Rhode Island. At that
time R. G. D. Richardson served as dean of the Graduate School at Brown, as
well as Secretary of the American Mathematical Society (from 1921 to 1940).
Richardson recognized early on the tremendous potential for growth of math-
ematics in the U.S. created by European emigré mathematicians and he was
one of the “principal agents of the mathematicians in aiding emigrés”.46 He
mobilized various institutions in his efforts, including the well-known Emer-
gency Committee in Aid of Displaced German [later: European] Scholars, and
was instrumental in creating permanent positions at Brown University for W.
Feller, O. Neugebauer, W. Prager and J. D. Tamarkin. These actions even-
tually led to the foundation of a distinguished applied mathematics division
at Brown. Wartime Brown was an exciting place for mathematics: Run by
Richardson, and with Tamarkin and then Prager as scientific directors, the
programme focussing on fluid dynamics, elasticity, and PDEs was set up;
it included Feller, Prager, and Tamarkin, who had Brown ap-
pointments, and at one time or another, Stefan Bergman, K. O.
Friedrichs, Witold Hurewicz, Charles Loewner, F. D. Murnaghan,
I. S. Solonikoff, Richard von Mises, Stefan E. Warschawski, Antoni
Zygmund and myself [Lipman Bers].47
In 1941 Richardson launched the Brown University Summer Session for Ad-
vanced Instruction and Research in Mechanics (June 23 – September 13, 1941),
and Tamarkin and Feller contributed with their lecture notes [*Feller 1941b] on
PDEs. Ironically, Richardson’s move torpedoed initially a parallel Courant’s
initiative to set up a center for basic and applied mathematics at New York
University.48 Mathematically, Feller’s work fit well into Richardson’s plans at
Brown. There are several papers [*Feller 1940b, *Feller 1940d] where Feller
engages with a non-mathematical (scientific) audience, and one of his notes
[*Feller 1943a] appears to be specifically written for the new Brown University
journal Quarterly of Applied Mathematics. By 1941 Feller’s first graduate stu-
dent (co-mentored with Tamarkin) George E. Forsythe had defended his dis-
sertation on summability methods. Later, Forsythe would become the founder
of the Computer Science Department at Stanford. These developments were
just a small component in a vast web of strong influences that Feller and other
immigrant mathematicians have exerted on the American scientific environ-

45 Possibly, also because of the more or less latent anti-Semitism in Sweden, see the

comments and excerpts from Feller’s letters from 1934 and 1939 in Siegmund-Schultze [41,
pp. 135 f.].
46 Reingold [35, pp. 178, 184]
47 Bers [3, p. 240]
48 Reingold [35, p. 197], Reid [34, p. 228 f.]

28 William Feller. A Biography


Picture 5: William relaxing
on the grass during his Cor-
nell days. Photograph cour-
tesy of Joanne Elliott.

ment after World War II; a thorough discussion of this historical phenomenon
can be found in Siegmund-Schultze [41].
The main thrust of Feller’s research continued to involve limit theorems
and Markov processes. In [Feller 1943c] he strengthens the results of Kol-
mogorov’s paper on the law of the iterated logarithm (LIL), giving necessary
and sufficient criteria for norming functions to be in Lévy’s upper and lower
classes. Feller would come back to this theme time and again. Mark Kac49
called this paper a “veritable tour de force”, and it can be argued that this
paper still is unsurpassed in its completeness and its methods. The paper is
essentially self-contained, relying only on his own previous work [Feller 1943b]
where asymptotic estimates for the tails of sums of (bounded) independent ran-
dom variables were obtained. In a major survey paper [Feller 1945b] he reviews
developments on the LIL, CLT, and its “little brother”, the weak law of large
numbers (WLLN); he was also interested in applications of the CLT and the
WLLN, e.g. in connection with the St. Petersburg paradox [Feller 1945d], and
normal approximations of the binomial law [Feller 1945a]. His paper on con-
ditional probability functions [Feller 1940c] which determine a not necessarily
diffusive Markov process completed his seminal contribution in [Feller 1936c],
and actually introduced the term “Markoff process”. Later, Doob50 would say
that

Feller completely transformed the subject of Markov processes. Go-


ing beyond his 1935 paper, he put the analysis into a modern frame-
work, applying semigroup theory [in the 1950s] to the semigroups
generated by these processes. He observed that the appropriate
boundary conditions for the parabolic differential equations gov-
erning the transition probabilities correspond on the one hand to
the specification of the domains of the infinitesimal generators of
the semigroups and on the other hand to the conduct of the pro-
49 M. Kac: William Feller, in memoriam – reprinted in these Selecta.
50 J. L. Doob: William Feller and twentieth century probability. Reprinted in these Se-
lecta.

R. Schilling & W. Woyczyński — Selected Works of W. Feller, Volume 1 29


cess trajectories at the boundaries of the process state spaces. In
particular, he found a beautiful perspicuous canonical form for the
infinitesimal generator of a one dimensional diffusion. In this work,
he was a pioneer yet frequently obtained definitive results.
The geometry papers [Feller 1942], [Feller 1945c] (jointly with H. Buse-
mann) were the last contributions to geometry; both continued his earlier work.
One could argue that during his stay at Brown Feller mostly was adding fin-
ishing touches to his earlier achievements. The only paper which contained
completely new material was his publication on renewal theory [Feller 1941a].
The reason could have been the new environment he found himself in, as well
as various not research-related activities he was involved in during those years.
Around the year 1938 several mathematicians started a campaign for an
independent American abstracting journal modelled on the Zentralblatt (Zbl)
which was compromised by Nazi racist policies. By the end of May 1939 the
name Mathematical Reviews (MR) was already in use and O. Neugebauer (who
had been editing Zentralblatt from 1931 to 1934 from Göttingen, and then
from 1934 till 1938 from Copenhagen) and J. D. Tamarkin were approached
with the request to serve as founding editors; both were employed by Brown
University and Brown was also the first location for the editorial office (until
1951). The first issue of MR saw the light in January 1940, covering articles
published from July 1939 onwards.
Dr Willy [sic!] Feller [was appointed] to be technical assistant to
the editors for a three year term effective 1 July 1939”. [. . . ] The
name of Feller first appears, with the title Executive Editor in vol.
5 (1944).51
In fact, Feller was Executive Editor from 1941 to 1945 and served in the
editorial committee from 1948 until 1953.52 Since there was not much staff
besides the editors, working for MR was quite time-consuming. Many reviews
were actually written by the editors themselves, and a reviewer search on the
MathSciNet database reveals that Feller wrote 843 abstracts in 14 years (1940–
1954). On top of that he served as President of the Institute of Mathematical
Statistics during 1947.
Towards the end of World War II, in 1944, William Feller became a US
citizen. By 1945 he had quit as Executive Editor of MR, and soon thereafter
decided to leave Brown University in 1945. He accepted a (full) professor-
ship at Cornell University, where he stayed until 1950. During those five
years at Cornell, Mark Kac and Richard Feynman were his colleagues, and
K. L. Chung, G. A. Hunt and G. Elfving were among the many permanent
and visiting staff members. Most of his articles at the time dealt with limit
theorems [Feller 1946a, Feller 1946b], [Feller 1948a] (containing an interesting
foray into statistics) and he developed important ideas regarding fluctuations
51 Pitcher [32, p. 72]
52 Pitcher [32]

30 William Feller. A Biography


and recurrent events [Feller 1949a, Feller 1949b, Feller 1949d]. In the paper
[Feller 1949c] presented at the first Berkeley Symposium Feller collected and
surveyed, for the first time, his research interests in the theory of Markov
processes. Towards the end of his stay at Cornell, Feller had completed the
first edition of the first volume of his landmark monograph, An Introduction
to Probability Theory and its Applications [*Feller 1950]. Volume 1 is ‘elemen-
tary’ in the sense that it does not use measure theory. The more advanced
parts of the theory would appear later in the second volume, but it is interest-
ing to note that many of Feller’s recent research results were directly included.
Work on this monograph must have started several years earlier. Indeed, in
a letter to Borge Jessen53 Feller writes “I am writing myself a book which is
to incorporate the mathematical theory of probability with many applications
and we are just mimeographing the first few chapters.” From this period also
comes the anecdote which was later told by J. Doob:

While writing my book [i.e. Doob: Stochastic Processes, 1953] I had


an argument with Feller. He asserted that everyone said “random
variable” and I asserted that everyone said “chance variable”. We
obviously had to use the same name in our books, so we decided
the issue by a stochastic procedure. That is, we tossed for it and
he won.54

The Princeton Period 1950–1970


In the summer of 1950 Feller was appointed Eugene Higgins Professor of Math-
ematics at Princeton University. Shortly thereafter Feller gave a 45-minute
address [Feller 1952c] in the Statistical Mechanics section at the 1950 Interna-
tional Congress of Mathematicians, which took place between 30 August and
6 September 1950 in Cambridge, Massachusetts. Feller talked about recent
trends in the theory of diffusion processes, covering some familiar terrain in-
cluding his and Kolmogorov’s work on Markov processes, but now clad it in
the new language of semigroup theory. Moreover, he explicitly mentioned Itô’s
approach, Bochner’s subordination, and Riesz’ fractional derivatives. At that
time this was truly visionary, mapping out a programme which would combine
probability theory, (partial) differential equations and functional analysis.
The Princeton years, in particular the 1950s, marked the second extremely
innovative period in Feller’s research. He introduced semigroups into proba-
bility theory (following Bochner [6] and Yosida [46])55 and started discussing
processes on subsets of R (or Rd ), leading to parabolic initial value problems

53 Typewritten letter [14] of Feller to Borge Jessen dated 2 April 1947


54 Snell,interview with Doob [43, p. 307]. Interestingly, this did not prevent Feller to
speak of “chance processes” in [*Feller 1961c].
55 Kendall writes in Reuter’s obituary: “Now Feller, in the preceding year [1951/52] had

given a long course on aspects of the Hille–Yosida theory of semigroups in the context

R. Schilling & W. Woyczyński — Selected Works of W. Feller, Volume 1 31


Picture 6: Clara and William Feller
around 1950. Photograph courtesy of
Joanne Elliott.

with additional boundary conditions. At the same time he understood (but


rarely worked himself on)56 the pathwise interpretation of diffusions offered
by Itô’s approach. The collaboration of one of his students, H. P. McKean,
with K. Itô, culminating in the celebrated monograph [23], was certainly no
accident, see also the discussions in Fukushima’s and Peskir’s essays [19, 31]
in these Selecta.
In a series of papers57 he develops semigroup theory with probabilistic
applications in mind which would usually be phrased in terms of the Kol-
mogorov backward and forward equations. Very important contributions are
contained in the 1953 papers where he relaxes the continuity conditions (both
in space and time) which are needed for the backward and forward Kolmogorov
equations. Even more important is his discovery in [Feller 1952a] that the
corresponding adjoint of the backward equation leads to an “essentially new
boundary condition”58 of second order. This line of research led to the general
characterization of boundary conditions for one-dimensional diffusion opera-
tors which appeared in a cascade of papers59 ; this body of work certainly
is Feller’s most cited contribution to pure mathematics, see the essays by
Fukushima [19], Jacob [24] and Peskir [31] in these Selecta.

of probability theory” and he goes on how “semigroup analysis” helped him (Kendall) to
understand and work (with Harry Reuter) on general Markov chains [25, p.178].
56 The interpretation of the boundary conditions in terms of the processes in [Feller 1954b]

is one notable exception. In general, however “[h]e was one of the first generation who
thought probabilistically [. . . ], but when it came to writing down any of his results for
publication, he would chicken out and recast the mathematics in purely analytic terms”
(Rota [38, p. 227]). The same pattern can already be observed in Feller’s early contributions
to probability theory, see Fischer [16].
57 notably in [Feller 1952a, Feller 1952b, Feller 1952e, Feller 1953a, Feller 1953b,

Feller 1959c]
58 Yosida, review MR0047886 (13,948a) of [Feller 1952a].
59 including [Feller 1952a, Feller 1954a, Feller 1954b, Feller 1955a, Feller 1955b,
Feller 1957a, Feller 1957d, Feller 1958]

32 William Feller. A Biography


Picture 7: William Feller at his home
in Princeton. Photograph courtesy of
Joanne Elliott.

At about the same time, building on his ideas from [Feller 1939a], he
presents a groundbreaking paper at the Second Berkeley Symposium on Math-
ematical Statistics and Probability [Feller 1951d], paving the way for non-
trivial applications of stochastic processes, in particular in genetics. At the
end of the decade, he connects his diffusion and semigroup theory with birth
and death processes [Feller 1959b] – an idea which was already present in the
1939 paper. Feller’s pioneering work on mathematical biology is among his
most influential contributions to science, see the commentary by Baake and
Wakolbinger [2] included in these Selecta.
At Princeton he was able to attract many exceptionally talented gradu-
ate students and young researchers to either work with him or to be strongly
influenced by his work. Among his Princeton PhD students were Patrick
Billingsley (1955), Henry McKean (1955), Hale Trotter (1956), Frank Knight
(1956), David Freedman (1960), Lawrence Shepp (1961), Martin Silverstein
(1965), Benjamin Weiss (1965) and Loren Pitt (1967); the Mathematics Ge-
nealogy Project60 lists 22 students and 1043 descendants. Many established
international visitors were attracted to Princeton as well. For example, Kiyosi
Itô visited the Institute for Advanced Study in Princeton from 1954 to 1956,
and almost immediately started working with Henry McKean, then one of
Feller’s graduate students, and soon they both started exchanging their ideas
with Feller. Their celebrated work on diffusion processes was in many ways
influenced by Feller’s programme: In Itô and McKean’s words “W. Feller has
our best thanks, his ideas run through the whole book.”61 . In his MR book
review, S. Watanabe wrote:
Around 1955, W. Feller’s work on linear diffusion [MR0047886 (13,948a)
[Feller 1952a]; MR0068082(16,824g) [Feller 1955a]; and many oth-
ers], which was primarily of analytic character, spurred some out-
60 genealogy.math.ndsu.nodak.edu/id.php?id=33019 (accessed 5 August 2014).
61 Itô and McKean [23, p. XI]

R. Schilling & W. Woyczyński — Selected Works of W. Feller, Volume 1 33


standing probabilists (including the present authors [Itô, McKean])
to re-establishing Feller’s results by more probabilistic methods,
solving some conjectures presented by Feller (in his paper or pri-
vately) and, finally, studying the structure of the sample paths of
linear diffusion profoundly.62

This nicely illustrates how Feller handled his interactions with his students. He
never competed with them head-on but carefully led them onto independent,
but still related to his own work, paths of research.
The 1950s were not only an exceptionally productive period for Feller but
they also marked a breakthrough in the appreciation of his work in a wider
mathematical and scientific community. At the age of only 45 he got appointed
to a named chair at Princeton University, which was just about to enter top
rankings of the mathematical world, taking over the role Göttingen had played
before the war. In 1958 he was a plenary lecturer at the International Congress
of Mathematicians (ICM) in Edinburgh and in the same year he was named
a Fellow of the American Academy of Arts and Sciences. Two years later,
William Feller became an elected member of the National Academy of Sci-
ences (USA)63 , and among the many signs of international recognition was
his appointment to serve on the Fields Medal Committee for the 1966 ICM in
Moscow.
Feller continued working on his benchmark monograph. The second, slightly
enlarged, but substantially changed, edition of Volume 1 appeared in 1957
and the preparations for the much more advanced Volume 2 (which would
appear in 1966) must have been in full swing. The content of both volumes
reflects Feller’s research interests, for example, the material from the papers
on fluctuations and coin tossing [Feller 1949b, *Feller 1951c, *Feller 1957b,
*Feller 1959a] have been included in Volume 1, while Volume 2 would estab-
lish semigroup theory as a tool in probability and it is still one of the most
complete discussions of limit theorems.
The 1958 ICM at Edinburgh showcased Feller at the height of his ca-
reer. In the one-hour plenary talk [Feller 1960] he summed up his bound-
ary theory, reviewed its connections with Martin boundaries, and explained
the intimate link between probability and potential theory. But after the
Congress had adjourned he left the field which he had initiated over the
past decade, and turned his attention again to limit theorems and appli-
cations. As in Stockholm, a quarter-century earlier, he started interacting
again with scientists and engineers. The expository paper [*Feller 1961c] was
aimed at engineering students and he returned to research in mathematical
population biology and genetics: [Feller 1966b, Feller 1967c, *Feller 1969a,
*Feller 1969f]. Jointly with S. Orey he studied renewal theorems for random
walks [Feller 1961a, Feller 1961b], and strong ratio theorems [Feller 1966a]. In
the field of limit theorems he returned to the topics he investigated in the
62 S. Watanabe, review MR 0199891 (33 #8031) of the monograph [23] by Itô and McKean.
63 Rosenblatt [37, p. 12]

34 William Feller. A Biography


decade starting in 1935, studying in a series of papers fluctuations and oscilla-
tions of series of random variables [Feller 1966c, *Feller 1969e, Feller 1970], and
limit theorems for large deviations and the LIL [*Feller 1968c, Feller 1968d,
Feller 1969d, Feller 1969c]. In connection with limit theorems he contributed
to Tauberian theory (for Laplace transforms) and the theory of regular varia-
tion [Feller 1963, Feller 1967a, *Feller 1968a] and [Feller 1969b]. A discussion
of the mathematical contents of those papers can be found in Maller’s com-
mentary [28] in these Selecta.
By the end of 1965 Feller had completed the first edition of the second
volume of his exceptional monograph on probability and statistics. He would
continue working on the next edition until the end of his life, and it is thanks
to many of his former graduate students and collaborators that the last edi-
tion was eventually completed after his death. In 1966 Feller was also awarded
the title of permanent visiting professor at the Rockefeller University in New
York, NY. The university, having recently changed its name from Rockefeller
Institute, was expanding its mission and started to hire faculty with expertise
in physics and mathematics. Feller’s collaborators there included T. Dobzhan-
sky64 , among others, and the paper [Feller 1966b] was the first of Feller’s pub-
lications with both affiliations, Rockefeller and Princeton University.65 During
that time Feller also worked on the the third edition of Volume 1 of his proba-
bility textbook which appeared in 1968 [*Feller 1968e] (with a corrected reprint
in 1970); again it differs considerably from the two earlier editions.
William Feller is remembered not only for his scientific achievements and
his monographs, but lives on in the memories of many contemporaries as a
“man of wide interests, with a profound knowledge of subjects like history and
literature, and a great love of music.” 66
Feller was an ebullient man, who would rather be wrong than unde-
cided, and who preferred getting a hearing for his views to getting
applauded for them. [. . . ] He spoke loudly, very fast, with a strong
Yugoslav accent, with wit and charm and understanding.67
He was short, compact, with a mop of wooly gray hair, irrepress-
ible. In conversation quick, always ready with an opinion (or two)
addicted to exaggeration. If you knew the code, you applied the
“Feller factor” (discount by 90%) [. . . ] [he was] so full of fun.68
As a lecturer he was “loud and entertaining” and, although his proofs were not
always correct or complete, the underlying idea was usually sound.69 Coming
to the ‘end’ of such an incomplete proof, Feller would stare out at the people
64 Theodosius Dobzhanski was one of the founding fathers of what is now known as evolu-

tionary synthesis within modern evolutionary biology. It is worth pointing out that Feller’s
publications in mathematical biology were strongly influenced by his work.
65 Rosenblatt [37, p. 12]
66 Cramér [7, p. 436]
67 Halmos [20, p. 94]
68 McKean quoted in Rosenblatt [37, pp. 13 f.]
69 Rota [38, p. 227]

R. Schilling & W. Woyczyński — Selected Works of W. Feller, Volume 1 35


Picture 8: One of the portrait pic-
tures taken by R. W. Collier at IBM’s
Thomas J. Watson Research Center,
where Feller served as a consultant for
many years. These photographs were
also used by Monroe Donsker’s wife,
Mary, to paint Feller’s watercolor por-
trait which is in the possession of Dr.
Elliot. Photograph courtesy of Joanne
Elliott.

in the class “with a set chin and an earnest look on his face, trying to elicit a
slight nod from people, and he would often get such a nod.”70 In this context
Mark Kac coined the expression “proof by intimidation”.71 Feller did not
adopt Landau’s Definition – Theorem – Proof style for his lectures and he did
not routinely use the headlines “Theorem” or “Proof” on the blackboard.72
We do not know when exactly Feller learned about his terminal illness, but
we do know that he was aware of it for a while:

Having accepted the verdict himself he tried to make it easy for all
of us to accept it too. He behaved so naturally and he took such
interest in things around him that he made us almost forget from
time to time that he was mortally ill.73

William Feller died of cancer in a New York hospital on 14 January 1970. He


did not live long enough to be present at the award ceremony for perhaps the
most distinguished of his many recognitions, the National Medal of Science for
1969. His widow Clara received the award at the White House on 16 February
1970. After her husband’s death, Clara continued her engagement with the
mathematical community, and served as the Technical Editor for the Annals of
Mathematics taking care of the Volumes 94/95 (1971/72). The second edition
of Volume 2 of Feller’s textbook [*Feller 1971] appeared posthumously in 1971;
70 Knapp[26]
71 Rota
[38, pp. 227 f.]. Knapp [26] confirms that the phrase “proof by intimidation” was
actually used in Feller’s class in the academic year 1962/63, but he does not remember who
had introduced it; it may well have been Feller’s own joke on various ways to prove things.
72 Knapp [26]
73 Mark Kac in Rosenblatt [37, p. 13].

36 William Feller. A Biography


Picture 9: Clara Feller around 1960.
Photograph courtesy of Joanne
Elliott.

“[t]he manuscript had been finished at the time of the author’s death but no
proofs had been received” and proofreading, indexing and final touches were
done by Feller’s students.74
Acknowledgement. Many colleagues supported us when writing this bio-
graphical sketch. In particular we would like to thank Christian Berg for sup-
plying Feller’s letters to Borge Jessen, Joanne Elliott for sharing with us her
recollections and supplying many photographs of Feller, Hans Fischer, Niels
Jacob, Joseph Rosenblatt and Ken-iti Sato for critical remarks and stimulating
discussions. Tony Knapp kindly told us about Feller’s teaching at Princeton
and he shared with us his personal lecture notes of some of Feller’s classes. Our
Croatian colleagues, Nikola Sandrić, Hrvoje Šikić and Zoran Vondraček helped
us with many documents and sources in Croatian which would otherwise have
been inaccessible to us. Part of this exposition is based on Hrvoje Šikić’s arti-
cle [42] on Feller. We are also indebted to Ulrich Hunger from the Staats- und
Universitätsbibliothek Göttingen who provided scans of the Promotionsakte,
to Ms. Christiane Weber of TU Dresden for digitizing the photographs and
to Ms. Sarah Hamdouchi of the Einwohnermeldeamt Flensburg, Germany, for
checking the birth register of the city of Flensburg.

References
All citations of the form [Feller 19nn], resp., [*Feller 19nn] (if the respective
paper is not included in these Selecta) point to Feller’s bibliography, pp. xxv–
xxxiv.
[1] Promotionsakte [dissertation file] “Willy Feller”. Universitätsarchiv Göt-
tingen, Signatur [Math Nat Prom 0010, 23].
74 [*Feller 1971, p. xi]; J. Goldman, A. Grunbaum, H. McKean, L. Pitt and A. Pittenger

R. Schilling & W. Woyczyński — Selected Works of W. Feller, Volume 1 37


[2] Baake, E., Wakolbinger, A.: Feller’s Contributions to Mathematical Biol-
ogy. These Selecta, Vol. 2, pp. 25 ff.
[3] Bers, L.: The migration of European mathematicians to America. In:
Duren, P. L. et al.: A Century of Mathematics in America Volume 1.
Am. Math. Soc., Providence (RI) 1988, pp. 231–243.
[4] Birnbaum, Z. W. (ed.): William Feller, 1906–1970. Ann. Math. Statist.
41.6 (1970) iv–xiii. Reprinted in these Selecta, Vol. 1.
[5] Brillinger, D. R., Davis, R. A.: A conversation with Murray Rosenblatt.
Statistical Science 24 (2009) 116–140.
[6] Bochner, S.: Diffusion equation and stochastic processes. Proc. Natl. Acad.
Sci. USA 35 (1949) 368–370.
[7] Cramér, H.: Obituary Note. William Feller 1906–1970. Revue de l’Institut
International de Statistique / Review of the International Statistical In-
stitute 38 (1970) 435–436.
[8] Cramér, H.: Half a century with probability theory. Some personal rec-
ollections. Ann. Probab. 4 (1976) 509–546. Reprinted in Martin-Löf, A.:
Collected Works of Harald Cramér Vol. 2. Spinger, Berlin 1994, pp. 1352–
1389.
[9] Doob, J. L.: William Feller and Twentieth Century Probability. In: Le
Cam, L. and Neyman, J. (eds.): Proceedings of the Sixth Berkeley Sym-
posium on Mathematical Statistics and Probability. Vol. II, pp. xv–xxi.
University of California Press, Berkeley (CA) 1972. Reprinted in these
Selecta, Vol. 1.
[10] Fatović-Ferenčić, S. and Ferber-Bogdan, J.: Ljekarnik Eugen Viktor Feller
(Pharmacist Eugen Viktor Feller). (Croatian with English summary)
Medicus, 6 (1997) 277–283.
[11] Fatović-Ferenčić, S. and Ferber-Bogdan, J.: Ljekarna K sv. Trojstvu:
izgubljeni sjaj zagrebačke secesije. (Croatian) Medicus, 16 (2007) 121–
128.
[12] Feller, M.: Das Landhaus E. V. Feller in Agram. Von Architekt Math-
ias Feller in München. Innendekoration 25 (September 1914) 367–
398. digi.ub.uni-heidelberg.de/diglit/innendekoration1914/0396 – digi.ub.
uni-heidelberg.de/diglit/innendekoration1914/0428
[13] Feller, W.: Handwritten letter to Borge Jessen, dated 1 September 1946.
(The letter was kindly provided by Christian Berg, June 2014).
[14] Feller, W.: Typed letter to Borge Jessen, dated 2 April 1947. (The letter
was kindly provided by Christian Berg, June 2014).
are explicitly mentioned.

38 William Feller. A Biography


[15] Fischer, H.: Feller’s Early Work on Measure Theory and Mathematical
Foundations of Probability. These Selecta, Vol. 1, pp. 43 ff.
[16] Fischer, H.: Feller’s Early Work on Limit Theorems. These Selecta, Vol. 1,
pp. 69 ff.
[17] Fraenkel, A. A.: Lebenskreise. Aus den Erinnerungen eines jüdischen
Mathematikers. Deutsche Verlags-Anstalt, Stuttgart 1967.
[18] Fréchet, M. (ed.): Colloque Consacré à la Théorie des Probabilités (I–
VIII). Actualités Scientifiques et Industrielles, tomes 734–740, 766, Her-
mann, Paris 1938/39.
[19] Fukushima, M.: Feller’s Contributions to the One-Dimensional Diffusion
Theory and Beyond. These Selecta, Vol. 2, pp. 63 ff.
[20] Halmos, P.: I Want to be a Mathematician. An Automatography. Springer,
New York (NY) 1985.
[21] Hochkirchen, T.: Wahrscheinlichkeitsrechnung im Spannungsfeld von
Maß- und Häufigkeitstheorie – Leben und Werk des „Deutschen“ Mathe-
matikers Erhard Tornier (1894–1982). NTM International Journal of His-
tory & Ethics of Natural Sciences, Technology & Medicine 6 (1998) 22–41.
[22] Hrvatski Biografski Leksikon, Vol 4 (E–Gm), ed. by Trpimir Macan, Lek-
sikografski Zavod Miroslav Krleža, Zagreb, 1998. URL: hbl.lzmk.hr
[23] Itô, K., McKean, H. P.: Diffusion Processes and Their Sample Paths.
Springer, Grundlehren Bd. 125, Berlin 1965.
[24] Jacob, N.: Feller on Differential Operators and Semi-groups. These Se-
lecta, Vol. 2, pp. 45 ff.
[25] Kendall, D. G., Bingham, N. H., Sondheimer, E. H.: Obituary. Gerd
Edzard Harry Reuter. Bull. London Math. Soc. 27 (1995) 177–188.
[26] Knapp, A.: Private communication. E-mail to the editors dated 28 August
2014.
[27] Kolmogoroff, A. N.: Über die analytischen Methoden in der Wahrschein-
lichkeitsrechnung. Math. Ann. 104 (1931) 415–458. English translation:
On analytical methods in probability theory, in: Shiryayev, A.Ṅ.: Selected
Works of A. N. Kolmogorov. Kluwer, Dordrecht 1992, Vol. 2, pp. 62–108.
[28] Maller, R.: Feller’s Work in Renewal Theory, the Law of the Iterated
Logarithm, Karamata Theory and Related Topics. These Selecta, Vol. 2,
pp. 95 ff.
[29] Mukhopadhyay, N: A conversation with Ulf Grenander. Statistical Science
21 (2006) 404–426.

R. Schilling & W. Woyczyński — Selected Works of W. Feller, Volume 1 39


[30] Neyman, J.: L’estimation statistique traitée comme un problème classique
de probabilité. Exposé III, pp. 25–57 in Vol. VI of [18].
[31] Peskir, G.: On boundary behaviour of one-dimensional diffusions: From
Brown to Feller and beyond. These Selecta, Vol. 2, pp. 77 ff.
[32] Pitcher, E.: Mathematical Reviews. In: Pitcher, E. (ed.): A History of the
Second Fifty Years, American Mathematical Society 1939–1988: Volume
I. American Mathematical Society, Providence (RI) 1988, pp. 69–89.
[33] Reid, C.: Hilbert. Springer, New York (NY) 1970.
[34] Reid, C.: Courant in Göttingen and New York. The Story of an Improbable
Mathematician. Springer, New York (NY) 1976.
[35] Reingold, N.: Refugee mathematicians in the United States of America,
1933–1941: Reception and reaction. In: Duren, P. L. et al. (eds.): A Cen-
tury of Mathematics in America Volume 1. Am. Math. Soc., Providence
(RI) 1988, pp. 175–200. Originally published as Annals of Science 38
(1981) 313–338.
[36] Risser, R.: Applications de la statistique à la démographie et à la biologie.
Traité du calcul des probabilités et de ses applications, III.3. Gauthier–
Villars, Paris 1932.
[37] Rosenblatt, M.: William Feller. July 7, 1906–January 14, 1970. In: Na-
tional Academy of Sciences, Office of the Home Secretary (ed.): Biograph-
ical Memoirs. Volume 90, National Academies Press 2009. (This is mainly
based on [4] and [9].)
[38] Rota, G. C.: Fine Hall in its Golden Age: Remembrances of Princeton. In:
Duren, P. L. et al. (eds.): A Century of Mathematics in America. Volume
2. Am. Math. Soc., Providence (RI) 1988, pp. 195–236.
[39] Scholz, E.: Feller and Busemann on Surface Theory — Contributions to
Geometry. These Selecta, Vol. 1, pp. 87 ff.
[40] Seneta, E.: Karamata’s characterization theorem, Feller, and regular vari-
ation in probability theory. Publications de l’Institut Mathématique, Nou-
velle série, 71(85) (2002) 79–89.
[41] Siegmund-Schultze, R.: Mathematicians Fleeing from Nazi Germany. In-
dividual Fates and Global Impact. Princeton University Press, Princeton
(NJ) 2009.
[42] Šikić, H.: William Feller. In: J. Herak: Distinguished Croatian Scientists in
America. Croatian–American Society and Matica Hrvatska, Zagreb 1997,
pp. 105–115.
[43] Snell, J. L.: A conversation with Joe Doob. Statistical Science 12 (1997)
301–311.

40 William Feller. A Biography


[44] Steinhaus, H. : Les probabilités dénombrables et leur rapport à la théorie
de la mesure. Fundamenta Mathematicae 4 (1923) 286–310.
[45] Uhlig, R. (ed.): Vertriebene Wissenschaftler der Christian-Albrechts-
Universität zu Kiel (CAU ) nach 1933. Kieler Werkstücke Reihe A:
Beiträge zur schleswig-holsteinischen und skandinavischen Geschichte
Band 2, Peter Lang, Frankfurt am Main 1991.
[46] Yosida, K: An operator-theoretical treatment of temporally homogeneous
Markoff process. J. Math. Soc. Japan 1 (1949) 244–253.

[47] Žubrinić, D.: Vilim Feller / William Feller (Croatian and English).
Graphis, Zagreb 2010.

Prof. René L. Schilling Prof. Wojbor A. Woyczyński


Fachrichtung Mathematik Department of Mathematics,
Institut für Mathematische Stochastik Applied Mathematics and Statistics
Technische Universität Dresden Case Western Reserve University
Helmholtzstraße 10 10900 Euclid Avenue
01069 Dresden Cleveland, Ohio 44106
Germany U.S.A.
[email protected] [email protected]

R. Schilling & W. Woyczyński — Selected Works of W. Feller, Volume 1 41


Feller’s Early Work on
Measure Theory and
Mathematical Foundations
of Probability
by Hans Fischer from Eichstätt

Introduction
During the first years of the 1930s, measure and integration as well as the
theory of probability began to gain the shape which we are used to these
days. Monographs like Saks’s “Theory of the Integral” (1st edn. 1933 [38]1 in
French, 2nd edn. 1937 [40] in English) and Kolmogorov’s “Grundbegriffe der
Wahrscheinlichkeitsrechnung” (1933, [27]) still provide a very “modern” im-
pression to the reader. Feller’s famous probabilistic papers from the second half
of the 1930s on limit theorems and related topics are, in accord with his first
publications, rather within the scope of “classical” analysis, and this fact cer-
tainly contributed to the popularity of these accounts. Therefore, it is a little
surprising, at first glance, that he had, from the beginning of his mathematical
career, very strong interests in measure and integration, and related proba-
bilistic problems. This interest is also reflected by his activities as a reviewer
in the “Jahrbuch über die Fortschritte der Mathematik”, and, especially, the
“Zentralblatt”; the latter had been founded in 1931 by his friend Otto Neuge-
bauer, whom Feller knew since their common Göttingen days. Altogether,
Feller wrote 7 longer reviews in these journals during the 1930s, among them
on the above-mentioned “Grundbegriffe”, and on the likewise very influential
survey “Asymptotische Gesetze” by Khinchin [26], which appeared in 1933
as well. Feller’s report (Zbl 0007.21601) on Kolmogorov’s booklet is – as a

1 The references [Feller 19nn] and [*Feller 19nn] (the star indicating that the respective

paper is not contained in these Selecta) refer to Feller’s bibliography, while [n] points to the
list of references at the end of this essay.

Ó Springer International Publishing Switzerland 2015 43


R.L. Schilling et al. (eds.), Selected Papers I,
common feature of the reviews in both journals – quite neutral, a certain sym-
pathy is visible, though. Feller designates Kolmogorov’s approach as “imbed-
ded, very naturally, into general measure theory”, Kolmogorov’s demand for
sigma-additivity is called “more than plausible”, and its significance for “well-
roundedness and simplicity of the theory” is pointed out. With respect to
sigma-additivity, Feller is in a certain contrast to Karl Dörge’s (likewise sym-
pathetic) review (JFM 59.1152.03) in “Jahrbuch”. Whereas Dörge refers to
the “large distance” caused by sigma-additivity “to the origin of probabil-
ity theory, which resides in frequency theory”, Feller, as it seems, attributes
sigma-additivity an “ideal” role, which does not affect the applicability of the
theory. And in fact, especially Feller’s contributions to the theory of frequency,
the seemingly “natural” antipode of the concept of probability as an abstract
measure, show the author’s approach of basing all probabilistic problems on
the theory of measure and integration as a common mathematical ground.

Measure and Integration


Feller’s excellence regarding measure and integration is shown by his pertinent
work between 1930 and 1935, which was, in part, jointly done with his col-
leagues Erhard Tornier and Herbert Busemann. Roughly speaking, the main
problems of all these papers are extension of measures on the one hand, and
differentiation of “indefinite integrals” with respect to sets, on the other.2
Both themes were in the core of the contemporary activities concerning
measure and integration, and both were connected to each other. Feller’s ap-
proach to measure extension was essentially in the scope of metric spaces, and
there were still essential analogies with Radon’s traditional 1913 method of ex-
tending non-negative and countably additive interval functions as measures to
Borel sigma-algebras [37]. Yet, Feller’s particular concern was not to consider
interval functions (or additive functions on corresponding systems of sets) as
given, but to “construct” them by a process whose structure corresponds to a
combinatorial tree.
The starting point of Feller’s activities in measure and integration was his
joint work together with Tornier. Feller had met Tornier in Kiel, and, for the
time being, there had been a close collaboration between the two mathemati-
cians (see [41, p. 23]). On the basis of available biographic material, it is not
possible to discern who of the two authors was exactly responsible for which
mathematical results. It is an obvious assumption that Tornier’s interests in
foundational issues of probability and number theory on the one hand, and
Feller’s analytic ambitions on the other provided the basis for their joint work.
At least since Borel’s seminal contribution (1909, [5]) on the distributions
of particular real numbers under the assumption that their binary represen-

2 For an outline of the history of measure theory from ca. 1900 to ca. 1950, see [36].

A wealth of historical details can be found in [17]. Especially with regard to the relations
between the development of probability theory and measure theory, see [23] and [42].

44 Measure Theory and Foundations of Probability


tation was generated by a Bernoulli process, which also contained first, rudi-
mentary versions of strong laws, a variety of accounts dealt with questions of
the distributions of specific numbers in a close relation with measure theoretic
probabilistic interpretations and methods (see [52, pp. 46–60]). A “density”
assigned to particular natural numbers which appear N (n) times among the
first n natural numbers, is defined as
1
lim N (n),
n→∞ n
provided this limit exists. If 2 = p1 < p2 < · · · < pk < . . . is the ordered sequence
of prime numbers, then each natural number z can be uniquely represented by
a sequence (λk ) ⊂ N0 of exponents such that

 λ
z= pk k ,
k=1

where only finitely many λk are not zero. In this sense, the study of cer-
tain sets of natural numbers can be transferred to the study of sequences
(λk ), and the idea is to interpret densities of numbers with certain properties
as measures in certain sequence spaces. This was the point where Tornier’s
and Feller’s collaboration began. The papers “Maß- und Inhaltstheorie des
Baireschen Nullraums” [*Feller 1932a] and “Mengentheoretische Untersuchung
von Eigenschaften der Zahlenreihe” [*Feller 1932b] have to be considered as
closely connected, as one can see from the summarizing report on both papers
[*Feller 1931], and also from the same submission date 10 July 1931.
The Baire null space R is defined [*Feller 1932a, p. 166] as a space of
sequences
(ρ ) (ρ ) (ρ )
x = e1 1 e2 2 e3 3 . . .
(ρ )
of “symbols” ei i (i = 1, 2, 3, . . . ), where the symbols with the integer variables
(ρ )
ρi are elements of sets {ei i | 0 ≤ ρi ≤ ti } with fixed constants ti ∈ N0 ∪ {∞}
such that at most finitely many ti are equal to 0. In the most common case
all these sets are simply identical to N0 , and the Baire space is the space NN 0
consisting of all sequences of non-negative integers. Each of the more general
Baire null spaces
(ρ ) (ρ )
{e1 1 } × {e2 2 } × · · ·
can be identified with a subspace of NN0.
3 The “distance” of two “points”
(ρi ) (σi )
x = (ei ) and y = (ei ) in R, written symbolically as xy, is defined by
1
xy :=
n
if n is the smallest number such that ρn = σn . Two sequences which are
3 Insofar, the use of the symbols e was unnecessarily complicated. Intricate notation
i
was a specialty of Tornier.

H. Fischer — Selected Works of W. Feller, Volume 1 45


different in the first element have the distance 1, and two identical sequences
have the distance 1/∞ = 0. One easily checks that all Baire spaces equipped
with this metric are complete and separable metric spaces.4
The construction of measures on Baire spaces R departs from a hierarchy
of so-called “basic sets” [*Feller 1932a, pp. 167 f.]. R itself is called “basic set
of zeroth grade”, and for all n > 0 the basic sets of nth grade are defined as
consisting of all sequences whose first n members are fixed. Each basic set is
both open and closed, and therefore (in the now usual terminology) a “clopen”
set with respect to the topology induced by the Baire metric. In German these
sets were termed “a.o.-Mengen”, where “a” stood for “abgeschlossen” (closed)
and “o” for “offen” (open). Any two basic sets are disjoint or nested. On these
basic sets a “measure” | · | is defined [*Feller 1932a, p. 169] such that

(i) 0 ≤ |E| < ∞ for all basic sets E, |∅| = 0

(ii) If an nth grade basic set E is decomposed into basic sets E (i) of (n + 1)st
grade (whose elements coincide with respect to their first n coordinates
with all elements of E), then

|E| = |E (i) |.

Feller and Tornier show


 that this “measure” can be extended to open sets
F such that |F | = |Eν | for any decomposition of F into basic sets Eν
[*Feller 1932a, pp. 169–176]. The outer measure A of a set A ⊂ R is defined
in the usual way as the infimum of the measures of all open sets containing A.
The inner measure A is correspondingly defined by

A = |R| − (R \ A).

A set A ⊂ R is called measurable if and only if A = A, and in this case the


common value |A| := A = A is said to be its measure. Referring to the usual
arguments in relation with Lebesgue measure, Feller and Tornier finally con-
clude that all sets which are measurable (in the just indicated sense) form
a sigma-algebra with their measures being countably additive [*Feller 1932a,
pp. 176–179].
Feller adapted this method for the construction of measures on Rn in his
paper [*Feller 1932c, pp. 460–465]. The simplest case concerned a measure
concentrated on an interval I = [a, b). In a first step, this interval is divided
into two parts, each closed on the left and open on the right. Each of these
two parts is divided in a second step into two parts again, closed on the left
and open on the right, and so on. The successive division has to be pursued in
4 Inits original form the Baire null space was introduced in 1909 as the space of all
sequences of natural numbers. Baire [1, pp. 133 f.] observed that “his” space behaved,
e. g., with respect to connectivity, very differently compared with “ordinary” spaces Rn .
Therefore he named this space “espace à 0 dimension” without explicitly referring to a
particular concept of dimension. Thanks to Bernhard Beham for this information.

46 Measure Theory and Foundations of Probability


a way such that the lengths of the respective intervals tend to 0. The interval
I is termed “zeroth grade basic set”, its two parts from the first step “first
grade basic sets”, the altogether 4 parts of the second step “second grade
basic sets”, and so on. Any two different basic sets are disjoint, or one of
them is a subset of the other. The analogy to the kth grade basic sets of the
Baire null space is obvious, especially when observing that, with respect to the
topology generated by all half open intervals [x1 , x2 ) with a ≤ x1 and x2 ≤ b,
the just defined basic sets are clopen as well. Feller explains this in the same
sense. He uses a somewhat different phrasing, the now common terminology
of general topology was not entirely established by the early 1930s.
To all basic sets E a non-negative “measure” |E| is assigned, such that
|E| = |E (1) | + |E (2) | whenever E is the union of two disjoint basic sets E (1)
and E (2) . A second condition for the measure of basic sets is that for all nested
systems of basic sets E (1) ⊃ E (2) ⊃ E (3) ⊃ . . . whose intersection is empty, the
measure |E (n) | tends to 0 for n → ∞. Because of the well-known relation of this
condition to sigma-additivity, which in the case of Baire spaces with respect to
basic sets is given by condition (ii) above, it is not surprising that Feller was
able to extend the measure defined on the basic sets of I by essentially the
same arguments as with Baire spaces to a (uniquely determined) countably
additive measure defined on the sigma-algebra generated by the basic sets.
In turn, if any nonnegative increasing and bounded function m(x) is given
for x ∈ I, such that the measure of a subinterval [x1 , x2 ) is defined by m(x2 ) −
m(x1 ), then Feller’s construction can be applied and leads, independently of
the particular choice of the basic sets, to a sigma-additive measure defined on
all Borel sets and all sets whose difference to a Borel set is contained in a set
of measure 0 [*Feller 1932c, pp. 465 f.].
At the end of his account, Feller sketched how his method of construc-
tion can be carried over, with minor modifications, to bounded intervals in
higher dimensions [*Feller 1932c, pp. 471 f.]. A direct continuation of these
ideas is Feller’s paper “Bemerkungen zur Masstheorie in abstrakten Räumen”
[*Feller 1934b], which essentially deals with (sigma-finite) measures in compact
and separable metric spaces and also contains, in its last part, some remarks
on non-additive “measures”.
Feller’s approach had some advantages, with respect to “Lebesgue’s main
theorem”, for example, according to which each measure that is absolutely
continuous in relation to a second can be represented by an integral with re-
spect to the second measure [*Feller 1932c, p. 472]. Another example was the
straightforward extension to a theory of measure and integration in product
spaces with “easy” proofs for the theorems of Fubini and Tonelli [*Feller 1934b,
pp. 42 f.]. Feller’s approach was restricted, however, to spaces with specific
topological properties. Moreover, Hahn’s (1933, [19, § 2]) and Kolmogorov’s
(1933, [27, Chapter 2]) measure extension theorem (according to which – in
modern terms – a pre-measure defined on a set algebra can be uniquely ex-
tended to a measure in the smallest sigma-algebra containing the set algebra)
would considerably have simplified Feller’s arguments. Altogether, the impact

H. Fischer — Selected Works of W. Feller, Volume 1 47


of Feller’s papers on “measure construction” remained rather low.
The paper [Feller 1934a] which Feller published together with Herbert
Busemann was on the differentiation of so-called “indefinite integrals” in the
scope of Lebesgue’s theory. If f is a locally summable function on Rn , then
for any system R of Lebesgue measurable and bounded subsets of Rn the
indefinite integral of f is defined as the mapping

R ρ → f (x)dx =: I(ρ).
ρ

If for a point Q ∈ Rn and a certain sequence ρ1 , ρ2 , · · · =:{ρk } ⊂ R with


Lebesgue measure m(ρk ) > 0, such that diam(ρk ) → 0 and ρk = {Q}, the
limit
1
(1) lim I(ρk )
k→∞ m(ρk )
exists, then this limit is called “derivative” of the indefinite integral in Q with
respect to the given sequence of sets. The basic problem in this context is:
Which properties have to be imposed on set sequences assigned to any point
Q ∈ Rn such that for all locally summable functions f and almost all points
Q ∈ Rn the derivative with respect to these sequences exists and is equal to
f (Q)?
In the case of locally summable functions f of one variable with R being
the system of all bounded intervals, the problem is equivalent to that  x of the
existence of the derivative with respect to x of an integral function a f (s)ds.
Lebesgue [28] in 1904 (see [33, p. 1092]) proved that
 x
d
(2) f (s)ds = f (x)
dx a
almost everywhere. In 1908, Vitali [49] (see [30, pp. 362 f.]; [33, p. 1131])
sketched how this assertion can be extended to several dimensions. In the
two-dimensional case we have
 x+h  y+k
1
(3) lim f (s, t)dsdt = f (x, y)
h,k→0 hk x y
0<α<|h/k|<β

for almost all (x, y), provided that f is locally summable. Strictly speaking,
the constraint 0 < α < |h/k| < β was made explicit only by Lebesgue (1910, [30,
p. 363]) in the introduction of his comprehensive article on relations between
differentiation and integration in several dimensions, in which also Vitali’s
result was considerably extended. In this context, Lebesgue’s main device was
the generalization of an assertion in Vitali’s above-cited 1908 paper. His (and
other’s) generalized versions of this assertion would later simply be called
“Vitali’s covering theorem”. Lebesgue [30, p. 390 f.] first defined what he
termed a “regular family”: A family F of Lebesgue measurable sets is called

48 Measure Theory and Foundations of Probability


“regular” if and only if there exists a positive constant μ such that for all sets
F ∈ F the inequality m(F )/m(S) > μ holds, where S denotes the smallest
sphere containing F . Then he [30, pp. 391–394] extended Vitali’s assertion
(which had only dealt with set families consisting of multidimensional cubes)
step by step. His version reads:

Let E be a measurable set of which each point appears in an infin-


ity of domains [i. e., perfect and connected subsets of Rn ] whose
diameters are as small as one wishes, which form a regular family,
and which are extracted from a given family F of domains. Then
one can find among F a finite or countably infinite number of do-
mains, two by two without common points, such that the sum of
their measures is not smaller than m(E).

It was apparently Lebesgue who started the discussion of derivative in the


general sense of (1). He [30, p. 361] referred to Volterra, however without
precisely indicating the specific source. It may be that an analogous notion in
the context of calculus of variations, introduced by Volterra [50] in 1887 (see
[33, p. 1132]), inspired Lebesgue to his idea. In the immediate sequel of his
account on Vitali’s covering theorem, Lebesgue [30, pp. 395–405] assumed to
each point Q ∈ Rn in question arbitrarily many sequences of measurable sets
{ρk (Q)}, say, contracting to Q and each belonging to a (possibly different)
regular family. He showed that for a locally summable function f defined on
Rn a derivative of the indefinite integral with respect to the sequences {ρk (Q)}
exists in almost all Q ∈ Rn such that

1
(4) f (x)dx → f (Q).
m(ρk (Q)) ρk (Q)

In his proof, Lebesgue made decisive use of Vitali’s theorem, for which, in
turn, the assumption of regular families seemed to be indispensable.5
In a footnote on pp. 362–363 of his 1910 paper, Lebesgue stated that neither
the validity nor the invalidity of (3) without the constraint 0 < α < |h/k| < β
corresponding to the assumption of regular sequences of two-dimensional inter-
vals was proven yet. Moreover, Harald Bohr and Stefan Banach later showed
that already in the case of certain higher-dimensional intervals which do not
fulfill the assumption of “regular families” a countable “Vitali covering” is
not possible, as is also hinted at in [Feller 1934a, p. 227].6 Therefore, it

5 Montel and Rosenthal [33, pp. 1132–1134] give an excellent description of Lebesgue’s

main ideas.
6 Banach (1924, [2]) constructed a bounded set F ⊂ R2 with Lebesgue measure m(F ) = 1

such that to each point of this set a particular sequence of rectangles exists whose edges are
parallel to the coordinate axes, and who contract to this point. From the family of these
rectangles, however, each countable subfamily consisting of mutually disjoint rectangles cov-
ers a set of a measure only less or equal to 12 m(F ). Bohr constructed another counterex-
ample, again in relation to two-dimensional intervals, which he had already communicated

H. Fischer — Selected Works of W. Feller, Volume 1 49


was desirable to make an attempt for a theory of derivatives of indefinite
integrals without using Vitali’s covering theorem, and this was exactly Buse-
mann and Feller’s goal. In [*Feller 1932c, pp. 467 f.] and, for metric spaces
in [*Feller 1934b, pp. 40 f.], an easy approach to the problem of derivatives
of indefinite integrals had been expounded, but only with respect to “basic
sets”.7 On the last page of [*Feller 1932c] a reference is made to the problem
in relation to more general systems of sets. Yet, in Feller’s joint work with
Busemann, entirely different methods were used.
In contrast to Lebesgue, Busemann and Feller do not concentrate on se-
quences of arbitrary regular sets assigned to each single point of Rn , but on
systems R containing (open) sets of a certain type, such as spheres, higher-
dimensional intervals, general higher-dimensional cuboids, or even more gen-
eral parallelepipeds, from which one can construct for any point Q a con-
tracting sequence {ρk (Q)}. By an argument basically known since Lebesgue’s
1910 paper (see below), they show that in the case of bounded functions f
the validity of the so-called “density theorem” is sufficient (and automatically
necessary) for (4) [Feller 1934a, pp. 247–250]. By definition, the “density the-
orem” holds for R if, for each measurable set κ, in almost all points Q ∈ Rn
for all sequences {ρk } ⊂ R which contract to Q the property

m(ρk ∩ κ) 1 if Q ∈ κ
lim =
k→∞ m(ρk ) 0 otherwise
is valid [Feller 1934a, p. 227]. The left side of this equation is called “density
of κ with respect to R in Q” [Feller 1934a, p. 229]. The notion of “density” in
the just described sense is due to Lebesgue (1905, [29, p. 266]). Lebesgue [30,
pp. 405–408] even showed that in the one-dimensional case with R being the
system of all bounded intervals the density theorem holds for all measurable
subsets κ of R, and, by aid of this result, he elegantly deduced (2) for almost
all real x. Within a short time, Charles Jean de la Vallée Poussin extended
these arguments and results to the higher-dimensional case in the scope of
regular sequences, see [15, pp. 71–74]. Busemann and Feller’s idea of tackling
the problem by means of the density theorem is therefore quite obvious, and,
consequently, a significant portion of their text [Feller 1934a, pp. 229–238] is
devoted to necessary and sufficient conditions for the validity of the density
theorem. In particular, both authors prove that the density theorem holds if
R is the system of all multi-dimensional open intervals [Feller 1934a, pp. 238–
242], whereas the density theorem is in general not true if R is the system of
all arbitrary rectangles (not only those parallel to the coordinate axes) in R2
[Feller 1934a, pp. 243–247].
The issue of unbounded functions takes another considerable part of Buse-
mann and Feller’s article [Feller 1934a, pp. 250–256]. The main theorem in
in 1918 to Carathéodory. This communication was eventually published in Carathéodory’s
1927 book (see [8, pp. 689–692]).
7 Bruckner [6, p. 28] even cites [*Feller 1934b] as the earliest contribution with an attempt

to this kind of differentiation in an “abstract” space.

50 Measure Theory and Foundations of Probability


this context is as follows:

The system R is assumed to contain for each set ρ also all the other sets that
can be generated from ρ by an affine mapping composed of a translation and
a uniform scaling.8 With respect to R the assertion (4) holds for any – even
unbounded – locally summable function f if and only if for any positive numbers
a1 , . . . , an and any arbitrary disjoint, bounded and Lebesgue measurable sets
κ1 , . . . , κn the union s of all sets ρ ∈ R with


n
aν m(κν ∩ ρ) > m(ρ)
ν=1

satisfies the condition


n
(5) m(s) < C aν m(κν ),
ν=1

where the constant C only depends on the system R.

The validity of (5) can be shown without any use of Vitali’s covering the-
orem for systems of “standard” sets, e. g., in the two-dimensional case for
circles, squares, rectangles with bounded ratios of edge lengths. A significant
consequence of Busemann and Feller’s main theorem is due to the fact that
(5) may already fail if R is the system of any arbitrary rectangles parallel
to the coordinate axes [Feller 1934a, p. 255]. Since Vitali’s covering theorem
for a system R is sufficient for the differentiability of indefinite integrals with
respect to sequences from R [Feller 1934a, p. 254], it can be inferred that this
theorem does not hold if R is assumed to consist of rectangles parallel to the
coordinate axes without any restriction on the ratios of their edge lengths. By
this argument, an alternative proof for the assertion of Banach and Bohr (see
above) is possible.
Vitali’s own proof of the covering theorem, as well as the modifications
of this proof by Lebesgue (see above), Banach [2], or Carathéodory (see [8,
pp. 299–307]) used the axiom of choice, but only in its “weak” denumerable
version. Despite the controversial role of this axiom with regard to the founda-
tions of set theory, it seems that its use (at least in its denumerable form) was
not seen as problematic in connection with the just described problems. This
even applied to Lebesgue, who was, in principle, rather critical towards the
axiom of choice (see [34, pp. 95 f.; 314–317]). Busemann and Feller do not use
Vitali’s theorem, but their considerations are also based on the denumerable
axiom of choice at several places (see, e. g., [Feller 1934a, pp. 231 f.; 255]). In
this sense they share the usual pragmatic attitude of their colleagues working
in measure theory.

8 In short, R is closed under homothetic transformations.

H. Fischer — Selected Works of W. Feller, Volume 1 51


Busemann and Feller’s main focus was on sets and not on functions. Their
assertions referred to “all” bounded or possibly unbounded functions, respec-
tively. Insofar their achievements may have been perceived at first as less
important in comparison with other results which were sharper with respect
to the specific functions under consideration. A survey of the state of the
art around 1935 – also referring to Busemann and Feller’s contribution – is
given in Saks’s 1937 monograph [40, Chapter IV]. From this source (as already
from the note on the last page of [Feller 1934a]) we also learn that several of
Busemann and Feller’s achievements, e. g., the above mentioned result on the
failure of measure differentiation with respect to general multi-dimensional in-
tervals in the case of unbounded integrands,9 had been reached at practically
the same time by other authors.
Still, the very general discussion of differentiation of integrals and density
theorems by Busemann and Feller contained such a wealth of interesting ideas
and details that it could unfold – although rather gradually after 1945 – sig-
nificant influence on the subsequent development of pertinent problems, even
in abstract measure spaces.10 Nowadays any family B of open and bounded
subsets of Rn with positive Lebesgue measure such that for any Q ∈ Rn there
is a sequence {ρk } ⊂ B with Q ∈ ρk for all k and diam(ρk ) → 0, and all se-
quences from B contracting to Q are considered with respect to a possible
limit (4), is called a “Busemann-Feller differentiation base” (cf. [13, p. 42]).11
The conditions for the density theorem and for differentiability as expounded in
[Feller 1934a] are given in terms of unions s of sets belonging to B whose inter-
sections with particular measurable sets κ have certain properties. Because,
in general, s surrounds κ like a halo, those properties “of Busemann-Feller
type” are called “halo properties” since the 1950s.12 Miguel de Guzmán’s
very readable book [13] from 1975 is, in large portions, a downright homage
to Busemann and Feller, in particular with respect to the geometric ideas ex-
pounded by the two authors; in fact, the 1934 article still excels in the richness
of its results and its geometric intuition.

Kollektivs
The two articles [*Feller 1938a], [Feller 1939c] on kollektivs are important con-
tributions to foundational questions of mathematical probability, which also
reflect its author’s intentions of unifying the concepts of chance and measure.

9 Saks’s proof [39] for this failure can be found in the immediate sequel of Busemann and

Feller’s article in the Fundamenta Mathematicae.


10 A survey of the current state of the theory is in [44], a historical sketch of its development

from around 1930 is given in [14].


11 This term is also used under the additional requirement that B is closed under homo-

thetic transformations (cf. [44, p. 205]).


12 This terminology was apparently introduced into the mathematical literature by Hayes

and Pauc [20, pp. 222; 252–255] after some time of informal use. For a classification of
different “halo properties”, see [6, pp. 7–10].

52 Measure Theory and Foundations of Probability


Both papers are in the framework of an intensified interest in von Mises’s the-
ory of kollektivs during the 1930s.13 Originally, Richard von Mises (1919, [51,
pp. 55–57]) had introduced a frequentist definition of probability in the follow-
ing way: Let (ei ) be an infinite sequence of “thought things” such that each
of the ei is mapped onto a certain point xi ∈ M ⊂ Rk . M is called the “Merk-
malraum” (“label space”). It is assumed that at least two different values of
M occur as images of infinitely many elements in this mapping respectively.
Then (ei ) is called a “kollektiv” if the two following “conditions” or “axioms”
(translated from von Mises’s original text) are valid:

Condition I. Existence of limits. Let A be an arbitrary point


set of the label space and NA the number of those among the first
N elements of the sequence whose label is a point in A; then for
each A the limit
NA
(6) lim =: WA
N →∞ N
exists.

Condition II. Irregularity of the attribution. Let A and B be


two point sets of the label space without any common points, and let
WA and WB be the two limits according to (6), where both are not
equal to zero at the same time. From the sequence of all elements
(e) of K [the kollektiv] we at first eliminate all those whose labels
do not belong to A or B. The remaining elements are given the
indices 1, 2, 3, . . . . From the infinite sequence generated in this way
an infinite subsequence (e ) is selected such that the indices of the
selected elements are chosen without any regard to the differences of
these elements relative to their labels. Then among the subsequence
(e ) the limits WA and WB according to (6) exist and obey the
condition
WA : WB = WA : WB .

Apparently, von Mises considered his kollektivs as ideal schemes serving to


approximately depict real situations, and their existence was self-evident for
him. In this sense, he was less interested in questions of a “formal” logical con-
sistency. From the beginning, two problems were in the focus of the discussion
of von Mises’s axioms: First, which restrictions were necessary for the sets A, B
as above? Von Mises’s original assumption that to any set in the label space a
probability could be assigned was not maintainable. Already in November of
1919, Hausdorff wrote two personal letters to von Mises with various critical

13 There is a good deal of surveys of the history of kollektivs up to the time around 1980.

The still most comprehensive account is [3]. For a concise introduction see [52, pp. 183–197].

H. Fischer — Selected Works of W. Feller, Volume 1 53


remarks on von Mises’s assumption. The second letter contains the proof that,
whenever for a certain kollektiv with – at most countably many and possibly
repeatedly appearing – labels ξk the equation pk < 1 holds, where pk is the
probability of the occurrence of ξk , a set A can be constructed such that the
limit (6) does not exist for this kollektiv. pk < 1 happens in particular if
the underlying probability
 distribution is a continuous distribution. In this
case we even have pk = 0 (see [43] for details and for an English translation
of both letters). The second difficulty was surrounding the only imprecisely
defined “irregularity” of von Mises’s random sequences, which was likewise
criticized by Hausdorff. Especially from the late 1920s, these two questions
were discussed by various authors aiming at a logically satisfactory exposition
of von Mises’s ideas, based on certain modifications of the original account,
however.
In principle, Feller and Tornier’s research on densities of (tacitly strictly
monotonically increasing) “sequences” of natural numbers (and likewise of se-
quences consisting of ideals – ordered with respect to their norm – in the ring
of integers of an algebraic number field) already comprised important results
which could be applied to kollektivs. We recall that a natural number with
the prime factorization
pλ λ2 λ3 λ4
1 · p2 · p 3 · p 4 · · · ,
1

where p1 , p2 , . . . designate the prime numbers in ascending order, corresponds


to the element
(λ ) (λ ) (λ ) (λ )
e1 1 e2 2 e3 3 e4 4 . . .
of the Baire null space R. Each basic set consists of all points in R whose first
k “coordinates” λ1 , . . . , λk are fixed, and those elements of it which only have
finitely many λi = 0 correspond to a number sequence consisting of numbers
λ
q · pλ1 λ2
1 p 2 · · · pk ,
k

where the prime factorization of the natural number q only contains prime
factors > pk . Such number sequences have the density

1 
k
1
(7) d(λ1 , . . . , λk ) = λk
1− .
pλ1 λ2
1 p2 · · · pk i=1
pi

Feller and Tornier’s main goal was to develop a criterion for the existence of
the density of an arbitrary sequence of natural numbers among all natural
numbers, based on the knowledge of the density (7) for particular sequences.
To this end they considered – in a quite intricate way – matrices with infinitely
(λ ) (λ )
many rows and columns, of which the jth column with the entries e1 1j , e2 2j ,
. . . corresponded to a point in the Baire null space. In this way, each of those

An excellent review of pertinent work during the 1930s is [32]. An easily accessible and very
good survey up to the newest development is [4]. Especially for Tornier’s theory see [21], [22].

54 Measure Theory and Foundations of Probability


matrices represented a point sequence in R. In order to apply measure theo-
retic methods, Feller and Tornier established a measure on R with the property
that the measure of the basic sets was defined by their density (7). In order
to broaden the basis for applications (in relation to densities among algebraic
structures, for example) they even considered arbitrary product measures | · |
for the basic sets such that |R| = 1. Aided by their measure theory for Baire
null spaces, they proved two important theorems:

(1) The set F of all matrices F such that for each basic set E ⊂ R the density
of E within F (i. e., the limit as n → ∞ of the number of all columns (up to
the nth) which are points of E divided by n) is equal to |E|, is nonempty.
In the case of the measure (7) this set is nonempty simply because the
matrix corresponding to N is in accord with this condition. In the general
case the proof for the existence of such matrices was part of a proof for an
even more specific assertion [*Feller 1932b, pp. 204–206].
(2) A subset A of the Baire null space R has a density within all matrices F
from the above-characterized set F if and only if A is Peano–Jordan measurable
with respect to the algebra of clopen sets and the measure | · |; then the measure
of the set is equal to its density [*Feller 1932b, pp. 208–214].

A “Peano–Jordan” theory of content in Baire spaces was given in the article


[*Feller 1932a, pp. 179–185]. Let κ be a system of subsets of R building a sub-
algebra of the sigma-algebra of all measurable sets with respect to the measure
| · |. Then the outer and inner Peano–Jordan content (in German “Inhalt”) of
a subset A of R in relation to κ is defined by

J κ (A) := inf {|B| | B ∈ κ, A ⊂ B} and J κ (A) := sup {|B| | B ∈ κ, B ⊂ A} ,

respectively.14 A set A is said to have a Peano–Jordan content Jκ (A) if and


only if
J κ (A) = J κ (A) =: Jκ (A).
It follows almost directly from the respective definitions that, if it exists, the
Peano–Jordan content Jκ (A) is identical to the measure |A|. In general, the
Peano–Jordan content is additive, but not sigma-additive. The most important
case is when κ is the set algebra generated by all basic sets, an algebra which
exclusively consists of clopen sets.
The “if” part of the theorem (2) above immediately followed from the fact
that each Peano–Jordan measurable set can be, with respect to its measure,
approximated with an arbitrarily small error from the interior, as well as from
the exterior, by a finite union of disjoint basic sets. The “only if” part needed
considerably more effort, with a quite involved investigation of the “clopen

14 This notion of content had been introduced by Peano (1887, [35, Chapter 5]) in a

geometric context and, more generally in Rn , by Jordan (1892, [25]).

H. Fischer — Selected Works of W. Feller, Volume 1 55


sets”. In the special case of a measure according to (7) the immediate conse-
quence of this theorem is that a sequence of natural numbers has a density in
N if and only if there exists a Peano–Jordan measurable subset A ⊂ R such
that there is a bijection from the sequence to the subset of all elements of A
with finitely many λi = 0 [*Feller 1932b, p. 192].
From the beginning of his pertinent work, Tornier (see, e. g., [45]) had been
well aware about the close relations between number densities and probabilities
in kollektivs. Basically, number sequences were not random, however, and
thus von Mises’s second condition on irregularity was unimportant for him in
the scope of number theory. Yet, his ∞ × ∞-matrices corresponding to point
sequences in the Baire space – each column stood for a point – could also be
used to represent an infinity of “random” experiments [45, p. 181]. In this way,
he developed a theory in which frequency and probability measure could be
unified on the basis of a measure theory in (to some extent generalized) Baire
spaces.
In the simplest case of coin tosses, each matrix has the entries 0 or 1 only.
Each column in a matrix corresponds to a random experiment of infinitely
many, not necessarily independent tosses. The event “0” in a single toss is
represented by all points of the Baire space {0, 1}N whose first coordinate is
equal to 0. The event “three times 1 and two times 0” among five tosses is
given by all points of the just defined Baire space whose first 5 coordinates
consist of three ones and two zeros. From Tornier’s point of view, an infinite
matrix consisting of those “points” whose coordinates are the columns, plays
the role of a kollektiv. With his matrices Tornier covered at the same time
events in single sample spaces Ωi (i ∈ N) when the possible entries in the ith
row corresponded to outcomes from Ωi , and finite as well as infinite product
spaces Ωi1 × Ωi2 × · · · corresponding to the entries in the rows i1 , i2 , . . . . For
the time being, the Ωi were assumed to consist of at most countably many
elements.
Notwithstanding several fundamental ideas expounded in his early contri-
butions on densities and kollektivs, only through his joint work with Feller
reached Tornier a sufficiently elaborated theoretic base for combining mea-
sure and frequency in kollektivs. His pertinent results were published in a
comprehensive article in Acta Mathematica from 1933. The above-discussed
theorems (1) and (2) played a significant role in this approach to probability
theory [46], which was given in an axiomatic form and which now concerned
general measures (and not only product measures). The basic ideas of the
original proofs from [*Feller 1932a] and [*Feller 1932b] could be maintained,
however. Tornier’s 1936 book [47] provides a still more generalized exposi-
tion, where the single sample spaces Ωi can also have an uncountably infinite
cardinality.
Yet, to anybody who looks for a concise and at the same time precise
account of Tornier’s definitive theory, the summarizing survey [*Feller 1938a],
which appeared in the proceedings of the historically very important 1937
conference on probability theory in Geneva, can be recommended. In a slightly

56 Measure Theory and Foundations of Probability


simplified, but sufficiently general, model compared with Tornier’s, Feller only
considered the case where all sample spaces Ωi are equal to a certain Ω. Each
point in the Baire space R = Ω × Ω × · · · is a sequence ej1 , ej2 , ej3 , . . . , where
eji ∈ {e1 , e2 , e3 , . . . } = Ω with at most countably many ei (1 ≤ i ≤ N ≤ ∞).
(j)
Tornier’s matrices therefore corresponded to double sequences (ei ), where j
designated the column and i the row.
In a masterly manner Feller sketched the essential steps for establish-
ing a frequentist probability measure P in the Baire space R [*Feller 1938a,
pp. 16 f.]: First define P for the basic sets of R (see above) and assume that
this measure is sigma-additive within the system of basic sets. Then extend P
to the smallest sigma-algebra ζ  generated by all basic sets. Finally adjoin all
those subsets of R which are part of a null set of ζ  . In this way a sigma-algebra
ζ is obtained which corresponds to all Lebesgue measurable sets with respect
to the sigma-additive measure P. At this place Feller did not refer to his
original method of constructing measures via clopen sets, as described above.
Instead, he apparently alluded to the measure extension theorem according to
Hahn and Kolmogorov, which probably was standard for the experts now.15
In a similar “modern” way, Feller defined the corresponding Peano–Jordan
measurable sets with respect to P not by inner and outer measure but simply
by characterizing them as the elements of the smallest set algebra ζ1 generated
by the basic sets. Feller now was ready to explain the two main assertions in
Tornier’s theory: For each matrix f with “points” of R as its columns, and for
each subset Γ of R, let Nk (Γ; f ) be the number of those of the first k columns
of f which are elements of Γ.

Theorem I: The set ϕ of all matrices f with the property


1
P(E) = lim Nk (E; f )
k→∞ k

for all basic sets E ⊂ R is non-empty.


Theorem II: In order that for a certain Γ ⊂ R and for all f ∈ ϕ the respective
limits
1
lim Nk (Γ; f )
k→∞ k

exist and are equal to P(Γ), it is necessary and sufficient that Γ ∈ ζ1 , i. e., that
Γ is Peano–Jordan measurable.
As we have seen, this had already been shown in Feller’s joint article with
Tornier [*Feller 1932b] in the special case of product measures.

15 Only in [Feller 1939c, p. 90] Kolmogorov’s version of the extension theorem was explic-

itly referred to, and also – rather vaguely – brought into a connection with Banach’s name.
Possibly, Feller hinted at the so-called Hahn–Banach extension theorem for functionals in
normed vector spaces, which in fact can be used, via characteristic functions of sets, for ex-
tending simply additive set functions defined on a set algebra consisting of subsets of a cer-
tain set Ω to all subsets of Ω (see [31, p. 255]).

H. Fischer — Selected Works of W. Feller, Volume 1 57


A significant part of Feller’s 1938 paper relates to Tornier’s remarks on the
possibility of infinitely many different extensions of probabilities beyond those
sets which are Peano–Jordan measurable. In [*Feller 1938a, pp. 19 f.] these
considerations are described as follows: Let ϑ ⊂ R be an arbitrary set with

P(ϑ) := inf P(A), P(ϑ) := sup P(A)


A∈ζ1 , A⊃ϑ A∈ζ1 , A⊂ϑ

being its outer and inner Peano–Jordan measure with respect to P and the set
algebra ζ1 . Then for each matrix f ∈ ϕ we have
1 1
P(ϑ) ≤ lim inf Nk (ϑ; f ) ≤ lim sup Nk (ϑ; f ) ≤ P(ϑ).
k→∞ k k→∞ k

Feller now refers to a theorem of Tornier (1933, [46, p. 312]) to the effect that
for each number c with
P(ϑ) ≤ c ≤ P(ϑ)
there exists an f ∈ ϕ such that
1
(8) lim Nk (ϑ; f ) = c.
k→∞ k
As a consequence, if ϑ does not belong to ζ1 , then there exists a nonempty
system Φ ⊂ ϕ of matrices f for which (8) holds. It is easily shown that all sets
Λ ⊂ R such that
1
p(Λ) := lim Nk (Λ, f )
k→∞ k
exists for all f ∈ Φ, form a set algebra ζ2 ⊃ ζ1 , where ϑ is an element of ζ2 . The
set function p(Λ) can be considered as an – additive but not sigma-additive –
extension of P from ζ1 to ζ2 . In this sense there exist infinitely many different
extensions of P from ζ1 depending on the particular choice of the adjoined set
ϑ and the choice p(ϑ) = c.
In this context Feller (p. 18) points out the common ground of Tornier’s
and Kolmogorov’s approaches. Both refer, in a first step, to a set algebra A1
endowed with a probability measure P which is sigma-additive in the sense
that for each sequence (Ai ) ⊂ A1 of mutually disjoint sets:
∞ ∞ ∞

Ai ∈ A1 =⇒ P Ai = P(Ai ).
i=1 i=1 i=1

In a second step Kolmogorov extends the probability P in a unique way to the


sigma-algebra A2 generated by A1 , whereas Tornier finds an infinite variety
of different extensions, in each case with different extended algebras, different
extended probabilities, and different underlying kollektivs. Feller (p. 20) writes
in this context with respect to Tornier’s ideas:
By identifying the calculus of probability with all possible exten-

58 Measure Theory and Foundations of Probability


sions of the function P(Γ) beyond the field ζ1 , one looses all connec-
tions with the applications and even with the intuition. Moreover,
from the point of view of axiomatics this identification is not le-
gitimate because it has started from functions of very special sets,
and it is not possible to arrive in this way at all those set functions
which can be represented by means of frequencies.
Notwithstanding this critical statement, Feller’s discussion is only a lit-
tle polemic against Tornier with regard to the erroneous assertion (p. 17)
that in Tornier’s opinion the frequency model would not comprise any sigma-
additivity. Actually, at no place in his work Tonier denied the restricted sigma-
additivity of P within the algebra ζ1 . On the other hand, it is surprising how
sympathetic Feller is towards Tornier’s theory on the whole, after all the hos-
tilities and plots Tornier had undertaken against him (see [41, pp. 23 f.]). He
is very restrained concerning his own merits, only in two footnotes (pp. 15 and
19) he refers to the two jointly written articles [*Feller 1932a], [*Feller 1932b],
which in principle contain all essential elements of “Tornier’s” theory. And
in this friendly tenor Feller (p. 21) closes his remarks on Tornier’s frequency
theory with the words:
But if one lets away this last strange point [i. e., the measure exten-
sion via adjoined sets] from the theory, one is led to a specialization
of the theory of Mr. Kolmogorov, which permits an interesting in-
terpretation of probabilities by means of frequencies.
As we have seen, Tornier was not interested in questions of irregularity,
but with his main result on the indispensability of measurability in the Peano–
Jordan sense he clarified (together with Feller!) an important prerequisite for
the proof of the consistency of von Mises’s (somewhat modified) axioms which
in a far-reaching generality was mastered by Abraham Wald in 1937. Yet, it
is a more than marginal detail that also in [Feller 1939c] a – compared with
Wald’s account considerably simplified, though nonconstructive – consistency
proof was given.
With respect to the demand for irregularity, von Mises had only verbally
defined the different forms of selection of subsequences from kollektivs (see his
“Condition II” above). During the 1930s, however, it had become a common
approach to state selection rules in a formal manner by aid of selection func-
tions. Additionally, von Mises’ original differentiation between the outcomes
of experiments and their quantification in label spaces had been abandoned
by most authors, such that a kollektiv was simply regarded as a sequence in
the label space M (any set with at least two elements). In Wald’s exposition
[53, pp. 38 f.] any selection function of order n (n ∈ N) is a mapping in n
variables M n → {0, 1}. The zeroth order selection function is the constant
mapping which assigns to all elements of M either 0 or 1. Then any selec-
tion rule corresponds to a sequence of selection functions (fn )n∈N0 , fn being
of order n, in the following sense: Select from the kollektiv (mn )n∈N this
subsequence (mni ) for which ni is the smallest index, respectively, such that

H. Fischer — Selected Works of W. Feller, Volume 1 59


fni −1 (m1 , . . . , mni −1 ) = 1. The selection of mni therefore only depends on the
index ni and the elements m1 , . . . , mni −1 . As Wald [53, pp. 71 f.] showed,
more general selection rules can lead to inconsistencies.
For any label space M , any system M of subsets of M , and any system
S of selection rules which also contains the “identical selection” (according to
the selection functions fn ≡ 1), Wald denoted by K(S, M) any sequence (mn )
in M with the property: For each L ∈ M there exists a number WL such that

lim Hn (L, (mni )) = WL


n→∞

holds for each infinite subsequence (mni ) of (mn ) which is generated according
to any of the selection rules S. In the latter limit relation Hn (L, (mni )) stands
for the relative frequency of occurrences of elements from L within the first
n members of the subsequence. WL was called the “probability of L in the
kollektiv K(S, M)” by Wald.
The main result of Wald’s 1937 paper [53, p. 46] is as follows:

Given a countable set algebra K M of subsets of M and a simply additive


set function μ ≥ 0 on K with μ(M ) = 1, let M be the system of all subsets
of M which are Peano–Jordan measurable with respect to μ and K. If S is a
countable system of selection rules (including the identical selection), then to
each dyadic number in the interval [0, 1] a different kollektiv (mn ) ⊂ M can be
constructed such that (mn ) is K(S, M) with μ(L) = WL for all L ∈ M.

We note first that Wald expressed the limit behavior of subsequences (see
“Condition II”) in a considerably simplified mode compared with von Mises.
Wald [53, pp. 41 f.] proved, however, that his version was equivalent to von
Mises’s. A second important remark is that Wald’s treatment of kollektivs
was constructive with the only exception that the selection procedures could
also include non-constructive elements. In this way he gave a direct proof that
the cardinality of kollektivs was (at least) the cardinality of the continuum.
We have to note further that Wald in contrast to Tornier (and Feller) only
demanded simple additivity for the probability measure μ within the “basic”
set algebra K. He did not show the necessity of Peano–Jordan measurability,
however. He [53, p. 47] only stated that “further weakening” the assumptions
would “scarcely be of any interest”. By the remark that there existed at most
“countably many mathematical laws”, Wald (same place) justified the consid-
eration of only countably many selection rules. Without altering Wald’s results
and arguments, Church (1940, [9]) would give an exposition relating to a pre-
cise definition of “constructiveness” on the basis of the theory of computability
(recursiveness, lambda-calculus).
Prior to Wald, especially Arthur H. Copeland had contributed to a formal
elaboration of von Mises’s ideas in a series of papers between 1928 and 1937.
In his 1937 paper [12] (already submitted in 1933) he came, regarding gener-
ality, rather close to Wald’s achievement, with a basically similar approach.

60 Measure Theory and Foundations of Probability


Copeland (1928, [10]) had introduced “admissible” sequences in order to gain
models for frequentist probabilities, still without explicitly referring to von
Mises. In the most general setting, a sequence (mn ) ∈ M is called “admissible”
with respect to a probability measure μ defined on a set algebra A of subsets
of M if for any k ∈ N and for all A1 , . . . , Ak ∈ A with characteristic functions
cAi , and for all arbitrarily chosen natural numbers r1 < r2 < · · · < rk ≤ s the
limit relation
1  
n−1 k k
lim cAi (mri +js ) = μ(Ai )
n→∞ n
j=0 i=1 i=1

holds. As one can see quite easily, all sequences being in accord with Wald’s
conditions are admissible, the reverse, however, is not true. In the context
of “admissible” in its full generality also Copeland (1931, [11]) realized the
importance of restricting the considered sets to those which are Peano–Jordan
measurable. In the particular case M = {0, 1} and μ according to a Bernoulli
experiment with probability p for the outcome “1”, the limit relation above is
equivalent to

1 
n−1 k
(9) lim mri +js = pk ,
n→∞ n
j=0 i=1

where k, ri and s are as above. Since the sequence (mn ) represents, if it is


considered in the sense of the digits of a dyadic expansion, a number between
0 and 1, it is called “admissible number” whenever (9) is true. Finally, in the
case p = 12 the admissible numbers are “entirely normal numbers” with respect
to the number base 2 in the sense of Borel’s 1909 paper [5], where they are
studied in relation with measure theoretic questions concerning the uniform
distribution on the interval [0, 1]. Under the assumption that the digits of
each dyadic number between 0 and 1 are generated by an infinite Bernoulli
process with success probability 12 , Borel showed that the subset of all entirely
normal numbers has, with respect to the corresponding product measure, the
probability 1.
An essential aspect of kollektivs is the impossibility of gambling systems,
already pointed out by von Mises [51, p. 58]. To have a gambling system means
to have a selection rule which alters the relative frequency of a certain event
within the selected subsequence of trials. If, for example, while playing roulette
the relative frequency of a certain outcome would in the long run be increased
among all trials which directly follow “rouge”, then concentrating on the cor-
responding moves would be favorable for the gambler. Therefore, invariance
of relative frequency with respect to selections in the long run is equivalent to
the impossibility of gambling systems. In a mathematically rigorous manner
this was clarified by Doob (1936, [16]), as also hinted at in [Feller 1939c, p. 88].
Basically, Doob defined selection functions in just the same way as Wald. He
considered an arbitrary probability distribution on the sample space Ω = Rn ,
and he took the infinite product space Ω∞ := Ω × Ω × · · · endowed with the

H. Fischer — Selected Works of W. Feller, Volume 1 61


product measure P∞ as a basis for his kollektivs. The selection functions were
assumed to be measurable, and each selection rule was presupposed to yield in-
finitely many selections with probability 1. Doob’s main theorem asserted that
the product measure P∞ was invariant with respect to selection. More specif-
ically: If all kollektivs (mn ) out of a measurable set Λ ⊂ Ω∞ are subjected
to a certain selection which transforms each (mn ) into a subsequence (mnk ),
then the set Λ of all (mnk )’s is measurable and the equality P∞ (Λ ) = P∞ (Λ)
holds.
Based on Doob’s result, a relatively simple consideration regarding kollek-
tivs is possible. Let A be a set from the countable set algebra K M introduced
above with M = Rn , let B be the sigma-algebra generated by K, and let P be
a probability measure on B. Then, on account of the strong law of large num-
bers, for all x ∈ Ω∞ with exception of a null set with respect to the product
measure P∞ the relation

(10) lim hn (x, A) = P(A)


n→∞

holds, if hn (x, A) designates the number of the first n coordinates of x with


values in A divided by n. Since the union of countably many null sets is again
a null set, the relation (10) is true for all A ∈ K and all elements x of a certain
set B ⊂ Ω∞ with P∞ (B) = 1. One can even show that (10) is true for all sets
A ⊂ M that are Peano–Jordan measurable with respect to K and P, and all
x ∈ B. If now S is a countable system of selection rules S1 , S2 , . . . , each of them
satisfying Doob’s assumptions, then, by Doob’s theorem, any selection rule Si
corresponds to a measure preserving transformation which in particular maps
B to Bi ⊂ Ω∞ . In Bi the strong law of large numbers is valid in the same way
as in B, and therefore (10) holds for all x ∈ Bi and again all Peano–Jordan
measurable A, where Bi is a certain subset of Bi with P∞ (Bi ) = 1. Let Bi
be the set of all elements in B which are mapped by means of Si to Bi .
Then P∞ (Bi ) = 1 and all x ∈ Bi are kollektivs with respect  to the selection Si .
We finally obtain the result that all elements of B∞ := ∞ i=1 Bi are kollektivs
with respect to the countable (!) system S and all Peano–Jordan measurable
sets. The set B∞ necessarily contains uncountably many elements because of
P∞ (B∞ ) = 1.
With certainty, the just explained consideration was a matter of course for
the expert, and there are hints by various authors (e. g., by Doob [16, p. 367],
Fréchet [18, p. 38], or Wald [54, p. 96]) in this direction.16 Feller, however,
by use of techniques connected with the strong law of large numbers for the
binomial process – he explicitly refers in this context to the proof by Cantelli
(1917, [7]) – in [Feller 1939c] directly shows that each selection rule “singles
out” (in Martin-Löf’s words [32, p. 31]) a set of probability zero only. Yet,
Feller goes further: He succeeds in dropping the assumption of measurability
16 Fréchet at the already mentioned 1937 conference in Geneva referred to Jean Ville’s

PhD thesis, which was not entirely completed at this time. Ville (1939, [48, pp. 34–38])
would give a detailed account, but only for kollektivs corresponding to a Bernoulli process.

62 Measure Theory and Foundations of Probability


of the single selection function as well as the assumption of sigma-additivity
for the considered probability measure. In this way he reaches a result which
is almost Wald’s. “Almost” only, because Feller tacitly infers the cardinality
of (at least) the continuum from uncountability. In contrast, Wald had given
an explicit definition of an injection from [0, 1] to Ω∞ in order to identify the
single kollektivs.
Apparently, Feller did not attribute much value to Wald’s (and others’)
“constructive” approaches. Neither was he willing to seriously consider Jean
Ville’s objection (brought forward in the Geneva conference by Fréchet [18,
p. 35] and discussed in detail in Ville’s PhD thesis [48, pp. 42–47]) that there
exist kollektivs which do not oscillate about their frequency limit in the usual
“irregular” manner. Such kollektivs could simply be neglected in the sense of
null sets, according to [Feller 1939c, p. 91]. Instead he gave credit to Hopf’s
measure theoretic investigations of randomness in the context of ergodic the-
ory [24]. Why was Feller so insensitive toward “deeper considerations” on
irregularity in kollektivs? A few hints can be found in the first part of his own
contribution [*Feller 1938a] to the Geneva conference (the second part of this
paper, referring to Tornier’s theory, was already described above). Regarding
those aspects of a mathematical theory which lie beyond pure mathematics,
Feller only put emphasis on “empirical” insights. “Philosophical” considera-
tions, however, were useless, from Feller’s point of view [*Feller 1938a, pp. 10–
13]. With respect to probabilistic experiences, only finite sequences of trials
are possible in reality. Therefore the mathematical theory can be confined to
the most convenient and most general approach which comprises such finite
models, and this is – at least in Feller’s opinion – measure theory according
to Kolmogorov. In this sense, Feller already in his early contributions showed
the basic attitude he maintained throughout his career: The attitude of a
pragmatic mathematician who easily changes from theory to applications and
back.

Acknowledgement. The author thanks Jürgen Elstrodt, Norbert Poscha-


del, René Schilling, Reinhard Siegmund-Schultze, and Günther Wirsching for
helpful discussions and various hints.

References
All citations of the form [Feller 19nn], resp., [*Feller 19nn] (if the respective
paper is not included in these Selecta) point to Feller’s bibliography, pp. xxv–
xxxiv.

[1] Baire, R.: Sur la représentation des fonctions discontinues, deuxième par-
tie. Acta Mathematica 32 (1909) 97–176. Reprinted in: P. Lelong (ed.):
Œuvres scientifiques, Gauthier–Villars, Paris 1992, pp. 377–456.

[2] Banach, S.: Sur un théorème de M. Vitali. Fundamenta Mathematicae

H. Fischer — Selected Works of W. Feller, Volume 1 63


5 (1924) 130–136. Reprinted in: St. Hartman and E. Marczewski (eds.):
Œuvres, T. 1, Ed. Scientifiques de Pologne, Warszawa 1967, pp. 90–95.

[3] Bernhardt, H.: Richard von Mises und sein Beitrag zur Grundlegung
der Wahrscheinlichkeitsrechnung im 20. Jahrhundert. Dissertation (B),
Humboldt-Universität, Berlin 1984.

[4] Bienvenu, L., Shafer, G., and Shen, A.: On the history of martingales
in the study of randomness. Electronic Journal for History of Probability
and Statistics 5, n◦ 1 (2009).

[5] Borel, É.: Les probabilités dénombrables et leurs applications arithmé-


tiques. Rendiconti del Circolo Matematico di Palermo 27 (1909) 247–271.
Reprinted in: Œuvres, T. 2, Ed. du Centre National de la Recherche Sci-
entifique, Paris 1972, pp. 1055–1078.

[6] Bruckner, A. M.: Differentiation of integrals. The American Mathematical


Monthly 78, No. 9, Part 2 (1971) i–iii, 1–51.

[7] Cantelli, F.: Sulla probabilita come limite della frequenza. Atti della Reale
Accademia dei Lincei–Rendiconti 26 (1917) 39–45.

[8] Carathéodory, C.: Vorlesungen über reelle Funktionen. Teubner, Leipzig–


Berlin 1927 (2nd edn.).

[9] Church, A.: On the concept of a random sequence. Bulletin of the Amer-
ican Mathematical Society 46 (1940) 130–135.

[10] Copeland, A.: Admissible numbers in the theory of probability. American


Journal of Mathematics 50 (1928) 535–552.

[11] Copeland, A.: Admissible numbers in the theory of geometrical probabil-


ity. American Journal of Mathematics 53 (1931) 153–162.

[12] Copeland, A.: Consistency of the conditions determining kollektivs.


Transactions of the American Mathematical Society 42 (1937) 333–357.

[13] de Guzmán, M.: Differentiation of Integrals in Rn . Springer, Lecture


Notes in Mathematics 481, Berlin–Heidelberg–New York 1975.

[14] de Guzmán, M.: The evolution of some ideas in the theory of differenti-
ation of integrals. In: J. A. Barroso (ed.): Aspects of Mathematics and its
Applications. Elsevier, Amsterdam 1986, pp. 377–385.

[15] de la Vallée Poussin, C.-J.: Intégrales de Lebesgue. Fonctions d’ensemble.


Classes de Baire. Gauthier–Villars, Paris 1916.

[16] Doob, J. L.: Note on probability. Annals of Mathematics, Second Series


37 (1936) 363–367.

64 Measure Theory and Foundations of Probability


[17] Elstrodt, J.: Maß- und Integrationstheorie, 7th edn. Springer, Berlin–
Heidelberg 2011.
[18] Fréchet, M.: Exposé et discussion de quelques recherches récentes sur les
fondements du calcul des probabilités. In: Colloque consacré à la théorie
des probabilités. Deuxième partie: Les fondements du calcul des probabi-
lités. Hermann, Actualités Scientifiques et Industrielles 735, Paris 1938,
pp. 23–55.
[19] Hahn, H.: Über die Multiplikation total-additiver Mengenfunktionen. An-
nali della Scuola Normale Superiore die Pisa, Classe di Science 2e serie
2 (1933) 429–452. Reprinted in: L. Schmetterer and K. Sigmund (eds.):
Collected Works Vol. 3, Springer, New York 1997, pp. 99–122.
[20] Hayes, C. A. and Pauc, C. Y.: Full individual and class differentiation
theorems in their relations to halo and Vitali properties. Canadian Journal
of Mathematics 7 (1955) 221–274.
[21] Hochkirchen, T.: Die Axiomatisierung der Wahrscheinlichkeitsrech-
nung zwischen Maß- und Häufigkeitstheorie – der Ansatz von Erhard
Tornier. Diplomarbeit, Fachbereich Mathematik, Bergische Universität
– Gesamthochschule Wuppertal 1992.
[22] Hochkirchen, T.: Wahrscheinlichkeitsrechnung im Spannungsfeld von
Maß- und Häufigkeitstheorie – Leben und Werk des “Deutschen” Mathe-
matikers Erhard Tornier (1894–1982). NTM (Neue Serie) Internationale
Zeitschrift für Geschichte und Ethik der Naturwissenschaften, Technik
und Medizin 6 (1998) 22–41.
[23] Hochkirchen, T.: Die Axiomatisierung der Wahrscheinlichkeitsrechnung
und ihre Kontexte. Vandenhoeck und Ruprecht, Göttingen 1999.
[24] Hopf, E.: On causality, statistics, and probability. Journal of Mathematics
and Physics of the Massachusetts Institute of Technology 13 (1934) 50–
102.
[25] Jordan, C.: Remarques sur les intégrales définies. Journal des Mathé-
matiques Pures et Appliquées (4) 8 (1892) 69–99. Reprinted in: R. Gar-
nier and J. Dieudonné (eds.):Œuvres de Camille Jordan T. 4, Gauthier–
Villars, Paris 1964, pp. 427–457.
[26] Khintchine (Khinchin), A.: Asymptotische Gesetze der Wahrscheinlich-
keitsrechnung. Springer, Ergebnisse der Mathematik und Ihrer Grenzge-
biete 2, Heft 4, Berlin 1933. Reprint: Chelsea, New York 1948.
[27] Kolmogoroff (Kolmogorov), A.: Grundbegriffe der Wahrscheinlichkeits-
rechnung. Springer, Ergebnisse der Mathematik und ihrer Grenzgebiete
2, Heft 3, Berlin 1933. English translation: Foundations of the Theory of
Probability, Chelsea, New York 1950.

H. Fischer — Selected Works of W. Feller, Volume 1 65


[28] Lebesgue, H.: Leçons sur l’intégration et la recherche des fonctions prim-
itives. Gauthier–Villars, Paris 1904.
[29] Lebesgue, H.: Recherches sur la convergence des séries de Fourier. Mathe-
matische Annalen 61 (1905) 251–280. Reprinted in: Œuvres scientifiques
T. 3, L’Enseignement Mathématique, Genève 1972, pp. 181–210.
[30] Lebesgue, H.: Sur l’intégration des fonctions discontinues. Annales Sci-
entifiques de l’E.N.S., 3e série 27 (1910) 361–450. Reprinted in: Œuvres
scientifiques T. 2, L’Enseignement Mathématique, Genève 1972, pp. 185–
274.
[31] Łomnicki, Z. and Ulam, S.: Sur la théorie de la mesure dans les espaces
combinatoires et son application au calcul des probabilités. I. Variables
indépendantes. Fundamenta Mathematicae 23 (1934) 237–278. Reprinted
in: Ulam, S.: Sets, Numbers, and Universes, W.A. Beyer, J. Mycielski,
and G.-C. Rota (eds.), MIT Press, Cambridge (Mass.) 1974, pp. 79–120.
Commentary by Z. Łomnicki: pp. 679 f.
[32] Martin-Löf, P.: The literature on von Mises’ collectives revisited. Theoria
35 (1969) 12–37.
[33] Montel, P. and Rosenthal, A.: Integration und Differentiation. In: Enzy-
klopädie der mathematischen Wissenschaften, zweiter Band, dritter Teil,
zweite Hälfte. Teubner, Leipzig–Berlin 1923–1927, pp. 1031–1135.
[34] Moore, G. H.: Zermelo’s Axiom of Choice, its Origins, Development, and
Influence. Springer, Studies in the History of Mathematics and Physical
Sciences 8, New York 1982.
[35] Peano, G.: Applicazioni geometriche del calcolo infinitesimale. Bocca,
Torino 1887.
[36] Pier, J.-P.: Intégration et mesure 1900–1950. In: J.-P. Pier (ed.): Develop-
ment of Mathematics 1900–1950. Birkhäuser, Basel–Boston–Berlin 1994,
pp. 517–564.
[37] Radon, J.: Theorie und Anwendungen der absolut additiven Mengenfunk-
tionen. Sitzungsberichte der Kaiserlichen Akademie der Wissenschaften in
Wien, mathematisch-naturwissenschaftliche Klasse, Abt. IIa 122 (1913).
Reprinted in: P. Gruber et al. (eds.) Gesammelte Abhandlungen, Band 1,
Birkhäuser, Basel 1987, pp. 45–189.
[38] Saks, S.: Théorie de l’intégrale. Warszawa 1933.
[39] Saks, S.: Remarks on the differentiability of the Lebesgue indefinite inte-
gral. Fundamenta Mathematicae 22 (1934) 257–261.
[40] Saks, S.: Theory of the Integral. Hafner, New York 1937. Reprint: Dover
Publications, New York 2005.

66 Measure Theory and Foundations of Probability


[41] Schilling, R.L. and Woyczyński, W.A.: William Feller: A Biography. These
Selecta, 17 ff.

[42] Shafer, G. and Vovk, V.: The sources of Kolmogorov’s Grundbegriffe. Sta-
tistical Science 21 (2006) 70–98.

[43] Siegmund-Schultze, R.: Sets versus trial sequences, Hausdorff versus von
Mises: “Pure” mathematics prevails in the foundations of probability
around 1920. Historia Mathematica 37 (2010) 204–241.

[44] Thomson, B.S.: Differentiation. In: E. Pap (ed.): Handbook of Measure


Theory, Vol. 1. Elsevier, Amsterdam 2002, pp. 181–247.

[45] Tornier, E.: Wahrscheinlichkeitsrechnung und Zahlentheorie. Journal für


die reine und angewandte Mathematik 160 (1929) 177–198.

[46] Tornier, E.: Grundlagen der Wahrscheinlichkeitsrechnung. Acta Mathe-


matica 60 (1933) 239–380.

[47] Tornier, E.: Wahrscheinlichkeitsrechnung und allgemeine Integrationsthe-


orie. Teubner, Leipzig–Berlin 1936.

[48] Ville, J.: Étude critique de la notion de collectif. Gauthier–Villars, Paris


1939.

[49] Vitali, G.: Sui gruppi di punti e sulle funzioni di variabili reali. Atti della
Reale Accademia delle Scienze di Torino 43 (1908) 229–246. Reprinted
in: Opere sull’analisi reale e complessa, Ed. Cremonese, Firenze 1984,
pp. 257–276.

[50] Volterra, V.: Sopra le funzioni che dipendono da altre funzioni. Atti della
Reale Accademia dei Lincei–Rendiconti 3 (1887) 97–105. Reprinted in:
Opere matematiche, Vol. 1, Accademia Nazionale dei Lincei, Roma 1954,
pp. 294–314.

[51] Von Mises, R.: Grundlagen der Wahrscheinlichkeitsrechnung. Mathema-


tische Zeitschrift 5 (1919) 52–99. Reprinted in: Ph. Frank et al. (eds.):
Selected Papers of Richard von Mises, Volume 2. American Mathematical
Society, Providence (RI) 1964, pp. 57–106.

[52] Von Plato, J.: Creating Modern Probability. Cambridge University Press,
Cambridge 1994.

[53] Wald, A.: Die Widerspruchsfreiheit des Kollektivbegriffes der Wahrschein-


lichkeitsrechnung. Ergebnisse eines mathematischen Kolloquiums, Heft 8
(1937) 38–72. Reprinted in: E. Dierker and K. Sigmund (eds.): Ergeb-
nisse eines mathematischen Kolloquiums, Springer, Wien–New York 1998,
pp. 418–452.

H. Fischer — Selected Works of W. Feller, Volume 1 67


[54] Wald, A.: Die Widerspruchsfreiheit des Kollektivbegriffes. In: Colloque
consacré à la théorie des probabilités. Deuxième partie: Les fondements du
calcul des probabilités. Hermann, Actualités Scientifiques et Industrielles
735, Paris 1938, pp. 79–99. Reprinted in: Selected Papers in Statistics and
Probability, Stanford University Press, Stanford 1955, pp. 25–45.

Dr. Hans Fischer


Katholische Universität Eichstätt–Ingolstadt
Fachbereich Mathematik
Mathematisch–Geographische Fakultät
Ostenstraße 26
85072 Eichstätt, Germany
hans.fi[email protected]

68 Measure Theory and Foundations of Probability


Feller’s Early Work on
Limit Theorems
by Hans Fischer from Eichstätt

Feller’s papers on the central limit theorem (1935) and the weak law of large
numbers (1937), henceforth abbreviated by CLT and WLLN, prepared the
ground for his high reputation. Both papers, as well as further articles on
related topics from the same period, were written in the style of classical
analysis. Probabilistic concepts and notations were used in a very restrained
mode only, and this circumstance contributed, at a time when probability
theory did not belong to the central topics of the mathematical canon, to the
popularity of Feller’s work on limit theorems.

Historical scope
Since the early days of probability theory, the WLLN and CLT had been
crucial parts of its development.1 Jakob Bernoulli by giving explicit estimates
showed in his Ars conjectandi (1713, [6]2 ) that, expressed in modern terms,
in a Bernoulli process with success probability p the relative frequency hn of
successes among n trials obeys a limit relation of the form

lim P(|hn − p| > ) = 0 ∀ > 0.


n→∞

As can be seen from Bernoulli’s private papers, he possessed the essential


ideas for this theorem already since around 1690 [51, pp. 117 f.]. In 1733,
Abraham de Moivre [46] “refined” this result by showing that the probability
of k successes among n trials in a Bernoulli process approximately follows a

1 Comprehensive historical accounts on this part of the history of stochastics, which the

reader may consult for details, are [1],2 [18], [20]. For Feller’s and Lévy’s work on the CLT
around 1935, in particular, see [29].
2 The references [Feller 19nn] and [*Feller 19nn] (the star indicating that the respective

paper is not contained in these Selecta) refer to Feller’s bibliography, while [n] points to the
list of references at the end of this essay.

Ó Springer International Publishing Switzerland 2015 69


R.L. Schilling et al. (eds.), Selected Papers I,
normal distribution if n is “large”. This achievement corresponded to the local
form of what is today called the de Moivre–Laplace theorem.
Around 1810, Laplace took an essential step in deriving normal approxima-
tions in cases which were by far more general than the binomial. He considered
sums of independent random variables (henceforth abbreviated by “r.vs”) Xk ,
which, with the only exception of two-valued variables, always were assumed
to have a common density with a compact support. His approximation in the
sense of an integral version of the CLT corresponded to
   a
 n  √ 2
  2 − x
P  (Xk − EX1 ) ≤ a n ≈ √ e 2σ2 dx, (σ 2 := VarX1 ).
  σ 2π 0
k=1

Laplace did not elaborate a general mathematical theory of the CLT but
he derived his approximations in each particular situation again and again,
though by the same methods in each case. Laplace also began to consider
CLTs for higher-dimensional r.vs. Despite some significant problems regard-
ing details, this situation was however always viewed as a by-product of the
one-dimensional case by him and his successors. Therefore, in this survey the
multi-dimensional case is omitted. Laplace’s CLT immensely enlarged the field
of applications of probability theory, as in error theory, hypothesis testing, or
also in an early approach to a theory of risk. All these achievements were col-
lected in Laplace’s Théorie analytique des probabilités [28], the most significant
monograph of probability theory during the 19th century, whose first edition
appeared in 1812. By considering probabilities
   a√n
 n  2
  2 − x
P  (Xk − EX1 ) ≤ na ≈ √ e 2σ2 dx ≈ 1
  σ 2π 0
k=1

for “very large” n, Laplace also derived assertions corresponding to the weak
law of large numbers, and he interpreted them in the sense of regularities in
nature and society which appear in the long run.
Today the WLLN is often connected with Poisson’s name. This math-
ematician extended Laplace’s CLT towards non-identically distributed, uni-
formly bounded and independent variables (“quantités variables”) Xk by the
following “approximation”:
⎛ ⎞
n  γ
⎝ (X − EX ) ⎠ 1 2
e−u du (n  1).
k k
(1) P γ≤   k=1
≤γ ≈ √
2 n
VarX π γ
k=1 k

Poisson’s “law of large numbers” has to be understood not as a mathematical


theorem in the narrow sense but as a universal law on the stability of rela-
tive frequencies or, more generally, of arithmetic means in nature and society,
cf. [49]. Poisson tried to substantiate mathematically this law by a model of
causation, and in this context he derived from the CLT an assertion which

70 The Early Work on Limit Theorems


corresponds to the modern form of the traditional WLLN for independent
random variables Xk :
 n 
 k=1 (Xk − EXk ) 
(2) ∀ > 0 : P   >  → 0 (n → ∞).
n 

Already with Laplace, the CLT and WLLN had gained, beyond their sig-
nificance in applications of probability theory, a purely mathematical quality,
due to the specific analytical methods employed, such as Fourier methods
and procedures for approximating “functions of large numbers”. This trend
was intensified in later investigations, such as Cauchy’s – even according to
modern standards rigorous – proof of a CLT [11] (under quite restrictive as-
sumptions, however), and was especially emphasized in Chebyshev’s pertinent
work. Pafnutii Lvovich Chebyshev tried to embed the CLT and WLLN into his
theory of moments, starting with an innovative proof [12] for the WLLN in the
form of (2) with the help of the now so-called Bienaymé–Chebyshev inequal-
ity for independent random variables Xk (simply “quantités” in Chebyshev’s
words):
⎛   ⎞
   n

n
   1
P ⎝ (Xk − EXk ) ≤ α VarXk ⎠ > 1 − 2 .
  α
k=1 k=1

Bienaymé (1853, [10]) had derived an analogous inequality in the special situ-
ation of linear combinations of observational errors. In this way, the WLLN,
which up to this time had been a corollary of the CLT, reached autonomy.
Chebyshev’s version of the CLT (1887, [13]) is very notable already because he
replaced the somewhat vague assertions on “approximations” like (1), which
had been common up to his time, by a precisely stated limit relation. He
considered a sequence of “quantités” u1 , u2 , u3 , . . . (which were only tacitly as-
sumed to be mutually independent) with zero means such that the absolute
values |Eusk | of all moments of order s ∈ N \ {1} were uniformly bounded for
all k by a constant depending on s. Then, as Chebyshev argued, the limit
relation
⎛ ⎞
n  t
⎝ u ⎠ 1 2
e−x dx (n → ∞)
k
P t≤   k=1
≤t → √
2 n
Eu2 π t
k=1 k

was valid. In order to prove this, Chebyshev tried to compare the moments
of the normed sum and the normal limit distribution for n → ∞, but he did
not succeed in giving entirely sound arguments. The flaws of his proof were
eventually eliminated by his disciple Andrei Markov
 in 1898/99 [42], [43], after
having introduced the additional assumption n k=1 Euk /n > α > 0 for all n.
2

Aleksandr Lyapunov, like Markov a member of the so-called St. Petersburg


school that had been founded by Chebyshev, published only a short time after
Markov in 1900 and 1901 the papers [40], [41], in which the assumptions for

H. Fischer — Selected Works of W. Feller, Volume 1 71


the CLT in the case of independent random variables Xk were considerably re-
laxed. Merely the existence of absolute moments E|X|2+δ k had to be supposed
for some positive δ > 0 according to Lyapunov’s second paper. For the proofs,
in contrast to the custom of the St. Petersburg school, Fourier methods were
successfully applied. But up to the first years of the 1920s, there was a certain
competition between Fourier methods on the one hand, and moment methods
on the other. Markov [45] was able to prove the CLT in Lyapunov’s version by
moment methods, and he also extended the CLT as well as the WLLN to sums
of dependent random variables forming Markov chains (see, e. g., [44]). The
article by Georg Pólya [50] in which the CLT obtained its definite name, was
also dedicated to the relations between moments and CLT. Finally, Fourier
methods prevailed over moment methods since Fourier transforms of distribu-
tions, or, which is the same, characteristic functions R t → EeitX of random
variables X, do not require the existence of moments of any order. The most
essential tool in this connection was Paul Lévy’s theorem on the “continuous
correspondence” between characteristic functions and distribution functions
[30]:

Let Vn be a sequence of distribution functions and ϕ any complex function


defined in R such that
 ∞
eitx dVn (x) → ϕ(t) (n → ∞)
−∞

uniformly in all bounded intervals of t. If ϕ is the characteristic function of a


random variable with distribution function V , then

Vn (x) → V (x) (n → ∞)

in all x ∈ R where V is continuous.

Characteristic functions were at least implicitly used since Laplace, and


they were especially appropriate for treating sums of independent random
variables thanks to the basic property

Eeit(X1 +X2 ) = EeitX1 EeitX2

for independent X1 , X2 .
In view of the beginning dominance of characteristic functions it was sur-
prising that Jarl Waldemar Lindeberg [38] succeeded in 1922 in proving the
CLT by a very elementary and direct method under very weak assumptions
which later even turned out to be necessary in the case of uniform “smallness”
of all summands:

If for a sequence X1 , X2 , X3 , . . . of independent r.vs with zero means and finite


variances the respective distribution functions V1 , V2 , V3 , . . . of these r.vs satisfy

72 The Early Work on Limit Theorems


the condition
n 
 
n
∀ > 0 : lim x2 dVk (x) = 0 (rn2 := VarXk ),
n→∞
k=1 |x|> rn k=1

then the limit relation


n
k=1 Xk
(3) P ≤ x → Φ(x) (n → ∞),
rn
Φ denoting the distribution function of the standard normal distribution, holds.

During the first half of the 1920s another important aspect of modern limit
theorems emerged: General norming. In his research on stable limit distribu-
tions for sums of independent identically distributed r.vs, Lévy (see, e. g., [31])
considered sums of the form n k=1 Xk /an with positive “constants” an . This
included normal limit distributions as a particular case, and in this way a CLT
for independent r.vs even with infinite second order moments was likewise es-
tablished. In this case, norming by means of the standard deviation rn , as
in (3), is not possible. For the time being, shift constants were not explicitly
considered by Lévy. Already in 1922, Sergei Bernshtein [7] published a note in
which the assertion of the CLT was not only extended from purely indepen-
dent to “almost independent” r.vs (including r.vs which form Markov chains),
but also to r.vs without necessarily existing second moments by considering
truncated variables (a comprehensive account [8] appeared in 1926). Instead
of “classical” normed sums

n  n
(Xk − EXk )  VarXk ,
k=1 k=1

Bernshtein considered sums of the type




n 
n

(Xk − EXnk )   ,
VarXnk
k=1 k=1

where with suitably chosen positive Nn such that Nn → ∞:



 Xk if |Xk | ≤ Nn
Xnk :=
0 else.

The truncation technique had been introduced by Markov in his above-men-


tioned proof of Lyapunov’s CLT by moment methods. Truncated variables
have moments of arbitrarily high order, and thus it is possible – at least in
principle – to apply moment methods even if r.vs without moments are given.
Later on, the truncation of r.vs would become a universal tool in almost all
investigations on limit theorems.

H. Fischer — Selected Works of W. Feller, Volume 1 73


Between ca. 1925 and ca. 1935 a certain pause in activities surrounding the
CLT can be observed. Instead, strong and weak laws of large numbers, and,
with growing intensity, stochastic processes were studied. In the field of the
WLLN for independent r.vs, Kolmogorov [26] reached an especially general
result already in 1928. With a little hindsight, Kolmogorov’s main theorem
implies the one in [Feller 1937b], even if Kolmogorov explicitly only dealt with
“classical” arithmetic means, see below for further details.
In 1934, Feller came to Sweden, and a close collaboration with Harald
Cramér, one of the most prominent exponents of modern mathematical statis-
tics, began. The latter had derived excellent knowledge and skills in applying
characteristic functions, mainly through his work during the 1920s on asymp-
totic expansions of the differences between actual distributions of sums of
independent r.vs and the approximating normal distributions. Therefore, it
is no surprise that Feller started his research on sums of independent random
variables by means of characteristic functions, even less so as the analytic
character of these problems was in line with his own way of thinking within
classical analysis.

The statement of Feller’s CLT, and the UAN


condition
In his work on limit theorems around 1935, Feller in most cases chose “purely
analytic” [Feller 1935c, p. 521] formulations which relied rather on distribu-
tion functions than on r.vs. For a better comparability with related work by
other authors we repeat Feller’s main theorem (called “criterion” by him, see
[Feller 1935c, p. 526 f.]) in the language of r.vs:
Let X1 , X2 , . . . be a sequence of independent r.vs with the respective distribution
functions Vk . Each of these r.vs is assumed to have a zero median. We further
introduce the notation
  n 


pn (δ) := min r ∈ [0, ∞)  P(|Xk | > r) ≤ δ (δ > 0).

k=1
In order that there exist sequences (an ) (an > 0) and (bk ) of real numbers such
that
(4) ∀ > 0 : max P(|Xk − bk | > an ) → 0 (n → ∞)
1≤k≤n

and

n n
k=1 (Xk − bk )
(5) P ≤ x → Φ(x) (n → ∞),
an
k=1
it is necessary and sufficient that
p2n (δ)
∀δ > 0 : lim n  = 0.
2
k=1 |x|≤pn (δ) x dVk (x)
n→∞

74 The Early Work on Limit Theorems


In this theorem the statement (4), which is added to the common limit
assertion (5) is noticeable. The insight in the significance of this restriction
was very important for Feller’s success. (4) is often called (probably follow-
ing Loève, cf. [39]) the “UAN condition”, where UAN stands for “uniformly
asymptotically negligible”. Roughly speaking, the UAN assumption means
that the influence of each individual summand on the distribution of the whole
(normed) sum is uniformly “small”. As it seems, clarifying the significance of
the UAN condition was a crucial precondition for the definitive solution of
the central limit problem. Already around 1935 Feller (and at the same time
Lévy) had become quite convinced that (roughly speaking) the distribution of
a normed sum of a very large number of independent r.vs can only be very close
to a normal distribution if the single summands either obey a UAN condition,
or are themselves normally distributed. A rigorous proof for this assertion was
finally elaborated by Lévy [34, p. 97–101] in 1937 on the basis of a 1936 theo-
rem of Cramér [15]. The remarks on not negligible summands in [Feller 1935c,
p. 530–532] have to be understood in this context.

The priority issue concerning the CLT


After a longer break, Lévy had resumed work in probability from about 1930,
now in the field of strong laws. In this context he also derived entirely new
analytic tools, which he called “concentration” and “dispersion”. Given any
interval length l > 0, the concentration assigned to l of a r.v. X is defined as
fX (l) := sup P(a < X < a + l),
−∞<a<∞

and, given any probability γ ∈ [0, 1), the dispersion of X with respect to γ is
defined as
ϕX (γ) := inf{x ∈ (0, ∞) | fX (x) ≥ γ}.
Roughly speaking, the concentration function gives the maximum probability
belonging to any interval of a given length l, and the dispersion function is
the minimum interval length belonging to a certain probability level.3 In
this context, Lévy (1931, [32]) discovered that his new techniques could be
applied to the CLT, and he stated (without proof) the up to then most general
version of the CLT. On the sequence of independent r.vs (Xk ), Lévy imposed
a condition which was sufficient for the limit assertion (5) and already very
similar to the one he would state in the subsequent 1935 paper [33]; under the
UAN assumption it proved to be even necessary. This latter condition was:
 
(6) ∀ > 0 : P max1≤k≤n |Xk | > Ln → 0 (n → ∞),

where Ln denotes the dispersion of n k=1 Xk with respect to any fixed prob-
ability level in (0, 1). Lévy already in 1931 alluded to the case of uniformly
3 More precisely, dispersion is the generalized inverse (continuous from the left) of fX .

H. Fischer — Selected Works of W. Feller, Volume 1 75


“small” r.vs, but at this time he was apparently not yet ready to accept a
general restriction to uniform smallness.4
At the end of 1935, the article [33] of Lévy appeared, and in this article
also a proof was sketched that for a sequence of independent r.vs, each with a
zero median, under the general precondition of UAN, expressed through

∀ > 0 : max P(|Xk | > Ln ) → 0 (n → ∞)


1≤k≤n

by Lévy (Ln being defined as above), the condition (6) is equivalent to the
existence of sequences (an ) (an > 0) and (bk ) such that (5) is true. Lévy’s
assertion is equivalent to Feller’s “criterion”, and his arguments are sound (see
[18, pp. 291–296; 308–310]). Lévy [34, p. 107] maintained to have submitted
his article already in the fall of 1934, whereas Feller’s paper was submitted
considerably later in May 1935. According to all we can find out ca. 80 years
later, also Feller’s article appeared around the turn of the years 1935/36, and
in contrast to the statement in [Feller 1937a, p. 304, footnote 4], it was by no
means significantly earlier issued than Lévy’s.5 Both accounts, Feller’s and
Lévy’s, differ so much regarding style, methods, and details that the question
of priority should not really have been of considerable significance.
Feller was usually very careful with references to other authors. According
to this habit, at several places of [Feller 1935c], Lévy’s achievements surround-
ing the “classic” CLT in Lindeberg’s version, the theorem on the convergence
of characteristic functions, and even the role of stable laws as limit distribu-
tions, are hinted at. In [Feller 1937a, p. 304, footnote 4], Lévy’s 1935 paper
[33] is referred to “after a friendly communication by Mr. P. Lévy”.6 Yet, just
at this point the problem begins: Feller only makes a remark on a particular
result of Lévy in the context of identically distributed r.vs. As it seems, Feller
did not realize the full significance of Lévy’s 1935 paper, which was – admit-
tedly – written in a very condensed and idiosyncratic style. If he had done
so he would have registered that the conditions “(7)” and “(8)” on p. 303 of
4 See[18, pp. 272–275] for more details.
5 LeCam [29, p. 85] has carefully investigated the circumstances surrounding the pub-
lication of the just mentioned articles by Lévy and Feller. He was informed by Springer,
the publisher of Mathematische Zeitschrift, that the issue containing Feller’s article was
“abgeschlossen” (which probably means “ready for the printer”) on 8 November 1935. In the
Bibliographie der Deutschen Zeitschriftenliteratur, Vol. LXXVII (July to December 1935),
[Feller 1935c] is cited on p. 733. From the same source we can learn that the part up to
p. 628 of the Mathematische Zeitschrift 40 was issued in 1935. The last page of Feller’s CLT
article in the same volume 40 has the number 559. Within this volume, the particular issue
where Feller’s article appeared cannot be seen. The articles were apparently delivered in dif-
ferent paper layers, but a more precise organization cannot be discerned. A renewed inquiry
to the Springer archive yielded that minutes on the delivery of Mathematische Zeitschrift
are no longer available there. Le Cam (loc. cit.) already reports the same.
6 Apparently, there was some correspondence regarding the CLT between Lévy and Feller.

As it seems, letters are not preserved, though. Lévy did not systematically collect documents,
and, moreover, his pre-war private papers were destroyed during WWII [3, p. 1]. Also in
Feller’s case, letters addressed to him could not be found despite considerable effort by the
editors of these Selecta.

76 The Early Work on Limit Theorems


[Feller 1937a] (which were stated to be equivalent to his “criterion A” on the
same page) were almost the same as Lévy’s CLT conditions in [33, p. 386].
The only small, however crucial, difference was that Lévy beside Feller’s “(8)”
put a condition corresponding to

Xn2
lim    = 0,
n→∞ n 2 dV (x) − ( 2
k=1 |x|<Xn x k |x|<Xn x dV k (x))

whereas in Feller’s “(7)” the term ( |x|<Xn x dVk (x))2 is missing. In fact,
Feller’s “criterion A” was erroneous in the general case, as hinted at in the
erratum to the paper [Feller 1937a].7
In his 1945 survey of the state of the art of probability theory, Feller in
several places quite fairly refers to Lévy’s achievements. The pertinent section
on the CLT [Feller 1945b, pp. 818 f.] starts with a rather lengthy account on
the advantages of general norming for sums of independent r.vs and ends with
the formulation of the CLT in Feller’s version, suggesting to the reader that all
these ideas are mainly due to Feller. Only in a footnote on page 820, is Lévy’s
version (according to [34, p. 107]) of the CLT literally quoted, but without any
comments on Lévy’s very special terminology, such that the reader is scarcely
able to understand the content. Feller at this place only states that Lévy’s
theorem “in a sense, should be equivalent” to his own.8
Lévy, on the other hand, was similarly restrictive in acknowledging Feller’s
results. In a footnote which was added to the discussion of necessary and suf-
ficient conditions for the CLT in his 1937 book [34, p. 107], he maintained that
Feller had only found again (“retrouvé”) the theorem. In the second edition
of this book, Lévy [35, p. 107] was friendlier, in writing – again in a footnote –
that Feller had discovered (“découverte”) the theorem independently of him.
Finally, in his autobiography Lévy [36, p. 108] wrote “Je n’aurai jamais eu de
chance avec la loi de Gauss”. In fact, the impact of Feller’s CLT was by far
higher than of Lévy’s. We will come back to this issue once again in the last
section.

Preliminary studies on the WLLN


Not only in the field of the CLT, but also of the WLLN was a rather intensive
competition around 1930, when several mathematicians tried to reach utmost
generality. There existed preliminary work, in particular by Kolmogorov and
Khinchin, which Feller had to compare with his examinations, and which was
7 The remark there that Wolfgang Doeblin had “succeeded in proving an even more use-

ful formulation of the criterion”, indicating at the same time a possible future publication
by Doeblin, might refer to Doeblin’s paper [16], where on p. 51 a necessary and sufficient
condition for the CLT, including a method to determine norming constants, is given. Doe-
blin’s criterion is, however, rather different from Feller’s “Criterion A”.
8 Feller at this place erroneously asserts that Lévy’s CLT only refers to the case of

vanishing shift constants bn = 0.

H. Fischer — Selected Works of W. Feller, Volume 1 77


also referred to by him [Feller 1937b, pp. 191 f.]. Kolmogorov [26] had (among
other topics and based on particular inequalities) derived a WLLN for triangu-
lar arrays (Xnk ) with n = 1, 2, . . . and k = 1, 2, . . . , mn of rowwise independent
r.vs in the following sense:

 ) by
Suppose that all Xnk have zero medians. Define the array (Xnk

 Xnk if |Xnk | ≤ mn
Xnk :=
0 else.

In order that there exists a sequence (dn ) of real numbers such that
 mn 
 k=1 Xnk 
P   − dn  >  → 0 (n → ∞)
mn

for any  > 0, it is necessary and sufficient that, as n → ∞,


mn

(7) P (Xnk − Xnk ) = 0 →0
k=1

and
1 
mn

VarXnk → 0.
m2n
k=1

Given a (one-parameter) sequence (Xk ) of independent r.vs, this theorem


for mn := n and Xnk := nX k
an implies Feller’s general version of the WLLN.
9

As it seems, Feller did not realize this circumstance – this applies also to
later work, such as [Feller 1945b, p. 827] – and, admittedly, Kolmogorov had
not considered general norming in connection with his theorem.10 There-
fore, Feller’s WLLN, which was shown by entirely different methods than
Kolmogorov’s, was a really important innovation. Indeed, Feller stated that
Kolmogorov had only considered the particular case an = n and referred to
Khinchin [25] with a 1936 paper on a (specialized form of the) WLLN for
independent and identically distributed random variables under more general
norming [Feller 1937b, p. 192]. In 1929, Khinchin [23] had already shown that
for independent identically distributed r.vs Xk the existence of an expectation
μ was sufficient for
 n 
 k=1 Xk 
∀ > 0 : P   − μ >  → 0 (n → ∞).
n
9 Thenecessity of a condition corresponding to “(1)” in [Feller 1937b] is explicitly shown
by Kolmogorov [26, p. 486]. From this condition, Kolmogorov’s condition (7) is deduced,
see [26, p. 317].
10 In [19, p. 105] a clear reference is made to Kolmogorov’s 1928 paper in the context of

the general WLLN, but only in a footnote.

78 The Early Work on Limit Theorems


He gave two different proofs, one via characteristic functions. Plessner [48]
(likewise cited by Feller) provided an extension of this result towards non-
identically distributed independent r.vs with finite moments of first order,
again using characteristic functions.
In the just mentioned 1936 paper, Khinchin by means of characteristic
functions proved a particular necessary and sufficient condition for
 n 
 Xk 
∀ > 0 : lim P  k=1 − 1 >  = 0
n→∞ nf (n)

with an indefinitely growing positive f (n), if the Xk are independent identi-


cally distributed nonnegative
∞ random variables with a continuous distribution
function F such that 0 x dF (x) = ∞.
Feller was anxious in acknowledging previous efforts in connection with the
WLLN. He highly praised the methods of differential equations and character-
istic functions which he brought into a strong connection with the “Moscow
mathematical school” [Feller 1937b, p. 192]. With “differential equations”
Feller most probably meant the investigation of distributions assigned to stochas-
tic processes by means of parabolic differential equations, as expounded in
Khinchin’s 1933 booklet [24]. But notwithstanding the excellence of some
members of the Moscow school, like Khinchin, in applying characteristic func-
tions, it was unfair to champion only the Russians in this field.

Feller and the St. Petersburg paradox


In connection with his general version of the WLLN, Feller also referred at
several places in his work to the so called St. Petersburg paradox, beginning
with his account in the last section of [Feller 1937b]. The problem had origi-
nally been posed in a 1713 letter by Nikolaus Bernoulli to Pierre Rémond de
Montmort. In 1728, Daniel Cramer in a letter to N. Bernoulli slightly sim-
plified the problem (which had at first been on dice throwing) by considering
coin tosses: A gambler repeatedly tosses a coin. The stakeholder gives him one
écu if “head” is reached at the first toss, two écus if “head” falls only at the
second toss, four écus if “head” falls at the third toss for the first time, and,
generally, 2k−1 écus if “head” occurs at the kth toss for the first time. The
“fair” wager which the gambler has to give as an “entrance fee” is, according
to the common rule, equal to the expectation
∞ k−1
 ∞

2 1
= = ∞.
2k 2
k=1 k=1

The game as described ends with an overwhelming probability after only a few
trials, and therefore only a modest gain seems to be attainable. Nobody would
pay a large, or even an “infinite”, amount for such a game, and this appeared to
be a paradoxical situation. A vivid scientific discussion began on the concepts

H. Fischer — Selected Works of W. Feller, Volume 1 79


and notions of probability and expectation, which lasted well into the first
decades of the 20th century.11
In 1738, an extensive essay by Daniel Bernoulli [5] was published in the
proceedings of the St. Petersburg Academy. Very soon, the problem obtained
its name from this institution. Bernoulli introduced the notion of “moral ex-
pectation” (this designation is due to Laplace [28, pp. 441–454]) which depends
on the previous fortune of the gambler. By use of this device, he was able to
calculate a finite wager for a “fair” Petersburg game – from the point of view of
the gambler. With Condorcet (1784, [14], see [22, p. 169]), a frequentist inter-
pretation of the problem was introduced: The expectation is (approximately)
equal to the mean gain among a large number of repetitions, and only in this
scope is the notion of expectation sensible. Because even a single Petersburg
game can take an indefinitely long time, repeatability is not guaranteed. A
fair wager, however, presupposes – at least in principle – that the game can
be repeated as often as one wishes.
Feller adopted the frequentist approach, but now in a quite different con-
ception. In [Feller 1937b, p. 201], Lévy’s comments [31, pp. 122–126] on Joseph
Bertrand’s discussion of the St. Petersburg game [9, pp. 61. f ] are referred to
without mentioning Bertrand’s name. Bertrand had argued that with a suf-
ficiently large number of repetitions (the general repeatability was not ques-
tioned by him), the gambler always has an advantage, regardless of the amount
he would stake before each single game. Yet, Bertrand clearly considered a
priori fixed, if arbitrarily high, stakes but not stakes which could change with
each single game. Lévy introduced the idea of modifying the stakes depending
on the number of repeated games in a disproportionally low (to the advantage
of the gambler) and disproportionally high (to the advantage of the stake-
holder) way. But Feller’s assertion that Lévy by so doing wanted to disprove
the erroneous belief that any system of stakes tending to infinity with indefi-
nitely many repeated games would be advantageous for the gambler, was based
on an exaggerated interpretation of Lévy’s account, which was – in a slightly
toned down way – repeated in [Feller 1945b, p. 826].
In [Feller 1945d] the Petersburg game was discussed again in a more so-
phisticated way. Now, on p. 302 Bertrand’s remarks are correctly rendered
(again without referring to him explicitly), and designated as “trivial”. Under
the assumption that the possible gain after the kth throw is 2k , Feller’s general
version of the WLLN for the overall gain Sn after n repetitions yields:
 
 Sn 
P   − 1 >  → 0 (n → ∞)
n log2 n

for all positive  > 0. Therefore, the accumulated stake for n repeated games
has to be asymptotically equivalent to n log2 n. In [*Feller 1950, pp. 199–201]
this result is explained in an especially elementary and lucid way.
11 See [17] and [22] for comprehensive historical accounts. For a very detailed discussion

of the most important 18th century sources see [4, pp. 239–258].

80 The Early Work on Limit Theorems


Jorland [22, p. 158] credits Feller for having answered the “nontrivial ques-
tion” whether a fair stake is possible in the Petersburg game (both from the
point of view of the gambler and the stakeholder), by introducing variable
stakes for the single games in dependence on the overall number of games
which have happened so far. But, as Feller himself admits, his solution has a
mathematically formal character only:
Variable entrance fees are undesirable in gambling halls, but there
the Petersburg game would be impossible anyway because of lim-
ited resources [*Feller 1950, p. 200].

Conclusion
Feller’s articles on the CLT and the WLLN were regarding methods and style
in the mainstream of the contemporary work on limit distributions of sums of
independent r.vs, and thus one can understand why they were so frequently
referred to in the pertinent literature. In this way Feller’s ideas significantly in-
fluenced further contributions on weak convergence. According to a new focus
on stochastic processes with independent increments, which had evolved since
the late 1920s, the central limit problem could within a few years after 1935 be
extended in a very general way towards limit problems for infinitely divisible
distributions, in particular by the work of Boris Vladimirovich Gnedenko. The
catalytic influence of Feller’s accounts is clearly visible in this work, e. g., by
the similarities of Gnedenko’s necessary and sufficient conditions for conver-
gence to infinitely divisible distributions (see [19, p. 116]) as compared with
conditions (I) and (II) in [Feller 1935c, p. 525].
On the other hand, Lévy did not gain broad acceptance for his newly
derived methods based on concentration and dispersion. Yet there was one
significant exception: Doeblin, one of Lévy’s disciples, in [16] showed how es-
sential results concerning infinitely divisible limit distributions could be treated
by Lévy’s methods as well. Doeblin died in 1941 as a soldier in World War
II,12 and, for the time being, this approach was not followed up, until in 1958
Kolmogorov [27, p. 29] called for a renewed research in those “direct methods”
of probability to which he essentially included considerations on concentration
and dispersion. And actually, to give only one example, it would turn out that
the extension of such direct methods to r.vs with values in infinite-dimensional
Banach spaces had considerable advantages over the use of “characteristic func-
tionals”, see [2].
It is an interesting detail that Feller in the second volume of his book (we
refer to the second edition [*Feller 1971]), beside a comprehensive exposition
on characteristic function methods (Chapters XV, XVI, XVII) also carefully
discusses methods based on semigroups of convolution operators in connection
with limit theorems, in particular the CLT (Chapters VIII, IX). In this way,

12 More precisely, Doeblin killed himself in order to escape from being captured by the

H. Fischer — Selected Works of W. Feller, Volume 1 81


also an account on Trotter’s approach to the Lindeberg method for proving
the CLT is given. Le Cam [29, p. 82] rightly notes that, contrary to the
opinion expressed in [*Feller 1971, p. 256], this variant is still very close to
Lindeberg’s original arguments. In the chapter on characteristic functions
XV, the sufficiency of the Lindeberg condition is shown again, but now also
its necessity is proven. In Feller’s book, the CLT under general norming is
deferred to the “problems for solution” section of Chapter XV (p. 530), if only
in the case of r.vs with symmetric distributions. And at this place a criterion
based on the conditions “(7)” and “(8)” of the paper [Feller 1937a] (see the
discussion above) reappears. In the particular case of symmetric distributions
this criterion (which, as we have seen, is basically due to Lévy) is sound. Feller
mentions neither his own nor Lévy’s previous work at this place.
We must not forget that the CLT in its general form could now be treated
as a particular case of a general limit theorem on convergence to an infinitely
divisible distribution – and, actually, Feller (Chapter XVII) discusses this the-
orem by means of characteristic functions. Moreover, Feller’s book was not
intended to be a specialized monograph on sums of r.vs but a book with a
broad range of topics, among them, naturally, stochastic processes. For these,
the technique of semigroups was an important device, and this field was in the
centre of Feller’s activities (see Jacob’s commentary [21]). Yet, as Feller admits
in his 1971 book (p. 290) with regard to infinitely divisible distributions, “when
applicable, the methods of Fourier analysis lead to sharper results”. And in this
sense, Feller’s early achievements on limit theorems did not become obsolete,
but they were included into a more comprehensive theory for whose develop-
ment they had decisively contributed. Up to the present, the “analytic” way
as pursued by Feller by means of characteristic functions constitutes one of
the predominant “classic” contents of probability theory.
Acknowledgement. I thank René Schilling for inspiring discussions and help-
ful advice, and Ross Maller for careful reading and correcting the text.

References
All citations of the form [Feller 19nn], resp., [*Feller 19nn] (if the respective
paper is not included in these Selecta) point to Feller’s bibliography, pp. xxv–
xxxiv.
[1] Adams, W. J.: The Life and Times of the Central Limit Theorem, 2nd
edn. American Mathematical Society, Providence (RI) 2009. The 1st edn.
appeared in 1974.
[2] Araujo, A. and Giné, E.: The Central Limit Theorem for Real and Banach
Valued Random Variables. Wiley, New York 1980.
[3] Barbut, M., Locker, B. and Mazliak, L.: Paul Lévy – Maurice Fréchet. 50
approaching German troops, see [47].

82 The Early Work on Limit Theorems


ans de correspondance en 107 lettres. Hermann, Paris 2004.

[4] Barth, F. and Haller, R.: Berühmte Aufgaben der Stochastik, von den
Anfängen bis heute. De Gruyter/Oldenbourg, München 2014.

[5] Bernoulli, D.: Specimen theorieae novae de mensura sortis. Commentarii


Academiae Scientiarum Imperialis Petropolitanae 5 (1738) 175–192.

[6] Bernoulli, J.: Ars conjectandi (posthumously published). Thurnisius,


Berlin 1713. Reprinted in Die Werke von Jakob Bernoulli, Vol. 3,
Birkhäuser, Basel 1975.

[7] Bernstein (Bernshtein), S. N.: Sur le théorème limite du calcul des pro-
babilités. Mathematische Annalen 85 (1922) 237–241.

[8] Bernstein (Bernshtein), S. N.: Sur l’extension du théorème limite du calcul


des probabilités aux sommes des quantités dépendantes. Mathematische
Annalen 97 (1926) 1–59.

[9] Bertrand, J.: Calcul des Probabilités. Gauthier–Villars, Paris 1888, 2nd
edn. 1907.

[10] Bienaymé, I. J.: Considérations à l’appui de la découverte de Laplace sur


la loi de probabilité dans la méthode des moindres carrés. Comptes Rendus
Hebdomadaires des Séances de l’Académie des Sciences de Paris 37 (1853)
309–324.

[11] Cauchy, A. L.: Mémoire sur les résultats moyens d’un très-grand nom-
bre des observations. Comptes Rendus Hebdomadaires des Séances de
l’Académie des Sciences 37 (1853) 381–385. Reprinted in Œuvres com-
plètes (1) 12. Gauthier–Villars, Paris 1900, pp. 125–130.

[12] Chebyshev, P. L.: Des valeurs moyennes. Journal de Mathématiques Pures


et Appliquées (2) 12 (1867) 177–184. Originally published in Russian
in Matematicheskii Sbornik 2 (1867) 1–9. Reprinted in Œuvres, T. 1,
Académie Impériale des Sciences, St. Petersburg 1899, pp. 687–694.

[13] Chebyshev, P. L.: Sur deux théorèmes relatifs aux probabilités. Acta Ma-
thematica 14 (1890/91) 305–315. Originally published in Russian in Za-
piski Akademii Nauk 55 (1887). Reprinted in Œuvres, T. 2, Académie
Impériale des Sciences, St. Petersburg 1907, pp. 481–491.

[14] Condorcet, J. A. N. C.: Mémoire sur le calcul des probabilités. Histoire de


l’Académie Royale des Sciences, année MDCCLXXXI. Imprimerie Royale,
Paris 1784.

[15] Cramér, H.: Über eine Eigenschaft der normalen Verteilungsfunktion. Ma-
thematische Zeitschrift 41 (1936) 405–414. Reprinted in A. Martin-Löf
(ed.): Collected Works, Vol. 2, Springer, New York 1994, pp. 856–865.

H. Fischer — Selected Works of W. Feller, Volume 1 83


[16] Doeblin, W.: Sur les sommes d’un grand nombre de variables aléatoires
indépendantes. Bulletin des Sciences Mathématiques 63 (1939) 23–32;
35–64.
[17] Dutka, J.: On the St. Petersburg Paradox. Archive for History of Exact
Sciences 39 (1988) 13–39.
[18] Fischer, H.: A History of the Central Limit Theorem, from Classical to
Modern Probability Theory. Springer, New York 2011.
[19] Gnedenko, B. V. and Kolmogorov, A. N.: Limit Distributions of Sums of
Independent Random Variables. 2nd English edn. Wiley, New York 1968.
[20] Hald, A.: A History of Mathematical Statistics from 1750 to 1930. Wiley,
New York 1998.
[21] Jacob, N.: Feller on Differential Operators and Semi-groups. These Se-
lecta, pp. ?? ff.
[22] Jorland, G.: The Saint Petersburg Paradox 1713–1937. In: L. Krüger et al.
(eds.): The Probabilistic Revolution, Vol. 1. MIT Press, Cambridge (MA)
1987, pp. 157–190.
[23] Khintchine (Khinchin), A.: Sur la loi des grands nombres. Comptes Ren-
dus Hebdomadaires de l’Académie des Sciences de Paris 188 (1929) 477–
479.
[24] Khintchine (Khinchin), A.: Asymptotische Gesetze der Wahrscheinlich-
keitsrechnung. Springer, Ergebnisse der Mathematik und Ihrer Grenzge-
biete 2, Heft 4, Berlin 1933.
[25] Khintchine (Khinchin), A.: Su una legge dei grandi numeri generalizzata.
Giornale dell’Istituto Italiano degli Attuari 7 (1936) 365–377.
[26] Kolmogorov, A. N.: Über die Summen durch den Zufall bestimmter unab-
hängiger Größen. Mathematische Annalen 99 (1928) 309–319; 102 (1929)
484–488. English translation in: A. N. Shiryayev (ed.): Selected Works of
A. N. Kolmogorov, Vol. II, Kluwer, Dordrecht 1992, pp. 15–31.
[27] Kolmogorov, A. N.: Sur les propriétés des fonctions de concentrations de
M. P. Lévy. Annales de l’Institut Henri Poincaré 16 (1958) 27–34. English
translation in: A. N. Shiryayev (ed.): Selected Works of A. N. Kolmogorov,
Vol. II. Kluwer, Dordrecht 1992, pp. 459–465.
[28] Laplace, P.-S. Théorie analytique des probabilités. 1st edn. 1812, 2nd edn.
1814, 3rd enlarged edn. 1820. Courcier, Paris. Reprint of the 3rd edn. in
Œuvres complètes de Laplace VII, Gauthier–Villars, Paris 1886.
[29] Le Cam, L. M.: The Central Limit Theorem around 1935. Statistical Sci-
ence 1 (1986) 78–96. Reprinted in [1, pp. 115–137].

84 The Early Work on Limit Theorems


[30] Lévy, P.: Sur la détermination des lois de probabilité par leurs fonctions
charactéristiques. Comptes Rendus Hebdomadaires de l’Académie des Sci-
ences de Paris 175 (1922) 854–856. Reprinted in [37, pp. 333–335].
[31] Lévy, P.: Calcul des probabilités. Gauthier–Villars, Paris 1925.
[32] Lévy, P.: Sur les séries dont les termes sont des variables éventuelles
indépendantes. Studia Mathematica 3 (1931) 119–155. Reprinted in [37,
pp. 123–159]
[33] Lévy, P.: Propriétés asymptotiques des sommes des variables aléatoires
indépendantes ou enchaînées. Journal de Mathématiques Pures et Ap-
pliquées 14 (1935) 347–402. Not reprinted in Œuvres.
[34] Lévy, P.: Théorie de l’addition des variables aléatoires. Gauthier–Villars,
Collection de monographies sur la théorie des probabilités Fasc. 1, Paris
1937, 1st edn.
[35] Lévy, P.: Théorie de l’addition des variables aléatoires. Gauthier–Villars,
Collection de monographies sur la théorie des probabilités Fasc. 1, Paris
1954, 2nd edn.
[36] Lévy, P.: Quelques aspects de la pensée d’un mathématicien. Blanchard,
Paris 1970.
[37] Lévy, P. and D. Dugué (eds.): Œuvres de Paul Lévy III. Gauthier–Villars,
Paris 1976.
[38] Lindeberg, J. W.: Eine neue Herleitung des Exponentialgesetzes in der
Wahrscheinlichkeitsrechnung. Mathematische Zeitschrift 15 (1922) 211–
225.
[39] Loève, M.: Probability Theory, 1st edn. Van Nostrand, Princeton 1955.
[40] Lyapunov, A. M.: Sur une proposition de la théorie des probabilités. Bul-
letin de l’Académie Impériale des Sciences de St.-Pétersbourg (5) 13
(1900) 359–386. English translation in [1, pp. 151–171].
[41] Lyapunov, A. M.: Nouvelle forme du théorème sur la limite de probabilité.
Mémoires de l’Académie Impériale des Sciences de St.-Pétersbourg V III e
Série, Classe Physico-Mathématique 12 (1901) 1–24. English translation
in [1, pp. 175–191].
2 m −x2
[42] Markov, A. A.: Sur les racines de l’équation ex d dxe m = 0. Bulletin de
l’Académie Impériale des Sciences de St.-Pétersbourg (5) 9 (1898) 435–
446.
[43] Markov, A. A.: Zakon bolshikh chisel i sposob naimenshikh kvadratov.
(Izvlechenie iz pisem A. A. Markova k A. V. Vasilevu). Izvestiya fiz.-mat.
obschestva Kazan univ. (2) 8 (1899) 110–128. English translation “The

H. Fischer — Selected Works of W. Feller, Volume 1 85


law of large numbers and the method of least squares” in: O. B. Sheynin
(ed.): Probability and Statistics, Russian Papers. NG-Verlag, Berlin 2004,
pp. 130–142.
[44] Markov, A. A.: Rasprostranenie predelnykh teorem ischisleniya veroyat-
nostei na summu velichin v tsep. Mémoires de l’Académie Impériale des
Sciences de St.-Pétersbourg, Classe physico-mathématique (8) 22 (1908)
1–29. English translation “The extension of the limit theorems of the cal-
culus of probability onto a sum of magnitudes connected into a chain”
in: O. B. Sheynin (ed.): Probability and Statistics, Russian Papers. NG-
Verlag, Berlin 2004, pp. 159–180.
[45] Markov, A. A.: O nekotorykh sluchayakh teoremy o predele veroyatnosti.
Bulletin de l’Académie Impériale des Sciences de St.-Pétersbourg (6) 2
(1908) 483–496. An amended version “Teorema o predele veroyatnosti
dlya sluchaev akademika A. M. Lyapunova” appeared as a supplement to
the 3rd edn. of Markov’s book Ischislenie veroyatnostei (1913). English
translation “The theorem on the limit of probability for the Liapunov
case” in: Nekrasov, P. A.: The Theory of Probability, O. B. Sheynin (ed.).
NG-Verlag, Berlin 2004, pp. 141–155.
[46] de Moivre, A.: Approximatio ad summam terminorum binomii (a + b)n in
seriem expansi. 7 pages offprint, 1733.
[47] Petit, M.: L’équation de Kolmogoroff: Vie et mort de Wolfgang Doeblin,
un génie dans la tourmente nazie. Gallimard, Folio 4240, Paris 2005.
[48] Plessner, A.: Über das Gesetz der großen Zahlen. Matematicheskii Sbornik
43 (1936) 165–168.
[49] Poisson, S. D.: Recherches sur la probabilité des jugements en matière cri-
minelle et en matière civile, précédés des règles générales du calcul des
probabilités. Bachelier, Paris 1837.
[50] Pólya, G.: Über den zentralen Grenzwertsatz der Wahrscheinlichkeits-
rechnung und das Momentenproblem. Mathematische Zeitschrift 8 (1920)
171–181.
[51] Schneider, I.: Die Entwicklung der Wahrscheinlichkeitstheorie von den
Anfängen bis 1933. Einführungen und Texte. Wissenschaftliche Buchge-
sellschaft, Darmstadt 1988.
Dr. Hans Fischer
Katholische Universität Eichstätt–Ingolstadt
Fachbereich Mathematik
Mathematisch–Geographische Fakultät
Ostenstraße 26
85072 Eichstätt, Germany
hans.fi[email protected]

86 The Early Work on Limit Theorems


Feller and Busemann on
Surface Theory —
Contributions to Geometry
by Erhard Scholz from Wuppertal

1 General remarks, historical context


Willy Feller left Kiel shortly after the Nazis seized power in Germany. During
the later part of 1933 and much of 1934 he lived at Copenhagen and commu-
nicated with Harald Bohr and the geometer Tommy Bonnesen, both editors of
the Matematisk Tidsskrift of the Danish Mathematical Society. Moreover, he
met and cooperated with Werner Fenchel, Herbert Busemann and, since early
1934, Otto Neugebauer with whom he coedited the Zentralblatt für Mathematik
(until 1938). The other three emigrants arrived from the Göttingen Institute
of Mathematics which was particularly strongly affected by the Nazi purges
[25, 24].1 In Copenhagen they found a friendly and congenial environment.
H. Bonnesen had long been interested in convex geometry; in 1934 he and
the younger Fenchel finished a book on the topic [9]. Also Feller and Busemann
were attracted by the subject. They started to explore the possibilities to gen-
eralize classical differential geometric properties of curves and surfaces without
presupposing differentiability. They presented their results in their main arti-
cle in Acta Mathematica [Feller 1936b] and three shorter papers in Matematisk
Tidsskrift [*Feller 1935a], [*Feller 1935b], [*Feller 1936a]. The manuscript of
[Feller 1936b] was finished first in April 1934 and printed in July 1935, but
the publication was delayed until the appearance of the (single) issue of Acta
Mathematica, vol. 66. In the meantime the other papers had already appeared
in Matematisk Tidsskrift. Feller had already moved to Sweden (first to Stock-
holm and later to Lund), which he left, in 1939, for Providence, Rhode Island
(Brown University). Busemann moved to the US in 1936, first with temporary
1 The references [Feller 19nn] and [*Feller 19nn] (the star indicating that the respective

paper is not contained in these Selecta) refer to Feller’s bibliography, while [n] points to the
list of references at the end of this essay.

Ó Springer International Publishing Switzerland 2015 87


R.L. Schilling et al. (eds.), Selected Papers I,
positions at Princeton, Baltimore, Illinois and other places before he finally
got a professorship at the University of South California, Los Angeles.
While Busemann and Fenchel continued research in convex geometry and
became respected in the slowly growing field,2 Feller’s research interests shifted
to probability.3 Nevertheless, as a kind of late extension of their common work,
Feller and Busemann published a final joint paper on regularity conditions for
functions and surfaces, slightly generalizing the conditions of convex geometry
after the second devastating war of the 20th century [Feller 1945c].

2 Infinitesimal geometry of curves and surfaces,


generalized
2.1 Background of the work and basic concepts
Feller and Busemann stated a clear goal for their joint work. They wanted to

[. . . ] develop the elements of the differential geometry of convex


surfaces without the common regularity conditions [Feller 1936b,
p. 1].4
Their main methods were geometrical rather than analytical.5 Other au-
thors had already started along similar lines of investigation. Among them
our authors referred, most prominently, to generalizations of tangents by G.
Bouligand in his Introduction à la géométrie infinitésimale directe (1931) [10],
to a variant of Meusnier’s theorem discussed by Hjelmslev in his Grundlag
for Fladernes Geometri (1914) [18], and to a generalized curvature definition
for curves without assuming differentiability proposed by H. Bohr’s doctoral
student B. Jessen in 1929 [19]. In 1935 the latter succeeded T. Bonnessen on
his Copenhagen chair, while Hjelmslev continued teaching geometry at Copen-
hagen University for several more years. Feller and Busemann nicely connected
to their new local mathematical environment, and stepped forward.
Their objects of inquiry where continuous curves and surfaces in R3 , defined
as “topological images” of a closed interval or a disk respectively [*Feller 1936a,
pp. 6, 8], endowed with additional regularity conditions only during the course
of investigation. We may rephrase such topological image as that of a home-
omorphism onto a closed subspace in R3 (equivalently, as the image of an
injective continuous and closed map).
As a prerequisite for generalizing differential geometry to these objects they
first introduced the concepts of tangent, curvature circle and radius, osculating
plane etc. for point sequences {pn } with limit pn −→ p in analogy to the

2 For an outline of the history of convexity see [16].


3 See William Feller, 1906–1970, this volume.
4 All translations from the German originals by the commentator (E. Scholz).
5 That emphasis may be due to the coauthor H. Busemann; his doctoral dissertation of

88 Contributions to Geometry
classical approximation (or limit) procedures, although now reduced to the case
of single point sequences [Feller 1936b, p. 5 f.]. That allowed to introduce a
left and/or right (semi-)tangent, (two-sided) tangent, strong tangent (Tangente
im scharfen Sinne), and of lower and upper curvature for (continuous) curves.
Often they considered the whole collection of point sequences converging to
the same point on the curve (possibly restricted to converge from “left” or
“right” only). Of course, existence of limits is not always secured but had to
be stipulated. “Lower” and “upper” curvatures result from considering the
infimum or supremum of respective values. A tangent is called strong (“im
strengen Sinne”), if all limits of secants pk pl with pk → p and pl → p exist and
are equal; similarly for a strong tangent plane.
The existence of a strong tangent t of a curve in p is equivalent to a re-
markable regularity condition. If, after choosing local coordinates (x, y, z) close
to p, with x-axis along t, the curve is given by y = f (x), z = g(x), then the
strong tangency condition is equivalent to differentiability of f and g almost
everywhere (a. e.) and, additionally, f  (xn ) → 0, g  (xn ) → 0 for all sequences
{xn }, xn → x0 , for which the derivatives exist [Feller 1936b, p. 6]. On the
other hand, a curve which is locally C 1 at p may possess a curvature in p
without existence of second derivatives [Feller 1936b, p. 7]. Only for convex
curves “circumstances are particularly simple”. Here our authors could report
a result of H. Bohr’s doctoral student: For a plane convex curve (two-sided)
tangents exist a. e., and also second differentiability holds a. e. [19].

2.2 Meusnier’s theorem and Dupin’s indicatrix


For a surface given by a graph, z = f (x, y),6 the existence of a strong tangent
plane at (x0 , y0 ) is equivalent to the following analytical regularity condition:
Existence a. e. of partial derivatives fx , fy , in particular at the coordinates
of the tangent point (x0 , y0 ) where they are even continuous; moreover, in a
neighbourhood of (x0 , y0 ) f (x, y) satisfies, for almost all values of one variable,
Lebesgue’s condition of absolute continuity with regard to the other one.7
From this regularity assumption Busemann and Feller were able to derive
a variant of Meusnier’s theorem [Feller 1936b, p. 10 f.].
Theorem 1 (Meusnier–Busemann–Feller). Consider a continuous surface S
in R3 with strong tangent plane Π at the point p and two point sequences {pn }
and {pn } both converging “differently” to p, although with the same tangent

1931 (supervisor R. Courant) had dealt with Über die Geometrien, in denen die “Kreise
mit unendlichem Radius” die kürzesten Linien sind.
6 Such a representation exists for any surface with strong tangent plane at p not orthog-

onal to the (x, y)-plane.


7 Feller wrote “totalstetig”, literally “totally continuous”, explaining it by Lebesgue’s

condition: If g  (x) exists a.e. and for any subinterval [a , b ] ⊂ [a, b] the fundamental theorem
 b
of calculus is applicable, g  (x)dx = g(b ) − g(a ) [Feller 1936b, Footnote 14].
a

E. Scholz — Selected Works of W. Feller, Volume 1 89


t.8 If both sequences possess curvature radii ρ, ρ and osculating planes forming
angles θ and θ , respectively, to the normal at p, then

ρ ρ
(1) = .
cos θ cos θ
Of course, the usual Meusnier formula arises if one of the sequences has a
normal osculating plane (θ = 0). Then ρ = ρN and ρ = ρN cos Θ.
Moreover, Busemann and Feller considered collections of point sequences
situated on two curves γ and γ  , respectively, and converging only one-sidedly.
Then lower and upper curvatures of γ and γ  at p satisfy the relation (1) sep-
arately. They claimed that “an essentially equivalent” result had been derived
by Hjelmslev, although the latter had assumed continuously varying tangent
planes in a neighbourhood of p (which seems to be a stronger assumption than
required in Theorem 1) [Feller 1936b, Footnote 16].
In classical surface theory Euler’s formula, κ = κ1 cos2 θ + κ2 sin2 θ, relates
the curvature κ of a general normal section with the two principal curvatures
κ1 , κ2 , where θ is the angle to the direction of the first principle curvature.
Dupin had proposed to study the curvature behaviour at a point p by plotting
the square root of the curvature radius ρ = κ−1 of the normal section on the
corresponding (semi-)tangent with initial point p. Euler’s formula implies that
the resulting curve, Dupin’s indicatrix, is a conic section (possibly degenerate).
Normal half-sections (i. e. sections of the surface Π with a normal half-
plane) at the point p of a continuous surface, will lead to curves wich generally
admit, at best, lower and upper curvatures, even in the case of a strong tan-
gent plane at p. Thus, generally spoken, two indicatrices, a lower and an
upper one, can be expected. Further constraints on the surface seemed to be
advisable for the study of these curves. At this point our authors restricted
their investigation to convex surfaces [Feller 1936b, p. 14].

3 Convex surfaces
3.1 Indicatrix, Euler’s theorem and umbilic points
Feller and Busemann showed that for a convex surface S the points in which
S has no tangent plane are exceptional in the sense that they form a null set
Λ0 [Feller 1936b, p. 24].9 Except for the points q of another, larger null set

8 More precisely, the angles between the the straight lines p p and t are ≥  for some
n n
 > 0 (they are “bounded from below” in Feller’s and Busemann’s language).
9 Null sets may be considered with regard to Carathéodory’s two-dimensional measure

in R3 , introduced in 1914. Our authors did not mention this paper (nor Hausdorff’s gener-
alization from 1919), so they may well have referred to Lebesgue’s surface measure intro-
duced for surfaces S ⊂ R3 parametrized by bijective continuous maps f : D → S over domains
D ⊂ R2 which are bounded by curves C of (Lebesgue) measure zero (“courbes quarrables”)
[21, pp. 246, 301–309]. At least they cited Lebesgue’s paper at another occasion (cf. fn. 14).
Carathéodory’s p-dimensional measure of point sets in Rn was a generalization of Lebesgue’s.

90 Contributions to Geometry
Λ1 ⊃ Λ0 all plane sections through q have finite curvatures (with identical lower
and upper curvatures). As it can be shown that any tangent plane of a convex
surface at p is strong,10 they could exploit their generalized Meusnier theorem
to find that all normal sections at p have finite curvature. Thus a single
indicatrix exists at such a point; it is a convex curve and point-symmetric
with respect to p [Feller 1936b, p. 24]. In a longer analytical investigation
they were even able to show that, with the exception of another null set,
the indicatrix at the remaining points forms an ellipse (perhaps degenerate)
[Feller 1936b, pp. 25–29]. They arrive at [Feller 1936b, p. 30]

Theorem 2. For a convex surface S embedded in R3 there are three excep-


tional sets of measure zero,

Λ0 ⊂ Λ1 ⊂ Λ2 ,

such that all points of S \ Λ0 have a (strong) tangent plane; all points of S \ Λ1
have a (convex point-symmetric) indicatrix, and at all points p of S \ Λ2 the
indicatrix is an ellipse (possibly degenerate). Thus Euler’s normal curvature
theorem holds at points p ∈ S \ Λ2 .
The existence of tangent planes or even of indicatrices has remarkable con-
sequences for the regularity of convex surfaces. Busemann and Feller gave an
analytical interpretation of the exceptional sets after presenting their theorem:

Corollary 3. Outside the exceptional sets of measure zero of a convex surface


S as in Theorem 2, the following regularity conditions hold:
a) At all points of S \ Λ0 the surface has partial derivatives in every direc-
tion,

b) At all points of S \ Λ1 the second partial derivative in every direction


exists,

c) At all points of S \ Λ0 the surface satisfies a generalized second differen-


tiability condition, described in terms of “differentiability with regard to
similar rectangles” [Feller 1936b, p. 30].

In [11, p. 24] Busemann rephrased condition c) as f (x) = 12 i,j xi xj aij +
o(|x|2 ), aij = aji , and showed that it could be translated into ordinary twice
differentiability. This reformulation was due to A. D. Alexandrov [1].11
For the proof of the classical theorem that a (connected) surface S con-
sisting only of umbilic points is contained in a 2-sphere or in a plane, one

It was again generalized a few years later (1919) by Hausdorff; see [17] and the commentary
by S. D. Chatterji in the Gesammelte Werke edition.
10 [Feller 1936b, p. 9]
11 See the Footnote 18.

E. Scholz — Selected Works of W. Feller, Volume 1 91


usually assumes C 3 -regularity of S.12 This seems to be no longer natural in
the present context. In the light of Theorem 2, the umbilic condition makes
sense for points p of S \ Λ2 only; then (twice) differentiability at p follows, but
not at every point. In this more general setting our authors showed that a
closed connected differentiable surface exists, in which almost all points are
umbilic with the same curvature, but – in spite of that – the surface is not
even locally contained in a sphere (or a plane) [Feller 1936b, pp. 34 f.].
Accordingly, there arises the question for regularity conditions which allow
to conclude sphericity from the knowledge that umbilic points exist almost
everywhere. Feller and Busemann found that, for a differentiable surface given
as the graph of a function f (x, y), absolute continuity of the partial derivatives
fx (x, y), fy (x, y) in both variables suffices [Feller 1936b, p. 32].
Then they turned their attention to the otherend of the chain of exceptional
sets in Theorem 2. There, on a dense set of directions, normal sections give
continuous curves with different lower and upper curvatures at each side. By
plotting their square roots along the corresponding half-tangent, Busemann
and Feller introduced two indicatrices, a lower one, i, and an upper one, I. In
the second of their papers for the Matematisk Tidsskrift they studied the four
curvature values at such exceptional points (two at “each side” of a tangent
direction) with the goal to give a “survey of all possible distributions for these
four quantities in a single point of a convex surface” [*Feller 1935b, p. 87].
They classified the arising types and gave examples showing that their classi-
fication was complete, in the sense that for any of their classification types a
surface could be given.
If two curves are given by polar coordinates (r, ϕ) (with respect to the pole
p)
i : r = g(ϕ), I : r = G(ϕ),
the following cases, according to the vanishing properties of g, have to be
distinguished:
(i) g(ϕ) does not vanish identically. Then (i, I) can arise as indicatrices of a
convex surface, if and only if the region Ag := 0 ≤ r ≤ g(ϕ) is convex and
for each point q of the region AG := 0 ≤ r ≤ G(ϕ) the convex hull of q
and Ag is contained in AG . If g(ϕ) vanishes in a subinterval of [0, 2π], it
may be discontinuous at two points ϕ1 , ϕ2 ; further conditions for point
distributions on the lower indcatrix can be given [*Feller 1935b, p. 89].13
(ii) If g(ϕ) vanishes identically, i. e. the lower indicatrix reduces to p, I arises
as upper indicatrix, if and only if G(ϕ) is the limit of a monotonously
decreasing sequence of lower semi-continuous functions [*Feller 1935b,
pp. 89 f., pp. 110 f.].
Thus, Feller and Busemann found a quite rich collection of properties with
12 A point p of a differentiable surface is called umbilic, if and only if all normal curvatures

at p are equal, i. e. the indicatrix is a circle (perhaps at infinity).


13 For a short resumée see Fenchel’s review [13].

92 Contributions to Geometry
regard to the generalized indicatrix of convex surfaces. Of course there were
other topics of classical surface theory that deserved attention in the more
general context.

3.2 Shortest arcs


Already at the beginning of the century Lebesgue had shown that any two
points of a convex surface can be connected by a shortest arc.14 Feller and
Busemann showed that in all points p of S \ Λ0 such a shortest arc γ is dif-
ferentiable and, even though the normal sections and thus γ may not have a
well-defined curvature (for p ∈ Λ1 \ Λ0 ), the projection of γ onto the tangent
plane does – and even has vanishing curvature. In this way the classical prop-
erty of vanishing geodesic curvature of γ translates to the more general setting
as described in [Feller 1936b, pp. 40 f.].
On the other hand, things look more surprising from the viewpoint of
tangential directions. Although a shortest arc through a point p ∈ S \ Λ0
has a tangent at p [*Feller 1936a, p. 46 f.], the argument cannot be reversed:
There may exist tangents which do not arise as directions of a shortest arc
in the surface, even for an everywhere differentiable surface (Λ0 = ∅). Feller
and Busemann announced this result as an addendum to [Feller 1936b, p. 47]
and discussed the problem in their first note in Matematisk Tidsskrift more
extensively [*Feller 1935a].
The existence of geodesic lines in a given direction t emanating from p
may become problematic if the upper normal curvature in that direction is
unbounded. Feller and Busemann constructed an example where such an
effect appears for a the limit of a sequence of surfaces starting from the sphere,
erecting tangent cones and smoothing out the cusp [*Feller 1935a, pp. 26 f.].
In this case there exist points q in any neighbourhood to p such that the
shortest connection between p and q is not unique. Thus, the problem is
not due to insufficient differentiability assumptions for the surface but to the
non-uniqueness of shortest arcs between arbitrarily close points. On the other
hand, a criterion for the existence of shortest arcs to every tangent direction
could also be given [*Feller 1935a, pp. 25, 27 f.]:

Theorem 4. Let S be a convex differentiable surface in R3 . If all upper (one-


sided) normal curvatures are bounded by R−1 for some R > 0, all tangential
directions of S are tangents of a shortest arc in S. If p, q are two points of S
with more than one shortest connection, their length is ≥ πR.

3.3 Gaussian curvature


In his Vorlesungen über Differentialgeometrie Blaschke had characterized the
Gaussian curvature of a differentiable convex surface (“Eifläche”) S embedded

14 In fact, Lesbesgue showed that any two points on a connected continuous surface lying

in a bounded domain of Rn can be connected by a shortest arc [21, pp. 345 f.].

E. Scholz — Selected Works of W. Feller, Volume 1 93


in 3-space in a new way [8, p. 83]. He considered a cap of the surface lying
between the tangent plane Π at a point p and the parallel plane Π with
distance h. Let |Ωh | be the surface area of the cap; Blaschke showed that the
Gaussian curvature κ(p) at the point p can be derived from the area by15
2
2πh
κ(p) = lim .
h→0 |Ωh |
Busemann and Feller took this as a starting point for generalizing the concept
of Gaussian curvature to any (not necessarily everywhere differentiable) convex
surface. At a point p with tangent plane and well-defined, bounded indicatrix
they found that limh→0 |Ωh |/h exists, and
 2π
|Ωh |
(2) lim = ρ(ϕ) dϕ,
h→0 h 0

for normal curvature radii ρ(ϕ) at the the point p (taken in a half-plane) with
direction ϕ. For an elliptic indicatrix with semi-axes ρ1 , ρ2 the integral becomes
 2π
0 ρdϕ = 2πρ1 ρ2 , and the classical formula is recovered [Feller 1936b, p. 22 f.].
In their third article for Matematisk Tidsskrift the two authors generalized
their result from early 1934 (published in 1936, as we know). They went back
to Gauss’ original idea and considered the “spherical images” Ω∗ of subsets Ω
of a convex surface S (embedded in R3 ). The spherical image p∗ of a point p
was defined as the totality of all points q on S 2 with direction normal to any
supporting plane of S at p.16 Now the question could be posed as:
Under which conditions does the generalized Gaussian curvature
|Ω∗h |
κ(p) := lim
h→0 |Ωh |

exist, where | · | denotes Lebesgue measure?


Studying the limits of the Blaschke quotients |Ωh |/h and |Ω∗h |/h on the surface
and the spherical image separately, the authors found:
Lemma 5. Let p be a point of a convex surface S with indicatrix i given in
polar coordinates by r = g(ϕ); let |Ωh | and |Ω∗h | be as above.
 2π
|Ωh |
a) lim exists and equals g 2 (ϕ) dϕ.
h→0 h 0

|Ω∗h |
b) If i is bounded, lim exists.
h→0 h
15 Blaschkehad shown a similar formula for the limit of the three-dimensional volume of
the cap in [7].
16 A plane Π through p is a supporting plane, if the surface S is contained in one of the

half-spaces with regard to Π. Thus “spherical imaging” is no longer a point-map, if S has


no tangent plane at p.

94 Contributions to Geometry
 2π
g 2 (ϕ) − g 2 (ϕ)
If p ∈ i, the limit is 0. If p ∈
/ i, it is dϕ.
0 g 4 (ϕ)
|Ω∗h |
c) If i is unbounded, lim need not exist.
h→0 h
An immediate consequence is [*Feller 1936a, p. 43]
Corollary 6. Under the same assumptions as in Lemma 5, the generalized
Gaussian curvature exists for bounded indicatrix i. It is
⎧  2π g2 (ϕ)−g2 (ϕ)

⎪ dϕ

⎨ 0  g (ϕ)
4
, for p ∈
/ i,
2π 2
κ(p) = 0 g (ϕ) dϕ




0, for p ∈ i.
 −1/2
cos2 ϕ 2
Again, for i an ellipse with g(ϕ) = a2
+ sin
b2
, the classical value
arises, κ = (ab)−2 .
Feller and Busemann went further, they analyzed conditions under which
even in case c) of Theorem 6, the generalized Gaussian curvature exists. They
found a condition under which in this case the generalized Gaussian (exists
and) even vanishes [*Feller 1936a, pp. 61 f.].17
Finally, for the total curvature of the surface they could use the null con-
dition for the exceptional set Λ2 of Theorem 2. If the spherical image corre-
spondence Ω −→ Ω∗ maps null sets onto null sets, the total curvature can be
derived from integrating over the points of S \ Λ2 (the “normal” points in the
language of the authors),

|Ω∗ | = (ρ1 (p)ρ2 (p))−1 dp,
Ω\Λ2

with ρi denoting the principal axes of the elliptic indicatrix i at p, [*Feller 1936a,
p. 45].

3.4 Beyond convexity


A decade after Feller and Busemann had worked on surface theory at Copen-
hagen, they came back to the topic in another joint paper [Feller 1945c]. At
the time this paper was written, Feller worked at Brown University and Buse-
mann at Illinois. The two mathematicians explored the possibility to find
similar regularity properties like the ones for convex surfaces, but with wider
applications.
Regularity of convex curves had been analyzed by Jessen [19]; in their
own work they had established regularity properties of convex surfaces em-
bedded in three-space (corollary 3). Their result was made more explicit and
17 See the resumée in [14].

E. Scholz — Selected Works of W. Feller, Volume 1 95


extended by Alexandr D. Alexandrov from the Steklov Institute at Leningrad.
In 1939 Alexandrov showed that the slightly involved regularity condition c) of
Feller–Busemann’s corollary 3 implies ordinary twice differentiability almost
everywhere. Moroever, he generalized the result (twice differentiability a. e.)
to convex hypersurfaces in Rn [1].18 Busemann and Feller now looked for more
general geometrical properties which would allow to conclude similar regularity
results. Hjelmslev and Bouligand had started to consider finiteness of order,
or of class of a surface, where a surface S was called of order n if any straight
line intersects S in at most n points, resp., of class m by a corresponding
tangency condition. Both attempts turned out to be unsatisfactory.19 Feller
and Busemann found a more appropriate characterization by curvature condi-
tions of plane sections, which allowed to derive the validity of the generalized
theorems of Euler and Meusnier a. e. Then, as in the case of convex surfaces,
twice differentiability in the regular points followed.
In order to achieve the intended generality they first studied generalized
curvature concepts for curves, called “total” and “strong” curvature; the lat-
ter was necessary for the formulation of the main result. Moreover they had
to take into account generalized tangency conditions, distinguishing “paratan-
gent” and “paratingens” which already had played a role in their joint 1934/35
studies.
Consider a parametrized continuous curve γ(t) which is locally a Jordan
arc. Then an (oriented) straight line t was called paratangent of γ at a point
p = γ(t0 ), if for two parameter sequences, tk → t0 , tk → t0 with tk < tk , the
oriented straight lines tk through γ(tk ), γ(tk ) converge to t. The paratingens
at p (a concept due to Bouligand [10]) was then the collection of all paratan-
gents at the point. “Strong” curvature of a plane rectifiable curve arose from
considering the collection of all angles Φ(s) formed by paratangents with the
(oriented) x-axis with regard to arc length parametrization s. The authors
showed that Φ (s) is well defined a. e.; thus strong curvature of a (rectifiable
plane) curve could be defined a. e. by |Φ (s)|. These concepts are sufficient to
express the statement of the main result of the paper.
For a sufficiently large set of plane sections through points p of a continuous
surface S embedded in R3 , the existence of (strong) curvature and the validity
of the theorems of Euler or Meusnier can be proven a. e., if the following con-
ditions hold for a surface S parametrized by p(u, v) = (x(u, v), y(u, v), z(u, v)),
(u, v) ∈ R2 [Feller 1945c, p. 589]:
(i) The map p(u, v) → (u, v) is a homeomorphism in a square neighbourhood
of any (u0 , v0 ) with side 2δ, i. e. for |u − u0 | < δ, |v − v0 | < δ.
(ii) For every (u0 , v0 ) there is at least one straight line l not contained in the
paratingens of S at (u0 , v0 ).
18 See [11, p. 24]. Alexandrov himself modestly attributed twice differentiability of convex

surfaces a. e. to Busemann–Feller [5, p. 387].


19 Finiteness of order seemed for our authors to be “too weak”, finiteness of class to be

“more restricitve than necessary” [Feller 1945c, p. 583].

96 Contributions to Geometry
(iii) There exists some δ  ≤ δ such that planes parallel to l intersect the square
neighbourhood with side 2δ  in curves of uniformly bounded (strong)
curvature.
From the curvature characterization twice differentiability a. e. could be con-
cluded. This result rounded off the joint work of Feller and Busemann. It was
their last joint publication.

4 Reception
Busemann’s and Feller’s 1934–36 publications received detailed reports in
Zentralblatt für Mathematik and Jahrbuch für Fortschritte der Mathematik
[12, 13, 14]. Their analytical techniques allowed to establish properties which
proved fruitful, independent of the original goals for which they had been de-
veloped, and became part of the tradition convex geometry. At the end of his
report S. Cohn-Vossen remarked:

In this work diverse inequalities and limits arise which, according


to the opinion of the reviewer, deserve interest in themselves [12].
He was right. A most widely spread auxiliary result turned out to be a lemma
of the Acta Mathematica article.20
Lemma 7 (Busemann–Feller). If a curve γ outside a closed convex surface S
in R3 has finite length l(γ), then its “nearest point” projection to S is a curve
ν ◦ γ with length
l (ν ◦ γ) ≤ l(γ) [Feller 1936b, p. 41].

The most widely cited result emerging from their curvature studies seems
to be twice differentiability of convex surfaces a. e. (Corollary 3 c)), and its
background in the Euler theorem for normal curvatures.21
In Soviet Russia, Alexandr D. Alexandrov and some of his students worked
on topics related to those of Busemann–Feller. They took up the results of the
two Copenhagen emigrants; Alexandrov’s 1939 generalization of the regularity
properties of convex surfaces and hypersurfaces has already been mentioned
[1]. His student I. M. Liberman obtained diverse results on shortest arcs on
convex surfaces in spaces of dimension n, refining and extending Busemann
and Feller’s existence result of tangents to shortest arcs at points p at which a
tangent plane exists (see Section 3.2) [22]. Liberman, like other gifted students
of A. D. Alexandrov, died fighting against Nazi-Germany’s invasion of the So-
viet Union. His most important findings on shortest arcs were integrated by
Alexandrov in his magisterial work on Intrinsic Geometry of Convex Surfaces
[5, pp. xii, 31, 158 etc.]. In this book also Feller and Busemann’s joint papers

20 See, e. g. the study of the “upper bound conjecture”[23, p. 35].


21 See, e. g. [6, 15]

E. Scholz — Selected Works of W. Feller, Volume 1 97


were quoted at several occasions.22
The broader field of intrinsic geometry of convex surfaces seems to have
flourished more lively in the school of A. D. Alexandrov than in the West. Al-
though Busemann later built on his work with Feller, their common research
appears as a starting point for his further works [11] rather than a recurring
reference, like in Alexandrov’s book. Outside the Soviet Union the generaliza-
tion of results from classical differential geometry seems not to have attracted
as much attention among convex geometers as in Russia.
In the recent Handbook of Convex Geometry there are only two references to
Feller’s joint work with Busemann.23 In the extended historical survey by one
of the main editors it is not mentioned at all [16]. Cohn-Vossen’s observation
on the importance of the more technical auxiliary results of the papers, in
particular Lemma 7, seems to have been more realistic in the long run than
its author may have wished. But, of course, I cannot exclude the possibility
that this impression is due to an insufficient insight into the impressively broad
tradition of convex geometry and its generalizations.

References
All citations of the form [Feller 19nn], resp., [*Feller 19nn] (if the respective
paper is not included in these Selecta) point to Feller’s bibliography, pp. xxv–
xxxiv.

[1] Alexandrov, A. D.: Almost everywhere existence of the second differential


of a convex function and some properties of convex surfaces connected
with it (Russian). Uchenye Zapiski Gos. Universiteta, Seriya Matem.
Nauk 6 (1939) 3–35.

[2] Alexandrov, A. D.: Smoothness of a convex surface of bounded gaussian


curvature (Russian). Doklady Akademii Nauk SSSR 36 (1942) 211–216.

[3] Aleksandrov, A. D.: Vnutrennyaya Geometriya Vypuklyh Poverhnosteı̆.


(Russian) OGIZ, Moscow–Leningrad 1948. German translation [4], En-
glish translation [5].
[4] Aleksandrov, A. D.: Die innere Geometrie der konvexen Flächen.
Akademie-Verlag, Berlin 1955.
[5] Alexandrov, A. D., Kutateladze S. S. (ed.): Alexandrov, A. D. Selected
Works Vol. 2. Intrinsic Geometry of Convex Surfaces. Chapman and Hall,
Boca Raton 2006.

22 Feller
and Busemann are quoted in [5] for the following topics: Busemann-.Feller Lemma
(p. 92 f.), tangents to shortest arcs (pp. 147, 156), twice differentiability of convex surfaces
a. e. (p. 387 f.).
23 [15, pp. 280, 1059] relating to normal points a. e. on a convex surface, and to the

principal curvatures and the indicatrix in the chapter on differential geometry [20].

98 Contributions to Geometry
[6] Bianchi, G., Colesanti, A. and Pucci, C.: On the second differentiability
of convex surfaces. Geometria Dedicata 60 (1996) 39–48.
[7] Blaschke, W.: Jahresbericht DMV 27 (1919) 149.
[8] Blaschke, W.: Vorlesungen über Differentialgeometrie, Bd. I. Springer,
Berlin 1921.
[9] Bonnesen, T. and Fenchel, W.: Theorie der konvexen Körper. Springer,
Berlin, 1934. English translation: Theory of Convex Bodies, Moscow (ID,
USA) BCS 1987.
[10] Bouligand, G.: Introduction à la géométrie infinitésimale directe.
Gauthier–Villars, Paris 1931.
[11] Busemann, H.: Convex Surfaces. Interscience, New York 1958.
[12] Cohn-Vossen, S.: Report on [*Feller 1935a]. Zentralblatt Mathematik
0015.12401.
[13] Fenchel, W.: Report on [*Feller 1935b]. Zentralblatt Mathematik
0013.17905.
[14] Fenchel, W.: Report on [*Feller 1936a]. Zentralblatt Mathematik,
0015.12401.
[15] Gruber, P. M. and Wills, J. M. (eds.): Handbook of Convex Geometry, 2
vols. North-Holland, Amsterdam 1993.
[16] Gruber, P. M.: History of convexity. In [15] Vol. A, Chapter 0, pp. 1–16.
[17] Hausdorff, F.: Dimension und äußeres Maß. Mathematische Annalen
31 (1929) 157–179. Reprinted (with commentaries) in: Felix Hausdorff:
Gesammelte Werke, Band 4. Springer, Berlin 2001, pp. 19–54.
[18] Hjelmslev, J.: Grundlag for Fladernes Geometri. Høst i Komm, Copen-
hagen 1914.
[19] Jessen, B.: On konvekse kurver’s krumning. Matematisk Tidsskrift B
(1929) 50–62.
[20] Leichtweiß, K.: Convexity and differential geometry. In [15] Vol. B, Chap-
ter 4.1, pp. 1045–1080.
[21] Lebesgue, H.: Intégrale, longueur, aire. Annali di Matematica 7 (1902)
231–359.
[22] Liberman, I. M.: Geodesic lines on convex surfaces (Russian). Doklady
Akademia Nauk SSSR 31 (1941) 310–313.
[23] McMullen, P. and Shephard, G.: Convex Polytopes and the Upper Bound

E. Scholz — Selected Works of W. Feller, Volume 1 99


Conjecture. London Mathematical Society Lecture Note Series Vol. 3,
Cambridge University Press, London 1971.
[24] Schappacher, N.: Das mathematische Institut der Universität Göttingen
1929–1950. In: Wegeler, C., Becher, H. and Dahms, H.-J. (eds.): Die Uni-
versität Göttingen unter dem Nationalsozialismus. Das verdrängte Kapitel
ihrer 250-jährigen Geschichte. Saur, München 1987, pp. 345–373.
[25] Siegmund-Schultze, R.: Mathematicians Fleeing from Nazi-Germany. In-
dividual Fates and Global Impact. Princeton University Press, Princeton
(NJ) 2009.

Prof. Erhard Scholz


History of Mathematics
Department C – Mathematics
Universität Wuppertal
42097 Wuppertal, Germany
[email protected]

100 Contributions to Geometry


Ó Springer International Publishing Switzerland 2015 101
R.L. Schilling et al. (eds.), Selected Papers I,
102 Mathematische Zeitschrift 27 (1928) 481–495
[Feller 1928] — Selected Works of W. Feller, Volume 1 103
104 Mathematische Zeitschrift 27 (1928) 481–495
[Feller 1928] — Selected Works of W. Feller, Volume 1 105
106 Mathematische Zeitschrift 27 (1928) 481–495
[Feller 1928] — Selected Works of W. Feller, Volume 1 107
108 Mathematische Zeitschrift 27 (1928) 481–495
[Feller 1928] — Selected Works of W. Feller, Volume 1 109
110 Mathematische Zeitschrift 27 (1928) 481–495
[Feller 1928] — Selected Works of W. Feller, Volume 1 111
112 Mathematische Zeitschrift 27 (1928) 481–495
[Feller 1928] — Selected Works of W. Feller, Volume 1 113
114 Mathematische Zeitschrift 27 (1928) 481–495
[Feller 1928] — Selected Works of W. Feller, Volume 1 115
Ó Springer International Publishing Switzerland 2015 117
R.L. Schilling et al. (eds.), Selected Papers I,
118 Mathematische Annalen 102 (1930) 633-–649
[Feller 1930] — Selected Works of W. Feller, Volume 1 119
120 Mathematische Annalen 102 (1930) 633-–649
[Feller 1930] — Selected Works of W. Feller, Volume 1 121
122 Mathematische Annalen 102 (1930) 633-–649
[Feller 1930] — Selected Works of W. Feller, Volume 1 123
124 Mathematische Annalen 102 (1930) 633-–649
[Feller 1930] — Selected Works of W. Feller, Volume 1 125
126 Mathematische Annalen 102 (1930) 633-–649
[Feller 1930] — Selected Works of W. Feller, Volume 1 127
128 Mathematische Annalen 102 (1930) 633-–649
[Feller 1930] — Selected Works of W. Feller, Volume 1 129
130 Mathematische Annalen 102 (1930) 633-–649
[Feller 1930] — Selected Works of W. Feller, Volume 1 131
132 Mathematische Annalen 102 (1930) 633-–649
[Feller 1930] — Selected Works of W. Feller, Volume 1 133
Ó Springer International
— Publishing Switzerland 2015 135
R.L. Schilling et al. (eds.), Selected Papers I,
136 Fundamenta Mathematicae 22 (1934) 226–256
[Feller 1934a] — Selected Works of W. Feller, Volume 1 137
138 Fundamenta Mathematicae 22 (1934) 226–256
[Feller 1934a] — Selected Works of W. Feller, Volume 1 139
140 Fundamenta Mathematicae 22 (1934) 226–256
[Feller 1934a] — Selected Works of W. Feller, Volume 1 141
142 Fundamenta Mathematicae 22 (1934) 226–256
[Feller 1934a] — Selected Works of W. Feller, Volume 1 143
144 Fundamenta Mathematicae 22 (1934) 226–256
[Feller 1934a] — Selected Works of W. Feller, Volume 1 145
146 Fundamenta Mathematicae 22 (1934) 226–256
[Feller 1934a] — Selected Works of W. Feller, Volume 1 147
148 Fundamenta Mathematicae 22 (1934) 226–256
[Feller 1934a] — Selected Works of W. Feller, Volume 1 149
150 Fundamenta Mathematicae 22 (1934) 226–256
[Feller 1934a] — Selected Works of W. Feller, Volume 1 151
152 Fundamenta Mathematicae 22 (1934) 226–256
[Feller 1934a] — Selected Works of W. Feller, Volume 1 153
154 Fundamenta Mathematicae 22 (1934) 226–256
[Feller 1934a] — Selected Works of W. Feller, Volume 1 155
156 Fundamenta Mathematicae 22 (1934) 226–256
[Feller 1934a] — Selected Works of W. Feller, Volume 1 157
158 Fundamenta Mathematicae 22 (1934) 226–256
[Feller 1934a] — Selected Works of W. Feller, Volume 1 159
160 Fundamenta Mathematicae 22 (1934) 226–256
[Feller 1934a] — Selected Works of W. Feller, Volume 1 161
162 Fundamenta Mathematicae 22 (1934) 226–256
[Feller 1934a] — Selected Works of W. Feller, Volume 1 163
164 Fundamenta Mathematicae 22 (1934) 226–256
[Feller 1934a] — Selected Works of W. Feller, Volume 1 165
Ó Springer International Publishing Switzerland 2015 167
R.L. Schilling et al. (eds.), Selected Papers I,
168 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 169
170 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 171
172 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 173
174 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 175
176 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 177
178 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 179
180 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 181
182 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 183
184 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 185
186 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 187
188 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 189
190 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 191
192 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 193
194 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 195
196 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 197
198 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 199
200 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 201
202 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 203
204 Mathematische Zeitschrift 40 (1935/36) 521–559
[Feller 1935c] — Selected Works of W. Feller, Volume 1 205
Translation of [Feller 1935c]

On the Central Limit ¶ 521

Theorem of Probability
Theory
By Willy Feller in Stockholm

§ 1. Statement of the problem and results.


The aim of the present investigation is to derive a necessary and sufficient
condition for the validity of the classical Laplace–Ljapounoff limit theorem,
and to obtain a criterion whether in a concrete case the so-called exponential
law is satisfied. As it is well known, one can state the problem in purely
analytic terms (i.e. without referring to specifically probabilistic notions) in
the following way:
A distribution function V (x) is a monotonic function defined for all real
values x such that
lim V (x) = 0, lim V (x) = 1;
x→−∞ x→+∞

at any discontinuity points we always define V (x) by, say,


V (x) = lim V (ξ).
ξ→x+

For any given sequence {Vn (x)} of distribution functions we will always denote
by Wn (x) the convolutions defined by
 +∞
(1) W1 (x) = V1 (x), Wn+1 (x) = Wn (x − y) dVn+1 (y).
−∞

Translated and typeset by René L. Schilling. I am grateful for critical comments and
suggestions by Hans Fischer and Zoran Vondraček. The symbol ¶ indicates a page break in
the original text, and the original pagination is shown in the margin. Footnotes indexed by
lowercase Roman letters contain editorial comments. Throughout the text the index ν has
been changed to μ since the Greek ν closely resembles v, the small Roman V .

Ó Springer International Publishing Switzerland 2015 207


R.L. Schilling et al. (eds.), Selected Papers I,
All versions of the central limit theorem give sufficient conditions guaranteeing
that for a certain sequence of real numbers {an }

(2) Wn (an x) → Φ(x)

where, as usual, Φ(x) denotes the Gaussian standard normal distribution func-
tion  x
1 1 2
Φ(x) = √ e− 2 y dy.
2π −∞
It is always a priori assumed that at least the second moments of the Vn (x) are
finite, and that the first moments vanish; in these cases one considers only those
¶ 522 normalizing factors an for which ¶ the second moment of Wn (an x) equals 1
(cf. § 5, (1) and (2), p. 541). Under these assumptions Lindeberg 1 has recently
given a sufficient condition for the limit theorem which contains all known
versions as special cases and which stands out because of its applicability (for
a statement see § 5, p. 541). The question whether Lindeberg’s condition is, at
least under the restrictions mentioned above, also necessary, has been asked2 ,
but not answered conclusively.
First of all, one should remark that the existence of any moments is cer-
tainly not necessary since the convergence cannot depend on the infinitary
behaviour of each of the elements V (x). Indeed, as soon as an → ∞ – which
is, of course, always assumed –, it is easily seen that there is a sequence of real
numbers {ξn } such that the relation (2) remains valid if Vn (x) is arbitrarily
re-defined for |x| > ξn , for example in such a way that all moments become in-
finite. Therefore, this assumption is, from an analytic point of view, certainly
not natural.
A more serious restriction of the problem, however, is the usual normaliza-
tion that the second moment of Wn (an x) has to be 1. Since, in general, the
moments do not converge jointly with the distributions, there exist sequences
{Vn (x)} (even with finite moments of any order), for which Wn (an x) does not
converge at all under that normalization, but is convergent to Φ(x) for a suit-
able choice of an . Moreover, this excludes already those cases where Wn (an x)
converges, under that particular normalization, towards Φ(2x), say. It is ab-
solutely irrelevant not only for the theory, but even more for the original limit
problem from practice, which normalization achieves the convergence to Φ(x).
Therefore, we consider the following question: Let {Vn (x)} be a given se-
quence of distribution functions; do there exist two sequences of real numbers
{an } and {cn } such that Wn (an x + cn ) → Φ(x) and, if so, can one determine
1 J. W. Lindeberg: Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlich-

keitsrechnung, Math. Zeitschr. 15 (1922). A different proof based on the theory of character-
istic functions is given in P. Lévy, Calcul des probabilités, Paris, Gauthier–Villars, pp. 242 ff.
An essentially different method of proof which shows the connection with other asymptotic
theorems can be found in A. Khintchine, Asymptotische Gesetze der Wahrscheinlichkeits-
rechnung, Ergebnisse der Math. Vol. 2, Issue 2, Berlin 1933. The statement given below
(p. 541) differs formally from Lindeberg’s and is due to P. Lévy.
2 cf. for example P. Lévy loc. cit.

208 On the Central Limit Theorem of Probability Theory


such sequences? If one wants to preserve the character of an asymptotic limit
law, then one has to ¶ exclude a trivial convergence case which, by the way, ¶ 523
already appears with the usual specific normalization. Clearly, the distribu-
tion function Wn (an x) symmetrically depends on the Vk (x), k = 1, 2, . . . , n. Of
course, the limit problem concerns only the case that the influence of the sin-
gle component Vk (x) on Wn (an x) tends to zero as n increases; in other words,
that the convergence (2) is not caused by the dominating influence of single
components which tend to Φ(x) on their own. The exact statement3 of this
is: Let for any k = 1, . . . , n and n → ∞ the limit

0 for x < 0,
(3) Vk (an x) → E(x) =
1 for x ≥ 0,

exist. (One may assume that the Vk (an x) are arranged in a sequence: V1 (a1 x),
V1 (a2 x), . . . , V1 (an−1 x), V1 (an x), . . . , Vn (an x), . . .; then (3) means that this
sequence converges to E(x) and, because of the monotonicity and the limit
conditions for x → ±∞, this convergence is automatically uniform outside
any neighbourhood of the origin.) In fact, this condition is equivalent to the
seemingly weaker assumption that for every fixed k

lim Vk (an x) = lim Vn (an x) = E(x).


n→∞ n→∞

It turns out (§ 2) that, as one would expect, the condition (3) is equivalent to
considering only those sequences {an } which satisfy
an+1
(4) an → ∞ and → 1.
an
In general, literally the same statements hold for this restriction as one has
for the usual, more specialized, normalization4 : Only two effectively existing ¶
cases are excluded from the consideration, and both have a completely different ¶ 524
analytic character: 1. If an remains bounded; then (2) fails if any member
Vm (x) is somehow changed; one does not deal with an asymptotic limit law,
but rather with the question, how to split Φ(x) into components (cf. § 2,

3 Probabilistically this condition has the following meaning: Let x , x , . . . be random


1 2
variables with the distribution functions Vk (x) – i.e. Vk (x) is the probability that the value
of xk is less or equal than x. Then Wn (an x) is the distribution function of the random
1 xμ
variable (x1 + · · · + xn ). Then (3) means that the probability for the absolute value of
an an
to exceed any constant η > 0, tends to zero. The condition arises, e.g., from the requirement
that for any given α, in particular (say) for α = −1, also the sequences
1
(x1 + · · · + xk−1 + (1 + α)xk + xk+1 + · · · + xn )
an
tend to a normally distributed random variable as n → ∞.
4 cf. e.g. P. Lévy, loc. cit. pp. 235–237.

[Feller 1935c] Translation — Selected Works of W. Feller, Volume 1 209


an+1
p. 531).a 2. If > α > 1 holds (or if this is the case for any subsequence);
an
then one has for suitable ck already Vk (ck x) → Φ(x), so that the convergence
again depends on the behaviour of the single components. It is, by the way,
possible to find necessary and sufficient (but not very practical) conditions for
(2), but we do not state them here, since the effort is in no relation to the
uninteresting question. In the appendix we show with a short example that
the condition Vk (ck x) → Φ(x) is not sufficient.
Thus, we study the above mentioned problem only under the restriction
(4), but we re-state this restriction more appropriately as
Definition. A sequence of distribution functions {Vk (x)} belongs (with the
normalizing factors {an }) to the Gaussian standard normal distribution func-
tion Φ(x) if the relations (2) and (3) hold.
The normalizing factors are, of course, unique only up to the fact that
one may replace the sequence {an } by those, and only those, sequences {an }
satisfying
an
lim  = 1;
an
we will call any such two sequences equivalent. One needs (§ 2) equivalent nor-
malizing factors for all sequences {Vk (x + bk )}, and any such sequence belongs
simultaneously with {Vk (x)} to Φ(x) if, and only if,

1 
n
lim bμ = 0
an
μ=1

holds. The question is now whether there are constants bμ such that Vμ (x+bμ )
belongs to Φ(x).
When answering this question we assume at first that the normalizing
factors are given, and we rule out the possibility that the convergence is caused
by shifts. Then one has:
The sequence of distribution functions {Vn (x)} with the normalizing factors
{an } belongs to Φ(x) if, and only if, for every η > 0 all of the following three
¶ 525 conditions are satisfied: ¶

n 

(I) lim dVμ (x) = 0,
μ=1 |x|>ηan
⎧ ⎫
n ⎨
  2⎬
1
(II) lim 2 x2 dVμ (x) − x dVμ (x) = 1,
an ⎩ |x|<ηan |x|<ηan ⎭
μ=1
n 
1 
(III) lim x dVμ (x) = 0.
an |x|<ηan
μ=1

a This problem was solved by H. Cramér after the publication of this paper, see Feller’s

210 On the Central Limit Theorem of Probability Theory


These conditions are still rather unpractical; they simplify considerably if
one considers only some kind of absolute convergence. To begin with, it seems
to be natural to require that (2) remains valid if we replace any element by its
symmetric5 analogue, i.e. any Vk (x) by 1 − Vk (−x). This leads to the new
Definition. The sequence {Vk (x)} properly belongs to Φ(x) if all sequences
belong to Φ(x) which can be derived from {Vk (x)} by replacing, for any subse-
quence {nk }, the element Vnk (x) by 1 − Vnk (−x).
Of course, the normalizing factors remain unchanged. Then one has:
The sequence {Vk (x)} with the normalizing factors {an } properly belongs
to Φ(x) if, and only, if for every η > 0 all of the following three conditions are
satisfied:
n 

(I ) lim dVμ (x) = 0,
μ=1 |x|>ηan
n 
1
(II ) lim x2 dVμ (x) = 1,
a2n
μ=1 |x|<ηan
 
n 
1   

(III ) lim  x dVμ (x) = 0.
an  |x|<ηan 
μ=1

These conditions are direct generalizations of Lindeberg’s condition which,


thus, is at the heart of the matter. In the presence of finite second moments and
the usual choice of the an one obtains Lindeberg’s condition. This condition is,
under the assumptions mentioned above not only sufficient but also necessary.
(In the ¶ parlance of P. Lévy this means that it is necessary and sufficient for ¶ 526
the {Vn (x)} to be a famille normale.) This seems to have gone by unnoticed,
although the necessity is an immediate consequence of Chebyshev’s inequality.
Its proof requires only a few lines and, given the general interest in Lindeberg’s
condition, it seems appropriate to give a direct proof, independently of the
somewhat complicated general case (cf. § 5, p. 542 f.).
Moreover, it turns out that the condition (III), and even (III ), is totally ir-
relevant in the sense that it only fixes the coordinate origins of the components.
One has, in fact, the theorem:
Assume that the conditions (I) and (II) are satisfied. If one defines the
numbers bk by

(5) (x − bk ) dVk (x) = 0
|x|<ak

1 1
5 This requirement means that both (x1 + · · · + xn ) and (i1 x1 + · · · + in xn ), where
an an
the ik take the values ±1, converge to a normally distributed random variable, i.e. that the
given sequence converges absolutely.

comment in Footnote 1 of [Feller 1937b].

[Feller 1935c] Translation — Selected Works of W. Feller, Volume 1 211


and sets Vk∗ (x) = Vk (x + bk ), then the sequence {Vk∗ (x)} with the normalizing
factors {an } belongs to Φ(x), and it belongs to Φ(x) even properly. In ret-
rospect, this allows to replace the conditions (I)–(III) since by (5) one may
reduce any sequence belonging to Φ(x) to one which properly belongs to Φ(x).
For the final answer to the question posed at the beginning it is, there-
fore, irrelevant if we start with the given sequence {Vn (x)} or with any other
sequence {Vn (x + cn )} with arbitrary cn . Thus we may, without loss of gener-
ality, fix the coordinate origins for the components in an arbitrary manner by
a suitable choice of the cn . We will do this by requiring that for all k
 ξ  ∞
(6) lim dVk (x) ≤ 1 − 12 , b lim dVk (x) ≤ 12 .
ξ→0− −∞ ξ→0+ ξ

This fixes the origin in a natural way. Let us remark, however, that the
following criterion literally remains valid if one replaces in (6) on the right-
hand sides 12 by any number q < 1: it is only important that the origins are
not completely excentric relative to the supports of the distribution functions
Vk (x); in particular, (6) has only been introduced to get a unique condition.
Now we define for every δ > 0 a real number pn (δ) as the smallest number
such that
n 
dVμ (x) ≤ δ
μ=1 |x|>pn (δ)

(the integral over an interval |x| > X is always understood as the limit of the
integrals over |x| > X +  as  → 0+; therefore, pn (δ) is always defined).
¶ 527 ¶ Then the following criterion holds which gives a complete answer to the
question.
Criterion. If the coordinate origins for the Vk (x) are fixed in such a way that
(6) holds, then it is necessary and sufficient for the existence of a sequence of
real numbers {bk } such that the sequence {Vk (x + bk )} belongs to Φ(x), that
for every δ > 0
n 
1 
lim 2 x2 dVμ (x) = ∞
pn (δ) |x|<pn (δ)
μ=1

holds. If this is the case, then there is a sequence δn → 0 such that also
n 
1 
lim 2 x2 dVμ (x) = ∞
pn (δn ) |x|<pn (δn )
μ=1

holds; setting
⎧ ⎫
n ⎨
  2⎬

a2n = x2 dVμ (x) − x dVμ (x) ,


⎩ |x|<pn (δn ) |x|<pn (δn ) ⎭
μ=1

1
b Feller writes 2
instead of 1 − 12 , but this does not work for his comment (in italics)

212 On the Central Limit Theorem of Probability Theory


and defining bk by (5), then the sequence {Vk (x + bk )} with the normalizing
factors {an } belongs to Φ(x), even properly.
The actual calculations become much easier in the following more gen-
eral formulation of the sufficient condition: Assume that {qn } is an arbitrary
sequence of real numbers for which either
n  n 
1  
(7) lim 2 x2 dVμ (x) = ∞ and lim dVμ (x) = 0,
qn |x|<qn
μ=1 μ=1 |x|>qn

or
n  n 
1  
(8) lim x2 dVμ (x) > 0 and lim dVμ (x) = 0
qn2 |x|<qn
μ=1 μ=1 |x|>ηqn

holds true for every η > 0, then


⎧ ⎫
n ⎨
  2⎬

a2n = x2 dVμ (x) − x dVμ (x)


⎩ |x|<qn |x|<qn ⎭
μ=1

yields a sequence {an } of normalizing factors for {Vn (x)}.


The proofs rely on the method of characteristic functions and, among all
proofs, they are closest to P. Lévy’s proof1 of Lindeberg’s theorem. In order
to simplify the proofs, we provide some auxiliary results (§ 2) on characteristic
functions and begin, thereafter, in § 3 with the study of the conditions (I)–
(III); the results of that paragraph yield all other ¶ results mentioned here, ¶ 528
and they will not appear again in this form; this is due to the postponed proof
that the conditions (I)–(III) are necessary and sufficient.
In § 8 we provide some examples illustrating several possible variants. The
example of a sequence with identical elements Vn (x) = V (x) might be interest-
ing in its own. If V (x) has a (vanishing first and a) finite second moment then,
according to Lindeberg and P. Lévy, the sequence belongs to Φ(x). On the
other hand, the beautiful investigations by Pólya6 and P. Lévy7 on so-called
exceptional error laws show that in the presence of an infinite second moment
this is, in general, not the case. A special case of the results mentioned above
shows now exactly the difference between those V (x) which do or do not belong
to Φ(x).
Let us, finally, add a few words on the relation to practical applications. It
has often been emphasized (among others by P. Lévy) that already a special
case of Lindeberg’s theorem provides a sufficient explanation that, frequently,
6 G.
Pólya, Herleitung des Gaußschen Fehlergesetzes aus einer Funktionalgleichung,
Math. Zeitschr. 18 (1923)
7 loc. cit., p. 252 f.

1
following (6) that we can replace 2
by q < 1; see also (4) in [Feller 1937a].

[Feller 1935c] Translation — Selected Works of W. Feller, Volume 1 213


observation errors appearing in practical applications are approximately nor-
mally distributed. In practice one may always assume that the random vari-
ables xk (cf. Footnote 3) attain only values in a fixed interval |x| < X which
means that all Vk (x) are constant for |x| > X. The limit theorem for such
sequences was already proved by Lindeberg in an earlier paper8 . On the other
hand it is clear that any such argument is inappropriate in a truly theoretical
description of nature. The fact to be proven, that the limit theorem is with
respect to “small deviations” of the Vn (x) as robust as practice would require,
is presupposed in that argument. Although it is impossible to associate with
Lindeberg’s theorem practically important situations, it seemed to be desir-
able to perform the parallelism between theoretical and empirical results. The
criterion provided here has immediately an intuitive interpretation, and only
effectively measurable quantities are being used there.
The completion of this paper was made possible by the kind invitation to
visit the Institute for Mathematical Statistics in Stockholm. I would like to
thank, also at this place, the director of the institute, Professor H. Cramér,
¶ 529 for his interest in my work and for generous hospitality. ¶

§ 2. Preparatory lemmas
According to P. Lévy9 the characteristic function of a distribution function
F (x) is its Fourier transform
 +∞
f (t) = eixt dF (x).
−∞

In the sequel we will consider characteristic functions for real-valued t only.


We begin with a collection of known calculation rules which we are going to
use later on.
f (t) is continuous for all t, and one has

f (0) = 1, |f (t)| ≤ 1.
bt  
The distribution function F (ax + b) has the characteristic function e− a i f at ;
if F (x) is the convolution of two distribution functions F1 (x) and F2 (x), i.e.
 +∞  +∞
F (x) = F1 (x − y) dF2 (y) = F2 (x − y) dF1 (y),
−∞ −∞

then one has for the corresponding characteristic functions

f (t) = f1 (t)f2 (t).

The characteristic function of the standard normal distribution function Φ(x)


8 J. W. Lindeberg, Ann. Acad. Sci. Fennicae A 16 (1920).
9 loc. cit. p. 161. There one finds also the calculation rules mentioned below.

214 On the Central Limit Theorem of Probability Theory


is
1 2
φ(t) = e− 2 t ,
and the one of the function

0 for x < 0,
E(x) =
1 for x ≥ 0,

is the constant function = 1. For our investigations characteristic functions


are important because of the following fundamental
Theorem. 10 Let F (x) and Fn (x), n = 1, 2, . . ., denote distribution functions
with characteristic functions f (t) and fn (t), respectively. For the convergence
of the sequence {Fn (x)} ¶ to F (x) it is necessary and sufficient that fn (t) → ¶ 530
f (t), uniformly in every finite interval.
We will now consider a sequence of distribution functions Vn (x) with char-
acteristic functions vn (t), and define their convolutions Wn (x) as in the intro-
duction (1). For the corresponding characteristic functions one has


n
(1) wn (t) = vμ (t).
μ=1

  1 2
Wn (an x) → Φ(x) is equivalent to wn atn → e− 2 t , and analogously Vμ (an x) →
 
E(x) (cf. the introduction (4)) is equivalent to vμ atn → 1. Both limits are
uniform in any finite interval; obviously, the latter can also be written in the
form:  +∞ 
ixt

lim 1 − e an dVμ (x) = 0, (μ = 1, 2, . . . , n).
n→∞ −∞

Thus, we have the following


Criterion. The sequence {Vn (x)} with the normalizing factors {an } belongs
to Φ(x) if, and only if, firstly

t 
n
t 1 2
(α) wn = vμ → e− 2 t
an an
μ=1

converges uniformly in every finite interval and if, moreover, for every  > 0
10 S. Bochner, Vorlesungen über Fouriersche Integrale, Leipzig 1932, p. 70 (“necessary”)

and p. 72 (“sufficient”). Bochner’s theorem is even a bit more general than the one stated
here. A somewhat more restrictive theorem (assuming the finiteness of the second mo-
ments) is in P. Lévy, loc. cit. pp. 195 and 197.c A simplification of this proof, which avoids
conditionally convergent integrals, is in H. Cramér, On composition of elementary errors,
Skandin. Actuarie Tidskr. 1928. (There one finds also remarkable remainder estimates for
the central limit theorem.)
c Feller mentions in [Feller 1937a, Footnote 1] that he is actually using Lévy’s version.

[Feller 1935c] Translation — Selected Works of W. Feller, Volume 1 215


and T > 0 there is some N = N (, T ) such that for n > N and |t| < T
 +∞   +∞ 
 xt   xt 
(β)  1 − cos 
dVμ (x) < ,  sin dVμ (x) < 
 an  an
−∞ −∞

holds11 , μ = 1, 2, . . . , n.
Now we study first the restrictions for the normalizing factors, and under
which circumstances (α) can hold without (β).
In order that for two sequences {an } and {an } of positive reals both (α)
and
t 1 2
wn 
→ e− 2 t
an
hold, it is obviously necessary and sufficient that both sequences are equivalent,
i.e.
an
lim =1
n→∞ an

¶ 531 holds. ¶ Now it is easy to see that (α) cannot hold for oscillating sequences
1 2
{an }: Since e− 2 t is monotonically decreasing and |vμ (t)| ≤ 1, it immediately
follows from (α) that for given  > 0 one has for n > N  () and all k > 0
an+k
(2) > 1 − .
an
Thus, for every sequence {an } of positive reals, such that (α) holds, there is an
equivalent, monotonically increasing sequence, and one might restrict oneself
in all what follows to monotone sequences {an } (which, alas, does not simplify
things).
Now we consider first the case that an remains bounded, so that lim an = a
exists. Then (β) cannot hold. If (α) is satisfied, then one has immediately
because of the uniform convergence

 t 1 2
vμ = e− 2 t .
a
μ=1

If, for bounded an , one has at all the convergence Wn (an x) → Φ(x), then the
sequence Vn (ax) is obtained by successively splitting of Φ(x) into components.
It is well-known that these have necessarily finite dispersion, and convergence
is only possible with the usual normalization. The convergence changes as
soon as one leaves away any single member, unless it is, incidentally, = E(x).
This case does not belong to probability theory at all.
 assume that an → ∞. If both relations (α) and (β) hold, then
Thus,
lim vn atn = 1 and, therefore, it follows at once from

t t t
(3) wn+1 = wn vn+1
an+1 an+1 an+1
11 The second inequality in (β) follows, by the way, from the first one using Schwarz’

216 On the Central Limit Theorem of Probability Theory


that
t t 1 2
lim wn = lim wn = e− 2 t ,
an an+1
and so, because of the uniform convergence,
an
(4) lim = 1.
an+1

Conversely,
 if the conditions (α) and (4) are satisfied, then it follows first
that vn atn converges uniformly in every finite interval to 1. By (2), the
 
same applies to vμ atn , μ = 1, . . . , n. Thus, (α) and (4) together imply (β).
Therefore, we can claim:
¶ If one has at all that Wn (an x) → Φ(x), then the sequence Vn (x) belongs ¶ 532
to Φ(x) if, and only if,
an
(5) an → ∞ and lim = 1.
an+1
Only in this case the influence of the single members becomes negligible as n
increases.
In order to verify that also the second restriction in (5) excludes only com-
pletely uninteresting sequences from the investigation, it suffices to consider a
sequence for which one has
an+1
> α > 1.
an
Then, for (α) to hold, (3) shows at once that already Vn (x) has to converge to
a normal law. More precisely: Setting a2n = n 2
μ=1 cμ (as one may do without
cn
loss of generality) and if > α > 0, then Wn (an x) can converge to Φ(x) only
an
if

(6) Vn (cn x) → Φ(x)

as well. The typical example for such sequences is, say,


x
Vn (cn x) = Φ √ n ,
2

where a2n = 2n+1 − 1 and Wn (an x) = Φ(x). Also in this case the convergence
depends only on the behaviour of the single elements. Moreover, as we have
already remarked in § 1, the condition (6) is not sufficient for (α), but we
refrain from stating the exact conditions. Let us just note that every sequence
for which (α) holds can be decomposed by some kind of diagonal procedure
into two subsequences, one of which is of the type (6) while the other belongs
to Φ(x).
inequality.

[Feller 1935c] Translation — Selected Works of W. Feller, Volume 1 217


Assume now that the sequence {Vn (x)} with the normalizing factors an
belongs to Φ(x). The characteristic function of Vn (x + bn ) differs from the one
of Vn (x) merely by the factor e−ibn t ; the sequence Vn (x + bn ) belongs, if at
all, to Φ(x) with the same normalizing factors. In order that an analogue of
(α) holds for this sequence, it is obviously necessary and sufficient that

1 
n
(7) lim bμ = 0
an
μ=1

bn
is valid. From (7) it follows, in particular, that → 0, and so, because of
an
(2),
1
lim max[|b1 |, . . . , |bμ |] = 0.
an

¶ 533 ¶ Therefore, the analogue of (β) holds for the new sequence {Vn (x + bn )}.
Thus we have
The sequence {Vn (x + bn )} belongs simultaneously with {Vn (x)} to Φ(x) if,
and only if, (7) obtains; in particular, both sequences always have the same
normalizing factors.

§ 3. Analysis of the conditions


We will not immediately turn to the proof that the conditions stated above
are necessary and sufficient, but we begin with an analysis of these conditions.
This will yield a re-formulation which allows for a considerably simplified proof.
Assume that for a sequence {Vn (x)} of distribution functions and a se-
quence of real numbers {an } the relations

n 

(1) lim dVμ (x) = 0,
μ=1 |x|>ηan
⎧ ⎫
 
1 ⎨
n 2⎬

(2) lim 2 x dVμ (x) −


2
x dVμ (x) =1
an ⎩ |x|<ηan |x|<ηan ⎭
μ=1

hold for every η > 0. Then, obviously, an → ∞ and one has

an+1
(3) lim = 1;
an

to see the latter, we set for the moment pn = min[an , an+1 ] and qn = max[an , an+1 ];

218 On the Central Limit Theorem of Probability Theory


with η = 1 in (2) one has
qn2 − p2n
lim
qn2
   
1 
n
≤ lim 2 2
x dVμ (x) + 2 |x| dVμ (x) · |x| dVμ (x)
qn pn ≤|x|<qn |x|<qn pn ≤|x|<qn
μ=1
n 
≤ 3 lim dVμ (x) = 0.
μ=1 |x|≥pn

In the same way one easily concludes that for every  > 0, all k > 0 and suffi-
ciently large n
an+k
(4) > 1 − .
an
The characteristic relations § 2, (5) for normalizing factors are, thus, a conse-
quence of (1) and (2).
We will now see how (1) and (2) change if we replace Vn (x) by new dis-
tribution functions Vn∗ (x) where Vn∗ (x) = Vn (x + bn ). First we consider the
sequence {bn } which is defined by ¶ ¶ 534


(5) (x − bn ) dVn (x) = 0.
|x|<an

We are going to show that for every η > 0



n 

1   

(6) lim  (x − bμ ) dVμ (x) = 0.
an  |x|<ηan 
μ=1

The following proof will show that this claim remains true if one only assumes
(1), (3) and (4), but not (2).
For the proof we note that |bk | ≤ ak so that the left-hand side in (6) is, by
(1), certainly independent of η. Therefore, it is enough to prove the particular
case
n 
 
1   

(7) lim  (x − bμ ) dVμ (x) = 0.
an  |x|<2an 
μ=1

For any given  > 0 we find some N = N () such that for n > N one has
simultaneously
n 
(8) dVμ (x) < ,
1
μ=1 |x|> 5 an
an+1 3
(9) < ,
an 2
an+t 2
(10) > , t = 1, 2, . . .
an 3

[Feller 1935c] Translation — Selected Works of W. Feller, Volume 1 219


If n > N , we set n0 = n and define inductively a decreasing sequence of indices
ni , i = 1, 2, . . . in such a way that
ani−1
(11) 2< <3
ani
holds. Because of (9) this is (at least) possible until we reach an index nk
satisfying
ank < 2aN .
Denoting, for a moment, by n(μ) the largest index ni which is ≤ μ, we have
by (5)

n0  

1  
 
 (x − bμ ) dVμ (x)

an0 μ=n |x|<2an 
k 0
n0  
 
1  
 
=  (x − bμ ) dVμ (x)
an0 μ=n  aμ ≤|x|<2an 
k 0
n0 
1 
≤ |x − bμ | dVμ (x)
an0 μ=n aμ ≤|x|<2an
k 0
n0 
1 
≤ |x − bμ | dVμ (x)
an0 μ=n 23 a (μ) <|x|<2an
k n 0

k n
 i−1 
1
≤ |x − bμ | dVμ (x).
an0 2 a <|x|<2a
3 ni ni−1
i=1 μ=nk

¶ 535 ¶ Using (10), (11) and (8) this gives since |bn | ≤ an

n0  

1  
 
 (x − bμ ) dVμ (x)
an0  |x|<2an 
μ=nk 0
k n
 i−1 
4
≤ ani−1 dVμ (x)
an0 2 a <|x|<2a
3 ni ni−1
i=1 μ=nk
ni−1 
4  
k
≤ ani−1 dVμ (x)
an0 μ=n |x|> 29 an
i=1 k i−1


4  
k i
1
< ani−1 < 4 = 8.
an0 2
i=1 i=1
Denote by N0 the smallest index satisfying aN0 > 2aN such that ank < aN0 ;
N0 is a constant which does not depend on n. We choose ξ = ξ() in such a
way that
N0 

dVμ (x) < .
μ=1 |x|≥ξ

220 On the Central Limit Theorem of Probability Theory


Using |bn | ≤ an and (10), we have
N0 
1 
(x − bμ ) dVμ (x)
an |x|<2an
μ=1
N0  N0 
1  
≤ |x − bμ | dVμ (x) + 3 dVμ (x)
an |x|<ξ |x|≥ξ
μ=1 μ=1
N0 (2aN0 + ξ)
< + 3;
an
combining these estimates we get
 
n 
1    1  1 
N0 n0

 (x − bμ ) dVμ (x) ≤ ( )+ ( )
an  |x|<2an  an an μ=n
μ=1 μ=1 k

N0 (2aN0 + ξ)
< 11 + .
an
Since ξ and N0 depend only on , but not on n, the right-hand side tends to
11 as n → ∞, hence (7), and so (6), follow.
Secondly, we prove some kind of invariance property of the relations (1)
and (2). Assume that {bn } is an arbitrary sequence of real numbers such that
bn
(12) lim = 0,
an
and we set again Vn∗ (x) = Vn (x + bn ). We are going to show that (1) and (2)
imply the analogous relations for V ∗ (x), that is ¶ ¶ 536

n 
(13) lim dVμ∗ (x) = 0,
μ=1 |x|>ηan
⎧ ⎫
n ⎨
  2⎬
1
(14) lim 2 x2 dVμ∗ (x) − x dVμ∗ (x) = 1.
an ⎩ |x|<ηan |x|<ηan ⎭
μ=1

For the proof we note first that, due to (4) and (12), we also have
1
(15) lim max[|b1 |, . . . , |bn |] = 0;
an
thus one can find some N = N (η) such that for n > N and μ = 1, 2, . . . , n one
always has
|bμ | η
< .
an 2
But then one has for n > N
n  n 

dVμ∗ (x) ≤ dVμ (x)
1
μ=1 |x|>ηan μ=1 |x|> 2 ηan

[Feller 1935c] Translation — Selected Works of W. Feller, Volume 1 221


which proves (13). Moreover,
⎧ ⎫
n ⎨
  2⎬

x2
dVμ∗ (x) − x dVμ∗ (x)
⎩ |x|<ηan |x|<ηan ⎭
μ=1
⎧ ⎫
n ⎨
 ηan +bμ  ηan +bμ
2⎬

= (x − bμ )2 dVμ (x) − (x − bμ ) dVμ (x) ,


⎩ −ηan +bμ −ηan +bμ ⎭
μ=1

and a simple calculation gives, by (1),


 ⎧ ⎫ 
 1  n ⎨  2⎬ 
 ∗ ∗ 
lim  2 x dVμ (x) −
2
x dVμ (x) − 1
 an ⎩ |x|<ηan |x|<ηan ⎭ 
μ=1
 ⎧ ⎫ 
 1  n ⎨  2⎬ 
 
= lim  2 (x − bμ ) dVμ (x) −
2
(x − bμ ) dVμ (x) − 1
 an ⎩ |x|<ηan |x|<ηan
⎭ 
μ=1
      
 1  n 
 
= lim  2 −2bμ 2
x dVμ (x) + bμ dVμ (x) · dVμ (x)
 an |x|<ηan |x|<ηan |x|≥ηan 
μ=1
n 
≤ 3η 2 lim dVμ (x) = 0.
μ=1 |x|≥ηan

This proves (14).


¶ 537 ¶ If the numbers bn are defined by (5), then condition (12) is satisfied.
Indeed, using (1) one has for every  > 0

1
lim (x − bn ) dVn (x) = 0,
an |x|< an
|bn |
hence lim ≤ . Therefore, the relations (13) and (14) hold. Additionally,
an
one also has
 
n 
1   ∗


(16) lim  x dVμ (x) = 0.
an  |x|<ηan 
μ=1

Indeed, using (6) and (1) this follows from


 
n 
1   ∗


lim  x dVμ (x)
an  |x|<ηan 
μ=1
n 
 
1   ηan +bμ 

= lim  (x − bμ ) dVμ (x)
an  −ηan +bμ 
μ=1
 
n 
1    
n

≤ lim  (x − bμ ) dVμ (x) + 2η lim dVμ (x) = 0.
an  |x|<ηan  |x|> 1 an
μ=1 μ=1 2

222 On the Central Limit Theorem of Probability Theory


From (16) it follows directly

 2
1 
n
lim 2 x dVμ∗ (x) =0
an |x|<ηan
μ=1

leading to a simplification of (14). Combining all this we have:


If the sequence {Vn (x)} satisfies the relations (1) and (2) and if the numbers
bn are defined by (5), then the sequence {Vn∗ (x) = Vn (x + bn )} satisfies
n 

lim dVμ∗ (x) = 0,
μ=1 |x|>ηan
n 
1
lim x2 dVμ∗ (x) = 1,
a2n
μ=1 |x|<ηan
n 
 
1   

lim  x dVμ∗ (x) = 0.
an  |x|<ηan 
μ=1

For later reference let us make one further remark. Because of (6) the two
relations
(17)
n  
1  1 
n
lim x dVμ (x) = 0 and lim bμ dVμ (x) = 0
an |x|<ηan an |x|<ηan
μ=1 μ=1

¶ are equivalent, i.e. either both hold or fail at the same time. According to ¶ 538
(15) one has, however,

1 
n
lim bμ dVμ (x) = 0.
an |x|≥ηan
μ=1

Adding this equality to the second equality in (17), one obtains the

Theorem. In order that


n 
1 
lim x dVμ (x) = 0
an |x|<ηan
μ=1

holds, it is necessary and sufficient that for the numbers defined in (5)

1 
n
lim bμ = 0
an
μ=1

holds.

[Feller 1935c] Translation — Selected Works of W. Feller, Volume 1 223


§ 4. Proof that the conditions are sufficient
We have to prove that from (I)–(III) it follows that {Vn (x)} with the normal-
izing factors an belongs to Φ(x). For this we set again, using the numbers
defined in § 3, (5), Vn∗ (x) = Vn (x + bn ). Since, by assumption, (III) holds, the
last two theorems of §§ 2 and 3 show that the sequence {Vn (x)} belongs to
Φ(x) if, and only if, {Vn∗ (x)} has the same property; therefore, it is sufficient
to prove the latter assertion. By § 3, p. 538, however, the conditions (I )–(III )
hold for the sequence {Vn∗ (x)}. Thus, we only have to prove the following:
If (I ), (II ) and (III ) (p. 525) hold, the relations (α) and (β), § 2, p. 530
follow. – As stated in the theorem, we use in the following proof a fixed interval
|t| < T .

a) First we verify (β). For this we choose for any given  > 0 some
δ = δ(, T ) such that for |x| < δan and |t| < T
 
xt   xt  
1 − cos < , 
an 2 sin an  < 2

holds; moreover, we pick N = N () in such a way, that for n > N and the
above δ
n 

dVμ (x) <
|x|>δan 4
μ=1

¶ 539 ¶ holds; this is possible because of (I ). Then we have


 +∞   
xt  
1 − cos dVμ (x) = + < + dVμ (x) ≤ ,
−∞ an |x|≤δan |x|>δan 2 2 |x|≤δan

and this is the first inequality (β). The analogous inequality for the sine follows
either in the same way, or directly by an application of Schwarz’ inequality.

b) Using the notation of a) we have, because of


 
 ixt  1 x2 t2
1 − e ixt
an + ≤ ,
 an  2 a2n
that
n 
 +∞   
 ixt 
 1 − e an dVμ (x)

μ=1 −∞

n 
 
n 

   
   
≤  +  
 |x|≤δan   |x|>δan 
μ=1 μ=1
 
n  n 
T    T2 

< +  x dVμ (x) + 2 x2 dVμ (x),
an  |x|≤δan  2an |x|<δan
μ=1 μ=1

224 On the Central Limit Theorem of Probability Theory


which implies that there is a bound M = M (T ) such that
n   n  
 
1 − vμ t    +∞  ixt
 
 < M.
(1)   =  1 − e an dV μ (x) 
an −∞
μ=1 μ=1

c) From this point onwards, the proof basically does not differ from
P. Lévy’s proof of Lindeberg’s theorem.  
Since by (β) all characteristic functions vμ atn are uniformly close to 1
for sufficiently large n, we can take logarithms in (α), if we agree to use the
principal branch. Then, (α) is equivalent to (always uniformly in |t| < T )


n
t t2
(2) lim log vμ =− ;
an 2
μ=1

using Landau’s symbols we have


$ %
t t t
(3) log vμ = − 1 − vμ + o 1 − vμ ,
an an an

and so, by (1),


n n $
 %
t t
(4) log vμ ∼− 1 − vμ ,
an an
μ=1 μ=1

¶ where the symbol ∼ indicates that the difference of both sides uniformly ¶ 540
tends to zero. All that remains is to prove the following relation which is
equivalent to (2):
n 
 +∞  
ixt t2
(5) lim 1 − e an dVμ (x) = .
2
μ=1 −∞

According to (I )–(III ) one can find for every  > 0 some N such that for
n>N
n 
 
dVμ (x) < ,
2
μ=1 |x|≥ an
 
 n  
 1  
 x dVμ (x) − 1 < ,
2
 a2
 n μ=1 |x|< an 
 
1   
n

 x dVμ (x) < .
an  |x|< an 
μ=1

[Feller 1935c] Translation — Selected Works of W. Feller, Volume 1 225


But then one has
 n  
  +∞   2
 ixt t 
 1 − e an dVμ (x) − 
 2 
μ=1 −∞
 
 n    2
 ixt t 
< + 1 − e an dVμ (x) − 
 2 
μ=1 |x|< an

n 
  
  t2  1  
 xt   
< +  dVμ (x) +  2 x dVμ (x) − 1
2
 |x|< an an  2  an |x|< an 
μ=1
n 
|t|3 1 
+ |x|3 dVμ (x)
3! a3n |x|< an
μ=1
n 
2 T 1 
3
< (1 + T + T ) + x2 dVμ (x)
3! a2n |x|< an
μ=1
2 3 3
< (1 + T + T + T + T ).

This proves (5).


Incidentally, the proof shows a bit more. Replacing any subsequence {Vnk (x)}
by {1 − Vnk (−x)}, then the conditions (I ), (II ) and (III ) remain valid for the
new sequence. Therefore, these conditions are even sufficient for the sequence
¶ 541 {Vn (x)} to belong properly to Φ(x). ¶

§ 5. On the Lindeberg condition


As a particular case of the result of § 4 one obtains Lindeberg’s theorem. This
can be stated in the following way:
Assume that the distribution functions Vn (x) have vanishing first and finite
second moments:
 +∞  +∞
(1) x dVμ (x) = 0, x2 dVμ (x) = σμ2 ,
−∞ −∞

and set

n
(2) s2n = σμ2 .
μ=1

If for every η > 0


n 
1 
(3) lim 2 x2 dVμ (x) = 1
sn |x|<ηsn
μ=1

holds, then {Vn (x)} with the normalizing factors sn belongs properly to Φ(x).

226 On the Central Limit Theorem of Probability Theory


In fact, (3) coincides with (II ) for an = sn ; moreover, one has
 n  n 
1 
dVμ (x) ≤ 2 2 x2 dVμ (x)
|x|≥ηsn
η sn |x|≥ηsn
μ=1 μ=1
⎧ ⎫
⎨ n  ⎬
1 1
= 2 1− 2 x2 dVμ (x) ,
η ⎩ sn |x|<ηsn ⎭
μ=1

from which (I ) follows; (III ) is obtained in the same way.


At the same time, our theorem shows that condition (3), even if one as-
sumes (1), is not necessary. There are various direct generalizations of it, the
perhaps simplest one being
Assume that (1) holds and that for every η > 0
 n 
lim dVμ (x) = 0.
μ=1 |x|>ηsn

If the sequence
n 
2 1 
αn = 2 x2 dVμ (x)
sn |x|<sn
μ=1
has a positive lower bound, then the sequence {Vn (x)} with the normalizing
factors an = αn sn belongs properly to Φ(x). The proof that (I )–(III ) are
indeed satisfied, is obvious.
¶ It is, however, easy to show that the Lindeberg condition (3) is neces- ¶ 542
sary for the fact that {Vn (x)} with the normalizing factors sn belongs to Φ(x).
This assertion is, of course, contained in the proof of the next section. That
proof, however, is rather complicated while the proof of the particular case
considered here is immediate. On the other hand, it is interesting to note that
Lindeberg’s theorem completely covers the practically only relevant normal-
ization (2); therefore we include its proof.
Thus, assume that {Vn (x)} with the normalizing factors sn belongs to
Φ(x), i.e. § 2, (α) and (β) hold with an = sn . Then one has by (1)
n   n  +∞   
     t2
(4) 1 − vμ t  =  1 − e
ixt
sn dV (x) < .
 sn   μ  2
μ=1 μ=1 −∞
 t 
Moreover, because of (β), vμ sn tends to 1 uniformly, which means that we
may take, as in § 4, (2)–(5), logarithms in (α) and conclude that, uniformly
for |t| < T ,
n  +∞  
ixt t2
1 − e sn dVμ (x) → ;
−∞ 2
μ=1
therefore, we have
 
 2 n  +∞ 
t  xt 
(5) lim  − 1 − cos dVμ (x) = 0.
 2 μ=1 −∞ sn 

[Feller 1935c] Translation — Selected Works of W. Feller, Volume 1 227


By the easily established Chebyshev inequality one has for every η > 0
n 
1
dVμ (x) ≤ 2 ,
|x|≥ηsn η
μ=1

and so it follows from (5) that


 
 2 n  
t  xt  2
(6) 
lim  − 1 − cos dVμ (x) ≤ 2 .
2 μ=1 |x|<ηsn sn  η

On the other hand, we have


n  n 
xt t2 1  t2
1 − cos dVμ (x) ≤ · 2 x2 dVμ (x) ≤ ,
|x|<ηsn sn 2 sn |x|<ηsn 2
μ=1 μ=1

¶ 543 ¶ and inserting this into (6) yields


⎧ ⎫
2 ⎨ n  ⎬
t 1 2
0 ≤ lim 1 − 2 x2 dVμ (x) ≤ 2 .
2 ⎩ sn |x|<ηsn ⎭ η
μ=1

Since this inequality holds for all t, we conclude that the expression inside the
curly braces converges to 0, q.e.d.

§ 6. Proof that the conditions are necessary


Now we assume that the conditions § 2, (α) and (β) are satisfied, and we want
to infer that the conditions (I), (II) and (III) hold.
Compared to the arguments in § 4 and 5, the following proof is relatively
involved which is due to the fact that we cannot
  restrict ourselves to the linear
terms in the Taylor expansion of log vμ atn . In fact, there are sequences
belonging to Φ(x) such that not even the sum of the absolute values of the
quadratic terms remains bounded (cf. § 8, no. e). Our next aim is to simplify
the considerations and to reduce everything to sequences where we need at
most quadratic terms in the Taylor development; this can be achieved by a
transformation of the form Vn∗ (x) = Vn (x + bn ). One can show that this is the
case if, and only if,
n  +∞
xt
1 − cos dVμ (x)
−∞ an
μ=1

stays bounded in every finite interval.


First we prove that for every η > 0 one has

(1) lim dVn (x) = 0.
|x|>ηan

228 On the Central Limit Theorem of Probability Theory


– By § 2, (β) there is some N = N () such that for |t| < 2
η and n > N
 +∞
xt
1 − cos dVn (x) < .
−∞ an

Then we also have



xt
1 − cos dVn (x) < 
|x|>ηan an

¶ and integrating this in t over 0 < t < T = 2


η yields ¶ 544

an xT
T ≥ T− sin dVn (x)
|x|>ηan x an
 
1 T
≥ T− dVn (x) = dVn (x).
η |x|>ηan 2 |x|>ηan

This proves (1).


Now pick bn such that for Vn∗ (x) = Vn (x + bn )
 ξ  ∞
1 1
lim dVμ∗ (x) ≤ 1 − , lim dVμ∗ (x) ≤
ξ→0− −∞ 2 ξ→0+ ξ 2

holds. Obviously, this can be done in exactly one way. Note that the following
arguments remain valid if we replace on the right-hand sides 12 by any real
number q < 1: This changes in all calculations only some not really important
constants.d
From (1) one directly obtains that
bn
(2) → 0.
an
Therefore, cf. § 3, p. 535, the sequence {Vn (x)} satisfies the conditions (I) and
(II) if, and only if, the corresponding relations are valid for {Vn∗ (x)}, and we
will show that this is indeed the case. Of course, the sequence {Vn∗ (x)} need
not belong to Φ(x). For the corresponding characteristic functions, however,
one has

(3) vμ∗ (t) = vμ (t)e−ibμ t ;

since the coefficients an satisfy the inequality § 2, (2), p. 531, it follows from
(2), again, that
1
lim max[|b1 |, . . . , |bn |] = 0.
an
   
Thus, (3) entails that, together with vμ atn → 1, the term vμ∗ atn converges
d See the footnote to § 1, (6).

[Feller 1935c] Translation — Selected Works of W. Feller, Volume 1 229


to 1 uniformly in every finite interval. Moreover,
   
 ∗ t   
 wn  = wn t  → e− 12 t2 .
 an   an 
Both properties of the sequence {Vn∗ (x)} are a consequence of the fact that
{Vn (x)} belongs to Φ(x). If it is possible to prove that (I) and (II) are true
for {Vn∗ (x)}, then this proves the same for {Vn (x)}.
¶ 545 ¶ In order to prove that the conditions (I) and (II) are necessary for {Vn (x)}
to belong to Φ(x), it is enough to verify that (I) and (II) hold for every sequence
{Vn (x)} satisfying
n  
 
(4) vμ t  → e− 12 t2 ,
 an 
μ=1
t
(5) vμ → 1,
an
 ξ  ∞
1 1
(6) lim dVμ (x) ≤ , lim dVμ (x) ≤ .
ξ→0− −∞ 2 ξ→0+ ξ 2
The limits (4) and (5) are, of course, meant to be uniform in the sense of § 2,
(α) and (β).
First we prove that, by (6), for |t| < T , μ = 1, 2, . . . , n and sufficiently large
n we have
 +∞ 2 
xt 2 +∞ 2 xt
(7) sin dVμ (x) < sin dVμ (x).
−∞ an 3 −∞ an
π
For this we set η = and for any fixed 0 < t < T we denote by J  the set
2T
xt xt
of all points x for which sin > 0, and by J  the set for which sin < 0.
an an
Then J  contains the interval (0, ηan ), J  the interval (0, −ηan ). Because of
(5), we have (1), and for sufficiently large values of n

1
dVμ (x) < ,
|x|>ηan 6
hence,  
2 2
and
dVμ (x) < dVμ (x) <
J  3 J  3
Thus, one has for J = J  or J = J  , respectively, applying the Schwarz in-
xt
equality to sin and 1
an
 +∞ 2  2 
xt xt 2 xt
sin dVμ (x) < sin dVμ (x) < sin2 dVμ (x)
−∞ a n J a n 3 J an

2 +∞ 2 xt
≤ sin dVμ (x),
3 −∞ an

230 On the Central Limit Theorem of Probability Theory


and this yields (7).
¶ If we use throughout the principal branch of the logarithm, we get from ¶ 546
(4) and (5)


n
t t2
(8) R log vμ →− .
an 2
μ=1

Now we have
(9)
 +∞    +∞   2
t ixt 1 ixt
log vμ =− 1 − e an dVμ (x) − 1 − e an dVμ (x)
an −∞ 2 −∞
 +∞   2
 ixt 
+ o  1 − e an dVμ (x) .
−∞

As soon as n is so large that (7) is satisfied and, by (5),


 +∞
xt 1
1 − cos dVμ (x) < (μ = 1, . . . , n)
−∞ an 6

holds, then it follows that


 +∞   2 
 ixt  3 +∞ xt
(10)  1−e an 
dVμ (x) < 1 − cos dVμ (x).
 2 −∞ an
−∞

According to (9) one has for sufficiently large n


 +∞
t 1 xt
−R log vμ > 1 − cos dVμ (x).
an 5 −∞ an

Thus we get because of the uniformity in (8) that for |t| < T and all n
n 
 +∞
xt
(11) 1 − cos dVμ (x) < M
an
μ=1 −∞

where, of course, M = M (T ).
This yields, by (10), that the sum of the absolute values of the quadratic
terms in (9) for μ = 1, . . . , n stays bounded, and so we have


n
t
(12) log vμ ∼
an
μ=1
  

n +∞   +∞   2
ixt 1 ixt
− 1−e an dVμ (x) + 1−e an dVμ (x) ,
−∞ 2 −∞
μ=1

[Feller 1935c] Translation — Selected Works of W. Feller, Volume 1 231


¶ 547 ¶ where the symbol ∼ indicates once again that the difference of both sides
tends uniformly to 0. Furthermore, by (5) it follows from (11) that
 n  +∞ 2
xt
1 − cos dVμ (x) → 0.
−∞ an
μ=1

Thus (8) and (12) finally imply that, uniformly in |t| < T ,
(13)  

n +∞  +∞ 2
xt 1 xt t2
lim 1 − cos dVμ (x) − sin dVμ (x) = .
−∞ an 2 −∞ a n 2
μ=1

This is the fundamental relation from which we will derive the necessity of (I)
and (II).
To simplify (13) even further, we need two more consequences of (11).
First, according to (11) one has, a fortiori, for every η > 0
n 
xt
1 − cos dVμ (x) < M.
|x|>ηan a n
μ=1

Integrating this inequality in t over the interval


2
0 < t < T, T≥
η
we obtain (cf. the proof of (1))
n 

1
T− dVμ (x) < M T.
η
μ=1 |x|>ηan

Thus, for every η > 0 there is a constant M  = M  (η) such that


n 
(14) dVμ (x) < M  .
μ=1 |x|>ηan

For any given η we pick τ = τ (η) such that for |x| < ηan
xτ 1 x2 τ 2
1 − cos ≥ · 2 .
an 4 an
Again we have by (11)
n  n 
xτ τ2 1 
M> 1 − cos dVμ (x) ≥ · 2 x2 dVμ (x).
|x|<ηan an 4 an |x|<ηan
μ=1 μ=1

This gives that also


n 
1 
(15) x2 dVμ (x) < K
a2n |x|<ηan
μ=1

232 On the Central Limit Theorem of Probability Theory


is bounded, where K = K(η).
¶ Using these relations it is possible to simplify (13) considerably: First of ¶ 548
all we can get rid of the sine. We have

(16)
n  +∞ 2 
n  2
xt xt
sin −
dVμ (x) sin dVμ (x)
−∞ an |x|<ηan an
μ=1 μ=1
  
 n +∞
xt xt xt
= sin dVμ (x) + sin dVμ (x) sin dVμ (x).
−∞ an |x|<ηan an |x|≥ηan an
μ=1

By the Schwarz inequality,

 2 
xt xt
(17) sin dVμ (x) ≤ sin2 dVμ (x)
|x|<ηan an |x|<ηan an

xt
≤2 1 − cos dVμ (x)
|x|<ηan an
 +∞
xt
≤2 1 − cos dVμ (x);
−∞ an

by (5) (= § 2, (β)) the expression inside the curly braces on the right-hand
side of (16) tends uniformly to 0, and we find for sufficiently large n

 n   2 
 +∞ 2 
n
 xt xt 
 sin dVμ (x) − sin dVμ (x) 
 −∞ an |x|<ηan an 
μ=1 μ=1
n 
< dVμ (x) < M  .
μ=1 |x|≥ηan

Thus, it follows from (13) that also


(18) ⎧ ⎫
n ⎨ +∞
  2⎬
xt 1 xt t2
lim 1 − cos dVμ (x) − sin dVμ (x) = .
⎩ −∞ an 2 |x|<ηan an ⎭ 2
μ=1

Moreover, we have because of (1) for sufficiently large values of n and μ =

[Feller 1935c] Translation — Selected Works of W. Feller, Volume 1 233




1, . . . , n that dVμ (x) < ,g and so
|x|> a n 2
4T

 n   2 
 2
n
 xt xt 
 sin dVμ (x) − dVμ (x) 
 |x|<ηan
an |x|<ηa n
an 
μ=1 μ=1
 
 n   
 xt xt xt xt 
= sin + dVμ (x) sin − dVμ (x)
 an an an an 
μ=1 |x|<ηan |x|<ηan

n 
x2 t2
≤ (1 + η|t|) 2
dVμ (x) < (1 + η|t|)t2 · K · ;
|x|<ηan an
μ=1

¶ 549 ¶ further, we get from (18)


(19) ⎧ ⎫
n ⎨ +∞  2⎬
xt t2 1 t2
lim 1 − cos dVμ (x) − · 2 x dVμ (x) = .
⎩ −∞ an 2 an |x|<ηan ⎭ 2
μ=1

Now we pick τ = τ (, η) so small that for |t| < τ and |x| < ηan we have

xt x2 t2
(20) 1 − cos ≥ (1 − ) .
an 2 a2n

For |t| < τ we find


(21) ⎧ ⎫
n ⎨ +∞ 2  2⎬
xt t
1 − cos dVμ (x) − 2 x dVμ (x)
⎩ −∞ an 2an |x|<ηan ⎭
μ=1
⎧ ⎫
n ⎨ 2  2⎬
xt t
≥ 1 − cos dVμ (x) − 2 x dVμ (x)
⎩ |x|<ηan an 2an |x|<ηan ⎭
μ=1
⎧ ⎫
 
t2  ⎨
n 2⎬
t2
≥ 2 x2 dVμ (x) − x dVμ (x) − · K · .
2an ⎩ |x|<ηan |x|<ηan ⎭ 2
μ=1

Inserting this into (19) yields


⎧ ⎫
n ⎨  2⎬
1
(22) lim 2 x2 dVμ (x) − x dVμ (x) ≤ 1.
an ⎩ |x|<ηan |x|<ηan ⎭
μ=1

g Corrected according to Footnote 1 in [Feller 1937b].

234 On the Central Limit Theorem of Probability Theory


Using (14) we obtain an upper estimate of the left-hand side of (19):
⎧ ⎫
n ⎨ +∞ 2  2⎬
xt t
1 − cos dVμ (x) − 2 x dVμ (x)
⎩ −∞ an 2an |x|<ηan ⎭
μ=1
⎧ ⎫
2 n ⎨
  2⎬
t
≤ 2M  + 2 x2 dVμ (x) − x dVμ (x) .
2an ⎩ |x|<ηan |x|<ηan ⎭
μ=1

Inserting this into (19) yields, if we use the shorthand


⎧ ⎫
n ⎨  2⎬
1
A2 = lim 2 x2 dVμ (x) − x dVμ (x)
an ⎩ |x|<ηan |x|<ηan ⎭
μ=1

that
t2
(1 − A2 ) ≤ 2M  .
2
By (22), the left-hand side is non-negative and, since the estimate holds for
all t, we have A = 1. Together with (22) this gives condition (II), which shows
that it is necessary.
¶ The necessity of condition (I) is an immediate consequence. For the ¶ 550
given interval |t| < T we find some η̄ = η̄(T, ) such that for |x| < η̄an (20)
holds. Using (II), the estimate (21) gives for η < η̄
⎧ ⎫
n ⎨  2⎬
xt t2
lim 1 − cos dVμ (x) − 2 x dVμ (x)
⎩ |x|<ηan an 2an |x|<ηan ⎭
μ=1

t2
≥ (1 − K),
2
and from (19) it follows for η < η̄ that
n 
 xt
lim 1 − cos dVμ (x) ≤ .
|x|≥ηan an
μ=1

Since the left-hand side decreases monotonically as η grows, this estimate holds
for every η > 0; thus, the sum on the left-hand side converges to 0, and the
limit is, of course, again uniform. So, one has for n > N (, T )
n 
xt
1 − cos dVμ (x) < 
|x|>ηan an
μ=1

and, by integration, one concludes from this that


n 
1 
T− dVμ (x) < T.
η |x|>ηan
μ=1

[Feller 1935c] Translation — Selected Works of W. Feller, Volume 1 235


This proves the necessity of (I).
All that remains is to show that (III) holds. For this one must not fix
the coordinate origins by (6) but one has to consider the general case. The
inequality (15) which would prove the claim at once need not be satisfied. – We
know, however, that for the sequence {Vn (x)} (I) and (II) hold. If we define
the numbers bn by § 3, (5), then we know according to p. 537 that (I ), (II )
and (III ) hold for the sequence {Vn∗ (x) = Vn (x + bn )}. By § 4 the sequence
{Vn∗ (x)} belongs to Φ(x), and by § 2, p. 533, {Vn (x)} belongs to Φ(x) if, and
only if,
1 
n
lim bμ = 0.
an
μ=1

¶ 551 ¶ By § 3, p. 538 this is true if, and only if,


n 
1 
(23) lim x dVμ (x) = 0.
an |x|<ηan
μ=1

Since, by assumption, {Vn (x)} belongs to Φ(x), (23) holds, and this proves
the necessity of (III).
It is a direct consequence of the just established necessity of the conditions
(I)–(III) that (I )–(III ) are necessary for the fact that {Vn (x)} belongs properly
to Φ(x).
Indeed, by definition it is necessary that (I), (II) and the condition

1 
n
lim iμ x dVμ (x) = 0
an |x|<ηan
μ=1

hold where the ik may attain, independently of each other, the values ±1.
This is just (III ). By (I) one has uniformly in μ = 1, . . . , n

1
lim x dVμ (x) = 0
an |x|<ηan

and, therefore, (I) and (III ) together yield


 2
1 
n
lim x dVμ (x) = 0.
a2n |x|<ηan
μ=1

The condition (II) simplifies, because of (III ), to (II ) and this proves its
necessity.

§ 7. Proof of the criterion


Now we have to provide a method to decide if for a given sequence {Vn (x)}
there are two sequences of real numbers {an } and {bn } such that the sequence

236 On the Central Limit Theorem of Probability Theory


{Vn (x + bn )} with the normalizing factors an belongs to Φ(x). The previous
results reduce this problem to the question, if it is possible to determine {an }
and {bn } such that the conditions (I) and (II) are satisfied. If so, we may,
cf. § 3, (5), afterwards shift the origins in such a way that even (I )–(III ) hold,
i.e. that the sequence belongs properly to Φ(x).
Since there are no restrictions in the choice of suitable bn , it is no restriction
of generality if we fix the origins of the Vn (x) by § 6, (6). Then, we have ¶
by § 3, p. 535 (using § 6, (2), p. 544): If there exists at all a sequence {bn } ¶ 552
such that the conditions (I) and (II) hold for the sequence {Vn (x + bn )}, then
they already hold for the sequence {Vn (x)}. The proof of the criterion from
§ 1, p. 527 will be completed once the following theorem is established:
Assume that the origins of the Vμ (x) are fixed by § 6, (6) and that the
numbers pn (δ) are defined as in § 1, p. 526. For the existence of a sequence
of real numbers {an } such that the conditions (I) and (II) hold it is necessary
and sufficient that for every δ > 0
n 
1 
(1) lim 2 x2 dVμ (x) = ∞.
pn (δ) |x|<pn (δ)
μ=1

If this is the case, then there is a sequence δn → 0 such that


n 
1 
(2) lim 2 x2 dVμ (x) = ∞;
pn (δn ) |x|<pn (δn )
μ=1

setting
⎧ ⎫
n ⎨
  2⎬

(3) a2n = x2 dVμ (x) − x dVμ (x) ,


⎩ |x|≤pn (δn ) |x|≤pn (δn ) ⎭
μ=1

then the sequence {an }, together with the distribution functions Vn (x), satisfies
the conditions (I) and (II).

Proof. a) Assume that (I) and (II) hold. By (I) one has for every δ > 0
pn (δ)
(4) lim = 0,
an
and by (II)
n 
1 
(5) lim x2 dVμ (x) ≥ 1.
a2n |x|<an
μ=1

If n is large enough to guarantee that pn (δ) < an , one has


n  n  n 
1  2 1  2 1 
x dV μ (x) = x dV μ (x) + x2 dVμ (x)
a2n a2n a2n
μ=1 |x|<an μ=1 |x|≤pn (δ) μ=1 pn (δ)<|x|<an
n 
1 
≤ 2 x2 dVμ (x) + δ.
an |x|≤pn (δ)
μ=1

[Feller 1935c] Translation — Selected Works of W. Feller, Volume 1 237


If δ < 1, then the condition (1) follows immediately because of (4) and (5).
On the other hand, as soon as (1) holds for some particular δ, then it holds
for all larger values of δ. This proves the necessity of (1) for all positive δ.
¶ 553 ¶ b) The sufficiency of the criterion is contained in the following, more
general theorem, which is much more amenable for practical applications:
If the sequence of real numbers {qn } satisfies either the two relations (7)
or the relations (8) of § 1, p. 527, then the sequence {an } defined by
⎧ ⎫
n ⎨
  2⎬

a2n = x2 dVμ (x) − x dVμ (x)


⎩ |x|<qn |x|<qn ⎭
μ=1

satisfies, together with the distribution functions {Vn (x)}, the conditions (I)
and (II).
From the assumption on the position of the origins we get, by the Schwarz
inequality,
 2 
1
x dVμ (x) ≤ x2 dVμ (x),
|x|<qn 2 |x|<qn

and so,
n 
1
a2n ≥ x2 dVμ (x).
2 |x|<qn
μ=1

If we assume (7), then lim = 0, and if (8) holds, then lim aqnn is finite. Because
qn
an
of the definition of qn and an , the condition (I) is satisfied.
Moreover, if we assume (7), one has for all sufficiently large n such that
qn < ηan
 ⎧ ⎫ 
 1  n ⎨  2⎬ 
 
 2 x dVμ (x) −
2
x dVμ (x) − 1
 an ⎩ |x|<ηan |x|<ηan
⎭ 
μ=1
 ⎧ ⎫
 1  n ⎨  2⎬

= 2 x2 dVμ (x) − x dVμ (x)
 an ⎩ |x|<ηan |x|<ηan ⎭
μ=1
⎧ ⎫
  2 ⎬ 
1 ⎨
n

− 2 x2 dVμ (x) − x dVμ (x) 
an ⎩ |x|<qn |x|<qn ⎭
μ=1
n 

≤ 3η 2 dVμ (x),
μ=1 |x|≥qn

and the right-hand side tends to 0, which proves (II) in this case. – If we
assume (8), an analogous estimate holds, only the right-hand side becomes,
say,
n 

6 dVμ (x)
μ=1 |x|≥σqn

238 On the Central Limit Theorem of Probability Theory


where σ is chosen in such a way that σqn < an .
This proves the criterion as well as the additional remark in § 1, p. 527. ¶ ¶ 554

§ 8. Examples
a) The case where all components are identical: Vn (x) = V (x).
The discussion above shows that we can apply the criterion if we fix the origin
in such a way that V (0) = 0 and = 1. The numbers pn (δ) are defined as the
smallest real numbers such that

δ
dVμ (x) ≤ .
|x|>pn (δ) n

In this context it is more natural to eliminate n and to use nδ = ζ as variable.


Then the criterion reads: Assume that Z = Z(ζ) is the smallest number such
that 
dVμ (x) ≤ ζ.
|x|>Z

For the existence of a sequence {bn } such that {Vn (x) = V (x + bn )} belongs to
Φ(x), it is necessary and sufficient that

1
lim x2 dV (x) = ∞.
ζ→0 ζZ 2 |x|≤Z

Intuitively, this means that the speed of convergence of V (x) as x → ±∞


is not too slow. In particular, consider the function12 given by

⎪ 1

⎪ if x ≤ −1,

⎪ 2|x|s

1
V (x) = ” |x| ≤ 1,

⎪ 2



⎩1 − 1 ” x ≥ 1,
2xs
where s > 0 is some constant. We have
 n 1/s
pn (δ) = ,
δ
and the condition becomes
 ( n )1/s
2 1− 2s
δ dx
sδ · n
s
s−1
→ ∞.
1 x

From this we see: The sequence belongs for s ≥ 2 to Φ(x), whereas for s < 2 ¶
12 The assumption on the position of the origin is here and in the following examples

[Feller 1935c] Translation — Selected Works of W. Feller, Volume 1 239


¶ 555 it doesn’t. In order to calculate the normalizing factors, we may set

1
δn = ,
log log n
and obtain by § 7, (3),
 (n log log n)1/s
dx
a2n = ns .
1 xs−1
Since every equivalent sequence does the same job, we see: One can use the
s
following normalizing factors: a2n = n if s > 2, and a2n = n log n if s = 2.
s−2
It is worth noticing that in the latter case a2n grows faster than n, although
there are n identical components.

b) Let Vn (x) be a step function with five jumps which have the following
sizes
1⎫ ⎪
⎪ x = ±1
2c ⎪




1 1 1 ⎬
1− for x = ±n, c>1
2 c n2 ⎪ ⎪


1 1 1 ⎪ ⎪

1− − 1− ⎭ x = 0.
c c n 2

The first√moment vanishes, the second equals 1. The usual normalization would
be sn = n; in fact one finds, either by an application of the general criterion,
or by § 5, p.&542, that the sequence belongs to Φ(x) with the normalizing
factors an = nc . Lindeberg’s condition is, of course, not satisfied.
It is instructive to use the example in order to understand how the conver-
gence is achieved. One has
' n $ ' ' %

n
c  1 c 1 1 c
vμ t = 1− 1 − cos t − 2 1− 1 − cos μt .
n c n μ c n
μ=1 μ=1

For the first expression inside the braces it is enough to consider the linear
term of the Taylor development, whereas the second expression requires a
completely different estimate. We write
'
c t2
vμ t = 1− − φ(t) − ψμ (t),
n 2n

where '
1 c t2 ct4
φ(t) = 1 − cos t − , hence |φ(t)| <
c n 2n n2
satisfied because of symmetry; the same applies to the conditon (III ).

240 On the Central Limit Theorem of Probability Theory


¶ and ¶ 556
⎧ 2
' ⎪

2
if n 3 ≤ μ ≤ n,
1 1 c ⎨ 43
n
0 ≤ ψμ (t) = 1− 1 − cos μt <
μ2 c n ⎪
⎪ 2
⎩ ct if μ ≤ n 3 .
2

n
Then '  

n
c t2
n 
n
φ(t) + ψμ (t)
vμ t = 1− 1− 2
μ=1
n 2n
μ=1 1 − 2n
t

and the previous estimates show that the product on the right-hand side tends
1 2
to 1; thus, the whole expression tends to e− 2 t .

c) Let Vμ (x) be again a step function, now with jumps of the size
1
if x = ±1,
4
μ−1
” x = ±μ2 ,
2μ4
1 μ−1
− 4 ” x = 0.
2 μ
2μ − 1
The second moment of Vμ (x) is σμ2 = ; with the usual normalization
2
2 1 2
sn = 2 n we obtain a sequence which converges to E(x) (Definition § 1, (3),
p. 523).
In order to apply the criterion we note that for every sequence {pn } which
monotonically diverges to ∞ one has
 n 
dVμ (x) → 0.
μ=1 |x|>pn

Moreover, one has for such sequences


n  n 
1  1  n
x2
dV μ (x) ≥ x2 dVμ (x) = 2 .
p2n |x|<pn p 2
n 3
|x|< 2 2p n
μ=1 μ=1

Thus, it is enough to assume that pn grows slower than n to get the limit
∞. For example, one could use
n 

a2n = √
x2 dVμ (x);
μ=1 |x|< n
3

this sequence is equivalent to 12 n. Thus, the sequence {Vμ (x)} with the nor-

malizing factors 12 n (properly) belongs to Φ(x).

[Feller 1935c] Translation — Selected Works of W. Feller, Volume 1 241


¶d) Example of a sequence which does not belong to Φ(x). Let
¶ 557 Vμ (x) be the step function with jumps given by

1 ⎪ 1

2 ⎪ x = ±μ 3 ,
2μ 3 ⎪




μ−1⎬
for x = ±μ,
μ2 ⎪⎪


1 μ−1⎪ ⎪
1− 2 −2 2 ⎪ ⎪
⎭ x = 0.
μ3 μ
Then we have for 0 < η < 1
 n  
n
μ−1
dVμ (x) ≥ 2 ∼ −2 log η,
μ2
μ=1 |x|>ηn μ=[ηn]+1

which means that the sequence {an } should grow faster than n. For any such
sequence, however, one has
n 
1  n2
x2
dV μ (x) = → 0,
a2n |x|<an a2n
μ=1

while the limit inferior of this quantity with the normalizing factors should
be ≥ 1. Hence, there exists no sequence of constants such that {Vn (x + bn )}
belongs to Φ(x).

e) Let ⎧

⎪0 if x < 1

⎨ 1

Vμ (x) = ” 1 ≤ x < 1+ 3 μ


2
μ3

⎩ √
1 ” 1+ 3 μ ≤ x
√  2 6
and set an = n. Then we have for n > η
n 

dVμ (x) = 0
μ=1 |x|>ηan

and
⎧ ⎫  
n ⎨
  2⎬
1
n
1 1
x2
dV μ (x) − x dVμ (x) = 1 − 2 → 1,
a2n ⎩ |x|<ηan |x|<ηan ⎭ n μ3
μ=1 μ=1

which means that the conditions (I) and (II) are satisfied. On the other hand,
we have
n 
1  1 1
n
x2
dV μ (x) > μ 3 → ∞.
a2n |x|<ηan n
μ=1 μ=1

242 On the Central Limit Theorem of Probability Theory


The relations (15), hence (11), of § 6, pp. 547 and 546, respectively, fail in
this case, and the example shows that the complications causing the relatively
involved argument in § 6 can really occur; ¶ if we study this sequence directly, ¶ 558
it would not be enough to consider only the quadratic terms of the Taylor
expansions of the characteristic functions. Of course, (III) is not satisfied in
this case. If we set V̄2μ (x) = Vμ (x) and V̄2μ−1 (x) = 1−Vμ (−x), then (III) holds
for {V̄μ (x)} while nothing else changes. – In order to derive from {Vμ (x)}
a sequence belonging to Φ(x) we have to use, according to § 3, (5) bn =
√ 1
1+ 3 n− √ 3
. The sequence {Vn (x + bn )} properly belongs to Φ(x) with the
n √
normalizing constants an = n.

Appendix
Let us finally (cf. § 1, p. 524) provide an example of a sequence {Vn (x)} such
that Vn (cn x) → Φ(x), but there is no sequence {an } for which Wn (an x) →
Φ(x).
Denote by lμ the solution of the equation Φ(lμ ) = 1 − μ1 , and set for any
integer k > 0
√ 2k+1 +1
m2k +1 = m2k +2 = · · · = m2k+1 = 2 .

Then we define
⎧   √ μ

⎪Φ √x μ if 0 ≤ x ≤ 2 lμ ,

⎨ 2
√ μ
Vμ (x) = Φ(lμ ) ” 2 lμ ≤ x < mμ ,




1 ” mμ ≤ x,
Vμ (x) = 1 − Vμ (−x) if x < 0.
√ μ 
Then Vμ 2 · x tends uniformly to Φ(x). By § 2, p. 532 the only possible
√ n+1
normalizing factors are sequences which are equivalent to an = 2 . – For
the characteristic function we get
 +∞
2
vμ (t) = eixt dVμ (x) = 1 − uμ (t) − (1 − cos mμ t)
−∞ μ

with 
 
√ μ (1 − e
ixt
uμ (t) = ) dΦ √x μ .
2
|x|< 2 lμ

It is easy to see that


n (
  ) 1 2
1 − uμ √ t
n+1 → e− 2 t .
2
μ=1

[Feller 1935c] Translation — Selected Works of W. Feller, Volume 1 243


¶ 559 ¶ If {Vn (x)} belonged to Φ(x), then we would also have


n   1 2
vμ t
√ n+1 → e− 2 t ,
2
μ=1

hence, ⎧ ⎫

⎪ 2 mμ t ⎪

n ⎪
⎪ 1 − cos √ n+1 ⎪

 ⎨ μ ⎬
2
1− → 1,
⎪ ⎪
μ=1 ⎪


t ⎪


⎩ 1 − uμ √ n+1 ⎭
2
or

n
1 mμ t
1 − cos √ n+1 → 0.
μ 2
μ=1

Setting n = 2k+1 one has, in particular:


k+1

n
1 mμ t
2
1 mμ t
1 − cos √ n+1 ≥ 1 − cos √ n+1
μ 2 μ 2
μ=1 μ=2k +1
k+1
2
1
= (1 − cos t) ∼ (1 − cos t) log 2.
μ
μ=2k +1

Stockholm, April 1935.

(Received 5–May–1935)

244 On the Central Limit Theorem of Probability Theory


Ó Springer International Publishing Switzerland 2015 245
R.L. Schilling et al. (eds.), Selected Papers I,
246 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 247
248 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 249
250 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 251
252 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 253
254 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 255
256 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 257
258 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 259
260 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 261
262 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 263
264 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 265
266 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 267
268 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 269
270 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 271
272 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 273
274 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 275
276 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 277
278 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 279
280 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 281
282 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 283
284 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 285
286 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 287
288 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 289
290 Acta Mathematica 66 (1936) 1–47
[Feller 1936b] — Selected Works of W. Feller, Volume 1 291
Ó Springer International Publishing Switzerland 2015 293
R.L. Schilling et al. (eds.), Selected Papers I,
294 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 295
296 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 297
298 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 299
300 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 301
302 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 303
304 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 305
306 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 307
308 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 309
310 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 311
312 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 313
314 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 315
316 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 317
318 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 319
320 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 321
322 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 323
324 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 325
326 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 327
328 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 329
330 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 331
332 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 333
334 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 335
336 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 337
338 Mathematische Annalen 113 (1936) 113–160
[Feller 1936c] — Selected Works of W. Feller, Volume 1 339
340 Mathematische Annalen 113 (1936) 113–160
Translation of [Feller 1936c]

On the Theory of ¶ 113

Stochastic Processes.
(Existence and
Uniqueness.)
By Willy Feller in Stockholm

Content.
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
§1. Derivation of the Functional Equations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
§2. The Fundamental Solution and the Initial Value Problem for Second-Order
Linear Parabolic Differential Equations in Two Variables. . . . . . . . . . . . 124
§3. Further Properties of the Fundamental Solution.
Continuous Stochastic Processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
§4. Purely Discontinuous Stochastic Processes. . . . . . . . . . . . . . . . . . . . . . . . . . 144
§5. The General Mixed Case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

Introduction.
The following investigation treats random processes with one degree of free-
dom; more precisely it is about those functions F (t, x; τ, ξ) which can appear
as transition probabilities from some state x at time t to another state ≤ ξ at
time τ > t (a purely analytic characterization of these functions will be given

Translated and typeset by René L. Schilling. I am grateful for critical comments and
suggestions by Niels Jacob and Zoran Vondraček. The symbol ¶ indicates a page break in
the original text, and the original pagination is shown in the margin. Footnotes indexed by
lowercase Roman letters contain editorial comments.

Ó Springer International Publishing Switzerland 2015 341


R.L. Schilling et al. (eds.), Selected Papers I,
in § 1, 1). We build on the well-known paper by Kolmogoroff [7]1 where he
provides, for the first time, a systematic investigation of a general stochastic
process, thus laying the theoretical foundation for a particularly fertile branch2
of probability theory. There it has been shown for processes which move con-
tinuously from state to state that, under certain conditions, F (t, x; τ, ξ) obeys
∂F
as a function of t, x a parabolic differential equation, while satisfies in τ, ξ
∂ξ
the adjoint equation. The principal aim of the present paper is to provide the
¶ 114 still missing existence and uniqueness theorems3 ¶ and then to treat, in the
same way, the more general processes which can change their state also by
jumps.
We begin with continuous processes 4 (§ 1, 2) where we call (more general
than usual) a process continuous, if for every δ > 0 the probability, that during
a small time-interval of length Δt the change in the state is at least δ, is of
lower order than Δt:

(1) dξ F (t, x; τ, ξ) = o(τ − t), τ > t.
|x−ξ|>δ

According to Kolmogoroff it is not hard to see that F (t, x; τ, ξ) satisfies, as a


function of t, x, a parabolic differential equation. Under the usual assumptions
on its coefficients we show that the associated initial value problem has a unique
solution which satisfies both the general functional relations of a stochastic
process as well as all relations which are peculiar for the particular case under
consideration which are expressed through the coefficients. (Eqs. (5)–(7), (11)–
∂F
(13)). Remarkably, Kolmogoroff’s result that as a function of τ, ξ satisfies
∂ξ
the adjoint equation, follows in this way without further assumptions from
general theorems on differential equations; it is not even necessary to assume
the differentiability of F (t, x; τ, ξ) in ξ, neither the existence of any moments.
Then we consider the case that the states may change also discontinuously.
This leads to a partial integro-differential equation based on Stieltjes integrals
((26), and (20) in a particular case). It is shown that the associated initial
value problem has under certain assumptions again a unique solution and that
this satisfies all requirements without further conditions (in particular (2), (5)–
(7), (22)–(25)). Also in this case there is an adjoint equation ((27), resp., (21))
for F (t, x; τ, ξ) as a function of τ, ξ.
The purely discontinuous processes 5 which may change their states exclu-
1 Numbers in square brackets refer to the bibliography at the end of this paper.
2 Itis, for instance, sufficient to point out the importance of temporally homogeneous
stochastic processes (including diffusion processes) in the beautiful presentation of Khint-
chine [6].
3 A particular uniqueness theorem has been proved by Kolmogoroff, cf. [8] (where the

differential equations for several degrees of freedom are derived).


4 This includes, in particular, the so-called diffusion processes.
5 Such processes include atomic collisions, telephone calls, various phenomena in single

atoms etc. A completely different phenomenon which is important in mathematical risk

342 On the Theory of Stochastic Processes


sively by jumps (cf. § 1, 3, p. 120) demand special attention. This case can
be treated under much weaker assumptions than the general case: One does
not need any continuity assumptions on F (t, x; τ, ξ) in x, ξ, and the case ¶
that jumps from certain states are more likely, is particularly interesting. This ¶ 115
case is also particularly simple, since (only) this case can be treated with-
out referring to partial differential equations; this is the reason for discussing
purely discontinuous processes separately (§ 4). Since this includes all essen-
tial concepts for the general mixed case, we give only a brief sketch (§ 5) of
the necessary changes. Thus § 4 continues directly § 1 and can be read without
the discussions in §§ 2, 3, 5.
All solutions are represented by series which are convergent for small values
of τ − t. For the construction we need a fundamental solution and existence
theorems for second-order linear parabolic differential equations in two vari-
ables and for unbounded domains; as is sometimes mentioned in the literature,
such results are presently lacking. It is (at least locally) possible to reduce such
equations to the form ±ut = uxx +aux +bu and to obtain some fundamental so-
lution which was, following Hadamard [5], constructed by Gevrey [4]. Here we
use a slightly different fundamental solution, and we need a few not completely
trivial estimates which could not be derived with that approach. Therefore,
§ 2 is devoted to the construction of the fundamental solution of the general
equation which is achieved, without further difficulties, by an adaptation of
Hadamard’s and Gevrey’s method; this fundamental solution can be seen as
Green’s function for a half-plane which is bounded by a characteristic; this
proves, of course, also the uniqueness (or non-uniqueness, cf. pp. 125 f.) of the
initial value problem.
The problem under investigation will be immediately stated and treated
in a purely analytic fashion. Nevertheless we should mention the not yet com-
monly accepted fact that the probabilistic notions, which we have mentioned
in passing, have been, independently of any application, given a rigorous math-
ematical meaning using set-theory, in particular by the lucid axiomatization
of Kolmogoroff [9], to which we will refer throughout.
It is a very pleasing obligation to thank Prof. Cramér, also at this point,
for the hospitality at the institute chaired by him: It has initiated and enabled
me to complete this work.

§1. Derivation of the Functional Equations.


1. Generalities. Let x be a random variable6 depending on time t, i.e. a
real-valued quantity for which at every instance ¶ there is a probability V (t, x) ¶ 116
theory for insurance companies has already been treated by Cramér [1] in 1930 as a particular
stochastic process which depends continuously on time. There seem to be opportunities for
novel applications, and we expect a further publication by Cramér.
6 i.e. “zufällige Größe” in Kolmogoroff’s terminology (variable aléatoire), that is simply

a measurable real function defined on the base set.a


a Feller writes stochastische Veränderliche i.e. stochastic variable. We use the modern

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 343


to be ≤ x. The process described by x is called, according to Kolmogoroff,
stochastically definite if the knowledge of x at time t entails the knowledge of
V (t, x) for t > t ; here it is assumed that the increments of x in disjoint intervals
of time are mutually independent. Therefore, a stochastically definite process
can be analytically completely described by a function F (t, x; τ, ξ) which is the
conditional probability that the random variable x takes at time τ some value
≤ ξ given that it took the value x at time t. The function F (t, x; τ, ξ) has to
satisfy certain conditions which we are now going to collect.
In the variable ξ the function F (t, x; τ, ξ) is for fixed t, x, τ , by definition,
a distribution function, i.e. a non-decreasing function which is defined for all
real ξ such that

(2) lim F (t, x; τ, ξ) = 0, lim F (t, x; τ, ξ) = 1;


ξ→−∞ ξ→+∞

at any discontinuity point we define

(3) F (t, x; τ, ξ0 ) = lim F (t, x; τ, ξ).


ξ→ξ0 +

Naturally, we assume further that F (t, x; τ, ξ) is continuous as a function of t


and τ . A priori we do not make any assumption on the x-dependence apart
from the trivial assumption that F (t, x; τ, ξ) is, for fixed t, τ, ξ, Borel measurable
in x.
If we define for all real x, ξ

0 if ξ < x,
(4) E(x, ξ) =
1 if ξ ≥ x,

then we get from the definition of F (t, x; τ, ξ) and the assumed continuity in t

(5) lim F (t, x; τ, ξ) = E(x, ξ),


t→τ −
(6) lim F (t, x; τ, ξ) = E(x, ξ).
τ →t+

Finally, the composition rule for probabilities yields for every t < t < τ the
fundamental relation named after Chapman and Smoluchowski
 +∞
(7) F (t, x; τ, ξ) = F (t , y; τ, ξ) dF (t, x; t , y).
−∞

Here and in the sequel the Stieltjes differential always refers to the last variable,
¶ 117 i.e. to y. The existence of the ¶ integral in (7) is a priori clear since we have
0 ≤ F ≤ 1 and since F (t, x; τ, ξ) is Borel measurable in x; cf. Lebesgue [11,
p. 261].

terminology and write random variable. See also Feller’s comment in footnote 1 of the later
paper [Feller 1937b].

344 On the Theory of Stochastic Processes


The properties mentioned above give a complete analytic characterization
of the function F (t, x; τ, ξ), i.e. every such function yields some stochastic pro-
cess. In the following the issue is to determine solutions of (7) which are
continuous in t, τ , distribution functions in ξ and satisfy the initial conditions
(5)–(6).
To simplify notation we introduce, following Kolmogoroff, the operator ⊕
which is defined by
 +∞
(8) u(x, ξ) ⊕ v(x, ξ) = v(y, ξ) du(x, y)
−∞

and which always acts on the non-temporal variables x, ξ and x, y, respectively.


Using this operator (7) can be rewritten as
(7a) F (t, x; τ, ξ) = F (t, x; t , ξ) ⊕ F (t , x; τ, ξ), t < t < τ.

If the partial derivative ∂ξ F (t, x; τ, ξ) = f (t, x; τ, ξ) exists, then we call
f (t, x; τ, ξ) frequency function (= differential distribution function in Kolo-
mogoroff’s diction). It satisfies
 +∞
(9) f (t, x; τ, ξ) = f (t , y; τ, ξ)f (t, x; t , y) dy
−∞

with the lateral condition


 +∞
(10) f (t, x; τ, ξ) dξ = 1.
−∞

The process is called temporally homogeneous, if F (t, x; τ, ξ) depends only


on the length of the time-interval (t, τ ) but not on its position: F (t, x; τ, ξ) =
G(τ − t, x, ξ). Then equation (7) becomes
 +∞
(7b) G(t + t , x, ξ) = G(t , y, ξ) dG(t, x, y) (t, t > 0).
−∞

For processes which are also homogeneous in space, i.e. where G depends only
on τ − t and ξ − x, our problem has been solved in the greatest generality, as
Kolmogoroff [10] has constructed – assuming only the existence of the second
moment for G – the most general solution to (7b) with the initial condition
(6) using the theory of characteristic functions. Cramér [2] gives asymptotic
estimates of this solution as t → ∞. For the connection of the continuous ho-
mogeneous case with differential equations see Petrowsky [13] and Khintchine
[6]. ¶ ¶ 118

2. The continuous stochastic process. The definition is given in


(1), p. 114. We cast this assumption in a form which is better suited for the
following discussion: Assume that for every δ > 0

1
(11) lim dF (t − Δt, x; t, y) = 0, Δt > 0
Δt→0 Δt |x−y|>δ

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 345


holds. Moreover, following [6] and [13] we make the restriction that the fol-
lowing two limits

1
(12) lim (y − x)2 dF (t − Δt, x; t, y) = 2a(t, x) > 0,
Δt→0 Δt |x−y|<δ

1
(13) lim (y − x) dF (t − Δt, x; t, y) = b(t, x),
Δt→0 Δt |x−y|<δ

exist7 (are finite); because of (11), these limits depend only seemingly on δ 8 .
Let us explicitly remark that the existence of any moments will not be assumed
(and there are, indeed, solutions F (t, x; τ, ξ) without finite moments).
Since we want to derive a differential equation for F (t, x; τ, ξ) as a function
of t, x, we have to assume that

∂F (t, x; τ, ξ) ∂ 2 F (t, x; τ, ξ)
,
∂x ∂x2
exist and are continuous in x for every triplet t, ξ, τ > t. On the other hand,
no assumptions are made on the continuity in ξ.
¶ 119 ¶ Now pick δ > 0 and t > 0. For every fixed t, x, τ > t, ξ one has by (2) and
(7)

(14)
F (t − Δt, x; τ, ξ) − F (t, x; τ, ξ)
Δt
 +∞
1 * +
= F (t, y; τ, ξ) − F (t, x; τ, ξ) dF (t − Δt, x; t, y)
Δt −∞

1 * +
= F (t, y; τ, ξ) − F (t, x; τ, ξ) dF (t − Δt, x; t, y)
Δt |y−x|≥δ

∂F (t, x; τ, ξ) 1
+ (y − x) dF (t − Δt, x; t, y)
∂x Δt |y−x|<δ
 $ %
∂ 2 F (t, x; τ, ξ) 1 1  
+ (y − x) + o (y − x)
2 2
dF (t − Δt, x; t, y).
∂x2 Δt |y−x|<δ 2

7 Of course, one could have replaced F (t − Δt, x; t, y) in (11)–(13) by F (t, x; t + Δt, y).

The thus obtained equations are, by the way, for continuous a(t, x) and b(t, x) – and only
this case will be considered – a consequence of (11)–(13) and vice versa. In the sequel we
will only use (11)–(13).
8 Our assumptions differ from those of Kolmogoroff [7, pp. 445 f.] and [8, p. 150]. Therein
 +∞
(11) is strengthened by assuming the existence of mk = |y − x|k dF (t, x; τ, y) for k =
−∞
m0
1, 2, 3 and lim = 0. With these assumptions, (12) and (13) have been proved (under
m2
certain assumptions on the regularity and that a functional determinant does not vanish). –
Khintchine and Petrowsky replace (11) by the stronger analogue of Lindeberg’s condition

346 On the Theory of Stochastic Processes


Letting Δt → 0 then the accumulation points of the right-hand side are, be-
cause of (11), obviously independent of δ; on the other hand, they can differ
among each other by at most o(δ 2 ) which means that the right-hand side con-
verges to a limit as Δt → 0. Therefore, the left-hand derivative ∂F
∂t exists, and
(14) yields the “first differential equation”

∂F (t, x; τ, ξ) ∂ 2 F (t, x; τ, ξ) ∂F (t, x; τ, ξ)


(15) + a(t, x) + b(t, x) = 0.
∂t ∂x2 ∂x
Here ∂F
∂t denotes the left-hand derivative which is enough for all what follows;
however, the same argument with Δt < 0 yields the existence of the derivative
∂F
∂t for which (15) holds.
For F (t, x; τ, ξ) we have (15) and the initial condition (5). Under certain
assumptions on a(t, x) and b(t, x) – notably condition A of pp. 128 f. – it will
be shown that there is a unique solution of (15) satisfying (5); this remains
true for a more general equation which is obtained from (15) by adding the
term cF . This solution has a frequency function
 ξ
(16) F (t, x; τ, ξ) = f (t, x; τ, y) dy,
−∞

and f is a so-called fundamental solution of the differential equation. From this


one immediately deduces that, on the one hand, (7) holds and, on the other,
that f (t, x; τ, ξ) satisfies as a function of τ, ξ the adjoint differential equation;
¶ for (15) this is Kolmogoroff’s “second differential equation”:b ¶ 120

∂f (t, x; τ, ξ) ∂2 , - ∂ , -
(17) − + 2 a(τ, ξ)f (t, x; τ, ξ) − b(τ, ξ)f (t, x; τ, ξ) = 0.
∂τ ∂ξ ∂ξ
Let us note that for a general equation f may become negative. For equations
of the form (15) (i.e. c = 0) F (t, x; τ, ξ) is indeed a distribution function in ξ,
so that all conditions of a stochastic process are satisfied. We will furthermore
show that the relations (11)–(13) hold which solves the problem completely.
The equation (17) together with (16) and the initial condition (6) uniquely
determines our stochastic process, too.

3. The purely discontinuous process. A process is called like this if it


comprises certain events whose state changes by jumps where the probability
for the Laplace–Liapounoff theorem:

1
lim (y − x)2 dF (t − Δt, x; t, y) = 0.
Δt→0 Δt
|y−x|>δ

Consequently, (11) is the analogue of a weaker condition which may be used to replace
Lindeberg’s condition cf. Feller [3].

∂2
, - , -
b The original paper contains the misprint − ∂f
∂τ
− ∂ξ2
af + ∂
∂ξ
bf which has been
corrected.

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 347


distribution of the jump size is known. More precisely: Assume that at time
t the random variable x has the value x; then we assume that there exists
a probability of the form p(t, x)Δt + o(Δt) such that in the following time-
interval of length Δt the random variable x changes; P (t, x, ξ) is the conditional
probability that x is ≤ ξ if it changes at time t + Δt. In purely analytic terms
this can be expressed in the following way:
The stochastic process defined by F (t, x; τ, ξ) is said to be purely discontin-
uous, if

(18) F (t, x; τ, ξ) = (1 − p(t, x)(τ − t))E(x, ξ)


+ (τ − t)p(t, x)P (t, x, ξ) + o(τ − t) (t < τ )

holds. Here p(t, x) and P (t, x, ξ) denote two non-negative functions, and P (t,x,ξ)
is, as a function of ξ, a distribution function, i.e. non-decreasing (right-continu-
ous, cf. (3)), and we have

lim P (t, x, ξ) = 0, lim P (t, x, ξ) = 1.


ξ→−∞ ξ→+∞

For the solution of this problem we will assume, in addition, that p(t, x) and
P (t, x, ξ) are continuous in t and Borel measurable in x; finally (for simplicity)
we assume that p(t, x) is bounded in each finite t-interval. – For F (t, x; τ, ξ) we
do not pose further continuity assumptions than those mentioned in 1.
A marked difference of the purely discontinuous case compared with all
other cases is the fact, that in the other cases the second equation follows only
¶ 121 indirectly ¶ from the first equation, whereas here both equations are equally
important and completely symmetric. In order to derive the equation for t, x
we note that we have by (7) and (18) for 0 < Δt < τ − t:c
 +∞
(19) F (t, x; τ, ξ) = F (t + Δt, y; τ, ξ) dF (t, x; t + Δt, y)
−∞
= (1 − p(t, x)Δt)F (t + Δt, x; τ, ξ)
 +∞
+ Δtp(t, x) F (t + Δt, y; τ, ξ) dP (t, x, y) + o(Δt).
−∞

The existence of all integrals appearing here and in the sequel is ensured by the
above mentioned measurability and boundedness of the functions. Subtracting
in (19) on both sides F (t + Δt, x; τ, ξ) and dividing by Δt immediately shows
that the right-hand side admits a limit as Δt → 0+. Therefore, the right-hand
derivative ∂F
∂t exists and we get
$  +∞ %
∂F (t, x; τ, ξ)
(20) = p(t, x) F (t, x; τ, ξ) − F (t, y; τ, ξ) dP (t, x, y) .
∂t −∞

In the same way we can deal with the increments Δt < 0 leading again to (20)
c The misprint of the original, τ has been corrected.

348 On the Theory of Stochastic Processes


but for the left-hand derivative; this proves the existence of a proper derivative
∂F
∂t satisfying (20).
(20) is the analogue of the “first differential equation” (15). Using the
operator ⊕ (cf. (8)) we may write (20) in the form

∂F (t, x; τ, ξ) * +
(20a) = p(t, x) E(x, ξ) − P (t, x, ξ) ⊕ F (t, x; τ, ξ).
∂t
This is, of course again, the initial value problem for (20) determined by (5).
In order to get to the second equation for F (t, x; τ, ξ) we write for Δτ > 0d
 +∞
F (t, x; τ + Δτ, ξ) = F (τ, y; τ + Δτ, ξ) dF (t, x; τ, y)
−∞
 +∞ * +
= (1 − p(τ, y)Δτ )E(y, ξ) + Δτ p(τ, y)P (τ, y, ξ) dF (t, x; τ, y) + o(Δτ ).
−∞

¶ As before this implies the existence of the derivative ∂F ∂τ and


9 ¶ 122
(21)
ξ 
+∞
∂F (t, x; τ, ξ)
=− p(τ, y) dF (t, x; τ, y) + p(τ, y)P (τ, y, ξ) dF (t, x; τ, y)
∂τ
−∞ −∞

or
∂F (t, x; τ, ξ) * +
(21a) = F (t, x; τ, ξ) ⊕ − p(τ, x)E(x, ξ) + p(τ, x)P (τ, x, ξ) ,
∂τ
this time with the initial condition (6).
The equations (20) and (21) are, of course, only the simplest particular case
of the two integro-differential equations (26) and (27) which will be derived
in 4.; their solution is much easier and needs less restrictive assumptions. As
we will show in § 410 , we may choose arbitrary functions p(t, x) and P (t, x, ξ)
which satisfy the assumptions mentioned above. The equations (20) and (21)
with the initial conditions (5) and (6) admit each a unique solution, and these
solutions coincide; this solution satisfies also the other conditions which have
to be satisfied by a transition function (§ 1, 1). As one would a priori expect,
the solution F (t, x; τ, ξ) is, in general, not continuous as a function of ξ.
9 In the notation of (21) it is, of course, assumed that the first integral on the right-

hand side, as well as F (t, x; τ, ξ) and E(x, ξ), are defined such that they are right-continuous
(cf. (3)):
 ξ  z
= lim .
z→ξ+
−∞ −∞
This remark is not essential as it is enough to consider continuity points.
10 As already mentioned in the introduction, § 4 is a direct continuation and can be read

without the following sections.


d The original contains, in the first line of the following formula, the misprint F (t, x; τ, ξ) =

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 349


4. The mixed case. It is natural to combine both cases studied in 2. and
3. and to replace in the governing equation (18) of the purely discontinuous
case the function E(x, ξ) by G(t, x; τ, ξ) which yields a continuous component
of the process, i.e. which assumes the role (and the properties) of F (t, x; τ, ξ)
in the continuous case.
Thus, we characterize the mixed case by the following requirement: For
small values of τ − t > 0 there is a representation of the form
(22) * +
F (t, x; τ, ξ) = 1 − p(t, x)(τ − t) G(t, x; τ, ξ) + (τ − t)p(t, x)P (t, x, ξ) + o(τ − t)

¶ 123 ¶ where G(t, x; τ, ξ) is for fixed t, x, τ a distribution function in ξ which satisfies


the following three conditions

1
(23) lim dG(t − Δt, x; t, y) = 0,
Δt→0+ Δt |y−x|>δ

1
(24) lim (y − x)2 dG(t − Δt, x; t, y) = 2a(t, x) > 0,
Δt→0+ Δt |y−x|<δ

1
(25) lim (y − x) dG(t − Δt, x; t, y) = b(t, x).
Δt→0+ Δt |y−x|<δ

Of course, p(t, x) and P (t, x, ξ) are, as stated in the preceding subsection, non-
negative functions and as a function of ξ, P (t, x, ξ) is a distribution function;
moreover, we assume that both functions are continuous in t.
As in the continuous case a direct argument only gives an equation for
F (t, x; τ, ξ) as a function of t, x. Inserting (22) into the fundamental equation
(7) (applied to t − Δt, t, τ ) and subtracting the identity
 +∞
F (t, x; τ, ξ) = F (t, x; τ, ξ) dG(t − Δt, x; t, y),
−∞

one obtains
F (t − Δt, x; τ, ξ) − F (t, x; τ, ξ)
Δt
 +∞
1 * +
= F (t, y; τ, ξ) − F (t, x; τ, ξ) dG(t − Δt, x; t, y)
Δt −∞
$  +∞
− p(t, x) F (t, y; τ, ξ) dG(t − Δt, x; t, y)
−∞
 +∞ %
− F (t, y; τ, ξ) dP (t, x, y) + o(Δt).
−∞

From (22) and (5) it follows immediately that G(t − Δt, x; t, y) → E(x, y) as
∂2F
Δt → 0. If we assume again that ∂F
∂x and ∂x2 exist and are continuous in x,

. . . which has been corrected.

350 On the Theory of Stochastic Processes


then it follows as in 2. and 3. that the (in the first step one-sided) derivative
∂F
∂t exists and

∂F (t, x; τ, ξ) ∂ 2 F (t, x; τ, ξ) ∂F (t, x; τ, ξ)


(26) + a(t, x) 2
+ b(t, x)
∂t ∂x ∂x
 +∞
− p(t, x)F (t, x; τ, ξ) + p(t, x) F (t, y; τ, ξ) dP (t, x, y) = 0,
−∞

¶ or ¶ 124

∂F (t, x; τ, ξ) ∂ 2 F (t, x; τ, ξ) ∂F (t, x; τ, ξ)


(26a) + a(t, x) 2
+ b(t, x)
∂t * ∂x + ∂x
− p(t, x) E(x, ξ) − P (t, x, ξ) ⊕ F (t, x; τ, ξ) = 0.
This is the analogue of the “first differential equation”, and again we have
the initial condition (5). For the proof of the existence of a solution we will
make the same assumptions on the coefficients a(t, x) and b(t, x) as in the
continuous case, i.e that condition A of pp. 128 f. is satisfied. On the other
hand, p(t, x) and P (t, x, ξ) have to satisfy more restrictive conditions than in
the purely discontinuous case; for simplicity, we will restrict ourselves to the
case that ∂P
∂ξ exists and that this function as well as p(t, x) are differentiable in
t, x (although continuity and a Lipschitz condition would be enough). Under
these assumptions we will show in § 5 that the initial value problem admits
a unique solution F (t, x; τ, ξ) and that it indeed defines a stochastic process
satisfying (22)–(25); again, there is a frequency function f (t, x; τ, ξ) (cf. (16))
satisfying the equation
∂f (t, x; τ, ξ) ∂2 , - ∂ , -
(27) − + 2 a(τ, ξ)f (t, x; τ, ξ) − b(τ, ξ)f (t, x; τ, ξ)
∂τ ∂ξ ∂ξ
 +∞
∂P (τ, y, ξ)
− p(τ, ξ)f (t, x; τ, ξ) + p(τ, y)f (t, x; τ, y) dy = 0.
−∞ ∂ξ
Moreover, f (t, x; τ, ξ) is a solution to (26) as one sees by differentiating this
equality in ξ.
The equation (27) has already been mentioned by Kolmogoroff [7, Eq. (179)]
as an “alternative possibility”.

§2. The Fundamental Solution and the Initial


Value Problem for Second-Order Linear Parabolic
Differential Equations in Two Variables.

For the following we need the solution of the initial value problem of both the
differential equation (15) and the adjoint differential equation (16). There-
fore it is appropriate to consider right away the most general homogeneous

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 351


parabolic differential equation in two variables:

(28) L(u) ≡ ut + a(t, x)uxx + b(t, x)ux + c(t, x)u = 0,

which we will also use in § 5. In this way it becomes evident, too, to which
extent the results are due to general properties of parabolic equations or, in
particular, related to stochastic processes.
¶ 125 ¶ Of course, it is necessary that a = 0 and we assume that a > 0 (the case
a < 0 can be treated analogously). The coefficients are assumed to be defined
in some fixed t-interval for all x, and then the initial value problem becomes:
We are looking for a continuous solution to (28) which is defined for t < t and
converges to a prescribed function g(x) as t → t −. (The analogous problem for
t > t is known to have no solution, in general; if a < 0 the half-plane t < t must
be replaced by t > t .) This initial value problem is, essentially, equivalent to
the construction of some fundamental solution to (28) which is defined for all x,
i.e. a solution to (28) which has in the point (τ, ξ) some prescribed singularity.
Among all fundamental solutions we look for a particular one which is suited
for our purposes and which is the analogue of the so-called Green’s function
for bounded domains. This solution will now be constructed by an adaptation
of Hadamard’s-and-Gevrey’s method for a = 1 [4, mainly II, pp. 138 f.] (cf. the
introduction p. 115).

1. Let a(t, x) be a positive function defined for all real x and t0 < t < t1
for which the derivatives at , ax and axx exist and are continuous (in t and x);
we set
 x
dy
(29) ϕ(t, x) = & ,
0 a(t, y)

such that ϕ(t, x) increases monotonically in x. Moreover, we assume that

(30) lim ϕ(t, x) = −∞, lim ϕ(t, x) = +∞.


x→−∞ x→+∞

We consider first the particular equation


√ √  √
(31) M (u) ≡ ut + a a ux x − a ϕt ux = 0,

and the adjoint equation


√ √   √ 
(32) M ∗ (u) ≡ −ut + a a u x x + a ϕt u x = 0,

for which one can immediately obtain the fundamental solution (cf. (38)).
Note that the restriction (30) is absolutely necessary for the uniqueness of
the initial value problem (31) (hence, for (32)), and thus for the whole theory
to follow. This is shown by the following example:
Set a(t, x) = ch4 x where ch x = cos ix; then (31) becomes

(33) ut + ch2 x (ch2 x · ux )x = 0.

352 On the Theory of Stochastic Processes


Denoting, as usual, the standard normal Gaussian distribution function by
 x
1 1 2
(34) Φ(x) = √ e− 2 y dy,
2π −∞

¶ then it is easy to verify that for each t < τ ¶ 126

1 − tanh x −1 − tanh x
(36) u(t, x) = −1 + Φ & −Φ &
2(τ − t) 2(τ − t)

ise a solution of (33). Since | tanh x| ≤ 1 and Φ(∞) = 1, the solution u converges
to 0 as t → τ −, i.e. (36) is a not identically vanishing solution to (33) which
tends as t → τ to the initial value 0.
Even if it is not always explicitly mentioned, we assume throughout that
the quantities t and τ always satisfy the inequality

(37) t0 < t < τ < t1 .

Now we set
1 1 1 {ϕ(τ, ξ) − ϕ(t, x)}2
(38) U0 (t, x; τ, ξ) = √ & √ exp − .
2 π a(τ, ξ) τ − t 4(τ − t)

As a function of t, x, U0 (t, x; τ, ξ) is a solution of (31), as a function of τ, ξ,


it solves (32) and it is the fundamental solution which we were looking for. If
t → τ − and x = ξ, then we have U0 → 0, otherwise it is continuous for t < τ
and x, ξ.
Introducing a new integration variable

ϕ(τ, ξ) − ϕ(t, x)
(39) y= &
2(τ − t)

(for fixed t, x, τ ; y is monotonically increasing in ξ) it follows because of Φ(∞) =


1 (cf. (34)) that
 +∞
(40) U0 (t, x; τ, ξ) dξ = 1.
−∞

Therefore, the function


 ξ
F (t, x; τ, ξ) = U0 (t, x; τ, z) dz
−∞

is for every t, x, τ a distribution function in ξ. Although it is not yet needed


and will, later on, follow from the general theory, let us note already at this
e In the original text the label (35) is missing. Feller uses th x to denote tanh x, the

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 353


point that F (t, x; τ, ξ) defines a stochastic process, i.e. it satisfies the equation
(7) – or U0 satisfies the equation (9); this follows easily by introducing a new
integration variable in (9) (t, x, t , τ, ξ are fixed!)
' '
ϕ(t , y) − ϕ(t, x) τ − t ϕ(t, x) − ϕ(τ, ξ) t − t
z= & + & .
2(t − t) τ − t 2(τ − t ) τ −t

This substitution is admissible because of the monotonicity of ϕ(t, y).

2. For the following we need a

Lemma. Let t0 < T < t1 and let g(t, x) be a bounded function which is defined
¶ 127 for t0 ≤ t ≤ T and all real x ¶ and satisfies in some neighbourhood of every
point (t , x ) a Lipschitz condition of the form
* +
(41) |g(t, x) − g(t , x )| < K |t − t |α + |x − x |α , α = α(t , x ) > 0.

Then the function


 T  +∞
(42) G(t, x) = dτ g(τ, ξ)U0 (t, x; τ, ξ) dξ
t −∞

admits the partial derivatives Gt , Gx , Gxx , and one has

(43) M (G(t, x)) = −g(t, x).

(The finiteness of the integral (42) follows from (40) and the boundedness of
g(t, x)).
The proof is most simply accomplished by reducing everything to a known
theorem for the heat equation. Under the (obviously bijective) transformation

t̄ = t, τ̄ = τ,
x̄ = ϕ(t, x), ξ¯ = ϕ(τ, ξ),

(42) becomes
 
1 T
dτ̄ +∞
¯ · exp − (ξ¯− x̄)2 ¯
G(t, x) ≡ Ḡ(t̄, x̄) = √ √ ḡ(τ̄ , ξ) dξ.
2 π t̄ τ̄ − t̄ −∞ 4(τ̄ − t̄)

¯ satisfies again a Lipschitz


Since ϕ(t, x) is continuously differentiable,f ḡ(τ̄ , ξ)
¯
condition of the form (41) in τ̄ , ξ, and so there exist the partial derivatives
Ḡt̄ , Ḡx̄ , Ḡx̄x̄ due to a well-known theorem which is, as far as I am aware, due

tangens hyperbolicus.
f The original contains the misprint “g(t, x) is continuously differentiable” which has been

corrected.

354 On the Theory of Stochastic Processes


to E. E. Levi [12, pp. 229 f.]11 , and we haveg

Ḡt̄ + Ḡx̄x̄ = −ḡ.

Returning to the original variables, the claim follows.


It is almost evident that the boundedness of g(t, x) is not at all necessary:
For the theorem to hold it is enough to assume that g(t, x) grows as x → ±∞
so slow (for example like a power of ϕ(t, x)) that the integrals appearing in the
definition converge – but we want to restrict ourselves to the simplest case.
Later on, we will need the analogue of our lemma for more general equa-
tions which is less easy to prove. If we only consider the case that g(t, x) is
continuously differentiable, then the proofs here and there almost literally co-
incide: In this case one may apply a well-known method from potential theory.
¶ For the benefit of less experienced readers we will briefly sketch the proof. ¶ 128
Assume that g(t, x) is continuously differentiable. Differentiating (42) for-
mally in x and introducing in the thus obtained integral new integration vari-
ables α, β (t, x are fixed!)

τ − t = α,
(44) √
ϕ(τ, ξ) − ϕ(t, x) = 2α β,

one obtains a uniformly convergent integral. Therefore, Gx exists and one has
 T  +∞
∂U0 (t, x; τ, ξ)
Gx (t, x) = dτ g(τ, ξ) dξ
t −∞ ∂x
 T  +∞
1 ∂ ,& -
= −& dτ g(τ, ξ) a(τ, ξ) U0 (t, x; τ, ξ) dξ
a(t, x) t −∞ ∂ξ
 T  +∞ &
1
=& dτ a(τ, ξ) gξ (τ, ξ) U0 (t, x; τ, ξ) dξ.
a(t, x) t −∞

The substitiution (44) shows that the last integral may again be formally
differentiated in x. – Using this substitution directly in (42), then t appears
under the integral only in g(·, ·) and we may formally differentiate in t: In this
way we obtain Gt as a sum of a line integral and an area integral; obviously, the
latter converges, as well as Gxx , to 0 if T → t+. In order to see M (G) = 0, it is
enough to split the interval (t, T ) into two parts (t, t ) + (t , T ): In the second
interval the integrand appearing in (42) is regular and solves M (u) = 0, the
first interval is dealt with as stated above. For t → t we get M (G) = −g.

3. Let us return to the general equation (28) which we write in the form

(45) L(u) ≡ M (u) + λux + cu = 0


11 A proof under even weaker assumptions than (41) can be found in Gevrey [4, I,
g The original contains in the following formula the misprint “= 0” which has been cor-

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 355


where (cf. the definition of ϕ, (29))
1 √
(46) λ = b − ax + a ϕt .
2
We will make the following assumption on the coefficients which we will, for
¶ 129 brevity, call condition A: ¶
1) For t0 ≤ t ≤ t1 and all x the functions a, at , ax , axx , b, bx , c exist and
satisfy in some neighbourhood of every point a Lipschitz condition of the form
(41).
1
2) a, , λ, λx , c are bounded.
a
The boundedness of a includes, in particular, (30).
We note that the condition 2) is only assumed for our convenience and to
ease the presentation by not repeating this trivial fact. It is easy to verify that
the following condition 2 ) allows to reduce all necessary estimates to bounded
domains so that we may replace 2) by 2 ):
√ 1
2 ) Assume that (30) holds; if f (t, x) is one of the functions a, √ , λ,
a
∂U0 ∂U0 ∂ 2 U0
λx , c and V (t, x; τ, ξ) one of the functions U0 , , , , then
∂x ∂t ∂x2

f (τ, ξ)V (t, x; τ, ξ) dξ
|ξ−x|>δ

converges for some δ > 0 uniformly and absolutely.


For example, 2 ) is certainly satisfied, if the above mentioned functions are
majorized by a power of ϕ(t, x) (cf. (29)).
Now we set for every integer n ≥ 0 (of course, always in the region (37)):

(47) Un+1 (t, x; τ, ξ)


τ  $
+∞ %
∂Un (p, q; τ, ξ)
= dp λ(p, q) + c(p, q)Un (p, q; τ, ξ) U0 (t, x; p, q) dq,
∂q
t −∞

and study the functions Un (t, x; τ, ξ) first as functions of t, x for arbitrary but
fixed τ, ξ.
First we consider U1 (t, x; τ, ξ). If we remove from the domain of inte-
gration in (47) any neighbourhood of the point (τ, ξ), then U0 (p, q; τ, ξ) and

∂q U0 (p, q; τ, ξ) are bounded on the remaining domain; because of (40), the
integral from (47), restricted on the remaining domain, converges for n = 0
absolutely and uniformly in t, x. Thus, we have only to study convergence lo-

pp. 343 f.].

rected.

356 On the Theory of Stochastic Processes


cally in a neighbourhood of (τ, ξ). For this, we introduce in (47) again (cf. (44))
new integration variables,

(48) τ − p = α, ϕ(τ, ξ) − ϕ(p, q) = 2α β.
1
¶ If, for example, |λ| < K, √ < K, then we get ¶ 130
a
  τ  +∞ 
 ∂U0 (p, q; τ, ξ) 
 dp λ(p, q) U0 (t, x; p, q) dq 
 ∂q
t −∞
  +∞
K 3 τ −t |β| 1 2
< dα & e− 2 β dβ
4π 0 −∞ α(τ − t − α)
3  τ −t
K dα
= &
2π 0 α(τ − t − α)
 1
K 3 ds K3  1 1 
= & = B 2, 2
2π 0 s(1 − s) 2π

where B(p, q) denotes Euler’s integral of the first kind, cf. [14, p. 253]. In the
same way we get (a fortiori) the boundedness of the other term in (47) for
n = 0, and so we have uniformly in t, x

(49) |U1 (t, x; τ, ξ)| < M.

Further, we want to show that

(50)
∂U1 (t, x; τ, ξ)
∂x
τ  $
+∞ %
∂U0 (p, q; τ, ξ) ∂U0 (t, x; p, q)
= dp λ(p, q) + c(p, q)U0 (p, q; τ, ξ) dq.
∂q ∂x
t −∞

In order to see that the integral on the right-hand side converges, we split the
 1 2
interval (t, τ ) by, say, t = 12 (t + τ ) into two parts. Observing that ze− 4 z  < 1
we obtain in the second part using the substitution (48):
 τ  +∞ 
 ∂U0 (p, q; τ, ξ) ∂U0 (t, x; p, q) 
 dp λ(p, q) dq 
 1 ∂q ∂x
2 (t+τ ) −∞
 1  +∞
K 4 2 (τ −t) |β| 1 2
< dα √ e− 2 β dβ
8π 0 −∞ α(τ − t − α)
4  1 (τ −t)
K 2 dα K4
= √ <√ .
4π 0 α(τ − t − α) τ −t

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 357


The same method yields for the second term in (50) and the same subinterval,
of course, an even better estimate. In the first half of the interval we obtain
the same estimates using the substitution

(51) p − t = α, ϕ(p, q) − ϕ(t, x) = 2α β;

altogether we get the following estimate which is again uniform in t, x


 
 ∂U1 (t, x; τ, ξ)  
(52)   < √M .
 ∂x  τ −t

¶ 131 ¶ For our purposes it is more convenient to combine the inequalities (49) and
(52); if we restrict our considerations to an arbitrary but finite t-interval, then
we can write (49) also in the form

M 
(53) |U1 (t, x; τ, ξ)| < √ .
τ −t

For the estimate of Un (t, x; τ, ξ) with n > 1 we use induction. Assume that
for some n ≥ 1 we have shown that
(54)  
N 2n √  ∂Un (t, x; τ, ξ)  2n √
|Un (t, x; τ, ξ)| <  n  τ − t
n−2
,   < N  τ − tn−2 ;
Γ 2  ∂x  Γ n2

here the constant N > 1 is already chosen so large that in the t-interval under
consideration
1 √
√ < N, |λ| + |c| < N, 2 τ −t < N
a

holds. Then the substitution (51), applied to (47), immediately proves the
claim. We obtain, for example,
    +∞  
 ∂Un+1 (t, x; τ, ξ)  N 2n+1 τ  ∂U0 (t, x; p, q)  √
 <   dp   τ − pn−2 dq
 ∂x  Γ n2  ∂x 
t −∞
 τ −t  +∞ √
N 2n+2 n−2 |β| − 1 β 2
< n √ dα τ −t−α √ e 2 dβ
Γ 2 ·2 π 0 −∞ α
 τ −t √ n−2
N 2n+2 τ −t−α
=  n √ √ dα
Γ 2 π 0 α
 1√ n−2
N 2n+2 √ n−1 1−s
=  n √ τ −t √ ds
Γ 2 π 0 s
N 2n+2  √ n−1 N 2n+2 √ n−1
=  n √ B 12 , n2 τ − t =  n+1  τ − t .
Γ 2 π Γ 2

Because of (53) we see that (54) indeed holds for every n > 0.

358 On the Theory of Stochastic Processes


Now we set


V (t, x; τ, ξ) = Un (t, x; τ, ξ),
(55) n=1
U (t, x; τ, ξ) = U0 (t, x; τ, ξ) + V (t, x; τ, ξ);

¶ by (54) this series converges absolutely and uniformly and may be differen- ¶ 132
tiated term-by-term. Therefore, we have by (47)
(56) V (t, x; τ, ξ)
τ  $
+∞ %
∂U (p, q; τ, ξ)
= dp λ(p, q) + c(p, q)U (p, q; τ, ξ) U0 (t, x; p, q) dq.
∂q
t −∞

From this we conclude (cf. (43)) that for (t, x) = (τ, ξ)


∂U (t, x; τ, ξ)
(57) M (V (t, x; τ, ξ)) = −λ(t, x) − c(t, x)U (t, x; τ, ξ)
∂x
holds; in the lemma stated in no. 3 it was assumed that g(t, x) is bounded;
nevertheless, we may remove from the integration domain in (56) some neigh-
bourhood of (τ, ξ) and apply the lemma for the integral over the remaining
domain. In some neighbourhood of (τ, ξ) the function U0 (t, x; p, q) is a regular
solution of M (u) = 0, and the same is true for the integral over this neighbour-
hood which proves (57). Using (55) and (45) we can write (57) also as
L(U (t, x; τ, ξ)) = 0.
Thus, U (t, x; τ, ξ) is, as a function of t, x, a solution of (28); it is the required
fundamental solution.

4. Let us finally show that for every δ > 0


 +∞ 
(58) lim U (t, x; τ, ξ) dx = 1, lim U (t, x; τ, ξ) dx = 0
t→τ −∞ t→τ |x−ξ|>δ

hold.
An application of the substitution (39) this time, however, for fixed t, τ, ξ,
immediately gives
 +∞  +∞ &
1 1 1 2
(59) U0 (t, x; τ, ξ) dx = √ & ā(y; t, τ, ξ) e− 2 y dy,
−∞ 2π a(τ, ξ) −∞
where ā(y; t, τ, ξ) denotes the function resulting from a(t, x) under the sub-
stitution. As t → τ , we have in every finite y-interval that x → ξ, hence
ā(y; t, τ, ξ) → a(τ, ξ), and so we find from (59)
 +∞
(60) lim U0 (t, x; τ, ξ) dx = 1;
t→τ −∞

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 359


the same argument also shows that

(61) lim U0 (t, x; τ, ξ) dx = 0
t→τ |x−ξ|>δ

¶ 133 ¶ obtains. – Finally we may integrate in (56) with respect to x under the
integral sign; since the integral converges we get
 +∞
(62) lim V (t, x; τ, ξ) dx = 0.
t→τ −∞

The relations (60)–(62) together with (55) yield the claim (58).
In conclusion we thus have:
If the coefficients of the differential equation (28) satisfy in the range (37)
the condition A of pp. 128 f., then there exists a fundamental solution

U (t, x; τ, ξ) = U0 (t, x; τ, ξ) + V (t, x; τ, ξ)

of (28) with bounded V (t, x; τ, ξ) satisfying the relations (58). The quantities
U0 and V are defined by (38), (47) and (55), and the series appearing in (55)
converges absolutely and uniformly in t, x, and may be differentiated term-by-
term in x.
In exactly the same fashion we can, of course, construct a fundamental
solution U ∗ (t, x; τ ∗ , ξ ∗ ) with corresponding properties for the equationh

−ut + a(t, x)uxx + b(t, x)ux + c(t, x)u = 0, (a > 0),

where, necessarily, τ ∗ < t; the construction starts with

U0∗ (t, x; τ ∗ , ξ ∗ ) = U0 (τ ∗ , ξ ∗ ; t, x);

the remainder of the proof can be literally carried over.

5. Now we consider, in particular, the differential equation adjoint to (28)

(63) L∗ (u) ≡ −ut + (au)xx − (bu)x + cu


≡ M ∗ (u) − (λu)x + cu = 0

(cf. (31), (32)). It is still assumed that the coefficients of L(u) satisfy the con-
dition A of pp. 128 f.; obviously, the coefficients of L∗ (u) satisfy the same
condition, too. Therefore, there exists for τ ∗ < t a fundamental solution
U ∗ (t, x; τ ∗ , ξ ∗ ) of (63) in the sense of the theorem stated in no. 4. We want to
prove the fundamental symmetry property

(64) U (t, x; τ, ξ) = U ∗ (τ, ξ; t, x), (t < τ ).


h The original contains in the following equation the misprint “+c(t, x)” which has been

360 On the Theory of Stochastic Processes


To do so, we note that for any pair of functions u(t, x), v(t, x), which are
sufficiently regular such that all integrals make sense, we have the following
identity:
(65)
 t  +∞  +∞
* ∗
+ *  +
dt vL(u)−uL (v) dx = u(t , x)v(t , x)−u(t , x)v(t , x) dx
t −∞ −∞

(one verifies (65) integrating by parts vL(u) two times). ¶ Assume that ¶ 134

t0 < τ ∗ < t < t < τ < t1 ,

and set, in particular,

u(t, x) = U (t, x; τ, ξ),


v(t, x) = U ∗ (t, x; τ ∗ , ξ ∗ ).

Clearly, all integrals appearing above converge. In the interval t ≤ t ≤ t


the functions u and v are regular, so that the left-hand side of (65) vanishes
identically, and thus (65) shows that the integral of u(t, x)v(t, x) is independent
of t in the interval τ ∗ < t < τ :
 +∞
(66) U (t, x; τ, ξ)U ∗ (t, x; τ ∗ , ξ ∗ ) dx = ψ(τ, ξ; τ ∗ , ξ ∗ ).
−∞

If we let t → τ , then by (58) the left-hand side obviously converges to U ∗ (τ, ξ; τ ∗ ,


ξ ∗ ), i.e. one has

(67) ψ(τ, ξ; τ ∗ , ξ ∗ ) = U ∗ (τ, ξ; τ ∗ , ξ ∗ ).

For t → τ ∗ one obtains similarly

(68) ψ(τ, ξ; τ ∗ , ξ ∗ ) = U (τ ∗ , ξ ∗ ; τ, ξ);

together, (67) and (68) yield (64). Inserting this into (66) one obtains the
fundamental relation (cf. (9))
 +∞
(69) U (τ ∗ , ξ ∗ ; t, x)U (t, x; τ, ξ) dx = U (τ ∗ , ξ ∗ ; τ, ξ)
−∞

for τ ∗ < t < τ . Therefore, we have


The fundamental solution appearing in the theorem of no. 4 is differentiable
in τ, ξ and, as a function of τ, ξ, it solves the adjoint equation L∗ (u) = 0; it
satisfies (69).

6. Using known arguments, the existence of our fundamental solution


immediately implies the existence and uniqueness of a solution to the initial
corrected.

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 361


value problem mentioned at the beginning of this paragraph. Let u(t, x) be
a bounded solution of L(u) = 0 which converges as t → τ to the continuous
initial datum g(x). Inserting this u(t, x) into (65) and taking for v(t, x) the
fundamental solution U ∗ (t, x; τ ∗ , ξ ∗ ) = U (τ ∗ , ξ ∗ ; t, x) with τ ∗ < t < τ , then we
get
 +∞  +∞
U ∗ (t , x; τ ∗ , ξ ∗ )u(t , x) dx = U (τ ∗ , ξ ∗ ; t , x)u(t , x) dx,
−∞ −∞

and from this, using (58) (of course, applied to U ∗ (t , x; τ ∗ , ξ ∗ )) we get as
t → τ , t → τ ∗
 +∞
∗ ∗
(70) u(τ , ξ ) = g(x)U (τ ∗ , ξ ∗ ; τ, x) dx.
−∞

¶ 135 ¶ Our solution necessarily is of the form (70) and, conversely, (70) is a solution
of the problem. Therefore, (70) represents the only bounded solution u(t, x) of
L(u) = 0 which is defined for t < τ and which converges as t → τ to the continu-
ous initial datum g(x). – It is almost evident that the assumed boundedness of
the solution can easily be replaced by more general conditions (cf. no. 3). The
argument also remains valid if g(x) is discontinuous, but of bounded variation;
then, the initial values are only attained if the approximation takes place at
continuity points. A corresponding theorem holds, of course, for the equation
L∗ (u) = 0.
Finally we note for later use (in § 5): If f (t, x) is continuously differentiable
and bounded, then
 τ  +∞
(71) u(t, x) = dp f (p, q)U (t, x; p, q) dq
t −∞

is for t < τ the only bounded solution of L(u) = −f (t, x) which tends to zero
as t → τ . The fact that (71) is indeed a solution to L(u) = −f follows almost
literally from the arguments in the proof of no. 2, p. 128. That it is the only
solution follows since, otherwise, there would exist a not identically vanishing
solution of L(u) = 0 which tends to zero as t → τ .

§3. Further Properties of the Fundamental


Solution. Continuous Stochastic Processes.

Per se, it would be enough to consider in this paragraph the differential equa-
tion (28) with c(t, x) = 0; it does, however, not add any complication if we
include the case c = 0, and doing so, the interrelations become clearer. – There-
fore, we continue to consider the general equation (28) and assume that the
coefficients satisfy condition A of § 2, pp. 128 f.

362 On the Theory of Stochastic Processes


1. Our next aim is to prove that (11)–(13) hold for the function F (t, x; τ, ξ)
whose frequency function is the fundamental solution U (t, x; τ, ξ) in the sense
of the theorem stated in § 2, 4. For this we begin with some estimates for
U0 (t, x; τ, ξ) (cf. (38)).
First we show that for every δ > 0

1
(72) lim U0 (t, x; τ, ξ) dξ = 0
t→τ τ − t |ξ−x|>δ

and

1
(73) lim (ξ − x)2 U0 (t, x; τ, ξ) dξ = 2a(τ, x)
t→τ τ − t |ξ−x|<δ

¶ holds, uniformly in every finite domain (observe that, in contrast to (60)– ¶ 136
(61) we integrate with respect to ξ!). – For the proof we introduce – for fixed
t, x, τ – a new integration variable y by
ϕ(τ, ξ) − ϕ(t, x)
(74) y= & .
2(τ − t)
Since ϕ(τ, ξ) has a positive continuous derivative ϕξ (τ, ξ) (cf. (29)), there is a
number M > 0 such that for |ξ − x| > δ and sufficiently small values of τ − t
M
|y| > √ ,
τ −t
and M has in every bounded region of the x, t-plane a positive lower bound.
Therefore,
  
1 1 2 τ −t 1 2
U0 (t, x; τ, ξ) dξ ≤ √ e− 2 y dy ≤ √ y 2 e− 2 y dy,
2π |y|> √M 2
M 2π |y|> √M
|ξ−x|>δ
τ −t τ −t

and this yields (72). – For the proof of (73) we re-write (74) as
ϕ(τ, ξ) − ϕ(τ, x) ϕ(τ, x) − ϕ(t, x)
y= & + &
2(τ − t) 2(τ − t)
1 1 &
= (ξ − x) & & + η 2(τ − t),

a(τ, x ) 2(τ − t)
where η is bounded and x denotes a point satisfying x − δ < x < x + δ. This
and the above estimate for the integration limits yield
(75)
 
τ − t +∞ 1 2
(ξ − x) U0 (t, x; τ, ξ) dξ = √
2
2a(τ, x )y 2 e− 2 y dy + o(τ − t);
|ξ−x|<δ 2π −∞
denote by m and M the minimum and maximum, respectively, of a(τ, x )
 +∞
1 1 2
in the interval x − δ < x < x + δ; because of √ y 2 e− 2 y dy = 1 all
2π −∞

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 363


accumulation points of the left-hand side of (75) are between 2m and 2M as
t → τ . On the other hand, (72) holds for every δ > 0 so that the accumulation
points cannot depend on δ. If δ → 0, we get m, M → a(τ, x), and this proves
(73).
¶ 137 ¶ The same method also yields the following estimates where K denotes
some constant satisfying √ 1 < K:
a(t,x)
 +∞  

 +∞
(76)  ∂U0 (t, x; τ, ξ)  dξ < √ 1

K 1 2 K
|y|e− 2 y dy = √ √
1
;
 ∂x  2 π τ − t π τ −t
−∞ −∞
  
1  ∂U0 (t, x; τ, ξ) 
(77) lim   dξ = 0;
t→τ τ − t |ξ−x|>δ  ∂x 
  
1  ∂U0 (t, x; τ, ξ) 
(78) lim √ (ξ − x)2   dξ ≤ 4K · a(τ, x).

t→τ τ − t |ξ−x|<δ ∂x

2. Now we turn to the corresponding estimates for U1 (t, x; τ, ξ) (cf. (47)).


The constant K > 1 is chosen so large that
1
& < K, a(t, x) < K, 2|c(t, x)| < K, 2|λ(t, x)| < K;
a(t, x)
moreover, we restrict the following considerations to the interval τ − t < K.
For the further investigations we decompose U1 (cf. (47))

(79) U1 (t, x; τ, ξ) = U 1 (t, x; τ, ξ) + U 2 (t, x; τ, ξ)

where
 τ  +∞
∂U0 (p, q; τ, ξ)
U 1 (t, x; τ, ξ) = dp λ(p, q) U0 (t, x; p, q) dq,
t −∞ ∂q
(80)  τ +∞
U 2 (t, x; τ, ξ) = dp c(p, q) U0 (p, q; τ, ξ) U0 (t, x; p, q) dq.
t −∞

Integrating (80) under the integral sign and enlarging the domain of integration
given by δ > 0 yields:

1
(81) |U 1 (t, x; τ, ξ)| dξ
τ − t |ξ−x|>δ
 τ  +∞   
K  ∂U0 (p, q; τ, ξ) 
< dp U0 (t, x; p, q) dq   dξ
τ −t t  ∂q 
−∞ |ξ−x|>δ
 τ  +∞   
K  ∂U0 (p, q; τ, ξ) 
< dp U0 (t, x; p, q) dq   dξ
τ −t t  ∂q 
−∞ |ξ−q|> 12 δ
 τ   +∞  
K  ∂U0 (p, q; τ, ξ) 
+ dp U0 (t, x; p, q) dq   dξ.
τ −t t  ∂q 
|q−x|> 12 δ −∞

364 On the Theory of Stochastic Processes


It goes almost without saying that the following limits under the integral sign
are admissible although ¶ the interval of integration in q is infinite; trivially, ¶ 138
the contribution of the part where |q| > M will tend to zero as M → ∞. In
every finite domain we will use the limits (72), (73), (77) and (78) which exist
uniformly.
1
By (77) and τ −t 1
< τ −p one has for sufficiently small τ − t:
    
1 τ +∞  ∂U0 (p, q; τ, ξ) 
(82) dp U0 (t, x; p, q) dq   dξ
τ −t  ∂q 
t −∞ |ξ−q|> 12 δ
 τ  +∞
< dp U0 (t, x; p, q) dq = (τ − t)
t −∞

(cf. (40)). Using (76), and then (72), we get for sufficiently small τ − t
 τ   +∞  

1  ∂U0 (p, q; τ, ξ)  dξ
(83) dp U0 (t, x; p, q) dq  
τ −t t |q−x|> 12 δ −∞ ∂q
 τ 
dq 1 √
<K √ U0 (t, x; p, q) dq < K τ − t;
t τ −p p−t |q−x|> 12 δ

(82) and (83) give, in view of (81),



1
lim |U 1 (t, x; τ, ξ)| dξ = 0.
t→τ τ −t |ξ−x|>δ

It is clear that the same argument (even simpler and a fortiori) yields the
corresponding relation for U 2 (t, x; τ, ξ), too. Combining everything, one has

1
(84) lim |U1 (t, x; τ, ξ)| dξ = 0.
t→τ τ −t |ξ−x|>δ

Moreover, by (80) one has:

(85)

1
(ξ − x)2 |U 1 (t, x; τ, ξ)| dξ
τ − t |ξ−x|<δ
 τ  +∞   
K  
2  ∂U0 (p, q; τ, ξ) 
< dp U0 (t, x; p, q) dq (ξ − x)   dξ
τ −t t −∞ |ξ−x|<δ ∂q
   +∞  
Kδ 2 τ  ∂U0 (p, q; τ, ξ) 
< dp U0 (t, x; p, q) dq   dξ
τ −t t  ∂q 
|q−x|>δ −∞
 τ    
K  
2  ∂U0 (p, q; τ, ξ) 
+ dp U0 (t, x; p, q) dq (ξ − x)   dξ.
τ −t t |q−x|<δ |ξ−q|<2δ ∂q

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 365


¶ 139 ¶ Using (76), and then (72), we have
 τ   +∞  

1  ∂U0 (p, q; τ, ξ)  dξ
(86) dp U0 (t, x; p, q) dq  
τ −t t |q−x|>δ −∞ ∂q
 τ 
dp 1
<K √ · U0 (t, x; p, q) dq → 0.
t τ − p p − t |q−x|>δ

Since (ξ − x)2 ≤ 2(ξ − q)2 + 2(q − x)2 we get, using (78), (76), (73) and (40),
for sufficiently small τ − t:

(87)
    
1 τ  
2  ∂U0 (p, q; τ, ξ) 
dp U0 (t, x; p, q) dq (ξ − x)   dξ
τ −t t |q−x|<δ |ξ−q|<2δ ∂q
 τ  +∞   
2  
2  ∂U0 (p, q; τ, ξ) 
< dp U0 (t, x; p, q) dq (ξ − q)   dξ
τ −t t −∞ |ξ−q|<2δ ∂q
 τ   +∞  
2  ∂U0 (p, q; τ, ξ) 
+ dp (q − x)2 U0 (t, x; p, q) dq   dξ
τ −t t  ∂q 
|q−x|<δ −∞
 τ  +∞
8K √
< τ − p dp a(τ, q)U0 (t, x; p, q) dq
τ −t t −∞
 τ 
dp 1
+ 2K √ · (q − x)2 U0 (t, x; p, q) dq
t τ − p p − t |q−x|<δ

< 15K 2 τ − t.

(85)–(87) yield:

1
lim (ξ − x)2 |U 1 (t, x; τ, ξ)| dξ = 0.
t→τ τ − t |ξ−x|<δ

The same estimates yield again, a fortiori, the corresponding relation for
U 2 (t, x; τ, ξ), and so

1
(88) lim (ξ − x)2 |U1 (t, x; τ, ξ)| dξ = 0.
t→τ τ − t |ξ−x|<δ

3. The remaining estimates which we need are now easily derived. As-
sume that for some n ≥ 0 and some constant M > 0 we have proved that
 +∞ n−1
K 2n−2 (τ − t) 2
|Un (t, x; τ, ξ)| dξ < M   ,
−∞ Γ n+12
(89)  +∞  
 2n−2 (τ − t) n−1
 ∂Un (t, x; τ, ξ)  dξ < M K
2
   n+1  .
−∞ ∂x Γ 2

366 On the Theory of Stochastic Processes


¶ Then we get from (47), using (40) and (76), respectively, and the constant ¶ 140
K as at the beginning of no. 2, immediately that

(90)
 +∞
|Un+1 (t, x; τ, ξ)| dξ
−∞
 τ  +∞
M K 2n−1 n−1
<  n+1  (τ − p) 2 dp U0 (t, x; p, q) dq
Γ 2 t −∞
M K 2n−1 2 n+1 M K 2n−1 n+1 M K 2n n
=  n+1  · (τ − t) 2 =  n+3  (τ − t) 2 <  n+2  (τ − t) 2 ,
Γ 2 n+1 Γ 2 Γ 2

and
 +∞  

(91)  ∂Un+1 (t, x; τ, ξ)  dξ
 ∂x 
−∞
  +∞  
M K 2n−1 τ n−1  ∂U0 (t, x; p, q) 
<   (τ − p) dp   dq
 
2
Γ n+12 t −∞ ∂x
 τ n−1
M K 2n (τ − p) 2
< √  n+1  √ dp
πΓ 2 t p−t
 1 n−1
M K 2n n (1 − s) 2
= √  n+1  (τ − t) 2 √ ds
πΓ 2 0 s
n
M K 2n n   M K 2n (τ − t) 2
= √  n+1  (τ − t) 2 B 12 , n+1
2 =   .
πΓ 2 Γ n+22

Because of (40) and (76) the inequalities (89) hold for all n ≥ 0. Moreover,
the penultimate of the inequalities in (90) also yields for all n
 +∞
M K 2n−3 n
(92) |Un (t, x; τ, ξ)| dξ <   (τ − t) 2 .
−∞ Γ n+2
2

Thus, there is a constant N such that


  
+∞  
∞  √
1  
(93)  Un (t, x; τ, ξ) dξ ≤ N τ − t
τ −t −∞  
n=3

holds. – A separate investigation is only needed for the case where n = 2. But
the estimates leading to (84) yield in exactly the same way, and even easier,
the relation

1
(94) lim |U2 (t, x; τ, ξ)| dξ = 0.
t→τ τ − t |ξ−x|>δ

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 367


¶ 141 ¶ Moreover, by (92), it follows that

1
(95) (ξ − x)2 |U2 (t, x; τ, ξ)| dξ
τ − t |ξ−x|<δ
 +∞
δ2
≤ |U2 (t, x; τ, ξ)| dξ < δ 2 · M K.
τ − t −∞
According to (94), however, the accumulation points of the left-hand side, as
t → τ , are certainly independent of δ. Thus, (95) yields:

(96) lim (ξ − x)2 |U2 (t, x; τ, ξ)| dξ = 0.
t→τ |ξ−x|<δ

When combined, (72), (84), (94) and (93) give according to (55):

1
(97) lim U (t, x; τ, ξ) dξ = 0;
t→τ τ − t |ξ−x|>δ

(73), (88), (96) and (93) also yield:



1
(98) lim (ξ − x)2 U (t, x; τ, ξ) dξ = 2a(τ, x).
t→τ τ − t |ξ−x|<δ

Finally, summing (92) over n ≥ 1 and adding (40), gives:


 +∞
(99) lim U (t, x; τ, ξ) dξ = 1;
t→τ −∞

if we set
 ξ
(100) F (t, x; τ, ξ) = U (t, x; τ, y) dy,
−∞

then we get from (99) and (97) at once that F (t, x; τ, ξ) tends for t → τ and
x = ξ to E(x, ξ) (for the definition cf. (4)), i.e. that (5) holds. The correspond-
ing relation (6) is equivalent to the two equations (58) if we apply them to
U (τ ∗ , ξ ∗ ; t, x) which we understand as a function of t, x and as the fundamental
solution to the adjoint equation (63). Thus we can collect our results up to
this point in the following way:
The fundamental solution, constructed in § 2, to the general linear parabolic
differential equation (28) yields by (100) a function F (t, x; τ, ξ) which is in the
variable ξ of bounded variation and satisfies the necessary conditions (5), (6),
(7), (11) and (12) for continuous stochastic processes.
In general, F (t, x; τ, ξ) does not define a stochastic process since it need
¶ 142 not be a distribution function in ξ. Moreover, ¶ we have not yet established
that the necessary condition (13) for continuous stochastic processes holds. In
order to derive those two properties, we have to consider the particular type of
the differential equation (15) for continuous stochastic processes, i.e. we have
to set c(t, x) = 0 in the general equation (28).

368 On the Theory of Stochastic Processes


4. Now we consider, in particular, the equation

(101) ut + a(t, x)uxx + b(t, x)ux = 0 (a(t, x) > 0),

and denote by U (t, x; τ, ξ) (t < τ ) its fundamental solution in the sense of


the theorem stated in § 2, 4. First, we show that the function F (t, x; τ, ξ)
defined by (100) is, for equations of the form (101), a distribution function.
For this we rely on the Gevrey’s lemma [4, I, p. 373]12 : Assume that in the
differential equation (28) one has c(t, x) ≤ 0 and let u(t, x) be some solution
which is in some interior point (t , x ) of the domain negative; denote by m(δ)
the minimum of u(t, x) on the semicircle (t − t )2 + (x − x )2 = δ 2 , t ≥ t , then
one has for sufficiently small δ
(102) m(δ) ≤ u(t , x ).

We use this lemma to prove the following assertion which includes also the
case c < 0 for later reference (§ 5):
Assume that c ≤ 0 in the differential equation (28). Then its fundamental
solution U (t, x; τ, ξ) is non-negative.
Assume there were a point (t < τ, x ) such that U (t , x ; τ, ξ) < 0, then
there would exist an interval |ξ − y| <  where U (t , x ; τ, y) < 0. Let g(y) be
some continuous function which is nonnegative in, and vanishing outside, that
interval (g(ξ) = 0). We set
 +∞
u(t, x) = g(y)U (t, x; τ, y) dy,
−∞

so that u(t , x ) < 0 and u is a solution to (101). For t < t < τ and x → ±∞
the function u(t, x) obviously tends to zero uniformly and, as t → τ , to the
nonnegative function g(y) (cf. § 2, 6). This clearly contradicts (102).
¶ We have, in particular, for the fundamental solution of (101) ¶ 143

(103) U (t, x; τ, ξ) ≥ 0,

i.e. the function F (t, x; τ, ξ) defined by (100) is monotonically increasing in ξ.


Further we note that the function u = 1 is a solution of (101) and attains,
as t → τ , the initial value 1; according to § 2, 6 this is the only bounded
solution with this initial datum and it is, therefore, given by
 +∞
(104) 1= U (t, x; τ, ξ) dξ.
−∞
12 In order to be self-contained we provide its proof. First we assume that in a neighbour-

hood of (t , x ) we have c < 0. Then (102) holds even with a strict inequality sign; otherwise
we would clearly have at the point (t , x )
ux = 0, uxx ≥ 0, ut ≥ 0, cu > 0,

which contradicts (28). If c ≤ 0, then it is sufficient to set z(t, x) = uek(t−t ) ; then z satisfies
an equation of the form (28) where c is replaced by c − k; thus, z satisfies an inequality of

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 369


This equation and the just established monotonicity show that F (t, x; τ, ξ) is
for each t < τ and x a distribution function in ξ.
In order to prove that (13) holds, we use the fundamental relation (69)
which gives with Δt > 0
 +∞
(105) U (t − Δt, x; τ, ξ) = U (t − Δt, x; t, y)U (t, y; τ, ξ) dy.
−∞

This yields, using (104) as in § 1, 2 (p. 119)

U (t − Δt, x; τ, ξ) − U (t, x; τ, ξ)
Δt 
1
= Ux (t, x; τ, ξ) (y − x)U (t − Δt, x; t, y) dy
Δt |x−y|<δ

1 1 * +
+ Uxx (t, x; τ, ξ) (y − x)2 + o(δ) U (t − Δt, x; t, y) dy
2 Δt |x−y|<δ

1 * +
+ U (t, y; τ, ξ) − U (t, x; τ, ξ) U (t − Δt, x; t, y) dy.
Δt |x−y|>δ

The expression in the curly braces of the last integral is for every t < τ bounded
and, by (97), the integral tends to zero. The accumulation points of the second
term on the right-hand side are, because of (97), independent of δ; thus this
term converges to a(t, x)Uxx (t, x; τ, ξ) by (98); the left-hand side converges to
−Ut (t, x; τ, ξ). Therefore, from (101) it follows that also

1
(106) lim (y − x)U (t − Δt, x; t, y) dy = b(t, x).
Δt→0 Δt |y−x|<δ

Thus, we have:
The fundamental solution, constructed in § 2, of the differential equation
(101) yields by (100) the transition probabilities F (t, x; τ, ξ) of a continuous
stochastic process satisfying the relations (11), (12), (13).
Alternatively: Assume that the coefficients a(t, x) and b(t, x) satisfy con-
¶ 144 dition A of pp. 128 f.; then the equation (15) with the ¶ initial condition (5)
uniquely defines a stochastic process satisfying (11)–(13). Without further as-
sumptions there is a continuously differentiable frequency function U (t, x; τ, ξ)
which satisfies the adjoint equation (17). Together with the initial condition
(6), the equation (17) is also a unique characterization of the process.

§4. Purely Discontinuous Stochastic Processes.


1. As we have mentioned in the introduction (pp. 114 f.) we want to
treat the special case of purely discontinuous processes separately: On the

the form (102) for any k > 0. Letting k → 0, the claim (102) follows.

370 On the Theory of Stochastic Processes


one hand, since this is possible under more general conditions, and on the
other, since the theory does not depend on differential equations. Since this
case seems to demand special interest, it is reasonable not to encumber the
presentation with the somewhat intricate arguments of §§ 2 and 3. The general
case will be treated in § 5 along the very same lines, only the ordinary integrals
will be replaced by certain solutions of a parabolic differential equation. Our
treatment will also highlight the essence of the general case and, when treating
this case, it will be enough to indicate briefly the new additions.
According to § 1, 2 we can either start from equation (20) or (21) which
correspond to running the process in negative or positive direction of time. In
both cases the treatment runs parallel and we begin with (21) which is the
more natural choice (in § 3 and, analogously, in § 5 we had to start with the
equation in t, x, since the other equation could only be derived a posteriori).
We assume throughout this paragraph that p(t, x) and P (t, x, ξ) are two
nonnegative functions which are defined for all real x, ξ and t in some (finite
or infinite) interval t0 < t < t1 , continuous in t and Borel measurable in x;
finally it is assumed that P (t, x, ξ) is a distribution function in ξ (cf. p. 116),
and p(t, x) is bounded in every finite subinterval of (107):

(107) 0 ≤ p(t, x) < K(T ), if |t| < T.

We start with (21) which we write in the form:

∂u(τ, ξ) * +
(108) = u(τ, ξ) ⊕ − p(τ, x)E(x, ξ) + p(τ, x)P (τ, x, ξ)
∂τ
 ξ  +∞
=− p(τ, y) du(τ, y) + p(τ, y)P (τ, y, ξ) du(τ, y);
−∞ −∞

¶ the integrals on the right-hand side are well-defined for every function u(τ, ξ) ¶ 145
which is of bounded variation in ξ, cf. Lebesgue [11, p. 261].
The point is to find a solution u(τ, ξ) = F (t, x; τ, ξ) of (108)i which is defined
for

(109) t0 < t < τ < t1

and satisfies the initial condition (6). Once such a solution has been con-
structed, it remains to check whether it enjoys the other remaining conditions
of a stochastic process.
In the following we always assume that the variables satisfy the inequality
(109); by a “finite interval” we always mean |t| < T such that we can apply
(107).

i The original cross-reference (109) is a misprint.

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 371


2. We begin with the uniqueness theorem: There is at most one solution
to (108) which converges for τ → t to a given function g(ξ) (where, of course,
g(ξ) is of bounded variation).
If there were two such solutions, then their difference u(τ, ξ) would also be
a solution to (108) which would tend for τ → t to zero; thus,
 τ
* +
(110) u(τ, ξ) = u(s, ξ) ⊕ − p(s, x)E(x, ξ) + p(s, x)P (s, x, ξ) ds.
t

Now let u(τ, ξ) = Π(τ, ξ) − N(τ, ξ) be the decomposition of u(τ, ξ) into two
components which are non-decreasing as ξ is increasing, V (τ, ξ) = Π(τ, ξ) +
N(τ, ξ) and v(τ ) = lim V (τ, ξ) as ξ → ∞. Then we get easily from (110) that
 τ
* +
Π(τ, ξ) ≤ N(s, ξ) ⊕ p(s, x)E(x, ξ) + Π(s, ξ) ⊕ p(s, x)P (s, x, ξ) ds,
t

and an analogous inequality for N(τ, ξ). Because of (107) and 0 ≤ P ≤ 1 we


have in every finite interval
 τ
* +
V (τ, ξ) ≤ V (s, ξ) ⊕ p(s, x)E(x, ξ) + p(s, x)P (s, x, ξ) ds
t
 τ
≤ 2K(T ) v(s) ds.
t

Starting from the assumption v(τ ) < M , it follows for every natural number n
by induction
(τ − t)n
V (τ, ξ) ≤ 2K(T )M ,
n!
so v(τ ) ≡ 0, q.e.d.
¶ 146 ¶ The solution to our problem is most easily found by successive approxi-
mations; we set

(111)
F0 (t, x; τ, ξ) = E(x, ξ),
(112)
 τ * +
Fn+1 (t, x; τ, ξ) = Fn (t, x; s, ξ) ⊕ − p(s, x)E(x, ξ) + p(s, x)P (s, x, ξ) ds.
t

Then it will turn out that




(113) F (t, x; τ, ξ) = Fn (t, x; τ, ξ)
n=0

represents the desired solution. For the proof of uniform convergence of (113)
we need a new representation for Fn (t, x; τ, ξ) which will also be much more
manageable.

372 On the Theory of Stochastic Processes


3. By induction we define for every integer k and n ≥ 0:
(114) ϕ0,0 (t, x; τ, ξ) = E(x, ξ); ϕk,0 (t, x; τ, ξ) ≡ 0 if k = 0;
 τ
*
(115) ϕk,n+1 (t, x; τ, ξ) = ϕk,n (t, x; s, ξ) ⊕ p(s, x)E(x, ξ)
t
+
+ ϕk−1,n (t, x; s, ξ) ⊕ p(s, x)P (s, x, ξ) ds.
All ϕk,n are, of course, Borel measurable in x, differentiable in t and τ , and
non-decreasing in ξ. Furthermore we find by induction from (115) using (107)
that in every finite interval
n {K(T )(τ − t)}n
(116) 0 ≤ ϕk,n (t, x; τ, ξ) ≤ ,
k n!
and, finally,
(117) ϕk,n (t, x; τ, ξ) = 0 if n < k and k < 0.
Now it is easy to see that (cf. (112))

n
(118) Fn (t, x; τ, ξ) = (−1)n (−1)k ϕk,n (t, x; τ, ξ)
k=0

holds. Indeed, if we use (118) as the definition of Fn , then it follows from


(115) thatj

∂Fn+1 (t, x; τ, ξ)  n+1


*
(119) = (−1)n+1 (−1)k ϕk,n (t, x; τ, ξ) ⊕ p(τ, x)E(x, ξ)
∂τ
k=0
+
+ ϕk−1,n (t, x; τ, ξ) ⊕ p(τ, x)P (τ, x, ξ)
= −Fn (t, x; τ, ξ) ⊕ p(τ, x)E(x, ξ)
+ Fn (t, x; τ, ξ) ⊕ p(τ, x)P (τ, x, ξ).
Since
(120) lim Fn (t, x; τ, ξ) = 0 if n ≥ 1,
τ →t

the relation (112) follows from (119), i.e. (112) and (118) are equivalent.
¶ (118) yields a representation of Fn (t, x; τ, ξ) as difference of two functions ¶ 147
which are monotonic in ξ. From (118) we obtain because of (116) in every
finite intervalk
{K(T )(τ − t)}n  n
n
{2K(T )(τ − t)}n
(121) |Fn (t, x; τ, ξ)| ≤ = ,
n! k n!
k=0
j The original of the last two lines of the following formula contains the misprints
−Fn (t, x; τ, ξ) ⊕ p(τ, ξ)E(x, ξ) and +Fn (t, x; τ, ξ)p(τ, x)P (τ, x, ξ) which have been corrected.

n
k In the original, the following calculation contains the misprint .
n=0

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 373


and, at the same time, the right-hand side yields an upper bound for the
total variation of Fn as a function of ξ. Therefore, the series (113) converges
uniformly towards a function whose total variation does not exceed
(122) e2K(T )(τ −t) .
 ∂Fn
Similarly, the series ∂τ converges uniformly because of (119), and it follows
immediately from (119) that (113) is indeed a solution to (108). From (120)
and (111) it follows as well that the initial condition (6) is (even uniformly)
satisfied.
Thus, F (t, x; τ, ξ) defined in (113) is a solution to (108) which is of bounded
variation in ξ (cf. (121)) and satisfies the initial condition (6).
The estimates (116) and (121) justify the following calculation with infinite
series:


F (t, x; τ, ξ) = Fn (t, x; τ, ξ)
n=0
∞ ∞

= (−1)n (−1)k ϕk,n (t, x; τ, ξ)
n=0 k=0
∞ ∞
= (−1)n ϕk,k+n (t, x; τ, ξ)
k=0 n=0

Thus, setting for k ≥ 0




(123) ψk (t, x; τ, ξ) = (−1)n ϕk,k+n (t, x; τ, ξ),
n=0

we have


(124) F (t, x; τ, ξ) = ψk (t, x; τ, ξ).
k=0

4. In order to prove the monotonicity of F (t, x; τ, ξ) in ξ, we derive a new


representation for the functions ψk (t, x; τ, ξ) which is also in itself interesting
and which shows that all ψk are monotone in ξ.
a) k = 0. According to (114) and (117) we obtain inductively
 τ
ϕ0,n+1 (t, x; τ, ξ) = ϕ0,n (t, x; s, ξ) ⊕ p(s, x)E(x, ξ) ds
t
= An+1 (t, τ, x)E(x, ξ),
¶ 148 ¶ where we use the shorthand
 τ
(125) A0 (t, τ, x) = 1, An+1 (t, τ, x) = p(s, x)An (t, s, x) ds.
t

374 On the Theory of Stochastic Processes


Inserting this into (123) yields


(126) ψ0 (t, x; τ, ξ) = E(x, ξ) · (−1)n An (t, τ, x),
n=0

where the convergence of the series on the right-hand side is evident. Setting,
for a moment,


(127) (−1)n An (t, τ, x) = f (t, τ, x)
n=0

then we get from (125)

∂f (t, τ, x)
= −p(τ, x)f (t, τ, x)
∂τ
or, since f (t, τ, x) → 1 as τ → t,

− p(s,x) ds
f (t, τ, x) = e t .

Combined, (126) and (127) thus yield



− p(s,x) ds
(128) ψ0 (t, x; τ, ξ) = E(x, ξ) · e t .

This is a nonnegative and in ξ non-decreasing function.


b) If k > 0, then we obtain because of (115) by differentiating (123)

∂ψk (t, x; τ, ξ)
(129) = −ψk (t, x; τ, ξ) ⊕ p(τ, x)E(x, ξ)
∂τ
+ ψk−1 (t, x; τ, ξ) ⊕ p(τ, x)P (τ, x, ξ).

Here, t, x are two parameter which only influence the initial values. Indeed,
we have by (123) and (115) for k > 0

(130) lim ψk (t, x; τ, ξ) = 0.


τ →t

(129) is a particular case of a more general equation of the form


 ξ
∂v(τ, ξ)
(131) =− p(τ, y) dv(τ, y) + h(τ, ξ),
∂τ −∞

where h(τ, ξ) is a prescribed function which is of bounded variation in ξ and


continuous in τ , and where the unknown function v(τ, ξ) has to be of bounded
variation and converges, as τ → t, to a given function g(ξ) of bounded variation.

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 375


If p(τ, ξ) does not depend on ξ, then (131) simplifies to an ordinary linear ¶
¶ 149 differential equation depending on a parameter ξ. The initial value problem
for (131) stated above always has a unique solution given by

 ξ − p(s,y) ds
(132) v(τ, ξ) = e t dH(τ, y),
−∞
where we set

 τ  ξ p(s,y) ds
(133) H(τ, ξ) = dσ et dh(σ, y) + g(ξ).
t −∞
It is easy to verify that this is indeed a solution by formally differentiating
this expression (this is obviously admissible). The uniqueness of the solution
follows exactly as in no. 2: The difference of any two solutions of the initial
value problem would be a solution to
 ξ
∂u(τ, ξ)
=− p(τ, y) du(τ, y)
∂τ −∞
which tends to zero as τ → t. This is a particular case of (110) if P (t, x, ξ) = 0
(although the latter is not a distribution function in ξ, it is easily verified that
the proof of p. 145 remains, a fortiori, valid if P = 0).
With these preparations we can return to the particular equation (129).
(132)–(133) yield for ψk (k > 0) the new representation

 ξ − p(s,y) ds
(134) ψk (t, x; τ, ξ) = e t dHk (τ, y)
−∞
with
(135)

 
τ ξ p(s,y) ds * +
Hk (τ, ξ) = dσ et dy ψk−1 (t, x; σ, y) ⊕ p(σ, y)P (σ, x, y) .
t −∞
Now we see by induction immediately that all ψk (t, x; τ, ξ) are monotone
in ξ. Indeed, if this is true for ψk−1 , then it holds for ψk−1 (t, x; τ, ξ) ⊕
p(τ, x)P (τ, x, ξ), too, hence for the function Hk defined in (135) and, by (134),
also for ψk .
By (124) the function F (t, x; τ, ξ) does not decrease if ξ increases. It remains
to show that for every t, x, τ (2) holds; because of (111) and the uniform
convergence of (113) this is equivalent to saying that for each t, x, τ and n ≥ 1
(136) lim Fn (t, x; τ, ξ) = 0
ξ→±∞

holds. By the monotonicity of P (t, x, ξ) in the variable ξ,


 +∞
lim Fn (t, x; τ, ξ) ⊕ p(τ, x)P (τ, x, ξ) = p(τ, y) dFn (t, x; τ, y).
ξ→∞ −∞

376 On the Theory of Stochastic Processes


¶ Therefore, we have for every fixed t, x, τ ¶ 150
* +
lim Fn (t, x; τ, ξ) ⊕ − p(τ, x)E(x, ξ) + p(τ, x)P (τ, x, ξ) = 0.
ξ→±∞

The integrand in (112) tends to zero as ξ → ±∞, and since it is, by (121),
uniformly bounded, also the integral tends to 0; this proves (136).
Collecting the results we have:
Under the assumptions p. 144 on p(t, x) and P (t, x, ξ) there exists exactly
one solution F (t, x; τ, ξ) of (108) or (21) which satisfies the condition (6); it is
for all fixed t, x, τ a distribution function in ξ and it enjoys the representations
a) (111)–(113) or (118), respectively;
b) (123)–(124) together with (114)–(115);
c) (123)–(124) together with (134)–(135).

5. As an application we consider the most general purely discontinuous


stochastic process which is temporally and spatially homogeneous, i.e. whose
transition probabilities only depend on the length of the respective time-
interval, but not on its position nor on the current size of the random variables
constituting the process.
Then we have

p(t, x) = λ = const. > 0, P (t, x, ξ) = G(ξ − x),

where G(ξ) is a fixed distribution function (= a-posteriori-probability of a


jump of the size ≤ ξ, given the knowledge that a jump really took place). This
is the “general discontinuous stochastic process” of Khintchine [6, Chap. II,
§ 4] (general, due to the assumed homogeneity).
We set
0 if ξ < 0,
G0 (ξ) = E(0, ξ) =
1 if ξ ≥ 0,
and for n ≥ 0
 +∞  +∞
Gn+1 (ξ) = Gn (ξ) ⊕ G(ξ − x) = G(ξ − y) dGn (y) = Gn (ξ − y) dG(y).
−∞ −∞

Then (128), (134) and (135) immediately imply

{λ(τ − t)}k
ψk (t, x; τ, ξ) = e−λ(τ −t) Gk (ξ − x).
k!
Thus we have because of (124)

 {λ(τ − t)}k
F (t, x; τ, ξ) = e−λ(τ −t) Gk (ξ − x),
k!
k=0

and this is the known solution to the problem. ¶ ¶ 151

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 377


6. The preceding observations carry over almost literally if we start from
the equality (20). The problem is to determine a function F ∗ (t, x; τ, ξ) satis-
fying the equation
$  +∞ %
∂u(t, x)
(137) = p(t, x) u(t, x) − u(t, y) dP (t, x, y)
∂t −∞
* +
= p(t, x) E(x, ξ) − P (t, x, ξ) ⊕ u(t, x),

the initial condition (5) and which is for all t, x, τ a distribution function in
ξ. Again we obtain a unique solution of this problem with the following repre-
sentation:

 ∞

F ∗ (t, x; τ, ξ) = Fn∗ (t, x; τ, ξ) = ϕ∗k (t, x; τ, ξ)
n=0 k=0

with

F0∗ (t, x; τ, ξ) = E(x, ξ),


 τ

* +
Fn+1 (t, x; τ, ξ) = − p(s, x) E(x, ξ) − P (s, x, ξ) ⊕ Fn∗ (s, x; τ, ξ) ds,
t


ψk∗ (t, x; τ, ξ) = (−1)n ϕ∗k,k+n (t, x; τ, ξ),
n=0
E(x, ξ) if k = 0
ϕ∗k,0 (t, x; τ, ξ) = ,
0 if k = 0
 τ

* +
ϕk,n+1 (t, x; τ, ξ) = p(s, x) ϕ∗k,n (s, x; τ, ξ) + P (s, x, ξ) ⊕ ϕ∗k−1,n (s, x; τ, ξ) ds.
t

The functions ψk∗ (t, x; τ, ξ) are now given by



− p(s,x) ds
ψ0∗ (t, x; τ, ξ) = E(x, ξ)e t ,

and for k > 0 by the simpler ordinary differential equation

∂ψk∗ (t, x; τ, ξ)
= p(t, x)ψk∗ (t, x; τ, ξ) − p(t, x)P (t, x, ξ) ⊕ ψk−1

(t, x; τ, ξ)
∂t
which has the well-known solutionl
τ τ
− p(s,x) ds  τ p(s,x) ds
ψk∗ (t, x; τ, ξ) = e t p(σ, x) eσ ∗
P (σ, x, ξ) ⊕ ψk−1 (σ, x; τ, ξ) dσ,
t

from which the monotonicity in ξ follows.

l The ∗
misprints of the original, ψk (t, x; τ, ξ) and ψk−1 (σ, x, ξ), have been corrected.

378 On the Theory of Stochastic Processes


7. It still remains to be shown 1) that the two solutions from no. 4 and
no. 6 coincide, as one would expect according to § 1, 3, and 2) that this solution
satisfies the fundamental relation (7).
¶ For this we define for any function u(s, y) of bounded variation in y ¶ 152

 y  +∞
L(u) ≡ −us (s, y) − p(s, z) du(s, z) + p(s, z)P (s, z, y) du(s, z),
−∞ −∞
$  +∞ %

L (u) ≡ us (s, y) − p(s, y) u(s, y) − u(s, z) dP (s, y, z) ,
−∞

so that the function F (t, x; s, y) as of no. 4 is a solution to L(u) = 0, and


F ∗ (s, y; τ, ξ) from no. 6 is a solution to L∗ (u) = 0 (both times as a function of
the variables s, y).
Now let v(s, y) be a function which is, along with its partial deriviative
vs (s, y), continuous in s and Borel measurable in y. Moreover, let u(s, y) be
a function which is, together with us (s, y), continuous in s and of bounded
variation in y; then the expression L(u) is well-defined and has also bounded
variation in y. Under these assumptions, the integral
 +∞
v(s, y) dL(u(s, y))
−∞

exists, and if we change the order of integrations or perform an integration by


parts in the variable s, we obtain the following fundamental identity:
 t  +∞ * +
(138) ds v(s, y) dL(u(s, y)) − L∗ (v(s, y)) du(s, y)
t −∞
 +∞ * +
≡ v(t , y) du(t , y) − v(t , y) du(t , y) .
−∞

Now we set, in particular for t < s < τ ,

u(s, y) = F (t, x; s, y), v(s, y) = F ∗ (s, y; τ, ξ),

where F and F ∗ have the same meaning as in no. 4 and no. 6. Then the
above assumptions are fulfilled and we may apply (138). The left-hand side
vanishes identically, and (138) tells us that the integral over v du for t < s < τ
is independent of s:
 +∞
(139) F ∗ (s, y; τ, ξ) dF (t, x; s, y) = ψ(t, x; τ, ξ).
−∞

By construction (cf. (111)–(113)) we now have

(140) F (t, x; τ, ξ) = E(x, ξ) + (τ − t)G(t, x; τ, ξ),

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 379


where G(t, x; τ, ξ) is in ξ uniformly of bounded variation. Inserting this into
(139) and letting s → t, we get because of the continuity of F (s, y; τ, ξ) in s
immediately
(141) ψ(t, x; τ, ξ) = F ∗ (t, x; τ, ξ);
¶ 153 ¶ using the representation of F ∗ which is the analogue of (140) and letting
s → τ we also get
(142) ψ(t, x; τ, ξ) = F (t, x; τ, ξ).
Then (140)–(142) yield the two claims
F (t, x; τ, ξ) = F ∗ (t, x; τ, ξ)
and  +∞
F (s, y; τ, ξ) dF (t, x; s, y) = F (t, x; τ, ξ), t < s < τ.
−∞
Thus, our solutions always yield a stochastic process. It is obvious that for
this process also the original defining relation (18) of a discontinuous stochastic
process holds; this follows, for instance, from (111)–(113):
F (t, x; τ, ξ) = E(x, ξ) + F1 (t, x; τ, ξ) + o(τ − t),
 τ
* +
F1 (t, x; τ, ξ) = E(x, ξ) ⊕ − p(s, x)E(x, ξ) + p(s, x)P (s, x, ξ) ds
t
* +
= (τ − t) − p(t, x)E(x, ξ) + p(t, x)P (t, x, ξ) + o(τ − t).
Summing up, we have:
Under the assumptions stated on p. 144 on p(t, x) and P (t, x, ξ) the initial
value problems (5) for the equation (20) as well as (6) for the equation (21)
have each a unique solution, and the solutions coincide. This solution gives
rise to a stochastic process (i.e. it is a distribution function in ξ and satisfies
(7)) and the relation (18) holds.

§5. The General Mixed Case.


1. The considerations of the previous paragraph can be adapted to the
general case discussed in § 1, 4, with only minor changes, the simple integrals
with respect to time have to be replaced by a particular solution of a parabolic
differential equation; therefore, we begin with an estimate of this solution.
In the sequel we set
(143) L(u(t, x)) ≡ ut + a(t, x)uxx + b(t, x)ux , a > 0;
we assume that the coefficients a and b satisfy again condition A from § 2,
pp. 128 f. Let us introduce, for brevity, the notation
 τ  +∞
(144) J τ [f (t, x)] = ds f (s, y)U (t, x; s, y) dy, τ > t,
t −∞

380 On the Theory of Stochastic Processes


¶ where U (t, x; τ, ξ) denotes the fundamental solution to the equation L(u) = 0 ¶ 154
which was constructed in § 2. If f (t, x) is differentiable and bounded then,
according to § 2, 6, p. 135, (144) represents a solution to the equation
L(u) = −f (t, x),
and it is the only bounded solution which tends to zero as t → τ . (In fact, this
theorem remains valid under substantially weaker assumption on f (t, x)). By
§ 3, 4, p. 142, we have U (t, x; τ, ξ) ≥ 0, and therefore it follows from
f1 (t, x) ≤ f (t, x) ≤ f2 (t, x)
that also
J τ [f1 (t, x)] ≤ J τ [f (t, x)] ≤ J τ [f2 (t, x)].
Now we obviously have
(τ − t)n+1
J τ [(τ − t)n ] = ,
n+1
n+1
since u(t, x) = (τ −t)
n+1 is some, hence the only, bounded solution to L(u) =
−(τ − t)n which vanishes if t = τ . Thus we have the
Lemma. If f (t, x) is differentiable and satisfies
(145a) 0 ≤ f (t, x) ≤ M (τ − t)n , M = const.,
then the estimate
(τ − t)n+1
(145b) 0 ≤ J τ [f (t, x)] ≤ M
n+1
holds.
Of course, one could have obtained these estimates also directly from the
representation of the fundamental solution in § 2.

2. Continuing from the methods developed in § 4 we can solve the more


general equation (26) with the initial value problem (5), but this requires
certain continuity properties of the functions p(t, x) and P (t, x, ξ). Since we do
not want to obscure the issue by unnecessary complications, let us introduce
some stronger assumptions than actually needed. – In order to derive (26)
we have already a priori assumed that F (t, x; τ, ξ) is in t once, and in x twice,
differentiable. Therefore it seems to be natural, to require also differentiability
of p(t, x) and P (t, x, ξ).
∂P (t, x, ξ)
We assume, from now on, that ≥ 0 exists, and that it is along
∂ξ
with p(t, x) continuously differentiable in x and t. Of course, one needs that
 +∞
∂P (t, x, y)
dy ≡ 1,
−∞ ∂y

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 381


¶ 155 ¶ and p(t, x) is nonnegative and bounded in each finite t-interval. The coeffi-
cients a(t, x) and b(t, x) have to satisfy condition A as of pp. 128 f.
We write the equation (26) as
(146)
 +∞
∂P (t, x, y)
N (u(t, x)) ≡ L(u(t, x)) − p(t, x)u(t, x) + p(t, x) u(t, y) dy = 0.
−∞ ∂y
Then there is a unique solution u(t, x) = F (t, x; τ, ξ) of (146) which is defined
for t < τ and tends to E(x, ξ) as t → τ ; without further assumptions, F (t, x; τ, ξ)
is a distribution function in ξ and it enjoys the following representation


(147) F (t, x; τ, ξ) = Fn (t, x; τ, ξ),
n=0

and
 ξ
(148) F0 (t, x; τ, ξ) = U (t, x; τ, y) dy,
−∞
.
(149) Fn+1 (t, x; τ, ξ) = J τ − p(t, x)Fn (t, x; τ, ξ)
 +∞ /
∂P (t, x, y)
+ p(t, x) Fn (t, y; τ, ξ) dy .
−∞ ∂y
Here U (t, x; τ, ξ) is the fundamental solution of L(u) = 0; by induction we see
that the expression in the square brackets is differentiable in t, x which allows
us to apply the theorem stated in § 2, p. 135. – According to § 3, p. 141 (cf. the
explanation following (100)), F0 (t, x; τ, ξ) tends to E(x, ξ) as t → τ .
If n ≥ 1, then (149) shows that Fn (t, x; τ, ξ) tends to zero as t → τ , and this
implies (by the uniform convergence of (147)) that the initial condition (5) is
satisfied.
The proof of the uniqueness of the solution and of the uniform convergence
of (147) will be accomplished step-by-step using the method of § 4; we only
have to replace the integrals over (t, τ ) by the operator J τ : Lemma (145a),
(145b) provides all the necessary estimates. Again we use the more manageable
representation

n
Fn (t, x; τ, ξ) = (−1)n (−1)k ϕk,n (t, x; τ, ξ)
k=0
where  ξ
U (t, x; τ, y) dy if k = 0,
ϕk,0 (t, x; τ, ξ) = −∞
0 if k = 0,
.
¶ 156 ¶(150) ϕk,n+1 (t, x, τ, ξ) = J p(t, x)ϕk,n (t, x; τ, ξ)
τ

 +∞ /
∂P (t, x, y)
+ p(t, x) ϕk−1,n (t, y; τ, ξ) dy .
−∞ ∂y
382 On the Theory of Stochastic Processes
Again we see that the functions ϕ are monotone and we obtain the analogues
of the estimates (116).
Moreover, we obtain the new representation:


(151) F (t, x; τ, ξ) = ψk (t, x; τ, ξ)
k=0

where


(152) ψk (t, x; τ, ξ) = (−1)n ϕk,k+n (t, x; τ, ξ).
n=0

For ψ0 we get from (152) and (150) the partial differential equation
(153) L(ψ0 (t, x; τ, ξ)) − p(t, x)ψ0 (t, x; τ, ξ) = 0
with the initial condition
(154) lim ψ0 (t, x; τ, ξ) = E(x, ξ);
t→τ

if k > 0 we also get, recursively,


(155) L(ψk (t, x;τ, ξ)) − p(t, x)ψk (t, x; τ, ξ)
 +∞
∂P (t, x, y)
= −p(t, x) ψk−1 (t, y; τ, ξ) dy
−∞ ∂y
with the initial condition
(156) lim ψk (t, x; τ, ξ) = 0.
t→τ

In order to solve this differential equation we introduce the fundamental


solution Ū (t, x; τ, ξ) (in the sense of the theorem of p. 133) of the equation
L(u) − p(t, x)u = 0.
Since p(t, x) ≥ 0 we have by § 3, p. 142
(157) Ū (t, x; τ, ξ) ≥ 0.
According to § 2, p. 135 the solution of the initial value problem (153)–(156)
has the following form
(158)

ψ0 (t, x, τ, ξ) = Ū (t, x; τ, y) dy,
−∞
(159)
τ 
+∞ 
+∞
∂P (s, z, y)
ψk (t, x; τ, ξ) = ds Ū (t, x; s, z)p(s, z) dz ψk−1 (s, y; τ, ξ) dy.
∂y
t −∞ −∞

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 383


¶ 157 ¶ The factor accompanying Ū under the integral is, as one easily sees by
induction, continuously differentiable in t, x, i.e. (159) is indeed a solution to
(155).
(150), (158) and (159) yield a new representation of F (t, x; τ, ξ).
By recursion we conclude from (158) and (159) because of (157) that all
functions ψk (t, x; τ, ξ) are non-decreasing as ξ increases, and by (151) the same
is certainly true for F (t, x; τ, ξ). In order to see that F (t, x; τ, ξ) is a distribution
function as a function of ξ, we note, that by (106) and (148) we have for all
t, x, τ
lim F0 (t, x; τ, ξ) = 0, lim F0 (t, x; τ, ξ) = 1;
ξ→−∞ ξ→+∞

if n ≥ 0 and as ξ → ±∞, the expression in the square bracket in (149) tends


to zero, and so we have by § 2, p. 135 also

lim Fn (t, x; τ, ξ) = 0, n ≥ 1;
ξ→±∞

because of the uniform convergence of (147) and the monotonicity in ξ, the


function F (t, x; τ, ξ) increases from 0 to 1, q.e.d.

3. The function F (t, x; τ, ξ) constructed above has a frequency function:


 ξ
F (t, x; τ, ξ) = f (t, x; τ, y) dy,
−∞

and we obtain f (t, x; τ, ξ) from term-by-term differentiation of the representa-


tions for F (t, x; τ, ξ) from the previous section. In particular, we have


f (t, x; τ, ξ) = fn (t, x; τ, ξ)
n=0

with
f0 (t, x; τ, ξ) = U (t, x; τ, ξ),
.
(160) fn+1 (t, x; τ, ξ) = J τ − p(t, x)fn (t, x; τ, ξ)
 +∞ /
∂P (t, x, y)
+ p(t, x) fn (t, y; τ, ξ) dy .
−∞ ∂y
To study the convergence behaviour, it is enough to note that, due to the
boundedness 0 ≤ p(t, x) < α,
 τ  +∞
|J τ [pU ]| ≤ αJ τ [|U |] = α ds |U (s, y; τ, ξ)U (t, x; s, y)| dy
t −∞

holds, and the last expression is certainly bounded; we have, using the notation
of § 2 and the theorem on p. 133, U = U0 + V , where V is bounded (for U0

384 On the Theory of Stochastic Processes


cf. (38)); the expression
 τ  +∞
dp U0 (p, q; τ, ξ)U0 (t, x; p, q) dq
t −∞

¶ was transformed by the substitution (48) into a uniformly convergent inte- ¶ 158
gral. Moreover, also
 +∞
∂P (t, x, y)
U (t, y; τ, ξ) dy
−∞ ∂y

is bounded by (58). Thus, it follows from (160) that f1 (t, x; τ, ξ) is bounded


and so all further estimates carry over literally.
In this way we obtain, starting from (146) and without the detour via
F (t, x; τ, ξ), the frequency function f (t, x; τ, ξ). It is directly defined as non-
negative solution to (146) such that
 +∞ 
f (t, x; τ, ξ) dξ ≡ 1, lim f (t, x; τ, ξ) dξ = 0,
−∞ t→τ |ξ−x|>δ

for each δ > 0.


The preceding lines make it perfectly clear that we can treat the adjoint
equation
 +∞
∗ ∗ ∂P (τ, y, ξ)
N (u(τ, ξ)) ≡ L (u(τ, ξ)) − p(τ, ξ)u(τ, ξ) + p(τ, y)u(τ, y) dy = 0
−∞ ∂ξ

in exactly the same way where L∗ (u) denotes the adjoint differential expression
of L(u):
∂2   ∂  
L∗ (u(τ, ξ)) ≡ −uτ + 2 a(τ, ξ)u − b(τ, ξ)u .
∂ξ ∂ξ
Thus we arrive at a frequency function f ∗ (t, x; τ, ξ) with corresponding proper-
ties. The proof that f and f ∗ are identical and that the fundamental relation
(9) holds, is achieved by directly adopting the arguments of §§ 3 and 4. In-
deed, if u(τ, ξ) and v(τ, ξ) are (assumed to be) sufficiently regular, we have the
identity
 t  +∞ *  
+ +∞ +∞
ds vN (u)−uN ∗ (v) dy ≡ u(t , y)v(t , y) dy − u(t , y)v(t , y) dy,
t −∞ −∞ −∞

where both operators act in the variables s, y.

4. Thus we have shown that the equation (26) with the initial value prob-
lem (5) defines a unique stochastic process and that the corresponding function
F (t, x; τ, ξ) admits a frequency function which is, as a function of t, x, also a
solution to (26) and, as a function of τ, ξ, a solution to the adjoint equation

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 385


¶ 159 (27). – All that remains ¶ to be shown is that for this function F (t, x; τ, ξ) the
relations (22) and (26) hold, which had been our starting points.
According to (148)–(149) we have for Δt > 0 because of the uniform con-
vergence and (144) that
.
F (t, x; t + Δt, ξ) = F0 (t, x; t + Δt, ξ) + J t+Δt − p(t, x)F0 (t, x; t + Δt, ξ)
 +∞ /
∂P (t, x, y)
+ p(t, x) F0 (t, y; t + Δt, ξ) dy + o(Δt)
−∞ ∂y
.
= F0 (t, x; t + Δt, ξ) + Δt − p(t, x)F0 (t, x; t + Δt, ξ)
 +∞ /
∂P (t, x, y)
+ p(t, x) F0 (t, y; t + Δt, ξ) dy + o(Δt).
−∞ ∂y

Since, however, F0 (t, x; t + Δt, ξ) converges to E(x, ξ) as Δt → 0, we get

F (t, x; t + Δt, ξ) = (1 − p(t, x)Δt)F0 (t, x; t + Δt, ξ) + Δtp(t, x)P (t, x, ξ) + o(Δt).

This is the relation (22) with F0 instead of G. The fact that F0 indeed satisfies
the conditions (23)–(25) has already been established in § 3.

References
All citations of the form [Feller 19nn], resp., [*Feller 19nn] (if the respective
paper is not included in these Selecta) point to Feller’s bibliography, pp. xxv–
xxxiv.
[1] H. Cramér: On the Mathematical Theory of Risk. Skandia-Festschrift,
Stockholm 1930.
[2] ——: Sur les propriétés asymptotiques d’une classe de variables aléatoires.
C. R. Acad. Sci. Paris 201 (1935).
[3] W. Feller: Über den zentralen Grenzwertsatz der Wahrscheinlichkeits-
rechnung. Math. Zeitschr. 40 (1935).m
[4] M. Gevrey: Sur les équations aux dérivées partielles du type parabolique.
Journ. Math. pures appl. (6) 9 (1913), pp. 305–471; 10 (1914), pp. 105–
148.
[5] J. Hadamard: Sur la solution fondamentale des équations aux dérivées
partielles du type parabolique. C. R. Acad. Sci. Paris 152 (1911).
[6] A. Khintchine: Asymptotische Gesetze der Wahrscheinlichkeitsrechnung.
Ergebnisse der Math. 2, Issue 4, Berlin 1933.
m This is [Feller 1935c] which is contained in these Selecta along with an English trans-

386 On the Theory of Stochastic Processes


[7] A. Kolmogoroff: Über die analytischen Methoden in der Wahrscheinlich-
keitsrechnung. Math. Annalen 104 (1931).n ¶ ¶ 160

[8] ——: Zur Theorie der stetigen zufälligen Prozesse. Math. Annalen 108
(1933).o

[9] ——: Grundbegriffe der Wahrscheinlichkeitsrechnung. Ergebnisse der


Math. 2, Issue 3, Berlin 1933.p
[10] ——: Sulla forma generale di un processo stocastico omogeneo. Ancora
sulla forma generale di un processo stocastico omogeneo. Both: Accad.
Naz. Lincei, Rend. (6) 15 (1932).q

[11] H. Lebesgue: Leçons sur l’intégration, 2nd ed., Paris 1928.

[12] E. E. Levi: Sull’equazione del calore. Ann. di math. pura appl. (3) 14
(1908).

[13] I. Petrowsky: Über das Irrfahrtproblem. Math. Annalen 109 (1934).r


[14] E. T. Whittaker and G. N. Watson: A Course of Modern Analysis, 4th
ed., Cambridge 1935.

Stockholm, February 1936.

(Received 17–February–1936.)

lation: On the Central Limit Theorem of Probability Theory.


n On analytical methods in probability theory, in: A. N. Shiryayev (ed.): Selected Works

of A. N. Kolmogorov. Volume II: Probability Theory and Mathematical Statistics. Kluwer,


Dordrecht 1992, pp. 62–108.
o On the theory of continuous random processes, in: A. N. Shiryayev (ed.): Selected

Works of A. N. Kolmogorov. Volume II: Probability Theory and Mathematical Statistics.


Kluwer, Dordrecht 1992, pp. 156–168.
p Foundations of the theory of probability. Translation edited by Nathan Morrison, with

an added bibliography by A.T. Bharucha-Reid. Chelsea Publishing Co., New York 1956.
q Both parts are translated as On the general form of a homogeneous stochastic process,

in: A. N. Shiryayev (ed.): Selected Works of A. N. Kolmogorov. Volume II: Probability


Theory and Mathematical Statistics. Kluwer, Dordrecht 1992, pp. 121–127.
r On the problem of random walk, in: O. A. Oleinik (ed.): I. G. Petrovsky: Selected

Works. Part II: Differential Equations and Probability Theory. Gordon and Breach, Ams-
terdam 1996, pp. 278–298.

[Feller 1936c] Translation — Selected Works of W. Feller, Volume 1 387


Ó Springer International Publishing Switzerland 2015 389
R.L. Schilling et al. (eds.), Selected Papers I,
390 Mathematische Zeitschrift 42 (1937) 301–312
[Feller 1937a] — Selected Works of W. Feller, Volume 1 391
392 Mathematische Zeitschrift 42 (1937) 301–312
[Feller 1937a] — Selected Works of W. Feller, Volume 1 393
394 Mathematische Zeitschrift 42 (1937) 301–312
[Feller 1937a] — Selected Works of W. Feller, Volume 1 395
396 Mathematische Zeitschrift 42 (1937) 301–312
[Feller 1937a] — Selected Works of W. Feller, Volume 1 397
398 Mathematische Zeitschrift 42 (1937) 301–312
[Feller 1937a] — Selected Works of W. Feller, Volume 1 399
400 Mathematische Zeitschrift 42 (1937) 301–312
[Feller 1937a] — Selected Works of W. Feller, Volume 1 401
Translation of [Feller 1937a]

On the Central Limit ¶ 301

Theorem of Probability
Theory. II
By Willy Feller in Stockholm
Introduction
In an earlier paper,1 among other results, exact conditions were determined
for the classical Laplace-Ljapounoff limit theorem to hold, and a criterion was
Translated and typeset by René L. Schilling. I am grateful for critical comments and
suggestions by Hans Fischer and Zoran Vondraček. The symbol ¶ indicates a page break in
the original text, and the original pagination is shown in the margin. Footnotes indexed by
lowercase Roman letters contain editorial comments. Throughout the text the index ν has
been changed to μ since the Greek ν closely resembles v, the small Roman V .
1 On the central limit theorem of probability theory, Math. Zeitschr. 40 (1935). In the

sequel, this paper will be cited as 1. Let me use this opportunity for the following corrections.
a) My results rely essentially on a theorem establishing the connection between the conver-
gence of a sequence of distribution functions with the convergence of the corresponding char-
acteristic functions; for this theorem I quote in Footnote 10, p. 529, Bochner and I remark
that P. Lévy has proved “a somewhat more restrictive theorem”. This is a regrettable error
on my side: In fact, the above mentioned theorem which I have used is exactly due to P. Lévy.
b) Please read on p. 548, line 4 from below
 

dVμ (x) < instead of dVμ (x) < .
a
|x|> 4Tn
2 |x|<ηan

Finally, the following addendum might be of interest. On pp. 524 and 531 the hypo-
thetic case of bounded normalization factors an was explicitly excluded from the consid-
erations since “in this case one does not deal with an asymptotic law, but with the prob-
lem how to split Φ(x) into components”. After 1 had been published, this question was
answered conclusively. Cramér proved [“Über eine Eigenschaft der normalen Verteilungs-
funktion”, Math. Zeitschr. 41 (1936), also C. R. Acad. Sci., Paris 202 (1936)] the follow-
ing
 theorem which was repeatedly stated as a conjecture by P. Lévy: The integral equation
+∞
V1 (x − y) dV2 (y) = Φ(x), where Vk (x) are distribution functions, has only the solution
−∞

Ó Springer International Publishing Switzerland 2015 403


R.L. Schilling et al. (eds.), Selected Papers I,
given to determine whether a given sequence of distribution functions con-
verges to the Gaussian standard normal distribution function Φ(x) (in which
case one could also calculate both sequences of normalizing factors). The first
aim of the following observations is to take up a suggestion by Marcel Riesz and
to cast the above criterion in a much more convenient form; as an additional
advantage, this formulation makes the connection with the known sufficient
conditions of Ljapounoff and of Lindeberg more transparent. The second aim
of this paper is to address a still remaining open question which arose when
dealing with the particular case of sequences with identical elements.
¶ 302 ¶ Let {Vk (x)} denote a sequence of distribution functions, i.e. non-decreas-
ing functions which are defined for all real x and have the limits 0 and 1,
respectively, as x → ±∞; we set
 ∞
(1) W1 (x) = V1 (x), Wn+1 (x) = Wn (x − y) dVn+1 (y).
−∞

We say, for short2 , that the sequence {Vk (x)} belongs to the Gaussian standard
normal distribution function Φ(x) with the normalizing factors {an }, if
 x
1 1 2
(2) Wn (an x) → Φ(x) = √ e− 2 y dy
2π −∞

and

0 for x < 0,
(3) Vn (an x) → E(x) =
1 for x > 0.

The probabilistic meaning of the latter condition is that the influence of the
single components tends to zero as n increases, i.e. the convergence of Wn (an x)
is not caused by the dominating influence of the function Vn (x) which, if suit-
ably normalized, converges to Φ on its own.3 This condition is equivalent to the
an+1
requirement that → 1 and an → ∞. — The most general question with
an
regards to the central limit theorem is as follows: Given a sequence of distri-
bution functions {Vk (x)}; under which conditions do there exist two sequences
of normalizing constants {an } and {bn } such that the sequence {Vk (x + bk )}
with the normalizing factors {an } belongs to Φ(x); and, if so, how can one
determine these normalizing factors? The main result of 1 was a complete
 
x − mk
Vk = Φ with σ12 + σ22 = 1, m1 + m2 = 0. This means that the case which was ex-
σk
cluded corresponds to the trivial sequence

x + bμ  
Vμ (x) = Φ with bμ = 0, c2μ = 1.

2 1. p. 524.
3 1, p. 532. In this paper, a further possible case was excluded. As Cramér [cf. Footnote

404 On the Central Limit Theorem of Probability Theory. II


answer to this question which we are now going to restate in the following
way.
¶ Clearly, the question under consideration is invariant with respect to ¶ 303
shifts of the origin, i.e. there is a simultaneous answer for all sequences {Vk (x +
ck )} with given distribution functions {Vk (x)} and arbitrary constants ck .
Without loss of generality we may, therefore, assume that the coordinate ori-
gins are not completely excentric relative to the supports of the functions
Vk (x), i.e., for example, that for some fixed 0 < λ < 1 one has
(4) Vk (0−) < 1 − λ and Vk (0+) > λ.
(If λ = 12 this means that the coordinate origin is a median of Vk (x).) Then
we have the
Criterion A. Let the sequence {Vk (x)} be given (with the normalization (4) of
the coordinate origin). For the existence of two sequences of real numbers {an }
and {bn } such that the sequence {Vk (x + bk )} with normalization constants an
belongs to Φ(x), it is necessary and sufficient that
n 
X2 dVk (x)
k=1 |x|>X
(5) lim
X→∞ n 

=0
n→∞ 2
x dVk (x)
k=1 |x|<X

and, at the same time,


X2
(6) lim n 
X→∞ n→∞ 
lim =0
2
x dVk (x)
k=1 |x|<X

hold. Then (and only then) there is a sequence of real numbers Xn → ∞ such
that
Xn2
(7) n  →0

2
x dVk (x)
k=1 |x|<Xn

as well as
n 

(8) dVk (x) → 0
k=1 |x|>Xn

hold. In this case one may set


⎧ ⎫
n ⎨  2⎬

(9) a2n = x2 dVk (x) − x dVk (x)


⎩ |x|<Xn |x|<Xn ⎭
k=1

1, last paragraph] has meanwhile shown, this case will practically not appear.

[Feller 1937a] Translation — Selected Works of W. Feller, Volume 1 405


and define the constants bn by

(10) (x − bn ) dVn (x) = 0
|x|<an

¶ 304 ¶ According to 1, p. 530 and 533, respectively, the thus defined sequences
{an } and {bn } can be replaced by two different sequences {an } and {bn } if,
and only if,

1  
n
an
(11) →1 and (bk − bk ) → 0.
an an
k=1

By the way, this is an immediate consequence of P. Lévy’s theorem on the


convergence of characteristic functions. — A method to calculate the numbers
Xn explicitly will be given in no. 2, p. 306.
The main condition (5) can be cast in a different form which reminds of
Ljapounoff’s condition.

Criterion B. In the statement of Criterion A one may replace (5) by

n 

|x|p dVk (x)
k=1 |x|<X
(12) lim
X→∞ n 

=0
n→∞ p−2 2
X x dVk (x)
k=1 |x|<X

where p is any fixed number > 2.

The remark that both criteria are equivalent (as is easily proved with known
methods) is due to Marcel Riesz whose valuable advice is gratefully acknowl-
edged.
The second part of this paper is devoted to an important special case where
all elements are identical: Vk (x) = V (x). Condition (6) is (unless V (x) degen-
erates into a step function with exactly one discontinuity at the origin) auto-
matically satisfied, while (5) and (12) are transformed into the two equivalent
relations
 
|x|p dV (x) X2 dV (x)
|x|<X |x|>X
(13) lim  = lim  = 0.
X→∞ X→∞
X p−2 x2 dV (x) x2 dV (x)
|x|<X |x|<X

In the latter form the condition was already given in 1 (§8, Example a, p. 554)
and discovered, almost simultaneously and independently with different tech-

406 On the Central Limit Theorem of Probability Theory. II


niques, by A. Khintchine and P. Lévy4 . This gives a necessary and sufficient
condition for the existence of a sequence of real numbers {bn } such that the
sequence {V (x + bn )} belongs to Φ(x). ¶ I am indebted to Mr. P. Lévy for ¶ 305
pointing out the interesting question, whether and under which condition it is
possible to choose the bn independently of n: Only in this case one really deals
with a sequence with identical elements that belongs to Φ(x). It will now be
shown (no. 5, p. 310) that this is always the case: If the sequence {V (x + bn )}
belongs, for arbitrary bn , to Φ(x), i.e. if (13) holds, then V (x) admits a finite
first absolute moment5 ; in this case one can choose the number b in such a
way, that the first moment of V (x + b) vanishes and the sequence consisting
of the elements V (x + b) belongs to Φ(x). The corresponding normalizing
factors an can be calculated by the method shown in no. 2.

1 Auxiliary Results
Our proof relies on the following facts which were proved in 1.

a) In order that the sequence of distribution functions {Vk (x)} with nor-
malization factors {an } belongs to Φ(x) it is necessary and sufficient6 , that for
every fixed η > 0 we have simultaneously
n 
(1) dVk (x) → 0,
k=1 |x|>ηan
⎧ ⎫
n ⎨
  2⎬
1
(2) x2 dVk (x) − x dVk (x) → 1,
a2n ⎩ |x|<an |x|<an ⎭
k=1
n 
1 
(3) x dVk (x) → 0.
an |x|<an
k=1

b) Let {bn } by any sequence of reals such that


bn
(4) →0
an

4 A. Khintchine: “Sul dominio di attrazione della legge di Gauss”, Giorn. Ist. Ital. At-

tuari 6 (1935). — P. Lévy: “Propriétés asymptotiques des sommes de variables aléaotires


indépendantes ou enchaînées”, J. Math. pures appl., IX.s. 14 (1935). I readily acknowledge
that Mr. P. Lévy kindly pointed out that his paper, despite the later date of publication,
had been submitted much earlier than mine (October 1934 vs. May 1935) and was presented
to the Société Math. de France.
5 This fact has been remarked by P. Lévy (loc. cit.)a and will again follow from our proof.
6 1, §1, p. 524 f.

a Feller refers to Lévy’s Theorem IV, p. 368 in the “Propriétés asymptotiques. . . ”. I am

[Feller 1937a] Translation — Selected Works of W. Feller, Volume 1 407


and let (1) be satisfied; then7 one has
n 
(5) lim dVk (x + bk ) = 0
n→∞
k=1 |x|>ηan

¶ 306 ¶ and
 
n  
1   

(6) lim 2  x2 dVk (x) − x2 dV (x + bk ) = 0.
n→∞ an  |x|<an |x|<an 
k=1

c) If the relations (1) and (2) obtain and if bn is defined by (10), then
the sequence {Vk (x + bk )} belongs, with the normalization factors an , to Φ(x).
Then, the convergence is even absolute8 , and one has
 
n 
1   

(7) lim  x dVk (x + bk ) = 0,
n→∞ an  |x|<an 
k=1
 2
1 
n
lim x dVk (x + bk ) = 0.
n→∞ a2
n |x|<an
k=1

2 Sufficiency of Criterion A
First we show: If the conditions (5) and (6) hold, then there exists a sequence
of real numbers Xn → ∞ such that (7) and (8) are satisfied.
We have to distinguish between two cases. Firstly, if
n 
(1) lim lim dVk (x) = 0
X→∞ n→∞
k=1 |x|>X

holds, then (8) is true for every sequence Xn → ∞, whereas by (6), the relation
(7) is always satisfied for every sufficiently slowly growing sequence {Xn }.
Secondly, one may have
n 
(2) lim lim dVk (x) = α > 0
X→∞ n→∞
k=1 |x|>X

(the existence of the iterated limits follows from monotonicity). Then, for
every 0 <  < α, there is obviously a sequence X̄n () → ∞ such that
n  n 
(3) lim dVk (x) > 0, lim dVk (x) < .
n→∞ n→∞
k=1 |x|>X̄n ( ) k=1 |x|>2X̄n ( )

7 The proof is evident, cf. also 1, §3, p. 537.


8 1, §1, p. 526

indebted to Hans Fischer for pointing this out.

408 On the Central Limit Theorem of Probability Theory. II


Because of (5) one has

X̄n2 ()
(4) limn 
n→∞ 
= 0.
2
x dVk (x)
k=1 |x|<X̄n ( )

¶ The second relation in (3) together with (4) shows that, for the validity ¶ 307
of (7) and (8), it is enought to pick a sufficiently slowly decreasing sequence
n → 0 and to define Xn = X̄n (n ).
Now choose an arbitrary sequence Xn → ∞ such that (7) and (8) hold, and
define the numbers an by (9). By the Schwarz inequality one has
 Xn
2  Xn
x dVk (x) ≤ {Vk (Xn ) − Vk (0)} x2 dVk (x),
0 0

an analogous inequality holds for x < 0 and, by (4), one has


 2 
x dVk (x) ≤ (1 − λ) x2 dVk (x).
|x|<Xn |x|<Xn

Thus, because of (7) and (9)


n 
a2n 1 
lim 2
≥ λ lim 2
x2 dVk (x) = ∞.
n→∞ Xn n→∞ Xn |x|<Xn k=1

This, together with (8), proves (1) for every η > 0.


Moreover, if n is chosen so large that an > Xn can be excluded, one has by
(8) and by (9)
 ⎧ ⎫ 
 n ⎨  2⎬ 
 1  
 x dVk (x) −
2
x dVk (x) − 1
 a2 ⎩ ⎭
 n k=1 |x|<an |x|<an 
   
1  
n
= 2 x2 dVk (x) − x2 dVk (x)
an  |x|<a n |x|<X n
k=1
 2  2 

− x dVk (x) + x dVk (x) 
|x|<an |x|<Xn 
n 
≤3 dVk (x) → 0.
k=1 |x|>Xn

Therefore, also the relation (2) holds and Theorem c) mentioned in no. 1 shows
that the condition of the criterion is indeed sufficient.

[Feller 1937a] Translation — Selected Works of W. Feller, Volume 1 409


3 Necessity of Criterion A
Let the coordinate origins again be normalized by (4) and assume that the
sequence {Vk (x + bk )} belongs, with the normalizing factors an , to Φ(x). Ac-
cording to Theorem a) from no. 1 one has for each fixed η > 0
n 
 n 
1 
lim dVk (x + bk ) = 0, lim 2 x2 dVk (x + bk ) ≥ 1.
n→∞ n→∞ an
k=1 |x|>ηan |x|<an
k=1

¶ 308 ¶ From the first of these relations one concludes that, as n → ∞, one has
uniformly for k = 1, 2, . . . , n

dVk (x + bk ) → 0.
|x|>ηan

Since η may be chosen arbitrarily small, this and (4) immediately entail that

1
(1) lim max{|b1 |, |b2 |, . . . , |bn |} = 0.
n→∞ an

By Theorem b) one thus has


n 
 n 
1 
(2) lim dVk (x) = 0, lim 2
x2 dVk (x) ≥ 1.
n→∞ n→∞ an
k=1 |x|>ηan |x|<an
k=1

This yields, by division,


n 

a2n dVk (x)
|x|>ηan
(3) lim nk=1
 =0
n→∞ 
2
x dVk (x)
k=1 |x|<an

for every fixed η > 0.


Let Yn be any divergent, increasing sequence of positive reals. Since
an+1
an → ∞ and → 1 (cf. the introduction, p. 302 and 1, pp. 523 and 531,
an
respectively), one can find, for sufficiently large n, some kn such that
akn < Yn ≤ 2akn .

410 On the Central Limit Theorem of Probability Theory. II


Then, however, one has by (3)
n  n 

Yn2 dVk (x) 4a2kn dVk (x)
k=1 |x|>Yn k=1 |x|>akn
n  ≤ n  → 0;
 
2 2
x dVk (x) x dVk (x)
k=1 |x|<Yn k=1 |x|<akn

since the sequence Yn → 0 is arbitrary, this entails the necessity of (5).


All that remains is to prove that (6) holds. Again there are two cases.
First, if (1) holds, then there is some number X0 such that for X > X0 ,
n 
1
lim dVk (x) <
n→∞ |x|≥X 2
k=1

¶ holds true. If X > X0 and if n is so large that an > X, one therefore has ¶ 309

n  n  n 
1  1  
x2
dV k (x) ≤ x2
dV k (x) + dVk (x)
a2n a2n
k=1 |x|<an k=1 |x|<X k=1 |x|≥X
n 
1  1
< 2 x2 dVk (x) + .
an |x|<X 2
k=1

By (2) this immediately implies that


n 
1  1
lim 2 x2 dVk (x) ≥ ,
n→∞ an |x|<X 2
k=1

and, since an → ∞, one has


n 

lim x2 dVk (x) = ∞;
n→∞
k=1 |x|<X

this proves that (6) holds. Alternatively, (2) obtains. Then (6) is a direct
consequence of (5) and (2), as one can see using a sufficiently slowly growing
sequence Xn → ∞.

4 Equivalence of Both Criteria


We use for X > 0 and p ≥ 2 the following shorthand
n 
 n 

(p)
Fn (X) = dVk (x), Fn (X) = |x|p dVk (x).
k=1 |x|>X k=1 |x|<X

[Feller 1937a] Translation — Selected Works of W. Feller, Volume 1 411


(p)
As a function of X, Fn (X) is non-increasing and Fn (X) non-decreasing, and
one has  X
(p)
Fn (X) = − xp dFn (x).
0
We have to show that (5) implies (12), and vice versa. This can be done (cf.
p. 304), for example, in the following way.
Assume first that (5) holds. For any given  > 0 one picks some n̄ = n̄()
and X̄ = X̄() such that for all n > n̄ and X > X̄

X 2 Fn (X)
(2)
<
Fn (X)

¶ 310 ¶ holds. For p > 2, n > n̄ and X > X̄ one easily obtains by integration by
parts
 X  X
(p)
Fn (X) = − xp dFn (x) = C − xp dFn (x)
0 X̄
 X

= C +p xp−1 Fn (x) dx

 X
< C  + p
(2)
xp−3 Fn (x) dx

p  (2)
<C + X p−2 Fn (X);
p−2

here, C and C  denote positive constants which may depend on , but not on
(2)
n and X. Since it is certain that Fn (X) is positive for sufficiently large X,
it follows from the last inequality that
(p)
Fn (X)
(2)
X p−2 Fn (X)

is uniformly small for large n and X, q.e.d.


Conversely, the proof that (12) implies (5) is even easier. From (12) it
follows for sufficiently large n and X
(p) (2)
(1) Fn (X) < X p−2 Fn (X).
1 p
For fixed n and p > 2 the expression Fn (X) trivially tends to 0 and so one
Xp
has by (1) for sufficiently large n and X

412 On the Central Limit Theorem of Probability Theory. II


 ∞  ∞
1 (p) 1 (p)
Fn (X) = dFn (x) ≤ p Fn (x) dx
X xp X xp+1
 ∞
1 (2)
< p F (x) dx
3 n
X x

p 1 (2) p ∞ 1 (2)
< F n (X) + dFn (x)
2 X2 2 X x2
p 1 (2)
= Fn (X) + Fn (X)
2 X2

or, for 0 <  < 2/p,


X 2 Fn (X) p
< ,
(2)
Fn (X) 2 − p
q.e.d.

5 The Case of Identical Components


Let the sequence {Vk (x) = V (x + bk )} with the normalization factors an , where
V (x) is a fixed distribution function, belong to Φ(x). According to Theorem
a) in no. 1, p. 305, one has for every η > 0
n 

(1) lim dV (x + bk ) = 0,
n→∞
k=1 |x|>ηan

n 
1 
¶(30) lim x dV (x + bk ) = 0; ¶ 311
n→∞ an |x|<an
k=1

moreover, there is a relation which corresponds to (2) but which we will not
explicitly use in the sequel.
Set  +∞
b= x dV (x);
−∞

following a remark by P. Lévy (cf. p. 305, Footnote 5), this is well defined
and our proof includes this, too. The theorem in question is as follows: The
sequence consisting of identical elements V (x + b) belongs to Φ(x), too. By a
theorem mentioned in the introduction on p. 304 this is the case if, and only
if,
1 
n
(b − bk ) → 0
an
k=1

[Feller 1937a] Translation — Selected Works of W. Feller, Volume 1 413


and this relation is, by (30), equivalent to
n 
1 
(3) lim x dV (x + bk ) = 0.
n→∞ an |x|≥an
k=1

Similar to the argument at the beginning of no. 3 it easily follows from (1)
that, in turn, (1) holds true. For sufficiently large n and k = 1, . . . , n one has,
therefore,
  
 
 
 x dV (x + bk ) ≤ (|x| + |bk |) dV (x).
 |x|≥an  |x|> 1 an 2

If we can show that



n
(4) lim |x| dV (x) = 0
n→∞ an |x|> 12 an

holds, then (3), hence our claim, are proved. Moreover, (4) contains the finite-
ness of the first moment of V (x).
By Theorem c) of no. 1 one may assume without loss of generality that the
constants bn are chosen in such a way that the sequence Vk (x) = V (x + bk )
satisfies the relations (7); then (2) becomes
n 
1 
lim x2 dV (x + bk ) = 1,
n→∞ a2
n |x|<an
k=1

and one knows from Theorem b) that



n
lim 2 x2 dV (x) = 1.
n→∞ an |x|<a
n

¶ 312 ¶ Because of this relation, (4) is equivalent to



an |x| dV( x)
|x|>an
(5) lim  = 0.
n→∞
x2 dV (x)
|x|<an

For X > 0 we set


  
F (X) = dV (x), F (2) (X) = x2 dV (x), G(X) = |x| dV (x).
|x|>X |x|<X |x|>X

Then (5) is contained in


XG(X)
lim = 0,
X→∞ F (2) (X)

414 On the Central Limit Theorem of Probability Theory. II


X 2 F (X)
and, thus, it is enough to prove this assertion. By our criterion,
F (2) (X)
tends to zero so that for sufficiently large X
 (2)
F (X) < F (X)
X2
holds, as well as
 ∞  ∞
G(X) = − x dF (x) = XF (X) + F (x) dx
X X
 ∞
 1 (2)
< F (2) (X) +  2
F (x) dx
X X x
2
= F (2) (X) + G(X).
X
Thus, one has for 0 <  < 1

XG(X) 2
lim ≤
X→∞ F (2) (X) 1 − 

q.e.d.

(Received 24–August–1936)

[Feller 1937a] Translation — Selected Works of W. Feller, Volume 1 415


Erratum: Math. Zeitschrift 44 (1939) p. 794

Corrections
Volume 42, pp. 301–312

I am grateful to Messrs. Doeblin and Fréchet for the kind advice that the
new version on pp. 303–4 of my necessary and sufficient condition from vol. 40,
pp. 521 f. is due to a regrettable mistake (following the equality (3)): In the
present form the criterion is only sufficient. In fact, the validity of (5), hence of
(12), has to be restricted in an obvious way, which also becomes immediately
clear from the proof. I do not give details since, meanwhile, Mr. Doeblin
succeededb in proving an even more useful formulation of the criterion.
In the case of identical components, the criterion remains valid and so
does the main result of the paper: The answer to Mr. P. Lévy’s question is
independent of the first part.
Willy Feller.

b Feller probably refers to W. Doeblin: Sur les sommes d’un grand nombre de variables

aléatoires independantes. Bulletin des Sciences Mathématiques, II. Ser. 63 (1939) 23–32
and 35–64.

416 On the Central Limit Theorem of Probability Theory. II


Ó Springer International Publishing Switzerland 2015 417
R.L. Schilling et al. (eds.), Selected Papers I,
418 Acta. Sci. Litt. Szeged 8 (1937) 191–201
[Feller 1937b] — Selected Works of W. Feller, Volume 1 419
420 Acta. Sci. Litt. Szeged 8 (1937) 191–201
[Feller 1937b] — Selected Works of W. Feller, Volume 1 421
422 Acta. Sci. Litt. Szeged 8 (1937) 191–201
[Feller 1937b] — Selected Works of W. Feller, Volume 1 423
424 Acta. Sci. Litt. Szeged 8 (1937) 191–201
[Feller 1937b] — Selected Works of W. Feller, Volume 1 425
426 Acta. Sci. Litt. Szeged 8 (1937) 191–201
[Feller 1937b] — Selected Works of W. Feller, Volume 1 427
Translation of [Feller 1937b]

On the Law of Large ¶ 191

Numbers
Dedicated to Harald Bohr
on the occasion of his
50th birthday 22–April–1937

By Willy Feller in Stockholm

1. A sequence of random variables X1 , X2 , . . . , is said to converge in prob-


ability 1 to zero, if the probability of the relation |Xn | >  > 0 tends to zero as
n increases, i.e. if the distribution functions Vn (x) of the Xn satisfy

0 for x < 0,
lim Vn (x) =
n→∞ 1 for x > 0.

Translated and typeset by René L. Schilling. I am grateful for critical comments and
suggestions by Hans Fischer and Zoran Vondraček. The symbol ¶ indicates a page break in
the original text, and the original pagination is shown in the margin. Footnotes indexed by
lowercase Roman letters contain editorial comments.
1 For all definitions as well as for a mathematically rigorous foundation for the notions

used here, I refer to the fundamental treatises by A. Kolmogoroff, Grundbegriffe der Wahr-
scheinlichkeitsrechnung a and A. Khintchine, Asymptotische Gesetze der Wahrscheinlich-
keitsrechnung, respectively. Both are included in: Ergebnisse der Mathematik, Bd. 2 (Berlin
1933). For linguistic reasons I prefer “stochastic variable” (variable aléatoire = real function
on the basic set) over “chance variable”. b
a A. N. Kolmogorov: Foundations of the theory of probability. Translation edited by

Nathan Morrison, with an added bibliography by A. T. Bharucha-Reid. Chelsea Publishing


Co., New York 1956.
b Following today’s custom, see also Feller’s later English-language publications, we will

use throughout the familiar random variable.

Ó Springer International Publishing Switzerland 2015 429


R.L. Schilling et al. (eds.), Selected Papers I,
Moreover, the sequence {Xn } satisfies the law of large numbers, if there is a
sequence of real numbers {bn } such that the sequence of random variables

1
n
1
(Xk − bk ) = Sn
n n
k=1

converges in probability to zero. In the case of mutually independent random


variables Xn Kolmogoroff has given a necessary and sufficient condition.2
¶ 192 Recently, ¶ using different methods, Plessner3 has derived a stronger sufficient
condition.
We are going to give a new proof of Kolmogorov’s theorem; at the same time
this is a generalization which is inspired by a recent question of Khintchine4 in
a related special situation. A new proof of Kolmogorov’s condition might be in-
teresting in order to get a unification of the methods. The unforeseen progress
in this direction of modern probability theory—which is essentially due to the
Moscow mathematical school—emphasizes the mathematical treatment based
on differential equations and characteristic functions. The latter are used in
the following proof which is closely related to my derivation of necessary and
sufficient conditions for the Laplace–Ljapounov limit theorem5 ,—and the law
of large numbers may well be seen as a degenerate case of this theorem. This
approach allows, without substantial changes, to replace the normalization
1 1
n Sn by a general normalization an Sn . We will, in fact, prove the following

Theorem. Assume that Xn are pairwise independent random variables with


distribution functions Vn (x). Then there exists a sequence of constants {bn }
1 
n
such that (Xk − bk ) converges in probability to zero if
an
k=1

n 

(1) dVk (x) = o(1)
k=1 |x|>an

2 A. Kolmogoroff: Über die Summen durch den Zufall bestimmter unabhängiger Größen,

Math. Annalen, 99 (1928), pp. 300–319. One should note the erratum with corrections: A.
Kolmogoroff, Bemerkungen zu meiner Arbeit “Über die Summen zufälliger Grössen”, Math.
Annalen, 102 (1930), pp. 484–488.c
3 A. Plessner: Über das Gesetz der großen Zahlen, Recueil Math. (= Matematitscheski

Sbornik, Moskau), 43 (= neue Folge, 1) (1936), pp. 165–168.d


4 A. Khintchine, Su una legge dei grandi numeri generalizzata, Giornale Istituto Attuari,

7 (1936), pp. 365–377.


5 W. Feller, Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung I, Math.

c English translations are contained in A.N. Shiryayev (ed.): Selected Works of A.N.

Kolmogorov. Volume II: Probability Theory and Mathematical Statistics. Kluwer, Dordrecht
1992, pp. 15–26, 26–31.
d Feller writes in his Zentralblatt review [Zbl 0014.16804] that Plessner’s paper generalizes

Khintchine’s result [C. R. Acad. Sci., Paris 188 (1929) 477–479] to independent, identically
distributed random variables with finite absolute first moments.
e Feller refers to the central limit theorem and his papers [Feller 1935c] and [Feller 1937a].

430 On the Law of Large Numbers


and ¶ ¶ 193

n 
1 
(2) x2 dVk (x) = o(1).
a2n |x|<an
k=1

In this case, one may set



(3) bn = x dVn (x).
|x|<an

If the coordinate origins are chosen in such a way that for all n
(4) Vn (+0) ≥ λ > 0, Vn (−0) ≤ 1 − λ
hold, then these conditions are also necessary.
1  1 
n n
The random variables (Xk −bk ) and (Xk −bk ) simultaneously
an an
k=1 k=1
converge in probability to zero if, and only if (cf. no. 2. on p. 195)

1 
n
(5) (bk − bk ) = o(1)
an
k=1

holds. There are, obviously, always sequences {an } such that (1) and (2) are
satisfied; therefore it is basically a matter of determining the slowest possible
growth.
If an = n, we get Kolmogorov’s theorem. The connection to the problem
studied by Khintchine (cf. footnote 4) is as follows. There it is assumed,
restrictively, that all Xn are positive and that they have the same (continuous)
distribution function V (x):
(6) Vn (x) = V (x), V (0) = 0.
Then the question is asked under which conditions one can pick the constants
an and bn such that
1 
n
(7) bk = 1.
an
k=1

By (3) this is equivalent to


 an
n
(8) x dV (x) = 1 + o(1)
an 0

which yields, because of (1),


 ∞  an
1
(9) dV (x) = o x dV (x) .
an an 0

Zeitschrift, 40 (1935), pp. 521–559 and II, loc. cit. 42 (1937), pp. 301–312.e

[Feller 1937b] Translation — Selected Works of W. Feller, Volume 1 431


an+1
¶ 194 ¶ Because of (1) and (8) we have trivially → 1, and so (9) implies, more
an
generally,
 ∞  z
1
(10) dV (x) = o x dV (x) , z → ∞.
z z 0

Conversely, if (10) holds, we can easily choose the an in such a way that (1)
and (8) are satisfied. The condition (2) is, if we assume (6), a consequence of
(10). In fact, one has
 z  z  z  +∞
2
x dV (x) = 2 x{V (z) − V (x)} dx ≤ 2 x dx dV (y),
0 0 0 x

which easily implies the estimate


 
n an 2 n an
x dV (x) = o x dV (x) = o(1).
a2n 0 an 0

Thus, (10) is necessary and sufficient for the choice of the constants. (The con-
dition which was obtained by Khintchine can be reduced to (10) by changing
the order of integration and evaluating one integral.)
In the general case, where Vn (x) = V (x) and with the usual normalization
an = n, (1) is equivalent to V (−z) + 1 − V (z) = o( z1 ). Condition (2) is then
a simple consequence. This condition has already been derived by Cramér6
using characteristic functions. If the first moment is finite, the same method
had already been used by Khintchine7 .
Let us finally remark that the above mentioned theorem immediately gives
a more precise answer to the question raised in the so-called St. Petersburg
Paradox (cf. no. 7., p. 200 f.).

2. Setting  +∞
vn (t) = eixt dVn (x),
−∞

¶ 195 ¶ we know that the characteristic function of (Xn − bn ) is e−ibn t vn (t) and that
1 
n
of (Xk − bk ) is
an
k=1

it
n 
n
t
(11) wn (t) = e− an b
k=1 k vk .
an
k=1

6 Ina booklet appearing soon in the Cambridge Tracts series.f


7 A.Khintchine, Sur la loi des grands nombres, Comptes Rendus Paris, 189 (1929), pp.
477–479
f H. Cramér: Random Variables and Probability Distributions. Cambridge Tracts in

432 On the Law of Large Numbers


According to a well-known theorem by P. Lévy8 on the convergence of char-
1 
n
acteristic functions, the convergence in probability to zero of (Xk − bk )
an
k=1
is equivalent to

(12) wn (t) → 1

uniformly on every finite interval.


Incidentally, the combination of (11) and (12) immediately proves the re-
mark connected with (5). — If we exclude the case that |vn (t)| ≡ 1 for all n
(in this case the theorem would be trivial), it directly follows from (12) that
an → ∞. Therefore, it is easy to see that we may always restrict ourselves to
monotonically increasing, divergent sequences {an }.

3. We begin by showing that the conditions (1) and (2) are necessary; to
do so, we start with (12) and assume that (4) is satisfied.
From (12) it follows that, uniformly in every interval |t| < T ,
 +∞
itx
e an dVn (x + bn ) → 1.
−∞

Thus, for every fixed η > 0,



xt
1 − cos dVn (x + bn ) → 0,
|x|>ηan an

and integrating over a fixed interval 0 < t < T , T > η1 yields


 
1 an xT
0≤ T − dVn (x+bn ) ≤ T− sin dVn (x+bn ) → 0.
η |x|>ηan |x|>ηan x an

¶ Thus, for every positive η, ¶ 196



(13) dVn (x + bn ) → 0.
|x|>ηan

From (4) and (13) one obviously gets that

(14) |bn | = o(an )

is necessary. From this and (12) one obtains easily, using the monotonicity of
the sequence {an }, that

(15) dVk (x) → 0
|x|>ηan
8 P. Lévy, Calcul des Probabilités (Paris 1925), pp. 195 and 197.

Math. and Math. Phys. 36, Cambridge Univ. Press, Cambridge 1937.

[Feller 1937b] Translation — Selected Works of W. Feller, Volume 1 433


uniformly in k = 1, 2, . . . , n as n → ∞.
We will now show that in every interval |t| < T for n > N = N (T ) and
k = 1, 2, . . . , n
 +∞ 2  +∞
xt λ xt
(16) sin dVk (x) ≤ 1 − sin2 dVk (x)
−∞ an 2 −∞ an
π xt
holds. For this we set η = . Then sin is for all 0 < t < T positive in
2T an
the interval 0 < x < ηan , and negative in −ηan < x < 0. Whenever the left-
λ
hand side of (15) is smaller than , it follows from (4) that, for every fixed
2
xt
|t| < T , the variation of all Vk (x) with respect to the two sets where sin is
an
λ
non-negative and non-positive, respectively, is at least . An application of
2
the Schwarz inequality easily shows that (16) holds.
By (12) one sees that
n  +∞ 
 ixt 
(17)  e dVk (x) → 1;
an

k=1 −∞

from (16) it follows that


 +∞ 2  2 
 ixt  +∞
xt λ +∞
xt
 e an dV (x) ≤ cos dVk (x) + 1− sin2 dVk (x)
 k  an 2 an
−∞ −∞ −∞
 +∞
λ xt
≤ 1− sin2 dVk (x)
2 −∞ an
¶ 197 ¶ holds. Because of (17), we thus have that
n  +∞
xt
sin2 dVk (x) → 0,
−∞ an
k=1
uniformly in every finite interval. In particular, for each positive η one has
n 
 2xt
(18a) 1 − cos dVk (x) → 0
|x|>ηan an
k=1
and
n 
 2xt
(18b) 1 − cos dVk (x) → 0.
an
k=1 |x|<ηan
1
Integrating (18a) over 0 < t < T , T > , one obtains (cf. the derivation of

(13)) that
n 

(19) dVk (x) → 0.
k=1 |x|>ηan

434 On the Law of Large Numbers


This proves the necessity of (1).
Now choose some fixed η > 0 so small that for |t| < T and 0 ≤ |x| ≤ ηan

2xt x2 t2
1 − cos ≥ 2
an an

holds. From the relation (18b) one concludes immediately that for this η
n 
1 
(20) x2 dVk (x) → 0.
a2n |x|<ηan
k=1

Since (19) holds for every η > 0, one easily sees that (20) remains true for any
η > 0 and, in particular, for η = 1. This proves the necessity of (2).

4. In order to prove that our conditions are sufficient, we begin with the
following

Lemma. Let {an } denote any monotonically increasing, divergent sequence


of real numbers such that for every η > 0 the relation ¶ (19) holds, and define ¶ 198
the constants bn by (3). Then one has
n 
 
1   

(21)  (x − bk ) dVk (x) = o(1).
an  |x|<an 
k=1

The same lemma was used in the proof of the central limit theorem9 . The
an+1
proof of this lemma essentially relies on the fact that → 1 which is not
an
assumed in the present setting. In order to prove the lemma in full generality,
one determines N = N () such that for all n ≥ N
n 

(22) dVk (x) < .
1
k=1 |x|≥ e an

Then, by (3), one has


   
1    1   
n n
 
 (x − bk ) dVk (x) =  (x − bk ) dVk (x)
an  |x|<an  an  ak ≤|x|<an 
k=N k=N

1 
n
≤ |x − bk | dVk (x)
an e[log ak ] ≤|x|<e[log an ]+1
k=N


[log an ]
 
1
≤ |x − bk | dVk (x).
an s s+1
s=[log aN ] aN ≤ak ≤es+1 e ≤|x|<e

9 op. cit. Footnote 5, [Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrech-

[Feller 1937b] Translation — Selected Works of W. Feller, Volume 1 435


Clearly, |bk | ≤ ak , and so the right-hand side of the last line can be estimated
by


[log an ]
 
2
≤ es+1 dVk (x)
an |x|≥es
s=[log aN ] aN ≤ak ≤es+1

2 
[log an ]
≤ es+1 < 2 e2 .
an
s=[log aN ]

Moreover, one obtains by splitting the domain of integration


 
N −1  −1  −1 
1    2a N 
N
 N
 (x − bk ) dVk (x) ≤ dVk (x) + 2 dVk (x)
an  |x|<an  an |x|<aN |x|≥aN
k=1 k=1 k=1
2aN
≤ N + 2.
an
Since N = N () is fixed and an → ∞, the right-hand side tends to 2 so that
¶ 199 for sufficiently large values of n ¶
 
n 
1   

 (x − bk ) dVk (x) < (2e2 + 3)
an  |x|<an 
k=1

holds, q.e.d.

5. We will also need the following


Lemma. If any sequence of real numbers {an } satisfies the relations (1) and
(2), and if bn is defined by (3), then
n 
(23) dVk (x + bk ) = o(1),
k=1 |x|>an
n 
1
(24) x2 dVk (x + bk ) = o(1),
a2n
k=1 |x|<an

and
 
n 
1   

(25)  x dVk (x + bk ) = o(1).
an  |x|<an 
k=1

For the proof, observe first that by (1) and (2) one has for each η > 0
n 
(26) dVk (x) → 0.
k=1 |x|>ηan

nung] I, p. 534.

436 On the Law of Large Numbers


Moreover, from (26) and (3) it immediately follows, because of an → ∞ (cf.
no. 2., p. 195), that

(27) |bn | = o(an ).

Now (23) is a trivial consequence of (26) and (27).


Since we have assumed that the {an } are monotone, we get from (27) that,
for sufficiently large n, max{|b1 |, . . . , |bn |} < 12 an . Then one hasg

n 
1 
0≤ x2 dVk (x + bk )
a2n |x|<an
k=1
n 
1 
≤ 2 (x − bk )2 dVk (x)
an |x|<2an
k=1
   
1 
n
= 2 x dVk (x) − 2bk
2
(x − bk ) dVk (x) − bk
2
dVk (x)
an |x|<2an |x|<2an |x|<2an
k=1
n  n 
1  
≤ 2 x2 dVk (x) + 4 dVk (x)
an
k=1 |x|<an k=1 |x|≥an
 
n  n 
1   
 
+  (x − bk ) dVk (x) + 2 dVk (x)
an  |x|<an  |x|≥an
k=1 k=1

¶ and, according to (2), (1) and (21), all quantities on the right-hand side ¶ 200
converge to zero. This proves (24). For the proof of (25) it is enough to note
that one has, because of (27), for sufficiently large n
    
n  n 
1     
 


  x dVk (x + bk ) −  (x − bk ) dVk (x) 
an   |x|<an   |x|<an 
k=1 k=1

2 
n
≤ (|x| + |bk |) dVk (x)
an 1
k=1 2 an ≤|x|≤an
n 
≤4 dVk (x) = o(1),
1
k=1 |x|≥ 2 an

q.e.d.

g In the second estimate the original has the misprint 2/a2n instead of 1/a2n .

[Feller 1937b] Translation — Selected Works of W. Feller, Volume 1 437


6. With these preparations it is easy to prove that the conditions of the
theorem on p. 192 f. are sufficient. In fact, one has for |t| < T
n   +∞   
 ixt 
 1 − e n dVk (x + bk )
a

k=1 −∞
n 

n 

  
 xt 
≤2 dVk (x + bk ) +  dVk (x + bk )
 a 
k=1 |x|≥an k=1 |x|<an
n

n 

 
 ixt ixt 
+  1+ − e an dVk (x + bk )
 |x|<an an 
k=1
  
 T   
n n

≤ dVk (x + bk ) +  x dVk (x + bk )
an  
k=1 |x|≥an k=1 |x|<an
n 
T2 
+ 2 x2 dVk (x + bk ),
2an |x|<an
k=1

and the right-hand side converges, by the lemma from no. 5., p. 199 to zero.
This entails immediately that, uniformly on each finite interval,
n $
  +∞   %
ixt
wn (t) = 1− 1 − e an dVk (x + bk ) → 1,
k=1 −∞

and by no. 2., p. 195 this is equivalent to the assertion.

7. Let us, finally, add a few words on the frequently discussed St. Pe-
tersburg Paradox. This consists in the following erroneous conclusion from
the law of large numbers: In a gamble where the mathematical expectation
¶ 201 of the gains is infinite, one may pay arbitrarily high stakes ¶ without any
disadvantage provided that the game is infinitely often repeated. In math-
ematical terms it is claimed that (at least if Vn (x) = V (x) and V (0) = 0),
 +∞
whenever x dVn (x) = +∞, for every sequence of real numbers {cn }, the
−∞
probability of the relation

1
n
(Xk − ck ) > 0
n
k=1

tends to 1 as n increases. Among others, P. Lévy10 has demonstrated by a


particular example that this is absurd from a mathematical point of view. We
may now, in general, assert that the above determined constants bn (cf. (3)),
or any other in the sense of (5) equivalent constants, always represent the
fairest possible stakes in the following sense. If one may choose an = n, then
10 P. Lévy, op. cit.

438 On the Law of Large Numbers


there is an almost sure compensation (and this may indeed happen, even if
Vn (x) = V (x) and if the infinite mean values are infinite). Otherwise, positive
deviations are almost certain, but betting arbitrarily high stakes cn which
satisfy, for example,
1 
n
(ck − bk ) > α > 0,
an
k=1

then an arbitrarily high loss is almost certain; more precisely: The probability
of the relation
1
n
(Xk − ck ) < −A
n
k=1

tends, for every positive A, to one as n increases.

Lund, February 1937

(Received 2–March–1937)

[Feller 1937b] Translation — Selected Works of W. Feller, Volume 1 439


Ó Springer International Publishing Switzerland 2015 441
R.L. Schilling et al. (eds.), Selected Papers I,
442 Acta Biotheoretica 5 (1939) 11–39
[Feller 1939a] — Selected Works of W. Feller, Volume 1 443
444 Acta Biotheoretica 5 (1939) 11–39
[Feller 1939a] — Selected Works of W. Feller, Volume 1 445
446 Acta Biotheoretica 5 (1939) 11–39
[Feller 1939a] — Selected Works of W. Feller, Volume 1 447
448 Acta Biotheoretica 5 (1939) 11–39
[Feller 1939a] — Selected Works of W. Feller, Volume 1 449
450 Acta Biotheoretica 5 (1939) 11–39
[Feller 1939a] — Selected Works of W. Feller, Volume 1 451
452 Acta Biotheoretica 5 (1939) 11–39
[Feller 1939a] — Selected Works of W. Feller, Volume 1 453
454 Acta Biotheoretica 5 (1939) 11–39
[Feller 1939a] — Selected Works of W. Feller, Volume 1 455
456 Acta Biotheoretica 5 (1939) 11–39
[Feller 1939a] — Selected Works of W. Feller, Volume 1 457
458 Acta Biotheoretica 5 (1939) 11–39
[Feller 1939a] — Selected Works of W. Feller, Volume 1 459
460 Acta Biotheoretica 5 (1939) 11–39
[Feller 1939a] — Selected Works of W. Feller, Volume 1 461
462 Acta Biotheoretica 5 (1939) 11–39
[Feller 1939a] — Selected Works of W. Feller, Volume 1 463
464 Acta Biotheoretica 5 (1939) 11–39
[Feller 1939a] — Selected Works of W. Feller, Volume 1 465
466 Acta Biotheoretica 5 (1939) 11–39
[Feller 1939a] — Selected Works of W. Feller, Volume 1 467
468 Acta Biotheoretica 5 (1939) 11–39
[Feller 1939a] — Selected Works of W. Feller, Volume 1 469
470 Acta Biotheoretica 5 (1939) 11–39
Translation of [Feller 1939a]

The Foundations of ¶ 11

Volterra’s Theory of the


Struggle for Life in a
Probabilistic Treatment
by
Willy Feller
(Upplandsgatan 70, Stockholm, Sweden)
(Received 13–IV–1939)

1. Introduction. The following account is an attempt to explain how


Volterra’s theory can be embedded into the framework of modern probability
theory; and how one can get a notion of the influence that will be exerted on
average by the necessarily expectable statistical fluctuations on the quantities
under investigation. Here we restrict ourselves to so-called events without af-
tereffect (in contrast to the “actions héréditaires”, cf. Volterra, 1931), i.e. to
events whose infinitesimal change over time only depend on the current state,
but not on the history how this state has been reached. Volterra’s theory
describes such events analytically in terms of systems of ordinary differential
equations. In order to derive them one uses (cf. Volterra, 1931, Risser, 1932,
or Kostitzin, 1937) ideas which are essentially of statistical nature and relate
to the average probability of certain events, such as birth or death of members
of a species, or encounters of members of a predator and a prey species. Since
one always deals with large populations, one may assume, however, that the
Translated and typeset by René L. Schilling. I am grateful for critical comments and
suggestions by Ellen Baake and Anton Wakolbinger. The symbol ¶ indicates a page break
in the original text, and the original pagination is shown in the margin. Footnotes indexed
by lowercase Roman letters contain editorial comments.

Ó Springer International Publishing Switzerland 2015 471


R.L. Schilling et al. (eds.), Selected Papers I,
expected statistical fluctuations are relatively insignificant; therefore, the pro-
cess is most simply described in a purely deterministic way: For the derivation
of the required differential equations one essentially assumes that the speed at
which [the size of] a certain population changes at any moment is proportional
to the average total probability of the (positive or negative) increase of the
members of the population.
¶ 12 ¶ This is, of course, exactly in accordance with the usual method in, say,
the theory of radioactive decay where the assumption that every single atom
has a constant decay probability λ leads to the differential equation

dm
(1) = −λm
dt
for the total mass m = m(t) of the material not yet decayed. In theory, the
numbers effectively observed will somewhat fluctuate, but these fluctuations
are extremely small and take place around the mean value determined by
(1). The fact that this is true, i.e. that equation (1) and the corresponding
equations in Volterra’s theory in this sense yield the expected values (math-
ematical expectations, mean values) of the relevant quantities, seems to be
mostly regarded as a matter of course which requires no further investiga-
tions. A detailed examination, however, yields the interesting result that this
is not always the case. Although (1) provides the relevant expected values, al-
ready the most basic biological growth processes, for example those which are
described in Volterra’s theory by the Pearl–Verhulst or “logistic” differential
equation

dm
(2) = m − δm2 ,
dt
reveal that these equations hold only approximatively for the relevant expected
values (although with sufficient precision for any practial purpose). The theo-
retical expectations, around which the quantities actually observed (under in-
variable basic assumptions) have to fluctuate statistically, are always a bit
smaller in reality than one would get from equation (2). A similar statement
holds for the system of equations

dN1
= 1 N1 + δ1 N1 N2
(3) dt
dN2
= 2 N2 − δ2 N1 N2
dt
which, in Volterra’s theory, yields just the first approximation for the descrip-
tion of the struggle for a joint habitat between, say, a predator and a prey
species of size N1 (t) and N2 (t), respectively. The fundamental difference be-
¶ 13 tween the phenomena described by (2) and (3) ¶ and simple growth models
of type (1) consists in the fact that, in the latter case, single atoms act, in the
sense of probability theory, completely independently of each other, while the

472 Volterra’s Theory of the Struggle for Life


possibility of interactions between the different individuals are the hallmark of
Volterra’s theory.
This very remark suggests that one could consider the struggle for life as
a stochastic process, i.e. as a statistical phenomenon depending also on “ran-
domness”. From a biological perspective this seems all the more likely, as
several problems from genetics lead to the theory of stochastic processes and
have been investigated using such methods (Kolmogoroff, 1935; Kolmogoroff,
Petrovsky & Piskounoff, 1937). The topic of the general theory are phenom-
ena like radioactive decay, telephone calls, transport of stones in a riverbed,
various activities of insurance companies or, to mention an example from a
completely different field, Brownian motion and diffusion. On a case by case
basis such processes have been investigated mathematically for some time, but
the foundations of a general and systematic theory have been laid only in a rel-
atively recent paper of the well known Russian mathematician A. Kolmogoroff
(1931).
For the sake of simplicity, and in order to expose some points of princi-
ple, we will restrict ourselves to considering the growth of a single popula-
tion; only at the end we will briefly mention how our considerations can be
adapted to the most general phenomena without aftereffect in the theory of
the struggle for life. It turns out that the biological processes in which we are
interested here can be mathematically described in two essentially different
ways, depending on the interpretation as “purely discontinuous” or “purely
continuous” processes (Feller, 1936). The first method is more primitive and
yields only a rough approximation of reality; in turn, it requires only relatively
simple mathematical techniques as it leads to systems of ordinary differential
equations. In order to characterize it briefly, one could call this description
atomistic. It is based on the assumption that the size of the population un-
der investigation changes discontinuously, i.e. that it remains exactly constant
over short intervals of time, suddenly increasing or decreasing every now and
then by one unit. The probability of such a jump depends of course, at any
time, on the size of the population, its age structure , the environment etc.,
but it is always assumed to be a known quantity. ¶ The first four examples of ¶ 14
stochastic processes mentioned above belong to the class of purely discontin-
uous processes.
From a biological point of view it is much more natural also to consider
the whole life energy of the population and to treat the size of the population
as a variable quantity which is constantly, but continuously, changing. This
is also in better accordance with Volterra’s approach for the struggle between
several species: The benefit, which the predators gain when a member of their
group encounters a member of a prey species, is accounted for as increment
of the predator species; this means that the increments are essentially not
restricted to be integer multiples of a unit. This approach uses a continuous
stochastic process and, analytically, one is lead to a partial differential equation
of the type of the heat equation. In the above mentioned genetical papers by
Kolmogoroff (1935) and Kolmogoroff, Petrovsky and Piskounoff (1937) the

[Feller 1939a] Translation — Selected Works of W. Feller, Volume 1 473


theory of continuous stochastic processes plays a role, and it is here where
the theory of diffusion belongs. The solution of the diffusion equation which
describes the spatial concentration of a diffusing substance may, according to
Einstein’s approach, also be interpreted as probability density to find some –
in a certain sense randomly moving – particle at exactly that position. An
essential difference between the differential equations – which which we will
encounter in the sequel – and the usual diffusion equation is, however, that
the former are, in general, singular, since the coefficients will vanish at the
boundary of the domain under consideration; this will also lead to particular
mathematical difficulties.
Let us also mention that one can definitely combine the atomistic and con-
tinuous approaches; this results in the description of a growth process of a
population whose size is incessantly changing in a continuous way and, some-
times, through jumps. Biologically, such an approach is most natural and most
flexible. Analytically, it leads to a “mixed stochastic process” and has to be
treated by integro-differential equations (Feller, 1936). Since the present essay
addresses mainly the fundamental aspects, it seemed indeed to be sufficient to
treat the two analytically simpler methods. Also for the reader’s convenience,
we begin with the rougher, atomistic treatment, which allows to introduce the
¶ 15 necessary definitions in a most simple way. It is ¶ thus our primary task to
derive subsequently, starting from both viewpoints mentioned earlier, the dif-
ferential equations for the probability that the population under investigation
has a certain size at a given moment. We will see which additional assumptions
allow to determine this probability uniquely and how we can obtain estimates
for the probable size of the statistical fluctuations in the size of the population,
as well as for the probability of extinction.
The analytic tools that are needed in the probabilistic treatment of the
struggle for life are, naturally, much more difficult than those of Volterra’s
approach. The practical calculations of even the simplest case will (cf. No. 8)
lead to Bessel functions, and in the more general cases there is not yet a
feasible method for the construction of the solutions. Of course, we are far from
proposing a new practical method here, rather, this is solely a fundamental
question in the biological approach which underlies the theory of the struggle
for life. Moreover, let us once again point out that Volterra’s method also
yields approximate values for the statistical expectations of the size of the
population under investigation; therefore, one has to arrive qualitatively at
the same results. I believe that one cannot expect more than just qualitative
results, given the state of the theory now and in the near future1 .

2. Before we describe the general growth process of a population as a dis-


continuous stochastic process let us, for the reader’s convenience, explain the
methods by means of the most trivial example, namely the theory of radioac-

1 For a purely qualitative treatment of the growth problem of several populations which

fight each other see, in particular, Kolmogoroff (1936).

474 Volterra’s Theory of the Struggle for Life


tive decay (which, incidentally, coincides with the most elementary approach
for the extinction of a homogeneous population under certain assuptions).
We assume that at time t = 0 there are N atoms (individuals) and each
atom (individual) which is still present at time t > 0 has probability λ dt do
decay (die) in the time interval (t, t + dt); here, λ can be seen as a constant.
We then write Pn (t) for the probability, that at ¶ time t > 0 there exist ¶ 16
exactly n atoms, and consequently, 0 ≤ n ≤ N holds. Since at time t = 0 there
existed exactly N atoms, we have the initial conditions

0 if 0 ≤ n ≤ N − 1,
(4) Pn (0) =
1 ” n = N.

In order to get to a differential equation for Pn (t) we can use a technique


which aligns itself with the method described by Kolmogoroff. If at time t there
are exactly n atoms, then the probability that in the time interval (t, t + dt)
at least one of them will decay obviously2 is nλ dt + o(dt) where o(dt) is a
quantity which is of a smaller magnitude than dt. The probability that in
any infinitesimal time interval more than one atom decays is, of course, of
the magnitude (dt)2 , and can be neglected. The event that at time t + dt
there are exactly n atoms may occur in the following two mutually exclusive
ways: (i) At time t there were also exactly n atoms and over the time interval
(t, t + dt) none of them decays. The probability for this is, according to our
discussion, Pn (t) · {1 − λn dt + o(dt)}. – (ii). At time t there were exactly n + 1
atoms and in (t, t + dt) exactly one of them decayed. The probability for this is
Pn+1 (t) · {λ(n + 1) dt + o(dt)}. All remaining ways to arrive at this event have
a probability of a smaller order than dt.
Altoghether we thus have for n < N

Pn (t + dt) = Pn (t) · {1 − λn · dt} + Pn+1 (t) · {λ(n + 1) dt} + o(dt),

and this obviously yields in the limit as dt → 0

(5) Pn (t) = −λnPn (t) + λ(n + 1)Pn+1 (t) (0 ≤ n < N ).

In particular, if n = N one obtains in the same way



(5a) PN (t) = −λN PN (t).

The system of N differential equations (5) and (5a) together with the initial
conditions (4) uniquely determines the probabilities Pn (t) that we have been

2 The exact probability is, of course,


n
1 − (1 − λ dt)n = nλ dt − λ2 (dt)2 + . . .
2
since (1 − λ dt)n is the probability that over the observed time interval none of the n existing
atoms decays.

[Feller 1939a] Translation — Selected Works of W. Feller, Volume 1 475


¶ 17 looking for. In the terminology of probability theory ¶ this is a “simple Markoff
chain”.3 The solutions are determined recursively as
N −λN t λt
(6) Pn (t) = e (e − 1)N −n ,
n
which can also be verified directly.
In the deterministic approach we thus obtain for the quantity m(t) of the
matter in stock the differential equation (1), hence m = N e−λt . It is also very
easy to see that the system (5) for the statistical expectation M = M (t) of
the existing number of atoms yields the same value M = N e−λt , and that
the fluctuations around this expected value are relatively small, i.e. all Pn (t)
whose index n is significantly different from M (t) are very small.
The expected value M (t) at time t is defined as the sum of all possible
numbers of atoms at time t, each multiplied with the corresponding probability,
i.e.

(7) M (t) = nPn (t).

If we repeat the experiment very often, i.e. if we frequently observe among


the N initial atoms the number of those atoms which have not yet decayed by
time t, then the arithmetic mean of these numbers will be close to M (t) with
overwhelming likelihood. In order to get a measure for the likely deviations of
the observed numbers from the theoretical expectation M (t), we introduce the
quadratic dispersion S(t), i.e. the expectation of the square of the deviations
of the observed numbers from the expected value, that is

(8) S(t) = [n − M (t)]2 · Pn (t).

A very small S(t) clearly indicates, according to (8), that only those Pn (t)
are significantly different from zero whose index is close to M (t), i.e. that we
may practically expect only numbers which do not significantly differ from the
expectation. Indeed, the limiting case S(t) = 0 means that PM (t) = 1 and that
all other Pn (t) vanish, i.e. only the expectation M (t) will be realized. In gen-
eral, one can easily show using (8) (the so-called inequality of Tschebytscheffa )
that the probability of the event&that the observed number n differs from the
¶ 18 expectation M (t) by more than ηS(t) is always smaller than ¶ 1/η, i.e. that
the probable
& deviations from the theoretical expectation are at most of the
order S(t). This estimate holds, quite generally, for any probability distri-
bution; for particular distributions like (5) and all others which appear later
on, one can very easily improve the estimate significantly, but we may as well
manage with this rough estimate.
For many purposes the quantities M (t) and S(t) are a sufficient quantita-
tive description of the probability distribution in question. For more precise
3 For a comprehensive review of the modern theory of Markoff chains cf. Fréchet (1938).
a Chebychev

476 Volterra’s Theory of the Struggle for Life


estimates we introduce the higher “moments” of the given probability distri-
bution which are defined by

(9) Mk (t) = nk Pn (t) (k = 1, 2, . . .).
n

In particular, we have M1 = M (t) and we get using (8)


  
S(t) = n2 Pn (t) − 2M (t) · nPn (t) + M 2 (t) · Pn (t)

= M2 (t) − 2M 2 (t) + M 2 (t) · Pn (t)

or, since by the definition of a probability always Pn (t) = 1,
(10) S(t) = M2 (t) − M 2 (t).
In our case one could calculate all moments directly from the probability
distribution (6), but we prefer to derive already here the moments directly from
the differential equation (5), using a more far-reaching method that we will
apply
  to the cases appearing
 later on. First, we see from (5) by summation that
Pn(t) = 0, thus Pn (t) = const. and, with regard to the initial condition
(4), Pn (t) = 1 as it should be according to the definition of a probability.
We can now generalize this method. In order to calculate Mk (t), we multiply
(5) with nk and obtainb
(11)  
Mk (t) = −λ nk+1 Pn (t) + λ nk (n + 1)Pn+1 (t)
$ %
k k
= −λMk+1 + λ Mk+1 − Mk + Mk−1 − + . . . + (−1)k M1
1 2
$ %
k
= −λ kMk − Mk−1 − + . . . + (−1) k−1
M1 .
2
In particular, one gets for k = 1
(12) M  (t) = −λM (t),
¶ and since one has by (4) M (0) = N , this implies ¶ 19

(13) M (t) = N e−λt .


in accordance with the differential equation (1) for the mass.
In order to calculate the dispersion S(t) we use k = 2 in (11) and obtain
(14) M2 (t) = −2λ · M2 (t) + λ · M (t),
and with the help of (10) and (12)
S  (t) = M2 (t) − 2M (t)M  (t)
= −2λ · M2 (t) − λ · M (t) + 2λ · M 2 (t) = −2λ · S(t) + λ · M (t).
b The following equation is No. (11), but the label is missing in the original paper –

[Feller 1939a] Translation — Selected Works of W. Feller, Volume 1 477


Inserting for M (t) the result from (13), one finally obtains
( )
(15) S(t) = N · e−λt 1 − e−λt .

Thus we have S(t) < M (t) and, according to the discussion above, the
& probable
deviations from the expectation are, therefore, at most of order M (t), i.e.
they are relatively insignificant for large M (t). Hence, the number of atoms
actually observed will be, with overwhelming likelihood, close to the value
given by the deterministic differential equation (1).

3. No further explanations are needed in order to adapt these consider-


ations to the case that every individual has the probability λ dt to split, in
each time interval of length dt, into two individuals (instead of decaying). The
place of the system (5) is now taken by

Pn (t) = −λ · nPn (t) + λ(n − 1)Pn−1 (t) for n > N


(16) 
PN (t) = −λ · N PN (t)

again with the initial values (4). Although the system (16) is a system of
infinitely many differential equations, this does not present any difficulties.
The solution is given by

n − 1 −λN t
(17) Pn (t) = e (1 − e−λt )n−N , (n ≥ N ≥ 1),
n−N

and one easily verifies that this indeed is a probability distribution. For the
expectation and the dispersion (cf. the definitions (7), (8) and (10)) we obtain,
as before,

M  (t) = λ · M (t),
(18)
S  (t) = 2λ · S(t) + λ · M (t),

¶ 20 ¶ or, explicitly,

M (t) = N · eλt ,
(18a) ( )
S(t) = N · e2λt · 1 − e−λt .
& √
Hence, one now has S(t) < M (t)/ N , i.e. for large initial values N one will,
in practice, observe only numbers which differ
√ only little from the
√ expectation
M (t), namely at most by the order of M/ N . The ratio of S and M is,
however, much less favourable than before.
The system (15) corresponds to a growth process that would, in the deter-
ministic setting, be described by the first one of the differential equations in
although it is used for cross-referencing.

478 Volterra’s Theory of the Struggle for Life


(18). In this context, it is now customary to describe more general growth pro-
cesses by the Pearl–Verhulst differential equation (2). The reasoning leading
to this can be sketched as follows. If the population under investigation lived
in an unchanged environment, then each individual would have, as assumed
earlier, a constant average probability λ dt to reproduce during (t, t + dt) (this
refers, of course, to the excess of the birth probability over the death probabil-
ity). Now the total population constrains the reproduction probability of each
member since the habitat is limited; this causes us to specify the reproduction
probability of a single individual during (t, t + dt) in second approximation as
the value (λ − γm) dt where m = m(t) denotes the current size of the popula-
tion; here it is assumed that λ > 0 and 0 < γ < λ/N , where N = m(0) denotes
the initial size of the population.
For the stochastic treatment it is essential that the total probability for
the reproduction of a single individual (λ − γm) dt consists of two components,
since in the differential equations the decrease of the population and its increase
are expressed in different ways. Let us, for the moment, neglect this, i.e. we first
consider populations where the individuals can only give birth, and only with
the average probabilities just mentioned (such populations may only turn up in
certain laboratory experiments). Then the above consideration yields, instead
of (16), the following system of differential equations for the probabilities Pn (t)
that at time t there are exactly n individuals given that at time t = 0 there
were N individuals:

(19) Pn (t) = −{λn − γn2 } · Pn (t) + {λ(n − 1) − γ(n − 1)2 } · Pn−1 (t).

¶ Of course, only those indices satisfying N ≤ n ≤ λ/γ are admissible (all ¶ 21


others must be set to 0), and one again has the initial condition (4).
The most general case in this direction would obviously be the system of
equations

(20) Pn (t) = −pn Pn (t) + pn−1 Pn−1 (t),

which describes the increase of a population where each individuum has, at


any moment where the size of the population is exactly n, the average proba-
bility pn /n to split in the following time interval of length dt. Provided that
the population does not change biologically, the pn are non-negative constants.
One can, however, account for a variation in the age structure of the popula-
tion, if one assumes that the pn , i.e. the reproduction intensity, is a function
of time; we will not discuss this here.
The system of differential equations (20) is infinite. As far as the existence
of solutions is concerned, we can calculate them directly by recursion; but it
turns out that the solution does, in general, not have theproperties which
are necessary for a probability distribution (in particular Pn (t) = 1 is not
always satisfied). One can show4 that the solutions of (20) define a probability
4 The system (20) is a simple special case of much more general systems of differential

[Feller 1939a] Translation — Selected Works of W. Feller, Volume 1 479



distribution if, and only if, either only finitely many pn = 0 or if the sum 1/pn
diverges. Under this condition, the solution is given by
(21)
n
(−1)n−N pN pN +1 · . . . · pn−1 · e−pκ t
Pn (t) = ,
(pκ − pN )(pκ − pN +1 ) · · · (pκ − pκ−1 )(pκ − pκ+1 ) · · · (pκ − pn )
κ=N

¶ 22 (n ≥ N ), and this includes, in particular, the solution of (19). ¶


Again, we prefer to calculate the expectation M (t) and the dispersion S(t)
directly from the differential equation, rather than from the explicit form of
the solution. Using (19) one obtains (cf. (7) and (9))

M  (t) = nPn (t)
   
= −λ n2 Pn (t) + λ n(n − 1)Pn−1 (t) + γ n3 Pn (t) − γ n(n − 1)2 Pn−1 (t)
 
=λ (n − 1)Pn−1 (t) − γ (n − 1)2 Pn−1 (t)
= λM (t) − γM2 (t).

Although we have set out from the same ansatz which leads to the equation
(2) in the deterministic approach, we obtain here a different equation for the
expectation M (t), namely

(22) M  (t) = λ · M (t) − γ · M2 (t).

In order to compare this value with the solution of (2), we note that

(23) M 2 (t) ≤ M2 (t),

and this is true for any probability distribution; this follows at once from (10)
since, because of the definition (8), one always has S(t) ≥ 0. By (22) and (23)
one has

(24) M  (t) ≤ λ · M (t) − γ · M 2 (t),

and this shows that M (t) is always smaller than the solution of (2) with the
same initial value M (0) = N . Under the present assumptions the sizes of the
population to be observed will statistically fluctuate around a quantity which
is slightly smaller than what would correspond to the differential equation (2).

equations from the theory of Markoff chains. Under the assumption that all coefficients are
bounded, I have (Feller, 1936) investigated the most general of these systems and shown

that they have solutions which satisfy all requirements of the theory (in particular, they
are positive and Pn (t) = 1). If the coefficients are unbounded, this theorem fails to
hold. Although in the case of (20)  solutions, which can easily be
there are still nonnegative
computed by a recursion, one has Pn (t) < 1 whenever 1/pn converges; thus the Pn (t)
are, in this case, not a probability distribution. I intend to show elsewhere that, also in the
more general setting, one can find a necessary and sufficient condition which is completely
analogous to the one given in the text.

480 Volterra’s Theory of the Struggle for Life


Thus, under the present assumptions the statistical fluctuations result in a
slow-down of the growth. We are going to see in what follows that such a
slow-down is the rule also under more general assumptions.
Quantitatively, the difference between the solutions of (24) and (2) will
only be small. This is due to the fact that one can show, in a similar way
as above, that the dispersion of the probability distribution given by (19) is
relatively small; if S(t) = 0, both differential equations (2) and (24) would
coincide because of (10). With the help of the so-called moment inequalities
of probability theory, this consideration can be made more precise, however,
we may content ourselves with the qualitative result.

4. The differential equations (16) and (19) correspond to an ¶ exponential ¶ 23


and a logistic growth, respectively, but in the derivation we neglected any
possibility of death. In general, the resulting growth probability consists of
a birth and a death probability. Assume first that, regardless of the size of
the population, each individual has the constant probability τ · dt of dying in
a time-interval of length dt and that the corresponding birth probability is
ω · dt.Then the event that there are exactly n individuals at time t + dt can
be achieved in three mutually disjoint ways: (i) At time t there were already
exactly n individuals and there was no birth or death during (t, t + dt). (ii)
At time t there were exactly n − 1 individuals, and there was exactly one
birth and no death during (t, t + dt). (iii) At time t there were exactly n + 1
individuals, and there was exactly one death. As in No. 2, p. 16, this yields
the corresponding probabilities if one neglects terms of order less than dt: (i)
Pn (t) · {1 − (ω + τ ) dt}, (ii) Pn−1 (t) · ω(n − 1) dt, and (iii) Pn+1 (t) · τ (n + 1) dt.
All other possibilities that also produce exactly n individuals at time t + dt
have a probability of order smaller than dt (cf. the footnote on p. 16 [this is
footnote 2]), so that Pn (t+dt) can be written as the sum of the three quantities
that we have just calculated. This yields the system of differential equations

(25) Pn (t) = −(ω + τ )n · Pn (t) + ω(n − 1) · Pn−1 (t) + τ (n + 1) · Pn+1 (t),

of course again with the the initial values (4), if there were initially exactly N
individuals.
It is now interesting to compare expectation and dispersion of the proba-
bility distribution (25) with those of (16). One easily obtains from (25)
  
M  (t) = nPn (t) = −τ (n + 1)Pn+1 (t) + ω (n − 1)Pn−1 (t)

= (ω − τ ) nPn (t)

i.e. one has

(26) M  (t) = (ω − τ )M (t), M (t) = N e(ω−τ )t .

Depending on the sign of ω − τ , i.e. on the resulting growth probability, this is


in accordance with the expected values (18) or (13). For both the stochastic

[Feller 1939a] Translation — Selected Works of W. Feller, Volume 1 481


and the deterministic approach, it is irrelevant in which way the total growth
¶ 24 probability ¶ is composed of a birth and a death component, as far as expec-
tations are concerned. This is different for the size of the probable statistical
fluctuations. In fact, one obtains from (25) that

M2 (t) = n2 Pn (t)
  
= −(ω + τ ) n3 Pn (t) + ω n2 (n − 1)Pn−1 (t) + τ n2 (n + 1)Pn+1 (t)
 
=ω {2(n − 1) + 1} · (n − 1)Pn−1 (t) + τ {−2(n + 1) + 1} · (n + 1)Pn+1 (t)
= 2(ω − τ )M2 (t) + (ω + τ )M (t),

and from this it follows, by virtue of (26), that


d * +
S  (t) = M2 (t) − M 2 (t) = 2(ω − τ )M2 (t) + (ω + τ )M (t) − 2(ω − τ )M 2 (t)
dt
= 2(ω − τ )S(t) + (ω + τ )M (t).

Assuming that, say, ω − τ > 0, one obtains


ω+τ
(27) S(t) = N e2λt (1 − e−λt ) (λ = ω − τ ).
λ
Thus, one sees that here the dispersion is larger than in the pure-birth case
ω+τ
(18a), namely, by the factor ω−τ > 1. It is also intuitive that the statistical
fluctuations have to be larger if the resulting growth probability λ comprises
a death component as well, and (27) just makes this guess precise.
The case ω − τ < 0 is the exact analogue to the one discussed above. If
ω = τ , i.e. if birth and death probabilities coincide, then M (t) is identically N ,
and S  (t) = 2ωN , i.e. S(t) = 2ωN t. In the deterministic approach the equation
m (t) = 0 corresponds to this case. Statistically one would expect that on the
one hand positive and negative increases of the initial value N are equally
likely, but on the other hand the observable values√fluctuate around N by a
value which increases, on average, proportional to ωt.
If one wanted to adapt the Pearl–Verhulst approach (2), one would have to
assume that in the population under investigation, at any moment where the
size of the population is exactly n, each member has an average infinitesimal
birth probability (ω −γn) dt and and infinitesimal death probability (τ −σn) dt.
Instead of (25) one obtains, with the same reasoning, the more general differ-
ential equationsc

Pn (t) = −{n(τ + ω) − n2 (τ + γ)} · Pn (t)


(27*) + {(n + 1)τ − (n + 1)2 σ} · Pn+1 (t)
+ {(n − 1)ω − (n − 1)2 γ} · Pn−1 (t).
c In the original the following equation is erroneously labeled as (27). In order to avoid

confusion we re-label it as (27*)

482 Volterra’s Theory of the Struggle for Life


( )
¶ Here, n is, of course,restricted to the interval 0 ≤ n ≤ min ωγ , στ , which ¶ 25
has to contain N if the population should at all be able to evolve from the
initial value N . The resulting reproduction probability of any single individual
is {(ω − τ ) − (γ − σ)n} dt. Whenever this quantity is positive, i.e. if we are
facing a proper growth process, then it is consistent with the approach (2)
that the interaction between the members of the population results in a slow-
down of the reproduction speed, i.e. it is always assumed in (2) that δ > 0.
Therefore, we will also assume that γ − σ > 0. As before, one can calculate the
expectation and the dispersion of the probability distribution (27). With the
usual transformations, one obtains, in particular, that

M  (t) = nPn (t) = (ω − τ )M (t) − (γ − σ)M2 (t)
(28)
< (ω − τ )M (t) − (γ − σ)M 2 (t).

This is the analogue of (22). In order to compare it with (2), we have to set
ω − τ =  and γ − σ = δ, in line with the biological meaning of these quantities.
Thus, we again encounter the same phenomenon which we have already met
in the more specialized approach (19) (cf. p. 22): The statistical fluctuations
have the effect that the population grows, on average, somewhat more slowly
than one would get from the approach (2). Also for the other aspects, our
earlier remarks (p. 22) remain valid.

5. We will now turn to the alternative probabilistic approach to the


growth problem mentioned in the introduction, which is substantially more
flexible, and in which the population size is no longer assumed to be integer-
valued. One can imagine the mechanism of the process to be similar to a
Brownian motion. The state of the population under consideration, i.e. its
total life energy, is exposed to continual changes which result from a large
number of small causes. As there are many causes, it is, in practice, safe
to assume that there will be some change in any arbitrarily small time in-
terval. Since the single causes are small, the change over a very small time
interval will also be small5 . This is the main difference ¶ compared to the ¶ 26
previous approach, where only changes of a fixed size were allowed. There, in
any infinitesimal time interval, the dominating probability (namely 1 − n dt)
was that no change happens; if a change did occur, then the new and the old
states would at once differ by a jump. In contrast, in the new approach it is

5 It is, per se, not necessary to assume that the size of the population changes only

continuously. As we have already mentioned in the introduction, we can very easily combine
both techniques described in the text and assume that the size of the population changes
partly continually and continuously – namely by ageing and similar continuous processes –
and partly sometimes by jumps. This would actually be the correct mathematical approach,
but the continuous approach seems to give a sufficiently good approximation and to contain
all qualitative essentials, so that we believed that we can restrict ourselves in the text to
this approach. Moreover, it will become obvious from the text how one can derive the

[Feller 1939a] Translation — Selected Works of W. Feller, Volume 1 483


assumed that almost surely (probability = 1) some change happens, but this
change will, with practical certainty, be smaller the smaller dt is (“continuous
process”).
The size (life energy) of the population can a priori attain any value x > 0.
Assume it is known that at time t = 0 it had the value N > 0. Let then w(t, x)
denote, for any x > 0 and t > 0, the probability density for the population to be
of size x at time t (“w(t, x) dx is the probability that the population at time t is
between x and x + dx”). Finally, w0 (t) is the probability that the population is
extinct at time t. It is assumed that w(t, x) is a continuous function (and, for
simplicity, we will assume the same also for its first and second derivatives).
According to the definition of a probability density one has for all t > 0
 ∞
(29) w0 (t) + w(t, x) dx = 1.
0

Moreover, it is known that the size of the population at time t = 0 was


exactly N ; since the change happens continuously, this means that for small t
the probability for the size being between N −  and N +  is very high. This
yields the initial condition
 N+
(30) lim w(t, x) dx = 1,
t→0 N −

which corresponds to equation (4) in the atomistic approach. Due to (29) one
can rewrite (30) also in the formd
  N−  ∞ 
(30a) lim w0 (t) + w(t, x) dx + w(t, x) dx = 0.
t→0 0 N+

¶ 27 ¶ We will now show how one can apply a method which was first used by
A. Kolmogoroff in order to derive a differential equation for w(t, x):

∂w(t, x) ∂ 2 {b(x)w(t, x)} ∂{a(x)w(t, x)}


(31) = − .
∂t ∂x2 ∂x
This, together with the initial condition (30), will in general completely deter-
mine the probability density w(t, x).
For simplicity, we assume, as usual, that the growth process is temporally
homogeneous, i.e. that the tendency to change only depends on the current
state, but not on when the process happens. Biologically this means, among
other things, to assume a constant age structure of the population. One may,
by the way, transfer the considerations almost literally to the more general
corresponding equations for the combined case; they, of course, will be integro-differential
equations of the type considered in Feller (1936), §5.

d Misprint
 N +  N −
in the original: . . . instead of . . ..
0 0

484 Volterra’s Theory of the Struggle for Life


inhomogeneous case: The essential difference is just that the coefficients a and
b in (31) will also depend on t.
The inner mechanism of the growth process is analytically expressed through
the function f (ξ, x; τ ), the so-called transition probability, which gives the prob-
ability for the population to grow from size ξ > 0 to size x > 0 during any time
interval of length τ > 0; in other words, f (ξ, x; τ ) dx is the conditional proba-
bility for the size of the population to be between x and x + dx at time t + τ ,
given that it had size ξ at time t. Thus, the integral of f (ξ, x; τ ) over all x > 0
is the probability that the population still has positive size after time τ has
elapsed, i.e.
 ∞
(32) Φ(ξ, τ ) = 1 − f (ξ, x; τ ) dx
0

is the probability that the population has died out by time t + τ if it was of
size ξ at time t.
Using the function f (ξ, x; τ ), we can make precise the notion of continuity
of the process (cf. Feller, 1936, §1) described vaguely above. Actually, when
describing the process we assume continuity in the sense that for every 0 <  < ξ
  ξ+ 
1
(33) lim 1− f (ξ, x; Δt) dx = 0,
Δt→0 Δt ξ−

i.e. the probability of a change exceeding  during a time interval of length Δt


will be of a smaller order than Δt. ¶ ¶ 28
It is now assumed that the population has size ξ at time t. In the following
time interval of length Δt there is, with probability density f (ξ, x; Δt), an
increment of size x − ξ if x > 0, and with probability Φ(ξ, Δt) an increment of
size −ξ, i.e. extinction. The expected value of the increment is thus
 ∞
(34) A(ξ, Δt) = −ξΦ(ξ, Δt) + (x − ξ)f (ξ, x; Δt) dx.
0

The biological meaning of the expected value is the same as before: If


one observes, under identical conditions, a large number of the populations in
question, all with size ξ, then A(ξ, Δt) yields approximatively the arithmetic
mean of the observed increments during a time interval of length Δt.
Assume now that the limit
1
(35) lim A(ξ, Δt) = a(ξ)
Δt→0 Δt

exists; clearly, a(ξ) is then the population’s tendency to grow at any moment
when the size is ξ, and this is the exact correpsondence to the notion of speed
of growth in the deterministic approach.
However, a(ξ) = 0 is possible as soon as positive and negative increments
are equally likely; this may happen also in cases where, statistically, relatively

[Feller 1939a] Translation — Selected Works of W. Feller, Volume 1 485


large changes of the size of the population are not unlikely. In order to get
an idea of the extent of these likely statistical fluctuations, we introduce, in
addition to (34), the expectation of the square of the increments, i.e. the
dispersion
 ∞
(36) B(ξ, Δt) = ξ 2 Φ(ξ, Δt) + (x − ξ)2 f (ξ, x; Δt) dx.
0

Then, provided the limit exists,


1
(37) lim B(ξ, Δt) = b(ξ)
Δt→0 2Δt
is the required measure for the dispersion of the possible speeds of increase
around their mean value a(ξ). In the sequel we will always assume that b(ξ)
exists and that it is positive.
¶ 29 If the assumption (33) is satisfied, then one can show under very ¶ general
conditions6 that w(t, x) satisfies the equation (31), where the coefficients are
defined by (35) and (37), and thus have an immediate biological meaning.

6. This biological meaning allows a simple derivation of the correspond-


ing differential equation (31) for all special cases that occur. For example, if
one assumes that the size of the population has no influence on the average
reproduction rate of the single individuals, i.e. that they are stochastically
independent – it is this assumption which leads to equation (1) in the deter-
ministic case –, then a(x) and b(x) obviously have to be proportional to x, i.e.
(31) attains the form

∂w(t, x) ∂ 2 {xw(t, x)} ∂{xw(t, x)}


(38) =β −α ,
∂t ∂x2 ∂x
where α and β are constants, β > 0. Thus, this equation is the analogue
of the equation (16) in the atomistic approach. α corresponds to λ and it
measures the average infinitesimal reproduction rate of any single individual
(α > 0 corresponds to a positive reproduction). β is a measure of the statistical
dispersion around this mean value in such a way that it depends, for example,
on the age structure and the homogeneity of the population and similar factors.
Now, the probability function w(t, x) is, as one can show7 , uniquely deter-
mined for all x > 0 and t > 0 by (38) together with the initial condition (30).
If α = 0 one can represent it in the form
'
√ α(N eαt +x)
1 α N eαt 2iα −
β(eαt −1)
(39) w(t, x) = J 1 xN eαt · e
i β(eαt − 1) x β(eαt − 1)

6 For this, cf. Kolmogoroff (1931), §§13–14, and under slightly more general conditions

Feller (1936), §§1–2.


7 Although equation (38) seems to have been nowhere treated, I prefer not to provide

486 Volterra’s Theory of the Struggle for Life


where J1 (x) denotes, as usual, the Bessel function of the first kind. The
real and positive character of w(t, x) becomes clearer from the following series
representation of (39)
(39a)

α(N eαt +x) 
$ √ %2κ
N α2 eαt −
β(eαt −1) 1 α αt
w(t, x) = 2 αt e xN e .
β (e − 1)2 κ!(κ + 1)! β(eαt − 1)
κ=0

¶ For the special case α = 0 one obtains from (39), using a limit argument, ¶ 30
the solution of the more special equation
∂w(t, x) ∂ 2 {xw(t, x)}
(40) =β
∂t ∂x2
in the form
'
1 N 2i √ − N +x
w(t, x) = J1 xN · e βt
iβt x βt
(41) ∞ √ 2κ
N − N +x  1 xN
= 2 2 e βt .
β t κ!(κ + 1)! βt
κ=0

Since one has good estimates for Bessel functions, one can read off from (39)
and (41) all essential properties of the solution w(t, x) of (38), and of (40),
respectively.
The greatest difference between this solution and the corresponding solu-
tion (17) of the atomistic approach is that, in the latter case, it was absolutely
impossible for the population to die out at any point, if the reproduction
probability λ is positive (one always has n ≥ N ). The equation (31), however,
always yields a positive probability for the extinction of the population. The
probability w0 (t) that the population has died out by time t can be calculated
from (29). The integral appearing there can be easily computed from the series
representation (39a): Noting that
 ∞ $ %κ
− αx αx β(eαt − 1)
e β(eαt −1) dx = κ! ,
0 β(e − 1)
αt α
one obtains easily
 ∞ αt ∞ $ %κ+1
− αN e 1 αN eαt
w(t, x) dx = e β(eαt −1)
0 (κ + 1)! β(eαt − 1)
κ=0
eαt
$ αN eαt
%
− αN
=e β(eαt −1) e β(eαt −1) −1 .

Thus, by (29) one obtains for the extinction probability w0 (t)


− αN eαt
αt
β(e −1)
(42) w0 (t) = e , (α = 0).
the rather involved calculations which lead to the solution (39). Concerning the uniqueness

[Feller 1939a] Translation — Selected Works of W. Feller, Volume 1 487


In case α = 0 one obtains in the same way from equation (41), or through a
limit argument in (42), that
N
− βt
(43) w0 (t) = e .
¶ 31 ¶ One sees that 0 < w0 (t) < 1 always holds, and that the smaller β, the
smaller is the extinction probability. Indeed, it is just the dispersion β which
makes an extinction of the population possible. Moreover, w0 (t) is a mono-
tonically increasing function, as it should be due to its meaning. One has
 αN

e β , if α > 0
(44) lim w0 (t) = .
t→∞ 1, if α ≤ 0
Cast in words: If the infinitesimal growth probability is α ≤ 0, then eventual
extinction of the population occurs with probability 1 (certainty). If, however,
the tendency to grow is α > 0, the dispersion is β > 0 and the population has
− αN
initially size N , then the probability to die out eventually equals e β .
If α > 0, then the expectation of the population equals N eαt (as we will
see soon), and the population will, with overwhelming likelihood, grow expo-
nentially. Therefore, once the population has reached a certain larger size,
it is, by (44), exponentially more improbable that it will die out later. This
also finds its expression in the fact that, for α > 0, the quantity (42) tends to
its limit (44), i.e. that an extinction, if it happens at all, will most probably
happen very soon, as long as the size of the population still is relatively small.

7. In order to get more information on the likely size of the population,


one could easily compute its expectation, its dispersion etc. from (39) and (41),
respectively, by integration. We prefer, however, to derive these quantities
directly from the differential equation (38), using a technique which has the
potential to be generalized and is the analogue of the method used earlier.
In general, we define the moments
 ∞
(45) Mk (t) = xk w(t, x) dx, k = 1, 2, . . .
0

(cf. (9)). In particular, M (t) = M1 (t) is the expected value of the popula-
tion size, around which the observable statistical fluctuations will take place.
Accordingly (cf. (8))
 ∞
(46) S(t) = {x − M (t)}2 w(t, x) dx + M 2 (t)w0 (t)
0
is the dispersion of the population size, i.e. the expectation of the square of
its deviation from its expected value. Multiplying out the square in (46), one
obtains, with (29) and the definition (45) of M (t) in mind, that
(47) S(t) = M2 (t) − M 2 (t),
of the initial value problem (31), cf. also the remarks to No. 8, pp. 33 f.

488 Volterra’s Theory of the Struggle for Life


i.e. again the relation (10).
Multiplying the differential equation (38) with x and integrating it with
respect to x, using integration by parts repeatedly, together with the fact that
xw(t, x) and x ∂w(t,x)
∂x tend to zero as x → 0, one obtains
 ∞  ∞ $ 2 %
 ∂w(t, x) ∂ xw(t, x) ∂xw(t, x)
M (t) = x dx = x β −α dx
0 ∂t 0 ∂x2 ∂x
 ∞ $ % $ %∞
∂xw(t, x) ∂xw(t, x) 
=− β − αxw(t, x) dx + x β − αxw(t, x) 
0 ∂x ∂x 0
∞  ∞

= −βxw(t, x) + α xw(t, x) dx.
0 0

So we recover for the expectation that

M  (t) = α · M (t),

or, together with the initial condition (30),

(48) M (t) = N · eαt ,

which coincides with the previous results. In general, for Mk (t) and k > 1, the
same technique yields

(49) Mk (t) = β · k(k − 1)Mk−1 (t) + αkMk (t),

and this gives a recursion formula for all Mk (t). In particular, we get from
this using (47)

S  (t) = M2 (t) − 2M (t)M  (t) = 2βM (t) + 2αM2 (t) − 2αM 2 (t)
= 2αS(t) + 2βM (t)

and so
β
(50) S(t) = 2 N e2αt (1 − e−αt ).
α
If one compares (50) with the corresponding formula (15) and (18a), then one
sees that we have complete coincidence if one sets α = β = λ. This, however,
is exactly the relation which one can ¶ calculate for the quantities (33) and ¶ 33
(37) from the probability distributions in the atomistic case. With these quan-
tities, the two ways of describing the growth process coincide, as far as the
expectation of the population size and its quadratic dispersion are concerned.
However, if one computes, using (49), the higher-order dispersion measures
that are familiar in statistics, then one easily sees that under the present
continuous approach these quantities become larger than in the atomistic ap-
proach. This corresponds with the fact that the continuous approach takes
into account more biological factors which will, in turn, cause theoretically
larger statistical fluctuations as well.

[Feller 1939a] Translation — Selected Works of W. Feller, Volume 1 489


8. If one starts with the Pearl–Verhulst approach (2) and assumes that
there is interaction between the members of the population that makes the
infinitesimal reproduction tendency α of any single individual decrease in pro-
portion to the size of the population, then (31) takes the form

∂w(t, x) ∂ 2 xw(t, x) ∂ , -
(51) =β − (αx − γx2 )w(t, x) .
∂t ∂x2 ∂x
For this equation, too, one can calculate expectation and dispersion by the
method mentioned above. For example, one gets

M  (t) = α · M (t) − γ · M2 (t)

so that the expected value is again smaller than under the deterministic ap-
proach (2). For the other quantities, one also obtains qualitatively the same
results as in the atomistic approach and in No. 7 (pp. 31 f.), so that we can do
without the explicit calculations.
A general method of integration for the equation (31) does not yet exist for
the cases we are interested in, and here one encounters many interesting, but
unsolved mathematical problems. A reasonably definitive result exists only
for the case where b(x) > 0 throughout, and all x are possible. In this case
I have (Feller, 1936, §§2–3), under suitably general regularity assumptions,
constructed the solution to (31) as a convergent series, and I have proven
that it is unique and satisfies all probabilistic requirements, in particular, that
the conditions (33), (35) and (37) are actually satisfied for this solution. ¶
¶ 34 The equations of the form (31), which we will now encounter, contain an
essentially novel aspect, since b(x) = 0 is, in general, possible, and since only
solutions in the quarter-plane t > 0, x > 0 are of interest. This is, from an
analysis perspective, a singularity of the equation that causes major difficulties
and which thoroughly modifies the initial value problems. From the point of
view of applications, the situation is better, since every equation (31) may
be approximated by equations for which b(x) > 0 holds and since many types
of equations can be directly reduced to equations of the latter kind. The
last remark holds, in particular, for those equations whose coefficients can be
written in the form

a(x) = xρ A(x), b(x) = xσ B(x)

where A(0) = 0, B(0) = 0, ρ ≥ σ/2, σ ≥ 2 provided that


 ∞
dx
&
1 b(x)

diverges. Then one can introduce a new variable ξ in place of x through


 x
ds
ξ= & , x > 0.
1 b(s)

490 Volterra’s Theory of the Struggle for Life


If x ranges from 0 to ∞, then ξ varies between −∞ and ∞, and (31) becomes
∂w ∂ 2 w 3  1 ∂w
= 2
+ b (x) − a √ − a (x) · w,
∂t ∂ξ 2 b ∂ξ
i.e. in an equation which is completely understood.
In particular, one sees that in this case, exactly as for the equation (38) –
which is not contained in this case – the solution depends only on the initial
value for t = 0. This is an interesting contrast to the theory of the ordinary
diffusion equation
∂w(t, x) ∂ 2 w(t, x)
(52) =
∂t ∂x2
which has infinitely many solutions in the quarter-plane x > 0, t > 0 with
the same initial values, and where any single solution is only determined by
the boundary values along both semi-axes. This remarkable difference may,
however, be also made plausible through the biological meaning of equation
(31). ¶ ¶ 35
If one wanted to associate (52) as differential equations of the form (31)
with a specific growth process of a hypothetic population, then this would
be a population satisfying a(x) = 0, i.e. having a vanishing expected value of
the reproduction: Births and deaths would have to cancel on average. The
dispersion around the mean value 0, however, is 1, independently of the size
of the population; i.e. independently of the size of the population there would
be finite reproduction rates. Such an assumption contradicts, of course, any
biological postulates, and backfires analytically exactly in the special role that
is played by the semi-axis x = 0, the case of extinction. In the case of the
equation (38) both a(x) and b(x) vanished for x = 0, i.e. the net reproduction
rate of a small population was small, and converged with certainty towards
zero if the size of the population did so, too. The axis x = 0 became negligible
by itself – any extinct population could not interfere with the calculation,
because of the form of a(x) and b(x). In contrast, the approach (52) formally
leaves the possibility that even a population of size zero starts reproducing (or,
as one would say in diffusion theory: Matter is flowing in across the x-axis).
Therefore, a special boundary condition is needed to express that a population,
once it has reached zero, remains at size zero forever. The analytic form of
such boundary conditions is well known from diffusion theory; it corresponds
to an “absorbing boundary” and can be expressed as w(t, 0) = 0.
In those cases where b(0) = 0, equation (31) (in contrast to equation (38)
etc. discussed before) has infinitely many solutions with the same initial values
for t = 0, and the one which vanishes along x = 0 would correspond to our
problem. In the case (52) this solution is known to be
$ %
1 (x−N )2 (x+N )2
w(t, x) = √ e− 4t − e− 4t .
2 πt
Retaining from (38) the form of the coefficient a(x), but defining b(x) again

[Feller 1939a] Translation — Selected Works of W. Feller, Volume 1 491


to be constant, one arrives at the equation

∂w(t, x) ∂ 2 w(t, x) ∂xw(t, x)


=β −α
∂t ∂x2 ∂x
¶ 36 ¶ which has the solution8
'  
α(x−N eαt )2 α(x+N eαt )2
α 1 − −
(53) w(t, x) = √ e 2β(e2αt −1) −e 2β(e2αt −1) .
2πβ e − 1
2αt

If both a(0) = 0 and b(0) = 0, then the solution of (31) is uniquely determined
just through the initial values at t = 0. If only b(0) = 0 but a(0) = 0, then both
cases may occur. To my knowledge, the latter has, first been observed by S.
Kepinski (1905). An all-embracing criterium for this case does not exist.

9. Let us finally say a few words on the limiting procedures that lead to
the deterministic approach.
The growth process is completely determined in a causal sense if, and only
if, the speed a(x) of the increments of the population, in the moment when it
has reached size x, is completely specified, i.e. if the statistical dispersion b(x)
around the expectation vanishes identically. In this limiting case (31) indeed
becomes a first-order equation

∂w(t, x) ∂{a(x)w(t, x)}


(54) + = 0,
∂t ∂x
and this is known to be equivalent with the deterministic approach according
to which the size x of the population satisfies, as a function of time t, the
ordinary differential equation

(55) x = a(x).

In fact, (55) yields the so-called characteristics of the equation (54), i.e. those
lines along which any attained state has to evolve according to (54).
Using a different limiting procedure, one reaches the same result starting
from the atomistic approach. For this we start, say, from the form (20) of the
differential equations of the growth process. The limit to infinite populations
is now done in the following way: The size of the population is assumed to
be any multiple of a quantity h > 0 and Pn (t) is the probability that exactly
¶ 37 the size n · h is attained. Then we carry out the limit ¶ h → 0 in such a way
that nh → x where x > 0 is chosen arbitrarily. If mh < x < (m + 1)h and m is
integer, then we set

a(h) (x) = h · pm , w(h) (t, x) = h · Pm (t).


8 This follows from a result by Kolmogoroff (1931), §17, and it can also be established

using the Fourier transform.

492 Volterra’s Theory of the Struggle for Life


In agreement with the meaning of the quantities pm , one has to assume that in
the limit h → 0 the quantity a(h) (x) converges to a continuous function a(x).
The same applies to w(h) (t, x) since, according to (20), one hase

∂w(h) (t, x) −a(h) (x)w(h) (t, x) + a(h) (x − h)wh (t, x − h)


=
∂t ( h )
Δx a (x − h)w (t, x − h)
(h) (h)
=− .
h

From this one easily concludes that w(h) (t, x) converges to a solution of the
equation (54).

10. Let us finally show how the previous considerations can be adapted
to the more general problems (without aftereffect) of Volterra’s theory of the
struggle for life. For this, it is enough to adapt the more natural, continuous
approach: The atomistic method then carries over in the same way, and even
easier.
Assume, for simplicity, that there are only two populations, which compete
for the same environment, one of them being a predator species, the other a
prey species. Instead, as it is usually done, to describe the process using two
functions of one variable each, we use a single function of three variables. If
the first population has size x1 and if, at the same time, the second is of size
x2 , then we represent this state by a point (x1 , x2 ) in the plane. One has
to determine a function w(t, x1 , x2 ) that is the probability density for both
populations to have the sizes x1 ≥ 0 and x2 ≥ 0, respectively, at the same
moment t > 0; the initial values of the quantities at time t = 0 are, of course,
assumed to be known.
Using similar considerations as before one obtains the following partial
differential equation for w(t, x1 , x2 ):
(56)
∂w(t, x1 , x2 ) 2
∂ 2 {bik (x1 , x2 )w(t, x1 , x2 )}  ∂ {ai (x1 , x2 )w(t, x1 , x2 )}
2
= − .
∂t ∂xi ∂xk ∂xi
i,k=1 i=1

The biological meaning of the coefficients in (56) can be obtained by ¶ similar ¶ 38


considerations as in No. 5: For example, a1 (x1 , x2 ) is the average infinitesimal
speed of growth of the first population as a function of the current size of
both populations; b11 (x1 , x2 ) stands for the infinitesimal statistical dispersion
around this expected value of the speed of growth, and b12 (x1 , x2 ) is a measure
of correlation for the mutual influence of both populations on their speed of
growth.
The simplest assumption is, again, the one that leads to Volterra’s ansatz
(3): The members of the single populations act stochastically independently

e The original contains the misprint w(t, x − h) instead of w(h) (t, x − h) in the second line

[Feller 1939a] Translation — Selected Works of W. Feller, Volume 1 493


among themselves, so that the speed of growth and the dispbersion are both
proportional to the size of that population. One still has to consider the
encounters between members of the predator and prey species; the probability
of such an encounter is evidently proportional to the product x1 x2 , and from
each such encounter the predator species will draw a certain advantage, the
prey species a disadvantage. Under these particular assumptions, (56) takes
the form
∂w ∂ 2 x1 w ∂ 2 x2 w ∂ ( )
= β1 2 +β2 2 − (α1 x1 +γx1 x2 )w
∂t ∂x1 ∂x2 ∂x1
(57)
∂ ( )
− (α2 x1 −δx1 x2 )w ,
∂x2
where the greek letters mean constants and, say, γ > 0, δ < 0, if index 1 denotes
the predator species. In the next approximation discussed by Volterra, which
corresponds to the Pearl–Verhulst approach (2), one would have to set the
coefficients a1 (x1 , x2 ) in (56) as quadratic functions, etc.
The ansatz (57) corresponds to the linear ansatz (38) in the theory of
growth of a single population. Since, however, (57) also contains quadratic
terms, the following phenomenon already appears here, which we have so far
encountered only in connection with the more general ansatz (51): The ex-
pected value for the size of the two populations, according to (57), will not be
identical with the size of these populations that would be obtained from the
approach (3), assuming the same speed of growth. Here, however, the growth
does not necessarily experience a slow-down. Rather – depending on the ini-
tial situation – the statistical fluctuations will sometimes cause a slow-down
and sometimes a speed-up of growth. One could repeat here some of the re-
¶ 39 marks from above, but ¶ we will refrain from providing details of the practical
calculations in this context.
Let us finally remark that the general theory of equations of the form
(56) in several variables is even less developed than in the case of a single
space dimension; recently, however, groundbreaking new results have been
announced by S. Bernstein (1938).

Literature
Bernstein, S. (1938) Limitation des modules des dérivées successives des solu-
tions des équations du type parabolique. – C. R. Acad. Sci. URSS, N. S.,
XVIII, 385–389.
Feller, W. (1936) Zur Theorie der stochastischen Prozesse (Existenz- und Ein-
deutigkeitssätze). – Math. Annalen CXIII, 113–160.f
Fréchet, M. (1938) Recherches théoriques modernes sur le caclul des probabil-
ités. Livre II (Traité du calcul des probabilités I, 3) – Paris, Gauthier–
Villars, 315 pp.
of the following calculation.
f This is [Feller 1936c] of these Selecta.

494 Volterra’s Theory of the Struggle for Life


∂2z
Kepinski, S. (1905) Über die Differentialgleichung ∂x2
+ m+1
x
∂z
∂x −n
x
∂z
∂y = 0. –
Math. Annalen LXI, 397–405.

Kolmogoroff, A. (1931) Über die analytischen Methoden in der Wahrschein-


lichkeitsrechnung. – Math. Annalen CIV, 415–458.g
—— (1935) Deviations from Hardy’s formula in partial isolation. – C. R. Acad.
Sci. URSS III (8), 129–132.h

—— (1936) Sulla teoria di Volterra della lotta per l’esistenza. – Giorn. Ist.
Ital. Attuari VII, 74–80.

Kolmogoroff, A., I. Petrovsky & N. Piscounoff (1937) Étude de l’équation de


la diffusion avec croissance de la quantité de matière et son application
à un problème biologique. – Bull. Univ. État Moscou, Sér. Int. Sect. A:
Math. Mécan. I, fasc 6, 1–25.i

Kostitzin, V. A. (1937) Biologie mathématique. – Paris, A. Collin, 215 pp.

Risser, R. (1932) Applications de la statistique à la démographie et à la bi-


ologie. 3me Partie. (Traité du calcul des probabilités III, 3) – Paris,
Gauthier–Villars, 252 pp.

Volterra, V. (1931) Leçons sur la théorie mathématique de la lutte pour la vie.


(Cahiers scientifques VII) – Paris, Gauthier–Villars, vi + 214 pp.

g On analytical methods in probability theory, in: A. N. Shiryayev (ed.): Selected Works

of A. N. Kolmogorov. Volume II: Probability Theory and Mathematical Statistics. Kluwer,


Dordrecht 1992, pp. 62–108.
h Reprinted in: A. N. Shiryayev (ed.): Selected Works of A. N. Kolmogorov. Volume II:

Probability Theory and Mathematical Statistics. Kluwer, Dordrecht 1992, pp. 179–181.
i Studies of the Diffusion with the Increasing Quantity of the Substance; Its Application

to a Biological Problem, in: O. A. Oleinik (ed.): I. G. Petrovsky: Selected Works. Part


II: Differential Equations and Probability Theory. Gordon and Breach, Amsterdam 1996,
pp. 106–132.

[Feller 1939a] Translation — Selected Works of W. Feller, Volume 1 495


Ó Springer International Publishing Switzerland 2015 497
R.L. Schilling et al. (eds.), Selected Papers I,
498 Duke Mathematical Journal 5 (1939) 661–674
[Feller 1939b] — Selected Works of W. Feller, Volume 1 499
500 Duke Mathematical Journal 5 (1939) 661–674
[Feller 1939b] — Selected Works of W. Feller, Volume 1 501
502 Duke Mathematical Journal 5 (1939) 661–674
[Feller 1939b] — Selected Works of W. Feller, Volume 1 503
504 Duke Mathematical Journal 5 (1939) 661–674
[Feller 1939b] — Selected Works of W. Feller, Volume 1 505
506 Duke Mathematical Journal 5 (1939) 661–674
[Feller 1939b] — Selected Works of W. Feller, Volume 1 507
508 Duke Mathematical Journal 5 (1939) 661–674
[Feller 1939b] — Selected Works of W. Feller, Volume 1 509
510 Duke Mathematical Journal 5 (1939) 661–674
Ó Springer International Publishing Switzerland 2015 511
R.L. Schilling et al. (eds.), Selected Papers I,
512 Fundamenta Mathematicae 32 (1939) 87–96
[Feller 1939c] — Selected Works of W. Feller, Volume 1 513
514 Fundamenta Mathematicae 32 (1939) 87–96
[Feller 1939c] — Selected Works of W. Feller, Volume 1 515
516 Fundamenta Mathematicae 32 (1939) 87–96
[Feller 1939c] — Selected Works of W. Feller, Volume 1 517
518 Fundamenta Mathematicae 32 (1939) 87–96
[Feller 1939c] — Selected Works of W. Feller, Volume 1 519
520 Fundamenta Mathematicae 32 (1939) 87–96
Translation of [Feller 1939c]

On the Existence of ¶ 87

So-called Kollektivs
By Willy Feller (Stockholm)

1. Following the well-known foundations of probability theorya by von


Mises there have been many attempts to prove the existence of certain (nu-
merical) label sequencesb which exhibit most general “irregularity”c .1 . Par-
tially, these investigations are ends in themselves, but most of them serve
for an axiomatic verification of the consistency of various models for kollek-
tivs which have emerged as restrictions of von Mises’ original definition. Yet,
only Copeland 2 and Wald obtained sufficiently general existence results which
may claim to be axiomatisation of probability theory containing all essential

Translated and typeset by René L. Schilling. I am grateful for critical comments and
suggestions by Hans Fischer and Zoran Vondraček. The symbol ¶ indicates a page break
in the original text, and the original pagination is shown in the margin. Footnotes indexed
by lowercase Roman letters contain editorial comments. We follow von Mises and do not
translate the German word Kollektiv but use kollektiv and its English plural kollektivs.
Feller uses P ⊂ γ to indicate that a point P is an element of the set γ; following modern
practice, we use P ∈ γ. We also use modern notation to indicate P = (P1 , P2 , . . .) and sets of
points {P1 , P2 , . . .}, respectively; Feller is not consistent in his notation and we have carefully
corrected the original.
1 Cf.the bibliography at the end of the paper.
2 We are mainly interested in the most general result by Copeland which is contained
in [6]. The theory of admissible numbers which was developed in other papers (not quoted
here) by Copeland is only a special case.
a Feller uses Wahrscheinlichkeitsrechnung which means literally “calculus of probability”.

Following modern custom we use “probability theory”, in German: Wahrscheinlichkeitstheo-


rie.
b German original: Merkmal(-Zahlen)folgen
c The German original Regellosigkeitseigenschaften means, literally, the property to obey

no rules at all

Ó Springer International Publishing Switzerland 2015 521


R.L. Schilling et al. (eds.), Selected Papers I,
ingredients of von Mises’ theory. Their approaches are closely related, but
Copeland’s results are only a special case of Wald’s results.
In conncetion with the so-called foundational questions of probability the-
ory, these approaches have recently been widely discussed, and the literature
on the existence of so-called kollektivs seems not yet to be in definite form.
Therefore it may be useful to highlight more clearly the essence of the exis-
tence results mentioned earlier. As it turns out, one needs to interpret Wald’s
¶ 88 (hence, a fortiori, Copeland’s) theorems in a set-theoretic way; ¶ without any
further efforts, modern theory provides not only an extremely simple and lucid
proof but also essential improvements.
A label-sequence is just a point in the infinite product space Π = π × π ×
π × . . . where π is the space of labels (i.e. any abstract space). Under suitable
regularity assumptions (that is σ-additivityd of the probability distribution in
π and measurability of the admissible “selection functions”) Wald’s existence
theorem appears as a far-reaching, but in view of the modern tools rather
simple, generalization of Borel’s theorem on the existence of normal numbers;
this theorem has already been interpreted in 1922 by Steinhaus [6] in a similar
way in an infinite product space (see also Łomnicki & Ulam [5]). Thus, the
theorem can easily be embedded into the modern theory of probability, e.g. in
Kolmogoroff ’s [4] axiomatisation. Wald, however, does not make the regularity
assumptions mentioned above, seemingly leaving the common realm of proba-
bility theory. It turns out, however, that in this connection these assumptions
are not essential and that one can remove them by a simple device.
Concerning their basic ideas, the following considerations are closely related
to a note by Doob [1] who shows in an elegant way how one may understand the
impossibility of a gambling system in an infinite dimensional Euclidean space.
Doob already remarks in this note that his theorem (which, by the way, has
been proved with different methods) easily entails, among other results, the
existence of Copeland’s “admissible numbers”.
No. 2 contains some necessary preparations and a discussion of the results;
no. 3 contains the proof of the special case mentioned above, in no. 4 we remove
the restrictive regularity assumptions.

2. Let π be an arbitrary set, containing at least two elements, which we


will call, following probabilistic custom, “label space” and whose elements are
called labels. Φ is a field of subsets of π, and p(γ) is a set function defined
¶ 89 on Φ such that p(π) = 1. From now on we ¶ assume that there are countably
many sets ϕk , k = 1, 2, . . . , in Φ such that for any γ ∈ Φ

(1) p(γ) = fin inf p(ϕi ) = fin sup p(ϕi ).


ϕi ⊃γ ϕi ⊂γ

Wald defines an n-ary (n ≥ 1) selection function as a function fn (P1 , P2 , . . . , Pn )


which assigns to each ordered n-tuple P1 , . . . , Pn of elements from π the value 0

d Feller uses absolut additiv (absolutely additive). For details of this nomenclature, see

522 On the Existence of So-called Kollektivs


or 1. f0 always means 0 or 1. An infinite sequence f0 , f1 (P1 ), . . . , fn (P1 , . . . , Pn ),
. . . is said to be a selection rule A. For any sequence P = (P1 ,P2 , . . .) of elements
of π, the selection rule A produces a new sequence P ∗ = (Pr1 , Pr2 , . . .) by select-
ing all points Prk , r1 < r2 < . . ., such that

(2) frk −1 (P1 , . . . , Prk −1 ) = 1.

The sequence rk may, of course, be empty3 .


For a set γ ∈ Φ we denote by aN (γ; P, A) the number of indices rk with
k ≤ N such that Prk ∈ γ. The sequence P = (P1 , . . .) is said to be regular
with respect to the selection rule A if the sequence rk is either terminating or
satisfies
1
(3) lim aN (γ; P, A) = p(γ)
N
for all γ ∈ Φ.
Finally, if Σ denotes a set of selection rules, then P is a kollektiv with
respect to Σ and the set-function p(γ) defined on Φ if it is regular with respect
to all selection rules from Σ. The main result4 of Wald states: Given the
assumptions on p(γ) and if Σ is countable, then there are as many kollektivs
as there are points in the continuum 5 .
¶ Let us understand the sequences P = (P1 , . . .) as elements of the product ¶ 90
space Π = π × π × π × . . .. In order to obtain a measure theory in Π we assume,
at first, that p(γ) is continuous (in the sense of Kolmogoroff [3]): For any
sequence γ1 ⊃ γ2 ⊃ . . . of sets from Φ which decreases monotonically to ∅ we
have

(4) lim p(γi ) = 0.

According to a well-known extension theorem by Banach (cf. Kolmogoroff


[4], p. 16) we can extend p(γ) to a σ-additive set function defined on a σ-field
Φ.
3 Consider, for instance, rolling a die and “select” those trials which follow the outcome
“1”.
4 The other results of Wald discuss the practical construction of kollektivs and related

questions.
5 Copeland’s results (cf. footnote2 ) are a special case of this. First of all, he essentially

considers only Euclidean and finite spaces and σ-additive p(γ). The greatest restriction
concerns, however, the admissible selection functions. Only the following procedures are
considered:
a) Let s1 < s2 < . . . be any fixed sequence of numbers; one selects the subsequence
Ps1 , Ps2 , . . . from the sequence P1 , P2 , . . .. This means that the selection functions fn are in-
dependent of the Pi .
b) Moreover, it is required, that for any N -tuple of sets γ1 , . . . , γN in Φ the relative fre-
quency of those sub-strings PnN +1 , PnN +2 , . . . , P(n+1)N for which PnN +r ∈ γr tends to-

p. 14 of H. Hahn and A. Rosenthal: Set Functions. The Univ. of New Mexico Press,
Albuquerque 1948.

[Feller 1939c] Translation — Selected Works of W. Feller, Volume 1 523


Let λ1 , . . . , λn be arbitrary subsets of π; following Kolmogoroff we call the
set of points P = (P1 , . . .) in Π such that Pi ∈ λi , i = 1, . . . , n, a rank-n cylinder
set and write {λ1 × λ2 × . . . × λn }. If, in particular, λi ∈ Φ, then we set

(5) |{λ1 × λ2 × . . . × λn }| = p(λ1 ) · p(λ2 ) · . . . · p(λn ).

Łomnicki & Ulam [5] and Kolmogoroff [4], III, §4, have shown that this
set function can be extended to a σ-additive measure on Π; in particular, we
have |Π| = 1.
In this setting, a selection function fn (P1 , . . . , Pn ) is a particular two-valued
function defined on Π. At first we will only consider measurable selection rules.
Although this restriction is, in particular from the point of view of applied
probability theory, quite natural, we will do away with it as well as with (4).
We are going to show
Under the assumptions on p(γ) almost all points in Π are regular with re-
spect to any arbitrary (but fixed) measurable selection rule A = {f0 , f1 (P1 ), . . .}.
Since Wald and Copeland allow only countably many selection rules, this result
contains their existence theorems.
¶ 91 ¶ Denote by A0 the selection rule for which we have identically fn ≡ 1, i.e.
which maps Π identically onto itself. The regular points with respect to A0
(which assigns to each set γ ∈ Φ the total frequency p(γ)) are the analogue of
Borel’s normal numbers; the latter are obtained if π consists of finitely many
points only (Steinhaus [6]). In this particular case, our theorem is of course
just the set-theoretic interpretation of the “strong law of large numbers”.
At the same time, this shows how to improve the theorem in a trivial
way. For this we have only to replace the requirement of normality by any
stricter assumption which also holds “almost surely”. For example, among
all admissible kollektivs there are some where the relative frequency of a set
γ ∈ Φ converges to p(γ) but stays always ≥ p(γ). It is, however, known that
for almost all points of Π the corresponding deviations oscillate about 0, and
one also has quite precise estimates for this. If we replace the requirement of
normality by a more stringent one in this vein, then it is still possible to adapt
the following proof and, doing so, to give a “more strict” or “more precise”
notion of a kollektiv. It is, of course, a mere matter of taste which assumptions
are required to hold for a kollektiv or are thought to be “indispensable” for
the foundations of probability theory6 , respectively. In the sequel I will not

wards the limit p(γ1 )p(γ2 ) . . . p(γN ). As one can see, Copeland’s assumptions are weaker
than Wald’s, even if we admit for the latter only such selection rules which depend only on
boundedly many of the Pi .
6 Often, in particular by Fréchet [2], it is pointed out that von Mises’ notion of kollektiv

excludes, a priori, only certain events of probability zero from the consideration in an arbi-
trary way. At least, this is done by von Mises in a canonical way since his starting point is
the description of the most elementary experience, i.e. the experience of the excluded gam-
bling system. Wald and Copeland, however, completely alter the system of selection rules,
and it is not mentioned at all, which events should actually be excluded from the theory.

524 On the Existence of So-called Kollektivs


dwell on this point since the interest in such considerations seems to be more
than doubtful for the foundations of probability theory7 .
¶ In its present form, our theorem depends essentially on the restrictive ¶ 92
assumption (4) and the restriction on measurable selection functions. Using
non-measurable functions it is easy to construct examples such that the outer
measure of the set of points which fail to be regular for some selection rule
becomes 1. In the currently adopted probabilistic practice such cases are
a priori excluded. In the particular case considered here, however, it turns
out that the measure in Π can be modified in such a way that all asymptotic
properties remain valid while all selection functions become measurable. Using
this, by the way rather arbitrary, measure, our theorem remains valid even in
the most general case (i.e. even without the continuity assumption (4)).

3. As we have announced we require throughout this section the continuity


assumption (4) entailing that the prescription (5) yields a unique measure
theory in Π. Let A = {f0 , f1 (P1 ), f2 (P1 , P2 ), . . .} denote a fixed selection rule
with fn being measurable point functions in Π.
In order to show that P = (P1 , P2 , . . .) is regular with respect to A, it
suffices, because of (1), to prove that (3) holds if γ is any of the basic sets
ϕi of Φ. Thus, it is enough to prove: For any fixed set γ ∈ Φ (and the given
selection rule A) the relation (3) holds for almost all points P in Π.
The action of A on P yields, say, the sequence Pr1 , Pr2 . . . (cf. (2)). Denote
by an,k the set of all points of Π for which the sequence ri contains at least n
elements and such that among the points Pr1 , . . . , Prn exactly k belong to the
set γ (if k < 0 and n < k we have to interpret an,k , of course, as the empty set;
a similar convention is used in the sequel). We want to show that8

n
(6) |an,k | ≤ {p(γ)}k {1 − p(γ)}n−k
k
holds.
¶ The proof is by induction. If n = 1, let β1t denote the set of points ¶ 93
for which r1 = t. By at1,1 we denote the subset of β1t such that Pt = Pr1 ∈
γ. Thus, β1t is the intersection of those (t − 2) sets, where fi = 0 for i =
1, 2, . . . , t − 2 and the set where ft−1 = 1, which means that β1t is measurable.
If Q = (Q1 , Q2 , . . .) is contained in β1t , then β1t contains all points of the form
(Q1 , . . . , Qt−1 , X1 , X2 , . . .) and we see that at1,1 is the intersection of β1t with
the rank-t cylinder set {π × π × . . . × π × γ}. Thus, we have |at1,1 | = |β1t | · p(γ),
Therefore, I think that these theories, even regarding their content, are only loosely con-
nected with von Mises’ system.
7 In this connection one may refer to the important and much deeper irregularity consider-

ations which are connected with the ergodic theorems of statistical mechanics; cf. E. Hopf [3].
8 If A = A (cf. 2) one has, of course, equality in (6). In general, this is not true, simply
0
because of those points for which the sequence ri terminates. The binomial law (6) does,
in general, not even give the relative magnitude of the |an,k | among themselves, and any
single selection function may well strongly favour certain sets.

[Feller 1939c] Translation — Selected Works of W. Feller, Volume 1 525


and  
|a1,1 | = |at1,1 | = p(γ) |β1t | ≤ p(γ).
t t

Symmetry considerations yield, of course, also |a1,0 | ≤ 1 − p(γ).


t
In general, denote by βn,k the set of all points P of Π satisfying rn = t and
such that among the points Pr1 , . . . , Prn−1 exactly k are from the set γ. For
fixed n and k these sets are mutually disjoint, and we have

(7) t
βn,k ⊂ an−1,k .g
t

Let atn,k and ātn,k be subsets of βn,k


t such that Prn ∈ γ and Prn ∈ / γ, re-
spectively. Again we have |atn,k | = p(γ) · |βn,k
t | and |āt | = {1 − p(γ)} · |β t |.
n,k n,k
Since  
an,k = atn,k−1 + ātn,k ,
t t

we see inductively, using (7), that


|an,k | ≤ p(γ)|an−1,k−1 | + {1 − p(γ)}|an−1,k |
$ %
n−1 n−1
≤ + {p(γ)}k {1 − p(γ)}n−k . q.e.d.h
k−1 k
From (6) we conclude using a well-known estimate for the binomial coeffi-
cients – Cantelli’s strong law of large numbers – that for each fixed  > 0 the
measure of the set

 
an,k
n=N | −p(γ)|>
k
n

tends to 0 as N → ∞. The measure of those points for which the upper or


1
lower limit of N aN (γ; P, A) deviates by more than  from p(γ) is, therefore, 0,
and this shows the assertion.

¶ 94 ¶ 4. We will now pass on to the case of arbitrary selection functions


discarding, at the same time, the assumption (4). First, we want to construct
a measure in Π for which all selection functions become measurable. In order
to do so, we construct for any natural number n a non-negative and σ-additive
set function pn (γ) which is defined on all subsets of π, and for which

(8) pn (ϕi ) = p(ϕi ) for i = 1, 2, . . . , n and pn (π) = 1.

Such functions do always exist. The simplest construction is probably the


following. Using the intersections ϕi1 ϕi2 · · · ϕin we easily obtain at most 2n
mutually disjoint sets ψ1 , . . . , ψN , (N ≤ 2n ), such that each ϕi , i ≤ n, can be
g The symbol

indicates
*t k−1   the +union of disjoint sets
h k
The misprint n−1
+ n−1
of the original text has been corrected.

526 On the Existence of So-called Kollektivs


written as the union of certain ψj . Since Φ is a field, the sets ψj are again in
Φ. For each i ≤ N we pick a point Qi ∈ ψi and set

pn ({Qi }) = p(ψi );

for all subsets γ ⊂ π \ {Q1 } \ {Q2 } \ . . . \ {QN } we define pn (γ) = 0. Because of


the additivity of p(γ), it is clear that pn (γ) satisfies the above assumptions.
In order to construct a measure theory in Π we define for the cylinder set
{λ1 × λ2 × . . . × λN } (cf. no. 2, p. 90)

|{λ1 × λ2 × . . . × λn }| = p1 (λ1 ) · p2 (λ2 ) · . . . · pn (λn ).

According to Kolmogoroff’s measure theory for infinite dimensional spaces


([4],III, §4) we can extend this set function uniquely to a measure in Π. In
the rank-n cylinder set {π × π × . . . × π} there are at most 2 × 22 × 23 × . . . × 2n
points whose union has measure 1, such that all subsets of this cylinder set
become measurable. Therefore, all selection functions are measurable.
Now it is easy to show: If γ denotes any of the sets ϕi and A is a fixed
selection rulei , then (3) holds for almost all points in Π. Indeed, if we observe
that pn (ϕi ) = p(ϕi ) for all n ≥ i, we see that the proof in no. 3 carries over
almost literally, if we leave out all indices rk < i, which is not essential for
(3). By (1) we immediately conclude that for our definition of a measure in
Π almost all points are regular for any fixed selection rulei . This contains the
most general existence theorem of Wald.

¶ Literature. ¶ 95

a) For the construction of kollektivs (and related topics).9

A. H. Copeland:2
[1] Admissible numbers in the theory of probability. Amer. Journ. Math. 50
(1928). pp. 535–552.l

9 For the original approach of von Mises (where set-theoretic existence questions do not

play any role) we refer to von Mises: Wahrscheinlichkeitsrechnung. Leipzig and Vienna,
1931.j
Related to the notion of kollektiv, but of essentially different kind is Tornier’s interpre-
tation of probabilities using frequencies of matrices. For this we refer to E. Tornier: Wahr-
i Inthe original text: selection function
l Page numbers missing in the original text.
j There seems to be no English translation, but Hilda Geiringer’s edition of von Mises’

lectures: Mathematical theory of probability and statistics. Academic Press, New York and
London 1964, may serve as a substitute. Note, however, that the 1964 lectures include
many developments since 1931; in particular the treatement of kollektivs differs from the
monograph of 1931. For the history and role of this book, see also the lucid reviews by J.L.
Doob in Mathematical Reviews [MR0178486 (31 #2743)] and R. Theodorescu in Zentralblatt
[Zbl 0132.12303].

[Feller 1939c] Translation — Selected Works of W. Feller, Volume 1 527


[2] Admissible numbers in the theory of geometrical probability. Ibid. 53
(1931). pp. 153–162.
[3] The theory of probability from the point of view of admissible numbers.
Ann. Mathem. Statistics, 3, (1932), pp. 143–156.
[4] Point set theory applied to the random selection of the digits of an admis-
sible number. Amer. Journ. Math. 58 (1936), pp. 181–192.
[5] Admissible numbers. Rev. Fac. Sci. Univ. Istanbul, 1 (1936), pp. 52–57.
[6] Consistency of the conditions determining Kollektivs. Trans. Amer. Math.
Soc. 42 (1937), pp. 333–357.
K. Dörge:

[7] Zu der von R. v. Mises gegebenen Begründung der Wahrscheinlichkeits-


rechnung. Math. Zeitschr. 32 (1932), pp. 232–258.
R. Iglisch:

[8] Zum Aufbau der Wahrscheinlichkeitsrechnung. Math. Ann. 107 (1933),


pp. 471–484.

R. v. Mises:

[9] Über Zahlenfolgen, die ein kollektiv-ähnliches Verhalten zeigen. Math.


Ann. 108 (1933), pp. 757–772.

H. Reichenbach:

[10] Axiomatik der Wahrscheinlichkeitsrechnung. Math. Zeitschr. 34 (1932),


pp. 568–619.

E. Tornier:

[11] Bemerkungen zu der Arbeit von Herrn Iglisch: Zum Aufbau der Wahr-
scheinlichkeitsrechnung. Math. Ann. 108 (1933), pp. 319–320.

J. A. Ville:

[12] Sur la notion de collectif. C. R. Acad. Sci. Paris, 203 (1936), pp. 26–27.
¶ 96 ¶ A. Wald:10

scheinlichkeitsrechnung und allgemeine Integrationstheorie. Leipzig and Berlin 1936, or E.


Kamke: Wahrscheinlichkeitstheorie, Leipzig 1932.
10 A recent talk on the notion of a kollektiv, presented at the colloquium on probability

theory organized 1937 by the University of Geneva will soon appear in the Actualités Sci-
entifiques, Hermann, Paris. m
m A. Wald: Die Widerspruchsfreiheit des Kollektivbegriffes (The Consistency of the No-
tion of Kollektiv) In: P. Cantelli, W. Feller, M. Fréchet, R. de Misès, J.F. Steffensen et A.
Wald: Colloque consacré à la Théorie des Probabilités. Deuxième Partie: Les Fondements
du Calcul des Probabilités. Hermann, Actualités Scientifiques et Industrielles 735, Paris
1938, pp. 79–99.

528 On the Existence of So-called Kollektivs


[13] Sur la notion de collectif dans le calcul des probabilités. C. R. Acad. Sci.
Paris, 202 (1936), pp. 180–183.
[14] Die Widerspruchsfreiheit des Kollektivbegriffs der Wahrscheinlichkeits-
rechnung. Ergebn. math. Kolloqu. 8, (1937), pp. 38–72.n

b) Further literature.

J. L. Doob:
[1] Note on Probability. Ann. of Math. 37 (1936), pp. 363–367.
M. Fréchet:

[2] Recherches théoriques modernes sur le calcul des probabilités. (in Borel’s
Traitéo 1, Fasc. 3), livre 1, Paris 1937
E. Hopf:
[3] On causality, statistics and probability. Journ. Math. Phys. of Massachusetts
Inst. Techn. 13 (1934), pp. 50–102.

A. Kolmogoroff:

[4] Grundbegriffe der Wahrscheinlichkeitsrechnung. Ergebn. Math. 2.3. Berlin


1933.p
Z. Łomnicki & S. Ulam:

[5] Sur la théorie de la mesure dans les espaces combinatoires et son application
au calcul des probabilités I. Fund. Math. 23 (1934), pp. 237–278.

H. Steinhaus:
[6] Les probabilités dénombrables et leur rapport à la théorie de la mesure.
Fund. Math. 4 (1923), pp. 286–310.

n Reprinted (with English commentaries) in: E. Dierker and K. Sigmund (eds.): Karl

Menger: Ergebnisse eines Mathematischen Kolloquiums (German and English Edition).


Springer, Wien 1998.
o É. Borel (ed.): Traité du Calcul des Probabilités et de ses Applications. Gauthier-

Villars, Paris 1924–1965, some volumes – including Fréchet’s contribution – have second
editions.
p English translation: A. N. Kolmogorov: Foundations of the theory of probability. Trans-

lation edited by Nathan Morrison, with an added bibliography by A. T. Bharucha-Reid.


Chelsea Publishing Co., New York 1956.

[Feller 1939c] Translation — Selected Works of W. Feller, Volume 1 529


Ó Springer International Publishing Switzerland 2015 531
R.L. Schilling et al. (eds.), Selected Papers I,
532 RAD – Bull. Int. Acad. Sci. Yougoslave 32 (1939) 106–113
[Feller 1939d] — Selected Works of W. Feller, Volume 1 533
534 RAD – Bull. Int. Acad. Sci. Yougoslave 32 (1939) 106–113
[Feller 1939d] — Selected Works of W. Feller, Volume 1 535
536 RAD – Bull. Int. Acad. Sci. Yougoslave 32 (1939) 106–113
[Feller 1939d] — Selected Works of W. Feller, Volume 1 537
538 RAD – Bull. Int. Acad. Sci. Yougoslave 32 (1939) 106–113
Ó Springer International Publishing Switzerland 2015 539
R.L. Schilling et al. (eds.), Selected Papers I,
540 Transactions AMS 48 (1940) 488–515
[Feller 1940c] — Selected Works of W. Feller, Volume 1 541
542 Transactions AMS 48 (1940) 488–515
[Feller 1940c] — Selected Works of W. Feller, Volume 1 543
544 Transactions AMS 48 (1940) 488–515
[Feller 1940c] — Selected Works of W. Feller, Volume 1 545
546 Transactions AMS 48 (1940) 488–515
[Feller 1940c] — Selected Works of W. Feller, Volume 1 547
548 Transactions AMS 48 (1940) 488–515
[Feller 1940c] — Selected Works of W. Feller, Volume 1 549
550 Transactions AMS 48 (1940) 488–515
[Feller 1940c] — Selected Works of W. Feller, Volume 1 551
552 Transactions AMS 48 (1940) 488–515
[Feller 1940c] — Selected Works of W. Feller, Volume 1 553
554 Transactions AMS 48 (1940) 488–515
[Feller 1940c] — Selected Works of W. Feller, Volume 1 555
556 Transactions AMS 48 (1940) 488–515
[Feller 1940c] — Selected Works of W. Feller, Volume 1 557
558 Transactions AMS 48 (1940) 488–515
[Feller 1940c] — Selected Works of W. Feller, Volume 1 559
560 Transactions AMS 48 (1940) 488–515
[Feller 1940c] — Selected Works of W. Feller, Volume 1 561
562 Transactions AMS 48 (1940) 488–515
[Feller 1940c] — Selected Works of W. Feller, Volume 1 563
564 Transactions AMS 48 (1940) 488–515
[Feller 1940c] — Selected Works of W. Feller, Volume 1 565
566 Transactions AMS 48 (1940) 488–515
Ó Springer International Publishing Switzerland 2015 567
R.L. Schilling et al. (eds.), Selected Papers I,
568 Annals of Mathematical Statistics 12 (1941) 243–267
[Feller 1941a] — Selected Works of W. Feller, Volume 1 569
570 Annals of Mathematical Statistics 12 (1941) 243–267
[Feller 1941a] — Selected Works of W. Feller, Volume 1 571
572 Annals of Mathematical Statistics 12 (1941) 243–267
[Feller 1941a] — Selected Works of W. Feller, Volume 1 573
574 Annals of Mathematical Statistics 12 (1941) 243–267
[Feller 1941a] — Selected Works of W. Feller, Volume 1 575
576 Annals of Mathematical Statistics 12 (1941) 243–267
[Feller 1941a] — Selected Works of W. Feller, Volume 1 577
578 Annals of Mathematical Statistics 12 (1941) 243–267
[Feller 1941a] — Selected Works of W. Feller, Volume 1 579
580 Annals of Mathematical Statistics 12 (1941) 243–267
[Feller 1941a] — Selected Works of W. Feller, Volume 1 581
582 Annals of Mathematical Statistics 12 (1941) 243–267
[Feller 1941a] — Selected Works of W. Feller, Volume 1 583
584 Annals of Mathematical Statistics 12 (1941) 243–267
[Feller 1941a] — Selected Works of W. Feller, Volume 1 585
586 Annals of Mathematical Statistics 12 (1941) 243–267
[Feller 1941a] — Selected Works of W. Feller, Volume 1 587
588 Annals of Mathematical Statistics 12 (1941) 243–267
[Feller 1941a] — Selected Works of W. Feller, Volume 1 589
590 Annals of Mathematical Statistics 12 (1941) 243–267
[Feller 1941a] — Selected Works of W. Feller, Volume 1 591
Ó Springer International Publishing Switzerland 2015 593
R.L. Schilling et al. (eds.), Selected Papers I,
594 Duke Mathematical Journal 9 (1942) 885–892
[Feller 1942] — Selected Works of W. Feller, Volume 1 595
596 Duke Mathematical Journal 9 (1942) 885–892
[Feller 1942] — Selected Works of W. Feller, Volume 1 597
598 Duke Mathematical Journal 9 (1942) 885–892
[Feller 1942] — Selected Works of W. Feller, Volume 1 599
600 Duke Mathematical Journal 9 (1942) 885–892
Ó Springer International Publishing Switzerland 2015 601
R.L. Schilling et al. (eds.), Selected Papers I,
602 Transactions AMS 54 (1943) 361–372
[Feller 1943b] — Selected Works of W. Feller, Volume 1 603
604 Transactions AMS 54 (1943) 361–372
[Feller 1943b] — Selected Works of W. Feller, Volume 1 605
606 Transactions AMS 54 (1943) 361–372
[Feller 1943b] — Selected Works of W. Feller, Volume 1 607
608 Transactions AMS 54 (1943) 361–372
[Feller 1943b] — Selected Works of W. Feller, Volume 1 609
610 Transactions AMS 54 (1943) 361–372
[Feller 1943b] — Selected Works of W. Feller, Volume 1 611
612 Transactions AMS 54 (1943) 361–372
Ó Springer International Publishing Switzerland 2015 613
R.L. Schilling et al. (eds.), Selected Papers I,
614 Transactions AMS 54 (1943) 373–402
[Feller 1943c] — Selected Works of W. Feller, Volume 1 615
616 Transactions AMS 54 (1943) 373–402
[Feller 1943c] — Selected Works of W. Feller, Volume 1 617
618 Transactions AMS 54 (1943) 373–402
[Feller 1943c] — Selected Works of W. Feller, Volume 1 619
620 Transactions AMS 54 (1943) 373–402
[Feller 1943c] — Selected Works of W. Feller, Volume 1 621
622 Transactions AMS 54 (1943) 373–402
[Feller 1943c] — Selected Works of W. Feller, Volume 1 623
624 Transactions AMS 54 (1943) 373–402
[Feller 1943c] — Selected Works of W. Feller, Volume 1 625
626 Transactions AMS 54 (1943) 373–402
[Feller 1943c] — Selected Works of W. Feller, Volume 1 627
628 Transactions AMS 54 (1943) 373–402
[Feller 1943c] — Selected Works of W. Feller, Volume 1 629
630 Transactions AMS 54 (1943) 373–402
[Feller 1943c] — Selected Works of W. Feller, Volume 1 631
632 Transactions AMS 54 (1943) 373–402
[Feller 1943c] — Selected Works of W. Feller, Volume 1 633
634 Transactions AMS 54 (1943) 373–402
[Feller 1943c] — Selected Works of W. Feller, Volume 1 635
636 Transactions AMS 54 (1943) 373–402
[Feller 1943c] — Selected Works of W. Feller, Volume 1 637
638 Transactions AMS 54 (1943) 373–402
[Feller 1943c] — Selected Works of W. Feller, Volume 1 639
640 Transactions AMS 54 (1943) 373–402
[Feller 1943c] — Selected Works of W. Feller, Volume 1 641
642 Transactions AMS 54 (1943) 373–402
Ó Springer International Publishing Switzerland 2015 643
R.L. Schilling et al. (eds.), Selected Papers I,
644 Annals of Mathematical Statistics 14 (1943) 389–400
[Feller 1943d] — Selected Works of W. Feller, Volume 1 645
646 Annals of Mathematical Statistics 14 (1943) 389–400
[Feller 1943d] — Selected Works of W. Feller, Volume 1 647
648 Annals of Mathematical Statistics 14 (1943) 389–400
[Feller 1943d] — Selected Works of W. Feller, Volume 1 649
650 Annals of Mathematical Statistics 14 (1943) 389–400
[Feller 1943d] — Selected Works of W. Feller, Volume 1 651
652 Annals of Mathematical Statistics 14 (1943) 389–400
[Feller 1943d] — Selected Works of W. Feller, Volume 1 653
654 Annals of Mathematical Statistics 14 (1943) 389–400
Ó Springer International Publishing Switzerland 2015 655
R.L. Schilling et al. (eds.), Selected Papers I,
656 Annals of Mathematical Statistics 16 (1945) 319–329
[Feller 1945a] — Selected Works of W. Feller, Volume 1 657
658 Annals of Mathematical Statistics 16 (1945) 319–329
[Feller 1945a] — Selected Works of W. Feller, Volume 1 659
660 Annals of Mathematical Statistics 16 (1945) 319–329
[Feller 1945a] — Selected Works of W. Feller, Volume 1 661
662 Annals of Mathematical Statistics 16 (1945) 319–329
[Feller 1945a] — Selected Works of W. Feller, Volume 1 663
664 Annals of Mathematical Statistics 16 (1945) 319–329
[Feller 1945a] — Selected Works of W. Feller, Volume 1 665
Ó Springer International Publishing Switzerland 2015 667
R.L. Schilling et al. (eds.), Selected Papers I,
668 Bulletin AMS 51 (1945) 800–832
[Feller 1945b] — Selected Works of W. Feller, Volume 1 669
670 Bulletin AMS 51 (1945) 800–832
[Feller 1945b] — Selected Works of W. Feller, Volume 1 671
672 Bulletin AMS 51 (1945) 800–832
[Feller 1945b] — Selected Works of W. Feller, Volume 1 673
674 Bulletin AMS 51 (1945) 800–832
[Feller 1945b] — Selected Works of W. Feller, Volume 1 675
676 Bulletin AMS 51 (1945) 800–832
[Feller 1945b] — Selected Works of W. Feller, Volume 1 677
678 Bulletin AMS 51 (1945) 800–832
[Feller 1945b] — Selected Works of W. Feller, Volume 1 679
680 Bulletin AMS 51 (1945) 800–832
[Feller 1945b] — Selected Works of W. Feller, Volume 1 681
682 Bulletin AMS 51 (1945) 800–832
[Feller 1945b] — Selected Works of W. Feller, Volume 1 683
684 Bulletin AMS 51 (1945) 800–832
[Feller 1945b] — Selected Works of W. Feller, Volume 1 685
686 Bulletin AMS 51 (1945) 800–832
[Feller 1945b] — Selected Works of W. Feller, Volume 1 687
688 Bulletin AMS 51 (1945) 800–832
[Feller 1945b] — Selected Works of W. Feller, Volume 1 689
690 Bulletin AMS 51 (1945) 800–832
[Feller 1945b] — Selected Works of W. Feller, Volume 1 691
692 Bulletin AMS 51 (1945) 800–832
[Feller 1945b] — Selected Works of W. Feller, Volume 1 693
694 Bulletin AMS 51 (1945) 800–832
[Feller 1945b] — Selected Works of W. Feller, Volume 1 695
696 Bulletin AMS 51 (1945) 800–832
[Feller 1945b] — Selected Works of W. Feller, Volume 1 697
698 Bulletin AMS 51 (1945) 800–832
[Feller 1945b] — Selected Works of W. Feller, Volume 1 699
Ó Springer International Publishing Switzerland 2015 701
R.L. Schilling et al. (eds.), Selected Papers I,
702 Bulletin AMS 51 (1945) 583–598
[Feller 1945c] — Selected Works of W. Feller, Volume 1 703
704 Bulletin AMS 51 (1945) 583–598
[Feller 1945c] — Selected Works of W. Feller, Volume 1 705
706 Bulletin AMS 51 (1945) 583–598
[Feller 1945c] — Selected Works of W. Feller, Volume 1 707
708 Bulletin AMS 51 (1945) 583–598
[Feller 1945c] — Selected Works of W. Feller, Volume 1 709
710 Bulletin AMS 51 (1945) 583–598
[Feller 1945c] — Selected Works of W. Feller, Volume 1 711
712 Bulletin AMS 51 (1945) 583–598
[Feller 1945c] — Selected Works of W. Feller, Volume 1 713
714 Bulletin AMS 51 (1945) 583–598
[Feller 1945c] — Selected Works of W. Feller, Volume 1 715
716 Bulletin AMS 51 (1945) 583–598
Ó Springer International Publishing Switzerland 2015 717
R.L. Schilling et al. (eds.), Selected Papers I,
718 Annals of Mathematical Statistics 16 (1945) 301–304
[Feller 1945d] — Selected Works of W. Feller, Volume 1 719
720 Annals of Mathematical Statistics 16 (1945) 301–304
Ó Springer International Publishing Switzerland 2015 721
R.L. Schilling et al. (eds.), Selected Papers I,
722 American Journal of Mathematics 68:2 (1946) 257–262
[Feller 1946a] — Selected Works of W. Feller, Volume 1 723
724 American Journal of Mathematics 68:2 (1946) 257–262
[Feller 1946a] — Selected Works of W. Feller, Volume 1 725
726 American Journal of Mathematics 68:2 (1946) 257–262
Ó Springer International Publishing Switzerland 2015 727
R.L. Schilling et al. (eds.), Selected Papers I,
728 Annals of Mathematics 47 (1946) 631–638
[Feller 1946b] — Selected Works of W. Feller, Volume 1 729
730 Annals of Mathematics 47 (1946) 631–638
[Feller 1946b] — Selected Works of W. Feller, Volume 1 731
732 Annals of Mathematics 47 (1946) 631–638
[Feller 1946b] — Selected Works of W. Feller, Volume 1 733
734 Annals of Mathematics 47 (1946) 631–638
Ó Springer International Publishing Switzerland 2015 735
R.L. Schilling et al. (eds.), Selected Papers I,
736 Annals of Mathematical Statistics 19 (1948) 177–189
[Feller 1948a] — Selected Works of W. Feller, Volume 1 737
738 Annals of Mathematical Statistics 19 (1948) 177–189
[Feller 1948a] — Selected Works of W. Feller, Volume 1 739
740 Annals of Mathematical Statistics 19 (1948) 177–189
[Feller 1948a] — Selected Works of W. Feller, Volume 1 741
742 Annals of Mathematical Statistics 19 (1948) 177–189
[Feller 1948a] — Selected Works of W. Feller, Volume 1 743
744 Annals of Mathematical Statistics 19 (1948) 177–189
[Feller 1948a] — Selected Works of W. Feller, Volume 1 745
746 Annals of Mathematical Statistics 19 (1948) 177–189
[Feller 1948a] — Selected Works of W. Feller, Volume 1 747
Erratum: Ann. Math. Stat. 21, 301 – 302 (1950)

748 Annals of Mathematical Statistics 19 (1948) 177–189


[Feller 1948a] — Selected Works of W. Feller, Volume 1 749
On Probability Problems ¶ 105

in the Theory of Counters

1. Introduction. Probability problems in the theory of counters have been


treated by many authors, but usually laborious special methods are employed.
Now the same type of problems appears in various disguises, in other appli-
cations. It is, therefore, methodologically important to notice that no special
technique is required. With a proper formulation all problems concerning
a single counter1 reduce to special instances of the theory of summation of
random variables. The familiar tools of the operational calculus then yield
the results in a simpler and often more precise manner than do the specially
devised methods.
The simplest problems can be described as follows. A counter is supposed
to register ‘random events’ such that (i) the probability of an event in any small
time-interval of length Δt is, independently of previous events a · Δt + o(Δt)
where a is a constant;2 (ii) the probability of more than one event is o(Δt).
This is equivalent to saying that the probability that the time T0 from an
arbitrary moment to the next event will not exceed t is given by ¶ ¶ 106

(1) F0 (t) = 1 − e−at (t > 0)

Alternatively one may say that the probability of exactly k events during any
time interval of length t is given by the Poisson law

(2) (at)k e−at /k!.

Typeset by René L. Schilling. The symbol ¶ indicates a page break in the original text,
and the original pagination is indicated in the margin. Originally underlined text appears
in italics.
1 Problems concerning coincidences in several counters are more delicate and mostly

unsolved.
2 The assumption that a is constant is a great simplification which is usually introduced.

The method outlined in the sequel applies directly also if a depends on the number of

Ó Springer International Publishing Switzerland 2015 751


R.L. Schilling et al. (eds.), Selected Papers I,
Due to the ‘resolving time’ the counter is unable to register all events. It
is customary to treat two ideal cases which are both approached by existing
mechanisms while most counters represent a compromise between the two
types.3
Type I. After each registration the counter is locked for a constant time τ .
An event is registered if, and only if, no registration has taken place during a
time τ preceding it. Mathematically, this type is much simpler than
Type II. The counter registers an event if, and only if, no event has occurred
during the preceding time interval of length τ . Here an event occurring at a
moment when the counter is locked prolongs the inoperative period. In theory,
the counter can remain locked indefinitely.
Type I has been recently treated by Gnedenko (1941) who has obtained
an expression for the average number M (t) of registrations. Subsequently
Kurbatov and Mann (1945) obtained a slightly simpler expression for M (t)
starting from an integral equation. They also obtained an estimate for the
difference between M (t) and its steady-state approximation. This estimate
has subsequently been improved by Mann (1946).
¶ 107 Furthergoing results have been obtained for Type II counters ¶ by Levert
and Scheen (1943) and Kosten (1943). They determined not only the average
M (t), but also the variance B(t) of the number of registrations. Only the latter
enables one to verify theoretical assumptions experimentally.4 For example,
for random events distributed according to (2) one has M (t) = at, B(t) = at.
From the fact that for cosmic rays the ratio B/M turned out to be different
from one, it has been concluded that cosmic rays are not random events. Levert
and Scheen rightly pointed out that this conclusion is erroneous and that for
actual counters the ratio B/M may be less than 12 .
In section 2 we shall outline a general method which applies to both types
and to more general cases. Special formulas relating to Type I will be found
in section 3, for Type II in section 5. In section 4 a method is outlined for
estimating the error committed by replacing M (t) or B(t) by their asymptotic
limit. The method applies generally to similar integral equations. It gives, in
an easy way, a simpler and sharper result than the estimates due to Kurbatov
and Mann (1945) and Mann (1946).

2. General Theory. In the sequel, boldface capitals will denote random


variables. [Comment (ed.): Both here and in the original random variables
are just normal capitals.] Denote by T0 the time from the beginning (when
the counter is not locked) to the first registration, and generally by Tk (k ≥ 0)
the time between registrations number k − 1 and k. The Tk are mutually

preceding events. Physically and formally such a dependence is equivalent to a dependence


of a on time, provided this dependence is a continuous one.
3 This information is taken from the comprehensive review by Maier-Leibnitz (1942)

where further details and references to previous work are found.


4 Some cruder tests have been occasionally used based, e.g., on the average number of

‘close-by’ registrations. Such tests require a more elaborate theory.

752 Courant Anniversary Volume (1948) pp. 105–115


independent random variables and

(3) Sk = T0 + T1 + . . . + Tk−1

is the time up to the k-th registration. Let N be the number of registrations


during time t. Clearly ¶ ¶ 108

(4) pk ≡ Pr{N = k} = Pr{Sk−1 ≤ t} − Pr{Sk ≤ t}.

The variables Tk , k ≥ 1, have the common distribution function

(5) Pr{Tk ≤ t} = F (t)

which will be determined later. The distribution of T is given by (1). The


distributions Fn (t) = Pr{Sn ≤ t} are obtained from the familiar relations
 t
(6) Fn+1 (t) = Fn (t − x) dF (x).
0

The actual integrations, as we shall see, need not be performed. From (4) we
see that

(7) pk = Fk−1 (t) − Fk (t).



Hence the mean number or registrations M (t) = kpk is given by


(8) M (t) = Fk (t).
k=0

As will be seen later on, for Type I counters Fk (t) can be written down im-
mediately, and even Type II requires only routine computations. However,
even if this were not so, the central limit theorem would give us a satisfactory
approximation to Fk (t), and therefore, both to pk and M (t).
It is preferable to use the operational calculus which enables us to describe
the asymptotic behavior of M (t) even in more general cases where an exact
formula is difficult to obtain. We introduce the Laplace transforms
 ∞  ∞
−st
(9) φ(s) = e dF (t), φk (s) = e−st dFk (t),
0 0

and
 ∞
(10) μ(s) = M (t)e−st dt.
0

¶ Equation (6) is equivalent to ¶ 109

(11) φk (s) = φ0 (s)φk−1 (s);

[Feller 1948c] — Selected Works of W. Feller, Volume 1 753


accordingly, taking the transforms in (7) we have only to sum a geometric
series and obtain5
φ0 (s)
(12) μ(s) =
s{1 − φ(s)}

In our examples it is easy to derive from (12) an explicit representation for


M (t). For a general theory it is more important to derive directly the asymp-
totic behavior of M (t). For that purpose it suffices to expand φ0 (s) and φ(s)
according to powers of s and determine in (12) the coefficient of s−2 and, if
one wants to luxuriate, that of s−1 . let
 ∞
(13) m= t dF (t),
0
 ∞  ∞
σ2 = (t − m)2 dF (t) = t2 dF (t) − m2 .
0 0

Then m is the average time between two registrations and σ 2 the corresponding
variance. From (9) we see that

(14) φ(s) = 1 − ms − 12 (σ 2 + m2 )s2 + . . . .

For φ0 (s) we have similarly


 t
(15) φ0 (s) = 1 − m0 s + . . . with m0 = t dF0 (t);
0

m0 is the average of T0 (in the case (2) we have m0 = 1/a). Substituting from
¶ 110 (14) and (15) into (12) we have ¶
$ %
1 1 σ2 m0 1
(16) μ(s) = + + − +···
ms2 2 2m 2 m s

This equation permits the immediate ‘Tauberian’ conclusion that the expected
number of registrations, M (t), satisfies the asymptotic relation6

t 1 1 σ 2 m0
(17) M (t) ∼ + + − + o(t).
m 2 2 m2 m
It may also be remarked that M (t) satisfies the integral equation
 t
(18) M (t) = F0 (t) + M (t − x) dF (x);
0

5 The factor s in the denominator is due to the fact that M (t) appears in (9) as coun-

terpart to dF (t) in (8).


6 The formal justification of this conclusion (and the corresponding one for B(t)) is in

the present case not difficult. It may be remarked that the integral equations (18) and (3)

754 Courant Anniversary Volume (1948) pp. 105–115


this follows either from (8) or (12) using (6).
Similarly, we obtain from (7) for the variance of N

(19) B(t) ≡ k 2 pk − M 2 (t) = D(t) + M (t) − M 2 (t)

where for simplicity we put




(20) D(t) = 2 kFk−1 (t).
k=1

For the Laplace transform we find therefore


 ∞
2 φ0 (s)
(21) δ(s) = e−st D(t) dt = .
0 s{1 − φ(s)}2

As before, we conclude that the variance B(t) of the number N of registrations


satisfies the asymptotic relation

(22) B(t) ∼ σ 2 t m−3 .

¶ Incidentally, D(t) is seen to satisfy the integral equation ¶ 111

 t
(23) D(t) = 2M (t) + D(t − x) dF (x).
0

3. The Type I. F0 (t) is given by (1), while, by definition,



⎨0 for t ≤ τ
(24) F (t) =
⎩1 − e−a(t−τ ) for t ≥ τ

Here F (t) differs from F0 (t) only by a change of origin. Now, if the integrations
(6) were applied to F0 (t) instead of to F (t), the derivative of Fn (t) would be
given by the Poisson expression (2). We conclude that

d ⎨0 if t ≤ kτ
(25) Fk (t) = ak+1 {t − kτ }k −a(t−kτ )
dt ⎩ e if t ≥ kτ.
k!
The same result follows operationally from

ae−τ s an+1 e−nτ s


(26) φ(s) = , φn (s) = .
a+s (a + s)n+1
are of the type familiar in the so-called renewal theory. Within the framework of this general
theory the passage from (16) to the asymptotic expansion (7) (and similarly for B(t)) has
been rigorously justified (Feller (1941) and, with more refinements, Täcklind (1944)).

[Feller 1948c] — Selected Works of W. Feller, Volume 1 755


Substituting from (25) into (8) (or, alternatively, expanding
a
(27) μ(s) =
s(s + a − ae−τ s )

according to powers of e−τ s ) one obtains finally

0t1 
[t/τ ]

ν
aρ (t − ντ )ρ
(28) M (t) = +1− e−a(t−ντ )
τ ρ!
ν=0 ρ=0


[t/τ ] ∞
 aρ (t − ντ )ρ
= e−a(t−ντ ) .
ρ!
ν=0 ρ=ν+1

Here [t/τ ] stands for the greatest integer not exceeding t/τ . Either of the
expressions (28) is simpler than the one obtained by Kurbatov and Mann.
¶ 112 (After performing the integrations the latter can ¶ be reduced to (28), however.
The identity of Gnedenko’s solution with (28) is more hidden).
Finally (17) and (22) provide the asymptotic expansions for the Type I :

at a2 τ 2
(29) M (t) ∼ +
1 + aτ 2(1 + aτ )2
at
(30) B(t) ∼ .
(1 + aτ )3

The leading term in (29) is familiar. The case of Levert and Scheen corresponds
approximately to τ x = .08 sec., a > 3. Then B/M ≈ 2/3 instead of 1 as would
be the case with ‘random events.’

4. Numerical Estimates. We shall now derive limits for the error M (t) −
at/(1 + aτ ) by a method which is of much wider applicability. For the Type I
the integral equation (18) reduces for t > τ to
 t
(31) M (t) = 1 − e−at + M (t − x)a · e−a(x−τ ) dx;
τ

for 0 < t < τ the integral is naturally to be replaced by zero. Consider now the
more general equation
 t
(32) A(t) = H(t) + A(t − x)ae−a(x−τ ) dx.
τ

It is obvious that H(t) ≤ 1 − e−at implies A(t) ≤ M (t). To obtain a lower


estimate for M (t) we put
at
(33) A(t) = + C;
1 + aτ

756 Courant Anniversary Volume (1948) pp. 105–115


(C is a constant to be determined). Substituting into (32) we obtain an H(t)
such that (33) will be a solution of (32). In order to have M (t) ≥ A(t) it
suffices to determine C so that

(34) H(t) ≤ 1 − eat .

If the inequality is reversed, we shall have M (t) ≤ H(t).


In our case we can put C = 0; then ¶ ¶ 113

⎧ at

⎪ for 0 < t < τ
⎨ 1 + aτ
(35) H(t) =
⎪ −a(t−τ )
⎩ a(t − τ ) − 1 − e

for t > τ,
1 + aτ 1 + aτ

and a simple computation shows that (34) holds. On te other hand, with
a2 τ 2
C = 2(1+aτ ) we get


⎪ at a2 τ 2

⎨ 1 + aτ + 2(1 + aτ ) for t < τ
(36) H(t) =

⎪ ( 2 2 ) −a(t−τ )

⎩1 − 1 − a τ e for t > τ
2 1 + aτ

and (34) holds with the inequality reversed. Therefore, for Type I and all t

at a2 τ 2
(37) 0 ≤ M (t) − ≤ .
1 + aτ 2(1 + aτ )

This estimate is both simpler and sharper than that of Kurbatov and Mann
(1945) and also than the improved estimate of Mann (1946).

5. Type II. The application of our formulas to Type II is less immediate


since F (t) has first to be determined. The procedure provides another instruc-
tive example for the summation of random variables. The time T during which
the counter is locked may be arbitrarily long since a large number of events
may occur in rapid succession (of which only the first is registered). Put

(38) p = e−aτ q = 1 − e−aτ .

The probability that, once the counter is locked, exactly ν events will prolong
the dead interval is q ν p. Now let T (i) be the time elapsed between the events
number i − 1 and i. The total locked time is T = T (1) + . . . + T (ν) . It is again a
sum of random variables but their number is itself a random variable. Clearly
1
(39) U (t) = (1 − e−at ), 0<t<τ
q

[Feller 1948c] — Selected Works of W. Feller, Volume 1 757


¶ 114 ¶ is the conditional probability that an event will succeed another by less than
t if it is known that the interval between them does not exceed τ . Accordingly,
if it is known that ν events have occurred during the locked interval, we have
for 0 < t < τ
(40) Pr{T (i) ≤ t} = U (t), i = 1, 2, . . . , ν.
The conditional probability distribution of the sum T could again be obtained
by successive integrations of the form (6), but it suffices to find its Laplace
transform. Now the transform of U (t) is
 τ
1 a
(41) ω(s) = e−st dU (t) = · · (1 − e−(a+s)τ )
0 q a+s
and therefore that of Pr{T (1) + . . . + T (ν) ≤ t} is simply ω ν (s). Adding the
constant resolving time τ following the last event necessitates a multiplication
of ω ν (s) by e−τ s . Therefore the conditional probability distribution of T
(assuming ν events during the passive time) has the transform ω ν (s)e−τ s .
The absolute distribution Pr{T ≤ t} has accordingly the Laplace transform


−sτ (a + s)e−(a+s)τ
(42) pe {qω(s)}ν = .
ν=0
sae−(a+s)τ

Now the time Tk between the (k − 1)st and the k-th registration is composed
of a time T (distributed according to (42)), and the time from the moment the
counter is again set free to the next event. The latter time is a random variable
a
distributed according to (1) with the Laplace transform a+s . Therefore, the
Laplace transform φ(s) of the distribution F (t) of Tk is given by
ae−(a+s)τ
(43) φ(s) = .
s + ae−(a+s)τ
¶ 115 ¶ Substituting into (12) we obtain finally
a * +
(44) μ(s) = 2 s + ae−(a+s)τ .
s (a + s)
This, of course, is the familiar operational image of

1 − e−at for t ≤ τ
(45) M (t) = −aτ −a
1−e + ae (t − τ ) for t ≥ τ.

This is the exact form for the average number of registrations for counters of
Type II. The variance B(t) is obtained in a similar way substituting from (43)
into (2). For t > 2τ we get
* +
(46) B(t) = ae−aτ (t − τ ) 1 − 2aτ e−aτ − e−aτ + (1 + aτ )2 e−2a ,
in accordance with Kosten (1943).

758 Courant Anniversary Volume (1948) pp. 105–115


LITERATURE

W. Feller (1941): On the integral equation of renewal theory. Ann. Math.


Statist. 12, pp. 243–267.
B.V. Gnedenko (1941): On the theory of Geiger-Müller counters (in Rus-
sian). Zurnal eksperimentaljnoi i teoreticeskoi fiziki, 11, pp. 101–106.

L. Kosten (1943): On the frequency distribution of the number of discharges


counted by a Geiger-Müller counter in a constant interval. Physica 10,
pp. 749–756.

J.D. Kurbatov and B.H. Mann (1945): Correction of G-M counter data.
Phys. Rev. 68, pp. 40–43.

C. Levert and W.L. Scheen (1943): Probability fluctuations of dischar-


ges in a Geiger-Müller counter produced by cosmic radiation. Physica
10, pp. 225–238.

H. Maier-Leibnitz (1942): Die Koinzidenzmethode und ihre Anwendung


auf kernphysikalische Probleme. Physikal. Zeitschrift 43, pp. 333–362.

H.B. Mann (1946): A note on the correction of Geiger-Müller counter data.


Quarterly Appl. Math. 4, pp. 307–309.
S. Täcklind (1945): Fourieranalytische Behandlung vom Erneuerungspro-
blem. Skandinavisk Aktuarietidskrift 1945, pp. 68–105.

[Feller 1948c] — Selected Works of W. Feller, Volume 1 759


Ó Springer International Publishing Switzerland 2015 761
R.L. Schilling et al. (eds.), Selected Papers I,
762 Bulletin AMS 55 (1949) 201–204
[Feller 1949a] — Selected Works of W. Feller, Volume 1 763
764 Bulletin AMS 55 (1949) 201–204
Ó Springer International Publishing Switzerland 2015 765
R.L. Schilling et al. (eds.), Selected Papers I,
766 Proc. Natl. Acad. Sci. USA 35 (1949) 605–608
[Feller 1949b] — Selected Works of W. Feller, Volume 1 767
768 Proc. Natl. Acad. Sci. USA 35 (1949) 605–608
Ó Springer International Publishing Switzerland 2015 769
R.L. Schilling et al. (eds.), Selected Papers I,
770 Berkeley Symposium Math. Statist. Probab. (1949) pp. 403–432
[Feller 1949c] — Selected Works of W. Feller, Volume 1 771
772 Berkeley Symposium Math. Statist. Probab. (1949) pp. 403–432
[Feller 1949c] — Selected Works of W. Feller, Volume 1 773
774 Berkeley Symposium Math. Statist. Probab. (1949) pp. 403–432
[Feller 1949c] — Selected Works of W. Feller, Volume 1 775
776 Berkeley Symposium Math. Statist. Probab. (1949) pp. 403–432
[Feller 1949c] — Selected Works of W. Feller, Volume 1 777
778 Berkeley Symposium Math. Statist. Probab. (1949) pp. 403–432
[Feller 1949c] — Selected Works of W. Feller, Volume 1 779
780 Berkeley Symposium Math. Statist. Probab. (1949) pp. 403–432
[Feller 1949c] — Selected Works of W. Feller, Volume 1 781
782 Berkeley Symposium Math. Statist. Probab. (1949) pp. 403–432
[Feller 1949c] — Selected Works of W. Feller, Volume 1 783
784 Berkeley Symposium Math. Statist. Probab. (1949) pp. 403–432
[Feller 1949c] — Selected Works of W. Feller, Volume 1 785
786 Berkeley Symposium Math. Statist. Probab. (1949) pp. 403–432
[Feller 1949c] — Selected Works of W. Feller, Volume 1 787
788 Berkeley Symposium Math. Statist. Probab. (1949) pp. 403–432
[Feller 1949c] — Selected Works of W. Feller, Volume 1 789
790 Berkeley Symposium Math. Statist. Probab. (1949) pp. 403–432
[Feller 1949c] — Selected Works of W. Feller, Volume 1 791
792 Berkeley Symposium Math. Statist. Probab. (1949) pp. 403–432
[Feller 1949c] — Selected Works of W. Feller, Volume 1 793
794 Berkeley Symposium Math. Statist. Probab. (1949) pp. 403–432
[Feller 1949c] — Selected Works of W. Feller, Volume 1 795
796 Berkeley Symposium Math. Statist. Probab. (1949) pp. 403–432
[Feller 1949c] — Selected Works of W. Feller, Volume 1 797
798 Berkeley Symposium Math. Statist. Probab. (1949) pp. 403–432
Ó Springer International Publishing Switzerland 2015 799
R.L. Schilling et al. (eds.), Selected Papers I,
800 Transactions AMS 67 (1949) 98–119
[Feller 1949d] — Selected Works of W. Feller, Volume 1 801
802 Transactions AMS 67 (1949) 98–119
[Feller 1949d] — Selected Works of W. Feller, Volume 1 803
804 Transactions AMS 67 (1949) 98–119
[Feller 1949d] — Selected Works of W. Feller, Volume 1 805
806 Transactions AMS 67 (1949) 98–119
[Feller 1949d] — Selected Works of W. Feller, Volume 1 807
808 Transactions AMS 67 (1949) 98–119
[Feller 1949d] — Selected Works of W. Feller, Volume 1 809
810 Transactions AMS 67 (1949) 98–119
[Feller 1949d] — Selected Works of W. Feller, Volume 1 811
812 Transactions AMS 67 (1949) 98–119
[Feller 1949d] — Selected Works of W. Feller, Volume 1 813
814 Transactions AMS 67 (1949) 98–119
[Feller 1949d] — Selected Works of W. Feller, Volume 1 815
816 Transactions AMS 67 (1949) 98–119
[Feller 1949d] — Selected Works of W. Feller, Volume 1 817
818 Transactions AMS 67 (1949) 98–119
[Feller 1949d] — Selected Works of W. Feller, Volume 1 819
820 Transactions AMS 67 (1949) 98–119

Common questions

Powered by AI

Feller's approach to probability emphasized the unification of frequency and measure through axiomatic methods, contrasting with Kolmogorov's analytical focus on foundational aspects of probability. Feller extended probability by applying measure theory in more generalized Baire spaces and explored applications beyond the confines Kolmogorov set, such as in his treatment of stochastic processes involving boundary behavior . Kolmogorov provided more formal mathematical definitions and focused on foundational work like the unconditional set theory treatment, which influenced probability axiomatization broadly .

Feller significantly influenced mathematical biology by applying his mathematical expertise to biological problems, notably in collaboration with Theodosius Dobzhansky, a key figure in evolutionary synthesis. His work on genetic and evolutionary models, like the critique of the Haldane paradox, provided mathematical clarity and innovative solutions that extended the quantitative methods available for studying evolutionary dynamics . This interdisciplinary approach helped bridge gaps between theoretical biology and mathematical modeling, broadening the scope and precision of biological research .

William Feller addressed the Haldane paradox, which posited that the cost of 'genetic deaths' during evolution by natural selection could slow down evolutionary change significantly. He showed that the paradox was based on spurious assumptions, particularly the unrealistic assumption of constant population size, which he had critiqued as early as 1952 . Through his work, Feller provided a mathematical framework to demonstrate that the evolutionary change could occur more rapidly than the paradox suggested by accommodating varying population sizes .

William Feller demonstrated persistence and a long-term commitment to problem-solving, often revisiting and refining problems he had started years earlier. He was known for not abandoning problems regardless of the completeness of existing solutions, which is evident in his dedication to enhancing the understanding of mathematical theories over decades . This approach highlights his thoroughness and commitment to achieving clarity and generality in mathematical concepts .

Feller's contributions to one-dimensional diffusion theory extended beyond solving specific problems to establishing a generalized framework for understanding stochastic processes. His work formulated conditions under which diffusion processes could be described, particularly in treating boundary behaviors and elaborations on Kolmogorov's earlier work. Feller's insights into diffusion processes have had lasting impacts on both theoretical frameworks and practical applications in fields such as physics, biology, and finance .

Feller's personality, described as ebullient and humorous, had a significant impact on his teaching and research approaches. His lively and fast-paced lecturing style, characterized by 'proof by intimidation,' made complex mathematical concepts engaging and memorable despite not always being complete or correct . His readiness to discuss and debate, even at the cost of being wrong, highlighted his preference for intellectual engagement over mere correctness, which fostered an inquisitive and dynamic learning environment .

Feller's academic journey and collaborations are reflected in his publications that often resulted from dialogues with leading figures of various fields. For instance, his association with Theodosius Dobzhansky influenced his work in evolutionary theory, and his collaboration efforts are evident in publications like [Feller 1966b], which denote institutional affiliations and interdisciplinary interests . His evolving academic roles, from Rockefeller Institute to Princeton University, also mirrored in his increasingly diverse research focus and publications .

Feller introduced a methodology in treating boundary conditions in stochastic processes by conceptualizing them as restrictions on the domain of operators that served as infinitesimal generators of appropriate semigroups. This approach allowed for a more nuanced understanding of the behavior of paths in Markov processes and generalized the use of differential operators within diffusion theory, revealing deep connections between operator theory and stochastic processes .

Feller's critique of constant population size assumptions significantly influenced modern evolutionary biology by identifying and addressing key limitations in the existing models of evolution, such as the Haldane paradox . By advocating for models that incorporate variable population sizes, his work facilitated the development of more robust frameworks that better account for real-world complexities in evolutionary dynamics. This shift has enabled more accurate predictions and analyses in evolutionary studies, supporting further advancements in the field .

Feller's understanding of the law of the iterated logarithm was enhanced through his ongoing effort to simplify and generalize complex probabilistic results. His paper "General analogues of the law of the iterated logarithm" surpassed his earlier work by achieving simplicity and generality after decades of refining his approach . By addressing foundational assumptions and expanding the conditions for its application, he made the theorem more accessible and useful, contributing to its application across various mathematical and scientific fields .

You might also like