William Feller (Auth.), René L. Schilling, Zoran Vondraček, Wojbor A. Woyczyński (Eds.) - Selected Papers I-Springer International Publishing (2015)
William Feller (Auth.), René L. Schilling, Zoran Vondraček, Wojbor A. Woyczyński (Eds.) - Selected Papers I-Springer International Publishing (2015)
Selected Papers
I
William Feller
Selected Papers I
W. Feller (around 1930)
Photograph courtesy of Joanne Elliott
William Feller
Selected Papers I
Edited by
René L. Schilling • Zoran Vondraček
Wojbor A. Woyczyński
123
William Feller
(1906 Zagreb – 1970 New York City)
Editors
René L. Schilling
Institut für Mathematische Stochastik
Technische Universität Dresden
Dresden
Germany
Zoran Vondraček
Department of Mathematics
University of Zagreb
Zagreb
Croatia
Wojbor A. Woyczyński
Department of Mathematics
Case Western Reserve University
Cleveland, OH
USA
ISBN 978-3-319-16858-6
Library of Congress Control Number: 2012954381
Math. Subj. Classification (2010): 01A75 60G05, 60F05, 60E07, 60J35, 60K05, 47D07
These volumes contain a selection of William Feller’s most important works on prob-
ability theory, mathematical biology, analysis and geometry. Feller was a prolific
writer, with many groundbreaking contributions which, almost 45 years after his
death, are still frequently quoted – both directly and indirectly. This abundance meant
that the decision which of his more than 100 research papers should be included
in these Selecta was not an easy task. Most of Feller’s English-language research
contributions have been included, and we also selected a fair share of his pre-1940
publications in German and French, six of which were translated into English for
these Selecta. His two most cherished contributions, the definitive Feller–Lindeberg
version of the Central Limit Theorem [Feller 1935c], and the deep On the theory of
stochastic processes [Feller 1936c] – a companion to Kolmogorov’s seminal work
on stochastic processes and PDEs – are here being made accessible in English for
the first time. We decided not to reprint Feller’s work on measure theory, and the
geometry selection contains only the fundamental Acta Mathematica contribution by
Busemann and Feller; in fact, Feller did not continue to work in these directions, and
we felt that his impact on these areas remained rather limited.
Feller is widely known for his brilliant two-volume textbook, An Introduction
to Probability Theory and Its Applications, on which he continuously worked since
the late 1940s. It is one of the few books which changed considerably from edition
to edition. The observant reader of these Selecta will notice that in later editions
Feller included many results from his (then) recent research papers. Consequently,
the Introduction to Probability Theory is still a “modern” book, and a treat for every
reader as well. For obvious reasons we could not include these books in the present
selection.
The papers are arranged in chronological order: Volume 1 covers the years 1928–
1950, and Volume 2 the period 1951–1971. Each volume contains additional mate-
rial, of which the most important are commentaries and essays written by the leading
experts in the areas covered by the Selecta. We are grateful to them for their will-
ingness to spend considerable time and effort to put various aspects of Feller’s work,
and its afterlife, into modern perspective; Ellen Baake & Anton Wakolbinger wrote
on Mathematical Biology, Hans Fischer on Foundations and the Central Limit Theo-
rem, Masatoshi Fukushima on Diffusions and Boundaries, Niels Jacob on Functional
Analysis, Ross Maller on Limit Theorems, Goran Peskir on Boundary Conditions
and Erhard Scholz on Geometry. The volumes would not have the same relevance to
v
modern mathematical research without their incisive analyses.
Thanks are due to Springer-Verlag for undertaking the publication. The team
around Catriona Byrne and Marina Reizakis was very supportive and helped the
project move forward throughout the past couple of years. We are grateful to all own-
ers of the copyrights for their cooperation, and for generously granting the reprint
permissions free of charge. These days this is no longer the rule, but an exception.
Only one publisher declined to help us. Thus the absence of the difficult-to-access
1938 survey paper [*Feller 1938a] on the foundations of probability theory leaves a
distressing gap in these Selecta.
Many colleagues and friends helped us during the preparation of these volumes.
We would like to thank all of them, in particular, Christian Berg, Ulrich Brehm,
Jürgen Elstrodt, Tony Knapp, Heinz König, Anders Martin-Löf and Hrvoje Šikić
for valuable comments. We are especially grateful to Joanne Elliott, a long-time
friend, and neighbour of the Feller family for insightful conversations about Feller’s
life in America, and for sharing generously her vast trove of historical photographs
and other materials; some are reproduced here with her permission. Several of the
surviving Feller’s PhD students contacted by us also expressed support for the idea of
this publication. Our student Ms. Franziska Kühn (TU Dresden) helped us with the
intricacies of typesetting this work in Latex and other administrative chores.
vi Preface
Acknowledgements
The Editors gratefully acknowledge the kindness of these institutions and individuals in granting the fol-
lowing permissions:
vii
W. Feller: Boundaries induced by non-negative matrices. Transactions of the American Mathemat-
ical Society 93 (1956) 19–54. c American Mathematical Society. Reprinted with permission of
the American Mathematical Society.
J. Elliott, W. Feller: Stochastic processes connected with harmonic functions. Transactions of the
American Mathematical Society 82 (1956) 392–420. c American Mathematical Society. Reprinted
with permission of the American Mathematical Society.
Cambridge University Press
W. Feller: Some new connections between probability and classical analysis. Proceedings of the
International Congress of Mathematicians, Edinburgh 1958. Cambridge University Press, Cam-
bridge 1960, pp. 69–86. c Cambridge University Press. Reprinted with permission of the Cam-
bridge University Press.
W. Feller: On fitness and the cost of natural selection. Genetics Research 9 (1967) 1–15.
c Cam-
bridge University Press. Reprinted with permission of the Cambridge University Press.
Croatian Academy of Sciences and Arts
W. Feller: Neuer Beweis für die Kolmogoroff–P. Lévysche Charakterisierung der unbeschränkt
teilbaren Verteilungsfunktionen. Bulletin international de l’académie Yougoslave des sciences et
des beaux-arts, Zagreb. Classe des sciences mathématiques et naturelles 32 (1939) 106–113. c
Croatian Academy of Sciences and Arts. Reprinted with permission of the Croatian Academy of
Sciences and Arts.
Duke University Press
W. Feller: Completely monotone functions and sequences. Duke Mathematical Journal 5 (1939)
661–674. c Duke University Press. Reprinted with permission of the Duke University Press.
W. Feller: Some geometric inequalities. Duke Mathematical Journal 9 (1942) 885–892.
c Duke
University Press. Reprinted with permission of the Duke University Press.
Elsevier
W. Feller: The birth and death processes as diffusion processes. Journal de Mathématiques pures
et appliquées, IX. série 38 (1959) 301–345.
c Elsevier. Reprinted with permission of Elsevier.
Illinois Journal of Mathematics
W. Feller: Generalized second order differential operators and their lateral conditions. Illinois Jour-
nal of Mathematics 1 (1957) 459–504. Reprinted with permission of the Illinois Journal of Mathe-
matics, published by the University of Illinois at Urbana-Champaign.
W. Feller: On the intrinsic form for second order differential operators. Illinois Journal of Mathe-
matics 2 (1958) 1–18. Reprinted with permission of the Illinois Journal of Mathematics, published
by the University of Illinois at Urbana-Champaign.
W. Feller: Differential operators with the positive maximum property. Illinois Journal of Mathemat-
ics 3 (1959) 182–186. Reprinted with permission of the Illinois Journal of Mathematics, published
by the University of Illinois at Urbana-Champaign.
Indiana University Mathematics Journal
W. Feller, S. Orey: A renewal theorem. Journal of Mathematics and Mechanics 10 (1961) 619–624.
c Indiana University Mathematics Journal. Reprinted with permission of the Indiana University
Mathematics Journal.
W. Feller: On the Fourier representation for Markov chains and the strong ratio theorem. Journal
of Mathematics and Mechanics 15 (1966) 273–283. c Indiana University Mathematics Journal.
Reprinted with permission of the Indiana University Mathematics Journal.
W. Feller: An extension of the law of the iterated logarithm to variables without variance. Journal
of Mathematics and Mechanics 18 (1968) 343–355. c Indiana University Mathematics Journal.
Reprinted with permission of the Indiana University Mathematics Journal.
Institute of Mathematical Statistics, Baltimore
viii Acknowledgements
W. Feller: On the integral equation of renewal theory. Annals of Mathematical Statistics 12 (1941)
243–267. c Institute of Mathematical Statistics, Baltimore. Reprinted with permission of the In-
stitute of Mathematical Statistics, Baltimore.
W. Feller: On a general class of “contagious” distributions. Annals of Mathematical Statistics 14
(1943) 389–400. c Institute of Mathematical Statistics, Baltimore. Reprinted with permission of
the Institute of Mathematical Statistics, Baltimore.
W. Feller: On the normal approximation to the binomial distribution. Annals of Mathematical
Statistics 16 (1945) 319–329. c Institute of Mathematical Statistics, Baltimore. Reprinted with
permission of the Institute of Mathematical Statistics, Baltimore.
W. Feller: Note on the law of large numbers and “fair” games. Annals of Mathematical Statistics
16 (1945) 301–304. c Institute of Mathematical Statistics, Baltimore. Reprinted with permission
of the Institute of Mathematical Statistics, Baltimore.
W. Feller: On the Kolmogorov–Smirnov limit theorems for empirical distributions. Annals of Math-
ematical Statistics 19 (1948) 177–189. Erratum: Annals of Mathematical Statistics 21 (1950) 301—
302. Both: c Institute of Mathematical Statistics, Baltimore. Reprinted with permission of the
Institute of Mathematical Statistics, Baltimore.
W. Feller: The asymptotic distribution of the range of sums of independent random variables.
Annals of Mathematical Statistics 22 (1951) 427–432. c Institute of Mathematical Statistics, Bal-
timore. Reprinted with permission of the Institute of Mathematical Statistics, Baltimore.
W. Feller: Non-Markovian processes with the semigroup property. Annals of Mathematical Statis-
tics 30 (1959) 1252–1253. c Institute of Mathematical Statistics, Baltimore. Reprinted with per-
mission of the Institute of Mathematical Statistics, Baltimore.
The Editors of Ann. Math. Statist.: William Feller, 1906–1970. Annals of Mathematical Statis-
tics 41 No. 6 (1970), iv–xiii. c Institute of Mathematical Statistics, Baltimore. Reprinted with
permission of the Institute of Mathematical Statistics, Baltimore.
Institute of Mathematics Polish Academy of Sciences
H. Busemann, W. Feller: Zur Differentiation der Lebesgueschen Integrale. Fundamenta Mathe-
maticae 22 (1934) 226–256. c Institute of Mathematics Polish Academy of Sciences. Reprinted
with permission of the Institute of Mathematics Polish Academy of Sciences.
W. Feller: Über die Existenz von sogenannten Kollektiven. Fundamenta Mathematicae 32 (1939)
87–96. c Institute of Mathematics Polish Academy of Sciences. Reprinted with permission of the
Institute of Mathematics Polish Academy of Sciences.
W. Feller: On positivity preserving semigroups of transformations on C[r1 , r2 ]. Annales de la So-
ciété Polonaise de Mathématique 25 (1952) 85–94. c Institute of Mathematics Polish Academy of
Sciences. Reprinted with permission of the Institute of Mathematics Polish Academy of Sciences.
John Wiley & Sons, Inc.
W. Feller: On differential operators and boundary conditions. Communications on Pure and Ap-
plied Mathematics 8:1 (1955) 203–216. 1955
c John Wiley & Sons, Inc. This material is repro-
duced with permission of John Wiley & Sons, Inc.
W. Feller: A simple proof for renewal theorems. Communications on Pure and Applied Mathe-
matics 14:3 (1961) 285–293. 1961
c John Wiley & Sons, Inc. This material is reproduced with
permission of John Wiley & Sons, Inc.
L’Enseignement Mathématique
W. Feller: One-sided analogues of Karamata’s regular variation. L’Enseignement Mathématique,
IIe. Série 15 (1969) 107–121.
c L’Enseignement Mathématique. Reprinted with permission of
L’Enseignement Mathématique.
Lund University, Centre for Mathematical Sciences
W. Feller: On a generalization of Marcel Riesz’ potentials and the semigroups generated by them.
x Acknowledgements
W. Feller: Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung. II. Mathemati-
sche Zeitschrift 42 (1937) 301–312. Erratum: Mathematische Zeitschrift 44 (1939) 794. Both:
c
Springer. Reprinted with permission of Springer.
W. Feller: Die Grundlagen der Volterraschen Theorie des Kampfes ums Dasein in wahrschein-
lichkeitstheoretischer Behandlung. Acta Biotheoretica, Leiden 5 (1939) 11–39.
c Springer. Re-
printed with permission of Springer.
W. Feller: On the classical Tauberian theorems. Archiv der Mathematik 14 (1963) 317–322.
c
Springer. Reprinted with permission of Springer.
W. Feller: On the Berry-Esseen theorem. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte
Gebiete 10 (1968) 261–268. c Springer. Reprinted with permission of Springer.
W. Feller: General analogues to the law of the iterated logarithm. Zeitschrift für Wahrscheinlich-
keitstheorie und verwandte Gebiete 14 (1969) 21–26. c Springer. Reprinted with permission of
Springer.
W. Feller: Limit theorems for probabilities of large deviations. Zeitschrift für Wahrscheinlich-
keitstheorie und verwandte Gebiete 14 (1969) 1–20. c Springer. Reprinted with permission of
Springer.
The Johns Hopkins University Press
W. Feller: A limit theorem for random variables with infinite moments. American Journal of Math-
ematics 68:2 (1946) 257–262. 1946
c The Johns Hopkins Press. Reprinted with permission of The
Johns Hopkins University Press.
University of California Press
W. Feller: On regular variation and local limit theorems. In: J. Neyman and L. LeCam (eds.):
Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability 1965/66,
Vol. 2, Pt. 1, University of California Press, Berkeley, Los Angeles (CA) 1967, pp. 373–388. c
University of California Press. Reprinted with permission of the University of California Press.
J.L. Doob: William Feller and Twentieth-Century Probability. In: L. LeCam, J. Neyman, E.L. Scott
(eds.): Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability
1970/71, Vol. 2, University of California Press, Berkeley, Los Angeles (CA) 1972, pp. 15–20. c
University of California Press. Reprinted with permission of the University of California Press.
M. Kac: William Feller, In Memoriam. In: L. LeCam, J. Neyman, E.L. Scott (eds.): Proceedings
of the Sixth Berkeley Symposium on Mathematical Statistics and Probability 1970/71, Vol. 2,
University of California Press, Berkeley, Los Angeles (CA) 1972, pp. 21–23. c University of
California Press. Reprinted with permission of the University of California Press.
For the following publications, the copyright resides with the author(s). Despite our best efforts, we were
not able to trace the present copyright owners and we do ask them to come forward if they feel that their
rights are infringed or that the material has been used in violation of copyright law.
W. Feller: On probability problems in the theory of counters. In: Studies, Essays, presented to
R. Courant (Courant anniversary volume). Interscience Publishers, New York 1948, pp. 105–115.
K.L. Chung, W. Feller: On fluctuations in coin-tossing. Proceedings of the National Academy of
Sciences, USA 35 (1949) 605–608.
W. Feller, H.P. McKean Jr.: A diffusion equivalent to a countable Markov chain. Proceedings of
the National Academy of Sciences, USA 42 (1956) 351–354.
W. Feller: On boundaries defined by stochastic matrices. In: Proceedings of Symposia in Applied
Mathematics. Vol. 7. McGraw-Hill Book Co., New York (for the American Mathematical Society,
Providence (RI)) 1957, 35–40.
W. Feller: On semi-Markov processes. Proceedings of the National Academy of Sciences, USA 51
(1964) 653–659.
W. Feller: On the influence of natural selection on population size. Proceedings of the National
Academy of Sciences, USA 55 (1966) 733–738.
xii Acknowledgements
Contents of Volume 1
Preface v
Acknowledgements vii
Contents of Volume 1 xiii
xiii
[Feller 1935c] (translation) On the Central Limit Theorem of
Probability Theory 207
[Feller 1943c] The General Form of the So-called Law of the Iterated
Logarithm 613
[Feller 1945d] Note on the Law of Large Numbers and “Fair” Games 717
Contents of Volume 1 xv
Contents of Volume 2
Preface v
Acknowledgements vii
Contents of Volume 2 xiii
xvii
[Feller 1952b] On a Generalization of Marcel Riesz’ Potentials and the
Semigroups Generated by Them 203
[Feller 1959b] The Birth and Death Processes as Diffusion Processes 541
[Feller 1959d] Differential Operators with the Positive Maximum Property 589
[Feller 1966a] On the Fourier Representation for Markov Chains and the
Strong Ratio Theorem 643
[Feller 1969d] General Analogues to the Law of the Iterated Logarithm 743
xxi
1966 Permanent title of visiting professor of the Rockefeller Univer-
sity, New York (NY)
1969/1970 National medal of science
14 January 1970 William Feller dies of cancer at the age of 63 in New York City
(NY)
A star “*” indicates that the respective entry is not contained in these Selecta.
[*Feller 1932a] (with Erhard Tornier) Maß- und Inhaltstheorie des Baireschen Null-
raumes [Measure and integration theory of Baire’s null space]. Mathemati-
sche Annalen 107 (1932) 165–187.
xxv
[Feller 1934a] (with Herbert Busemann) Zur Differentiation der Lebesgueschen In-
tegrale [On the differentiation of Lebesgue’s integrals]. Fundamenta Mathe-
maticae 22 (1934) 226–256.
[Feller 1936c] Zur Theorie der stochastischen Prozesse. (Existenz- und Ein-
deutigkeitssätze) [On the theory of stochastic processes. (Existence- and
uniqueness theorems)]. Mathematische Annalen 113 (1936) 113–160. This
paper has been translated for the present Selecta.
[Feller 1937b] Über das Gesetz der großen Zahlen [On the law of large num-
bers]. Acta Litterarum ac Scientiarum. Regiae Universitatis Hungaricae
[*Feller 1941b] (with Jacob David Tamarkin) Partial Differential Equations. Brown
University Summer Session for Advanced Instructions & Research in Me-
chanics. 6/23- 9/13 1941, Providence (RI) 1941.
Chapters 1–3 by J. D. Tamarkin, Chapters 4–7 by William Feller.
Reprinted by the Lewis Flight Propulsion Laboratory, National Committee
for Aeronautics, Cleveland 1956.
[Feller 1943c] The general form of the so-called law of the iterated logarithm. Trans-
actions of the American Mathematical Society 54 (1943) 373–402.
[Feller 1945b] The fundamental limit theorems in probability. Bulletin of the Ameri-
can Mathematical Society 51 (1945) 800–832.
[Feller 1945d] Note on the law of large numbers and “fair” games. Annals of Math-
ematical Statistics 16 (1945) 301–304.
[Feller 1946a] A limit theorem for random variables with infinite moments. Ameri-
can Journal of Mathematics 68 (1946) 257–262.
Textbooks
A star “*” indicates that the respective entry is not contained in the present selection.
[*Feller 1941b] (with Jacob David Tamarkin) Partial Differential Equations. Brown
University Summer Session for Advanced Instructions & Research in Me-
chanics. 6/23- 9/13 1941, Providence (RI) 1941.
Chapters 1–3 by J. D. Tamarkin, Chapters 4–7 by William Feller.
Reprinted by the Lewis Flight Propulsion Laboratory, National Committee
for Aeronautics, Cleveland 1956.
[*Feller 1950] An Introduction to Probability Theory and Its Applications. John Wi-
ley & Sons, New York 1950.
[*Feller 1957f] An Introduction to Probability Theory and Its Applications. Vol. 1.
2nd edn. of [*Feller 1950]. John Wiley & Sons, New York 1957.
[*Feller 1966d] An Introduction to Probability Theory and Its Applications. Vol. 2.
John Wiley & Sons, New York 1966.
[*Feller 1968e] An Introduction to Probability Theory and Its Applications. Vol. 1.
3rd edn. of [*Feller 1950], [*Feller 1957f]. John Wiley & Sons, New York
1968.
[*Feller 1942b] Vectors in a Plane; Notes for Mathematics 5, Summer 1942. Brown
University, Providence (RI) 1942. (Brown University Libary: SCI – Level 7,
Aisle 1B 1-SIZE QA261.F45)
1. Forsythe, George E.: Riesz summability methods of order r, for R(r) < 0, Ce-
saro summability of independent random variables. (Brown University 1941)
2. Kincaid, Wilfred MacDonald: Part I: On non-cut sets of locally connected
continua. Part II: An application of orthogonal moments to problems in statis-
tically indeterminate structures. Part III: Numerical methods for finding char-
acteristic roots and vectors of matrices. (Brown University 1946)
3. Juncosa, Mario L.: On the asymptotic behavior of the minimum in a sequence
of random variables. (Cornell 1949)
4. Weber, Maria A.: The solution of a linear differential equation of parabolic
type. (Cornell 1949)
5. Seifert, George H.: Some third order boundary value problems. (Cornell 1950)
6. Shapley, Lloyd Stowell: Additive and non-additive set functions. (Princeton
1953. Co-advisor: A. W. Tucker)
7. Murdoch, Brian Hughes: Preharmonic functions. (Princeton 1955)
8. Billingsley, Patrick Paul: The invariance principle for dependent random vari-
ables. (Princeton 1955)
9. McKean, Henry P. Jr.: Sample functions of stable processes. (Princeton 1955)
10. Trotter, Hale F.: Convergence of semi-groups of operators. (Princeton 1956)
11. Gaver, Donald Paul: Some results in the theory of queues. (Princeton 1956)
12. Knight, Frank Beardsley: Construction of diffusion processes by means of ran-
dom walks. (Princeton 1959)
13. George, Melvin D.: The approximation of solutions of nonlinear differential
equations. (Princeton 1959)
14. Freedman, David Amiel: Mixtures of stochastic processes. (Princeton 1960)
15. Shepp, Lawrence A.: Recurrent sums of random variables. (Princeton 1961)
xxxv
16. Schay, Geza: The equations of diffusion in the special theory of relativity.
(Princeton 1961)
17. Goldman, Jay Robert: Stochastic point processes: limit theorems and infinite
divisibility. (Princeton 1965)
18. Silverstein, Martin Louis: Many particle processes. (Princeton 1965)
19. Weiss, Benjamin: Vibrating systems and positivity preserving semi-groups.
(Princeton 1965)
William Feller died on January 14, 1970, at the age of 63, at Memorial Hospital
in New York. He was a member of the National Academy of Sciences, of
the American Academy of Arts and Sciences, of the Danish and Yugoslavian
Academies of Sciences, fellow of the Royal Statistical Society, past governor of
the Mathematical Association of America, former president of the Institute of
Mathematical Statistics (1946). A few days before his death he had learned
also of his election as honorary member of the London Mathematical Society
and of the decision to award him the National Medal of Science, which his wife
Clara was to receive in his stead on February 16 at the White House. These
outward and visible honors confirm his position in science, to which is added
our affection for his gaiety, enthusiasm, gentleness, and responsiveness.
Will Feller was born in Zagreb, Yugoslavia, on July 7, 1906; he attended
the University there from 1923 to 1925, leaving with a degree equivalent to our
Master of Science. From 1925 to 1928 he worked at the University of Göttin-
gen, where he received the Ph.D. in 1926, at the age of twenty. At Göttingen
he had the good fortune to become acquainted with David Hilbert, always
his ideal mathematician, as well as with Richard Courant, who recognized his
promise and encouraged him to become a mathematician in earnest. In 1928
he went as Privatdozent to the University of Kiel, but left there in 1933 after
refusing to sign a Nazi oath. He passed a year in Copenhagen, where he came
to know Harold Bohr and his brother Niels, and then five years (1934-1939)
at the University of Stockholm, in the vicinity of Marcel Riesz and Harald
Cramér. It was during his last year there, on July 27, 1938, that he married
1 Editor’s note. This article, prepared by the Editors, is a combination of the memorial
resolution by the Faculty of Princeton University and the documents supporting Feller’s
nomination for the National Medal of Science.
a This text is retyped, with permission, from the obituary which appeared in The Annals
William Feller was born on 7 July 1906 in Zagreb, Croatia, which was then
a Southern province of the Austro-Hungarian Empire. He was the ninth of
twelve children of Eugen Viktor Feller and Ida Feller. Following the Roman-
Catholic tradition, the young baby boy was named Vilibald after the saint (St.
Willibald) whose feast day fell on his birthday. In the church register of births
(city register)1 he had been registered as Vilibald Srećko2 Feller. Feller called
himself Vilim3 (which is the Croatian form of William), and throughout his life
he would adopt the native versions of his first name depending on the country
he lived in: thus in Germany and Scandinavia he called himself Willi/Willy,
and William/Will in America.
Feller’s father Eugen was born in 1871 in Lemberg, Galicia, in the North-
eastern corner of the Austro-Hungarian Empire (now Lviv, in Western Ukraine),
and died in 1936 in Zagreb. His wife Ida was born in 1870 (place unknown),
and passed away in Zagreb two years after her husband’s death. His paternal
grandparents were David Feller and Elizabeth (Elsa) Holzer from Lemberg, and
his maternal grandparents were Ferdinand Oemichen and Hermina Peerz (or
Perc). Feller’s parents owned a pharmaceutical company. The then-famous,
now-obscure Elsa fluid, named after Eugen’s mother, was one of the mainstays
1 Žubrinić [47, p. 9]. The references [Feller 19nn] and [*Feller 19nn] (the star indicating
that the respective paper is not contained in these Selecta) refer to Feller’s bibliography,
while [n] points to the list of references at the end of this essay.
2 Srećko is the Croatian equivalent of Felix, ‘the lucky’.
3 The rather anecdotal story in Seneta [40, p. 87] is based on the claim that July 7 is
St. Willibald in the “(German) Catholic saint’s list” while it is Sv. Vilim’s feast day in the
“Croatian Catholic list” which cannot be verified (www.namecalendar.net, accessed 3 August
2014, gives for Vilim the dates 6 April, 23 May, 8 June and 25 June, the Croatian Catholic
Calendar www.hkr.hr/kalendar, accessed 29 August 2014, gives the dates 10 February, 23
May, 8 June and 29 July). It is more likely that Vili(m) was used as a family nickname
instead of the more formal Vilibald; also in German, it is common practice to abbreviate
Willibald by Willi or Willy.
of the enterprise. It had been marketed as an elixir capable to cure all kinds
of maladies, such as headaches, colds, back pains, etc., and was the source
of the family’s substantial wealth. Like many upper-class families, the Fellers
were bilingual and William spoke both Croatian and German. Young William
was raised in the Roman-Catholic tradition even though his paternal grand-
father David was Jewish; he had converted to Catholicism when marrying
Elsa.4 Feller’s ancestors probably came to Zagreb during the second half of
the nineteenth century. More details on Feller’s parents and grandparents can
be found in the two studies by Fatović-Ferenčić, and Ferber-Bogdan [10, 11].
Zagreb 1906–1925
We do not know the exact year when the Fellers moved to their modern, spa-
cious and elegantly decorated villa on 31a, Jurjevska street,5 but it is safe to
assume that William spent most of his childhood there. Judging by family
photos and personal stories, they lived in a pleasant and productive envi-
ronment. Several of William’s brothers had successful careers of their own:
6 Žubrinić [47, pp. 26 ff.] and the Croatian Biographical Encyclopedia [22].
7 German for a “secondary school that prepares students for the university, that offers
Latin but no Greek, and that typically emphasizes sciences and modern languages” (www.
merriam-webster.com/dictionary/realgymnasium, accessed 30/7/2014).
8 The former I. Realgymnasium today houses the Mimara museum of art, one of Croatia’s
[1] lists an Abgangszeugnis (transcript of marks) of the University in (sic!) Agram (Zagreb)
dated 13 October 1925. The Abgangszeugnis, however, was returned to the candidate after
the defence of the thesis and is not part of the Promotionsakte.
16 Reid [34, p. 119]
17 Reid [34, p. 99]. In those years, Courant’s assistants, O. Neugebauer and
K. O. Friedrichs were in charge of the Anfängerpraktikum. Feller would meet them again
later in his career.
18 Around the same time quite a few very gifted students were at Göttingen, e.g. Herbert
Busemann (1905–1994), Hans Lewy (1904–1988), John von Neumann (1903–1957), and
Franz Rellich (1906–1955).
19 Rota also mentions this as an example of one of Feller’s “bombastic stories” [38, p. 230].
bearbeitet hat und zwar in einer Umgebung, (Agram), in welcher er keinerlei Anregung von
aussen empfangen konnte.
22 Courant notes “that the choice of the topic leads away from those questions which
nowadays appear to be interesting and important” [“dass die Stellung des Themas etwas aus
dem Rahmen der Fragen hin[aus]führt, die uns heute interessant und wichtig erscheinen”].
23 of Zermelo–Fraenkel fame, worked on number theory, set theory and the foundations
of mathematics. He came to Kiel in 1928. His relation to Feller and Tornier is described in
his autobiography Lebenskreise [17, pp. 154 f ].
24 Habilitation is a post-doctoral degree that bestows on the recipient the venia legendi,
the permission to teach, examine and supervise students independently at the university
which conferred the degree. Usually a full professor of the faculty has to support one’s
application for habilitation.
25 On the solutions of second-order linear partial differential equations of elliptic type
26 Hochkirchen [21, p. 26]
however, have had the same doctoral advisor: Kurt Hensel at the University of Marburg.
29 Fraenkel [17, p. 131; our translation from German]
30 He joined on 1 May 1932, after the philosophical faculty at Kiel agreed to recognize his
phemistically named Nazi law which allowed to cleanse the civil service from non-Aryan and
politically non-conforming members. Its §3 is one of the earliest Aryan paragraphs. The
full text is available under www.documentarchiv.de/ns/beamtenges.html (accessed 1 August
2014), excerpts are in Uhlig [45, pp. 143 ff.].
32 Doob [9] writes of an “Nazi oath” which Feller “refused to sign [. . . ] and was forced
to leave”; this story is reiterated at several places but it cannot be confirmed. On the
contrary, Feller filled out all required forms and he was forced to leave on the basis of
his ancestral information. The form, which appeared as an appendix to the BBG, is
available online alex.onb.ac.at/cgi-content/alex?apm=0&aid=dra&datum=19330004&zoom=
2&seite=00000253&ues=0&x=13&y=7 (accessed 20 August 2014).
33 See the official web-page of the University of Kiel, www.uni-kiel.de/ns-zeit/bios/
feller-willy.shtml (accessed 31 July 2014) and the documentation in Uhlig [45, p. 23].
34 Uhlig [45, p. 23]
35 Doob [9, p. xvii], Birnbaum [4, p. iv]
36 Handwritten letter of Feller to Borge Jessen [13]
37 U.S. Social Security Death Index, accessed on 1 August 2014 via death-records.
38 The following historical remarks on the Zentralblatt are from zbmath.org/about/ (ac-
cessed 1 August 2014): “The Zentralblatt für Mathematik und ihre Grenzgebiete was
founded in 1931 with the aim to publish reviews of the entire world literature in mathe-
matics and related areas. Zentralblatt became the second comprehensive review journal for
mathematics in Germany after the Jahrbuch über die Fortschritte der Mathematik (estab-
lished in 1868) which has been active until the 1940s. Although the Zentralblatt had, essen-
tially, the same agenda as the Jahrbuch, the latter aimed at maintaining the completeness
of the coverage and the classification of all articles in each calendar year, whereas Zentral-
blatt put more emphasis on the promptness of the reviews and the international aspect.
“The initiative for the foundation of a new mathematical reviewing journal came from
mathematicians Otto Neugebauer, Richard Courant, and Harald Bohr, together with the
publisher Ferdinand Springer. The rapidly growing number of newly published mathematics
works in the 1920s and the scientists’ need for obtaining quick information on recent material
motivated the decision to create an alternative service to the Jahrbuch. [...]
“Neugebauer directed the new periodical for several years until the political situation
in Germany made his position as editor-in-chief unsustainable. In 1933, shortly after the
Nazi party rose to power, a law was enacted which banned Jews and political enemies from
holding jobs as civil servants. A call to dismiss Courant, Neugebauer, Landau, Bernays,
and Noether appeared in a local newspaper and, soon afterwards, Courant escaped to the
UK and later moved to New York. Due to this pressure, Neugebauer decided to resign from
his post at Göttingen University and in 1934 took up a professorship in Copenhagen, from
where he continued his work for Zentralblatt.
“The struggle to produce the reviewing journal became more difficult throughout this
period, however, for the Nazis tried to influence the editorial policy. Neugebauer eventually
gave up his position as editor-in-chief in 1938 after a series of incidents, including Levi-Civita
being dismissed from the editorial board without his knowledge.”
With his already published work in probability theory and analysis, and pa-
pers on differential geometry, Feller had choices to make, and the contact with
Cramér tipped the scale in favour of probability theory. During his Stockholm
years Feller wrote two seminal papers which would establish his prominent
position as a probabilist: his work on limit theorems [Feller 1935c], where he
found the necessary and sufficient conditions for the validity of the Central
Limit Theorem (CLT), and his extension [Feller 1936c] of the results of Kol-
mogorov’s Analytische Methoden paper [27], which provided the foundations
of the theory of Markov processes. The mathematical methods used in both
papers reflected his strong analysis and PDE background which he had ac-
quired in Göttingen. There was more work on limit theorems [Feller 1937a]
(influenced by Feller’s interaction with Marcel Riesz), the paper on the weak
law of large numbers [Feller 1937b] (dedicated to Harald Bohr and written
during Feller’s February 1937 visit to Lund), and another on infinite divisibil-
ity [Feller 1939d]; their mathematical contents is scrutinized in detail in Hans
Fischer’s essay later in these Selecta [16].
At the memorable Geneva Colloque sur le Calcul des Probabilités40 Cramér
and Feller met Neyman who “gave a lecture on his theory of confidence inter-
vals, which was then something quite new”, and met with a skeptical recep-
tion;41 Feller, however, immediately took up this topic in [*Feller 1938b].42
His own contribution to the colloquium [*Feller 1938a]43 was a comparison
of the von Mises’, Tornier’s and Kolmogorov’s approaches to the foundations
of probability theory; he continued this theme in [Feller 1939c]. Interestingly
enough, after all the Tornier’s political scheming in Kiel, Feller unflinchingly
gave him the mathematical credit due. A critical assessment of this work can
be found, again, in Fischer’s essay [15]. At the same time, clearly with Kol-
mogorov [27] and [Feller 1936c], as well as Risser [36], in mind he came up with
44 On the titlepage of the Rad-version of [Feller 1939d] there is a note, “Napisao član
dopisnik” [“written by corresponding member”], and the acceptance date (22 November
1937) is given. This, curiously, only appears in the original Croatian version of [Feller 1939d],
cf. also Seneta [40, p. 87 f.] and Žubrinić [47, p. 45 f.; he confirms that Feller became
corresponding member in 1937].
45 Possibly, also because of the more or less latent anti-Semitism in Sweden, see the
comments and excerpts from Feller’s letters from 1934 and 1939 in Siegmund-Schultze [41,
pp. 135 f.].
46 Reingold [35, pp. 178, 184]
47 Bers [3, p. 240]
48 Reingold [35, p. 197], Reid [34, p. 228 f.]
ment after World War II; a thorough discussion of this historical phenomenon
can be found in Siegmund-Schultze [41].
The main thrust of Feller’s research continued to involve limit theorems
and Markov processes. In [Feller 1943c] he strengthens the results of Kol-
mogorov’s paper on the law of the iterated logarithm (LIL), giving necessary
and sufficient criteria for norming functions to be in Lévy’s upper and lower
classes. Feller would come back to this theme time and again. Mark Kac49
called this paper a “veritable tour de force”, and it can be argued that this
paper still is unsurpassed in its completeness and its methods. The paper is
essentially self-contained, relying only on his own previous work [Feller 1943b]
where asymptotic estimates for the tails of sums of (bounded) independent ran-
dom variables were obtained. In a major survey paper [Feller 1945b] he reviews
developments on the LIL, CLT, and its “little brother”, the weak law of large
numbers (WLLN); he was also interested in applications of the CLT and the
WLLN, e.g. in connection with the St. Petersburg paradox [Feller 1945d], and
normal approximations of the binomial law [Feller 1945a]. His paper on con-
ditional probability functions [Feller 1940c] which determine a not necessarily
diffusive Markov process completed his seminal contribution in [Feller 1936c],
and actually introduced the term “Markoff process”. Later, Doob50 would say
that
given a long course on aspects of the Hille–Yosida theory of semigroups in the context
of probability theory” and he goes on how “semigroup analysis” helped him (Kendall) to
understand and work (with Harry Reuter) on general Markov chains [25, p.178].
56 The interpretation of the boundary conditions in terms of the processes in [Feller 1954b]
is one notable exception. In general, however “[h]e was one of the first generation who
thought probabilistically [. . . ], but when it came to writing down any of his results for
publication, he would chicken out and recast the mathematics in purely analytic terms”
(Rota [38, p. 227]). The same pattern can already be observed in Feller’s early contributions
to probability theory, see Fischer [16].
57 notably in [Feller 1952a, Feller 1952b, Feller 1952e, Feller 1953a, Feller 1953b,
Feller 1959c]
58 Yosida, review MR0047886 (13,948a) of [Feller 1952a].
59 including [Feller 1952a, Feller 1954a, Feller 1954b, Feller 1955a, Feller 1955b,
Feller 1957a, Feller 1957d, Feller 1958]
At about the same time, building on his ideas from [Feller 1939a], he
presents a groundbreaking paper at the Second Berkeley Symposium on Math-
ematical Statistics and Probability [Feller 1951d], paving the way for non-
trivial applications of stochastic processes, in particular in genetics. At the
end of the decade, he connects his diffusion and semigroup theory with birth
and death processes [Feller 1959b] – an idea which was already present in the
1939 paper. Feller’s pioneering work on mathematical biology is among his
most influential contributions to science, see the commentary by Baake and
Wakolbinger [2] included in these Selecta.
At Princeton he was able to attract many exceptionally talented gradu-
ate students and young researchers to either work with him or to be strongly
influenced by his work. Among his Princeton PhD students were Patrick
Billingsley (1955), Henry McKean (1955), Hale Trotter (1956), Frank Knight
(1956), David Freedman (1960), Lawrence Shepp (1961), Martin Silverstein
(1965), Benjamin Weiss (1965) and Loren Pitt (1967); the Mathematics Ge-
nealogy Project60 lists 22 students and 1043 descendants. Many established
international visitors were attracted to Princeton as well. For example, Kiyosi
Itô visited the Institute for Advanced Study in Princeton from 1954 to 1956,
and almost immediately started working with Henry McKean, then one of
Feller’s graduate students, and soon they both started exchanging their ideas
with Feller. Their celebrated work on diffusion processes was in many ways
influenced by Feller’s programme: In Itô and McKean’s words “W. Feller has
our best thanks, his ideas run through the whole book.”61 . In his MR book
review, S. Watanabe wrote:
Around 1955, W. Feller’s work on linear diffusion [MR0047886 (13,948a)
[Feller 1952a]; MR0068082(16,824g) [Feller 1955a]; and many oth-
ers], which was primarily of analytic character, spurred some out-
60 genealogy.math.ndsu.nodak.edu/id.php?id=33019 (accessed 5 August 2014).
61 Itô and McKean [23, p. XI]
This nicely illustrates how Feller handled his interactions with his students. He
never competed with them head-on but carefully led them onto independent,
but still related to his own work, paths of research.
The 1950s were not only an exceptionally productive period for Feller but
they also marked a breakthrough in the appreciation of his work in a wider
mathematical and scientific community. At the age of only 45 he got appointed
to a named chair at Princeton University, which was just about to enter top
rankings of the mathematical world, taking over the role Göttingen had played
before the war. In 1958 he was a plenary lecturer at the International Congress
of Mathematicians (ICM) in Edinburgh and in the same year he was named
a Fellow of the American Academy of Arts and Sciences. Two years later,
William Feller became an elected member of the National Academy of Sci-
ences (USA)63 , and among the many signs of international recognition was
his appointment to serve on the Fields Medal Committee for the 1966 ICM in
Moscow.
Feller continued working on his benchmark monograph. The second, slightly
enlarged, but substantially changed, edition of Volume 1 appeared in 1957
and the preparations for the much more advanced Volume 2 (which would
appear in 1966) must have been in full swing. The content of both volumes
reflects Feller’s research interests, for example, the material from the papers
on fluctuations and coin tossing [Feller 1949b, *Feller 1951c, *Feller 1957b,
*Feller 1959a] have been included in Volume 1, while Volume 2 would estab-
lish semigroup theory as a tool in probability and it is still one of the most
complete discussions of limit theorems.
The 1958 ICM at Edinburgh showcased Feller at the height of his ca-
reer. In the one-hour plenary talk [Feller 1960] he summed up his bound-
ary theory, reviewed its connections with Martin boundaries, and explained
the intimate link between probability and potential theory. But after the
Congress had adjourned he left the field which he had initiated over the
past decade, and turned his attention again to limit theorems and appli-
cations. As in Stockholm, a quarter-century earlier, he started interacting
again with scientists and engineers. The expository paper [*Feller 1961c] was
aimed at engineering students and he returned to research in mathematical
population biology and genetics: [Feller 1966b, Feller 1967c, *Feller 1969a,
*Feller 1969f]. Jointly with S. Orey he studied renewal theorems for random
walks [Feller 1961a, Feller 1961b], and strong ratio theorems [Feller 1966a]. In
the field of limit theorems he returned to the topics he investigated in the
62 S. Watanabe, review MR 0199891 (33 #8031) of the monograph [23] by Itô and McKean.
63 Rosenblatt [37, p. 12]
tionary synthesis within modern evolutionary biology. It is worth pointing out that Feller’s
publications in mathematical biology were strongly influenced by his work.
65 Rosenblatt [37, p. 12]
66 Cramér [7, p. 436]
67 Halmos [20, p. 94]
68 McKean quoted in Rosenblatt [37, pp. 13 f.]
69 Rota [38, p. 227]
in the class “with a set chin and an earnest look on his face, trying to elicit a
slight nod from people, and he would often get such a nod.”70 In this context
Mark Kac coined the expression “proof by intimidation”.71 Feller did not
adopt Landau’s Definition – Theorem – Proof style for his lectures and he did
not routinely use the headlines “Theorem” or “Proof” on the blackboard.72
We do not know when exactly Feller learned about his terminal illness, but
we do know that he was aware of it for a while:
Having accepted the verdict himself he tried to make it easy for all
of us to accept it too. He behaved so naturally and he took such
interest in things around him that he made us almost forget from
time to time that he was mortally ill.73
“[t]he manuscript had been finished at the time of the author’s death but no
proofs had been received” and proofreading, indexing and final touches were
done by Feller’s students.74
Acknowledgement. Many colleagues supported us when writing this bio-
graphical sketch. In particular we would like to thank Christian Berg for sup-
plying Feller’s letters to Borge Jessen, Joanne Elliott for sharing with us her
recollections and supplying many photographs of Feller, Hans Fischer, Niels
Jacob, Joseph Rosenblatt and Ken-iti Sato for critical remarks and stimulating
discussions. Tony Knapp kindly told us about Feller’s teaching at Princeton
and he shared with us his personal lecture notes of some of Feller’s classes. Our
Croatian colleagues, Nikola Sandrić, Hrvoje Šikić and Zoran Vondraček helped
us with many documents and sources in Croatian which would otherwise have
been inaccessible to us. Part of this exposition is based on Hrvoje Šikić’s arti-
cle [42] on Feller. We are also indebted to Ulrich Hunger from the Staats- und
Universitätsbibliothek Göttingen who provided scans of the Promotionsakte,
to Ms. Christiane Weber of TU Dresden for digitizing the photographs and
to Ms. Sarah Hamdouchi of the Einwohnermeldeamt Flensburg, Germany, for
checking the birth register of the city of Flensburg.
References
All citations of the form [Feller 19nn], resp., [*Feller 19nn] (if the respective
paper is not included in these Selecta) point to Feller’s bibliography, pp. xxv–
xxxiv.
[1] Promotionsakte [dissertation file] “Willy Feller”. Universitätsarchiv Göt-
tingen, Signatur [Math Nat Prom 0010, 23].
74 [*Feller 1971, p. xi]; J. Goldman, A. Grunbaum, H. McKean, L. Pitt and A. Pittenger
[47] Žubrinić, D.: Vilim Feller / William Feller (Croatian and English).
Graphis, Zagreb 2010.
Introduction
During the first years of the 1930s, measure and integration as well as the
theory of probability began to gain the shape which we are used to these
days. Monographs like Saks’s “Theory of the Integral” (1st edn. 1933 [38]1 in
French, 2nd edn. 1937 [40] in English) and Kolmogorov’s “Grundbegriffe der
Wahrscheinlichkeitsrechnung” (1933, [27]) still provide a very “modern” im-
pression to the reader. Feller’s famous probabilistic papers from the second half
of the 1930s on limit theorems and related topics are, in accord with his first
publications, rather within the scope of “classical” analysis, and this fact cer-
tainly contributed to the popularity of these accounts. Therefore, it is a little
surprising, at first glance, that he had, from the beginning of his mathematical
career, very strong interests in measure and integration, and related proba-
bilistic problems. This interest is also reflected by his activities as a reviewer
in the “Jahrbuch über die Fortschritte der Mathematik”, and, especially, the
“Zentralblatt”; the latter had been founded in 1931 by his friend Otto Neuge-
bauer, whom Feller knew since their common Göttingen days. Altogether,
Feller wrote 7 longer reviews in these journals during the 1930s, among them
on the above-mentioned “Grundbegriffe”, and on the likewise very influential
survey “Asymptotische Gesetze” by Khinchin [26], which appeared in 1933
as well. Feller’s report (Zbl 0007.21601) on Kolmogorov’s booklet is – as a
1 The references [Feller 19nn] and [*Feller 19nn] (the star indicating that the respective
paper is not contained in these Selecta) refer to Feller’s bibliography, while [n] points to the
list of references at the end of this essay.
2 For an outline of the history of measure theory from ca. 1900 to ca. 1950, see [36].
A wealth of historical details can be found in [17]. Especially with regard to the relations
between the development of probability theory and measure theory, see [23] and [42].
where only finitely many λk are not zero. In this sense, the study of cer-
tain sets of natural numbers can be transferred to the study of sequences
(λk ), and the idea is to interpret densities of numbers with certain properties
as measures in certain sequence spaces. This was the point where Tornier’s
and Feller’s collaboration began. The papers “Maß- und Inhaltstheorie des
Baireschen Nullraums” [*Feller 1932a] and “Mengentheoretische Untersuchung
von Eigenschaften der Zahlenreihe” [*Feller 1932b] have to be considered as
closely connected, as one can see from the summarizing report on both papers
[*Feller 1931], and also from the same submission date 10 July 1931.
The Baire null space R is defined [*Feller 1932a, p. 166] as a space of
sequences
(ρ ) (ρ ) (ρ )
x = e1 1 e2 2 e3 3 . . .
(ρ )
of “symbols” ei i (i = 1, 2, 3, . . . ), where the symbols with the integer variables
(ρ )
ρi are elements of sets {ei i | 0 ≤ ρi ≤ ti } with fixed constants ti ∈ N0 ∪ {∞}
such that at most finitely many ti are equal to 0. In the most common case
all these sets are simply identical to N0 , and the Baire space is the space NN 0
consisting of all sequences of non-negative integers. Each of the more general
Baire null spaces
(ρ ) (ρ )
{e1 1 } × {e2 2 } × · · ·
can be identified with a subspace of NN0.
3 The “distance” of two “points”
(ρi ) (σi )
x = (ei ) and y = (ei ) in R, written symbolically as xy, is defined by
1
xy :=
n
if n is the smallest number such that ρn = σn . Two sequences which are
3 Insofar, the use of the symbols e was unnecessarily complicated. Intricate notation
i
was a specialty of Tornier.
(ii) If an nth grade basic set E is decomposed into basic sets E (i) of (n + 1)st
grade (whose elements coincide with respect to their first n coordinates
with all elements of E), then
|E| = |E (i) |.
A = |R| − (R \ A).
for almost all (x, y), provided that f is locally summable. Strictly speaking,
the constraint 0 < α < |h/k| < β was made explicit only by Lebesgue (1910, [30,
p. 363]) in the introduction of his comprehensive article on relations between
differentiation and integration in several dimensions, in which also Vitali’s
result was considerably extended. In this context, Lebesgue’s main device was
the generalization of an assertion in Vitali’s above-cited 1908 paper. His (and
other’s) generalized versions of this assertion would later simply be called
“Vitali’s covering theorem”. Lebesgue [30, p. 390 f.] first defined what he
termed a “regular family”: A family F of Lebesgue measurable sets is called
In his proof, Lebesgue made decisive use of Vitali’s theorem, for which, in
turn, the assumption of regular families seemed to be indispensable.5
In a footnote on pp. 362–363 of his 1910 paper, Lebesgue stated that neither
the validity nor the invalidity of (3) without the constraint 0 < α < |h/k| < β
corresponding to the assumption of regular sequences of two-dimensional inter-
vals was proven yet. Moreover, Harald Bohr and Stefan Banach later showed
that already in the case of certain higher-dimensional intervals which do not
fulfill the assumption of “regular families” a countable “Vitali covering” is
not possible, as is also hinted at in [Feller 1934a, p. 227].6 Therefore, it
5 Montel and Rosenthal [33, pp. 1132–1134] give an excellent description of Lebesgue’s
main ideas.
6 Banach (1924, [2]) constructed a bounded set F ⊂ R2 with Lebesgue measure m(F ) = 1
such that to each point of this set a particular sequence of rectangles exists whose edges are
parallel to the coordinate axes, and who contract to this point. From the family of these
rectangles, however, each countable subfamily consisting of mutually disjoint rectangles cov-
ers a set of a measure only less or equal to 12 m(F ). Bohr constructed another counterex-
ample, again in relation to two-dimensional intervals, which he had already communicated
The system R is assumed to contain for each set ρ also all the other sets that
can be generated from ρ by an affine mapping composed of a translation and
a uniform scaling.8 With respect to R the assertion (4) holds for any – even
unbounded – locally summable function f if and only if for any positive numbers
a1 , . . . , an and any arbitrary disjoint, bounded and Lebesgue measurable sets
κ1 , . . . , κn the union s of all sets ρ ∈ R with
n
aν m(κν ∩ ρ) > m(ρ)
ν=1
n
(5) m(s) < C aν m(κν ),
ν=1
The validity of (5) can be shown without any use of Vitali’s covering the-
orem for systems of “standard” sets, e. g., in the two-dimensional case for
circles, squares, rectangles with bounded ratios of edge lengths. A significant
consequence of Busemann and Feller’s main theorem is due to the fact that
(5) may already fail if R is the system of any arbitrary rectangles parallel
to the coordinate axes [Feller 1934a, p. 255]. Since Vitali’s covering theorem
for a system R is sufficient for the differentiability of indefinite integrals with
respect to sequences from R [Feller 1934a, p. 254], it can be inferred that this
theorem does not hold if R is assumed to consist of rectangles parallel to the
coordinate axes without any restriction on the ratios of their edge lengths. By
this argument, an alternative proof for the assertion of Banach and Bohr (see
above) is possible.
Vitali’s own proof of the covering theorem, as well as the modifications
of this proof by Lebesgue (see above), Banach [2], or Carathéodory (see [8,
pp. 299–307]) used the axiom of choice, but only in its “weak” denumerable
version. Despite the controversial role of this axiom with regard to the founda-
tions of set theory, it seems that its use (at least in its denumerable form) was
not seen as problematic in connection with the just described problems. This
even applied to Lebesgue, who was, in principle, rather critical towards the
axiom of choice (see [34, pp. 95 f.; 314–317]). Busemann and Feller do not use
Vitali’s theorem, but their considerations are also based on the denumerable
axiom of choice at several places (see, e. g., [Feller 1934a, pp. 231 f.; 255]). In
this sense they share the usual pragmatic attitude of their colleagues working
in measure theory.
Kollektivs
The two articles [*Feller 1938a], [Feller 1939c] on kollektivs are important con-
tributions to foundational questions of mathematical probability, which also
reflect its author’s intentions of unifying the concepts of chance and measure.
9 Saks’s proof [39] for this failure can be found in the immediate sequel of Busemann and
and Pauc [20, pp. 222; 252–255] after some time of informal use. For a classification of
different “halo properties”, see [6, pp. 7–10].
13 There is a good deal of surveys of the history of kollektivs up to the time around 1980.
The still most comprehensive account is [3]. For a concise introduction see [52, pp. 183–197].
where the prime factorization of the natural number q only contains prime
factors > pk . Such number sequences have the density
1
k
1
(7) d(λ1 , . . . , λk ) = λk
1− .
pλ1 λ2
1 p2 · · · pk i=1
pi
Feller and Tornier’s main goal was to develop a criterion for the existence of
the density of an arbitrary sequence of natural numbers among all natural
numbers, based on the knowledge of the density (7) for particular sequences.
To this end they considered – in a quite intricate way – matrices with infinitely
(λ ) (λ )
many rows and columns, of which the jth column with the entries e1 1j , e2 2j ,
. . . corresponded to a point in the Baire null space. In this way, each of those
An excellent review of pertinent work during the 1930s is [32]. An easily accessible and very
good survey up to the newest development is [4]. Especially for Tornier’s theory see [21], [22].
(1) The set F of all matrices F such that for each basic set E ⊂ R the density
of E within F (i. e., the limit as n → ∞ of the number of all columns (up to
the nth) which are points of E divided by n) is equal to |E|, is nonempty.
In the case of the measure (7) this set is nonempty simply because the
matrix corresponding to N is in accord with this condition. In the general
case the proof for the existence of such matrices was part of a proof for an
even more specific assertion [*Feller 1932b, pp. 204–206].
(2) A subset A of the Baire null space R has a density within all matrices F
from the above-characterized set F if and only if A is Peano–Jordan measurable
with respect to the algebra of clopen sets and the measure | · |; then the measure
of the set is equal to its density [*Feller 1932b, pp. 208–214].
14 This notion of content had been introduced by Peano (1887, [35, Chapter 5]) in a
exist and are equal to P(Γ), it is necessary and sufficient that Γ ∈ ζ1 , i. e., that
Γ is Peano–Jordan measurable.
As we have seen, this had already been shown in Feller’s joint article with
Tornier [*Feller 1932b] in the special case of product measures.
15 Only in [Feller 1939c, p. 90] Kolmogorov’s version of the extension theorem was explic-
itly referred to, and also – rather vaguely – brought into a connection with Banach’s name.
Possibly, Feller hinted at the so-called Hahn–Banach extension theorem for functionals in
normed vector spaces, which in fact can be used, via characteristic functions of sets, for ex-
tending simply additive set functions defined on a set algebra consisting of subsets of a cer-
tain set Ω to all subsets of Ω (see [31, p. 255]).
being its outer and inner Peano–Jordan measure with respect to P and the set
algebra ζ1 . Then for each matrix f ∈ ϕ we have
1 1
P(ϑ) ≤ lim inf Nk (ϑ; f ) ≤ lim sup Nk (ϑ; f ) ≤ P(ϑ).
k→∞ k k→∞ k
Feller now refers to a theorem of Tornier (1933, [46, p. 312]) to the effect that
for each number c with
P(ϑ) ≤ c ≤ P(ϑ)
there exists an f ∈ ϕ such that
1
(8) lim Nk (ϑ; f ) = c.
k→∞ k
As a consequence, if ϑ does not belong to ζ1 , then there exists a nonempty
system Φ ⊂ ϕ of matrices f for which (8) holds. It is easily shown that all sets
Λ ⊂ R such that
1
p(Λ) := lim Nk (Λ, f )
k→∞ k
exists for all f ∈ Φ, form a set algebra ζ2 ⊃ ζ1 , where ϑ is an element of ζ2 . The
set function p(Λ) can be considered as an – additive but not sigma-additive –
extension of P from ζ1 to ζ2 . In this sense there exist infinitely many different
extensions of P from ζ1 depending on the particular choice of the adjoined set
ϑ and the choice p(ϑ) = c.
In this context Feller (p. 18) points out the common ground of Tornier’s
and Kolmogorov’s approaches. Both refer, in a first step, to a set algebra A1
endowed with a probability measure P which is sigma-additive in the sense
that for each sequence (Ai ) ⊂ A1 of mutually disjoint sets:
∞ ∞ ∞
Ai ∈ A1 =⇒ P Ai = P(Ai ).
i=1 i=1 i=1
holds for each infinite subsequence (mni ) of (mn ) which is generated according
to any of the selection rules S. In the latter limit relation Hn (L, (mni )) stands
for the relative frequency of occurrences of elements from L within the first
n members of the subsequence. WL was called the “probability of L in the
kollektiv K(S, M)” by Wald.
The main result of Wald’s 1937 paper [53, p. 46] is as follows:
We note first that Wald expressed the limit behavior of subsequences (see
“Condition II”) in a considerably simplified mode compared with von Mises.
Wald [53, pp. 41 f.] proved, however, that his version was equivalent to von
Mises’s. A second important remark is that Wald’s treatment of kollektivs
was constructive with the only exception that the selection procedures could
also include non-constructive elements. In this way he gave a direct proof that
the cardinality of kollektivs was (at least) the cardinality of the continuum.
We have to note further that Wald in contrast to Tornier (and Feller) only
demanded simple additivity for the probability measure μ within the “basic”
set algebra K. He did not show the necessity of Peano–Jordan measurability,
however. He [53, p. 47] only stated that “further weakening” the assumptions
would “scarcely be of any interest”. By the remark that there existed at most
“countably many mathematical laws”, Wald (same place) justified the consid-
eration of only countably many selection rules. Without altering Wald’s results
and arguments, Church (1940, [9]) would give an exposition relating to a pre-
cise definition of “constructiveness” on the basis of the theory of computability
(recursiveness, lambda-calculus).
Prior to Wald, especially Arthur H. Copeland had contributed to a formal
elaboration of von Mises’s ideas in a series of papers between 1928 and 1937.
In his 1937 paper [12] (already submitted in 1933) he came, regarding gener-
ality, rather close to Wald’s achievement, with a basically similar approach.
holds. As one can see quite easily, all sequences being in accord with Wald’s
conditions are admissible, the reverse, however, is not true. In the context
of “admissible” in its full generality also Copeland (1931, [11]) realized the
importance of restricting the considered sets to those which are Peano–Jordan
measurable. In the particular case M = {0, 1} and μ according to a Bernoulli
experiment with probability p for the outcome “1”, the limit relation above is
equivalent to
1
n−1 k
(9) lim mri +js = pk ,
n→∞ n
j=0 i=1
PhD thesis, which was not entirely completed at this time. Ville (1939, [48, pp. 34–38])
would give a detailed account, but only for kollektivs corresponding to a Bernoulli process.
References
All citations of the form [Feller 19nn], resp., [*Feller 19nn] (if the respective
paper is not included in these Selecta) point to Feller’s bibliography, pp. xxv–
xxxiv.
[1] Baire, R.: Sur la représentation des fonctions discontinues, deuxième par-
tie. Acta Mathematica 32 (1909) 97–176. Reprinted in: P. Lelong (ed.):
Œuvres scientifiques, Gauthier–Villars, Paris 1992, pp. 377–456.
[3] Bernhardt, H.: Richard von Mises und sein Beitrag zur Grundlegung
der Wahrscheinlichkeitsrechnung im 20. Jahrhundert. Dissertation (B),
Humboldt-Universität, Berlin 1984.
[4] Bienvenu, L., Shafer, G., and Shen, A.: On the history of martingales
in the study of randomness. Electronic Journal for History of Probability
and Statistics 5, n◦ 1 (2009).
[7] Cantelli, F.: Sulla probabilita come limite della frequenza. Atti della Reale
Accademia dei Lincei–Rendiconti 26 (1917) 39–45.
[9] Church, A.: On the concept of a random sequence. Bulletin of the Amer-
ican Mathematical Society 46 (1940) 130–135.
[14] de Guzmán, M.: The evolution of some ideas in the theory of differenti-
ation of integrals. In: J. A. Barroso (ed.): Aspects of Mathematics and its
Applications. Elsevier, Amsterdam 1986, pp. 377–385.
[42] Shafer, G. and Vovk, V.: The sources of Kolmogorov’s Grundbegriffe. Sta-
tistical Science 21 (2006) 70–98.
[43] Siegmund-Schultze, R.: Sets versus trial sequences, Hausdorff versus von
Mises: “Pure” mathematics prevails in the foundations of probability
around 1920. Historia Mathematica 37 (2010) 204–241.
[49] Vitali, G.: Sui gruppi di punti e sulle funzioni di variabili reali. Atti della
Reale Accademia delle Scienze di Torino 43 (1908) 229–246. Reprinted
in: Opere sull’analisi reale e complessa, Ed. Cremonese, Firenze 1984,
pp. 257–276.
[50] Volterra, V.: Sopra le funzioni che dipendono da altre funzioni. Atti della
Reale Accademia dei Lincei–Rendiconti 3 (1887) 97–105. Reprinted in:
Opere matematiche, Vol. 1, Accademia Nazionale dei Lincei, Roma 1954,
pp. 294–314.
[52] Von Plato, J.: Creating Modern Probability. Cambridge University Press,
Cambridge 1994.
Feller’s papers on the central limit theorem (1935) and the weak law of large
numbers (1937), henceforth abbreviated by CLT and WLLN, prepared the
ground for his high reputation. Both papers, as well as further articles on
related topics from the same period, were written in the style of classical
analysis. Probabilistic concepts and notations were used in a very restrained
mode only, and this circumstance contributed, at a time when probability
theory did not belong to the central topics of the mathematical canon, to the
popularity of Feller’s work on limit theorems.
Historical scope
Since the early days of probability theory, the WLLN and CLT had been
crucial parts of its development.1 Jakob Bernoulli by giving explicit estimates
showed in his Ars conjectandi (1713, [6]2 ) that, expressed in modern terms,
in a Bernoulli process with success probability p the relative frequency hn of
successes among n trials obeys a limit relation of the form
1 Comprehensive historical accounts on this part of the history of stochastics, which the
reader may consult for details, are [1],2 [18], [20]. For Feller’s and Lévy’s work on the CLT
around 1935, in particular, see [29].
2 The references [Feller 19nn] and [*Feller 19nn] (the star indicating that the respective
paper is not contained in these Selecta) refer to Feller’s bibliography, while [n] points to the
list of references at the end of this essay.
Laplace did not elaborate a general mathematical theory of the CLT but
he derived his approximations in each particular situation again and again,
though by the same methods in each case. Laplace also began to consider
CLTs for higher-dimensional r.vs. Despite some significant problems regard-
ing details, this situation was however always viewed as a by-product of the
one-dimensional case by him and his successors. Therefore, in this survey the
multi-dimensional case is omitted. Laplace’s CLT immensely enlarged the field
of applications of probability theory, as in error theory, hypothesis testing, or
also in an early approach to a theory of risk. All these achievements were col-
lected in Laplace’s Théorie analytique des probabilités [28], the most significant
monograph of probability theory during the 19th century, whose first edition
appeared in 1812. By considering probabilities
a√n
n 2
2 − x
P (Xk − EX1 ) ≤ na ≈ √ e 2σ2 dx ≈ 1
σ 2π 0
k=1
for “very large” n, Laplace also derived assertions corresponding to the weak
law of large numbers, and he interpreted them in the sense of regularities in
nature and society which appear in the long run.
Today the WLLN is often connected with Poisson’s name. This math-
ematician extended Laplace’s CLT towards non-identically distributed, uni-
formly bounded and independent variables (“quantités variables”) Xk by the
following “approximation”:
⎛ ⎞
n γ
⎝ (X − EX ) ⎠ 1 2
e−u du (n 1).
k k
(1) P γ≤ k=1
≤γ ≈ √
2 n
VarX π γ
k=1 k
Already with Laplace, the CLT and WLLN had gained, beyond their sig-
nificance in applications of probability theory, a purely mathematical quality,
due to the specific analytical methods employed, such as Fourier methods
and procedures for approximating “functions of large numbers”. This trend
was intensified in later investigations, such as Cauchy’s – even according to
modern standards rigorous – proof of a CLT [11] (under quite restrictive as-
sumptions, however), and was especially emphasized in Chebyshev’s pertinent
work. Pafnutii Lvovich Chebyshev tried to embed the CLT and WLLN into his
theory of moments, starting with an innovative proof [12] for the WLLN in the
form of (2) with the help of the now so-called Bienaymé–Chebyshev inequal-
ity for independent random variables Xk (simply “quantités” in Chebyshev’s
words):
⎛ ⎞
n
n
1
P ⎝ (Xk − EXk ) ≤ α VarXk ⎠ > 1 − 2 .
α
k=1 k=1
Bienaymé (1853, [10]) had derived an analogous inequality in the special situ-
ation of linear combinations of observational errors. In this way, the WLLN,
which up to this time had been a corollary of the CLT, reached autonomy.
Chebyshev’s version of the CLT (1887, [13]) is very notable already because he
replaced the somewhat vague assertions on “approximations” like (1), which
had been common up to his time, by a precisely stated limit relation. He
considered a sequence of “quantités” u1 , u2 , u3 , . . . (which were only tacitly as-
sumed to be mutually independent) with zero means such that the absolute
values |Eusk | of all moments of order s ∈ N \ {1} were uniformly bounded for
all k by a constant depending on s. Then, as Chebyshev argued, the limit
relation
⎛ ⎞
n t
⎝ u ⎠ 1 2
e−x dx (n → ∞)
k
P t≤ k=1
≤t → √
2 n
Eu2 π t
k=1 k
was valid. In order to prove this, Chebyshev tried to compare the moments
of the normed sum and the normal limit distribution for n → ∞, but he did
not succeed in giving entirely sound arguments. The flaws of his proof were
eventually eliminated by his disciple Andrei Markov
in 1898/99 [42], [43], after
having introduced the additional assumption n k=1 Euk /n > α > 0 for all n.
2
Vn (x) → V (x) (n → ∞)
for independent X1 , X2 .
In view of the beginning dominance of characteristic functions it was sur-
prising that Jarl Waldemar Lindeberg [38] succeeded in 1922 in proving the
CLT by a very elementary and direct method under very weak assumptions
which later even turned out to be necessary in the case of uniform “smallness”
of all summands:
During the first half of the 1920s another important aspect of modern limit
theorems emerged: General norming. In his research on stable limit distribu-
tions for sums of independent identically distributed r.vs, Lévy (see, e. g., [31])
considered sums of the form n k=1 Xk /an with positive “constants” an . This
included normal limit distributions as a particular case, and in this way a CLT
for independent r.vs even with infinite second order moments was likewise es-
tablished. In this case, norming by means of the standard deviation rn , as
in (3), is not possible. For the time being, shift constants were not explicitly
considered by Lévy. Already in 1922, Sergei Bernshtein [7] published a note in
which the assertion of the CLT was not only extended from purely indepen-
dent to “almost independent” r.vs (including r.vs which form Markov chains),
but also to r.vs without necessarily existing second moments by considering
truncated variables (a comprehensive account [8] appeared in 1926). Instead
of “classical” normed sums
n n
(Xk − EXk ) VarXk ,
k=1 k=1
and
n n
k=1 (Xk − bk )
(5) P ≤ x → Φ(x) (n → ∞),
an
k=1
it is necessary and sufficient that
p2n (δ)
∀δ > 0 : lim n = 0.
2
k=1 |x|≤pn (δ) x dVk (x)
n→∞
and, given any probability γ ∈ [0, 1), the dispersion of X with respect to γ is
defined as
ϕX (γ) := inf{x ∈ (0, ∞) | fX (x) ≥ γ}.
Roughly speaking, the concentration function gives the maximum probability
belonging to any interval of a given length l, and the dispersion function is
the minimum interval length belonging to a certain probability level.3 In
this context, Lévy (1931, [32]) discovered that his new techniques could be
applied to the CLT, and he stated (without proof) the up to then most general
version of the CLT. On the sequence of independent r.vs (Xk ), Lévy imposed
a condition which was sufficient for the limit assertion (5) and already very
similar to the one he would state in the subsequent 1935 paper [33]; under the
UAN assumption it proved to be even necessary. This latter condition was:
(6) ∀ > 0 : P max1≤k≤n |Xk | > Ln → 0 (n → ∞),
where Ln denotes the dispersion of n k=1 Xk with respect to any fixed prob-
ability level in (0, 1). Lévy already in 1931 alluded to the case of uniformly
3 More precisely, dispersion is the generalized inverse (continuous from the left) of fX .
by Lévy (Ln being defined as above), the condition (6) is equivalent to the
existence of sequences (an ) (an > 0) and (bk ) such that (5) is true. Lévy’s
assertion is equivalent to Feller’s “criterion”, and his arguments are sound (see
[18, pp. 291–296; 308–310]). Lévy [34, p. 107] maintained to have submitted
his article already in the fall of 1934, whereas Feller’s paper was submitted
considerably later in May 1935. According to all we can find out ca. 80 years
later, also Feller’s article appeared around the turn of the years 1935/36, and
in contrast to the statement in [Feller 1937a, p. 304, footnote 4], it was by no
means significantly earlier issued than Lévy’s.5 Both accounts, Feller’s and
Lévy’s, differ so much regarding style, methods, and details that the question
of priority should not really have been of considerable significance.
Feller was usually very careful with references to other authors. According
to this habit, at several places of [Feller 1935c], Lévy’s achievements surround-
ing the “classic” CLT in Lindeberg’s version, the theorem on the convergence
of characteristic functions, and even the role of stable laws as limit distribu-
tions, are hinted at. In [Feller 1937a, p. 304, footnote 4], Lévy’s 1935 paper
[33] is referred to “after a friendly communication by Mr. P. Lévy”.6 Yet, just
at this point the problem begins: Feller only makes a remark on a particular
result of Lévy in the context of identically distributed r.vs. As it seems, Feller
did not realize the full significance of Lévy’s 1935 paper, which was – admit-
tedly – written in a very condensed and idiosyncratic style. If he had done
so he would have registered that the conditions “(7)” and “(8)” on p. 303 of
4 See[18, pp. 272–275] for more details.
5 LeCam [29, p. 85] has carefully investigated the circumstances surrounding the pub-
lication of the just mentioned articles by Lévy and Feller. He was informed by Springer,
the publisher of Mathematische Zeitschrift, that the issue containing Feller’s article was
“abgeschlossen” (which probably means “ready for the printer”) on 8 November 1935. In the
Bibliographie der Deutschen Zeitschriftenliteratur, Vol. LXXVII (July to December 1935),
[Feller 1935c] is cited on p. 733. From the same source we can learn that the part up to
p. 628 of the Mathematische Zeitschrift 40 was issued in 1935. The last page of Feller’s CLT
article in the same volume 40 has the number 559. Within this volume, the particular issue
where Feller’s article appeared cannot be seen. The articles were apparently delivered in dif-
ferent paper layers, but a more precise organization cannot be discerned. A renewed inquiry
to the Springer archive yielded that minutes on the delivery of Mathematische Zeitschrift
are no longer available there. Le Cam (loc. cit.) already reports the same.
6 Apparently, there was some correspondence regarding the CLT between Lévy and Feller.
As it seems, letters are not preserved, though. Lévy did not systematically collect documents,
and, moreover, his pre-war private papers were destroyed during WWII [3, p. 1]. Also in
Feller’s case, letters addressed to him could not be found despite considerable effort by the
editors of these Selecta.
Xn2
lim = 0,
n→∞ n 2 dV (x) − ( 2
k=1 |x|<Xn x k |x|<Xn x dV k (x))
whereas in Feller’s “(7)” the term ( |x|<Xn x dVk (x))2 is missing. In fact,
Feller’s “criterion A” was erroneous in the general case, as hinted at in the
erratum to the paper [Feller 1937a].7
In his 1945 survey of the state of the art of probability theory, Feller in
several places quite fairly refers to Lévy’s achievements. The pertinent section
on the CLT [Feller 1945b, pp. 818 f.] starts with a rather lengthy account on
the advantages of general norming for sums of independent r.vs and ends with
the formulation of the CLT in Feller’s version, suggesting to the reader that all
these ideas are mainly due to Feller. Only in a footnote on page 820, is Lévy’s
version (according to [34, p. 107]) of the CLT literally quoted, but without any
comments on Lévy’s very special terminology, such that the reader is scarcely
able to understand the content. Feller at this place only states that Lévy’s
theorem “in a sense, should be equivalent” to his own.8
Lévy, on the other hand, was similarly restrictive in acknowledging Feller’s
results. In a footnote which was added to the discussion of necessary and suf-
ficient conditions for the CLT in his 1937 book [34, p. 107], he maintained that
Feller had only found again (“retrouvé”) the theorem. In the second edition
of this book, Lévy [35, p. 107] was friendlier, in writing – again in a footnote –
that Feller had discovered (“découverte”) the theorem independently of him.
Finally, in his autobiography Lévy [36, p. 108] wrote “Je n’aurai jamais eu de
chance avec la loi de Gauss”. In fact, the impact of Feller’s CLT was by far
higher than of Lévy’s. We will come back to this issue once again in the last
section.
ful formulation of the criterion”, indicating at the same time a possible future publication
by Doeblin, might refer to Doeblin’s paper [16], where on p. 51 a necessary and sufficient
condition for the CLT, including a method to determine norming constants, is given. Doe-
blin’s criterion is, however, rather different from Feller’s “Criterion A”.
8 Feller at this place erroneously asserts that Lévy’s CLT only refers to the case of
) by
Suppose that all Xnk have zero medians. Define the array (Xnk
Xnk if |Xnk | ≤ mn
Xnk :=
0 else.
In order that there exists a sequence (dn ) of real numbers such that
mn
k=1 Xnk
P − dn > → 0 (n → ∞)
mn
mn
(7) P (Xnk − Xnk ) = 0 →0
k=1
and
1
mn
VarXnk → 0.
m2n
k=1
As it seems, Feller did not realize this circumstance – this applies also to
later work, such as [Feller 1945b, p. 827] – and, admittedly, Kolmogorov had
not considered general norming in connection with his theorem.10 There-
fore, Feller’s WLLN, which was shown by entirely different methods than
Kolmogorov’s, was a really important innovation. Indeed, Feller stated that
Kolmogorov had only considered the particular case an = n and referred to
Khinchin [25] with a 1936 paper on a (specialized form of the) WLLN for
independent and identically distributed random variables under more general
norming [Feller 1937b, p. 192]. In 1929, Khinchin [23] had already shown that
for independent identically distributed r.vs Xk the existence of an expectation
μ was sufficient for
n
k=1 Xk
∀ > 0 : P − μ > → 0 (n → ∞).
n
9 Thenecessity of a condition corresponding to “(1)” in [Feller 1937b] is explicitly shown
by Kolmogorov [26, p. 486]. From this condition, Kolmogorov’s condition (7) is deduced,
see [26, p. 317].
10 In [19, p. 105] a clear reference is made to Kolmogorov’s 1928 paper in the context of
The game as described ends with an overwhelming probability after only a few
trials, and therefore only a modest gain seems to be attainable. Nobody would
pay a large, or even an “infinite”, amount for such a game, and this appeared to
be a paradoxical situation. A vivid scientific discussion began on the concepts
for all positive > 0. Therefore, the accumulated stake for n repeated games
has to be asymptotically equivalent to n log2 n. In [*Feller 1950, pp. 199–201]
this result is explained in an especially elementary and lucid way.
11 See [17] and [22] for comprehensive historical accounts. For a very detailed discussion
of the most important 18th century sources see [4, pp. 239–258].
Conclusion
Feller’s articles on the CLT and the WLLN were regarding methods and style
in the mainstream of the contemporary work on limit distributions of sums of
independent r.vs, and thus one can understand why they were so frequently
referred to in the pertinent literature. In this way Feller’s ideas significantly in-
fluenced further contributions on weak convergence. According to a new focus
on stochastic processes with independent increments, which had evolved since
the late 1920s, the central limit problem could within a few years after 1935 be
extended in a very general way towards limit problems for infinitely divisible
distributions, in particular by the work of Boris Vladimirovich Gnedenko. The
catalytic influence of Feller’s accounts is clearly visible in this work, e. g., by
the similarities of Gnedenko’s necessary and sufficient conditions for conver-
gence to infinitely divisible distributions (see [19, p. 116]) as compared with
conditions (I) and (II) in [Feller 1935c, p. 525].
On the other hand, Lévy did not gain broad acceptance for his newly
derived methods based on concentration and dispersion. Yet there was one
significant exception: Doeblin, one of Lévy’s disciples, in [16] showed how es-
sential results concerning infinitely divisible limit distributions could be treated
by Lévy’s methods as well. Doeblin died in 1941 as a soldier in World War
II,12 and, for the time being, this approach was not followed up, until in 1958
Kolmogorov [27, p. 29] called for a renewed research in those “direct methods”
of probability to which he essentially included considerations on concentration
and dispersion. And actually, to give only one example, it would turn out that
the extension of such direct methods to r.vs with values in infinite-dimensional
Banach spaces had considerable advantages over the use of “characteristic func-
tionals”, see [2].
It is an interesting detail that Feller in the second volume of his book (we
refer to the second edition [*Feller 1971]), beside a comprehensive exposition
on characteristic function methods (Chapters XV, XVI, XVII) also carefully
discusses methods based on semigroups of convolution operators in connection
with limit theorems, in particular the CLT (Chapters VIII, IX). In this way,
12 More precisely, Doeblin killed himself in order to escape from being captured by the
References
All citations of the form [Feller 19nn], resp., [*Feller 19nn] (if the respective
paper is not included in these Selecta) point to Feller’s bibliography, pp. xxv–
xxxiv.
[1] Adams, W. J.: The Life and Times of the Central Limit Theorem, 2nd
edn. American Mathematical Society, Providence (RI) 2009. The 1st edn.
appeared in 1974.
[2] Araujo, A. and Giné, E.: The Central Limit Theorem for Real and Banach
Valued Random Variables. Wiley, New York 1980.
[3] Barbut, M., Locker, B. and Mazliak, L.: Paul Lévy – Maurice Fréchet. 50
approaching German troops, see [47].
[4] Barth, F. and Haller, R.: Berühmte Aufgaben der Stochastik, von den
Anfängen bis heute. De Gruyter/Oldenbourg, München 2014.
[7] Bernstein (Bernshtein), S. N.: Sur le théorème limite du calcul des pro-
babilités. Mathematische Annalen 85 (1922) 237–241.
[9] Bertrand, J.: Calcul des Probabilités. Gauthier–Villars, Paris 1888, 2nd
edn. 1907.
[11] Cauchy, A. L.: Mémoire sur les résultats moyens d’un très-grand nom-
bre des observations. Comptes Rendus Hebdomadaires des Séances de
l’Académie des Sciences 37 (1853) 381–385. Reprinted in Œuvres com-
plètes (1) 12. Gauthier–Villars, Paris 1900, pp. 125–130.
[13] Chebyshev, P. L.: Sur deux théorèmes relatifs aux probabilités. Acta Ma-
thematica 14 (1890/91) 305–315. Originally published in Russian in Za-
piski Akademii Nauk 55 (1887). Reprinted in Œuvres, T. 2, Académie
Impériale des Sciences, St. Petersburg 1907, pp. 481–491.
[15] Cramér, H.: Über eine Eigenschaft der normalen Verteilungsfunktion. Ma-
thematische Zeitschrift 41 (1936) 405–414. Reprinted in A. Martin-Löf
(ed.): Collected Works, Vol. 2, Springer, New York 1994, pp. 856–865.
paper is not contained in these Selecta) refer to Feller’s bibliography, while [n] points to the
list of references at the end of this essay.
88 Contributions to Geometry
classical approximation (or limit) procedures, although now reduced to the case
of single point sequences [Feller 1936b, p. 5 f.]. That allowed to introduce a
left and/or right (semi-)tangent, (two-sided) tangent, strong tangent (Tangente
im scharfen Sinne), and of lower and upper curvature for (continuous) curves.
Often they considered the whole collection of point sequences converging to
the same point on the curve (possibly restricted to converge from “left” or
“right” only). Of course, existence of limits is not always secured but had to
be stipulated. “Lower” and “upper” curvatures result from considering the
infimum or supremum of respective values. A tangent is called strong (“im
strengen Sinne”), if all limits of secants pk pl with pk → p and pl → p exist and
are equal; similarly for a strong tangent plane.
The existence of a strong tangent t of a curve in p is equivalent to a re-
markable regularity condition. If, after choosing local coordinates (x, y, z) close
to p, with x-axis along t, the curve is given by y = f (x), z = g(x), then the
strong tangency condition is equivalent to differentiability of f and g almost
everywhere (a. e.) and, additionally, f (xn ) → 0, g (xn ) → 0 for all sequences
{xn }, xn → x0 , for which the derivatives exist [Feller 1936b, p. 6]. On the
other hand, a curve which is locally C 1 at p may possess a curvature in p
without existence of second derivatives [Feller 1936b, p. 7]. Only for convex
curves “circumstances are particularly simple”. Here our authors could report
a result of H. Bohr’s doctoral student: For a plane convex curve (two-sided)
tangents exist a. e., and also second differentiability holds a. e. [19].
1931 (supervisor R. Courant) had dealt with Über die Geometrien, in denen die “Kreise
mit unendlichem Radius” die kürzesten Linien sind.
6 Such a representation exists for any surface with strong tangent plane at p not orthog-
condition: If g (x) exists a.e. and for any subinterval [a , b ] ⊂ [a, b] the fundamental theorem
b
of calculus is applicable, g (x)dx = g(b ) − g(a ) [Feller 1936b, Footnote 14].
a
ρ ρ
(1) = .
cos θ cos θ
Of course, the usual Meusnier formula arises if one of the sequences has a
normal osculating plane (θ = 0). Then ρ = ρN and ρ = ρN cos Θ.
Moreover, Busemann and Feller considered collections of point sequences
situated on two curves γ and γ , respectively, and converging only one-sidedly.
Then lower and upper curvatures of γ and γ at p satisfy the relation (1) sep-
arately. They claimed that “an essentially equivalent” result had been derived
by Hjelmslev, although the latter had assumed continuously varying tangent
planes in a neighbourhood of p (which seems to be a stronger assumption than
required in Theorem 1) [Feller 1936b, Footnote 16].
In classical surface theory Euler’s formula, κ = κ1 cos2 θ + κ2 sin2 θ, relates
the curvature κ of a general normal section with the two principal curvatures
κ1 , κ2 , where θ is the angle to the direction of the first principle curvature.
Dupin had proposed to study the curvature behaviour at a point p by plotting
the square root of the curvature radius ρ = κ−1 of the normal section on the
corresponding (semi-)tangent with initial point p. Euler’s formula implies that
the resulting curve, Dupin’s indicatrix, is a conic section (possibly degenerate).
Normal half-sections (i. e. sections of the surface Π with a normal half-
plane) at the point p of a continuous surface, will lead to curves wich generally
admit, at best, lower and upper curvatures, even in the case of a strong tan-
gent plane at p. Thus, generally spoken, two indicatrices, a lower and an
upper one, can be expected. Further constraints on the surface seemed to be
advisable for the study of these curves. At this point our authors restricted
their investigation to convex surfaces [Feller 1936b, p. 14].
3 Convex surfaces
3.1 Indicatrix, Euler’s theorem and umbilic points
Feller and Busemann showed that for a convex surface S the points in which
S has no tangent plane are exceptional in the sense that they form a null set
Λ0 [Feller 1936b, p. 24].9 Except for the points q of another, larger null set
8 More precisely, the angles between the the straight lines p p and t are ≥ for some
n n
> 0 (they are “bounded from below” in Feller’s and Busemann’s language).
9 Null sets may be considered with regard to Carathéodory’s two-dimensional measure
in R3 , introduced in 1914. Our authors did not mention this paper (nor Hausdorff’s gener-
alization from 1919), so they may well have referred to Lebesgue’s surface measure intro-
duced for surfaces S ⊂ R3 parametrized by bijective continuous maps f : D → S over domains
D ⊂ R2 which are bounded by curves C of (Lebesgue) measure zero (“courbes quarrables”)
[21, pp. 246, 301–309]. At least they cited Lebesgue’s paper at another occasion (cf. fn. 14).
Carathéodory’s p-dimensional measure of point sets in Rn was a generalization of Lebesgue’s.
90 Contributions to Geometry
Λ1 ⊃ Λ0 all plane sections through q have finite curvatures (with identical lower
and upper curvatures). As it can be shown that any tangent plane of a convex
surface at p is strong,10 they could exploit their generalized Meusnier theorem
to find that all normal sections at p have finite curvature. Thus a single
indicatrix exists at such a point; it is a convex curve and point-symmetric
with respect to p [Feller 1936b, p. 24]. In a longer analytical investigation
they were even able to show that, with the exception of another null set,
the indicatrix at the remaining points forms an ellipse (perhaps degenerate)
[Feller 1936b, pp. 25–29]. They arrive at [Feller 1936b, p. 30]
Λ0 ⊂ Λ1 ⊂ Λ2 ,
such that all points of S \ Λ0 have a (strong) tangent plane; all points of S \ Λ1
have a (convex point-symmetric) indicatrix, and at all points p of S \ Λ2 the
indicatrix is an ellipse (possibly degenerate). Thus Euler’s normal curvature
theorem holds at points p ∈ S \ Λ2 .
The existence of tangent planes or even of indicatrices has remarkable con-
sequences for the regularity of convex surfaces. Busemann and Feller gave an
analytical interpretation of the exceptional sets after presenting their theorem:
It was again generalized a few years later (1919) by Hausdorff; see [17] and the commentary
by S. D. Chatterji in the Gesammelte Werke edition.
10 [Feller 1936b, p. 9]
11 See the Footnote 18.
92 Contributions to Geometry
regard to the generalized indicatrix of convex surfaces. Of course there were
other topics of classical surface theory that deserved attention in the more
general context.
14 In fact, Lesbesgue showed that any two points on a connected continuous surface lying
in a bounded domain of Rn can be connected by a shortest arc [21, pp. 345 f.].
for normal curvature radii ρ(ϕ) at the the point p (taken in a half-plane) with
direction ϕ. For an elliptic indicatrix with semi-axes ρ1 , ρ2 the integral becomes
2π
0 ρdϕ = 2πρ1 ρ2 , and the classical formula is recovered [Feller 1936b, p. 22 f.].
In their third article for Matematisk Tidsskrift the two authors generalized
their result from early 1934 (published in 1936, as we know). They went back
to Gauss’ original idea and considered the “spherical images” Ω∗ of subsets Ω
of a convex surface S (embedded in R3 ). The spherical image p∗ of a point p
was defined as the totality of all points q on S 2 with direction normal to any
supporting plane of S at p.16 Now the question could be posed as:
Under which conditions does the generalized Gaussian curvature
|Ω∗h |
κ(p) := lim
h→0 |Ωh |
|Ω∗h |
b) If i is bounded, lim exists.
h→0 h
15 Blaschkehad shown a similar formula for the limit of the three-dimensional volume of
the cap in [7].
16 A plane Π through p is a supporting plane, if the surface S is contained in one of the
94 Contributions to Geometry
2π
g 2 (ϕ) − g 2 (ϕ)
If p ∈ i, the limit is 0. If p ∈
/ i, it is dϕ.
0 g 4 (ϕ)
|Ω∗h |
c) If i is unbounded, lim need not exist.
h→0 h
An immediate consequence is [*Feller 1936a, p. 43]
Corollary 6. Under the same assumptions as in Lemma 5, the generalized
Gaussian curvature exists for bounded indicatrix i. It is
⎧ 2π g2 (ϕ)−g2 (ϕ)
⎪
⎪ dϕ
⎪
⎨ 0 g (ϕ)
4
, for p ∈
/ i,
2π 2
κ(p) = 0 g (ϕ) dϕ
⎪
⎪
⎪
⎩
0, for p ∈ i.
−1/2
cos2 ϕ 2
Again, for i an ellipse with g(ϕ) = a2
+ sin
b2
, the classical value
arises, κ = (ab)−2 .
Feller and Busemann went further, they analyzed conditions under which
even in case c) of Theorem 6, the generalized Gaussian curvature exists. They
found a condition under which in this case the generalized Gaussian (exists
and) even vanishes [*Feller 1936a, pp. 61 f.].17
Finally, for the total curvature of the surface they could use the null con-
dition for the exceptional set Λ2 of Theorem 2. If the spherical image corre-
spondence Ω −→ Ω∗ maps null sets onto null sets, the total curvature can be
derived from integrating over the points of S \ Λ2 (the “normal” points in the
language of the authors),
|Ω∗ | = (ρ1 (p)ρ2 (p))−1 dp,
Ω\Λ2
with ρi denoting the principal axes of the elliptic indicatrix i at p, [*Feller 1936a,
p. 45].
96 Contributions to Geometry
(iii) There exists some δ ≤ δ such that planes parallel to l intersect the square
neighbourhood with side 2δ in curves of uniformly bounded (strong)
curvature.
From the curvature characterization twice differentiability a. e. could be con-
cluded. This result rounded off the joint work of Feller and Busemann. It was
their last joint publication.
4 Reception
Busemann’s and Feller’s 1934–36 publications received detailed reports in
Zentralblatt für Mathematik and Jahrbuch für Fortschritte der Mathematik
[12, 13, 14]. Their analytical techniques allowed to establish properties which
proved fruitful, independent of the original goals for which they had been de-
veloped, and became part of the tradition convex geometry. At the end of his
report S. Cohn-Vossen remarked:
The most widely cited result emerging from their curvature studies seems
to be twice differentiability of convex surfaces a. e. (Corollary 3 c)), and its
background in the Euler theorem for normal curvatures.21
In Soviet Russia, Alexandr D. Alexandrov and some of his students worked
on topics related to those of Busemann–Feller. They took up the results of the
two Copenhagen emigrants; Alexandrov’s 1939 generalization of the regularity
properties of convex surfaces and hypersurfaces has already been mentioned
[1]. His student I. M. Liberman obtained diverse results on shortest arcs on
convex surfaces in spaces of dimension n, refining and extending Busemann
and Feller’s existence result of tangents to shortest arcs at points p at which a
tangent plane exists (see Section 3.2) [22]. Liberman, like other gifted students
of A. D. Alexandrov, died fighting against Nazi-Germany’s invasion of the So-
viet Union. His most important findings on shortest arcs were integrated by
Alexandrov in his magisterial work on Intrinsic Geometry of Convex Surfaces
[5, pp. xii, 31, 158 etc.]. In this book also Feller and Busemann’s joint papers
References
All citations of the form [Feller 19nn], resp., [*Feller 19nn] (if the respective
paper is not included in these Selecta) point to Feller’s bibliography, pp. xxv–
xxxiv.
22 Feller
and Busemann are quoted in [5] for the following topics: Busemann-.Feller Lemma
(p. 92 f.), tangents to shortest arcs (pp. 147, 156), twice differentiability of convex surfaces
a. e. (p. 387 f.).
23 [15, pp. 280, 1059] relating to normal points a. e. on a convex surface, and to the
principal curvatures and the indicatrix in the chapter on differential geometry [20].
98 Contributions to Geometry
[6] Bianchi, G., Colesanti, A. and Pucci, C.: On the second differentiability
of convex surfaces. Geometria Dedicata 60 (1996) 39–48.
[7] Blaschke, W.: Jahresbericht DMV 27 (1919) 149.
[8] Blaschke, W.: Vorlesungen über Differentialgeometrie, Bd. I. Springer,
Berlin 1921.
[9] Bonnesen, T. and Fenchel, W.: Theorie der konvexen Körper. Springer,
Berlin, 1934. English translation: Theory of Convex Bodies, Moscow (ID,
USA) BCS 1987.
[10] Bouligand, G.: Introduction à la géométrie infinitésimale directe.
Gauthier–Villars, Paris 1931.
[11] Busemann, H.: Convex Surfaces. Interscience, New York 1958.
[12] Cohn-Vossen, S.: Report on [*Feller 1935a]. Zentralblatt Mathematik
0015.12401.
[13] Fenchel, W.: Report on [*Feller 1935b]. Zentralblatt Mathematik
0013.17905.
[14] Fenchel, W.: Report on [*Feller 1936a]. Zentralblatt Mathematik,
0015.12401.
[15] Gruber, P. M. and Wills, J. M. (eds.): Handbook of Convex Geometry, 2
vols. North-Holland, Amsterdam 1993.
[16] Gruber, P. M.: History of convexity. In [15] Vol. A, Chapter 0, pp. 1–16.
[17] Hausdorff, F.: Dimension und äußeres Maß. Mathematische Annalen
31 (1929) 157–179. Reprinted (with commentaries) in: Felix Hausdorff:
Gesammelte Werke, Band 4. Springer, Berlin 2001, pp. 19–54.
[18] Hjelmslev, J.: Grundlag for Fladernes Geometri. Høst i Komm, Copen-
hagen 1914.
[19] Jessen, B.: On konvekse kurver’s krumning. Matematisk Tidsskrift B
(1929) 50–62.
[20] Leichtweiß, K.: Convexity and differential geometry. In [15] Vol. B, Chap-
ter 4.1, pp. 1045–1080.
[21] Lebesgue, H.: Intégrale, longueur, aire. Annali di Matematica 7 (1902)
231–359.
[22] Liberman, I. M.: Geodesic lines on convex surfaces (Russian). Doklady
Akademia Nauk SSSR 31 (1941) 310–313.
[23] McMullen, P. and Shephard, G.: Convex Polytopes and the Upper Bound
Theorem of Probability
Theory
By Willy Feller in Stockholm
For any given sequence {Vn (x)} of distribution functions we will always denote
by Wn (x) the convolutions defined by
+∞
(1) W1 (x) = V1 (x), Wn+1 (x) = Wn (x − y) dVn+1 (y).
−∞
Translated and typeset by René L. Schilling. I am grateful for critical comments and
suggestions by Hans Fischer and Zoran Vondraček. The symbol ¶ indicates a page break in
the original text, and the original pagination is shown in the margin. Footnotes indexed by
lowercase Roman letters contain editorial comments. Throughout the text the index ν has
been changed to μ since the Greek ν closely resembles v, the small Roman V .
where, as usual, Φ(x) denotes the Gaussian standard normal distribution func-
tion x
1 1 2
Φ(x) = √ e− 2 y dy.
2π −∞
It is always a priori assumed that at least the second moments of the Vn (x) are
finite, and that the first moments vanish; in these cases one considers only those
¶ 522 normalizing factors an for which ¶ the second moment of Wn (an x) equals 1
(cf. § 5, (1) and (2), p. 541). Under these assumptions Lindeberg 1 has recently
given a sufficient condition for the limit theorem which contains all known
versions as special cases and which stands out because of its applicability (for
a statement see § 5, p. 541). The question whether Lindeberg’s condition is, at
least under the restrictions mentioned above, also necessary, has been asked2 ,
but not answered conclusively.
First of all, one should remark that the existence of any moments is cer-
tainly not necessary since the convergence cannot depend on the infinitary
behaviour of each of the elements V (x). Indeed, as soon as an → ∞ – which
is, of course, always assumed –, it is easily seen that there is a sequence of real
numbers {ξn } such that the relation (2) remains valid if Vn (x) is arbitrarily
re-defined for |x| > ξn , for example in such a way that all moments become in-
finite. Therefore, this assumption is, from an analytic point of view, certainly
not natural.
A more serious restriction of the problem, however, is the usual normaliza-
tion that the second moment of Wn (an x) has to be 1. Since, in general, the
moments do not converge jointly with the distributions, there exist sequences
{Vn (x)} (even with finite moments of any order), for which Wn (an x) does not
converge at all under that normalization, but is convergent to Φ(x) for a suit-
able choice of an . Moreover, this excludes already those cases where Wn (an x)
converges, under that particular normalization, towards Φ(2x), say. It is ab-
solutely irrelevant not only for the theory, but even more for the original limit
problem from practice, which normalization achieves the convergence to Φ(x).
Therefore, we consider the following question: Let {Vn (x)} be a given se-
quence of distribution functions; do there exist two sequences of real numbers
{an } and {cn } such that Wn (an x + cn ) → Φ(x) and, if so, can one determine
1 J. W. Lindeberg: Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlich-
keitsrechnung, Math. Zeitschr. 15 (1922). A different proof based on the theory of character-
istic functions is given in P. Lévy, Calcul des probabilités, Paris, Gauthier–Villars, pp. 242 ff.
An essentially different method of proof which shows the connection with other asymptotic
theorems can be found in A. Khintchine, Asymptotische Gesetze der Wahrscheinlichkeits-
rechnung, Ergebnisse der Math. Vol. 2, Issue 2, Berlin 1933. The statement given below
(p. 541) differs formally from Lindeberg’s and is due to P. Lévy.
2 cf. for example P. Lévy loc. cit.
exist. (One may assume that the Vk (an x) are arranged in a sequence: V1 (a1 x),
V1 (a2 x), . . . , V1 (an−1 x), V1 (an x), . . . , Vn (an x), . . .; then (3) means that this
sequence converges to E(x) and, because of the monotonicity and the limit
conditions for x → ±∞, this convergence is automatically uniform outside
any neighbourhood of the origin.) In fact, this condition is equivalent to the
seemingly weaker assumption that for every fixed k
It turns out (§ 2) that, as one would expect, the condition (3) is equivalent to
considering only those sequences {an } which satisfy
an+1
(4) an → ∞ and → 1.
an
In general, literally the same statements hold for this restriction as one has
for the usual, more specialized, normalization4 : Only two effectively existing ¶
cases are excluded from the consideration, and both have a completely different ¶ 524
analytic character: 1. If an remains bounded; then (2) fails if any member
Vm (x) is somehow changed; one does not deal with an asymptotic limit law,
but rather with the question, how to split Φ(x) into components (cf. § 2,
1
n
lim bμ = 0
an
μ=1
holds. The question is now whether there are constants bμ such that Vμ (x+bμ )
belongs to Φ(x).
When answering this question we assume at first that the normalizing
factors are given, and we rule out the possibility that the convergence is caused
by shifts. Then one has:
The sequence of distribution functions {Vn (x)} with the normalizing factors
{an } belongs to Φ(x) if, and only if, for every η > 0 all of the following three
¶ 525 conditions are satisfied: ¶
n
(I) lim dVμ (x) = 0,
μ=1 |x|>ηan
⎧ ⎫
n ⎨
2⎬
1
(II) lim 2 x2 dVμ (x) − x dVμ (x) = 1,
an ⎩ |x|<ηan |x|<ηan ⎭
μ=1
n
1
(III) lim x dVμ (x) = 0.
an |x|<ηan
μ=1
a This problem was solved by H. Cramér after the publication of this paper, see Feller’s
1 1
5 This requirement means that both (x1 + · · · + xn ) and (i1 x1 + · · · + in xn ), where
an an
the ik take the values ±1, converge to a normally distributed random variable, i.e. that the
given sequence converges absolutely.
This fixes the origin in a natural way. Let us remark, however, that the
following criterion literally remains valid if one replaces in (6) on the right-
hand sides 12 by any number q < 1: it is only important that the origins are
not completely excentric relative to the supports of the distribution functions
Vk (x); in particular, (6) has only been introduced to get a unique condition.
Now we define for every δ > 0 a real number pn (δ) as the smallest number
such that
n
dVμ (x) ≤ δ
μ=1 |x|>pn (δ)
(the integral over an interval |x| > X is always understood as the limit of the
integrals over |x| > X + as → 0+; therefore, pn (δ) is always defined).
¶ 527 ¶ Then the following criterion holds which gives a complete answer to the
question.
Criterion. If the coordinate origins for the Vk (x) are fixed in such a way that
(6) holds, then it is necessary and sufficient for the existence of a sequence of
real numbers {bk } such that the sequence {Vk (x + bk )} belongs to Φ(x), that
for every δ > 0
n
1
lim 2 x2 dVμ (x) = ∞
pn (δ) |x|<pn (δ)
μ=1
holds. If this is the case, then there is a sequence δn → 0 such that also
n
1
lim 2 x2 dVμ (x) = ∞
pn (δn ) |x|<pn (δn )
μ=1
holds; setting
⎧ ⎫
n ⎨
2⎬
1
b Feller writes 2
instead of 1 − 12 , but this does not work for his comment (in italics)
or
n n
1
(8) lim x2 dVμ (x) > 0 and lim dVμ (x) = 0
qn2 |x|<qn
μ=1 μ=1 |x|>ηqn
1
following (6) that we can replace 2
by q < 1; see also (4) in [Feller 1937a].
§ 2. Preparatory lemmas
According to P. Lévy9 the characteristic function of a distribution function
F (x) is its Fourier transform
+∞
f (t) = eixt dF (x).
−∞
f (0) = 1, |f (t)| ≤ 1.
bt
The distribution function F (ax + b) has the characteristic function e− a i f at ;
if F (x) is the convolution of two distribution functions F1 (x) and F2 (x), i.e.
+∞ +∞
F (x) = F1 (x − y) dF2 (y) = F2 (x − y) dF1 (y),
−∞ −∞
n
(1) wn (t) = vμ (t).
μ=1
1 2
Wn (an x) → Φ(x) is equivalent to wn atn → e− 2 t , and analogously Vμ (an x) →
E(x) (cf. the introduction (4)) is equivalent to vμ atn → 1. Both limits are
uniform in any finite interval; obviously, the latter can also be written in the
form: +∞
ixt
lim 1 − e an dVμ (x) = 0, (μ = 1, 2, . . . , n).
n→∞ −∞
t
n
t 1 2
(α) wn = vμ → e− 2 t
an an
μ=1
converges uniformly in every finite interval and if, moreover, for every > 0
10 S. Bochner, Vorlesungen über Fouriersche Integrale, Leipzig 1932, p. 70 (“necessary”)
and p. 72 (“sufficient”). Bochner’s theorem is even a bit more general than the one stated
here. A somewhat more restrictive theorem (assuming the finiteness of the second mo-
ments) is in P. Lévy, loc. cit. pp. 195 and 197.c A simplification of this proof, which avoids
conditionally convergent integrals, is in H. Cramér, On composition of elementary errors,
Skandin. Actuarie Tidskr. 1928. (There one finds also remarkable remainder estimates for
the central limit theorem.)
c Feller mentions in [Feller 1937a, Footnote 1] that he is actually using Lévy’s version.
holds11 , μ = 1, 2, . . . , n.
Now we study first the restrictions for the normalizing factors, and under
which circumstances (α) can hold without (β).
In order that for two sequences {an } and {an } of positive reals both (α)
and
t 1 2
wn
→ e− 2 t
an
hold, it is obviously necessary and sufficient that both sequences are equivalent,
i.e.
an
lim =1
n→∞ an
¶ 531 holds. ¶ Now it is easy to see that (α) cannot hold for oscillating sequences
1 2
{an }: Since e− 2 t is monotonically decreasing and |vμ (t)| ≤ 1, it immediately
follows from (α) that for given > 0 one has for n > N () and all k > 0
an+k
(2) > 1 − .
an
Thus, for every sequence {an } of positive reals, such that (α) holds, there is an
equivalent, monotonically increasing sequence, and one might restrict oneself
in all what follows to monotone sequences {an } (which, alas, does not simplify
things).
Now we consider first the case that an remains bounded, so that lim an = a
exists. Then (β) cannot hold. If (α) is satisfied, then one has immediately
because of the uniform convergence
∞
t 1 2
vμ = e− 2 t .
a
μ=1
If, for bounded an , one has at all the convergence Wn (an x) → Φ(x), then the
sequence Vn (ax) is obtained by successively splitting of Φ(x) into components.
It is well-known that these have necessarily finite dispersion, and convergence
is only possible with the usual normalization. The convergence changes as
soon as one leaves away any single member, unless it is, incidentally, = E(x).
This case does not belong to probability theory at all.
assume that an → ∞. If both relations (α) and (β) hold, then
Thus,
lim vn atn = 1 and, therefore, it follows at once from
t t t
(3) wn+1 = wn vn+1
an+1 an+1 an+1
11 The second inequality in (β) follows, by the way, from the first one using Schwarz’
Conversely,
if the conditions (α) and (4) are satisfied, then it follows first
that vn atn converges uniformly in every finite interval to 1. By (2), the
same applies to vμ atn , μ = 1, . . . , n. Thus, (α) and (4) together imply (β).
Therefore, we can claim:
¶ If one has at all that Wn (an x) → Φ(x), then the sequence Vn (x) belongs ¶ 532
to Φ(x) if, and only if,
an
(5) an → ∞ and lim = 1.
an+1
Only in this case the influence of the single members becomes negligible as n
increases.
In order to verify that also the second restriction in (5) excludes only com-
pletely uninteresting sequences from the investigation, it suffices to consider a
sequence for which one has
an+1
> α > 1.
an
Then, for (α) to hold, (3) shows at once that already Vn (x) has to converge to
a normal law. More precisely: Setting a2n = n 2
μ=1 cμ (as one may do without
cn
loss of generality) and if > α > 0, then Wn (an x) can converge to Φ(x) only
an
if
where a2n = 2n+1 − 1 and Wn (an x) = Φ(x). Also in this case the convergence
depends only on the behaviour of the single elements. Moreover, as we have
already remarked in § 1, the condition (6) is not sufficient for (α), but we
refrain from stating the exact conditions. Let us just note that every sequence
for which (α) holds can be decomposed by some kind of diagonal procedure
into two subsequences, one of which is of the type (6) while the other belongs
to Φ(x).
inequality.
1
n
(7) lim bμ = 0
an
μ=1
bn
is valid. From (7) it follows, in particular, that → 0, and so, because of
an
(2),
1
lim max[|b1 |, . . . , |bμ |] = 0.
an
¶ 533 ¶ Therefore, the analogue of (β) holds for the new sequence {Vn (x + bn )}.
Thus we have
The sequence {Vn (x + bn )} belongs simultaneously with {Vn (x)} to Φ(x) if,
and only if, (7) obtains; in particular, both sequences always have the same
normalizing factors.
n
(1) lim dVμ (x) = 0,
μ=1 |x|>ηan
⎧ ⎫
1 ⎨
n 2⎬
an+1
(3) lim = 1;
an
to see the latter, we set for the moment pn = min[an , an+1 ] and qn = max[an , an+1 ];
In the same way one easily concludes that for every > 0, all k > 0 and suffi-
ciently large n
an+k
(4) > 1 − .
an
The characteristic relations § 2, (5) for normalizing factors are, thus, a conse-
quence of (1) and (2).
We will now see how (1) and (2) change if we replace Vn (x) by new dis-
tribution functions Vn∗ (x) where Vn∗ (x) = Vn (x + bn ). First we consider the
sequence {bn } which is defined by ¶ ¶ 534
(5) (x − bn ) dVn (x) = 0.
|x|<an
The following proof will show that this claim remains true if one only assumes
(1), (3) and (4), but not (2).
For the proof we note that |bk | ≤ ak so that the left-hand side in (6) is, by
(1), certainly independent of η. Therefore, it is enough to prove the particular
case
n
1
(7) lim (x − bμ ) dVμ (x) = 0.
an |x|<2an
μ=1
For any given > 0 we find some N = N () such that for n > N one has
simultaneously
n
(8) dVμ (x) < ,
1
μ=1 |x|> 5 an
an+1 3
(9) < ,
an 2
an+t 2
(10) > , t = 1, 2, . . .
an 3
k n
i−1
1
≤ |x − bμ | dVμ (x).
an0 2 a <|x|<2a
3 ni ni−1
i=1 μ=nk
¶ 535 ¶ Using (10), (11) and (8) this gives since |bn | ≤ an
n0
1
(x − bμ ) dVμ (x)
an0 |x|<2an
μ=nk 0
k n
i−1
4
≤ ani−1 dVμ (x)
an0 2 a <|x|<2a
3 ni ni−1
i=1 μ=nk
ni−1
4
k
≤ ani−1 dVμ (x)
an0 μ=n |x|> 29 an
i=1 k i−1
∞
4
k i
1
< ani−1 < 4 = 8.
an0 2
i=1 i=1
Denote by N0 the smallest index satisfying aN0 > 2aN such that ank < aN0 ;
N0 is a constant which does not depend on n. We choose ξ = ξ() in such a
way that
N0
dVμ (x) < .
μ=1 |x|≥ξ
N0 (2aN0 + ξ)
< 11 + .
an
Since ξ and N0 depend only on , but not on n, the right-hand side tends to
11 as n → ∞, hence (7), and so (6), follow.
Secondly, we prove some kind of invariance property of the relations (1)
and (2). Assume that {bn } is an arbitrary sequence of real numbers such that
bn
(12) lim = 0,
an
and we set again Vn∗ (x) = Vn (x + bn ). We are going to show that (1) and (2)
imply the analogous relations for V ∗ (x), that is ¶ ¶ 536
n
(13) lim dVμ∗ (x) = 0,
μ=1 |x|>ηan
⎧ ⎫
n ⎨
2⎬
1
(14) lim 2 x2 dVμ∗ (x) − x dVμ∗ (x) = 1.
an ⎩ |x|<ηan |x|<ηan ⎭
μ=1
For the proof we note first that, due to (4) and (12), we also have
1
(15) lim max[|b1 |, . . . , |bn |] = 0;
an
thus one can find some N = N (η) such that for n > N and μ = 1, 2, . . . , n one
always has
|bμ | η
< .
an 2
But then one has for n > N
n n
dVμ∗ (x) ≤ dVμ (x)
1
μ=1 |x|>ηan μ=1 |x|> 2 ηan
x2
dVμ∗ (x) − x dVμ∗ (x)
⎩ |x|<ηan |x|<ηan ⎭
μ=1
⎧ ⎫
n ⎨
ηan +bμ ηan +bμ
2⎬
2
1
n
lim 2 x dVμ∗ (x) =0
an |x|<ηan
μ=1
For later reference let us make one further remark. Because of (6) the two
relations
(17)
n
1 1
n
lim x dVμ (x) = 0 and lim bμ dVμ (x) = 0
an |x|<ηan an |x|<ηan
μ=1 μ=1
¶ are equivalent, i.e. either both hold or fail at the same time. According to ¶ 538
(15) one has, however,
1
n
lim bμ dVμ (x) = 0.
an |x|≥ηan
μ=1
Adding this equality to the second equality in (17), one obtains the
holds, it is necessary and sufficient that for the numbers defined in (5)
1
n
lim bμ = 0
an
μ=1
holds.
a) First we verify (β). For this we choose for any given > 0 some
δ = δ(, T ) such that for |x| < δan and |t| < T
xt xt
1 − cos < ,
an 2 sin an < 2
holds; moreover, we pick N = N () in such a way, that for n > N and the
above δ
n
dVμ (x) <
|x|>δan 4
μ=1
and this is the first inequality (β). The analogous inequality for the sine follows
either in the same way, or directly by an application of Schwarz’ inequality.
c) From this point onwards, the proof basically does not differ from
P. Lévy’s proof of Lindeberg’s theorem.
Since by (β) all characteristic functions vμ atn are uniformly close to 1
for sufficiently large n, we can take logarithms in (α), if we agree to use the
principal branch. Then, (α) is equivalent to (always uniformly in |t| < T )
n
t t2
(2) lim log vμ =− ;
an 2
μ=1
n n $
%
t t
(4) log vμ ∼− 1 − vμ ,
an an
μ=1 μ=1
¶ where the symbol ∼ indicates that the difference of both sides uniformly ¶ 540
tends to zero. All that remains is to prove the following relation which is
equivalent to (2):
n
+∞
ixt t2
(5) lim 1 − e an dVμ (x) = .
2
μ=1 −∞
According to (I )–(III ) one can find for every > 0 some N such that for
n>N
n
dVμ (x) < ,
2
μ=1 |x|≥ an
n
1
x dVμ (x) − 1 < ,
2
a2
n μ=1 |x|< an
1
n
x dVμ (x) < .
an |x|< an
μ=1
and set
n
(2) s2n = σμ2 .
μ=1
holds, then {Vn (x)} with the normalizing factors sn belongs properly to Φ(x).
If the sequence
n
2 1
αn = 2 x2 dVμ (x)
sn |x|<sn
μ=1
has a positive lower bound, then the sequence {Vn (x)} with the normalizing
factors an = αn sn belongs properly to Φ(x). The proof that (I )–(III ) are
indeed satisfied, is obvious.
¶ It is, however, easy to show that the Lindeberg condition (3) is neces- ¶ 542
sary for the fact that {Vn (x)} with the normalizing factors sn belongs to Φ(x).
This assertion is, of course, contained in the proof of the next section. That
proof, however, is rather complicated while the proof of the particular case
considered here is immediate. On the other hand, it is interesting to note that
Lindeberg’s theorem completely covers the practically only relevant normal-
ization (2); therefore we include its proof.
Thus, assume that {Vn (x)} with the normalizing factors sn belongs to
Φ(x), i.e. § 2, (α) and (β) hold with an = sn . Then one has by (1)
n n +∞
t2
(4) 1 − vμ t = 1 − e
ixt
sn dV (x) < .
sn μ 2
μ=1 μ=1 −∞
t
Moreover, because of (β), vμ sn tends to 1 uniformly, which means that we
may take, as in § 4, (2)–(5), logarithms in (α) and conclude that, uniformly
for |t| < T ,
n +∞
ixt t2
1 − e sn dVμ (x) → ;
−∞ 2
μ=1
therefore, we have
2 n +∞
t xt
(5) lim − 1 − cos dVμ (x) = 0.
2 μ=1 −∞ sn
Since this inequality holds for all t, we conclude that the expression inside the
curly braces converges to 0, q.e.d.
holds. Obviously, this can be done in exactly one way. Note that the following
arguments remain valid if we replace on the right-hand sides 12 by any real
number q < 1: This changes in all calculations only some not really important
constants.d
From (1) one directly obtains that
bn
(2) → 0.
an
Therefore, cf. § 3, p. 535, the sequence {Vn (x)} satisfies the conditions (I) and
(II) if, and only if, the corresponding relations are valid for {Vn∗ (x)}, and we
will show that this is indeed the case. Of course, the sequence {Vn∗ (x)} need
not belong to Φ(x). For the corresponding characteristic functions, however,
one has
since the coefficients an satisfy the inequality § 2, (2), p. 531, it follows from
(2), again, that
1
lim max[|b1 |, . . . , |bn |] = 0.
an
Thus, (3) entails that, together with vμ atn → 1, the term vμ∗ atn converges
d See the footnote to § 1, (6).
n
t t2
(8) R log vμ →− .
an 2
μ=1
Now we have
(9)
+∞ +∞ 2
t ixt 1 ixt
log vμ =− 1 − e an dVμ (x) − 1 − e an dVμ (x)
an −∞ 2 −∞
+∞ 2
ixt
+ o 1 − e an dVμ (x) .
−∞
Thus we get because of the uniformity in (8) that for |t| < T and all n
n
+∞
xt
(11) 1 − cos dVμ (x) < M
an
μ=1 −∞
where, of course, M = M (T ).
This yields, by (10), that the sum of the absolute values of the quadratic
terms in (9) for μ = 1, . . . , n stays bounded, and so we have
n
t
(12) log vμ ∼
an
μ=1
n +∞ +∞ 2
ixt 1 ixt
− 1−e an dVμ (x) + 1−e an dVμ (x) ,
−∞ 2 −∞
μ=1
Thus (8) and (12) finally imply that, uniformly in |t| < T ,
(13)
n +∞ +∞ 2
xt 1 xt t2
lim 1 − cos dVμ (x) − sin dVμ (x) = .
−∞ an 2 −∞ a n 2
μ=1
This is the fundamental relation from which we will derive the necessity of (I)
and (II).
To simplify (13) even further, we need two more consequences of (11).
First, according to (11) one has, a fortiori, for every η > 0
n
xt
1 − cos dVμ (x) < M.
|x|>ηan a n
μ=1
For any given η we pick τ = τ (η) such that for |x| < ηan
xτ 1 x2 τ 2
1 − cos ≥ · 2 .
an 4 an
Again we have by (11)
n n
xτ τ2 1
M> 1 − cos dVμ (x) ≥ · 2 x2 dVμ (x).
|x|<ηan an 4 an |x|<ηan
μ=1 μ=1
(16)
n +∞ 2
n 2
xt xt
sin −
dVμ (x) sin dVμ (x)
−∞ an |x|<ηan an
μ=1 μ=1
n +∞
xt xt xt
= sin dVμ (x) + sin dVμ (x) sin dVμ (x).
−∞ an |x|<ηan an |x|≥ηan an
μ=1
2
xt xt
(17) sin dVμ (x) ≤ sin2 dVμ (x)
|x|<ηan an |x|<ηan an
xt
≤2 1 − cos dVμ (x)
|x|<ηan an
+∞
xt
≤2 1 − cos dVμ (x);
−∞ an
by (5) (= § 2, (β)) the expression inside the curly braces on the right-hand
side of (16) tends uniformly to 0, and we find for sufficiently large n
n 2
+∞ 2
n
xt xt
sin dVμ (x) − sin dVμ (x)
−∞ an |x|<ηan an
μ=1 μ=1
n
< dVμ (x) < M .
μ=1 |x|≥ηan
n 2
2
n
xt xt
sin dVμ (x) − dVμ (x)
|x|<ηan
an |x|<ηa n
an
μ=1 μ=1
n
xt xt xt xt
= sin + dVμ (x) sin − dVμ (x)
an an an an
μ=1 |x|<ηan |x|<ηan
n
x2 t2
≤ (1 + η|t|) 2
dVμ (x) < (1 + η|t|)t2 · K · ;
|x|<ηan an
μ=1
Now we pick τ = τ (, η) so small that for |t| < τ and |x| < ηan we have
xt x2 t2
(20) 1 − cos ≥ (1 − ) .
an 2 a2n
that
t2
(1 − A2 ) ≤ 2M .
2
By (22), the left-hand side is non-negative and, since the estimate holds for
all t, we have A = 1. Together with (22) this gives condition (II), which shows
that it is necessary.
¶ The necessity of condition (I) is an immediate consequence. For the ¶ 550
given interval |t| < T we find some η̄ = η̄(T, ) such that for |x| < η̄an (20)
holds. Using (II), the estimate (21) gives for η < η̄
⎧ ⎫
n ⎨ 2⎬
xt t2
lim 1 − cos dVμ (x) − 2 x dVμ (x)
⎩ |x|<ηan an 2an |x|<ηan ⎭
μ=1
t2
≥ (1 − K),
2
and from (19) it follows for η < η̄ that
n
xt
lim 1 − cos dVμ (x) ≤ .
|x|≥ηan an
μ=1
Since the left-hand side decreases monotonically as η grows, this estimate holds
for every η > 0; thus, the sum on the left-hand side converges to 0, and the
limit is, of course, again uniform. So, one has for n > N (, T )
n
xt
1 − cos dVμ (x) <
|x|>ηan an
μ=1
Since, by assumption, {Vn (x)} belongs to Φ(x), (23) holds, and this proves
the necessity of (III).
It is a direct consequence of the just established necessity of the conditions
(I)–(III) that (I )–(III ) are necessary for the fact that {Vn (x)} belongs properly
to Φ(x).
Indeed, by definition it is necessary that (I), (II) and the condition
1
n
lim iμ x dVμ (x) = 0
an |x|<ηan
μ=1
hold where the ik may attain, independently of each other, the values ±1.
This is just (III ). By (I) one has uniformly in μ = 1, . . . , n
1
lim x dVμ (x) = 0
an |x|<ηan
The condition (II) simplifies, because of (III ), to (II ) and this proves its
necessity.
setting
⎧ ⎫
n ⎨
2⎬
then the sequence {an }, together with the distribution functions Vn (x), satisfies
the conditions (I) and (II).
Proof. a) Assume that (I) and (II) hold. By (I) one has for every δ > 0
pn (δ)
(4) lim = 0,
an
and by (II)
n
1
(5) lim x2 dVμ (x) ≥ 1.
a2n |x|<an
μ=1
satisfies, together with the distribution functions {Vn (x)}, the conditions (I)
and (II).
From the assumption on the position of the origins we get, by the Schwarz
inequality,
2
1
x dVμ (x) ≤ x2 dVμ (x),
|x|<qn 2 |x|<qn
and so,
n
1
a2n ≥ x2 dVμ (x).
2 |x|<qn
μ=1
If we assume (7), then lim = 0, and if (8) holds, then lim aqnn is finite. Because
qn
an
of the definition of qn and an , the condition (I) is satisfied.
Moreover, if we assume (7), one has for all sufficiently large n such that
qn < ηan
⎧ ⎫
1 n ⎨ 2⎬
2 x dVμ (x) −
2
x dVμ (x) − 1
an ⎩ |x|<ηan |x|<ηan
⎭
μ=1
⎧ ⎫
1 n ⎨ 2⎬
= 2 x2 dVμ (x) − x dVμ (x)
an ⎩ |x|<ηan |x|<ηan ⎭
μ=1
⎧ ⎫
2 ⎬
1 ⎨
n
− 2 x2 dVμ (x) − x dVμ (x)
an ⎩ |x|<qn |x|<qn ⎭
μ=1
n
≤ 3η 2 dVμ (x),
μ=1 |x|≥qn
and the right-hand side tends to 0, which proves (II) in this case. – If we
assume (8), an analogous estimate holds, only the right-hand side becomes,
say,
n
6 dVμ (x)
μ=1 |x|≥σqn
§ 8. Examples
a) The case where all components are identical: Vn (x) = V (x).
The discussion above shows that we can apply the criterion if we fix the origin
in such a way that V (0) = 0 and = 1. The numbers pn (δ) are defined as the
smallest real numbers such that
δ
dVμ (x) ≤ .
|x|>pn (δ) n
For the existence of a sequence {bn } such that {Vn (x) = V (x + bn )} belongs to
Φ(x), it is necessary and sufficient that
1
lim x2 dV (x) = ∞.
ζ→0 ζZ 2 |x|≤Z
From this we see: The sequence belongs for s ≥ 2 to Φ(x), whereas for s < 2 ¶
12 The assumption on the position of the origin is here and in the following examples
1
δn = ,
log log n
and obtain by § 7, (3),
(n log log n)1/s
dx
a2n = ns .
1 xs−1
Since every equivalent sequence does the same job, we see: One can use the
s
following normalizing factors: a2n = n if s > 2, and a2n = n log n if s = 2.
s−2
It is worth noticing that in the latter case a2n grows faster than n, although
there are n identical components.
b) Let Vn (x) be a step function with five jumps which have the following
sizes
1⎫ ⎪
⎪ x = ±1
2c ⎪
⎪
⎪
⎪
⎪
1 1 1 ⎬
1− for x = ±n, c>1
2 c n2 ⎪ ⎪
⎪
⎪
1 1 1 ⎪ ⎪
⎪
1− − 1− ⎭ x = 0.
c c n 2
The first√moment vanishes, the second equals 1. The usual normalization would
be sn = n; in fact one finds, either by an application of the general criterion,
or by § 5, p.&542, that the sequence belongs to Φ(x) with the normalizing
factors an = nc . Lindeberg’s condition is, of course, not satisfied.
It is instructive to use the example in order to understand how the conver-
gence is achieved. One has
' n $ ' ' %
n
c 1 c 1 1 c
vμ t = 1− 1 − cos t − 2 1− 1 − cos μt .
n c n μ c n
μ=1 μ=1
For the first expression inside the braces it is enough to consider the linear
term of the Taylor development, whereas the second expression requires a
completely different estimate. We write
'
c t2
vμ t = 1− − φ(t) − ψμ (t),
n 2n
where '
1 c t2 ct4
φ(t) = 1 − cos t − , hence |φ(t)| <
c n 2n n2
satisfied because of symmetry; the same applies to the conditon (III ).
n
Then '
n
c t2
n
n
φ(t) + ψμ (t)
vμ t = 1− 1− 2
μ=1
n 2n
μ=1 1 − 2n
t
and the previous estimates show that the product on the right-hand side tends
1 2
to 1; thus, the whole expression tends to e− 2 t .
c) Let Vμ (x) be again a step function, now with jumps of the size
1
if x = ±1,
4
μ−1
” x = ±μ2 ,
2μ4
1 μ−1
− 4 ” x = 0.
2 μ
2μ − 1
The second moment of Vμ (x) is σμ2 = ; with the usual normalization
2
2 1 2
sn = 2 n we obtain a sequence which converges to E(x) (Definition § 1, (3),
p. 523).
In order to apply the criterion we note that for every sequence {pn } which
monotonically diverges to ∞ one has
n
dVμ (x) → 0.
μ=1 |x|>pn
this sequence is equivalent to 12 n. Thus, the sequence {Vμ (x)} with the nor-
malizing factors 12 n (properly) belongs to Φ(x).
which means that the sequence {an } should grow faster than n. For any such
sequence, however, one has
n
1 n2
x2
dV μ (x) = → 0,
a2n |x|<an a2n
μ=1
while the limit inferior of this quantity with the normalizing factors should
be ≥ 1. Hence, there exists no sequence of constants such that {Vn (x + bn )}
belongs to Φ(x).
e) Let ⎧
⎪
⎪0 if x < 1
⎪
⎨ 1
√
Vμ (x) = ” 1 ≤ x < 1+ 3 μ
⎪
⎪
2
μ3
⎪
⎩ √
1 ” 1+ 3 μ ≤ x
√ 2 6
and set an = n. Then we have for n > η
n
dVμ (x) = 0
μ=1 |x|>ηan
and
⎧ ⎫
n ⎨
2⎬
1
n
1 1
x2
dV μ (x) − x dVμ (x) = 1 − 2 → 1,
a2n ⎩ |x|<ηan |x|<ηan ⎭ n μ3
μ=1 μ=1
which means that the conditions (I) and (II) are satisfied. On the other hand,
we have
n
1 1 1
n
x2
dV μ (x) > μ 3 → ∞.
a2n |x|<ηan n
μ=1 μ=1
Appendix
Let us finally (cf. § 1, p. 524) provide an example of a sequence {Vn (x)} such
that Vn (cn x) → Φ(x), but there is no sequence {an } for which Wn (an x) →
Φ(x).
Denote by lμ the solution of the equation Φ(lμ ) = 1 − μ1 , and set for any
integer k > 0
√ 2k+1 +1
m2k +1 = m2k +2 = · · · = m2k+1 = 2 .
Then we define
⎧ √ μ
⎪
⎪Φ √x μ if 0 ≤ x ≤ 2 lμ ,
⎪
⎨ 2
√ μ
Vμ (x) = Φ(lμ ) ” 2 lμ ≤ x < mμ ,
⎪
⎪
⎪
⎩
1 ” mμ ≤ x,
Vμ (x) = 1 − Vμ (−x) if x < 0.
√ μ
Then Vμ 2 · x tends uniformly to Φ(x). By § 2, p. 532 the only possible
√ n+1
normalizing factors are sequences which are equivalent to an = 2 . – For
the characteristic function we get
+∞
2
vμ (t) = eixt dVμ (x) = 1 − uμ (t) − (1 − cos mμ t)
−∞ μ
with
√ μ (1 − e
ixt
uμ (t) = ) dΦ √x μ .
2
|x|< 2 lμ
n 1 2
vμ t
√ n+1 → e− 2 t ,
2
μ=1
hence, ⎧ ⎫
⎪
⎪ 2 mμ t ⎪
⎪
n ⎪
⎪ 1 − cos √ n+1 ⎪
⎪
⎨ μ ⎬
2
1− → 1,
⎪ ⎪
μ=1 ⎪
⎪
⎪
t ⎪
⎪
⎪
⎩ 1 − uμ √ n+1 ⎭
2
or
n
1 mμ t
1 − cos √ n+1 → 0.
μ 2
μ=1
(Received 5–May–1935)
Stochastic Processes.
(Existence and
Uniqueness.)
By Willy Feller in Stockholm
Content.
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
§1. Derivation of the Functional Equations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
§2. The Fundamental Solution and the Initial Value Problem for Second-Order
Linear Parabolic Differential Equations in Two Variables. . . . . . . . . . . . 124
§3. Further Properties of the Fundamental Solution.
Continuous Stochastic Processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
§4. Purely Discontinuous Stochastic Processes. . . . . . . . . . . . . . . . . . . . . . . . . . 144
§5. The General Mixed Case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Introduction.
The following investigation treats random processes with one degree of free-
dom; more precisely it is about those functions F (t, x; τ, ξ) which can appear
as transition probabilities from some state x at time t to another state ≤ ξ at
time τ > t (a purely analytic characterization of these functions will be given
Translated and typeset by René L. Schilling. I am grateful for critical comments and
suggestions by Niels Jacob and Zoran Vondraček. The symbol ¶ indicates a page break in
the original text, and the original pagination is shown in the margin. Footnotes indexed by
lowercase Roman letters contain editorial comments.
0 if ξ < x,
(4) E(x, ξ) =
1 if ξ ≥ x,
then we get from the definition of F (t, x; τ, ξ) and the assumed continuity in t
Finally, the composition rule for probabilities yields for every t < t < τ the
fundamental relation named after Chapman and Smoluchowski
+∞
(7) F (t, x; τ, ξ) = F (t , y; τ, ξ) dF (t, x; t , y).
−∞
Here and in the sequel the Stieltjes differential always refers to the last variable,
¶ 117 i.e. to y. The existence of the ¶ integral in (7) is a priori clear since we have
0 ≤ F ≤ 1 and since F (t, x; τ, ξ) is Borel measurable in x; cf. Lebesgue [11,
p. 261].
terminology and write random variable. See also Feller’s comment in footnote 1 of the later
paper [Feller 1937b].
For processes which are also homogeneous in space, i.e. where G depends only
on τ − t and ξ − x, our problem has been solved in the greatest generality, as
Kolmogoroff [10] has constructed – assuming only the existence of the second
moment for G – the most general solution to (7b) with the initial condition
(6) using the theory of characteristic functions. Cramér [2] gives asymptotic
estimates of this solution as t → ∞. For the connection of the continuous ho-
mogeneous case with differential equations see Petrowsky [13] and Khintchine
[6]. ¶ ¶ 118
exist7 (are finite); because of (11), these limits depend only seemingly on δ 8 .
Let us explicitly remark that the existence of any moments will not be assumed
(and there are, indeed, solutions F (t, x; τ, ξ) without finite moments).
Since we want to derive a differential equation for F (t, x; τ, ξ) as a function
of t, x, we have to assume that
∂F (t, x; τ, ξ) ∂ 2 F (t, x; τ, ξ)
,
∂x ∂x2
exist and are continuous in x for every triplet t, ξ, τ > t. On the other hand,
no assumptions are made on the continuity in ξ.
¶ 119 ¶ Now pick δ > 0 and t > 0. For every fixed t, x, τ > t, ξ one has by (2) and
(7)
(14)
F (t − Δt, x; τ, ξ) − F (t, x; τ, ξ)
Δt
+∞
1 * +
= F (t, y; τ, ξ) − F (t, x; τ, ξ) dF (t − Δt, x; t, y)
Δt −∞
1 * +
= F (t, y; τ, ξ) − F (t, x; τ, ξ) dF (t − Δt, x; t, y)
Δt |y−x|≥δ
∂F (t, x; τ, ξ) 1
+ (y − x) dF (t − Δt, x; t, y)
∂x Δt |y−x|<δ
$ %
∂ 2 F (t, x; τ, ξ) 1 1
+ (y − x) + o (y − x)
2 2
dF (t − Δt, x; t, y).
∂x2 Δt |y−x|<δ 2
7 Of course, one could have replaced F (t − Δt, x; t, y) in (11)–(13) by F (t, x; t + Δt, y).
The thus obtained equations are, by the way, for continuous a(t, x) and b(t, x) – and only
this case will be considered – a consequence of (11)–(13) and vice versa. In the sequel we
will only use (11)–(13).
8 Our assumptions differ from those of Kolmogoroff [7, pp. 445 f.] and [8, p. 150]. Therein
+∞
(11) is strengthened by assuming the existence of mk = |y − x|k dF (t, x; τ, y) for k =
−∞
m0
1, 2, 3 and lim = 0. With these assumptions, (12) and (13) have been proved (under
m2
certain assumptions on the regularity and that a functional determinant does not vanish). –
Khintchine and Petrowsky replace (11) by the stronger analogue of Lindeberg’s condition
∂f (t, x; τ, ξ) ∂2 , - ∂ , -
(17) − + 2 a(τ, ξ)f (t, x; τ, ξ) − b(τ, ξ)f (t, x; τ, ξ) = 0.
∂τ ∂ξ ∂ξ
Let us note that for a general equation f may become negative. For equations
of the form (15) (i.e. c = 0) F (t, x; τ, ξ) is indeed a distribution function in ξ,
so that all conditions of a stochastic process are satisfied. We will furthermore
show that the relations (11)–(13) hold which solves the problem completely.
The equation (17) together with (16) and the initial condition (6) uniquely
determines our stochastic process, too.
Consequently, (11) is the analogue of a weaker condition which may be used to replace
Lindeberg’s condition cf. Feller [3].
∂2
, - , -
b The original paper contains the misprint − ∂f
∂τ
− ∂ξ2
af + ∂
∂ξ
bf which has been
corrected.
holds. Here p(t, x) and P (t, x, ξ) denote two non-negative functions, and P (t,x,ξ)
is, as a function of ξ, a distribution function, i.e. non-decreasing (right-continu-
ous, cf. (3)), and we have
For the solution of this problem we will assume, in addition, that p(t, x) and
P (t, x, ξ) are continuous in t and Borel measurable in x; finally (for simplicity)
we assume that p(t, x) is bounded in each finite t-interval. – For F (t, x; τ, ξ) we
do not pose further continuity assumptions than those mentioned in 1.
A marked difference of the purely discontinuous case compared with all
other cases is the fact, that in the other cases the second equation follows only
¶ 121 indirectly ¶ from the first equation, whereas here both equations are equally
important and completely symmetric. In order to derive the equation for t, x
we note that we have by (7) and (18) for 0 < Δt < τ − t:c
+∞
(19) F (t, x; τ, ξ) = F (t + Δt, y; τ, ξ) dF (t, x; t + Δt, y)
−∞
= (1 − p(t, x)Δt)F (t + Δt, x; τ, ξ)
+∞
+ Δtp(t, x) F (t + Δt, y; τ, ξ) dP (t, x, y) + o(Δt).
−∞
The existence of all integrals appearing here and in the sequel is ensured by the
above mentioned measurability and boundedness of the functions. Subtracting
in (19) on both sides F (t + Δt, x; τ, ξ) and dividing by Δt immediately shows
that the right-hand side admits a limit as Δt → 0+. Therefore, the right-hand
derivative ∂F
∂t exists and we get
$ +∞ %
∂F (t, x; τ, ξ)
(20) = p(t, x) F (t, x; τ, ξ) − F (t, y; τ, ξ) dP (t, x, y) .
∂t −∞
In the same way we can deal with the increments Δt < 0 leading again to (20)
c The misprint of the original, τ has been corrected.
∂F (t, x; τ, ξ) * +
(20a) = p(t, x) E(x, ξ) − P (t, x, ξ) ⊕ F (t, x; τ, ξ).
∂t
This is, of course again, the initial value problem for (20) determined by (5).
In order to get to the second equation for F (t, x; τ, ξ) we write for Δτ > 0d
+∞
F (t, x; τ + Δτ, ξ) = F (τ, y; τ + Δτ, ξ) dF (t, x; τ, y)
−∞
+∞ * +
= (1 − p(τ, y)Δτ )E(y, ξ) + Δτ p(τ, y)P (τ, y, ξ) dF (t, x; τ, y) + o(Δτ ).
−∞
or
∂F (t, x; τ, ξ) * +
(21a) = F (t, x; τ, ξ) ⊕ − p(τ, x)E(x, ξ) + p(τ, x)P (τ, x, ξ) ,
∂τ
this time with the initial condition (6).
The equations (20) and (21) are, of course, only the simplest particular case
of the two integro-differential equations (26) and (27) which will be derived
in 4.; their solution is much easier and needs less restrictive assumptions. As
we will show in § 410 , we may choose arbitrary functions p(t, x) and P (t, x, ξ)
which satisfy the assumptions mentioned above. The equations (20) and (21)
with the initial conditions (5) and (6) admit each a unique solution, and these
solutions coincide; this solution satisfies also the other conditions which have
to be satisfied by a transition function (§ 1, 1). As one would a priori expect,
the solution F (t, x; τ, ξ) is, in general, not continuous as a function of ξ.
9 In the notation of (21) it is, of course, assumed that the first integral on the right-
hand side, as well as F (t, x; τ, ξ) and E(x, ξ), are defined such that they are right-continuous
(cf. (3)):
ξ z
= lim .
z→ξ+
−∞ −∞
This remark is not essential as it is enough to consider continuity points.
10 As already mentioned in the introduction, § 4 is a direct continuation and can be read
Of course, p(t, x) and P (t, x, ξ) are, as stated in the preceding subsection, non-
negative functions and as a function of ξ, P (t, x, ξ) is a distribution function;
moreover, we assume that both functions are continuous in t.
As in the continuous case a direct argument only gives an equation for
F (t, x; τ, ξ) as a function of t, x. Inserting (22) into the fundamental equation
(7) (applied to t − Δt, t, τ ) and subtracting the identity
+∞
F (t, x; τ, ξ) = F (t, x; τ, ξ) dG(t − Δt, x; t, y),
−∞
one obtains
F (t − Δt, x; τ, ξ) − F (t, x; τ, ξ)
Δt
+∞
1 * +
= F (t, y; τ, ξ) − F (t, x; τ, ξ) dG(t − Δt, x; t, y)
Δt −∞
$ +∞
− p(t, x) F (t, y; τ, ξ) dG(t − Δt, x; t, y)
−∞
+∞ %
− F (t, y; τ, ξ) dP (t, x, y) + o(Δt).
−∞
From (22) and (5) it follows immediately that G(t − Δt, x; t, y) → E(x, y) as
∂2F
Δt → 0. If we assume again that ∂F
∂x and ∂x2 exist and are continuous in x,
¶ or ¶ 124
For the following we need the solution of the initial value problem of both the
differential equation (15) and the adjoint differential equation (16). There-
fore it is appropriate to consider right away the most general homogeneous
which we will also use in § 5. In this way it becomes evident, too, to which
extent the results are due to general properties of parabolic equations or, in
particular, related to stochastic processes.
¶ 125 ¶ Of course, it is necessary that a = 0 and we assume that a > 0 (the case
a < 0 can be treated analogously). The coefficients are assumed to be defined
in some fixed t-interval for all x, and then the initial value problem becomes:
We are looking for a continuous solution to (28) which is defined for t < t and
converges to a prescribed function g(x) as t → t −. (The analogous problem for
t > t is known to have no solution, in general; if a < 0 the half-plane t < t must
be replaced by t > t .) This initial value problem is, essentially, equivalent to
the construction of some fundamental solution to (28) which is defined for all x,
i.e. a solution to (28) which has in the point (τ, ξ) some prescribed singularity.
Among all fundamental solutions we look for a particular one which is suited
for our purposes and which is the analogue of the so-called Green’s function
for bounded domains. This solution will now be constructed by an adaptation
of Hadamard’s-and-Gevrey’s method for a = 1 [4, mainly II, pp. 138 f.] (cf. the
introduction p. 115).
1. Let a(t, x) be a positive function defined for all real x and t0 < t < t1
for which the derivatives at , ax and axx exist and are continuous (in t and x);
we set
x
dy
(29) ϕ(t, x) = & ,
0 a(t, y)
for which one can immediately obtain the fundamental solution (cf. (38)).
Note that the restriction (30) is absolutely necessary for the uniqueness of
the initial value problem (31) (hence, for (32)), and thus for the whole theory
to follow. This is shown by the following example:
Set a(t, x) = ch4 x where ch x = cos ix; then (31) becomes
1 − tanh x −1 − tanh x
(36) u(t, x) = −1 + Φ & −Φ &
2(τ − t) 2(τ − t)
ise a solution of (33). Since | tanh x| ≤ 1 and Φ(∞) = 1, the solution u converges
to 0 as t → τ −, i.e. (36) is a not identically vanishing solution to (33) which
tends as t → τ to the initial value 0.
Even if it is not always explicitly mentioned, we assume throughout that
the quantities t and τ always satisfy the inequality
Now we set
1 1 1 {ϕ(τ, ξ) − ϕ(t, x)}2
(38) U0 (t, x; τ, ξ) = √ & √ exp − .
2 π a(τ, ξ) τ − t 4(τ − t)
ϕ(τ, ξ) − ϕ(t, x)
(39) y= &
2(τ − t)
Lemma. Let t0 < T < t1 and let g(t, x) be a bounded function which is defined
¶ 127 for t0 ≤ t ≤ T and all real x ¶ and satisfies in some neighbourhood of every
point (t , x ) a Lipschitz condition of the form
* +
(41) |g(t, x) − g(t , x )| < K |t − t |α + |x − x |α , α = α(t , x ) > 0.
(The finiteness of the integral (42) follows from (40) and the boundedness of
g(t, x)).
The proof is most simply accomplished by reducing everything to a known
theorem for the heat equation. Under the (obviously bijective) transformation
t̄ = t, τ̄ = τ,
x̄ = ϕ(t, x), ξ¯ = ϕ(τ, ξ),
(42) becomes
1 T
dτ̄ +∞
¯ · exp − (ξ¯− x̄)2 ¯
G(t, x) ≡ Ḡ(t̄, x̄) = √ √ ḡ(τ̄ , ξ) dξ.
2 π t̄ τ̄ − t̄ −∞ 4(τ̄ − t̄)
tangens hyperbolicus.
f The original contains the misprint “g(t, x) is continuously differentiable” which has been
corrected.
τ − t = α,
(44) √
ϕ(τ, ξ) − ϕ(t, x) = 2α β,
one obtains a uniformly convergent integral. Therefore, Gx exists and one has
T +∞
∂U0 (t, x; τ, ξ)
Gx (t, x) = dτ g(τ, ξ) dξ
t −∞ ∂x
T +∞
1 ∂ ,& -
= −& dτ g(τ, ξ) a(τ, ξ) U0 (t, x; τ, ξ) dξ
a(t, x) t −∞ ∂ξ
T +∞ &
1
=& dτ a(τ, ξ) gξ (τ, ξ) U0 (t, x; τ, ξ) dξ.
a(t, x) t −∞
The substitiution (44) shows that the last integral may again be formally
differentiated in x. – Using this substitution directly in (42), then t appears
under the integral only in g(·, ·) and we may formally differentiate in t: In this
way we obtain Gt as a sum of a line integral and an area integral; obviously, the
latter converges, as well as Gxx , to 0 if T → t+. In order to see M (G) = 0, it is
enough to split the interval (t, T ) into two parts (t, t ) + (t , T ): In the second
interval the integrand appearing in (42) is regular and solves M (u) = 0, the
first interval is dealt with as stated above. For t → t we get M (G) = −g.
3. Let us return to the general equation (28) which we write in the form
and study the functions Un (t, x; τ, ξ) first as functions of t, x for arbitrary but
fixed τ, ξ.
First we consider U1 (t, x; τ, ξ). If we remove from the domain of inte-
gration in (47) any neighbourhood of the point (τ, ξ), then U0 (p, q; τ, ξ) and
∂
∂q U0 (p, q; τ, ξ) are bounded on the remaining domain; because of (40), the
integral from (47), restricted on the remaining domain, converges for n = 0
absolutely and uniformly in t, x. Thus, we have only to study convergence lo-
rected.
where B(p, q) denotes Euler’s integral of the first kind, cf. [14, p. 253]. In the
same way we get (a fortiori) the boundedness of the other term in (47) for
n = 0, and so we have uniformly in t, x
(50)
∂U1 (t, x; τ, ξ)
∂x
τ $
+∞ %
∂U0 (p, q; τ, ξ) ∂U0 (t, x; p, q)
= dp λ(p, q) + c(p, q)U0 (p, q; τ, ξ) dq.
∂q ∂x
t −∞
In order to see that the integral on the right-hand side converges, we split the
1 2
interval (t, τ ) by, say, t = 12 (t + τ ) into two parts. Observing that ze− 4 z < 1
we obtain in the second part using the substitution (48):
τ +∞
∂U0 (p, q; τ, ξ) ∂U0 (t, x; p, q)
dp λ(p, q) dq
1 ∂q ∂x
2 (t+τ ) −∞
1 +∞
K 4 2 (τ −t) |β| 1 2
< dα √ e− 2 β dβ
8π 0 −∞ α(τ − t − α)
4 1 (τ −t)
K 2 dα K4
= √ <√ .
4π 0 α(τ − t − α) τ −t
¶ 131 ¶ For our purposes it is more convenient to combine the inequalities (49) and
(52); if we restrict our considerations to an arbitrary but finite t-interval, then
we can write (49) also in the form
M
(53) |U1 (t, x; τ, ξ)| < √ .
τ −t
For the estimate of Un (t, x; τ, ξ) with n > 1 we use induction. Assume that
for some n ≥ 1 we have shown that
(54)
N 2n √ ∂Un (t, x; τ, ξ) 2n √
|Un (t, x; τ, ξ)| < n τ − t
n−2
, < N τ − tn−2 ;
Γ 2 ∂x Γ n2
here the constant N > 1 is already chosen so large that in the t-interval under
consideration
1 √
√ < N, |λ| + |c| < N, 2 τ −t < N
a
holds. Then the substitution (51), applied to (47), immediately proves the
claim. We obtain, for example,
+∞
∂Un+1 (t, x; τ, ξ) N 2n+1 τ ∂U0 (t, x; p, q) √
< dp τ − pn−2 dq
∂x Γ n2 ∂x
t −∞
τ −t +∞ √
N 2n+2 n−2 |β| − 1 β 2
< n √ dα τ −t−α √ e 2 dβ
Γ 2 ·2 π 0 −∞ α
τ −t √ n−2
N 2n+2 τ −t−α
= n √ √ dα
Γ 2 π 0 α
1√ n−2
N 2n+2 √ n−1 1−s
= n √ τ −t √ ds
Γ 2 π 0 s
N 2n+2 √ n−1 N 2n+2 √ n−1
= n √ B 12 , n2 τ − t = n+1 τ − t .
Γ 2 π Γ 2
Because of (53) we see that (54) indeed holds for every n > 0.
¶ by (54) this series converges absolutely and uniformly and may be differen- ¶ 132
tiated term-by-term. Therefore, we have by (47)
(56) V (t, x; τ, ξ)
τ $
+∞ %
∂U (p, q; τ, ξ)
= dp λ(p, q) + c(p, q)U (p, q; τ, ξ) U0 (t, x; p, q) dq.
∂q
t −∞
hold.
An application of the substitution (39) this time, however, for fixed t, τ, ξ,
immediately gives
+∞ +∞ &
1 1 1 2
(59) U0 (t, x; τ, ξ) dx = √ & ā(y; t, τ, ξ) e− 2 y dy,
−∞ 2π a(τ, ξ) −∞
where ā(y; t, τ, ξ) denotes the function resulting from a(t, x) under the sub-
stitution. As t → τ , we have in every finite y-interval that x → ξ, hence
ā(y; t, τ, ξ) → a(τ, ξ), and so we find from (59)
+∞
(60) lim U0 (t, x; τ, ξ) dx = 1;
t→τ −∞
¶ 133 ¶ obtains. – Finally we may integrate in (56) with respect to x under the
integral sign; since the integral converges we get
+∞
(62) lim V (t, x; τ, ξ) dx = 0.
t→τ −∞
The relations (60)–(62) together with (55) yield the claim (58).
In conclusion we thus have:
If the coefficients of the differential equation (28) satisfy in the range (37)
the condition A of pp. 128 f., then there exists a fundamental solution
of (28) with bounded V (t, x; τ, ξ) satisfying the relations (58). The quantities
U0 and V are defined by (38), (47) and (55), and the series appearing in (55)
converges absolutely and uniformly in t, x, and may be differentiated term-by-
term in x.
In exactly the same fashion we can, of course, construct a fundamental
solution U ∗ (t, x; τ ∗ , ξ ∗ ) with corresponding properties for the equationh
(cf. (31), (32)). It is still assumed that the coefficients of L(u) satisfy the con-
dition A of pp. 128 f.; obviously, the coefficients of L∗ (u) satisfy the same
condition, too. Therefore, there exists for τ ∗ < t a fundamental solution
U ∗ (t, x; τ ∗ , ξ ∗ ) of (63) in the sense of the theorem stated in no. 4. We want to
prove the fundamental symmetry property
(one verifies (65) integrating by parts vL(u) two times). ¶ Assume that ¶ 134
together, (67) and (68) yield (64). Inserting this into (66) one obtains the
fundamental relation (cf. (9))
+∞
(69) U (τ ∗ , ξ ∗ ; t, x)U (t, x; τ, ξ) dx = U (τ ∗ , ξ ∗ ; τ, ξ)
−∞
and from this, using (58) (of course, applied to U ∗ (t , x; τ ∗ , ξ ∗ )) we get as
t → τ , t → τ ∗
+∞
∗ ∗
(70) u(τ , ξ ) = g(x)U (τ ∗ , ξ ∗ ; τ, x) dx.
−∞
¶ 135 ¶ Our solution necessarily is of the form (70) and, conversely, (70) is a solution
of the problem. Therefore, (70) represents the only bounded solution u(t, x) of
L(u) = 0 which is defined for t < τ and which converges as t → τ to the continu-
ous initial datum g(x). – It is almost evident that the assumed boundedness of
the solution can easily be replaced by more general conditions (cf. no. 3). The
argument also remains valid if g(x) is discontinuous, but of bounded variation;
then, the initial values are only attained if the approximation takes place at
continuity points. A corresponding theorem holds, of course, for the equation
L∗ (u) = 0.
Finally we note for later use (in § 5): If f (t, x) is continuously differentiable
and bounded, then
τ +∞
(71) u(t, x) = dp f (p, q)U (t, x; p, q) dq
t −∞
is for t < τ the only bounded solution of L(u) = −f (t, x) which tends to zero
as t → τ . The fact that (71) is indeed a solution to L(u) = −f follows almost
literally from the arguments in the proof of no. 2, p. 128. That it is the only
solution follows since, otherwise, there would exist a not identically vanishing
solution of L(u) = 0 which tends to zero as t → τ .
Per se, it would be enough to consider in this paragraph the differential equa-
tion (28) with c(t, x) = 0; it does, however, not add any complication if we
include the case c = 0, and doing so, the interrelations become clearer. – There-
fore, we continue to consider the general equation (28) and assume that the
coefficients satisfy condition A of § 2, pp. 128 f.
and
1
(73) lim (ξ − x)2 U0 (t, x; τ, ξ) dξ = 2a(τ, x)
t→τ τ − t |ξ−x|<δ
¶ holds, uniformly in every finite domain (observe that, in contrast to (60)– ¶ 136
(61) we integrate with respect to ξ!). – For the proof we introduce – for fixed
t, x, τ – a new integration variable y by
ϕ(τ, ξ) − ϕ(t, x)
(74) y= & .
2(τ − t)
Since ϕ(τ, ξ) has a positive continuous derivative ϕξ (τ, ξ) (cf. (29)), there is a
number M > 0 such that for |ξ − x| > δ and sufficiently small values of τ − t
M
|y| > √ ,
τ −t
and M has in every bounded region of the x, t-plane a positive lower bound.
Therefore,
1 1 2 τ −t 1 2
U0 (t, x; τ, ξ) dξ ≤ √ e− 2 y dy ≤ √ y 2 e− 2 y dy,
2π |y|> √M 2
M 2π |y|> √M
|ξ−x|>δ
τ −t τ −t
and this yields (72). – For the proof of (73) we re-write (74) as
ϕ(τ, ξ) − ϕ(τ, x) ϕ(τ, x) − ϕ(t, x)
y= & + &
2(τ − t) 2(τ − t)
1 1 &
= (ξ − x) & & + η 2(τ − t),
a(τ, x ) 2(τ − t)
where η is bounded and x denotes a point satisfying x − δ < x < x + δ. This
and the above estimate for the integration limits yield
(75)
τ − t +∞ 1 2
(ξ − x) U0 (t, x; τ, ξ) dξ = √
2
2a(τ, x )y 2 e− 2 y dy + o(τ − t);
|ξ−x|<δ 2π −∞
denote by m and M the minimum and maximum, respectively, of a(τ, x )
+∞
1 1 2
in the interval x − δ < x < x + δ; because of √ y 2 e− 2 y dy = 1 all
2π −∞
where
τ +∞
∂U0 (p, q; τ, ξ)
U 1 (t, x; τ, ξ) = dp λ(p, q) U0 (t, x; p, q) dq,
t −∞ ∂q
(80) τ +∞
U 2 (t, x; τ, ξ) = dp c(p, q) U0 (p, q; τ, ξ) U0 (t, x; p, q) dq.
t −∞
Integrating (80) under the integral sign and enlarging the domain of integration
given by δ > 0 yields:
1
(81) |U 1 (t, x; τ, ξ)| dξ
τ − t |ξ−x|>δ
τ +∞
K ∂U0 (p, q; τ, ξ)
< dp U0 (t, x; p, q) dq dξ
τ −t t ∂q
−∞ |ξ−x|>δ
τ +∞
K ∂U0 (p, q; τ, ξ)
< dp U0 (t, x; p, q) dq dξ
τ −t t ∂q
−∞ |ξ−q|> 12 δ
τ +∞
K ∂U0 (p, q; τ, ξ)
+ dp U0 (t, x; p, q) dq dξ.
τ −t t ∂q
|q−x|> 12 δ −∞
(cf. (40)). Using (76), and then (72), we get for sufficiently small τ − t
τ +∞
1 ∂U0 (p, q; τ, ξ) dξ
(83) dp U0 (t, x; p, q) dq
τ −t t |q−x|> 12 δ −∞ ∂q
τ
dq 1 √
<K √ U0 (t, x; p, q) dq < K τ − t;
t τ −p p−t |q−x|> 12 δ
It is clear that the same argument (even simpler and a fortiori) yields the
corresponding relation for U 2 (t, x; τ, ξ), too. Combining everything, one has
1
(84) lim |U1 (t, x; τ, ξ)| dξ = 0.
t→τ τ −t |ξ−x|>δ
(85)
1
(ξ − x)2 |U 1 (t, x; τ, ξ)| dξ
τ − t |ξ−x|<δ
τ +∞
K
2 ∂U0 (p, q; τ, ξ)
< dp U0 (t, x; p, q) dq (ξ − x) dξ
τ −t t −∞ |ξ−x|<δ ∂q
+∞
Kδ 2 τ ∂U0 (p, q; τ, ξ)
< dp U0 (t, x; p, q) dq dξ
τ −t t ∂q
|q−x|>δ −∞
τ
K
2 ∂U0 (p, q; τ, ξ)
+ dp U0 (t, x; p, q) dq (ξ − x) dξ.
τ −t t |q−x|<δ |ξ−q|<2δ ∂q
Since (ξ − x)2 ≤ 2(ξ − q)2 + 2(q − x)2 we get, using (78), (76), (73) and (40),
for sufficiently small τ − t:
(87)
1 τ
2 ∂U0 (p, q; τ, ξ)
dp U0 (t, x; p, q) dq (ξ − x) dξ
τ −t t |q−x|<δ |ξ−q|<2δ ∂q
τ +∞
2
2 ∂U0 (p, q; τ, ξ)
< dp U0 (t, x; p, q) dq (ξ − q) dξ
τ −t t −∞ |ξ−q|<2δ ∂q
τ +∞
2 ∂U0 (p, q; τ, ξ)
+ dp (q − x)2 U0 (t, x; p, q) dq dξ
τ −t t ∂q
|q−x|<δ −∞
τ +∞
8K √
< τ − p dp a(τ, q)U0 (t, x; p, q) dq
τ −t t −∞
τ
dp 1
+ 2K √ · (q − x)2 U0 (t, x; p, q) dq
t τ − p p − t |q−x|<δ
√
< 15K 2 τ − t.
(85)–(87) yield:
1
lim (ξ − x)2 |U 1 (t, x; τ, ξ)| dξ = 0.
t→τ τ − t |ξ−x|<δ
The same estimates yield again, a fortiori, the corresponding relation for
U 2 (t, x; τ, ξ), and so
1
(88) lim (ξ − x)2 |U1 (t, x; τ, ξ)| dξ = 0.
t→τ τ − t |ξ−x|<δ
3. The remaining estimates which we need are now easily derived. As-
sume that for some n ≥ 0 and some constant M > 0 we have proved that
+∞ n−1
K 2n−2 (τ − t) 2
|Un (t, x; τ, ξ)| dξ < M ,
−∞ Γ n+12
(89) +∞
2n−2 (τ − t) n−1
∂Un (t, x; τ, ξ) dξ < M K
2
n+1 .
−∞ ∂x Γ 2
(90)
+∞
|Un+1 (t, x; τ, ξ)| dξ
−∞
τ +∞
M K 2n−1 n−1
< n+1 (τ − p) 2 dp U0 (t, x; p, q) dq
Γ 2 t −∞
M K 2n−1 2 n+1 M K 2n−1 n+1 M K 2n n
= n+1 · (τ − t) 2 = n+3 (τ − t) 2 < n+2 (τ − t) 2 ,
Γ 2 n+1 Γ 2 Γ 2
and
+∞
(91) ∂Un+1 (t, x; τ, ξ) dξ
∂x
−∞
+∞
M K 2n−1 τ n−1 ∂U0 (t, x; p, q)
< (τ − p) dp dq
2
Γ n+12 t −∞ ∂x
τ n−1
M K 2n (τ − p) 2
< √ n+1 √ dp
πΓ 2 t p−t
1 n−1
M K 2n n (1 − s) 2
= √ n+1 (τ − t) 2 √ ds
πΓ 2 0 s
n
M K 2n n M K 2n (τ − t) 2
= √ n+1 (τ − t) 2 B 12 , n+1
2 = .
πΓ 2 Γ n+22
Because of (40) and (76) the inequalities (89) hold for all n ≥ 0. Moreover,
the penultimate of the inequalities in (90) also yields for all n
+∞
M K 2n−3 n
(92) |Un (t, x; τ, ξ)| dξ < (τ − t) 2 .
−∞ Γ n+2
2
holds. – A separate investigation is only needed for the case where n = 2. But
the estimates leading to (84) yield in exactly the same way, and even easier,
the relation
1
(94) lim |U2 (t, x; τ, ξ)| dξ = 0.
t→τ τ − t |ξ−x|>δ
When combined, (72), (84), (94) and (93) give according to (55):
1
(97) lim U (t, x; τ, ξ) dξ = 0;
t→τ τ − t |ξ−x|>δ
if we set
ξ
(100) F (t, x; τ, ξ) = U (t, x; τ, y) dy,
−∞
then we get from (99) and (97) at once that F (t, x; τ, ξ) tends for t → τ and
x = ξ to E(x, ξ) (for the definition cf. (4)), i.e. that (5) holds. The correspond-
ing relation (6) is equivalent to the two equations (58) if we apply them to
U (τ ∗ , ξ ∗ ; t, x) which we understand as a function of t, x and as the fundamental
solution to the adjoint equation (63). Thus we can collect our results up to
this point in the following way:
The fundamental solution, constructed in § 2, to the general linear parabolic
differential equation (28) yields by (100) a function F (t, x; τ, ξ) which is in the
variable ξ of bounded variation and satisfies the necessary conditions (5), (6),
(7), (11) and (12) for continuous stochastic processes.
In general, F (t, x; τ, ξ) does not define a stochastic process since it need
¶ 142 not be a distribution function in ξ. Moreover, ¶ we have not yet established
that the necessary condition (13) for continuous stochastic processes holds. In
order to derive those two properties, we have to consider the particular type of
the differential equation (15) for continuous stochastic processes, i.e. we have
to set c(t, x) = 0 in the general equation (28).
We use this lemma to prove the following assertion which includes also the
case c < 0 for later reference (§ 5):
Assume that c ≤ 0 in the differential equation (28). Then its fundamental
solution U (t, x; τ, ξ) is non-negative.
Assume there were a point (t < τ, x ) such that U (t , x ; τ, ξ) < 0, then
there would exist an interval |ξ − y| < where U (t , x ; τ, y) < 0. Let g(y) be
some continuous function which is nonnegative in, and vanishing outside, that
interval (g(ξ) = 0). We set
+∞
u(t, x) = g(y)U (t, x; τ, y) dy,
−∞
so that u(t , x ) < 0 and u is a solution to (101). For t < t < τ and x → ±∞
the function u(t, x) obviously tends to zero uniformly and, as t → τ , to the
nonnegative function g(y) (cf. § 2, 6). This clearly contradicts (102).
¶ We have, in particular, for the fundamental solution of (101) ¶ 143
(103) U (t, x; τ, ξ) ≥ 0,
hood of (t , x ) we have c < 0. Then (102) holds even with a strict inequality sign; otherwise
we would clearly have at the point (t , x )
ux = 0, uxx ≥ 0, ut ≥ 0, cu > 0,
which contradicts (28). If c ≤ 0, then it is sufficient to set z(t, x) = uek(t−t ) ; then z satisfies
an equation of the form (28) where c is replaced by c − k; thus, z satisfies an inequality of
U (t − Δt, x; τ, ξ) − U (t, x; τ, ξ)
Δt
1
= Ux (t, x; τ, ξ) (y − x)U (t − Δt, x; t, y) dy
Δt |x−y|<δ
1 1 * +
+ Uxx (t, x; τ, ξ) (y − x)2 + o(δ) U (t − Δt, x; t, y) dy
2 Δt |x−y|<δ
1 * +
+ U (t, y; τ, ξ) − U (t, x; τ, ξ) U (t − Δt, x; t, y) dy.
Δt |x−y|>δ
The expression in the curly braces of the last integral is for every t < τ bounded
and, by (97), the integral tends to zero. The accumulation points of the second
term on the right-hand side are, because of (97), independent of δ; thus this
term converges to a(t, x)Uxx (t, x; τ, ξ) by (98); the left-hand side converges to
−Ut (t, x; τ, ξ). Therefore, from (101) it follows that also
1
(106) lim (y − x)U (t − Δt, x; t, y) dy = b(t, x).
Δt→0 Δt |y−x|<δ
Thus, we have:
The fundamental solution, constructed in § 2, of the differential equation
(101) yields by (100) the transition probabilities F (t, x; τ, ξ) of a continuous
stochastic process satisfying the relations (11), (12), (13).
Alternatively: Assume that the coefficients a(t, x) and b(t, x) satisfy con-
¶ 144 dition A of pp. 128 f.; then the equation (15) with the ¶ initial condition (5)
uniquely defines a stochastic process satisfying (11)–(13). Without further as-
sumptions there is a continuously differentiable frequency function U (t, x; τ, ξ)
which satisfies the adjoint equation (17). Together with the initial condition
(6), the equation (17) is also a unique characterization of the process.
the form (102) for any k > 0. Letting k → 0, the claim (102) follows.
∂u(τ, ξ) * +
(108) = u(τ, ξ) ⊕ − p(τ, x)E(x, ξ) + p(τ, x)P (τ, x, ξ)
∂τ
ξ +∞
=− p(τ, y) du(τ, y) + p(τ, y)P (τ, y, ξ) du(τ, y);
−∞ −∞
¶ the integrals on the right-hand side are well-defined for every function u(τ, ξ) ¶ 145
which is of bounded variation in ξ, cf. Lebesgue [11, p. 261].
The point is to find a solution u(τ, ξ) = F (t, x; τ, ξ) of (108)i which is defined
for
and satisfies the initial condition (6). Once such a solution has been con-
structed, it remains to check whether it enjoys the other remaining conditions
of a stochastic process.
In the following we always assume that the variables satisfy the inequality
(109); by a “finite interval” we always mean |t| < T such that we can apply
(107).
Now let u(τ, ξ) = Π(τ, ξ) − N(τ, ξ) be the decomposition of u(τ, ξ) into two
components which are non-decreasing as ξ is increasing, V (τ, ξ) = Π(τ, ξ) +
N(τ, ξ) and v(τ ) = lim V (τ, ξ) as ξ → ∞. Then we get easily from (110) that
τ
* +
Π(τ, ξ) ≤ N(s, ξ) ⊕ p(s, x)E(x, ξ) + Π(s, ξ) ⊕ p(s, x)P (s, x, ξ) ds,
t
Starting from the assumption v(τ ) < M , it follows for every natural number n
by induction
(τ − t)n
V (τ, ξ) ≤ 2K(T )M ,
n!
so v(τ ) ≡ 0, q.e.d.
¶ 146 ¶ The solution to our problem is most easily found by successive approxi-
mations; we set
(111)
F0 (t, x; τ, ξ) = E(x, ξ),
(112)
τ * +
Fn+1 (t, x; τ, ξ) = Fn (t, x; s, ξ) ⊕ − p(s, x)E(x, ξ) + p(s, x)P (s, x, ξ) ds.
t
represents the desired solution. For the proof of uniform convergence of (113)
we need a new representation for Fn (t, x; τ, ξ) which will also be much more
manageable.
the relation (112) follows from (119), i.e. (112) and (118) are equivalent.
¶ (118) yields a representation of Fn (t, x; τ, ξ) as difference of two functions ¶ 147
which are monotonic in ξ. From (118) we obtain because of (116) in every
finite intervalk
{K(T )(τ − t)}n n
n
{2K(T )(τ − t)}n
(121) |Fn (t, x; τ, ξ)| ≤ = ,
n! k n!
k=0
j The original of the last two lines of the following formula contains the misprints
−Fn (t, x; τ, ξ) ⊕ p(τ, ξ)E(x, ξ) and +Fn (t, x; τ, ξ)p(τ, x)P (τ, x, ξ) which have been corrected.
n
k In the original, the following calculation contains the misprint .
n=0
we have
∞
(124) F (t, x; τ, ξ) = ψk (t, x; τ, ξ).
k=0
where the convergence of the series on the right-hand side is evident. Setting,
for a moment,
∞
(127) (−1)n An (t, τ, x) = f (t, τ, x)
n=0
∂f (t, τ, x)
= −p(τ, x)f (t, τ, x)
∂τ
or, since f (t, τ, x) → 1 as τ → t,
τ
− p(s,x) ds
f (t, τ, x) = e t .
∂ψk (t, x; τ, ξ)
(129) = −ψk (t, x; τ, ξ) ⊕ p(τ, x)E(x, ξ)
∂τ
+ ψk−1 (t, x; τ, ξ) ⊕ p(τ, x)P (τ, x, ξ).
Here, t, x are two parameter which only influence the initial values. Indeed,
we have by (123) and (115) for k > 0
The integrand in (112) tends to zero as ξ → ±∞, and since it is, by (121),
uniformly bounded, also the integral tends to 0; this proves (136).
Collecting the results we have:
Under the assumptions p. 144 on p(t, x) and P (t, x, ξ) there exists exactly
one solution F (t, x; τ, ξ) of (108) or (21) which satisfies the condition (6); it is
for all fixed t, x, τ a distribution function in ξ and it enjoys the representations
a) (111)–(113) or (118), respectively;
b) (123)–(124) together with (114)–(115);
c) (123)–(124) together with (134)–(135).
{λ(τ − t)}k
ψk (t, x; τ, ξ) = e−λ(τ −t) Gk (ξ − x).
k!
Thus we have because of (124)
∞
{λ(τ − t)}k
F (t, x; τ, ξ) = e−λ(τ −t) Gk (ξ − x),
k!
k=0
the initial condition (5) and which is for all t, x, τ a distribution function in
ξ. Again we obtain a unique solution of this problem with the following repre-
sentation:
∞
∞
F ∗ (t, x; τ, ξ) = Fn∗ (t, x; τ, ξ) = ϕ∗k (t, x; τ, ξ)
n=0 k=0
with
∂ψk∗ (t, x; τ, ξ)
= p(t, x)ψk∗ (t, x; τ, ξ) − p(t, x)P (t, x, ξ) ⊕ ψk−1
∗
(t, x; τ, ξ)
∂t
which has the well-known solutionl
τ τ
− p(s,x) ds τ p(s,x) ds
ψk∗ (t, x; τ, ξ) = e t p(σ, x) eσ ∗
P (σ, x, ξ) ⊕ ψk−1 (σ, x; τ, ξ) dσ,
t
l The ∗
misprints of the original, ψk (t, x; τ, ξ) and ψk−1 (σ, x, ξ), have been corrected.
y +∞
L(u) ≡ −us (s, y) − p(s, z) du(s, z) + p(s, z)P (s, z, y) du(s, z),
−∞ −∞
$ +∞ %
∗
L (u) ≡ us (s, y) − p(s, y) u(s, y) − u(s, z) dP (s, y, z) ,
−∞
where F and F ∗ have the same meaning as in no. 4 and no. 6. Then the
above assumptions are fulfilled and we may apply (138). The left-hand side
vanishes identically, and (138) tells us that the integral over v du for t < s < τ
is independent of s:
+∞
(139) F ∗ (s, y; τ, ξ) dF (t, x; s, y) = ψ(t, x; τ, ξ).
−∞
and
ξ
(148) F0 (t, x; τ, ξ) = U (t, x; τ, y) dy,
−∞
.
(149) Fn+1 (t, x; τ, ξ) = J τ − p(t, x)Fn (t, x; τ, ξ)
+∞ /
∂P (t, x, y)
+ p(t, x) Fn (t, y; τ, ξ) dy .
−∞ ∂y
Here U (t, x; τ, ξ) is the fundamental solution of L(u) = 0; by induction we see
that the expression in the square brackets is differentiable in t, x which allows
us to apply the theorem stated in § 2, p. 135. – According to § 3, p. 141 (cf. the
explanation following (100)), F0 (t, x; τ, ξ) tends to E(x, ξ) as t → τ .
If n ≥ 1, then (149) shows that Fn (t, x; τ, ξ) tends to zero as t → τ , and this
implies (by the uniform convergence of (147)) that the initial condition (5) is
satisfied.
The proof of the uniqueness of the solution and of the uniform convergence
of (147) will be accomplished step-by-step using the method of § 4; we only
have to replace the integrals over (t, τ ) by the operator J τ : Lemma (145a),
(145b) provides all the necessary estimates. Again we use the more manageable
representation
n
Fn (t, x; τ, ξ) = (−1)n (−1)k ϕk,n (t, x; τ, ξ)
k=0
where ξ
U (t, x; τ, y) dy if k = 0,
ϕk,0 (t, x; τ, ξ) = −∞
0 if k = 0,
.
¶ 156 ¶(150) ϕk,n+1 (t, x, τ, ξ) = J p(t, x)ϕk,n (t, x; τ, ξ)
τ
+∞ /
∂P (t, x, y)
+ p(t, x) ϕk−1,n (t, y; τ, ξ) dy .
−∞ ∂y
382 On the Theory of Stochastic Processes
Again we see that the functions ϕ are monotone and we obtain the analogues
of the estimates (116).
Moreover, we obtain the new representation:
∞
(151) F (t, x; τ, ξ) = ψk (t, x; τ, ξ)
k=0
where
∞
(152) ψk (t, x; τ, ξ) = (−1)n ϕk,k+n (t, x; τ, ξ).
n=0
For ψ0 we get from (152) and (150) the partial differential equation
(153) L(ψ0 (t, x; τ, ξ)) − p(t, x)ψ0 (t, x; τ, ξ) = 0
with the initial condition
(154) lim ψ0 (t, x; τ, ξ) = E(x, ξ);
t→τ
lim Fn (t, x; τ, ξ) = 0, n ≥ 1;
ξ→±∞
with
f0 (t, x; τ, ξ) = U (t, x; τ, ξ),
.
(160) fn+1 (t, x; τ, ξ) = J τ − p(t, x)fn (t, x; τ, ξ)
+∞ /
∂P (t, x, y)
+ p(t, x) fn (t, y; τ, ξ) dy .
−∞ ∂y
To study the convergence behaviour, it is enough to note that, due to the
boundedness 0 ≤ p(t, x) < α,
τ +∞
|J τ [pU ]| ≤ αJ τ [|U |] = α ds |U (s, y; τ, ξ)U (t, x; s, y)| dy
t −∞
holds, and the last expression is certainly bounded; we have, using the notation
of § 2 and the theorem on p. 133, U = U0 + V , where V is bounded (for U0
¶ was transformed by the substitution (48) into a uniformly convergent inte- ¶ 158
gral. Moreover, also
+∞
∂P (t, x, y)
U (t, y; τ, ξ) dy
−∞ ∂y
in exactly the same way where L∗ (u) denotes the adjoint differential expression
of L(u):
∂2 ∂
L∗ (u(τ, ξ)) ≡ −uτ + 2 a(τ, ξ)u − b(τ, ξ)u .
∂ξ ∂ξ
Thus we arrive at a frequency function f ∗ (t, x; τ, ξ) with corresponding proper-
ties. The proof that f and f ∗ are identical and that the fundamental relation
(9) holds, is achieved by directly adopting the arguments of §§ 3 and 4. In-
deed, if u(τ, ξ) and v(τ, ξ) are (assumed to be) sufficiently regular, we have the
identity
t +∞ *
+ +∞ +∞
ds vN (u)−uN ∗ (v) dy ≡ u(t , y)v(t , y) dy − u(t , y)v(t , y) dy,
t −∞ −∞ −∞
4. Thus we have shown that the equation (26) with the initial value prob-
lem (5) defines a unique stochastic process and that the corresponding function
F (t, x; τ, ξ) admits a frequency function which is, as a function of t, x, also a
solution to (26) and, as a function of τ, ξ, a solution to the adjoint equation
F (t, x; t + Δt, ξ) = (1 − p(t, x)Δt)F0 (t, x; t + Δt, ξ) + Δtp(t, x)P (t, x, ξ) + o(Δt).
This is the relation (22) with F0 instead of G. The fact that F0 indeed satisfies
the conditions (23)–(25) has already been established in § 3.
References
All citations of the form [Feller 19nn], resp., [*Feller 19nn] (if the respective
paper is not included in these Selecta) point to Feller’s bibliography, pp. xxv–
xxxiv.
[1] H. Cramér: On the Mathematical Theory of Risk. Skandia-Festschrift,
Stockholm 1930.
[2] ——: Sur les propriétés asymptotiques d’une classe de variables aléatoires.
C. R. Acad. Sci. Paris 201 (1935).
[3] W. Feller: Über den zentralen Grenzwertsatz der Wahrscheinlichkeits-
rechnung. Math. Zeitschr. 40 (1935).m
[4] M. Gevrey: Sur les équations aux dérivées partielles du type parabolique.
Journ. Math. pures appl. (6) 9 (1913), pp. 305–471; 10 (1914), pp. 105–
148.
[5] J. Hadamard: Sur la solution fondamentale des équations aux dérivées
partielles du type parabolique. C. R. Acad. Sci. Paris 152 (1911).
[6] A. Khintchine: Asymptotische Gesetze der Wahrscheinlichkeitsrechnung.
Ergebnisse der Math. 2, Issue 4, Berlin 1933.
m This is [Feller 1935c] which is contained in these Selecta along with an English trans-
[8] ——: Zur Theorie der stetigen zufälligen Prozesse. Math. Annalen 108
(1933).o
[12] E. E. Levi: Sull’equazione del calore. Ann. di math. pura appl. (3) 14
(1908).
(Received 17–February–1936.)
an added bibliography by A.T. Bharucha-Reid. Chelsea Publishing Co., New York 1956.
q Both parts are translated as On the general form of a homogeneous stochastic process,
Works. Part II: Differential Equations and Probability Theory. Gordon and Breach, Ams-
terdam 1996, pp. 278–298.
Theorem of Probability
Theory. II
By Willy Feller in Stockholm
Introduction
In an earlier paper,1 among other results, exact conditions were determined
for the classical Laplace-Ljapounoff limit theorem to hold, and a criterion was
Translated and typeset by René L. Schilling. I am grateful for critical comments and
suggestions by Hans Fischer and Zoran Vondraček. The symbol ¶ indicates a page break in
the original text, and the original pagination is shown in the margin. Footnotes indexed by
lowercase Roman letters contain editorial comments. Throughout the text the index ν has
been changed to μ since the Greek ν closely resembles v, the small Roman V .
1 On the central limit theorem of probability theory, Math. Zeitschr. 40 (1935). In the
sequel, this paper will be cited as 1. Let me use this opportunity for the following corrections.
a) My results rely essentially on a theorem establishing the connection between the conver-
gence of a sequence of distribution functions with the convergence of the corresponding char-
acteristic functions; for this theorem I quote in Footnote 10, p. 529, Bochner and I remark
that P. Lévy has proved “a somewhat more restrictive theorem”. This is a regrettable error
on my side: In fact, the above mentioned theorem which I have used is exactly due to P. Lévy.
b) Please read on p. 548, line 4 from below
dVμ (x) < instead of dVμ (x) < .
a
|x|> 4Tn
2 |x|<ηan
Finally, the following addendum might be of interest. On pp. 524 and 531 the hypo-
thetic case of bounded normalization factors an was explicitly excluded from the consid-
erations since “in this case one does not deal with an asymptotic law, but with the prob-
lem how to split Φ(x) into components”. After 1 had been published, this question was
answered conclusively. Cramér proved [“Über eine Eigenschaft der normalen Verteilungs-
funktion”, Math. Zeitschr. 41 (1936), also C. R. Acad. Sci., Paris 202 (1936)] the follow-
ing
theorem which was repeatedly stated as a conjecture by P. Lévy: The integral equation
+∞
V1 (x − y) dV2 (y) = Φ(x), where Vk (x) are distribution functions, has only the solution
−∞
We say, for short2 , that the sequence {Vk (x)} belongs to the Gaussian standard
normal distribution function Φ(x) with the normalizing factors {an }, if
x
1 1 2
(2) Wn (an x) → Φ(x) = √ e− 2 y dy
2π −∞
and
0 for x < 0,
(3) Vn (an x) → E(x) =
1 for x > 0.
The probabilistic meaning of the latter condition is that the influence of the
single components tends to zero as n increases, i.e. the convergence of Wn (an x)
is not caused by the dominating influence of the function Vn (x) which, if suit-
ably normalized, converges to Φ on its own.3 This condition is equivalent to the
an+1
requirement that → 1 and an → ∞. — The most general question with
an
regards to the central limit theorem is as follows: Given a sequence of distri-
bution functions {Vk (x)}; under which conditions do there exist two sequences
of normalizing constants {an } and {bn } such that the sequence {Vk (x + bk )}
with the normalizing factors {an } belongs to Φ(x); and, if so, how can one
determine these normalizing factors? The main result of 1 was a complete
x − mk
Vk = Φ with σ12 + σ22 = 1, m1 + m2 = 0. This means that the case which was ex-
σk
cluded corresponds to the trivial sequence
x + bμ
Vμ (x) = Φ with bμ = 0, c2μ = 1.
cμ
2 1. p. 524.
3 1, p. 532. In this paper, a further possible case was excluded. As Cramér [cf. Footnote
hold. Then (and only then) there is a sequence of real numbers Xn → ∞ such
that
Xn2
(7) n →0
2
x dVk (x)
k=1 |x|<Xn
as well as
n
(8) dVk (x) → 0
k=1 |x|>Xn
1, last paragraph] has meanwhile shown, this case will practically not appear.
¶ 304 ¶ According to 1, p. 530 and 533, respectively, the thus defined sequences
{an } and {bn } can be replaced by two different sequences {an } and {bn } if,
and only if,
1
n
an
(11) →1 and (bk − bk ) → 0.
an an
k=1
n
|x|p dVk (x)
k=1 |x|<X
(12) lim
X→∞ n
=0
n→∞ p−2 2
X x dVk (x)
k=1 |x|<X
The remark that both criteria are equivalent (as is easily proved with known
methods) is due to Marcel Riesz whose valuable advice is gratefully acknowl-
edged.
The second part of this paper is devoted to an important special case where
all elements are identical: Vk (x) = V (x). Condition (6) is (unless V (x) degen-
erates into a step function with exactly one discontinuity at the origin) auto-
matically satisfied, while (5) and (12) are transformed into the two equivalent
relations
|x|p dV (x) X2 dV (x)
|x|<X |x|>X
(13) lim = lim = 0.
X→∞ X→∞
X p−2 x2 dV (x) x2 dV (x)
|x|<X |x|<X
In the latter form the condition was already given in 1 (§8, Example a, p. 554)
and discovered, almost simultaneously and independently with different tech-
1 Auxiliary Results
Our proof relies on the following facts which were proved in 1.
a) In order that the sequence of distribution functions {Vk (x)} with nor-
malization factors {an } belongs to Φ(x) it is necessary and sufficient6 , that for
every fixed η > 0 we have simultaneously
n
(1) dVk (x) → 0,
k=1 |x|>ηan
⎧ ⎫
n ⎨
2⎬
1
(2) x2 dVk (x) − x dVk (x) → 1,
a2n ⎩ |x|<an |x|<an ⎭
k=1
n
1
(3) x dVk (x) → 0.
an |x|<an
k=1
4 A. Khintchine: “Sul dominio di attrazione della legge di Gauss”, Giorn. Ist. Ital. At-
¶ 306 ¶ and
n
1
(6) lim 2 x2 dVk (x) − x2 dV (x + bk ) = 0.
n→∞ an |x|<an |x|<an
k=1
c) If the relations (1) and (2) obtain and if bn is defined by (10), then
the sequence {Vk (x + bk )} belongs, with the normalization factors an , to Φ(x).
Then, the convergence is even absolute8 , and one has
n
1
(7) lim x dVk (x + bk ) = 0,
n→∞ an |x|<an
k=1
2
1
n
lim x dVk (x + bk ) = 0.
n→∞ a2
n |x|<an
k=1
2 Sufficiency of Criterion A
First we show: If the conditions (5) and (6) hold, then there exists a sequence
of real numbers Xn → ∞ such that (7) and (8) are satisfied.
We have to distinguish between two cases. Firstly, if
n
(1) lim lim dVk (x) = 0
X→∞ n→∞
k=1 |x|>X
holds, then (8) is true for every sequence Xn → ∞, whereas by (6), the relation
(7) is always satisfied for every sufficiently slowly growing sequence {Xn }.
Secondly, one may have
n
(2) lim lim dVk (x) = α > 0
X→∞ n→∞
k=1 |x|>X
(the existence of the iterated limits follows from monotonicity). Then, for
every 0 < < α, there is obviously a sequence X̄n () → ∞ such that
n n
(3) lim dVk (x) > 0, lim dVk (x) < .
n→∞ n→∞
k=1 |x|>X̄n ( ) k=1 |x|>2X̄n ( )
X̄n2 ()
(4) limn
n→∞
= 0.
2
x dVk (x)
k=1 |x|<X̄n ( )
¶ The second relation in (3) together with (4) shows that, for the validity ¶ 307
of (7) and (8), it is enought to pick a sufficiently slowly decreasing sequence
n → 0 and to define Xn = X̄n (n ).
Now choose an arbitrary sequence Xn → ∞ such that (7) and (8) hold, and
define the numbers an by (9). By the Schwarz inequality one has
Xn
2 Xn
x dVk (x) ≤ {Vk (Xn ) − Vk (0)} x2 dVk (x),
0 0
Therefore, also the relation (2) holds and Theorem c) mentioned in no. 1 shows
that the condition of the criterion is indeed sufficient.
¶ 308 ¶ From the first of these relations one concludes that, as n → ∞, one has
uniformly for k = 1, 2, . . . , n
dVk (x + bk ) → 0.
|x|>ηan
Since η may be chosen arbitrarily small, this and (4) immediately entail that
1
(1) lim max{|b1 |, |b2 |, . . . , |bn |} = 0.
n→∞ an
¶ holds true. If X > X0 and if n is so large that an > X, one therefore has ¶ 309
n n n
1 1
x2
dV k (x) ≤ x2
dV k (x) + dVk (x)
a2n a2n
k=1 |x|<an k=1 |x|<X k=1 |x|≥X
n
1 1
< 2 x2 dVk (x) + .
an |x|<X 2
k=1
this proves that (6) holds. Alternatively, (2) obtains. Then (6) is a direct
consequence of (5) and (2), as one can see using a sufficiently slowly growing
sequence Xn → ∞.
X 2 Fn (X)
(2)
<
Fn (X)
¶ 310 ¶ holds. For p > 2, n > n̄ and X > X̄ one easily obtains by integration by
parts
X X
(p)
Fn (X) = − xp dFn (x) = C − xp dFn (x)
0 X̄
X
= C +p xp−1 Fn (x) dx
X̄
X
< C + p
(2)
xp−3 Fn (x) dx
X̄
p (2)
<C + X p−2 Fn (X);
p−2
here, C and C denote positive constants which may depend on , but not on
(2)
n and X. Since it is certain that Fn (X) is positive for sufficiently large X,
it follows from the last inequality that
(p)
Fn (X)
(2)
X p−2 Fn (X)
n
1
¶(30) lim x dV (x + bk ) = 0; ¶ 311
n→∞ an |x|<an
k=1
moreover, there is a relation which corresponds to (2) but which we will not
explicitly use in the sequel.
Set +∞
b= x dV (x);
−∞
following a remark by P. Lévy (cf. p. 305, Footnote 5), this is well defined
and our proof includes this, too. The theorem in question is as follows: The
sequence consisting of identical elements V (x + b) belongs to Φ(x), too. By a
theorem mentioned in the introduction on p. 304 this is the case if, and only
if,
1
n
(b − bk ) → 0
an
k=1
Similar to the argument at the beginning of no. 3 it easily follows from (1)
that, in turn, (1) holds true. For sufficiently large n and k = 1, . . . , n one has,
therefore,
x dV (x + bk ) ≤ (|x| + |bk |) dV (x).
|x|≥an |x|> 1 an 2
holds, then (3), hence our claim, are proved. Moreover, (4) contains the finite-
ness of the first moment of V (x).
By Theorem c) of no. 1 one may assume without loss of generality that the
constants bn are chosen in such a way that the sequence Vk (x) = V (x + bk )
satisfies the relations (7); then (2) becomes
n
1
lim x2 dV (x + bk ) = 1,
n→∞ a2
n |x|<an
k=1
XG(X) 2
lim ≤
X→∞ F (2) (X) 1 −
q.e.d.
(Received 24–August–1936)
Corrections
Volume 42, pp. 301–312
I am grateful to Messrs. Doeblin and Fréchet for the kind advice that the
new version on pp. 303–4 of my necessary and sufficient condition from vol. 40,
pp. 521 f. is due to a regrettable mistake (following the equality (3)): In the
present form the criterion is only sufficient. In fact, the validity of (5), hence of
(12), has to be restricted in an obvious way, which also becomes immediately
clear from the proof. I do not give details since, meanwhile, Mr. Doeblin
succeededb in proving an even more useful formulation of the criterion.
In the case of identical components, the criterion remains valid and so
does the main result of the paper: The answer to Mr. P. Lévy’s question is
independent of the first part.
Willy Feller.
b Feller probably refers to W. Doeblin: Sur les sommes d’un grand nombre de variables
aléatoires independantes. Bulletin des Sciences Mathématiques, II. Ser. 63 (1939) 23–32
and 35–64.
Numbers
Dedicated to Harald Bohr
on the occasion of his
50th birthday 22–April–1937
Translated and typeset by René L. Schilling. I am grateful for critical comments and
suggestions by Hans Fischer and Zoran Vondraček. The symbol ¶ indicates a page break in
the original text, and the original pagination is shown in the margin. Footnotes indexed by
lowercase Roman letters contain editorial comments.
1 For all definitions as well as for a mathematically rigorous foundation for the notions
used here, I refer to the fundamental treatises by A. Kolmogoroff, Grundbegriffe der Wahr-
scheinlichkeitsrechnung a and A. Khintchine, Asymptotische Gesetze der Wahrscheinlich-
keitsrechnung, respectively. Both are included in: Ergebnisse der Mathematik, Bd. 2 (Berlin
1933). For linguistic reasons I prefer “stochastic variable” (variable aléatoire = real function
on the basic set) over “chance variable”. b
a A. N. Kolmogorov: Foundations of the theory of probability. Translation edited by
1
n
1
(Xk − bk ) = Sn
n n
k=1
n
(1) dVk (x) = o(1)
k=1 |x|>an
2 A. Kolmogoroff: Über die Summen durch den Zufall bestimmter unabhängiger Größen,
Math. Annalen, 99 (1928), pp. 300–319. One should note the erratum with corrections: A.
Kolmogoroff, Bemerkungen zu meiner Arbeit “Über die Summen zufälliger Grössen”, Math.
Annalen, 102 (1930), pp. 484–488.c
3 A. Plessner: Über das Gesetz der großen Zahlen, Recueil Math. (= Matematitscheski
c English translations are contained in A.N. Shiryayev (ed.): Selected Works of A.N.
Kolmogorov. Volume II: Probability Theory and Mathematical Statistics. Kluwer, Dordrecht
1992, pp. 15–26, 26–31.
d Feller writes in his Zentralblatt review [Zbl 0014.16804] that Plessner’s paper generalizes
Khintchine’s result [C. R. Acad. Sci., Paris 188 (1929) 477–479] to independent, identically
distributed random variables with finite absolute first moments.
e Feller refers to the central limit theorem and his papers [Feller 1935c] and [Feller 1937a].
n
1
(2) x2 dVk (x) = o(1).
a2n |x|<an
k=1
If the coordinate origins are chosen in such a way that for all n
(4) Vn (+0) ≥ λ > 0, Vn (−0) ≤ 1 − λ
hold, then these conditions are also necessary.
1 1
n n
The random variables (Xk −bk ) and (Xk −bk ) simultaneously
an an
k=1 k=1
converge in probability to zero if, and only if (cf. no. 2. on p. 195)
1
n
(5) (bk − bk ) = o(1)
an
k=1
holds. There are, obviously, always sequences {an } such that (1) and (2) are
satisfied; therefore it is basically a matter of determining the slowest possible
growth.
If an = n, we get Kolmogorov’s theorem. The connection to the problem
studied by Khintchine (cf. footnote 4) is as follows. There it is assumed,
restrictively, that all Xn are positive and that they have the same (continuous)
distribution function V (x):
(6) Vn (x) = V (x), V (0) = 0.
Then the question is asked under which conditions one can pick the constants
an and bn such that
1
n
(7) bk = 1.
an
k=1
Zeitschrift, 40 (1935), pp. 521–559 and II, loc. cit. 42 (1937), pp. 301–312.e
Conversely, if (10) holds, we can easily choose the an in such a way that (1)
and (8) are satisfied. The condition (2) is, if we assume (6), a consequence of
(10). In fact, one has
z z z +∞
2
x dV (x) = 2 x{V (z) − V (x)} dx ≤ 2 x dx dV (y),
0 0 0 x
Thus, (10) is necessary and sufficient for the choice of the constants. (The con-
dition which was obtained by Khintchine can be reduced to (10) by changing
the order of integration and evaluating one integral.)
In the general case, where Vn (x) = V (x) and with the usual normalization
an = n, (1) is equivalent to V (−z) + 1 − V (z) = o( z1 ). Condition (2) is then
a simple consequence. This condition has already been derived by Cramér6
using characteristic functions. If the first moment is finite, the same method
had already been used by Khintchine7 .
Let us finally remark that the above mentioned theorem immediately gives
a more precise answer to the question raised in the so-called St. Petersburg
Paradox (cf. no. 7., p. 200 f.).
2. Setting +∞
vn (t) = eixt dVn (x),
−∞
¶ 195 ¶ we know that the characteristic function of (Xn − bn ) is e−ibn t vn (t) and that
1
n
of (Xk − bk ) is
an
k=1
it
n
n
t
(11) wn (t) = e− an b
k=1 k vk .
an
k=1
(12) wn (t) → 1
3. We begin by showing that the conditions (1) and (2) are necessary; to
do so, we start with (12) and assume that (4) is satisfied.
From (12) it follows that, uniformly in every interval |t| < T ,
+∞
itx
e an dVn (x + bn ) → 1.
−∞
is necessary. From this and (12) one obtains easily, using the monotonicity of
the sequence {an }, that
(15) dVk (x) → 0
|x|>ηan
8 P. Lévy, Calcul des Probabilités (Paris 1925), pp. 195 and 197.
Math. and Math. Phys. 36, Cambridge Univ. Press, Cambridge 1937.
2xt x2 t2
1 − cos ≥ 2
an an
holds. From the relation (18b) one concludes immediately that for this η
n
1
(20) x2 dVk (x) → 0.
a2n |x|<ηan
k=1
Since (19) holds for every η > 0, one easily sees that (20) remains true for any
η > 0 and, in particular, for η = 1. This proves the necessity of (2).
4. In order to prove that our conditions are sufficient, we begin with the
following
The same lemma was used in the proof of the central limit theorem9 . The
an+1
proof of this lemma essentially relies on the fact that → 1 which is not
an
assumed in the present setting. In order to prove the lemma in full generality,
one determines N = N () such that for all n ≥ N
n
(22) dVk (x) < .
1
k=1 |x|≥ e an
[log an ]
1
≤ |x − bk | dVk (x).
an s s+1
s=[log aN ] aN ≤ak ≤es+1 e ≤|x|<e
[log an ]
2
≤ es+1 dVk (x)
an |x|≥es
s=[log aN ] aN ≤ak ≤es+1
2
[log an ]
≤ es+1 < 2 e2 .
an
s=[log aN ]
holds, q.e.d.
and
n
1
(25) x dVk (x + bk ) = o(1).
an |x|<an
k=1
For the proof, observe first that by (1) and (2) one has for each η > 0
n
(26) dVk (x) → 0.
k=1 |x|>ηan
nung] I, p. 534.
n
1
0≤ x2 dVk (x + bk )
a2n |x|<an
k=1
n
1
≤ 2 (x − bk )2 dVk (x)
an |x|<2an
k=1
1
n
= 2 x dVk (x) − 2bk
2
(x − bk ) dVk (x) − bk
2
dVk (x)
an |x|<2an |x|<2an |x|<2an
k=1
n n
1
≤ 2 x2 dVk (x) + 4 dVk (x)
an
k=1 |x|<an k=1 |x|≥an
n n
1
+ (x − bk ) dVk (x) + 2 dVk (x)
an |x|<an |x|≥an
k=1 k=1
¶ and, according to (2), (1) and (21), all quantities on the right-hand side ¶ 200
converge to zero. This proves (24). For the proof of (25) it is enough to note
that one has, because of (27), for sufficiently large n
n n
1
x dVk (x + bk ) − (x − bk ) dVk (x)
an |x|<an |x|<an
k=1 k=1
2
n
≤ (|x| + |bk |) dVk (x)
an 1
k=1 2 an ≤|x|≤an
n
≤4 dVk (x) = o(1),
1
k=1 |x|≥ 2 an
q.e.d.
g In the second estimate the original has the misprint 2/a2n instead of 1/a2n .
and the right-hand side converges, by the lemma from no. 5., p. 199 to zero.
This entails immediately that, uniformly on each finite interval,
n $
+∞ %
ixt
wn (t) = 1− 1 − e an dVk (x + bk ) → 1,
k=1 −∞
7. Let us, finally, add a few words on the frequently discussed St. Pe-
tersburg Paradox. This consists in the following erroneous conclusion from
the law of large numbers: In a gamble where the mathematical expectation
¶ 201 of the gains is infinite, one may pay arbitrarily high stakes ¶ without any
disadvantage provided that the game is infinitely often repeated. In math-
ematical terms it is claimed that (at least if Vn (x) = V (x) and V (0) = 0),
+∞
whenever x dVn (x) = +∞, for every sequence of real numbers {cn }, the
−∞
probability of the relation
1
n
(Xk − ck ) > 0
n
k=1
then an arbitrarily high loss is almost certain; more precisely: The probability
of the relation
1
n
(Xk − ck ) < −A
n
k=1
(Received 2–March–1937)
The Foundations of ¶ 11
dm
(1) = −λm
dt
for the total mass m = m(t) of the material not yet decayed. In theory, the
numbers effectively observed will somewhat fluctuate, but these fluctuations
are extremely small and take place around the mean value determined by
(1). The fact that this is true, i.e. that equation (1) and the corresponding
equations in Volterra’s theory in this sense yield the expected values (math-
ematical expectations, mean values) of the relevant quantities, seems to be
mostly regarded as a matter of course which requires no further investiga-
tions. A detailed examination, however, yields the interesting result that this
is not always the case. Although (1) provides the relevant expected values, al-
ready the most basic biological growth processes, for example those which are
described in Volterra’s theory by the Pearl–Verhulst or “logistic” differential
equation
dm
(2) = m − δm2 ,
dt
reveal that these equations hold only approximatively for the relevant expected
values (although with sufficient precision for any practial purpose). The theo-
retical expectations, around which the quantities actually observed (under in-
variable basic assumptions) have to fluctuate statistically, are always a bit
smaller in reality than one would get from equation (2). A similar statement
holds for the system of equations
dN1
= 1 N1 + δ1 N1 N2
(3) dt
dN2
= 2 N2 − δ2 N1 N2
dt
which, in Volterra’s theory, yields just the first approximation for the descrip-
tion of the struggle for a joint habitat between, say, a predator and a prey
species of size N1 (t) and N2 (t), respectively. The fundamental difference be-
¶ 13 tween the phenomena described by (2) and (3) ¶ and simple growth models
of type (1) consists in the fact that, in the latter case, single atoms act, in the
sense of probability theory, completely independently of each other, while the
1 For a purely qualitative treatment of the growth problem of several populations which
The system of N differential equations (5) and (5a) together with the initial
conditions (4) uniquely determines the probabilities Pn (t) that we have been
A very small S(t) clearly indicates, according to (8), that only those Pn (t)
are significantly different from zero whose index is close to M (t), i.e. that we
may practically expect only numbers which do not significantly differ from the
expectation. Indeed, the limiting case S(t) = 0 means that PM (t) = 1 and that
all other Pn (t) vanish, i.e. only the expectation M (t) will be realized. In gen-
eral, one can easily show using (8) (the so-called inequality of Tschebytscheffa )
that the probability of the event&that the observed number n differs from the
¶ 18 expectation M (t) by more than ηS(t) is always smaller than ¶ 1/η, i.e. that
the probable
& deviations from the theoretical expectation are at most of the
order S(t). This estimate holds, quite generally, for any probability distri-
bution; for particular distributions like (5) and all others which appear later
on, one can very easily improve the estimate significantly, but we may as well
manage with this rough estimate.
For many purposes the quantities M (t) and S(t) are a sufficient quantita-
tive description of the probability distribution in question. For more precise
3 For a comprehensive review of the modern theory of Markoff chains cf. Fréchet (1938).
a Chebychev
Thus we have S(t) < M (t) and, according to the discussion above, the
& probable
deviations from the expectation are, therefore, at most of order M (t), i.e.
they are relatively insignificant for large M (t). Hence, the number of atoms
actually observed will be, with overwhelming likelihood, close to the value
given by the deterministic differential equation (1).
again with the initial values (4). Although the system (16) is a system of
infinitely many differential equations, this does not present any difficulties.
The solution is given by
n − 1 −λN t
(17) Pn (t) = e (1 − e−λt )n−N , (n ≥ N ≥ 1),
n−N
and one easily verifies that this indeed is a probability distribution. For the
expectation and the dispersion (cf. the definitions (7), (8) and (10)) we obtain,
as before,
M (t) = λ · M (t),
(18)
S (t) = 2λ · S(t) + λ · M (t),
¶ 20 ¶ or, explicitly,
M (t) = N · eλt ,
(18a) ( )
S(t) = N · e2λt · 1 − e−λt .
& √
Hence, one now has S(t) < M (t)/ N , i.e. for large initial values N one will,
in practice, observe only numbers which differ
√ only little from the
√ expectation
M (t), namely at most by the order of M/ N . The ratio of S and M is,
however, much less favourable than before.
The system (15) corresponds to a growth process that would, in the deter-
ministic setting, be described by the first one of the differential equations in
although it is used for cross-referencing.
(19) Pn (t) = −{λn − γn2 } · Pn (t) + {λ(n − 1) − γ(n − 1)2 } · Pn−1 (t).
Although we have set out from the same ansatz which leads to the equation
(2) in the deterministic approach, we obtain here a different equation for the
expectation M (t), namely
In order to compare this value with the solution of (2), we note that
and this is true for any probability distribution; this follows at once from (10)
since, because of the definition (8), one always has S(t) ≥ 0. By (22) and (23)
one has
and this shows that M (t) is always smaller than the solution of (2) with the
same initial value M (0) = N . Under the present assumptions the sizes of the
population to be observed will statistically fluctuate around a quantity which
is slightly smaller than what would correspond to the differential equation (2).
equations from the theory of Markoff chains. Under the assumption that all coefficients are
bounded, I have (Feller, 1936) investigated the most general of these systems and shown
that they have solutions which satisfy all requirements of the theory (in particular, they
are positive and Pn (t) = 1). If the coefficients are unbounded, this theorem fails to
hold. Although in the case of (20) solutions, which can easily be
there are still nonnegative
computed by a recursion, one has Pn (t) < 1 whenever 1/pn converges; thus the Pn (t)
are, in this case, not a probability distribution. I intend to show elsewhere that, also in the
more general setting, one can find a necessary and sufficient condition which is completely
analogous to the one given in the text.
(25) Pn (t) = −(ω + τ )n · Pn (t) + ω(n − 1) · Pn−1 (t) + τ (n + 1) · Pn+1 (t),
of course again with the the initial values (4), if there were initially exactly N
individuals.
It is now interesting to compare expectation and dispersion of the proba-
bility distribution (25) with those of (16). One easily obtains from (25)
M (t) = nPn (t) = −τ (n + 1)Pn+1 (t) + ω (n − 1)Pn−1 (t)
= (ω − τ ) nPn (t)
This is the analogue of (22). In order to compare it with (2), we have to set
ω − τ = and γ − σ = δ, in line with the biological meaning of these quantities.
Thus, we again encounter the same phenomenon which we have already met
in the more specialized approach (19) (cf. p. 22): The statistical fluctuations
have the effect that the population grows, on average, somewhat more slowly
than one would get from the approach (2). Also for the other aspects, our
earlier remarks (p. 22) remain valid.
5 It is, per se, not necessary to assume that the size of the population changes only
continuously. As we have already mentioned in the introduction, we can very easily combine
both techniques described in the text and assume that the size of the population changes
partly continually and continuously – namely by ageing and similar continuous processes –
and partly sometimes by jumps. This would actually be the correct mathematical approach,
but the continuous approach seems to give a sufficiently good approximation and to contain
all qualitative essentials, so that we believed that we can restrict ourselves in the text to
this approach. Moreover, it will become obvious from the text how one can derive the
which corresponds to equation (4) in the atomistic approach. Due to (29) one
can rewrite (30) also in the formd
N− ∞
(30a) lim w0 (t) + w(t, x) dx + w(t, x) dx = 0.
t→0 0 N+
¶ 27 ¶ We will now show how one can apply a method which was first used by
A. Kolmogoroff in order to derive a differential equation for w(t, x):
d Misprint
N + N −
in the original: . . . instead of . . ..
0 0
is the probability that the population has died out by time t + τ if it was of
size ξ at time t.
Using the function f (ξ, x; τ ), we can make precise the notion of continuity
of the process (cf. Feller, 1936, §1) described vaguely above. Actually, when
describing the process we assume continuity in the sense that for every 0 < < ξ
ξ+
1
(33) lim 1− f (ξ, x; Δt) dx = 0,
Δt→0 Δt ξ−
exists; clearly, a(ξ) is then the population’s tendency to grow at any moment
when the size is ξ, and this is the exact correpsondence to the notion of speed
of growth in the deterministic approach.
However, a(ξ) = 0 is possible as soon as positive and negative increments
are equally likely; this may happen also in cases where, statistically, relatively
6 For this, cf. Kolmogoroff (1931), §§13–14, and under slightly more general conditions
¶ For the special case α = 0 one obtains from (39), using a limit argument, ¶ 30
the solution of the more special equation
∂w(t, x) ∂ 2 {xw(t, x)}
(40) =β
∂t ∂x2
in the form
'
1 N 2i √ − N +x
w(t, x) = J1 xN · e βt
iβt x βt
(41) ∞ √ 2κ
N − N +x 1 xN
= 2 2 e βt .
β t κ!(κ + 1)! βt
κ=0
Since one has good estimates for Bessel functions, one can read off from (39)
and (41) all essential properties of the solution w(t, x) of (38), and of (40),
respectively.
The greatest difference between this solution and the corresponding solu-
tion (17) of the atomistic approach is that, in the latter case, it was absolutely
impossible for the population to die out at any point, if the reproduction
probability λ is positive (one always has n ≥ N ). The equation (31), however,
always yields a positive probability for the extinction of the population. The
probability w0 (t) that the population has died out by time t can be calculated
from (29). The integral appearing there can be easily computed from the series
representation (39a): Noting that
∞ $ %κ
− αx αx β(eαt − 1)
e β(eαt −1) dx = κ! ,
0 β(e − 1)
αt α
one obtains easily
∞ αt ∞ $ %κ+1
− αN e 1 αN eαt
w(t, x) dx = e β(eαt −1)
0 (κ + 1)! β(eαt − 1)
κ=0
eαt
$ αN eαt
%
− αN
=e β(eαt −1) e β(eαt −1) −1 .
(cf. (9)). In particular, M (t) = M1 (t) is the expected value of the popula-
tion size, around which the observable statistical fluctuations will take place.
Accordingly (cf. (8))
∞
(46) S(t) = {x − M (t)}2 w(t, x) dx + M 2 (t)w0 (t)
0
is the dispersion of the population size, i.e. the expectation of the square of
its deviation from its expected value. Multiplying out the square in (46), one
obtains, with (29) and the definition (45) of M (t) in mind, that
(47) S(t) = M2 (t) − M 2 (t),
of the initial value problem (31), cf. also the remarks to No. 8, pp. 33 f.
M (t) = α · M (t),
which coincides with the previous results. In general, for Mk (t) and k > 1, the
same technique yields
and this gives a recursion formula for all Mk (t). In particular, we get from
this using (47)
S (t) = M2 (t) − 2M (t)M (t) = 2βM (t) + 2αM2 (t) − 2αM 2 (t)
= 2αS(t) + 2βM (t)
and so
β
(50) S(t) = 2 N e2αt (1 − e−αt ).
α
If one compares (50) with the corresponding formula (15) and (18a), then one
sees that we have complete coincidence if one sets α = β = λ. This, however,
is exactly the relation which one can ¶ calculate for the quantities (33) and ¶ 33
(37) from the probability distributions in the atomistic case. With these quan-
tities, the two ways of describing the growth process coincide, as far as the
expectation of the population size and its quadratic dispersion are concerned.
However, if one computes, using (49), the higher-order dispersion measures
that are familiar in statistics, then one easily sees that under the present
continuous approach these quantities become larger than in the atomistic ap-
proach. This corresponds with the fact that the continuous approach takes
into account more biological factors which will, in turn, cause theoretically
larger statistical fluctuations as well.
∂w(t, x) ∂ 2 xw(t, x) ∂ , -
(51) =β − (αx − γx2 )w(t, x) .
∂t ∂x2 ∂x
For this equation, too, one can calculate expectation and dispersion by the
method mentioned above. For example, one gets
so that the expected value is again smaller than under the deterministic ap-
proach (2). For the other quantities, one also obtains qualitatively the same
results as in the atomistic approach and in No. 7 (pp. 31 f.), so that we can do
without the explicit calculations.
A general method of integration for the equation (31) does not yet exist for
the cases we are interested in, and here one encounters many interesting, but
unsolved mathematical problems. A reasonably definitive result exists only
for the case where b(x) > 0 throughout, and all x are possible. In this case
I have (Feller, 1936, §§2–3), under suitably general regularity assumptions,
constructed the solution to (31) as a convergent series, and I have proven
that it is unique and satisfies all probabilistic requirements, in particular, that
the conditions (33), (35) and (37) are actually satisfied for this solution. ¶
¶ 34 The equations of the form (31), which we will now encounter, contain an
essentially novel aspect, since b(x) = 0 is, in general, possible, and since only
solutions in the quarter-plane t > 0, x > 0 are of interest. This is, from an
analysis perspective, a singularity of the equation that causes major difficulties
and which thoroughly modifies the initial value problems. From the point of
view of applications, the situation is better, since every equation (31) may
be approximated by equations for which b(x) > 0 holds and since many types
of equations can be directly reduced to equations of the latter kind. The
last remark holds, in particular, for those equations whose coefficients can be
written in the form
If both a(0) = 0 and b(0) = 0, then the solution of (31) is uniquely determined
just through the initial values at t = 0. If only b(0) = 0 but a(0) = 0, then both
cases may occur. To my knowledge, the latter has, first been observed by S.
Kepinski (1905). An all-embracing criterium for this case does not exist.
9. Let us finally say a few words on the limiting procedures that lead to
the deterministic approach.
The growth process is completely determined in a causal sense if, and only
if, the speed a(x) of the increments of the population, in the moment when it
has reached size x, is completely specified, i.e. if the statistical dispersion b(x)
around the expectation vanishes identically. In this limiting case (31) indeed
becomes a first-order equation
(55) x = a(x).
In fact, (55) yields the so-called characteristics of the equation (54), i.e. those
lines along which any attained state has to evolve according to (54).
Using a different limiting procedure, one reaches the same result starting
from the atomistic approach. For this we start, say, from the form (20) of the
differential equations of the growth process. The limit to infinite populations
is now done in the following way: The size of the population is assumed to
be any multiple of a quantity h > 0 and Pn (t) is the probability that exactly
¶ 37 the size n · h is attained. Then we carry out the limit ¶ h → 0 in such a way
that nh → x where x > 0 is chosen arbitrarily. If mh < x < (m + 1)h and m is
integer, then we set
From this one easily concludes that w(h) (t, x) converges to a solution of the
equation (54).
10. Let us finally show how the previous considerations can be adapted
to the more general problems (without aftereffect) of Volterra’s theory of the
struggle for life. For this, it is enough to adapt the more natural, continuous
approach: The atomistic method then carries over in the same way, and even
easier.
Assume, for simplicity, that there are only two populations, which compete
for the same environment, one of them being a predator species, the other a
prey species. Instead, as it is usually done, to describe the process using two
functions of one variable each, we use a single function of three variables. If
the first population has size x1 and if, at the same time, the second is of size
x2 , then we represent this state by a point (x1 , x2 ) in the plane. One has
to determine a function w(t, x1 , x2 ) that is the probability density for both
populations to have the sizes x1 ≥ 0 and x2 ≥ 0, respectively, at the same
moment t > 0; the initial values of the quantities at time t = 0 are, of course,
assumed to be known.
Using similar considerations as before one obtains the following partial
differential equation for w(t, x1 , x2 ):
(56)
∂w(t, x1 , x2 ) 2
∂ 2 {bik (x1 , x2 )w(t, x1 , x2 )} ∂ {ai (x1 , x2 )w(t, x1 , x2 )}
2
= − .
∂t ∂xi ∂xk ∂xi
i,k=1 i=1
e The original contains the misprint w(t, x − h) instead of w(h) (t, x − h) in the second line
Literature
Bernstein, S. (1938) Limitation des modules des dérivées successives des solu-
tions des équations du type parabolique. – C. R. Acad. Sci. URSS, N. S.,
XVIII, 385–389.
Feller, W. (1936) Zur Theorie der stochastischen Prozesse (Existenz- und Ein-
deutigkeitssätze). – Math. Annalen CXIII, 113–160.f
Fréchet, M. (1938) Recherches théoriques modernes sur le caclul des probabil-
ités. Livre II (Traité du calcul des probabilités I, 3) – Paris, Gauthier–
Villars, 315 pp.
of the following calculation.
f This is [Feller 1936c] of these Selecta.
—— (1936) Sulla teoria di Volterra della lotta per l’esistenza. – Giorn. Ist.
Ital. Attuari VII, 74–80.
Probability Theory and Mathematical Statistics. Kluwer, Dordrecht 1992, pp. 179–181.
i Studies of the Diffusion with the Increasing Quantity of the Substance; Its Application
On the Existence of ¶ 87
So-called Kollektivs
By Willy Feller (Stockholm)
Translated and typeset by René L. Schilling. I am grateful for critical comments and
suggestions by Hans Fischer and Zoran Vondraček. The symbol ¶ indicates a page break
in the original text, and the original pagination is shown in the margin. Footnotes indexed
by lowercase Roman letters contain editorial comments. We follow von Mises and do not
translate the German word Kollektiv but use kollektiv and its English plural kollektivs.
Feller uses P ⊂ γ to indicate that a point P is an element of the set γ; following modern
practice, we use P ∈ γ. We also use modern notation to indicate P = (P1 , P2 , . . .) and sets of
points {P1 , P2 , . . .}, respectively; Feller is not consistent in his notation and we have carefully
corrected the original.
1 Cf.the bibliography at the end of the paper.
2 We are mainly interested in the most general result by Copeland which is contained
in [6]. The theory of admissible numbers which was developed in other papers (not quoted
here) by Copeland is only a special case.
a Feller uses Wahrscheinlichkeitsrechnung which means literally “calculus of probability”.
no rules at all
d Feller uses absolut additiv (absolutely additive). For details of this nomenclature, see
questions.
5 Copeland’s results (cf. footnote2 ) are a special case of this. First of all, he essentially
considers only Euclidean and finite spaces and σ-additive p(γ). The greatest restriction
concerns, however, the admissible selection functions. Only the following procedures are
considered:
a) Let s1 < s2 < . . . be any fixed sequence of numbers; one selects the subsequence
Ps1 , Ps2 , . . . from the sequence P1 , P2 , . . .. This means that the selection functions fn are in-
dependent of the Pi .
b) Moreover, it is required, that for any N -tuple of sets γ1 , . . . , γN in Φ the relative fre-
quency of those sub-strings PnN +1 , PnN +2 , . . . , P(n+1)N for which PnN +r ∈ γr tends to-
p. 14 of H. Hahn and A. Rosenthal: Set Functions. The Univ. of New Mexico Press,
Albuquerque 1948.
Łomnicki & Ulam [5] and Kolmogoroff [4], III, §4, have shown that this
set function can be extended to a σ-additive measure on Π; in particular, we
have |Π| = 1.
In this setting, a selection function fn (P1 , . . . , Pn ) is a particular two-valued
function defined on Π. At first we will only consider measurable selection rules.
Although this restriction is, in particular from the point of view of applied
probability theory, quite natural, we will do away with it as well as with (4).
We are going to show
Under the assumptions on p(γ) almost all points in Π are regular with re-
spect to any arbitrary (but fixed) measurable selection rule A = {f0 , f1 (P1 ), . . .}.
Since Wald and Copeland allow only countably many selection rules, this result
contains their existence theorems.
¶ 91 ¶ Denote by A0 the selection rule for which we have identically fn ≡ 1, i.e.
which maps Π identically onto itself. The regular points with respect to A0
(which assigns to each set γ ∈ Φ the total frequency p(γ)) are the analogue of
Borel’s normal numbers; the latter are obtained if π consists of finitely many
points only (Steinhaus [6]). In this particular case, our theorem is of course
just the set-theoretic interpretation of the “strong law of large numbers”.
At the same time, this shows how to improve the theorem in a trivial
way. For this we have only to replace the requirement of normality by any
stricter assumption which also holds “almost surely”. For example, among
all admissible kollektivs there are some where the relative frequency of a set
γ ∈ Φ converges to p(γ) but stays always ≥ p(γ). It is, however, known that
for almost all points of Π the corresponding deviations oscillate about 0, and
one also has quite precise estimates for this. If we replace the requirement of
normality by a more stringent one in this vein, then it is still possible to adapt
the following proof and, doing so, to give a “more strict” or “more precise”
notion of a kollektiv. It is, of course, a mere matter of taste which assumptions
are required to hold for a kollektiv or are thought to be “indispensable” for
the foundations of probability theory6 , respectively. In the sequel I will not
wards the limit p(γ1 )p(γ2 ) . . . p(γN ). As one can see, Copeland’s assumptions are weaker
than Wald’s, even if we admit for the latter only such selection rules which depend only on
boundedly many of the Pi .
6 Often, in particular by Fréchet [2], it is pointed out that von Mises’ notion of kollektiv
excludes, a priori, only certain events of probability zero from the consideration in an arbi-
trary way. At least, this is done by von Mises in a canonical way since his starting point is
the description of the most elementary experience, i.e. the experience of the excluded gam-
bling system. Wald and Copeland, however, completely alter the system of selection rules,
and it is not mentioned at all, which events should actually be excluded from the theory.
n
(6) |an,k | ≤ {p(γ)}k {1 − p(γ)}n−k
k
holds.
¶ The proof is by induction. If n = 1, let β1t denote the set of points ¶ 93
for which r1 = t. By at1,1 we denote the subset of β1t such that Pt = Pr1 ∈
γ. Thus, β1t is the intersection of those (t − 2) sets, where fi = 0 for i =
1, 2, . . . , t − 2 and the set where ft−1 = 1, which means that β1t is measurable.
If Q = (Q1 , Q2 , . . .) is contained in β1t , then β1t contains all points of the form
(Q1 , . . . , Qt−1 , X1 , X2 , . . .) and we see that at1,1 is the intersection of β1t with
the rank-t cylinder set {π × π × . . . × π × γ}. Thus, we have |at1,1 | = |β1t | · p(γ),
Therefore, I think that these theories, even regarding their content, are only loosely con-
nected with von Mises’ system.
7 In this connection one may refer to the important and much deeper irregularity consider-
ations which are connected with the ergodic theorems of statistical mechanics; cf. E. Hopf [3].
8 If A = A (cf. 2) one has, of course, equality in (6). In general, this is not true, simply
0
because of those points for which the sequence ri terminates. The binomial law (6) does,
in general, not even give the relative magnitude of the |an,k | among themselves, and any
single selection function may well strongly favour certain sets.
pn ({Qi }) = p(ψi );
¶ Literature. ¶ 95
A. H. Copeland:2
[1] Admissible numbers in the theory of probability. Amer. Journ. Math. 50
(1928). pp. 535–552.l
9 For the original approach of von Mises (where set-theoretic existence questions do not
play any role) we refer to von Mises: Wahrscheinlichkeitsrechnung. Leipzig and Vienna,
1931.j
Related to the notion of kollektiv, but of essentially different kind is Tornier’s interpre-
tation of probabilities using frequencies of matrices. For this we refer to E. Tornier: Wahr-
i Inthe original text: selection function
l Page numbers missing in the original text.
j There seems to be no English translation, but Hilda Geiringer’s edition of von Mises’
lectures: Mathematical theory of probability and statistics. Academic Press, New York and
London 1964, may serve as a substitute. Note, however, that the 1964 lectures include
many developments since 1931; in particular the treatement of kollektivs differs from the
monograph of 1931. For the history and role of this book, see also the lucid reviews by J.L.
Doob in Mathematical Reviews [MR0178486 (31 #2743)] and R. Theodorescu in Zentralblatt
[Zbl 0132.12303].
R. v. Mises:
H. Reichenbach:
E. Tornier:
[11] Bemerkungen zu der Arbeit von Herrn Iglisch: Zum Aufbau der Wahr-
scheinlichkeitsrechnung. Math. Ann. 108 (1933), pp. 319–320.
J. A. Ville:
[12] Sur la notion de collectif. C. R. Acad. Sci. Paris, 203 (1936), pp. 26–27.
¶ 96 ¶ A. Wald:10
theory organized 1937 by the University of Geneva will soon appear in the Actualités Sci-
entifiques, Hermann, Paris. m
m A. Wald: Die Widerspruchsfreiheit des Kollektivbegriffes (The Consistency of the No-
tion of Kollektiv) In: P. Cantelli, W. Feller, M. Fréchet, R. de Misès, J.F. Steffensen et A.
Wald: Colloque consacré à la Théorie des Probabilités. Deuxième Partie: Les Fondements
du Calcul des Probabilités. Hermann, Actualités Scientifiques et Industrielles 735, Paris
1938, pp. 79–99.
b) Further literature.
J. L. Doob:
[1] Note on Probability. Ann. of Math. 37 (1936), pp. 363–367.
M. Fréchet:
[2] Recherches théoriques modernes sur le calcul des probabilités. (in Borel’s
Traitéo 1, Fasc. 3), livre 1, Paris 1937
E. Hopf:
[3] On causality, statistics and probability. Journ. Math. Phys. of Massachusetts
Inst. Techn. 13 (1934), pp. 50–102.
A. Kolmogoroff:
[5] Sur la théorie de la mesure dans les espaces combinatoires et son application
au calcul des probabilités I. Fund. Math. 23 (1934), pp. 237–278.
H. Steinhaus:
[6] Les probabilités dénombrables et leur rapport à la théorie de la mesure.
Fund. Math. 4 (1923), pp. 286–310.
n Reprinted (with English commentaries) in: E. Dierker and K. Sigmund (eds.): Karl
Villars, Paris 1924–1965, some volumes – including Fréchet’s contribution – have second
editions.
p English translation: A. N. Kolmogorov: Foundations of the theory of probability. Trans-
Alternatively one may say that the probability of exactly k events during any
time interval of length t is given by the Poisson law
Typeset by René L. Schilling. The symbol ¶ indicates a page break in the original text,
and the original pagination is indicated in the margin. Originally underlined text appears
in italics.
1 Problems concerning coincidences in several counters are more delicate and mostly
unsolved.
2 The assumption that a is constant is a great simplification which is usually introduced.
The method outlined in the sequel applies directly also if a depends on the number of
(3) Sk = T0 + T1 + . . . + Tk−1
The actual integrations, as we shall see, need not be performed. From (4) we
see that
As will be seen later on, for Type I counters Fk (t) can be written down im-
mediately, and even Type II requires only routine computations. However,
even if this were not so, the central limit theorem would give us a satisfactory
approximation to Fk (t), and therefore, both to pk and M (t).
It is preferable to use the operational calculus which enables us to describe
the asymptotic behavior of M (t) even in more general cases where an exact
formula is difficult to obtain. We introduce the Laplace transforms
∞ ∞
−st
(9) φ(s) = e dF (t), φk (s) = e−st dFk (t),
0 0
and
∞
(10) μ(s) = M (t)e−st dt.
0
Then m is the average time between two registrations and σ 2 the corresponding
variance. From (9) we see that
m0 is the average of T0 (in the case (2) we have m0 = 1/a). Substituting from
¶ 110 (14) and (15) into (12) we have ¶
$ %
1 1 σ2 m0 1
(16) μ(s) = + + − +···
ms2 2 2m 2 m s
This equation permits the immediate ‘Tauberian’ conclusion that the expected
number of registrations, M (t), satisfies the asymptotic relation6
t 1 1 σ 2 m0
(17) M (t) ∼ + + − + o(t).
m 2 2 m2 m
It may also be remarked that M (t) satisfies the integral equation
t
(18) M (t) = F0 (t) + M (t − x) dF (x);
0
5 The factor s in the denominator is due to the fact that M (t) appears in (9) as coun-
the present case not difficult. It may be remarked that the integral equations (18) and (3)
t
(23) D(t) = 2M (t) + D(t − x) dF (x).
0
Here F (t) differs from F0 (t) only by a change of origin. Now, if the integrations
(6) were applied to F0 (t) instead of to F (t), the derivative of Fn (t) would be
given by the Poisson expression (2). We conclude that
⎧
d ⎨0 if t ≤ kτ
(25) Fk (t) = ak+1 {t − kτ }k −a(t−kτ )
dt ⎩ e if t ≥ kτ.
k!
The same result follows operationally from
0t1
[t/τ ]
ν
aρ (t − ντ )ρ
(28) M (t) = +1− e−a(t−ντ )
τ ρ!
ν=0 ρ=0
[t/τ ] ∞
aρ (t − ντ )ρ
= e−a(t−ντ ) .
ρ!
ν=0 ρ=ν+1
Here [t/τ ] stands for the greatest integer not exceeding t/τ . Either of the
expressions (28) is simpler than the one obtained by Kurbatov and Mann.
¶ 112 (After performing the integrations the latter can ¶ be reduced to (28), however.
The identity of Gnedenko’s solution with (28) is more hidden).
Finally (17) and (22) provide the asymptotic expansions for the Type I :
at a2 τ 2
(29) M (t) ∼ +
1 + aτ 2(1 + aτ )2
at
(30) B(t) ∼ .
(1 + aτ )3
The leading term in (29) is familiar. The case of Levert and Scheen corresponds
approximately to τ x = .08 sec., a > 3. Then B/M ≈ 2/3 instead of 1 as would
be the case with ‘random events.’
4. Numerical Estimates. We shall now derive limits for the error M (t) −
at/(1 + aτ ) by a method which is of much wider applicability. For the Type I
the integral equation (18) reduces for t > τ to
t
(31) M (t) = 1 − e−at + M (t − x)a · e−a(x−τ ) dx;
τ
for 0 < t < τ the integral is naturally to be replaced by zero. Consider now the
more general equation
t
(32) A(t) = H(t) + A(t − x)ae−a(x−τ ) dx.
τ
⎧ at
⎪
⎪ for 0 < t < τ
⎨ 1 + aτ
(35) H(t) =
⎪ −a(t−τ )
⎩ a(t − τ ) − 1 − e
⎪
for t > τ,
1 + aτ 1 + aτ
and a simple computation shows that (34) holds. On te other hand, with
a2 τ 2
C = 2(1+aτ ) we get
⎧
⎪
⎪ at a2 τ 2
⎪
⎨ 1 + aτ + 2(1 + aτ ) for t < τ
(36) H(t) =
⎪
⎪ ( 2 2 ) −a(t−τ )
⎪
⎩1 − 1 − a τ e for t > τ
2 1 + aτ
and (34) holds with the inequality reversed. Therefore, for Type I and all t
at a2 τ 2
(37) 0 ≤ M (t) − ≤ .
1 + aτ 2(1 + aτ )
This estimate is both simpler and sharper than that of Kurbatov and Mann
(1945) and also than the improved estimate of Mann (1946).
The probability that, once the counter is locked, exactly ν events will prolong
the dead interval is q ν p. Now let T (i) be the time elapsed between the events
number i − 1 and i. The total locked time is T = T (1) + . . . + T (ν) . It is again a
sum of random variables but their number is itself a random variable. Clearly
1
(39) U (t) = (1 − e−at ), 0<t<τ
q
Now the time Tk between the (k − 1)st and the k-th registration is composed
of a time T (distributed according to (42)), and the time from the moment the
counter is again set free to the next event. The latter time is a random variable
a
distributed according to (1) with the Laplace transform a+s . Therefore, the
Laplace transform φ(s) of the distribution F (t) of Tk is given by
ae−(a+s)τ
(43) φ(s) = .
s + ae−(a+s)τ
¶ 115 ¶ Substituting into (12) we obtain finally
a * +
(44) μ(s) = 2 s + ae−(a+s)τ .
s (a + s)
This, of course, is the familiar operational image of
1 − e−at for t ≤ τ
(45) M (t) = −aτ −a
1−e + ae (t − τ ) for t ≥ τ.
This is the exact form for the average number of registrations for counters of
Type II. The variance B(t) is obtained in a similar way substituting from (43)
into (2). For t > 2τ we get
* +
(46) B(t) = ae−aτ (t − τ ) 1 − 2aτ e−aτ − e−aτ + (1 + aτ )2 e−2a ,
in accordance with Kosten (1943).
J.D. Kurbatov and B.H. Mann (1945): Correction of G-M counter data.
Phys. Rev. 68, pp. 40–43.
Feller's approach to probability emphasized the unification of frequency and measure through axiomatic methods, contrasting with Kolmogorov's analytical focus on foundational aspects of probability. Feller extended probability by applying measure theory in more generalized Baire spaces and explored applications beyond the confines Kolmogorov set, such as in his treatment of stochastic processes involving boundary behavior . Kolmogorov provided more formal mathematical definitions and focused on foundational work like the unconditional set theory treatment, which influenced probability axiomatization broadly .
Feller significantly influenced mathematical biology by applying his mathematical expertise to biological problems, notably in collaboration with Theodosius Dobzhansky, a key figure in evolutionary synthesis. His work on genetic and evolutionary models, like the critique of the Haldane paradox, provided mathematical clarity and innovative solutions that extended the quantitative methods available for studying evolutionary dynamics . This interdisciplinary approach helped bridge gaps between theoretical biology and mathematical modeling, broadening the scope and precision of biological research .
William Feller addressed the Haldane paradox, which posited that the cost of 'genetic deaths' during evolution by natural selection could slow down evolutionary change significantly. He showed that the paradox was based on spurious assumptions, particularly the unrealistic assumption of constant population size, which he had critiqued as early as 1952 . Through his work, Feller provided a mathematical framework to demonstrate that the evolutionary change could occur more rapidly than the paradox suggested by accommodating varying population sizes .
William Feller demonstrated persistence and a long-term commitment to problem-solving, often revisiting and refining problems he had started years earlier. He was known for not abandoning problems regardless of the completeness of existing solutions, which is evident in his dedication to enhancing the understanding of mathematical theories over decades . This approach highlights his thoroughness and commitment to achieving clarity and generality in mathematical concepts .
Feller's contributions to one-dimensional diffusion theory extended beyond solving specific problems to establishing a generalized framework for understanding stochastic processes. His work formulated conditions under which diffusion processes could be described, particularly in treating boundary behaviors and elaborations on Kolmogorov's earlier work. Feller's insights into diffusion processes have had lasting impacts on both theoretical frameworks and practical applications in fields such as physics, biology, and finance .
Feller's personality, described as ebullient and humorous, had a significant impact on his teaching and research approaches. His lively and fast-paced lecturing style, characterized by 'proof by intimidation,' made complex mathematical concepts engaging and memorable despite not always being complete or correct . His readiness to discuss and debate, even at the cost of being wrong, highlighted his preference for intellectual engagement over mere correctness, which fostered an inquisitive and dynamic learning environment .
Feller's academic journey and collaborations are reflected in his publications that often resulted from dialogues with leading figures of various fields. For instance, his association with Theodosius Dobzhansky influenced his work in evolutionary theory, and his collaboration efforts are evident in publications like [Feller 1966b], which denote institutional affiliations and interdisciplinary interests . His evolving academic roles, from Rockefeller Institute to Princeton University, also mirrored in his increasingly diverse research focus and publications .
Feller introduced a methodology in treating boundary conditions in stochastic processes by conceptualizing them as restrictions on the domain of operators that served as infinitesimal generators of appropriate semigroups. This approach allowed for a more nuanced understanding of the behavior of paths in Markov processes and generalized the use of differential operators within diffusion theory, revealing deep connections between operator theory and stochastic processes .
Feller's critique of constant population size assumptions significantly influenced modern evolutionary biology by identifying and addressing key limitations in the existing models of evolution, such as the Haldane paradox . By advocating for models that incorporate variable population sizes, his work facilitated the development of more robust frameworks that better account for real-world complexities in evolutionary dynamics. This shift has enabled more accurate predictions and analyses in evolutionary studies, supporting further advancements in the field .
Feller's understanding of the law of the iterated logarithm was enhanced through his ongoing effort to simplify and generalize complex probabilistic results. His paper "General analogues of the law of the iterated logarithm" surpassed his earlier work by achieving simplicity and generality after decades of refining his approach . By addressing foundational assumptions and expanding the conditions for its application, he made the theorem more accessible and useful, contributing to its application across various mathematical and scientific fields .