0% found this document useful (0 votes)
23 views16 pages

Estimando Una Funcion de Distribucion Con Datos Truncados

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views16 pages

Estimando Una Funcion de Distribucion Con Datos Truncados

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Estimating a Distribution Function with Truncated Data

Michael Woodroofe

The Annals of Statistics, Vol. 13, No. 1. (Mar., 1985), pp. 163-177.

Stable URL:
http://links.jstor.org/sici?sici=0090-5364%28198503%2913%3A1%3C163%3AEADFWT%3E2.0.CO%3B2-3

The Annals of Statistics is currently published by Institute of Mathematical Statistics.

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained
prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in
the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/journals/ims.html.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic
journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,
and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take
advantage of advances in technology. For more information regarding JSTOR, please contact [email protected].

http://www.jstor.org
Tue Mar 11 14:11:52 2008
The Annals of Statistics
1985, Vol. 13, No. 1, 163-177

ESTIMATING A DISTRIBUTION FUNCTION WITH TRUNCATED


DATA

The University of Michigan and Rutgers University


Let 9 be a finite population with N r 1elements; for each e E 9 let X,
and Yebe independent, positive random variables with unknown distribution
functions F and G; and suppose that the pairs (X., Ye) are i.i.d. We consider
the problem of estimating F, G, and N when the data consist of those pairs
(X., Ye) for which e E 9 and Ye 5 X,. The nonparametric maximum
likelihood estimators (MLEs) of F and G are described; and their asymptotic
properties as N + oo are derived. It is shown that the MLEs are consistent
against pairs (F, G) for which F and G are continuous, G-'(0) 5 F-'(O), and
G-'(1) 5 F-'(1). f i x estimation error for F converges in distribution to a
Gaussian process if 5; (1/G) d F < m, but may fail to converge if this integral
is infinite.

1. Introduction. Consider a finite population 9 whose size N is large, but


otherwise unknown, For each element e E 9 let Xe and Ye denote independent,
positive random variables with distribution functions F and G, say; and suppose
that (X,, Ye),e E 9 are i.i.d., as (X, Y), say. Finally, suppose that one observes
(only) those pairs (X,, Ye) for which Ye IXe, but not the labels e E 9 The
problem considered is that of estimating F, G, and N. Nonparametric maximum
likelihood estimators (MLEs) of F and G, described in (8) and (9) below, have
been derived by several authors, listed below, from different perspectives. Here
the asymptotic properties of the estimators are studied, and still another deri-
vation suggested.
This model arises in astronomy. The absolute and apparent luminosities of an
astronomical object are defined to be its brightness at a fixed distance and as
observed on earth; and magnitude is defined to be the negative logarithm of
luminosity. In some models, the redshift z and the absolute magnitude M of
astronomical objects are assumed to be independent random variables which are
related to the apparent magnitude m by the equation
(1) m = f (z) + M,
where f is a known function, or at least a nearly known one. For example,
Hubble's Law specifies that f (z) = 5 log z, and Segal's Chronometric Theory
specifies that f (z) = (5/2)log[z/(l +
z)]. See Segal (1975). Of course, one can

Received June 1983; revised March 1984.


' Presented at the Jack Kiefer-Jacob Wolfowitz Memorial Statistical Research Conference;
dedicated to their memories.
Research supported by the National Science Foundation under MSC-8101897.
AMS 1980 subject classifications. Primary 62F20; secondary 62G05.
Key words and phrases. Nonparametric, maximum likelihood estimation, consistency, asymptotic
distributions.
163
164 M. WOODROOFE

only detect objects which are sufficiently bright, say m 5 m*. Then, letting X =
exp[-f ( z ) ] and Y = exp[M - m*]yields the model described above.
In other applications, the Xe may be the sizes of hidden objects for which one
searches for one unit of time and Te = Ye/Xe might be the time at which one
would find the object e, if the search were continued indefinitely. Then the
conditional probability of finding object e given Xe is G(Xe), an unknown but
increasing function of X,. For example, Barouch and Kaufman (1975) have
described models for exploring for petroleum reserves in which the probability of
finding a given pool is proportional to the pool's size. Letting X denote a pool's
size and T denote the time at which it would be found in an infinite search yields
a model which is closely related to Barouch and Kaufman's (1975).
Starr (1974), Starr, Wardrop, and Woodroofe (1976), and Kramer (1983) have
considered a class of optimal stopping problems in which one searches for hidden
objects and receives a reward depending on the objects found, say the sum of
their sizes, less a cost of sampling. Assuming a known stochastic model and
certain other conditions, these authors obtain explicit solutions to the optimal
stopping problem. In addition, they propose adaptive procedures for use when
the total number of objects N is unknown. The estimators studied here may
allow implementation of adaptive procedures in which other quantities, like F,
are estimated sequentially.
Nonparametric MLEs of F and G were derived by Lynden-Bell (1971), who
described another application to astronomy. See also Jackson (1974). Nicoll and
Segal(1980) derive the MLEs for grouped data; and Bhattacharya, Chernoff, and
Yang (1983) derived MLEs from a conditional likelihood function of certain
counts, given the observed X-values. The latter paper also computes the infor-
mation matrix for its model. Bhattacharya, et al. (1983) construct nonparametric
estimators of regression parameters in models like (I), and show asymptotic
normality of estimation error, properly normalized; and Bhattacharya (1983)
considers the asymptotic distribution of a goodness of fit statistic with a view
towards testing hypotheses about regression parameters. None of these papers
give conditions for the consistency and asymptotic normality of the MLEs of F
and G, however.
Here asymptotic properties of these estimators are studied as N -, m. In
Section 2, the conditional distributions of X and Y given Y 5 X are related to
the unconditional distributions F and G. The estimators are described in Section
3. Section 4 considers consistency; if F and G are continuous and if the lower
and upper endpoints of the convex support of G are individually less than or
equal to those of F, then the estimators converge to the true distribution functions
F and G in probability as N + m. Sections 5 and 6 consider normalized estimation
error for the distribution functions. Here d% x estimation error for F converges
in distribution to a Gaussian process if $; (1/G) d F < m; but the asymptotic
variance may be infinite if this integral diverges.
There is some similarity between the estimators studied here and the estimator
of Kaplan and Meier (1958), and hence with the asymptotic results of Breslow
and Crowley (1974). There are also differences. The Kaplan Meier estimator
would be appropriate if XiA Y, = min(Xi, Yi) and 6, = I{Xi 5 Y,] were observed
DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 165

for 1 I i I N; here both Xi and Yi are observed if Yi I Xi, and nothing is


observed otherwise. In terms of the asymptotic distributions, this difference leads
to the possibility of an infinite variance for fix estimation error.
There is also some similarity with recent results of Vardi (1982a, 1982b). He
considers generalizations of our model when G is known, and obtains both
nonparametric MLEs and asymptotic distributions.

2. A Transformation. Let X and Y denote independent, positive random


variables with distribution functions F and G, taken to be continuous from the
right. Let H, denote the joint distribution function of X and Y given Y I X; and
let F, and G, denote the marginal distribution functions of X and Y given
Y IX. Thus,
rx

F,(x) = H,(x, w) and G,(y) = H,(w, y), 0 I X , Y< W,


where a = So" G(z) dF(z) = So" [I - F(z-)] dG(z) is assumed to be positive. Here
y A z denotes the minimum of y and z for 0 I y, z < m; F(z-) = PIX < z) for
z r 0; and Sk = S(a,blfor 0 I a < b I w. There is little hope of finding consistent
estimators of F and G from the data described in the introduction, unless F, and
G, determine F and G. So, this question is investigated first.
If K is any distribution function on [0, w), let

and

so that (aK, b ~ is) the interior of the convex support of K. Then a > 0 in (2) if
a~ < b ~and
, a = 0 unless a~ I b ~If. a > 0 and if F, and G, are related to F and
G by (21, then a ~ =* m a x { a ~a, ~ )b, ~ =
, b ~ a, ~ =, a ~ and
, b ~=
* min(bF, b ~ )In
.
addition, it is convenient to have the following notation: let
X= {(F, G): F(0) = 0 = G(O), a(F, G) > 01,
Xo= ((F, G) E X: a~ I a ~bc, 5 b~),
, T (F, G) = H*, (F, G) E x
LEMMA1. (i) Let (F, G) E X a n d let Fo and Go denote the conditional
distributions of X and Y given X r a~ and Y I b ~ Then
. (Fo, Go) E XOand
TVo, Go) = T(F, G);
(ii) T ( X ) = T(X0).

PROOF.Since Y I X implies X r a~ and Y I b~w.p.1, T (F, G) = T(F0, Go).


To see that a ~ I , a ~ , ,observe that a ~ =, a ~ since
, (F, G) E and that
, r a~ = a ~ , A
a ~=, m a x ( a ~aG) . similar argument shows that b ~ I, b ~to
, complete
the proof of (i). Assertion (ii) then follows since XoC %
166 M. WOODROOFE

Recall that the cumulative hazard function of a distribution function F (with


F(0) = 0 ) is defined by

A(x) = 1' d F ( z ) / [ l - F(z-)], 0 5 x < w.

The cumulative hazard function A uniquely determines the distribution F by


the following algorithm; let D denote the set of x for which 0 I x < b~ and
X(x) = A(x) - A(x-) > 0; then

where A,(x) = A(x) - X ( Z ) , 0 5 x < bF.

THEOREM1. Suppose that H , E T ( X ) . Then there is a unique pair


(F, G ) E Xofor which T ( F , G ) = H,. Here the pair ( F , G ) is determined by the
conditions

and #

where
C ( z ) = G*(z) - F*(z-), 0 5 z < w.

PROOF. By the lemma, there is at least one pair (F, G ) E Xo for which
T ( F , G ) = H,. It is shown below that ( 4 ) holds for any such pair, and it then
follows that there is only one such pair, by (3) applied to F and G I , where
Gl(z) = 1 - G ( l / z - ) , z > 0. The proof of ( 4 ) depends on the simple identity
C ( z ) = aA1G(z)[l- F(z-)] for z r 0 , which may be derived as follows:

for 0 5 z < a. Since a~ 5 aF, it follows easily that

1' dF,(z)/C(z) = iR: G(z)d ~ ( z ) l a ~ ( z )

for all x 2 up;and both sides vanish for x < aF.This establishes the first assertion
in ( 4 ) and the second may be established similarly.
DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 167

COROLLARY 1. Let (F, G) E 2 and let Foand Go be the conditional distribu-


tions of X and Y given X r a~ and Y 5 bF, as in Lemma 1. Then (Fo, Go) is the
only pair in X0for which T (Fo,Go) = T (F, G).

COROLLARY 2. Let To denote the restriction of T to X0.Then To has an


inverse function.

PROOFS.Lemma 1 asserts that (Fo,Go) E X0and T (Fo,Go) = T (F, G); and


the theorem asserts that there is only one such pair. This establishes the first
corollary. The second then follows, since (Fo,Go) = (F, G) when (F, G) E Xo.

REMARKS 1. The inversion formula of Theorem 1 uses only the marginal


distributions of H, .

2. Let 9 denote the class of all distribution functions on [0, w). Endow 9
with its weak topology; endow 9 x 9with the product topology; and endow
X0, and T ( 9 ) with their relative topologies. Then T is easily seen to be
continuous at all (F, G) E 9 which have no common points of discontinuity.
However, the inverse transformation to To is not continuous. To see this let F
and G be continuous distribution functions with support [0, w); and let G, =
(G + 6,)/2, where 6, denotes the point mass at n for n r 1. Then T(F, G,)
T(F, G) as n -t m, but G, does not converge to G.
-
3. Estimation. Now let F and G denote distribution functions for which
(F, G) E A?;let X and Y denote independent random variables with distri-
bution functions F and G; and let (XI, Y1), ., (XN, YN) be i.i.d. as (X, Y). As
in the introduction, suppose that one observes only those pairs (Xi, Yi) for which
i 5 N and Yi 5 Xi. Suppose that there is at least one such pair, and let
(xl, yl), . ., (x,, y,) denote these pairs, so labeled that (xl, yl), . ., (x,, y,) are
conditionally i.i.d. given n.
To describe the estimators of F and G, let F,*and G,* denote the empirical
distribution functions of xl, . . x, and yl,
a , a , y,,
F:(z) = ( l l n ) # {i 5 n: Xi 5 z),
(5)
G,*(z)= ( l l n ) # ( j In: yj 5 z), 0 5 z < m,
where # A denotes the cardinality of a set A. Thus, F,*and G: estimate the
conditional distribution functions F, and G,. Estimators of F and G may be
constructed from F,*and G,*by using the inversion formula of Theorem 1. Let

and observe that C,(xi) r l l n for all i 5 n. Then Theorem 1 suggests estimating
the cumulative hazard function A by

Observe that A, is a step function with discontinuities (only) at xl, .. ., x,. Thus,
168 M. WOODROOFE

Equation (3) suggests estimating F by

where r(xi) = # (k 5 n: xk = x,] for 1 5 i < n, the product extends over distinct
values of xl, . . . , x,, and an empty product is to be interpreted as one. Of course,
a similar construction is possible for the estimation of G. After some algebra, one
is led to the estimator

where s(y,) = # (k 5 n: yk = yj] for 1 Ij 5 n.


The estimators F, and G, were derived by Lynden-Bell (1971). Suppose, for
simplicity, that there are no ties among xl, . . ., x,, yl, . . ., y, and consider
estimating F and G by distributions which are supported by (xl, . . ., x,]
and (yl, . . ., y,). For such distributions, the conditional likelihood function given
n is

where p l , . . ., p, and ql, . . . , q, are the masses assigned to xl, . . ., x, and


yl, . . ., y,. This likelihood function may be maximized with respect to p l , . . .,
pn and ql, . . . , q,; and the estimators F, and 4, result, provided that (10) below
does not occur. Alternatively, one may show that FE and G,* are the nonpara-
metric, maximum likelihood estimators of F, and G, and then use the invariance
properties of maximum likelihood estimators. The alternative derivation is not
substantially simpler than the direct one, however.
The estimators Pnand Gn may be supported by proper subsets of (xl, . . ., x,)
and (yl, . . ., y,). Let ql)< x(*) < . . . < x(,) and y(l) < . . . < y(,) denote the
ordered values of xl, . . ., x, and yl , . . . , y,. If
(10) ~ C , [ X ( ~=) ]1, for some k, 1 5 k < n,
then

This a disturbing property of the estimators, since it may lead to unreasonable


estimates. For example, it is possible to have k [ ~ ( ~= )1.] It is shown below that
the probability of (10) approaches zero as N + w, if F and G are continuous; but
this will be of little comfort when (10) occurs.
The problems which result from (10) may be overcome in a simple, if ad hoc,
manner. Let k, be a nonincreasing function for which k,(x) > k,[x(,)] = l l n for
all x < x(,) . If C, is replaced by

in (9), then the resulting estimator F! is not supported by any proper subset of
(xl, . . ., x,]. In fact, l / n k , [ ~ ( ~ )is] the maximum proportion of the estimated
probability 1 - F![x(i)-] which the experimenter is willing to assign to x(,) for
i = l , . . . n.
DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 169

TABLE1
Calculation of p,,

The (x,y ) pairs are listed in order of increasing x values; andpk = p"[x(k]]- f l n [ ~ ( k - l ] ] ,
k = 1, . . . , 10. The sample average and MLE of the mean of F are
f = ,6116 and = .5192.

It is especially interesting that one may estimate a , the probability that


Y 5 X, when one observes only those pairs (Xi, Yi) for which i 5 N a n d Yi 5 Xi.
The nonparametric maximum likelihood estimator of a is
Pm

a, = Jo 6, dl',.
It is easily seen that 6, > 0 if nC,[x(i)] > 1for all i In - 1; otherwise, #, and 6,
may be replaced by Fi and G;. Having estimated a, one may then estimate the
population size by
N, = TI/&,.
EXAMPLE 1. When F and G are both the uniform distribution on the unit
interval, F,(x) = x2 for 0 < x < 1 and the conditional distribution of y1 given xl
is uniform on the interval (0, xl]. To illustrate the properties of the estimators
#, and 6,, n = 10 pairs of (x, y) values were simulated from the latter joint
distribution. The results are listed in Table 1, along with the value of C, and fin.
Observe that there is only one data point in the interval (0, 1/31 and four in the
interval (2/3,11-reflecting the selection bias. The estimator #, attempts to correct
for this bias by assigning higher weight to the smaller values of xl, . ., x,. One
may see the extent of this correction by comparing the observed average 2 = .612
with the MLE of the mean of F, F. = $A x dF, = .519. Of course, the means of F,
and F are 2h and l/2. While assigning larger weights to smaller values may correct
for some bias, it also increases variability. This is illustrated by the erratic
behavior of #,(x) for x 5 l/2.

4. Consistency. In this section, F and G denote continuous distribution


functions for which (F, G) E A?j and (XI, Yl), (X2, Y2), . . denote i.i.d. random
- -
vectors for which Xl F and Yl G are independent. We imagine the estimators
170 M. WOODROOFE

fin and Gn computed from the populations 9={I,2, . . - , N ) for N = 1, 2, . . .


and investigate the limiting behavior of #,, and 6, as N + w. Let (xl, yl),
(x2, y2), . . . denote the successive values of (Xi, Yi) for which Yi IXi. Then
(xl, yl), (x2,y2), - . are i.i.d. with the common joint distribution function H, of
(2). As in Section 3, let n = n~ = # ( i 5 N: Yi d Xi] for N r 1. Then n
Binomial(N, a ) for all N 2 1; and the conditional distribution of (xl, yl), . . .,
-
(xk,yk) given n = k is the same as their unconditional distribution for 1 5 k I
N. Let Pndenote conditional probability given n. Below, the Pn-probability limits
of fin and Gn are determined as n w. It then follows that these are also the
limits in unconditional probability as N -+ w.
The following lemma may be of independent interest, since it computes the
bias of the estimator A,.
LEMMA 2. Suppose that F and G are continuous and that (F, G) E Wo. If h is
a measurable function for which Jr
I h I dA < w, then

&{Im lm 1'
h d ~ 4=. h dA - h(l - C)" dA
for all n r 1, where C(z) = a-lG(z)[l - F(z)], z r 0. I n particular,
r

PROOF. If h is integrable with respect to A and n r 1, then

Now, the conditional distribution of nCn(xi) - 1 = # ( j5 n: j # i, yj 5 xi 5 X j J


-
given n and xi is binomial with parameters n - 1 and C(xi) for each i = 1, ., n.
so,

for all i = 1, . . ., n, by an elementary calculation. Since dA = dF,/C, the first


assertion of the lemma now follows from multiplying (11) by h(xi), integrating
over xi, and summing over i = 1, . . ., n. The second assertion then follows by
letting h be the indicator of [0, x] for fixed x, 0 < x Ib ~ .
Observe that the conditional bias of i,(x) approaches zero as n -+ w for all
x < bF, but may do so arbitrarily slowly.
THEOREM2. Let F and G be continuous distribution functions for which
(F, G) E 2, and let Foand Go denote the conditional distributions of Xl and Yl
given Xl Ia~ and Yl Ib ~respectively.
, Then

PROOF.Since the distribution function H, remains unchanged when F and


DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 171

G are replaced by Foand Go by Lemma 1, it suffices to prove the theorem in the


special case that (F, G) E Wo. Moreover, it suffices to prove the convergence of
fin.
Given e, 0 < e < 1, let a > a~ be such that A(a) < e2/4 and let B = B , , be the
event B = ( i n ( a )5 ~ / 2 )Then
.

for all n r 1 by Lemma 2. So, since fin(z) 5 i n ( a ) and F(z) 4 A(a) for z s a, it
suffices to show that Pn(B,supxz,l fin(x) - F(x) I r E ] + 0 as n -+ w.
Let A n i = l/nCn(xi) for 1 4 i In; and define Kn and K by

and

for x r a and n r 1. Then 1 - fin(x) = [ l - f i n ( a ) ] ~ n ( xand


) 1 - F(x) =
[I - F(a)]K(x) for all x r a and n r 1. If B occurs, then

for all x r a and n r 1 by simple algebra. So, it suffices to show that


sup,,,l Kn(x)- K(x) I --, 0 w.p.1 as n + (on the space of (xi, yi), i r 1).In fact,
since K is continuous and each Kn is monotone, it suffices to show that Kn(x) +
K(x) w.p.1 for each fixed x, a 4 x < bF (cf. Breiman, 1968, page 160).
Since sup,,ol F,*- F, I -,0 c sup,,Ol GX(y) - G,(y) I w.p.1 as n 4 w and
since C is positive and continuous on the interval (ac, bF), one finds that
supaczSxll/Cn(z) - l/C(z) I + 0 w.p.1 as n + for all x, a < x < b ~ SO,.

w.p.1 as n --, w for a < x < bF. See Billingsley (1968, page 34). Since A is
continuous and i n , n r 1, are monotone, the convergence must be uniform on
a 4 x 4 b for any b < bF; and it follows that the maximum of over any such
interval [a, b] approaches zero w.p.1 as n + w. To complete the proof, let
(12) Rn(a, X) = C i : a u i s x log[l - A n i l + [in(x) - i n ( a ) l
<
for a < x bF and n r 1. Then, by expanding log(1 - A ) in a Taylor series about
X = 0, one finds that there are intermediate points [ni for which I 1 - [ni I 5 Xni
for 1 5 i 4 n,
I Rn(a, X) I = lh C i : a u i s x C;zfXzi --, 0
and
Kn(x) = exp(-[in(x) - in(a)l + x)l

w.p.1 as n + w for a < x < b ~This


. completes the proof.
172 M. WOODROOFE

COROLLARY 3 . If F and G are continuous and (F, G) E s o , then


sup1 Pn- F I -+ 0 t sup I &, - G I in Pn-probability as n + w.

COROLLARY 4. If F and G are continuous and (F, G) E Yo, then 6, -+ cu in


Pn-probability as n + w and N ~ / N+ 1 in probability as N -+ w.

5. If F and G are continuous and (F, G) E Yo, then


COROLLARY
P,(~C,[X(~,]
= 1, for some i 5 n - 1)+ 0
and
min{nCn[xci,]:1 5 i 5 (1 - c ) n ) + w
in Pn-probability as n -+ w for all c, 0 < c < 1.

PROOFS. Corollary 3 is clear, and the convergence of 6, to cu in Corollary 4


follows. That N ~ / N
-+ 1 then follows, since n/N -+ cu w.p.1 as N + w.
The second assertion in Corollary 5 follows from the relation
Pn[x(i)] - pn[x(i) - I = (1 - Pn[x(i) - I )/nCn[xci)l
for all i 5 n and n r 1. Let 0 < e < 1 and k = k(n, e ) = [ ( I - e)n] + 1, where [.I
denotes the greatest integer function. Then 1/(1- fin[xck)]]is stochastically
bounded and maxisn Pn(xi) - Pn(xi -) + 0 in Pn-probability as n + w, both by
Theorem 2. This proves the second assertion in Corollary 4. The first assertion
then follows from the second and its dual, obtained by reversing the roles of
(X, Y) and (l/Y, l/X), by observing that nCn[xcs]= 1implies that nCn[y(i+l)- ]
=lforl5i5n-1.

REMARK3. In the astronomy example, improved instrumentation might


change m*. In turn, this could change the definitions of Y, a ~ and
, Fo, the
asymptotic value of Pn.

REMARK 4. Since the joint distribution H, depends on F and G only through


FOand GO,it is not possible to test the hypotheses a~ 5 a~ and bG 5 bF using
(XI,YI), . . , ( ~ n~, n ) .
5. Convergence on compact intervals. For 0 5 a < b 5 w, let B[a, b]
be the space of all functions f from [a, b] into R = (-my w ) which are right
continuous on [a, b), have left-hand limits on (a, b], and are continuous at b.
Endow B[a, b] with the Skorohod topology, as described by Billingsley (1968,
Section 14). For each n r 1, define the stochastic processes Xn and Yn by

and
Yn(t) = & [ ~ , * ( t )- G, (t)], 0 5 t -< w.
where F,*and G,*are as in (5); and note the change in the use of the symbols
DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 173

" X " and "Y." Then (X,, Y,) is a random element with values in g 2 [ 0 ,w ] =
$3[0,w ] x 9 [O, w ] for each n r 1. If F and G are continuous, then the conditional
distributions of (X,, Y,) given n converge

where X and Yare jointly Gaussian processes on [O, w ) with continuous sample
paths and covariance structure
p,(s,t)=F,(s)-F,(s)F,(t), Ossst<w,

and p,(s, t ) = H,(s, t ) - F,(s)G,(t), 0 5 s, t 5 03.

Indeed, the convergence of the finite dimensional distributions of (X,, Y,) follows
directly from the univariate central limit theorem and the Cramer-Wold device;
and the tightness of the distributions of the pairs (X,, Y,), n r 1, follows from
that of the components.
Observe that the covariance functions p,, p,, and p, may be consistently
estimated.
Now suppose that F and G are continuous and that ( F , G ) E X0.Fix values
of a and b for which aG< a < b < bF and let

= lt& ( X , - Y,) dFz + lt $j X dC

w.p.1 for a 5 t 5 b and n r 1. The processes W,,,, n z 1, are random elements


with values in g [ a , b].

THEOREM 3. Suppose that F and G are continuous and that (F, G ) E Xo.If
a~ < a < b < b ~then
,
W,,, + W a = W ? + W ; , as n + w ,
where

W ? ( t )= it S ) ( s ) - Y ( s ) dF,(s)]
C ( S ) - ~ [ X (dG,

and

PROOF. First observe that C = G, - F , is positive and continuous on [a, b],


since ac < a < b < bF. SO, expressions like X / C and $A C 2 X dG, define
continuous transformations from 9 [ a ,b] back into 9 [ a ,b]. Since weak conver-
174 M. WOODROOFE

gence is preserved by such continuous transformations, it suffices to show that

in P,-probability as n -+ w, with Z, = X, - Y,, n r 1. To see this, one may


replace C,, FX, and Z, by other random elements, also denoted by C,, FX,
and Z,, which have the same joint distribution and converge to C, F,, and
Z = X - Y w.p.1. as n +w. See Skorohod (1956). That A, -+ 0 w.p.1 then follows
from Theorem 5.5 of Billingsley (1968) by considering a sequence t,, n z 1, of
random variables for which the supremum is nearly attained. The details are
omitted. For a closely related argument, see Breslow and Crowley (1974, pages
447-448).
Of course, one would like to set a = aF in Theorem 3. If ac < a ~then
, this is
possible. If ac = a ~then
, the limiting process may not be defined.

THEOREM 4. Suppose that F and G are continuous, that (F, G) E X0,and


that a~ = a ~If.

then X(a)/C(a) + 0 and WY(t) + J & C-2[X dG, - Y dF,] in probability as


a 4 a ~for
, aF < t < bF. Conversely, if (15) fail., then the variance of WY(t) diverges
to as a J. a~ for any t E (aF, b ~ ) .

-
PROOF. Recall that C = a-lG(1 - F), so that C(z) a-lG(z) as a 4 a ~ .
Suppose first that (15) holds. Then the variance of X(a)/C(a) is at most
) [ l - F ( u ) ] - ~$& (1/G) dF, which tends to zero as a J. up. Next,
c ( ~ ) - ~ F , ( aI
write WY = WY1 - WY2, where Wyl(t) = JL C 2 X dG, and W e = $L C-2Y dF,
for a~ < a < t < bF. Thus, to show that lim WY(t) exists in probability for all
t > aF, it suffices to show that the variances of WPl(t) and WY2(t) remain
bounded as a 4 aF for some t < up. If a~ < a < z < b ~ then , the variance of
WYl(z) is

for some constant B; and the last line is finite, by assumption. A similar argument
shows that the variance cri ( 2 ) of WYp (2) remains bounded as a J. aF, if (15) holds.
If (15) fails, then a careful examination of (16) shows that a;(z) + w
DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 175

as a 1 a ~ ai(z)
. may either diverge or remain bounded, depending on whether
(FIG) d F = w or < w, but one may show that a%(z)/af(z) + 0 in either case.
That the variance of WP(z) diverges is an easy consequence. The details are
omitted.

6. Convergence at an endpoint. In this section, we suppose that F and


G are continuous, that (F, G) E Yo, and that (15) holds. In this case, the limiting
distributions developed in the last section are valid when a = a ~ To . avoid
trivialities and simplify the notation, we suppose that a~ = a~ = 0 throughout.
Fix a value of b for which 0 < b < bF and define processes

and
Zn(t) = &[Pn(t) - F(t)], 0 It 5 b, n r 1.
Then Wn and Zn take values in g[O, b] w.p.1 for all n r 1.

THEOREM 5. Suppose that F and G are continuous, that (F, G) E %o, that
(15) holds, and that ac = a~ = 0. Then Wn + W and Z, +Z, as n -+ w, where

and
Z ( t ) = [l - F(t)]W(t), 0 5 t 5 b,
with the convention 010 = 0 when t = 0.

PROOF. That W is well defined follows from Theorem 4. To show that


Wn + Was n -+ w, it suffices to show that Wn(a) --, 0 in Pn-probability as first
n -+ w and then a + 0. See Theorem 3. Now, as in (14),

Wn(a) = 1 a (l/CCn)(Xn - Yn) dFE + 1 a C-l dXn

for a > 0 and n r 1. Given n r 1, IIn(a) is a normalized sum of i.i.d. random


variables, and E,(II,(u)~) IJ: C-2 dF* which is independent of n and tends to
zero as a J, 0, since C-2 dF* is finite. Thus, IIn(a) converges to zero in
Pn-probability as n --, w and then a 10. Next, recall that d i n = dFE/Cn and
write

I In(a)I 5 Ja C-l I Xn - Yn I d i n 5 B.,, Ja CT1d i n

for a > 0 and n r 1, where B , , = sup,,, ( X n ( t ) - Yn(t)1. Now B , , --, 0 in


176 M. WOODROOFE

P,-probability as n + w and then a J 0; and, by Lemma 2, En{$o C-l d i n ] 5


J8 C-l d h , which is independent of n and tends to zero as a J 0. This completes
the proof that W, + W and n + w.
Now consider 2,. With R, as in (12),

(17) zn(t) = - $n(t)l J-{


n exp [&
1
I
- Wn(t) -Rn(O, t ) - 1
>
for 0 5 t 5 b and n r 1. So, it suffices to show that maxtsb & I R,(O, t ) I + 0 in
Pn-probability as n + w. Now, maxtab I R,(O, t ) I = I R,(O, b) I ; and

where B, = max(t;?: xi 5 b) and t,i, 1 5 i 5 n, are intermediate points as in


(12). Now, B, is bounded in Pn-probability, by Corollary 3; and the expectation
of the sum in (17) is at most ( l l n ) $8 C-2 dF*, as in the proof of Lemma 2. (The
conditional distribution of &,(xi) - 1 given n and s is binomial [n - 1, C(xi)]
for 15 i 5 n.) Thus, Rn(O,b) = Op(l/n) = +(I/&) in Pn-probabilityto complete
the proof.

REMARKS5. By Corollary 5, Theorems 2, 3, and 5 are valid if fin is replaced


by the modification F! of (9), provided the constants knl, . . . , k,, are bounded.
Indeed, Corollary 5 asserts that P,(F{(z) = F,(z) for all z 5 b] + 1as n -, for
any b < bF, in this case.

6. There is a dual to Theorem 5. Suppose that F and G are continuous, that


(F, G) E Xo, that bG = b~ = w, and that 1/(1 - F ) dG < w. Let U,(t) =
&[&(t) - G(t)], t r 0, n r 1,and regard Un as random elements with values in
9 [ a , w], where a > a~ r a ~Then
. Un + U, where

for a 5 t < w.

7. The condition (15) is not surprising, since it is necessary for the convergence
in distribution of & x estimation error even in the case when G is known. In
this case, the nonparametric maximum likelihood estimator of F is F n ( t ) =
~ = ~ for t 2 0 and n r 1; and it is easily seen that
[Ci:xist ~ / G ( X ~ ) ] / [ Cl/G(xi)]
& [ ~ , ( t ) - ~ ( t )converges
] in distribution for all t > a~ r a~ iff (15) holds.

8. If (15) fails, then other limiting distributions may obtain, Suppose, for
example, that F is continuous, that a~ = 0, and that G = Fc, where 1 < c < w.
+
Let 6 = 1/(1 c). Then n6[Fn(t)- F ( t ) ] has a limiting stable distribution for all
t > 0 for which F(,t) < 1. To see this, fix t and write

Then I II, I 5 maxSst I Cn(s) - C(s) 1 $6 ( l / ~ ) d i n= op(l/&) by Lemma 1 and


DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 177

( x ~i) = 1, 2, . .
properties of empirical processes. Let zi = ( l / C ( x i ) ) I ( ~ , ~ )for
Then zi, z2, . . . are i.i.d. with common mean A(t). Now, it is easily seen that zi
is in the domain of attraction of a stable distribution with characteristic exponent
y = (1 + c)/c and skewness parameter 1, in Feller's (1966, pages 540-543)
terminology. So, n6[A,(t) - A(t)J has a limiting stable distribution as n + w. (In
fact, the same stable distribution is obtained for all t.) That n6[fin(t)- F ( t )J has
a limiting distribution, now follows from (17) by using a stable distribution to
bound z? + . . . + 22, in (18).

Acknowledgements. Irving Segal introduced me to this problem through


the application to astronomy. Gary Lorden brought the papers by Lynden-Bell
(1971) and Jackson (1974) to my attention. The referees and associate editor
contributed useful comments and criticisms.

REFERENCES
BAROUCH, E. and KAUFMAN, G. M. (1975). Probabilistic modelling of oil and gas discovery. Energy
133-152.
BHATTACHARYA, P. K. (1983). Justification for a K-S type test for the slope of a truncated regression.
Ann. Statist. 11 697-701.
BHATTACHARYA, P. K., CHERNOFF, H. and YANG,S. S. (1983). Nonparametric estimation of the
slope of a truncated regression. Ann. Statist. 11 505-514.
BILLINGSLEY, P. (1968). Convergence of Probability Measures. Wiley, New York.
BREIMAN, L. (1968). Probability. Addison-Wesley, Reading, Massachusetts.
BRESLOW, N. and CROWLEY, J. (1974). A large sample study of the life table and product limit
estimates under random censorship. Ann. Statist. 2 437-453.
FELLER,W. (1966). An Introduction to Probability Theory and its Applications, Vol. 2. Wiley, New
York.
JACKSON, J. C. (1974). The analysis of quasar samples. Mon. Not. R. Astr. Soc. 166 281-295.
KAPLAN, E. L. and MEIER,P. (1958). Nonparametric estimation from incomplete observations. J.
Amer. Statist. Soc. 53 457-481.
KRAMER, M. (1983). Stopping a size dependent exploration process. Ph.D. Thesis, The University of
Michigan.
LYNDEN-BELL, D. (1971). A method of allowing for known observational selection in small samples
applied to 3CR quasars. Mon. Not. R. Astr. Soc. 155 95-118.
NICOLL,J . F. and SEGAL,I. E. (1980). Nonparametric elimination of the observational cutoff bias.
Astron. Astrophys. 82 L3-L6.
SEGAL,I. E. (1975). Observational validation of the chronometric cosmology: I. Preliminaries and
the red shift-magnitude relation. Proc. Nut. Acad. Sci. 7 2 2437-2477.
SKOROHOD, A. V. (1956). Limit theorems for stochastic processes. Theor. Probab. Appl. 1 261-290.
STARR,N. (1974). Optimal and adaptive stopping based on capture times. J. Appl. Probab. 11
294-301.
STARR,N., WARDROP, R. and WOODROOFE, M. (1976). Estimating the mean from delayed observa-
tions. 2. Wahrsch. uerw Gebiete 35 103-113.
VARDI,Y. (1982a). Nonparametric estimation in the presence of length bias. Ann. Statist. 10
616-620.
VARDI,Y. (1982b). Nonparametric estimation in renewal processes. Ann. Statist. 10 772-785.

You might also like