Estimando Una Funcion de Distribucion Con Datos Truncados
Estimando Una Funcion de Distribucion Con Datos Truncados
Michael Woodroofe
The Annals of Statistics, Vol. 13, No. 1. (Mar., 1985), pp. 163-177.
Stable URL:
http://links.jstor.org/sici?sici=0090-5364%28198503%2913%3A1%3C163%3AEADFWT%3E2.0.CO%3B2-3
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained
prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in
the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/journals/ims.html.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic
journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,
and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take
advantage of advances in technology. For more information regarding JSTOR, please contact [email protected].
http://www.jstor.org
Tue Mar 11 14:11:52 2008
The Annals of Statistics
1985, Vol. 13, No. 1, 163-177
only detect objects which are sufficiently bright, say m 5 m*. Then, letting X =
exp[-f ( z ) ] and Y = exp[M - m*]yields the model described above.
In other applications, the Xe may be the sizes of hidden objects for which one
searches for one unit of time and Te = Ye/Xe might be the time at which one
would find the object e, if the search were continued indefinitely. Then the
conditional probability of finding object e given Xe is G(Xe), an unknown but
increasing function of X,. For example, Barouch and Kaufman (1975) have
described models for exploring for petroleum reserves in which the probability of
finding a given pool is proportional to the pool's size. Letting X denote a pool's
size and T denote the time at which it would be found in an infinite search yields
a model which is closely related to Barouch and Kaufman's (1975).
Starr (1974), Starr, Wardrop, and Woodroofe (1976), and Kramer (1983) have
considered a class of optimal stopping problems in which one searches for hidden
objects and receives a reward depending on the objects found, say the sum of
their sizes, less a cost of sampling. Assuming a known stochastic model and
certain other conditions, these authors obtain explicit solutions to the optimal
stopping problem. In addition, they propose adaptive procedures for use when
the total number of objects N is unknown. The estimators studied here may
allow implementation of adaptive procedures in which other quantities, like F,
are estimated sequentially.
Nonparametric MLEs of F and G were derived by Lynden-Bell (1971), who
described another application to astronomy. See also Jackson (1974). Nicoll and
Segal(1980) derive the MLEs for grouped data; and Bhattacharya, Chernoff, and
Yang (1983) derived MLEs from a conditional likelihood function of certain
counts, given the observed X-values. The latter paper also computes the infor-
mation matrix for its model. Bhattacharya, et al. (1983) construct nonparametric
estimators of regression parameters in models like (I), and show asymptotic
normality of estimation error, properly normalized; and Bhattacharya (1983)
considers the asymptotic distribution of a goodness of fit statistic with a view
towards testing hypotheses about regression parameters. None of these papers
give conditions for the consistency and asymptotic normality of the MLEs of F
and G, however.
Here asymptotic properties of these estimators are studied as N -, m. In
Section 2, the conditional distributions of X and Y given Y 5 X are related to
the unconditional distributions F and G. The estimators are described in Section
3. Section 4 considers consistency; if F and G are continuous and if the lower
and upper endpoints of the convex support of G are individually less than or
equal to those of F, then the estimators converge to the true distribution functions
F and G in probability as N + m. Sections 5 and 6 consider normalized estimation
error for the distribution functions. Here d% x estimation error for F converges
in distribution to a Gaussian process if $; (1/G) d F < m; but the asymptotic
variance may be infinite if this integral diverges.
There is some similarity between the estimators studied here and the estimator
of Kaplan and Meier (1958), and hence with the asymptotic results of Breslow
and Crowley (1974). There are also differences. The Kaplan Meier estimator
would be appropriate if XiA Y, = min(Xi, Yi) and 6, = I{Xi 5 Y,] were observed
DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 165
and
so that (aK, b ~ is) the interior of the convex support of K. Then a > 0 in (2) if
a~ < b ~and
, a = 0 unless a~ I b ~If. a > 0 and if F, and G, are related to F and
G by (21, then a ~ =* m a x { a ~a, ~ )b, ~ =
, b ~ a, ~ =, a ~ and
, b ~=
* min(bF, b ~ )In
.
addition, it is convenient to have the following notation: let
X= {(F, G): F(0) = 0 = G(O), a(F, G) > 01,
Xo= ((F, G) E X: a~ I a ~bc, 5 b~),
, T (F, G) = H*, (F, G) E x
LEMMA1. (i) Let (F, G) E X a n d let Fo and Go denote the conditional
distributions of X and Y given X r a~ and Y I b ~ Then
. (Fo, Go) E XOand
TVo, Go) = T(F, G);
(ii) T ( X ) = T(X0).
and #
where
C ( z ) = G*(z) - F*(z-), 0 5 z < w.
PROOF. By the lemma, there is at least one pair (F, G ) E Xo for which
T ( F , G ) = H,. It is shown below that ( 4 ) holds for any such pair, and it then
follows that there is only one such pair, by (3) applied to F and G I , where
Gl(z) = 1 - G ( l / z - ) , z > 0. The proof of ( 4 ) depends on the simple identity
C ( z ) = aA1G(z)[l- F(z-)] for z r 0 , which may be derived as follows:
for all x 2 up;and both sides vanish for x < aF.This establishes the first assertion
in ( 4 ) and the second may be established similarly.
DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 167
2. Let 9 denote the class of all distribution functions on [0, w). Endow 9
with its weak topology; endow 9 x 9with the product topology; and endow
X0, and T ( 9 ) with their relative topologies. Then T is easily seen to be
continuous at all (F, G) E 9 which have no common points of discontinuity.
However, the inverse transformation to To is not continuous. To see this let F
and G be continuous distribution functions with support [0, w); and let G, =
(G + 6,)/2, where 6, denotes the point mass at n for n r 1. Then T(F, G,)
T(F, G) as n -t m, but G, does not converge to G.
-
3. Estimation. Now let F and G denote distribution functions for which
(F, G) E A?;let X and Y denote independent random variables with distri-
bution functions F and G; and let (XI, Y1), ., (XN, YN) be i.i.d. as (X, Y). As
in the introduction, suppose that one observes only those pairs (Xi, Yi) for which
i 5 N and Yi 5 Xi. Suppose that there is at least one such pair, and let
(xl, yl), . ., (x,, y,) denote these pairs, so labeled that (xl, yl), . ., (x,, y,) are
conditionally i.i.d. given n.
To describe the estimators of F and G, let F,*and G,* denote the empirical
distribution functions of xl, . . x, and yl,
a , a , y,,
F:(z) = ( l l n ) # {i 5 n: Xi 5 z),
(5)
G,*(z)= ( l l n ) # ( j In: yj 5 z), 0 5 z < m,
where # A denotes the cardinality of a set A. Thus, F,*and G: estimate the
conditional distribution functions F, and G,. Estimators of F and G may be
constructed from F,*and G,*by using the inversion formula of Theorem 1. Let
and observe that C,(xi) r l l n for all i 5 n. Then Theorem 1 suggests estimating
the cumulative hazard function A by
Observe that A, is a step function with discontinuities (only) at xl, .. ., x,. Thus,
168 M. WOODROOFE
where r(xi) = # (k 5 n: xk = x,] for 1 5 i < n, the product extends over distinct
values of xl, . . . , x,, and an empty product is to be interpreted as one. Of course,
a similar construction is possible for the estimation of G. After some algebra, one
is led to the estimator
in (9), then the resulting estimator F! is not supported by any proper subset of
(xl, . . ., x,]. In fact, l / n k , [ ~ ( ~ )is] the maximum proportion of the estimated
probability 1 - F![x(i)-] which the experimenter is willing to assign to x(,) for
i = l , . . . n.
DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 169
TABLE1
Calculation of p,,
The (x,y ) pairs are listed in order of increasing x values; andpk = p"[x(k]]- f l n [ ~ ( k - l ] ] ,
k = 1, . . . , 10. The sample average and MLE of the mean of F are
f = ,6116 and = .5192.
a, = Jo 6, dl',.
It is easily seen that 6, > 0 if nC,[x(i)] > 1for all i In - 1; otherwise, #, and 6,
may be replaced by Fi and G;. Having estimated a, one may then estimate the
population size by
N, = TI/&,.
EXAMPLE 1. When F and G are both the uniform distribution on the unit
interval, F,(x) = x2 for 0 < x < 1 and the conditional distribution of y1 given xl
is uniform on the interval (0, xl]. To illustrate the properties of the estimators
#, and 6,, n = 10 pairs of (x, y) values were simulated from the latter joint
distribution. The results are listed in Table 1, along with the value of C, and fin.
Observe that there is only one data point in the interval (0, 1/31 and four in the
interval (2/3,11-reflecting the selection bias. The estimator #, attempts to correct
for this bias by assigning higher weight to the smaller values of xl, . ., x,. One
may see the extent of this correction by comparing the observed average 2 = .612
with the MLE of the mean of F, F. = $A x dF, = .519. Of course, the means of F,
and F are 2h and l/2. While assigning larger weights to smaller values may correct
for some bias, it also increases variability. This is illustrated by the erratic
behavior of #,(x) for x 5 l/2.
&{Im lm 1'
h d ~ 4=. h dA - h(l - C)" dA
for all n r 1, where C(z) = a-lG(z)[l - F(z)], z r 0. I n particular,
r
for all n r 1 by Lemma 2. So, since fin(z) 5 i n ( a ) and F(z) 4 A(a) for z s a, it
suffices to show that Pn(B,supxz,l fin(x) - F(x) I r E ] + 0 as n -+ w.
Let A n i = l/nCn(xi) for 1 4 i In; and define Kn and K by
and
w.p.1 as n --, w for a < x < bF. See Billingsley (1968, page 34). Since A is
continuous and i n , n r 1, are monotone, the convergence must be uniform on
a 4 x 4 b for any b < bF; and it follows that the maximum of over any such
interval [a, b] approaches zero w.p.1 as n + w. To complete the proof, let
(12) Rn(a, X) = C i : a u i s x log[l - A n i l + [in(x) - i n ( a ) l
<
for a < x bF and n r 1. Then, by expanding log(1 - A ) in a Taylor series about
X = 0, one finds that there are intermediate points [ni for which I 1 - [ni I 5 Xni
for 1 5 i 4 n,
I Rn(a, X) I = lh C i : a u i s x C;zfXzi --, 0
and
Kn(x) = exp(-[in(x) - in(a)l + x)l
and
Yn(t) = & [ ~ , * ( t )- G, (t)], 0 5 t -< w.
where F,*and G,*are as in (5); and note the change in the use of the symbols
DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 173
" X " and "Y." Then (X,, Y,) is a random element with values in g 2 [ 0 ,w ] =
$3[0,w ] x 9 [O, w ] for each n r 1. If F and G are continuous, then the conditional
distributions of (X,, Y,) given n converge
where X and Yare jointly Gaussian processes on [O, w ) with continuous sample
paths and covariance structure
p,(s,t)=F,(s)-F,(s)F,(t), Ossst<w,
Indeed, the convergence of the finite dimensional distributions of (X,, Y,) follows
directly from the univariate central limit theorem and the Cramer-Wold device;
and the tightness of the distributions of the pairs (X,, Y,), n r 1, follows from
that of the components.
Observe that the covariance functions p,, p,, and p, may be consistently
estimated.
Now suppose that F and G are continuous and that ( F , G ) E X0.Fix values
of a and b for which aG< a < b < bF and let
THEOREM 3. Suppose that F and G are continuous and that (F, G ) E Xo.If
a~ < a < b < b ~then
,
W,,, + W a = W ? + W ; , as n + w ,
where
W ? ( t )= it S ) ( s ) - Y ( s ) dF,(s)]
C ( S ) - ~ [ X (dG,
and
-
PROOF. Recall that C = a-lG(1 - F), so that C(z) a-lG(z) as a 4 a ~ .
Suppose first that (15) holds. Then the variance of X(a)/C(a) is at most
) [ l - F ( u ) ] - ~$& (1/G) dF, which tends to zero as a J. up. Next,
c ( ~ ) - ~ F , ( aI
write WY = WY1 - WY2, where Wyl(t) = JL C 2 X dG, and W e = $L C-2Y dF,
for a~ < a < t < bF. Thus, to show that lim WY(t) exists in probability for all
t > aF, it suffices to show that the variances of WPl(t) and WY2(t) remain
bounded as a 4 aF for some t < up. If a~ < a < z < b ~ then , the variance of
WYl(z) is
for some constant B; and the last line is finite, by assumption. A similar argument
shows that the variance cri ( 2 ) of WYp (2) remains bounded as a J. aF, if (15) holds.
If (15) fails, then a careful examination of (16) shows that a;(z) + w
DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 175
as a 1 a ~ ai(z)
. may either diverge or remain bounded, depending on whether
(FIG) d F = w or < w, but one may show that a%(z)/af(z) + 0 in either case.
That the variance of WP(z) diverges is an easy consequence. The details are
omitted.
and
Zn(t) = &[Pn(t) - F(t)], 0 It 5 b, n r 1.
Then Wn and Zn take values in g[O, b] w.p.1 for all n r 1.
THEOREM 5. Suppose that F and G are continuous, that (F, G) E %o, that
(15) holds, and that ac = a~ = 0. Then Wn + W and Z, +Z, as n -+ w, where
and
Z ( t ) = [l - F(t)]W(t), 0 5 t 5 b,
with the convention 010 = 0 when t = 0.
for a 5 t < w.
7. The condition (15) is not surprising, since it is necessary for the convergence
in distribution of & x estimation error even in the case when G is known. In
this case, the nonparametric maximum likelihood estimator of F is F n ( t ) =
~ = ~ for t 2 0 and n r 1; and it is easily seen that
[Ci:xist ~ / G ( X ~ ) ] / [ Cl/G(xi)]
& [ ~ , ( t ) - ~ ( t )converges
] in distribution for all t > a~ r a~ iff (15) holds.
8. If (15) fails, then other limiting distributions may obtain, Suppose, for
example, that F is continuous, that a~ = 0, and that G = Fc, where 1 < c < w.
+
Let 6 = 1/(1 c). Then n6[Fn(t)- F ( t ) ] has a limiting stable distribution for all
t > 0 for which F(,t) < 1. To see this, fix t and write
( x ~i) = 1, 2, . .
properties of empirical processes. Let zi = ( l / C ( x i ) ) I ( ~ , ~ )for
Then zi, z2, . . . are i.i.d. with common mean A(t). Now, it is easily seen that zi
is in the domain of attraction of a stable distribution with characteristic exponent
y = (1 + c)/c and skewness parameter 1, in Feller's (1966, pages 540-543)
terminology. So, n6[A,(t) - A(t)J has a limiting stable distribution as n + w. (In
fact, the same stable distribution is obtained for all t.) That n6[fin(t)- F ( t )J has
a limiting distribution, now follows from (17) by using a stable distribution to
bound z? + . . . + 22, in (18).
REFERENCES
BAROUCH, E. and KAUFMAN, G. M. (1975). Probabilistic modelling of oil and gas discovery. Energy
133-152.
BHATTACHARYA, P. K. (1983). Justification for a K-S type test for the slope of a truncated regression.
Ann. Statist. 11 697-701.
BHATTACHARYA, P. K., CHERNOFF, H. and YANG,S. S. (1983). Nonparametric estimation of the
slope of a truncated regression. Ann. Statist. 11 505-514.
BILLINGSLEY, P. (1968). Convergence of Probability Measures. Wiley, New York.
BREIMAN, L. (1968). Probability. Addison-Wesley, Reading, Massachusetts.
BRESLOW, N. and CROWLEY, J. (1974). A large sample study of the life table and product limit
estimates under random censorship. Ann. Statist. 2 437-453.
FELLER,W. (1966). An Introduction to Probability Theory and its Applications, Vol. 2. Wiley, New
York.
JACKSON, J. C. (1974). The analysis of quasar samples. Mon. Not. R. Astr. Soc. 166 281-295.
KAPLAN, E. L. and MEIER,P. (1958). Nonparametric estimation from incomplete observations. J.
Amer. Statist. Soc. 53 457-481.
KRAMER, M. (1983). Stopping a size dependent exploration process. Ph.D. Thesis, The University of
Michigan.
LYNDEN-BELL, D. (1971). A method of allowing for known observational selection in small samples
applied to 3CR quasars. Mon. Not. R. Astr. Soc. 155 95-118.
NICOLL,J . F. and SEGAL,I. E. (1980). Nonparametric elimination of the observational cutoff bias.
Astron. Astrophys. 82 L3-L6.
SEGAL,I. E. (1975). Observational validation of the chronometric cosmology: I. Preliminaries and
the red shift-magnitude relation. Proc. Nut. Acad. Sci. 7 2 2437-2477.
SKOROHOD, A. V. (1956). Limit theorems for stochastic processes. Theor. Probab. Appl. 1 261-290.
STARR,N. (1974). Optimal and adaptive stopping based on capture times. J. Appl. Probab. 11
294-301.
STARR,N., WARDROP, R. and WOODROOFE, M. (1976). Estimating the mean from delayed observa-
tions. 2. Wahrsch. uerw Gebiete 35 103-113.
VARDI,Y. (1982a). Nonparametric estimation in the presence of length bias. Ann. Statist. 10
616-620.
VARDI,Y. (1982b). Nonparametric estimation in renewal processes. Ann. Statist. 10 772-785.