0% found this document useful (0 votes)

23 views16 pages

Estimando Una Funcion de Distribucion Con Datos Truncados

Uploaded by

DiegoFernado Guichon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views16 pages

Estimando Una Funcion de Distribucion Con Datos Truncados

Uploaded by

DiegoFernado Guichon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Estimating a Distribution Function with Truncated Data

Michael Woodroofe

The Annals of Statistics, Vol. 13, No. 1. (Mar., 1985), pp. 163-177.

Stable URL:
http://links.jstor.org/sici?sici=0090-5364%28198503%2913%3A1%3C163%3AEADFWT%3E2.0.CO%3B2-3

The Annals of Statistics is currently published by Institute of Mathematical Statistics.

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained
prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in
the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/journals/ims.html.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic
journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,
and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take
advantage of advances in technology. For more information regarding JSTOR, please contact [email protected].

http://www.jstor.org
Tue Mar 11 14:11:52 2008
The Annals of Statistics
1985, Vol. 13, No. 1, 163-177

ESTIMATING A DISTRIBUTION FUNCTION WITH TRUNCATED

DATA

The University of Michigan and Rutgers University

Let 9 be a finite population with N r 1elements; for each e E 9 let X,
and Yebe independent, positive random variables with unknown distribution
functions F and G; and suppose that the pairs (X., Ye) are i.i.d. We consider
the problem of estimating F, G, and N when the data consist of those pairs
(X., Ye) for which e E 9 and Ye 5 X,. The nonparametric maximum
likelihood estimators (MLEs) of F and G are described; and their asymptotic
properties as N + oo are derived. It is shown that the MLEs are consistent
against pairs (F, G) for which F and G are continuous, G-'(0) 5 F-'(O), and
G-'(1) 5 F-'(1). f i x estimation error for F converges in distribution to a
Gaussian process if 5; (1/G) d F < m, but may fail to converge if this integral
is infinite.

1. Introduction. Consider a finite population 9 whose size N is large, but

otherwise unknown, For each element e E 9 let Xe and Ye denote independent,
positive random variables with distribution functions F and G, say; and suppose
that (X,, Ye),e E 9 are i.i.d., as (X, Y), say. Finally, suppose that one observes
(only) those pairs (X,, Ye) for which Ye IXe, but not the labels e E 9 The
problem considered is that of estimating F, G, and N. Nonparametric maximum
likelihood estimators (MLEs) of F and G, described in (8) and (9) below, have
been derived by several authors, listed below, from different perspectives. Here
the asymptotic properties of the estimators are studied, and still another deri-
vation suggested.
This model arises in astronomy. The absolute and apparent luminosities of an
astronomical object are defined to be its brightness at a fixed distance and as
observed on earth; and magnitude is defined to be the negative logarithm of
luminosity. In some models, the redshift z and the absolute magnitude M of
astronomical objects are assumed to be independent random variables which are
related to the apparent magnitude m by the equation
(1) m = f (z) + M,
where f is a known function, or at least a nearly known one. For example,
Hubble's Law specifies that f (z) = 5 log z, and Segal's Chronometric Theory
specifies that f (z) = (5/2)log[z/(l +
z)]. See Segal (1975). Of course, one can

Received June 1983; revised March 1984.

' Presented at the Jack Kiefer-Jacob Wolfowitz Memorial Statistical Research Conference;
dedicated to their memories.
Research supported by the National Science Foundation under MSC-8101897.
AMS 1980 subject classifications. Primary 62F20; secondary 62G05.
Key words and phrases. Nonparametric, maximum likelihood estimation, consistency, asymptotic
distributions.
163
164 M. WOODROOFE

only detect objects which are sufficiently bright, say m 5 m*. Then, letting X =
exp[-f ( z ) ] and Y = exp[M - m*]yields the model described above.
In other applications, the Xe may be the sizes of hidden objects for which one
searches for one unit of time and Te = Ye/Xe might be the time at which one
would find the object e, if the search were continued indefinitely. Then the
conditional probability of finding object e given Xe is G(Xe), an unknown but
increasing function of X,. For example, Barouch and Kaufman (1975) have
described models for exploring for petroleum reserves in which the probability of
finding a given pool is proportional to the pool's size. Letting X denote a pool's
size and T denote the time at which it would be found in an infinite search yields
a model which is closely related to Barouch and Kaufman's (1975).
Starr (1974), Starr, Wardrop, and Woodroofe (1976), and Kramer (1983) have
considered a class of optimal stopping problems in which one searches for hidden
objects and receives a reward depending on the objects found, say the sum of
their sizes, less a cost of sampling. Assuming a known stochastic model and
certain other conditions, these authors obtain explicit solutions to the optimal
stopping problem. In addition, they propose adaptive procedures for use when
the total number of objects N is unknown. The estimators studied here may
allow implementation of adaptive procedures in which other quantities, like F,
are estimated sequentially.
Nonparametric MLEs of F and G were derived by Lynden-Bell (1971), who
described another application to astronomy. See also Jackson (1974). Nicoll and
Segal(1980) derive the MLEs for grouped data; and Bhattacharya, Chernoff, and
Yang (1983) derived MLEs from a conditional likelihood function of certain
counts, given the observed X-values. The latter paper also computes the infor-
mation matrix for its model. Bhattacharya, et al. (1983) construct nonparametric
estimators of regression parameters in models like (I), and show asymptotic
normality of estimation error, properly normalized; and Bhattacharya (1983)
considers the asymptotic distribution of a goodness of fit statistic with a view
towards testing hypotheses about regression parameters. None of these papers
give conditions for the consistency and asymptotic normality of the MLEs of F
and G, however.
Here asymptotic properties of these estimators are studied as N -, m. In
Section 2, the conditional distributions of X and Y given Y 5 X are related to
the unconditional distributions F and G. The estimators are described in Section
3. Section 4 considers consistency; if F and G are continuous and if the lower
and upper endpoints of the convex support of G are individually less than or
equal to those of F, then the estimators converge to the true distribution functions
F and G in probability as N + m. Sections 5 and 6 consider normalized estimation
error for the distribution functions. Here d% x estimation error for F converges
in distribution to a Gaussian process if $; (1/G) d F < m; but the asymptotic
variance may be infinite if this integral diverges.
There is some similarity between the estimators studied here and the estimator
of Kaplan and Meier (1958), and hence with the asymptotic results of Breslow
and Crowley (1974). There are also differences. The Kaplan Meier estimator
would be appropriate if XiA Y, = min(Xi, Yi) and 6, = I{Xi 5 Y,] were observed
DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 165

for 1 I i I N; here both Xi and Yi are observed if Yi I Xi, and nothing is

observed otherwise. In terms of the asymptotic distributions, this difference leads
to the possibility of an infinite variance for fix estimation error.
There is also some similarity with recent results of Vardi (1982a, 1982b). He
considers generalizations of our model when G is known, and obtains both
nonparametric MLEs and asymptotic distributions.

2. A Transformation. Let X and Y denote independent, positive random

variables with distribution functions F and G, taken to be continuous from the
right. Let H, denote the joint distribution function of X and Y given Y I X; and
let F, and G, denote the marginal distribution functions of X and Y given
Y IX. Thus,
rx

F,(x) = H,(x, w) and G,(y) = H,(w, y), 0 I X , Y< W,

where a = So" G(z) dF(z) = So" [I - F(z-)] dG(z) is assumed to be positive. Here
y A z denotes the minimum of y and z for 0 I y, z < m; F(z-) = PIX < z) for
z r 0; and Sk = S(a,blfor 0 I a < b I w. There is little hope of finding consistent
estimators of F and G from the data described in the introduction, unless F, and
G, determine F and G. So, this question is investigated first.
If K is any distribution function on [0, w), let

and

so that (aK, b ~ is) the interior of the convex support of K. Then a > 0 in (2) if
a~ < b ~and
, a = 0 unless a~ I b ~If. a > 0 and if F, and G, are related to F and
G by (21, then a ~ =* m a x { a ~a, ~ )b, ~ =
, b ~ a, ~ =, a ~ and
, b ~=
* min(bF, b ~ )In
.
addition, it is convenient to have the following notation: let
X= {(F, G): F(0) = 0 = G(O), a(F, G) > 01,
Xo= ((F, G) E X: a~ I a ~bc, 5 b~),
, T (F, G) = H*, (F, G) E x
LEMMA1. (i) Let (F, G) E X a n d let Fo and Go denote the conditional
distributions of X and Y given X r a~ and Y I b ~ Then
. (Fo, Go) E XOand
TVo, Go) = T(F, G);
(ii) T ( X ) = T(X0).

PROOF.Since Y I X implies X r a~ and Y I b~w.p.1, T (F, G) = T(F0, Go).

To see that a ~ I , a ~ , ,observe that a ~ =, a ~ since
, (F, G) E and that
, r a~ = a ~ , A
a ~=, m a x ( a ~aG) . similar argument shows that b ~ I, b ~to
, complete
the proof of (i). Assertion (ii) then follows since XoC %
166 M. WOODROOFE

Recall that the cumulative hazard function of a distribution function F (with

F(0) = 0 ) is defined by

A(x) = 1' d F ( z ) / [ l - F(z-)], 0 5 x < w.

The cumulative hazard function A uniquely determines the distribution F by

the following algorithm; let D denote the set of x for which 0 I x < b~ and
X(x) = A(x) - A(x-) > 0; then

where A,(x) = A(x) - X ( Z ) , 0 5 x < bF.

THEOREM1. Suppose that H , E T ( X ) . Then there is a unique pair

(F, G ) E Xofor which T ( F , G ) = H,. Here the pair ( F , G ) is determined by the
conditions

and #

where
C ( z ) = G*(z) - F*(z-), 0 5 z < w.

PROOF. By the lemma, there is at least one pair (F, G ) E Xo for which
T ( F , G ) = H,. It is shown below that ( 4 ) holds for any such pair, and it then
follows that there is only one such pair, by (3) applied to F and G I , where
Gl(z) = 1 - G ( l / z - ) , z > 0. The proof of ( 4 ) depends on the simple identity
C ( z ) = aA1G(z)[l- F(z-)] for z r 0 , which may be derived as follows:

for 0 5 z < a. Since a~ 5 aF, it follows easily that

1' dF,(z)/C(z) = iR: G(z)d ~ ( z ) l a ~ ( z )

for all x 2 up;and both sides vanish for x < aF.This establishes the first assertion
in ( 4 ) and the second may be established similarly.
DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 167

COROLLARY 1. Let (F, G) E 2 and let Foand Go be the conditional distribu-

tions of X and Y given X r a~ and Y 5 bF, as in Lemma 1. Then (Fo, Go) is the
only pair in X0for which T (Fo,Go) = T (F, G).

COROLLARY 2. Let To denote the restriction of T to X0.Then To has an

inverse function.

PROOFS.Lemma 1 asserts that (Fo,Go) E X0and T (Fo,Go) = T (F, G); and

the theorem asserts that there is only one such pair. This establishes the first
corollary. The second then follows, since (Fo,Go) = (F, G) when (F, G) E Xo.

REMARKS 1. The inversion formula of Theorem 1 uses only the marginal

distributions of H, .

2. Let 9 denote the class of all distribution functions on [0, w). Endow 9
with its weak topology; endow 9 x 9with the product topology; and endow
X0, and T ( 9 ) with their relative topologies. Then T is easily seen to be
continuous at all (F, G) E 9 which have no common points of discontinuity.
However, the inverse transformation to To is not continuous. To see this let F
and G be continuous distribution functions with support [0, w); and let G, =
(G + 6,)/2, where 6, denotes the point mass at n for n r 1. Then T(F, G,)
T(F, G) as n -t m, but G, does not converge to G.
-
3. Estimation. Now let F and G denote distribution functions for which
(F, G) E A?;let X and Y denote independent random variables with distri-
bution functions F and G; and let (XI, Y1), ., (XN, YN) be i.i.d. as (X, Y). As
in the introduction, suppose that one observes only those pairs (Xi, Yi) for which
i 5 N and Yi 5 Xi. Suppose that there is at least one such pair, and let
(xl, yl), . ., (x,, y,) denote these pairs, so labeled that (xl, yl), . ., (x,, y,) are
conditionally i.i.d. given n.
To describe the estimators of F and G, let F,*and G,* denote the empirical
distribution functions of xl, . . x, and yl,
a , a , y,,
F:(z) = ( l l n ) # {i 5 n: Xi 5 z),
(5)
G,*(z)= ( l l n ) # ( j In: yj 5 z), 0 5 z < m,
where # A denotes the cardinality of a set A. Thus, F,*and G: estimate the
conditional distribution functions F, and G,. Estimators of F and G may be
constructed from F,*and G,*by using the inversion formula of Theorem 1. Let

and observe that C,(xi) r l l n for all i 5 n. Then Theorem 1 suggests estimating
the cumulative hazard function A by

Observe that A, is a step function with discontinuities (only) at xl, .. ., x,. Thus,
168 M. WOODROOFE

Equation (3) suggests estimating F by

where r(xi) = # (k 5 n: xk = x,] for 1 5 i < n, the product extends over distinct
values of xl, . . . , x,, and an empty product is to be interpreted as one. Of course,
a similar construction is possible for the estimation of G. After some algebra, one
is led to the estimator

where s(y,) = # (k 5 n: yk = yj] for 1 Ij 5 n.

The estimators F, and G, were derived by Lynden-Bell (1971). Suppose, for
simplicity, that there are no ties among xl, . . ., x,, yl, . . ., y, and consider
estimating F and G by distributions which are supported by (xl, . . ., x,]
and (yl, . . ., y,). For such distributions, the conditional likelihood function given
n is

where p l , . . ., p, and ql, . . . , q, are the masses assigned to xl, . . ., x, and

yl, . . ., y,. This likelihood function may be maximized with respect to p l , . . .,
pn and ql, . . . , q,; and the estimators F, and 4, result, provided that (10) below
does not occur. Alternatively, one may show that FE and G,* are the nonpara-
metric, maximum likelihood estimators of F, and G, and then use the invariance
properties of maximum likelihood estimators. The alternative derivation is not
substantially simpler than the direct one, however.
The estimators Pnand Gn may be supported by proper subsets of (xl, . . ., x,)
and (yl, . . ., y,). Let ql)< x(*) < . . . < x(,) and y(l) < . . . < y(,) denote the
ordered values of xl, . . ., x, and yl , . . . , y,. If
(10) ~ C , [ X ( ~=) ]1, for some k, 1 5 k < n,
then

This a disturbing property of the estimators, since it may lead to unreasonable

estimates. For example, it is possible to have k [ ~ ( ~= )1.] It is shown below that
the probability of (10) approaches zero as N + w, if F and G are continuous; but
this will be of little comfort when (10) occurs.
The problems which result from (10) may be overcome in a simple, if ad hoc,
manner. Let k, be a nonincreasing function for which k,(x) > k,[x(,)] = l l n for
all x < x(,) . If C, is replaced by

in (9), then the resulting estimator F! is not supported by any proper subset of
(xl, . . ., x,]. In fact, l / n k , [ ~ ( ~ )is] the maximum proportion of the estimated
probability 1 - F![x(i)-] which the experimenter is willing to assign to x(,) for
i = l , . . . n.
DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 169

TABLE1
Calculation of p,,

The (x,y ) pairs are listed in order of increasing x values; andpk = p"[x(k]]- f l n [ ~ ( k - l ] ] ,
k = 1, . . . , 10. The sample average and MLE of the mean of F are
f = ,6116 and = .5192.

It is especially interesting that one may estimate a , the probability that

Y 5 X, when one observes only those pairs (Xi, Yi) for which i 5 N a n d Yi 5 Xi.
The nonparametric maximum likelihood estimator of a is
Pm

a, = Jo 6, dl',.
It is easily seen that 6, > 0 if nC,[x(i)] > 1for all i In - 1; otherwise, #, and 6,
may be replaced by Fi and G;. Having estimated a, one may then estimate the
population size by
N, = TI/&,.
EXAMPLE 1. When F and G are both the uniform distribution on the unit
interval, F,(x) = x2 for 0 < x < 1 and the conditional distribution of y1 given xl
is uniform on the interval (0, xl]. To illustrate the properties of the estimators
#, and 6,, n = 10 pairs of (x, y) values were simulated from the latter joint
distribution. The results are listed in Table 1, along with the value of C, and fin.
Observe that there is only one data point in the interval (0, 1/31 and four in the
interval (2/3,11-reflecting the selection bias. The estimator #, attempts to correct
for this bias by assigning higher weight to the smaller values of xl, . ., x,. One
may see the extent of this correction by comparing the observed average 2 = .612
with the MLE of the mean of F, F. = $A x dF, = .519. Of course, the means of F,
and F are 2h and l/2. While assigning larger weights to smaller values may correct
for some bias, it also increases variability. This is illustrated by the erratic
behavior of #,(x) for x 5 l/2.

4. Consistency. In this section, F and G denote continuous distribution

functions for which (F, G) E A?j and (XI, Yl), (X2, Y2), . . denote i.i.d. random
- -
vectors for which Xl F and Yl G are independent. We imagine the estimators
170 M. WOODROOFE

fin and Gn computed from the populations 9={I,2, . . - , N ) for N = 1, 2, . . .

and investigate the limiting behavior of #,, and 6, as N + w. Let (xl, yl),
(x2, y2), . . . denote the successive values of (Xi, Yi) for which Yi IXi. Then
(xl, yl), (x2,y2), - . are i.i.d. with the common joint distribution function H, of
(2). As in Section 3, let n = n~ = # ( i 5 N: Yi d Xi] for N r 1. Then n
Binomial(N, a ) for all N 2 1; and the conditional distribution of (xl, yl), . . .,
-
(xk,yk) given n = k is the same as their unconditional distribution for 1 5 k I
N. Let Pndenote conditional probability given n. Below, the Pn-probability limits
of fin and Gn are determined as n w. It then follows that these are also the
limits in unconditional probability as N -+ w.
The following lemma may be of independent interest, since it computes the
bias of the estimator A,.
LEMMA 2. Suppose that F and G are continuous and that (F, G) E Wo. If h is
a measurable function for which Jr
I h I dA < w, then

&{Im lm 1'
h d ~ 4=. h dA - h(l - C)" dA
for all n r 1, where C(z) = a-lG(z)[l - F(z)], z r 0. I n particular,
r

PROOF. If h is integrable with respect to A and n r 1, then

Now, the conditional distribution of nCn(xi) - 1 = # ( j5 n: j # i, yj 5 xi 5 X j J

-
given n and xi is binomial with parameters n - 1 and C(xi) for each i = 1, ., n.
so,

for all i = 1, . . ., n, by an elementary calculation. Since dA = dF,/C, the first

assertion of the lemma now follows from multiplying (11) by h(xi), integrating
over xi, and summing over i = 1, . . ., n. The second assertion then follows by
letting h be the indicator of [0, x] for fixed x, 0 < x Ib ~ .
Observe that the conditional bias of i,(x) approaches zero as n -+ w for all
x < bF, but may do so arbitrarily slowly.
THEOREM2. Let F and G be continuous distribution functions for which
(F, G) E 2, and let Foand Go denote the conditional distributions of Xl and Yl
given Xl Ia~ and Yl Ib ~respectively.
, Then

PROOF.Since the distribution function H, remains unchanged when F and

DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 171

G are replaced by Foand Go by Lemma 1, it suffices to prove the theorem in the

special case that (F, G) E Wo. Moreover, it suffices to prove the convergence of
fin.
Given e, 0 < e < 1, let a > a~ be such that A(a) < e2/4 and let B = B , , be the
event B = ( i n ( a )5 ~ / 2 )Then
.

for all n r 1 by Lemma 2. So, since fin(z) 5 i n ( a ) and F(z) 4 A(a) for z s a, it
suffices to show that Pn(B,supxz,l fin(x) - F(x) I r E ] + 0 as n -+ w.
Let A n i = l/nCn(xi) for 1 4 i In; and define Kn and K by

and

for x r a and n r 1. Then 1 - fin(x) = [ l - f i n ( a ) ] ~ n ( xand

) 1 - F(x) =
[I - F(a)]K(x) for all x r a and n r 1. If B occurs, then

for all x r a and n r 1 by simple algebra. So, it suffices to show that

sup,,,l Kn(x)- K(x) I --, 0 w.p.1 as n + (on the space of (xi, yi), i r 1).In fact,
since K is continuous and each Kn is monotone, it suffices to show that Kn(x) +
K(x) w.p.1 for each fixed x, a 4 x < bF (cf. Breiman, 1968, page 160).
Since sup,,ol F,*- F, I -,0 c sup,,Ol GX(y) - G,(y) I w.p.1 as n 4 w and
since C is positive and continuous on the interval (ac, bF), one finds that
supaczSxll/Cn(z) - l/C(z) I + 0 w.p.1 as n + for all x, a < x < b ~ SO,.

w.p.1 as n --, w for a < x < bF. See Billingsley (1968, page 34). Since A is
continuous and i n , n r 1, are monotone, the convergence must be uniform on
a 4 x 4 b for any b < bF; and it follows that the maximum of over any such
interval [a, b] approaches zero w.p.1 as n + w. To complete the proof, let
(12) Rn(a, X) = C i : a u i s x log[l - A n i l + [in(x) - i n ( a ) l
<
for a < x bF and n r 1. Then, by expanding log(1 - A ) in a Taylor series about
X = 0, one finds that there are intermediate points [ni for which I 1 - [ni I 5 Xni
for 1 5 i 4 n,
I Rn(a, X) I = lh C i : a u i s x C;zfXzi --, 0
and
Kn(x) = exp(-[in(x) - in(a)l + x)l

w.p.1 as n + w for a < x < b ~This

. completes the proof.
172 M. WOODROOFE

COROLLARY 3 . If F and G are continuous and (F, G) E s o , then

sup1 Pn- F I -+ 0 t sup I &, - G I in Pn-probability as n + w.

COROLLARY 4. If F and G are continuous and (F, G) E Yo, then 6, -+ cu in

Pn-probability as n + w and N ~ / N+ 1 in probability as N -+ w.

5. If F and G are continuous and (F, G) E Yo, then

COROLLARY
P,(~C,[X(~,]
= 1, for some i 5 n - 1)+ 0
and
min{nCn[xci,]:1 5 i 5 (1 - c ) n ) + w
in Pn-probability as n -+ w for all c, 0 < c < 1.

PROOFS. Corollary 3 is clear, and the convergence of 6, to cu in Corollary 4

follows. That N ~ / N
-+ 1 then follows, since n/N -+ cu w.p.1 as N + w.
The second assertion in Corollary 5 follows from the relation
Pn[x(i)] - pn[x(i) - I = (1 - Pn[x(i) - I )/nCn[xci)l
for all i 5 n and n r 1. Let 0 < e < 1 and k = k(n, e ) = [ ( I - e)n] + 1, where [.I
denotes the greatest integer function. Then 1/(1- fin[xck)]]is stochastically
bounded and maxisn Pn(xi) - Pn(xi -) + 0 in Pn-probability as n + w, both by
Theorem 2. This proves the second assertion in Corollary 4. The first assertion
then follows from the second and its dual, obtained by reversing the roles of
(X, Y) and (l/Y, l/X), by observing that nCn[xcs]= 1implies that nCn[y(i+l)- ]
=lforl5i5n-1.

REMARK3. In the astronomy example, improved instrumentation might

change m*. In turn, this could change the definitions of Y, a ~ and
, Fo, the
asymptotic value of Pn.

REMARK 4. Since the joint distribution H, depends on F and G only through

FOand GO,it is not possible to test the hypotheses a~ 5 a~ and bG 5 bF using
(XI,YI), . . , ( ~ n~, n ) .
5. Convergence on compact intervals. For 0 5 a < b 5 w, let B[a, b]
be the space of all functions f from [a, b] into R = (-my w ) which are right
continuous on [a, b), have left-hand limits on (a, b], and are continuous at b.
Endow B[a, b] with the Skorohod topology, as described by Billingsley (1968,
Section 14). For each n r 1, define the stochastic processes Xn and Yn by

and
Yn(t) = & [ ~ , * ( t )- G, (t)], 0 5 t -< w.
where F,*and G,*are as in (5); and note the change in the use of the symbols
DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 173

" X " and "Y." Then (X,, Y,) is a random element with values in g 2 [ 0 ,w ] =
$3[0,w ] x 9 [O, w ] for each n r 1. If F and G are continuous, then the conditional
distributions of (X,, Y,) given n converge

where X and Yare jointly Gaussian processes on [O, w ) with continuous sample
paths and covariance structure
p,(s,t)=F,(s)-F,(s)F,(t), Ossst<w,

and p,(s, t ) = H,(s, t ) - F,(s)G,(t), 0 5 s, t 5 03.

Indeed, the convergence of the finite dimensional distributions of (X,, Y,) follows
directly from the univariate central limit theorem and the Cramer-Wold device;
and the tightness of the distributions of the pairs (X,, Y,), n r 1, follows from
that of the components.
Observe that the covariance functions p,, p,, and p, may be consistently
estimated.
Now suppose that F and G are continuous and that ( F , G ) E X0.Fix values
of a and b for which aG< a < b < bF and let

= lt& ( X , - Y,) dFz + lt $j X dC

w.p.1 for a 5 t 5 b and n r 1. The processes W,,,, n z 1, are random elements

with values in g [ a , b].

THEOREM 3. Suppose that F and G are continuous and that (F, G ) E Xo.If
a~ < a < b < b ~then
,
W,,, + W a = W ? + W ; , as n + w ,
where

W ? ( t )= it S ) ( s ) - Y ( s ) dF,(s)]
C ( S ) - ~ [ X (dG,

and

PROOF. First observe that C = G, - F , is positive and continuous on [a, b],

since ac < a < b < bF. SO, expressions like X / C and $A C 2 X dG, define
continuous transformations from 9 [ a ,b] back into 9 [ a ,b]. Since weak conver-
174 M. WOODROOFE

gence is preserved by such continuous transformations, it suffices to show that

in P,-probability as n -+ w, with Z, = X, - Y,, n r 1. To see this, one may

replace C,, FX, and Z, by other random elements, also denoted by C,, FX,
and Z,, which have the same joint distribution and converge to C, F,, and
Z = X - Y w.p.1. as n +w. See Skorohod (1956). That A, -+ 0 w.p.1 then follows
from Theorem 5.5 of Billingsley (1968) by considering a sequence t,, n z 1, of
random variables for which the supremum is nearly attained. The details are
omitted. For a closely related argument, see Breslow and Crowley (1974, pages
447-448).
Of course, one would like to set a = aF in Theorem 3. If ac < a ~then
, this is
possible. If ac = a ~then
, the limiting process may not be defined.

THEOREM 4. Suppose that F and G are continuous, that (F, G) E X0,and

that a~ = a ~If.

then X(a)/C(a) + 0 and WY(t) + J & C-2[X dG, - Y dF,] in probability as

a 4 a ~for
, aF < t < bF. Conversely, if (15) fail., then the variance of WY(t) diverges
to as a J. a~ for any t E (aF, b ~ ) .

-
PROOF. Recall that C = a-lG(1 - F), so that C(z) a-lG(z) as a 4 a ~ .
Suppose first that (15) holds. Then the variance of X(a)/C(a) is at most
) [ l - F ( u ) ] - ~$& (1/G) dF, which tends to zero as a J. up. Next,
c ( ~ ) - ~ F , ( aI
write WY = WY1 - WY2, where Wyl(t) = JL C 2 X dG, and W e = $L C-2Y dF,
for a~ < a < t < bF. Thus, to show that lim WY(t) exists in probability for all
t > aF, it suffices to show that the variances of WPl(t) and WY2(t) remain
bounded as a 4 aF for some t < up. If a~ < a < z < b ~ then , the variance of
WYl(z) is

for some constant B; and the last line is finite, by assumption. A similar argument
shows that the variance cri ( 2 ) of WYp (2) remains bounded as a J. aF, if (15) holds.
If (15) fails, then a careful examination of (16) shows that a;(z) + w
DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 175

as a 1 a ~ ai(z)
. may either diverge or remain bounded, depending on whether
(FIG) d F = w or < w, but one may show that a%(z)/af(z) + 0 in either case.
That the variance of WP(z) diverges is an easy consequence. The details are
omitted.

6. Convergence at an endpoint. In this section, we suppose that F and

G are continuous, that (F, G) E Yo, and that (15) holds. In this case, the limiting
distributions developed in the last section are valid when a = a ~ To . avoid
trivialities and simplify the notation, we suppose that a~ = a~ = 0 throughout.
Fix a value of b for which 0 < b < bF and define processes

and
Zn(t) = &[Pn(t) - F(t)], 0 It 5 b, n r 1.
Then Wn and Zn take values in g[O, b] w.p.1 for all n r 1.

THEOREM 5. Suppose that F and G are continuous, that (F, G) E %o, that
(15) holds, and that ac = a~ = 0. Then Wn + W and Z, +Z, as n -+ w, where

and
Z ( t ) = [l - F(t)]W(t), 0 5 t 5 b,
with the convention 010 = 0 when t = 0.

PROOF. That W is well defined follows from Theorem 4. To show that

Wn + Was n -+ w, it suffices to show that Wn(a) --, 0 in Pn-probability as first
n -+ w and then a + 0. See Theorem 3. Now, as in (14),

Wn(a) = 1 a (l/CCn)(Xn - Yn) dFE + 1 a C-l dXn

for a > 0 and n r 1. Given n r 1, IIn(a) is a normalized sum of i.i.d. random

variables, and E,(II,(u)~) IJ: C-2 dF* which is independent of n and tends to
zero as a J, 0, since C-2 dF* is finite. Thus, IIn(a) converges to zero in
Pn-probability as n --, w and then a 10. Next, recall that d i n = dFE/Cn and
write

I In(a)I 5 Ja C-l I Xn - Yn I d i n 5 B.,, Ja CT1d i n

for a > 0 and n r 1, where B , , = sup,,, ( X n ( t ) - Yn(t)1. Now B , , --, 0 in

176 M. WOODROOFE

P,-probability as n + w and then a J 0; and, by Lemma 2, En{$o C-l d i n ] 5

J8 C-l d h , which is independent of n and tends to zero as a J 0. This completes
the proof that W, + W and n + w.
Now consider 2,. With R, as in (12),

(17) zn(t) = - $n(t)l J-{

n exp [&
1
I
- Wn(t) -Rn(O, t ) - 1
>
for 0 5 t 5 b and n r 1. So, it suffices to show that maxtsb & I R,(O, t ) I + 0 in
Pn-probability as n + w. Now, maxtab I R,(O, t ) I = I R,(O, b) I ; and

where B, = max(t;?: xi 5 b) and t,i, 1 5 i 5 n, are intermediate points as in

(12). Now, B, is bounded in Pn-probability, by Corollary 3; and the expectation
of the sum in (17) is at most ( l l n ) $8 C-2 dF*, as in the proof of Lemma 2. (The
conditional distribution of &,(xi) - 1 given n and s is binomial [n - 1, C(xi)]
for 15 i 5 n.) Thus, Rn(O,b) = Op(l/n) = +(I/&) in Pn-probabilityto complete
the proof.

REMARKS5. By Corollary 5, Theorems 2, 3, and 5 are valid if fin is replaced

by the modification F! of (9), provided the constants knl, . . . , k,, are bounded.
Indeed, Corollary 5 asserts that P,(F{(z) = F,(z) for all z 5 b] + 1as n -, for
any b < bF, in this case.

6. There is a dual to Theorem 5. Suppose that F and G are continuous, that

(F, G) E Xo, that bG = b~ = w, and that 1/(1 - F ) dG < w. Let U,(t) =
&[&(t) - G(t)], t r 0, n r 1,and regard Un as random elements with values in
9 [ a , w], where a > a~ r a ~Then
. Un + U, where

for a 5 t < w.

7. The condition (15) is not surprising, since it is necessary for the convergence
in distribution of & x estimation error even in the case when G is known. In
this case, the nonparametric maximum likelihood estimator of F is F n ( t ) =
~ = ~ for t 2 0 and n r 1; and it is easily seen that
[Ci:xist ~ / G ( X ~ ) ] / [ Cl/G(xi)]
& [ ~ , ( t ) - ~ ( t )converges
] in distribution for all t > a~ r a~ iff (15) holds.

8. If (15) fails, then other limiting distributions may obtain, Suppose, for
example, that F is continuous, that a~ = 0, and that G = Fc, where 1 < c < w.
+
Let 6 = 1/(1 c). Then n6[Fn(t)- F ( t ) ] has a limiting stable distribution for all
t > 0 for which F(,t) < 1. To see this, fix t and write

Then I II, I 5 maxSst I Cn(s) - C(s) 1 $6 ( l / ~ ) d i n= op(l/&) by Lemma 1 and

DISTRIBUTION FUNCTIONS WITH TRUNCATED DATA 177

( x ~i) = 1, 2, . .
properties of empirical processes. Let zi = ( l / C ( x i ) ) I ( ~ , ~ )for
Then zi, z2, . . . are i.i.d. with common mean A(t). Now, it is easily seen that zi
is in the domain of attraction of a stable distribution with characteristic exponent
y = (1 + c)/c and skewness parameter 1, in Feller's (1966, pages 540-543)
terminology. So, n6[A,(t) - A(t)J has a limiting stable distribution as n + w. (In
fact, the same stable distribution is obtained for all t.) That n6[fin(t)- F ( t )J has
a limiting distribution, now follows from (17) by using a stable distribution to
bound z? + . . . + 22, in (18).

Acknowledgements. Irving Segal introduced me to this problem through

the application to astronomy. Gary Lorden brought the papers by Lynden-Bell
(1971) and Jackson (1974) to my attention. The referees and associate editor
contributed useful comments and criticisms.

REFERENCES
BAROUCH, E. and KAUFMAN, G. M. (1975). Probabilistic modelling of oil and gas discovery. Energy
133-152.
BHATTACHARYA, P. K. (1983). Justification for a K-S type test for the slope of a truncated regression.
Ann. Statist. 11 697-701.
BHATTACHARYA, P. K., CHERNOFF, H. and YANG,S. S. (1983). Nonparametric estimation of the
slope of a truncated regression. Ann. Statist. 11 505-514.
BILLINGSLEY, P. (1968). Convergence of Probability Measures. Wiley, New York.
BREIMAN, L. (1968). Probability. Addison-Wesley, Reading, Massachusetts.
BRESLOW, N. and CROWLEY, J. (1974). A large sample study of the life table and product limit
estimates under random censorship. Ann. Statist. 2 437-453.
FELLER,W. (1966). An Introduction to Probability Theory and its Applications, Vol. 2. Wiley, New
York.
JACKSON, J. C. (1974). The analysis of quasar samples. Mon. Not. R. Astr. Soc. 166 281-295.
KAPLAN, E. L. and MEIER,P. (1958). Nonparametric estimation from incomplete observations. J.
Amer. Statist. Soc. 53 457-481.
KRAMER, M. (1983). Stopping a size dependent exploration process. Ph.D. Thesis, The University of
Michigan.
LYNDEN-BELL, D. (1971). A method of allowing for known observational selection in small samples
applied to 3CR quasars. Mon. Not. R. Astr. Soc. 155 95-118.
NICOLL,J . F. and SEGAL,I. E. (1980). Nonparametric elimination of the observational cutoff bias.
Astron. Astrophys. 82 L3-L6.
SEGAL,I. E. (1975). Observational validation of the chronometric cosmology: I. Preliminaries and
the red shift-magnitude relation. Proc. Nut. Acad. Sci. 7 2 2437-2477.
SKOROHOD, A. V. (1956). Limit theorems for stochastic processes. Theor. Probab. Appl. 1 261-290.
STARR,N. (1974). Optimal and adaptive stopping based on capture times. J. Appl. Probab. 11
294-301.
STARR,N., WARDROP, R. and WOODROOFE, M. (1976). Estimating the mean from delayed observa-
tions. 2. Wahrsch. uerw Gebiete 35 103-113.
VARDI,Y. (1982a). Nonparametric estimation in the presence of length bias. Ann. Statist. 10
616-620.
VARDI,Y. (1982b). Nonparametric estimation in renewal processes. Ann. Statist. 10 772-785.

Stat 450850 Notes 2012
No ratings yet
Stat 450850 Notes 2012
190 pages
Robust Statistics
No ratings yet
Robust Statistics
11 pages
Minimum L - Distance Estimators For Non-Normalized Parametric Models
No ratings yet
Minimum L - Distance Estimators For Non-Normalized Parametric Models
32 pages
Estimating The Support of A High-Dimensional Distribution
No ratings yet
Estimating The Support of A High-Dimensional Distribution
28 pages
Ferrat y 2006
No ratings yet
Ferrat y 2006
30 pages
Huber RobustEstimationLocation 1964
No ratings yet
Huber RobustEstimationLocation 1964
30 pages
STAT 2-2 Test of Hypothesis
No ratings yet
STAT 2-2 Test of Hypothesis
14 pages
Hinkley 1975
No ratings yet
Hinkley 1975
12 pages
David 1985
No ratings yet
David 1985
4 pages
Introduction
No ratings yet
Introduction
11 pages
Kubat 1980
No ratings yet
Kubat 1980
8 pages
18.443 MIT Stats Course
No ratings yet
18.443 MIT Stats Course
139 pages
STAT2102 Chapter6
No ratings yet
STAT2102 Chapter6
5 pages
Restricted Parameter Space Estimation Problems
No ratings yet
Restricted Parameter Space Estimation Problems
171 pages
NIPS 1999 Support Vector Method For Novelty Detection Paper
No ratings yet
NIPS 1999 Support Vector Method For Novelty Detection Paper
7 pages
Lecture Notes - 1
No ratings yet
Lecture Notes - 1
56 pages
X400004 20220215 Solutions
No ratings yet
X400004 20220215 Solutions
8 pages
Dattner and Reiser, Estimation of Distribution Functions in Measurement Error Models (2013)
No ratings yet
Dattner and Reiser, Estimation of Distribution Functions in Measurement Error Models (2013)
15 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
114 pages
Empirical Process (Sara Van de Geer)
No ratings yet
Empirical Process (Sara Van de Geer)
91 pages
RigNotes15 PDF
No ratings yet
RigNotes15 PDF
130 pages
Stat-Review Xid-8243919 1
No ratings yet
Stat-Review Xid-8243919 1
24 pages
Statistics Course Review Notes
No ratings yet
Statistics Course Review Notes
20 pages
Lecture 1
No ratings yet
Lecture 1
8 pages
Empirical Finance1
No ratings yet
Empirical Finance1
31 pages
2009 Paninsky Nonparametric Estimation of Entropy and Distributions
No ratings yet
2009 Paninsky Nonparametric Estimation of Entropy and Distributions
34 pages
1 s2.0 016771529090099S Main
No ratings yet
1 s2.0 016771529090099S Main
8 pages
An Introduction To Classical Statistics
No ratings yet
An Introduction To Classical Statistics
15 pages
Estimations
100% (1)
Estimations
183 pages
Estimation of The Minimum Probability of A Multinomial Distribution
No ratings yet
Estimation of The Minimum Probability of A Multinomial Distribution
19 pages
Maximum Likelihood Notes1
No ratings yet
Maximum Likelihood Notes1
10 pages
Probability Distributions Guide
No ratings yet
Probability Distributions Guide
86 pages
Zhang 1994
No ratings yet
Zhang 1994
22 pages
Lecture Notes
No ratings yet
Lecture Notes
90 pages
Estimating Regression Models of Finite But Unknown Order
No ratings yet
Estimating Regression Models of Finite But Unknown Order
17 pages
Industrial Mathematics Institute: Research Report
No ratings yet
Industrial Mathematics Institute: Research Report
25 pages
Solution 3 Problem 1: Let X
No ratings yet
Solution 3 Problem 1: Let X
12 pages
Applied Robust Statistics 2005 PDF
No ratings yet
Applied Robust Statistics 2005 PDF
532 pages
Applied Robust Statistics
No ratings yet
Applied Robust Statistics
532 pages
Advanced Statistical Methods
No ratings yet
Advanced Statistical Methods
295 pages
Econometrics - Applied Robust Statistic To Regression Analysis
No ratings yet
Econometrics - Applied Robust Statistic To Regression Analysis
534 pages
Intro&NP Stat
No ratings yet
Intro&NP Stat
122 pages
Econometric Theory: Module - Ii
No ratings yet
Econometric Theory: Module - Ii
11 pages
Maximum Likelihood An Introduction: L. Le Cam
No ratings yet
Maximum Likelihood An Introduction: L. Le Cam
31 pages
Advanced Statistics Estimation With Handwritten Solutions
No ratings yet
Advanced Statistics Estimation With Handwritten Solutions
285 pages
Notes For Lectures 1 To 10 - 2024
No ratings yet
Notes For Lectures 1 To 10 - 2024
39 pages
Sta301 Ch.1 To 22 For Grand Quiz
No ratings yet
Sta301 Ch.1 To 22 For Grand Quiz
16 pages
Estimation of Parametric Functions in Downton's
No ratings yet
Estimation of Parametric Functions in Downton's
17 pages
Adv Statistics I
No ratings yet
Adv Statistics I
95 pages
Linear Regression Analysis: Module - Ii
No ratings yet
Linear Regression Analysis: Module - Ii
11 pages
Creel M Econometrics
No ratings yet
Creel M Econometrics
479 pages
Part 2 Estimation
No ratings yet
Part 2 Estimation
72 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
Durrande 2020
No ratings yet
Durrande 2020
90 pages
Statistical Inference Foundations
No ratings yet
Statistical Inference Foundations
89 pages
Asymptotic Theory & Inference Guide
No ratings yet
Asymptotic Theory & Inference Guide
32 pages
Optima Detención en Estudio Secuencial Dependiente de Tamaño
No ratings yet
Optima Detención en Estudio Secuencial Dependiente de Tamaño
19 pages
Genest y Zidek Combinando Distribuciones de Probabilidad
No ratings yet
Genest y Zidek Combinando Distribuciones de Probabilidad
23 pages
1988 Julius - Alicbusan - Politicas de Precios Del Sector Publico
No ratings yet
1988 Julius - Alicbusan - Politicas de Precios Del Sector Publico
128 pages
WP238 Underlying Counterfactual Conditionals in Rawls Justice As Fairness
No ratings yet
WP238 Underlying Counterfactual Conditionals in Rawls Justice As Fairness
19 pages
WP154.Corporate Governance in Argentina New Developments Through 1991-2000
No ratings yet
WP154.Corporate Governance in Argentina New Developments Through 1991-2000
23 pages
Elección Social Revisión e Interpretación
No ratings yet
Elección Social Revisión e Interpretación
87 pages
WP 245 Professor Apreda The Semantics of Governance
No ratings yet
WP 245 Professor Apreda The Semantics of Governance
30 pages
Estructura Del Conflicto de Clases en La Sociedad Capitalista Moderna
No ratings yet
Estructura Del Conflicto de Clases en La Sociedad Capitalista Moderna
25 pages
(1441215851) Guichon 2 1-9-1
No ratings yet
(1441215851) Guichon 2 1-9-1
9 pages
Cec 500 2015 099
No ratings yet
Cec 500 2015 099
182 pages
(Clifford S. Russell, Duane D. Baumann) The Evolut
No ratings yet
(Clifford S. Russell, Duane D. Baumann) The Evolut
320 pages
Gss Project 2025
No ratings yet
Gss Project 2025
55 pages
ML Module
No ratings yet
ML Module
129 pages
6 Sigma Green Belt Roadmap
No ratings yet
6 Sigma Green Belt Roadmap
4 pages
West Kalimantan Unemployment Factors
No ratings yet
West Kalimantan Unemployment Factors
10 pages
W12 Reading Quiz Probability and Confidence Intervals Math For The Real World PDF
No ratings yet
W12 Reading Quiz Probability and Confidence Intervals Math For The Real World PDF
5 pages
Case Study DBM Maths - 3
No ratings yet
Case Study DBM Maths - 3
11 pages
Uma Sekaran
67% (3)
Uma Sekaran
158 pages
Ethiopian Stats Exit Exam Guide
100% (2)
Ethiopian Stats Exit Exam Guide
9 pages
Model For The Prediction of Noise Generated by Fixed Sources
No ratings yet
Model For The Prediction of Noise Generated by Fixed Sources
17 pages
Thesis
No ratings yet
Thesis
31 pages
Group 5 Assignment Edited
No ratings yet
Group 5 Assignment Edited
14 pages
Statistical Analysis of Dataset
No ratings yet
Statistical Analysis of Dataset
2 pages
Mean and Standard Deviation of Grouped Data
No ratings yet
Mean and Standard Deviation of Grouped Data
15 pages
PR1 ATG Final
No ratings yet
PR1 ATG Final
37 pages
Statistics and Probability: Quarter 4 - Module 2 Null and Alternative Hypotheses and Test Statistics
100% (2)
Statistics and Probability: Quarter 4 - Module 2 Null and Alternative Hypotheses and Test Statistics
16 pages
Mco 03
No ratings yet
Mco 03
3 pages
Data Science Unit 5
No ratings yet
Data Science Unit 5
11 pages
Statistics For Managers Using Microsoft® Excel 5th Edition: Numerical Descriptive Measures
No ratings yet
Statistics For Managers Using Microsoft® Excel 5th Edition: Numerical Descriptive Measures
64 pages
Module Handbook Bremerhafen
No ratings yet
Module Handbook Bremerhafen
55 pages
Application of Statistics in Real Life: By: Shrestha Pranay and Shivam Surya Nirwana
No ratings yet
Application of Statistics in Real Life: By: Shrestha Pranay and Shivam Surya Nirwana
21 pages
M Inning
100% (1)
M Inning
146 pages
Developing Graduate Research Proposals and Completing A Graduate Project/Thesis/Dissertation
No ratings yet
Developing Graduate Research Proposals and Completing A Graduate Project/Thesis/Dissertation
10 pages
Fodor S Seoul With Busan Jeju and The Best of Korea Full Color Travel Guide 1st Edition Eileen Cho Instant Download
100% (1)
Fodor S Seoul With Busan Jeju and The Best of Korea Full Color Travel Guide 1st Edition Eileen Cho Instant Download
122 pages
3 Sem Stati
No ratings yet
3 Sem Stati
12 pages
2025 Detailed CMT Program Guide
100% (1)
2025 Detailed CMT Program Guide
41 pages
Statistics For The Life Sciences 5th Global Edition Download Instantly
No ratings yet
Statistics For The Life Sciences 5th Global Edition Download Instantly
322 pages
Strategic Management CH 3 and 4
No ratings yet
Strategic Management CH 3 and 4
82 pages
A Versatile Workflow For Linear Modelling in R
No ratings yet
A Versatile Workflow For Linear Modelling in R
15 pages
(Ebook PDF) Classroom Assessment For Student Learning: Doing It Right - Using It Well 2nd Edition PDF Download
No ratings yet
(Ebook PDF) Classroom Assessment For Student Learning: Doing It Right - Using It Well 2nd Edition PDF Download
128 pages
Supervised
No ratings yet
Supervised
5 pages

Estimando Una Funcion de Distribucion Con Datos Truncados

Uploaded by

Estimando Una Funcion de Distribucion Con Datos Truncados

Uploaded by

Estimating a Distribution Function with Truncated Data

The Annals of Statistics is currently published by Institute of Mathematical Statistics.

ESTIMATING A DISTRIBUTION FUNCTION WITH TRUNCATED

The University of Michigan and Rutgers University

1. Introduction. Consider a finite population 9 whose size N is large, but

Received June 1983; revised March 1984.

for 1 I i I N; here both Xi and Yi are observed if Yi I Xi, and nothing is

2. A Transformation. Let X and Y denote independent, positive random

F,(x) = H,(x, w) and G,(y) = H,(w, y), 0 I X , Y< W,

PROOF.Since Y I X implies X r a~ and Y I b~w.p.1, T (F, G) = T(F0, Go).

Recall that the cumulative hazard function of a distribution function F (with

A(x) = 1' d F ( z ) / [ l - F(z-)], 0 5 x < w.

The cumulative hazard function A uniquely determines the distribution F by

where A,(x) = A(x) - X ( Z ) , 0 5 x < bF.

THEOREM1. Suppose that H , E T ( X ) . Then there is a unique pair

for 0 5 z < a. Since a~ 5 aF, it follows easily that

1' dF,(z)/C(z) = iR: G(z)d ~ ( z ) l a ~ ( z )

COROLLARY 1. Let (F, G) E 2 and let Foand Go be the conditional distribu-

COROLLARY 2. Let To denote the restriction of T to X0.Then To has an

PROOFS.Lemma 1 asserts that (Fo,Go) E X0and T (Fo,Go) = T (F, G); and

REMARKS 1. The inversion formula of Theorem 1 uses only the marginal

Equation (3) suggests estimating F by

where s(y,) = # (k 5 n: yk = yj] for 1 Ij 5 n.

where p l , . . ., p, and ql, . . . , q, are the masses assigned to xl, . . ., x, and

This a disturbing property of the estimators, since it may lead to unreasonable

It is especially interesting that one may estimate a , the probability that

4. Consistency. In this section, F and G denote continuous distribution

fin and Gn computed from the populations 9={I,2, . . - , N ) for N = 1, 2, . . .

PROOF. If h is integrable with respect to A and n r 1, then

Now, the conditional distribution of nCn(xi) - 1 = # ( j5 n: j # i, yj 5 xi 5 X j J

for all i = 1, . . ., n, by an elementary calculation. Since dA = dF,/C, the first

PROOF.Since the distribution function H, remains unchanged when F and

G are replaced by Foand Go by Lemma 1, it suffices to prove the theorem in the

for x r a and n r 1. Then 1 - fin(x) = [ l - f i n ( a ) ] ~ n ( xand

for all x r a and n r 1 by simple algebra. So, it suffices to show that

w.p.1 as n + w for a < x < b ~This

COROLLARY 3 . If F and G are continuous and (F, G) E s o , then

COROLLARY 4. If F and G are continuous and (F, G) E Yo, then 6, -+ cu in

5. If F and G are continuous and (F, G) E Yo, then

PROOFS. Corollary 3 is clear, and the convergence of 6, to cu in Corollary 4

REMARK3. In the astronomy example, improved instrumentation might

REMARK 4. Since the joint distribution H, depends on F and G only through

and p,(s, t ) = H,(s, t ) - F,(s)G,(t), 0 5 s, t 5 03.

= lt& ( X , - Y,) dFz + lt $j X dC

w.p.1 for a 5 t 5 b and n r 1. The processes W,,,, n z 1, are random elements

PROOF. First observe that C = G, - F , is positive and continuous on [a, b],

gence is preserved by such continuous transformations, it suffices to show that

in P,-probability as n -+ w, with Z, = X, - Y,, n r 1. To see this, one may

THEOREM 4. Suppose that F and G are continuous, that (F, G) E X0,and

then X(a)/C(a) + 0 and WY(t) + J & C-2[X dG, - Y dF,] in probability as

6. Convergence at an endpoint. In this section, we suppose that F and

PROOF. That W is well defined follows from Theorem 4. To show that

Wn(a) = 1 a (l/CCn)(Xn - Yn) dFE + 1 a C-l dXn

for a > 0 and n r 1. Given n r 1, IIn(a) is a normalized sum of i.i.d. random

I In(a)I 5 Ja C-l I Xn - Yn I d i n 5 B.,, Ja CT1d i n

for a > 0 and n r 1, where B , , = sup,,, ( X n ( t ) - Yn(t)1. Now B , , --, 0 in

P,-probability as n + w and then a J 0; and, by Lemma 2, En{$o C-l d i n ] 5

(17) zn(t) = - $n(t)l J-{

where B, = max(t;?: xi 5 b) and t,i, 1 5 i 5 n, are intermediate points as in

REMARKS5. By Corollary 5, Theorems 2, 3, and 5 are valid if fin is replaced

6. There is a dual to Theorem 5. Suppose that F and G are continuous, that

Then I II, I 5 maxSst I Cn(s) - C(s) 1 $6 ( l / ~ ) d i n= op(l/&) by Lemma 1 and

Acknowledgements. Irving Segal introduced me to this problem through

You might also like