0% found this document useful (0 votes)
595 views35 pages

(Cox (1972) ) Regression Models and Life Tables PDF

This document discusses regression models and life tables. It introduces hazard functions and survivor functions to describe failure rates over time. The product-limit method is presented as a way to estimate survival distributions from censored lifetime data. Generalizations are outlined, including incorporating regression arguments into life-table analysis to allow for comparisons between groups and investigation of relationships between failure times and explanatory variables. Asymptotic properties and applications to reliability studies, medical statistics, and other fields are discussed.

Uploaded by

Az- Zahra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
595 views35 pages

(Cox (1972) ) Regression Models and Life Tables PDF

This document discusses regression models and life tables. It introduces hazard functions and survivor functions to describe failure rates over time. The product-limit method is presented as a way to estimate survival distributions from censored lifetime data. Generalizations are outlined, including incorporating regression arguments into life-table analysis to allow for comparisons between groups and investigation of relationships between failure times and explanatory variables. Asymptotic properties and applications to reliability studies, medical statistics, and other fields are discussed.

Uploaded by

Az- Zahra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Regression Models and Life-Tables

Author(s): D. R. Cox
Reviewed work(s):
Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 34, No. 2
(1972), pp. 187-220
Published by: Wiley for the Royal Statistical Society
Stable URL: http://www.jstor.org/stable/2985181 .
Accessed: 12/03/2013 03:25

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].

Wiley and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to
Journal of the Royal Statistical Society. Series B (Methodological).

http://www.jstor.org

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
1972] 187

RegressionModels and Life-Tables

BY D. R. Cox
ImperialCollege,London
theROYAL STATISTICALSOCIETY,at a meeting
[Readbefore bythe
organized
ResearchSection,on Wednesday,March8th,1972,Mr M. J. R. HEALY in theChair]

SUMMARY
The analysisof censoredfailuretimesis considered.It is assumedthaton
each individualare availablevaluesof one or moreexplanatory variables.
The hazardfunction failurerate)is takento be a function
(age-specific of
the explanatory variablesand unknownregression multiplied
coefficients
byan arbitraryand unknownfunction of time. A conditionallikelihoodis
obtained,leadingto inferences about theunknownregression coefficients.
Somegeneralizations are outlined.
Keywords: LIFE TABLE; HAZARD FUNCTION; AGE-SPECIFIC FAILURE RATE; PRODUCT
LIMIT ESTIMATE; REGRESSION; CONDITIONAL INFERENCE; ASYMPTOTIC THEORY;
CENSORED DATA; TWO-SAMPLE RANK TESTS; MEDICAL APPLICATIONS; RELIABILITY
THEORY; ACCELERATED LIFE TESTS.
1. INTRODUCTION
used by
LIFE tables are one of the oldest statisticaltechniquesand are extensively
medical statisticiansand by actuaries. Yet relativelylittlehas been writtenabout
theirmoreformalstatisticaltheory.Kaplan and Meier (1958) gave a comprehensive
reviewof earlierworkand manynew results. Chiang in a seriesof papers has, in
particular,exploredthe connectionwith birth-deathprocesses; see, for example,
Chiang (1968). The presentpaper is largelyconcernedwith the extensionof the
resultsof Kaplan and Meier to the comparisonof lifetables and moregenerallyto
theincorporation ofregression-likearguments intolife-tableanalysis.The arguments
are asymptoticbut are relevantto situationswherethe samplingfluctuationsare
large enough to be of practicalimportance.In otherwords,the applicationsare
more likelyto be in industrialreliabilitystudiesand in medical statisticsthan in
actuarial science. The proceduresproposed are, especiallyfor the two-sample
problem,closelyrelatedto proceduresforcombiningcontingency tables; see Mantel
and Haenzel (1959), Mantel (1963) and, especiallyforthe applicationto lifetables,
Mantel (1966). There is also a strongconnectionwitha paper read recentlyto the
Societyby R. and J. Peto (1972).
We considera populationof individuals;for each individualwe observeeither
the timeto "failure"or the timeto "loss" or censoring.That is, for the censored
individualswe know onlythatthe timeto failureis greaterthanthecensoringtime.
Denote by T a randomvariablerepresenting failuretime; it may be discreteor
continuous.Let F(t) be thesurvivorfunction,
,(t) = pr (T> t)
and let A(t)be thehazard or age-specific
failurerate. That is,

(t) =Alim pr(t AT<t+ At|t<T)


At--O+ At

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
188 Cox - RegressionModels andLife Tables [No. 2,

Note thatif T is discrete,then


A(t)= E A. 8(t- ui), (2)
whereS(t) denotestheDirac deltafunctionand At= pr(T = tIT) t). By theproduct
law of probabilityF(t) is givenby theproductintegral
t- r-1
( {- A(u)du}= lim {1- A(Tk) (Tk+l - Tk)}, (3)
u=O k=O
thelimitbeingtakenas all Tk+1- Tk tendto zerowith0 = mo
< ri < ... < Tr7- < Sr =
If A(t)is integrable
thisis

exp - A(u) du} (4)


whereas
ifA(t)is givenby(2), theproduct is
integral
nG(- Aui). (5)
uj<t
Ifthedistribution
has bothdiscrete components
andcontinuous theproduct
integral
is a productoffactors
(4) and(5).
2. THE PRODUCT-LIMIT METHOD
Supposeobservations are availableon noindependent individualsand,to begin
with,thatthefailuretimesareidentically distributed intheformspecifiedinSection1.
Letn individuals be observed to failureand therestbe censored.The ratherstrong
assumption willbe madethroughout thattheonlyinformation availableaboutthe
failuretimeof a censoredindividual is thatit exceedsthecensoring time. This
assumption is testable
onlyifsuitable supplementary is available.Denote
information
thedistinctfailuretimesby
t(l) < t(2)< ... < t(k)- (6)
Further letm(i)be thenumber offailuretimesequal to t(j),themultiplicity of t(j);
ofcourseE m(f)= n,andin thecontinuous casek = n,m(f) = 1.
Thesetofindividuals at riskat timet-0 is calledtherisksetat timetanddenoted
R(t); thisconsistsof thoseindividuals whosefailureor censoring timeis at leastt.
Letr(i)be thenumber ofsuchindividuals fort = t(j). The product-limitestimate of
theunderlying distribution
is obtainedbytakingestimated conditionalprobabilities
thatagreeexactly withtheobserved conditional Thatis,
frequencies.

7()=E-bttf (7)
i=l r(i)
Correspondingly,
t-o ( m.
J(t)= u {I- A(u)du}= 1I7I j1-i (i (8)
aO t(i)K
For uncensored data thisis theusual samplesurvivor function; someof the
asymptotic propertiesof (8) are givenby Kaplan and Meier(1958)and by Efron
(1967)andcanbe usedtoadapttothecensored casetestsbasedon samplecumulative
distribution
function.
The functions(7) and (8) are maximum-likelihood estimates in thefamily
ofall
possibledistributions
(KaplanandMeier,1958).However, as intheuncensoredcase,

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
1972] Cox - RegressionModels and Life Tables 189

this propertyis of limitedimportanceand the best justificationis essentially(7).


The estimates involvinga very"irregular"
probablyalso havea Bayesianinterpretation
prior.
is restricted,
If the class of distributions or by some such
eitherparametrically
conditionas requiringA(t) to be monotonicor smooth,the maximum-likelihood
estimateswill be changed. For themonotonehazard case withuncensoreddata, see
Grenander(1956). The smoothingof estimatedhazardfunctions has beenconsidered
by Watsonand Leadbetter(1964a, b) fortheuncensoredcase.

3. REGRESSION MODELS
Suppose now that on each individualone or more furthermeasurementsare
available, say on variablesz1,..., z.. We deal firstwith the notationallysimpler
case when the failure-times are continuouslydistributedand the possibilityof ties
can be ignored. For thejth individuallet thevalues of z be zj = (z11,..., zp). The z's
may be functionsof time. The main problemconsideredin this paper is that of
assessingtherelationbetweenthedistribution offailuretimeand z. This willbe done
in termsof a modelin whichthehazard is
A(t;z) = exp(z,) AO(t), (9)
where, is a p x 1 vectorof unknownparametersand Ao(t)is an unknownfunction
givingthehazard functionforthe standardset of conditionsz = 0. In fact(z4) can
be replacedby any knownfunctionh(z,,), but thisextragenerality is not neededat
thisstage. The followingexamplesillustrate just a fewpossibilities.
Example1. Two-sample problem.Suppose thatthereis just one z variable,p = 1,
and thatthistakes values 0 and 1, beingan indicatorvariableforthe two samples.
Then accordingto (9) thehazardsin samples0 and 1 are respectively Ao(t)and /AO(t),
wherei = efl. In the continuouscase the survivorfunctionsare related(Lehmann,
1953) by Fj(t) = {A0(t)}*. Thereis an obviousextensionforthek sampleproblem.
Example2. The two-sample problem;extendedtreatment. We can deal withmore
complicatedrelationships betweenthetwo samplesthanare contemplated in Example
1 by introducingadditionaltime-dependent componentsinto z. Thus if Z2= tzl,
wherez1 is thebinaryvariableof Example 1, thehazard in the secondsampleis
A0(t).
Ve#2t (10)
Of course in definingZ2,t could be replacedby any known function of t; further,
severalnew variablescould be introducedinvolvingdifferentfunctionsof t. This
providesone way of examiningconsistencywith a simple model of proportional
it is convenientto
hazards. In fittingthe model and oftenalso in interpretation
reparametrize(10) in theform
p exp{12(t - t*)}, (11)
wheret* is any convenientconstanttime somewherenear the overallmean. This
will avoid themoreextremenon-orthogonalities of fitting.All thepointsconnected
withthisexampleextendto thecomparisonof severalsamples.
Example 3. Two-sample problemwithcovariate.By introducing into the models
of Examples 1 and 2 one or more furtherz variables representing concomitant
variables,it is possibleto examinetherelationbetweentwo samplesadjustingforthe
presenceof concomitantvariables.

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
190 Cox - RegressionModels and Life Tables [No. 2,

Example 4. Regression. The connectionbetween failure-timeand regressor


variablescan be exploredin an obvious way. Note especiallythat by introducing
functionsof t,effects
otherthanconstantmultiplication
ofthehazardcan be included.
4. ANALYSIS OF REGRESSION MODELS
There are severalapproachesto the anialysisof the above models. The simplest
is to assume Ao(t)constant,i.e. to assume an underlying exponentialdistribution;
see, for example,Chernoff(1962) for some models of this typein the contextof
acceleratedlifetests. The nextsimplestis to take a two-parameter familyof hazard
functions,such as the power law associated with the Weibull distributionor the
exponentialof a linearfunctionof t. Then standardmethodssuch as maximum
likelihoodcan be used; to be rigorousextensionof theusual conditionsformaximum-
likelihoodformulaeand theorywould be involvedto cover censoring,but there
is littledoubt that some such justificationcould be given. This is in many ways
the most naturalapproach but will not be exploredfurtherin the presentpaper.
In thisapproacha computationally desirablefeatureis thatbothprobabilitydensity
and survivorfunctionare fairlyeasilyfound. A simpleformforthe hazard is not
by itselfparticularlyadvantageous,and modelsotherthan(9) may be morenatural.
For a normal theorymaximum-likelihood analysis of factorialexperimentswith
censored observations,see Sampfordand Taylor (1959), and for the parametric
analysisof responsetimesin bioassay,see, Sampford(1954).
Alternativelywe mayrestrictAo(t)qualitatively,forexampleby assumingit to be
monotonicor to be a step function(a suggestionof ProfessorJ. W. Tukey). The
latterpossibilityis relatedto a simple spline approximationto the log survivor
function.
In thepresentpaperwe shall,however,concentrate on exploringtheconsequence
of allowingAO(t)to be arbitrary, main interestbeingin the regressionparameters.
That is, we requireour methodof analysisto have sensiblepropertieswhateverthe
formofthenuisancefunctionAO(t).Now thisis a severerequirement and unnecessary
in the sensethatan assumptionof some smoothnessin the distribution Go(t) would
be reasonable. The situationis parallelto thatarisingin simplerproblemswhen a
nuisanceparameteris regardedas completelyunknown. It seems plausiblein the
presentcase thattheloss of information about ,3 arisingfromleavingAo(t)arbitrary
is usuallyslight;if thisis indeed so the procedurediscussedhere is justifiableas a
reasonablycautiousapproachto thestudyof P3.A majoroutstanding problemis the
analysisof the relativeefficiency of inferencesabout P3undervarious assumptions
about Ao(t).
The generalattitudetaken is that parametrization of the dependenceon z is
requiredso thatour conclusionsabout thatdependenceare expressedconcisely;of
courseany formtakenis provisionaland needsexaminationin thelightof the data.
So faras the secondaryfeaturesof the systemare concerned,however,it is sensible
to make a minimumof assumptionsleadingto a convenientanalysis,providedthat
no majorloss of efficiency is involved.

5. A CONDITIONAL LIKELIHOOD
Suppose thenthat Ao(t)is arbitrary.No information can be contributedabout
,3 by timeintervalsin whichno failuresoccur because the componentAo(t)might
conceivablybe identicallyzero in such intervals.We therefore argue conditionally
on theset{t(i)} of instantsat whichfailuresoccur; in discretetimewe shallcondition

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
1972] Cox - RegressionModels and Life Tables 191

also on the observedmultiplicities{m(i)}. Once we requirea methodof analysis


holdingforall AO(t),
considerationof thisconditionaldistribution seemsinevitable.
For the particularfailureat time t(i), conditionallyon the risk set W(t(i)),the
probabilitythatthefailureis on theindividualas observedis

exp{z() P}/Eexp{z( 13}. (12)


kgf(tmi)
Each failurecontributesa factorof thisnatureand hence the requiredconditional
log likelihoodis
k k
L(O) = Z(i)p- Elog E exp{z(1)P} .(13)
i=1 i=1 IC-Mtmo)
Directcalculationfrom(13) givesfor I, = 1,. .p
M k
UO(l)= g) ) E{zi) - A(6()} (14)
where

A z61exp(z1 (3)

the sum being over leg(t(*)). That is, A(i)(P) is the average of z6 over the finite
populationW(t(i)),usingan "exponentially weighted"formof sampling. Similarly
k
a32L(P) (16)
(6
_16)= -
C(6,o(p),
where

Cq>),i3) = {E z z1 exp(z1,)/E exp(z1f3)}-


A(6i)(3)A(qi)(p) (17)
is thecovarianceof z6 and zv in thisformof weightedsampling.
To calculatetheexpectedvalue of (16) it would be necessaryto knowthetimesat
whichindividualswho failedwould have been censoredhad theynot failed. This
informationwould oftennot be available and in any case mightwell be thought
irrelevant;thispointis connectedwithdifficulties of conditionality at thebasis of a
samplingtheoryapproach to statistics(Pratt,1962). Here we shall use asymptotic
arguments in which(16) can be used directly fortheestimationof variances,f3being
replacedby a suitableestimate.For a rigorousjustification, assumptionsabout the
censoringtimesgeneralizingthose of Breslow(1970) would be required. It would
notbe satisfactory to assumethatthecensoringtimesare randomvariablesdistributed
independently of the z's. For instancein the two-sampleproblemcensoringmight
be muchmoreseverein one samplethanin the other.
Maximum-likelihood estimatesof ,3 can be obtainedby iterativeuse of (14) and
(16) in the usual way. Significancetestsabout subsetsof parameterscan be derived
in variousways,forexamplebycomparisonofthemaximumlog likelihoodsachieved.
Relativelysimpleresultscan, however,be obtainedfortestingthe global null hypo-
thesis,f3= 0. For thiswe treatU(0) as asymptotically normalwithzero meanvector
and withcovariancematrixf(0). That is, the statistic

{U(0)}T {f(0)} 1{U(0)} (18)

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
192 Cox - RegressionModels andLife Tables [No. 2,

has, underthenullhypothesis, an asymptotic


chi-squareddistribution
withp degrees
of freedom.
We have from(14) and (15) that
k
Ug(O) = (z(gi) - A q)), (19)

whereA(gi)-A(gi)(0) is themean of zgover$9(t(j)). Further,from(16),


k
-"(0) = c q(), (20)

whereC(gj) =Cq(,j)(O) is the covarianceof zg and zl,in the finitepopulationM(t(i)).


The formof weightedsamplingassociatedwithgeneral,3 has reducedto random
samplingwithoutreplacement.

6. ANALYSIS IN DISCRETETIME
Unfortunatelyit is quite likelyin applicationsthatthedata will be recordedin a
forminvolvingties. If theseare smallin numbera relatively ad hoc modification of
theabove procedureswill be satisfactory.To coverthepossibilityof an appreciable
numberof ties,we generalize(9) formallyto discretetimeby

A(; z)dt exp(Z_) AO(t_dt. (21)


- A(t; I1-A0(t)dt'21
In the continuouscase this reducesto (9); in discretetime A(t;z) dt is a non-zero
probabilityand (21) is a logisticmodel.
The typicalcontribution (12) to thelikelihoodnow becomes

exp{S(j) P} E exp{sq) P}, (22)


IC-ff(to;mus))
wheres(i) is the sum of z over the individualsfailingat t(j) and the notationin the
denominatormeans that the sum is taken over all distinctsets of m(q)individuals
drawnfromR(t(j)).
Thus thefullconditionallog likelihoodis
k k

il
SS() P- i-1
log EY
eflt(No;m(o))
exp{s(l) .

The derivatives
can be calculatedas before. In particular,
k
U6(0)= i=1
Y,{s(gi)-m(i)Aq)}, (23)

k -m
= m?(i{r(j)-1} (i ) (24)

Note that(24) givesthe exact covariancematrixwhenthe observationsz(gi)and


thetotalss(a) aredrawnrandomlywithoutreplacement fromthefixedfinitepopulations
'q(t(l)),.-, A(t(k)). In fact,however,thepopulationat one timeis influenced
by the
outcomesof the"trials"at previoustimes.

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
1972] Cox - RegressionModelsand Life Tables 193

7. THE TWO-SAMPLE PROBLEM


considerthetwo-sampleproblemwiththeproportionalhazard
As an illustration,
modelofSection3, Example1. Herep = 1 and we omitthefirstsuffix
on theindicator
variable. Then
k
U(0) =n1- m(i)A(), (25)
i=1

.1(0) = E (i{{ (f
(i}A(j){l-A(i)}, >_ (26)
1}
i1 {r(j) - 1}
whereA(i) is theproportionof theriskpopulationR(t()) thathave z = 1, i.e. belong
to sample 1, and n1is the totalnumberof failuresin sample 1. An asymptotictwo-
sampletestis thusobtainedby treating
U(0)/jO(0) (27)
as havinga standardnormaldistribution underthenull hypothesis.This is different
fromthe procedureof Gehan who adapted the Wilcoxon test to censored data
(Gehan, 1965; Efron,1967; Breslow,1970). The testhas been consideredin some
detailby Peto and Peto (1972).
The test(27) is formallyidenticalwiththatobtainedby settingup at each failure
point a 2 x 2 contingency table (sample 1, sample 2) (failed,survived).To testfor
the presenceof a difference betweenthe two samples the informationfromthe
separatetables can thenbe combined(Cochran, 1954; Mantel and Haenzel, 1959;
Mantel,1963). The applicationof thisto lifetablesis discussedespeciallyby Mantel
(1966). Note, however,thatwhereasthe testin the contingency table situationis,
at leastin principle,exact,thetesthereis onlyasymptotic, because of thedifficulties
associatedwithspecification of the stoppingrule. Formallythe same testwas given
by Cox (1959) fora different life-tableproblemwherethereis a singlesamplewith
twotypesof failureand thehypothesis undertestconcernstheproportionality of the
hazardfunctionforthetwo types.
Whenthereis a non-zerovalueof3, the"weighted"averageofa singleobservation
fromtheriskpopulationR(t()) is

1-A(X)efi
+ A(o)
equation U($) = 0 gives,when all failuretimes are
and the maximum-likelihood
distinct,
kc
k
efA(j)
AA = n1. (29)
i=i 1-A(,)+efA(A)
If : is thoughtto be close to some knownconstant,it maybe usefulto linearize(29).
$
In particular,if is small,we have as an approximationto themaximum-likelihood
estimate
Po =(n - E A(j))/E A(j){1 - A(i
The proceduresof this sectioninvolveonly the rankeddata, i.e. are unaffected
by an arbitrary
monotonictransformation of thetimescale. Indeed thesame is true
foranyoftheresultsin Section4 providedthatthez's are notfunctionsoftime. While
8

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
194 Cox - RegressionModels and Life Tables [No. 2,

theconnectionwiththetheoryof ranktestswillnot be exploredin detail,it is worth


examiningtheformof thetest(27) foruncensoreddata withall failuretimesdistinct.
For this,let thefailuretimesin sample 1 have ranksc1< c2< ... < cn in the ranking
of the full data. At the ith largestobservedfailuretime,individualswith ranks
n,n-1, ..., i are at risk,so that
1 n,
A(i) = n + l H(c1-i), (30)

whereH(x) is theunitHeavisidefunction,

(0 (x<0)q (31)

Thus, by (25),
nL c 1
U(O) = nj - E tE1 -
I=i=l=n - i+ 1
n
= n1- Zenc (32)
1=1

wheree's are the expectedvalues of the orderstatisticsin a randomsample of size


n froma unitexponentialdistribution.The testbased on (32) is asymptotically fully
for the comparisonof two exponentialdistributions
efficient (Savage, 1956; Cox,
1964). Further,by (26),
n, njL

10'(0) =Eencl- ( 1 + 2n - 21) VnCq (33)


1=1 1=1

where
Cj 1
v -l 1 (34)
is thevarianceof an exponentialorderstatistic.
Here theteststatisticis, underthenullhypothesis,a constantminusthetotalof a
random sample of size n1 drawn withoutreplacementfromthe finitepopulation
.en1. ., enn}. The exact distributioncan in principlebe obtainedand in particular
it can be shownthat
=nl(n - nl) (n- enn)
E{U(O)} = 0, var{U(0)}= (35)
n(n-1(5
There is not much point in this case in using the more complicatedasymptotic
formula(33), especiallyas fairlysimplemorerefined approximations to thedistribution
that
of theteststatisticare available (Cox, 1964). It can easilybe verified
E{f(0)} - var{U(0)}. (36)

8. ESTIMATION OF DISTRIBUTION OF FAILURE-TIME


Once we have obtainedthe maximum-likelihood estimateof 3, we can consider
associatedwiththehazard (10) eitherforz = 0, or
theestimationof thedistribution
for some othergivenvalue of z. Thus to estimateA0Q)we need to generalize(7).

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
1972] Cox - RegressionModels and Life Tables 195

To do thiswe take Ao(t)to be identically


zero,exceptat thepointswherefailureshave
occurred,and carryout a separate maximum-likelihood estimationat each such
failurepoint. For the latterit is convenientto writethe contributionto AO(t)at t(j)
in theform

7T(i)exp(- PZ(v)) (
1-iT(T) + 7T(i) exp(- ))
where1(i) is an arbitrary
constantto be chosen; it is usefulto take 1(i) as approxi-
matelythe mean in the relevantriskset. The maximum-likelihood estimateof '7T
can thenbe shownto satisfy
A
- 1())} - 1
A=M(i)
() r()
AT(i)(1 _7T
r() jR(tm)
1-Ar exp{P(zj
+ exp -z ?
A (37)

whichcan be solvedbyiteration.The suggestedchoiceof 2(i) is designedto makethe


second termin (37) small. Note thatin the single-sample case, the second termis
identically
zero. Once (37) is solvedforall i, we have by theproductintegralformula

=i<t ( - exp( -
7rT+7T() Z( ))) (38)
t(i < 1- 1T 7Ti)exp(- 0()
For an estimateat a givennon-zeroz, replaceexp(- lz(f)) by exp{f(z-
Alternativesimplerprocedureswould be worthhaving(Mantel, 1966).

9. BIVARIATE LIFE TABLES


We now considerbriefly theextensionoflife-table
arguments
to multivariate
data.
Suppose for simplicity that thereare two typesof failuretimefor each individual
representedby randomvariablesT1and T2. For instance,thesemightbe thefailure-
timesof two different but associatedcomponents;observationsmay be censoredon
neither,one or bothcomponents.For analogousproblemsin bioassay,see Sampford
(1952).
The joint distributioncan be described in terms of hazard functions
A10(t), u), where
A21(tu), A12(tI
A20(t),
+tt-,Tlt1T
= i
Avo(t) rQ/T (P =, 2),A
At-O+ At

(t|ju) = lim pr(t < T2< t+A tt < T2,T1= u) (<(39)


At_O+At
witha similardefinition Iu). It is easilyshownthatthebivariateprobability
forA12(t
densityfunctionf(tl,t2) is givenby

f(tl, t2) = exp [-f 0


{A10(u) du-
+ A20(u)} A21(uI
t~~~~~~1+?
tl) du] A10(tl) tl), (40)
A21(t21

for t2> tl, withagain an analogous expressionfor t2? tl. It is fairlyeasy to show
formallythat a necessaryand sufficient conditionfor the independenceof T1 and
T2 is

A12(tu)= A10(t),A21(t
u)= 20(t), (41)

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
196 Cox - RegressionModels andLife Tables [No. 2,

as is obvious on generalgrounds. Note also that if Y(tl, t2) is thejoint survivor


function

u,) / (t, u)
Yi()
= 1 [DY(t,] AU(I)
(t[
-(t, t) at J 12(t I
D a
tDau au(2
(42)

Dependenceon further variablesz can be indicatedin thesame way as for(11). The


simplestmodelwouldhavethesamefunction ofz multiplyingall fourhazardfunctions,
althoughthisrestrictionis not essential.
Estimationand testingwould in principleproceedas before,althoughgrouping
of theconditioningu variableseemsnecessaryin thepartsof theanalysisconcerning
thefunction
A12(tI u).
u) and A21(tI
Furthergeneralizationswhichwillnot,however,be exploredhereare to problems
in multidimensionaltimeand to problemsconnectedwithpointprocesses(Cox and
Lewis, 1972; Cox, 1972).

10. AN EXAMPLE
To illustratesome of the above results,it is convenientto take data of Freireich
etal. used by Gehan (1965) and severalsubsequentauthors.Table 1 givestheordered
timesfor two samples of individuals;censoredvalues are denotedwith asterisks.
Table 2 outlinesthe calculationof the simpletest statisticU(0) and its asymptotic
variance. The failureinstantsand theirmultiplicities m(q)are listed;A(s) is the pro-
portionof therelevantriskpopulationin sample 1.

TABLE 1

(weeks)of leukemiapatients
Timesof remission
(Gehan, 1965,fromFreireichet al.)

Sample 0 (drug 6-MP) 6*, 6, 6, 6, 7, 9*, 10*, 10, 11*, 13, 16, 17*, 19*, 20*, 22, 23, 25*,
32*, 32*, 34*, 35*

Sample 1 (control) 1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12, 15, 17, 22, 23

* Censored.

The value of U(0) = n - E m() A(s) is 10-25withan asymptoticstandarderror


V/f(0)of 2 50. The criticalratio of juistover 4 compareswith about 3X6for the
generalizedWilcoxon test of Gehan (1965). The overwhelming of the
significance
is in line withone's qualitativeimpressionof thedata.
difference
The techniqueused to find, was directcomputationof the log likelihoodas a
functionof : and of a further parametery to be describedin a moment.This,while
not thebest way of gettingmaximum-likelihood estimateson theirown,is usefulin
enablingvarious approximatetestsand confidenceregionsto be foundin a unified
manner.
To examinepossibledeparturesfromthe simplemodel of proportionalhazards,
theprocedureof Example 2 of Section3 was followed,takingas in (11) the hazard
in sample 1 to be a time-dependent multipleof thatin sample0 of theform
exp{g + y(t- 10)} Ao(t); (43)

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
1972] Cox - RegressionModels and Life Tables 197

thearbitrary constant10is insertedto achieveapproximateorthogonality


ofestimation
of thetwo parameters, beingchosenas a convenientvalue in thecentreof therange.
A testof theglobal null hypothesis = y = 0 could be done via the teststatistic
(20) but is not veryrelevanthere. Instead the log likelihood(15) was computed

TABLE 2
Main quantities
for thetestof thenullhypothesis
for thedata of Table 1

"Failure"time Riskpopulation
Multiplicity
Sample0 Sample1 No. in No. in r(i) A(i) m(i)
sample0 sample1

23 23 6 1 7 0-1429 2
22 22 7 2 9 0-2222 2
17 10 3 13 0-2308 1
16 11 3 14 0-2143 1
15 11 4 15 0-2667 1
13 12 4 16 0-2500 1
12, 12 12 6 18 0 3333 2
11, 11 13 8 21 0-3810 2
10 15 8 23 0-3478 1
8,8,8,8 16 12 28 0-4286 4
7 17 12 29 0-4138 1
6,6,6 21 12 33 0-3636 3
5,5 21 14 35 0-4000 2
4,4 21 16 37 0-4324 2
3 21 17 38 0 4474 1
2,2 21 19 40 0 4750 2
1,1 21 21 42 0-5000 2

U(O) = n1- Im(j) A(2) = 10-25;

f(0) - :g _(X) -m(is} A(j){ - A(I}


{r(i)_ = 6.2570.

directlyfor a grid of pointsin the (3, y) plane. Note thatin (15) the firsttermis
21 -28y; forinstance,the coefficient -28 is the sum of the values (t- 10) over the
individualsin sample 1. The logarithmic secondtermis simpleforthosetimepoints
at whichthereis a singlecompletedtime,m(t)= 1; forexamplecorresponding to the
time7 thereis a termin thelog likelihood
- log(17+ 12ef-37),
the risk set at this time consistingof 17 individualsfromsample 0 and 12 from
sample1. For pointsofhighermultiplicity, thesituationis morecomplicated,because
all possible samples of size m(t)fromthe risk populationhave to be considered;
fortunately all the sampleshave the same totalsof the two relevantvariables. For
example,forthepoint6, of multiplicity 3, we have to considerthetotalof all samples
of size 3 drawnfromtherelevantriskpopulationand thisleads to a term

-log (2) + (21) (1 e- + (21) (12) e2 38r+ (1) e3ff12r) (44)

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
198 Cox - RegressionModels andLife Tables [No. 2,

To avoid undulylargenumbers,it mightoftenbe convenientto divideeach termin


thelogarithmby a suitableconstant,but thiswas not done in thepresentcase.
The maximum-likelihood estimateof / wheny = 0 is $ = 1 65. Thus the ratio
of the hazards is estimatedas efi= 5 21; if the distributions
were exponential,this
wouldbe theratioofmeans. Confidencelimitsfor3,subjectto y = 0, can be obtained
eitherby computingthe second derivative>f(A) or directlyfromthe log likelihood.
Withthelattermethod,approximate95 percentconfidence limitsfor : of (0.78,2.60)
are obtainedfromthosevaluesforwhichthelog likelihoodis within1 x 1.962 = 1 92
of its maximumvalue. An alternativetestof the null hypothesis: = 0 is obtained
bycomparingthelog likelihoodat : = 0 and : = P; thedifference of7-43corresponds
to chi-squaredof 14-9 and hence to a standardizeddeviateof 3-86,in reasonable
agreementwithtestbased on U(0).
The inclusionof the extraparametery providesa test of the adequacy of the
assumptionof simplyrelatedhazards. In facttheadditionallog likelihoodachieved
by the extraparameter,about 0 01, is small, even suspiciouslysmall. Confidence
limitsfory are, at the 95 per centlevel,approximately-0-12 and 0-14. Thus any
markeddeparturefromthe proportionalhazard model is not likelyto be a smooth
monotonicchangewitht. Furtherdetailsof thelikelihoodfunctionwillnot be given
here. It is, however,quadraticto a close approximationand the particularpara-
metrization chosenachievedalmostexactorthogonality.
Finally,we considergraphicaltechniques,which are likelyto be particularly
usefulfor data more extensivethan the presentset. A firststep is to obtain un-
conditionalestimatesof the separatesurvivorfunctionsby (8). For sample 1 this
givestheordinarysamplesurvivorfunction, therebeingno censoring.For sample0,
we gettheproductlimitestimate.Now considerestimationof thesurvivorfunctions
under the model of proportionalhazards; the constrainedmaximum-likelihood
estimatesof the survivorfunctionsin the two samplesare givenby (37) and (38).
Iterativesolution of the 17 equations of the form(37) took in all 1 sec. on the
CDC 6600; z was chosenseparatelyforeach riskset so thateA equalled themean of
eflzovertheriskset in question.
Fig. 1 shows the fourestimatedfunctions.Discrepancywiththe model of pro-
portionalhazards would be shownby clear departuresof the conditionalfromthe
unconstrained survivorcurves. More elaborateversionsof thisanalysisare certainly
possible,in which,for instance,plots are made on a non-linearscale, or in which
residualsfromtheconstrainedfitare formed,or in whichtheanalysisis presentedin
tabulatedratherthangraphicalform.The graphicalanalysisconfirms theconsistency
of thedata witha model of proportionalhazards.
Only a verybriefnote will be added here about alternativeapproachesto the
analysis. If exponential distributionsare assumed the relevant statisticsare
the total periods at risk,namely359 weeks and 182 weeks,and the total numbers
of failures9 and 22 respectively.Approximate95 per cent confidencelimitsfor
the log ratio of means can be obtainedvia the F distribution with(18,44) degrees
of freedom.Theyare 0-83and 2-43,as comparedwith0-78and 2-60fromtheearlier
analysis.
An analysiswitha stepfunction forA0(.)is barelyfeasiblewiththelimitedamount
of data available. The procedureis to dividethe timescale into cells,forinstance
0-10 weeksand 11-20 weeks. Numbersof failuresand periodsat riskare calculated
foreach cell and henceratiosof ratesderived. Providedtheyare consistentforthe

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
1972] Cox - RegressionModels and Life Tables 199

cellstheratioscan thenbe combined


different intoa singlesummary with
statistic
examplethisapproachdoesnotlead to essentially
limits.In thepresent
confidence
conclusions.
different

. x (i)
survivor X
function

08 00
x~~

x
06 -

'x
x

04 X

x
x

x
02 - X
x
x

x~~

I0 20 30
remissiontime (weeks)- .

1. Empiricalsurvivorfunctionsfor data of Table 1. Productlimitestimate,


FIG.
- -sample 0 (6-MP); , sample 1 (control). Estimateconstrainedby
estimatesare
?, sample 0; x , sample 1. For clarity,the constrained
proportionality:
horizontallines.
indicatedbytheleftendsof thedefining

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
200 Cox - RegressionModels andLife Tables [No. 2,

is theuse of theWeibulldistribution.If we assumea common


A thirdpossibility
indexin the two sampleswe may fitby maximum-likelihood functions
distribution
in theform
1-exp {- (pX/K)v}, 1-exp {- (KpX)v}.

The maximum-likelihood estimateof the index is vI= 13 and the maximizedlog-


different
likelihoodsshow thatthisis just significantly fromv = 1I0 at the 5 per cent
level. The explanationof thedepartureprobablylies largelyin thedeficiency of small
failuretimesin sample 0. Fittingof different indexesforthe two sampleshas not
beenattempted.Approximate95 percentconfidence limitsforthelog ratioof means
can be derivedin theusual wayfromthemaximizedlog likelihoodsand are 0-71and
2 10; themaximum-likelihood estimateis log(K2)= 1-31.
The data have been analysedin some detail to illustratea numberof relevant
points. Many applicationsare likelyto be more complicatedpartlybecause of
largersample sizes and partlybecause of the presenceof a numberof explanatory
variables.

11. PHYSICAL INTERPRETATION OF MODEL


The model(9), whichis thebasis of thispaper,is intendedas a representation of
the behaviourof failure-time thatis convenient,flexibleand yet entirelyempirical.
One of the refereeshas, however,suggestedaddingsome discussionof the physical
meaningof the model and in particularof its possiblerelevanceto acceleratedlife
testing.Suppose in factthatthereis a variables, called "stress",and thatlifetests
are carriedout at variouslevelsof s. For simplicity we supposethats is one-dimen-
sional and thateach individualis testedat a fixedlevel of s. The usual idea is that
we are reallyinterestedin some standardstress,say s = 1, and whichto use other
values of s to get quick laboratoryresultsas a substitutefor a predictorof the
expensiveresultsof usertrials.
Now in orderthatthedistribution of failure-timeat one levelof stressshouldbe
relatedto thatat some otherlevel,therelationshipbeingstableundera wide range
of conditions,it seemsnecessarythatthebasic physicalprocessof failureshouldbe
commonat the different stresslevels; and thisis likelyto happen onlywhen there
is a singlepredominantmode of failure. One difficulty of the problemis that of
knowingenoughabout thephysicalprocessto be able to definea stressvariable,i.e.
a set of testconditions,withtherightproperties.
One of the simplestmodelsproposedforthe effectof stresson the distribution
of failure-time is to assume thatthe mechanismof failureis identicalat the various
levelsof s but takesplace on a time-scalethatdependson s. Thus if 5(t; s) denotes
thesurvivorfunctionat stresss, thismodelimpliesthat
,F(t; s) = F{g(s) t; 1}, (45)
whereg(s) is some functionof s withg(l) = 1. Thus the hazard functionat stress
s is
g(s) Ao{g(s)t}, (46)
whereA0(.) is thehazard at s = 1. In particularifg(s) = sfland ifz = logs thisgives
eflzAO(eflz
t). (47)

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
1972] Cox - RegressionModels andLife Tables 201

This is similarto but differentfromthe model (9) of thispaper. A special set of


conditionswhere(47) appliesis wheretheindividualis subjectto a streamof shocks
of randomlyvaryingmagnitudesuntil the cumulativeshock exceeds some time-
independenttolerance. If, forinstance,all aspectsof the processexceptthe rate of
incidenceof shocksare independentof s, then(45) willapply.
If,however,theshocksare non-cumulative and failureoccurswhena ratherhigh
thresholdis firstexceeded,failuresoccurin a Poisson processwitha ratedepending
on s. A special model of thiskind oftenused forthermalstressis to suppose that
failurecorrespondsto theexcedenceof the activationenergyof some process; then
by the theoryof rate processes(47) can be used with A0(.) = 1 and z equal to the
reciprocalof absolutetemperature.
As a quite differentmodel supposethatsome processof ageinggoes on indepen-
dentlyof stress. Suppose further that the conditionalprobabilityof failureat any
timeis theproductof an instantaneous time-dependent termarisingfromtheageing
processand a stress-dependent term;themodelis non-cumulative.Then thehazard
is
h(s) AO(t), (48)
whereh(s) is some functionof stress.Again ifh(s) = s6, themodelbecomes
ef6s
Ao(t) (49)
exactlythat of (9), whereagain Ao(t)is the hazard functionat s = 1, z = 0. One
specialexampleof thismodelis rathersimilarto thatsuggestedfor(46), exceptthat
thecriticaltolerancevariesin a fixedwaywithtimeand theshocksare non-cumulative,
therateof incidenceof shocksdependingon s. For anotherpossibility, see Shooman
(1968).
If hazardor survivorfunctions are availableat variouslevelsofs we mightattempt
an empiricaldiscrimination between(46) and (48). Note, however,thatifwe have a
Weibulldistribution at s = 1, A0(.)is a powerfunctionand (46) and (48) are identical.
Then themodelscannotbe discriminated fromfailure-time distributionsalone. That
is, ifwe did wantto make sucha discrimination we mustlook forsituationsin which
thedistributions are farfromtheWeibullform. Of coursethemodelsoutlinedhere
can be made much more specificby introducingexplicitstochasticprocesses or
physicalmodels. The wide varietyof possibilitiesservesto emphasizethe difficulty
of inferring an underlying mechanismindirectly fromfailuretimesalone ratherthan
fromdirectstudyof thecontrolling physicalprocesses.
As a basis forratherempiricaldata reduction(9), possiblywithtime-dependent
exponent,seemsflexibleand satisfactory.

ACKNOWLEDGEMENTS
I am gratefulto therefereesforhelpfulcommentsand to ProfessorP. Armitage,
Mr P. Fisk, Dr N. Mantel,ProfessorsJ.W. Tukeyand M. Zelen forreferencesand
constructivesuggestions.

REFERENCES
BRESLOW,N. (1970). A generalizedKruskal-Wallis testforcomparing samplessubjectto unequal
patternsof censoring. Biometrika,57, 579-594.
4, 381-408.
CHERNOFF,H. (1962). Optimal accelerated life designs for estimation. Technometrics,
CHIANG,C. L. (1968). Introductionto Stochastic Processes in Biostatistics. New York: Wiley.

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
202 Discussionon ProfessorCox's Paper [No. 2,

COCHRAN, W. G. (1954). Some methodsforstrengthening thecommonx2 tests.Biometrics, 10,


417-451.
Cox, D. R. (1959). The analysisof exponentially distributed
life-times withtwotypesof failure.
J. R. Statist.Soc. B, 21, 411-421.
(1964). Some applicationsof exponential orderedscores.J. R. Statist.Soc. B, 26, 103-110.
(1972). The statisticalanalysisof dependencies in pointprocesses.In Symposium on Point
Processes(P. A. W. Lewis,ed.). New York: Wiley(to appear).
Cox, D. R. and LEWIS, P. A. W. (1972). Multivariate pointprocesses.Proc. 6thBerkeleySymp.
(to appear).
EFRON, B. (1967). The two sampleproblemwithcensoreddata. Proc. 5thBerkeleySymp.,4,
831-853.
GEHAN, E. A. (1965). A generalizedWilcoxontestfor comparingarbitrarily single-censored
samples. Biometrika, 52, 203-224.
GRENANDER, U. (1956). On the theoryof mortality measurement, I and II. Skand. Akt.,39,
90-96, 125-153.
KAPLAN, E. L. and METER, P. (1958). Nonparametric estimationfromincompleteobservations.
J. Am.Statist.Assoc.,53, 457-481.
LEHMANN, E. L. (1953). The powerofranktests.Ann.Math.Statist., 24, 23-43.
MANTEL, N. (1963). Chi-squaretestswithone degreeof freedom:extensions of the Mantel-
Haenzelprocedure.J. Am.Statist.Assoc.,58, 690-700.
(1966). Evaluationof survivaldata and two new rankorderstatisticsarisingin its con-
sideration.CancerChemotherapy Reports,50, 163-170.
MANTEL, N. and HAENZEL, W. (1959). Statisticalaspectsof the analysisof data fromretro-
spectivestudiesof disease. J. Nat. CancerInst.,22, 719-748.
PETO, R. and PETO, J. (1972). Asymptotically efficient
rank invarianttest procedures.J. R.
Statist.Soc. A 135, 185-206.
PRATT, J.W. (1962). Contribution to discussionofpaperbyA. Birnbaum.J.Am.Statist.Assoc.,
57, 314-316.
SAMPFORD, M. R. (1952). The estimationof response-time distributions, II: Multi-stimulus
distributions. Biometrics, 8, 307-369.
- (1954). The estimationof response-timedistribution, III: Truncationand survival.
Biometrics, 10, 531-561.
SAMPFORD, M. R. and TAYLOR, J. (1959). Censoredobservations in randomizedblock experi-
ments.J. R. Statist.Soc. B, 21, 214-237.
SAVAGE, I. R. (1956). Contributions to thetheoryof rankorderstatistics-thetwo-sample case.
Ann.Math.Statist.,27, 590-615.
SHOOMAN, M. L. (1968). Reliability physicsmodels. IEEE Trans.on Reliability, 17, 14-20.
WATSON, G. S. and LEADBETTER, M. R. (1964a). Hazard analysis,I. Biometrika, 51, 175-184.
WATSON, G. S. and LEADBETTER, M. R. (1964b). Hazard analysis,II. Sankhyd, A, 26, 101-116.

DISCUSSIONONPROFESSOR COX'SPAPER
ProfessorF. DOWNTON (Universityof Birmingham): ProfessorCox has givenus a
paperwhichis characteristically bothelegantand useful.One can onlyregretthatit is
probably truethat,as hesays,"theapplicationsaremorelikelyto be inindustrial
reliability
studiesandinmedicalstatistics thaninactuarialscience".Benjamin(1972)gaveonereason
forthiswhenhe said thatto insurance companiestheestimation offuturemortalitywas
theleastoftheirproblems;themajorparameter has becometheinterest
in lifeinsurance
rate on investedmoney. It would appear that insurancecompaniesare, in general,
extremely reluctantto takeon specialshort-term risks,wherethemethodsof thispaper
couldbe applied. One wouldhavethought, however, thatthesemethodscouldbe usedin
non-life insurance.Woulditbe too outrageous to suggestthattherecentfailuresin motor
insurancewould not have occurredif the companiesconcernedhad read,appliedand
drawnthecorrectconclusions fromthispaper?
However,I do notwishto discusspracticalapplications, butto suggestthatbygiving
his papera somewhatrestrictive titleProfessorCox has beentoo modest. He has said
thathe does not wishto exploretheconnectionof thispaperwiththe theoryof rank
tests,so I hopehe willforgive meifI do. Basicallytheapproachadoptedhereis a mixture

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
1972] Discussionon ProfessorCox's Paper 203

of theparametric and thenon-parametric butthisapproachmaybe used to derivenon-


parametric testprocedures ofa moretraditional thisforoneclassof
kind. I willillustrate
problems.
The clueliesin hisremarkin Example1 of Section3 thatforthetwosampleproblem
thebasicmodelimpliesthatweareconcerned witha Lehmann-type familyofdistributions.
This was also theconditionfoundby Armitage(1959) forusingtheDixon and Mood
"signtest". It seemsnaturalto ask firsthow a pairedcomparisondesignwouldrespond
tothetreatment ofthispaper.Weassumetherefore thatoutofnpairsofresults, rspecimens
giventreatment A "failed"beforetheirpairedspecimens, whichhad beengiventreatment
B. For theremaining n- r pairsthepositionwas reversed. Thenifthefailurerateswere
Ao(t)and Ao(t)e, forA and B, respectively,usingtheconditionalargument of Section5
ofthepaper,theprobability, at thefirst
failuretimetioftheithpair,thatfailureoccurred
to theactualindividualobservedis ez/I(l+ en),wherez = 0 or 1, accordingas thefailure
was ofthespecimengiventreatment A or B, respectively.
The log likelihoodis then
L(F) r,B-nlog(1+ el),
=
whence
U(f)= rL(f)= r_ neg
a2 L(F) - neg E2 L(:)}
-fl2 (1+efl)2 =
aP2

Thusto testthehypothesis =- 0 we havetheteststatistic


4(r-n/2)2n,
whose distribution (if 8 = 0) is, asymptotically, X2withone degreeoffreedom.
This is, of course,the "sign test"forthe medianand is a trivialresult. However,
pairedcomparisons are a specialcase of therandomizedblockdesign,and generalizing
themethodaboveyieldsteststatistics forthatsituationdifferent fromthoseusuallyused.
We willassumethatwe haven blockseachcontaining theresultsforp + 1 treatments,
theseresultsbeingrankedin orderofpreference in each block. Equivalentlywe maysay
thatforeach blockwe have an observation consisting of a permutationof thenumbers
0 top, representing thetreatments arranged in orderofpreference.
We supposethatin thejth blockthedistributions underlying therankingof thep + 1
treatments are of theform
1- Fi,jt) ={I -Fj(t)} ki i = O,1, ...,p; j = 1,2, ..,n.
The "standard"treatment corresponding to i = 0 may be chosenarbitrarily and we
assumeko= 1. This distributional assumption is equivalent,in ProfessorCox's terms,
to a hazardfunction fortheithtreatment in thejth blockof theform
A(t) ei, withPi = log,ki (Po = 0).
We now needto use a slightgeneralization of theconditionalargument of Section5 to
see thatifweindexthepossible(p + 1)! permutations oftreatments byr = 1,2, ..., (p + 1)!
thentheconditional probability of obtaining therthpermutation maybe written
exp(i T}/(To[ITr,i
whereTo= z' expPi and Tf,,= ( )exp Pi. The summation sign2 denotesthatin
thesummation thosetermscorresponding
to thefirst oftherthpermutation
I elements of
0 top havebeenomitted.
Theconditional
loglikelihoodoftheobservedresultsis givenby
V V(2+1) 2
L(P) = n zfS-n logTo- z nrz logT,,1,
i=O r=1 1=1

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
204 Discussionon ProfessorCox's Paper [No. 2,

wheren, is thenumberof blocksin whichtherthpermutation


occurs.It maybe shown
that

Uk== : = E n +

wheremk,t (k = 1,2, ..., p) has a rank


is thenumberofblocksin whichthekthtreatment
of at most1. It mayalso be shownthat

E ag2 )= -np(p)
and
Et2 L(0)
___=
no(p),
where

+(p) 2 1 t
matrixis givenby
so thattheinformation
p -1 -1 ... -1

I =n+( ) ....... ... ... .. ...


I=n -(1)[ -1 -1 p
withinverse
2 1 1 ...
1 2 1.
I-1 = [n(p+ 1) 0(p)-1 .. 2
I 1 1 ... 2
On thehypothesis
that, = 0, theteststatistic
U'PI1 U = 2[n(p+ 1) +(p)]1 I Uh Uk
h,<k

has,asymptotically, a x2distributionwithp degreesoffreedom.


is quitedifferent
This statistic fromthatdue to Friedman,whichwouldusuallybe
employed inthissituation.As an exampleofitsapplicationBradley(1968,Example5.12.6,
p. 127)givesdata oftheeffect offourdrugson a person'svisualacuitybasedon testson
fivepeople. The rankings are as follows:
Drug
Subject 0 1 2 3
A 2 4 1 3
B 3 4 2 1
C 4 3 2 1
D 3 4 1 2
E 4 2 1 3

The tableofthenumbers
mkl together valuesof Ukis:
withtheresulting

k 1 2 3 Uk

1 0 1 2 - 35/12
2 3 5 5 37/12
3 2 3 5 21/12

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
1972] Discussionon ProfessorCox's Paper 205

This gives
U'I-1 U = 1782/230 = 7 75,
whilethe5 percentpointofx2withthreedegreesoffreedom is 7X815.On theotherhand
Friedman'stestforthesedata gavea valueof 8X28fora different approximate x2variable
withthreedegreesof freedom.Theseresultsare broadlyin agreement.
It shouldbe pointedout thatthe non-parametric analysisgivenherealso provides
in the statisticsUk some information about whethera treatment is "good" or "bad"
relativeto thestandard.By inspection of thosestatistics treatment 1 is "bad", whereas
2 and 3 are "good". We can also attribute a standarderrorto thestatisticsUk,given,
asymptotically, by
4[2/{n(p+ 1) +(p)}].
In theexamplethistakesvalue079. Becauseit is a fairlysmallexperiment and because
thereis a highcorrelation betweenthe Uk's,we need to adopt a cautiousattitudein
interpreting thisstandarderror.
In principlethisapproachmaybe extended to deal withtiesand/orwithblocks(either
incomplete or over-complete) ofdifferentsizes,althoughthealgebramaynotcomeoutso
neatly.By a suitablechoiceof "'blocks"a non-parametric testmaybe derivedforany
situation,in whichan analysisof variancetestwould be appropriateon continuous
measurements. In particulara relativelysimpletestemergesforthek-samplesituation
(as an alternative to theusual Kruskal-Wallis test).For k = 2 thisreducesof courseto
thetestgivenin equations(32)-(36) ofthepresentpaper.Thistwo-sample testwas earlier
described byProfessor Cox inhis1964paperas an exampleoftheuseofexponential scores.
In factall thetestsdevelopedbythemethodI havedescribed can be expressed in termsof
exponentialscores,illustrating the pointthatthe use of thesescoresarisesfromthe
Lehmannalternative ratherthanfromtheexponential distribution itself.
A rathermoreinteresting and relativelysimplenon-parametric testthatcan be derived
is forthe equivalenceof treatment effectsin a balancedincompleteblock experiment.
Apartfromitspracticalusesit is interesting becauseifthereare onlytwotreatments per
block we are back again in a pairedcomparisonsituation,onlythistimepairedcom-
parisonsof theRound Robintype. For thiscase Professor Cox's approachleads to the
testgivenby David (1963,p. 38). Thus themethodsof thispaperappliedto traditional
non-parametric problems enableus to putundera singleumbrellaapparently unconnected
situations.
As usual thestatistical
ideas thatProfessor Cox has discussedare of boththeoretical
interestand greatpracticalimportance.It givesme thegreatest pleasureto proposethe
voteof thanks.

Mr RICHARD PETO(OxfordUniversity): I havegreatly enjoyedProfessor Cox's paper.


It seemsto me to formulate and to solvetheproblemof theregression of prognosison
otherfactorsperfectly, and it is verypretty.
In one detailI thinkthatProfessor Cox has notclaimedthefullcreditthathismethod
deserves.Supposewehavea singleexplanatory variablez and a singleparameter ,Brelating
z to prognosis(i.e. to thedistribution offailuretime)and supposethatcensoring is inde-
pendentof z. In thissituation, Professor Cox suggests in equation(18) thestatisticU(O)
fortesting ,B= 0. Thisteststatistic is notmerely asymptotically it is locallymost
efficient,
powerful amongall rank-invariant testprocedures.Thisis exactlytrueforanyparticular
finitesamplesize,and U(O) is therefore thebestconceivablerank-invariant teststatistic
forthisproblem.
In thecase wherez is a zero-oneindicator variable,thetestof,B= 0 is thetwo-group
ranktestof Section7, whichis thelogranktestand whichhas alreadybeenprovedto be
ofmaximallocalpowerfordetecting a differencebetween twogroupsofsimilarly censored
observations.However,thediscovery ofa ranktestofmaximallocal powerfordetecting

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
206 Discussionon ProfessorCox's Paper [No. 2,

dependenceof prognosison a continuous variableis completely novel. We have used


Professor Cox's regression methodsin Oxfordon realdataand,despiteappearances, they
are computationally veryquickand easyto handle,givencarefulprogramming.
I thinkonlythathistreatment of tiedranksis unsatisfactory. Fromtheviewpoint of
theanalysisofclinicaltrials,itfallsbetweentwostools. His suggested likelihoodfunction
fortiedranksis notexactlythecorrectlikelihoodfunction iftimeis continuous and tied
ranksmerelyrepresent slightgrouping, althoughtheexactlycorrectfunction is horribly
complicated.However,if Cox's suggestedlikelihoodfunction is seen as merelya very
good approximation to the propergrouped-continuous-time likelihoodfunction, then
it can be shownthatan equallygood approximation, whichis muchsimpler, exists(see
below).
Now, it is notfairto complainthata paperwhichhas beenveryfulland interesting
does not giveall thetechniques requiredfortheanalysisof clinicaltrials. However,it
does seemto us at Oxfordthata synthesis of Professor Cox's fullyconditional regression
and ourfullypermutational two-group significance testingis betterthaneitherseparately.
In a clinicaltrial,patientsare allocatedat randomto receivedrugA or drugB and,as
theyenterthetrial,variousexplanatory variablesarerecorded;whitebloodcount,age and
so on. Supposewe havea vectorz ofinformation on eachpatient,wherez1is a zero-one
indicatorvariablespecifying group membership.Let P be the vectorof coefficients
relatingto prognosisin exactlythemannerProfessor Cox has described.Professor Cox
has suggestedthe followingtestforwhether, afterallowingforeverything else, group
membership affects prognosis.First,findOr, therestricted ML value of P in whichPi,
thegroupmembership parameter, is constrained to be zero. ThenexaminewhatProfessor
Cox calls U(O), whichis the log-likelihood derivative at or withrespectto the group
membership parameter/1. FollowingProfessorCox, eitherthe squareroot of thelog
likelihoodincreasewhentherestriction on , is liftedor U(O) is approximately normally
distributed, and sincez1 is independent of theothercomponents of z it does not matter
whichwe examineto testwhether treatment matters.
However,ifthistestis theheartof a clinicaltrialwhichhas lastedseveralyears,it is
betterfor it to be exact than approximate.Having located the restricted likelihood
maximum at Or, we can in factconstruct a scoreforeach subject,expressing how wellhe
has done givenhis initialwhiteblood count,age and so on, suchthatthe sum of the
scoresof thesubjectsin groupA equals U(O). The nulldistribution of U(O)is therefore
thatof thesumof a randomselectionfromthefinitepopulationof our derivedscores,
and exactsignificance testsare therefore possible.
Definetheobserveddeathcountforsubjectj to be 1 or 0 accordingto whether the
subjectdiedor not,and definetheexpecteddeathcountforsubjectj to be an appropriate
function of Or,

E
i IIeRk()
exp (or z)f E
kERt(i)
exp(r *Zk)

whichequalstheriskofdeathon a man-years basisforsubjectj iftheexplanatory variables


affect
prognosisas Ir (wherefortypographical reasonR(t(,))is printedRt(i)).
The scoreforsubjectj is now thedifference betweenhis observedand his expected
death-counts,and thesumofthescoresforone particular treatment groupequalsplusor
minusU(O). The exactnulldistribution of U(O)is therefore thatofthesumof a random
combination ofthesescores.
We have also foundthecalculationof observedand expecteddeath-counts forindi-
vidualsto be good forillustrating
thedependence ofprognosison a particular factor.If
thefactoris dividedintoa fewsub-groups, and thesumsof theobservedand thesums
oftheexpecteddeath-counts in thosesub-groups are compared, thenit is easierto under-
standphysicallytheapparent natureofthedependence thanifwejusthavea fewregresssion
coefficients
and significance
levelsto look at.

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
1972] Discussionon ProfessorCox's Paper 207

Finally,I wouldliketo returnto thequestionof how Professor Cox deals withtied


rankswhentimeis continuous andtiedranksmeanonlythatslightgrouping has occurred.
If p is a vectorof coefficients
and zj is thevectorof explanatory variablesforsubjectj,
denotebyej thequantity exp(P. zj). Also,I restrict
attention to oneeventonly(consisting
of one deathor severaltied deaths),and abbreviate"the sum overtheriskset of" to
"thesumof". Now, at anyparticular timethedeathrateforsubjectj is proportional to
ej, so if one deathonlyoccurstheprobability thatit was subjectj who died is ei/l e.
Whatlikelihoodshouldreplaceej/l e ifmorethanone deathoccurs? As Professor Cox
remarks, anyrelativelyad hoc modification of his procedurewilldeal satisfactorilywith
thisproblemifthetiesare fewin number.
I willtakethespecialcase oftwosubjects, jl andj2, dyingat thesamerecordedtime:
generalization to severaldeathsis straightforward. If timeis continuous,theprobability
thatj1 andj2 are thetwosubjectswhodie is thesumof theprobability thatjl diesfirst
and j2 secondplus the probability thatj2 dies firstand jl second. Call thisthe real
probability;
Preai= e3l e12 e12 +
e,l
2:e Q(e)-ej, 2:eQ(e)- ej2
ProfessorCox's suggestedprobabilityappears in his equation (22); call this Cox's
probability;
2e1lej2
C (E e)2-z e2'

I wouldliketo suggesta thirdformthattheprobability


mighttake,whichI call the
rough
probability;
Prough= eil e2 2 Q( eIN)2.
Physically, it is a matterof indifference whichof the threeformswe adopt. All are
identically equal in theabsenceof tiedranks,and if thereare tiedranksthedifferences
between thethreeformsaretwoordersofmagnitude lessthantherandomvariation which
is beinganalysed.The roughprobability is just as good an approximation to thereal
probability as Cox's probability,
butall thingsbeingequalI supposeonewouldmarginally
prefer to use therealprobability sinceno approximation to realityis involved.However,
all thingsare notequal; thelocation,evengivenextremely efficient
programming, of the
maximaoflikelihoods derivedfromtherealprobability or fromCox's probability is much
morecomplexthanthelocationofthemaximum oftheroughprobability. Forthisreason,
I believethatProfessorCox's model shouldperhapsbe fittedin continuoustimeby
maximizing the sum over all eventsof the logs of the roughprobabilities.Susannah
Howardhas developedan algorithm whichconverges inpowersoftenorbetter, andwhich
is fast-thefitof fivefactorsto 250 patientstookless thana secondper stepon an old
Atlas,and is, therefore, quitepracticable.
Last week,I usedthesemethodson someclinicaltrialdata,and whileI was goingover
theresultssomeoneaskedmewhyI was lookingso pleased. I saidthatit was becausethe
methodthatwas beingused was so neat,and she askedme to explainit. She is not a
mathematician nora statistician,
so I describedtheconditional argument and leftout all
thecomputational details.WhenI had finished, shesaid "I can'tsee whyyouthinkthat's
neat. It's justcommonsense." I secondthevoteof thanksto Professor Cox becausehe
has openedup newterritories to commonsense.
The voteof thankswas put to themeetingand carriedunanimously.
ProfessorD. J. BARTHOLOMEW (Universityof Kent): ProfessorCox's methodshave
interesting
potentialapplicationsto theanalysisof labourwastage.The function A(t;z)
thenrepresentsan individual'spropensity
to leave as a function
of his lengthof service.

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
208 Discussionon ProfessorCox's Paper [No. 2,

The formof thisfunction has an obviousrelevanceto personnel policiesand it has been


thesubjectof a good deal of empirical work. Forbes(1971)reviewedtheapplicationof
lifetable techniquesto the non-parametric estimationof the survivorfunctionfrom
censoreddata.
It is well establishedthatpropensity to leave dependson manyattributes of which
sex,grade,levelof skill,place ofresidence are amongthemostimportant.The methods
givenin thispaperoffer theprospectofa muchmoreefficient estimationoftheserelation-
shipsthanhas hitherto beenpossible.The modelofequation(9) is particularly appealing
becauseof itssimplicity and becauseof a certainplausibility
whichit has in thewastage
application.The formof survivor functions is remarkablystableand thismightsuggest
a commonAO(t)scaledup or downbya factordepending on theexplanatory variablesz.
Unfortunately thereis a considerablebodyof empiricalevidenceto suggestthatthisis
notthecase. Survivor functionsare oftencloseto thelognormal implying that
t ID_ @ g
A(t) = ,

whereS is thestandard normaldensity and(Ditsintegral.Further, theparameter a appears


to reflectthetypeofjob concerned(e.g. professional, skilledmanual)whereasvariation
in the explanatory variableslistedabove exerttheirinfluence through1t. A suitable
modelmightthenbe obtainedby writing [t = z'13in A(t). The analysiscould thenbe
developedusingparametric maximum likelihood methods butthesimplicityoftheauthor's
methodswouldbe lost. It wouldbe interesting to knowwhether themethodsofthepaper
are robustenoughto givesensibleanswerswhenthe lognormalmodelis appropriate.
Put anotherwaywe mightask whether it is possibleto constructz'P in sucha waythat
thereis closeagreement between thetwomodels.Someofthez's wouldbe theexplanatory
variablesin whichwe areinterested and othersmightbe functions oft designedto improve
theapproximation.
The non-parametric estimation of survivor functionswhenA(t)is monotonic, referred
to in Section2, has beenextended to increasing failurerateaverage(IFRA) distributions.
A reviewofthisgeneralproblemis to appearinBarlowetal. (1972).

MrDAVIDOAKES (ImperialCollege,London):I shouldliketoremark briefly


concerning
theestimation of the distribution of failuretimeonce an estimate,3 of P is obtained.
The methodgivenin Section8 ofthepapertreatsAO(t) as identically
zeroexceptat points
wherefailuresoccur. Howeverwhendealingwithdata in continuous timeit seemsmore
naturalto assumethat AO(t)is a slowlyvaryingfunctionof t. This leads to a simple
maximumlikelihoodestimateof Ak, the (assumedconstant)value of AO(t)betweenthe
failuretimest(k-1) and t(k) (t(O) = 0). We obtain

Ak= H(ri - u) exp{Izi(u)} du] ,

whereriis thetimeto failureor censoring


oftheithindividualand H(x) is theHeaviside
unitfunction.In orderto obtaina good indicationof thebehaviourof AO(t)it willbe
to applysomegroupingor smoothing
necessary procedureto theseestimates.

Professor D. V. LINDLEY (University CollegeLondon): For simplicity,my remarks


are confinedto the two-sampleproblemin continuoustime. Let sample0 have m
observations occurring at timess1,s2, . Sm (eitherfailuresor censored),
. ., and letm' (< m)
ofthembe failures.The corresponding dataforsample1 aren timest1,t2,..., t, ofwhich
n'(s<n) are failures. If Yi(t) are the survivorfunctions(i = 0, 1), fi(t) = - d,1i(t)/dtthe
corresponding densityfunctions and As(t)the hazard rates,so thatfi(t) = Fi(t) Ai(t),
each censored value contributesa term F(t), and each failurea term Y(t) A(t),to the

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
1972] Discussionon ProfessorCox's Paper 209

overalllikelihood(as distinctfromCox's marginallikelihood).Hence the likelihood


functionis
mn n
H Fo(si) H Ao(si) H .11(t1) H AMO),
i=i1 j=
ieFo 1 ieF,

whereFi is theset of failuresforsamplei. If we write,withtheauthor,Al(t)= A(t),


so thatEl(t) = 0o(t)A,
thisbecomes

(I
i=1
o(sJ) II A0(SJ)
II Ao(tj)}IIi(tj) ( )
iEieF, JeF, i1=

Now Ao(t),and hence %o(t),is unknown, so we shouldproperly I 0) and.50(tI 6)


writeAO(t
a parametric
indicating dependence on 0, say. It is immediately
apparentfromthesecond
setof bracesthattheobviousconditions fora marginal likelihoodargument, namelythat
thelikelihoodfactorizes
intoone partinvolvingb,theparameter ofinterest,and another
with0, thenuisanceparameter, doesnotobtain. So Cox's argument cannotbe supported
thisway.
Supposewe takethecase AO(t)= , a constant.Thenthelikelihoodis easilyfound
to be
e-(S+OT) Om'+n' /n

whereS = Si and T = = tj. If the prioris proportional


to 0-1 +b-l, we easily
obtaintheposteriorfor b to be proportional
to
0n'_- (S + )sT)mn (*

so thattb/sis F on (2n',2m') d.f.: heres = S/m'and t = Tln'. (Noticethedivisionby


m', n'; notm,n.)
Howeverthe assumptionof constanthazardis not necessarily appropriate, and is
clearlyavoidedin themarginallikelihoodapproach. But forany AO(t)thereis a trans-
formation ofthetimeaxisso thatit is constantand again(*) willobtainbutwithS and T
nowthesumson thenewtimescale. Hencewe can explorea rangeofpriorestimates for
AO(t) and see howtheresultsare affected.
It is worthcontrastingthemarginallikelihoodwiththeintegrated (withrespectto 0)
likelihood,equal to (*) times b. The former is a productof termslike b/(ai + bi+) or
(ai + bi+)-L whereai and bi referto thenumbersat risk. The numerators are at most
different by b butthedenominators are quitedifferentsincethetimesappearin (*) but
not in the marginallikelihood.Special cases are worthexploring.Suppose sample0
has one censoredvalueat 2, and sample1 has a failureat 1. Thenthemarginal likelihood
is b/(1 to thesinglerisksetat t = 1. The integrated
+ b)referring likelihoodis b/(2+ b).
Witha changeof timescale themostthatthelattercouldbe is b/(1 + b),and thiswhen
t = 2 is identified
witht = 1. The marginal likelihoodis therefore
veryextreme, especially
in its failureto dependon the timeof censoringor failurein sample0 whenever this
exceeds1.
Mr P. W. GLASSBOROW (BritishRail): I wantto makea briefremark.In Section8
Professor Cox analysestwocausesof failureand whether thecausesof failureare inde-
pendent.In reallifetheyoftenarenotindependent andthisbringsus backto thebeginning
of thepaper. It is unfortunatethatProfessor Cox uses theterm"censored";I do not
knowwhether thishas beenusedelsewhere insteadofthetraditionalterm"withdrawal".
If youuse "withdrawal" yourealizeit is justa typeoffailure,
and withdrawal and failure
are oftennotindependent.
The following
contributions
werereceivedin writing
afterthemeeting.
ProfessorD. E. BARTON (Queen MaryCollegeand Instituteof ComputerScience,
of London): My feelingis thatProfessor
University Cox understates
theimportanceof

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
210 Discussionon ProfessorCox's Paper [No. 2,

Kaplan and Meier'sresultthattheproduct-limit estimateis themaximum likelihoodone


and,conversely, is tookindtothosewhofindtheanalytic problems in specifyingthefamily
ofall possibledistributions.As discussed
inBarton(1968)thereareseveralpossiblealterna-
tiveforms ofestimator, anditis notimmediatelyclearthatthemaximum likelihood estima-
tormakesbestuse of theinformation available. Moreoverthereareeffectively an infinite
numberofnuisanceparameters beingeliminated(thatis a nuisancefunction: theunknown
censoring rule). In thepapercitedI showthatthemethodof maximum likelihoodgives
moreefficient estimation thanthealternativesand a heuristic argument suggesting that
it is efficient.
Thisefficiencydoes seemto be a property whichgivesKaplan and Meier's
resultsomeimportance.
Miss SUSANNAH HOWARD (Department of Biomathematics,OxfordUniversity): Since
ProfessorCox has proposedsucha satisfying methodfortheanalysisof censoredfailure
times,it seemsworthwhileindicatinghow easily the computationinvolvedcan be
performed.
By replacingthe explanatory variableszj foreach individual
j by zj - z, wherez is
themeanof z overall thoseindividuals whoare observedto fail,theterm s(i, P in the
fullconditional
log likelihood(following
equation(22)) vanishesidentically,giving
kr
L( =- - log exp{s(t) 3}
in eitherdiscreteor continuoustime. The notationhereis as in Section6 of thepaper,
butwithz nowequalling0. If thereareties,L and itsfirst and secondderivatives
can be
computedbyexploiting their"symmetric function"properties in thefollowingway.
Let ej be theexponentialweightexp{zj P} forthejth individual, and,for1< s, -9<p,
define
xei= zeiei, yej = zejz,j ej.
For any riskset R and any integerm, definea(.R; m), b(QJ;e; m), c(QJ; ., r; m) and
d(-q; e, r1;m),for1< s, <p, byrecursion on S:
(i) If S = 0,
a(-q; m) = So,
and
bQt;{; nm)= c(q; m = d(PJq; in)
in) m = 0 forall m.
(ii) If R+ = Ju {j}, withj R,
a(-q+; m) = a(-q; m)+e,a(-q; mr-1),
b(-q+;e; m) = b(-q; e; m)+ ej b(?q;e; m-1) + xeia(.R; m),
c(-q+; m = c(-q;
i) i)+e,c(-q;
m) ,ri; mi-1)+y,ja0q; m),
d(-q+; m
i) = d(-q; m) {
ej)+ejd(-q;; -
m1-)+xejb(-q; -9;m)+x,jb(-q; e; m)
forall m.
Then
k
L =- logai whereai = a(QJ(t(i));
m(j)),

aL k

=-bei wherebi = b(t?(t(());e; m(in- )/ai


and
a2 L k
- = I fce{Ci- bei ,

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
1972] Discussionon ProfessorCox's Paper 211

where
= {c(_qt(ti>));,r;m(X)-1) + d(Rt(t(>)
cenRi ;,r; m(X) -2)}Iai.
Now ifwe considerthetimesofcensoring or observedfailurein reversechronological
order,the risk set R increasessteadily.So afterfirstdefining arraysA(m), B(e, m),
Qe, -9,m) and D(e, -9,m) accordingto (i), withe, -9and mwithinthefollowing bounds
A(m) 0 <mAm< j

B(e, m) 1 < p< 0 0 K, ,m" -1 wheremQ = max {m()I 1i < k},

m) I1 <A1p9
D(e,-9, <- 0<m <m,,- 2J
thenas each newindividual joins theriskset thecorresponding valuesin thearrayscan
be computedaccordingto (ii). Thus at each failuretimet(i) all the termsneededfor
computing L anditsderivativesarealreadyknown.Moreover, at anytimet,inthebounds
givenabove,mO0 maybe replacedbymt= max{m(i) It(i)< t}.
This simpleprocedurecan be programmed in a way whichallowsforflexibility, so
thatone can choosewhether or notto use approximations forthesecondderivatives, or
evenforL itself.If therearenottoo manydata (say,up to 200 individuals withnotmore
than 5 parameters to be fitted),
maximization of the fullconditionallog likelihoodis
feasiblewithoutresorting to approximations and, in situationswheretimeis genuinely
as onemightfindincertaintypesoflife-testing,
discrete, itis betterto fitthelogisticmodel
exactly.However,in analysinga largeclinicaltrialwith"ties" due to slightgrouping,
approximationssuchas the"rough"probability whichRichardPetohas suggested would
stillseempreferable.
Professor B. BENJAMIN (Civil ServiceCollege):It is not quitetruethatactuariesare
onlyconcerned withsituations in whichsamplingerrorsareinsignificant. Manyofthem
are involvedin follow-up studiesof specialgroups(e.g. thosewithimpairments) or with
non-life investigations whichare analogousto reliability trials.The actuaryis moreover
notonlyinterested in theprobability ofsurvivingt years,or theexpectation oflife,or the
expectednumberof "failures"in a specified period. He is interested in theshapeof the
lifetable. He is a collectorof shapesand partofhis specialskillliesin hisexperience of
and recognition of typicalshapes. My approachto the data of Table 1 is as follows.
(1) Turn the table upsidedown and groupin 5-weekperiodsto reduceirregularities.
Assuming thatthefailures areat thenearestintegral interval andthatthecensored"lives"
survived to thebeginning oftheinterval in whichtheywerecensored, calculatetheaverage
exposedandthencetheaveragedeath-rates mtin eachinterval [notethatwe do notwholly
discardthecensored"lives"].(2) Plottheseand drawa smoothcurvethrough thepoints
(see Fig.I) thusinferringandremoving sampling fluctuations (therearetestsforimproving
theefficiency of thisinference-seeBenjaminand Haycocks,1971). The shapeof mtis
reminiscent ofmanycurveswitha basicexponential progression andanadditional component
of early"mortality" probablylikesomepopulationlifetableswherem$,x beingage, is
a combination of a Gompertz (m. = Bco) and a Normal curve in early ages. It is also
veryevidentwithoutcalculating errorsthatthetwoexperiences are different.
I havenot
seentheauthor'sdiagram.(3) Read offmtforeach weekand calculatefirst pt and then
[PoPlP2 - Pt-1]theprobability of survivingt intervals.(4) Calculatethevarianceof
thisprobability and if necessarymake a formaltestof the difference betweenthe two
experiences.In thiscase,as theauthoragrees,thedifferenceis overwhelmingly significant.
Thereare probablyweaknessesand strengths in thisprocedure.The authorwould be
doingtheactuariesa greatserviceifhewouldturnto a practical reviewoftheseweaknesses
and to an assessment of theirimportancein practicalsituations(likeTable 1). Actuaries
are willingand able to followthemathematics especiallywhenso lucidlyexpressedas in
thispaperbuttheyneedto be convinced thatitis importantto decision-taking.

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
212 Discussionon ProfessorCox's Paper [No. 2,

For whatitis worthmyestimatesfortheproportion


surviving
5 and 10 weeksrespec-
tivelyare (variancein brackets)
SampleI 0-649(0-0155) 0-358(0 0569),
Sample0 0-923(0 0040) 0 753(0-0145).
May I also stress,as elsewhere(Benjamin,1972),thatno actuarywould recommend
actionon anyexperiment forwhichsignificance
could be demonstrated
onlyaftergreat
mathematicalstrain.Mostimportant changesstickoutlikea sorethumb.

0*40
Sample I

0-30

mt
0-20

Sample 0
0.10/

0 S 10 15 20
Weeks
FIG. I

Dr JOHNJ. GART (NationalCancerInstitute):In 1958 ProfessorCox presented an


elegantand unified approachto theanalysisof binarydata and nowhe givesa treatment
of lifetablesof equal eleganceand usefulness.In Section7 he pointsout the formal
identity of (27) to thetestforpartialassociationin combining 2 x 2 contingency tables.
It followsalmostas directly thatthex2teststatisticforthecomparison ofp + 1 independent
survivalcurvesderivedfrom(18) is formally identicalto the Birch-Armitage statistic
forpartialassociationin 2 x k x (p + 1) contingency tables(Birch,1965;Armitage, 1966).
In thetwo-sample problem,it appearsthatvalid,asymptotic methodsforthepointand
intervalestimation of eg are formally identicalto thoseof the commonodds ratioin
combining 2 x 2 contingency tables(e.g. Gart,1970). It willproveinteresting to pursue
further thepossibleparallelsbetweenlifetablesand contingency tables. Can theformally
identicaltestsforinteraction in higherdimensional contingency tablesbe usedto testthe

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
1972] Discussionon ProfessorCox's Paper 213

plausibility
of the proportional
hazardrate model? Will the proportional
hazardrate
model methodsprove as robustas the logisticmodel methodsfor contingency table
analyses?Once again ProfessorCox has provideda simple,coherentframeworkwithin
whichsuchquestionscan be resolved.

Drs L. D. MESHALKIN and A. R. KAGAN(WorldHealthOrganization): We congratu-


latetheauthoron an extremely stimulating
paper,whichhas relevanceto epidemiological
studiesof predictionof highriskand identification ofcauses,as wellas to clinicaltrials.
We makebelowtwopoints,illustrated by an exampleof how thepowerof a particular
factor(raisedblood pressure)to predictsubsequentdisease(deathfromcardiovascular
disease)varieswiththeintervalbetweenits measurement and theonsetof disease. We
believethatthisdemonstrates furthertheidealsexpressed by Cox.
1. Use of a morecomplicatedfunction h(z, 3). Predictors of thoseat highriskto
developischaemic heartdiseasehavebeenidentified byrelating initialmeasurements made
on groupsof subjectsto theirsubsequent diseaseexperience.Butthepredictive powerof
somefactorschangeswiththepassageoftime.It is important to knowthewayin which
thischangetakesplace fora properunderstanding of thediseaseprocessand itscontrol
and also formoreadequatestudydesign.
An adaptationof ProfessorCox's approachenablesus to measurethiseven when
thestudyincludessubjectsof different age, who remainin thestudyforvarying periods
oftimeandthenumberofsubsequent diseaseeventsis small(e.g.684maleswerefollowed
fornot morethan10 years,aged 30-62 yearsat entry,with66 cases of cardiovascular
death).
Our illustration(Fig. II in thisDiscussion)showshow thepredictive powerof the
value of thesystolicblood pressuredecreases.Two analyticalexpressions wereused for
thefunction, h(z,/3):
hi= (fo+ P1z) (1 -2)T,
h2 = (fo + 1 Z)/(1 +/2 T),
whereT is a timefromtheinitialmeasurement andz thevalueofa systolic
bloodpressure.
Fig. II showsthatthechoiceof analyticalexpression
has notinfluencedtheresultmuch.
2. A knowledge of A0Q).For a numberofchronicdiseases,A0Q)can be wellapproxi-
matedbythefunction,
A0Q)= exp{do+ d1t}
as used,forexamplein de Haas (1964).
In theabove example,use of thisformoffunction AO(t)reducesasymptoticvariances
ofestimates by 10-20percent.
Computerprogramsfor the above analysescan be obtainedfromthe Numerical
AnalysisUnitoftheDivisionofResearchin Epidemiology and Communications Science,
oftheWorldHealthOrganization, Geneva,Switzerland.

ProfessorM. ZELEN(StateUniversity of New York at Buffalo):My congratulations


to ProfessorCox on presenting a verystimulating and pioneering paper. He has raised
severalpointsin his paperwhichI am certainwillbe thesubjectofmuchfuture investi-
gation. I wishto confinemyremarksto the analogybetweenthe modeldiscussedby
Professor Cox and contingency tables. To simplify matters onlythetwo-sample problem
willbe discussedand no censoring willbe assumedpresent.
Supposewe have(k+ 1) intervals (za_1,za] (ot= 1,2, ..., k), (Zk, o) wherez0 = 0. Also
lettherebe twopopulationshavingtheconditional probabilities Pi, = Ji(za)/'i(za1) for
i = 1,2. (ChooseZk so thatthereare no failures pastZk.) Then,iftheeventof surviving
or not survivingan intervalis onlyconsideredforanalysis,thecomparisonof thetwo
populationsis formallythesame(as Professor Cox has noted)as comparing several2 x 2

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
214 Discussionon ProfessorCox's Paper [No. 2,

contingency tables. The teststatistic


dependson thealternative whether
hypothesis the
oddsratio0,,= ql,p2,jq2apla forthecth tablearethesameor possiblydifferent.If 0A, 0 =

forall u, thentheappropriate teststatistic


is the one discussedby Cochran(1954) and
Manteland Haenzel(1959). Alternatively, if arenotall equal theteststatistic
a wouldbe

______ Predictive
power
I-d-Confidence intervals
1-10 \ (twostandard
deviations)
1.0 h
0 \ \ h2

080 \
') h2\
?
0
0*70

0-60
Vu \

-~0*50.

0*40

0*30 ::,-

h2-
ho--
0-20 /-
0.10

10 20 30 40 50 60 70 80 90 100 1O 120
Timein months
Fig.II. Predictive
powerofinitialsystolic
bloodpressure oftimefrom
as a function
initialmeasurement.
Predictive
poweris measured
by:
log10 h(xl)/h(x2),
whereh = h(x)is a factor
whichshowshowmanytimestheriskofan individualwitha
measurementvalueofx is morethantheriskfortheaverageindividual
ofhis age,and
xl(x2) is the value of measurement
such that one-quarterof the whole populationof
hisagehasbigger
(lower)valuesofx.

different,
cf.Zelen (1971). For example,if thetwo populationshave exponential
distri-
butions(SF(t) = exp- Ait), we have
=
[{1-exp (-A1l
/A)}/(l-exp (-A2A.)}] exp-(A2-A1) \A,
whereA\,= Za- z-. Thus the a willnot be thesame (providedA1X A2)unlesstheintervals
are chosento be of equal length.In generalforarbitrary
survivaldistributions
where
F2(t) = [J1(t)]P,the same result will hold in that the {0/a} will be different.The same
remarksholdiftheintervals
arechosento coincidewiththeobservedfailuretimes.Thus
theasymptotictestprocedure
willnotin generallead to equation(27) of Professor
Cox's
paper.

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
1972] Discussionon ProfessorCox's Paper 215

ProfessorR. E. BARLOW of Californiaat Berkeley):ProfessorCox has


(University
proposedsomeapparently veryusefulprocedures
foranalysinglifetestdata. In a recent
paperwithDoksum(1972),we foundthecumulativetotal timeon teststatisticto be very
usefulin thesinglesamplegoodness-of-fit
problemforexponentiality. This statistic
is

[-F(u)] du x "[Il-F(u)]du,
k n:
[I
whereFn is the empiricaldistribution
and Xl: nX2 n <- <Xk: n are the firstk order
statistics
fromF. The process

[I
rFn-I(t)
[-Fn(U)] du
on [0,1] also playeda keyrole in Barlowand van Zwet (1970) wherewe investigated
estimatesforthe failurerate assumedmonotone.These statistics thusseemusefulin
lifetestmodelsbesidesthosebased on the exponential distribution.
Perhapssincethe
presentpaper is more concernedwithsupplementary information,total timeon test
does not playsucha centralrole. However,I wouldliketo see a formulation
statistics
of theseproblemsin whichthetotaltimeon teststatistics mightbe used to advantage.
Reference shouldperhapsbe made to therelevantpaper by Harriset al. (1950) in
connection withstepfunction failurerateestimators.
Doksum(1967) also uses testsbased on (32) fornon-parametric two-sample lifetest
problems.He showsthattheSavage statistic(32) maximizestheminimum powerover
IFRA (forincreasing failurerateaverage)distribution, fortheproblem
F, asymptotically,
Ho: A < 1 versusH1: A > 1 wherethefirstsampleis fromF(.) and thesecondsampleis
fromF(. /A).
Recently, someveryelegantproperties ofshockmodelprocesseshavebeendiscovered
byEsaryet al. (1972). Perhaps,theseare nowripeforstatisticalanalysis.

Drs JACK KALBFLEISCH andR. L. PRENTICEt (StateUniversity ofNewYorkatBuffalo):


We wouldliketo raisesomequestionsconcerning theconditional likelihoodin Section5
of this paper. Let us suppose a continuoushazard withoutcensoredobservations.
Expression(12) appearsto be the conditionalprobability thatindividuali failsat t(i),
giventhata failureoccursat t(j) and giventheriskat R(t(z)). Thus ifindividuals1, 2, 3
haveassociatedcovariatevalueszj,z2, Z3 andareobservedto failat tl,t2,t3,witht,< t2 < t3,
thenexpression (12) yields
(i) P (1 failsat t,I one failureat t, and R(tj) = {1,2, 3})
= exp{zj f}/13 exp{zi f};
(ii) P (2 failsat t2lone failureat t2and R(t2) {2, 3})
=

= exp {Z2 P}/13 exp{zi f};

(iii) P (3 failsat t3 lone failureat t3 and R(t3) = {3}) = 1.


Our questionsconcernthecombination of suchstatements to formtheexpression (13).
If (13) is thelogarithm of a conditionallikelihood,thentheproductof (i), (ii) and (iii)
shouldpermitan interpretation as a conditional probability statement.The introduction
of Section5 appearsto suggestthatthedistribution to be calculatedis to be conditional
on theobservedorderstatistic.However,theconditionalportionof (i), forinstance,is
the eventthata failureoccursat t, and two failuresoccuraftert, (as opposedto the
eventthatfailuresoccurat tl, t2, t3). Thus the likelihoodcorresponding to (13) differs
fromthat arisingfromthe permutation distribution calculatedconditionally on the
observedfailuretimes. The permutation distribution generally involvesAO(t)(fi?0).
t On leavefrom theUniversity ofWaterloo, Canada.

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
216 Discussionon ProfessorCox's Paper [No. 2,

On casual reading,it appearsthat(13) is formedby regarding the selectionof an


individualfromtheriskset at each observedfailuretimeas an independent experiment.
The Cartesianproductof theconditionalprobability spacescorresponding to each such
experiment wouldthengivea probability yielding(13) as thelog likelihoodof /. This
procedure, however,definesa reference set whichattachespositiveprobability to events
in whichthesameindividual failsseveraltimes.We wouldappreciateit ifProfessor Cox
woulddiscussthe reference set and the conditionalprobability statementsfromwhich
(13) arises.
Considering again thecontinuousuncensored to notethatthe
case, it is of interest
model(9) is invariantunderthe groupof differentiable, monotone,strictly increasing
transformations on survivaltime. This invariance permitsthecalculationof a marginal
and Sprott,1970,or Fraser,1968)forP. The marginal
likelihood(Kalbfleisch likelihood,
thelogarithm ofwhichis givenby(13), arisesfromthemarginaldistribution oftheranks.
The continuous censoredcase can also be handledfromtheviewpoint ofmarginallikeli-
hood by imposingapproximations similarto thosein Section5. Again the resulting
expressionis (13). If multiplicitiesare allowed in the continuouscase, the resulting
marginal from(22) and is written
likelihooddiffers
k k
(iv) i=l_
s(i, - i=l m(i)log eR(t(j}))
Eexp {z PI.
Expression(iv) seemsappealingin certainspecialinstancesconsidered.For example,
ifn = 2 and t, = t2 is observedwithcorresponding covariatevaluesz1 and Z2, then(iv)
has a unique maximumat P = 0 unless zl = Z2. Expression(22), however,reduces
identically
to zeroin thiscase,indicating thatno one valueofP is to be preferred to any
other.But,ifz1 and Z2 differ widelyit seemsclearthat,B= 0 is to be favoured(provided
theintervalsformeasuring survivaltimeare notundulylarge).
In orderto keepthesecomments relatively
brief,thecalculationsinvolvedin obtaining
thesemarginal likelihoodshavebeendeferred to a notenowbeingprepared forpublication.
A finalquestioninvolvesthespecification ofthecontinuous model(9). Professor Cox
suggeststhata function of survivaltimeitselfmaybe used as a covariatein thehazard
function.Sinceno assumption is madeaboutAO(t), thehazard
A(t,z) = Ao(t)exp{fl z}
maybe re-written as
A(t,z) = Al(t)exp{92 t+ /1 Z}
withoutadditionalassumption.Corresponding to these two specifications,different
conditionallikelihoods(13) couldbe formed, whichwouldgenerally giveriseto different
estimatesof P. We notethattheabove-mentioned marginallikelihoodsdo not permit
theinclusionof suchtimedependent covariates,and we wouldappreciatea discussionof
whensuchcovariatesshouldbe included.

ProfessorNORMAN BRESLOW (University of Washington):Like some of the other


discussantsI too was puzzledbytheconditional likelihoodof Section2. I wouldliketo
suggestan alternative approachto theestimation ofP and AOwhichleadsto equation(14)
and also to a simplerestimateoftheunderlying survivaldistribution thanis providedby
equations(37) and(38). Thisapproachis motivated inpartbythediscussion ofKalbfleisch
and Prentice.Howeverit differs fromboththeirarguments and thoseof Cox in that
simultaneous estimation ofP and AOis achievedthrough considerationofa jointlikelihood
function involving bothsetsofparameters.
One of themethodsof deriving theKaplan-Meierestimatein a maximum likelihood
(ML) framework is to restrict
attentionto distributions
havinga hazardfunction whichis
constant betweenthedistinct observeduncensored failuretimes,i.e.
AO(t)= Ai fort(i_1)< t t(j), i = 1,..., k.

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
1972] Discussionon ProfessorCox's Paper 217

Thisis also thestartingpointfromwhichGranander(1956) derivesML estimates in the


classofdistributionswithmonotonehazardfunctions.Writing downthejointlikelihood
forCox'smodelwithAoas defined above,andadopting Kalbfleisch
andPrentice's
convention
of consideringall censoredobservations as censoredat theprecedinguncensoredfailure
time,itturnsoutthatthevaluesofp and Aiwhichsimultaneously maximizethelikelihood
are givenbysetting Cox's equation(14) to 0 to findA and by
Ls m(>/((z)t(i-,)) exp (zi,)
)
leR(t(ff)
Hencetheestimateofthecumulative
hazard

A(t) =-log -(t) = JA(u)du


evaluatedat t(i) is

A(t(i))= () exp(z,
i= 1 )
IC_R(t(J)
With, = 0 thisis theformof theKaplan-Meierestimateconsidered
by Nelson(1969).
To achievean exactanalogueof theKaplan-Meierestimate,
one maytake

t(j)<t
where
Vi = M(i/ I exp (zi A).
IeR(t(u))
Thisexpression fortheVicanalso be obtainedas a first-order
approximation totheestimate
suggestedby Cox and, as notedby them,as an approximation to theestimatederived
fromthedistinct discretetimemodelof Kalbfleisch and Prentice.
I haverecently appliedCox's regressionmodelto thecovarianceanalysisof survival
data arisingfroma clinicaltrialinvolving
268 patientson 5 regimens.Whentheestimate
of the underlying survivaldistributionsuggestedabove was comparedto the more
complicated estimateofCox,thetwowerefoundtoagreetowithin 0-001at eachtimepoint.
Even more surprising was the factthatneitherdepartedgreatlyfromthe unadjusted
Kaplan-Meierestimate, obtainedby settingA = 0 in theexpression for ^i above. This
was truein spiteofthefactthatthecovariatehad a markedeffect on survival.

The AUTHOR repliedbriefly morefullyin writing


at the meetingand subsequently
as follows.
I am verygratefulto all thecontributors
fortheirconstructive and helpfulcomments.
Manypointshavebeenmadeand it is notfeasibleto comment on themall.
ProfessorDowntonhas discusseda numberof interesting non-parametric procedures
whichhave good propertieswhenthe data are derivedfromunderlying exponential
variates.One questionhereconcernswhether to testfromdata whether
it is practicable
suchtestsare moreappropriate than,say,thosebased on underlying normalvariates.
Mr Petohas madea numberofverycogentpoints.The factthat"exact"testscan be
based on thepermutation whileit does requirethe extraassumptionthat
distribution,
consoringoperatesequallyon all groups,is important.Also his suggestion of a simpler
approximatelikelihoodforthegroupedcase is ingeniousand shouldcertainly be noted
byanyoneproposing to usethesemethods, as shouldMissHoward'svaluablecontribution
on computationalmethods.
ProfessorsLindley,Zelen,Breslowand Kalbfleischand Prenticeall raise questions
aboutthelikelihood(12). The paperis undulycryptic overthisand I agreethatfurther
workmaybe neededto clarify exactlywhatis beingdone. The essenceof theargument
seemsto meto be as follows.

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
218 Discussionon ProfessorCox's Paper [No. 2,

(a) If AO(t)is specifiedparametrically, the ordinarylikelihoodis used consisting,


whentheexplanatory variablesareindependent oftime,of a productof density functions
fromtheindividuals whofailandsurvivor functions fromtheindividuals whoarecensored.
Thiscan be regardedas an integral to whichall elements oftimeat riskcontribute.
(b) If AO(t)is arbitrary, (a) is nothelpful.(Professor Lindley'sremarkabouta trans-
formation of thetimescale is, I think,usefulonlywhenAO(t)is known.) We therefore
considerthe likelihoodfora description of part of the data, namelythe specification
of thoseindividualswho failconsidering hypothetical repetitions in whichthetimesof
failureare fixed.The probabilities in thisnewrandomsystemare deducedfromthosein
theoriginalfullerspecification. Each probability is conditionalon whathappenedat the
previoustime-points and on any intervening censoring.Factorsassociatedwithnon-
occurrences in intervening time-intervals are,however, not included.Thisis in thespirit
ofBartlett (1937).
(c) Thisraisesa numberof issues.
(i) It is assumedwithout proofin thepaperthattheusualasymptotic procedures and
properties associatedwithmaximumlikelihoodestimates and testshold.
(ii) Is it possibleand worthwhileto tryto recoverinformation whichforanyspecific
AO(t)is containedin thegapsbetweenfailures?
(iii) Whatis theloss ofinformation abouttheregression coefficients
involvedin using
the proceduresof the paper whensome parametric representation of AO(t)is in fact
appropriate?Thisclearlydependson themagnitude of theregression effectspresent.
Both Professors Lindleyand Zelen workwithformulations in whichan exponential
assumptionallows use of information arisingfromgaps. Theirresultstherefore differ
fromtheresultsofthepaperwhich,at leastwhentheexpanatory variablesareindependent
of time,are invariantundermonotonictransformations of the timescale, a property
emphasized by Mr Peto; see especially Petoand Peto(1972). Incidentally a non-Bayesian
versionof Professor Lindley'smainresultis used at theend of Section10 in comparing
alternative analyses.
ProfessorBreslow'sinteresting derivationis not, I feel,essentiallydifferent from
whatI have done. He attachesa separateunknownparameter to everygap. Thisis an
obliqueway of sayingthatthe gaps contribute no information about P. His likelihood
function has a verylargenumberof unknownparameters and thisis well knownto be
dangerous.
In discretetimethepositionis in somewaysmorecomplicated.The logisticmodel
usedin (21) is possiblysensiblefora process"really"takingplacein discrete time,butis
onlya first-order approximation whenthe data are obtainedby groupinga processin
continuoustimeto which(9) applies. Puttingthe same pointanotherway,if we had
largeamountsof data fromthesamesystemin two setswithgreatlydifferent grouping
intervals, slightlydifferent estimateswould be obtainedforthe regression coefficients.
Thisis unlikely to be a seriouspracticalpointand fromthispointofviewtherebeingan
approximation anyway,use of Mr Peto'ssimplerfunction seemsentirely sensible.
Mr Oakes'ssuggestion appearssuperior to thatofSection8 ofthepaper.
Professor Bartholomew has raisedsomeinteresting questions, whichservein particular
to emphasizethata simplemodelin termsofhazardsmaynotbe thebestwayto proceed.
Dr Meshalkinand Dr Kagan's contribution is verywelcomeas illustrating botha more
complicated formofdependence on theexplanatory variablesand theuse ofa parametric
assumption forAo(t).
Mr Glassborowstresses an important assumption aboutcensoring.As to terminology,
I thinkI havefollowedthatusualin statistical papersalthoughthismaywellnotbe ideal.
It is worthemphasizing thatthediscussionof Section9 is concernedwiththepossibly
ratherunusualsituationwherethereare twoor moredistinct kindsoffailuretime,all of
whichmaybe observed, and notwiththesituation whereonlyone kindoffailuretimecan
be observedon anyoneindividual.

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
1972] Discussionon ProfessorCox's Paper 219

ProfessorBenjamin'sanalysisis not all that different fromthe one of the paper


especiallyin thelightof Fig. 1 (whichunfortunately was not availableat themeeting).
His approachis in somewayssimpler, and therefore better,thanthatof thepaper. On
the otherhand, the regressionapproachdeals more readilywithcomplexproblems
involving manyexplanatory variables.Alsoinsimpler problems, providedthattherelation
between thedifferent
hazardsisfairlydirect, thecomparison between themis madeconcisely
in termsof parameters witha quiteimmediate physicalmeaning.Of courseI agreethat
in takingactionone wantsthestatistical uncertainty in thenarrowsense to be small,
althoughtheresurelyare situations wherethisis notachievable.
I agreewithProfessor Bartonthatthedifficulty in specifying thespaceofdistributions
involvedin themaximumlikelihoodproperty of theproduct-limit methodis not to be
takenveryseriously.On theotherhand theproperty is analogousto thatfora multi-
nomialdistribution witha verylargenumberof cellsand typicalobservedoccupancies
all verysmall,andtheusualjustifications formaximum likelihood arethenfairly
irrelevant.
Dr Garthas raisedtheimportant possibility thata widevarietyof contingency table
techniquescan be adapted. ProfessorBarlowmentionsa numberof veryinteresting
recentinvestigations.It seemsquitepossiblethattimeon testcouldbe adaptedtoproblems
of thispaperby workingwithan estimatedoperationaltimevariableafterpreliminary
estimation of theregression coefficients.
Professors Kalbfleischand Prenticehave asked forclarification of therole of time-
dependent explanatory variables.Thesemustbe eitherfixedfunctions foreachindividual
or,ifrandom,we argueconditionally on theirrealizedvalues. If wewereto takethesame
fixedfunction foreachindividual, e.g. t itself,
thecontribution woulddisappearfrom(12),
the function havingbeen absorbedinto Ao(t).In theexamplewe have an explanatory
variablethatis t forsomeindividuals and zeroforothers.
FinallyI wouldliketo stressthatwhilethemodel(9) seemsto providea flexible and
simpleway of representing a widerangeof situationsit is onlyone such way and the
possibilityof otherphysically sounderor moreeconomicalmodelsshouldnot be over-
looked. Further, giventhemodel(9), themethodof analysisgivenmainemphasishere
is onlyone wayof procedingand thepossibility of a parametric representation of AO(t)
willoftenbe worthconsideration.
REFERENCES IN THE DISCUSSION
ARMITAGE, P. (1959). The comparisonofsurvivalcurves.J. R. Statist.Soc. A, 122,279-300.
(1966). The chi-squaredtestforheterogeneity of proportions afteradjustment forstratifi-
cation. J. R. Statist.Soc. B, 28, 150-163;Addendum:1967,29, 197.
BARLOW, R. E., BARTHOLOMEW, D. J., BREMNER, J. M. and BRUNK, H. D. (1972). Statistical
InferenceunderOrderRestrictions. Chichester:Wiley.
BARLOW, R. E. and DOKSUM, K. (1972). Isotonictestsforconvexorderings.In Proc.6thBerkeley
Symp.on Math.Statist.Prob.,pp. 293-323.Berkeley:University of CaliforniaPress.
BARLOW, R. E. and VAN ZWET, W. (1970). Asymptotic propertiesof isotonicestimators forthe
generalisedfailurerate function.In Proc. 1st Int. Symp.on Non-parametric Techniquesin
StatisticalInference,pp. 159-174. Cambridge:University Press.
BARTLETT, M. S. (1937). Properties of sufficiencyand statisticaltests. Proc. Roy. Soc. A, 160,
268-282.
BARTON, D. E. (1968). The solutionof stochasticintegralrelationsfor strongly-consistent
estimators of an unknowndistribution function froma samplesubjectto variablecensoring
and truncation.Trab.Estadist.,19, 51-73.
BENJAMIN,B. (1972). Stochasticaspectsof lifetables. LM.A. Bull.,8, 12-16.
BENJAMIN, B. and HAYCOCKS, H. W. (1971). The Analysisof Mortalityand otherActuarial
Statistics.Cambridge:University Press.
BIRCH, M. W. (1965). The detection of partialassociation,II. The generalcase. J. R. Statist.
Soc. B, 27, 111-124.
BRADLEY, J. V. (1968). Distribution-free StatisticalTests. EnglewoodCliffs, N.J.: PrenticeHall.
COCHRAN, W. G. (1954). Some methodsfor strengthening the commonx2 tests. Biometrics,
10, 417-451.

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions
220 Discussionon ProfessorCox's Paper [No 2,.

Cox, D. R. (1958). The regression analysisof binarysequences(withDiscussion). J. R. Statist.


Soc. B, 20, 215-242.
DAVID,H. A. (1963). TheMethodofPairedComparisons.London: Griffin.
DE HASS,J.H. (1964). Changing Mortality Patterns
and CardiovascularDiseases. N. V. Haarlem:
De ErvenF. Bohn.
DOKSUM,K. (1967). Asymptotically optimalstatisticsin some modelswithincreasingfailure
rateaverages.Ann.Math.Statist.,38, 1731-1739.
ESARY,J. D., MARSHALL,A. W. and PROSCHAN,F. (1972). Shock models.Ann.Math.Statist.,
in the press.
FORBES, A. F. (1971). Non-parametric methodsof estimatingthe survivorfunction. The
Statistician,20, 27-52.
FRASER,D. A. S. (1968). TheStructure ofInference.New York: Wiley.
GART,J. J. (1970). Pointand intervalestimation of thecommonodds ratioin thecombination
of 2 x 2 tables with fixed marginals. Biometrika, 57, 471-475.
GRENANDER, U. (1956). On the theoryof mortalitymeasurement,Part II. Skan. Aktuarietidskr.,
39, 125-153.
HARRIS,T. E., MEIER,P. and TUKEY, J.W. (1950). Timingof thedistribution of eventsbetween
observations. Hum. Biol.,22, 249-270.
KALBFLEISCH,J. D. and SPROTT,D. A. (1970). Application of likelihood methods to models
involvinglarge numbers of parameters(with Discussion). J. R. Statist.Soc. B, 32, 175-208.
MANTEL,N. and HAENZEL,W. (1959). Statistical aspects of the analysis of data from retro-
spective studies of disease. J. Nat. CancerInst.,22, 719-748.
NELSON,W. (1969). Hazard plottingfor incomplete failure data. J. Qual. Tech.,1, 27-52.
ZELEN,M. (1971). The analysis of several 2 x 2 contingencytables. Biometrika,58, 129-137.

This content downloaded on Tue, 12 Mar 2013 03:25:47 AM


All use subject to JSTOR Terms and Conditions

You might also like