0% found this document useful (0 votes)
152 views29 pages

A General Definition of Residuals

This document provides a general definition of residuals that can be applied beyond linear models. It defines residuals as the solutions to an equation relating the observed values to the unknown parameters and independent error terms. This allows residuals to be defined for a variety of distributions, not just normal errors. It discusses how residuals can be used to assess model adequacy and provides some illustrative examples, such as regression with exponentially distributed errors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
152 views29 pages

A General Definition of Residuals

This document provides a general definition of residuals that can be applied beyond linear models. It defines residuals as the solutions to an equation relating the observed values to the unknown parameters and independent error terms. This allows residuals to be defined for a variety of distributions, not just normal errors. It discusses how residuals can be used to assess model adequacy and provides some illustrative examples, such as regression with exponentially distributed errors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

A General Definition of Residuals

Author(s): D. R. Cox and E. J. Snell


Source: Journal of the Royal Statistical Society. Series B (Methodological) , 1968, Vol. 30,
No. 2 (1968), pp. 248-275
Published by: Wiley for the Royal Statistical Society

Stable URL: https://www.jstor.org/stable/2984505

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

Royal Statistical Society and Wiley are collaborating with JSTOR to digitize, preserve and
extend access to Journal of the Royal Statistical Society. Series B (Methodological)

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
248 [No. 2,

A General Definition of Residuals

By D. R. Cox and E. J. SNELL


Imperial College

[Read at a RESEARCH METHODS MEETING of the Society, March 13th, 1968,


Professor R. L. PLACKETT in the Chair]

SUMMARY
Residuals are usually defined in connection with linear models. Here a more
general definition is given and some asymptotic properties found. Some
illustrative examples are discussed, including a regression problem
involving exponentially distributed errors and some problems concerning
Poisson and binomially distributed observations.

1. INTRODUCTION
RESIDUALS are now widely used to assess the adequacy of linear models; see Anscombe
(1961) for a systematic discussion of significance tests based on residuals, and for
references to earlier work. A second and closely related application of residuals is
in time-series analysis, for example in examining the fit of an autoregressive model.
In the context of normal-theory linear models, the n x 1 vector of random variables
Y is assumed to have the form

Y = xp + , (1)

where X is a known matri


of unobserved random va
with constant variance
residuals R* are defined by

Y = X,+R*. (2)
Provided that the number o
properties of R* are nearly those of e, i.e. R* should have approximately the
properties of a random sample from a normal distribution. In fact, R* being line
in Y, the random variable R* has, under (1), a singular normal distribution and
hence the properties of significance tests can be studied in some detail (Anscombe
1961).
The main types of departure from the model (1) likely to be of importance are:
(i) the presence of outliers;
(ii) the relevance of a further factor, omitted from (1), detected by plotting the
residuals against the levels of that factor;
(iii) non-linear regression on a factor already included in (1), detected by plotting the
residuals against the levels of that factor and obtaining a curved relationship;
(iv) correlation between different ei's, for example between ei's adjacent in time,
detected from scatter diagrams of suitable pairs of J?*'s, or possibly from a periodo-
gram analysis of residuals;
(v) non-constancy of variance, detected by plotting residuals or squared residuals
against factors thought to affect the variance, or against fitted values;

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
1968] COX AND SNELL - A General Definition of Residuals 249

(vi) non-normality of the distribution of the ei's, detected by plo


residuals against the expected order statistics from a standard normal distribution
(Pearson and Hartley, 1966, Table 28).
Corresponding to the graphical analyses suggested in points (i)-(vi), statistics can
be constructed for formal tests of significance. The idea of inspecting residuals is
very old, but the systematic calculation of residuals, particularly from extensive data,
has become practicable only recently; their thorough graphical analysis as a routine
is feasible only with a suitable computer graphical output device.
The examination of the adequacy of a model by such analyses may be contrasted
with a more formal approach in which there is fitted:

either (a) a more general model containing one or more additional parameters and
reducing to (1) for particular values of the new parameters;

or (b) a different family of models, adequacy of fit being assessed, say, by the
maximum log likelihood achieved.

One example of (a) is the family of models considered by Box and Cox (1964),
in which model (1) is considered as applying to an unknown power of the original
observations. The advantages of the more formal techniques are that they have
sensitive significance tests associated with them and that they are directly constructive
in the sense that, if the initial model does not fit, a specific better-fitting model is
obtained immediately from the analysis. On the other hand, analysis of residuals,
especially by graphical techniques, does not require committal in advance to a
particular family of alternative models. It will indicate the nature of a departure
from the initial model, but not explicitly how to extend or replace the model. With
very extensive data, significance testing is relatively unimportant and the types of
departure that can be detected are more numerous than can be captured in advance
in a few simple parametric models. It is in such applications that the analysis of
residuals is likely to be most fruitful.

2. A MORE GENERAL DEFINITION


The main object of the present paper is to give a more general definition of residuals
and to illustrate some of its properties and applications. Consider a model expressing
an observed vector random variable Y in terms of a vector P of unknown parameters
and a vector e of independent and identically distributed unobserved random variables
More particularly we assume that each observation Yi depends on only one of th
e's, so that we can write

Y, =gi(V3,i) (i= I,..n). (3)


This assumption excludes applications to time
variance problems in which several random var
Models involving discrete distributions, such as
the first place included, because, for example, P
different means cannot be expressed in terms
distributed observations. Later, however, in Sec
to deal with Poisson and binomial distributions.
To define residuals for (3), let A be the maximum likelihood estimate of ,3 from
Y. It would be possible to work with other asymptotically efficient estimates, or
even with inefficient estimates, but the details of Section 4 would be different.

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
250 COX AND SNELL - A General Definition of Residuals [No. 2,

Now suppose that the equation

Y, = go(l, R&) (4)


has a unique solution for R&, namely

Ri= hi(Y, p). (5)


Note that

q = hi(Yg, p). (6)


We take (5) as defining the residua
we shall introduce a minor modifica
Example 1. If (3) is the normal-theory linear model (1) with known variance,
the residuals (5) are the same as those, R*, of Section 1, equation (2). If the variance
is an additional unknown parameter, then

,z = i{ *2 Inl-i,

and so Ri is essentially equivalent to Ri*.


Example 2. Feigl and Zelen (1965) discussed some leukemia data in which for
the ith individual Yi is the time to death in weeks and xi is the log of the initi
white blood cell count. Feigl and Zelen considered primarily linear regression of
Yi on xi with exponential errors, but here we work with the model, mentioned briefl
by Feigl and Zelen,

Yi = /3 exp {/2(xi -x)} Ei, (7)


where el, are independently exponentially distri
x = E xJn. The advantage of (7) over a linear regressio
and all /2, xi, the random variable on the left-hand s
For this model, if Pi, /2 are maximum likelihood estim

Yi = PI exp {#2(x -x)} R


i.e.

Ri = [t exp {12(x - 9)}]-l Yi. (8)


Example 3. For some purposes it is convenient for analysing a random sample
Y1, Y,Y from a Weibull distribution to write the model in the form

Yi = (/1 3Ei)fl%2 (9)


where again e1 ..., en have an exponential distribution of unit mean.
Some further examples are given in Section 10.
Often the number of parameters is small compared with the number of observa-
tions and the configuration is such that all relevant combinations of parameters are
estimated with small standard error of order n-. Then a residual Ri will differ from
ei by an amount of order nAi in probability and most statistical properties of the
R's will differ little from those of the e's.
We examine the properties of the R's more carefully in Section 4. This will be
done by expanding R - e in a Taylor series in terms of R,- P We need some of the
properties of maximum likelihood estimates, in particular an expression for their bias,
and these are developed briefly in Section 3.

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
1968] COX AND SNELL - A General Definition of Residuals 251

3. SOME PROPERTIES OF MAXIMUM LIKELIHOOD ESTIMATION


Bartlett (1952), incidentally to his study of large-sample confidence intervals,
gave a simple expression for the bias to order n-1 of the maximum likelihood estimate
from a single random sample, there being one unknown parameter. Haldane (1953)
and Haldane and Smith (1956) further discussed asymptotic expansions for the
properties of a maximum likelihood estimate dealing with random samples and one
or two unknown parameters; for further discussion and extensions see Shenton and
Wallington (1962) and Shenton and Bowman (1963).
With a single parameter and observations that are independent, but not necessarily
identically distributed, the log likelihood is

L() = logpj(Yj, ),
where pj(Yj, /) is the p.d.f. of Yj. For a regular problem the maximum li
equation L'(P) = 0 is to first order
L'( +( - L"(P = 0. (10)
Write

U(j) - logpj(Yj, /) V 32logpj(Yj,fl) (1 1)

and replace -L"(/) by its expectation

I = E E(-V(j))
where I is the total information in the sample.
We thus have the standard first-order expressions

,2-/3=--Ij var( )= (12)

where U() = U(j), the dot indicating a sum over the sample.
To obtain a more refined answer, we replace (10) by the second-order equation

L'(/) + ( - /) L"(/) + I( - /)2L"'(/) = 0. (13)

Take expectations in (13), thereby obtaining

E(P - /) E{L"(/)} + cov { - 3, L"(/3)} + IE(iA- /2 E{L"'(j3)}


+ COV{ I I P)2, L'(P)} = 0. (14)
Now, approximately, by (12)

cov {,1-,L"(/3)} = cov (U(), V( )) = (15)

where
J = z E(U(i) V(D)).
Also if
03 logpjQ(y/3
W(j) = a ,3 l, K= E=(W?

then

EfL .. ()}= K.

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
252 COX AND SNELL - A General Definition of Residuals [No. 2,

Note that I, J, K refer to a total over the sample and are of order n.
Finally, a calculation similar to (15) shows that the final term in
whence (14) gives (Bartlett, 1952), for the terms of order 1

-IE(P-P)+7+KI = O,
i.e.

b-E(,O-/) = TI-2(K+2J), (16)


which is of order n-1.
When there are parameters S P..., 3, we define

- alogpj(Yj, P3) V(j) _ 2gp)(YX,


UP agr ' rs agr aPs
= a3logp1(Y, {3)

St gr ags a t

IrsE= E(V )) Jr,st = E{E UW VW}9 Krst = E(Wrs),). (17


Expansion of the equation

[KL/arIp-= = 0
replaces the first-order equation (12) by

Or-Pr=Irs U( 9 cov (Or,jAs) = Ir. (18)


The superscripts denote matrix inversion and the summation conven
to multiple suffices referring to parameter components.
The second-order equation (13) becomes

i3L A a2L _A 3L 0
gr +(O s- P )p +p 12 (9t-t) ( u-fu)ggSr J=

On taking expectations, we have that

E($s-/3s) Irs = V2Ilu(Krtu + 2Jt,ru), (19)


a set of simultaneous linear equations for the biases, with solution

b= _E(os-p ) = 2IrsItu(Krtu +2Jt,ru). (20)


In the right-hand side of (20), which is of order n-1, consistent estimates of parameters
can be inserted.

4. FURTHER PROPERTIES OF RESIDUALS


It would be useful to know the joint distribution of the Ri's defined by (5). T
distribution of any suggested test statistic could then be found, and the properties o
graphical procedures evaluated. It is, however, not feasible to determine this joi
distribution in general and, as a first step, we consider the expectations and covarian
of the Ri's.

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
1968] COX AND SNELL - A General Definition of Residuals 253

For this, expand (5) in series, obtaining to order n-1

Ri = ei + (ir-/Pr) HI" + r("r- Pr) (s- Ps) Hril (21)


where

H(i) =ahi(Yi, r
p) g
_ 3ih)(Yi, p)
H(i) ~~~~~~~~~~(22)
r gr a s Ir ags
Thus

E(Ri) = E(Ei) + E(ir - Pr) E(H (i)) + COV (r - Pr Hi))


+ JE{(Er -Pr) (s - Ps)} E(Hrs)), (23)
the neglected terms being o(n-1).
In (23), the second term is given by (20). The fourth term is given to sufficient
accuracy by the usual large-sample result (18) and to evaluate the third term, we have,
again by (18),

COV (r - Pr H i)) = E(Irs U I) H(i))


= IrsE(U(i) H(i)); (24)

on the right-hand side the summation convention does not apply to the superscript i.
Thus, to order n-1,

E(Ri) = E(Ei) + br E(H(i)) + Irs E(H(i) U I) + 1H(i))


= E(,E) + ai, (25)
say. In the same way, squaring (21) and taking expectations, we have to the same
order that

E(RM) = E(,E4) + 2brE(EiHji)) + 2IrsE(Ei Hi) U) + 1H i) H + -,i H ()) (2


and that for ioj

E(Ri Rj)
E(i^= E(,E,)}2 r E(,E)
+(a, +aj) s ++Irs E1
E(Ei H ~
Q) UU r/
(i)+ jH(i) UU)s
+ H (i) H ID)
(27)
We can summarize (25), (26), and (27) as follows:

E(Ri) = E(Ei) + ai,


var (Ri) = var (Es) + cii, (28)
cov((Rg, Rj) = cij, )
where ai, cii, cij can be found in terms of the right-hand sides of (2
and are of order n-1. Note that the summation convention applies to the
suffices only and not to cii.
A simple example of these formulae is given in Section 6.

5. APPLICATION OF RESULTS OF SECTION 4


There are broadly three ways in which the above results can be used.
Firstly, if all the correction terms in (28) are numerically small this gives some
assurance that treating the Ri's as having the same statistical properties as the Ei's
is reasonable. If, say, the correction terms are small for all residuals except one, we
might look at that residual separately and then omit it from the rest of the analysis.

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
254 COX AND SNELL - A General Definition of Residuals [No. 2,

Secondly, for particular types of test statistic, the results can be use
mate to its distribution. Thus if we consider

T 3 Rizz,

where the zi's are constants, then

E(T) = E(e) E zi + E a- zj,

var (T) = var (e) Z+ EZZ izzc.


A statistic used for testing possible dependence on zi of var (ej) is

T' R(zi-2), (29)

where 2z= zi/n. Here the r

E(T') = E cii(zi- 2) +
In principle, it is possible t
and hence to reach an approximation to var(T').
Thirdly, we may use (28) to define a modified residual RJ? having more nearly the
properties of Ei. How best to do this depends somewhat on the particular case, but
one fairly general procedure is to write

Ri = (1 +ki) R+ l, (30)
where ki, 4 are small constants. If we require

E(gR) = E(e ., var(Ri) = var (ei), (31)


two equations determining ki, 4 follow from (28).
A serious limitation to this discussion is that it applies only indirectly to the
examination of distributional form, by plotting ordered residuals against the expected
order statistics for the distributional form proposed for the Ei's. For this we would
like, in particular, to calculate E(R(j)), where R(j) is the ith largest residual;
alternatively, we would like to introduce modified residuals Rt' such that

E(R11)) = E(E(j));
of course, it is easy to formulate even more ambitious aims. It is plausible, but not
certain, that the modification (30), designed to produce residuals with approximately
the same marginal mean and variance, is an advantage also from the point of view
of plots to examine distributional form.

6. AN EXAMPLE
We consider further the data of Feigl and Zelen (1965). The model (7) is

Yj = f1L exp{f2(x1-x)} E1, j = 1,2, 2. .., n.

We first find the bias in Al, 12. Since ei is exponentially distributed with unit mean,
we have, writing (xj - x) = dj,

pj(Yj, ,3) = [exp { - Yj exp (- P2 d)1f3}]I{l1 exp (P2 dj)},

logpj(Yj, 3) = - {Yj exp (-f2 d2)/dpl} - log /3 fl2 dj, (32)

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
1968] COX AND SNELL - A General Definition of Residuals 255

which, on differentiating and taking expectations, leads to

(n/Pl2 r=s= 1,
Irs d ! r =s= 2,
IO, r# s.
Equations (20) therefore give

b, = '111{I(Klll + 2J1,11) + I22(K122 + 2J2,12)},


with a similar equation for b2. From (17) and (32), we have, without the summation
convention,

Jt,lt= E(E U) V(j)


(-2n/P3. t = 1,
- E dO/l/, t = 2,
and

Ka= E(W( )
(4n/p 3 t = 1,
( d//ll, t = 2.
Thus

b, =- Iln.
A similar calculation gives

b2 =-2 Ed /(E d j)2.


To find E(Ri) and E(J? ), given by (25), (26), we write

hi(Yi, p) = Yi exp (-2 d)/31


from which we evaluate H"), etc., and obtain

E(R?) = I + An-' + '(di , d3- 0 , d!)1(y d4)2


= 1+ai, (33)
E(RM) = 2 +2(di d3-2d0 - d4)/(E dj)2
= 2+ci, (34)
where the connecti
Although Feigl and Zelen compared two groups of observations it is sufficient
here to consider only one of the groups. The data are given in Table 1, together
with the corresponding values ai,citi calculated from (33), (34); values of the cru
residuals R,, defined by (5), are also given.
In order to calculate modified residuals R' we note, since Ei has an exponentia
distribution, that we require a transformation which adjusts the mean and variance
and yet restricts RJ to be positive. Hence we take

Ri = {Ri/(l )}l+ki, (35)

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
256 COX AND SNELL - A General Definition of Residuals [No. 2,

where both li and ki are small. Assuming this transforms RJ?


distribution with unit mean, it can be shown that

E(R,) = (1 -1,) r{1 + 1/(1 +ki)}


and

E(J2) = (1-4,)2r{1+ 2/(1 + k,)}.

TABLE 1

Leukemia data (Feigl and Zelen, 1965). Log white blood cell count, xi.
Survival time, weeks, Yi. Crude and modified residuals, R,? and R,

Xi Yi ai Ci Ra RIi

3-36 65 -0-013 -0-340 0 56 0 49


2 88 156 -0 088 -0-946 0-79 0-70
3-63 100 0.013 -0-134 1.17 1.12
3*41 134 -0007 -0-293 1P23 1-20
3*78 16 0-022 -0-063 0-22 0.19
4-02 108 0-029 -0-003 1 94 1.91
4.00 121 0-029 -0 005 2 13 2.11
4-23 4 0.028 -0-012 0*09 0.07
3*73 39 0.019 -0-082 0-51 0-46
3-85 143 0-025 -0 039 2412 2411
3*97 56 0-028 -0*009 0.96 0.90
4-51 26 0.015 -04109 0-80 0-74
4*54 22 0.013 -0-131 0-71 0-65
5-00 1 -0-037 -0-528 0.05 0.03
5-00 1 -0*037 -0-528 0.05 0.03
4-72 5 -0-002 -0-249 0.19 0415
5-00 65 -0-037 -0-528 3-47 4417

If we equate these expressions to (33) and (34), take logarithms and expand, ignoring
high-order terms of ki and li, we find

ai=-li-(1-y)ki, cif= -41i-2(3- 2y)kj,


where y = rP(1)/P(1). Thus

ki = (4ai-citi), li = 021ct. -1 85ai. (36)


Values of RJ are given in Table 1; the most noticeable difference betwee
and modified residuals occurs in the final entry, at x = 5 00. A plot of R
shows no evidence that the mean or dispersion of the residuals vary systematically
with x. The modified residuals RJ? are shown plotted in Fig. 1 against the expected
values of exponential order statistics for a sample of size n = 17; the assumption of
an exponential distribution is clearly confirmed.
A supplement to the graphical analysis is the calculation of a test statistic designed
to examine consistency with the exponential distribution; see Cox and Lewis (1966,
pp. 161-163) for a brief review of such tests applied to simple random samples.

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
1968] COX AND SNELL - A General Definition of Residuals 257

One such test, although not normally the best, is based in effect on c
variance with the square of the mean. Now
T*=-RI- = 31-07

and the result (34) leads, with n = 17, to


E(T*) - 2(n-2) = 30.

#4

40o xoF~~~~~~~~~~~~~~~~~

396

32

268 -

V 24

2-0 -

1-2 -

0-8

0*4

? 0-4 0.8 12 16 2-0 2.4 2.8 3-2 36

EXPECTED EXPONENTIAL ORDER STATISTIC

FIG. 1. Leukemia data. Modified residual, Rk, versus expected exponential order
statistics. Straight line corresponds to unit exponential distribution.

Now if the residuals were obtained directly from a random sample of size n, i.e.
Ri= Yil , and the test statistic is
T* = / F)2
then it is easy to show that, to the order considered,
E(T*) 2(n-1).

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
258 COX AND SNELL - A General Definition of Residuals [No. 2,

This suggests that T* should be regarded as derived from a random sample of


size n - 1, i.e. a "degree of freedom" subtracted for the extra parameter fitted. Again
the close fit to the exponential distribution is confirmed.
The covariance between different residuals can be calculated from (25), (27).
In fact

cov (Ri, Rk) n-n - di dkI/ d . (37)


This is identical, to the order considered, with the corresponding formula for ordinary
linear regression. Since the residuals here have approximately unit variance, (37) is
also the correlation between residuals. In this example, the numerically greatest
correlation is about 02. The presence of a substantial correlation between a particular
pair of residuals would have been a warning of possible difficulties of interpretation,
especially if both residuals had appeared to correspond to outliers.
In one way the fact that the Ri and the Ri differ relatively little is an anticlimax.
A more encouraging way of looking at the conclusions is, however, that they suggest
that, even with the very small number of observations considered here, the distortion
introduced by the fitting is small and that the unmodified residuals may in practice
often be adequate.

7. POISSON DATA
In order to apply the methods of the preceding sections to Poisson-distributed
observations, we must first consider how to define residuals so as to obtain nearly
identically distributed variables. The difference from the earlier discussion is that,
there, the model was defined directly in terms of independently distributed random
variables. Here we proceed indirectly, defining Ri as
(a) (Y - )l/i4K,
(b) 2(V/Yi - Vk),
or

(c) {Wb(Y*) - t/{*)}/{Ab'(j*) v


where [ i= [ti(1) is the expected frequency, on some model dependent upon para-
meters l, ..., P, of the Poisson observation, Yi. Each of these, asymptotically,
defines a standard normal deviate; (c) is a generalization of (a) and (b), and in it
0(x) is an arbitrary function.
The choice of an appropriate transformation depends upon the requirements;
the need for a direct interpretation might lead to (a) or, alternatively, for a multi-
plicative model, to (c) with 0(x) = log x. But since our immediate object is to detect
departures, rather than to explain them, it is desirable to find a transformation to
give a set of residuals with a distribution as near as possible to some known form.
Anscombe (1953) suggests that an appropriate transformation for normalizing
Poisson observations is

{y(-i ((6)}/(3 )v (38)


This can be derived by considering the Taylor expansion for moments of a power
yh and equating the coefficient of skewness to zero; see also Moore (1957).
We therefore define hi(Yi, P) by (38); we have also

pj(Yj,,P) = exp (- uj) j Y.Y!

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
19681 COX AND SNELL - A General Definition of Residuals 259

and hence we can apply the results of Section 4 to obtain E(R) and E(R!). To do so,
however, involves certain approximations and it is necessary to distinguish between
terms that become small when n, the number of distinct Poisson observations, is
large and those that become small when It, the expectation of a typical observation,
is large. The general results of Section 5 refer to large n.
The biases are given by

bs = -IjrsjtuZ1 -3 (39)

with

Irs - @ 1 a _p

and hence bs is of order j'-1.


In obtaining E(R*) and E(Rj2), there is appreciable simplification if we consider
leading terms in expansions in powers of l/l; this leads to terms of order 4-i for
E(RJ) and of order 1 and jL_r for E(R2). Thus if ,ti is sufficiently large it will be adequate
to take as an approximation only the terms of order 1 and this gives

E(R2) = 0, E(R 1 ..Irs 1 (40)


IDagr aps ti'L
The expression for E(R*) to order p4-i is, in fact,

E(J?%) =i_?rk +Irs/ 1 ____i _ ku (41)


ER 4 'gr k6, pr afls 214 apr (41)
The transformations (a) and (b) also lead to the approximation (40), to the order
considered.

8. BINOMIAL DATA

Following the arguments of Section 7, we define a transformation O(Y*/m*) of th


observation Yi from a binomial distribution with parameters Qj(p) and mi. By
considering the Taylor expansion and equating the skewness to zero, we obtain a
differential equation of which a solution is

f(u)= f -(l t)- dt, 0 K u< 1. (42)

Blom (1954) suggests equation (42) as a normalizing transformation but does not
apply it. In order to simplify its application we have computed Table 2. This gives
values of i(u)/+(l), i.e. the incomplete beta function IuQ-, 23), which is symmetrical
about u = 0-5; multiplication by B(-, 23) = 2X0533 gives the value of (42). For example,
(0-2) = 2*0533 x 0-257 = 0'528, 4(0*8) = 2-0533 (1 -0 257) = 1526.
Introducing the mean and variance of the transformed binomial variate, we define

h*(Yi, 3) = [O(Y*/m*) - O{6* - (l - 2 0)/m1}]/{06(1 - Gi)I/lmi}; (43)


this reduces to (38) for small Oi. Plots on probability paper suggest that the trans-
formation is very effective, even for values as small as mi = 5, 6i = 0 04. Often the
bias correction - (1 - 20*)/mi can be omitted.

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
260 COX ANDM SNELL_- A General Definition of Residuals [No. 2,

TABLE 2

Values of the incomplete beta function I -b3, )

u 0.000 0.001 0.002 0.003 0-004 0-005 0-006 0-007 0.008 0-009

0.00 0 0-007 0-012 0-015 0-018 0-021 0-024 0.027 0-029 0-032
0.01 0-034 0.036 0-038 0.040 0.043 0.045 0-046 0.048 0.050 0-052
0-02 0-054 0-056 0.058 0-059 0.061 0.063 0-064 0.066 0-068 0-069
0-03 0-071 0-072 0-074 0-075 0-077 0-079 0-080 0-082 0-083 0-084
0-04 0-086 0-087 0-089 0.090 0-092 0-093 0-094 0-096 0-097 0-098
0.05 0.100 0.101 0.102 0.104 0.105 0-106 0-108 0-109 0-110 0-112
0.06 0.113 0.114 0-115 0-117 0-118 0-119 0-120 0.122 0.123 0-124
0-07 0.125 0.126 0.128 0.129 0-130 0.131 0-132 0-134 0.135 0-136
0.08 0.137 0.138 0.139 0-141 0.142 0.143 0.144 0.145 0-146 0-147
0.09 0-149 0.150 0.151 0.152 0.153 0-154 0-155 0-156 0-157 0-158
0.10 0.160 0.161 0.162 0.163 0-164 0-165 0-166 0-167 0-168 0-169
0-11 0-170 0-171 0-172 0-173 0-174 0-176 0-177 0-178 0-179 0-180
0.12 0-181 0.182 0.183 0-184 0-185 0-186 0-187 0-188 0-189 0-190
0.13 0.191 0.192 0.193 0.194 0-195 0-196 0-197 0-198 0-199 0-200
0.14 0-201 0-202 0.203 0-204 0-205 0-206 0-207 0-208 0-209 0-210
0.15 0-211 0-212 0.213 0-214 0-214 0-215 0-216 0-217 0-218 0-219
0.16 0-220 0-221 0.222 0-223 0-224 0-225 0-226 0-227 0-228 0-229
0-17 0-230 0-231 0.232 0.232 0-233 0-234 0-235 0-236 0.237 0-238
0.18 0-239 0-240 0-241 0.242 0-243 0-244 0-244 0-245 0.246 0-247
0-19 0-248 0-249 0-250 0.251 0-252 0.253 0-254 0-254 0.255 0-256
0-20 0-257 0-258 0-259 0-260 0.261 0.262 0-262 0-263 0-264 0 265
0-21 0-266 0-267 0.268 0-269 0-270 0.270 0-271 0-272 0-273 0-274
0-22 0-275 0-276 0.277 0.277 0-278 0.279 0-280 0-281 0.282 0-283
0-23 0-284 0-284 0-285 0-286 0-287 0-288 0-289 0-290 0.290 0-291
0-24 0-292 0-293 0.294 0-295 0-296 0.296 0-297 0-298 0-299 0-300
0-25 0-301 0-302 0.302 0.303 0-304 0-305 0-306 0-307 0.308 0-308
0-26 0-309 0-310 0.311 0-312 0-313 0-313 0-314 0-315 0.316 0-317
0-27 0-318 0-318 0-319 0.320 0-321 0.322 0-323 0-323 0.324 0-325
0-28 0-326 0-327 0-328 0.328 0.329 0.330 0-331 0-332 0-333 0-333
0-29 0-334 0-335 0-336 0-337 0-338 0.338 0-339 0-340 0.341 0-342
0-30 0-342 0-343 0.344 0.345 0-346 0.347 0-347 0-348 0-349 0-350
0-31 0-351 0-351 0.352 0.353 0-354 0.355 0-355 0-356 0-357 0-358
0-32 0-359 0-360 0.360 0.361 0-362 0.363 0-364 0-364 0-365 0-366
0-33 0-367 0-368 0.368 0.369 0-370 0.371 0-372 0-372 0.373 0-374
0-34 0-375 0-376 0.376 0-377 0-378 0.379 0-380 0-380 0-381 0-382
0-35 0-383 0-384 0-384 0-385 0-386 0-387 0-388 0-388 0.389 0-390
0-36 0-391 0-392 0-392 0-393 0-394 0-395 0-396 0-396 0.397 0-398
0-37 0-399 0-400 0-400 0-401 0-402 0-403 0-403 0.404 0-405 0-406
0-38 0-407 0-407 0.408 0-409 0-410 0-411 0-411 0.412 0.413 0-414
0-39 0-414 0-415 0.416 0-417 0-418 0-418 0-419 0-420 0-421 0-422
0-40 0-422 0-423 0-424 0.425 0-425 0.426 0-427 0-428 0.429 0-429
0-41 0-430 0-431 0.432 0.433 0.433 0-434 0.435 0-436 0-436 0.437
0-42 0.438 0-439 0-440 0-440 0-441 0-442 0-443 0-443 0-444 0-445
0-43 0-446 0-447 0-447 0-448 0.449 0.450 0-450 0-451 0.452 0.453
0-44 0-454 0-454 0.455 0.456 0.457 0.457 0.458 0-459 0-460 0-461
0-45 0-461 0-462 0-463 0.464 0.464 0.465 0.466 0-467 0-468 0-468
0-46 0-469 0-470 0.471 0.471 0-472 0-473 0-474 0-474 0.475 0.476
0-47 0-477 0-478 0.478 0.479 0-480 0.481 0-481 0.482 0-483 0.484
0-48 0-485 0-485 0-486 0.487 0-488 0.488 0-489 0-490 0.491 0.491
0-49 0-492 0-493 0-494 0.495 0.495 0.496 0-497 0-498 0.498 0-499

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
1968] COX AND SNELL - A General Definition of Residuals 261

The p.d.f. of Yj is

p1(y, p) = O Y (1 - oj)"-Yj,

from which we obtain the biases

b ir8I2 ina m j ,2 t (44)


s OP~~~( - 6j) aPr ap
To obtain E(R*) and E(R.), we consider expansions in powers of m-l and get,
analogous to (40), the approximation

E(Rj- = 0 E(RI) =-I - (45) m


1 ag~~r 'OPS OM( - )
In the numerical example which follows, we checked that t
neglected in (45) are indeed negligible.

9. A FURTHER EXAMPLE
Dyke and Patterson (1952) present the analysis for a 24 factorial design of the
proportions of respondents who achieve good scores on cancer knowledge; some
details of the data are given in columns (1)-(3) of Table 3. They assume a logit
transformation of the proportions, the expected value of the transformed variat
being a linear function of parameters representing main effects and interactions
Values of the parameters are estimated by maximum likelihood. We consider thei
solution and apply the methods of Section 8 to examine residuals from the fitted
model.
Following Dyke and Patterson (with slight changes in notation) we write the model
as
Oj(P) = {1 +exp (-2z1)}-',
where zj = br/Pr, summed over the parameters; Ijr = ? 1. Then

a o = 2 0j(l- Oj) ljr,

_2 __ = 40j(1 -j) (1-2Oj) ljr ljs

and substitution into (44) gives

b5 =4Irs Itu E mj 0j(1- Oj) (I1-2 Oj) ljr ljt ljuf.


From (45), we have
E(R) =-0
and
E(Rq) - 1- 4Irs m* O(l - 0) lir lis

= 1+ctii.
Dyke and Patterson fit a model with five parameters, representing the overall
mean and the four main effects. They quote the values of Irs obtained in the course
of their solution and we use these values to calculate b8 and ct.

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
262 COX AND SNELL - A General Definition of Residuals [No. 2,

In order to calculate modified residuals, we use (30) writing 4 = 0 since E(Ri) = 0;


solving for ki, remembering ki is small, we obtain

Ri' = (I-1- ci .9 Rs.

Values of cit, Ri and R' are given in Table 3. The biases b5 were all extremely small;
none exceeded 21 per cent of the standard error of the estimate of the parameter.

2-0 -

1-6

12

0.8 .

0-4-

010

-ox /~~~~~~~~~

-0-4-

-08 ,X x

-1*2-

-16 ' /

-1.6 -12 -0-8 -0 4 0.0 0.4 0.8 I.2 1.6


EXPECTED NORMAL ORDER STATISTIC

FIG. 2. 24 factorial design. Modified residual, R', versus expected normal order
statistics. Straight line corresponds to unit normal distribution.

The modified residuals R? are plot,ted against the expected normal order statistics
in Fig. 2 and show agreement with the assumption of a standard normal distribution.
Dyke and Patterson go on to fit a model with extra parameters to represent the
interactions AD, BD and CD. Before proceeding to extend the model, we examine
the residuals R, from the simple main effects model. If we regard the values of R,
as observations in a 24 design, we can analyse them in the usual way and obtain the
sums of squares given in Table 4. This is an unweighted analysis and, as such, can
be used only for guidance; the existence of any apparent effect can be established
only by fitting a model containing the appropriate parameters. Also, for the same
reason, the sums of squares due to main effects in Table 4 are not zero. Nevertheless,

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
1968] COX AND SNELL - A General Definition of Residuals 263

the magnitude of the AC effect suggests it to be worth including in the model along
with AD, BD and CD; the three-factor interaction ACD also is large but has not been
included in the further analysis. We therefore fitted a model with nine parameters;

TABLE 3

24 factorial design (Dyke and Patterson, 1952). ri number of "good"


responses out of mi. Crude and modified residuals, Ri and R'

Treatment mi ri -4c, Rj R

abcd 31 23 0.248 0-36 0-41


abc 169 102 0 540 -0-46 -0-68
abd 12 8 04135 1P34 1P44
ab 94 35 0-372 -0 07 -0 09
acd 45 27 0 379 -0 59 -0 75
ac 378 201 0.711 -0 45 -0-84
ad 13 7 04133 1P04 1412
a 231 75 0-527 0.65 0 94
bcd 4 1 0 050 -1P33 -1P37
bc 32 16 04190 0-38 0 43
bd 7 4 0-076 1-38 1P44
b 63 13 0*225 -0-68 -0-78
cd 11 3 0.123 -1P47 -1-57
c 150 67 0 505 1P45 2.06
d 12 2 0*100 -0*73 -077
1 477 84 0*705 -0-78 -1 43

TABLE 4

24 factorial designs. Sums of squ

Sums of squares
Effect
R, Rj

A 0.78 0.80
B 0-26 0-20
C 109 1P14
D 001 000
AB 0 04 0 04
AC 2-53 0.98
AD 1P85 1P57
BC 0.32 0-20
BD 2-06 1P69
CD 4-88 3-89
ABC 2-51 1P26
ABD 007 010
ACD 3-77 1P97
BCD 003 005
ABCD 003 003

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
264 COX AND SNELL - A General Definition of Residuals [No. 2,

the estimated values and standard errors are given in Table 5. Comparison of the
estimates with their standard errors confirms that the AC interaction is at least as
significant as the AD and BD interactions.

TABLE 5

24 factorial design. Estimated parameters and


their standard errors

Parameter Estimate Standard


error

Mean -0-13 0.06


A 0-25 0-06
B 0-12 0 05
C 0-17 0 05
D 011 0-06
AC -0 05 0 03
AD 010 0-06
BD 0-06 005
CD -0 11 0 05

For comparison, the corresponding analysis on the crude residuals, Ri, is also given
in Table 4; it is interesting that the AC interaction does not stand out in this case.
Note, however, that 9 parameters are being fitted to 16 observations, so that the
applicability of the asymptotic formulae is in doubt.

10. FURTHER WORK


Some possible extensions of the work fall under the following three broad
headings.
(i) There are applications, for example to time series and components of variance
problems, in which more than one random variable contributes to each observation.
Durbin and Watson (1950, 1951) have considered a significance test for serial
correlation based on residuals from a fitted regression.
(ii) There could be more refined distributional studies of the quantities discussed
above. In particular, the more detailed study of order statistics of residuals would be
useful both in connection with detecting outliers and for tests of distributional form.
There is obvious scope for simulation.
(iii) Further special applications can be considered. Two general types concern
(a) the examination of distributional form from random samples, for example by
using the probability integral transformation to produce residuals having a theoretical
rectangular distribution, and (b) more complex situations that are essentially generali-
zations of regression.
As an example of (a), consider the Weibull distribution (Section 2), where a
transformation can be made to the unit exponential distribution, or the circular normal
distribution, where the probability integral transformation can be applied. As further
examples of (b), consider two problems closely associated with normal-theory
regression, namely the calculation of residuals after fitting a transformation (Box and
Cox, 1964) or after fitting a non-linear model by iterative least squares.

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
1968] Discussion on the Paper by Professor Cox and Mrs Snell 265

ACKNOWLEDGEMENT
We are grateful to Mrs E. A. Chambers and Mr B. G. F. Springer for programming
the calculations. Their work was supported by the Science Research Council.

REFERENCES
ANSCOMBE, F. J. (1953). Contribution to the discussion of H. Hotelling's paper. J. R. Statist.
Soc. B, 15, 229-230.
(1961). Examination of residuals. Proc. 4th Berkeley Symp., 1, 1-36.
BARTLETT, M. S. (1952). Approximate confidence intervals. II. Biometrika, 40, 306-317.
BLOM, G. (1954). Transformations of the binomial, negative binomial, Poisson and x2 distri-
butions. Biometrika, 41, 302-316.
Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations. J. R. Statist. Soc. B,
26, 211-252.
Cox, D. R. and LEWIS, P. A. W. (1966). The Statistical Analysis of Series of Events. London:
Methuen.
DURBIN, J. and WATSON, G. S. (1950). Testing for serial correlation in least squares regression.
I. Biometrika, 37, 409-428.
(1951). Testing for serial correlation in least squares regression. II. Biometrika, 38, 159-178.
DYKE, G. V. and PATTERSON, H. D. (1952). Analysis of factorial arrangements when the data are
proportions. Biometrics, 8, 1-12.
FEIGL, P. and ZELEN, M. (1965). Estimation of exponential survival probabilities with concomitant
observation. Biometrics, 21, 826-838.
HALDANE, J. B. S. (1953). The estimation of two parameters from a sample. Sankhyd, 12, 313-320.
HALDANE, J. B. S. and SMITH, S. M. (1956). The sampling distribution of a maximum likelihood
estimate. Biometrika, 43, 96-103.
MOORE, P. G. (1957). Transformations to normality using fractional powers of the variate.
J. Am. Statist. Ass., 52, 237-246.
PEARSON, E. S. and HARTLEY, H. 0. (1966). Biometrika Tables for Statisticians, 3rd ed. Cambridge
University Press.
SHENTON, L. R. and BOWMAN, K. (1963). Higher moments of a maximum-likelihood estimate.
J. R. Statist. Soc. B, 25, 305-317.
SHENTON, L. R. and WALLINGTON, P. A. (1962). The bias of moment estimators with an appli-
cation to the negative binomial distribution. Biometrika, 49, 193-204.

DISCUSSION ON THE PAPER BY PROFESSOR Cox AND MRS SNELL


Mr M. J. R. HEALY (Medical Research Council): From one point of view, the
statistical developments of the last half-century or so may be regarded as the detailed
exploitation of the linear model. That these developments are not over is evidenced by
the programme of next week's ordinary meeting of the Societyt (to which Professor Cox is
making another of his characteristically stimulating contributions); but there is an ever-
growing interest nowadays in the use of non-linear models for the statistical study of a
wide diversity of phenomena. One reason for this is the feeling that the physical situation
generating the observations will often demand the use of non-linear formulae, and these
are bound to be needed as soon as we deal with either thresholds or asymptotes.
Parallel with this change in the climate of statistical thought has been the growing
interest in the study of residuals. As soon as a feeling of dissatisfaction with one's model
sets in, it is natural to look at the discrepancies between its predictions and what is actually
observed, and there is now a fair-sized literature dealing with methods for doing this.
Both the tendencies I refer to have been fostered by advances in computing facilities which
have brought to practical trial hitherto theoretical notions and techniques.
It is when both tendencies intersect that we reach the subject of tonight's paper. If our
model involves, for example, a curve with a horizontal asymptote, it is almost inevitable
that the standard notion of additive homoscedastic errors with a symmetric distribution

t The discussion at that meeting is published in Part 3 of Series A of the journal.


10

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
266 Discussion on the Paper by Professor Cox and Mrs Snell [No. 2,

will be quite inadequate. A model which takes the underlying physical situation seriously
enough to influence the expectations might also allow it to affect the error structure, and
doing so is likely to lead to just the sort of problems dealt with here. The solutions to
such problems may well be more important than they would be in the linear case. If a
linear model is admittedly an over-simplification, we can tolerate even some systematic
departures from it, weighing these against the advantages of simplicity and ease of calcu-
lation; whereas the extra cost of the non-linear model may not be recouped by better
insight into the true situation if discrepancies between model and observations are allowed
to remain uninvestigated.
I would like to make a few specific comments on the paper, in particular on the second
example, and to ask the authors whether they have compared their incomplete beta-
function transform with any alternatives. The maximum-likelihood calculations can often
be viewed as weighted least squares following a transformation, and thus can readily be
persuaded to produce a set of quasi-residuals as a by-product. Even simpler, perhaps,
is a procedure I have myself found useful for informal investigation; it consists of tabulating
or plotting (observed - expected)/(expected)i the sum of squares of these quantities being
simply the goodness-of-fit x2. The surprising thing about the transformation tabulated in
the paper is perhaps its near linearity between 10-90 per cent.
I confess that I share the authors' reservations over treating 16-9 as a large number.
I would also like to ask whether they know any way of taking advantage of the fact that
their "ordered plots" should in the null situation not only give straight lines but straight
lines with known position and slope. Perhaps one might examine the residuals.
In summary, my reaction to this paper has been that which is so often stimulated by
papers in which Professor Cox has had a hand-a mixture of somewhat envious admiration
with an impatience to go away and apply the suggested methods to data a satisfactory
treatment of which has so far eluded me. I would like to propose a very hearty vote of
thanks to both authors.

Mr P. J. HARRISON (Imperial Chemical Industries Ltd): In tonight's paper Professor Cox


and Mrs Snell deal with a topic which has always been of importance to practising statis-
ticians and which with the advent of computers has grown in importance. The main
questions which we ask of residuals have been detailed in Section 1 of the paper and these
are broadly:

(i) Does the fitted model adequately describe the data?


(ii) Are the data adequate for determining the model? Here we may be concerned with
the need for data editing such as blocking observations or eliminating outliers. We may
also be concerned about whether more observations are required for model discrimination.
(iii) Is the assumption about the nature of the error distribution correct and if not is a
re-analysis necessary?

Before statisticians had ready access to computers they had a reasonable degree of
control over residuals. This was primarily due to the fact that observations largely
resulted from statistically designed experiments and there was often prior information
about the nature of the experimental error. When this was not the case then computing
limitations usually restricted investigations to models with few parameters.
However, immediately computers became available for statistical analyses, multiple
linear regression and non-linear estimation programs were written which were capable
of analysing models involving a large number of parameters.
Initially most of these programs were applied somewhat indiscriminately, and generally
to data which had not been statistically planned. Residuals were listed but their analysis
was left to the statistician who probably contented himself with a quick scan for outliers
and distribution form. More recently these programs have attempted to automate the
examination of model adequacy.

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
1968] Discussion on the Paper by Professor Cox and Mrs Snell 267

In one I.C.I. program there is an option for considering the family of mod
by Box and Cox (1964). The paper defines this as a more formal approach. Procedures
are also available for dealing with outliers, for testing distributional form and for plotting
residuals with selected variables. It is also possible to test the hypothesis that the residuals
are distributed in the multivariable space according to a given law. In this case interest
might lie in detecting and defining regions of the space in which the residuals are signifi-
cantly correlated. Clump or Cluster Analysis is a possibility here.
In general the residuals which are analysed are those the paper refers to as Crude
Residuals and Professor Cox and Mrs Snell draw our attention to the fact that perhaps we
should work in terms of Modified Residuals. However, the paper comes to no definite
recommendation that we should use the Modified Residual except perhaps when the num-
ber of observations is small compared with the number of parameters to be estimated,
and I would agree with the authors that this is encouraging.
The paper suggests that further work might be undertaken in defining residuals related
to time series and component of variance problems. Certainly in such situations the
problems are more difficult. Box and Jenkins discuss the identification of non-stationary
time series models using combinations of differencing, regression, overparameterization
and simulation techniques. In industry the most frequently encountered non-stationary
model can be represented in terms of a first-order random walk generating process:

dt = mt +,Et, (1)
mt = mt-i+Yt,

where dt is the observation at time t and e and y are each a set of identically independently
distributed random variables with zero mean.
This model is a particular case of the polynomial random walk in which each derivative
is subjected to random impulses. The linear least-squares predictor for (1) is

dt+1 = dt + Aet,

where et = dt-dt, and A depends on the magnitude of V(e)/ V(y).


In a dynamic situation the residuals are closely scrutinized and often analysed using a
series of Wald Sequential tests in the form of a Backward Cumulative Sum test. If the
test shows a significant departure from the null hypothesis that the predictor is satisfactory
then the residuals are used to assess (i) whether an "outlier" has occurred in the e set in
which case that particular outlier is ignored (ii) whether an outlier has occurred in the
y set in which case adjustments to the predictor are made or (iii) whether the model has
changed.
In the study of chemical reactions and processes both variance components and
stochastic variation are to be expected and present difficulties in the analysis of residuals.
Professor Cox and Mrs Snell have taken one step towards the clarification of residuals
and I have much pleasure in seconding the vote of thanks.

The vote of thanks was put to the meeting and carried unanimously.

Professor F. J. ANSCOMBE (Yale University): I heartily endorse Mr Healy's appreciation


of this paper. It is a most timely study of a topic that will surely command widespread
interest-a topic that no doubt many of us have been uneasily aware of and are now
delighted to see unfold so clearly.
Back in the days when I frequented this hall, an honoured tradition permitted a con-
tributor to the discussion sometimes to make remarks only indirectly related to the
subject of the paper. I hope I shall be pardoned for speaking now about computing.
One fateful day last June I was taken by a colleague to see a demonstration of a new
computer language. The language is based on a book by K. E. Iverson called A Program-
ming Language (Iverson, 1962). The book was concerned with the precise and concise

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
268 Discussion on the Paper by Professor Cox and Mrs Snell [No. 2,

expression of algorithms, using a notation and way of thinking closely resembling estab-
lished mathematics. The present implementation of a modified version of the language
as a computer coding language is known as APL. (The modest first letter has been
retained, though some of us think that TPL would be more fitting. The language should
not be confused with PL/1, developed for IBM's 360 series of computers.) APL has been
implemented experimentally at IBM's Thomas J. Watson Research Center, Yorktown
Heights, N.Y., for computation in conversational mode through typewriter terminals.
At present it has not been generally released.
Requirements for statistical computing are many and various, because the persons
who have occasion to do such computing have very diverse degrees of interest in statistics
and in computing. No one system or method can be satisfactory for all. A professional
statistician (like me) needs to be able to experiment freely in computing, and ought if
possible to be able to do so without a vast effort in mastering a computer language and
with little time spent in coding or otherwise specifying what he wants done. It seems to
me that, unlike previous general-purpose languages such as FORTRAN or ALGOL,
APL is sufficiently powerful and sufficiently easy to learn to meet this need. I have been
preparing a paper to show how, the language can be used for typical statistical calculations,
to show enough of its character that a reader could have some basis for judging whether
to take an active interest. Far less than that can be said now.
Statistical computing, like other computing, requires negotiation of arrays. Various
languages and systems have been proposed permitting matrix operations to be called for
easily. APL also is designed to handle arrays. The unique feature of APL is the generality
of its definitions, leading to a high degree of consistency that not only makes the language
easy to remember but also gives it a peculiar dignity and reasonableness. One example
must suffice.
Any language or system designed to handle matrices must obviously encompass
matrix addition. It must permit a command like:

ADD A B,

where A and B are matrices of the same size. In APL matrix addition is denoted by

A+B.

What is peculiar is that this notation refers not to a special operation, matrix addition,
but to a general method of combining two arrays that are not necessarily matrices by a
function that is not necessarily addition. In fact A and B can be any two arrays of the same
size-vectors, or matrices, or rectangular arrays of any number of dimensions. The
function " + " can be replaced by any other standard function f having the same syntax,
that is by a symbol f such that for any scalar arguments A and B the combination A fB
is scalar. Thus if A and B are two 17-dimensional arrays of the same size, A x B means
the array of the same size formed by multiplying each element of A by the corresponding
element of B. Similarly for A * B (where * means exponentiation) and A I B (where I means
the residue function) and A = B (where = is the logical function taking the value 1 when
the arguments are equal and 0 otherwise), and so on for a dozen more functions. We
might say that for the cost of a capability in matrix addition we have obtained free many
other capabilities, many of which turn out to be just as useful. The whole of APL has
been constructed with this kind of generality.
Whatever may be the fate of this particular implementation of APL, something like
it must surely eventually command widespread approval.
The relevance of computing to the paper by Professor Cox and Mrs Snell is that until
a computer has been adequately tamed, residuals have only theoretical interest. Such a
study was not done many years ago in the Fisherian era because computers were not
available then. I am eager to try out these methods, which promise to be a valuable
weapon in the continual struggle to fit theory to facts.

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
1968] Discussion on the Paper by Professor Cox and Mrs Snell 269

Professor J. DURBIN (London School of Economics): I am strongly in favou


examination of the residuals after carrying out regression analysis, linear or no
a means of examining the adequacy of the model specification. But it must be r
that this can be a highly treacherous business where an intuitive analysis can
seriously astray. In addition to the inspection of visual plots of various kinds
often wish to carry out some form of test of significance. If the test statistic b
fitted residuals is a(R,, ..., R,n), which we may take to be suitably normalized
appear reasonable to suppose that its distribution converges for large n to th
corresponding statistic based on the true residuals a(el, ..., E,,) on the ground
usually converges stochastically to zero for all n. If this were true one could
large-sample tests of fit by treating statistics calculated from the fitted residua
had been based on the true residuals. The method is usually invalid, however,
particularly favourable circumstances.
Let P denote the parameters of interest and suppose that if P were known, a would
estimate a vector a of nuisance parameters. The model specification is then equivalent to
a hypothesis of the form a = ao. Let b be the maximum-likelihood estimate of P assuming
a = aO and let a be the maximum-likelihood estimator of a assuming P = b. In the present
context a would be a statistic based on the fitted residuals. Assuming the usual type of
regularity conditions one can then show that 4(n) (a - ao) is asymptotically normal with
zero mean and variance matrix A-' - A-' CB-' C' A-l where

a2 log L a2 log L

pAni 82log
C' B C1 1 AaVa' VAaDC3'
L 82log L.

On the other hand, the intuitive procedure would take a t


distribution as the maximum-likelihood estimator of a assu
known, that is 4(n) (a - a0) would be assumed to be N(O, A
can be very different the intuitive test can be seriously mis
that any error is always in the same direction, that is, to
significance. The intuitive test is asymptotically valid if E
results are given in a paper "Testing for serial correlation
some of the regressors are lagged dependent variables" whic
To get an idea of how the results apply to problems of th
Cox and Mrs Snell consider a simplified form of their mod

yi = exp (Px) ei, i = 1, ... , n,


where el, .., E, are independently exponentially distributed
they have mean unity. Suppose we require a test based on
a shift in mean. The example is of course artificial but I b
point. The intuitive test would be to compute

a =

and to take this as asymptotically N(1, n


fact 4(n) a has asymptotic variance
In n n

1- nR2 / xi = :(xi - R)2/ /x2A.


X1 1 /1

The intuitive procedur


amount that can be very large.

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
270 Discussion on the Paper by Professor Cox and Mrs Snell [No. 2,

Dr V. BARNETT (University of Birmingham): I should like to echo the sentiments of


earlier speakers and say how much I enjoyed this paper. There are just two points I should
like to make. The first and major one arises from the form of the model (3) and defined
residuals (5). Implicit in the definition (5) is that any parameters which relate specifically
to the distribution of "errors" or unobserved variables e are incorporated in the set of
model parameters, P. Later discussion and examples in the paper make this quite apparent.
For instance, the authors suggest that an unknown variance in the normal-theory linear
model should be handled in this way (example 1, Section 2); and the c-distributions for
the two practical examples discussed are assumed to be in standardized form-this is an
essential feature of the ordered residual plots of Figs. 1 and 2, requiring the linear relation-
ships to be of unit slope and through the origin. The mathematical convenience of this
approach is quite clear and well demonstrated in this paper. Also, in many cases the basic
behaviour of the model is bound up with the parameters related to the error distribution
(examples 2 and 3 in Section 2 are of this type), and we have no alternative but to simul-
taneously estimate all the parameters in fitting the model. But in certain cases this is not so;
for a regression model, say, with independent identically distributed (not necessarily
normal) errors it is reasonable (and indeed customary) to fit the parameters in the basic
model separately, and without regard to the scale parameter in the error distribution.
This is not to deny the essential need of studying residuals in order to validate the model;
but there would seem to be no reason why the residuals should have been fully standardized
for this purpose. More generally, some parameters in the error distribution may be of
secondary practical interest, and not warrant the complication of the model fitting process
by their inclusion.
The procedure of plotting the ordered residuals (or ordered modified residuals) remains
a useful tool for obtaining a visual validation of the model in such cases. But it can
provide the further service of yielding quick estimates of any error distribution parameter
omitted from the fitted model. Also, although such omitted parameters may have secon-
dary practical interest, we are not absolved from trying to get as good estimates of them
as we are able, within the scope of the (limited) effort we are prepared to put into this.
This raises the general question of estimation via probability plotting procedures. Such
methods find wide practical application in many fields, for example in engineering,
meteorology, hydrology, etc., usually for estimating scale and location parameters in
distributions with density function of the form (l1/cr)f{(x-juo)/r}. However, there would
seem to have been little serious study of the statistical rationale of this approach to
estimation. The method consists simply of plotting ordered sample values, y(i), at pre-
assigned values, xi, of a variable x; and estimating the scale and location parameters from
the best straight-line relationship. Various sets of plotting positions, xi, have been
employed. These might be, as in this paper, the expected values of the standardized order
statistics or, more commonly in practical application, the inverse probabilities of the
cumulative sampling distribution, for example

F-1(1) or F-(12)

It is in the choice of xi that more detailed work


the statistical properties (bias and relative effici
may vary wildly with the plotting positions chosen.
Little detailed formal discussion of this problem appears in the literature; although
there are papers by Benard and Bos-Levenbach (1953) who propose modified plotting
positions for general use; and Chernoff and Lieberman (1954), concerned specifically with
the normal distribution. Some more detailed results on suitable choice of plotting
positions for estimation can be easily obtained. For instance, if ju happens to be the mean
of the distribution, unbiasedness of the estimate of ju constrains the xi to always produce
the sample mean y, which for certain distributions is most inefficient. For the plotting
positions proposed in this paper no problems of bias will arise for, say, a scale parameter or

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
1968] Discussion on the Paper by Professor Cox and Mrs Snell 271

(if we ignore inter-residual correlations); but we may well sacrifice a lot of information
on u in relation to some other choice of plotting positions. This is well illustrated, for
example, by the case of uniformly distributed residuals with half range or.
The choice of plotting positions relates also to the question of model validation. We
are invited to consider the plotted ordered residuals in Figs. 1 and 2 as evidence of the
adequacy of the respective models for these examples. It is largely a subjective matter
how convincing we find the evidence in these two cases-personally Fig. 2 worries me a
little! More objective criteria of the adequacy of the model can of course be constructed
and applied to these plots, as the authors have mentioned. But again the properties, and
relative convenience, of relevant test statistics for this purpose are going to depend
(perhaps quite strongly) on the choice of plotting positions, xi.
My second and concluding point is something of a personal "hobby horse" and I will
be brief. In the Introduction the authors comment on the use of computers in studying
residuals. There is a growing tendency, in some circles, to regard the computer as a
substitute for common sense, thought and observation. I certainly would not presume to
criticize the authors on these grounds; the whole spirit of their paper denies this attitude;
nor would I dispute the essential value of the computer in large-scale studies of residuals.
But it would be unfortunate if the lack of a "suitable computer with graphical output
device" was allowed to discourage, or excuse, the study of residuals for model validation.
A pencil and piece of graph paper may still work wonders in this respect!

Dr S. C. PEARCE (East Malling Research Station): I studied the text of this paper with
the greatest interest and found it both stimulating and provocative. After a time I came to
suspect that the authors know rather more about the quantities, R&, than they say, so
perhaps I may provoke them in turn, hoping they will add to their remarks and so make an
excellent paper even better.
These quantities are derived from the crude residuals, Ri, by a transformation that is
intended to give them the same mean and variance as the random variables, 4i. Accordingly
they go only part of the way towards the desired quantities, R'), each of which shall be an
estimate of one of the 4i's. Where they first appear at equation (30), there are some cautionary
words about particular cases and fairly general procedures, but when they are actually used
at equation (35) the word is Hence; I suggest that it should be Let. The transformation
proposed is completely reasonable, but it is not unique for the purpose intended.
In fact the transformation in this instance proves to be rather too powerful, as is shown
by Fig. 1. In general, it will make small Ri into even smaller R' whereas it will increase
large Ri. In Fig. 1 we see that all the smallest residuals lie below the line while the largest
is awkwardly above it, and we should actually have done better not to have transformed
at all. Turning to Fig. 2, the transformation has again been too powerful; with one
exception all the negative residuals are too small and all the positive ones too large. Here
let me agree that these two Figures provide most impressive support for the essential
rightness of the authors' approach; my point is merely that there are in fact systematic
deviations, which may be due to the arbitrary element in the transformation.
However, there are other explanations possible. Perhaps the systematic deviations
result from a quantity having been badly estimated on account of some quirk in the data.
In that case the fact of the transformation having been too powerful in two instances may
be of no more importance than a coin having been tossed twice and come down heads on
both occasions. Alternatively everything may be the result of using Ri instead of R') or Ei
Admittedly the method of deriving the R' is reasonable, because each Ei must be a functio
of the values Ri and must depend chiefly upon the Ri to which it corresponds, but a trans-
formation such as the one used can hardly be exact. There is another point, how certain
can we be a priori that the random residuals, Ei, are in fact distributed exponentially
(Fig. 1) or normally (Fig. 2)? (In the latter case, as a matter of fact, it is scarcely con-
ceivable that they should be.) For my own part I can see no way of judging where the
systematic deviations come from, but perhaps the authors can help.

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
272 Discussion on the Paper by Professor Cox and Mrs Snell [No. 2,

I have rather trailed my coat because I would like to hear the extended comments of
the authors on this point. I would, however, advance a suggestion. Scandalous as the
suggestion may seem at a meeting of the Royal Statistical Society, there are occasions when
real data are a nuisance and fudged-up figures are better. This is perhaps an occasion for
simulation. If we knew for certain what the parameters and random residuals were, we
could apply the methods of this valuable paper and observe how the values of Ri actually
behave. It could be that we do not need to seek much further.

Mr A. M. WALKER (University of Cambridge): I would like to ask a very simple


question in connection with Example 2 in Section 2 of the paper, where the model is given
by equation (7). Why did the authors not take logarithms of each side of the equation,
so as to obtain a linear regression model of the usual form, with independently and identi-
cally distributed residuals log Ei = - (say)? If that were done, and the basic parameters
taken to be log fl. and 2, the analysis would seem to be somewhat simpler, because the
parameters occur linearly in the definition of the residuals, which become log Ri, 1 < iA n,
in the authors' notation. There would of course be no difference as regards maximum
likelihood estimation (and least-squares estimators might well not be particularly efficient,
but derivation of formulae analogous to (33) and (34) on p. 255 giving the approximate
mean and variance of log Ri would certainly be more straightforward, and if behaviour
residuals not in accordance with the model was expected to be associated with the value
of the independent variable x (log white blood cell count), the logarithmic transformation
would be a fairly natural one. Also the estimated residuals are no longer restricted to be
positive, so that one does not have to introduce a transformation such as that given by
equation (35); incidentally, taking logarithms in this equation obviously just brings one
back to equation (30) with Ri replaced by log Ri. The distribution of Xi is, admittedly,
less simple than that of Ei, but is still fairly easy to handle. For example its moment
generating function E(exp,t ) = F(I + t), so that its rth cumulant is 0(z1-1) (1), where
+(x) = d/dx {log P(x)} denotes the digamma function.
This question illustrates the point that the general definition of residuals given at the
beginning of Section 2 is not unique; one can apply an arbitrary 1-1 transformation to
the Ei and still satisfy it, as the resulting random variables remain independently and
identically distributed. In many problems there will be a natural transformation to use,
but that will not always be so, and even when it is, is the "natural" transformation
necessarily the most appropriate? I would be grateful if Professor Cox and Mrs Snell
could comment on this.

The following written contributions were received after the meeting.

Dr R. M. LOYNES (University of Cambridge): I should like to say that I enjoyed the


paper, and to observe very briefly that it is possible to obtain the distribution of the
residuals Ri, and then to find a function pi(Ri) which has the same distribution as Ei, thu
producing a different modified residual R'); these statements are to hold if account is taken
of terms of order n'l but of no higher order.
A numerical comparison with the values of Rk in Table I shows, as one would expect,
no great differences: all except the last row showing differences no greater than 0 03, and
a difference of 0-09 in the last row.
One can similarly consider joint distributions of the Ri, and more complicated adjust-
ments of them in an attempt to reduce the dependence. It is not possible to be quite as
thorough now, but on general grounds it seems unlikely that much can be done: for
example in the simplest possible (normal, linear) case there are n independent cE, while the
Ri satisfy a linear relationship.
A full account will be given elsewhere.

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
1968] Discussion on the Paper by Professor Cox and Mrs Snell 273

Dr C. L. MALLOWS (Bell Telephone Laboratories): I would like to congratulate the


authors on their important paper, and to report some results that bear on the questions
raised at the end of Section 5 and in Section 10 (ii). Suppose that X1, ..., Xn are random
variables (for example, residuals) that are nearly, but not quite, independent and identically
distributed. In a forthcoming paper I give a general method for obtaining the distributions
of the order-statistics of X1, ..., X", and of pairs, triads, and so on of these order-statistics.
These distributions are obtained as expansions in series; approximate results can be
obtained by truncation. A key result is that through first correction terms, the joint
distribution of any pair of order-statistics is determined as soon as the pairwise joint
distributions of X1, ..., Xn are known. Thus first-order corrections to the means, variances
and covariances of the order-statistics can be determined from the means, variances and
covariances of the original variables. For example, suppose X1, ..., Xn are multinormal
with zero means, unit variances and small correlations. (If X1, ..., Xn are rescaled residuals,
the correlations will be known.) One finds that if squares and higher powers of the
correlations are neglected, then through second moments the order-statistics behave as
though all the correlations were equal to their average. Thus to this approximation the
configuration of the order-statistics has the same distributional properties as that derived
from a set of independent normal variables, so that probability plotting remains an
appropriate technique.

Dr M. B. PRIESTLEY (University of Manchester): I should like to join the other


discussants in expressing my thanks to the authors for a very interesting and stimulating
paper. The techniques which they propose will no doubt find many useful applications,
but I would like to mention one or two points which I feel require further consideration. In
the first place, equation (3) does not by itself seem to define a unique set of {EC}. As stated,
the {Ei} are defined as a set of independent identically distributed random variables. How-
ever, as Mr Walker has pointed out, there are many transformations on any given set
of {Ei} which will result in a new set of random variables which are also independen
identically distributed. Since the authors' definition of the residuals {RJ} (equati
assumes a knowledge of the maximum-likelihood estimates A, and hence implicit
ledge of the distributional form of the {es}, it might be as well to include this information
in the basic model (equation (3)), so that the {Ri} would then be defined more explicitly
with respect to a given family G of functions {gi} and a given family F of distributions for
{Ei}.
The "interaction" between the unknown parameters of G (p) and the unknown para-
meters of F has already been mentioned by Dr Barnett, and is related to the general
problem of interpreting the observed {Ri}. The authors have investigated some of the
sampling properties of the {Ri}, and have shown how these properties may be used to
detect departures from the original model. However, these results were deducted on the
basis of the original model. If, therefore, one decides (on the result of some test on the R)
that this model was inadequate, there would seem no reason to suppose, a priori, that the
{RJ} will still follow closely the pattern of the (modified) {EC}. In the case of a polynomia
regression model it is probably true that, in general, an examination of the {RJ} (or {R'})
would correctly indicate when further terms were needed. However, in the context of
time-series analysis (that is, when the {Ei} are allowed to be correlated in "time") it is
doubtful whether the same conclusion would hold, and a periodogram analysis of the
residuals (as suggested by the authors in Section 1) might be misleading. As an example,
consider a linear system with stationary input Xt and transfer function A, and suppose
that the output, Wt, contains an additive "noise" term, et. Then the system may be
described by the equation:
Wt = AXt+et.

Suppose now that one observes Xt and Wt over some interval of time and wishes to estimate
the transfer function A. As an initial model one might suppose that et was a white noise

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
274 Discussion on the Paper by Professor Cox and Mrs Snell [No. 2,

process uncorrelated with Xt, and estimate A by the usual technique of cross-spectral
analysis. However, if these assumptions were incorrect then the estimate of A could be
seriously in error, and any subsequent model fitted to the residuals may be invalid. In
such situations it would be nice if one could appeal to some type of "orthogonality"
property-but this is hardly likely to be applicable in many situations.

Professor J. TUKEY (Princeton University): The broad conclusions to be supported or


drawn from this work are relatively clear. These include:
(i) Since residuals are mainly used to look for what might be going on beyond what is
already in the model, we can go for this with first-order answers. Small corrections,
whether or not of higher order, are often negligible. (Studies of the distribution of residuals
may prove exceptions and may not.) The important thing is to look at "residuals"; details
of definition matter much less.
(ii) The authors have made available a useful and well-behaved normalizing transformation
for the binomial.
(iii) The authors have provided approximate formulae for bias and variance of any
residual-like function of the observations and the parameters, however selected. (How
complex the situations are, under which we will, in fact, use these formulae is not yet clear.)
There remain a number of minor points which I fail to understand. These include: The
omission (in (v) on p. 248) of distributional behaviour as a possible clue to heterogeneous
variance. The statement, in the second paragraph of Section 2, that Poisson variates cannot
be expressed in terms of identically distributed quantities. (The inverse of the probability-
integral transformation always applies-presumably the interchanged statement was
meant.) Why it is not more convenient to introduce the logarithm of time to death in
Example 2, thus obtaining residuals that are both additive and unbounded? Why should
we be more concerned with variances than with covariances for the Ri of Table 3 ?
finally how much the nine-parameter fit leading to Table 5 increases our belief in the reality
of the AC effect beyond that coming from Table 4?

Mrs SNELL replied briefly at the meeting and the authors subsequently replied more
fully in writing as follows:
We are grateful for the extremely constructive and helpful comments that have been
made and for the pinpointing of problems for further study. As Mr Healy has pointed out,
there are a number of commonly used transformations of the binomial distribution which
are for many purposes effectively linear functions of one another. The incomplete beta
transformation studied in the paper does, however, give an appreciably more linear plot
on probability paper than, for example, the inverse sine transformation, especially in the
range p = 0-1 - 0 3 and with small n, for example n 10; in looking for systematic depar-
tures this could be an advantage. Mr Harrison has raised in particular the question of the
behaviour of cusum charts plotted from residuals, and we agree that this deserves further
study, especially of the effect on the plot of correlation between different residuals.
We have deliberately in the paper put the main emphasis on the plotting of residuals
rather than on formal tests of significance. Professor Durbin and also Dr Priestley have
stressed the need for caution in applying significance tests to residuals. That this is a very
important point is clear from a consideration of the simple problem of examining in linear
regression the possible importance of an omitted regressor variable, say z. It is entirely
legitimate to make a graphical analysis by plotting residuals from the initial regression
relation directly against z. If, however, the significance of the regression on z is to be
tested by the usual formula the residuals of z must be used, as is in effect done in analysis
of covariance. In the more general situations contemplated in our paper the expected
value of linear or nearly linear test statistics calculated from residuals will be close to that
calculated from the Eq's, but the contribution of the covariance terms (28) to the variance
of such test statistics will in general be non-negligible, because the number of covariance

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms
1968] Discussion on the Paper by Professor Cox and Mrs Snell 275

terms will be of order n2. A special case is Professor Durbin's example for the sum of
residuals, where the correct variance he gives follows directly from (28). That is, in such
applications it is essential to take int9 account the correction terms in (28) and with more
complex test statistics to develop appropriate extensions of them.
Mr Walker has pointed out that the distribution of the Ei's could by transformation
be taken in any form, and he and Professor Tukey have enquired about the use of a log
transformation in the exponential regression model (7), converting it into a model with
additive error. It must be stressed that computationally there is no particular advantage
in such a transformation, unless a log normal distribution of error is assumed. Our reason
for keeping to the original scale was partly the simplicity of properties of the exponential
distribution and more importantly that we felt that in this application the precise form of
the distribution for small values is not important and that test statistics and plots that are
sensitive to the very small values are not good. We think that similar arguments of
simplicity and meaningfulness will often be applicable; see also the remarks at the begin-
ning of Section 7 of the paper.
Dr Loynes and Dr Mallows have sketched theoretical arguments that should be an
improvement on those that we have used and we look forward to seeing a detailed account
of their work. In particular they should throw light on some of the points raised by
Dr Pearce, with whom we agree that there is substantial scope for simulation in studying
these problems.
We agree with Dr Barnett that it is not always necessary to take the distribution of the
Ei's in standardized form, nor always to include its parameters in fitting the model. Often
however, as in our two examples, it will be useful to take the Ei's as having a completel
known distribution.
We accept the general points made by Dr Priestley; they make explicit some of the
reservations about residuals discussed in the introduction of our paper.
Professor Tukey has made a considerable number of cogent points. We agree that the
real usefulness of minor adjustments to the residuals is not established although, as
Professor Durbin's contribution makes clear, second-order properties are important when
it comes to tests. In the statement about Poisson variates that he queries, we had in mind
that no continuous relation is available. If the Re's are to be plotted against external
variables we think that standardizing them to have equal means and variances is very
reasonable to avoid spurious regularities in the plots but in examining distributional form
the neglect of covariances is less easily defended, as indeed we indicated in the paper.
Finally we mention some relevant references that have come to our attention since we
wrote the paper. Dr Mallows has pointed out that the work of David and Johnson (1948)
has some bearing on the problem briefly mentioned at the end of Section 10. Also Theil
(1965, 1968) has suggested that for normal theory linear model problems adjusted residuals
with exactly the distribution of the true errors can be produced by attempting to find
residuals only at a suitably limited set of observational points.

REFERENCES IN THE DISCUSSION


BENARD, A. and Bos-LEVENBACH, E. C. (1953). The plotting of observations on probability paper.
Statistica, 7, 163-173.
CHERNOFF H. and LIEBERMAN, G. J. (1954). Use of normal probability paper. J. Amer. Statist.
Ass. 49, 778-785.
DAVID, F. N. and JOHNSON, N. L. (1948). The probability integral transformation when para-
meters are estimated from the sample. Biometrika, 35, 182-190.
IVERSON, K. E. (1962). A Programming Language. New York: Wiley.
THEIL, H. (1965). The analysis of distributions in regression analysis. J. Amer. Statist. Ass., 60,
1067-1079.
(1968). A simplification of the BLUS procedure for analysing regression disturbances.
J. Amer. Statist. Ass., 63, 242-251.

This content downloaded from


196.189.55.94 on Fri, 22 Dec 2023 06:33:13 +00:00
All use subject to https://about.jstor.org/terms

You might also like