Biometrika Trust
On Power Transformations to Symmetry
Author(s): David V. Hinkley
Reviewed work(s):
Source: Biometrika, Vol. 62, No. 1 (Apr., 1975), pp. 101-111
Published by: Biometrika Trust
Stable URL: http://www.jstor.org/stable/2334491 .
Accessed: 28/09/2012 07:41
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.
http://www.jstor.org
Biometrika(1975), 62, 1, p. 101 101
Printed in GreatBritain
to symmetry
On powertransformations
BY DAVID V. HINKLEY
School of Statistics,Universityof Minnesota, Twin Cities
SUMMARY
Transformationsto symmetry,or approximate symmetry,are considered.In particular,
propertiesof simple estimates based on equitailed order statistics are derived. Examples
include transformationof exponential and gamma random variables. Errors in previous
work are discovered and partially corrected.
Some key words: Maximum likelihood; Order statistics; Robustness; Symmetry; Transformation.
1. INTRODUCTION
Box & COx (1964) discussed estimation of data transformationswhich would yield
variables satisfyinga normal-erroradditive linear model. In particular,a familyof power
transformationswas considered, which in its simple form consists of transformations
TA:Y-+zA definedby
Z=] 4A (Z0), (1.1)
log y (A = 0).
Here y mightbe an observable quantity or a residual froma fittedmodel. A conventional
assumption underlyingthe use of the transformationis that, forsome A, ZA has a normal
distribution.
One methodofestimatingA discussed by Box & Cox is that of maximumlikelihood.This
was furtherexplored by Draper & Cox (1969), who derived expressionsforthe precisionof
the maximumlikelihoodestimate. Otheraspects ofnormal-theoryestimationand inference
about A in (1.1) have been investigatedby Andrews (1971) and Atkinson (1973).
It is frequentlyassumed in connexionwith (1.1) that y is positive; ify could be negative
many values of A would be clearlyinadmissible.Note, however,that ify is positive then Z;A
can have a normaldistributiononlyifA is zero or ifA-1is an even integer.Nevertheless,one
can often obtain a transformationfor which ZA, although bounded below, is very nearly
normal, or close enough to normal forpractical purposes.
There are three reservations that one might have about fitting(1 1) with a normal
distributionassumption by maximum likelihood. First, the maximum likelihood method
involves a great deal of calculation even in the normal case. Secondly, as Andrews (1971)
has shown,the maximum likelihoodmethod can be very sensitiveto outliers;this reserva-
tion is actually unjustified in the sense that all reasonably efficientmethods depend
criticallyon the extremeobservations.Thirdly,ifwe are aimingto use a linear model forthe
transformeddata we may not want to make a normalityassumption at any stage forfear
of nonrobustness.We may be planning to use now popular robust methods of analysis
(Huber, 1973), and the assumption of normality in connexion with (1.1) would seem
contradictory.
102 DAVID V. HINKLEY
In thispaperwediscusssimpleandnot-so-simple methodsofestimating Atogiveapproxi-
matesymmetry forthe distribution
ofZA.The methodsare based on symmetrizing order
statisticsaboutthemedian.
As wereDraper & Cox (1969), we are not concernedherewiththe requirement of an
additivelinearmodelfortransformed data. We do assumethatthe Y's have a common
distributionwithunknownlocationand scale.
In ? 2 we discussa verysimpleorderstatisticestimateofA and deriveits large-sample
properties.Corresponding resultsforthenormal-theory maximumlikelihoodestimateare
outlinedin ? 3, whichincludescorrectionsto theresultsofDraper& Cox (1969).Section4
thengivesseveralillustrationsoftheresults,forgamma,lognormaland otherdistributions.
Generalizations of the simpleestimateof ? 2 are discussedin ? 5, withan examplegiven
in ?6.
2. A QUICK ESTIMATE
2 1. Definition
oftheestimate
SupposethatY1,. . .,Ynarecontinuous nonnegative independent andidenticallydistributed
randomvariables;therestriction to positivevariablesis necessaryifthefamily(1 1) is to
be sensible.If thereexistsa AsuchthatZAin (1.1) has a symmetric distribution,thenthep
and 1-p quantileswillbe symmetrically placedaboutthemedian.Thissymmetry ofpopu-
lation quantilesforZA suggestsa simplemethodforestimatingA, namelythat of sym-
metrizing the samplequantilescorresponding p and 1-p forsomep.
to tail probabilities
As suggestedin ? 1,Z. cannothaveexactsymmetry formostA,butwe assumethata value
ofAexistswhich'nearly'givessymmetry. Morewillbe said aboutthislater.
Let Y,1,..., Y. have the commondistributionfunction F(y), withquantiles6, definedby
F(6s) = s (O < s < 1). Then we seekthattransformation in thefamily(1 1) forwhich
E05 P = (2.1)
If we denotetheorderedvaluesofY,..., Y by X1, ... < Xn,and definethemedianX in
theusual way,thenthe sampleanalogueof(2 1) is, unlessX, = X=.X,
X -XrA= X"_r+1XA (r = [np]), (2.2)
whichis an estimating
equationforA. Thereare onlytwo solutionsto (2.2), one ofthem
=
beingA 0. However,by comparisonwith(1 1) we excludeA = 0 unless
X/X =Xn-r+l/xX (2.3)
whichis the conditionforsamplequantilesoflog Y to be symmetric about the median.
For computation purposesit is easierto rewrite(2.2) in theform
(Xr/X)A?+(Xn-r+1/X)A= 2. (2.4)
The existenceof one nonzerosolutionto (2.4) is easily proved directly,or as a special case
ofthe lemma in ? 5. The nonzerosolutionT of (2 4) is positive if and only if XrXn+l > x2
and is otherwisenegative if (2.3) is not satisfied; this is obviously sensible on physical
grounds. Moreoverit is easy to verifythat
ITI > Ilog log (X.-r+1iX) -log log (XlXr)I {Ilog (Xr/Xn.,+l)I
which may be usefulin solving (2.4).
On power transformationsto symrnetry 103
The estimatordefinedby (2.3) and (2.4) is somewhatnaive. One wouldexpectthat in
orderto obtaina reasonablyefficient estimatorone wouldhave to combinethe equations
(2 2) correspondingtoseveralp valuesin somesensibleway.Thiswe do in ? 5. However,the
simplicityof(2.2) is appealing,and thereis someflexibility
in ourabilityto choosep. Also
the reasonableness ofthebasic idea and somegenerallyusefulproperties oftheestimator
are mosteasilydiscussedin thesimplecase.
2*2. Properties
ofthequickestimate
We have alreadyseenthatT, thenonzeroestimatorsatisfying (2.4),is unique.We now
showthatas n -? so theestimatorT definedby (2 4) has a limiting normaldistribution. To
do this we use the joint asymptoticnormalityof the orderstatisticsXr1,...,Xrmfor
r, = [npj],0 < Pi < ... < pm< 1. Specifically,
iftheoriginaldistribution function F(y) has
densityf(y) and quantiles6, = F-1(s), thenthe vector(Xrl,..., Xrm)has a limitingmulti-
variatenormaldistribution withmean(6, . ) and covariancematrixdetermined by
n cov (Xr,Xr) =
p) (i j). (2.5)
The firstpropertyofthe estimatorT thatwe need is consistency, whichstrictlymeans
thatT = A+ op,(1), whereAis thesolutionof (2 1); ifthereis a transformation in the class
(1.1) givingexact symmetry, then the solutionto (2.1) gives it, whateverp. Actually
consistencyis easy to verifyfromcontinuity oftheleft-hand side of (2.4) and consistency
ofXr' Xn_rl and X fortherespectivequantiles.Note thatX is asymptotically equivalent
to X[n, whichfactwe shalluse.
Now let us supposeA t- 0, and write,withr = [np],p+q1q=
Xr = Ap(1?n Wp), Xnr+l = gq(l +n-fJq),
X = 60.5(1+ n-IW0.5) (2.6)
Thentheestimating
equation(2.4) can be written
[cp{l + n-i( - W0.5) +
ov(n4)}]T + [OCq{1 + + o(n-I)}]T
n-T(WI- W0.5) 2, (2.7)
whereac,= 6sI,0.5and theac'ssatisfy
OCa,, = 2 (2.8)
expansionof (2.7) about T = A gives,using(2 8),
Since T is consistent,
by definition.
(T-A) (ac.log ax + aA log aq)
+ An {ctp( Wp +
- W0O6) q(Wq - WO-5)}+ op(T - A) + op(nm2) = 0.
That is, to firstorder,
Vn(T-A)/A = (2.9)
cA.log ac +?cqlog aZ
_,
We thenuse thelimitingjointnormality ofthe W's, whosecovariancematrixis deter-
minedby (2.5) and thetransformation
(2 6),to obtainthelimiting
normaldistribution
ofT.
104 DAVID V. HINKLEY
ofVn(T-A) is
If we defineh 1 = sf(s), the varianceofthe limitingnormaldistribution
foundto be
VT(A,p) = A4{h2+pq(4a2hx, + x2 h2) 2p((aphp+h hq)h?+2p24PA*hp}
(2h10)
ap,+ca,lo0gal) 2
(ac,lo0g
g(z) forthe trans-
in termsofthe probabilitydensityfunction
An alternativeexpression,
formedvariableZA,is
VT(A,p) = A4{g2+?pq(gi2 +g-2) - 2p(glg-1 + g-lg1) + 2p2g-glg-l}
{(1 + AKp)log (1 + AKp)+ (1 + AKq)1og(1+AKq)-2(1 + AK+)log(1+ AKI)}2
whereK,is the quantile definedby G(Kc) = s and g5 = g(K5).
Noticethattheproperties ofT are invariantunderscale changeof Y, as is immediately
obviousfromtheestimating equation(2.4).
The aboveresultsholdalso forA = 0, whenZ = log Y. Slightlymoregenerally,(2.1 1) for
smallA maybe written as
2
Vi +2pq(g2 + g-2) - 2pg l(g'l + g-1) + 2p2gg1g-l (2.12)
{1 (K2 +K2_ 2K2) _A(KP3+K3q-2Ic)}2
This contradictsthe typeof resultobtainedby Draper & Cox, but theirresultsare
wrong,as we showin thenextsection.
theresultsofthissectionare givenin ? 4.
Severalexamplesillustrating
3. NORMAL-THEORY MAXIMUM LIKELIHOOD
As pointedout in ? 1, previousworkon powertransformations has assumedthe trans-
formedvariableZAto be normallydistributed; in thesimplestcase thevariablesare taken
to be homogeneous N(lt,v). Draper& Cox derivedlarge-sample propertiesoftheestimator
ANobtainedby maximizing the N(G,v) likelihood.These properties wouldprovideuseful
standardsbywhichto judgethesimpleestimateT describedin ? 2; howeversomeofDraper
& Cox's resultsare incorrectand othersare incomplete.We therefore outlinethe
briefly
basic propertiesof the normal-theory maximumlikelihoodestimateANhere.
scorevectorU.,
The N(jt,v) likelihoodeL forZA,1,..., ZAfnleads directlyto the efficient
givenby
UA= ,A = -ogYj V IA-(Zt-y)
- {(1 + AZj)log(1 + AZj)-AZj},
L
U/J=L -V=-l(Z-it), UV.= (2V2)-l(Zj-)2 (2v)-1. (3.1)
An obvious feature of the component likelihood equation UA.= 0 is its invariance under
scale transformationof the originalvariable Y.
Provided that the densityf(y) of Y is regular and a unique solution of E(U.) = 0 exists,
as is the case forstandard continuousdistributionson [0, xo),the normal-theorymaximum
-likelihoodestimate convergesstochasticallyto the solution of E(U.) = 0 and has a limiting
normal distribution.
On power transformations
to symmetry 105
Let 0 = (A,,u,v) and denotethenormal-theory
maximumlikelihoodestimateby 0N with
limit ON. A standard expansion of the likelihood equation gives
1 32L] -1 1
J
A
Vn(9N- ON) = - p
(ON)+o(l) (3.2)
see,forexample,
Cox& Hinkley 9). ThenVn(ON- ON) hasa limiting
(1974,Chapter normal
distribution
withcovariance
matrix
= J-1IJ-1, (3.3)
where
F 2L]
nJ = Ef [-v] a nI = Ef{U.(ON)U' (ON)}; (3.4)
hereEfdenotes expectationwithrespecttothedensityf(y) ofY. NotethatE = I-1 onlyif
fisthe normaldensitybecause I = J if
only L istheloglikelihoodaccording
tothedensityf.
The generalform(3.3) is requiredwhenexamining properties OfANundernonnormal
as wedo in ?4.
distributions,
Draper& Coxincorrectly obtainthevariance OfANfrom I-1. Theirmethodofexpanding
U.as a powerseriesinAdoesleadto approximations forI andJ up toanyorderinA,but
theresultsforE areverycomplicated, involvingthefirst
sixmoments ofZA,andoflimited
In particular
usefulness. casesonecanevaluateE. Somegeneral forthecaseA = 0
results
aregivenin ?4.
4. EXAMPLES
andgamma
441. Exponential cases
To illustrate
thediscussion up tothispointwefirstexamine insomedetailtheexample
chosen byDraper& Cox,where theoriginal
variables
Y1,...,Y,areexponentially
distributed
withdensityf(y)= p exp(-py).
case(2.2)becomes
In thisparticular
(-logp)A+( logq)A = 2(log2)A. (4.1)
The quantilesof YAare
UP, p) = p-A{-log (1 -s)}A, (4.2)
K8(p,p)ofZAaregivenbyK
andthequantiles = A crudeoutlier-free
(y- 1)/A. measure
of
forZAis the'tiltfactor'
asymmetry
r(s,p)= (4(3) (O<
X?s(p'P)_V0(5(PP) s < 1).
NotethatthenonzerosolutionApof (4.1) and T(s,p) arebothindependent ofthescale
parameter.
Table1givessomevaluesofA. and (s,p) forp, s > 0 01.TheentriesshowthatA.isvery
nearlyconstantforp> 0 10;and,related
tothisstability, isa highdegree
there ofsymmetry
as faras theupperandlower5 % pointsofthetransformed distributions.
Muchthesame
conclusionswerereachedbyDraper& Cox,whonotedthatsmallchanges inAhavelittle
visibleeffect
onthesymmetry. TheWeibulldistribution ofZAis quitecloseto normality
exceptintheextreme tails.
106 DAVID V. HINKLEY
Table 1. Transformations,
Ap,and tiltfactors,7(s, p), in theexponentialcase
Quantile p 0 005 0 01 0.05 0.10 0.20 0 30 0 40
Transformationpower A, 0 272 0 28 0 291 0 297 0 303 0 305 0 307
s = 0*2 0*970 0 978 0 989 0*995 1.000 1.002 1 004
s = 0*1 0*963 0 975 0.991 1.000 1 009 1*018 1 015
s = 0-05 0-964 0-979 1.000 P011 1 023 1*027 1*031
s = 0*02 0*973 0*992 1*019 1*034 1*047 1 054 1 059
s = 0 01 0*985 1.000 1P038 1.055 1*072 1 078 1*084
The limitingnormal distributionof T is scale invariant, as we noted in ? 2, and hence
independentofp. The variance VTis given in Table 2 forthe same transformations described
in Table 1; rows below that forthe exponential case are definedlater.
Table 2. Large-samplevariance VTof thequantiletransformation
estimatefor
gamma distributions withindex r, includingexponential(r = 1)
\P 0*005 001 0.05 0.10 0.20
r\
1 0-589 0.582 1 012 1*894 6*271
2 1*704 1 670 2*968 6l148 19*916
3 2*841 2 718 4*507 8 982 36 069
4 3 977 3 748 5 936 11 442 43 420
It is interestingto see that ratherextremeorderstatisticsgive the best precision,p = 0 01
being close to optimal. This is a pity, in a sense, because rather large samples would be
requiredforanyone to have faithin the results! Also the method is consequentlysensitive
to outliers.
The correspondingresults for the normal-theorymaximum likelihood estimate AN are
easily derived using the efficientscore formulaein (3.1) togetherwith the identity
Jo0
C,l\gyrys-ydy
Y Y
-
(-
dsr
r +s) (s> 0),
which is related to the polygamma functions.The maximum likelihood estimate AN con-
verges to 0-265,to be compared with Draper & Cox's approximation 0268, and
FtN?P-ANrF(l+ AN), VN + 2AN).+
ftb-?p-2AVP(1
The variance VNofthelimitingnormaldistributionofVn(AN- AN)iS 0-314. Note fromTable 1
that A = 0-265 gives a relativelypoor degree of symmetry.
The above calculations forthe exponential case are easily extended to the generalgamma
density f(y) = yr-le-y/l](r),
and we have added such calculations in Tables 2 and 3 forr = 2, 3 and 4. The correcttrans-
formationpower A. for Y is quite stable at about 0-32 forthese cases, i.e. close to the con-
ventional cube root transformation.As r increases,the transformedvariable ZA is closer to
symmetryand normality.
Table 3. Large-sampleliMitANand variance VNof thenormal-theory maximumlikelihood
estimatefor gamma distributionof index r, includingexponential(r = 1)
r 1 2 3 4
AN 02654 0.301 0-312 0.318
VN 0 314 0 914 1-567 2.229
On power transformations
to symmetry 107
4-2. Examples withA = 0
For the special case A = 0 equation (2.12) gives a simpleexpressionforVT,the large-sample
variance of 4n(T - AP). A correspondingresult forthe normal-theorymaximum likelihood
estimate is quite easily derived from (3 3). Lengthy algebra gives
VN = 36(v2A(6) -6v3,(4)-2vg(3)Jt(5) + t3)lt + 7V2j3t+ 9v5) (4 4)
6,
(7v/L(4)
-
2)-3V3)2
whereitt(r)is the rth central momentof ZO= log Y. We now look at two specificexamples.
When log Y has the N(,t, v) density, (2.12) and (4 4) simplifyto
VT= XP4V-1(0-2 + 2pb-2 _ 4p-1 0-1), (4.5)
where eD(x8)= s and q5 = 56(x5),and VN= 3V-1.Some numerical values of VTare given in
Table 4. The smallest value of VToccurs at p = 0 01, at which point VNIVT 2/7T,
rather
interestingly.
Table 4. Large-sample in
estimate
variancesVTforquantiletransformation
lognormaland logdoubleexponential
cases
Normal-theory
maximum
p 0 005 0.01 0.02 0*05 0.10 likelihood
Normal: vVT P15 1 04 108 148 2.62 0.667
Double exponential:p-2VT 0-881 0.837 0.894 128 2.39 1*491
Note that the variances of log Y are respectivelyv and 2p2.
The effectof unknownA on estimationof at and v is seen fromthe complete covariance
matrix 1V-1 _(v+ ,/2)/V A
ENn var (ON) [-(V + /;2)IV V+ 6(V?t2)2/V
+ 21(V + It2)
4,U 2#U(V
+ru2) 2v2+ 2v
The potentiallyheavy increasein var (&N) due to not knowingAis clearlyworthinvestigating
in more generality.
If log Y has a distribution close to the normal, so that the standardized moments
Yl = Ih1(3)/V, V-2 -3, etc. are of successivelylower orderin some notional parameter,
Y2 = ,U(4)
we can approximate VNfrom(4.4) by
VN= 3^11-9VY 16)2)
In a sense this correspondsto (9) of Draper & Cox, theirfactor02 being incorrect.
A correspondingapproximation forVTis easily constructedfrom(2.12) using a Fisher-
Cornish expansion for KS and an Edgeworth expansion for g(z). The result is somewhat
complicated and will not be given here.
A distributioncharacterizingmuchlongertails than the normalis the double exponential,
with density lp exp (-p lzl). If log Y has this distribution,it is easy to show that (2.12)
becomes VT= p2(log2p)-4 (2p-1-4) (O<p < 2),
with values as in Table 4. The correspondingvalue of VNcalculated from(4.4) is 1491p2,
so that T is superiorto ANin large samples forp < 0 06. In termsofthe variance v ofZ, the
smallest value of VThere is 1 674v-1,compared with 1 044v-1 in the log normal case.
108 DAVID V. HINKLEY
5. GENERALIZATION
OF THE QUICKESTIMATE
541. The generalization
There are several ways in which one could generalize the estimator T definedby (2.2).
First, we could solve (2.2) forseveral values ofp and average the resultingestimates of A.
Secondly,we could, as it were,average the equation (2 2) forseveralp values and then solve
for the estimator. Other possible methods exist, but this latter method is the one we
examine here.
We propose,then,to use the equation (2 2) forseveral values ofp, say Pi < ... <Pm < 7,
and in fact to formthe combined equation
m m
E Cj(XT;+XTr ?i) = 2EcjXTX (45.1)
j=1 3=1
where rj = [npj]; the solution T = 0 is chosen only if
Zc log (Xr Xn-r+i) = 2Zcj log;1, (5.2)
corresponding to (2 3). The coefficientscl, ..., cm are arbitrary weights to be chosen. A more
convenientformof (5.1) is
D { 7+ (X;+)T} = 2Zcj. (5.3)
In practice it would be sensible to choose all cj's positive,particularlyif a monotonetrans-
formationof Y is symmetricallydistributed,since otherwiseasymmetryof quantile pairs
tends to cancel out in the summation.
The existenceofa unique nonzerosolutionto (5 .3) forpositive Cjis proved by the following
lemma, easily proved by convexity.
LEMMA.For arbitrary positive constants al, 6..) am, b , ..., bmand cl, ..., cm, the equation
E cj(aj + b) = 2Zcj (5.4)
has a single nonzeroreal solution unless 2cjlog (aj bj) = 0, in whichcase t = 0 is the only
solution.
Although the general equation (5.2) is interestingtheoreticallyfor any value of m, in
practice one mightwell restrictattentionto m = 2 or 3 and use equal weightsCj. Potentially
the use of m > 1 could accomplishtwo things: (i) increased precisionofthe transformation
estimate, (ii) an averaging out of the asymmetryin ZT when no ZA has a symmetric
distribution.
5.2. Large-sampleproperties
The groundworkforestablishinglarge-samplepropertiesof T has been laid in ? 2-2. Here
we outline the main steps and results.
By continuityof (5.2) and consistencyof the order statistics, T is consistentfor that
value A. of A satisfying
Ec ('rX + 6) - 2Zcj3 Ex5,
which would be common to all vectorsp if ZA is symmetricallydistributed.By the same
expansion route used in ? 2-2 we findthat forall A
VIn(T-) 2W1
= c( 5
lgC(p Wp+Oqo(Wq)
SEC 109f Otp +
(tV; Otq 109 Otq)
On power transformationsto symmetry 109
Hereand belowthesuffix j on c>,pj and qj has beendroppedfortypographical convenience.
The resultinglimitingnormaldistribution forT is again obtainedfromthe limitingjoint
normaldistributionoforderstatistics,and,using(2.5),thevarianceVT(A,p)is
VT(A,p)= A4[h -2h.5 cp(ch +ohq) )+c2{pq(4"h +G2Ahq)
a2 +2p20C4hpAhq}
+ 2 E cc'{pq'(aAa4,hp
hp+ch,
h aAa; hqhq)+ pp( ,
', hp
hq'+ a
hpJ,hq)}]
x {Ec(cA logap + aAlog aq)}-2. (5.6)
The notationthroughout is thatof ? 2-2.
A correspondingexpression forVTin termsofthedensityg(z) can be obtainedfrom(5.6)
in the sameway that (2.11) was derivedfrom(2.10). This simplyamountsto substituting
g-1 forac4h8in the numeratorand (1 + AK5)log (1 + AK) - (1 + AKO.5)log (1 + AK0.5)fora4loga,,
in the denominator of (5.6), wherewe recallthat G(Ks) = s and g, = (K8).
Theresult(5 6),as wehavegivenit,isforfinite m,andwouldapplywhenmis smallrelative
to n. If all theorderstatisticsXi areused,so thatm = [n] in (5.1),a corresponding
asymp-
toticresultcan be obtainedfora smoothweightfunctionc(x) definedby Cj= c{j/(n+ 1)}.
In termsofg(z) theresultis
A-4V (I)}2 - 2~fr(1)
J~{f +c
Al(c(c) A A3(c)
A4VT(A,C)= {2 2((C) 2(C)+A
f
= 1/g{G-l(x)}
where3fr(c) and
Al(c) = c(x) x{*f(x)+
-i/(1 -x)} dx, A2(c) = jX((- x) c2(x) {#2(X) + #f2(1-x)} dx,
A3(C) = fC(X) C(X')[X(1-X') {/f(X) ir(X')+ r(1-X) r(1-X')}
x<x'
+ xx'{V(x) 3b(1 - x') + 3b(1 - x) #(x')}]dxdx',
B(c) = c(x) [{1 + AG-1(x)}log{1 + AG-1(x)}+ {1 + AG-1(1-x)}
log{1 + AG-'(1- x)}] dx- 2{1 + AG-1(-)}log{1 +AG-1(2)}
requiredforthisresultwillnotbe givenhere;see Stigler(1974).
A discussionofconditions
6. AN EXAMPLE
Afterintroducing of T in ? 5, we need to assess whatis gainedin
the generalization
precisionat the expenseof complication. Fromcalculationswe have done it wouldseem
thatin factlittleis to be gained.Here we give onlyone example,the case wherelog Y is
normallydistributed.
WhenA = 0 and Z = log Y is N(1t,v),we saw in ? 4-2thatT has minimum large-sample
= =
varianceat p 0 01, whereVT 1 04v-1.Using a simplifiedform of (5.6) corresponding
to (2.12),we obtaintheresultsgivenin Table 5. The right-hand columnofthetable gives
valuesofvVT,and theotherentriesindicatevalues ofcj; thecj's sumto onein each case.
Somegeneralfeaturesare apparentfromthissmallsetofresults.Moststriking is thefact
thatifall valuesofp exceed0 05,thenm = 1,thatis theuse ofonepairoforderstatistics,
cannotbe markedly improvedon by m = 2. Use ofm = 3 withonevalue ofp equal to 0 01
can give up to 15% improvement in precision,whichis a littlebetterthan usingm = 2.
Withp = 0 05,0410,0-15and 0-20and each Cjequal to {, vVT= 2-23.We concludethatit is
notpossibleto escape theextremetails (p < 0 02) and keepprecision,unlessperhapsm is
considerably largerthan3.
110 DAVID V. HINKLEY
Table 5. Large-samplevarianceVTand coefficients
cj of thegeneralizedversionof T
whenlog Y is N(a, v)
p 0o01 002 005 0.10 VVT
1 0 0 0 104
a 2 0 0 0*92
o 1 0 0 1*08
0 0 0.91
o o 1 0 148
o 12
2 1P04
0 0 23 1 5
t54
o o a1 1P64
o 0 0 1 2*62
1 1
3
0
o *8
0-88
3 3
o 3
1
31
t2
P1
39
7. FURTHER DISCUSSION
Use ofpowertransformations such as (1 1) occurs mostfrequentlywithmore complicated
linear models than the singlemean case discussed in this paper. The abilityto generalizethe
estimator T definedby (5-1) depends to some extent on whetheror not the linear model
design includes replication.
Suppose that Y(j = 1, ..., r*)are replicatesof the ith cell of a linear model, meaningthat
forsome A
ZA, = I +e*j. (7.1)
We can generalize (2.2) and (2-3), or (5-1) and (5.2), as follows.Let Yi be the median of
variables in the ith cell, and define
Aii = Yj/Y (j= 1, .. ., r; i = 1,...,I). (7.2)
Then the ordered values of A j replace the ratios X%/l in (5.1) and (5.2). The estimating
equation so definedis not a trivialgeneralization,althoughthe consistencyof T forfixedI
and large n = Eri is still assured. The problem is that the standardization in (7.2) is non-
homogeneous,the more so if the variabilityof 4aiis large relative to that of the eij in (7. 1).
Assuming that the eij are homogeneous errors,it is clear that if
var (Y j)oc {E(Yzj)}b,
then cells with largermeans will dominate the estimatingequation, and hence T, if b > 2.
For example, if A = 1 in (7.1) then b = 0 and cells with small means dominate T; if A =
then b = 2 and no cell dominates T.
While we have not examined this problem in any detail, this does seem to be a suitable
situation foruse of the generalization (5.1) with m = [1n] and Cj = constant. This has the
disadvantage of requiringa large amount of computation.
An example that fitsinto this discussion is the firstnumerical example of Box & Cox
(1964), whichis a fourfoldreplicate of a 3 x 4 design. The normal-theorylikelihoodsuggests
that A = -1, althoughone would not discountvalues - 1 < A < 0. The threeoutmostpairs
of ordered Aq's each yield the estimate T = 0 by the method of ? 2. Fitting the additive
two-waylinear model by least squares with A =-1 and A = 0 gives negligibleinteractions.
Normal plots of residuals reveal that A = -1 gives a better fitto normality,although the
closeness to symmetryis about the same forboth A = - 1 and A = 0; in each case thereare
On power transformationsto symmetry 11
two or threemoderatelylarge outliers,not fromthe same data points.Thereis some evidence
that extremeAi's are associated with large cell means,whichsuggeststhat A is somewhat
negative. Strangely,use of less extremeAq's indicates A to be around 2 although there is
no consistentvalue forany particularpair.
This discussion is intended to suggest that there are difficulties
with the order statistic
method,particularlyin connexionwith complex models. When one is able to use the simple
estimatingequation (2.2), eitherin the original formor with the A i definedin (7.2), the
estimate T should be reasonably constant over the outermostpairs of order statistics in
order to be convincing.It would be helpful to understand more clearly the problem of
heterogeneityin the Aij's, particularlythroughexperience with applications.
One must conclude,however,that the need to use fairlyextremeorderstatisticsin order
to achieve preciseestimatesof A makes the quick method of ? 2 unappealing with moderate
amounts of data containing genuine outliers. Data transformationin the presence of
outliersis a riskybusiness.
REFERENCES
ANDREWS,D. F. (1971). A note on the selectionof data transformations. Biometrika58, 249-54.
ATKINSON,A. C. (1973). Testing transformations to normality.J. R. Statist.Soc. B 35, 473-9.
Box, G. E. P. & Cox, D. R. (1964). An analysis oftransformations (withdiscussion).J. R. Statist.Soc.
B 26, 211-52.
Cox, D. R. & HINELEY, D. V. (1974). TheoreticalStatistics.London: Chapman & Hall.
DRAPER, N. R. & Cox, D. R. (1969). On distributionsand their transformation to normality.J. R.
Statist.Soc. B 31, 472-6.
H-UBER, P. J. (1973). Robust regression:Asymptotics,conjecturesand Monte Carlo. Ann. Statist. 1,
799-821.
STIGLER, S. M. (1974). Linear functionsof orderstatisticswithsmoothweightfunctions.Ann. Statist.
2, 676-93.
[ReceivedAugust 1974. Revised September1974]