0% found this document useful (0 votes)

47 views12 pages

Hinkley 1975

The document discusses power transformations to achieve symmetry in statistical data, focusing on properties of estimates based on equitailed order statistics. It critiques previous work by Draper & Cox, correcting errors and presenting new methods for estimating transformation parameters. The paper also explores various distributions and provides examples of the results obtained from the proposed methods.

Uploaded by

Jorge Hernan Aguado Quintero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views12 pages

Hinkley 1975

Uploaded by

Jorge Hernan Aguado Quintero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Biometrika Trust

On Power Transformations to Symmetry

Author(s): David V. Hinkley
Reviewed work(s):
Source: Biometrika, Vol. 62, No. 1 (Apr., 1975), pp. 101-111
Published by: Biometrika Trust
Stable URL: http://www.jstor.org/stable/2334491 .
Accessed: 28/09/2012 07:41

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].

Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

http://www.jstor.org
Biometrika(1975), 62, 1, p. 101 101
Printed in GreatBritain

to symmetry
On powertransformations
BY DAVID V. HINKLEY
School of Statistics,Universityof Minnesota, Twin Cities

SUMMARY
Transformationsto symmetry,or approximate symmetry,are considered.In particular,
propertiesof simple estimates based on equitailed order statistics are derived. Examples
include transformationof exponential and gamma random variables. Errors in previous
work are discovered and partially corrected.

Some key words: Maximum likelihood; Order statistics; Robustness; Symmetry; Transformation.

1. INTRODUCTION

Box & COx (1964) discussed estimation of data transformationswhich would yield
variables satisfyinga normal-erroradditive linear model. In particular,a familyof power
transformationswas considered, which in its simple form consists of transformations
TA:Y-+zA definedby

Z=] 4A (Z0), (1.1)

log y (A = 0).
Here y mightbe an observable quantity or a residual froma fittedmodel. A conventional
assumption underlyingthe use of the transformationis that, forsome A, ZA has a normal
distribution.
One methodofestimatingA discussed by Box & Cox is that of maximumlikelihood.This
was furtherexplored by Draper & Cox (1969), who derived expressionsforthe precisionof
the maximumlikelihoodestimate. Otheraspects ofnormal-theoryestimationand inference
about A in (1.1) have been investigatedby Andrews (1971) and Atkinson (1973).
It is frequentlyassumed in connexionwith (1.1) that y is positive; ify could be negative
many values of A would be clearlyinadmissible.Note, however,that ify is positive then Z;A
can have a normaldistributiononlyifA is zero or ifA-1is an even integer.Nevertheless,one
can often obtain a transformationfor which ZA, although bounded below, is very nearly
normal, or close enough to normal forpractical purposes.
There are three reservations that one might have about fitting(1 1) with a normal
distributionassumption by maximum likelihood. First, the maximum likelihood method
involves a great deal of calculation even in the normal case. Secondly, as Andrews (1971)
has shown,the maximum likelihoodmethod can be very sensitiveto outliers;this reserva-
tion is actually unjustified in the sense that all reasonably efficientmethods depend
criticallyon the extremeobservations.Thirdly,ifwe are aimingto use a linear model forthe
transformeddata we may not want to make a normalityassumption at any stage forfear
of nonrobustness.We may be planning to use now popular robust methods of analysis
(Huber, 1973), and the assumption of normality in connexion with (1.1) would seem
contradictory.
102 DAVID V. HINKLEY
In thispaperwediscusssimpleandnot-so-simple methodsofestimating Atogiveapproxi-
matesymmetry forthe distribution
ofZA.The methodsare based on symmetrizing order
statisticsaboutthemedian.
As wereDraper & Cox (1969), we are not concernedherewiththe requirement of an
additivelinearmodelfortransformed data. We do assumethatthe Y's have a common
distributionwithunknownlocationand scale.
In ? 2 we discussa verysimpleorderstatisticestimateofA and deriveits large-sample
properties.Corresponding resultsforthenormal-theory maximumlikelihoodestimateare
outlinedin ? 3, whichincludescorrectionsto theresultsofDraper& Cox (1969).Section4
thengivesseveralillustrationsoftheresults,forgamma,lognormaland otherdistributions.
Generalizations of the simpleestimateof ? 2 are discussedin ? 5, withan examplegiven
in ?6.

2. A QUICK ESTIMATE

2 1. Definition
oftheestimate
SupposethatY1,. . .,Ynarecontinuous nonnegative independent andidenticallydistributed
randomvariables;therestriction to positivevariablesis necessaryifthefamily(1 1) is to
be sensible.If thereexistsa AsuchthatZAin (1.1) has a symmetric distribution,thenthep
and 1-p quantileswillbe symmetrically placedaboutthemedian.Thissymmetry ofpopu-
lation quantilesforZA suggestsa simplemethodforestimatingA, namelythat of sym-
metrizing the samplequantilescorresponding p and 1-p forsomep.
to tail probabilities
As suggestedin ? 1,Z. cannothaveexactsymmetry formostA,butwe assumethata value
ofAexistswhich'nearly'givessymmetry. Morewillbe said aboutthislater.
Let Y,1,..., Y. have the commondistributionfunction F(y), withquantiles6, definedby
F(6s) = s (O < s < 1). Then we seekthattransformation in thefamily(1 1) forwhich
E05 P = (2.1)
If we denotetheorderedvaluesofY,..., Y by X1, ... < Xn,and definethemedianX in
theusual way,thenthe sampleanalogueof(2 1) is, unlessX, = X=.X,
X -XrA= X"_r+1XA (r = [np]), (2.2)
whichis an estimating
equationforA. Thereare onlytwo solutionsto (2.2), one ofthem
=
beingA 0. However,by comparisonwith(1 1) we excludeA = 0 unless
X/X =Xn-r+l/xX (2.3)

whichis the conditionforsamplequantilesoflog Y to be symmetric about the median.

For computation purposesit is easierto rewrite(2.2) in theform
(Xr/X)A?+(Xn-r+1/X)A= 2. (2.4)
The existenceof one nonzerosolutionto (2.4) is easily proved directly,or as a special case
ofthe lemma in ? 5. The nonzerosolutionT of (2 4) is positive if and only if XrXn+l > x2
and is otherwisenegative if (2.3) is not satisfied; this is obviously sensible on physical
grounds. Moreoverit is easy to verifythat

ITI > Ilog log (X.-r+1iX) -log log (XlXr)I {Ilog (Xr/Xn.,+l)I
which may be usefulin solving (2.4).
On power transformationsto symrnetry 103
The estimatordefinedby (2.3) and (2.4) is somewhatnaive. One wouldexpectthat in
orderto obtaina reasonablyefficient estimatorone wouldhave to combinethe equations
(2 2) correspondingtoseveralp valuesin somesensibleway.Thiswe do in ? 5. However,the
simplicityof(2.2) is appealing,and thereis someflexibility
in ourabilityto choosep. Also
the reasonableness ofthebasic idea and somegenerallyusefulproperties oftheestimator
are mosteasilydiscussedin thesimplecase.

2*2. Properties
ofthequickestimate
We have alreadyseenthatT, thenonzeroestimatorsatisfying (2.4),is unique.We now
showthatas n -? so theestimatorT definedby (2 4) has a limiting normaldistribution. To
do this we use the joint asymptoticnormalityof the orderstatisticsXr1,...,Xrmfor
r, = [npj],0 < Pi < ... < pm< 1. Specifically,
iftheoriginaldistribution function F(y) has
densityf(y) and quantiles6, = F-1(s), thenthe vector(Xrl,..., Xrm)has a limitingmulti-
variatenormaldistribution withmean(6, . ) and covariancematrixdetermined by

n cov (Xr,Xr) =
p) (i j). (2.5)

The firstpropertyofthe estimatorT thatwe need is consistency, whichstrictlymeans

thatT = A+ op,(1), whereAis thesolutionof (2 1); ifthereis a transformation in the class
(1.1) givingexact symmetry, then the solutionto (2.1) gives it, whateverp. Actually
consistencyis easy to verifyfromcontinuity oftheleft-hand side of (2.4) and consistency
ofXr' Xn_rl and X fortherespectivequantiles.Note thatX is asymptotically equivalent
to X[n, whichfactwe shalluse.
Now let us supposeA t- 0, and write,withr = [np],p+q1q=
Xr = Ap(1?n Wp), Xnr+l = gq(l +n-fJq),

X = 60.5(1+ n-IW0.5) (2.6)

Thentheestimating
equation(2.4) can be written

[cp{l + n-i( - W0.5) +

ov(n4)}]T + [OCq{1 + + o(n-I)}]T
n-T(WI- W0.5) 2, (2.7)
whereac,= 6sI,0.5and theac'ssatisfy
OCa,, = 2 (2.8)

expansionof (2.7) about T = A gives,using(2 8),

Since T is consistent,
by definition.

(T-A) (ac.log ax + aA log aq)

+ An {ctp( Wp +
- W0O6) q(Wq - WO-5)}+ op(T - A) + op(nm2) = 0.

That is, to firstorder,

Vn(T-A)/A = (2.9)
cA.log ac +?cqlog aZ
_,

We thenuse thelimitingjointnormality ofthe W's, whosecovariancematrixis deter-

minedby (2.5) and thetransformation
(2 6),to obtainthelimiting
normaldistribution
ofT.
104 DAVID V. HINKLEY
ofVn(T-A) is
If we defineh 1 = sf(s), the varianceofthe limitingnormaldistribution
foundto be
VT(A,p) = A4{h2+pq(4a2hx, + x2 h2) 2p((aphp+h hq)h?+2p24PA*hp}
(2h10)
ap,+ca,lo0gal) 2
(ac,lo0g

g(z) forthe trans-

in termsofthe probabilitydensityfunction
An alternativeexpression,
formedvariableZA,is

VT(A,p) = A4{g2+?pq(gi2 +g-2) - 2p(glg-1 + g-lg1) + 2p2g-glg-l}

{(1 + AKp)log (1 + AKp)+ (1 + AKq)1og(1+AKq)-2(1 + AK+)log(1+ AKI)}2
whereK,is the quantile definedby G(Kc) = s and g5 = g(K5).
Noticethattheproperties ofT are invariantunderscale changeof Y, as is immediately
obviousfromtheestimating equation(2.4).
The aboveresultsholdalso forA = 0, whenZ = log Y. Slightlymoregenerally,(2.1 1) for
smallA maybe written as
2
Vi +2pq(g2 + g-2) - 2pg l(g'l + g-1) + 2p2gg1g-l (2.12)
{1 (K2 +K2_ 2K2) _A(KP3+K3q-2Ic)}2

This contradictsthe typeof resultobtainedby Draper & Cox, but theirresultsare

wrong,as we showin thenextsection.
theresultsofthissectionare givenin ? 4.
Severalexamplesillustrating

3. NORMAL-THEORY MAXIMUM LIKELIHOOD

As pointedout in ? 1, previousworkon powertransformations has assumedthe trans-

formedvariableZAto be normallydistributed; in thesimplestcase thevariablesare taken
to be homogeneous N(lt,v). Draper& Cox derivedlarge-sample propertiesoftheestimator
ANobtainedby maximizing the N(G,v) likelihood.These properties wouldprovideuseful
standardsbywhichto judgethesimpleestimateT describedin ? 2; howeversomeofDraper
& Cox's resultsare incorrectand othersare incomplete.We therefore outlinethe
briefly
basic propertiesof the normal-theory maximumlikelihoodestimateANhere.
scorevectorU.,
The N(jt,v) likelihoodeL forZA,1,..., ZAfnleads directlyto the efficient
givenby

UA= ,A = -ogYj V IA-(Zt-y)

- {(1 + AZj)log(1 + AZj)-AZj},

L
U/J=L -V=-l(Z-it), UV.= (2V2)-l(Zj-)2 (2v)-1. (3.1)

An obvious feature of the component likelihood equation UA.= 0 is its invariance under
scale transformationof the originalvariable Y.
Provided that the densityf(y) of Y is regular and a unique solution of E(U.) = 0 exists,
as is the case forstandard continuousdistributionson [0, xo),the normal-theorymaximum
-likelihoodestimate convergesstochasticallyto the solution of E(U.) = 0 and has a limiting
normal distribution.
On power transformations
to symmetry 105
Let 0 = (A,,u,v) and denotethenormal-theory
maximumlikelihoodestimateby 0N with
limit ON. A standard expansion of the likelihood equation gives
1 32L] -1 1
J
A

Vn(9N- ON) = - p
(ON)+o(l) (3.2)
see,forexample,
Cox& Hinkley 9). ThenVn(ON- ON) hasa limiting
(1974,Chapter normal
distribution
withcovariance
matrix
= J-1IJ-1, (3.3)
where
F 2L]
nJ = Ef [-v] a nI = Ef{U.(ON)U' (ON)}; (3.4)

hereEfdenotes expectationwithrespecttothedensityf(y) ofY. NotethatE = I-1 onlyif

fisthe normaldensitybecause I = J if
only L istheloglikelihoodaccording
tothedensityf.
The generalform(3.3) is requiredwhenexamining properties OfANundernonnormal
as wedo in ?4.
distributions,
Draper& Coxincorrectly obtainthevariance OfANfrom I-1. Theirmethodofexpanding
U.as a powerseriesinAdoesleadto approximations forI andJ up toanyorderinA,but
theresultsforE areverycomplicated, involvingthefirst
sixmoments ofZA,andoflimited
In particular
usefulness. casesonecanevaluateE. Somegeneral forthecaseA = 0
results
aregivenin ?4.
4. EXAMPLES
andgamma
441. Exponential cases
To illustrate
thediscussion up tothispointwefirstexamine insomedetailtheexample
chosen byDraper& Cox,where theoriginal
variables
Y1,...,Y,areexponentially
distributed
withdensityf(y)= p exp(-py).
case(2.2)becomes
In thisparticular
(-logp)A+( logq)A = 2(log2)A. (4.1)
The quantilesof YAare
UP, p) = p-A{-log (1 -s)}A, (4.2)
K8(p,p)ofZAaregivenbyK
andthequantiles = A crudeoutlier-free
(y- 1)/A. measure
of
forZAis the'tiltfactor'
asymmetry

r(s,p)= (4(3) (O<

X?s(p'P)_V0(5(PP) s < 1).
NotethatthenonzerosolutionApof (4.1) and T(s,p) arebothindependent ofthescale
parameter.
Table1givessomevaluesofA. and (s,p) forp, s > 0 01.TheentriesshowthatA.isvery
nearlyconstantforp> 0 10;and,related
tothisstability, isa highdegree
there ofsymmetry
as faras theupperandlower5 % pointsofthetransformed distributions.
Muchthesame
conclusionswerereachedbyDraper& Cox,whonotedthatsmallchanges inAhavelittle
visibleeffect
onthesymmetry. TheWeibulldistribution ofZAis quitecloseto normality
exceptintheextreme tails.
106 DAVID V. HINKLEY

Table 1. Transformations,
Ap,and tiltfactors,7(s, p), in theexponentialcase
Quantile p 0 005 0 01 0.05 0.10 0.20 0 30 0 40
Transformationpower A, 0 272 0 28 0 291 0 297 0 303 0 305 0 307
s = 0*2 0*970 0 978 0 989 0*995 1.000 1.002 1 004
s = 0*1 0*963 0 975 0.991 1.000 1 009 1*018 1 015
s = 0-05 0-964 0-979 1.000 P011 1 023 1*027 1*031
s = 0*02 0*973 0*992 1*019 1*034 1*047 1 054 1 059
s = 0 01 0*985 1.000 1P038 1.055 1*072 1 078 1*084
The limitingnormal distributionof T is scale invariant, as we noted in ? 2, and hence
independentofp. The variance VTis given in Table 2 forthe same transformations described
in Table 1; rows below that forthe exponential case are definedlater.

Table 2. Large-samplevariance VTof thequantiletransformation

estimatefor
gamma distributions withindex r, includingexponential(r = 1)
\P 0*005 001 0.05 0.10 0.20
r\
1 0-589 0.582 1 012 1*894 6*271
2 1*704 1 670 2*968 6l148 19*916
3 2*841 2 718 4*507 8 982 36 069
4 3 977 3 748 5 936 11 442 43 420

It is interestingto see that ratherextremeorderstatisticsgive the best precision,p = 0 01

being close to optimal. This is a pity, in a sense, because rather large samples would be
requiredforanyone to have faithin the results! Also the method is consequentlysensitive
to outliers.
The correspondingresults for the normal-theorymaximum likelihood estimate AN are
easily derived using the efficientscore formulaein (3.1) togetherwith the identity

Jo0
C,l\gyrys-ydy
Y Y
-
(-
dsr
r +s) (s> 0),
which is related to the polygamma functions.The maximum likelihood estimate AN con-
verges to 0-265,to be compared with Draper & Cox's approximation 0268, and
FtN?P-ANrF(l+ AN), VN + 2AN).+
ftb-?p-2AVP(1
The variance VNofthelimitingnormaldistributionofVn(AN- AN)iS 0-314. Note fromTable 1
that A = 0-265 gives a relativelypoor degree of symmetry.
The above calculations forthe exponential case are easily extended to the generalgamma
density f(y) = yr-le-y/l](r),

and we have added such calculations in Tables 2 and 3 forr = 2, 3 and 4. The correcttrans-
formationpower A. for Y is quite stable at about 0-32 forthese cases, i.e. close to the con-
ventional cube root transformation.As r increases,the transformedvariable ZA is closer to
symmetryand normality.

Table 3. Large-sampleliMitANand variance VNof thenormal-theory maximumlikelihood

estimatefor gamma distributionof index r, includingexponential(r = 1)
r 1 2 3 4
AN 02654 0.301 0-312 0.318
VN 0 314 0 914 1-567 2.229
On power transformations
to symmetry 107

4-2. Examples withA = 0

For the special case A = 0 equation (2.12) gives a simpleexpressionforVT,the large-sample
variance of 4n(T - AP). A correspondingresult forthe normal-theorymaximum likelihood
estimate is quite easily derived from (3 3). Lengthy algebra gives

VN = 36(v2A(6) -6v3,(4)-2vg(3)Jt(5) + t3)lt + 7V2j3t+ 9v5) (4 4)

6,
(7v/L(4)
-
2)-3V3)2

whereitt(r)is the rth central momentof ZO= log Y. We now look at two specificexamples.
When log Y has the N(,t, v) density, (2.12) and (4 4) simplifyto
VT= XP4V-1(0-2 + 2pb-2 _ 4p-1 0-1), (4.5)
where eD(x8)= s and q5 = 56(x5),and VN= 3V-1.Some numerical values of VTare given in
Table 4. The smallest value of VToccurs at p = 0 01, at which point VNIVT 2/7T,
rather
interestingly.
Table 4. Large-sample in
estimate
variancesVTforquantiletransformation
lognormaland logdoubleexponential
cases
Normal-theory
maximum
p 0 005 0.01 0.02 0*05 0.10 likelihood
Normal: vVT P15 1 04 108 148 2.62 0.667
Double exponential:p-2VT 0-881 0.837 0.894 128 2.39 1*491
Note that the variances of log Y are respectivelyv and 2p2.

The effectof unknownA on estimationof at and v is seen fromthe complete covariance

matrix 1V-1 _(v+ ,/2)/V A
ENn var (ON) [-(V + /;2)IV V+ 6(V?t2)2/V
+ 21(V + It2)
4,U 2#U(V
+ru2) 2v2+ 2v
The potentiallyheavy increasein var (&N) due to not knowingAis clearlyworthinvestigating
in more generality.
If log Y has a distribution close to the normal, so that the standardized moments
Yl = Ih1(3)/V, V-2 -3, etc. are of successivelylower orderin some notional parameter,
Y2 = ,U(4)
we can approximate VNfrom(4.4) by
VN= 3^11-9VY 16)2)

In a sense this correspondsto (9) of Draper & Cox, theirfactor02 being incorrect.
A correspondingapproximation forVTis easily constructedfrom(2.12) using a Fisher-
Cornish expansion for KS and an Edgeworth expansion for g(z). The result is somewhat
complicated and will not be given here.
A distributioncharacterizingmuchlongertails than the normalis the double exponential,
with density lp exp (-p lzl). If log Y has this distribution,it is easy to show that (2.12)
becomes VT= p2(log2p)-4 (2p-1-4) (O<p < 2),
with values as in Table 4. The correspondingvalue of VNcalculated from(4.4) is 1491p2,
so that T is superiorto ANin large samples forp < 0 06. In termsofthe variance v ofZ, the
smallest value of VThere is 1 674v-1,compared with 1 044v-1 in the log normal case.
108 DAVID V. HINKLEY

5. GENERALIZATION
OF THE QUICKESTIMATE
541. The generalization
There are several ways in which one could generalize the estimator T definedby (2.2).
First, we could solve (2.2) forseveral values ofp and average the resultingestimates of A.
Secondly,we could, as it were,average the equation (2 2) forseveralp values and then solve
for the estimator. Other possible methods exist, but this latter method is the one we
examine here.
We propose,then,to use the equation (2 2) forseveral values ofp, say Pi < ... <Pm < 7,
and in fact to formthe combined equation
m m
E Cj(XT;+XTr ?i) = 2EcjXTX (45.1)
j=1 3=1

where rj = [npj]; the solution T = 0 is chosen only if

Zc log (Xr Xn-r+i) = 2Zcj log;1, (5.2)
corresponding to (2 3). The coefficientscl, ..., cm are arbitrary weights to be chosen. A more
convenientformof (5.1) is
D { 7+ (X;+)T} = 2Zcj. (5.3)

In practice it would be sensible to choose all cj's positive,particularlyif a monotonetrans-

formationof Y is symmetricallydistributed,since otherwiseasymmetryof quantile pairs
tends to cancel out in the summation.
The existenceofa unique nonzerosolutionto (5 .3) forpositive Cjis proved by the following
lemma, easily proved by convexity.
LEMMA.For arbitrary positive constants al, 6..) am, b , ..., bmand cl, ..., cm, the equation

E cj(aj + b) = 2Zcj (5.4)

has a single nonzeroreal solution unless 2cjlog (aj bj) = 0, in whichcase t = 0 is the only
solution.
Although the general equation (5.2) is interestingtheoreticallyfor any value of m, in
practice one mightwell restrictattentionto m = 2 or 3 and use equal weightsCj. Potentially
the use of m > 1 could accomplishtwo things: (i) increased precisionofthe transformation
estimate, (ii) an averaging out of the asymmetryin ZT when no ZA has a symmetric
distribution.
5.2. Large-sampleproperties
The groundworkforestablishinglarge-samplepropertiesof T has been laid in ? 2-2. Here
we outline the main steps and results.
By continuityof (5.2) and consistencyof the order statistics, T is consistentfor that
value A. of A satisfying
Ec ('rX + 6) - 2Zcj3 Ex5,

which would be common to all vectorsp if ZA is symmetricallydistributed.By the same

expansion route used in ? 2-2 we findthat forall A

VIn(T-) 2W1
= c( 5
lgC(p Wp+Oqo(Wq)
SEC 109f Otp +
(tV; Otq 109 Otq)
On power transformationsto symmetry 109
Hereand belowthesuffix j on c>,pj and qj has beendroppedfortypographical convenience.
The resultinglimitingnormaldistribution forT is again obtainedfromthe limitingjoint
normaldistributionoforderstatistics,and,using(2.5),thevarianceVT(A,p)is

VT(A,p)= A4[h -2h.5 cp(ch +ohq) )+c2{pq(4"h +G2Ahq)

a2 +2p20C4hpAhq}
+ 2 E cc'{pq'(aAa4,hp
hp+ch,
h aAa; hqhq)+ pp( ,
', hp
hq'+ a
hpJ,hq)}]
x {Ec(cA logap + aAlog aq)}-2. (5.6)
The notationthroughout is thatof ? 2-2.
A correspondingexpression forVTin termsofthedensityg(z) can be obtainedfrom(5.6)
in the sameway that (2.11) was derivedfrom(2.10). This simplyamountsto substituting
g-1 forac4h8in the numeratorand (1 + AK5)log (1 + AK) - (1 + AKO.5)log (1 + AK0.5)fora4loga,,
in the denominator of (5.6), wherewe recallthat G(Ks) = s and g, = (K8).
Theresult(5 6),as wehavegivenit,isforfinite m,andwouldapplywhenmis smallrelative
to n. If all theorderstatisticsXi areused,so thatm = [n] in (5.1),a corresponding
asymp-
toticresultcan be obtainedfora smoothweightfunctionc(x) definedby Cj= c{j/(n+ 1)}.
In termsofg(z) theresultis
A-4V (I)}2 - 2~fr(1)
J~{f +c
Al(c(c) A A3(c)
A4VT(A,C)= {2 2((C) 2(C)+A

f
= 1/g{G-l(x)}
where3fr(c) and

Al(c) = c(x) x{*f(x)+

-i/(1 -x)} dx, A2(c) = jX((- x) c2(x) {#2(X) + #f2(1-x)} dx,

A3(C) = fC(X) C(X')[X(1-X') {/f(X) ir(X')+ r(1-X) r(1-X')}

x<x'
+ xx'{V(x) 3b(1 - x') + 3b(1 - x) #(x')}]dxdx',
B(c) = c(x) [{1 + AG-1(x)}log{1 + AG-1(x)}+ {1 + AG-1(1-x)}
log{1 + AG-'(1- x)}] dx- 2{1 + AG-1(-)}log{1 +AG-1(2)}
requiredforthisresultwillnotbe givenhere;see Stigler(1974).
A discussionofconditions

6. AN EXAMPLE
Afterintroducing of T in ? 5, we need to assess whatis gainedin
the generalization
precisionat the expenseof complication. Fromcalculationswe have done it wouldseem
thatin factlittleis to be gained.Here we give onlyone example,the case wherelog Y is
normallydistributed.
WhenA = 0 and Z = log Y is N(1t,v),we saw in ? 4-2thatT has minimum large-sample
= =
varianceat p 0 01, whereVT 1 04v-1.Using a simplifiedform of (5.6) corresponding
to (2.12),we obtaintheresultsgivenin Table 5. The right-hand columnofthetable gives
valuesofvVT,and theotherentriesindicatevalues ofcj; thecj's sumto onein each case.
Somegeneralfeaturesare apparentfromthissmallsetofresults.Moststriking is thefact
thatifall valuesofp exceed0 05,thenm = 1,thatis theuse ofonepairoforderstatistics,
cannotbe markedly improvedon by m = 2. Use ofm = 3 withonevalue ofp equal to 0 01
can give up to 15% improvement in precision,whichis a littlebetterthan usingm = 2.
Withp = 0 05,0410,0-15and 0-20and each Cjequal to {, vVT= 2-23.We concludethatit is
notpossibleto escape theextremetails (p < 0 02) and keepprecision,unlessperhapsm is
considerably largerthan3.
110 DAVID V. HINKLEY

Table 5. Large-samplevarianceVTand coefficients

cj of thegeneralizedversionof T
whenlog Y is N(a, v)
p 0o01 002 005 0.10 VVT
1 0 0 0 104
a 2 0 0 0*92
o 1 0 0 1*08
0 0 0.91
o o 1 0 148
o 12
2 1P04
0 0 23 1 5
t54
o o a1 1P64
o 0 0 1 2*62
1 1
3
0
o *8
0-88
3 3
o 3
1
31
t2
P1
39

7. FURTHER DISCUSSION
Use ofpowertransformations such as (1 1) occurs mostfrequentlywithmore complicated
linear models than the singlemean case discussed in this paper. The abilityto generalizethe
estimator T definedby (5-1) depends to some extent on whetheror not the linear model
design includes replication.
Suppose that Y(j = 1, ..., r*)are replicatesof the ith cell of a linear model, meaningthat
forsome A
ZA, = I +e*j. (7.1)
We can generalize (2.2) and (2-3), or (5-1) and (5.2), as follows.Let Yi be the median of
variables in the ith cell, and define

Aii = Yj/Y (j= 1, .. ., r; i = 1,...,I). (7.2)

Then the ordered values of A j replace the ratios X%/l in (5.1) and (5.2). The estimating
equation so definedis not a trivialgeneralization,althoughthe consistencyof T forfixedI
and large n = Eri is still assured. The problem is that the standardization in (7.2) is non-
homogeneous,the more so if the variabilityof 4aiis large relative to that of the eij in (7. 1).
Assuming that the eij are homogeneous errors,it is clear that if
var (Y j)oc {E(Yzj)}b,
then cells with largermeans will dominate the estimatingequation, and hence T, if b > 2.
For example, if A = 1 in (7.1) then b = 0 and cells with small means dominate T; if A =
then b = 2 and no cell dominates T.
While we have not examined this problem in any detail, this does seem to be a suitable
situation foruse of the generalization (5.1) with m = [1n] and Cj = constant. This has the
disadvantage of requiringa large amount of computation.
An example that fitsinto this discussion is the firstnumerical example of Box & Cox
(1964), whichis a fourfoldreplicate of a 3 x 4 design. The normal-theorylikelihoodsuggests
that A = -1, althoughone would not discountvalues - 1 < A < 0. The threeoutmostpairs
of ordered Aq's each yield the estimate T = 0 by the method of ? 2. Fitting the additive
two-waylinear model by least squares with A =-1 and A = 0 gives negligibleinteractions.
Normal plots of residuals reveal that A = -1 gives a better fitto normality,although the
closeness to symmetryis about the same forboth A = - 1 and A = 0; in each case thereare
On power transformationsto symmetry 11
two or threemoderatelylarge outliers,not fromthe same data points.Thereis some evidence
that extremeAi's are associated with large cell means,whichsuggeststhat A is somewhat
negative. Strangely,use of less extremeAq's indicates A to be around 2 although there is
no consistentvalue forany particularpair.
This discussion is intended to suggest that there are difficulties
with the order statistic
method,particularlyin connexionwith complex models. When one is able to use the simple
estimatingequation (2.2), eitherin the original formor with the A i definedin (7.2), the
estimate T should be reasonably constant over the outermostpairs of order statistics in
order to be convincing.It would be helpful to understand more clearly the problem of
heterogeneityin the Aij's, particularlythroughexperience with applications.
One must conclude,however,that the need to use fairlyextremeorderstatisticsin order
to achieve preciseestimatesof A makes the quick method of ? 2 unappealing with moderate
amounts of data containing genuine outliers. Data transformationin the presence of
outliersis a riskybusiness.

REFERENCES
ANDREWS,D. F. (1971). A note on the selectionof data transformations. Biometrika58, 249-54.
ATKINSON,A. C. (1973). Testing transformations to normality.J. R. Statist.Soc. B 35, 473-9.
Box, G. E. P. & Cox, D. R. (1964). An analysis oftransformations (withdiscussion).J. R. Statist.Soc.
B 26, 211-52.
Cox, D. R. & HINELEY, D. V. (1974). TheoreticalStatistics.London: Chapman & Hall.
DRAPER, N. R. & Cox, D. R. (1969). On distributionsand their transformation to normality.J. R.
Statist.Soc. B 31, 472-6.
H-UBER, P. J. (1973). Robust regression:Asymptotics,conjecturesand Monte Carlo. Ann. Statist. 1,
799-821.
STIGLER, S. M. (1974). Linear functionsof orderstatisticswithsmoothweightfunctions.Ann. Statist.
2, 676-93.

[ReceivedAugust 1974. Revised September1974]

1981 Estimating The Dimension of A Linear-Model - J. Andel, M. G. Perez and A. I. Negrao
No ratings yet
1981 Estimating The Dimension of A Linear-Model - J. Andel, M. G. Perez and A. I. Negrao
12 pages
Kubat 1980
No ratings yet
Kubat 1980
8 pages
Mean Multi-Variate Normal Distribution: Inadmissibility of The Usual Esti - Mator For The OF
No ratings yet
Mean Multi-Variate Normal Distribution: Inadmissibility of The Usual Esti - Mator For The OF
10 pages
Estimando Una Funcion de Distribucion Con Datos Truncados
No ratings yet
Estimando Una Funcion de Distribucion Con Datos Truncados
16 pages
Cherof - On The Distribution of The Likelihood Ratio
No ratings yet
Cherof - On The Distribution of The Likelihood Ratio
7 pages
Statistical Inference Based On Pooled Data: A Moment-Based Estimating Equation Approach
No ratings yet
Statistical Inference Based On Pooled Data: A Moment-Based Estimating Equation Approach
23 pages
Estimation of A Common Multivariate Normal Mean Vector
No ratings yet
Estimation of A Common Multivariate Normal Mean Vector
11 pages
1 s2.0 016771529090099S Main
No ratings yet
1 s2.0 016771529090099S Main
8 pages
White John S. The Limiting Distribution of The Serial Correlation Coefficient in The Explosive Case
No ratings yet
White John S. The Limiting Distribution of The Serial Correlation Coefficient in The Explosive Case
11 pages
1 s2.0 0167715288900338 Main
No ratings yet
1 s2.0 0167715288900338 Main
7 pages
Sen Slope
No ratings yet
Sen Slope
12 pages
Entropy-Based Normality Test
No ratings yet
Entropy-Based Normality Test
7 pages
Stat 450850 Notes 2012
No ratings yet
Stat 450850 Notes 2012
190 pages
Maximum Likelihood An Introduction: L. Le Cam
No ratings yet
Maximum Likelihood An Introduction: L. Le Cam
31 pages
1520-0493-1520-0493 1958 086 0117 Anotgd 2 0 Co 2
No ratings yet
1520-0493-1520-0493 1958 086 0117 Anotgd 2 0 Co 2
6 pages
Wang Schaubel 2018 Supplemental
No ratings yet
Wang Schaubel 2018 Supplemental
12 pages
Quasi-Likelihood Functions, Generalized Linear Models
No ratings yet
Quasi-Likelihood Functions, Generalized Linear Models
10 pages
Ijaerv15n6 12
No ratings yet
Ijaerv15n6 12
16 pages
Minimum L - Distance Estimators For Non-Normalized Parametric Models
No ratings yet
Minimum L - Distance Estimators For Non-Normalized Parametric Models
32 pages
Ps 2,3
No ratings yet
Ps 2,3
48 pages
WorkedExample Cauchy
No ratings yet
WorkedExample Cauchy
3 pages
M604 Final Solutions
No ratings yet
M604 Final Solutions
20 pages
Statistical Models Based On Counting Processes (PDFDrive) PDF
No ratings yet
Statistical Models Based On Counting Processes (PDFDrive) PDF
778 pages
REML Estimation for Statisticians
No ratings yet
REML Estimation for Statisticians
8 pages
Assign20153 Sol
No ratings yet
Assign20153 Sol
47 pages
Aeroian - The Probabability Function of The Product of Two Normally Distributed Variables
No ratings yet
Aeroian - The Probabability Function of The Product of Two Normally Distributed Variables
7 pages
Robust Statistics
No ratings yet
Robust Statistics
11 pages
Sen 1968 Sen's Slope Method
No ratings yet
Sen 1968 Sen's Slope Method
12 pages
Standard Normal Transformation Guide
No ratings yet
Standard Normal Transformation Guide
7 pages
1 Notes On Brownian Motion: 1.1 Normal Distribution
No ratings yet
1 Notes On Brownian Motion: 1.1 Normal Distribution
15 pages
Lecture 1
No ratings yet
Lecture 1
8 pages
Notes
No ratings yet
Notes
10 pages
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 271 300
No ratings yet
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 271 300
30 pages
Asymptotic Regression Analysis
No ratings yet
Asymptotic Regression Analysis
22 pages
Sen 1968
No ratings yet
Sen 1968
12 pages
18.443 MIT Stats Course
No ratings yet
18.443 MIT Stats Course
139 pages
Module 4
No ratings yet
Module 4
3 pages
Lecture 21
No ratings yet
Lecture 21
9 pages
Dickey Fuller Test
No ratings yet
Dickey Fuller Test
6 pages
Cosistent Asymptotic Normal Estimator
No ratings yet
Cosistent Asymptotic Normal Estimator
11 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
11 pages
Estimations
100% (1)
Estimations
183 pages
Robert Engle Dan McFadden Handbook of Econometrics PDF
No ratings yet
Robert Engle Dan McFadden Handbook of Econometrics PDF
1,024 pages
On The Problem of Calibration G.K. Shukla Technometrics
No ratings yet
On The Problem of Calibration G.K. Shukla Technometrics
8 pages
Shapiro-Wilk Test for Normality
No ratings yet
Shapiro-Wilk Test for Normality
21 pages
Advanced Statistics Estimation With Handwritten Solutions
No ratings yet
Advanced Statistics Estimation With Handwritten Solutions
285 pages
Module04 Slides Print
No ratings yet
Module04 Slides Print
60 pages
Linear Method of Moments 1.1. The Model
No ratings yet
Linear Method of Moments 1.1. The Model
15 pages
Unit 4 1lec 5
No ratings yet
Unit 4 1lec 5
6 pages
Maximum Likelihood Estimation Guide
No ratings yet
Maximum Likelihood Estimation Guide
47 pages
Stat-Review Xid-8243919 1
No ratings yet
Stat-Review Xid-8243919 1
24 pages
STAT 2-2 Test of Hypothesis
No ratings yet
STAT 2-2 Test of Hypothesis
14 pages
STAT2102 Chapter6
No ratings yet
STAT2102 Chapter6
5 pages
Advanced Statistical Theory
No ratings yet
Advanced Statistical Theory
132 pages
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
No ratings yet
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
5 pages
Notes For Lectures 1 To 10 - 2024
No ratings yet
Notes For Lectures 1 To 10 - 2024
39 pages
Dattner and Reiser, Estimation of Distribution Functions in Measurement Error Models (2013)
No ratings yet
Dattner and Reiser, Estimation of Distribution Functions in Measurement Error Models (2013)
15 pages
Barndorff-Nielsen 1987
No ratings yet
Barndorff-Nielsen 1987
68 pages
Insertion Sort
No ratings yet
Insertion Sort
11 pages
FMEAgueorguiev 2020
No ratings yet
FMEAgueorguiev 2020
4 pages
Qlund 1983
No ratings yet
Qlund 1983
9 pages
UTCIWBGT
No ratings yet
UTCIWBGT
8 pages
ISO 7243 Paper
No ratings yet
ISO 7243 Paper
12 pages
Organizational Climate Lawler III Et Al.
100% (1)
Organizational Climate Lawler III Et Al.
17 pages
Multilevel Modeling: IRR & IRA Guide
No ratings yet
Multilevel Modeling: IRR & IRA Guide
37 pages
The 4 Influencers of Employee Engagement and Commitment
No ratings yet
The 4 Influencers of Employee Engagement and Commitment
2 pages
Organizational Climate and Culture Schneider
100% (1)
Organizational Climate and Culture Schneider
15 pages
Wet-Bulb Temperature From RH and Air Temperature
No ratings yet
Wet-Bulb Temperature From RH and Air Temperature
3 pages
Benchmarking RWG Interrater Agreement Indices HarveyHollander 2004SIOP
No ratings yet
Benchmarking RWG Interrater Agreement Indices HarveyHollander 2004SIOP
13 pages
VBA Data Types for Developers
No ratings yet
VBA Data Types for Developers
2 pages
SIPOC: A Six Sigma Tool Helping On ISO 9000 Quality Management Systems
No ratings yet
SIPOC: A Six Sigma Tool Helping On ISO 9000 Quality Management Systems
10 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
136 pages
Datasheet SFFBB11203XCN2D4
No ratings yet
Datasheet SFFBB11203XCN2D4
2 pages
Multiplying Two-Digit by Two-Digit Numbers Education Presentation in Cream Green Orange Nostalgic Handdrawn Style
No ratings yet
Multiplying Two-Digit by Two-Digit Numbers Education Presentation in Cream Green Orange Nostalgic Handdrawn Style
13 pages
Worksheet 8 Answers
No ratings yet
Worksheet 8 Answers
1 page
Data Stage Parallel Job Tutorial
No ratings yet
Data Stage Parallel Job Tutorial
76 pages
Rotor Balancing: HG 4 (Chapter 8)
No ratings yet
Rotor Balancing: HG 4 (Chapter 8)
30 pages
PYTHON Khurramshahzad
No ratings yet
PYTHON Khurramshahzad
20 pages
Appendix 4 - SPECIFICATION FOR STRUCTURAL STEEL MATERIAL FOR OFFSHORE STRUCTURES
100% (3)
Appendix 4 - SPECIFICATION FOR STRUCTURAL STEEL MATERIAL FOR OFFSHORE STRUCTURES
21 pages
Optimization of Submerged Arc Welding
No ratings yet
Optimization of Submerged Arc Welding
4 pages
Modified Fuel-less Air Engine Design
No ratings yet
Modified Fuel-less Air Engine Design
46 pages
Shree Cement LTD, Bangurcity: HALF YEARLY / YEARLY CHECK LIST For Monitoring of Earthing System (Earth Pits) (4 X 18 MW)
No ratings yet
Shree Cement LTD, Bangurcity: HALF YEARLY / YEARLY CHECK LIST For Monitoring of Earthing System (Earth Pits) (4 X 18 MW)
8 pages
SEIKO 6M13 Watch User Guide
100% (1)
SEIKO 6M13 Watch User Guide
20 pages
Questionnaire Performance Testing
No ratings yet
Questionnaire Performance Testing
10 pages
SM SSM DB Uk 003
No ratings yet
SM SSM DB Uk 003
4 pages
2) Change Control
No ratings yet
2) Change Control
4 pages
Grove 1997 VIII On The Gas Voltaic Battery Experiments Made With A View of Ascertaining The Rationale of Its Action and
No ratings yet
Grove 1997 VIII On The Gas Voltaic Battery Experiments Made With A View of Ascertaining The Rationale of Its Action and
23 pages
A I I E Transactions Volume 11 Issue 4 1979 (Doi 10.1080 - 05695557908974471) Muth, Eginhard J. White, John A. - Conveyor Theory - A Survey
100% (1)
A I I E Transactions Volume 11 Issue 4 1979 (Doi 10.1080 - 05695557908974471) Muth, Eginhard J. White, John A. - Conveyor Theory - A Survey
9 pages
SOT-23 Plastic-Encapsulate Transistors: Jiangsu Changjiang Electronics Technology Co., LTD
No ratings yet
SOT-23 Plastic-Encapsulate Transistors: Jiangsu Changjiang Electronics Technology Co., LTD
2 pages
Islam Et Al. - 2024 - iXGB Improving The Interpretability of XGBoost Us
No ratings yet
Islam Et Al. - 2024 - iXGB Improving The Interpretability of XGBoost Us
9 pages
Protein Study Guide for Students
No ratings yet
Protein Study Guide for Students
8 pages
Airport Terminal Capacity Model
No ratings yet
Airport Terminal Capacity Model
20 pages
Ad 001 en PDF
No ratings yet
Ad 001 en PDF
84 pages
Special Purpose Diodes Overview
No ratings yet
Special Purpose Diodes Overview
131 pages
Sand and Gravel For Se As Filtration Medium - Specification: Indian Standard
No ratings yet
Sand and Gravel For Se As Filtration Medium - Specification: Indian Standard
12 pages
Latihan Soal-Soal Bab 1-4 (Fismod)
No ratings yet
Latihan Soal-Soal Bab 1-4 (Fismod)
35 pages
OS Concepts for BSc IT Students
No ratings yet
OS Concepts for BSc IT Students
3 pages
Cariology: Presented By-Dr. Neha Sultana Post Graduate Student Department of Conservative Dentistry and Endodontics
No ratings yet
Cariology: Presented By-Dr. Neha Sultana Post Graduate Student Department of Conservative Dentistry and Endodontics
93 pages
Weibull-Analysis-In-Excel Standard IEC 61649
No ratings yet
Weibull-Analysis-In-Excel Standard IEC 61649
113 pages
Extraction Notes
No ratings yet
Extraction Notes
16 pages
7 Rational Functions Equations Inequalities
No ratings yet
7 Rational Functions Equations Inequalities
27 pages

Hinkley 1975

Uploaded by

Hinkley 1975

Uploaded by

Biometrika Trust

On Power Transformations to Symmetry

Z=] 4A (Z0), (1.1)

whichis the conditionforsamplequantilesoflog Y to be symmetric about the median.

The firstpropertyofthe estimatorT thatwe need is consistency, whichstrictlymeans

X = 60.5(1+ n-IW0.5) (2.6)

[cp{l + n-i( - W0.5) +

expansionof (2.7) about T = A gives,using(2 8),

(T-A) (ac.log ax + aA log aq)

That is, to firstorder,

We thenuse thelimitingjointnormality ofthe W's, whosecovariancematrixis deter-

g(z) forthe trans-

VT(A,p) = A4{g2+?pq(gi2 +g-2) - 2p(glg-1 + g-lg1) + 2p2g-glg-l}

This contradictsthe typeof resultobtainedby Draper & Cox, but theirresultsare

3. NORMAL-THEORY MAXIMUM LIKELIHOOD

As pointedout in ? 1, previousworkon powertransformations has assumedthe trans-

UA= ,A = -ogYj V IA-(Zt-y)

hereEfdenotes expectationwithrespecttothedensityf(y) ofY. NotethatE = I-1 onlyif

r(s,p)= (4(3) (O<

Table 2. Large-samplevariance VTof thequantiletransformation

It is interestingto see that ratherextremeorderstatisticsgive the best precision,p = 0 01

Table 3. Large-sampleliMitANand variance VNof thenormal-theory maximumlikelihood

4-2. Examples withA = 0

VN = 36(v2A(6) -6v3,(4)-2vg(3)Jt(5) + t3)lt + 7V2j3t+ 9v5) (4 4)

The effectof unknownA on estimationof at and v is seen fromthe complete covariance

where rj = [npj]; the solution T = 0 is chosen only if

In practice it would be sensible to choose all cj's positive,particularlyif a monotonetrans-

E cj(aj + b) = 2Zcj (5.4)

which would be common to all vectorsp if ZA is symmetricallydistributed.By the same

VT(A,p)= A4[h -2h.5 cp(ch +ohq) )+c2{pq(4"h +G2Ahq)

Al(c) = c(x) x{*f(x)+

A3(C) = fC(X) C(X')[X(1-X') {/f(X) ir(X')+ r(1-X) r(1-X')}

Table 5. Large-samplevarianceVTand coefficients

Aii = Yj/Y (j= 1, .. ., r; i = 1,...,I). (7.2)

[ReceivedAugust 1974. Revised September1974]

You might also like