Capability Indices for Non Normal Data
Capability Indices for Non Normal Data
To cite this article: D. W. McCormack, Jr. , Ian R. Harris , Arnon M. Hurwitz & Patrick D. Spagon
(2000) CAPABILITY INDICES FOR NON-NORMAL DATA, Quality Engineering, 12:4, 489-495,
DOI: 10.1080/08982110008962614
Ian R. Harris
Department of Mathematics and Statistics
Northern Arizona University
P.O. Box 5717
Flagstaff, Arizona 8601 1
Arnon M. Hurwitz
Qualtech Productivity Solutions
3rd Floor, Sanclare, Dreyer Street
Claremont 7700 (Cape), South Africa
Patrick D. Spagon
Statistical Methods
Motorola University Southwest
505 Barton Springs Road, Suite 300
Austin, Texas 78704
rnetric models. Simulation studies are performed indicating exists detailing the characteristics of Normally distributed
that as a distribution becomes more skewed and kurtotic, random variables. A fourth premise is that the process mean
the alternative PC1 shows less bias and has a smaller mean is centered between the USL and LSL. Such an assumption
square error lhan C,,,. is necessary to establish a relationship between the speci-
fication range and the process distribution. Without it, it
Capability Indices for would be impossible to estimate the proportion of the pro-
cess distribution that falls within the specification range.
Norn~allyDistributed Processes
Violation of this premise leads to spuriously high el,
values.
Process capability indices appear to rely on two general
el,
If the above four assumptions hold, a of 1.0 indicates that
the specification limits contain approximately 99.87% of the
methods to quantify capability. The first method focuses on observations.
the proportion of nonconforming observations (i.e., those At times, one may wish to relax the last assumption. This
that fall outside of the specification limits). It is the older can be accomplished by using an alternative PC], C,,,. It can
and, perhaps, more widely used approach. A rnore recent be defined as
method is to define capability in terms of deviation from an
intended value. Typically, a loss function is proposed based
on some function of the difference between observed and
where
carget values. Examples include C,,,,,(1,2) and its variants
(sce Ref. 3, Chap. 3). To limit the size and scope of the pres- p - LSL
C/'I =
ent inquiry, only the former approach will be considered. 3u '
This choice was based on the authors' preference and should USL - p
not bc deemed an indication of the relative importance of the c,,,,= 3u .
two n~ethods.
Perhaps the most widely used PC1 is C,,. It is typically As can be seen, the relationship between the process dis-
defined as tribution and the specification limits is explicitly established.
C,,,,and C,,,can be useful when only one specification limit
USL - LSL is of interest. A functional relationship between C, and C,,,
c,, = 6u ' exists such that C,,, = ( I - k)C,, (6), where k measures a
where USL is the upper specification limit, LSL is the lower shift in the process mean from the center. C,, can be esti-
specification limit, and u is the process standard deviation. mated by replacing the process mean and standard deviation
A natural estimator for ~,,~e,,,
can be arrived at by re- with appropriate estimators that rely on sampleobservations.
The distributional characteristics of el,,,el,,, el,,
and can be
placing rr with 5 and /L with X , where
found in Ref. 3. To ensure non-negative values for C,,,, the
process mean must fall within the upper and lower specifi-
cation limits.
C,, Iiieasures the size of the process standard deviation
relative to the specification range. Several assumptions are Non-Normal Data
often made about the process being characterized to make
C,,, and the statistical properties associated with it, more The ability of el, el,,
and to provide an accurate estimate
easily interpreted. First, it is assumed that the system on of capability relies on the underlying process being Nor-
which the measurements are taken is in control. This is mally distributed. Given the assumptions stated earlier (e.g.,
required for characterization to reflect accurately system that any nonstationarity in the mean is accounted for), this
capability rather than performance at some point in time. is possible because the proportion of observations falling
Second, process observations are taken as independent and outside any specified range is affected by its variance only.
identically distributed. Independence ensures that C,, is not Knowledge of the mean and variance allows one to calculate
affccted by correlation among the observations. When auto- with certainty where any value or range of values lies within
el,
correlation exists, may underestimate or overestimate C,, the distribution. This is not necessarily the case for other
(3-5). A third assumption is that the process follows a Nor- distributions. The shapes of non-Normal distributions are
rnal distribution. Mathematical tractability is a strong reason usually affected by moments not expressible in terms of the
for using the Normal. In addition, a wealth of information mean and variance alone. When this is the case, it is ,lo
CAPABILITY INDICES FOR NON-NORMAL DATA
longer possible to say accurately how much of the process is substitutes tolerance limits for process capability. The use of
in specification, unless these moments are estimated along tolerance limits is well established (3) and can be found in
with the mean and variance of the underlying distribution. the work of Hahn (18,19) and Guenther (20,21). Franklin
Gunter (7) demonstrates this situation by showing three dif- and Wasserman (22,23) used the bootstrap, a technique de-
ferent distributions with identical e,, el,,
and but different veloped by Efron (24), to generate confidence intervals for
proportions of observations falling otuside of fixed specifi- el,;,,
ep,, e,,,,
and for non-Normal process distrbibutions.
cation limits. Chan et al. (8) and Kocherlakota et al. (9) pro-
vide additional information on the effects of non-Normality
on e,, andc,, . Empirical Percentile Method
The problems engendered by non-Normality are not new
to manufacturing. Kane (lo), in a seminal article, mentioned Harris (25) examined the behavior of data-fitting methods
problems associated with non-Normality. Development of on non-Normal data. One of the goals of this study was to
capability indices for non-Normal process distributions has find a suitable alternative to C,,, for conditions that were in-
received increased attention in the 10 years since. Presently, herently non-Normal. He found that in samples of greater
there are a small number of PCIs that may be used for non- than 100 observations, there was little difference between the
Normal process distribution~.They can be characterized by empirical cumulative distribution function (CDF) and the
the assumptions they make about the process distribution. other methods, in terms of bias of estimated extreme percen-
Model-based methods attempt to fit the data to a spe- tiles. This suggests that empirical CDFs may be appropriate
cific distribution. Two model-based methods use general models for distributions with as few as 100 observations and
distributions. Clements (1 1) uses the Pearson family of dis- that nonparametric forms of PCIs may be used to circumvent
tributions to provide a flexible means to fit data. With this the problems encountered by parametric models when the
method, skew and kurtosis are calculated from sample val- process distribution is not Normal.
ues. A set of tables is provided that allows one to match the The study above led to the development of four PCIs.
appropriate curve to the data indirectly. Once this is done, In each,, the parametric estimate of process capability is re-
process variability is estimated by determining the distribu- placed by a nonparametric approximation based on ernpiri-
tional points where P(X 5 x ) = 0.00165 and P(X 5 x) = cal percentiles. Specifically, these PCls are
0.99835. (This is facilitated by a set of tables that accompany
Ref. 11.) Page (12) suggests that the same may be accom- - -- LSL '
c,,, = USL
(99.5 50.3
plished with the Johnson family of distributions (cf. Ref. 13).
A third method is to use mixture distributions to fit a
model to data. Mixture distributions are treated extensively
in the literature and have gained wide acceptance in the sta-
tistical community (cf. Ref. 14). Although these techniques
are computationally intensive, the evolution of the computer
has made this less of an issue. Johnson et al. (15) suggest a
flexible PC1 that takes into account possible skew in the pro-
cess distribution. It relies on using two different standard de- ti
where is the ith empirical percentile.
viations: one for observations falling below the target value The choice of the 0.5th and 99.5th percentiles was based
and the other for observations above the target value. on the finding that little difference existed between methods
Some approaches make few or no assumptions about the when the sample size was greater than 100. If one defines
process distribution. Pearn et al. (16) offer a PC1 similar to the lOOpth percentile, such that 1/2n 5 p 5 1 - 1/2n, esti-
that of Clements but has the advantage of being computa- mation of both the 0.5th and 99.5th percentiles are possible
tionally simpler. The assumption is made that the Johnson with 100 or more data points. When two specification limits
family of curves can be used to model the process distribu- are present, the process capability estimate represents 99%
tion. Unlike Clements, however, they select a denominator of the empirical distribution. If only one specification limit
for the PC1 that minimizes risks caused by skewness. The is used, this percentage is 49.5%. It is immediately apparent
value 5 . 1 5 is
~ used as an estimate of process standard devi- that there is a difference between these percentages and those
ation rather than 6 u . It can be shown that this substitution typically found in conventional PCIs. If, for example, en,,
=
results in a PC1 that is less affected by skewness in the pro- 1.O, then the minimum expected percent of nonconforming
cess distribution. Chan et al. (17) outline a technique that product is 1.0%, not 0.27% as it is for e,,.
McCORMACK ET AL.
and
C,, to C,,,,, one can see that, for the Normal distribution, results found in Study 1. As the underlying distributions
both variance and bias are smaller in all sample sizes. C,, became more non-Normal, bias increased with the change
shows smaller biases and MSEs but larger variances for the more pronounced for C,,. As the sample size increased, the
two gamma distributions. This would suggest that as skew- variance decreased, for all distributions. The change in vari-
ness increases, the nonparametric capability indices are pref- ance was greater for C',,,. A comparison of the PCls shows
erable to C, . that in 33 of the 40 cases (82.5%), e,,,, had a smaller bias
Study 2 involved simulating data from 10 additional dis- than I?,',,. The seven cases where I?,,;,,outperformed were en,,
tributions. A computer program was written to solve for from either the Normal distribution or from samples of size
Johnson family curves with specified skewness and kurto- 50. In all cases, the variance was smaller for tPk. Approxi-
sis. The distributions generated are shown in Table 3, with mately half the cases showed en',,,
to have a smaller MSE
curve I corresponding to the Normal distribution. Ranges than t P k .When the non-Normal cases of sample sizes 250
selected for P , and P, were based on the results from and 500 are examined, however, el,,, outperforms tpk 67%
Study I. Of interest was the behavior of C,,, and CnPkwhen of the time (12 of 18 cases).
small to moderate skewness and kurtosis were present. Four
different sample sizes, N = 50, 100, 250, and 500, were Conclusion
investigated for each of the 10 curves. Five thousand data-
sets were simulated for each' of the 40 conditions (10 curves The above results suggest that I?,, ', is a viable estima-
times 4 sample sizes). The MSE, bias, and variance were tor of process capability for non-Normal distributions. The
compared for two PCIs, ep, and I?,,',,,. The results can be drawback of being associated with a different percent non-
found in Table 4.
Individually, the behavior of each PC1 was similar to the
conforming than e,,;,,
is outweighed by the benefits provided
in its ability to estimate capability accurately when the pro-
cess distribution is not Normal. Several questions remain un-
answered, however. How well would en,,
perform against
Table 3. Distributions Used in Study 2 other PCIs for non-Normal data? As mentioned previously,
CURVE P, P2 CURVE P, P2 there is a class of capability indices that were not consid-
ered for this article. Are confidence intervals available for
en,,? This is important in order to provide users of I?.,,with
,,
Table 4. MSE,Var, and Bias for Curves from the Johnson Family
C,,l,k C1a
MSE VAR BIAS CURVE SIZE MSE VAR BIAS
CAPABILITY INDICES FOR NON-NORMAL DATA
this study has provided some insight into the use of capabil- Chan, L. K., Cheng. S. W., and Spiring, F.A., The Robustness
ity indices for non-Normal process distributions. of Process Capability Index C,, to Departures from Normal-
ity, in Staristical Theory arid Data Arlalysis, 11, edited by
K . Matusita, Elsevier ScienceINorth-Holland, Amsterdam.
References 1988, pp. 223 -229.
Hahn, G. 1.. Statistical Intervals for a Normal Population,
Chan, L. K., Cheng, S. W., and Spiring, F, A., A Graphical Part I. Tables, Examples and Applications, J. AUI. Statist.
Technique for Process Capability, ASQC Quality Corlgress Assoc., 2(3), 115-125 (1970).
Transactions, 1988. pp. 268-275. Hahn, G. J., Statistical Intervals for a Norrnal Population.
Spiring, F. A., An Application of C,,,,to the Toolwear Prob- Part 11. Formulas, Assumptions, Some Derivations, J. Qual.
lem, ASQC Quality Congress Transactions, 1989, pp. 123- Techrlol., 2(4), 195-206 (1 970).
128. Guenther, W. C., Determination of Sample Size for Distri-
Kotz, S, and Johnson, N. L.. Process Capabilify Indices, bution-Free Tolerance Limits, Am. Stutisticiar~,24, 44-46
Chapman & Hall, New York, 1993. (1970).
Deligonul, S. 2. and Mergen, E., Dependence Bias in Con- Guenther, W. C., Two-Sided Distribution-Free Tolerance In-
ventional p-Charts and Its Correction with an Approximate tervals and Accompanying Sample Size Problems, J. Qual.
Lot Quality Distribution. J. Appl. Statist., 14(1), 75-81 Technol., 17(1), 40-43 (1985).
(I 987). Franklin, L. A. and Wasserman, G., Bootstrap Confidence In-
Deligonul, S. Z., On the Independence Condition for the terval Estimates of C,,,: An Introduction, Commun. Slatist.-
Classical Variance Estimator in an Autoregressive Moving Sirnul., 20(1), 231-242 (1 991).
Average Process, J. Statist. Comput. Simul., 17(3), 233-236 Franklin, L. A. and Wasserman, G., Bootstrap Lower Confi-
(1983). dence Limits for Capability Indices, J. Qual. Technol., 24(4),
Porter, L. J. and Oakland. J. S., Process Capability Indices- 196-210 (1992).
An Overview. Qual. Reliab. Eng. Int., 7,437-447 (1991). Efron, B., The Jackknife, the Bootstrap a11dOrher Resampling
Gunter, 8 . H., The Use and Abuse of C,,, Part 2, Qual. Prog., Plans, Society for Industrial and Applied Mathematics, Phila-
22(3), 108-1 09 (1989). delphia. 1982.
Chan, L. K., Cheng, S. W., and Spiring, F. A., A New Mea- Harris. I. R., Nonnormal Particle Data Project: Phase 11, Tech-
sure of Process Capability, C,,,,, J. Qual. Technol., 20(3). nical Report, SEMATECH. Austin, TX.(1994).
160-175 (1988). Nelson, P. R., Editorial, J. Qual. Technol., 24(4), 175 (1992).
Kocherlakota, S., Kocherlakota, K., and Kirmani, S. N. A. U., Casella, G. and Berger, R. L., Statistical Inference, Wads-
Process Capability Indices Under Nonnormality, Int. J. Math. worth and BrooksICole, Pacific Grove, CA, 1990.
Statist. Sci., l(2). 175-210 (1992).
Kane, V. E., Process Capability Indices, J. Qual. Technol.,
IS(]), 41 -52 (1986). About the Authors: D. W. McCormack, Jr. is a statistician at
Clements, J. A., Process Capability Calculations for Non- SEMATECH where he is responsible for developing and
Normal Distributions, Qual. Prog., 22.95-100 (1989). teaching statistics courses and consulting. H e is a member
Page, M., Analysis of Non-Normal Process Distributions, of the American Statistical Association.
Semicond. Inr., 10.88-96 (1994). Ian R. Harris is associate professor in Northern Arizona
Johnson, N. L., Systems of Frequency Curves Generated by University's Department of Mathematics and Statistics. His
Methods of Translation. Biometrika, 36, 149-176 (1949). publications have appeared in Biometrika, Statistics and
Johnson, N. L. and Kotz, S., Distributions in Sturistics: Con- Probability Letters, and the Journal of Applied Statistics.
tirluous Univariate Distributions. Houghton Mifflin, New
Arnon M . Hunvitz is founder and director of Qualtech
York, 1970.
Productivity Solutions, an international productivity consult-
Johnson, N. L., Kotz, S., and Pearn, W. L., Flexible Pro-
cess Capability Indices, Pakistan J. Starist.. lO(1 A). 23-31 ing firm in Cape Town, South Africa. H e is a member of the
(1 994). American Society for Quality.
Pearn, W. L., Kotz, S., and Johnson, N. L., Distributional and Patrick D. Spagon works at Motorola University's Col-
l~iferentialProperties of Process Capability Indices, J. Qual. lege of Statistical Methods. He is a member of the American
Technol.,24(4), 21 6-23 1 (1992). Statistical Association and the American Society for Quality.