0% found this document useful (0 votes)
7 views

Capability Indices for Non Normal Data

The document discusses the development of process capability indices (PCIs) for non-normal data, highlighting the limitations of traditional methods that assume normality. It emphasizes the importance of using nonparametric PCIs based on empirical percentiles to accurately assess process capability in non-normal distributions. The authors present simulation studies and alternative methods to address the challenges posed by non-normality in manufacturing processes.

Uploaded by

jucianebernardes
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Capability Indices for Non Normal Data

The document discusses the development of process capability indices (PCIs) for non-normal data, highlighting the limitations of traditional methods that assume normality. It emphasizes the importance of using nonparametric PCIs based on empirical percentiles to accurately assess process capability in non-normal distributions. The authors present simulation studies and alternative methods to address the challenges posed by non-normality in manufacturing processes.

Uploaded by

jucianebernardes
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Quality Engineering

ISSN: 0898-2112 (Print) 1532-4222 (Online) Journal homepage: https://www.tandfonline.com/loi/lqen20

CAPABILITY INDICES FOR NON-NORMAL DATA

D. W. McCormack, Jr. , Ian R. Harris , Arnon M. Hurwitz & Patrick D. Spagon

To cite this article: D. W. McCormack, Jr. , Ian R. Harris , Arnon M. Hurwitz & Patrick D. Spagon
(2000) CAPABILITY INDICES FOR NON-NORMAL DATA, Quality Engineering, 12:4, 489-495,
DOI: 10.1080/08982110008962614

To link to this article: https://doi.org/10.1080/08982110008962614

Published online: 29 May 2007.

Submit your article to this journal

Article views: 274

View related articles

Citing articles: 9 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=lqen20
Quality Engineering, 12(4),489-495 (2000)

CAPABILITY INDICES FOR NON-NORMAL DATA


D. W. McCormack, Jr.
Sematech
2706 Montopolis Drive
Austin, Texas 78741

Ian R. Harris
Department of Mathematics and Statistics
Northern Arizona University
P.O. Box 5717
Flagstaff, Arizona 8601 1

Arnon M. Hurwitz
Qualtech Productivity Solutions
3rd Floor, Sanclare, Dreyer Street
Claremont 7700 (Cape), South Africa

Patrick D. Spagon
Statistical Methods
Motorola University Southwest
505 Barton Springs Road, Suite 300
Austin, Texas 78704

Key Words quently occur in practice, however. Two examples often


found in high-purity manufacturing are particle count distri-
Non-Normal processes; Process capability; Nonparamet- butions and process yield data. Although the problems asso-
ric process capability indices. ciated with non-Normality have been realized for some time,
only recently has research focused on developing PCls for
Introduction such situations.
The focus of this investigation is to serve as a brief over-
Process capability indices (PCls) have received extensive view of PCls and to investigate the feasibility of using non-
treatment in professional, technical, and academic literature. parametric PCIs based on empirical percentiles. Current
The most common and earliest forms of PCIs assume that research shows that empirical cumulative distribution func-
the process under examination is Normally distributed. Vio- tions are appropriate models for distributioris with as few
lation of this assumption often leads to inappropriate or as 100 observations. This suggests that nonparametric PCls
misleading results. Non-Normally distributed processes fre- may be used to circumvent problems encountered by para- .

Copyright 6 2000 by Marcel Dekker, Inc.


McCORMACK ET AL.

rnetric models. Simulation studies are performed indicating exists detailing the characteristics of Normally distributed
that as a distribution becomes more skewed and kurtotic, random variables. A fourth premise is that the process mean
the alternative PC1 shows less bias and has a smaller mean is centered between the USL and LSL. Such an assumption
square error lhan C,,,. is necessary to establish a relationship between the speci-
fication range and the process distribution. Without it, it
Capability Indices for would be impossible to estimate the proportion of the pro-
cess distribution that falls within the specification range.
Norn~allyDistributed Processes
Violation of this premise leads to spuriously high el,
values.
Process capability indices appear to rely on two general
el,
If the above four assumptions hold, a of 1.0 indicates that
the specification limits contain approximately 99.87% of the
methods to quantify capability. The first method focuses on observations.
the proportion of nonconforming observations (i.e., those At times, one may wish to relax the last assumption. This
that fall outside of the specification limits). It is the older can be accomplished by using an alternative PC], C,,,. It can
and, perhaps, more widely used approach. A rnore recent be defined as
method is to define capability in terms of deviation from an
intended value. Typically, a loss function is proposed based
on some function of the difference between observed and
where
carget values. Examples include C,,,,,(1,2) and its variants
(sce Ref. 3, Chap. 3). To limit the size and scope of the pres- p - LSL
C/'I =
ent inquiry, only the former approach will be considered. 3u '
This choice was based on the authors' preference and should USL - p
not bc deemed an indication of the relative importance of the c,,,,= 3u .
two n~ethods.
Perhaps the most widely used PC1 is C,,. It is typically As can be seen, the relationship between the process dis-
defined as tribution and the specification limits is explicitly established.
C,,,,and C,,,can be useful when only one specification limit
USL - LSL is of interest. A functional relationship between C, and C,,,
c,, = 6u ' exists such that C,,, = ( I - k)C,, (6), where k measures a
where USL is the upper specification limit, LSL is the lower shift in the process mean from the center. C,, can be esti-
specification limit, and u is the process standard deviation. mated by replacing the process mean and standard deviation
A natural estimator for ~,,~e,,,
can be arrived at by re- with appropriate estimators that rely on sampleobservations.
The distributional characteristics of el,,,el,,, el,,
and can be
placing rr with 5 and /L with X , where
found in Ref. 3. To ensure non-negative values for C,,,, the
process mean must fall within the upper and lower specifi-
cation limits.
C,, Iiieasures the size of the process standard deviation
relative to the specification range. Several assumptions are Non-Normal Data
often made about the process being characterized to make
C,,, and the statistical properties associated with it, more The ability of el, el,,
and to provide an accurate estimate
easily interpreted. First, it is assumed that the system on of capability relies on the underlying process being Nor-
which the measurements are taken is in control. This is mally distributed. Given the assumptions stated earlier (e.g.,
required for characterization to reflect accurately system that any nonstationarity in the mean is accounted for), this
capability rather than performance at some point in time. is possible because the proportion of observations falling
Second, process observations are taken as independent and outside any specified range is affected by its variance only.
identically distributed. Independence ensures that C,, is not Knowledge of the mean and variance allows one to calculate
affccted by correlation among the observations. When auto- with certainty where any value or range of values lies within
el,
correlation exists, may underestimate or overestimate C,, the distribution. This is not necessarily the case for other
(3-5). A third assumption is that the process follows a Nor- distributions. The shapes of non-Normal distributions are
rnal distribution. Mathematical tractability is a strong reason usually affected by moments not expressible in terms of the
for using the Normal. In addition, a wealth of information mean and variance alone. When this is the case, it is ,lo
CAPABILITY INDICES FOR NON-NORMAL DATA

longer possible to say accurately how much of the process is substitutes tolerance limits for process capability. The use of
in specification, unless these moments are estimated along tolerance limits is well established (3) and can be found in
with the mean and variance of the underlying distribution. the work of Hahn (18,19) and Guenther (20,21). Franklin
Gunter (7) demonstrates this situation by showing three dif- and Wasserman (22,23) used the bootstrap, a technique de-
ferent distributions with identical e,, el,,
and but different veloped by Efron (24), to generate confidence intervals for
proportions of observations falling otuside of fixed specifi- el,;,,
ep,, e,,,,
and for non-Normal process distrbibutions.
cation limits. Chan et al. (8) and Kocherlakota et al. (9) pro-
vide additional information on the effects of non-Normality
on e,, andc,, . Empirical Percentile Method
The problems engendered by non-Normality are not new
to manufacturing. Kane (lo), in a seminal article, mentioned Harris (25) examined the behavior of data-fitting methods
problems associated with non-Normality. Development of on non-Normal data. One of the goals of this study was to
capability indices for non-Normal process distributions has find a suitable alternative to C,,, for conditions that were in-
received increased attention in the 10 years since. Presently, herently non-Normal. He found that in samples of greater
there are a small number of PCIs that may be used for non- than 100 observations, there was little difference between the
Normal process distribution~.They can be characterized by empirical cumulative distribution function (CDF) and the
the assumptions they make about the process distribution. other methods, in terms of bias of estimated extreme percen-
Model-based methods attempt to fit the data to a spe- tiles. This suggests that empirical CDFs may be appropriate
cific distribution. Two model-based methods use general models for distributions with as few as 100 observations and
distributions. Clements (1 1) uses the Pearson family of dis- that nonparametric forms of PCIs may be used to circumvent
tributions to provide a flexible means to fit data. With this the problems encountered by parametric models when the
method, skew and kurtosis are calculated from sample val- process distribution is not Normal.
ues. A set of tables is provided that allows one to match the The study above led to the development of four PCIs.
appropriate curve to the data indirectly. Once this is done, In each,, the parametric estimate of process capability is re-
process variability is estimated by determining the distribu- placed by a nonparametric approximation based on ernpiri-
tional points where P(X 5 x ) = 0.00165 and P(X 5 x) = cal percentiles. Specifically, these PCls are
0.99835. (This is facilitated by a set of tables that accompany
Ref. 11.) Page (12) suggests that the same may be accom- - -- LSL '
c,,, = USL
(99.5 50.3
plished with the Johnson family of distributions (cf. Ref. 13).
A third method is to use mixture distributions to fit a
model to data. Mixture distributions are treated extensively
in the literature and have gained wide acceptance in the sta-
tistical community (cf. Ref. 14). Although these techniques
are computationally intensive, the evolution of the computer
has made this less of an issue. Johnson et al. (15) suggest a
flexible PC1 that takes into account possible skew in the pro-
cess distribution. It relies on using two different standard de- ti
where is the ith empirical percentile.
viations: one for observations falling below the target value The choice of the 0.5th and 99.5th percentiles was based
and the other for observations above the target value. on the finding that little difference existed between methods
Some approaches make few or no assumptions about the when the sample size was greater than 100. If one defines
process distribution. Pearn et al. (16) offer a PC1 similar to the lOOpth percentile, such that 1/2n 5 p 5 1 - 1/2n, esti-
that of Clements but has the advantage of being computa- mation of both the 0.5th and 99.5th percentiles are possible
tionally simpler. The assumption is made that the Johnson with 100 or more data points. When two specification limits
family of curves can be used to model the process distribu- are present, the process capability estimate represents 99%
tion. Unlike Clements, however, they select a denominator of the empirical distribution. If only one specification limit
for the PC1 that minimizes risks caused by skewness. The is used, this percentage is 49.5%. It is immediately apparent
value 5 . 1 5 is
~ used as an estimate of process standard devi- that there is a difference between these percentages and those
ation rather than 6 u . It can be shown that this substitution typically found in conventional PCIs. If, for example, en,,
=
results in a PC1 that is less affected by skewness in the pro- 1.O, then the minimum expected percent of nonconforming
cess distribution. Chan et al. (17) outline a technique that product is 1.0%, not 0.27% as it is for e,,.
McCORMACK ET AL.

Discussion Specification limits for the distributions were chosen so


that 99.865% of the data fell between + 3 a and - 3 a ; that
One must proceed cautiously when employing capability is, the USL was set to the value at which P(X 5 x) = 0.9973,
indices. Use of a single measure to describe distributional where X is the random variable corresponding to the distri-
characteristics is tenuous. As Nelson (26) argues, such an bution being simulated. Likewise, the LSL was selected to
approach is "fundamentally flawed." Unfortunately, this has be the value at which P(X 5 x ) = 0.0027. For the standard
little effect on the use of these statistics in nonacademic set- Normal, r ( l 6 , I), and r(0.25, 1) distributions, the USLs
tings. 'The situation often faced by the industrial statisticians were 3, 30.688759, and 0.482133, and the LSLs were -3,
is how to make best use of an imperfect measure. 6.599174, and 2.241239 X 10-12, respectively.
The mean squared error (MSE),variance (Var), and bias
were calculated for all estimators. The formulas for MSE,
Monte Carlo Simulations variance, and bias are respectively given by

Two studies, both using Monte Carlo simulation, were


performed to investigate the characteristics of C,, and C,,,,, ~ [ ( -i $)2],
whe'n samples were drawn from distributions with varying
- Var,
~ M S E
degrees of non-Normality. The first study looked at three dis-
tributions: standard normal, 1'(16, I ) , and r(0.25, I), where where 8 is the true parameter, 8 is the estimate of 8 , and 8 is
]'(a, h) corresponds to the mean of the estimate of 0 (26). Equations (14) and (15)
can be estimated by

and

).(-I = xu-I exp -.' dr


which is only possible when the true value of the parameter
and four sample sizes, N = 50, 100,250, and 500, were used.
8 is known a priori. When determining the quality of an es-
All combinations of distributions and sample sizes were
timator, these three statistics are often compared. The MSE
used. gives the mean squared difference between the estimator and
Measures of skewness and kurtosis were defined as
the true parameter. The variance is the average squared dif-
ference between the estimator and its mean. It is a measure
of how much stochastic noise is associated with the estima-
tor, independent of the quantity it is estimating. Bias is an
indication of how accurate the estimator is. As the sample
size increases, an unbiased estimator will more closely ap-
respectively, where ,u2,p 3 , and p,, are the second, third, proximate the value of the parameter. This measure is inde-
and fourth central moments, respectively. Skewness and kur- pendent of the variability of the estimator. An unbiased esti-
tosis values for the three distributions in Study I can be mator may have a large variance, and vice versa. It can be
found in Table I. shown that the MSE is the sum of the bias squared and the
Five thousand data sets were generated for each of the 12 variance of an estimator (27).
experimental conditions. Version 3 . 2 of S-PLUSwas used to Results from Study 1 appear in Table 2. The bias in-
generate the data and perform the analysis. creases for both PCIs as the distributions become more non-
Normal. The variance, which accounts for a smaller portion
of MSE, decreases or remains the same. Increases in sample
Table I . Distributions Used in Study 1 size lead to smaller variances for both PCIs. Decreases are
DISTRIBUTION SKEWNESS KURTOSIS more rapid for C, , however. Bias does not appear to change
for C,,,. The results for C,,,, indicate that bias increases for
N(0, I ) 0 3 the non-Normal distributions. Although this change is small,
17(1,0.25) 0.25 3.375
it is counterintuitive and warranls further investigation. C,,
r ( l , 16) 16 27
outperforms C,,, for the Normal distribution. Comparing
CAPABILITY INDICES FOR NON-NORMAL DATA

Table 2. MSE, Var, and Bias for C,, and C,,

N(O, 1) r(l, 16) r(1,0.25)


SIZE Cr,r Cnot C,,, Crlrlt Cr,, Crrrrt
50 MSE 0.012 0.106
Var 0.01 1 0.033
Bias 0.020 0.290
100 MSE 0.006 0.063
Var 0.006 0.020
Bias 0.020 0.207
250 MSE 0.002 0.020
Var 0.002 0.0 13
Bias 0.014 0.085
500 MSE 0.001 0.019
Var 0.001 0.006
Bias 0.011 0.1 13

C,, to C,,,,, one can see that, for the Normal distribution, results found in Study 1. As the underlying distributions
both variance and bias are smaller in all sample sizes. C,, became more non-Normal, bias increased with the change
shows smaller biases and MSEs but larger variances for the more pronounced for C,,. As the sample size increased, the
two gamma distributions. This would suggest that as skew- variance decreased, for all distributions. The change in vari-
ness increases, the nonparametric capability indices are pref- ance was greater for C',,,. A comparison of the PCls shows
erable to C, . that in 33 of the 40 cases (82.5%), e,,,, had a smaller bias
Study 2 involved simulating data from 10 additional dis- than I?,',,. The seven cases where I?,,;,,outperformed were en,,
tributions. A computer program was written to solve for from either the Normal distribution or from samples of size
Johnson family curves with specified skewness and kurto- 50. In all cases, the variance was smaller for tPk. Approxi-
sis. The distributions generated are shown in Table 3, with mately half the cases showed en',,,
to have a smaller MSE
curve I corresponding to the Normal distribution. Ranges than t P k .When the non-Normal cases of sample sizes 250
selected for P , and P, were based on the results from and 500 are examined, however, el,,, outperforms tpk 67%
Study I. Of interest was the behavior of C,,, and CnPkwhen of the time (12 of 18 cases).
small to moderate skewness and kurtosis were present. Four
different sample sizes, N = 50, 100, 250, and 500, were Conclusion
investigated for each of the 10 curves. Five thousand data-
sets were simulated for each' of the 40 conditions (10 curves The above results suggest that I?,, ', is a viable estima-
times 4 sample sizes). The MSE, bias, and variance were tor of process capability for non-Normal distributions. The
compared for two PCIs, ep, and I?,,',,,. The results can be drawback of being associated with a different percent non-
found in Table 4.
Individually, the behavior of each PC1 was similar to the
conforming than e,,;,,
is outweighed by the benefits provided
in its ability to estimate capability accurately when the pro-
cess distribution is not Normal. Several questions remain un-
answered, however. How well would en,,
perform against
Table 3. Distributions Used in Study 2 other PCIs for non-Normal data? As mentioned previously,
CURVE P, P2 CURVE P, P2 there is a class of capability indices that were not consid-
ered for this article. Are confidence intervals available for
en,,? This is important in order to provide users of I?.,,with
,,

more than just point estimates. Finally, how is affected en,,


by correlated observations? It was assumed throughout the
study that observations were uncorrelated. Although these
and other questions remain, it is the hopes of the authors that
McCORMACK ET AL.

Table 4. MSE,Var, and Bias for Curves from the Johnson Family
C,,l,k C1a
MSE VAR BIAS CURVE SIZE MSE VAR BIAS
CAPABILITY INDICES FOR NON-NORMAL DATA

this study has provided some insight into the use of capabil- Chan, L. K., Cheng. S. W., and Spiring, F.A., The Robustness
ity indices for non-Normal process distributions. of Process Capability Index C,, to Departures from Normal-
ity, in Staristical Theory arid Data Arlalysis, 11, edited by
K . Matusita, Elsevier ScienceINorth-Holland, Amsterdam.
References 1988, pp. 223 -229.
Hahn, G. 1.. Statistical Intervals for a Normal Population,
Chan, L. K., Cheng, S. W., and Spiring, F, A., A Graphical Part I. Tables, Examples and Applications, J. AUI. Statist.
Technique for Process Capability, ASQC Quality Corlgress Assoc., 2(3), 115-125 (1970).
Transactions, 1988. pp. 268-275. Hahn, G. J., Statistical Intervals for a Norrnal Population.
Spiring, F. A., An Application of C,,,,to the Toolwear Prob- Part 11. Formulas, Assumptions, Some Derivations, J. Qual.
lem, ASQC Quality Congress Transactions, 1989, pp. 123- Techrlol., 2(4), 195-206 (1 970).
128. Guenther, W. C., Determination of Sample Size for Distri-
Kotz, S, and Johnson, N. L.. Process Capabilify Indices, bution-Free Tolerance Limits, Am. Stutisticiar~,24, 44-46
Chapman & Hall, New York, 1993. (1970).
Deligonul, S. 2. and Mergen, E., Dependence Bias in Con- Guenther, W. C., Two-Sided Distribution-Free Tolerance In-
ventional p-Charts and Its Correction with an Approximate tervals and Accompanying Sample Size Problems, J. Qual.
Lot Quality Distribution. J. Appl. Statist., 14(1), 75-81 Technol., 17(1), 40-43 (1985).
(I 987). Franklin, L. A. and Wasserman, G., Bootstrap Confidence In-
Deligonul, S. Z., On the Independence Condition for the terval Estimates of C,,,: An Introduction, Commun. Slatist.-
Classical Variance Estimator in an Autoregressive Moving Sirnul., 20(1), 231-242 (1 991).
Average Process, J. Statist. Comput. Simul., 17(3), 233-236 Franklin, L. A. and Wasserman, G., Bootstrap Lower Confi-
(1983). dence Limits for Capability Indices, J. Qual. Technol., 24(4),
Porter, L. J. and Oakland. J. S., Process Capability Indices- 196-210 (1992).
An Overview. Qual. Reliab. Eng. Int., 7,437-447 (1991). Efron, B., The Jackknife, the Bootstrap a11dOrher Resampling
Gunter, 8 . H., The Use and Abuse of C,,, Part 2, Qual. Prog., Plans, Society for Industrial and Applied Mathematics, Phila-
22(3), 108-1 09 (1989). delphia. 1982.
Chan, L. K., Cheng, S. W., and Spiring, F. A., A New Mea- Harris. I. R., Nonnormal Particle Data Project: Phase 11, Tech-
sure of Process Capability, C,,,,, J. Qual. Technol., 20(3). nical Report, SEMATECH. Austin, TX.(1994).
160-175 (1988). Nelson, P. R., Editorial, J. Qual. Technol., 24(4), 175 (1992).
Kocherlakota, S., Kocherlakota, K., and Kirmani, S. N. A. U., Casella, G. and Berger, R. L., Statistical Inference, Wads-
Process Capability Indices Under Nonnormality, Int. J. Math. worth and BrooksICole, Pacific Grove, CA, 1990.
Statist. Sci., l(2). 175-210 (1992).
Kane, V. E., Process Capability Indices, J. Qual. Technol.,
IS(]), 41 -52 (1986). About the Authors: D. W. McCormack, Jr. is a statistician at
Clements, J. A., Process Capability Calculations for Non- SEMATECH where he is responsible for developing and
Normal Distributions, Qual. Prog., 22.95-100 (1989). teaching statistics courses and consulting. H e is a member
Page, M., Analysis of Non-Normal Process Distributions, of the American Statistical Association.
Semicond. Inr., 10.88-96 (1994). Ian R. Harris is associate professor in Northern Arizona
Johnson, N. L., Systems of Frequency Curves Generated by University's Department of Mathematics and Statistics. His
Methods of Translation. Biometrika, 36, 149-176 (1949). publications have appeared in Biometrika, Statistics and
Johnson, N. L. and Kotz, S., Distributions in Sturistics: Con- Probability Letters, and the Journal of Applied Statistics.
tirluous Univariate Distributions. Houghton Mifflin, New
Arnon M . Hunvitz is founder and director of Qualtech
York, 1970.
Productivity Solutions, an international productivity consult-
Johnson, N. L., Kotz, S., and Pearn, W. L., Flexible Pro-
cess Capability Indices, Pakistan J. Starist.. lO(1 A). 23-31 ing firm in Cape Town, South Africa. H e is a member of the
(1 994). American Society for Quality.
Pearn, W. L., Kotz, S., and Johnson, N. L., Distributional and Patrick D. Spagon works at Motorola University's Col-
l~iferentialProperties of Process Capability Indices, J. Qual. lege of Statistical Methods. He is a member of the American
Technol.,24(4), 21 6-23 1 (1992). Statistical Association and the American Society for Quality.

You might also like