Advantages of Bootstrap Forest For Yield Analysis
Advantages of Bootstrap Forest For Yield Analysis
White Paper
SAS White Paper
Table of Contents
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Description of various control stages. . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Online control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Parametric test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
EWS (electrical wafer sort). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Analytical methodologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Problem classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Data preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Statistical analysis methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Traditional methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Multivariate analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Principal component analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Predictive modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
The bootstrap forest. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
The boosted tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Case study 1: Root-cause identification (smile signature). . . . . . . . . . 15
Description of problem and impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Case study 2: Correlation between EWS and PT parameter. . . . . . . . 22
Description of problem and impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Introduction
The world of the semiconductor industry is becoming increasingly competitive and
forcing manufacturers to achieve significant reductions in time to market. As a result,
every step in the manufacturing process needs to be completed in less time while
maintaining a high level of control and quality. This must be accomplished based on
an increasing number of requirements to satisfy customer demands, particularly for
products aimed at the automotive industry and the medical sector.
Explaining any variation in yield is a challenge for the yield engineer insofar as there can
be multiple causes of variability, including variations in process parameters, unexpected
and unmonitored manufacturing events, defective equipment, etc. The engineer must
have access to high-performance tools and methods that make it possible to get to the
root of the problem as quickly and reliably as possible. Given the significant quantity of
data collected at every stage of a manufacturing lot and the sometimes-limited statistics
available to describe the problem, traditional techniques are not always adequate for
resolving the issues faced by the yield engineer. For some years, data mining has proved
a highly effective supplement to these techniques.
In this paper, we will use a number of practical examples to demonstrate the use of
partitioning techniques. In particular, we will focus on the “bootstrap forest” method
available in JMP® Pro for root-cause analysis. Sections two and three explain the
problem and describe the techniques used; we will then present the results obtained
using two case studies. The first involves using partitioning for root-cause identification
in the case of a loss of electrical yield that is not detected during the manufacturing
process; the second examines a variation in electrical yield detected during
manufacturing.
1
SAS White Paper
Problem
Description of various control stages
Online control
The volume and complexity of data in the semiconductor industry are significant.
Creating the final product involves a manufacturing process made up of several hundred
steps. These are divided into macro steps involving areas including photolithography,
etching, implantation and CMP (chemical mechanical polishing). A distinction is drawn
between FEOL (front end of line) steps, which define the active parts of transistors, and
BEOL (back end of line) steps, which establish the interconnections between different
transistors using contacts and wire lines.
All of these steps are controlled online, either in real time or after the event, with the help
of the traditional techniques used by process control: SPC (statistical process control)
and APC (advanced process control). Metrological data is gathered as part of the control
process with respect to processing equipment (chamber temperature, pressure, etc.),
along with physical data measured on a lot-by-lot basis, such as thickness, dimensions,
physicochemical characteristics, etc. All data is recorded in EDA (engineering data
analysis) databases and must comply with the specifications agreed when defining the
processes and equipment; any variation must be individually analyzed at the relevant
point. In spite of this, physical measurements are not sufficient to ensure that the
manufacturing process has been completed correctly: Defects that are not identified by
physical measurements may appear when components are electrically activated.
2
Advantages of Bootstrap Forest for Yield Analysis
There are two additional steps used to sort chips at the end of the manufacturing
process: the parametric test and the electrical wafer sort, both electrical tests. The
first of these relates to elementary components (transistors, resistors, capacitors, etc.),
which are produced on cutting lines and placed on a wafer in N geographical areas
(sampling of N = 5, 9, 17 areas). In the second case there is no sampling: Functional
measurements are carried out on all chips. There is also one final sorting step carried
out at assembly sites, known as a final test, which is used to eliminate chips that are
found to be defective after they have been placed in a casing. The entire cycle can be
summarized in Figure 1.
3
SAS White Paper
Parametric test
The parametric test is carried out at the end of the manufacturing process and involves
electrical testing of the elementary components (resistors, transistors, capacitors, etc.) in
structures known as TEGs (test element groups). These are repeated at several points
on the wafer to measure its uniformity (see Figure 2). Several parameters are measured
for these components. They are aggregated at several different levels, including
individual values per site, average per wafer, average per lot, etc. All of this information is
recorded in databases. At this stage, wafers that do not comply with specifications are
rejected.
The electrical wafer sort is an electrical sorting step to ensure that all chips are
electrically functional in accordance with the client’s specifications. Figure 3 illustrates
the EWS test process. Each chip is subjected to a sequence of tests combined in
various subtests (or subprograms) identified by a whole number known as a BIN. If
a particular subtest fails to meet the criteria, the test sequence is stopped and the
chip (identified by its X,Y coordinates on the wafer) is allocated the corresponding test
number. By default, the corresponding BIN number for a chip that passes all the tests
successfully is 1.
4
Advantages of Bootstrap Forest for Yield Analysis
Test Sequence
Wafer map
A wafer map can be produced based on the chip coordinates and its BIN number.
Each BIN (associated with a failed chip) is represented with a color; conventionally,
BIN 1 is shown in green as in Figure 4.
5
SAS White Paper
Yield calculation
The yield of the wafer (as a percent) is calculated based on the following formula:
All test data (yield, BIN, test conditions, etc.) are recorded in the EDA database in
the same way as for the parametric test, with several levels of aggregation: individual
value per chip, average per wafer, average per lot, product, etc. The role of the device
engineer is to monitor product yields: They are responsible for increasing yield to its
theoretical maximum as quickly as possible at the start of production. They must
also react quickly to identify any sudden variation in yield and analyze the root causes
of the latest anomalies not identified by the SPC process. The first step in these
analyses is to categorize the problems.
Analytical methodologies
The first stage is to identify the anomalies using a BIN Pareto analysis, as shown in
Figure 5. This provides a means of focusing on the BINs that are most representative
in terms of loss of yield.
BIN COUNT
BIN NUMBER
Figure 5. Pareto analysis of BINS with associated signatures.
6
Advantages of Bootstrap Forest for Yield Analysis
Once the main anomalies have been identified, further analysis is carried out based
on three key steps: classification, data extraction and the choice of relevant statistical
analysis models.
Problem classification
Classification is an essential step in narrowing down the problem, whether it involves
defining the population affected by the crisis (for comparison with a healthy population)
or describing its signature.
7
SAS White Paper
8
Advantages of Bootstrap Forest for Yield Analysis
Data preparation
Figure 8 shows an example of a data table used for yield analyses. This represents
an aggregation at the wafer level of EWS, PT, INLINE, EQUIPMENT and CHAMBER
parameters. In the case of a lot of 25 wafers, the table dimension is 25x (N+1: where
N is the number of parameters). The total amount of data is then this table dimension
multiplied by the number of lots and number of sites. The main issue in data preparation
is defining the most relevant aggregation level (LOT, WAFER, SITE) used for the analysis,
and extracting the maximum amount of information related to the process. The number
of variables is often significantly higher than the number of observations, which can
cause a problem for traditional statistical analyses. Furthermore, it is possible that some
parameters may be sampled but not available on all wafers or measurement sites, which
results in missing data; from experience, this can negatively affect the analysis when
missing data represents more than 30 percent of the observations. Finally, there can be
high levels of correlation between certain factors, so a second issue is only selecting the
most relevant factors to identify the process problem.
9
SAS White Paper
Traditional methods
As pointed out by Lee et al. [1], the usual techniques used in root-cause analysis are
analysis of variance (ANOVA) techniques and Kruskall-Wallis tests. Analysis of variance
is a type of analysis used for data distributed on a normal curve; Kruskall-Wallis tests,
however, avoid this constraint. The number of factors to be considered in our case
makes this a very time-consuming task. This approach will therefore be used more to
validate the results obtained using the data mining techniques outlined below. Moreover,
these methods ignore the possible interactions that may occur in a crisis, and are based
on studying just one parameter at a time.
Multivariate analysis
Various analysis methods, in particular discriminant analysis, principal component
analysis and partial least squares (PLS) modeling can be used to better describe and
understand multivariate relationships. For illustrative purposes, in this paper we focus
entirely on principal component analysis.
When a set of variables is correlated, some of the information contained in any given
variable is redundant, due to the information already provided by the other variables.
An orthogonal (uncorrelated) set of variables, however, contains no such redundancy.
PCA uses linear combinations of the original variables to construct a set of orthogonal
variables. These orthogonal variables are termed principal components, and are ordered
such that the variance of any given component exceeds that of the next.
If the original set of variables is highly correlated, we are often able to retain the vast
majority of the information contained in the original variable set, using only the first few
principal components. For this reason, PCA is often described as a dimension-reduction
technique.
10
Advantages of Bootstrap Forest for Yield Analysis
Predictive modeling
By constructing a model, which links factors to the response (in our case, yield), we can
identify the impact of these factors, and thereby identify root causes.
Regression
A typical approach would be to construct a linear (or quadratic) model of type:
examining the highest ai coefficients (as absolute values). In our case, however, this is
impractical because we usually have a number of factors that exceeds the number of
historic data, making it impossible to estimate all the coefficients.
The stepwise method can be used to circumvent this type of problem, by representing
only explanatory factors in the model. This technique was used successfully by McCray
et al. [3], but suffers in settings like ours, where a significant portion of the data is
missing.
11
SAS White Paper
Partitioning
More recently, partition- or decision-tree-based methods have been used to identify
ways of improving yield (Cheng et al. [4]). Partitioning is a way to describe the
relationship between a response and set of factors without a mathematical model;
its goal is to divide the data into groups, which differ maximally with respect to some
characteristic – in our example, yield. Partitioning is an iterative process, the visualization
of which resembles a tree – hence the term “decision tree.”
If we examine a group of data, we can identify the X that most expresses the variance
of Y, splitting the data at that value of X, which maximizes the difference in the resulting
groups. An example of this is shown in Figure 10.
All Rows
Count G ^2 LogWorth
874 840,97374 34,957909
Level Rate Prob
BAD 0,1865 0,1865
GOOD 0,8135 0,8135
Count G ^2 Count G ^2
377 61,589827 497 619,99428
Level Rate Prob Level Rate Prob
BAD 0,0159 0,0164 BAD 0,3159 0,3156
GOOD 0,9841 0,9836 GOOD 0,6841 0,6844
12
Advantages of Bootstrap Forest for Yield Analysis
1 + 2 + ... + n /n trees
Although they perform slightly worse than bootstrap forests in identifying root causes,
their simplicity allows boosted trees to be estimated more quickly than bootstrap forests.
13
SAS White Paper
L L
M1 M2 M3 ... Mn
Neural networks
Neural networks are highly flexible predictive models. Based on the way in which the
brain was originally thought to function, neural networks contain one or more “hidden
layers,” each of which contains one or more transformation functions, operating on the
predictors. The relationship between the predictors and the response, described by
Sassenberg et al. [6], is usually quite complex, and generally renders interpretation of
the model coefficients impossible. For this reason, neural networks have traditionally
been better suited to making predictions than to identifying a root cause.
JMP 11, however, allows an analyst to order factors based on the significance of their
impact on the model. Using this information in the same way one might use the column
contributions from a bootstrap forest, likely root-cause candidates can be unearthed,
making the use of neural networks in this way an opportunity deserving additional study.
Validation
To validate a model, we use the model to score (make predictions based on) data that
was not used to fit the model (termed validation data). The model’s predictions are then
compared to the true responses. If overfitting is present, the model will not perform as
well as expected on the validation data.
While there are a variety of validation strategies, we will discuss two of the most
commonly used: cross-validation and holdback validation.
14
Advantages of Bootstrap Forest for Yield Analysis
Cross-validation: The cross-validation technique divides the data set into k subsets
or folds (this is commonly referred to as k-fold cross-validation). K models are then
estimated: Each model is estimated using the data that remains after excluding a single
fold from the original data. Each of the k models is then used to score the fold excluded
when estimating it. The resultant model picked is the one that produces the best fit
to the excluded subset. This technique is well suited to small data sets but does not
guarantee the model’s general applicability.
Holdback validation: Holdback validation divides the data set into three subsets: a
training set, a validation set and a test set. The training set and validation set are used
to select the best candidate model from among many. Several models are constructed:
Each is fit using the training data, and then scores the validation data. From among
these candidate models, the best performing model is selected and used to score the
test data set, providing an indication of the model’s ability to generalize to previously
unseen data.
We will now apply these techniques to our crisis data sets, using the model comparison
tool in JMP Pro to identify the best model.
Case studies
Case study 1: Root-cause identification (smile signature)
We observed losses of yield for a number of weeks on a critical BIN for a mature
STMicroelectronics product. The process was well understood and stable, with a high
level of baseline yield. Yield losses were as high as 20 percent on some wafers, with
a signature at the bottom of the wafer we will call a “smile” (see Figure 13). A defect
analysis identified the nature of the problem, which affected more than 100 wafers.
A detailed view of the defect, which has been produced through scanning electron
microscopy (SEM), is shown in Figure 14. Unfortunately, online parametric analysis was
unable to identify the cause of the problem.
15
SAS White Paper
Figure 13. Example of a failed BIN wafer stack. These failed wafers exhibited a smile
defect signature.
16
Advantages of Bootstrap Forest for Yield Analysis
Data
The data contains 874 rows, comprising one response and 802 factors. Because
the smile BIN response is positively skewed, we decided to apply a logarithmic
transformation to the data. The raw response is shown in Figure 15.
After the logarithmic transformation has been applied to the raw data, we have a
response distribution that is close to normal.
17
SAS White Paper
As Figure 17 shows, the data set contains a significant amount of missing data.
Removing columns with more than 30 percent missing data gives us a data set that is
smaller (just 346 columns) and much easier to analyze. Further analyses presented in
this case study are analyzed with this subset.
Analysis
We begin our analysis by fitting a validated partition model (see Figure 18).
All Rows
Count 524 LogWorth Difference
M ean 39,71374 6,53e+307 72,2747
Std Dev 63,567568
Count 512
Mean 38,058594
Std Dev 61,865112
18
Advantages of Bootstrap Forest for Yield Analysis
Number
RSquare RMSE N of Splits Imputes AlCc
Training 0,029 62,579856 524 1 55 5828,09
Validation 0,037 59,018092 175
Test - 0,04 56,642924 175
Although the process step 177_1_CHAMBER (see Figure 20) seems to affect the
response in a statistical sense, it has too many levels to be useful in a practical sense –
and gives us no help in understanding our yield problem.
Column Contributions
Number
Term of Splits SS Portion
177_1_CHAMBER 1 61248,1522 1,0000
175_1_CHAMBER 0 0 0,0000
144_1_CHAMBER 0 0 0,0000
1_1_CHAMBER 0 0 0,0000
Therefore, we will base our root-cause analysis on other models, using the continuous
SMILE BIN variable. In particular, we are going to create a set of models based on
advanced partitioning techniques: bootstrap forests and boosted trees. We will then use
a model comparison tool in JMP Pro to select a model based on the statistical criterion
R2 (see Figure 21).
19
SAS White Paper
We can see that the bootstrap forest gives us a model that is statistically superior to
either of the other approaches. We therefore proceed by analyzing the factors proposed
by the bootstrap forest model.
If we examine the bootstrap forest’s column contribution report (Figure 22), we see that
the 80_1_CHAMBER chamber is at the top of the list. Again, this variable is much more
useful in understanding yield loss than the variable first proposed by a simple partition,
the 177_1_CHAMBER variable.
Column Contributions
Number
Term of Splits SS Portion
80_1_CHAMBER 23 1654529,98 0,1078
160_2_CHAMBER 10 864816,201 0,0563
146_1_CHAMBER 4 801808,207 0,0522
146_1_EQUIPMENT 5 755164,555 0,0492
146_2_CHAMBER 7 635484,504 0,0414
160_2_EQUIPMENT 11 574988,809 0,0375
7_1_CHAMBER 11 555647,003 0,0362
130_1_EQUIPMENT 2 395391,082 0,0258
203_1_CHAMBER 9 393854,156 0,0257
We were therefore able to identify the root cause of the problem directly, without
discretizing the response.
Conclusion
The 80_1 tool was confirmed as the culprit by various approaches: First, a Kruskall-
Wallis (Figure 23) test carried out after the event confirmed a significant difference in yield
among the chambers. This confirmed our view that the bootstrap forest method offered
an appropriate solution for identifying the root cause. It is important to note that the use
of a bootstrap forest in this way is more generally applicable than the Kruskall-Wallis test.
This is because the bootstrap forest can be used for any combination of continuous
and categorical factors and responses, and is also able to succeed in the presence of
interactions and other complex relationships among factors.
20
Advantages of Bootstrap Forest for Yield Analysis
Finally, the root cause was confirmed by the process teams, using physical analyses and
analyses of data related to the equipment (see Figure 24).
Graph Builder
EW S1 SM ILE Zone-01 vs. 80_1_EQUIPM EN T & 80_1_CH A M BER GOOD/BAD
350
EWS1 SM ILE Zone- 01
BAD
GOOD
300
250
EWS1 SMILE Zone-01
200
150
100
50
EQ01 EQ02
80_1_EQUIPMENT / 80_1_CHAMBER
21
SAS White Paper
Data
The data set has 560 rows and 600 columns - too much information to begin with
principal component analysis. In this case, due to the correlation between EWS and the
PTs, missing data is minimal: Except for noncritical parameters, 100 percent of the PT
information is available.
As the BIN10 data followed a logarithmic trend and included some outliers, we applied a
logarithmic transformation to it, producing a more symmetric distribution (see Figure 26).
22
Advantages of Bootstrap Forest for Yield Analysis
40 1,6
35 1,4
30 1,2
25
1
20
0,8
15
0,6
10
0,4
5
0,2
0
LogNormal(1,98652,0,67515) Normal(0,86273,0,29348)
Analysis
We fit a bootstrap forest with the 606 PT parameters, using the transformed BIN10
variable as Y. Our forest contained 100 trees, with 151 randomly selected columns
considered at each split.
The model created performed reasonably well, as seen by the high R2 reported in Figure
27 for the training data, and minimal reduction of R2 in validation and test sets. Therefore,
we have built a well-fitting model that is also predicting new data well.
R Square RMSE N
Training 0,859 0,1136393 335
Validation 0,748 0,1353828 111
Test 0,750 0,1435366 112
Figure 27. Statistical characteristics of the bootstrap forest model.
We can see that a certain number of columns has an impact on the response, in
particular parameters 171, 451 and 26.
23
SAS White Paper
Column Contributions
Number
Term of Splits SS Portion
451__PARAM _AVERAGE 35 179,294222 0,1017
26__PARAM _AVERAGE 50 147,894805 0,0839
171__PARAM _AVERAGE 31 120,072569 0,0681
164__PARAM _AVERAGE 31 74,5215686 0,0423
183__PARAM _AVERAGE 22 48,1871382 0,0273
218__PARAM _AVERAGE 12 36,9233666 0,0209
42__PARAM _AVERAGE 29 30,1121665 0,0171
216__PARAM _AVERAGE 13 28,6468436 0,0162
The most significant PT parameters in the bootstrap forest model were used to perform
a correlation study, using principal component analysis. This technique helps to identify
and group elements that are highly correlated.
1,0
- 0,5
EWS1_BIN10_
- 1,0
- 1,0 - 0,5 0,0 0,5 1,0
Component 1 (72,9 %)
Figure 29. Projection based on the first two components of PT parameters and BIN 10.
Of those parameters most highly correlated with BIN10, we selected parameter 171; it is
positively correlated with the first principal component, negatively with the second, and
measures the effects of the process directly. This is illustrated in Figure 29.
24
Advantages of Bootstrap Forest for Yield Analysis
Graph Builder
EW S1_B IN 10_ vs. 171__PA R A M _AVERAGE S1
S2
40 S3
S4
EWS1_BIN10_
35
30
25
EWS1_BIN10_
20
15
10
0
1720 1740 1760 1780 1800
171__PARAM _AVERAGE
Where(1 rows excluded)
As Figure 30 shows, there is a correlation between the PT parameter and the process
parameter. In Figure 31 the significant difference is confirmed by a post-hoc Student’s
t-test.
1800
1790
1780
1770
171_PARAM_AVERAGE
1760
1750
1740
1730
1720
1710
1700
S1 S2 S3 S4 Each Pair
Student's t
SPLIT
0,05
25
SAS White Paper
Conclusion
After this analysis was used to optimize the process parameter, the process returned to
a minimal BIN10 rate. The same result could have been realized with other techniques,
after a detailed analysis of each of the 600 parameters, but the bootstrap forest enabled
us to easily pre-screen the parameters, greatly reducing the number of parameters
subjected to detailed analysis.
Summary
As George Box famously said, “All models are wrong, but some models are useful.” In
both the examples above, the use of bootstrap forests greatly streamlined the analysis
process, but it is natural to ask how these models compare to other models we might
have fit. The JMP Pro environment allows us to quickly build, compare and identify the
most useful of several models.
Graph Builder
RSquare vs. Creator Predictor
CS_1_Log10(Smile)
CS_2_Log(Bin10)
1,0
CS_3_Log_EWS13_EQUIP
CS_3_Log_EWS13_PT
0,8
RSquare
0,6
0,4
0,2
0,0
Partition Bootstrap Forest Boosted Tree Neural
Creator
26
Advantages of Bootstrap Forest for Yield Analysis
As seen in Figure 32, which compares R2 values for a variety of models, each
constructed over a variety of responses, the performance of the bootstrap forests was
clearly better than that of the other tree-based methods. Its performance was on par
with the performance of the neural networks.
The two case studies outlined in this paper illustrate just two examples from
STMicroelectronics. We have also frequently used bootstrap forests in other cases
where a simple partition proved inadequate. In particular, they were employed in a case
that required aspects of both of the procedures described above. Specifically, we first
used a bootstrap forest to support a principal component analysis (the purpose of which
was to identify a PT parameter), and then used it a second time to identify equipment
(Figure 33).
Graph Builder
BIN13 & 2 more vs. 68_1_EQUIPMENT GOOD/BAD
PARAMETER_132_AVERAGE
3,0
2,5
LOGBIN13
2,0
1,5
1,0
0,5
0,0
3000
2500
BIN13
2000
1500
1000
500
0
EQ01 EQ02 EQ03 EQ04 EQ05 EQ06 EQ07
68_1_EQUIPM ENT
27
SAS White Paper
References
[1] Lee, C. H., Woo, H. D., Hong, S. W., Moon, J. Y., Kang, S. H., Lee, J. C., Chong,
K. W. and Oh, K. S. (2006). “Novel Method for Identification and Analysis of Various
Yield Problems in Semiconductor Manufacturing.” IEEE Advanced Semiconductor
Manufacturing Conference and Workshop: 185-190.
[2] Pearson, K. (1901). “On Lines and Planes of Closest Fit to Systems of Points in
Space.” Philosophical Magazine 2 (11): 559-572.
[3] “McCray, A. T., McNames, J. and Abercrombie, D. (2004). “Stepwise Regression for
Identifying Sources of Variation in a Semiconductor Manufacturing Process.” IEEE/SEMI
Advanced Semiconductor Manufacturing Conference and Workshop: 448-452.
[4] Cheng H., Ooi, M. P., Kuang, Y. C., Sim, E., Cheah, B. and Demidenko, S.
“Automatic Yield Management System for Semiconductor Production Test.” (2006).
Sixth IEEE International Symposium on Electronic Design, Test and Application (DELTA):
254-258.
[5] Ho, T. K. “Random Decision Forest.” (1995). Proceedings of the Third International
Conference on Document Analysis and Recognition, Montreal, QC, 14-16: 278-282.
[6] Sassenberg, C., Weber, C., Fathi, M. and Montino, R. “A Data Mining-Based
Knowledge Management Approach for the Semiconductor Industry.” (2009). IEEE
International Conference on Electro/Information Technology 2009: 72-77.
28
Advantages of Bootstrap Forest for Yield Analysis
29
About SAS and JMP®
JMP is a software solution from SAS that was first launched in 1989. John Sall, SAS co-founder and Executive Vice President, is the
chief architect of JMP. SAS is the leader in business analytics software and services, and the largest independent vendor in the business
intelligence market. Through innovative solutions, SAS helps customers at more than 65,000 sites improve performance and deliver value
by making better decisions faster. Since 1976 SAS has been giving customers around the world THE POWER TO KNOW®.
STMicroelectronics
STMicroelectronics is a global leader in the semiconductor market, with clients covering the full range of sense & power technologies,
automotive products and on-board processing solutions. From managing consumption to energy savings, confidentiality to data
security and health and well-being to intelligent devices for the general public, ST is involved wherever microelectronic technology is
making a positive and innovative contribution to day-to-day living. ST is active at the heart of professional and entertainment solutions
at home, in the office and in the car. ST is synonymous with “life.augmented” through the increasing use of technology to improve the
quality of life.
In 2012, ST generated net turnover of $8.49 billion. For further information visit the ST website: st.com.