DEPARTEMENT OF INFORMATION TECHNOLOGY
ARTIFICIAL INTELLIGENCE FOR ENHANCED
SEMICONDUCTOR MANUFACTURING FEATURE
SELECTION FOR YIELD IMPROVEMENT
Guide: Batch no:12
[Link] [Link] GOUD (20UJ1A1245)
[Link] [Link] (20UJ1A1208)
[Link] (20UJ1A1204)
CONTENTS
ABSTRACT ADVANTAGES
INTRODUCTION APPLICATIONS
TRADITIONAL SYSTEM SOFTWARE USED
PROBLEM DEFINITION DATASET DESCRIPTION
LITERATURE SURVEY CODE
RESEARCH GAPS RESULT
EXISTING METHODOLOGY COMPARISON TABLE
LIMITATIONS CONCLUSION
PROPOSED SYSTEM FUTURE SCOPE
PROPOSED METHODOLOGY REFERENCES
ABSTRACT
In semiconductor manufacturing, ensuring high yield rates is critical for
optimizing production efficiency and minimizing costs
Traditional approaches were instrumental and not effectively distinguish the
faultyness and efficiency, leading to reduced yield rates. Additionally, manual
feature selection processes are labour-intensive
To overcome the limitations , this model proposes the use of artificial intelligence-
based feature selection techniques. This proposed system will rank features \
according to their impact on semiconductor manufacturing yield.
INTRODUCTION
Semiconductors are materials which have a conductivity between conductors and
insulators. They can be pure elements, silicon or germanium or compounds;
gallium, arsenide or cadmium selenide.
)Nowadays semi conductors became essential part of our high-tech world called
as Integrated circuits (Ics
A survey from SIA reports that faults in semiconductors manufactured in last 5
years had 5% to in 2021 the number of fatalities increased by 19.8%.
.
Continued…
m
TRADITIONAL SYSTEM
Enhanced semiconductor feature selection is mostly used to improve the yield and
performance of IC.
Traditional systems for enhanced feature selection techniques are instrumental
they help identify the most relevant signals that impact yield rates
The manual examination plays a vital role in inspecting and testing Ics
But it is time-consuming i.e it delays the analysis, that impacts on yield rate.
PROBLEM DEFINITION
The semiconductor design steps consists of unique complexities.
Increased data complexity leads to extraxt the required data in IC
The current system for traffic tracking and accident detection may delays the
emergency treatment.
Lack of automation in fault etection and resolution
LITERATURE SURVEY
TITLE AUTHOR YEAR WORK DONE DRAWBACKS
An expandable Lee,[Link]. 2022 Based on the analysis and It has less root
yield prediction actual data,it was shown that mean square error
framework using the random forest classifier compare to other
Explainable performed that best when classifiers.
Artificial paired with chained equations
Ielligence for multiple imputations.
Semiconductor Chowdhury, 2022 In order to investigate the Less accuracy
manufacturing [Link]. similarity of travel patterns in when large dataset
process seasonal variations, a WND- is used.
improvement LSTM model that included
using Data- data pretreatment, data model
Driven implementation was
methodologies presented.
Continued…
TITLE AUTHOR YEAR WORK DONE DRAWBACKS
Machine Learning Nuhu,et.a 2022 A real world dataA real-world High
techniques for l. dataset was used to assess the computational
fault diagnosis in model from several angles. The requirements
the semiconductor outcomes demonstrated that the
manufacturing suggested model performed better
process than certain benchmark models
when taking into account both
temporal and geographical data.
Yield prediction Busch,Re 2022 In order to reduce the false Scalability and
with machine becca detection rate, dynamic weights high cost.
learning and [Link]. were employed in ensemble
parameter limits in transfer learning.
semiconductor
production
RESEARCH GAPS
Class imbalance in data.
Time consuming
Limited explanation and interpretation of models.
Imperferct handling of noisy and missing data
Lack of automation in detection
EXISTING METHODOLOGY
• The MLP is a type of artificial neural network characterized by multiple layers of
nodes (neurons), including an input layer, one or more hidden layers, and an
output layer.
• In the context of fault identification, the MLP classifier is trained on integrated
circuit (IC) statistics data,
• The MLP learns to map these input features to the presence or absence of faults in
ICs
LIMITATIONS
Over fitting
Complexity
Hyper parameter sensitivity
Lack of interpretability
Depends heavily on training data
PROPOSED SYSTEM
RANDOM FOREST
• Random forest is a popular machine learning algorithm which belongs to supervised
learning techniques.
• It is based on the concept of ensemble learning which is a process of combining
multiple classifiers.
• Instead of believing on one decision tree, it takes the prediction from every tree and
based on the majority votes of predictions, it decides the final output.
Continued….
PROPOSED SYSTEM
RFC
Dataset
Train
Data
preprocessing
Test Production
Data
SMOTE
RFC No
Train & Test Yes production
prediction Fault
PROPOSED METHODOLOGY
The proposed methodology employing the Random Forest Classifier (RFC) for
fault identification in semiconductor devices encompasses several key steps
begins with
Dataset upload
Dataset preprosessing
Training of the Random Forest Classifier using the training dataset to learn
patterns and relationships between integrated circuit statistics and fault
occurrences.
Dataset splitting
Feature extraction
ADVANTAGES
Handling imbalance data
Robustness to overfitting.
Implicit feature selection
High accuracy rate.
APPLICATIONS
Managing cost of smart device
Automated fault ic identification systems
Automated ic inspecting cards
SOFTWARE USED
Software Requirements
• Anaconda: It is a free and open-source platform that includes a Python
distribution, a package manager, and a collection of pre-built scientific packages.
• Jupyter notebook: It is a web-based interactive computing platform.
Programming Language used
• Python: Python is a multipurpose, high-level, object-oriented programming
language.
Continued…
Packages used
• TensorFlow: It is to create dataflow graphs that describe how data moves through
a graph.
• Numpy:It is a powerful python library for numerical computations.
• Pandas:It is a data manipulation package in python for tabular data.
• Matplotlib:It is a library for creating static, animated and interactive
visualizations in python.
• Scikit-learn:It is a python library to implement machine learning models and
statistical modelling.
DATASET DESCRIPTION
Dataset consist of 2 files.
• test data
• [Link]
• [Link]
Dataset consist of 5852 samples.
In each file it consist of 2926 samples.
CODE
• def preprocessDataset():
•
global X, y global le global dataset global
x_train,y_train,x_test,y_test [Link]('1.0', END)
print([Link]()) [Link](END,str([Link]())+"\n\n")
[Link](0,inplace=True) X=[Link]('Label',axis=1)
y=dataset['Label']
•smote = SMOTE(sampling_strategy='auto', random_state=42) X,y= smote.fit_resample(X, y)
[Link](END,"Total records found in dataset:
RESULT
Existing Systems
• Confusion Matrix
Continued…
Proposed Model
• Confusion Matrix
COMPARISON
Model name Accuracy Precision Recall F1-Score
RF classifier 86.35 86.36 86.20 86.26
CNN classifier 97.78 97.76 97.78 97.77
Conclusion
This project has great potential for the future of intelligent transportation systems.
These models and graph neural networks can help identify accident-prone areas
and predict traffic patterns.
It also enabling proactive measures to improve road safety and reduce congestion.
The ability to provide real-time traffic updates and accident warnings to drivers
can empower them to make informed decisions and navigate more safely.
Future scope
The future scope of this project lies on Further refinement and optimization of
machine learning models such as the Random Forest Classifier (RFC) .
This includes fine-tuning hyperparameters, experimenting with different ensemble
methods, or exploring advanced algorithms to improve model performance and
scalability.
This continues research into feature engineering and selection techniques can
enhance the effectiveness of fault identification models.
REFERENCES
[1]Lee, Youjin, and Yonghan Roh. "An Expandable Yield Prediction
Framework Using explainable Artificial Intelligence for Semiconductor
Manufacturing." Applied Sciences 13, no. 4 (2023): 2660.
[2]Chowdhury, Hribhu. "Semiconductor Manufacturing Process
Improvement Using Data-Driven Methodologies." (2023).
Continued…
[3] Nuhu, Abubakar Abdussalam, Qasim Zeeshan, Babak Safaei, and
Muhammad Atif Shahzad. "Machine learning-based techniques for fault
diagnosis in the semiconductor manufacturing process: a comparative study."
The Journal of Supercomputing 79, no. 2 (2023): 2031-2081.
[4]Kao, Sheng-Xiang, and Chen-Fu Chien. "Deep Learning Based
Positioning Error Fault Diagnosis of Wire Bonding Equipment and an
Empirical Study for IC Packaging." IEEE Transactions on Semiconductor
Manufacturing (2023).
.
Thank You!