INTEL - AI FOR MANUFACTURING
PROJECT
REPORT
for
Pass/Fail Prediction for Semiconductor Wafers
Prepared by :
Gosai Harshpari Sunilpari
Patel Jenil Mahendrakumar
Uneval Lalit Bharatbhai
TABLE OF
CONTENTS
PAGE-01 PAGE-06
OVERVIEW RESULTS
PAGE-03 PAGE-07
OBJECTIVES CONCLUSION
PAGE-04 PAGE-08
METHODOLOGY PROTOTYPE
PAGE-05 PAGE-13
TECHNOLOGIES USED IMPORTANT LINKS
OVERVIEW
Project Title : Pass/Fail Prediction for Semiconductor Wafers
Project Description : Semiconductor wafer manufacturing is a critical
step in electronics production. However, wafers often undergo retest
due to inaccurate prediction, causing time delay and increase cost.
This project implements AI based prediction model to identify
potential pass/fail outcomes during initial testing stage. By leveraging
data-driven insights, the model aims to improve first time pass rates
and streamline the manufacturing process.
Timeline :
Phase Duration
Problem Scoping 3 days
Data Acquisition 1 day
Data Exploration 2 days
Modeling 5 days
Evaluation 1 day
Report & Finalization 2 days
Benefits :
Reduce Production Delay
Reduces wafer retesting
Increases production rate
Team Members :
Gosai Harshpari Sunilpari - gosaiharsh8200@[Link]
Patel Jenil Mahendrakumar - jenilgajera19@[Link]
Uneval Lalit Bharatbhai - unevallalit6499@[Link]
Risks :
Model may not function well on unseen wafer design pattern
Any inaccuracy in data preprocessing can lead to biasing output
Lack of real-time data integrity can limit deployment
OBJECTIVES
Primary Objectives : Primary objective of this project is to develop an
AI-based prediction model to classify semiconductor wafers in Pass or
Fail during testing. The aim is to improve first time yield rates by
minimizing number of wafers that require retesting. By leveraging
historical wafer data and machine learning algorithms, the model is
designed to assist manufacturers in making faster, more reliable
decisions during the quality control process.
Secondary Objectives : To collect and preprocess historical wafer test
data to make them ready for model training and identifying key
parameters that influence wafer test outcomes. Also to prepare the
model for future integration in real-time production environment.
Measurable Goals :
Achieve at least 90% accuracy in classifying wafer pass/fail status
on the test dataset.
Reduce false positive (incorrectly predicting pass) rate.
Reduce the overall need for retesting wafers by good percentage,
based on historical retest trends.
METHODOLOGY
Approach : The project followed a data-driven, iterative machine
learning approach. The project was divided into distinct stages
including data preprocessing, model development, evaluation, and
result interpretation. Python and Jupyter Notebook were used for
implementation, supported by libraries such as Pandas, Scikit-learn,
and Matplotlib.
Phases : The project followed a standard AI project development cycle
to handle step by step functions. This include the phases like Problem
Scoping, Data Acquisition, Data Exploration, Modeling, Evaluation and
Deployment.
Deliverable :
Cleaned and preprocessed wafer dataset.
Graphs and charts for key parameters important for consideration.
Jupyter Notebook containing all code, plots, and metrics.
Project Report with result and conclusion.
Testing and Quality assurance : Testing was conducted using 70-30
train-test split to ensure model generalization. Evaluation metrics such
as F1-score were prioritized due to class imbalance in the data.
Confusion matrix analysis helped ensure the model minimized false
positives. Code validation included checking data pipeline
correctness, ensuring reproducibility, and avoiding data leakage.
TECHNOLOGIES
USED
Programming Language : Python programming language is used for all
stages of the project including preprocessing, model building,
evaluation and visualization. It was chosen due to its rich ecosystem of
machine learning libraries and its readability.
Development Framework & Libraries :
Pandas: For data loading, transformation, and manipulation.
NumPy: Used for numerical operations and efficient array
handling.
Matplotlib & Seaborn: Used for data visualization, including
countplot, and confusion matrices.
Scikit-learn: Provided tools for machine learning model
implementation , model evaluation , and utilities like train-test
splitting.
Development Tools :
Jupyter Notebook : The development environment where the
entire workflow — from data exploration to model evaluation — was
written and executed. Jupyter’s cell-based format made it easy to
test, debug, and document code inline.
Testing Tools :
Scikit-learn’s Evaluation suite : Metrics like confusion matrix,
classification report, accuracy, precision, recall, and F1-score were
used to validate the models.
RESULTS
Key Metrics : The model performance was evaluated using accuracy,
precision, recall, F1 score and confusion matrix. Below are the results
of the model
Metric Score (%)
Accuracy 67
Precision 96
Recall 70
F1 score 80
Return on investment (ROI) :
Reduction in Retesting Costs: By predicting pass/fail outcomes
accurately, the model reduces unnecessary retests, saving both
machine time and labor.
Faster Production : Manufacturer can make quick decision about
Pass/Fail for wafer, accelerating wafer flow on production line
Improved Yield Quality: With a precision score above 90%, the
model helps in avoiding the release of faulty wafers into further
stages.
CONCLUSION
Recap : This project is focused on developing an AI-based to predict
the Pass/Fail status of the semiconductor wafer. Using Python and
machine learning frameworks multiple models were built and
evaluated based on key classification metrics. The project followed a
structured methodology from data preprocessing and model training
to testing and evaluation to ensure the model’s robustness and
accuracy.
Key Takeaways :
A high-performing prediction model was developed with precision
exceeding 95%, significantly reducing the likelihood of false
classifications.
Among the models tested, Logistic Regression achieved the best
balance between precision, recall, and F1-score.
The entire project was implemented using open-source
technologies, making it a cost-effective solution for smart
manufacturing.
Future Plan :
Deployment in a production environment, possibly through API
integration or real-time monitoring dashboards.
Integration with live manufacturing systems, including data
streaming from wafer testing equipment.
Feedback loop integration to allow the model to improve over time
with new wafer test data.
PROTOTYPE
Importing important libraries and loading data set into variable called
dataset.
Performing Exploratory Data Analysis where exploration and
summarization of the main characteristic of data is done.
Visualizing the class distribution by plotting bar chart which shows
how many samples belong to each class.
Checking for missing values and selecting categorical columns for
more processing.
Performing Data Preprocessing by analyzing categorical features,
removing irrelevant data, splitting features and labels and imputing
missing values.
Dataset is converted in training and testing sets as 70% training data
and 30% testing data.
Performing Normalization by applying StandardScaler from sklearn to
the features. (Mean = 0 and Standard deviation = 1)
Doing dimensionality reduction using principal component analysis to
reduce feature dimensions and noise.
Creating and training a logistic regression model on the PCA reduced
training data.
Evaluating performance of the model on test data based on
parameters like accuracy, classification report and confusion matrix.
Predicting the class and probability for the randomly selected sample.
This helps to evaluate model confidence and correctness on a real-
world test case.
IMPORTANT LINKS
Github Link :
For the Jupyter Notebook file
[Link]
Google Drive Link :
For the Dataset (.csv) file
[Link]
s/view?usp=sharing