ML Report
ML Report
2. Data Preprocessing
Dataset Overview
The UCI SECOM dataset contains
measurements from various sensors
used in a manufacturing process, along
with a binary target variable indicating
whether a product passed or failed
quality control.
1.import necessary libraries:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn.preprocessing import
StandardScaler
from imblearn.over_sampling import SMOTE
from sklearn.model_selection import
train_test_split
from sklearn.metrics import accuracy_score,
classification_report, confusion_matrix
from sklearn.ensemble import
RandomForestClassifier, VotingClassifier
from sklearn.linear_model import
LogisticRegression
from sklearn.svm import SVC
from sklearn.decomposition import PCA
import warnings
2. Mounting Google Drive
from google.colab import drive
drive.mount('/content/drive')
This step is specific to Google Colab and
allows you to access files stored on Google
Drive. The dataset is stored in the user's
Google Drive, and this code mounts it for use.
7. Feature Standardization
scaler = StandardScaler()
x_resampled_scaled =
scaler.fit_transform(x_resampled)
The features are standardized (i.e., scaled
to have a mean of 0 and standard
deviation of 1), which is essential for
models like SVM to perform well.
CONFUSION MATRIX
References:
US Accident Dataset: Kaggle
Conclusion
The code takes the SECOM dataset, handles
missing data and class imbalance, and applies
PCA for dimensionality reduction.
Three classifiers (Logistic Regression, Random
Forest, and SVC) are combined using a Voting
Classifier to make final predictions.
The model is evaluated using various metrics
such as accuracy, confusion matrix, and
classification report.