ai 6007 lab
ai 6007 lab
Submitted to:
MD. Assaduzzaman
Senior Lecturer
Department of CSE
Submitted by:
Mohaiminul Islam
ID: 221-15-6007
Section: 61_P1
Department of CSE
Objectives: The objective of this experiment is to understand and implement the Breadth-First Search
(BFS) traversal algorithm for graphs using Python. BFS explores nodes level-by-level from a Starting
node, visiting all immediate neighbors first.
This experiment aims to:
• Learn the application of BFS in finding the shortest paths and traversing data structures like trees
and graphs.
Procedure:
➢ Graph Representation:
● A Python dictionary is used to represent the graph as an adjacency list.
● Each key represents a node, and its corresponding value is a list of neighboring nodes.
➢ Queue Initialization:
● We use Python’s deque from the collections module to implement a FIFO queue.
● A set is maintained to track visited nodes, ensuring no node is visited twice.
➢ Testing:
● A graph with 7 nodes was created.
● BFS traversal was performed starting from node 0.
Code:
print(vertex, end=" ") for
from collections import deque
neighbor in
class Graph:
Output:
Output Explanation:
Conclusion: Through this experiment, we successfully implemented the Breadth-First Search (BFS) algorithm
in Python using an adjacency list. By using a deque for efficient queue operations and a set to manage visited
nodes, the BFS traversal correctly explored the graph level by level.
This hands-on practice improved understanding of how graphs are traversed, how queues operate internally, and
how BFS can be used for various real-world applications like network analysis, finding shortest paths in
unweighted graphs, and social network exploration.
Overall, the experiment deepened our skills in Python programming, object-oriented design, and algorithmic
thinking related to graph theory.
Lab Report
Submitted to:
MD. Assaduzzaman
Senior Lecturer
Department of CSE
Submitted by:
Mohaiminul Islam
ID: 221-15-6007
Section: 61_P1
Department of CSE
Objectives: The objective of this experiment is to understand and implement the Depth-First Search
(DFS) traversal algorithm in Python. DFS explores as far as possible along each branch before
backtracking, making it useful for tasks such as pathfinding, cycle detection, and solving puzzles. The
main goals are:
Procedure:
● Graph Representation:
● The graph is represented using a Python dictionary (adjacency list format).
● Each key is a node, and its value is a list of connected neighboring nodes.
● Initialization:
● A set is used to keep track of visited nodes to prevent revisiting.
● Traversal Steps:
● Start DFS traversal from a given source node.
● Mark the node as visited and print/process it.
● Recursively visit each unvisited neighbor.
● Continue this depth-first until all nodes are visited.
● Testing:
● A sample undirected graph with 7 nodes is created.
● DFS traversal is performed starting from node 0.
Code:
class Graph:
def init (self): self.adj_list = for neighbor in self.adj_list.get(node, []):
{} if neighbor not in
Output:
Output Explanation:
• Start at node 0.
• Visit neighbor 1, then go deeper to 3, then backtrack to 4.
• After finishing all of 1's neighbors, backtrack to 0 and move to 2.
• Then recursively visit 5 and 6.
Conclusion: In this lab, we successfully implemented the Depth-First Search (DFS) traversal algorithm using
recursion in Python. The experiment highlighted how DFS dives deep into the graph before backtracking, making
it ideal for exhaustive exploration.
The experience strengthened our understanding of fundamental graph algorithms and improved our Python
programming skills, especially recursion and data structure handling.
Lab Report
Submitted to:
MD. Assaduzzaman
Department of CSE
Submitted by:
Mohaiminul Islam
ID: 221-15-6007
Section: 61_P1
Department of CSE
Objectives: The objective of this lab experiment is to perform data preprocessing steps on a dataset to
handle missing values effectively and prepare the dataset for further analysis or model training.
Key goals include:
Procedure:
Dataset Loading:
• Upload the CSV dataset using Google Colab’s files.upload() method.
• Read the dataset into a Pandas DataFrame.
df3_drop_rows = plt.figure(figsize=(25,25))
df2_drop_clm.select_dtypes(include= for i,var in enumerate(num_var): plt.subplot(9,4,i+1)
['int64','float64']).values sns.distplot(df[var], bins=20)
Output:
Conclusion: In this lab, we successfully performed data preprocessing on a real-world dataset using Python. We
visualized missing data, identified columns with excessive missing values, and dropped them. We then handled
missing numerical data using mean value imputation through the SimpleImputer class.
The process highlighted the importance of cleaning datasets before applying machine learning algorithms.
Properly addressing missing values ensures that models are trained on complete and meaningful data, leading to
better accuracy and insights.
Additionally, visual comparisons showed how preprocessing can slightly adjust data distributions, impacting
future data analysis. This lab enhanced our skills in data cleaning, imputation techniques, and visualization using
Python.
Lab Report
Submitted to:
MD. Assaduzzaman
Department of CSE
Submitted by:
Mohaiminul Islam
ID: 221-15-6007
Section: 61_P1
Department of CSE
Experiment Name: Handling Imbalanced Dataset using Over-Sampling and Hybrid Techniques
Objectives: The objective of this lab is to address the problem of imbalanced class distribution using
advanced resampling techniques.
The goals are:
Procedure:
Dataset Loading:
• Load the Maternal Health Risk Dataset.
• Explore features and check the distribution of the target class (RiskLevel).
Data Preprocessing:
• Handle missing values if any (isnull().sum()).
• Detect and replace outliers with column mean using IQR method.
• Standardize numerical features using StandardScaler.
Resampling Techniques:
• SMOTE:
o Apply SMOTE to create synthetic samples for minority classes.
o Balance the dataset.
• SMOTEENN (optional step - add if required):
o Combine SMOTE with Edited Nearest Neighbor to oversample and clean.
• ADASYN (optional step - add if required):
o Generate synthetic samples for difficult-to-learn examples dynamically.
Code:
from google.colab import drive drive.mount('/content/drive')
#sklearn
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import mean_squared_error,confusion_matrix, precision_score, recall_score, auc,roc_curve
from sklearn import ensemble, linear_model, neighbors, svm, tree, neural_network from sklearn.pipeline import
make_pipeline
from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn import svm,model_selection, tree, linear_model, neighbors, naive_bayes, ensemble, discriminant_analysis,
gaussian_process
import pandas as pd
from sklearn.model_selection import train_test_split from
sklearn.preprocessing import StandardScaler from sklearn.ensemble import
RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix from sklearn.tree import
DecisionTreeClassifier
path = '/content/Maternal Health Risk Data Set.csv' # Or wherever it is in your Colab session
df = pd.read_csv(path) # Removed the extra indent df.head()
X = df.drop('RiskLevel', axis=1) y =
df['RiskLevel']
# Value counts for the RiskLevel column
print(df['RiskLevel'].value_counts()) import
matplotlib.pyplot as plt
import seaborn as sns import
pandas as pd
# Convert to DataFrame
df_class_distribution = pd.DataFrame(list(class_distribution.items()), columns=['RiskLevel', 'Count'])
# Convert to DataFrame
df_class_distribution = pd.DataFrame(list(class_distribution.items()), columns=['RiskLevel', 'Count'])
print(df_class_distribution) import
matplotlib.pyplot as plt import seaborn as sns
# Find outliers (those that fall outside the lower and upper bounds) # Apply the outlier detection only
to numerical columns
outliers = ((numerical_df < lower_bound) | (numerical_df > upper_bound))
# Print rows with outliers using the original DataFrame (df) outlier_rows = df[outliers.any(axis=1)]
print(outlier_rows) import pandas
as pd import seaborn as sns
import matplotlib.pyplot as plt
# Loop through each **numerical** column and replace outliers with the mean of the column for column in numerical_df.columns: #
Changed to numerical_df.columns
mean_value = df[column].mean()
df_cleaned[column] = df[column].apply(lambda x: mean_value if x < lower_bound[column] or x > upper_bound[column] else
x)
# Now df_cleaned has outliers replaced with the mean values print(df_cleaned)
import seaborn as sns
import matplotlib.pyplot as plt
# Create a boxplot to visualize the data after replacing outliers with the mean plt.figure(figsize=(10, 6))
sns.boxplot(data=df_cleaned) # Use df_cleaned after replacing outliers plt.title('Boxplot
After Replacing Outliers with Mean')
plt.show()
#SMOTE (Synthetic Minority Over-sampling Technique) generates synthetic samples for the minority class.
from imblearn.over_sampling import SMOTE # Initialize
SMOTE
smote = SMOTE(random_state=42)
# Initialize RandomUnderSampler
undersample = RandomUnderSampler(random_state=42)
# Convert back to DataFrame to visualize the resampled and scaled data df_resampled_scaled =
pd.DataFrame(X_resampled_scaled, columns=X.columns) df_resampled_scaled['RiskLevel'] = y_resampled
print(df_resampled_scaled.head()) print(df_resampled_scaled['RiskLevel'].value_counts())
# Split the resampled and scaled data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_resampled_scaled, y_resampled, test_size=0.20, random_state=42)
Output:
Conclusion: In this experiment, we addressed the issue of class imbalance in a real-world health dataset by
applying SMOTE and optionally SMOTEENN/ADASYN techniques.
By generating synthetic samples, we balanced the dataset, leading to significant improvement in model
performance metrics.
Key learnings:
Task Name: Model Training and Evaluation using Random Forest, Decision Tree,
and Logistic Regression
Submitted to:
MD. Assaduzzaman
Department of CSE
Submitted by:
Mohaiminul Islam
ID: 221-15-6007
Section: 61_P1
Department of CSE
Experiment Name: Feature Selection and Model Evaluation on Maternal Health Risk Dataset
Objectives: The objective of this lab is to apply different feature selection techniques on a real-world
dataset and analyze their impact on model performance.
The primary goals are:
• Apply Recursive Feature Elimination (RFE) and Chi-Square Test for selecting important features.
• Evaluate model accuracy after feature selection using Random Forest and Decision Tree
classifiers.
• Understand the role of feature selection in reducing overfitting and improving generalization.
Procedure:
• Dataset Loading:
• Load the "Maternal Health Risk" dataset.
• Encode the target variable RiskLevel into numerical form using LabelEncoder.
• Data Splitting:
• Split the data into training (70%) and testing (30%) sets using train_test_split.
• Model Training:
• Train Random Forest and Decision Tree classifiers on the selected features from each method.
• Evaluate using accuracy score.
• Result Analysis:
• Compare model accuracies across different feature selection methods.
• Identify which method led to the best model performance.
Code:
import pandas as pd
from sklearn.model_selection import train_test_split from sklearn.preprocessing
import LabelEncoder
from sklearn.feature_selection import SelectKBest, chi2, f_classif, mutual_info_classif,
VarianceThreshold
from sklearn.ensemble import RandomForestClassifier from sklearn.tree
import DecisionTreeClassifier from sklearn.feature_selection import RFE
from sklearn.metrics import accuracy_score
# Feature selection using Variance Threshold (removing low variance features) var_thresh =
VarianceThreshold(threshold=0.1)
X_train_var = var_thresh.fit_transform(X_train) X_test_var =
var_thresh.transform(X_test)
# Feature selection using Recursive Feature Elimination (RFE) with RandomForestClassifier rfe_selector =
RFE(estimator=RandomForestClassifier(), n_features_to_select=5) X_train_rfe = rfe_selector.fit_transform(X_train, y_train)
X_test_rfe = rfe_selector.transform(X_test)
dt = DecisionTreeClassifier() dt.fit(X_train_selected,
y_train) y_pred_dt = dt.predict(X_test_selected)
accuracy_dt = accuracy_score(y_test, y_pred_dt)
results = []
results.extend(train_and_evaluate(X_train_anova, X_test_anova, y_train, y_test, 'ANOVA'))
results.extend(train_and_evaluate(X_train_chi2, X_test_chi2, y_train, y_test, 'Chi- Square'))
results.extend(train_and_evaluate(X_train_mi, X_test_mi, y_train, y_test, 'Mutual Information'))
results.extend(train_and_evaluate(X_train_var, X_test_var, y_train, y_test, 'Variance Threshold'))
results.extend(train_and_evaluate(X_train_rfe, X_test_rfe, y_train, y_test, 'RFE'))
results.extend(train_and_evaluate(X_train_rf_imp, X_test_rf_imp, y_train, y_test, 'Feature Importance'))
# Get the selected features from each feature selection method feature_names = X.columns
# Create a list to store the selected features and their corresponding methods selected_features_list = []
selected_features_list.append(['ANOVA'] + list(anova_selected_features)) selected_features_list.append(['Chi-Square'] +
list(chi2_selected_features)) selected_features_list.append(['Mutual Information'] + list(mi_selected_features))
selected_features_list.append(['RFE'] + list(rfe_selected_features)) selected_features_list.append(['Variance Threshold'] +
list(var_selected_features)) selected_features_list.append(['Feature Importance'] + list(important_features))
#Find maximum number of features selected by any method + 1 (for method name) max_len = max(len(row) for row in
selected_features_list)
# Pad shorter lists with None to make them equal length for row in
selected_features_list:
while len(row) < max_len:
row.append(None)
Output:
Conclusion: In this lab experiment, we successfully implemented multiple feature selection techniques and
analyzed their effects on model performance.
Key Findings:
• Recursive Feature Elimination (RFE) slightly outperformed Chi-Square in both Random Forest and
Decision Tree classifiers.
• Feature selection helped reduce overfitting, improved training speed, and maintained or enhanced model
accuracy.
• Chi-Square method is simple but only applicable when features are non-negative and independent.
• RFE is model-dependent but more flexible and powerful for complex data.
Lab Report
Task Name: Model Training and Evaluation using Random Forest, Decision Tree,
and Logistic Regression
Submitted to:
MD. Assaduzzaman
Senior Lecturer
Department of CSE
Submitted by:
Mohaiminul Islam
ID: 221-15-6007
Section: 61_P1
Department of CSE
Experiment Name: Classification using Different Machine Learning Algorithms on Health Risk Dataset
• Implement multiple machine learning algorithms: Random Forest (RF), Decision Tree (DT), and
Logistic Regression (LR).
Procedure:
• Dataset Loading:
• Load the Alzheimer’s Disease Dataset.
• Preprocessing:
• Separate features (X) and target (y).
• Apply one-hot encoding to categorical variables.
• Standardize features using StandardScaler.
• Model Definition:
• Logistic Regression (LR): Basic linear classifier.
• Decision Tree (DT): Tree-based classification model.
• Random Forest (RF): Ensemble method using multiple decision trees.
• Model Evaluation:
• Apply Stratified K-Fold Cross-Validation (5 folds).
• Calculate and store mean accuracy for each model on each feature-selected dataset.
• Result Analysis:
• Compare the models' performance across different feature selection techniques.
• Identify which model and feature selection combination achieved the highest accuracy.
Code:
# Import necessary libraries import pandas
as pd
from sklearn.model_selection import StratifiedKFold, cross_val_score from sklearn.preprocessing import
StandardScaler, MinMaxScaler
from sklearn.feature_selection import (
SelectKBest, f_classif, mutual_info_classif, RFE, VarianceThreshold, chi2, RFECV
)
from sklearn.linear_model import LogisticRegression, Lasso from sklearn.tree import
DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier from
sklearn.decomposition import PCA
from imblearn.over_sampling import SMOTE
# Mutual Information
mi_selector = SelectKBest(mutual_info_classif, k=20)
X_mi = mi_selector.fit_transform(X_resampled, y_resampled)
# RFE
logistic_model = LogisticRegression(max_iter=500) rfe_selector = RFE(logistic_model,
n_features_to_select=10) X_rfe = rfe_selector.fit_transform(X_resampled, y_resampled)
# Variance Threshold
var_thresh = VarianceThreshold(threshold=0.1) X_var_thresh =
var_thresh.fit_transform(X_resampled)
# RFECV
rfecv_selector = RFECV(logistic_model, step=1, cv=5)
X_rfecv = rfecv_selector.fit_transform(X_resampled, y_resampled)
# PCA
pca = PCA(n_components=10)
X_pca = pca.fit_transform(X_resampled)
# Evaluate each model with all feature selection methods for model_name, model in
models.items():
accuracies[model_name] = {}
for method_name, X_selected in feature_datasets.items():
scores = cross_val_score(model, X_selected, y_resampled, cv=cv, scoring='accuracy')
mean_accuracy = scores.mean() accuracies[model_name][method_name] = mean_accuracy
# Step 9: Display the accuracies for each model and feature selection method for model_name, acc in
accuracies.items():
print(f'\nModel: {model_name}')
for method, accuracy in acc.items():
print(f' Cross-validated accuracy with {method}: {accuracy * 100:.2f}%')
Output:
• Random Forest achieved the best classification accuracy across all feature selection methods.
• Logistic Regression worked well on feature sets like ANOVA and Chi-Square but struggled when the
data had complex relationships.
• Feature selection techniques like RFE and RFECV provided the best boost in model performance.
• Handling class imbalance with SMOTE significantly improved classification performance for minority
classes.
Lab Report
Task Name: Hyperparameter Tuning using GridSearchCV for Decision Tree and
Random Forest
Submitted to:
MD. Assaduzzaman
Department of CSE
Submitted by:
Mohaiminul Islam
ID: 221-15-6007
Section: 61_P1
Department of CSE
Procedure:
• Outlier Removal:
• Detect and remove outliers using the IQR method.
• Feature Selection:
• Select the top 9 important features using SelectKBest with ANOVA F-test.
• Data Standardization:
• Scale the feature values using StandardScaler.
• Train-Test Split:
• Split the dataset into 80% training and 20% testing sets.
• Hyperparameter Tuning:
• Perform GridSearchCV on:
• Decision Tree Classifier:
• Hyperparameters: criterion, max_depth, min_samples_split, min_samples_leaf.
• Random Forest Classifier:
• Hyperparameters: n_estimators, max_depth, min_samples_split, min_samples_leaf, bootstrap.
• Use 5-fold cross-validation during tuning.
• Model Evaluation:
• Predict on the test set using the tuned models.
• Evaluate models using accuracy, classification report, and confusion matrix.
• Performance Comparison:
• Analyze the improvements in model performance after tuning.
Code:
# Import necessary libraries import pandas
as pd
import numpy as np import seaborn
as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.preprocessing
import StandardScaler
from sklearn.ensemble import RandomForestClassifier from sklearn.tree
import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix from sklearn.feature_selection import
SelectKBest, f_classif
from sklearn.utils import resample
# Load dataset
df = pd.read_csv('/content/data (1).csv')
dt_model = DecisionTreeClassifier(random_state=42)
dt_grid = GridSearchCV(dt_model, param_grid=dt_param_grid, cv=5, n_jobs=-1, verbose=1) dt_grid.fit(X_train, y_train)
rf_model = RandomForestClassifier(random_state=42)
rf_grid = GridSearchCV(rf_model, param_grid=rf_param_grid, cv=5, n_jobs=-1, verbose=1) rf_grid.fit(X_train, y_train)
• The tuned Random Forest Classifier achieved an impressive accuracy of 92.5 on the test set.
• The Decision Tree Classifier also showed improved performance after tuning.
• Tuning parameters like max_depth, min_samples_split, and bootstrap helped in reducing overfitting and
improved the model's generalization ability.
Lab Report
Submitted to:
MD. Assaduzzaman
Department of CSE
Submitted by:
Mohaiminul Islam
ID: 221-15-6007
Section: 61_P1
Department of CSE
Experiment Name: Transfer Learning with CNN (Xception Model) on CT Kidney Dataset
• To apply transfer learning techniques using a pre-trained CNN (Xception) to classify CT scan
images of kidneys into four categories: Normal, Cyst, Tumor, and Stone.
• To understand the use of convolutional layers for feature extraction from medical images and
dense layers for classification tasks.
• To evaluate the performance of the CNN model using training, validation, and test datasets.
• To predict new unseen images and measure the confidence level of predictions.
Procedure:
• 2.8 Visualization:
• To better understand the training, we plotted:
• Training Accuracy vs Epochs
• Validation Accuracy vs Epochs
• plt.plot(history.history['accuracy'])
• plt.plot(history.history['val_accuracy'])
• plt.title('Model Accuracy')
• plt.xlabel('Epoch')
• plt.ylabel('Accuracy')
• plt.legend(['Train', 'Validation'])
• plt.show()
Code:
import tensorflow as tf
dataset = tf.keras.preprocessing.image_dataset_from_directory(
"/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-
Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone",
shuffle=True, batch_size=32,
image_size=(299, 299),
)
import numpy as np
base_model.trainable = False
model.compile( optimizer='adamw',
loss='sparse_categorical_crossentropy', metrics=['accuracy']
)
model.summary()
img = tf.keras.preprocessing.image.load_img(
'/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-Stone/CT-
KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Cyst/Cyst- (1003).jpg', target_size=(299, 299)
)
img_array = tf.keras.preprocessing.image.img_to_array(img) img_array =
tf.expand_dims(img_array, 0) # Create a batch predictions = model.predict(img_array)
score = tf.nn.sigmoid(predictions[0]) print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(labels[np.argmax(score)], 100 * np.max(score))
)
import numpy as np
img = tf.keras.preprocessing.image.load_img(
'/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-
Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Normal/Normal- (1006).jpg', target_size=(299, 299)
)
img_array = tf.keras.preprocessing.image.img_to_array(img) img_array =
tf.expand_dims(img_array, 0) # Create a batch predictions = model.predict(img_array)
score = tf.nn.sigmoid(predictions[0]) print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(labels[np.argmax(score)], 100 * np.max(score))
)
import numpy as np
img = tf.keras.preprocessing.image.load_img(
'/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-
Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Stone/Stone- (1005).jpg', target_size=(299, 299)
)
img_array = tf.keras.preprocessing.image.img_to_array(img) img_array =
tf.expand_dims(img_array, 0) # Create a batch predictions = model.predict(img_array)
score = tf.nn.sigmoid(predictions[0]) print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(labels[np.argmax(score)], 100 * np.max(score))
)
import numpy as np
img = tf.keras.preprocessing.image.load_img(
'/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-
Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Tumor/Tumor- (1007).jpg', target_size=(299, 299)
)
img_array = tf.keras.preprocessing.image.img_to_array(img) img_array =
tf.expand_dims(img_array, 0) # Create a batch predictions = model.predict(img_array)
score = tf.nn.sigmoid(predictions[0]) print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(labels[np.argmax(score)], 100 * np.max(score))
)
# plot accuracy and loss import
matplotlib.pyplot as plt
plt.plot(history.history['accuracy']) plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy') plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left') plt.show()
model.save('model.keras')
Output:
• The model achieved very high accuracy (~98.5%) in classifying CT scan images into four medical
conditions.
• Transfer learning helped us save time and computational resources without compromising accuracy.
• Predictions on unseen images were confident and correct, demonstrating strong generalization.
Key Learnings:
Submitted to:
MD. Assaduzzaman
Department of CSE
Submitted by:
Mohaiminul Islam
ID: 221-15-6007
Section: 61_P1
Department of CSE
Experiment Name: Image Classification Using Transfer Learning (Xception) on CT Kidney Dataset
• Apply transfer learning by using a pre-trained Convolutional Neural Network (Xception) for
image classification.
• Classify CT kidney images into four classes: Normal, Cyst, Tumor, and Stone.
Procedure:
dataset = tf.keras.preprocessing.image_dataset_from_directory(
"/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-
Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone",
shuffle=True, batch_size=32,
image_size=(299, 299),
)
import numpy as np
base_model.trainable = False
model.compile( optimizer='adamw',
loss='sparse_categorical_crossentropy', metrics=['accuracy']
)
model.summary()
img = tf.keras.preprocessing.image.load_img(
'/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-Stone/CT-
KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Cyst/Cyst- (1003).jpg', target_size=(299, 299)
)
img_array = tf.keras.preprocessing.image.img_to_array(img) img_array =
tf.expand_dims(img_array, 0) # Create a batch predictions = model.predict(img_array)
score = tf.nn.sigmoid(predictions[0]) print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(labels[np.argmax(score)], 100 * np.max(score))
)
import numpy as np
img = tf.keras.preprocessing.image.load_img(
'/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-
Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Normal/Normal- (1006).jpg', target_size=(299, 299)
)
img_array = tf.keras.preprocessing.image.img_to_array(img) img_array =
tf.expand_dims(img_array, 0) # Create a batch predictions = model.predict(img_array)
score = tf.nn.sigmoid(predictions[0]) print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(labels[np.argmax(score)], 100 * np.max(score))
)
import numpy as np
img = tf.keras.preprocessing.image.load_img(
'/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-
Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Stone/Stone- (1005).jpg', target_size=(299, 299)
)
img_array = tf.keras.preprocessing.image.img_to_array(img) img_array =
tf.expand_dims(img_array, 0) # Create a batch predictions = model.predict(img_array)
score = tf.nn.sigmoid(predictions[0]) print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(labels[np.argmax(score)], 100 * np.max(score))
)
import numpy as np
img = tf.keras.preprocessing.image.load_img(
'/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-
Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Tumor/Tumor- (1007).jpg', target_size=(299, 299)
)
img_array = tf.keras.preprocessing.image.img_to_array(img) img_array =
tf.expand_dims(img_array, 0) # Create a batch predictions = model.predict(img_array)
score = tf.nn.sigmoid(predictions[0]) print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(labels[np.argmax(score)], 100 * np.max(score))
)
# plot accuracy and loss import
matplotlib.pyplot as plt
plt.plot(history.history['accuracy']) plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy') plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left') plt.show()
model.save('model.keras')
Output:
Conclusion: In this lab experiment:
• Transfer Learning using Xception was successfully applied to classify CT kidney images.
• The model achieved very high accuracy (~98.03%) with minimal training because of the pre-learned
powerful features from ImageNet.
• The custom classification head allowed fine-tuning specifically for the CT kidney dataset.
• The experiment demonstrated how transfer learning can reduce computation time and improve
performance even on relatively small datasets.
Key Takeaways:
• How CNN layers extract and transfer complex features like edges, textures, and patterns.
Submitted to:
MD. Assaduzzaman
Department of CSE
Submitted by:
Mohaiminul Islam
ID: 221-15-6007
Section: 61_P1
Department of CSE
• Apply transfer learning by using a pre-trained Xception model for CT Kidney image classification.
• Fine-tune a new classification head by freezing the pre-trained layers and training additional
layers.
• Understand the advantages of freezing base layers and training only a small part of the model (for
efficiency).
• Split the dataset into training, testing, and validation sets appropriately.
Procedure:
Code:
import tensorflow as tf
dataset = tf.keras.preprocessing.image_dataset_from_directory(
"/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-
Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone",
shuffle=True, batch_size=32,
image_size=(299, 299),
)
import numpy as np
base_model.trainable = False
model.compile( optimizer='adamw',
loss='sparse_categorical_crossentropy', metrics=['accuracy']
)
model.summary()
img = tf.keras.preprocessing.image.load_img(
'/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-Stone/CT-
KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Cyst/Cyst- (1003).jpg', target_size=(299, 299)
)
img_array = tf.keras.preprocessing.image.img_to_array(img) img_array =
tf.expand_dims(img_array, 0) # Create a batch predictions = model.predict(img_array)
score = tf.nn.sigmoid(predictions[0]) print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(labels[np.argmax(score)], 100 * np.max(score))
)
import numpy as np
img = tf.keras.preprocessing.image.load_img(
'/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-
Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Normal/Normal- (1006).jpg', target_size=(299, 299)
)
img_array = tf.keras.preprocessing.image.img_to_array(img) img_array =
tf.expand_dims(img_array, 0) # Create a batch predictions = model.predict(img_array)
score = tf.nn.sigmoid(predictions[0]) print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(labels[np.argmax(score)], 100 * np.max(score))
)
import numpy as np
img = tf.keras.preprocessing.image.load_img(
'/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET-
Normal-Cyst-Tumor-Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Stone/Stone- (1005).jpg', target_size=(299,
299)
)
img_array = tf.keras.preprocessing.image.img_to_array(img) img_array =
tf.expand_dims(img_array, 0) # Create a batch predictions = model.predict(img_array)
score = tf.nn.sigmoid(predictions[0]) print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(labels[np.argmax(score)], 100 * np.max(score))
)
import numpy as np
img = tf.keras.preprocessing.image.load_img(
'/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-
Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Tumor/Tumor- (1007).jpg', target_size=(299, 299)
)
img_array = tf.keras.preprocessing.image.img_to_array(img) img_array =
tf.expand_dims(img_array, 0) # Create a batch predictions = model.predict(img_array)
score = tf.nn.sigmoid(predictions[0]) print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(labels[np.argmax(score)], 100 * np.max(score))
)
# plot accuracy and loss import
matplotlib.pyplot as plt
plt.plot(history.history['accuracy']) plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy') plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left') plt.show()
model.save('model.keras')
Output:
Conclusion: In this lab, we successfully applied Fine-Tuned Transfer Learning:
• Model achieved very high accuracy (~98.03%), proving the power of transfer learning.
• Frozen base layers ensured faster training with fewer computational resources.
• Unseen image predictions were highly accurate with high confidence scores.
Lab Report
Submitted to:
MD. Assaduzzaman
Department of CSE
Submitted by:
Mohaiminul Islam
ID: 221-15-6007
Section: 61_P1
Department of CSE
Experiment Name: Ablation Study on Batch Size and Optimizer in Transfer Learning with Xception
• Perform an Ablation Study to observe the effect of different hyperparameters (batch size,
optimizer) on a CNN’s performance.
• Apply fine-tuned transfer learning using a pre-trained Xception model for CT Kidney image
classification.
• Visualize and compare training and validation accuracy curves for different setups.
Procedure:
• 2.9 Visualization
• Plotted accuracy curves to observe training dynamics:
• plt.plot(history.history['accuracy'])
• plt.plot(history.history['val_accuracy'])
• plt.title('Model Accuracy')
• plt.xlabel('Epoch')
• plt.ylabel('Accuracy')
• plt.legend(['Train', 'Validation'])
• plt.show()
Code:
import tensorflow as tf
dataset = tf.keras.preprocessing.image_dataset_from_directory(
"/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-
Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone",
shuffle=True, batch_size=32,
image_size=(299, 299),
)
import numpy as np
model.compile( optimizer='adamw',
loss='sparse_categorical_crossentropy', metrics=['accuracy']
)
model.summary()
img = tf.keras.preprocessing.image.load_img(
'/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-Stone/CT-
KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Cyst/Cyst- (1003).jpg', target_size=(299, 299)
)
img_array = tf.keras.preprocessing.image.img_to_array(img) img_array =
tf.expand_dims(img_array, 0) # Create a batch predictions = model.predict(img_array)
score = tf.nn.sigmoid(predictions[0]) print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(labels[np.argmax(score)], 100 * np.max(score))
)
import numpy as np
img = tf.keras.preprocessing.image.load_img(
'/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-
Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Normal/Normal- (1006).jpg', target_size=(299, 299)
)
img_array = tf.keras.preprocessing.image.img_to_array(img) img_array =
tf.expand_dims(img_array, 0) # Create a batch predictions = model.predict(img_array)
score = tf.nn.sigmoid(predictions[0]) print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(labels[np.argmax(score)], 100 * np.max(score))
)
import numpy as np
img = tf.keras.preprocessing.image.load_img(
'/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-
Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Stone/Stone- (1005).jpg', target_size=(299, 299)
)
img_array = tf.keras.preprocessing.image.img_to_array(img)
img_array = tf.expand_dims(img_array, 0) # Create a batch
predictions = model.predict(img_array)
score = tf.nn.sigmoid(predictions[0]) print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(labels[np.argmax(score)], 100 * np.max(score))
)
import numpy as np
img = tf.keras.preprocessing.image.load_img(
'/kaggle/input/ct-kidney-dataset-normal-cyst-tumor-and-stone/CT-KIDNEY-DATASET- Normal-Cyst-Tumor-
Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Tumor/Tumor- (1007).jpg', target_size=(299, 299)
)
img_array = tf.keras.preprocessing.image.img_to_array(img) img_array =
tf.expand_dims(img_array, 0) # Create a batch predictions = model.predict(img_array)
score = tf.nn.sigmoid(predictions[0]) print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(labels[np.argmax(score)], 100 * np.max(score))
)
# plot accuracy and loss import
matplotlib.pyplot as plt
plt.plot(history.history['accuracy']) plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy') plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left') plt.show()
model.save('model.keras')
Output:
Conclusion: In this ablation study, we conclude:
• Freezing the base model (Xception) led to high accuracy (~98.03%) while avoiding overfitting.
• The fine-tuned model was stable, fast, and accurate for CT Kidney Image Classification.