0% found this document useful (0 votes)
167 views10 pages

ST1 4483 8995 Capstone PPT Template

This presentation explores analyzing the Iris dataset using exploratory data analysis, developing a predictive model using decision trees, and creating a GUI desktop and web application. The EDA found correlations between features and differences between species. Testing classifiers found random forests most accurate at 96.67%. Applications allowing users to input features and get predictions were created with Tkinter and Streamlit.

Uploaded by

360mostafasaif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
167 views10 pages

ST1 4483 8995 Capstone PPT Template

This presentation explores analyzing the Iris dataset using exploratory data analysis, developing a predictive model using decision trees, and creating a GUI desktop and web application. The EDA found correlations between features and differences between species. Testing classifiers found random forests most accurate at 96.67%. Applications allowing users to input features and get predictions were created with Tkinter and Streamlit.

Uploaded by

360mostafasaif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

4483/8995 CAPSTONE PROJECT

PRESENTATION

Exploring the Iris Dataset: EDA,


Classification and GUI Application

INSERT STUDENT NAMES/IDs:


TUTORIAL GROUP – WEEK DAY/TIME:
Table of Contents

1. Introduction / Problem Statement


2. Dataset Details
3. EDA (Exploratory Data Analysis) Outcomes
4. PDA (Predictive Data Analytics) Outcomes
5. Implementation and Deployment
(TkInter/Flask/Streamlit) Plan and Status Update
6. References/Bibliography
1. Introduction / Problem
Statement
• In this presentation on our analysis of the Iris dataset and the development of a predictive model for its species
classification.
• The Iris dataset is a well-known dataset in the field of machine learning and data analysis, and it contains
measurements of various features of different Iris flower species. In this presentation, we will walk you through
our exploratory data analysis (EDA) of the dataset, where we have looked at different characteristics and patterns
in the data.
• We will also present our approach for developing a predictive model using a decision tree classifier, and we will
demonstrate how we can use this model to classify new Iris samples based on their features.
• Finally, we will showcase our efforts in creating a desktop GUI application using Python's Tkinter library and a
web application using Streamlit.
• We hope you find this presentation informative and interesting.
2. Dataset Details
• The Iris dataset is a classic dataset used in machine learning and statistics. It contains information on three
species of the Iris flower (Iris setosa, Iris versicolor, and Iris virginica) with 50 samples for each species.

• The dataset includes four features of each sample: sepal length, sepal width, petal length, and petal width. The
dataset is often used for classification tasks, where the goal is to predict the species of a new flower based on its
features.
3. EDA (Exploratory Data
Analysis) Outcomes
• The average sepal length and width of the flowers are 5.8 cm and 3.1 cm, respectively. The average petal
length and width are 3.8 cm and 1.2 cm, respectively.

• The petal length and width are highly correlated, while the sepal length and width have a weaker positive
correlation.

• The setosa species can be easily separated from the other two species based on their petal length and
width, while versicolor and virginica have some overlap in their feature distributions.
4. PDA (Predictive Data
Analysis) Outcomes
• As our project focuses on classification of iris plants based on their sepal and
petal dimensions, the PDA outcomes involve the performance of different
machine learning models on the dataset.
• We evaluated the performance of four different classification models -
Logistic Regression, K-Nearest Neighbors, Decision Tree, and Random
Forest - using 10-fold cross-validation. The results showed that all four
models were able to classify the iris plants with high accuracy, with Random
Forest performing the best with an average accuracy of 96.67%.
• We also performed feature selection using Recursive Feature Elimination
(RFE) with Logistic Regression as the underlying model. The RFE results
showed that the most important features for iris classification were petal
length and petal width.
• Overall, the PDA outcomes suggest that machine learning models can
effectively classify iris plants based on their sepal and petal dimensions, and
that petal length and width are the most important features for this
classification task.
5. Implementation and Deployment (TkInter/Flask/Streamlit) Plan and Status Update

• For the deployment plan, we will create a desktop application using


the Tkinter library for the decision tree model, and a web-based
application using the Streamlit library for the support vector
machine model. Both applications will allow users to input values
for the four iris features, and the model will output the predicted
class.

• Currently, the implementation phase for both the desktop


application and web-based application has been completed. The
desktop application has been developed using Python and Tkinter,
while the web-based application has been developed using Python
and Streamlit. Both applications have been tested and validated
with the iris dataset, and the accuracy of the machine learning
models has been confirmed
5. Implementation and Deployment (TkInter/Flask/Streamlit) Plan and Status Update(Screenshots)

Desktop GUI app


5. Implementation and Deployment (TkInter/Flask/Streamlit) Plan and Status Update(Screenshots)

Streamlit App
References /Bibilography
Brownlee, J. (2020). How to Develop a Multiclass Classification Model
for Iris Flower Species. Retrieved from
https://machinelearningmastery.com/how-to-develop-a-multiclass-
classification-model-for-iris-flower-species/

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of


statistical learning: data mining, inference, and prediction. New York:
Springer.

https://www.kaggle.com/datasets/uciml/iris?resource=download

"Building Machine Learning Applications with Streamlit" by Abhishek


Thakur: https://towardsdatascience.com/building-machine-learning-
applications-with-streamlit-667cef3e0f1a

You might also like