Classification Chapter 5

The document provides an overview of classification in data mining, explaining its purpose, steps, and key concepts such as supervised and unsupervised learning. It outlines the process of building and evaluating classifiers, including popular algorithms like decision trees and Naïve Bayes. Additionally, it discusses the importance of classification for predictions, automation, and decision-making in various applications.

Uploaded by

windsorgrey890

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views26 pages

Classification Chapter 5

Uploaded by

windsorgrey890

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

CLASSIFICATION

Fundamental of data mining

Outline:
• Introduction to classification
• Basic Steps in Classification
• Important terms
• Types of classification problems
• Popular classification algorithms
• Evaluation of classifier
• Applications of classification
Introduction
• What is Classification?

Classification is a way to sort things into groups based on their features.

Categorization of new Data with the help of current / past data.

Example:

Grouping of the patients based on their medical records

Hospitals use classification to predict diseases based on patient

symptoms
Example
🍎 🍌 🍎 🍌 🍇
Apple Banana Apple Banana Grape

We want to classify them into groups:

Group 1: Apples (🍎)

Group 2: Bananas (🍌)
Group 3: Grapes (🍇)
In Data Mining:
• In Data Mining,
Classification is the process where a computer learns from
past data (examples we already know)
and then predicts the group for new data.
“Supervised vs Unsupervised
Learning”
• Supervised:

The computer learns from past data with answers and then makes
predictions for new data.
Example:
Predicting if a person will get a loan or not
• Unsupervised:
The computer doesn’t get answers. It tries to find patterns or group
things by itself.
Example:
Grouping customers by shopping habits
Why we need Classification in
Data mining?
Reasons Explanation
Predictions We can guess what new data
belongs to (like spam emails or loan
approvals).
Automation Saves human time! Computers can
classify thousands of things very
quickly.
Decision Making Helps companies and people make
smart decisions based on data.
Finding Patterns Helps discover hidden patterns in
the data that we might not notice
easily.
Organization Helps in sorting and organizing large
amounts of data properly.
Basic Steps In Classification
• Data Collection:

Gather the old data you already know about (like old emails, patient records,
etc.).

• Data Preparation:

Clean the data (remove mistakes, fill missing values, etc.)

• Model Building (Training):

 Teach the computer using this clean data.
 The computer looks for patterns and learns.
Continued…
• Model Testing:
 Check if the computer has learned correctly.
 Give it some new test data and see if it guesses right.

• Model Evaluation:

Measure how good or bad the computer is at guessing.

• Deployment (Use the Model):

Now, start using this trained computer model in real life to predict new cases.
Example
Step What Happen?
1. Collect Data Gather Examples
2. Prepare Data Clean The examples
3. Train Model Teach the Computer
4. Test Model Check if it learned
5. Evaluate See how well it guesses.
6. Deploy Start using it for real!
Simple Example: Classifying
Fruits
•Collect : pictures of apples and bananas 🍎🍌.

•Prepare :by removing blurry images.

•Train the model: Teach it what apples and bananas look like.

•Test: with new fruit pictures.

•Evaluate: See how many times it guesses right.

•Deploy: Use it in a real fruit-sorting machine!

What is Model?
•A model is like a smart recipe or a set of rules that the
computer creates after learning from old data.

• It remembers the important patterns.

• It uses those patterns to predict the group/class of new

things.
Important Terms
Terms Explanation Example
Data we use to teach 100 emails with labels
Training Data
the computer (with
answers).
New data used to check 20 new emails to test
Test Data
if the computer has predictions
learned correctly.
What the computer A spam detector model
Model
builds/learns to make
predictions.
Class A group or category "Spam" or "Not Spam"
something belongs to. in emails

Label The correct answer for Label = "Spam" for an

the data. email
How many predictions 90 correct out of 100 =
Accuracy
were correct out of 90% accuracy
Decision Tree
•A decision tree is like a flowchart that helps a
computer make decisions based on answers to simple
questions.
• It starts at the top (called the root).
• It asks a yes/no or simple question at each step.
• Based on the answer, it moves to the next question.
• Finally, it reaches a decision (this is called a leaf node).
Real-Life Example
[Is exam tomorrow?]
Helping a
student / \
decide Yes No
whether they / \
should study
for an exam. [Are notes ready?] [Watch TV]
/ \
Yes No
/ \
[Revise] [Prepare Notes]
In Data Mining
• A decision tree helps in classification. For example:
• Classify a person’s loan status (Yes or No)
[Income > 50k?]
/ \
Yes No
/ \
[Credit Good?] Reject
/ \
Yes No
/ \
Accept Reject
By Using ID3
• ID3 stands for (Iterative Dichotomiser 3)

• ID3 is a popular algorithm used to create decision trees.

It chooses the best question (attribute) at each step by
checking:

Which question gives the most information?

 Entrophy

Information Gain
Entrophy
• Entropy tells us how mixed or impure a group of data is.
• If a group has all same values (like all “Yes”) → Entropy =
(pure)
• If it's a perfect mix (like half “Yes” and half “No”) → Entropy
= (very impure)
• Entropy Formula:
Entropy(S)=−p1log2(p1)−p2log2(p2)−…
• p1,p2are the proportions of each class (like Yes, No)
• "log base 2" means you're measuring in bits
Information Gain
• Information Gain (IG) tells us:
to decide which question to ask first.
Formula of Information Gain:
Information Gain= Entropy(Parent)−∑((Number of items
rows in branch/Total items rows )
×Entropy(Branch))
Data Set
Weather Temperatur Humidity Windy Play?
e
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Step By Step
•Calculate Entropy of "Play?" column
This tells us how "mixed" the data is. (e.g., how many Yes vs. No)
•Choose the best attribute
Check each column (Weather, Temperature, etc.) and
find the one with the highest information gain (best question to split the data).
•Split the dataset
Use the chosen attribute to divide the data into branches.
•Repeat
Do the same steps on each branch, until:
•All data is the same class (pure), or
•There are no more questions left
Naïve Bay’s Theorem
Person Covid (Yes/No) Flu(Yes/No) Fever(Yes/No)
1 Yes No Yes
2 No Yes Yes
3 Yes Yes Yes
4 No No No
5 Yes No Yes
6 No No Yes
7 Yes No Yes
8 Yes No No
9 No Yes Yes
10 No Yes No
Classifier Evaluation
• Whenwe build a classifier (like Decision Tree, Naive Bayes, etc.),
we must check how good it is. This process is called evaluation.
Accuracy:
How many predictions were correct out of total.
Precision:
Of the predicted positives, how many were actually right?
Recall:
Of all actual positives, how many did we predict right?
Confusion Matrix
Example
Formulas

Classification
No ratings yet
Classification
33 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Data Mining Classification Guide
No ratings yet
Data Mining Classification Guide
35 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
141 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
DM - 06 Mar 2025
No ratings yet
DM - 06 Mar 2025
13 pages
Unit 4 Classification & Prediction
No ratings yet
Unit 4 Classification & Prediction
10 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Classification Notes
No ratings yet
Classification Notes
14 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
Classification & Prediction Guide
100% (1)
Classification & Prediction Guide
67 pages
Module 04
No ratings yet
Module 04
75 pages
Classification
No ratings yet
Classification
23 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
7 Classification
100% (3)
7 Classification
63 pages
Classification
No ratings yet
Classification
20 pages
Chapter 02 - DM Tasks - Part I - Classification
No ratings yet
Chapter 02 - DM Tasks - Part I - Classification
58 pages
Learning AI
No ratings yet
Learning AI
34 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
Classification and Prediction
No ratings yet
Classification and Prediction
130 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
41 pages
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
No ratings yet
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
60 pages
Chapter 3
No ratings yet
Chapter 3
67 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
DataMining Unit-3
No ratings yet
DataMining Unit-3
8 pages
Classification & Prediction
No ratings yet
Classification & Prediction
19 pages
DM Ch6 (Classification and Prediction)
No ratings yet
DM Ch6 (Classification and Prediction)
39 pages
Classification
No ratings yet
Classification
50 pages
Data Mining and Classification Basics
No ratings yet
Data Mining and Classification Basics
129 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Data Mining
No ratings yet
Data Mining
68 pages
Learning
No ratings yet
Learning
51 pages
41 j48 Naive Bayes Weka
No ratings yet
41 j48 Naive Bayes Weka
5 pages
DM Classification 1 3
No ratings yet
DM Classification 1 3
19 pages
Classification in Data Mining
No ratings yet
Classification in Data Mining
14 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
70 pages
Data Mining Basics for Beginners
No ratings yet
Data Mining Basics for Beginners
20 pages
Chapter3 Classification and Prediction
No ratings yet
Chapter3 Classification and Prediction
63 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
4 Classification
No ratings yet
4 Classification
20 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
CCST9017 (2023-24lecture11printed Version) MachineLearning
No ratings yet
CCST9017 (2023-24lecture11printed Version) MachineLearning
55 pages
Classification-1
No ratings yet
Classification-1
48 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Unit 3
No ratings yet
Unit 3
16 pages
Introduction To ML
No ratings yet
Introduction To ML
31 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Grade 12 Louis Aivan
No ratings yet
Grade 12 Louis Aivan
32 pages
SEPAROVIC Lepuri
No ratings yet
SEPAROVIC Lepuri
46 pages
Lyrics
No ratings yet
Lyrics
1 page
Basic Switch Stacking
No ratings yet
Basic Switch Stacking
3 pages
Time Management Training by Lisa J Downs
No ratings yet
Time Management Training by Lisa J Downs
6 pages
Training and Development
100% (1)
Training and Development
12 pages
Chapter 5 Strategy Implementation, Control and Evaluation
No ratings yet
Chapter 5 Strategy Implementation, Control and Evaluation
34 pages
Law Abiding Citizen (2009) BluRayExt High
No ratings yet
Law Abiding Citizen (2009) BluRayExt High
25 pages
MyShield & MyHealthPlus Insurance Benefits
No ratings yet
MyShield & MyHealthPlus Insurance Benefits
32 pages
Digital Marketing's Impact on SME Sales
No ratings yet
Digital Marketing's Impact on SME Sales
16 pages
VI Sem Advanced Pharmacognosy MCQs - Upload
100% (5)
VI Sem Advanced Pharmacognosy MCQs - Upload
12 pages
Day 3 - Transcription and RNA Processing
No ratings yet
Day 3 - Transcription and RNA Processing
50 pages
Object Oriented Programming (Java) Lecture Notes Unit 1
No ratings yet
Object Oriented Programming (Java) Lecture Notes Unit 1
14 pages
History of Universe
No ratings yet
History of Universe
25 pages
Everything cURL
No ratings yet
Everything cURL
403 pages
Data Sheet c78-720918
No ratings yet
Data Sheet c78-720918
46 pages
Two Stage Electro-Hydraulic Servo Valve
No ratings yet
Two Stage Electro-Hydraulic Servo Valve
4 pages
Proposal For Elderly Care Center
No ratings yet
Proposal For Elderly Care Center
20 pages
9 Data Analysis
No ratings yet
9 Data Analysis
31 pages
Assessment Pack Genmath w3
No ratings yet
Assessment Pack Genmath w3
2 pages
Testing
No ratings yet
Testing
28 pages
AS Level Physics A Exam Paper
No ratings yet
AS Level Physics A Exam Paper
44 pages
Bank Service Quality Insights
No ratings yet
Bank Service Quality Insights
16 pages
Transmisor de Temperatura Omegaette TXDIN70
No ratings yet
Transmisor de Temperatura Omegaette TXDIN70
8 pages
Current Day Number Yearly Calendar 2025
No ratings yet
Current Day Number Yearly Calendar 2025
1 page
General Acceptability of Karonda Carissa Carandas Vita Gummy FINAL NA
No ratings yet
General Acceptability of Karonda Carissa Carandas Vita Gummy FINAL NA
66 pages
DENSO Robotics Datasheet Vs 050-060 Series
No ratings yet
DENSO Robotics Datasheet Vs 050-060 Series
2 pages
2015.104396.journal of The Ceylon Branch of The Royal Asiatic Society 1917 Vol25 Diary of MR John Doyly Text
No ratings yet
2015.104396.journal of The Ceylon Branch of The Royal Asiatic Society 1917 Vol25 Diary of MR John Doyly Text
388 pages
StudentStatus 2
No ratings yet
StudentStatus 2
2 pages
News Article Lesson Plan
No ratings yet
News Article Lesson Plan
3 pages