0% found this document useful (0 votes)
24 views26 pages

Classification Chapter 5

The document provides an overview of classification in data mining, explaining its purpose, steps, and key concepts such as supervised and unsupervised learning. It outlines the process of building and evaluating classifiers, including popular algorithms like decision trees and Naïve Bayes. Additionally, it discusses the importance of classification for predictions, automation, and decision-making in various applications.

Uploaded by

windsorgrey890
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views26 pages

Classification Chapter 5

The document provides an overview of classification in data mining, explaining its purpose, steps, and key concepts such as supervised and unsupervised learning. It outlines the process of building and evaluating classifiers, including popular algorithms like decision trees and Naïve Bayes. Additionally, it discusses the importance of classification for predictions, automation, and decision-making in various applications.

Uploaded by

windsorgrey890
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

CLASSIFICATION

Fundamental of data mining


Outline:
• Introduction to classification
• Basic Steps in Classification
• Important terms
• Types of classification problems
• Popular classification algorithms
• Evaluation of classifier
• Applications of classification
Introduction
• What is Classification?

Classification is a way to sort things into groups based on their features.

Categorization of new Data with the help of current / past data.

Example:

Grouping of the patients based on their medical records

Hospitals use classification to predict diseases based on patient


symptoms
Example
🍎 🍌 🍎 🍌 🍇
Apple Banana Apple Banana Grape

We want to classify them into groups:

Group 1: Apples (🍎)


Group 2: Bananas (🍌)
Group 3: Grapes (🍇)
In Data Mining:
• In Data Mining,
Classification is the process where a computer learns from
past data (examples we already know)
and then predicts the group for new data.
“Supervised vs Unsupervised
Learning”
• Supervised:

The computer learns from past data with answers and then makes
predictions for new data.
Example:
Predicting if a person will get a loan or not
• Unsupervised:
The computer doesn’t get answers. It tries to find patterns or group
things by itself.
Example:
Grouping customers by shopping habits
Why we need Classification in
Data mining?
Reasons Explanation
Predictions We can guess what new data
belongs to (like spam emails or loan
approvals).
Automation Saves human time! Computers can
classify thousands of things very
quickly.
Decision Making Helps companies and people make
smart decisions based on data.
Finding Patterns Helps discover hidden patterns in
the data that we might not notice
easily.
Organization Helps in sorting and organizing large
amounts of data properly.
Basic Steps In Classification
• Data Collection:

Gather the old data you already know about (like old emails, patient records,
etc.).

• Data Preparation:

Clean the data (remove mistakes, fill missing values, etc.)

• Model Building (Training):


 Teach the computer using this clean data.
 The computer looks for patterns and learns.
Continued…
• Model Testing:
 Check if the computer has learned correctly.
 Give it some new test data and see if it guesses right.

• Model Evaluation:

Measure how good or bad the computer is at guessing.

• Deployment (Use the Model):

Now, start using this trained computer model in real life to predict new cases.
Example
Step What Happen?
1. Collect Data Gather Examples
2. Prepare Data Clean The examples
3. Train Model Teach the Computer
4. Test Model Check if it learned
5. Evaluate See how well it guesses.
6. Deploy Start using it for real!
Simple Example: Classifying
Fruits
•Collect : pictures of apples and bananas 🍎🍌.

•Prepare :by removing blurry images.

•Train the model: Teach it what apples and bananas look like.

•Test: with new fruit pictures.

•Evaluate: See how many times it guesses right.

•Deploy: Use it in a real fruit-sorting machine!


What is Model?
•A model is like a smart recipe or a set of rules that the
computer creates after learning from old data.

• It remembers the important patterns.

• It uses those patterns to predict the group/class of new


things.
Important Terms
Terms Explanation Example
Data we use to teach 100 emails with labels
Training Data
the computer (with
answers).
New data used to check 20 new emails to test
Test Data
if the computer has predictions
learned correctly.
What the computer A spam detector model
Model
builds/learns to make
predictions.
Class A group or category "Spam" or "Not Spam"
something belongs to. in emails

Label The correct answer for Label = "Spam" for an


the data. email
How many predictions 90 correct out of 100 =
Accuracy
were correct out of 90% accuracy
Decision Tree
•A decision tree is like a flowchart that helps a
computer make decisions based on answers to simple
questions.
• It starts at the top (called the root).
• It asks a yes/no or simple question at each step.
• Based on the answer, it moves to the next question.
• Finally, it reaches a decision (this is called a leaf node).
Real-Life Example
[Is exam tomorrow?]
Helping a
student / \
decide Yes No
whether they / \
should study
for an exam. [Are notes ready?] [Watch TV]
/ \
Yes No
/ \
[Revise] [Prepare Notes]
In Data Mining
• A decision tree helps in classification. For example:
• Classify a person’s loan status (Yes or No)
[Income > 50k?]
/ \
Yes No
/ \
[Credit Good?] Reject
/ \
Yes No
/ \
Accept Reject
By Using ID3
• ID3 stands for (Iterative Dichotomiser 3)

• ID3 is a popular algorithm used to create decision trees.


It chooses the best question (attribute) at each step by
checking:

Which question gives the most information?

 Entrophy

Information Gain
Entrophy
• Entropy tells us how mixed or impure a group of data is.
• If a group has all same values (like all “Yes”) → Entropy =
(pure)
• If it's a perfect mix (like half “Yes” and half “No”) → Entropy
= (very impure)
• Entropy Formula:
Entropy(S)=−p1​log2​(p1​)−p2​log2​(p2​)−…
• p1​,p2​are the proportions of each class (like Yes, No)
• "log base 2" means you're measuring in bits
Information Gain
• Information Gain (IG) tells us:
to decide which question to ask first.
Formula of Information Gain:
Information Gain= Entropy(Parent)−∑((Number of items
rows in branch/Total items rows ​)
×Entropy(Branch))
Data Set
Weather Temperatur Humidity Windy Play?
e
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Step By Step
•Calculate Entropy of "Play?" column
This tells us how "mixed" the data is. (e.g., how many Yes vs. No)
•Choose the best attribute
Check each column (Weather, Temperature, etc.) and
find the one with the highest information gain (best question to split the data).
•Split the dataset
Use the chosen attribute to divide the data into branches.
•Repeat
Do the same steps on each branch, until:
•All data is the same class (pure), or
•There are no more questions left
Naïve Bay’s Theorem
Person Covid (Yes/No) Flu(Yes/No) Fever(Yes/No)
1 Yes No Yes
2 No Yes Yes
3 Yes Yes Yes
4 No No No
5 Yes No Yes
6 No No Yes
7 Yes No Yes
8 Yes No No
9 No Yes Yes
10 No Yes No
Classifier Evaluation
• Whenwe build a classifier (like Decision Tree, Naive Bayes, etc.),
we must check how good it is. This process is called evaluation.
Accuracy:
How many predictions were correct out of total.
Precision:
Of the predicted positives, how many were actually right?
Recall:
Of all actual positives, how many did we predict right?
Confusion Matrix
Example
Formulas

You might also like