0% found this document useful (0 votes)

34 views36 pages

Data Classification and Prediction : Lecture-11

The document discusses various classification algorithms including decision trees, Bayesian classifiers, and k-nearest neighbors. It covers decision tree algorithms like ID3, C4.5, and CART and explains how they use information gain, information gain ratio, and Gini index to build classification trees. The document also provides an overview of Bayesian classification and the naive Bayesian classifier approach. It gives examples to illustrate naive Bayesian classification and compares the different impurity measures used in decision tree algorithms.

Uploaded by

deepanshu ja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views36 pages

Data Classification and Prediction : Lecture-11

Uploaded by

deepanshu ja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Lecture-11

Data Classification and Prediction

(Part-2)

Dr.J. Dhar 1
Classification Algorithms
 Classification by Decision Tree Induction:
 ID3
 C4.5
CART
• Bayesian Classifier
• k-Nearest-Neighbor Classifier
• Classification and Prediction Accuracy Measures
Dr.J. Dhar 2
A Defect of
• It favors attributes with many values
• Such attribute splits N to many subsets, and if these are small,
they will tend to be pure anyway
• One way to rectify this is through a corrected measure of
information gain ratio.
• C4.5, a successor of ID3, uses an extension to information gain
known as gain ratio,which attempts to overcome this bias.
• It applies a kind of normalization to information gain using a
“split information” value defined analogously with Info(D)
Dr.J. Dhar 3
C4.5 Algorithm: Information Gain Ratio
• SplitInfo_A(D) is amount of information needed to
determine the value of an attribute A

• Information gain ratio:

Dr.J. Dhar 4
. .
. .
Information Gain Ratio
. .
. .
red

Color? green

.
yellow .

.
.

Dr.J. Dhar 5
Comparison:
Information Gain and Information Gain Ratio

A |v(A)| Gain(A) GainRatio(A)

Color 3 0.247 0.156
Outline 2 0.152 0.152
Dot 2 0.048 0.049

Dr.J. Dhar 6
CART Algorithm: Gini Index
• Another sensible measure of impurity (i and j are classes)

• After applying attribute A, the resulting Gini value is

• Giniindex ( GiniGain) can be interpreted as expected error rate

Dr.J. Dhar 7
Gini Index

. .

. .
. .

Dr.J. Dhar 8
. .
. .
. .
Gini Index for Color
. .
red

Color? green

.
yellow .

.
.

Dr.J. Dhar 9
GiniGain of Color

Dr.J. Dhar 10
Comparison of Three Impurity Measures
A Gain(A) GainRatio(A) GiniGain(A)
Color 0.247 0.156 0.058
Outline 0.152 0.152 0.046
Dot 0.048 0.049 0.015

 These impurity measures assess the effect of a single attribute

Criterion “most informative” that they define is local.
 It does not reliably predict the effect of several attributes applied
jointly
Dr.J. Dhar 11
Alternative calculation of Gini Index

Dr.J. Dhar 12
Tree Pruning
When a decision tree is built, many of the branches will reflect
anomalies in the training data due to noise or outliers.
Pruned trees tend to be smaller and less complex and, thus,
easier to comprehend.
They are usually faster and better at correctly classifying
independent test data than unpruned trees.
In the prepruning approach, a tree is “pruned” by halting its
construction early.
Dr.J. Dhar 13
Decision Tree Pruning

Dr.J. Dhar 14
Bayesian Classification
• Bayesian classification is based on Bayes’ theorem.
• Studies comparing classification algorithms have found
a simple Bayesian classifier known as the Naïve
Bayesian Classifier.
• It is comparable in performance with decision tree and
selected neural network classifiers.
• Bayesian classifiers have also exhibited high accuracy
and speed when applied to large databases.

Dr.J. Dhar 15
Naïve Bayesian Classification
1. Let D be a training set of tuples and their associated class labels.
As usual, each tuple is represented by an n-dimensional attribute
vector, X = ( , : : : , ), depicting n measurements made on
the tuple from n attributes, respectively, ,:::, .
2. Suppose that there are m classes, ,:::, . Given a tuple,
X, the classifier will predict that X belongs to the class having the
highest posterior probability, conditioned on X. That is, the naïve
Bayesian classifier predicts that tuple X belongs to the class if
and only if

Dr.J. Dhar 16
3. Thus we maximize . The class for which is
maximized is called the maximum posteriori hypothesis. By
Bayes’ theorem:

4. As P(X) is constant for all classes, only need be

maximized. If the class prior probabilities are not known, then it
is commonly assumed that the classes are equally likely, that is,
P(C1) = P(C2) = …. = P(Cm), and we would therefore maximize
. Otherwise, we maximize .

Dr.J. Dhar 17
• In order to reduce computation in evaluating , the
naive assumption of class conditional independence is made.
We can easily estimate the probabilities , ), : :
:, ) from the training tuples. Thus

Dr.J. Dhar 18
Example: Bank Loan
Sl No Age Income Occupation C_R Loan The data tuples are described by the
1 Young Low Service Good Safe attributes Age, Income, Occupation,
and C_R. The class label attribute,
2 Young Average Entp Fair Risky
Bank loan, has two distinct values
3 Middle High Farmer Fair Safe
(namely, Safe, Risky). Let
4 Senior Average Entp Good Risky correspond to the class Bank loan =
5 Middle High Service Fair Safe Safe and correspond to Bank
6 Senior High Entp Good Safe loan = Risky.
7 Middle High Farmer Good Safe
8 Senior Average Entp Fair Risky The tuple we wish to classify is
9 Middle Average Entp Good Risky X = (Age= Middle, Income = Average,
Occupation = Entp, C_R= Fair)
10 Young Average Service Fair Safe

Dr.J. Dhar 19
( )
Solution
, i=1,2
( )
Here P( )= 6/10 and P( )= 4/10
X = (Age= Middle, Income = Average, Occupation = Entp, C_R= Fair)

= Age= Middle Income = Average Occupation = Entp C_R= Fair

= (3/6 )x (1/6) x (1/6)X(3/6) = 1/144
= Age= Middle Income = Average Occupation = Entp C_R= Fair
= (1/4 )x (4/4) x (4/4)X(2/4) = 1/8
Which implies .

Hence > , therefore X belongs to 𝟐 class (i.e., loan is Risky).

Dr.J. Dhar 20
Example

Dr.J.21
Dhar
Example
• Learning Phase
Outlook Play=Yes Play=No Temperature Play=Yes Play=No
Sunny 2/9 3/5 Hot 2/9 2/5
Overcast 4/9 0/5 Mild 4/9 2/5
Rain 3/9 2/5 Cool 3/9 1/5

Humidity Play=Yes Play=N Wind Play=Yes Play=No

o Strong 3/9 3/5
High 3/9 4/5 Weak 6/9 2/5
Normal 6/9 1/5

P(Play=Yes) = 9/14 P(Play=No) = 5/14

Dr.J.22
Dhar
Example
• Test Phase
– Given a new instance, predict its label
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables achieved in the learning phrase
P(Outlook=Sunny|Play=Yes) = 2/9 P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=Yes) = 3/9 P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=Yes) = 3/9 P(Wind=Strong|Play=No) = 3/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14

– Decision making with the MAP rule

Given the fact P(Yes|x’) < Dr.J.

P(No|
23 x’), we label x’ to be “No”.
Dhar
Naïve Bayes Algorithm: Continuous-Valued Features
– Numberless values taken by a continuous-valued feature
– Conditional probability often modeled with the normal distribution
1  ( x j   ji ) 2 
Pˆ ( x j | c i )  exp   
2  ji  2  2 
 ji 
 ji : mean (avearage) of feature values x j of examples for which c  c i
 ji : standard deviation of feature values x j of examples for which c  c i

– Learning Phase: for X  ( X 1 ,  , X F ), C  c1 ,  , c L

Output: normal distributions and P(C  ci ) i  1,  , L
– Test Phase: Given an unknown instance X   ( a1 ,  , an )
• Instead of looking-up tables, calculate conditional probabilities with all the normal
distributions achieved in the learning phrase
• Apply the MAP rule to assign a label (the same as done for the discrete case)

Dr.J.24
Dhar
Naïve Bayes
• Example: Continuous-valued Features
– Temperature is naturally of continuous value.
Yes: 22.8, 25.2, 20.1, 19.3, 18.5, 21.7, 24.3, 23.1, 19.8
No: 15.1, 17.4, 27.3, 30.1, 29.5
– Estimate mean and variance for each class
1 N 1 N Yes  21.64 , Yes  2.35
   xn ,    ( xn   ) 2
2
N n1 N n1  No  23.88 , No  7.09
– Learning Phase: output two Gaussian models for P(temp|C)
1  ( x  21 .64 ) 2  1  ( x  21 .64 ) 2 
Pˆ ( x | Yes )  exp     exp   
2.35 2  2  2 . 35 2
 2 . 35 2  11 . 09 
ˆ 1  ( x  23 .88 ) 2  1  ( x  23 .88 ) 2 
P ( x | No )  exp     exp   
7 .09 2  2  7 . 09 2
 7 .09 2  50 . 25 

Dr.J.25
Dhar
Naïve Bayes: Zero conditional probability
• If no example contains the feature value
– In this circumstance, we face a zero conditional probability problem
during test
Pˆ (x1 | ci )    Pˆ (a jk | ci )    Pˆ (xn | ci )  0 for xj  ajk , Pˆ(ajk | ci )  0
– For a remedy, class conditional probabilities re-estimated with

n  mp
Pˆ ( a jk | ci )  c (m-estimate)
nm
nc : number of training examples for which x j  a jk and c  ci
n : number of training examples for which c  ci
p : prior estimate (usually, p  1 / t for t possible values of x j )
m : weight to prior (number of " virtual" examples, m  1)
Dr.J.26
Dhar
Zero conditional probability
• Example: P(outlook=overcast|no)=0 in the play-tennis dataset
– Adding m “virtual” examples (m: up to 1% of #training
example)
• In this dataset, # of training examples for the “no” class is 5.

• We can only add m=1 “virtual” example in our m-esitmate remedy.

– The “outlook” feature can takes only 3 values. So p=1/3.

– Re-estimate P(outlook|no) with the m-estimate

Dr.J.27
Dhar
• Non-probabilistic Classification Algorithm

Dr.J. Dhar 28
k-Nearest-Neighbor Classifiers
• Nearest-neighbor classifiers are based on learning by analogy, that
is, by comparing a given test tuple with training tuples that are similar
to it. The training tuples are described by n attributes. Each tuple
represents a point in an n-dimensional space. In this way, all of the
training tuples are stored in an n-dimensional pattern space.
• When given an unknown tuple, a k-nearest-neighbor classifier
searches the pattern space for the k training tuples that are closest to
the unknown tuple. These k training tuples are the k “nearest
neighbors” of the unknown tuple.

Dr.J. Dhar 29
K-NN Classifiers
• “Closeness” is defined in terms of a distance metric, such
as Euclidean distance. The Euclidean distance between
two points or tuples, say, X1 = (x11, x12, : : : , x1n) and X2
= (x21, x22, : : : , x2n), distance formula?
• Mini-Max normalization is perform to normalize all the
attribute in same range. Why?
• “But how can distance be computed for attributes that
not numeric, but categorical, such as color?”

Dr.J. Dhar 30
• “What about missing values?” In general, if the value
of a given attribute A is missing in tuple X1 and/or in
tuple X2, we assume the maximum possible difference.
• “How can I determine a good value for k, the number
of neighbors?” This can be determined experimentally.
Starting with k = 1, we use a test set to estimate the
error rate of the classifier. This process can be repeated
each time by incrementing k to allow for one more
neighbor.

Dr.J. Dhar 31
Comparing Classification and Prediction Methods
• Accuracy: The accuracy of a classifier refers to the ability to
correctly predict the class label of previously unseen data.
• Scalability: It refers to the ability to fit the classifier or
predictor efficiently given large data.
• Speed: This refers to the time required in generating and
using the given classifier or predictor.
• Robustness: This is the ability of the classifier to make correct
predictions given noisy data or data with missing values.

Dr.J. Dhar 32
Classifier Accuracy Measures

IF-THEN Rules Classification

Dr.J. Dhar 33
Evaluating the Accuracy of a Classifier or Predictor
 Holdout Method and • Ensemble Methods—Increasing
Random Subsampling the Accuracy
 k-fold cross-validation - Bagging - Boosting
• Bootstrap: The bootstrap method
samples the given training tuples
uniformly with replacement. That is,
each time a tuple is selected, it is
equally likely to be selected again and
readded to the training set.

Dr.J. Dhar 34
Few more algorithms
• We will discuss other machine learning (i.e. soft
computing) based classification algorithms after
introduction of soft computing (during 15/16
classes).

Dr.J. Dhar 35
Thank You

Dr.J. Dhar 36

Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Unit-Iv Data Classification: Data Warehousing and Data Mining
No ratings yet
Unit-Iv Data Classification: Data Warehousing and Data Mining
7 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Bayesian
No ratings yet
Bayesian
23 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Lecture 5-Naïve Bayes
No ratings yet
Lecture 5-Naïve Bayes
26 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
Bayesian Learning: Berrin Yanikoglu
No ratings yet
Bayesian Learning: Berrin Yanikoglu
64 pages
CS-DM Module-4
No ratings yet
CS-DM Module-4
22 pages
Lecture 7
No ratings yet
Lecture 7
15 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
Naïve Bayes Classifier: Ke Chen
No ratings yet
Naïve Bayes Classifier: Ke Chen
20 pages
UNIT-IV
No ratings yet
UNIT-IV
34 pages
Lecture - 4.2 - Continuous Data and Zero Frequency Problem in Naive Bayes Classifier
No ratings yet
Lecture - 4.2 - Continuous Data and Zero Frequency Problem in Naive Bayes Classifier
11 pages
9-Decision Tree Induction-23-01-2025
No ratings yet
9-Decision Tree Induction-23-01-2025
40 pages
Week 4 - Classification Alternative Techniques
No ratings yet
Week 4 - Classification Alternative Techniques
87 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
4_22865_IS465_2019_1__2_1_08ClassBasic
No ratings yet
4_22865_IS465_2019_1__2_1_08ClassBasic
43 pages
8 - Classification NaiveBayes PDF
No ratings yet
8 - Classification NaiveBayes PDF
13 pages
Naive Bayes
No ratings yet
Naive Bayes
37 pages
DWDM Unit 3 Part 2
No ratings yet
DWDM Unit 3 Part 2
8 pages
Bayes Classification Methods
No ratings yet
Bayes Classification Methods
22 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
Wk08
No ratings yet
Wk08
10 pages
Ml Module4 Classification
No ratings yet
Ml Module4 Classification
79 pages
Bayes Classifier PDF
100% (1)
Bayes Classifier PDF
18 pages
Naïve Bayes Classifier: Ke Chen
No ratings yet
Naïve Bayes Classifier: Ke Chen
19 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
23-Naive Bayes
No ratings yet
23-Naive Bayes
22 pages
5.classification and Prediction
No ratings yet
5.classification and Prediction
9 pages
Unit-4 DWDM
No ratings yet
Unit-4 DWDM
10 pages
Naïve Bayes Classifier: Dr. Hussain Dawood
No ratings yet
Naïve Bayes Classifier: Dr. Hussain Dawood
20 pages
Classification With NaiveBayes
No ratings yet
Classification With NaiveBayes
19 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Predicting The Missing Value by Bayesian Classification: Abstract
No ratings yet
Predicting The Missing Value by Bayesian Classification: Abstract
5 pages
6 Naive-Bayes
No ratings yet
6 Naive-Bayes
18 pages
Classification DMKD
No ratings yet
Classification DMKD
50 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
17 pages
Naive Bayes Classifier PDF
No ratings yet
Naive Bayes Classifier PDF
17 pages
Classification
No ratings yet
Classification
33 pages
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
No ratings yet
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
26 pages
Chapter_4 (2)
No ratings yet
Chapter_4 (2)
22 pages
Class Adv Classification IV
No ratings yet
Class Adv Classification IV
49 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
A5 PDF
No ratings yet
A5 PDF
9 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
unit 2 notes (1)
No ratings yet
unit 2 notes (1)
83 pages
UNIT-3
No ratings yet
UNIT-3
99 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
Baye's Rule and Its Use
No ratings yet
Baye's Rule and Its Use
9 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Exam Prep for:: Belgium Banking and Financial Market Handbook
From Everand
Exam Prep for:: Belgium Banking and Financial Market Handbook
Mzn Lnx
No ratings yet
Nursing Research 1 Notes
No ratings yet
Nursing Research 1 Notes
18 pages
PHD Admissions: Information Brochure
No ratings yet
PHD Admissions: Information Brochure
22 pages
PGNEPODDSEM21102024
No ratings yet
PGNEPODDSEM21102024
11 pages
All Courses
No ratings yet
All Courses
10 pages
Transdiciplinay Coaching With Tarot
No ratings yet
Transdiciplinay Coaching With Tarot
1 page
Index
No ratings yet
Index
4 pages
Assignment 2 Design
No ratings yet
Assignment 2 Design
2 pages
DIASS Quarter 2 Week 3 Activity Sheet
60% (5)
DIASS Quarter 2 Week 3 Activity Sheet
5 pages
Page I X - Evaluating ESL Writing Amidst Pandemic - Inference From Module Based
No ratings yet
Page I X - Evaluating ESL Writing Amidst Pandemic - Inference From Module Based
10 pages
Professional Learning Communities A Review of The - 3
No ratings yet
Professional Learning Communities A Review of The - 3
39 pages
Module 3-Lesson 4
0% (1)
Module 3-Lesson 4
6 pages
OB Course Outline, Admas
No ratings yet
OB Course Outline, Admas
4 pages
Attachment 1: Revisit The Infographic/s
No ratings yet
Attachment 1: Revisit The Infographic/s
11 pages
Hope 2 Week 1 DLL - 093928
100% (3)
Hope 2 Week 1 DLL - 093928
3 pages
Genmath 11 (Find Value of X) - Lesson Plan
No ratings yet
Genmath 11 (Find Value of X) - Lesson Plan
4 pages
Lesson Plan CONTEMPORARY
No ratings yet
Lesson Plan CONTEMPORARY
11 pages
2030 Nacional Strategy For Sustainable Development of Montenegro
No ratings yet
2030 Nacional Strategy For Sustainable Development of Montenegro
34 pages
DIASS - The Settings, Processes, Methods and Tools in Counseling
0% (1)
DIASS - The Settings, Processes, Methods and Tools in Counseling
28 pages
Department of Education: Republic of The Philippines
No ratings yet
Department of Education: Republic of The Philippines
2 pages
Sociology 11 Book 2
No ratings yet
Sociology 11 Book 2
36 pages
BEEd 2nd Sem Schedule
No ratings yet
BEEd 2nd Sem Schedule
9 pages
PR 2 Module 1
No ratings yet
PR 2 Module 1
27 pages
Technical Writning and Ai: Made By: Ravi Agrawal Sarthak Arora Jatin Kumar Nandini Dixit
No ratings yet
Technical Writning and Ai: Made By: Ravi Agrawal Sarthak Arora Jatin Kumar Nandini Dixit
8 pages
Case study-QUEZON CITY
No ratings yet
Case study-QUEZON CITY
11 pages
How Society Is Organized
No ratings yet
How Society Is Organized
21 pages
2021 IGCSE Mock TT Revised Version For April 5th Start
No ratings yet
2021 IGCSE Mock TT Revised Version For April 5th Start
1 page
Theory of Goal Attainment
No ratings yet
Theory of Goal Attainment
47 pages
APPRAISAL FORM (Basic Research Proposal)
No ratings yet
APPRAISAL FORM (Basic Research Proposal)
2 pages
Anaylsis
No ratings yet
Anaylsis
5 pages
SM0025 - Exercise Course Assessment Plan
No ratings yet
SM0025 - Exercise Course Assessment Plan
2 pages