0% found this document useful (0 votes)
144 views

DMW Module 3

This document discusses classification and prediction models. It begins by introducing classification and prediction, explaining that they are forms of data analysis used to extract patterns from data and predict future trends. Classification predicts categorical labels while prediction forecasts continuous values. The document then covers issues that can impact classification and prediction accuracy, such as noisy or missing data, irrelevant attributes, and data transformation and reduction techniques.

Uploaded by

Rahul S.Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
144 views

DMW Module 3

This document discusses classification and prediction models. It begins by introducing classification and prediction, explaining that they are forms of data analysis used to extract patterns from data and predict future trends. Classification predicts categorical labels while prediction forecasts continuous values. The document then covers issues that can impact classification and prediction accuracy, such as noisy or missing data, irrelevant attributes, and data transformation and reduction techniques.

Uploaded by

Rahul S.Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 112

4

Data Mining & Warehousing

MODULE 3

CS 402 Data Mining & Warehousing February 27, 20201 / 112


Outline:

1 Classification Models
Introduction to Classification and Prediction
Issues regarding classification and prediction
Decision Tree- ID3
Tree Pruning
Decision Tree- C4.5
Naive Bayes Classifier

CS 402 Data Mining & Warehousing February 27, 20202 / 112


Classification Models Introduction to Classification and Prediction

Classification and PredictionI


Classification and prediction are two forms of data analysis
● They can be used to extract models describing important data classes or
to predict future data trends
● Classification and prediction have numerous applications, including
fraud detection, target marketing, performance prediction,
manufacturing, and medical diagnosis.

CS 402 Data Mining & Warehousing February 27, 20203 / 112


Classification Models Introduction to Classification and Prediction

Classification and
PredictionII
Classification
● Classification is the process of predicting the class of given data
points.
Classes are sometimes called as targets/ labels or categories.
● Examples:
● A bank loans officer needs analysis of her data in order to learn which
loan applicants are: “safe” and which are “risky”
● A marketing manager at AllElectronics needs data analysis to help guess
whether a customer with a given profile will buy a new computer: ”Yes”
or ”No”
● A medical researcher wants to analyze breast cancer data in order to
predict which one of three specific treatments a patient should receive:
“treatment A,” “treatment B”, or “treatment C
● In each of these examples, the data analysis task is classification, where
a model or classifier is constructed to predict categorical labels

CS 402 Data Mining & Warehousing February 27, 20204 / 112


Classification Models Introduction to Classification and Prediction

Classification and PredictionIII


Classification (contd...)
● Classification process can also be viewed as the learning of a mapping
or function, y = f ( X ) , that can predict the associated class label y of a
given tuple X .
● Da ta classification is a two-step process

Step 1: Building the Classifier or Model
● Step 2: Using Classifier for Classification

CS 402 Data Mining & Warehousing February 27, 20205 / 112


Classification Models Introduction to Classification and Prediction

Classification and
PredictionIV
Classification (contd...)
● A classifier
1 Building is built or
the Classifier describing
Model a predetermined set of data classes or
concepts.
This step is the learning step or the training phase.
● Builds the classifier by analyzing or “learning from” a training set made
up of database tuples and their associated class labels.
Training tuples are used for training.
● A tuple, X , is represented by an n-dimensional attribute vector,
X =( x 1, x 2, ..., x n)
● X depicting n measurements made on the tuple from n database
attributes, respectively, A 1 , A 2 , ..., A n
● X is assumed to belong to a predefined class as determined by another
database attribute called the class label attribute, Y.
● In the context of classification, data tuples can be referred to as
samples, examples, instances, data points, or objects

CS 402 Data Mining & Warehousing February 27, 20206 / 112


Classification Models Introduction to Classification and Prediction

Classification and PredictionV


Classification (contd...)
1 Building the Classifier or Model
● The learning process of classifier is supervised, where class label of each
training tuple is provided or known.
The training data are analyzed by a classification algorithm. Then
● the learned model or classifier is represented in the form of
classification rules.
2 Classification:
Test data are used to estimate the accuracy of the classification rules.
● If the accuracy is considered acceptable, the rules can be applied to the
classification of new data tuples.
● A test set is used, made up of test tuples and their associated class
labels.
● These tuples are randomly selected from the general data set.

CS 402 Data Mining & Warehousing February 27, 20207 / 112


Classification Models Introduction to Classification and Prediction

Classification and
PredictionVI

Figure:Training

CS 402 Data Mining & Warehousing February 27, 20208 / 112


Classification Models Introduction to Classification and Prediction

Classification and
PredictionVII

Figure:Classification

CS 402 Data Mining & Warehousing February 27, 20209 / 112


Classification Models Introduction to Classification and Prediction

Classification and
PredictionVIII
Prediction
● Predicts a continuous-valued function, or ordered value, as opposed
to a categorical label
● The goal of prediction is to forecast or deduce the value of an
attribute based on values of other attributes.
Focus on numeric prediction
● Regression analysis is a statistical methodology that is most often
used for numeric prediction, hence the two terms are often used
synonymously
● Examples
● Marketing manager would like to predict how much money a given
customer will spend during a sale at AllElectronics
Predicting age of a person
● Predicting whether stock price of a company will increase tomorrow

CS 402 Data Mining & Warehousing


Classification Models Introduction to Classification and Prediction

Classification and
PredictionIX

Prediction (contd...)
● Prediction can also be viewed as a mapping or function, y =f ( X ,
where X is the input tuple. and the output y is a continuous or
ordered value
● Data prediction is a two-step process, similar to that of data
classification
Training: Training set used to train the predictor model.
● Prediction: Predicts a numerical, continuous-values (ordered)

CS 402 Data Mining & Warehousing


Classification Models Issues regarding classification and prediction

Issues regarding classification and


predictionI
● Describes issues regarding preprocessing the data for classification
and prediction.
● Data cleaning: for Noisy and Missing Values
● Preprocessing of data in order to remove or reduce noise(smoothing )
and the treatment of missing values(fill with probable values)
● Most classification algorithms have some mechanisms for handling noisy
or missing data, this step can help reduce confusion during learning.
● Relevance analysis: detecting redundant and irrelevant attribute
which do not contribute to the classification or prediction task
Many of the attributes in the data may be redundant.
● Correlation analysis can be used to identify whether any two given
attributes are statistically related.
● Ex: Strong correlation between A and 1 A would
2 suggest that one of
the two could be removed from further analysis.
● A database may also contain irrelevant attributes

CS 402 Data Mining & Warehousing


Classification Models Issues regarding classification and prediction

Issues regarding classification and


predictionII
● Attribute subset selection can be used in these cases to find a reduced set
of attributes
● Data transformation and reduction: prevent attributes with initially
large ranges (like, say, income) from outweighing attributes and
reduce the data size
● Normalization involves scaling all values for a given attribute so that
they fall within a small specified range, such as −1.0 to 1.0, or 0.0 to
1.0
● Generalizing it to higher-level concepts: use concept hierarchies, useful
for continuous-valued attributes.
● Ex: income can be generalized to discrete ranges, such as low,
medium, and high.
● Generalization compresses the original training data, fewer input/
output operations may be involved during learning
● Data reduction: wavelet transformation and principle components
analysis, discretization techniques, such as binning, histogram analysis,
and clustering

CS 402 Data Mining & Warehousing


Classification Models Issues regarding classification and prediction

Issues regarding classification and


Comparing Classification and Prediction Methods
predictionIII
● Accuracy:
● Classifier: ability to correctly predict the class label of new or previously
unseen data
● Predictor: refers to how well a given predictor can guess the value of
the predicted attribute for new or previously unseen data
● Speed:Computational costs involved in generating and using the
given classifier or predictor
● Robustness:ability of the classifier or predictor to make correct
predictions given noisy data or data with missing values

CS 402 Data Mining & Warehousing


Classification Models Issues regarding classification and prediction

Issues regarding classification and


Comparing Classification and Prediction Methods
predictionIV
● Robustness:ability of the classifier or predictor to make correct
predictions given noisy data or data with missing values
● Scalability:ability to construct the classifier or predictor efficiently
given large amounts of data.
● Interpretability:
● level of understanding and insight that is provided by the classifier or
predictor.
● Interpretability is subjective and therefore more difficult to assess.

CS 402 Data Mining & Warehousing


Classification Models Issues regarding classification and prediction

Issues regarding classification and


predictionV

Classification Methods
Decision Tree Induction - ID3, C4.5 - M3
● Naive Bayes Classifier - M3
Rule based classification- 1R - M4
● Classification using Neural Networks-Back
propagation - M4
Support Vector Machine(SVM Classifier) - M4
● Lazy Learners-K Nearest Neighbor Classifier -M4

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionI
Example
Consider the following situation: Somebody is hunting for a job.
● At the very beginning, he decides that he will consider only those jobs
for which the monthly salary is at least Rs.50,000.
● Our job hunter does not like spending much time traveling to place of
work. He is comfortable only if the commuting time is less than one
hour.
● Also, he expects the company to arrange for a free coffee every
morning!
● The decisions to be made before deciding to accept or reject a job
offer can be schematically represented as in Figure below.
● This figure represents a decision tree

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionII

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree InductionIII


● The term “tree” refers to the concept of a tree in graph theory in
mathematics
● A decision tree is a graph-theoretical tree, all terminology related to
graph-theoretical trees can be applied to describe decision trees also.
Nodes or vertices shown as ellipses are called the leaf nodes.
● All other nodes, except the root node, are called the internal nodes

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionIV
Two
● types of decision
Classification trees
trees: Tree models where the target variable can take a
discrete set of values are called classification trees. In these tree
structures, leaves represent class labels and branches represent
conjunctions of features that lead to those class labels.
Regression trees: Decision trees where the target variable can take
● continuous values (real numbers) like the price of a house, or a
patient’s length of stay in a hospital, are called regression trees.

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionV
Classification
● Consider trees: an given
the data Example
in Table below which specify the features of
certain vertebrates and the class to which they belong.
● For each species, four features have been identified: “gives birth”,
”aquatic animal”, “aerial animal” and “has legs”.
● There are five class labels, namely, “amphibian”, “bird”, “fish”,
“mammal” and “reptile”.
● The problem is how to use this data to identify the class of a newly
discovered vertebrate.

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionVI

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionVII

Construction of the tree: Step 1


● Split the dataset into disjoint subsets according to the values of the
feature “gives birth”
● There are only two possible values for this feature, we have only two
subsets
● Subset 1: Consists of those examples for which the value of “gives
birth” is “yes”
Subset 2: for which the value is “no”
● Tree in figure represents the classification tree at after step 1

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionVIII

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionIX

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionX

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXI
Construction
● We split of the examples
these tree: Step based
2 on the values of the feature “aquatic
animal”
There are three possible values for this feature
● Two appear in Table 8.2, Accordingly, we need consider only two
subsets : Table 8.4, 8.5
● Table 8.4: contains only one example and hence no further splitting is
required. It leads to the assignment of the class label “fish”.
● Table 8.5: need to be split into subsets based on the values of “aerial
animal”. Subset 1: value as ”yes”, Subset 2: values as ”no”
● It can be seen that these subsets immediately lead to unambiguous
assignment of class labels: The value of “no” leads to “mammal” and
the value “yes” leads to ”bird”.

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXII

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXIII

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXIV
Construction ofSplit
Table 8.3: the tree:
basedStep
on 3the values of the feature “aquatic animal”;
● Split into three subsets; Table 8.6 - ”yes”, Table 8.7 - ”No”, Table
8.8 - ”semi”
Now Split resulting subsets based on the values of “has legs”, etc.
● Finally we get the classification tree

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXV

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXVI

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXVII
Classification tree in rule format

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXVIII

Some Remarks on Classification Tree

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXIX
● Elements of a classification tree
● Nodes represents features in the data-set, Branches in the tree are
identified by the values of features, The leaf nodes identified by are the
class labels.
Order in which the features are selected

● The tree depends on the order in which the features are selected, There is
no theoretical justification for this choice.
Stopping criteria

● There will be a large number of features each feature having several
possible values results complex classification trees
● Criteria 1 : All (or nearly all) of the examples at the node have the
same class.
● Criteria 2 : There are no remaining features to distinguish among the
examples.
● Criteria 3 :The tree has grown to a predefined size limit.

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree InductionXX


Feature selection measures
? Which which attribute is to be to placed at the root or as internal nodes If
a dataset consists of n attributes: it is complicated problem
Random selection may give us bad results with low accuracy
● Solution : Use feature selection measures - assign numerical values to
the various features such that the values reflect the relative importance
of the various features
● Two of the popular feature selection measures are information gain
and Gini index.

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXXI
Entropy
● The degree to which a subset of examples contains only a single class is
known as purity
Any subset composed of only a single class is called a pure class
Entropy is a measure of “impurity ” in a dataset.
If there are only two possible classes, entropy values can range
● from 0
to
For1n classes, entropy ranges from 0 to log (n). 2
Minimum entropy value - sample is completely homogeneous
● Maximum value - data are as diverse as possible

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree InductionXXII


Entropy
Consider a segment S of a dataset having c number of class labels
Let p ibe the proportion of examples in S having the ith class label.
● The entropy of S is defined as:

Special case: If S has only two class labels, say, “yes” and “no”. If p
● is the proportion label “yes” then the proportion of label “no” will be
1 − p. Here entropy of Sis given by:

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXXIII

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXXIV
Entropy of Table 8.1

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXXV

Entropy of Table 8.1

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXXVI

CS 402 Data Mining & Warehousing


Classification Models

Decision Tree- ID3

Descision Tree InductionXXVII

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
Entropy of Table 8.2
InductionXXVIII

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXXIX
Entropy of Table 8.3

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree InductionXXX


Information gain
Let S be a set of examples, A be a feature (or, an attribute)
S vbe the subset of S with A = v.
● V alues(A) be the set of all possible values of A.
● Then the information gain of an attribute A relative to the set S,
denoted by Gain (S, A):

● where |S| denotes the number of elements in S.

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXXXI

Consider data in table 8.1


● ǁSǁ =10, Entropy(S) =2.2464

CS 402 Data Mining & Warehousing


Classification Models

Decision Tree- ID3

Descision Tree InductionXXXII

CS 402 Data Mining & Warehousing


Classification Models

Decision Tree- ID3

Descision Tree InductionXXXIII

CS 402 Data Mining & Warehousing


Classification Models

Decision Tree- ID3

Descision Tree InductionXXXIV

CS 402 Data Mining & Warehousing


Classification Models

Decision Tree- ID3

Descision Tree InductionXXXV

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXXXVI

● Task: Find Computations of Gain(S, aerialanimal) and


Gain(S, haslegs)

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXXXVII
Problem:Use ID3 algorithm to construct a decision tree for the data in
Table below

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXXXVIII
Step 1
Create a root node for the tree
● Total no of features = 4
Decide which feature is to be
● placed at the root node.
Calculate the information gains corresponding to each of the four
features.
Calculation of Entropy (S)
● Calculation of Gain (S, outlook), Gain (S, temperature), Gain (S,
humidity) and Gain (S, wind)

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXXXIX

Calculation of Gain (S, outlook), Gain (S, temperature)

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXL

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXLI

Calculation of Gain (S, temperature)

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXLII

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXLIII

Calculation of Gain (S, humidity) and Gain (S, wind)

Step 2

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree InductionXLIV


● Find the highest information gain which is the maximum among
Gain(S, outlook), Gain(S, temperature), Gain(S, humidity) and
Gain(S, wind).

● Therefore, we place “outlook” at the root node.

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree InductionXLV


Step 3

● Follow the steps for Node 1 (S (1) S outlook=sunny ), |S ( 1 ) | =5

● Find Entropy(S (1) ), Gain(S (1) , temperature), Gain(S (1)


, humidity),
Gain(S (1) , wind)

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXLVI

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXLVII

CS 402 Data Mining & Warehousing


Classification Models

Decision Tree- ID3

Descision Tree InductionXLVIII

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionXLIX

Step 4
● Find maximum of Gain(S (1) , temperature), Gain(S (1) , humidity),
(
Gain(S 1) , wind) = humidity
● Place humidity at node 1, split it into branches acording to the values
of humidity.

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionL

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree InductionLI


Step 5
Node 4 have the same class label “no” (Play Tennis= no)
Node 5 have the same class label “yes” (Play Tennis= Yes)
● So we represent Node 4 as a leaf node with value “no” and Node 5 as a
leaf node with value “yes”.
● Node 2 have the same class label “yes”. So we convert Node 2 as a
leaf node with value “ yes.
● For Node 3: S ( 3 ) =S outlook=rain , Maximum gain = Gain(S (3) ,
humidity)
● The branches resulting from splitting this node 3 corresponding to the
values “high” and “normal” of “humidity” lead to leaf nodes with class
labels “no” and ”yes”

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionLII

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree InductionLIII


The Algorithm
● The algorithm uses information gain to select the most useful
attribute for classification.
● Assume that there are only two class labels, namely, ” + ” and ”
− ”.

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionLIV

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionLV

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionLVI

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionLVII

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionLVIII

Advantages of ID3 prediction rules are created from the training data.
Understandable
● Builds the fastest tree.
Builds a short tree.
● Only need to test
● enough attributes until
all data is classified.
Finding leaf
Disadvantages of nodes
ID3 enables test data to be pruned, reducing number of
tests.
Data may be over-fitted or overclassified, if a small sample is tested.
● Only one attribute at a time is tested for making a decision.

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree InductionLIX


● Classifying continuous data may be computationally expensive, as
many trees must be generated to see where to break the continuum
Tree Pruning
● When a decision tree is built, many of the branches will reflect anomalies in
the training data due to noise or outliers.
This will lead to the problem of overfitting
● Pruning is a technique that reduces the size of decision trees by removing
sections of the tree that provide little power to classify instances.
● Tree pruning methods helps to remove the least reliable branches in the
decision tree.

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionLX

Tree Pruning
● Pruned trees tend to be smaller and less complex and, thus, easier to
comprehend.
● Pruned trees are usually faster and better at correctly classifying independent
test data (i.e., of previously unseen tuples) than unpruned trees. There are two
● common approaches to tree pruning: prepruning and postpruning.

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionLXI

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionLXII
Prepruning
A tree is “pruned” by halting its construction early
● Apply pruning earlier, that is, before it reaches the point where it perfectly
classifies the training data.
● This is done by deciding not to further split or partition the subset of
training tuples at a given node
● Upon halting, the node becomes a leaf.

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionLXIII
Prepruning
● The leaf may hold the most frequent class among the subset tuples or the
probability distribution of those tuples.
● Usually the goodness of a split is assessed using measures such as statistical
significance, information gain, Gini index etc.
● Here, The partitioning of the given subset is halted when the split falls
below a prespecified threshold

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- ID3

Descision Tree
InductionLXIV

Postpruning
Allow the tree to overfit the data, and then post-prune the tree. It
removes subtrees from a “fully grown” tree.
● A subtree at a given node is pruned by removing its branches and replacing it
with a leaf.
● The leaf is labeled with the most frequent class among the subtree being
replaced

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- C4.5

Decision Tree- C4.5I


C4.5 uses an extension to information gain known as gain ratio
● It applies a kind of normalization to information gain using a “split
information” value defined analogously with Info(D) as:

● The gain ratio is defined as

Example:Computation of gain ratio for the attribute income


● A test on income splits the data of into three partitions, namely low,
medium, and high, containing four, six, and four tuples, respectively
● To compute the gain ratio of income:
CS 402 Data Mining & Warehousing
Classification Models Decision Tree- C4.5

Decision Tree-
C4.5II

We have Gain(income) = 0.029


● Therefore, GainRatio(income) = 0.029/0.926 = 0.031

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- C4.5

Decision Tree-
C4.5III
The C4.5 algorithm improves ID3 in the following ways:
1Handling both continuous and discrete attributes - In order to handle
continuous attributes, C4.5 creates a threshold and then splits the list
into those whose attribute value is above the threshold and those that
are less than or equal to it.
2Handling training data with missing attribute values - C4.5 allows
attribute values to be marked as ? for missing. Missing attribute
values are simply not used in gain and entropy calculations.
3Handling attributes with differing costs.
4Pruning trees after creation - C4.5 goes back through the tree once it’s
been created and attempts to remove branches that do not help by
replacing them with leaf nodes.

CS 402 Data Mining & Warehousing


Classification Models Decision Tree- C4.5

Decision Tree-
C4.5IV

ID3 v/s C4.5


ID3 C4.5
Splitting Criteria Information Gain Gain Ratio
Attribute type supported Handles on Categorical Attributes Both Categorical and Numerical
Missing Values Do not handle missing values Handles missing values
Pruning No pruning is Done Error based pruning is used
Outlier Detection Susceptible to outliers Susceptible to Outliers

CS 402 Data Mining & Warehousing


Classification Models Naive Bayes Classifier

Naive Bayes
ClassifierI
● Predict class membership probabilities, such as the probability that a
given tuple belongs to a particular class.
● Assumption 1: All the features are independent and are unrelated to
each other.
● ie., Presence or absence of a feature does not influence the presence or
absence of any other feature.
● Assumption 2: The data has class-conditional independence: ie., the
effect of an attribute value on a given class is independent of the
values of the other attributes
● It is made to simplify the computations involved true in many real
world problems, in this sense, is considered “naive.”

CS 402 Data Mining & Warehousing


Classification Models Naive Bayes Classifier

Naive Bayes ClassifierII


Basic Idea
● Consider a training data set consisting of N examples having n
features.
Let the features be named as (F1 , ..., Fn ).
● A feature vector is of the form (f 1 , f 2 , ..., f n )
Let the set of class labels be {c , 1c ,2..., c }
● Let X be the test instance havingp the feature vector
X =(x 1 , x 2 , ..., x n ).

CS 402 Data Mining & Warehousing


Classification Models Naive Bayes Classifier

Naive Bayes ClassifierIII


Basic Idea
● To find the most appropriate class label that should be assigned to the
test instance X . Compute the following conditional probabilities, and
choose the maximum among them P (c |1X ) , P (c2 | X ) , ..., P (c | X )
p
● Let the maximum probability be P (c |i X ) . Then, we choose c as
i
the most appropriate class label for feature vector X.
● The direct computation of the probabilities are difficult for a number of
reasons. The Bayes’ theorem is used

CS 402 Data Mining & Warehousing


Classification Models Naive Bayes Classifier

Naive Bayes ClassifierIV


Bayes’ theorem: Computation of probabilities
● Using Bayes’ theorem, we have:

● Since , the data has class-conditional independence, we note that the


events ”x 1 | c k ”, ”x 2 | c k ”, ..., ”x n | ck ” are independent

CS 402 Data Mining & Warehousing


Classification Models Naive Bayes Classifier

Naive Bayes ClassifierV


Bayes’ theorem: Computation of probabilities
● Since the denominator P ( X ) is independent of the class labels, we
have:

● It is enough to find the maximum among the following values:


P (ck| X ) =P ( x 1| c )k P ( x2 | c k) ...P ( xn | c )k P (ck), where k =
1, ...,
p

Algorithm: Naive Bayes

CS 402 Data Mining & Warehousing


Classification Models Naive Bayes Classifier

Naive Bayes
ClassifierVI

Figure:Step 1, 2 : Training, Other Steps: Testing

CS 402 Data Mining & Warehousing


Classification Models Naive Bayes Classifier

Naive Bayes
ClassifierVII

Example 1: Naive Bayes

CS 402 Data Mining & Warehousing


Classification Models Naive Bayes Classifier

Naive Bayes
ClassifierVIII

Figure:Use naive Bayes algorithm to classify a particular species if its features are
(Slow, Rarely, No)

CS 402 Data Mining & Warehousing


Classification Models Naive Bayes Classifier

Naive Bayes ClassifierIX


The features are F1 =”Swim”, F2 =”Fly”, F3 =”Crawl”.
● The class labels are c1 =”Animal”, c2 =”Bird”, c3 =”Fish”
● The test instance x 1 = ”Slow”, x 2 = ”Rarely”, x 3 = ”No”
● Construct frequency table:

CS 402 Data Mining & Warehousing


Classification Models Naive Bayes Classifier

Naive Bayes ClassifierX


● Step 1: Compute following probabilities

CS 402 Data Mining & Warehousing


Classification Models Naive Bayes Classifier

Naive Bayes ClassifierXI

● Step 2: Construct the table of conditional probabilities:

● Conditional probabilities are calculated as follows

CS 402 Data Mining & Warehousing


Classification Models Naive Bayes Classifier

Naive Bayes
ClassifierXII

● Step 3: Compute products qk

CS 402 Data Mining & Warehousing


Classification Models Naive Bayes Classifier

Naive Bayes
ClassifierXIII

CS 402 Data Mining & Warehousing


Classification Models Naive Bayes Classifier

Naive Bayes
ClassifierXIV

CS 402 Data Mining & Warehousing


Classification Models Naive Bayes Classifier

Naive Bayes ClassifierXV


Example 2: Naive Bayes Classfier

CS 402 Data Mining & Warehousing


Classification Models Naive Bayes Classifier

Naive Bayes
ClassifierXVI

CS 402 Data Mining & Warehousing February 27, 2020100 / 112


Classification Models Naive Bayes Classifier

Naive Bayes
ClassifierXVII
Features: age, income, student, and credit rating
● Class labels: c1 =yes, c2 = no

Test instance:

● Step 1: Compute following probabilities

CS 402 Data Mining & Warehousing February 27, 2020101 / 112


Classification Models Naive Bayes Classifier

Naive Bayes ClassifierXVIII


● Step 2: Calculate Conditional probabilities

CS 402 Data Mining & Warehousing February 27, 2020102 / 112


Classification Models Naive Bayes Classifier

Naive Bayes ClassifierXIX


● Step 3, 4, 5: Compute products P ( X | buyscomputer = yes), P ( X |
buyscomputer = no), q1, q2, maxq1, q2, predictclass

CS 402 Data Mining & Warehousing February 27, 2020103 / 112


Classification Models Naive Bayes Classifier

Naive Bayes
ClassifierXX
Question 3:

CS 402 Data Mining & Warehousing February 27, 2020104 / 112


Classification Models Naive Bayes Classifier

Naive Bayes
For Numerical or continuous valued attributes
ClassifierXXI
● Find mean an variance of each numerical attribute for each class label

CS 402 Data Mining & Warehousing February 27, 2020105 / 112


Classification Models Naive Bayes Classifier

Naive Bayes
ClassifierXXII
For Numerical or continuous valued attributes
1.Calculate class probabilities: P (male), P (f e mal e)
2.For given sample, Calculate Conditional probabilities: P (x k | c i ) as
follows
:

3. Where µ and σ are the the mean and standard deviation of attribute
A k , respectively.
4. Find P ( X | c 1 ), P ( X | c 2 ), q1, q2, max{q1, q2} to predict the class

CS 402 Data Mining & Warehousing February 27, 2020106 / 112


Classification Models Naive Bayes Classifier

Naive Bayes
ClassifierXXIII
Q3 Answer

CS 402 Data Mining & Warehousing February 27, 2020107 / 112


Classification Models Naive Bayes Classifier

Naive Bayes ClassifierXXIV


Question 4: Use naive Bayes algorithm to determine whether a red domestic
SUV car is a stolen car or not using the following data (Ans: No)

CS 402 Data Mining & Warehousing February 27, 2020108 / 112


Classification Models Naive Bayes Classifier

Naive Bayes ClassifierXXV


Zero frequency problem in Naive Bayes Classifier
● We calculate P ( X | C i ), as the product of the probabilities
P ( x1 | C i) , P ( x2 | C i), ..., P ( xn | C i)
● Suppose if count of any x k is zero; ie., there are no training tuples
representing x k for the class C i
Which results in P ( x k| C )i =0
● Since it appears in the product, the entire term becomes zero ie.,
P ( X | Ci ) = 0
● ie., A zero probability cancels the effects of all of the other
probabilities (on C i ) involved in the product.
● But, even though, without the zero probability, we may have ended up
with a high probability, suggesting that X belonged to class C i

CS 402 Data Mining & Warehousing February 27, 2020109 / 112


Classification Models Naive Bayes Classifier

Naive Bayes
ClassifierXXVI
● Solution: For a large training database, D, adding one to each count
that we need would only make a negli- gible difference in the
estimated probability value
Yet would conveniently avoid the case of probability values of zero
● This technique for probability estimation is known as the Laplacian
correction or Laplace estimator
Example:
● Suppose that for the class buys computer = yes in some training
database, D, containing 1,000 tuples
● We have 0 tuples with income = low, 990 tuples with income =
medium, and 10 tuples with income = high
● The probabilities of these events are 0, 0.990 (= 999/1000), and
0.010 (=10/1000)

CS 402 Data Mining & Warehousing February 27, 2020110 / 112


Classification Models Naive Bayes Classifier

Naive Bayes
ClassifierXXVII

● Using the Laplacian correction for the three quantities, we pretend


that we have 1 more tuple for each income-value pair.
● ie;, count(low)=1, count(medium)=991, count(high)=11, Therefore,
Total tuples = 1003.
● Now we can calculate the new probabilities as follows:

● The “corrected ” probability estimates are close to their “uncorrected


” counterparts, yet the zero probability value is avoided

CS 402 Data Mining & Warehousing February 27, 2020111 / 112


Classification Models Naive Bayes Classifier

CS 402 Data Mining & Warehousing February 27, 2020112 / 112

You might also like