DMW Module 3
DMW Module 3
MODULE 3
1 Classification Models
Introduction to Classification and Prediction
Issues regarding classification and prediction
Decision Tree- ID3
Tree Pruning
Decision Tree- C4.5
Naive Bayes Classifier
Classification and
PredictionII
Classification
● Classification is the process of predicting the class of given data
points.
Classes are sometimes called as targets/ labels or categories.
● Examples:
● A bank loans officer needs analysis of her data in order to learn which
loan applicants are: “safe” and which are “risky”
● A marketing manager at AllElectronics needs data analysis to help guess
whether a customer with a given profile will buy a new computer: ”Yes”
or ”No”
● A medical researcher wants to analyze breast cancer data in order to
predict which one of three specific treatments a patient should receive:
“treatment A,” “treatment B”, or “treatment C
● In each of these examples, the data analysis task is classification, where
a model or classifier is constructed to predict categorical labels
Classification and
PredictionIV
Classification (contd...)
● A classifier
1 Building is built or
the Classifier describing
Model a predetermined set of data classes or
concepts.
This step is the learning step or the training phase.
● Builds the classifier by analyzing or “learning from” a training set made
up of database tuples and their associated class labels.
Training tuples are used for training.
● A tuple, X , is represented by an n-dimensional attribute vector,
X =( x 1, x 2, ..., x n)
● X depicting n measurements made on the tuple from n database
attributes, respectively, A 1 , A 2 , ..., A n
● X is assumed to belong to a predefined class as determined by another
database attribute called the class label attribute, Y.
● In the context of classification, data tuples can be referred to as
samples, examples, instances, data points, or objects
Classification and
PredictionVI
Figure:Training
Classification and
PredictionVII
Figure:Classification
Classification and
PredictionVIII
Prediction
● Predicts a continuous-valued function, or ordered value, as opposed
to a categorical label
● The goal of prediction is to forecast or deduce the value of an
attribute based on values of other attributes.
Focus on numeric prediction
● Regression analysis is a statistical methodology that is most often
used for numeric prediction, hence the two terms are often used
synonymously
● Examples
● Marketing manager would like to predict how much money a given
customer will spend during a sale at AllElectronics
Predicting age of a person
● Predicting whether stock price of a company will increase tomorrow
Classification and
PredictionIX
Prediction (contd...)
● Prediction can also be viewed as a mapping or function, y =f ( X ,
where X is the input tuple. and the output y is a continuous or
ordered value
● Data prediction is a two-step process, similar to that of data
classification
Training: Training set used to train the predictor model.
● Prediction: Predicts a numerical, continuous-values (ordered)
Classification Methods
Decision Tree Induction - ID3, C4.5 - M3
● Naive Bayes Classifier - M3
Rule based classification- 1R - M4
● Classification using Neural Networks-Back
propagation - M4
Support Vector Machine(SVM Classifier) - M4
● Lazy Learners-K Nearest Neighbor Classifier -M4
Descision Tree
InductionI
Example
Consider the following situation: Somebody is hunting for a job.
● At the very beginning, he decides that he will consider only those jobs
for which the monthly salary is at least Rs.50,000.
● Our job hunter does not like spending much time traveling to place of
work. He is comfortable only if the commuting time is less than one
hour.
● Also, he expects the company to arrange for a free coffee every
morning!
● The decisions to be made before deciding to accept or reject a job
offer can be schematically represented as in Figure below.
● This figure represents a decision tree
Descision Tree
InductionII
Descision Tree
InductionIV
Two
● types of decision
Classification trees
trees: Tree models where the target variable can take a
discrete set of values are called classification trees. In these tree
structures, leaves represent class labels and branches represent
conjunctions of features that lead to those class labels.
Regression trees: Decision trees where the target variable can take
● continuous values (real numbers) like the price of a house, or a
patient’s length of stay in a hospital, are called regression trees.
Descision Tree
InductionV
Classification
● Consider trees: an given
the data Example
in Table below which specify the features of
certain vertebrates and the class to which they belong.
● For each species, four features have been identified: “gives birth”,
”aquatic animal”, “aerial animal” and “has legs”.
● There are five class labels, namely, “amphibian”, “bird”, “fish”,
“mammal” and “reptile”.
● The problem is how to use this data to identify the class of a newly
discovered vertebrate.
Descision Tree
InductionVI
Descision Tree
InductionVII
Descision Tree
InductionVIII
Descision Tree
InductionIX
Descision Tree
InductionX
Descision Tree
InductionXI
Construction
● We split of the examples
these tree: Step based
2 on the values of the feature “aquatic
animal”
There are three possible values for this feature
● Two appear in Table 8.2, Accordingly, we need consider only two
subsets : Table 8.4, 8.5
● Table 8.4: contains only one example and hence no further splitting is
required. It leads to the assignment of the class label “fish”.
● Table 8.5: need to be split into subsets based on the values of “aerial
animal”. Subset 1: value as ”yes”, Subset 2: values as ”no”
● It can be seen that these subsets immediately lead to unambiguous
assignment of class labels: The value of “no” leads to “mammal” and
the value “yes” leads to ”bird”.
Descision Tree
InductionXII
Descision Tree
InductionXIII
Descision Tree
InductionXIV
Construction ofSplit
Table 8.3: the tree:
basedStep
on 3the values of the feature “aquatic animal”;
● Split into three subsets; Table 8.6 - ”yes”, Table 8.7 - ”No”, Table
8.8 - ”semi”
Now Split resulting subsets based on the values of “has legs”, etc.
● Finally we get the classification tree
Descision Tree
InductionXV
Descision Tree
InductionXVI
Descision Tree
InductionXVII
Classification tree in rule format
Descision Tree
InductionXVIII
Descision Tree
InductionXIX
● Elements of a classification tree
● Nodes represents features in the data-set, Branches in the tree are
identified by the values of features, The leaf nodes identified by are the
class labels.
Order in which the features are selected
●
● The tree depends on the order in which the features are selected, There is
no theoretical justification for this choice.
Stopping criteria
●
● There will be a large number of features each feature having several
possible values results complex classification trees
● Criteria 1 : All (or nearly all) of the examples at the node have the
same class.
● Criteria 2 : There are no remaining features to distinguish among the
examples.
● Criteria 3 :The tree has grown to a predefined size limit.
Descision Tree
InductionXXI
Entropy
● The degree to which a subset of examples contains only a single class is
known as purity
Any subset composed of only a single class is called a pure class
Entropy is a measure of “impurity ” in a dataset.
If there are only two possible classes, entropy values can range
● from 0
to
For1n classes, entropy ranges from 0 to log (n). 2
Minimum entropy value - sample is completely homogeneous
● Maximum value - data are as diverse as possible
Special case: If S has only two class labels, say, “yes” and “no”. If p
● is the proportion label “yes” then the proportion of label “no” will be
1 − p. Here entropy of Sis given by:
Descision Tree
InductionXXIII
Descision Tree
InductionXXIV
Entropy of Table 8.1
Descision Tree
InductionXXV
Descision Tree
InductionXXVI
Descision Tree
Entropy of Table 8.2
InductionXXVIII
Descision Tree
InductionXXIX
Entropy of Table 8.3
Descision Tree
InductionXXXI
Descision Tree
InductionXXXVI
Descision Tree
InductionXXXVII
Problem:Use ID3 algorithm to construct a decision tree for the data in
Table below
Descision Tree
InductionXXXVIII
Step 1
Create a root node for the tree
● Total no of features = 4
Decide which feature is to be
● placed at the root node.
Calculate the information gains corresponding to each of the four
features.
Calculation of Entropy (S)
● Calculation of Gain (S, outlook), Gain (S, temperature), Gain (S,
humidity) and Gain (S, wind)
Descision Tree
InductionXXXIX
Descision Tree
InductionXL
Descision Tree
InductionXLI
Descision Tree
InductionXLII
Descision Tree
InductionXLIII
Step 2
Descision Tree
InductionXLVI
Descision Tree
InductionXLVII
Descision Tree
InductionXLIX
Step 4
● Find maximum of Gain(S (1) , temperature), Gain(S (1) , humidity),
(
Gain(S 1) , wind) = humidity
● Place humidity at node 1, split it into branches acording to the values
of humidity.
Descision Tree
InductionL
Descision Tree
InductionLII
Descision Tree
InductionLIV
Descision Tree
InductionLV
Descision Tree
InductionLVI
Descision Tree
InductionLVII
Descision Tree
InductionLVIII
Advantages of ID3 prediction rules are created from the training data.
Understandable
● Builds the fastest tree.
Builds a short tree.
● Only need to test
● enough attributes until
all data is classified.
Finding leaf
Disadvantages of nodes
ID3 enables test data to be pruned, reducing number of
tests.
Data may be over-fitted or overclassified, if a small sample is tested.
● Only one attribute at a time is tested for making a decision.
Descision Tree
InductionLX
Tree Pruning
● Pruned trees tend to be smaller and less complex and, thus, easier to
comprehend.
● Pruned trees are usually faster and better at correctly classifying independent
test data (i.e., of previously unseen tuples) than unpruned trees. There are two
● common approaches to tree pruning: prepruning and postpruning.
Descision Tree
InductionLXI
Descision Tree
InductionLXII
Prepruning
A tree is “pruned” by halting its construction early
● Apply pruning earlier, that is, before it reaches the point where it perfectly
classifies the training data.
● This is done by deciding not to further split or partition the subset of
training tuples at a given node
● Upon halting, the node becomes a leaf.
Descision Tree
InductionLXIII
Prepruning
● The leaf may hold the most frequent class among the subset tuples or the
probability distribution of those tuples.
● Usually the goodness of a split is assessed using measures such as statistical
significance, information gain, Gini index etc.
● Here, The partitioning of the given subset is halted when the split falls
below a prespecified threshold
Descision Tree
InductionLXIV
Postpruning
Allow the tree to overfit the data, and then post-prune the tree. It
removes subtrees from a “fully grown” tree.
● A subtree at a given node is pruned by removing its branches and replacing it
with a leaf.
● The leaf is labeled with the most frequent class among the subtree being
replaced
Decision Tree-
C4.5II
Decision Tree-
C4.5III
The C4.5 algorithm improves ID3 in the following ways:
1Handling both continuous and discrete attributes - In order to handle
continuous attributes, C4.5 creates a threshold and then splits the list
into those whose attribute value is above the threshold and those that
are less than or equal to it.
2Handling training data with missing attribute values - C4.5 allows
attribute values to be marked as ? for missing. Missing attribute
values are simply not used in gain and entropy calculations.
3Handling attributes with differing costs.
4Pruning trees after creation - C4.5 goes back through the tree once it’s
been created and attempts to remove branches that do not help by
replacing them with leaf nodes.
Decision Tree-
C4.5IV
Naive Bayes
ClassifierI
● Predict class membership probabilities, such as the probability that a
given tuple belongs to a particular class.
● Assumption 1: All the features are independent and are unrelated to
each other.
● ie., Presence or absence of a feature does not influence the presence or
absence of any other feature.
● Assumption 2: The data has class-conditional independence: ie., the
effect of an attribute value on a given class is independent of the
values of the other attributes
● It is made to simplify the computations involved true in many real
world problems, in this sense, is considered “naive.”
Naive Bayes
ClassifierVI
Naive Bayes
ClassifierVII
Naive Bayes
ClassifierVIII
Figure:Use naive Bayes algorithm to classify a particular species if its features are
(Slow, Rarely, No)
Naive Bayes
ClassifierXII
Naive Bayes
ClassifierXIII
Naive Bayes
ClassifierXIV
Naive Bayes
ClassifierXVI
Naive Bayes
ClassifierXVII
Features: age, income, student, and credit rating
● Class labels: c1 =yes, c2 = no
●
Test instance:
Naive Bayes
ClassifierXX
Question 3:
Naive Bayes
For Numerical or continuous valued attributes
ClassifierXXI
● Find mean an variance of each numerical attribute for each class label
Naive Bayes
ClassifierXXII
For Numerical or continuous valued attributes
1.Calculate class probabilities: P (male), P (f e mal e)
2.For given sample, Calculate Conditional probabilities: P (x k | c i ) as
follows
:
3. Where µ and σ are the the mean and standard deviation of attribute
A k , respectively.
4. Find P ( X | c 1 ), P ( X | c 2 ), q1, q2, max{q1, q2} to predict the class
Naive Bayes
ClassifierXXIII
Q3 Answer
Naive Bayes
ClassifierXXVI
● Solution: For a large training database, D, adding one to each count
that we need would only make a negli- gible difference in the
estimated probability value
Yet would conveniently avoid the case of probability values of zero
● This technique for probability estimation is known as the Laplacian
correction or Laplace estimator
Example:
● Suppose that for the class buys computer = yes in some training
database, D, containing 1,000 tuples
● We have 0 tuples with income = low, 990 tuples with income =
medium, and 10 tuples with income = high
● The probabilities of these events are 0, 0.990 (= 999/1000), and
0.010 (=10/1000)
Naive Bayes
ClassifierXXVII