0% found this document useful (0 votes)
60 views

ML-3-Decision Tree

decision Tree of Machine Learning

Uploaded by

Sanchit Verma ji
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

ML-3-Decision Tree

decision Tree of Machine Learning

Uploaded by

Sanchit Verma ji
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

1|Page

KRISHNA ENGINEERING COLLEGE


(Approved by AICTE & Affiliated to Dr. APJ Abdul Kalam Technical University (Formerly UPTU), Lucknow)
Department of CSE-Artificial Intelligence
Department of CSE-Artificial Intelligence & Machine Learning

Machine Learning Techniques (KAI601)


Unit-1:
INTRODUCTION
Learning, Types of Learning, Well defined learning problems, Designing a learning System, History of ML,
Introduction of Machine Learning Approaches – (Artificial Neural Network, Clustering, Reinforcement Learning,
Decision Tree Learning, Bayesian networks, Support Vector Machine, Genetic Algorithm), Issues in Machine
Learning and Data Science Vs Machine Learning;
Unit-2:
REGRESSION: Linear Regression and Logistic Regression
BAYESIAN LEARNING - Bayes theorem, Concept learning, Bayes Optimal Classifier, Naïve Bayes classifier,
Bayesian belief networks, EM algorithm.
SUPPORT VECTOR MACHINE: Introduction, Types of support vector kernel – (Linear kernel, polynomial kernel,
and Gaussian kernel), Hyperplane – (Decision surface), Properties of SVM, and Issues in SVM.
Unit-3:
DECISION TREE LEARNING - Decision tree learning algorithm, Inductive bias, Inductive inference with decision
trees, Entropy and information theory, Information gain, ID-3 Algorithm, Issues in Decision tree learning.
INSTANCE-BASED LEARNING - k-Nearest Neighbour Learning, Locally Weighted Regression, Radial basis
function networks, Case-based learning.
Unit-4:
ARTIFICIAL NEURAL NETWORKS – Perceptron’s, Multilayer perceptron, Gradient descent and the Delta rule,
Multilayer networks, Derivation of Backpropagation Algorithm, Generalization, Unsupervised Learning – SOM
Algorithm and its variant;
DEEP LEARNING - Introduction, concept of convolutional neural network , Types of layers – (Convolutional Layers
, Activation function , pooling , fully connected) , Concept of Convolution (1D and 2D) layers, Training of network,
Case study of CNN for e.g. on Diabetic Retinopathy, Building a smart speaker, Self-deriving car etc.

Unit-5:
REINFORCEMENT LEARNING–Introduction to Reinforcement Learning, Learning Task, Example of
Reinforcement Learning in Practice, Learning Models for Reinforcement – (Markov Decision process , Q Learning -
Q Learning function, Q Learning Algorithm ), Application of Reinforcement Learning, Introduction to Deep Q
Learning.
GENETIC ALGORITHMS: Introduction, Components, GA cycle of reproduction, Crossover, Mutation, Genetic
Programming, Models of Evolution and Learning, Applications.

Books:

1. Tom M. Mitchell, ―Machine Learning, McGraw-Hill Education (India) Private Limited, 2013.
2. Ethem Alpaydin, ―Introduction to Machine Learning (Adaptive Computation and Machine Learning), MIT Press
3. Stephen Marsland, ―Machine learning: An Algorithmic Perspective, CRC Press, 2009.
4. Bishop, C., Pattern Recognition and Machine Learning. Berlin: Springer-Verlag.
5. M. Gopal, “Applied Machine Learning”, McGraw Hill Education
2|Page

Unit-3:
DECISION TREE LEARNING - Decision tree learning algorithm, Inductive bias, Inductive
inference with decision trees, Entropy and information theory, Information gain, ID-3 Algorithm,
Issues in Decision tree learning.

Decision tree

 A decision tree is a non-parametric supervised learning algorithm, which is utilized for both
classification and regression tasks.
 Decision tree learning is a method for approximating discrete-valued target functions, in
which the learned function is represented by a decision tree.
 Learned trees can also be re-represented as sets of if-then rules

Decision trees classify instances by sorting them down the tree from the root to some leaf node,
which provides the classification of the instance. Each node in the tree specifies a test of some
attribute of the instance, and each branch descending from that node corresponds to one of the
possible values for this attribute.

An instance is classified by starting at the root node of the tree, testing the attribute specified by
this node, then moving down the tree branch corresponding to the value of the attribute. This
process is then repeated for the subtree rooted at the new node.
3|Page

For example, the instance (Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong)
would be sorted down the left most branch of this decision tree and would therefore be classified
as a negative instance.

In general, decision trees represent a disjunction of conjunctions of constraints on the attribute


values of instances. For example, the decision tree shown above corresponds to the expression

(Outlook = Sunny ∧ Humidity = Normal)


∨ (Outlook = Overcast)
∨ (Outlook = Rain A Wind = Weak)

Appropriate problems for decision tree learning

 Decision tree learning is generally best suited to problems with the following characteristics:
 Instances are represented by attribute-value pairs
 The target function has discrete output values
 Disjunctive descriptions may be required
 The training data may contain missing attribute values

Decision tree learning algorithm

The basic idea behind any decision tree algorithm is as follows:


1. Select the best attribute using Attribute Selection Measures (ASM) to split the records.
2. Make that attribute a decision node and breaks the dataset into smaller subsets.
3. Start tree building by repeating this process recursively for each child until one of the
conditions will match:
 All the tuples belong to the same attribute value.
 There are no more remaining attributes.
 There are no more instances.
4|Page

Inductive bias, Inductive inference with decision trees

Given a collection of training examples, there are typically many decision trees consistent with
these examples.

Approximate inductive bias of ID3: Shorter trees are preferred over larger trees.

A closer approximation to the inductive bias of ID3: Shorter trees are preferred over longer trees.
Trees that place high information gain attributes close to the root are preferred over those that do
not.

Inductive inference refers to the process of generalizing knowledge from specific examples to make
probabilistic predictions about new, unseen instances.

Inductive Learning Algorithm (ILA) is an iterative and inductive machine learning algorithm that
is used for generating a set of classification rules, which produces rules of the form “IF-THEN”,
for a set of examples, producing rules at each iteration and appending to the set of rules.

Entropy in information theory

Entropy characterizes the (im)purity of an arbitrary collection of examples. Given a collection S,


containing positive and negative examples of some target concept, the entropy of S relative to this
Boolean classification is
Entropy(S)  - (p+) (log2p+) – (p-) (log2p-)

p+: is the proportion of positive examples in S


p- : is the proportion of negative examples in S.
In all calculations involving entropy we define 0log0 to be 0

Suppose S is a collection of 14 examples of some Boolean concept, including 9 positive and 5


negative examples. Then the entropy of S relative to this Boolean classification is
5|Page

Entropy ([9+, 5-])  - (9/14) log2 (9/14) – (5/14) log2 (5/14) = 0.940

Note:
 The entropy is 0 if all members of S belong to the same class
Entropy ([14+, 0-])  - (14/14) log2 (14/14) – (0/14) log2 (0/14) = 0

 The entropy is 1 when the collection contains an equal number of positive and negative
examples.
Entropy ([7+, 7-])  - (7/14) log2 (7/14) – (7/14) log2 (7/14) = 1

 If the collection contains unequal numbers of positive and negative examples, the entropy is
between 0 and 1.
 If the target attribute can take on c different values, then the entropy of S relative to this c-
wise classification is defined as

Entropy(S) = ∑𝑐𝑖=1 −𝑝𝑖 log 2 𝑝𝑖

Where pi is the proportion of S belonging to class i. If the target attribute can take on c possible
values, the entropy can be as large as log2c.

Information gain

Information gain is the decrease in entropy. Information gain computes the difference between
entropy before the split and average entropy after the split of the dataset based on given attribute
values. ID3 (Iterative Dichotomiser) decision tree algorithm uses information gain.

The information gain, Gain(S, A) of an attribute A, relative to a collection of examples S, is defined


as

where Values(A) is the set of all possible values for attribute A, and Sv, is the subset of S for which
attribute A has value v.

The information gain due to sorting the original 14 examples by the attribute Wind may then be
calculated as
6|Page

Information gain is precisely the measure used by ID3 to select the best attribute at each step in
growing the tree.

Answer: Humidity

ID-3 Algorithm

ID3 stands for Iterative Dichotomiser 3 and is named such because the algorithm iteratively
(repeatedly) dichotomizes (divides) features into two or more groups at each step.

Invented by Ross Quinlan, ID3 uses a top-down greedy approach to build a decision tree.

Our basic algorithm, ID3, learns decision trees by constructing them top down, beginning with the
question "which attribute should be tested at the root of the tree?" To answer this question, each
7|Page

instance attribute is evaluated using a statistical test to determine how well it alone classifies the
training examples.
The best attribute is selected and used as the test at the root node of the tree. A descendant of the
root node is then created for each possible value of this attribute, and the training examples are
sorted to the appropriate descendant node (i.e., down the branch corresponding to the example's
value for this attribute).
The entire process is then repeated using the training examples associated with each descendant
node to select the best attribute to test at that point in the tree

Steps of the ID3 Algorithm

1. Calculate the Information Gain of each feature.


2. Considering that all rows don’t belong to the same class, split the dataset S into subsets
using the feature for which the Information Gain is maximum.
3. Make a decision tree node using the feature with the maximum Information gain.
4. If all rows belong to the same class, make the current node as a leaf node with the class as
its label.
5. Repeat for the remaining features until we run out of all features, or the decision tree has
all leaf nodes.

An Illustrative Example

Refer to the solution discussed in the class.


8|Page

Issues in Decision tree learning

1. Avoiding Overfitting the Data

Overfitting is a significant practical difficulty for decision tree learning and many other learning
methods. Random noise in the training examples can lead to overfitting.

There are several approaches to avoiding overfitting in decision tree learning. These can be grouped
into two classes:
 Pre-pruning – we can stop growing the tree earlier, which means we can prune/remove/cut a
node if it has low importance while growing the tree.
 Post-pruning – once our tree is built to its depth, we can start pruning the nodes based on their
significance. Approaches that allow the tree to overfit the data, and then post-prune the tree.

Another approach is often referred to as a training and validation set approach. Even though the
learner may be misled by random errors and coincidental regularities within the training set, the
validation set provide a safety check against overfitting.

2. Incorporating Continuous-Valued Attributes

Our initial definition of ID3 is restricted to attributes that take on a discrete set of values. The
continuous-valued decision attributes can be incorporated into the learned tree.
This can be accomplished by dynamically defining new discrete valued attributes that partition the
continuous attribute value into a discrete set of intervals.
In particular, for an attribute A that is continuous-valued, the algorithm can dynamically create a
new Boolean attribute A, that is true if A < c and false otherwise.

3. Alternative Measures for Selecting Attributes


The GainRatio measure is defined in terms of the earlier Gain measure, as well as this
SplitInformation, as follows

where

Where S1 through Sc are the c subsets of examples resulting from partitioning S by the c-valued
attribute A.
A variety of other selection measures have been proposed as well. However, in his experimental
domains the choice of attribute selection measure appears to have a smaller impact on final
accuracy than does the extent and method of post-pruning
9|Page

4. Handling Training Examples with Missing Attribute Values

In certain cases, the available data may be missing values for some attributes. One strategy for
dealing with the missing attribute value is to assign it the value that is most common among training
examples at node n.

5. Handling Attributes with Differing Costs

In some learning tasks the instance attributes may have associated costs. These attributes vary
significantly in their costs. We would prefer decision trees that use low-cost attributes.
ID3 can be modified to take into account attribute costs by introducing a cost term into the attribute
selection measure. For example, we might divide the Gain by the cost of the attribute, so that lower-
cost attributes would be preferred.
While such cost-sensitive measures do not guarantee finding an optimal cost-sensitive decision
tree, they do bias the search in favor of low-cost attributes.

Reference
https://www.datacamp.com/tutorial/decision-tree-classification-python
https://www.kdnuggets.com/2020/01/decision-tree-algorithm-explained.html
https://scikit-learn.org/stable/modules/tree.html
10 | P a g e

EXAMPLE:
11 | P a g e
12 | P a g e
13 | P a g e
14 | P a g e
15 | P a g e
16 | P a g e
17 | P a g e

You might also like