Chapter 03
Chapter 03
Random Forests:
Overview: An ensemble of decision trees, each trained on different subsets of the data.
Nonlinearity: Combines multiple nonlinear trees to create a more robust model with
improved generalization ability.
Advantages:
Model Complex Relationships: Capable of capturing complex patterns in data.
High Accuracy: Often achieve higher accuracy compared to linear models, especially in
real-world applications with complex data.
Disadvantages:
Computationally Intensive: Training nonlinear models can be resource-intensive.
Risk of Overfitting: More prone to overfitting, especially if not properly regularized.
Decision Tree
• Decision Tree is a Supervised learning technique
• It can be used for both classification and Regression problems, but mostly it is preferred for
solving Classification problems.
• It is a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision
nodes are used to make any decision and have multiple branches, whereas Leaf nodes are
the output of those decisions and do not contain any further branches.
• In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it further split the
tree into subtrees.
Why use Decision Trees?
• Decision Trees usually mimic human thinking ability while making a decision.
• The logic of decision tree can be easily understood because it shows a tree-like structure.
Decision Tree Terminologies
Root Node: Decision tree starts from root node. It represents the entire dataset, which further
gets divided into two or more homogeneous sets.
Leaf Node: It’s final output node, the tree cannot be segregated further after getting a leaf
node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes are
called the child nodes.
How does the Decision Tree algorithm Work?
Step-1 Begin the tree with the root node, says
S, which contains the complete dataset.
Step-2 Find the best attribute in the dataset
using Attribute Selection Measure (ASM).
Step-3 Divide the S into subsets that contains
possible values for the best attributes.
Step-4 Generate the decision tree node, which
contains the best attribute.
Step-5 Recursively make new decision trees
using the subsets of the dataset created in step
-3. Continue this process until a stage is
reached where you cannot further classify the
nodes and called the final node as a leaf node.
Attribute Selection Measures:
• While implementing a Decision tree, the main issue arises that how to select the best
attribute for the root node and for sub-nodes.
• To solve such problems there is a technique which is called as Attribute selection measure
or ASM.
• Popular techniques for ASM, which are: Information Gain, Gini Index
1. Information Gain:
• Information gain is the measurement of changes in entropy after the segmentation of a
dataset based on an attribute.
• It calculates how much information a feature provides us about a class.
• According to the value of information gain, we split the node and build the decision tree.
• A decision tree algorithm always tries to use maximum value of information gain, and a
node/attribute having the highest information gain is split first.
It can be calculated using the below formula:
Information gain is a measure of this change in entropy.
• Suppose S is a set of instances(whole dataset),
• A is an attribute
• Sv(one feature) is the subset of S
• v represents an individual value that the attribute A can take and Values (A) is the set of all
possible values of A, then
Random Forest is a popular machine learning algorithm that belongs to the family of
ensemble learning methods.
• Random Forest is a tree-based ensemble learning algorithm used in machine learning
for classification and regression.
• It constructs multiple Decision Trees during training, each using a random subset of the
dataset.
• Each tree measures a random subset of features at each split, increasing variability and
reducing overfitting.
• Prediction is made by aggregating the results of all trees:
• Voting for classification tasks.
• Averaging for regression tasks.
• This ensemble approach leads to stable and precise results.
• Random Forests can handle complex data effectively and are widely used in various
applications for their reliability in predictions.
What are Ensemble Learning models?
• The collective strength of multiple models overcomes individual limitations, leading to
more robust predictions.
• Ensemble models are commonly used in classification and regression tasks.
• Popular ensemble models include:
• Bagging: Reduces variance by training multiple versions of a model.
• Random Forest: Builds multiple decision trees on random data subsets.
• Boosting: Sequentially improves models by focusing on errors (e.g., AdaBoost,
XGBoost, LightGBM).
• Voting: Combines predictions by taking a majority or average vote across models.
Bagging (Bootstrap Aggregating)
Goal: Reduce variance and avoid overfitting by combining predictions from multiple
models.
How it works:
• Creates multiple subsets of the training data by sampling with replacement.
• Trains a separate model on each subset (often using decision trees).
Aggregates predictions:
For regression: Takes the average of predictions.
For classification: Uses majority voting.
Example: Random Forest is a popular bagging method that combines many decision trees.
Boosting
Goal: Improve model accuracy by focusing on difficult-to-predict cases.
How it works:
• Trains models sequentially, with each new model correcting the errors of the previous
ones.
• Adjusts weights to emphasize data points that were misclassified earlier.
• Final prediction combines all models, often with weighted voting.
Example: AdaBoost and XGBoost are popular boosting methods that iteratively refine
predictions.
Both bagging and boosting aim to create a stronger overall model by combining the
strengths of individual models.
Algorithm for Random Forest Work:
AdaBoost: Focuses on improving errors by adjusting the weights of misclassified data points.
Gradient Boosting: Focuses on improving the model by reducing prediction errors through gradient
descent.