Chapter 03

Uploaded by

suleymanabdu0931

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Chapter 03

Uploaded by

suleymanabdu0931

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Prepared by

Prince Thomas M.E., PhD

Associate Professor
Chapter 3 Supervised Learning: Nonlinear Models
K-Nearest Neighbors (K-NN)
- Distance Metrics (Euclidean, Manhattan)
- K-Value Selection and Cross-Validation
Neural Networks and Multilayer Perceptrons (MLPs)
- Structure of Neural Networks (Input, Hidden, and Output Layers)
- Activation Functions (ReLU, Sigmoid, Softmax)
- Backpropagation and Gradient Descent in Neural Networks
Decision Trees
- Splitting Criteria: Gini Index, Entropy, and Information Gain
- Overfitting in Decision Trees and Pruning Boosting Techniques (e.g., AdaBoost)
Random Forests (Introduction to Ensembles) - Concept of Weak Learners
- Boosting Algorithms: AdaBoost, Gradient
- Bagging and Random Subspace Sampling
Boosting
Stacking and Voting Methods
- Model Combination Techniques
- Hard vs. Soft Voting
Nonlinear Models in Machine Learning
- Nonlinear models capture relationships between inputs and outputs that are not linear.
- They can model complex patterns in data that linear models cannot.
Key Characteristics of Nonlinear Models:
Flexibility: Capable of modeling complex, nonlinear relationships.
Complex Patterns: Can fit data with intricate patterns and interactions between variables.
Non-Linear Boundaries: Able to create decision boundaries that are not straight lines, which
is crucial for solving complex classification problems.
Examples of Nonlinear Models:
K-Nearest Neighbors (K-NN):
Overview: Classifies a data point based on the classification of its nearest neighbors.
Nonlinearity: The decision boundary can be highly nonlinear depending on the distribution of
training data and the value of K.
Neural Networks:
Overview: Consist of layers of neurons that can learn complex representations of data.
Nonlinearity: Layers with nonlinear activation functions (like ReLU, Sigmoid) allow modeling
very complex relationships.
Decision Trees:
Overview: Splits the data based on feature values to make predictions.
Nonlinearity: Creates complex, piecewise constant decision boundaries that adapt to
intricate data patterns.

Random Forests:
Overview: An ensemble of decision trees, each trained on different subsets of the data.
Nonlinearity: Combines multiple nonlinear trees to create a more robust model with
improved generalization ability.

Advantages:
Model Complex Relationships: Capable of capturing complex patterns in data.
High Accuracy: Often achieve higher accuracy compared to linear models, especially in
real-world applications with complex data.

Disadvantages:
Computationally Intensive: Training nonlinear models can be resource-intensive.
Risk of Overfitting: More prone to overfitting, especially if not properly regularized.
Decision Tree
• Decision Tree is a Supervised learning technique
• It can be used for both classification and Regression problems, but mostly it is preferred for
solving Classification problems.
• It is a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision
nodes are used to make any decision and have multiple branches, whereas Leaf nodes are
the output of those decisions and do not contain any further branches.
• In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it further split the
tree into subtrees.
Why use Decision Trees?
• Decision Trees usually mimic human thinking ability while making a decision.
• The logic of decision tree can be easily understood because it shows a tree-like structure.
Decision Tree Terminologies
Root Node: Decision tree starts from root node. It represents the entire dataset, which further
gets divided into two or more homogeneous sets.
Leaf Node: It’s final output node, the tree cannot be segregated further after getting a leaf
node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes are
called the child nodes.
How does the Decision Tree algorithm Work?
Step-1 Begin the tree with the root node, says
S, which contains the complete dataset.
Step-2 Find the best attribute in the dataset
using Attribute Selection Measure (ASM).
Step-3 Divide the S into subsets that contains
possible values for the best attributes.
Step-4 Generate the decision tree node, which
contains the best attribute.
Step-5 Recursively make new decision trees
using the subsets of the dataset created in step
-3. Continue this process until a stage is
reached where you cannot further classify the
nodes and called the final node as a leaf node.
Attribute Selection Measures:
• While implementing a Decision tree, the main issue arises that how to select the best
attribute for the root node and for sub-nodes.
• To solve such problems there is a technique which is called as Attribute selection measure
or ASM.
• Popular techniques for ASM, which are: Information Gain, Gini Index
1. Information Gain:
• Information gain is the measurement of changes in entropy after the segmentation of a
dataset based on an attribute.
• It calculates how much information a feature provides us about a class.
• According to the value of information gain, we split the node and build the decision tree.
• A decision tree algorithm always tries to use maximum value of information gain, and a
node/attribute having the highest information gain is split first.
It can be calculated using the below formula:
Information gain is a measure of this change in entropy.
• Suppose S is a set of instances(whole dataset),
• A is an attribute
• Sv(one feature) is the subset of S
• v represents an individual value that the attribute A can take and Values (A) is the set of all
possible values of A, then

Entropy: Entropy is a metric to measure the impurity in a given attribute. It speciﬁes

randomness in data, It specifies how much information available in the data. Entropy can be
calculated as:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
Where,
S= Total number of samples, P(yes)= probability of yes, P(no)= probability of no
Example for Entropy calculation dataset example
2. Gini Index:
• Gini index is a measure of impurity or purity used while creating a decision tree in the
CART(Classification and Regression Tree) algorithm.
• An attribute with low Gini index should be preferred as compared to the high Gini index.
because Low Gini index indicates less impurity, leading to better decision tree splits.
• It only creates binary splits, and the CART algorithm uses the Gini index to create binary
splits.
• Gini index can be calculated using the below formula:
Gini Index= 1- ∑jPj2
Pj: The probability of an object being classified into the jth class.
∑jPj^2 The sum of the squared probabilities for all classes.
Range: The Gini Index ranges from 0 to 1.
0 Perfect equality (all classes have equal probability).
1 Perfect inequality (one class has a probability of 1, all others 0).
Higher Gini Index: Indicates greater inequality or impurity in the dataset.
Lower Gini Index: Indicates less inequality or higher purity.
Overfitting in Decision Trees
Definition: Overfitting happens when the model captures too much detail from the
training data, including noise and outliers.
Impact: Leads to poor generalization, meaning the model performs well on training data
but poorly on unseen data.
Cause: Often results from a highly complex tree with many nodes and branches, trying to
perfectly fit the training data.
Consequence: High risk of making incorrect predictions on new data due to the overly
specific patterns learned from the training data.
Controlling Overfitting Through Pruning
Pruning helps reduce complexity by removing branches that don’t contribute significantly to
model accuracy.
1. Pre-pruning (Early Stopping):
• Stops tree growth early based on certain criteria, preventing it from becoming overly
complex.
Common parameters:
Max depth: Limits the depth of the tree.
Minimum samples per leaf: Sets a minimum number of samples for each leaf node.
Minimum samples to split: Specifies the minimum samples required to split a node.
Maximum leaf nodes: Limits the total number of leaf nodes.
Pros: Faster, reduces complexity upfront.
Cons: Risk of underfitting if the tree stops growing too early.
2. Post-pruning (Cost Complexity Pruning):
• Prunes the tree after it has fully grown by removing branches to simplify it.
Techniques:
• Reduced Error Pruning: Removes branches if it doesn’t worsen accuracy on a validation
set.
• Cost Complexity Pruning: Adds a penalty for each node to balance accuracy and
complexity, tuned with ccp_alpha in libraries like scikit-learn.
Pros: Allows exploration of deeper patterns before simplifying.
Cons: Computationally intensive and requires careful parameter selection.
Additional Tips for Controlling Overfitting
Cross-Validation: Apply cross-validation to fine-tune pruning parameters and other
hyperparameters, achieving a balance between bias and variance.
Ensemble Methods: Use methods like Random Forests or Gradient Boosted Trees, which
combine multiple trees to reduce overfitting through averaging.
Advantages of the Decision Tree
• It is simple to understand as it follows the same process which a human follow while
making any decision in real-life.
• It can be very useful for solving decision-related problems.
• It helps to think about all the possible outcomes for a problem.
• There is less requirement of data cleaning compared to other algorithms.
Disadvantages of the Decision Tree
• The decision tree contains lots of layers, which makes it complex.
• It may have an overfitting issue, which can be resolved using the Random Forest
algorithm.
• For more class labels, the computational complexity of the decision tree may increase.
Random Forest Algorithm
What is Random Forest Algortihm
Random Forest = Decision Tree + Column Sampling/Row Sampling

Random Forest is a popular machine learning algorithm that belongs to the family of
ensemble learning methods.
• Random Forest is a tree-based ensemble learning algorithm used in machine learning
for classification and regression.
• It constructs multiple Decision Trees during training, each using a random subset of the
dataset.
• Each tree measures a random subset of features at each split, increasing variability and
reducing overfitting.
• Prediction is made by aggregating the results of all trees:
• Voting for classification tasks.
• Averaging for regression tasks.
• This ensemble approach leads to stable and precise results.
• Random Forests can handle complex data effectively and are widely used in various
applications for their reliability in predictions.
What are Ensemble Learning models?
• The collective strength of multiple models overcomes individual limitations, leading to
more robust predictions.
• Ensemble models are commonly used in classification and regression tasks.
• Popular ensemble models include:
• Bagging: Reduces variance by training multiple versions of a model.
• Random Forest: Builds multiple decision trees on random data subsets.
• Boosting: Sequentially improves models by focusing on errors (e.g., AdaBoost,
XGBoost, LightGBM).
• Voting: Combines predictions by taking a majority or average vote across models.
Bagging (Bootstrap Aggregating)
Goal: Reduce variance and avoid overfitting by combining predictions from multiple
models.
How it works:
• Creates multiple subsets of the training data by sampling with replacement.
• Trains a separate model on each subset (often using decision trees).
Aggregates predictions:
For regression: Takes the average of predictions.
For classification: Uses majority voting.
Example: Random Forest is a popular bagging method that combines many decision trees.
Boosting
Goal: Improve model accuracy by focusing on difficult-to-predict cases.
How it works:
• Trains models sequentially, with each new model correcting the errors of the previous
ones.
• Adjusts weights to emphasize data points that were misclassified earlier.
• Final prediction combines all models, often with weighted voting.
Example: AdaBoost and XGBoost are popular boosting methods that iteratively refine
predictions.

Both bagging and boosting aim to create a stronger overall model by combining the
strengths of individual models.
Algorithm for Random Forest Work:

Step 1 Select random K data points from the training set.

Step 2 Build the decision trees associated with the selected data points(Subsets).
Step 3 Choose the number N for decision trees that you want to build.
Step 4 Repeat Step 1 and 2.
Step 5 For new data points, find the predictions of each decision tree, and assign the new
data points to the category that wins the majority votes.
What is a Weak Learner?
• A Weak Learner is a model that performs just slightly better than random guessing on a
given problem.
• In binary classification, a weak learner has an accuracy of just above 50% (i.e., better than
chance).
• For regression, it performs only marginally better than guessing the average value.
Characteristics of Weak Learners
• Simple models: Often, weak learners are simple models, such as small decision trees
(stumps with one or two splits) or simple linear models.
• High bias: Weak learners typically have limited complexity, so they’re biased and may
underfit the data if used alone.
• Low predictive power individually: On their own, weak learners may not capture all
patterns or relationships in the data.
Why Use Weak Learners?
• Combining Weak Learners in Ensembles: While a weak learner on its own is not powerful,
combining many weak learners can lead to a strong model.
• In Boosting, each weak learner corrects the errors of the previous ones, resulting in a
progressively better model.
• In Bagging (like Random Forest), the weak learners are trained independently, and their
predictions are averaged or voted upon, reducing variance.
Efficiency: Weak learners are computationally simpler and faster to train, making them suitable
for use in large ensemble methods where many learners are needed.
Controlled overfitting: Because weak learners are limited in complexity, they can help keep the
ensemble model from overfitting, especially in Boosting methods.
Examples of Weak Learners
• Decision Stumps: Decision trees with only one or two splits.
• Shallow Trees: Decision trees with low depth, typically limited to a few levels.
• Simple Linear Models: Models that only capture linear relationships without complex
transformations.
Boosting Technique:
1. AdaBoost (Adaptive Boosting)
How it works:
• AdaBoost builds models sequentially, where each new model focuses on the mistakes
made by the previous one.
• After each model, misclassified data points are given more weight, so the next model
will focus more on those points.
• The final prediction is made by combining the results from all models, with more weight
given to models that performed better.
Key idea: AdaBoost adjusts itself based on what it learns from the errors of earlier models.
Common use: It works well for both classification and regression tasks.
Strength: AdaBoost is simple and effective, but it can be sensitive to noisy data.
2. Gradient Boosting
• Like AdaBoost, Gradient Boosting also builds models sequentially, but with a key difference: each
new model is trained to predict the residual errors (the difference between the actual and predicted
values) of the previous models.
• Each model tries to minimize a loss function (such as mean squared error) by making small
corrections to the previous models' predictions.
• The predictions of all models are combined, usually by weighted summing.
Key idea: It’s focuses on correcting errors by directly improving the predictions in small steps.
Common use: It’s widely used for both classification and regression, when accuracy is a priority.
Strength: It is powerful and flexible but can be prone to overfitting if not tuned properly.

AdaBoost: Focuses on improving errors by adjusting the weights of misclassiﬁed data points.
Gradient Boosting: Focuses on improving the model by reducing prediction errors through gradient
descent.

Projectreport Diabetes Prediction
No ratings yet
Projectreport Diabetes Prediction
25 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
chapter 04
No ratings yet
chapter 04
48 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Unit 4
No ratings yet
Unit 4
33 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Module 4 Lecture -2
No ratings yet
Module 4 Lecture -2
65 pages
Supervised Learning Algorithm DT
No ratings yet
Supervised Learning Algorithm DT
15 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
FMLanswerkey-IT 2.docx (1) (1) (1)
No ratings yet
FMLanswerkey-IT 2.docx (1) (1) (1)
11 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
DECSION TREE
No ratings yet
DECSION TREE
6 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Ml Unit 2 Final_iii Yr
No ratings yet
Ml Unit 2 Final_iii Yr
72 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
ml unit3
No ratings yet
ml unit3
8 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
NOTES
No ratings yet
NOTES
18 pages
Decision Tree
No ratings yet
Decision Tree
68 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
AIML Removed Merged
No ratings yet
AIML Removed Merged
31 pages
AIML Removed
No ratings yet
AIML Removed
25 pages
ML (Interview)
No ratings yet
ML (Interview)
20 pages
5 Learning
No ratings yet
5 Learning
7 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Day48 Decision Trees
No ratings yet
Day48 Decision Trees
5 pages
Refer For KNNDecison Tree SVM
No ratings yet
Refer For KNNDecison Tree SVM
90 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Big Data Notes
No ratings yet
Big Data Notes
33 pages
Lesson 7 Supervised Method (Decision Trees) Algorithms
No ratings yet
Lesson 7 Supervised Method (Decision Trees) Algorithms
12 pages
Unit-3 Introduction To Machine Learning Algorithms
No ratings yet
Unit-3 Introduction To Machine Learning Algorithms
18 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
5 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
Tree
No ratings yet
Tree
7 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
dm unit 4
No ratings yet
dm unit 4
24 pages
ML Unit 4
No ratings yet
ML Unit 4
47 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Decision Trees
No ratings yet
Decision Trees
21 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
EST Cheatsheet
No ratings yet
EST Cheatsheet
5 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
6 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
13 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Dl
No ratings yet
Dl
10 pages
AI - Mod 5. Part 2
No ratings yet
AI - Mod 5. Part 2
40 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
AI-driven Critical Parameter Optimization of Sustainable Self-Compacting Geopolymer Concrete-2024
No ratings yet
AI-driven Critical Parameter Optimization of Sustainable Self-Compacting Geopolymer Concrete-2024
20 pages
Unit - 3 ML
No ratings yet
Unit - 3 ML
17 pages
AJCE XGBOOST Abhilash
No ratings yet
AJCE XGBOOST Abhilash
18 pages
Adaboost Solutions
No ratings yet
Adaboost Solutions
6 pages
Customer_Churn_Prediction_employing_Ensemble_Learning
No ratings yet
Customer_Churn_Prediction_employing_Ensemble_Learning
5 pages
Customer_Churn_Prediction_Using_Machine_Learning_Algorithms
No ratings yet
Customer_Churn_Prediction_Using_Machine_Learning_Algorithms
6 pages
Prediction of Asteroid Diameter With The Help of Multi-Layer Perceptron Regressor
No ratings yet
Prediction of Asteroid Diameter With The Help of Multi-Layer Perceptron Regressor
5 pages
An Introduction To Statistical Learning From A Reg PDF
No ratings yet
An Introduction To Statistical Learning From A Reg PDF
25 pages
When Machine Learning Meets Hardware Cybersecurity Delving Into Accurate Zero-Day Malware Detection
No ratings yet
When Machine Learning Meets Hardware Cybersecurity Delving Into Accurate Zero-Day Malware Detection
6 pages
(Ebook) Machine Learning in Action by Peter Harrington ISBN 9781617290183, 1617290181 download
100% (1)
(Ebook) Machine Learning in Action by Peter Harrington ISBN 9781617290183, 1617290181 download
49 pages
Naïve Bayes-DecisionTrees-RandomForest-SVM
No ratings yet
Naïve Bayes-DecisionTrees-RandomForest-SVM
26 pages
ITB1 Documentation Detection of Phishing Website Using ML
No ratings yet
ITB1 Documentation Detection of Phishing Website Using ML
49 pages
DSA5102_lecture3
No ratings yet
DSA5102_lecture3
34 pages
Android Based Malware Detection Technique Using Machine Learning Algorithms
No ratings yet
Android Based Malware Detection Technique Using Machine Learning Algorithms
6 pages
Gradient Boosting: November 2020
100% (1)
Gradient Boosting: November 2020
7 pages
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
No ratings yet
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
12 pages
unit 5 ML
No ratings yet
unit 5 ML
14 pages
Machine Learning and IoT – Based Predictive Maintenance Approach for Industrial Applications
No ratings yet
Machine Learning and IoT – Based Predictive Maintenance Approach for Industrial Applications
12 pages
Ensemble Methods in Machine Learning: X X X X X X
No ratings yet
Ensemble Methods in Machine Learning: X X X X X X
15 pages
Numero 59 Art 13
No ratings yet
Numero 59 Art 13
16 pages
ML Mod-4
No ratings yet
ML Mod-4
30 pages
Overview of Adaboost: Reconciling Its Views To Better Understand Its Dynamics
No ratings yet
Overview of Adaboost: Reconciling Its Views To Better Understand Its Dynamics
39 pages
Section 2 - Introduction To Machine Learning-Bje Edits - Ipynb - Colab
No ratings yet
Section 2 - Introduction To Machine Learning-Bje Edits - Ipynb - Colab
7 pages
REPORT - DRONE AND IMPROVED HUMAN DETECTION IN SEA USING PI PICO New
No ratings yet
REPORT - DRONE AND IMPROVED HUMAN DETECTION IN SEA USING PI PICO New
52 pages
Predicting Stock Market Trends
No ratings yet
Predicting Stock Market Trends
15 pages
Physica A: Feng Shen, Xingchao Zhao, Zhiyong Li, Ke Li, Zhiyi Meng
No ratings yet
Physica A: Feng Shen, Xingchao Zhao, Zhiyong Li, Ke Li, Zhiyi Meng
17 pages
Saikiran
No ratings yet
Saikiran
28 pages
Water Body Extraction From Landsat ETM+ Imagery
No ratings yet
Water Body Extraction From Landsat ETM+ Imagery
4 pages
Combining Classifiers: Outline
No ratings yet
Combining Classifiers: Outline
15 pages