0% found this document useful (0 votes)
17 views20 pages

Present

This document provides an overview of decision trees and random forests machine learning algorithms. It defines decision trees as algorithms that recursively partition data into subsets based on feature values at each node, and random forests as an ensemble method that trains multiple decision trees on random subsets of data and features. The document outlines key concepts like information gain, gini impurity, bagging, and the random subspace method. It discusses applications in domains like healthcare, finance, and marketing. Challenges and ethical considerations of these algorithms are also presented.

Uploaded by

ayushkukreja30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views20 pages

Present

This document provides an overview of decision trees and random forests machine learning algorithms. It defines decision trees as algorithms that recursively partition data into subsets based on feature values at each node, and random forests as an ensemble method that trains multiple decision trees on random subsets of data and features. The document outlines key concepts like information gain, gini impurity, bagging, and the random subspace method. It discusses applications in domains like healthcare, finance, and marketing. Challenges and ethical considerations of these algorithms are also presented.

Uploaded by

ayushkukreja30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Decision Tree

and
Random Forest

Presenters: Atul Jaguri, Ayush Kukreja


Daivik Mohan
Table of Content:
1. Mathematical formulation of algorithm.
1.1 Explain its signification in today's technology driven world.
1.2 Real time applications.
2. Challenges and ethical considerations in data collection an usage.
3. Real-world applications future trends.
4. Evaluation metrics.
5. Model deployment.
6. Problem solving by using the given algorithm.
7. References.
Introduction
Decision Tree:
•Input: Training dataset D = {(X1​, y1​), (X2​, y2​), … , (XN​, yN​)}
•Algorithm: Recursive partitioning based on features, with splitting
criteria (e.g., Gini impurity or entropy).
•Prediction: Traverse tree to reach a leaf node for class prediction.

Random Forest (Ensemble of Decision Trees):


•Train multiple decision trees on random subsets of data and features.
•Aggregate predictions (classification: majority vote, regression:
average).
Definition
A decision tree is a supervised machine learning algorithm used for both
classification and regression tasks. It works by recursively partitioning the
dataset into subsets based on the most significant attribute at each node of the
tree. The goal is to create a tree that makes accurate predictions on unseen data.

Who to loan?
• Not a student
• 45 years old
• Medium income
• Fair credit record
 Yes

• Student
• 27 years old
• Low income
• Excellent credit record
 No
Decision Tree Learning
Entropy
• Entropy measures the degree of randomness in data

• For a set of samples with classes:

where is the proportion of elements of class

• Lower entropy implies greater predictability!


Information Gain
• The information gain of an attribute a is the expected reduction
in entropy due to splitting on values of a:

where is the subset of for which


Gini Impurity
• Gini impurity measures how often a randomly chosen example would be incorrectly
labeled if it was randomly labeled according to the label distribution

• For a set of samples with classes:

where is the proportion of elements of class

• Can be used as an alternative to entropy for selecting attributes!


Random Forests
• Random Forests:
 Instead of building a single decision tree and use it to make predictions,
build many slightly different trees and combine their predictions
• We have a single data set, so how do we obtain slightly different trees?
1. Bagging (Bootstrap Aggregating):
 Take random subsets of data points from the training set to create N smaller
data sets
 Fit a decision tree on each subset
2. Random Subspace Method (also known as Feature Bagging):
 Fit N different decision trees by constraining each one to operate on a
random subset of features
Bagging at training time
N subsets (with
replacement)

Training set
Bagging at inference time

A test sample

75% confidence
Random Forests

Tree 1 Tree 2
Random Forest Tree N
Significance in Today's Technology-Driven World:
• Versatility: Decision trees can handle both classification and
regression tasks.
• Interpretability: Easy to understand and interpret.
• Ensemble Power: Random Forests improve accuracy and
generalization.
• Applications: Widely used in finance, healthcare, marketing, and
more.

Real-Time Applications:
• Fraud Detection: Identify unusual patterns in real-time transactions.
• Health Monitoring: Predict patient conditions based on real-time
data.
• Online Retail: Personalized recommendations for users.
Challenges: Ethical Considerations:
1. Overfitting: Decision trees, especially deep ones, are 1. Transparency: Ensuring transparency in how decision
prone to overfitting, capturing noise in the training data trees make predictions.
rather than the underlying patterns. • Action: Providing explanations for model decisions,
• Impact: Reduced generalization performance on new, especially in critical applications like healthcare or
unseen data. finance.
2. Sensitivity to Small Variations: Small changes in the 2. Fairness: Addressing and mitigating biases in the training
training data can lead to the generation of significantly data to promote fair and unbiased predictions.
different decision trees. • Action: Regularly auditing and updating training data to
• Impact: Lack of stability and consistency in the correct biases.
model's predictions. 3. Privacy Preservation: Safeguarding individuals' privacy in
3. Bias in Data: Decision trees can perpetuate and amplify the training and deployment of decision trees.
biases present in the training data. • Action: Implementing data anonymization and
• Impact: Unfair or discriminatory predictions, encryption protocols to protect sensitive information.
reinforcing societal biases. 4. Accountability: Establishing accountability for the
4. Lack of Robustness to Outliers: Decision trees can be outcomes of decision tree models.
sensitive to outliers, leading to skewed decision • Action: Clearly defining responsibility for model
boundaries. development, monitoring, and addressing any negative
• Impact: Outliers may disproportionately influence consequences.
model predictions.
Real-World Applications and Future Trends:
• Healthcare: Predicting diseases and personalized treatment.
• Finance: Credit scoring, fraud detection.
• Marketing: Customer segmentation, recommendation systems.

Applications:
• Cybersecurity: Decision trees and random forests are used for anomaly
detection and identifying patterns indicative of cyber threats.
• Environmental Monitoring: Decision trees can be employed for analyzing
environmental data, predicting climate patterns, and assessing the
impact of human activities on ecosystems.

• Explainable AI: Enhancing interpretability.

Future
• Automated Machine Learning (AutoML): Streamlining model
development.
• Federated Learning: The future trend involves training decision
Trends: tree models across decentralized devices or servers without
exchanging raw data
Example Problem: Effect
of weather on Play?

Data Collection:
weather_forecast data. Problem Solving
Using the Given
Model Development: Train a
Decision Tree or Random Forest. Algorithm
Evaluation: Use appropriate
metrics.

Deployment: Deploy model for


real-time credit scoring.
Model Creation:
Evaluation Metrics & Model Deployment
Decision Tree [App] Random Forest
7. References:
• Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
• Scikit-learn Decision Trees Documentation
• Scikit-learn Random Forest Documentation
• Streamlit Documentation for model deployment
• Decision Tree Implementation
• Random Forest Implementation
• Scikit Learn
THANK YOU

You might also like