0% found this document useful (0 votes)
43 views11 pages

Unit-5 Decision Trees & Ensembles Methods

Uploaded by

idalgavearpita31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views11 pages

Unit-5 Decision Trees & Ensembles Methods

Uploaded by

idalgavearpita31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Unit-5 DECISION TREES & ENSEMBLES METHODS

What Is a Decision Tree:


A decision tree is a popular supervised machine learning algorithm used for both classification and
regression tasks. It is a flowchart-like structure where each internal node represents a "decision" based on
one of the input features, each branch represents the outcome of that decision, and each leaf node
represents the final decision or the target value.
In a classification decision tree, the goal is to predict the class label of the input data, while in a
regression decision tree, the goal is to predict a continuous value. Decision trees are constructed by
recursively partitioning the input space into smaller regions, making decisions at each step based on the
features that best separate the data.
The decision-making process in a decision tree involves selecting the most informative feature at each
step to split the data into subsets that are as pure as possible with respect to the target variable. This
process continues until a stopping criterion is met, such as reaching a maximum tree depth, achieving a
minimum number of samples in a leaf node, or when further splitting does not significantly improve the
model's performance.
Decision trees are easy to interpret and visualize, making them particularly useful for understanding the
underlying patterns in the data. However, they can be prone to overfitting, especially when the trees are
deep and complex.
• Entropy:
• ntropy, in the context of decision trees and machine learning, is a measure of impurity or disorder in a set
of data. In decision tree algorithms, entropy is commonly used to determine the best feature to split the
data on at each node of the tree.
• The entropy of a set is calculated using the formula:
• H(S)=−∑i=1c​pi​log2​(pi​)
• Where:
• H(S) is the entropy of the set S.
• pi​is the proportion of examples in class i in set S.
• c is the number of classes.
• The entropy is highest when the classes in the dataset are evenly distributed, meaning there is maximum
uncertainty about which class a given example belongs to. Conversely, entropy is lowest (zero) when all
examples in the set belong to the same class, indicating perfect purity or homogeneity.
• When building a decision tree, the algorithm selects the feature that minimizes entropy or maximizes
information gain, which is the reduction in entropy that results from splitting the data on that feature. The
goal is to find the feature that separates the data into subsets that are as pure as possible with respect to the
target variable, leading to more accurate predictions.
• Creating a Decision Tree:
• Creating a decision tree involves several steps. Here's a simplified overview:

• 1. **Data Collection**: Gather the dataset containing the features and the target variable you want to predict.

• 2. **Data Preprocessing**: This step involves handling missing values, encoding categorical variables, and
splitting the dataset into a training set and a testing set for evaluation.

• 3. **Tree Building**: The tree-building process typically follows a recursive, top-down approach. At each node
of the tree:
• - Select the best feature to split the data based on a criterion such as entropy or Gini impurity.
• - Split the data into subsets based on the chosen feature.
• - Recursively repeat the process on each subset until certain stopping criteria are met (e.g., maximum tree
depth, minimum number of samples per leaf).

• 4. **Stopping Criteria**: These criteria determine when to stop growing the tree. Common stopping criteria
include reaching a maximum tree depth, having a minimum number of samples in a node, or when further
splitting does not significantly improve model performance.

• 5. **Pruning (Optional)**: After the tree is built, pruning can be applied to reduce overfitting. Pruning involves
removing parts of the tree that do not provide significant improvements in prediction accuracy on a validation
dataset.

• 6. **Prediction**: Once the tree is constructed, it can be used to make predictions on new data. For
classification tasks, predictions are made by traversing the tree from the root to a leaf node and assigning the
majority class in that leaf node. For regression tasks, predictions are made by averaging the target values of
samples in the leaf node.
• 7. **Evaluation**: Finally, evaluate the performance of the decision tree model on the testing set
using appropriate evaluation metrics such as accuracy, precision, recall, F1-score (for classification),
or mean squared error (for regression).

• When implementing a decision tree, libraries like scikit-learn in Python provide convenient functions
for building and training decision tree models. Here's a simplified example using scikit-learn:

• ```python
from sklearn.tree import DecisionTreeClassifier
• from sklearn.datasets import load_iris
• from sklearn.model_selection import train_test_split
• from sklearn.metrics import accuracy_score

• # Load the iris dataset


• iris = load_iris()
• X = iris.data
• y = iris.target

• # Split the data into training and testing sets


• X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

• # Create a decision tree classifier


• clf = DecisionTreeClassifier()

• # Train the classifier on the training data


• clf.fit(X_train, y_train)

• # Make predictions on the testing data


• y_pred = clf.predict(X_test)

• # Evaluate the accuracy of the model


• accuracy = accuracy_score(y_test, y_pred)
• print("Accuracy:", accuracy)
• ```
This example demonstrates how to create and train a decision tree classifier using scikit-learn and evaluate its performance
on a test set.
ID4 Algorithm:
• It seems like you're referring to the Iterative Dichotomiser 4 (ID3) algorithm, which is one of the earliest and
simplest algorithms used for constructing decision trees. Here's an overview of the ID3 algorithm:
• Selecting the Best Attribute: The algorithm begins by selecting the best attribute to split the dataset. It uses a
criterion such as information gain or entropy to determine which attribute provides the most significant
reduction in uncertainty.
• Splitting the Dataset: Once the best attribute is chosen, the dataset is split into subsets based on the possible
values of that attribute.
• Building the Tree Recursively: This process is repeated recursively on each subset of data until one of the
following conditions is met:
– All instances in a subset belong to the same class (pure node).
– There are no more attributes left to split on.
– A stopping criterion is met (e.g., maximum tree depth, minimum number of samples per leaf).
• Handling Missing Values: ID3 can handle missing attribute values by assigning the most common value of
the attribute in the dataset to the missing values.
• Handling Categorical Data: ID3 is designed for categorical (discrete) attributes. If the dataset contains
continuous attributes, they need to be discretized before applying the algorithm.
• Pruning (Optional): ID3 does not perform pruning, which can lead to overfitting on the training data.
However, post-pruning techniques can be applied to reduce overfitting.
• Tree Representation: The resulting decision tree is typically represented in a hierarchical structure, where
each internal node represents a decision based on an attribute, and each leaf node represents a class label.
• ID3 has some limitations, such as its inability to handle continuous attributes directly and its tendency to
create biased trees when attributes with many values are favored. These limitations have led to the
development of more advanced algorithms like C4.5 and CART (Classification and Regression Trees), which
address some of the shortcomings of ID3.
• C4.5:
• C4.5 is an extension of the ID3 (Iterative Dichotomiser 3) algorithm, developed by Ross Quinlan. It overcomes some of
the limitations of ID3 and introduces several improvements, making it one of the most widely used decision tree
algorithms. Here's an overview of the C4.5 algorithm:

• 1. **Handling Continuous Attributes**: Unlike ID3, which only works with categorical attributes, C4.5 can handle
both categorical and continuous attributes. It accomplishes this by first sorting the values of continuous attributes and
then selecting thresholds for splitting.

• 2. **Handling Missing Values**: C4.5 includes a mechanism to handle missing attribute values. Instead of assigning
the most common value like ID3, C4.5 evaluates all possible split points and chooses the one that maximizes
information gain.

• 3. **Information Gain Ratio**: While ID3 uses information gain to select the best attribute for splitting, C4.5 uses
information gain ratio. Information gain ratio adjusts for bias towards attributes with a large number of values. It
penalizes attributes with many distinct values and encourages smaller trees with more meaningful splits.

• 4. **Pruning**: C4.5 includes a pruning step to reduce overfitting. After the decision tree is built, pruning involves
removing branches that do not significantly improve the tree's accuracy on a separate validation dataset. Pruning
helps to create simpler, more generalizable trees.

• 5. **Dealing with Overfitting**: C4.5 addresses overfitting by using pruning and by setting a minimum number of
instances required to split a node. This helps prevent the algorithm from creating overly complex trees that capture
noise in the training data.

• 6. **Tree Representation**: Like ID3, the resulting decision tree in C4.5 is represented in a hierarchical structure,
where each internal node represents a decision based on an attribute, and each leaf node represents a class label.

• C4.5 has been influential in the field of machine learning and data mining due to its effectiveness and flexibility. It has
inspired many variations and improvements, including the popular open-source implementation called C5.0.
• CART:
• CART, which stands for Classification and Regression Trees, is a versatile decision tree algorithm
introduced by Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone. CART can be used
for both classification and regression tasks, making it highly flexible. Here's an overview of the CART
algorithm:
• Binary Splitting: Unlike ID3 and C4.5, which can handle multi-way splits, CART performs binary splits
at each node of the tree. It considers all possible splits for each attribute and selects the one that
maximizes a criterion such as Gini impurity (for classification) or mean squared error (for regression).
• Handling Continuous and Categorical Attributes: CART can handle both continuous and categorical
attributes. For continuous attributes, it finds the best split point based on the chosen criterion. For
categorical attributes, it performs a binary split for each category.
• Pruning: CART includes a pruning step to prevent overfitting. After the decision tree is built, pruning
involves iteratively removing nodes from the tree while monitoring the tree's performance on a separate
validation dataset. Pruning helps to create simpler, more interpretable trees that generalize well to unseen
data.
• Regression Trees: In regression tasks, CART constructs regression trees to predict continuous target
variables. At each node, it minimizes the mean squared error between the predicted values and the actual
values of the target variable.
• Classification Trees: In classification tasks, CART constructs classification trees to predict class labels.
At each node, it minimizes the Gini impurity, which measures the degree of impurity in the node. CART
aims to create pure nodes with predominantly one class label.
• Tree Representation: The resulting decision tree in CART is represented in a hierarchical structure,
similar to other decision tree algorithms. Each internal node represents a decision based on an attribute,
and each leaf node represents a predicted class label (for classification) or a predicted value (for
regression).
Bagging & boosting and its impact on bias and variance:
• Bagging:
– Process: Bagging involves training multiple base learners independently on random subsets of the
training data, sampled with replacement (bootstrap sampling). Each base learner is trained on a
different subset of the data.
– Combining Predictions: In bagging, predictions from the base learners are typically averaged (for
regression tasks) or aggregated using voting (for classification tasks) to make the final prediction.
– Impact on Bias and Variance:
• Bias: Bagging tends to reduce bias by averaging predictions from multiple models, which can
improve the overall accuracy of the ensemble model.
• Variance: Bagging reduces variance by reducing the risk of overfitting. Each base learner is
trained on a different subset of the data, which introduces diversity among the models.
Combining these diverse models helps to reduce variance and make the ensemble model more
robust to variations in the training data.
• Boosting:
– Process: Boosting involves training a sequence of base learners iteratively, where each subsequent
learner focuses more on the instances that were misclassified by the previous ones. Examples are
weighted based on their classification performance during training.
– Combining Predictions: Boosting combines predictions from all base learners, giving more weight
to those with higher performance on the training data.
– Impact on Bias and Variance:
• Bias: Boosting tends to reduce bias by iteratively improving the model's ability to fit the
training data. It can learn complex patterns in the data, potentially leading to lower bias.
• Variance: Boosting can increase variance as it adapts the model to the training data, potentially
leading to overfitting. However, techniques like early stopping and regularization can be used to
mitigate this issue.

You might also like