Unit-5 Decision Trees & Ensembles Methods
Unit-5 Decision Trees & Ensembles Methods
• 1. **Data Collection**: Gather the dataset containing the features and the target variable you want to predict.
• 2. **Data Preprocessing**: This step involves handling missing values, encoding categorical variables, and
splitting the dataset into a training set and a testing set for evaluation.
• 3. **Tree Building**: The tree-building process typically follows a recursive, top-down approach. At each node
of the tree:
• - Select the best feature to split the data based on a criterion such as entropy or Gini impurity.
• - Split the data into subsets based on the chosen feature.
• - Recursively repeat the process on each subset until certain stopping criteria are met (e.g., maximum tree
depth, minimum number of samples per leaf).
• 4. **Stopping Criteria**: These criteria determine when to stop growing the tree. Common stopping criteria
include reaching a maximum tree depth, having a minimum number of samples in a node, or when further
splitting does not significantly improve model performance.
• 5. **Pruning (Optional)**: After the tree is built, pruning can be applied to reduce overfitting. Pruning involves
removing parts of the tree that do not provide significant improvements in prediction accuracy on a validation
dataset.
• 6. **Prediction**: Once the tree is constructed, it can be used to make predictions on new data. For
classification tasks, predictions are made by traversing the tree from the root to a leaf node and assigning the
majority class in that leaf node. For regression tasks, predictions are made by averaging the target values of
samples in the leaf node.
• 7. **Evaluation**: Finally, evaluate the performance of the decision tree model on the testing set
using appropriate evaluation metrics such as accuracy, precision, recall, F1-score (for classification),
or mean squared error (for regression).
• When implementing a decision tree, libraries like scikit-learn in Python provide convenient functions
for building and training decision tree models. Here's a simplified example using scikit-learn:
• ```python
from sklearn.tree import DecisionTreeClassifier
• from sklearn.datasets import load_iris
• from sklearn.model_selection import train_test_split
• from sklearn.metrics import accuracy_score
• 1. **Handling Continuous Attributes**: Unlike ID3, which only works with categorical attributes, C4.5 can handle
both categorical and continuous attributes. It accomplishes this by first sorting the values of continuous attributes and
then selecting thresholds for splitting.
• 2. **Handling Missing Values**: C4.5 includes a mechanism to handle missing attribute values. Instead of assigning
the most common value like ID3, C4.5 evaluates all possible split points and chooses the one that maximizes
information gain.
• 3. **Information Gain Ratio**: While ID3 uses information gain to select the best attribute for splitting, C4.5 uses
information gain ratio. Information gain ratio adjusts for bias towards attributes with a large number of values. It
penalizes attributes with many distinct values and encourages smaller trees with more meaningful splits.
• 4. **Pruning**: C4.5 includes a pruning step to reduce overfitting. After the decision tree is built, pruning involves
removing branches that do not significantly improve the tree's accuracy on a separate validation dataset. Pruning
helps to create simpler, more generalizable trees.
• 5. **Dealing with Overfitting**: C4.5 addresses overfitting by using pruning and by setting a minimum number of
instances required to split a node. This helps prevent the algorithm from creating overly complex trees that capture
noise in the training data.
• 6. **Tree Representation**: Like ID3, the resulting decision tree in C4.5 is represented in a hierarchical structure,
where each internal node represents a decision based on an attribute, and each leaf node represents a class label.
• C4.5 has been influential in the field of machine learning and data mining due to its effectiveness and flexibility. It has
inspired many variations and improvements, including the popular open-source implementation called C5.0.
• CART:
• CART, which stands for Classification and Regression Trees, is a versatile decision tree algorithm
introduced by Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone. CART can be used
for both classification and regression tasks, making it highly flexible. Here's an overview of the CART
algorithm:
• Binary Splitting: Unlike ID3 and C4.5, which can handle multi-way splits, CART performs binary splits
at each node of the tree. It considers all possible splits for each attribute and selects the one that
maximizes a criterion such as Gini impurity (for classification) or mean squared error (for regression).
• Handling Continuous and Categorical Attributes: CART can handle both continuous and categorical
attributes. For continuous attributes, it finds the best split point based on the chosen criterion. For
categorical attributes, it performs a binary split for each category.
• Pruning: CART includes a pruning step to prevent overfitting. After the decision tree is built, pruning
involves iteratively removing nodes from the tree while monitoring the tree's performance on a separate
validation dataset. Pruning helps to create simpler, more interpretable trees that generalize well to unseen
data.
• Regression Trees: In regression tasks, CART constructs regression trees to predict continuous target
variables. At each node, it minimizes the mean squared error between the predicted values and the actual
values of the target variable.
• Classification Trees: In classification tasks, CART constructs classification trees to predict class labels.
At each node, it minimizes the Gini impurity, which measures the degree of impurity in the node. CART
aims to create pure nodes with predominantly one class label.
• Tree Representation: The resulting decision tree in CART is represented in a hierarchical structure,
similar to other decision tree algorithms. Each internal node represents a decision based on an attribute,
and each leaf node represents a predicted class label (for classification) or a predicted value (for
regression).
Bagging & boosting and its impact on bias and variance:
• Bagging:
– Process: Bagging involves training multiple base learners independently on random subsets of the
training data, sampled with replacement (bootstrap sampling). Each base learner is trained on a
different subset of the data.
– Combining Predictions: In bagging, predictions from the base learners are typically averaged (for
regression tasks) or aggregated using voting (for classification tasks) to make the final prediction.
– Impact on Bias and Variance:
• Bias: Bagging tends to reduce bias by averaging predictions from multiple models, which can
improve the overall accuracy of the ensemble model.
• Variance: Bagging reduces variance by reducing the risk of overfitting. Each base learner is
trained on a different subset of the data, which introduces diversity among the models.
Combining these diverse models helps to reduce variance and make the ensemble model more
robust to variations in the training data.
• Boosting:
– Process: Boosting involves training a sequence of base learners iteratively, where each subsequent
learner focuses more on the instances that were misclassified by the previous ones. Examples are
weighted based on their classification performance during training.
– Combining Predictions: Boosting combines predictions from all base learners, giving more weight
to those with higher performance on the training data.
– Impact on Bias and Variance:
• Bias: Boosting tends to reduce bias by iteratively improving the model's ability to fit the
training data. It can learn complex patterns in the data, potentially leading to lower bias.
• Variance: Boosting can increase variance as it adapts the model to the training data, potentially
leading to overfitting. However, techniques like early stopping and regularization can be used to
mitigate this issue.