ML Exp6
ML Exp6
Objective:
Program to implement decision tree in machine learning.
Apparatus required:
Pc and Jupyter or collab .
Theory:
A decision tree is a supervised machine learning algorithm used for both classification and regression
tasks and makes predictions based on a set of training data. It resembles the structure of a flowchart,
where each internal node represents a test on an attribute, each branch represents the outcome of the test,
and each leaf node represents the final decision or the predicted output. The goal of a decision tree is to
create a model that predicts the target variable based on input features by recursively splitting the data
into subsets.
Key Components:
Root Node: The starting point of the tree, typically containing the entire dataset.
Internal Nodes (Decision Nodes): These nodes ask questions about the features (attributes) of the data.
The answer to the question determines which branch to follow.
Branches: Each branch represents a possible answer to the question posed at the internal node.
Leaf Nodes (Terminal Nodes): These nodes represent the final prediction or outcome. They contain
labels in classification tasks or continuous values in regression tasks
Construction Process:
1. Start with the Root Node: The entire training dataset resides here.
2. Ask Questions and Split: The algorithm iteratively selects the best feature and its corresponding value
to split the data into subsets. The goal is to maximize the purity (homogeneity) of data points within each
branch. Common splitting criteria include Gini impurity for classification and variance for regression.
3. Repeat Until Stopping Condition: The splitting process continues at each internal node until a stopping
criterion is met. This could be reaching a certain depth in the tree, having a minimum number of data
points in a leaf node, or achieving a desired level of prediction accuracy.
Interpretability: One of the biggest strengths of decision trees is their ease of interpretation. The tree
structure visually depicts the decision-making process, making it easier to understand how the model
arrives at its predictions.
Handling of Different Data Types: Decision trees can work effectively with both categorical and
continuous data, making them adaptable to various machine learning problems.
Efficiency: Training decision trees is often computationally efficient, especially for smaller datasets.
They can also be relatively fast for making predictions.
Overfitting: Decision trees are prone to overfitting, especially when grown too deep. This means they
may perform well on the training data but generalize poorly to unseen data. Techniques like pruning
(removing unnecessary branches) can help mitigate this.
Sensitivity to Irrelevant Features: Decision trees can be sensitive to irrelevant or redundant features in
the data, which can lead to suboptimal performance. Feature selection methods can be used to address
this.
Variance: Decision trees can sometimes be unstable, meaning small changes in the training data can lead
to significant changes in the tree structure. Ensemble methods (combining multiple decision trees) can
improve stability.