CART (Classification And Regression Tree) in Machine Learning
Last Updated :
19 Sep, 2024
CART( Classification And Regression Trees) is a variation of the decision tree algorithm. It can handle both classification and regression tasks. Scikit-Learn uses the Classification And Regression Tree (CART) algorithm to train Decision Trees (also called “growing” trees). CART was first produced by Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone in 1984.
CART(Classification And Regression Tree) for Decision Tree
CART is a predictive algorithm used in Machine learning and it explains how the target variable's values can be predicted based on other matters. It is a decision tree where each fork is split into a predictor variable and each node has a prediction for the target variable at the end.
The term CART serves as a generic term for the following categories of decision trees:
- Classification Trees: The tree is used to determine which "class" the target variable is most likely to fall into when it is continuous.
- Regression trees: These are used to predict a continuous variable's value.
In the decision tree, nodes are split into sub-nodes based on a threshold value of an attribute. The root node is taken as the training set and is split into two by considering the best attribute and threshold value. Further, the subsets are also split using the same logic. This continues till the last pure sub-set is found in the tree or the maximum number of leaves possible in that growing tree.
CART Algorithm
Classification and Regression Trees (CART) is a decision tree algorithm that is used for both classification and regression tasks. It is a supervised learning algorithm that learns from labelled data to predict unseen data.
- Tree structure: CART builds a tree-like structure consisting of nodes and branches. The nodes represent different decision points, and the branches represent the possible outcomes of those decisions. The leaf nodes in the tree contain a predicted class label or value for the target variable.
- Splitting criteria: CART uses a greedy approach to split the data at each node. It evaluates all possible splits and selects the one that best reduces the impurity of the resulting subsets. For classification tasks, CART uses Gini impurity as the splitting criterion. The lower the Gini impurity, the more pure the subset is. For regression tasks, CART uses residual reduction as the splitting criterion. The lower the residual reduction, the better the fit of the model to the data.
- Pruning: To prevent overfitting of the data, pruning is a technique used to remove the nodes that contribute little to the model accuracy. Cost complexity pruning and information gain pruning are two popular pruning techniques. Cost complexity pruning involves calculating the cost of each node and removing nodes that have a negative cost. Information gain pruning involves calculating the information gain of each node and removing nodes that have a low information gain.
How does CART algorithm works?
The CART algorithm works via the following process:
- The best-split point of each input is obtained.
- Based on the best-split points of each input in Step 1, the new “best” split point is identified.
- Split the chosen input according to the “best” split point.
- Continue splitting until a stopping rule is satisfied or no further desirable splitting is available.

CART algorithm uses Gini Impurity to split the dataset into a decision tree .It does that by searching for the best homogeneity for the sub nodes, with the help of the Gini index criterion.
Gini index/Gini impurity
The Gini index is a metric for the classification tasks in CART. It stores the sum of squared probabilities of each class. It computes the degree of probability of a specific variable that is wrongly being classified when chosen randomly and a variation of the Gini coefficient. It works on categorical variables, provides outcomes either “successful” or “failure” and hence conducts binary splitting only.
The degree of the Gini index varies from 0 to 1,
- Where 0 depicts that all the elements are allied to a certain class, or only one class exists there.
- Gini index close to 1 means a high level of impurity, where each class contains a very small fraction of elements, and
- A value of 1-1/n occurs when the elements are uniformly distributed into n classes and each class has an equal probability of 1/n. For example, with two classes, the Gini impurity is 1 - 1/2 = 0.5.
Mathematically, we can write Gini Impurity as follows:
Gini = 1 -\sum_{i=1}^{n} (p_i)^{{2}}
where p_i
is the probability of an object being classified to a particular class.
In conclusion, Gini impurity is the probability of misclassification, assuming independent selection of the element and its class based on the class probabilities.
CART for Classification
A classification tree is an algorithm where the target variable is categorical. The algorithm is then used to identify the "Class" within which the target variable is most likely to fall. Classification trees are used when the dataset needs to be split into classes that belong to the response variable(like yes or no)
For classification in decision tree learning algorithm that creates a tree-like structure to predict class labels. The tree consists of nodes, which represent different decision points, and branches, which represent the possible result of those decisions. Predicted class labels are present at each leaf node of the tree.
How Does CART for Classification Work?
CART for classification works by recursively splitting the training data into smaller and smaller subsets based on certain criteria. The goal is to split the data in a way that minimizes the impurity within each subset. Impurity is a measure of how mixed up the data is in a particular subset. For classification tasks, CART uses Gini impurity
- Gini Impurity- Gini impurity measures the probability of misclassifying a random instance from a subset labeled according to the majority class. Lower Gini impurity means more purity of the subset.
- Splitting Criteria- The CART algorithm evaluates all potential splits at every node and chooses the one that best decreases the Gini impurity of the resultant subsets. This process continues until a stopping criterion is reached, like a maximum tree depth or a minimum number of instances in a leaf node.
CART for Regression
A Regression tree is an algorithm where the target variable is continuous and the tree is used to predict its value. Regression trees are used when the response variable is continuous. For example, if the response variable is the temperature of the day.
CART for regression is a decision tree learning method that creates a tree-like structure to predict continuous target variables. The tree consists of nodes that represent different decision points and branches that represent the possible outcomes of those decisions. Predicted values for the target variable are stored in each leaf node of the tree.
How Does CART works for Regression?
Regression CART works by splitting the training data recursively into smaller subsets based on specific criteria. The objective is to split the data in a way that minimizes the residual reduction in each subset.
- Residual Reduction- Residual reduction is a measure of how much the average squared difference between the predicted values and the actual values for the target variable is reduced by splitting the subset. The lower the residual reduction, the better the model fits the data.
- Splitting Criteria- CART evaluates every possible split at each node and selects the one that results in the greatest reduction of residual error in the resulting subsets. This process is repeated until a stopping criterion is met, such as reaching the maximum tree depth or having too few instances in a leaf node.
Pseudo-code of the CART algorithm
d = 0, endtree = 0
Note(0) = 1, Node(1) = 0, Node(2) = 0
while endtree < 1
if Node(2d-1) + Node(2d) + .... + Node(2d+1-2) = 2 - 2d+1
endtree = 1
else
do i = 2d-1, 2d, .... , 2d+1-2
if Node(i) > -1
Split tree
else
Node(2i+1) = -1
Node(2i+2) = -1
end if
end do
end if
d = d + 1
end while
CART model representation
CART models are formed by picking input variables and evaluating split points on those variables until an appropriate tree is produced.
Steps to create a Decision Tree using the CART algorithm:
- Greedy algorithm: In this The input space is divided using the Greedy method which is known as a recursive binary spitting. This is a numerical method within which all of the values are aligned and several other split points are tried and assessed using a cost function.
- Stopping Criterion: As it works its way down the tree with the training data, the recursive binary splitting method described above must know when to stop splitting. The most frequent halting method is to utilize a minimum amount of training data allocated to every leaf node. If the count is smaller than the specified threshold, the split is rejected and also the node is considered the last leaf node.
- Tree pruning: Decision tree's complexity is defined as the number of splits in the tree. Trees with fewer branches are recommended as they are simple to grasp and less prone to cluster the data. Working through each leaf node in the tree and evaluating the effect of deleting it using a hold-out test set is the quickest and simplest pruning approach.
- Data preparation for the CART: No special data preparation is required for the CART algorithm.
Decision Tree CART Implementations
Here is the code implements the CART algorithm for classifying fruits based on their color and size. It first encodes the categorical data using a LabelEncoder and then trains a CART classifier on the encoded data. Finally, it predicts the fruit type for a new instance and decodes the result back to its original categorical value.
Python
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import LabelEncoder
# Define the features and target variable
features = [
["red", "large"],
["green", "small"],
["red", "small"],
["yellow", "large"],
["green", "large"],
["orange", "large"],
]
target_variable = ["apple", "lime", "strawberry", "banana", "grape", "orange"]
# Flatten the features list for encoding
flattened_features = [item for sublist in features for item in sublist]
# Use a single LabelEncoder for all features and target variable
le = LabelEncoder()
le.fit(flattened_features + target_variable)
# Encode features and target variable
encoded_features = [le.transform(item) for item in features]
encoded_target = le.transform(target_variable)
# Create a CART classifier
clf = DecisionTreeClassifier()
# Train the classifier on the training set
clf.fit(encoded_features, encoded_target)
# Predict the fruit type for a new instance
new_instance = ["red", "large"]
encoded_new_instance = le.transform(new_instance)
predicted_fruit_type = clf.predict([encoded_new_instance])
decoded_predicted_fruit_type = le.inverse_transform(predicted_fruit_type)
print("Predicted fruit type:", decoded_predicted_fruit_type[0])
Output:
Predicted fruit type: apple
POPULAR CART-BASED ALGORITHMS:
- CART (Classification and Regression Trees): The original algorithm that uses binary splits to build decision trees.
- C4.5 and C5.0: Extensions of CART that allow for multiway splits and handle categorical variables more effectively.
- Random Forests: Ensemble methods that use multiple decision trees (often CART) to improve predictive performance and reduce overfitting.
- Gradient Boosting Machines (GBM): Boosting algorithms that also use decision trees (often CART) as base learners, sequentially improving model performance.
Advantages of CART
- Results are simplistic.
- Classification and regression trees are Nonparametric and Nonlinear.
- Classification and regression trees implicitly perform feature selection.
- Outliers have no meaningful effect on CART.
- It requires minimal supervision and produces easy-to-understand models.
Limitations of CART
- Overfitting.
- High Variance.
- low bias.
- the tree structure may be unstable.
Applications of the CART algorithm
- For quick Data insights.
- In Blood Donors Classification.
- For environmental and ecological data.
- In the financial sectors.
Frequently asked Question(FAQ's)
1. What is CART (classification and regression tree)?
CART is a decision tree algorithm that can be used for both classification and regression tasks. It works by recursively partitioning the data into smaller and smaller subsets based on certain criteria. The goal is to create a tree structure that can accurately predict the target variable for new data points.
2. What is a regression tree in machine learning?
A regression tree is a type of decision tree that is used to predict continuous target variables. It works by partitioning the data into smaller and smaller subsets based on certain criteria, and then predicting the average value of the target variable within each subset.
3. What is the difference between a regression tree and a classification tree?
A regression tree is used to predict continuous target variables, while a classification tree is used to predict categorical target variables. Regression trees predict the average value of the target variable within each subset, while classification trees predict the most likely class for each data point.
4. What is the difference between cart and decision tree?
CART is a specific implementation of the decision tree algorithm. There are other decision tree algorithms, such as ID3 and C4.5, that have different splitting criteria and pruning techniques.
5. What is the main difference between classification and regression?
Classification is the task of assigning a category to an instance, while regression is the task of predicting a continuous value. For example, classification could be used to predict whether an email is spam or not spam, while regression could be used to predict the price of a house based on its size, location, and amenities.
Similar Reads
Machine Learning Algorithms
Machine learning algorithms are essentially sets of instructions that allow computers to learn from data, make predictions, and improve their performance over time without being explicitly programmed. Machine learning algorithms are broadly categorized into three types: Supervised Learning: Algorith
8 min read
Top 15 Machine Learning Algorithms Every Data Scientist Should Know in 2025
Machine Learning (ML) Algorithms are the backbone of everything from Netflix recommendations to fraud detection in financial institutions. These algorithms form the core of intelligent systems, empowering organizations to analyze patterns, predict outcomes, and automate decision-making processes. Wi
14 min read
Linear Model Regression
Ordinary Least Squares (OLS) using statsmodels
Ordinary Least Squares (OLS) is a widely used statistical method for estimating the parameters of a linear regression model. It minimizes the sum of squared residuals between observed and predicted values. In this article we will learn how to implement Ordinary Least Squares (OLS) regression using P
3 min read
Linear Regression (Python Implementation)
Linear regression is a statistical method that is used to predict a continuous dependent variable i.e target variable based on one or more independent variables. This technique assumes a linear relationship between the dependent and independent variables which means the dependent variable changes pr
14 min read
Multiple Linear Regression using Python - ML
Linear regression is a statistical method used for predictive analysis. It models the relationship between a dependent variable and a single independent variable by fitting a linear equation to the data. Multiple Linear Regression extends this concept by modelling the relationship between a dependen
4 min read
Polynomial Regression ( From Scratch using Python )
Prerequisites Linear RegressionGradient DescentIntroductionLinear Regression finds the correlation between the dependent variable ( or target variable ) and independent variables ( or features ). In short, it is a linear model to fit the data linearly. But it fails to fit and catch the pattern in no
5 min read
Bayesian Linear Regression
Linear regression is based on the assumption that the underlying data is normally distributed and that all relevant predictor variables have a linear relationship with the outcome. But In the real world, this is not always possible, it will follows these assumptions, Bayesian regression could be the
10 min read
How to Perform Quantile Regression in Python
In this article, we are going to see how to perform quantile regression in Python. Linear regression is defined as the statistical method that constructs a relationship between a dependent variable and an independent variable as per the given set of variables. While performing linear regression we a
4 min read
Isotonic Regression in Scikit Learn
Isotonic regression is a regression technique in which the predictor variable is monotonically related to the target variable. This means that as the value of the predictor variable increases, the value of the target variable either increases or decreases in a consistent, non-oscillating manner. Mat
6 min read
Stepwise Regression in Python
Stepwise regression is a method of fitting a regression model by iteratively adding or removing variables. It is used to build a model that is accurate and parsimonious, meaning that it has the smallest number of variables that can explain the data. There are two main types of stepwise regression: F
6 min read
Least Angle Regression (LARS)
Regression is a supervised machine learning task that can predict continuous values (real numbers), as compared to classification, that can predict categorical or discrete values. Before we begin, if you are a beginner, I highly recommend this article. Least Angle Regression (LARS) is an algorithm u
3 min read
Linear Model Classification
K-Nearest Neighbors (KNN)
ML | Stochastic Gradient Descent (SGD)
Stochastic Gradient Descent (SGD) is an optimization algorithm in machine learning, particularly when dealing with large datasets. It is a variant of the traditional gradient descent algorithm but offers several advantages in terms of efficiency and scalability, making it the go-to method for many d
8 min read