0% found this document useful (0 votes)
4 views

EST Cheatsheet

The document provides an overview of Decision Trees, a supervised learning algorithm used for classification and regression, detailing impurity measures like Gini and Entropy that help in selecting the best splits. It discusses the concepts of pruning to prevent overfitting and various parameters that can be adjusted to control tree growth. Additionally, it covers ensemble methods such as Random Forest and Boosting, which combine multiple models to enhance prediction accuracy.

Uploaded by

Amit Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

EST Cheatsheet

The document provides an overview of Decision Trees, a supervised learning algorithm used for classification and regression, detailing impurity measures like Gini and Entropy that help in selecting the best splits. It discusses the concepts of pruning to prevent overfitting and various parameters that can be adjusted to control tree growth. Additionally, it covers ensemble methods such as Random Forest and Boosting, which combine multiple models to enhance prediction accuracy.

Uploaded by

Amit Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Decision Trees

A decision tree is a Supervised Learning Algorithm Impurity measures in Decision Trees are used to select the best split for that node. The lower the
used for both Classification and Regression problems. impurity the better the split, yielding the maximum purity in each node.
It splits the dataset using conditions and forms a tree
like structure which acts as a flow chart to classify the The tree keeps growing until all the nodes are completely pure.
records. It consists of branches and nodes.
Commonly used impurity measures are: Gini, Entropy and Information gain

Gini Impurity: 1 − σ𝑖 𝑝𝑖2 (ranges from 0 to 0.5)

Entropy (E): 1 − σ𝐶
𝑖=1 −𝑝𝑖 log 2 𝑝𝑖 (ranges from 0 to 1)
[email protected]
LOSXG1B9X7 Information Gain:
𝑚 𝑚
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑛𝑜𝑑𝑒) − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐹𝑖𝑟𝑠𝑡 𝑐ℎ𝑖𝑙𝑑 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆𝑒𝑐𝑜𝑛𝑑 𝑐ℎ𝑖𝑙𝑑
𝑚+𝑛 𝑚+𝑛

1
Source: https://www.displayr.com/what-is-a-decision-tree/ Entropy
0.8
Gini • Higher Gini and Entropy values
0.6 indicate higher impurity
Root Node: The first node with all the data points and
maximum impurity. It identifies the best split and 0.4
divides the data accordingly • Reduction in impurity after the
0.2
Decision Node: It is an intermediate node that further split is measured by information
splits the data with the best splitting condition 0 gain
0 0.2 0.4 0.6 0.8 1
Leaf nodes: They are the final nodes in the decision
tree which makes the final prediction or classification p
This file is meant for personal use by [email protected] only. 1
Proprietary content.
Sharing ©Great Learning.
or publishing All Rights
the contents Reserved.
in part or fullUnauthorized use oraction.
is liable for legal distribution prohibited
Pruning and Parameters in Decision Trees

One of the dis-advantage of Decision Tree is, it is prone to overfitting. The min_impurity_decrease: The tree grows by decreasing the impurity in each
tree grows to the maximum depth until all the samples in the training data node it builds and sometimes it even grows if there no decrease in the
are perfectly fitted. impurity leading to overfit model. By setting some value to this parameter
will help us limit the tree growth
Pruning: It helps us to overcome the problem of overfitting by reducing or Reference Link: https://scikit-
cutting off the branches or nodes of the decision tree to improve the model. learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

Cost Complexity Pruning: It is one of the pruning techniques which uses cost min_samples_split: The minimum of samples required to split an internal
complexity parameter ccp_alpha to prune the nodes. mode. If the number of samples is less than the specified value, then split
Reference Link: https://scikit- will not happen and the tree stops growing.
learn.org/stable/auto_examples/tree/plot_cost_complexity_pruning.html Reference Link: https://scikit-
[email protected]
LOSXG1B9X7
learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
max_depth: It is the length between the root node and last nodes. By
assigning max depth value we can limit the tree growth and simplify the max_leaf_nodes: This parameter helps us to limit the tree based on the
model. number of leaf nodes. The tree stops growing once it attains the maximum
Reference Link: https://scikit- leaf nodes as specified in the parameter.
learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html Reference Link: https://scikit-
learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
min_samples_leaf: The minimum number of samples required in the leaf
node. Further splitting will be stopped if the child nodes cannot meet this max_features: It is used to specify the maximum number of features to be
minimum samples requirement. considered for the best split.
Reference Link: https://scikit- Reference Link: https://scikit-
learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

This file is meant for personal use by [email protected] only. 2


Proprietary content.
Sharing ©Great Learning.
or publishing All Rights
the contents Reserved.
in part or fullUnauthorized use oraction.
is liable for legal distribution prohibited
Pruning and Visualizing Trees

#Fitting the model #Visualizing a Decision Tree


d_tree = DecisionTreeClassifier() from sklearn.tree import plot_tree, export_text
d_tree.fit(X_train,y_train) plt.figure(figsize =(80,20))

#Pruning with cost complexity parameter plot_tree(d_tree, feature_names=X_train.columns, max_depth=2,


path = d_tree.cost_complexity_pruning_path(X_train, y_train) filled=True)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
Reference: https://www.section.io/engineering-education/hyperparmeter-
#Next, we train a decision tree using the effective alphas. The last value in tuning/
ccp_alphas is the alpha value that prunes the whole tree, leaving the tree, Root Node
clfs[-1], with one node. Approved – 6
clfs = []
[email protected]
LOSXG1B9X7 Rejected – 4
for ccp_alpha in ccp_alphas:
clf = DecisionTreeClassifier(random_state=0, ccp_alpha=ccp_alpha)
clf.fit(X_train, y_train) Income < =
Income > 80,000 80,000
clfs.append(clf) Approved – 4
print( Approved – 2
Rejected – 1 Rejected – 3
"Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
clfs[-1].tree_.node_count, ccp_alphas[-1]
)
Age >50 Age <=50 Age > 30 Age <= 30
)
Reference: https://scikit- Approved – 0 Approved – 4 Approved – 0 Approved – 2
learn.org/stable/auto_examples/tree/plot_cost_complexity_pruning.html Rejected – 1 Rejected – 0 Rejected – 3 Rejected – 0

Rejected Approved Rejected Approved

This file is meant for personal use by [email protected] only. 3


Proprietary content.
Sharing ©Great Learning.
or publishing All Rights
the contents Reserved.
in part or fullUnauthorized use oraction.
is liable for legal distribution prohibited
Ensemble Methods

Ensemble Methods combines the multiple individual models to derive an Random Forest: One of the bagging methods where each tree is built from a
output or prediction. It is used for both classification and regression sample drawn with replacement from the training set. It uses a random
problems. subset of features instead of all features and finds the best split among
We get an highly accurate and generalizable model. them.

Averaging methods, the driving principle is to build several estimators >>> from sklearn.ensemble import RandomForestClassifier
independently and then to average / vote their predictions. On average, the >>> X = [[0, 0], [1, 1]]
combined estimator is usually better than any of the single base estimator >>> Y = [0, 1]
because its variance is reduced. >>> clf = RandomForestClassifier(n_estimators=10)
E.g. Bagging methods, Forests of randomized trees, ... >>> clf = clf.fit(X, Y)

Boosting methods,
[email protected]
LOSXG1B9X7
base estimators are built sequentially and one tries to Ada Boosting:In AdaBoost, the successive models are created with a focus on
reduce the bias of the combined estimator. The motivation is to combine the ill fitted data of the previous learner. Each successive model focuses
several weak models to produce a powerful ensemble. E.g. more and more on the harder to fit data i.e. their residuals in the previous
AdaBoost, Gradient Tree Boosting, ... model. Model instances are created sequentially; except for the first, each
subsequent model is grown from previously grown learners
Bagging: It is also known as Bootstrap Aggregation, uses sampling with
replacement to generate multiple samples of a given size to build an >>> from sklearn.model_selection import cross_val_score
ensemble of models that improve performance and accuracy. It learns from >>> from sklearn.datasets import load_iris
a homogenous weak learners independently and aggregates them. >>> from sklearn.ensemble import AdaBoostClassifier

>>> from sklearn.ensemble import BaggingClassifier >>> X, y = load_iris(return_X_y=True)


>>> from sklearn.neighbors import KNeighborsClassifier >>> clf = AdaBoostClassifier(n_estimators=100)
>>> bagging = BaggingClassifier(KNeighborsClassifier(), >>> scores = cross_val_score(clf, X, y, cv=5)
... max_samples=0.5, max_features=0.5) >>> scores.mean()
This file is meant for personal use by [email protected] only. 4
Proprietary content.
Sharing ©Great Learning.
or publishing All Rights
the contents Reserved.
in part or fullUnauthorized use oraction.
is liable for legal distribution prohibited
Ensemble Methods

Gradient Boosting: Similarly to AdaBoost, gradient tree boosting is built from Stacking: It uses a meta-learning algorithm to learn how to best combine the
a set of small trees, though usually slightly deeper than decision stumps. The predictions from two or more base machine learning algorithms.
trees are trained sequentially, just like in AdaBoost, but the training of
individual trees is not the same >>> from sklearn.linear_model import RidgeCV, LassoCV
>>> from sklearn.neighbors import KNeighborsRegressor
>>> from sklearn.datasets import make_hastie_10_2
>>> from sklearn.ensemble import GradientBoostingClassifier >>> estimators = [('ridge', RidgeCV()),
... ('lasso', LassoCV(random_state=42)),
>>> X, y = make_hastie_10_2(random_state=0) ... ('knr', KNeighborsRegressor(n_neighbors=20,
>>> X_train, X_test = X[:2000], X[2000:] ... metric='euclidean'))]
>>> y_train, y_test = y[:2000], y[2000:]
[email protected]
LOSXG1B9X7
>>> from sklearn.ensemble import GradientBoostingRegressor
>>> clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, >>> from sklearn.ensemble import StackingRegressor
... max_depth=1, random_state=0).fit(X_train, y_train) >>> final_estimator = GradientBoostingRegressor(
>>> clf.score(X_test, y_test) ... n_estimators=25, subsample=0.5, min_samples_leaf=25,
0.913... max_features=1,
... random_state=42)
XGBoost: It is called as Extreme Gradient Boosting which uses gradient >>> reg = StackingRegressor(
boosting framework with better performance and speed. ... estimators=estimators,
... final_estimator=final_estimator)
>>> from xgboost import XGBClassifier
>>> xgb_classifier = XGBClassifier(random_state=1, eval_metric= "error") Reference: https://scikit-learn.org/stable/modules/ensemble.html
>>> xgb_classifier.fit(X_train.astype('int'),y_train)

This file is meant for personal use by [email protected] only. 5


Proprietary content.
Sharing ©Great Learning.
or publishing All Rights
the contents Reserved.
in part or fullUnauthorized use oraction.
is liable for legal distribution prohibited

You might also like