0% found this document useful (0 votes)

34 views9 pages

Learn Python From Scratch

Uploaded by

mohanadvani74

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views9 pages

Learn Python From Scratch

Uploaded by

mohanadvani74

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Building

Random Forest Algorithm

from Scratch in Python

Without relying on high-level libraries

1 ANSHUMAN JHA
Building Random Forest Algorithm from Scratch in Python

Table of Contents
1. Introduction to Random Forest
2. Building Blocks of a Random Forest
a. Decision Tree
b. Bootstrap Aggregation (Bagging)
c. Random Feature Selection
3. The Structure of a Random Forest
4. Implementation Random Forest from Scratch in Python
a. Implementing the Decision Tree
i. Calculate the Gini Index
ii. Split a Dataset Based on an Attribute and an Attribute Value
iii. Select the Best Split Point for a Dataset
iv. Create a Terminal Node Value
v. Create Child Splits for a Node or Make Terminal
vi. Build a Decision Tree
vii. Make a Prediction with a Decision Tree
viii. Create a Random Subsample from the Dataset with Replacement
ix. Prediction with a List of Bagged Trees
b. Building the Random Forest
i. Create a Random Forest
ii. Evaluate the Algorithm Using Cross-Validation
iii. Split a Dataset into K Folds
iv. Calculate Accuracy Percentage
v. Test the Random Forest Algorithm
vi. Load Dataset Function
vii. Main Script to Run the Random Forest Algorithm
5. Conclusion

2 ANSHUMAN JHA
Building Random Forest Algorithm from Scratch in Python

1. Introduction to Random Forest Algorithm

Random Forest is a popular ensemble learning method for classification and regression tasks. It combines multiple
decision trees to improve the overall performance and reduce the risk of overfitting. In this post, we will build a Random
Forest algorithm from scratch in Python, without relying on high-level libraries like scikit-learn.

Random Forest is an ensemble method that builds multiple decision trees and merges them together to get a more
accurate and stable prediction. Each tree is built using a different subset of the training data, and the final prediction is
made by averaging the predictions of all trees (for regression) or by majority voting (for classification)

2. Building Blocks of a Random Forest

Decision Tree
A decision tree is a flowchart-like structure where each internal node represents a decision based on a feature, each
branch represents the outcome of the decision, and each leaf node represents a class label (for classification) or a
continuous value (for regression).

Bootstrap Aggregation (Bagging)

Bagging involves randomly sampling with replacement from the training data to create multiple datasets. Each decision
tree is trained on a different dataset, reducing variance and improving robustness.

Random Feature Selection

Random feature selection means that each node in a decision tree is split using a random subset of features. This
process helps to decorrelate the trees and improve the model's performance.

3 ANSHUMAN JHA
Building Random Forest Algorithm from Scratch in Python

3. The Structure of a Random Forest Algorithm

This Structure includes the steps and sub-steps with appropriate labels and connections. Each step corresponds to a
function or a key part of the process described in the provided implementation.

4 ANSHUMAN JHA
Building Random Forest Algorithm from Scratch in Python

4. Implementation in Python
Let's implement a simple Random Forest Algorithm in Python.

Step 1: Implementing the Decision Tree

Let's start by implementing the core component of our Random Forest: the decision tree.

Calculate the Gini Index

The Gini index is a measure of impurity or diversity used to evaluate splits in decision trees.

import numpy as np

def gini_index(groups, classes):

# Count all samples at split point
n_instances = float(sum([len(group) for group in groups]))
# Sum weighted Gini index for each group
gini = 0.0
for group in groups:
size = float(len(group))
# Avoid division by zero
if size == 0:
continue
score = 0.0
# Score the group based on the score for each class
for class_val in classes:
p = [row[-1] for row in group].count(class_val) / size
score += p * p
# Weight the group score by its relative size
gini += (1.0 - score) * (size / n_instances)
return gini

Split a Dataset Based on an Attribute and an Attribute Value

This function splits the dataset into two groups based on a feature index and a threshold value.

def test_split(index, value, dataset):

left, right = list(), list()
for row in dataset:
if row[index] < value:
left.append(row)
else:
right.append(row)
return left, right

Select the Best Split Point for a Dataset

This function evaluates all potential splits and selects the one with the lowest Gini index.

def get_split(dataset):
class_values = list(set(row[-1] for row in dataset))
b_index, b_value, b_score, b_groups = 999, 999, 999, None
for index in range(len(dataset[0])-1):
for row in dataset:
groups = test_split(index, row[index], dataset)
gini = gini_index(groups, class_values)
if gini < b_score:
b_index, b_value, b_score, b_groups = index, row[index], gini, groups
return {'index': b_index, 'value': b_value, 'groups': b_groups}

5 ANSHUMAN JHA
Building Random Forest Algorithm from Scratch in Python

Create a Terminal Node Value

This function determines the output value for a terminal node, which is the most common class in the group.

def to_terminal(group):
outcomes = [row[-1] for row in group]
return max(set(outcomes), key=outcomes.count)

Create Child Splits for a Node or Make Terminal

This recursive function splits nodes into child nodes or makes them terminal nodes if stopping criteria are met.

def split(node, max_depth, min_size, depth):

left, right = node['groups']
del(node['groups'])
# Check for no split
if not left or not right:
node['left'] = node['right'] = to_terminal(left + right)
return
# Check for max depth
if depth >= max_depth:
node['left'], node['right'] = to_terminal(left), to_terminal(right)
return
# Process left child
if len(left) <= min_size:
node['left'] = to_terminal(left)
else:
node['left'] = get_split(left)
split(node['left'], max_depth, min_size, depth+1)
# Process right child
if len(right) <= min_size:
node['right'] = to_terminal(right)
else:
node['right'] = get_split(right)
split(node['right'], max_depth, min_size, depth+1)

Build a Decision Tree

This function builds a decision tree by recursively splitting nodes.

def build_tree(train, max_depth, min_size):

root = get_split(train)
split(root, max_depth, min_size, 1)
return root

6 ANSHUMAN JHA
Building Random Forest Algorithm from Scratch in Python

Make a Prediction with a Decision Tree

This function predicts the class label for a given row of data using a decision tree.

def predict(node, row):

if row[node['index']] < node['value']:
if isinstance(node['left'], dict):
return predict(node['left'], row)
else:
return node['left']
else:
if isinstance(node['right'], dict):
return predict(node['right'], row)
else:
return node['right']

Create a Random Subsample from the Dataset with Replacement

This function creates a bootstrap sample from the dataset, which is used to train each decision tree in the
Random Forest.

from random import seed, randrange

def subsample(dataset, ratio):

sample = list()
n_sample = round(len(dataset) * ratio)
while len(sample) < n_sample:
index = randrange(len(dataset))
sample.append(dataset[index])
return sample

Make a Prediction with a List of Bagged Trees

This function predicts the class label for a given row of data by aggregating predictions from multiple decision
trees.

def bagging_predict(trees, row):

predictions = [predict(tree, row) for tree in trees]
return max(set(predictions), key=predictions.count)

Step 2: Building the Random Forest

With our decision tree implementation ready, we can now build the Random Forest. The Random Forest will be
composed of multiple decision trees, each trained on a different bootstrap sample of the training data.
Create a Random Forest
This function builds the Random Forest by creating multiple decision trees, each trained on a different
bootstrap sample of the training data.

def random_forest(train, max_depth, min_size, sample_size, n_trees, n_features):

trees = list()
for _ in range(n_trees):
sample = subsample(train, sample_size)
tree = build_tree(sample, max_depth, min_size)
trees.append(tree)
return trees

7 ANSHUMAN JHA
Building Random Forest Algorithm from Scratch in Python

Evaluate the Algorithm Using Cross-Validation

This function evaluates the performance of the Random Forest algorithm using cross-validation.

def evaluate_algorithm(dataset, algorithm, n_folds, *args):

folds = cross_validation_split(dataset, n_folds)
scores = list()
for fold in folds:
train_set = list(folds)
train_set.remove(fold)
train_set = sum(train_set, [])
test_set = list()
for row in fold:
row_copy = list(row)
test_set.append(row_copy)
row_copy[-1] = None
predicted = algorithm(train_set, *args)
actual = [row[-1] for row in fold]
accuracy = accuracy_metric(actual, predicted)
scores.append(accuracy)
return scores

Split a Dataset into K Folds

This function splits the dataset into k folds for cross-validation.

def cross_validation_split(dataset, n_folds):

dataset_split = list()
dataset_copy = list(dataset)
fold_size = int(len(dataset) / n_folds)
for _ in range(n_folds):
fold = list()
while len(fold) < fold_size:
index = randrange(len(dataset_copy))
fold.append(dataset_copy.pop(index))
dataset_split.append(fold)
return dataset_split

Calculate Accuracy Percentage

This function calculates the accuracy percentage of the predictions.

def accuracy_metric(actual, predicted):

correct = 0
for i in range(len(actual)):
if actual[i] == predicted[i]:
correct += 1
return correct / float(len(actual)) * 100.0

Test the Random Forest Algorithm

This function tests the Random Forest algorithm on the dataset.

def random_forest_algorithm(train, test, max_depth, min_size, sample_size, n_trees, n_features):

trees = random_forest(train, max_depth, min_size, sample_size, n_trees, n_features)
predictions = [bagging_predict(trees, row) for row in test]
return predictions

8 ANSHUMAN JHA
Building Random Forest Algorithm from Scratch in Python

Load Dataset Function

This function loads a dataset from a CSV file.

def load_dataset(filename):
dataset = list()
with open(filename, 'r') as file:
for line in file:
if line.strip():
dataset.append(list(map(float, line.split(','))))
return dataset

Main Script to Run the Random Forest Algorithm

The main script ties everything together and runs the Random Forest algorithm on a given dataset.

seed(1)
filename = 'data.csv'
dataset = load_dataset(filename)
n_folds = 5
max_depth = 10
min_size = 1
sample_size = 1.0
n_trees = 10
n_features = int(np.sqrt(len(dataset[0])-1))
scores = evaluate_algorithm(dataset, random_forest_algorithm, n_folds, max_depth, min_size, sample_size,
n_trees, n_features)
print(f'Scores: {scores}')
print(f'Mean Accuracy: {sum(scores)/float(len(scores)):.3f}%')

By breaking down each function and providing detailed comments, we have a clear understanding of how each part of
the Random Forest algorithm is implemented from scratch in Python.

5. Conclusion
Building a Random Forest from scratch involves understanding and implementing several key components: decision
trees, bootstrap aggregation, and random feature selection. By combining these elements, we can create a powerful
ensemble model that improves prediction accuracy and reduces overfitting.

This implementation provides a foundational understanding of how Random Forest works and allows you to customize
and extend the algorithm for specific use cases.

Constructive comments and feedback are welcomed

9 ANSHUMAN JHA

Machine Learning With Random Forests and Decision Trees - A Visual Guide For Beginners (Naren) PDF
No ratings yet
Machine Learning With Random Forests and Decision Trees - A Visual Guide For Beginners (Naren) PDF
68 pages
Practicing Our Faith A Way of Life for a Searching People Second Edition Dorothy C. Bass download
No ratings yet
Practicing Our Faith A Way of Life for a Searching People Second Edition Dorothy C. Bass download
140 pages
Pe Syllabus g12
100% (2)
Pe Syllabus g12
8 pages
Random Forest PHD Thesis
100% (3)
Random Forest PHD Thesis
4 pages
Machine Learning With Random Forests and Decision Trees PDF (1)
No ratings yet
Machine Learning With Random Forests and Decision Trees PDF (1)
171 pages
Machine Learning Random Forest Algorithm - Javatpoint
No ratings yet
Machine Learning Random Forest Algorithm - Javatpoint
14 pages
Random Forest
No ratings yet
Random Forest
3 pages
Random Forest Algorithm Unit 3
No ratings yet
Random Forest Algorithm Unit 3
2 pages
Random Forest
No ratings yet
Random Forest
2 pages
STEM Qualifying Exam Reviewer Grade10 Science Math 50Q
No ratings yet
STEM Qualifying Exam Reviewer Grade10 Science Math 50Q
26 pages
DS 7
No ratings yet
DS 7
5 pages
Test Bank For Cognitive Psychology: Connecting Mind, Research, and Everyday Experience, 5th Edition, E. Bruce Goldstein
100% (10)
Test Bank For Cognitive Psychology: Connecting Mind, Research, and Everyday Experience, 5th Edition, E. Bruce Goldstein
36 pages
School Monitoring, Evaluation, and Adjustment (Smea) : (Tools/Instrument)
No ratings yet
School Monitoring, Evaluation, and Adjustment (Smea) : (Tools/Instrument)
8 pages
Asia Score For Vertebra Injury
100% (1)
Asia Score For Vertebra Injury
2 pages
Forest
No ratings yet
Forest
2 pages
10 Random - Forest - Algo
No ratings yet
10 Random - Forest - Algo
6 pages
Field Attachment Students Report Form
No ratings yet
Field Attachment Students Report Form
6 pages
Main
No ratings yet
Main
27 pages
2023AIB1008 Lab08
No ratings yet
2023AIB1008 Lab08
8 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
15' Stress Test
No ratings yet
15' Stress Test
1 page
Ex.6
No ratings yet
Ex.6
10 pages
Lecture 1 - Plane Wave
No ratings yet
Lecture 1 - Plane Wave
35 pages
Machinist Mate 3 2 Surface Navy
No ratings yet
Machinist Mate 3 2 Surface Navy
592 pages
Why Do You Glamorize Serial Killers in The Media
No ratings yet
Why Do You Glamorize Serial Killers in The Media
7 pages
#Freud's Concept of Narcissism
No ratings yet
#Freud's Concept of Narcissism
5 pages
Building A Decision Tree Classifier From Scratch
No ratings yet
Building A Decision Tree Classifier From Scratch
10 pages
Economics Day 7
No ratings yet
Economics Day 7
3 pages
p77253 Btec l3 Applied Science 31617h Unit 1b Jan 2024
No ratings yet
p77253 Btec l3 Applied Science 31617h Unit 1b Jan 2024
12 pages
Internal Assessment
No ratings yet
Internal Assessment
3 pages
Aditri Chaudhuri - DM
No ratings yet
Aditri Chaudhuri - DM
10 pages
Random Forest 1737667979
No ratings yet
Random Forest 1737667979
11 pages
Trees and Random Forest
No ratings yet
Trees and Random Forest
34 pages
Term Paper Tungkol Sa k12
100% (1)
Term Paper Tungkol Sa k12
7 pages
Decision Trees Implementation
No ratings yet
Decision Trees Implementation
13 pages
ML Asst.-01
No ratings yet
ML Asst.-01
21 pages
Random Forest
No ratings yet
Random Forest
10 pages
An Introduction To Random Forest Algorithm For Beginners
No ratings yet
An Introduction To Random Forest Algorithm For Beginners
16 pages
Machine Learning - Random Forest
No ratings yet
Machine Learning - Random Forest
6 pages
Drdo Research Project
No ratings yet
Drdo Research Project
5 pages
JAVA Modifier Inheritance
No ratings yet
JAVA Modifier Inheritance
3 pages
Random Forest
No ratings yet
Random Forest
21 pages
Massicotte Background-Working With Bion
No ratings yet
Massicotte Background-Working With Bion
6 pages
PPT
No ratings yet
PPT
14 pages
Random Forest
No ratings yet
Random Forest
8 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Additional English - 4th Semester Full
No ratings yet
Additional English - 4th Semester Full
48 pages
03 - Random Forest
No ratings yet
03 - Random Forest
24 pages
California Academy For Lilminius (Cal) : Lesson Plan
No ratings yet
California Academy For Lilminius (Cal) : Lesson Plan
2 pages
Random Forest
No ratings yet
Random Forest
13 pages
Decision Trees
No ratings yet
Decision Trees
11 pages
Practical No4 - 5 ML
No ratings yet
Practical No4 - 5 ML
11 pages
Machine Learning With Random Forests and Decision Trees - A Visual Guide For Beginners by Scott Hartshorn
No ratings yet
Machine Learning With Random Forests and Decision Trees - A Visual Guide For Beginners by Scott Hartshorn
73 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
Discovering Your Natural Gifts and Transform Your Life Barry Douglass Mccollough
No ratings yet
Discovering Your Natural Gifts and Transform Your Life Barry Douglass Mccollough
10 pages
Buckingham Pi Theorem
No ratings yet
Buckingham Pi Theorem
2 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
9 pages
Tree Based Algorithms in Machine Learning
No ratings yet
Tree Based Algorithms in Machine Learning
8 pages
Random Forests For Beginners PDF
No ratings yet
Random Forests For Beginners PDF
71 pages
Sabar Rutoto, Henry Suryo Bintoro, Ika Oktavianti, Sumaji
No ratings yet
Sabar Rutoto, Henry Suryo Bintoro, Ika Oktavianti, Sumaji
9 pages
Random Forest Explained
No ratings yet
Random Forest Explained
39 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
39 pages
SMEA Orientation
No ratings yet
SMEA Orientation
35 pages
2024-LS-G8-NMP Mathematics Q1 W3 D2
No ratings yet
2024-LS-G8-NMP Mathematics Q1 W3 D2
13 pages
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
No ratings yet
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
11 pages
Hartshorn, Scott 2016 - Machin Learning With Random Forests and Decision Trees - A Visual Guide For Beginners
No ratings yet
Hartshorn, Scott 2016 - Machin Learning With Random Forests and Decision Trees - A Visual Guide For Beginners
98 pages
Case Write Up Harley Davidson
No ratings yet
Case Write Up Harley Davidson
1 page
Random Forest Algorithms - Comprehensive Guide With Examples
No ratings yet
Random Forest Algorithms - Comprehensive Guide With Examples
13 pages
Python Implementation of Random Forest Algorithm
No ratings yet
Python Implementation of Random Forest Algorithm
10 pages
Random Forest - Basics
No ratings yet
Random Forest - Basics
9 pages
Schonlau Zou 2020 The Random Forest Algorithm For Statistical Learning
No ratings yet
Schonlau Zou 2020 The Random Forest Algorithm For Statistical Learning
27 pages
Extended Shear Tab Connections Under Combined Axial and Shear Loading
No ratings yet
Extended Shear Tab Connections Under Combined Axial and Shear Loading
10 pages
Random Forests
No ratings yet
Random Forests
35 pages
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
No ratings yet
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
12 pages
23 Ens RandomForests
No ratings yet
23 Ens RandomForests
27 pages
Random Forests
No ratings yet
Random Forests
22 pages
Lecture 19 Different Classification Models
No ratings yet
Lecture 19 Different Classification Models
22 pages
Da MS
No ratings yet
Da MS
24 pages
Random Forests
No ratings yet
Random Forests
43 pages
CSL0777 L26
No ratings yet
CSL0777 L26
33 pages
Intrinsic FUNCTIONS in COBOL
No ratings yet
Intrinsic FUNCTIONS in COBOL
33 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Random Forest
No ratings yet
Random Forest
25 pages
Decision Tree & Regression
No ratings yet
Decision Tree & Regression
33 pages
Random Forest
No ratings yet
Random Forest
83 pages

Learn Python From Scratch

Uploaded by

Learn Python From Scratch

Uploaded by

Building

Random Forest Algorithm

Without relying on high-level libraries

1. Introduction to Random Forest Algorithm

2. Building Blocks of a Random Forest

Bootstrap Aggregation (Bagging)

Random Feature Selection

3. The Structure of a Random Forest Algorithm

Step 1: Implementing the Decision Tree

Calculate the Gini Index

def gini_index(groups, classes):

Split a Dataset Based on an Attribute and an Attribute Value

def test_split(index, value, dataset):

Select the Best Split Point for a Dataset

Create a Terminal Node Value

Create Child Splits for a Node or Make Terminal

def split(node, max_depth, min_size, depth):

Build a Decision Tree

def build_tree(train, max_depth, min_size):

Make a Prediction with a Decision Tree

def predict(node, row):

Create a Random Subsample from the Dataset with Replacement

from random import seed, randrange

def subsample(dataset, ratio):

Make a Prediction with a List of Bagged Trees

def bagging_predict(trees, row):

Step 2: Building the Random Forest

def random_forest(train, max_depth, min_size, sample_size, n_trees, n_features):

Evaluate the Algorithm Using Cross-Validation

def evaluate_algorithm(dataset, algorithm, n_folds, *args):

Split a Dataset into K Folds

def cross_validation_split(dataset, n_folds):

Calculate Accuracy Percentage

def accuracy_metric(actual, predicted):

Test the Random Forest Algorithm

def random_forest_algorithm(train, test, max_depth, min_size, sample_size, n_trees, n_features):

Load Dataset Function

Main Script to Run the Random Forest Algorithm

Constructive comments and feedback are welcomed

You might also like