0% found this document useful (0 votes)

45 views

RANDOM FOREST (Binary Classification)

The document describes a machine learning workflow for binary classification of honey samples using spectral data. It includes: 1) Importing common Python libraries for data processing, modeling, and visualization. 2) Loading a CSV dataset, splitting it into features (spectral data) and a target (adulterated or not). 3) Training a random forest model on 80% of the data and evaluating its predictions on the remaining 20%. Key steps are data preprocessing, model training and tuning, and evaluating performance using various metrics to identify an accurate model. Feature importance plots provide insights into the most predictive spectral bands.

Uploaded by

Noor Ul Haq

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views

RANDOM FOREST (Binary Classification)

Uploaded by

Noor Ul Haq

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

CODE:

import numpy as np # linear algebra

import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import os

for dirname, _, filenames in os.walk('/kaggle/input'):

for filename in filenames:

print(os.path.join(dirname, filename))

EXPLANATION:
1. import numpy as np: This line imports the NumPy library as np. NumPy is a fundamental library
for numerical computations in Python, and it provides support for arrays and matrices, which are
commonly used in machine learning.

2. import pandas as pd: This line imports the Pandas library as pd. Pandas is another essential
library for data manipulation and analysis in Python, often used to work with structured data,
such as CSV files or data tables.

3. Comments (# data processing, CSV file I/O...): These lines are comments that provide
explanations for the purpose of the imported libraries.

4. import os: This line imports the os module, which provides a way to interact with the operating
system. It is used to perform file and directory operations.

5. for dirname, _, filenames in os.walk('/kaggle/input'):: This line initiates a loop using the os.walk
function to traverse the directory tree starting from the '/kaggle/input' directory. It retrieves
three values in each iteration:

 dirname: The current directory being explored.

 _: A list of subdirectories in the current directory (but not used in this loop).

 filenames: A list of filenames in the current directory.

6. for filename in filenames:: This line starts another loop to iterate over the list of filenames
obtained in the previous step.

7. print(os.path.join(dirname, filename)): In this line, os.path.join() is used to combine the current

dirname and filename into a full path, and then print() is used to display that full path to the
console. This effectively lists all the files in the '/kaggle/input' directory and its subdirectories.

CODE:
# Import necessary libraries

import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

import matplotlib.pyplot as plt

# Load the dataset

data = pd.read_csv('/kaggle/input/honey-adulteration/adulteration.csv') # Replace 'your_dataset.csv'

with the actual file path

# Split the data into features (X) and target (y)

X = data.iloc[:, 4:-1] # Select spectral band columns as features

y = data['Class'] # Target variable (adulterated or not)

# Map 'Class' to binary labels (e.g., 'Clover' to 1 and others to 0)

y = (y == 'Clover').astype(int)

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features (optional but often recommended)

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

# Initialize and train a classification model (Random Forest in this example)

clf = RandomForestClassifier(random_state=42)

clf.fit(X_train, y_train)
# Make predictions on the test set

y_pred = clf.predict(X_test)

# Evaluate the model's performance

accuracy = accuracy_score(y_test, y_pred)

conf_matrix = confusion_matrix(y_test, y_pred)

classification_rep = classification_report(y_test, y_pred)

# Print the evaluation results

print(f'Accuracy: {accuracy:.2f}')

print(f'Confusion Matrix:\n{conf_matrix}')

print(f'Classification Report:\n{classification_rep}')

# Plot feature importances (if applicable to your model)

feature_importances = clf.feature_importances_

plt.figure(figsize=(10, 6))

plt.bar(range(len(feature_importances)), feature_importances)

plt.xlabel('Spectral Bands')

plt.ylabel('Feature Importance')

plt.title('Feature Importance for Binary Classification')

plt.show()

EXPLANATION:
This code snippet demonstrates a workflow for building and evaluating a binary classification model
using a dataset with spectral band features. Here's a step-by-step explanation:

1. Importing Libraries:

 Necessary libraries such as pandas, numpy, scikit-learn, and matplotlib are imported to
perform data manipulation, model building, and visualization tasks.

2. Loading the Dataset:

 The dataset is loaded from a CSV file ('adulteration.csv') using pandas and stored in a
DataFrame named data.

3. Splitting Features and Target:

 The features (X) are selected from the DataFrame, excluding the first four columns
(assuming they are not needed for modeling). These columns are assumed to represent
spectral band data.

 The target variable (y) is extracted from the 'Class' column, where binary labels are
created. For example, 'Clover' is mapped to 1 (indicating adulterated) and other classes
to 0 (indicating not adulterated).

4. Splitting the Dataset:

 The dataset is split into training and testing sets using the train_test_split function from
scikit-learn. This is a common practice for evaluating machine learning models. Here,
80% of the data is used for training, and 20% is used for testing.

5. Standardizing Features (Optional):

 The features are standardized using the StandardScaler from scikit-learn.

Standardization scales the features to have a mean of 0 and a standard deviation of 1,
which can help some machine learning algorithms perform better. This step is optional
but often recommended.

6. Initializing and Training a Classification Model (Random Forest):

 A binary classification model is initialized using the RandomForestClassifier from scikit-

learn. This model is used to learn the relationship between spectral band features and
the binary target variable (adulterated or not).

 The model is trained on the standardized training data using the fit method.

7. Making Predictions:

 The trained model is used to make predictions on the test set using the predict method.

8. Evaluating Model Performance:

 The code calculates the accuracy of the model's predictions using the accuracy_score
function from scikit-learn.

 The confusion matrix is computed using the confusion_matrix function, providing

information about true positives, true negatives, false positives, and false negatives.

 A classification report is generated using the classification_report function, which

includes precision, recall, F1-score, and support for both classes.

9. Printing Evaluation Results:

 The accuracy, confusion matrix, and classification report are printed to assess the
model's performance.

10. Plotting Feature Importances (if applicable):

 If the model supports feature importance analysis (as Random Forest does), the code
calculates and plots feature importances. This helps understand which spectral bands
contribute most to the classification decision.

Overall, this code provides a complete example of a binary classification workflow, including data
preprocessing, model training, evaluation, and feature importance analysis.

GRAPH EXPLANATION:
The graph in the code is used to visualize the feature importances when using a Random Forest classifier
for binary classification. This visualization helps you understand which spectral bands (features) are the
most important for making classification decisions. Here's an explanation of the graph:

1. Feature Importances: In a machine learning model like Random Forest, feature importances
represent how much each feature (spectral band, in this case) contributes to the model's
predictions. Higher feature importance indicates that the feature is more influential in making
classification decisions.

2. x-axis (Spectral Bands): The x-axis of the graph represents the spectral bands used as features.
Each band corresponds to a specific wavelength in the hyperspectral data, such as 399.40nm,
404.39nm, and so on. These bands are the input features for the model.

3. y-axis (Feature Importance): The y-axis represents the feature importance scores. It quantifies
the importance of each spectral band in the classification process. Higher values indicate more
important features.

4. Bars: Each bar in the graph corresponds to a specific spectral band. The height of the bar
represents the feature importance score for that band. The taller the bar, the more important
that particular band is in making classification decisions.

5. Interpretation: By looking at this graph, you can identify which spectral bands have the most
significant impact on whether honey is classified as 'Clover' (positive class) or not. Bands with
higher feature importance are more informative for distinguishing between the two classes.

6. Usage: You can use this information to potentially reduce the number of features (spectral
bands) in your model if some bands are less important. It can also provide insights into the
underlying characteristics of the data and help focus further analysis on specific wavelengths
that are highly relevant for classification.

The graph is a valuable tool for feature selection and understanding the key factors contributing to the
model's decisions. It can guide decisions on feature engineering, model improvement, and domain-
specific insights about the dataset.

SLNX Administrators Guide PDF
No ratings yet
SLNX Administrators Guide PDF
876 pages
Kassambara, Alboukadel - Machine Learning Essentials - Practical Guide in R (2018)
100% (1)
Kassambara, Alboukadel - Machine Learning Essentials - Practical Guide in R (2018)
424 pages
1Z0 1127 24 Questionstest
No ratings yet
1Z0 1127 24 Questionstest
4 pages
Iflex BODI 0.1
No ratings yet
Iflex BODI 0.1
337 pages
WDM - Week - I
No ratings yet
WDM - Week - I
24 pages
Machine Learning-1
No ratings yet
Machine Learning-1
24 pages
assignmnet (1)
No ratings yet
assignmnet (1)
25 pages
Code Explanation
No ratings yet
Code Explanation
3 pages
Experiment 1 Aim:: Introduction To ML Lab With Tools (Hands On WEKA On Data Set (Iris - Arff) ) - (A) Start Weka
No ratings yet
Experiment 1 Aim:: Introduction To ML Lab With Tools (Hands On WEKA On Data Set (Iris - Arff) ) - (A) Start Weka
55 pages
Table of Contents:: Predictnow - Ai Lets You Apply Machine Learning Predictions To Your Data Without Any Programming
No ratings yet
Table of Contents:: Predictnow - Ai Lets You Apply Machine Learning Predictions To Your Data Without Any Programming
15 pages
Scikit - Notes ML
100% (2)
Scikit - Notes ML
12 pages
Data Collection
No ratings yet
Data Collection
8 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
message (3)
No ratings yet
message (3)
2 pages
Deep Learning
No ratings yet
Deep Learning
25 pages
Python Library Functions
No ratings yet
Python Library Functions
12 pages
Credit_Card_Approval_Prediction_Report-Final
No ratings yet
Credit_Card_Approval_Prediction_Report-Final
27 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
Tensor Flow and Keras Sample Programs
No ratings yet
Tensor Flow and Keras Sample Programs
22 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
Prac 5
No ratings yet
Prac 5
4 pages
MLA Lab 6:-Implementation of Decision Tree
No ratings yet
MLA Lab 6:-Implementation of Decision Tree
16 pages
DV Unit 2 Update
No ratings yet
DV Unit 2 Update
13 pages
Intro To Deep Learning With TensorFlow - Introduction To TensorFlow Cheatsheet - Codecademy
No ratings yet
Intro To Deep Learning With TensorFlow - Introduction To TensorFlow Cheatsheet - Codecademy
8 pages
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
MLp
No ratings yet
MLp
28 pages
Raghav soni(20IOT6014) Algo_Assignment
No ratings yet
Raghav soni(20IOT6014) Algo_Assignment
14 pages
ds viva
No ratings yet
ds viva
9 pages
PRACTICAL5
No ratings yet
PRACTICAL5
23 pages
Decision Trees and Random Forests
No ratings yet
Decision Trees and Random Forests
25 pages
ML Report Fake News Detection
No ratings yet
ML Report Fake News Detection
15 pages
ML Hota Assign5
No ratings yet
ML Hota Assign5
2 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
12 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
MLT_07
No ratings yet
MLT_07
8 pages
20 Pandas Functions For 80% of Your Data Science
No ratings yet
20 Pandas Functions For 80% of Your Data Science
22 pages
ANL252 SU5 Jul2022
No ratings yet
ANL252 SU5 Jul2022
58 pages
Pandas For Machine Learning: Acadview
No ratings yet
Pandas For Machine Learning: Acadview
18 pages
Practical # 11
No ratings yet
Practical # 11
10 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Data Analysis and Visulaization Experiment
No ratings yet
Data Analysis and Visulaization Experiment
104 pages
Thinespary Sitharam 841007106016-Supply Chain Management Data Analytic
No ratings yet
Thinespary Sitharam 841007106016-Supply Chain Management Data Analytic
6 pages
DS Lec 7
No ratings yet
DS Lec 7
13 pages
Final Project Implementation
No ratings yet
Final Project Implementation
3 pages
3. Decision Tree Algorithm
No ratings yet
3. Decision Tree Algorithm
2 pages
What Is A Data Frame in R?
No ratings yet
What Is A Data Frame in R?
5 pages
Pandas Questions
No ratings yet
Pandas Questions
11 pages
UNITIV.BtechIot
No ratings yet
UNITIV.BtechIot
43 pages
20240514_Kazadi_Joel_9213934_DLMDSPWP01
No ratings yet
20240514_Kazadi_Joel_9213934_DLMDSPWP01
18 pages
Module 4 - Supervised Learning - First ML Model
No ratings yet
Module 4 - Supervised Learning - First ML Model
23 pages
Lab 4_Feature Selection_Appendix
No ratings yet
Lab 4_Feature Selection_Appendix
3 pages
1 (A) Explain Supervised Learning and Unsupervised Learning
No ratings yet
1 (A) Explain Supervised Learning and Unsupervised Learning
52 pages
Maxbox Starter60 Machine Learning
No ratings yet
Maxbox Starter60 Machine Learning
8 pages
A Neural Network Model Using Python
No ratings yet
A Neural Network Model Using Python
10 pages
20BCP021 Assignment 6
No ratings yet
20BCP021 Assignment 6
15 pages
Learn R_ Learn R_ Data Cleaning Cheatsheet _ Codecademy
No ratings yet
Learn R_ Learn R_ Data Cleaning Cheatsheet _ Codecademy
4 pages
project
No ratings yet
project
10 pages
Project 1
No ratings yet
Project 1
4 pages
Pravesh 6301
No ratings yet
Pravesh 6301
11 pages
Assignment1
No ratings yet
Assignment1
2 pages
Python Unit 4
No ratings yet
Python Unit 4
43 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
02 - 16 - Chitty Chat Bot Paper Summary
No ratings yet
02 - 16 - Chitty Chat Bot Paper Summary
1 page
DAta MEsh
No ratings yet
DAta MEsh
3 pages
CC1011 Midterm
No ratings yet
CC1011 Midterm
3 pages
Information Framework (SID) Fundamentals PREGUNTAS EXAMEN
No ratings yet
Information Framework (SID) Fundamentals PREGUNTAS EXAMEN
8 pages
Assignment 1 - IT312
No ratings yet
Assignment 1 - IT312
2 pages
Models in Information Behavior Research-With-Cover-Page-V2
No ratings yet
Models in Information Behavior Research-With-Cover-Page-V2
46 pages
Fiji National University: Namaka Library
No ratings yet
Fiji National University: Namaka Library
6 pages
Relational Database: Relationship Types
No ratings yet
Relational Database: Relationship Types
2 pages
Employers Information Requirements (EIRs)
No ratings yet
Employers Information Requirements (EIRs)
3 pages
Dbms Lab Record 2 Sem All Solved Full
No ratings yet
Dbms Lab Record 2 Sem All Solved Full
9 pages
Recovery System-RDBMS
No ratings yet
Recovery System-RDBMS
22 pages
Data Analysis with Power BI
No ratings yet
Data Analysis with Power BI
20 pages
GEN1-Evolution and Architrecture of T24-R08.01
No ratings yet
GEN1-Evolution and Architrecture of T24-R08.01
34 pages
The Growing Importance of Cybersecurity in The Digital Age
No ratings yet
The Growing Importance of Cybersecurity in The Digital Age
2 pages
Varitas CLI
No ratings yet
Varitas CLI
162 pages
Ckan PDF
No ratings yet
Ckan PDF
505 pages
Charisma Analyzer Powered by Tableau Software
100% (1)
Charisma Analyzer Powered by Tableau Software
12 pages
Minor Project Report Format
No ratings yet
Minor Project Report Format
6 pages
Kendall Sad9 PP 07
No ratings yet
Kendall Sad9 PP 07
51 pages
American Speakout Starter
No ratings yet
American Speakout Starter
56 pages
Portfolio
No ratings yet
Portfolio
16 pages
Resume Spark
No ratings yet
Resume Spark
4 pages
Angus 2015 JASS PDF
No ratings yet
Angus 2015 JASS PDF
20 pages
1 Pengenalan Penambangan Data-IMD
No ratings yet
1 Pengenalan Penambangan Data-IMD
34 pages
BI Performance
No ratings yet
BI Performance
16 pages
Stock Inventory System of Fabric Warehouse
No ratings yet
Stock Inventory System of Fabric Warehouse
4 pages
DBMS Answer Key
100% (1)
DBMS Answer Key
26 pages