Forest Fire Prediction Sem 8 - Review 1
Forest Fire Prediction Sem 8 - Review 1
• Forest fires are a major environmental hazard that threatens forest preservation,
causing economic and ecological harm as well as human suffering.
• They endanger not only the forest's wealth, but also the entire ecosystem's
animals and vegetation, causing major disruption of a region's biodiversity,
ecology, and environment.
• In this project we are going to perform the forest fire prediction on two datasets
using machine learning classification techniques. (SVM, DT, Logistic
Regression, KNN, Random Forest, Stacking and Voting techniques).
Literature Survey
AUTHOR PUBLISHER TITLE ADVANTAGE DISADVANTAGE
NAME
T. Preeti, S. Kanakaraddi, A. Beelagi, S. IEEE - 2021 Forest Fire Prediction This paper ventilates about a comparative research of different The project cannot predict
Malagi and A. Sudi Using Machine models for predicting forest fire such as Decision Tree, Random about the common provincial
Learning Techniques Forest, Support Vector Machine, Artificial Neural networks (ANN) locations and doesn’t have
algorithms. In this barometrical parameters such as temperature, any sort of API or GUI for local
rain, wind and humidity were used. assistance.
Liqing Si, Lifu Shu, Mingyu Wang, IEEE - 2022 Study on forest fire The main goals of this paper is to determine the best logistic The data source used in the
Fengjun Zhao, Feng Chen, Weike Li, Wei danger prediction regression model for predicting the chance of forest fires in the system is complex in nature
Li. study area and to examine the link between climatic factors, and the data is split into two
vegetation types, and terrain and forest fires. The value of ratio fire and non fire ,these
prediction accuracy of logistic model reached 86.9% when the ratio ratios should be adjusted
of fire to non-fire point was 1:1.5. according. Only then the
model's prediction percent
will increase to above 90%
approx.
Faroudja Abid Springer - 2020 A Survey of Machine The journal proposed new machine learning method, named as Geographical Information
Learning Algorithms DFP-MnBpAnn, based on Artificial Neural Network System (GIS) is necessary and
Based Forest Fires without this the proposed
Prediction and model cannot work properly
Detection Systems
K. V. Murali Mohan, A. R. Satish, K. IEEE - 2021 Leveraging Machine This paper predicts algorithm which helps wildfire rescue workers The project lacks precise
Mallikharjuna Rao, R. K. Yarava and G. Learning to Predict to use their foreseen level in the initial phases in order to mitigate ignition point information and
C. Babu Wild Fires destruction inflicted by a forest fire. Modeling information has takes large response time.
been gathered from Natural Resources Canada's real-time dataset,
including forest fire and weather information for Alberta, Canada.
To evaluate the severity of flames, the dimensions of the region
affected with fire and the timeframe of the flames have been used.
AUTHOR PUBLISHER TITLE ADVANTAGE DISADVANTAGE
NAME
D. Rosadi, W. Andriyani, IEEE - 2020 Prediction of Forest Fire This paper proposes to use support vector machine, k-Nearest Neighborhood, This project is only designed
D. Arisanty and D. Occurrence in Peatlands Logistic Regression, Decision Tree and Naïve Bayes. For the purpose of to predict forest fire for the
Agustina using Machine Learning comparability AdaBoost (DT based) approach for Forest Fire Prediction in peatland fire hotspots of peatland
Approaches areas. areas in Indonesia and the
scope of extensive global
expansion prediction is yet
to be explored
P. Rakshit et al IEEE - 2021 Prediction of Forest Fire The paper presents the prediction of forest fire risk with the help of a machine Big data measurement and
Using Machine Learning learning algorithm by using meteorological data. they have worked with different analysis related to re
Algorithms: The Search classification models to check which models work best to predict forest fire with occurrences and prediction
for the Better Algorithm greater accuracy. The results are obtained with the help of various classifiers in of re events require a more
machine learning are much better and reliable than the results obtained by significant framework
traditional computing methods.
Sharma, R., Rani, S. & Springer - 2020 A smart approach for In this paper the results suggest that the Boosted decision tree model with the Area The system doesn't use
Memon. fire prediction under Under Curve (AUC) value of 0.78 is the most suitable candidate for a fire prediction oxygen as a parameter for
uncertain conditions model. Based on the results, a novel IoT-based smart Fire prediction system that the prediction. Since the
using machine learning would consider both meteorological data and images for early fire prediction. oxygen level contribute
more in forest fire it can't
be ignored. It would have
also helped in reflecting
more accurate and precised
model.
Problem Statement
● Forest fires are one of the most serious problems in today's world, and they are
becoming more common as global warming and other climatic conditions
worsen. Wildfires are fires that, once started, become uncontrollable, threatening
not only the lives of humans but also the wildlife that lives in the forest.
● Wildfires are one of the reasons that endangered flora and fauna that help to
maintain our planet's ecological balance are on the verge of extinction. We hope
to predict forest fires using Machine Learning classification algorithms such as
Logistic Regression, Support Vector Classifier, Decision Tree, KNN, Random
Forest, Stacking and Voting techniques in this project. So that we could obtain
precise and accurate results without having to detect them.
Objective
● Forest fires can have an economic impact, with many families and communities
dependent on forests for food, ration, and fuel. Thus, Forest fire forecasting is an
important component of wildfire management.
● The main purpose of our project is not to determine whether forest fires will occur
or not.
Scope
• The project includes several machine learning algorithms for predicting forest fires. A
forest fire can be caused by any number of factors, including changes in climatic
conditions, rising temperatures, campfires, and so on. The effect of this is very brutal
as it threatens the wildlife habitat, pollutes the environment, and the houses of humans
get burned too.
• So far, the only solution to this is to use an automated prediction system to do the
prediction and plan the necessary measures beforehand.
• The model plans to predict the forest fire well in advance in a less complex way.
Furthermore, this model will help us to be panic-free from the last-minute chaos.
Hardware and Software requirements
• System must be able to predict whether the region will be affected by forest
fire based on the parameters present in the dataset.
• All the risks that can occur must be evaluated and efforts should be
undertaken to manage, mitigate and monitor those risks.
• It should be user friendly and benefitable for local use. So individual having
basic computer knowledge should be able to use the system for prediction.
• System should be able to predict accurately and instantly using only low
memory space.
• System should be safe, secure and reliable.
Proposed Methodology
● Initial dataset
● Preprocessing dataset
● Splitting data
● Use classification algorithms
1. Logistic Regression 5. Decision Tree
2. SVM 6. Stacking
3. KNN 7. Voting
4. Random Forest
Dataset Information
Forest fire dataset • Parameters: oxygen, temperature(°F),
humidity, fire occurrence.
•Random Forest: Random forest is a supervised learning approach used in machine learning for classification and
regression. Its a classifier that averages the results of many decision trees applied to distinct subsets of a dataset to improve
the datasets projected accuracy.
•Support vector machine: A support vector machine (SVM) is a simple algorithm that professionals can use for
classification or regression activities. They work by finding hyperplanes within a data distribution, which you can visualize
as a line separating two different classes of data.
•K-nearest neighbor: K-nearest neighbor (KNN) is a supervised lazy learner algorithm used in machine learning. This
means that it stores the training data that supervisors present and compares it to other data to make predictions.
•Logistic Regression: A logistic function is used to describe the probability of the probable outcomes of a single trial in this
technique.
• Voting: A voting classifier is a machine learning model that gains experience by training on a
collection of several models and estimates an output (class) based on the class with the highest
possibility of being the output. It is also used to boost the performance of the model.
• Stacking: Using stacking, you can combine different regression or classification models. The two most
well known ensemble modeling techniques are bagging and boosting.
- Bagging enables the averaging of several comparable models with significant volatility in order to
reduce variance.
- Boosting creates numerous incremental models in order to reduce bias while trying to minimize
variance.
• Hyperparameter tuning :
Hyperparameter Tuning is an important aspect of regulating the behaviour of a machine learning model. Our predicted model
parameters will yield less-than-ideal outcomes if our hyperparameters aren't properly tuned to minimize the loss function. This
indicates that our model has more flaws. A hyperparameter value is set before the learning process begins. Models may have a
large number of hyperparameters, and determining the ideal set of parameters can be approached as a search issue.
Hyperparameters control the over-fitting and under-fitting of the model. Optimal hyperparameters often differ for different
datasets.
To get the best hyperparameters the following steps are followed:
1. For each proposed hyperparameter setting the model is evaluated
2. 2. The hyperparameters that give the best model are selected.
• K-fold Cross-validation :
The data set is partitioned into a number of K-folds and utilised in K-fold cross-validation to evaluate the model's performance as
fresh data become available. The data sample is divided into K groups, which stands for the number of groups. For instance, you
can refer to it as 5-fold cross-validation if the k value is 5. At some point during the procedure, each fold is utilised as a test set.
Performance Metrics For Classification
• Accuracy
Accuracy simply measures how often the classifier correctly predicts. We can define accuracy as the ratio of the number
of correct predictions and the total number of predictions. Accuracy is useful when the target class is well balanced but is
not a good choice for the unbalanced classes.
• Precision
Precision explains how many correctly predicted cases were actually successful. When False Positives are more of a
concern than False Negatives, precision can be useful. Precision is calculated by dividing the number of predicted
positives by the number of true positives.
• Recall (Sensitivity)
Recall explains how many of the actual positive cases we were able to predict correctly with our model. It is a useful
metric in cases where False Negative is of higher concern than False Positive.
• Confusion Matrix
Confusion Matrix is a performance measurement for classification problems in machine learning where there can be two
or more classes output. It is a table with actual and predicted value combinations. The table that is frequently used to
describe the performance of a classification model on a set of test data for which the true values are known is referred to
as a confusion matrix
• F1 Score
F1 Score is the harmonic mean of precision and recall.
Gantt Chart
Use Case Diagram
Sequence Diagram:
RMMM Model and Planning
2. balance the dataset out of an 2. Cluster the abundant class and 2. Data cleaning and formatting
imbalanced one i.e., should perform resample them with different ratios to make it consistent
under-sampling and over-sampling
RMMM2 Chances of Low accuracy 1.Identify common downfalls of 1. Analyze the impact of risks, 1. Identify and evaluate
existing ML techniques when prioritize them according to their alternatives
learning from under-represented severity
data.
2. Increase the number of training 2. Consider alternate values for the 2. Compare the final output and
examples training parameters used by the determine which classifier
classifiers predicts the forest fire with
maximum accuracy
RMMM3 Implementing various 1.Make use of simple classifiers for 1.Use efficient algorithms for the 1.Identify the complex classifier
machine learning modules implementation model. among the other classifiers and
can turn out to be more work more on that classifier.
difficult than expected. 2.Train the models accordingly. 2.Make appropriate feature selection
Implementation
Random Forest 99.76 89.32 89.07 90.24 90.00 89.00 89.00 94.69
K Nearest Neighbours 93.66 82.52 81.49 87.56 84.00 84.00 84.00 89.86
Support Vector Classifier 81.71 81.55 81.27 88.54 79.00 80.00 79.00 89.55
Decision Tree 99.76 81.55 86.34 88.78 85.00 84.00 85.00 89.30
Training accuracy, Testing accuracy and k-fold accuracy
• Dataset 1 • Dataset 2
Train-Test-Kfold comparison
Train-Test-Kfold comparison
100.00
100.00
80.00
80.00
60.00
Accuracy %
60.00
Accuracy %
40.00
40.00
20.00 20.00
0.00 0.00
Logistic Re- Random Forest K Nearest Support Vector Decision Tree Logistic Re- Random Forest K Nearest Support Vector Decision Tree
gression Neighbours Classifier gression Neighbours Classifier
Classifiers Classifiers
Training Accuracy Test Accuracy k-fold(10 fold) Training Accuracy Test Accuracy k-fold(10 fold)
k-fold accuracy and Hyperparameter tuning accuracy
• Dataset 1 • Dataset 2
90.00
90.00
Accuracy %
Accuracy %
80.00
80.00
70.00
Logistic Random Forest K Nearest Support Vector Decision Tree 70.00
Regression Neighbours Classifier Logistic Random Forest K Nearest Support Vector Decision Tree
Regression Neighbours Classifier
Classifiers
Classifiers
• Dataset 2:
Classifiers Accuracy Precision Recall F1
Logistic Regression 81.00 80.00 81.00 80.00
Random Forest 89.00 90.00 89.00 89.00
K Nearest Neighbours 84.00 84.00 84.00 84.00
Support Vector Classifier 80.00 79.00 80.00 79.00
Decision Tree 84.00 85.00 84.00 85.00
Comparison with other research papers
90.00%
80.00%
[1]S.Natekar et. al
70.00% [3]G.Denim et. al
[4]G.E.Sakr et. al
60.00% [5]N.Omar et. al
[6]B.K.Singh et. al
50.00% [7]M.Anshori et. al
[8]Shidik et. al
Accuracy
0.00%
l l l al l l l l l l l al l l k
t .a .a .a t. .a .a .a .a .a .a .a t. .a .a or
re et re
t
re et ie
t et et ie
t et et ie et ae
t
W
ka im ak a gh or di
k gh e t Si an d ish rm n t
e en .S Om in h hi in re g oh sa at rre
Na
t
.D .E . .S ns ]S .S .P qi
n
.M .R
o .S ha u
. G ]G ]N .K .A [8 ]M 1]
T Li D .R 7 ]S C
]S [3
]
[4 [5 ]B ]M 0 [1 2] .M 5] A [1
[1 [6 [7 [1 [1 .V [1 6]
K [1
4]
[1
Research papers
ROC-AUC curve
• Dataset 1
roc_auc_score_rf: 0.9808612440191388
roc_auc_score_lr: 0.9880382775119616
roc_auc_score_knn: 0.9282296650717702
roc_auc_score_dt: 0.9138755980861244
roc_auc_score_sv: 0.9856459330143541
• Dataset 2
roc_auc_score_rf: 0.9382255389718077
roc_auc_score_lr: 0.8897180762852405
roc_auc_score_knn: 0.8986318407960199
roc_auc_score_dt: 0.8930348258706468
roc_auc_score_sv: 0.8955223880597015
Conclusion
• Forest fire Prediction is an important aspect of forest fire control. It has a significant
impact on resource allocation, mitigation, and recovery. Here in this project, we
predicted the occurrence of forest fires using Machine Learning classification
algorithms such as Logistic Regression, Support Vector Classifier, Naive Bayes,
Decision Tree, KNN and Random Forest in this project.
• So by this way, we can help in managing forest fires before it destroys the whole
forest. This makes prevention easier by predicting forest fires easily. In the future,
we aim to improve our model by improving accuracy and speed. We would also like
to make live predictions or on-site predictions using satellite images.
References
[1] Abid, F. A Survey of Machine Learning Algorithms Based Forest Fires Prediction and Detection Systems. Fire Technol 57, 559–
590 (2021).
[2] T. Preeti, S. Kanakaraddi, A. Beelagi, S. Malagi and A. Sudi, "Forest Fire Prediction Using Machine Learning Techniques," 2021
International Conference on Intelligent Technologies (CONIT), 2021, pp. 1-6.
[3] D. Rosadi, W. Andriyani, D. Arisanty and D. Agustina, "Prediction of Forest Fire Occurrence in Peatlands using Machine
Learning Approaches," 2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI),
2020, pp. 48-51.
[4] Sharma, R., Rani, S. & Memon, I. A smart approach for fire prediction under uncertain conditions using machine learning.
Multimed Tools Appl 79, 28155–28168 (2020).
[5] Liqing Si, Lifu Shu, Mingyu Wang, Fengjun Zhao, Feng Chen, Weike Li, Wei Li. Study on forest fire danger prediction in
plateau mountainous forest area, Natural Hazards Research,2022, pp. 25-32, ISSN 2666-5921.
[6] P. Rakshit et al., "Prediction of Forest Fire Using Machine Learning Algorithms: The Search for the Better Algorithm," 2021 6th
International Conference on Innovative Technology in Intelligent System and Industrial Applications (CITISIA), 2021, pp. 1-6.
[7] K. V. Murali Mohan, A. R. Satish, K. Mallikharjuna Rao, R. K. Yarava and G. C. Babu, "Leveraging Machine Learning to Predict
Wild Fires," 2021 2nd International Conference on Smart Electronics and Communication (ICOSEC), 2021, pp. 1393-1400.
Thank You