0% found this document useful (0 votes)
37 views13 pages

2023 Social Media Data Extraction For Disaster Management Aid Using

Uploaded by

Kedar Sawant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views13 pages

2023 Social Media Data Extraction For Disaster Management Aid Using

Uploaded by

Kedar Sawant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Remote Sensing Applications: Society and Environment 30 (2023) 100961

Contents lists available at ScienceDirect

Remote Sensing Applications: Society and


Environment
journal homepage: www.elsevier.com/locate/rsase

Social media data extraction for disaster management aid using


deep learning techniques
Trisha Vishwanath a, Rudresh Deepak Shirwaikar b, *, Washitwa Mani Jaiswal a,
M. Yashaswini a
a Department of Information Science and Engineering, Ramaiah Institute of Technology (RIT), 560054, Karnataka, India
b Department of Computer Engineering, Agnel Institute of Technology and Design (AITD), Goa, India

ARTICLE INFO ABSTRACT

Keywords: The process of minimizing the damage being caused by disasters through information gathering,
Disaster management information sharing, disaster planning, problem-solving, and decision-making is known as disas-
Machine learning ter management. Social media generates a lot of information that can be used to provide disaster
Convolutional neural network relief organizations with crucial information. The method under discussion is data extraction
Transfer learning
from social media during disasters, as social media generates a lot of complex data, much of
which is not effectively utilized. Therefore, this study offers a technique for obtaining informa-
tion from Twitter, the primary social media source, to locate and distribute the crucial aid re-
quired in emergency and catastrophe situations.
Two distinct classification problems are covered in the study. Firstly, preprocessing the classi-
fication of images that are relevant to our inquiry is the first task and secondly, classifying four
different types of natural disasters, including earthquakes, floods, cyclones, and wildfires. In this
study, we employ Convolution Neural Network (CNN) designs along with the help of Transfer
Learning (TL) architectures, wherein the model is trained and tested using the secondary data set.
Furthermore, real-time tweets and images that are being extracted from Twitter are validated on
the trained models and the accuracy is noted. Additional data pretreatment methods like image
augmentation have also been used for preprocessing. Transfer Learning, bottleneck feature of in-
ceptionV3 and a fine-tuning model have been included, following the disaster classification
through the CNN model as a means to improve our accuracy up to 98.14%, and attaining 0.82
Precision, 0.86 Recall, and 0.84 F1-Score for Cyclone; 0.96 Precision, 0.89 Recall, and 0.92 F1-
Score for Earthquake; 0.74 Precision, 0.95 Recall, and 0.83 F1-Score for Flood and 0.97 Precision,
0.96 Recall, and 0.96 F1-Score for Wildfire classification. Disaster can manifest itself in different
forms and generally have a negative impact on the biosphere, resulting in loss of property and the
environment. This study aims to effectively exercise the power of social media data for disaster
management aid.

Abbreviations: ML, Machine Learning; DL, Deep Learning; UAVs, Unmanned Aerial Vehicles; CNN, Convolutional Neural Network; SVM, Support Vector Machine;
LSTM, Long Short-Term Memory; DR, Disaster Response; RNN, Recurrent Neural Networks; KDD, Knowledge Discovery in Databases; NLP, Natural Language
Processing; BERT, Bidirectional Encoder Representations from Transformers; GPU, Graphical Processing Unit; LDA, Latent Dirichlet Allocation; RLU, Rectified Linear
Unit; FC, Fully Connected.
* Corresponding author. Department of Computer Engineering, Agnel Institute of Technology and Design (AITD), Goa University, Assagao, Goa, 403507, India.

E-mail addresses: [email protected], [email protected] (R.D. Shirwaikar).

https://doi.org/10.1016/j.rsase.2023.100961
Received 24 November 2022; Received in revised form 19 February 2023; Accepted 19 March 2023
Available online 28 March 2023
2352-9385/© 2023 Elsevier B.V. All rights reserved.
T. Vishwanath et al. Remote Sensing Applications: Society and Environment 30 (2023) 100961

1. Introduction
In recent times, India has witnessed several major disasters that have completely changed trajectories and reshaped the lives of
many. Cyclone Tauktae which made its landfall in mid-May 2021 in Gujarat, was a devastating tropical cyclone that resulted in at
least 67 deaths in India, damaged nearly 88,910 houses, 475 boats, and adversely affected almost 1.49 lakh hectares of crops in Gu-
jarat (The Indian Express, 2021). To provide immediate disaster relief measures such as saving lives, providing shelter, food and
clothing, relief agencies should be able to quickly gain access to the evolving real-time status in the affected areas so that they can act
aptly. With the increase in smartphone usage and wider internet connectivity, the world population is just one touch away from get-
ting access to information around the world. In accordance with this social media platform plays a major role towards the exchange of
real-time information during disaster crises, as crises are generally complicated, they move at different speeds and have dispropor-
tionate impacts. By utilizing the tools offered by social media, the quality of information extracted from social media platforms can in-
crease an organization's ability to respond to emergencies with resilience. This can be considerably improved by opening new chan-
nels for cooperation to support impacted communities. For instance, when a crisis first develops, crisis workers can access data from
social networks on the basis of which management and first responders can assist in determining the origin and gravity of the problem
and communicate a unified message to the impacted areas.
Information from social networking sites which is present in the form of text, images, and audio-visual format, once extracted can
be used effectively to play a pivotal role in disaster management. Social media platforms like Twitter have emerged over the past sev-
eral years as crucial channels of communication during emergencies. With real-time monitoring, there are increasing calls for busi-
nesses or governments to manually monitor and comprehend people's Twitter feeds in real-time in order to provide quick responses
like disaster relief, medical assistance, shelters and donations.
Deep Learning (DL) is becoming increasingly popular in recent years due to its superior accuracy when the model is trained on
enormous volumes of data. The main benefit of DL is its attempt to incrementally learn high-level features from data. This does away
with the necessity for hard-core feature extraction and subject expertise. The method used to solve the problems differs significantly
between DL and Machine Learning (ML) techniques. Chamola's literature (Chamola et al., 2020) presents a thorough analysis of DL
models in conjunction with other technologies, including geodesics, satellite, remote sensing, smartphone-based, and Unmanned Aer-
ial Vehicles (UAVs). These technologies are used for disaster prediction, crowd evacuation, and post-disaster scenarios. DL is capable
of autonomously learning the representation of a complicated system for classification, prediction, or detection. In DL, long casual
chains of Neural Network layers are employed to build higher-level, more abstract computational models of the real system. DL tech-
niques enable representations with many degrees of abstraction, formed by simple non-linear modules that modify the representation
to a higher and more abstract one at each level, in order to finally learn invariant properties and incredibly complex functions. DL ad-
vances open up new approaches to disaster management.
DL has recently made advances that have helped people deal with disaster impacts that are severe and disastrous. The literature
(Linardos et al., 2022) employs CNN, Support Vector Machine (SVM), and Long Short-Term Memory (LSTM), three machine learning
architectures which are frequently used in disaster management. The average CNN accuracy for detecting flood water was 80.07%.
The study focuses on using Twitter to obtain geolocation data that isn't offered by other social media sites. It does not, however, con-
centrate on long-term disaster recovery that is enabled or fuelled by ML and DL advancements.
Despite recent positive results from Deep Learning (DL) algorithms in extracting knowledge from multiple modalities of data, the
use of DL approaches for Disaster Response (DR) activities has so far primarily been researched in an academic environment. When
comparing the outcomes of CNN with those of other DL algorithms, particularly LSTM and Recurrent Neural Networks (RNN), the sig-
nificant application of DL in literature (Nilani et al., 2022) has produced promising findings. The strategy of employing DL models for
disaster response has benefited from the usage of Knowledge Discovery in Databases (KDD) to find pertinent information from data
with the use of Natural Language Processing (NLP).
This study aims to form a repository of information by extracting all tweets and images associated with the ongoing disaster and
filtering out relevant information and storing them in a database to enable their access to local and central government agencies,
news agencies, defence services, NGOs, hospitals, voluntary organizations, and private companies who engaged in disaster relief op-
erations. It can create a mechanism to raise awareness and present a visual summary of the disasters in real-time. By extracting dona-
tion and fundraiser links, people residing outside the affected areas can further support the impacted community by being informed in
real-time on the evolving situation.
The primary objectives of the study were to address the following-(1) The extraction of data from Twitter, including tweets and
images, as our primary source of data, as well as the collection of secondary data sources for the model training. (2) Using a CNN
model, the extracted images were classified as “Disaster” or “Not a disaster”. (3) The classification of the previously identified “Disas-
ter” into one of the four chosen disasters of the study - Earthquake, Wildfires, Floods and Cyclones by using three models based on the
concept of CNN, transfer learning and InceptionV3, fine-tuning. (4) The validation of the CNN model built for the classification of dis-
asters using real-time data extracted from Twitter.

2. Related work
Lots of research related to Machine Learning (ML) and Deep Learning (DL) for Disaster Management have been proposed by others
which use social media data to train, validate, and test the models. This section highlights the key literature on using ML and DL in
disaster management.
Madichetty et al., (Madichetty and Sridevi, 2019), suggested a system for separating informative from uninformative tweets dur-
ing emergencies. They employed CNN for feature extraction and Artificial Neural Networks (ANN) for tweet classification in their

2
T. Vishwanath et al. Remote Sensing Applications: Society and Environment 30 (2023) 100961

methodology. The model outperformed the previous classification models in terms of precision (76%), recall (76%), F1-score (76%),
and accuracy (75.9%). However, the results can be improved by adding more layers and by employing advanced deep learning archi-
tectures. Wang et al. (2020), focused on using the Bidirectional Encoder Representations from Transformers (BERT) model in the do-
main of disaster detection in real life, and it also identified disasters through Twitter. BERT and a comparison of various neural net-
work models were used to detect disasters through the Tweets. The prediction was more precise when compared to models that did
not have keyword position data. The accuracy of the BERT model can be increased to over 90% by including the position. In the study,
it was discovered that the recall rate (88.12%) and precision rate (88.14%) were relatively near, suggesting that the model has a simi-
lar amount of false-negative and false-positive output that needs more investigation. But they were able to work only on text data that
was available on Twitter.
Sindhu et al. (Dr.Sindhu, 2019), investigated the use of machine learning to manage data related to disasters on social media. In
order to aid humanitarian groups by giving them a head start to plan relief activities through the updates received, the paper explores
denoising and classifying visual content from social media sites. The strategies for irrelevant image detection and duplicate image de-
tection are Deep Neural Network and Perceptual Hashing, respectively. According to the models they employed, the proposed sys-
tems with total accuracy were found to be 88%. The proposed system was discovered to be more universal and effective when com-
pared to previous models.
The method for gathering, managing, extracting, and categorizing disaster information from social media for disaster response
and volunteer work was put forth by Xie et al., (Xie and Yang, 2018). A web-based disaster information system and application system
was created for natural disasters, using Typhoon as a case study to evaluate the system. In accordance with the experiment's findings,
the method fared with average Precision (81.42%), Recall (83.09%), and F-Measure (81.42%). A disaster taxonomy of emergency re-
sponse was put forth by Asif et al. (2021). The decision-making process for emergency response was suggested to be automated by us-
ing an object identification algorithm. The same taxonomy was applied to the classification of images using deep learning and an
emergency response pipeline. For assessing disaster-related images, types of disasters, and pertinent cues, the VGG16 and Incep-
tionV3 models were utilized. The findings revealed that 96% of images were accurately classified using YOLOv4's disaster taxonomy.
The current investigation has a restricted number of computational resources because high-performance Graphical Processing Unit
(GPU) machines might enhance detection efficiency.
Table 1 describes the summary of the literature on the techniques used to extract data from social media and their classification for
disaster management.
Studies on ML/DL methods for (long-term) catastrophe recovery have not been found, according to the literature. Collectively
most literature uses UAVs, satellite imagery, IoTs, or data from the internet for Disaster Response (DR) which is often insufficient and
inaccurate. There is only a limited amount of application of data available on social media platforms especially Twitter which pro-
vides structured data along with geospatial features. Furthermore, the literature using Twitter deals with tweets alone and other
forms of data such as URLs, and images for classifying disasters into different types and levels of intensity. Moreover, most of the re-
search deals with a single disaster and not multiple which reduces the scalability of the model. Nevertheless, the research does create
awareness and mobilization of resources but fails to do it effectively as the amount of data used in the training and testing of the
model is unsupportive of it.
This study focuses on using different types of Twitter data for analyzing multiple disasters and determining the extent of relief that
can be provided. The text and image data are extracted from Twitter using Twitter APIs that are accessed through the Tweepy library
of python. The images were extracted and classified into relevant and irrelevant images using CNN. The obtained relevant images are
further classified into types of disasters using DL techniques and the aid could be provided based on the type of disaster and its im-
pact.

Table 1
Summary of literature.

Literature Objective Methodology Result

Jony et al., (Islam Usage of visual characteristics to categorize social Three CNN models were pre-trained on Accuracy - Visual feature (88%),
JonyWoodley and media images showing signs of natural disasters. two separate datasets and a Textual Metadata (86%).
Perrin, 2019) straightforward neural network was used.
Putra et al., (Putra et Monitoring floods with the information extraction Classification techniques-Naive Bayes, F1-score-classification model
al., 2020) approach from social media data. random forest, support vector machines, (82.5%), NER model (73%), and
logistic regression, and conditional geocoding (75%).
random fields.
Stanford NER, Geocoding locator and
NLP.
Dwarakanath et al., Review relevant academic research publications and SVM, Latent Dirichlet Allocation (LDA), AUC-SVM (0.937), LDA (0.95).
(Dwarakanath et al., categorize them into one of three categories: damage CNN, RNN, LSTM. Accuracy – CNN (77.61).
2021) assessment, post-disaster coordination and response,
and early warning and event detection.
Peters et al., (Peters Identify relevant messages that have the potential to On-topic tweets were categorized using The results clearly show that
and Joao Porto, improve situation awareness in the instance of disaster keywords and the Euclidean distance communications containing
2015) management using social media. between each message in the danger zone images are closer to the actual
was calculated. occurrence.

3
T. Vishwanath et al. Remote Sensing Applications: Society and Environment 30 (2023) 100961

3. Methodology
The study's secondary data came from the website “PyImageSearch (Pyimagesearch and Google Search, 2019)," which is open to
the public. The CNN which is the model used for the classification of disasters is trained and tested using the image dataset. The key
data source for obtaining crucial aid-related information and images for validating the CNN model used for categorizing disasters af-
ter the training and testing phases are complete is Twitter's API. Fig. 1 depicts the goal and application of primary and secondary data.
There are six crucial steps that need to be carried out each time a set of images or tweets are downloaded from Twitter. The complete
proposed method is shown in Fig. 2.
The first step towards implementation is extracting data from Twitter using the API provided by Twitter itself. The study is done
with a Twitter developer's account with elevated access to extract any data depending upon the requirements. The study accesses
Twitter API using the secret keys and tokens obtained from the Twitter platform with an active internet connection. On the other
hand, API provides an easy and effective mechanism to delete data whenever required. For this study, the extraction of data began
with the tweets posted by different users over a three-month timeframe beginning from May 2022 to June 2022. The necessary tweets
are often obtained in a recent timeframe i.e., one or two months prior to the given date. These tweets in the study are extracted with
the hashtags and keywords that are trending for a given disaster resulting in the most relevant tweets. These tweets serve two pur-
poses, firstly, gaining general knowledge about the public's perception of the disaster and secondly, identifying the interested parties
that are more likely to be posting images of the very disaster that their tweets are about. This leads to an easy extraction of other infor-
mation related to the disaster from these targeted accounts through their ids which are instrumental in getting the images.
The images are extracted from the accounts of users who are more likely to share information regarding the disaster on a social
networking platform, this is mentioned in step 1 of Fig. 2. As seen in Fig. 2 user id extraction comes first, followed by the isolation of
pertinent images from the whole collection. The images that are extracted frequently consist of both pertinent and unrelated images
that users posted during the same timeframe of the disaster. If this were not addressed, the model would be far less accurate and effec-
tive. As a result, a CNN model is used to separate the pertinent images. Since the segregation is a simple process and does not call for a
complex architecture, this model is created from scratch without using any pre-built structures.
Following the acquisition of relevant images, test datasets are provided to the CNN model for disaster classification that was devel-
oped for the classification and image analysis. Here, the images play a crucial role in judging how accurately the CNN model for clas-

Fig. 1. Purpose of primary and secondary data.

Fig. 2. Overall methodology.

4
T. Vishwanath et al. Remote Sensing Applications: Society and Environment 30 (2023) 100961

sifying disasters categorizes the kind of catastrophe the images show. This technique helps to mobilize resources rapidly and effec-
tively by letting us know not only what kind of assistance these locations would receive but also what kind of assistance the people
would require. For instance, images of a wildfire would demand the usage of firefighting units in the affected areas. The size of the fire
would determine the number of these units required as well as provide the data of preliminary estimate of the damage the wildfire
would create. After being categorized into cyclones, earthquakes, wildfires or floods, the images are further processed to estimate the
damage done by calculating how much of each image is damaged by the disaster. The amount and kind of assistance required by the
specific disaster-affected area are both determined by the impact analysis from the previous phase. Accessing contribution links, char-
itable endeavours, volunteer opportunities, etc. using the disaster information obtained in step 2 will substantially aid in the rapid re-
habilitation of as many individuals as possible.
The methodology is divided into two sections. The first is for the extraction of data from Twitter, and the second is for disaster
classification.

3.1. Data extraction and classification of relevant and irrelevant images


Data extraction is the process of extracting all important data from Twitter that might aid in the allocation of resources to disaster-
affected areas. The entire process is explained in detail using two major steps of the collection with preprocessing and model building
resulting in only relevant data being used to aid in disaster management.
I. Data Collection and Preprocessing
Approximately 1% of tweets are made accessible via the API. These restrictions are site- and API-specific. An analyst can use third-
party tools to get the data and perhaps perform some analysis if they do not want to work directly with the API. Twitter provides an
API to extract data from its platform but for that function to be available, the user must create a developer's account with elevated ac-
cess, as done in this study. Once the account is created, different access keys and tokens are provided which are specific to every user.
Using these keys and tokens, all forms of data become available and are used for processing tweets for the data regarding dead and af-
fected people, URLs of journals, articles, and donation links.
In this study, to make sure that only relevant data is extracted from Twitter's API as much as possible, proper keywords, or hash-
tags have been used that return all the results that are related to them. For example: In the case of California Wildfires, keywords like
‘California Wildfires’ and hashtags such as ‘#CalWildfire’, and ‘#CaliforniaWildfire’ are used. After eliminating duplicate tweets, we
employ a selection of unique tweets that are live-extracted for the study. This query is given as hashtags or keywords and the data is
extracted within a required timeframe. The retweets are dropped during the search process to prevent redundancy. The media involv-
ing URLs, images and texts are all extracted and used for helping disaster-affected people.
II. Model building and Implementation
For Data Extraction several key features need to be taken into account before obtaining the data as shown in Fig. 3. The approach
towards obtaining Twitter data goes through several steps until finally the relevant information can be gathered.
Twitter Developer's account is created from the main Twitter website and elevated access is requested which gets approved de-
pending upon the level of data requirements and is confirmed through a questionnaire that is submitted along with a formal applica-
tion regarding the usage of data extracted from Twitter's platform. All the access keys and tokens are obtained with the creation of the
project for which the data extraction is required. These keys and tokens are access and bearer types that are unique to every user and
need to be kept safely. Since the keys and tokens are unique to every user, the user is liable for every usage derived from those keys
and tokens because they provide access to all kinds of publicly available Twitter data such as texts (tweets), URLs, images, etc. In the
study, these data items have been extracted and then used for disaster management.
First, the task is to identify those people who are most likely to send images related to disasters. Here it is done by using hashtags
and keywords for a particular disaster and getting the names of people who sent tweets regarding the same. The names are then used
to get the user IDs which allow much larger access than usernames. The user IDs then obtained on getting the keys are used to extract
the URLs, images and tweets as per the requirements. The images are then extracted after their respective URLs are obtained and the
images are downloaded with an active internet connection. The problem here is that a lot of irrelevant images are also getting down-
loaded which need to be segregated.

Fig. 3. Methodology flow of extracting tweets from Twitter's API.

5
T. Vishwanath et al. Remote Sensing Applications: Society and Environment 30 (2023) 100961

After the relevant images have been segregated from the irrelevant ones using a CNN model. The CNN model gives high accuracy
of segregation and thus only the required images are obtained and then saved into a folder and used as feed for the classification
model. The model uses 40 images for each class of disaster in training thus a total of 160 images and 10 images for every class of disas-
ter in testing thus a total of 40 images.
The tweets upon extraction are all relevant as they are extracted with keywords and hashtags. The URLs to be extracted are di-
vided into three types: information, general knowledge, and, dead and affected people. The categorization isn't explicitly done but
happens implicitly based on the keywords and hashtags used to extract them in the first place. For example: To extract information
tweets about a cyclone, keywords such as “timesofindia” and “Cyclone” are used. To obtain a general perception of the public regard-
ing the disaster hashtags such as “#indiacyclone” and “#tauktae” have been used. For analyzing the number of dead and affected peo-
ple keywords and hashtags combinations are used such as “#indiacyclone”, and “people died”. The images are extracted with a user's
ID which is in turn extracted using keywords or hashtags of around 20 people who tweeted about a particular disaster as shown in Fig.
3. The flow diagram clearly depicts every consecutive step where each of these processes is done. This process results in a high con-
centration of relevant information regarding tweets and URLs although the images extracted are primarily based on the assumption
that if a user tweets about a disaster, then they are most likely to send images regarding the same (Fig. 3). This results in several irrele-
vant images cropping up during extraction as not all the images posted by a user are disaster-related and need to be filtered out. The
examples of Relevant and Irrelevant Tweets are shown in Table 2.

3.2. Data Classification for disaster prediction


I. Data Collection and Preprocessing
The dataset used for training and testing our model is from a secondary source which is made publicly available by “PyImage-
Search”. From 4429 total images available in the dataset, 930 images are earthquake images, 1349 are cyclone images, 1072 are flood
images and the remaining 1072 are wildfire images. These images are divided into training and testing datasets to train and test the
model respectively. The training set contains 3414 images, while the test set contains 854 images. The model trained using the train-
ing set, validated using the validation set is later used to classify images in the test dataset which is the dataset created from the im-
ages extracted from Twitter in real-time using Twitter's API. The distribution of the images into their corresponding classes for train-
ing and testing is shown in Fig. 4.
DL architecture requires automatic learning and picking up features from the data, which is usually only attainable when there is a
vast quantity of training data available, especially for issues where the input samples are extremely high-dimensional, like images.
Image augmentation is a technique of producing several altered versions of the same image by making various adjustments to source
images and has been implemented in this study for image preprocessing. Scaling, rotating, and zooming are a few techniques that we
have used on our training dataset. Before performing any extra processing, the training input images will be multiplied by a value
called “rescale”. The RGB coefficients in the original training images range from 0 to 255, but given a normal learning rate, such val-
ues would be too high for our CNN model to handle. As a result, we scale our original images by a factor of 1/255 to get values be-
tween 0 and 1. The “zoom_range” is an image augmentation feature for randomly zooming inside images and has been set to 0.2 in
our study. Image augmentation keeps the CNN model for the classification of disasters from overfitting and improves generalization
and by making these small alterations to the original image, we have captured the training disaster images from a different perspec-
tive, as it would seem in reality.
II. Model building and Implementation
The methodology of building and implementing the CNN model for the classification of disasters is illustrated in Fig. 5.
The secondary dataset is run through a CNN model that is built from scratch and intended for the classification of disasters into
four classes - Cyclones, Earthquakes, Floods, and Wildfires. The CNN model is improvised by using transfer learning to improve the
accuracy of the model and the performance of the model is verified by using the test data. To further improve the accuracy of the CNN
model, inceptionV3 and fine-tuning features are built into the previous CNN model along with the Transfer Learning framework. The
primary data that has been extracted prior and classified as “relevant” with the help of the CNN model to classify images as relevant
or irrelevant, serves as a source for validation of the CNN model for disaster classification.
The CNN algorithm is the most well-known and widely used one in the field of DL. CNNs have a structure akin to a traditional
neural network and were modelled after the neurons found in human and animal brains (Fig. 6). For this study, a CNN model for the
classification of disasters has been built and implemented from scratch. The model consists of four layers, each of which performs a
specific function. The CNN is a sequential model wherein the convolution layer, being the CNN model's first layer, is composed of con-
volutional filters, and N-dimensional metrics that are used to express the training dataset, which in this study pertains to the input dis-

Table 2
Examples of relevant and irrelevant tweets.

Sl Relevant Tweets Irrelevant Tweets


No.

1. Assam's chief minister, Himanta Biswa Sarma, revealed today that the CM's Relief Fund (CMRF) had Accordingly, I will submit the cheque to the
given more than 1 lakh students beneficiary cash aid totalling 1000 INR apiece. Honourable Chief Minister of Assam.
2. Shri Aditya Mein and Shri Khanseng Mein donated 15 lakh INR to CMRF on behalf of Lohit Allied Funds are already being released for kids.
Industries for flood relief in Assam.

6
T. Vishwanath et al. Remote Sensing Applications: Society and Environment 30 (2023) 100961

Fig. 4. Dataset distribution for training and testing.

Fig. 5. Framework of models for disaster classification.

Fig. 6. CNN architecture.

aster images which create the output feature map. We have built the Pooling layer after the Convolution layer, the output of one layer
becomes the input of the following layer. In this case, the Pooling layer's primary function is the feature map's subsampling. It creates
smaller versions of large-scale feature maps and at the same time, keeps most of the dominating characteristics of the disaster images
during the entire pooling stage. The activation function being implemented is Rectified Linear Unit (ReLU) which decides whether or
not to fire a neuron in response to a specific input. The Fully Connected (FC) layer forms the last layer in our CNN model. The input of
this layer is the output from the final Pooling Layer, which is flattened and then fed into it. Here, we change the activation function
from ReLU to SoftMax which is used to get probabilities of the input disaster images being in a particular disaster class.
TL is the process of using a machine learning model that has already been trained to address a different but related issue (Fig. 7).
7
In this study, transfer learning has been utilized as a means to improve the accuracy of the CNN model for the classification of disas-
T. Vishwanath et al. Remote Sensing Applications: Society and Environment 30 (2023) 100961

Fig. 7. Transfer learning architecture.

ters. The process of using the knowledge obtained to complete another activity and enhance learning is what motivates this technol-
ogy. We began the learning process by using patterns which were discovered while classifying disasters using the CNN model. In the
TL architecture, it transfers as much information gained by the CNN model during the training process. The lower layers of the TL ar-
chitecture generate more general features that make knowledge transfer easier to other tasks, whereas higher layers are more task-
specific. To validate our CNN model, the parameters of the model are modified extremely precisely during the fine-tuning procedure,
which involves training the variables of an imported module alongside those of the CNN model. Fine-tuning parameters use a smaller
learning rate while training the output layer from scratch uses a larger learning rate, this feature significantly improves the model's
accuracy. Fine-tuning improves generalization only when sufficient examples are available and hence is used in this study.

4. Results and discussion


I. CNN model for Data Extraction
The data extraction results in a near 100% accuracy for tweets and journals as well as the links extracted as they all employ hash-
tags and keywords which only result in tweets which are relevant to the disaster i.e., informative, general knowledge or those that
give counts of people dead or affected. However, it results in lesser accuracy when it comes to image extraction because hashtags or
keywords are not used there. A large number of irrelevant images are extracted which need to be filtered out. The accuracies obtained
for the CNN model are shown below in Fig. 8 to Fig. 15 for all four disasters. The accuracy for each disaster is high enough so that the
segregated images can be used as a validation set for the second CNN model which is used to classify images into disasters. There are
altogether 8 epochs used for the extraction of images in the CNN model, which on average, gives maximum accuracy at four epochs
for each disaster as shown in Fig. 9, Fig. 11, Fig. 13, and Fig. 15. The loss plots, which give a quantitative measure of the difference be-
tween the target and predicted output values for the four disasters are seen in Figs. 10, 12 and 14.
As seen in graph Fig. 9 for the identification of flood images the model has the highest accuracy of 0.89 for epoch 8 which is also
the overall accuracy. The highest accuracy of 0.95 for the wildfire class as seen in Fig. 11, is obtained for epoch 4 with an overall accu-
racy of 0.90. For the earthquake class, the maximum accuracy is 0.99 for epoch 3 with an overall accuracy of 0.88 as shown in Fig. 13.
The highest accuracy for the cyclone class as shown in Fig. 15 is 0.97 for epoch 5 with an overall accuracy of 0.90. The confusion ma-
trix in Table 3 gives a deeper understanding of the model's performance along with the precision and recall values mentioned in Table
4.
Table 3 shows the confusion matrix for the CNN model for the extraction of images. Here the total number of images taken as part
of the validation set is 40, 10 for each disaster and the number of truly and falsely identified images. It shows the true positive, true
negative, false positive, and false negative values. The true positive values for all the disasters are high, indicating that the model is
good at identifying images that are truly disaster-related. As there are 5 images for all disasters that are relevant, the model thus, eas-
ily identifies the relevant images owing to the true positive values being equal to or comparable to 5. To segregate the irrelevant im-

Fig. 8. Loss Plot for Extraction of flood images.

8
T. Vishwanath et al. Remote Sensing Applications: Society and Environment 30 (2023) 100961

Fig. 9. Accuracy Plot for Extraction of flood images.

Fig. 10. Loss Plot for Extraction of Wildfire images.

Fig. 11. Accuracy Plot for Extraction of Wildfire images.

ages, the models need to identify the said images as well. The CNN model for extraction of images works well enough at finding out
the unnecessary images. It works best for earthquake-related images whereas least for flood-related images. Images for wildfires as
shown by the true positive and true negative values are the best identified in terms of relevance and irrelevance, therefore the model
is best suited for wildfire-related disaster images. The overall accuracy of the CNN model is 90% for all four disasters. The model
works best for wildfire-related images and lesser for flood-related images. This ensures that the images containing wildfire informa-
tion are readily recognized and classified as relevant. For the other disasters i.e., cyclones, earthquakes, and floods the model works
with more than satisfactory outcomes as can be seen by the F1-scores in Table 4.
II. Data Classification
9
T. Vishwanath et al. Remote Sensing Applications: Society and Environment 30 (2023) 100961

Fig. 12. Loss Plot for Extraction of Earthquake images.

Fig. 13. Accuracy Plot for Extraction of Earthquake images.

Fig. 14. Loss Plot for Extraction of Cyclone images.

The images that were extracted from Twitter and classified as “relevant” were used as the validation dataset in the CNN model for
the classification of disasters. The CNN model used for validation, obtained an accuracy of 71% as displayed in Table 5. The model for
testing, achieved 80% accuracy and other performance measures as seen in Table 6. The CNN model in conjunction with TL, bottle-
neck features of inceptionV3 and fine-tuning for validation resulted in 96.3% accuracy and other performance measures as reflected
in Table 7. The same for testing results with 98.14% accuracy as seen in Table 8.
10
T. Vishwanath et al. Remote Sensing Applications: Society and Environment 30 (2023) 100961

Fig. 15. Accuracy Plot for Extraction of Cyclone images.

Table 3
Confusion Matrix Values of CNN model for the extraction of images.

True Positive True Negative False Positive False Negative

Cyclone 5 3 2 0
Earthquake 4 5 0 1
Flood 5 2 3 0
WildFire 5 4 1 0

Table 4
Performance Measure Values of the CNN model for the extraction of images.

Performance Measure Cyclone Earthquake Floods WildFire

Precision 0.71 1.00 0.62 0.83


Recall 1.00 0.80 1.00 1.00
F1-Score 0.83 0.89 0.77 0.91
Accuracy of CNN model -0.90 (Overall)

Table 5
Performance Measure Values of the CNN model for the extraction of images (validation).

Performance Measure Cyclone Earthquake Floods WildFire

Precision 0.62 0.75 0.49 0.77


Recall 0.68 0.69 0.57 0.80
F1-Score 0.65 0.71 0.52 0.78
Accuracy of CNN model -0.71 (Overall)

Table 6
Performance Measure Values of the CNN model for the extraction of images (testing).

Performance Measure Cyclone Earthquake Floods WildFire

Precision 0.71 0.82 0.57 0.83


Recall 0.76 0.77 0.69 0.87
F1-Score 0.75 0.79 0.62 0.85
Accuracy of CNN model -0.80 (Overall)

As seen from the above tables, the CNN model for the classification of disasters for testing has scored higher results when com-
pared to validation. Likewise, for the CNN model with additional transfer learning, fine-tuning and bottleneck features, results have
shown that testing has improved when compared to the results of the validation. ‘Flood’ class has consistently achieved the lowest
precision score throughout all models, while ‘Wildfires’ have achieved the highest scores in precision, recall, and F1.

5. Conclusion
This study defined results based on data, including text and images extracted from Twitter and explored their associations by using
different deep learning techniques. We successfully developed two classifiers, the first one to classify between relevant and irrelevant
11
T. Vishwanath et al. Remote Sensing Applications: Society and Environment 30 (2023) 100961

Table 7
Results of Transfer Learning, InceptionV3 and fine-tuning model for the classification of disasters (Validation).

Performance Measure Cyclone Earthquake Floods WildFire

Precision 0.74 0.85 0.62 0.86


Recall 0.69 0.81 0.74 0.90
F1-Score 0.71 0.83 0.67 0.88
Accuracy of transfer learning, InceptionV3 and Fine-Tuning model -0.963 (Overall)

Table 8
Results of Transfer learning, InceptionV3 and fine-tuning model for the classification of disasters (Testing).

Performance Measure Cyclone Earthquake Floods WildFire

Precision 0.82 0.96 0.74 0.97


Recall 0.86 0.89 0.95 0.96
F1-Score 0.84 0.92 0.83 0.96
Accuracy of transfer learning, InceptionV3 and Fine-Tuning model -0.98 (Overall)

images and the second to classify images into four disaster classes: Cyclones, Earthquakes, Floods and Wildfires. The study incorpo-
rates transfer learning, InceptionV3 and Fine-Tuning model for the classification of disasters and achieved an accuracy of 98.14%, at-
taining 0.82 Precision, 0.86 Recall and 0.84 F1-Score for Cyclone, 0.96 Precision,0.89 Recall and 0.92 F1-Score for Earthquake, 0.74
Precision, 0.95 Recall and 0.83 F1-Score for Flood and 0.97 Precision, 0.96 Recall and 0.96 F1-Score for Wildfire classification.
The CNN model for validation without the TL for the classification of disasters attained an accuracy of only 71%. It is observed
that with the help of TL, this accuracy has improved significantly. The overall architecture and attention to detail of features in TL
have contributed tremendously in improving the CNN model's results and hence should be used in association with CNN models in the
future for better performance.
Through this study, a repository of information was formed by extracting all tweets and images associated with the ongoing disas-
ter and filtering out relevant information. A mechanism was created to raise awareness, to help and prepare for disasters in real-time.
By extracting donation and fundraiser links, people residing outside the affected areas can further support the impacted community as
well as being informed in real-time on the evolving situation, using the article links extracted from Twitter.
The study's advantages include the comparison of various DL techniques are becoming increasingly popular in recent years due to
their superior accuracy when the model is trained on enormous volumes of data. The main benefit of DL is its attempt to incremen-
tally learn high-level features from data. This does away with the necessity for hard-core feature extraction and subject expertise. The
insights gained, show promising results by using InceptionV3 and fine-tuning. These methods, when combined with CNN, have the
potential to have a significant impact on disaster management. The study's conclusions are important for determining the high-impact
areas and gauging the efficacy of government policies in the context of catastrophe.
From a future work perspective, there is a lot of scope for research. A similar methodology of DL techniques can be employed in
various cases under disaster management aid. It can be expanded to pursue further analysis of disasters, classification of a variety of
hazards, as well as data extraction from a multitude of social media platforms. The precise location of the images taken and posted
can also be extracted. This work is limited to research purposes and thus can be expanded to include real-time applications for people
in disaster-struck areas where direct and appropriate aid can be provided to people using the location extracted and assessment of the
extent of damage, the feasibility of providing aid etc. Moreover, the real-time extraction and processing can also be improved using a
larger time frame up to a year instead of three months.

Ethical statement
1) The material is the authors original work, which has not been previously published elsewhere.
2) The paper is not currently being considered for publication elsewhere.
3) The paper reflects the authors' own research and analysis in a truthful and complete manner.
4) The paper properly credits the meaningful contributions of co-authors and co-researchers.
5) The results are appropriately placed in the context of prior and existing research.
6) All sources used are properly disclosed (correct citation). Literally copying of text must be indicated as such by using quotation
marks and giving proper reference.
7) All authors have been personally and actively involved in substantial work leading to the paper, and will take public
responsibility for its content.

Declaration of competing interest


The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.

12
T. Vishwanath et al. Remote Sensing Applications: Society and Environment 30 (2023) 100961

Data availability
Data will be made available on request.

References
Asif, Amna, Shaheen, Khatoon, Md Maruf, Hasan, Alshamari, Majed A., Abdou, Sherif, Elsayed, Khaled Mostafa, Rashwan, Mohsen, 2021. Automatic analysis of social
media images to identify disaster type and infer appropriate Emergency response. https://doi.org/10.1186/s40537-021-00471-5.
Chamola, Vinay, Hassija, Vikas, Gupta, Sakshi, Goyal, Adit, Guizani, Mohsen, Sikdar, Biplab, 2020. Disaster and pandemic management using machine learning: a
survey. https://doi.org/10.1109/jiot.2020.3044966.
DrSindhu, S., 2019. Disaster management from social media using machine learning. https://doi.org/10.1109/icacc48162.2019.8986198.
Dwarakanath, Lokabhiram, Amirrudin, Kamsin, Rasheed Abubakar, Rasheed, Anitha, Anandhan, Liyana, Shuib, 2021. Automated machine learning approaches for
emergency response and coordination via social media in the aftermath of a disaster: a review. https://doi.org/10.1109/access.2021.3074819.
Islam Jony, Rabiul, Woodley, Alan, Perrin, DImitri, 2019. Flood detection in social media images using visual features and metadata. https://doi.org/10.1109/
dicta47822.2019.8946007.
Linardos, Vasileios, Maria, Drakaki, Panagiotis, Tzionas, Yannis, L Karnavas, 2022. Machine learning in disaster management: recent developments in methods and
applications. Machine Learn. Knowledge Extract. https://doi.org/10.3390/make4020020.
Madichetty, Sreenivasulu, Sridevi, M., 2019. Detecting informative tweets during disaster using deep neural networks. https://doi.org/10.1109/
comsnets.2019.8711095.
Nilani, Algiriyage, Raj, Prasanna, Kristin, Stock, Doyle, Emma E.H., Johnston, David, 2022. Multi-source multimodal data and deep learning for disaster response: a
systematic review. https://doi.org/10.1007/s42979-021-00971-4.
Peters, Robin, Joao Porto, De Albuquerque, 2015. Investigating Images as Indicators for Relevant Social Media Messages in Disaster Management.
Putra, Prabu Kresna, Sencaki, D.B., Dinanta, G.P., Alhasanah, F., Ramadhan, R., 2020. Flood monitoring with information extraction approach from social media data.
https://doi.org/10.1109/agers51788.2020.9452770.
Pyimagesearch, Google Search. https://pyimagesearch.com/2019/11/11/detecting-natural-disasters-with-keras-and-deep-learning/. (Accessed 12 August 2022).
TheIndianExpress, 2021. Tauktae. https://indianexpress.com/article/india/tauktae-killed-67-people-left-8629-cattle-dead-in-gujarat-says-centre-7415713/.
(Accessed 12 August 2022).
Wang, Zihan, Zhu, Taozheng, Mai, Shice, 2020. Disaster detector on twitter using bidirectional encoder representation from Transformers with keyword position
information. https://doi.org/10.1109/iccasit50869.2020.9368610.
Xie, Jibo, Yang, Tengfei, 2018. Using social media to enhance disaster response and community service. https://doi.org/10.1109/bgdds.2018.8626839.

13

You might also like