0% found this document useful (0 votes)
21 views13 pages

Twitter For Disaster Relief Through Sentiment Analysis For COVID-19 and

Uploaded by

SarraSaroura
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views13 pages

Twitter For Disaster Relief Through Sentiment Analysis For COVID-19 and

Uploaded by

SarraSaroura
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

International Journal of Disaster Risk Reduction 55 (2021) 102101

Contents lists available at ScienceDirect

International Journal of Disaster Risk Reduction


journal homepage: http://www.elsevier.com/locate/ijdrr

Twitter for disaster relief through sentiment analysis for COVID-19 and
natural hazard crises
Shivam Behl, Aman Rao, Sahil Aggarwal, Sakshi Chadha, H.S. Pannu *
Computer Science and Engineering Department Thapar Institute of Engineering and Technology Patiala India, India

A R T I C L E I N F O A B S T R A C T

Keywords: In emergencies and disasters, large numbers of people require basic needs and medical attention. In such situ­
Disaster management ations, online social media comes as a possible solution to aid the current disaster management methods. In this
Deep learning paper, supervised learning approaches are compared for the multi-class classification of Twitter data. A careful
COVID-19 preparedness
setting of Multilayer Perceptron (MLP) network layers and the optimizer has shown promising results for clas­
Sentiment analysis
sification of tweets into three categories i.e. ‘resource needs’, ‘resource availability’, and ‘others’ being neutral
and of no useful information. Public data of Nepal Earthquake (2015) and Italy Earthquake (2016) have been
used for training and validation of the models, and original COVID-19 data is acquired, annotated, and used for
testing. Detailed data analysis of tweets collected during different disasters has also been incorporated in the
paper. The proposed model has been able to achieve 83% classification accuracy on the original COVID-19
dataset. Local Interpretable Model-Agnostic Explanations (LIME) is used to explain the behavior and short­
comings model on COVID-19 data. This paper provides a simple choice for real-world applications and a good
starting point for future research.

1. Introduction 1.1. Role of social media

Natural hazards are events caused by natural forces of the Earth such It has been observed that the internet was available in situations
as earthquakes, cyclones, hurricanes, floods, volcanic eruptions, and, when there was no other medium of communication during the disaster
tsunamis which lead to loss of lives and immense disruption. A natural [2]. OSM websites such as Twitter, Facebook, WhatsApp, and Instagram
hazard is a tragic event with the atmospheric, geological origin which have played an important role in establishing this communication.
causes social environment disruption, huge damage, and fatalities [1]. These websites help in spreading real-time information about the di­
There are varying estimates that the number of natural hazards across sasters by allowing people to share ground zero information and ask for
the world is in the range of 500–1,000 per year. There are two kinds of help. For example, in the 2015 Chennai Floods [3], many regions
help needed by the victims suffering in a natural hazard event, one is remained drowned in heavy rain, various groups and individuals offered
during the disaster, and the other is while dealing with its immediate to help voluntarily and were continuously interacting with social media
after-effects. To reduce the human and financial loss caused by the channels to share information and seek help. Many researchers propose
disaster, many different possible ways are being explored in support of three ways to use social media during natural hazard event [4,5]:
the current disaster management techniques and one such possible so­
lution is the use of online social media (OSM). During a natural hazard 1. Preparing for a natural hazard - social media can help people better
event, there is a spike in communication since people seek to contact prepare for a disaster and understand which organizations will help
family and friends in the disaster-affected region and search for infor­ their communities.
mation regarding food, shelter, and transportation. 2. Responding during and immediately after the natural hazard - during
the disaster social media help users communicate directly to their
families, reporters, volunteer organizations, and other residents and

* Corresponding author.
E-mail address: [email protected] (H.S. Pannu).

https://doi.org/10.1016/j.ijdrr.2021.102101
Received 15 July 2020; Received in revised form 12 January 2021; Accepted 28 January 2021
Available online 3 February 2021
2212-4209/© 2021 Elsevier Ltd. All rights reserved.
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101

immediately share information. It also controls rumors because it’s 2. Recent works
easier for organizations to validate facts.
3. Recovering from the natural hazard event - Social media help bring The impact of users’ sentiments based upon geographical locations
the community together to discuss the event and share information, has been visualised for disastrous Hurricane Sandy [8]. The association
coordinate recovery efforts and get information about aid of retweet for sentiment divergence with the hurricane affects have been
studied during the disaster situation. A transient overview of natural
OSM data refers to all of the raw insights and information collected hazards and social media along with its basic functions and components
from an individual’s social media. This unstructured data is ever- provided in various other studies [4,9,10] proposes basic guidelines for
growing due to an exponential increase in the popularity of OSM. organizing information exchange between actors during a natural haz­
There is a need for a solution to use OSM data to bridge the gap between ard. The role of Information in disaster management highlights the need
relief efforts, victims, and people willing to help with medicines, shel­ for an effective information system for efficient measures also recog­
ters, and other resources. It can help us gather numbers, percentages, nizing various information sources and databases for handling disaster
and statistics of tweets from the victims straightforwardly. situations in India [11].
In this paper, we study how to use social media to gather situational A case study of the Chennai city of India has been performed on
information about various disasters to convert it into structured and WhatsApp and Facebook to manage the natural hazard event known as
usable information to devise a system that streamlines the process of ‘black swan’, which was one of the biggest floods in 2015 in southern
disaster management to help save more lives and conduct relief opera­ India [12]. The study revealed that the data retrieved from WhatsApp
tions more efficiently. Work here focuses on information posted on and Facebook conversations can be an eye-opener to gaps in resource
Twitter, though it can further be extended to other micro-blogging need and distribution which in real disasters should help in
platforms such as Facebook and Instagram. decision-making. The recovery and disaster response has been studied
for India and Pakistan with sentiment analysis by utilizing big data [13].
1.2. Research challenges Real-time visualization, categorization, and grouping of social media
have been discussed using machine learning techniques for the recovery
Existing disaster management techniques suffer from the following and response. Table 2 discusses the comparative analysis of recent works
research gaps [6,7]: including techniques used and results obtained. Table 1 lists definitions
of abbreviations used in this paper.
1 Supervised learning requires the labeling of massive social media The relationships between network size and Twitter activity during
dataset which is expensive and time-consuming and hence not the disasters were mapped [21]. The study conducted on five disasters
best approach during disasters. Unsupervised or semi-supervised (hurricanes Irene and Sandy, two sets of tornado outbreaks, and flooding
learning is limited due to the lack of rule-based structural in Louisiana) found that among all account types, individuals with
grouping. Therefore, the recent techniques are still at the infant stage “average” sized networks, i.e. those with 100 followers or fewer on
struggling for effectiveness and cost-efficiency for real-time Twitter, are most likely to share extensive information about the prep­
applications. aration and relief methods of the disaster. The study on hurricane Irma
2. The majority of existing studies employ machine learning models by and Harvey [22] shows how geo-location and filtration of Tweets can be
only using basic lexicon features. Strong feature extraction tech­ used for rapid disaster relief. The study on hurricane Irma [23] cites the
niques should be incorporated with advanced machine learning usefulness of Twitter data in resource allocation decisions during a
techniques for accurate and quick performance. disaster. Geo-spatial and Supervised machine learning techniques were
3. Mainstream research on the disaster sentiment analysis only con­ used to categorize Geo-located tweets into negative, neutral, or positive
siders linguistic-based features and lexicon aspects only to match the groups. An approach is presented to enhance the identification of rele­
victim demand and resources of supply for disaster relief. vant messages from social media which relies on the relationships be­
4. Emotional association using the temporal and spatial features could tween geo-referenced social media messages and geographic features of
be deployed for the missing patterns through the social media in­ flood phenomena [24]. It concludes that that messages near (up to 10
formation for homophily effect. Homophily signifies the similarity in km) to severely flooded areas had a much higher probability of being
the behavior of a group of people about a certain event. related to floods when the technique is applied to examine Tweets
5. Sociological and psychological impacts on human behavior need to produced during the River Elbe flood of June 2013 in Germany. The
be employed to reduce the difference between the algorithmic pro­ accuracy of supervised learning algorithms (NB, RF, DT, SVM, LR in
totype and the human’s way of understanding. For example, fear,
hope, and excitement leverage during the disasters which often Table 1
overpower the normal sense of reasoning. Abbreviations used in the recent works.
6. Visual features could also leverage the effectiveness of disaster
Abbreviation Full-Form
management to evaluate the analytical factors for disaster-related
posts and prioritize relief operations. AUC Area under the curve
CNN Convolutional Neural Network
DL Deep Learning
This paper has seven sections. The first section has introduced the DT Decision Tree
topic, discussing the role of social media in disaster management and GBDT Gradient Boosting Decision
problem challenges. The second section highlights the recent works on IR Information Retrieval
KNN K-Nearest Neighbor
social media use in disaster management. The third section discusses the
LIME Local Interpretable Model-Agnostic Explanations
methodology and proposed architecture of the study. The fourth section LR Logistics Regression
is about experiments and detailed results. The fifth section discusses the MLP Multi Layer Perceptron
model interpretation and explainability. The sixth section contains dis­ NB Naïve Bayes
cussions about the research undertaken in this paper. The seventh sec­ OOV Out of Vocabulary
OSM Online Social Media
tion, the last, lists the research challenges confronted, concludes and RF Random Forest
points at future possibilities and research extensions. ROC Receiver Operating Characteristic curve
SVM Support Vector Machine
SLDA Supervised Linear Discriminant Analysis
XAI Explainable Artificial Intelligence

2
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101

Table 2 management (see Fig. 1).


Comparative analysis of recent works. The insights from these papers are used in
experiments conducted in this research. 3. Proposed architecture
Sr. Ref. &Year Data set Techniques used Results

1. Pranckevi-cius Amazon NB, RF, DT, LR achieved


The proposed idea for disaster relief has 4 phases: (a) data collection,
and Reviews: SVM, LR. highest (b) pre-processing, (c) classification based on need or availability of the
Marcinkevicius Unlocked Mobile classification resource, and (d) presentation of data to the participants of relief op­
2017 [14] Phones, Kaggle, accuracy erations, namely government and NGOs. However, the main contribu­
(2016). (58.50%).
tion of this research is the classification phase using supervised methods.
2. Ramadhani and 2,000 Tweets for Deep learning, DL. achieved
Goo 2017 [15] training, 2,000 MLP highest This involves pre-processing of the tweet text (Table 4), training of the
Tweets for accuracy proposed (MLP) model, and classify the tweets into victim or resource
testing. (75.03%). provider classes, as described in Fig. 2. Since the labeled data is avail­
3. Ashktorab et al., Twitter data of classification: LR achieved able, therefore the classified tweets are compared against their true la­
2014 [16] 12 selected KNN, DT, NB, 0.86 accuracy&
disasters in LR. Clustering 0.88 AUC.
bels to calculate the testing accuracy. As labeling of data is an expensive
North America and information and time-consuming task and is difficult to perform while dealing with
from 2006. extraction. the effects of a disaster, this study also includes studying the re-usability
4. Nguyen et al., Nepal, California SVM, LR, RF, CNN-1 of the model trained and tested on geographically detached regions.
2017 [17] earthquake, CNN–I (Crisis (86.89%,
Since this is a domain-specific problem, we have done a detailed analysis
Typhoon embedding), 81.21%,
Hagupit & CNN-II (Google 87.83%, of the text to understand and compare the tweets. This section contains
cyclone PAM. embedding) 94.17% AUC) the basic architecture of the proposed MLP model, details of model pre-
performed best processing, and model training.
across 4
disasters.
5. Basu et al., 2019 Twitter data unsupervised: Nepal (Trained 3.1. Proposed MLP architecture
[18] from Italy and pattern on Italy):
Nepal matching and IR Unsupervised The proposed deep learning architecture using MLP is depicted in
Earthquake. Supervised: achieved
Fig. 3. MLP is a feed forward artificial neural network comprising of
GBDT, SVM, NB, highest F-
CNN. Combine score@100:
multiple layers of perceptrons. A perceptron is a basic unit of a neural
word and Need (0.191), network consisting of weights and biases [29]. Training of MLP includes
character level Avail (0.117). finding the best set of weights and biases to make the best prediction. A
embeddings. Italy (Trained perceptron is represented by the equation:
on Nepal):
( )
Supervised ∑ n
achieved M=f b+ xi wi (1)
highest F- i=1
score@100:
Need(0.087), where b is bias, x is the input vector, w is the weight, n is the cardinality
Avail(0.076)
of vector x, where i ranges from 1 to n.
6 Jianquang et al., The Stanford CNN Average
2018 [19] Twitter accuracy = MLP has been shown to be a reliable model for classification of short
Sentiment Test 85.63% text like tweets [20,30]. A variety of deep learning models are available
(STSTd), for the MLP architecture. The architecture of the model proposed here
SemEval2014, has been inspired by Akhtar et al. [20], and Mamgain et al. [30] and
STSGd dataset,
SED, SSTd
modified to improve performance on the underlying dataset. A MLP is a
7 Akhtar et al., SemEval-2017 MLP with word Macro-avg neural network with multiple layers of artificial neurons stacked one
2018 [20] embedding accuracy = after another. In our architecture, after pre-processing, tweets are passed
0.646 to Word embedding layer, which transforms each word into corre­
sponding vector. This is described in subsection 3.3. Generally, outputs
of embedding layer are flattened or pooled before passing to fully con­
Table 1) are compared by analyzing short product reviews from Amazon
nected layers. Pooling is preferred to reduce computations. In our ex­
[14]. Kaggle (2016) data can be found at https://www.kaggle.com/
periments, global average pooling produced superior results. Since we
PromptCloudHQ/Amazon-reviews-unlocked-mobile-phones.
are working on smaller dataset, we have used one dimensional global
Deep learning methods are used for sentiment analysis of tweets
average pooling layer after three fully connected layers. The increase in
[15]. A method to extract information from Twitter for quantitative
training time was compensated by improved results. A fully connected
analysis is discusses [16]. A pipeline is constructed that consists of three
layer, also known as a dense layer is a layer in a neural network in which
main parts: classification (SLDA, SVM, LR in Table 1) to identify tweets
every input is connected to every output by weight. It represents a
reporting damages, clustering to merge similar tweets, and extraction of
matrix multiplication operation. Pooling is a way to down-sample the
phrases that report specific information. Neural network-based classi­
incoming vectors. In global average pooling, we take the average of all
fication methods have been explored [17] to achieve good results in the
the features in a pool.
classification of Twitter data at the onset of disaster when no labeled
The input layer of the proposed model feeds the bag-of-words vector
data is available. Compararison of various unsupervised (Pattern
into the embedding layer using disaster specific Word2Vec embedding
Matching & IR) and supervised (GBDT, SVM, NB, CNN in Table 1) ma­
[31]. The sentence vectors thus obtained as a 2D matrix of each word
chine learning techniques for extracting useful information from Twitter
represented as a (1x300) dimensional row are fed into repeating units of
data set of Nepal and Italy Earthquake [18].
a dropout layer and a fully connected layer with the Rectified Linear
Explainable machine learning and explainable AI (XAI) is quite
Unit Activation Function (ReLU). ReLu is represented as follows:
useful to reverse engineer the test results obtained from the black box
machine learning model for human interpretation and analysis [25,26]. f (y) = max(0, y) (2)
Thus XAI help to extend the utility and trust of the underlying model.
During model training, the weights are updated with the aim of
The next section discussed the proposed architecture for disaster
minimising the loss function Sparse Categorical Crossentropy calculated as

3
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101

Fig. 1. Graphical abstract of the proposed work.

Fig. 2. Flow diagram representing the process sequence of input data preprocessing, Word2Vec [27,28], MLP layers, involved in the proposed architecture. Testing
phase is also similar to prediction.

− 1∑n
[ ( ) ( )]
L(w) = ti log ̂
ti + (1 − ti )log 1 − ̂
ti (3) ( ⃒ ) e(θ )
i

n i=1 P y = k⃒θi = ∑m (4)


e(θm )
i
k=0

where, ti → true label, t̂i → predicted label, w→ Model parametersand i The standard exponent function is applied to each element θi when θ
ranges from 1 to n, i.e. the number of classes or target variables. The is input parameter represented by Equations (5) and (6). Here j is the
Output layer of the proposed model uses a Softmax activation function, index of summation which goes till n, the cardinality of vector θ. Also
primarily employed for the multi-class classification task. Softmax nor­
malises the input values into a vector representing the probability of ∑
n
θ=b + wtj Xj (5)
each class which adds up to 1. Here probability is calculated for a class j=1
y = k with total classes being m. This is calculated by expression:

4
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101

artificial intelligence and introduced by Professor Geoffrey Hinton [33].


Dropout provides a computationally cheap and effective method for
regularization and reduction of overfitting. In dropout, some neurons
are randomly excluded during each training cycle based on dropout rate
which is a Regularization technique for neural networks. All the tech­
nical terms have been defined in Table 3. The Adam optimizer has been
found to perform best in terms of accuracy and speed. Values of the
hyper-parameter are listed in Table 3 and the algorithm of proposed
tweet classification has been provided in Algorithm 1 i.e. these values
are used to control the learning process.

3.2. Pre-processing

The dataset contains raw data in the form of sentences with special
symbols, email ids, and URLs. Such type of data is not required for
machine learning models and thus pre-processing is needed, which deals
with data preparation and transformation of the dataset to make the
input data easier to decode and interpret. Initially, lowerization is per­
formed i.e converting all the characters to lower case. After that, all the
special words such as emails, mentions, links, images are eliminated
using a regular expression. Special symbols are also cut from the corpus
of the tweet. All the punctuation such as full stop, comma, and brackets
are removed using the NLTK library. Natural Language Toolkit (NLTK) is
an easy to use platform for Python programs to work with the human
language involving lexical resources and over 50 corpora (https://www.
nltk.org).
Removing of stop words such as “the”, “a”, “an”, “in” is also neces­
sary for simplification of the language and efficient vectorization. In the
end, Lemmatization is performed using WordNetLemmatizer. The dataset
used has class imbalanced problem and the majority of the tweets are
from ‘others’ class. This bias influences many machine learning algo­
rithms and can lead models ignoring minority class [35,36]. In our

Table 3
Definition of technical parameters and their values involved in the proposed
technique.
Parameter Meaning Value

Input Shape Tweets are provided to our model as 55


1 × 55 dimensional vector of integers.
Dropout Rate Regularization technique for neural 0.2
networks.
Output Shape Equal to number of classes. 3
Activation (Dense Mathematical function that defines the “ReLU” (Rectified
Layer) output of a node with given input Linear Unit)
(inputs).
Fig. 3. Flow diagram of proposed Multi Layer Perceptron neural network with Output Activation The activation function of output node. “softmax”
Word Embedding. Max Epochs Number of passes an algorithm has 1000
completed over training data.
Callback Monitor Functions which help monitor or affect Validation Loss
or model training. Early stopping callback
is used to terminate model training
θ = b + WtT X (6) when not learning to prevent
overfitting.
where X is feature vector of ith training sample and Wt is the weight Callback patience Patience is the number of epochs our 3
(early stopping) metric may falter before we stop
vector and b is bias. The models which show low bias and high variance training.
are said to overfit the data. This may result in very high error on test Batch Size Data is grouped into batches before 128
data. Deep Learning models trained on smaller datasets are more likely feeding into the machine learning
to show high variance, trying to observe patterns that do not exist. This algorithm.
Optimizer Method used to update the weights Adam
results in poor accuracy over test data. Models based upon deep learning
during training process.
are likely to overfit when trained on small data sets. This is because the Learning Rate Tuning parameter that determines the 0.0001
model is not well generalised and neuron weights are adjusted just to fit step size to minimize a loss function
the underlying few training examples. Therefore, data augmentation is [34].
often used to enhance the size of training data for a deep learning model Validation Ratio Ratio in which data is divided into 0.25
validation and training set.
for robustness, better overall performance and model generalization to Embedding Size of vector used to represent each 300
reduce over-fitting [32]. dimension word embedding.
According to the Cambridge dictionary, deep learning exploits al­ Loss Function to asses models prediction. Sparse categorical
gorithms and tries to mimic the way human brain works. It is a type of crossentropy

5
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101

Table 4 Table 5
Dataset description of Italy and Nepal earthquake. There are three categories: Snapshot of need and availability tweets from Italy [18], Nepal [18] and
need, availability and others. COVID-19 datasets.
Categories Italy-Earthquake (2016) Nepal-Earthquake (2015) Tweets Class

Need 177 495 Italy Dataset


Availability 233 1,333 Rieti hospital in Italy is asking for blood donors of all blood types Need
Others 70,487 50,018 We are sending food, water, and medicine to survivors of the 6.2 Magnitude Avail
earthquake
Nepal Dataset
experiments we have used oversampling and under sampling to deal Pickaxes, shovels and earth-moving equipment required in Nepal Need
Turkey, is sending 80-person search, rescue and medicine team to Nepal Avail
with this problem (Section 4.6).
COVID-19 Dataset
Urgent for a friend’s father HELP 76 years old male blood group Need
Algorithm 1 = A+ Diagnosed with covid19 with pneumonia Admitted in
Algorithm of tweet classification using the proposed architecture layers of MLP Max Saket Hospital, New Delhi Need a plasma donor who recovered
from Covid by a month ago
Result: Text data classified into three classes x = Get Twitter data; I am AB positive. I’ve had no COVID-19 symptoms Avail
y = Over sampling/under sampling(x); since 3 months. I will be glad to donate plasma. I am in Kolkata
z = Text lowerizer(y);
p = Removal of special words and stop words(z);
q = Lematizer(p);
r = Tokenizer(q);
Table 6
s = Generation of Bag of Words(BOW) model(r); Formulae for Performance Metrics. True (t), false (f), positive
t = Create embedding layer(pre-trained embedding matrix,s); (p) and negative (n).
While Epochs ≤ 1000 do
Metric Formula
Sentence transformation via word embedding(t);
For MLP Blocks do Accuracy (tp+tn)/(tp+fp+fn+tn)
Dropout layer(t); Precision(P) tp/(tp+fp)
Dense layer(t); end Recall(R) tp/(tp+fn)
Global average pooling(t); F1 Score 2*R*P/(R+P)
Dense layer(t);
Output layer(t);
If Callback patience on validation loss ≤ 0 then
End training end Table 7
end Comparative analysis of accuracy with the model trained on the Nepal dataset.
Present training output; Proposed model MLP-W lagged behind MLP-TF when tested on leftout Nepal
tweets but scored maximum when tested for re-usability on the Italy dataset.
S.No. Method Accuracy on Nepal Accuracy on Italy
3.3. Word embedding
1. LR-TF [14] 0.84 0.37
2. CNN–W [19] 0.83 0.67
Word Embedding is a feature learning technique where words are
3. CNN-WF [19] 0.84 0.67
mapped to vectors of real numbers. The word embedding used in this 4. MLP-TF [20] 0.86 0.60
paper was developed and employed with proof-of-concept [31]. They 5. MLP-W [20] 0.82 0.76
used disaster specific Word2vec embeddings trained using 52 million
human-annotated crisis-related tweets during 19 different disasters that
took place between 2013 and 2015. In word embeddings, the words are some tweets with their labels can be found in Table 5. The testing of the
represented as low dimensional vectors that represent both semantics proposed model has also been applied to the new target dataset i.e.
and syntax of the word. The Word2Vec model used here employs a primary dataset collected from Twitter related to COVID-19. Details of
continuous bag of words architecture with each word represented in 300 the primary data will be discussed in subsection 4.7.
dimensions.
4.2. System configuration
4. Experiments
Dell Inspiron 7570 having Intel i5 8th generation (8250U) processor
This section contains information about public datasets, system with Windows 10 Operating System and 8 GB RAM has been used for all
configuration, data analysis, and performance metrics, results, and the experimental work. The graphics produced in the system were
comparative analysis on public and original dataset. courtesy to the Integrated Intel 640 Graphics along with the NVIDIA GF
940MX graphic card. Python 3.7 with the latest versions of Tensorflow-
4.1. Public datasets - Nepal and Italy earthquake 2, SciKit-Learn, NLTK, and other minor libraries were used for imple­
menting and evaluating the methods. Details of Python programming
A public dataset is used that was prepared by Basu et al. [18] (2019) language for machine learning are discussed in detail [37].
containing tweet-ids of tweets collected during Nepal earthquake 2015
and Italy earthquake 2016. We used these tweet-ids to access the tweet 4.3. Data analysis
text which contains three different categories, tweets related to needs of
resources, tweets related to the availability of resources for donation or The data collected is in text form and this helped in analyzing the
relief work and others which contain tweets collected during the time of frequency of the words used in need and availability sets of both di­
disaster but are not related to the need or availability of resources. In the sasters. Visualizing these frequencies in the form of a word cloud in
Nepal dataset, we had 495 need related tweets, 1,333 availability which the most frequent words are highlighted gives clarity of the
related tweets, and 50,018 other tweets. Similarly, the Italy dataset context of the data-sets that were taken into consideration as visible in
consists of 177 need, 233 availability, and 70,487 others (see Table 4). Fig. 4. While making these word clouds, it was important to remove the
From this, it is evident that the dataset is skewed and resource-related words with outlier frequency. This was done by constructing the fre­
tweets form only a small part of the whole dataset. An example of quency chart of normalized frequency. Then normalized frequency was

6
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101

Fig. 4. Word Cloud of prominent words.

capped at 0.04 as the words with higher frequency (Eg. “Nepal”, 4.5. State-of-art techniques for performance comparison
“Italy”& “Earthquake”) added no value for the objective of the analysis.
To understand how different data distributions Nepal and Italy datasets The techniques which are considered for performance comparison
possessed, we used a simple MLP with TF-IDF to classify tweets into are discussed briefly as follows:
categories Nepal and Italy. TF-IDF, stands for term frequency–inverse 1. LR-TF [14]: Traditional logistic regression with TF-IDF features.
document frequency [38]. It is a metric to calculate the importance of a Logistic regression (LR) uses Logistic function at its core which maps a
word is to a document relative to the whole collection or corpus. real number onto (0,1) and defined by
This model can classify our two datasets with an AUC-ROC score of
L(x) = 1 / (1 + e− x ) (7)
0.94 which implies that both datasets have different distributions and
can be distinguished. Receiver operating characteristics (ROC) curve LR is used to find the probability of a class or several classes such as
and area under the curve (AUC) are performance metric used with an image containing a fruit, animal, human being etc. Where each class
various threshold values to see the classification performance [39]. will get a probability assigned adding to 1.
AUC-ROC shows the degree of classification and with a probability 2. CNNs were originally used for 2-D images, but for time series and
curve. AUC is a metric to calculate the ability of a classifier to differ­ Natural Language Processing, are used as 1-dimension convolutions. 1-D
entiate among various classes and is a summary measure for ROC. If CNNs with Disaster specific word embedding with (CNN-WF) and
0.5 < AUC < 1, then classifier is able to detect more true negatives and without fine-tuning (CNN–W) has been considered for our experiments.
true positives as compared to false negatives and false positives. CNN have been found promising for sentiment analysis for short length
Therefore it can classify positive and negative class examples and the texts (such as tweets) [19]. Fine tuning about Google embedding and
best AUC value is 1 which means it can distinguish positive and negative crisis embedding (for example) have been discussed [17] for the two
examples perfectly. So AUC-ROC score of 0.94 by the proposed model is types of pre-trained Embedding.
quite promising. 3. Multi-Layer Perceptron (MLP) [40] is a type of artificial neural
network consisting of multiple layers of perceptron along with threshold
4.4. Performance metrics activation. Perceptron is a type of binary linear classifier to decide if a
given vector belongs to either class, for example apples versus bananas.
The performance of the proposed architecture layers of MLP has been MLP consists of input layer, hidden layer(s) and output layer and uses
compared with the state-of-art models LR-TF [14], CNN–W [19], back-propagation learning technique for training. Hidden layers consist
CNN-WF [19], MLP-TF [20]. Metrics include accuracy, precision, recall, of non-linear activation functions (neurons) to separate non-linear data
and F1-Score which are described in Table 6. The next sub-section distribution. MLP has been studied for sentiment analysis for investi­
briefly explains the state-of-art techniques. gating the rumors on Twitter for information authenticity [20]. MLP
combined with TF-IDF is denoted by MLP-TF and our proposed model is
MLP with Disaster Specific Word embedding (MLP-W). These techniques

7
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101

were carefully selected after researching recent applications of tradi­ respectively. Max epochs were set to 1000, but early stop callback with
tional machine learning and deep learning methods for short text clas­ patience 3 was used on validation loss save time and prevent over-fitting.
sification. When training on our dataset, new hyper-parameters were Epochs are the number of passes of the entire data during training.
selected using combination of random search and grid search. Bigger datasets are usually split into batches or groups which are passed
Next sub-section discusses about the results analysis and perfor­ through the model one at a time. Patience is the number of epochs when
mance comparison of the underlying case study. there is no further improvement and hence the training process finishes.
The dataset is split in training, validation and testing. The corresponding
4.6. Results errors or loss signify the quality of the model. For example higher
validation loss would signify that the trained model does not generalise
The result comparisons were done in five different cases, trained and well for the validation data and thus is not good enough for the testing
tested on the Nepal dataset, trained and tested on the Italy dataset, phase. Accuracy is the ratio of correct predictions over total input
trained on the Nepal dataset and tested on Italy dataset, trained on Italy samples [35] and it is inversely proportional to error.
dataset and tested on the Nepal dataset and trained and tested on the During training, we used oversampling of the minority class to deal
mix. The results are recorded in Tables (7, 8, 9) and Figure (6) with the acute data imbalance. In the Nepal dataset, we under-sampled

Fig. 5. (a)–(f) are the curves for accuracy and loss for training and validation for Nepal (a–b), Italy (c–d), and Nepal+Italy both (e–f) as a combined datasets. In (c),
the zig-zag pattern is due to smaller dataset, and with more tweets the curve becomes smoother.

8
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101

others class by taking 1000 tweets, took all the tweets from resource Table 8
availability class and over-sampled resource need class by taking each Comparative analysis of accuracy of models trained on the Italy Dataset. MLP-W
tweet twice. Similarly, in Italy dataset, we under-sampled others by was second best when tested on the Italy dataset but again scored most when
randomly selecting 400 tweets and over-sampled resource availability tested for re-usability on the Nepal dataset. Reusability scores for this case are a
and need by again selecting each tweet twice. For unbiased result again less which can be attributed to the fact that the Italy dataset has less tweets to
train the models.
testing was done by using stratified sampling.
Stratified sampling [41] selects data inversely proportional to the S.No. Method Accuracy on Nepal Accuracy on Italy
size of their stratum. For example Neyman allocation studied [42]. 1. LR-TF 0.47 0.88
Again, during testing, stratified sampling was done for unbiased testing 2. CNN–W 0.46 0.92
of the results. However, a different approach was used in testing on 3. CNN-WF 0.46 0.94
4. MLP-TF 0.45 0.97
original dataset as described in subsection 4.7.
5. MLP-W 0.52 0.95
Plots comparing training and validation accuracy against epochs and
training and validation loss against epochs for MLP-W model are pre­
sented in Fig. 5(a–f). Curves which are obtained using smaller data sets
Table 9
provide a zig-zag pattern, showing less sensitivity to noise. Such zig-zag
Comparative analysis of accuracy of Models Trained on Mix (Nepal+Italy)
curves are obtained because noise of training data and validation data is dataset. MLP-W lagged a little behind when tested on Mixed Dataset but per­
always different, and this effect is magnified when dealing with smaller formed best on testing reuability scores on original COVID-19 data. Because,
datsets. This behaviour can be observed in Fig. 5. COVID-19 data were not stratified during testing, a detailed performance eval­
uation also is done in Table 10.
S. Method Accuracy on Mixed Dataset Accuracy on COVID-19 Dataset
4.7. Original dataset - COVID-19 tweets No.

1. LR-TF 0.88 0.81


Original Data of tweets were collected during COVID-19 Pandemic
2. CNN–W 0.87 0.81
with keywords ‘COVID-19’ and ‘COVID19’ These tweets were limited in 3. CNN- 0.88 0.78
resource-related tweets. So we used resource keywords such as ‘Plasma’, WF
‘Mask’, ‘PPE’, ‘Shelter’ to collect some more tweets, and added them in 4. MLP-TF 0.87 0.77
the mix. No terms such as need or availability were used and keywords 5. MLP-W 0.85 0.83

used for the collection are not present in Italy and Nepal dataset so that
the new dataset does not become biased. Data collection was case dealing with imbalanced classes. MLP-W performed consistently better
insensitive for keywords. The data was manually annotated with labels in all the metrics and thus was considered as the winner. The next sec­
‘need’, ‘availability’, and ‘others’ as present in the open dataset. tion discusses the Explainable Artificial Intelligence (XAI) technique
Retweets, repeated tweets, and tweets in languages other than English used to evaluate the test results obtained by the MLP model. XAI refers to
were removed to maintain the quality of the dataset. The dataset con­ the technique used to interpret and evaluate the machine learning pre­
tains a total of 2,274 tweets. It contains 194 resource need tweets, 125 dictions for human analysis [25]. Local interpretable model-agnostic
resource availability tweets and 1,955 others tweets. We tested the explanations (LIME) [43] is a technique of XAI used to reverse engi­
proposed MLP model trained on a mix of Nepal and Italy dataset on the neer the test results to evaluate the responsible factors.
Original dataset and found the results in Tables 9 and 10. Since this
dataset is small and only includes a few tweets of availability and need 5. Model interpretation - XAI
class, the whole dataset was used in the evaluation. In Table 10 we
calculated individual accuracy of each class along with their macro Disaster relief management is a critical task and any machine
average. We observed that models performed particularly badly in the learning model which is used for such an important task needs to be
availability class. F1-score is also included for a fair comparison of interpreted and explained before using it in the real world. Some ma­
models on unseen data. The weighted average is also used since we are

Fig. 6. Accuracy plotted for different scenarios as per Tables 7, 8, and 9.

9
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101

Table 10
Comparative analysis of algorithms trained on union of Nepal and Italy dataset and tested on COVID-19 dataset, MLP-W performed well in individual as well as mean
accuracy for all three classes. It was also able to get the highest F1-score among all the models. Weighted average is considered suitable for comparison when
imbalanced data is used.
S.No Method Accuracy F1-Score

Need Avail Others Mean Macro average Weighted average


Accuracy (Macro average)

1. LR-TF 0.82 0.16 0.85 0.61 0.57 0.83


2. CNN–W 0.82 0.33 0.84 0.66 0.60 0.84
3. CNN-WF 0.87 0.38 0.80 0.68 0.57 0.82
4. MLP-TF 0.82 0.17 0.81 0.60 0.55 0.82
5. MLP-W 0.85 0.41 0.85 0.70 0.61 0.85

chine learning models are easy to explain like linear models such as approximate linear model) has been set to 5,000 and ‘cosine similarity’
Logistic Regression, but many of the models used for real world appli­ is used as distance measure.
cations use deep learning. Deep neural networks can understand com­ For this study, we randomly selected tweets and applied LIME on it.
plex structures better than traditional machine learning but are also Figs. 7 and 8 show output results thus generated for misclassified
victims of being difficult to interpret. In this section, we have used LIME availability tweets. From Fig. 7, it can be observed that words like
[43] to understand the behavior of the MLP model, particularly for ‘blood’ make the tweet more likely to be resource-related, but its impact
explaining its inferior performance on COVID-19 Availability tweets. on availability tweet is very small. Further, the word ‘donate’ which
LIME works on the idea of approximating a black-box model, in this case from common sense can be attached to both need and availability classes
MLP, with a more interpretable white-box model constructed on input decreases the probability of the availability class (Figs. 7 and 8). On the
data. The functioning of LIME can be described by the following other hand, important words from Fig. 9 for correctly classified tweets
equation: make sense. The words like ‘blood’, ‘contact’ and ‘old’ make it a resource
related tweet, words ‘need’ and ‘please’ makes it a clear Need tweet.
explanation(x) = min[L(f , m, nx ) + θ(m)] (8)
mε M The under-performance of the preceding examples can be ascribed to
limitations of the training dataset, which are (i) Training data consti­
where, m → explainable model, M → family of all possible explanations, tutes tweets during earthquake, however testing was done on a
θ → model complexity, L → loss function, f → black box model to be completely different kind of crisis data i.e. COVID-19. (ii) Smaller
explained, and nx → size of neighbourhood around instance x. LIME training data may lead to high variability in some cases thus causing
algorithm optimizes the loss part and model complexity is determined bias. This is further discussed in detail in Section 6. However, LIME has
by the user. In this experiment, “num_samples”(neighbourhood to learn

Fig. 7. Results of LIME on incorrectly classified availability tweet “today registered donate blood plasma nh finally feeling like ive recovered covid19 best feeling
tested positive three month ago feel grateful still bit help fight” gave probabilities of 0.69 for Need, 0.30 for Others and 0.01 as Availability.

10
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101

Fig. 8. Results of LIME on incorrectly classified availability tweet “ab positive Covid-19 symptom since 3 month glad donate plasma Kolkata” gave probabilities of
0.03 for Need, 0.02 for Availability and 0.95 as Others.

provided indispensable information about deficiencies of the proposed number of tweets it contains. Fig. 5(a–f) illustrates the accuracy and loss
model. These model interpretations can be monitored in the initial for training and validation for Nepal 5(a-b), Italy 5(c-d), and Nepal­
stages of disasters. If they can be explained by humans, it can be +Italy both 5(e-f) as a combined datasets. Zig-zag pattern can be
considered as a sanity check. If some abrupt behavior is observed observed in Fig. 5c and d due to smaller dataset, which becomes
(Fig. 8), the underlying dataset for classification model can be changed smoother and with involvement of more tweets eventually. Also, the
to suit new crisis needs. model performed poorly for classifying availability tweets when it was
trained on Earthquake data set and tested on the COVID-19 data set. The
6. Discussion availability tweets in the COVID-19 dataset involved majorly new
resource names (eg. plasma) and were more directed towards appreci­
The challenges faced during the current study are as follows: ation of volunteers as in Fig. 7.
The success of MLP can be attributed to the fact that being a deep
1. The annotated dataset available to train our supervised machine learning model, it has been able to learn complex structures better than
learning models is limited. traditional Machine Learning but did not rely on sentence structure like
2. It is found that out of all the data collected during disasters, only a CNN since we were dealing with tweets from various sources and thus
small proportion are related to disaster needs and availability. containing varied writing styles. Further, we found that pre-trained
3. Optimal neural network architecture depends upon the underlying disaster specific word embedding performed better than TF-IDF. This
data distribution used for training and validation. Therefore, it is a can be explained by the fact that we were dealing with a small dataset
problem without a well defined solution. and an embedding is trained on a much larger disaster specific dataset.
We have also discussed about the LIME technique for XAI to interpret
This work explores a supervised learning algorithm for classification models and analyze their shortcomings for real-world applications.
of user-generated Twitter data into three classes based on the whether
the Tweet talks about some resource need, resource availability or is 7. Conclusion
neither from any of these categories i.e. others. The testing of state-of-art
supervised learning text classification models was done for six different Following are the contributions made by our research:
cases. It was found that all the models performed quite similarly when
trained and tested on the same data set. However, when we trained on 1. Explored the importance and complexity of utilizing social media in
data collected during one disaster and tested on another disaster with disaster relief operations, specifically using the micro-blogging site -
different data distribution, a steep decline in the performance of some Twitter.
models was observed. Looking at the results it was found that deep 2. Considered two public datasets (Nepal and Italy Earthquake) for
learning algorithms performed much better as compared to traditional training and one original Twitter dataset (COVID-19) for testing and
machine learning methods. explored the reusability of the model trained on previous disasters.
Overall, MLP-W (proposed model) performed best. It scored 83% 3. Cleaned and normalized tweets and compared TF-IDF and pre-
accuracy when trained on data from a mix of Nepal Earthquake (2015) trained Word2Vec in different scenarios for text vectorization.
and Italy Earthquake (2016) and tested on the original COVID-19 4. Compared state-of-art supervised classification algorithm for segre­
dataset. The problem with re-usability was observed when models gation of resource need, resource availability, and other tweets.
were trained on Italy data set which can be attributed to the limited

11
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101

Fig. 9. Results of LIME on correctly classified need tweet “55 year old lady max saket need plasma covid recovered person recovered 15 day back blood group please
contact daughter arpita” gave probabilities of 0.71 for Need, 0.14 for Others and 0.15 as Availability.

5. MLP with pre-trained Word2Vec embedding has been found to generalization.


outperform giving 83% testing classification accuracy on COVID-19
data. The model was trained on Nepal and Italy datasets.
6. LIME has been used for model interpretation to explain model Declaration of competing interest
behaviour and shortcomings.
There is no conflict of interest.
This paper has defined the multilayer perceptron model for classifi­
cation of tweets during disaster based on resource needs, resource References
availability and others. After adequate pre-processing, tweets were
vectorized using pre-trained disaster specific word embedding [31]. [1] S.L. Cutter, The perilous nature of food supplies: natural hazards, social
Public datasets of Nepal earthquake (2015) and Italy earthquake (2016) vulnerability, and disaster resilience, Environment 59 (1) (2017) 4–15.
[2] M. Jahanian, Y. Xing, J. Chen, K. Ramakrishnan, H. Seferoglu, M. Yuksel, “The
were analyzed using word frequency and used for model training and evolving nature of disaster management in the internet and social media era, in:
validation. Although data from different locations had a very different 2018 IEEE International Symposium on Local and Metropolitan Area Networks
distribution, many words distinguishing need and availability were (LANMAN), IEEE, 2018, pp. 79–84.
[3] M. Yadav, Z. Rahman, The social role of social media: the case of Chennai rains-
common. The original dataset of COVID-19 tweets was used for testing
2015, Social Network Analysis and Mining 6 (1) (2016) 101.
of the models. Our proposed model- MLP-W has shown better results in [4] D. Velev, P. Zlateva, Use of social media in natural disaster management, in: Intl.
all the test cases and proved to be usable for data from an unseen Proc. Of Economic Development and Research, vol. 39, 2012, pp. 41–45.
[5] A. Lin, H. Wu, G. Liang, A. Cardenas-Tristan, X. Wu, C. Zhao, D. Li, A big data-
disaster.
driven dynamic estimation model of relief supplies demand in urban flood disaster,
The limitation of the proposed technique is that its classification International Journal of Disaster Risk Reduction (2020) 101682.
accuracy heavily depends on the kind of resource being asked in the [6] G. Beigi, X. Hu, R. Maciejewski, H. Liu, An overview of sentiment analysis in social
disaster it is being tested on. This is evident from the fact that the model media and its applications in disaster relief, in: Sentiment Analysis and Ontology
Engineering, Springer, 2016, pp. 313–340.
trained on Earthquake data when tested on COVID-19 data performed [7] F. Roth, T. Prior, Volunteerism in Disaster Management: Opportunities, Challenges
poorly for the availability class. LIME (technique for XAI) has been used and Instruments for Improvement, vol. 1, 07 2019.
to understand model behaviour. This technique can be used for initial [8] V.K. Neppalli, C. Caragea, A. Squicciarini, A. Tapia, S. Stehle, Sentiment analysis
during hurricane sandy in emergency response, International journal of disaster
stages of disaster and in later stages when one can identify the perfor­ risk reduction 21 (2017) 213–222.
mance on tweet classes thus extending the trust towards a model. The [9] N. Pourebrahim, S. Sultana, J. Edwards, A. Gochanour, S. Mohanty, Understanding
future plan is to incorporate more diverse data and improve model communication dynamics on twitter during natural disasters: a case study of
hurricane sandy, International journal of disaster risk reduction 37 (2019) 101176.

12
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101

[10] N. Kankanamge, T. Yigitcanlar, A. Goonetilleke, M. Kamruzzaman, Determining [26] A. Malhi, S. Knapic, K. Främling, Explainable agents for less bias in human-agent
disaster severity through social media analysis: testing the methodology with south decision making, in: International Workshop on Explainable, Transparent
east queensland flood tweets, International journal of disaster risk reduction 42 Autonomous Agents and Multi-Agent Systems, Springer, 2020, pp. 129–146.
(2020) 101360. [27] T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed
[11] M.S. Tad, K. Janardhanan, The role of information system in disaster management, representations of words and phrases and their compositionality, Adv. Neural Inf.
International Journal of Management and Social Sciences Research 3 (1) (2014) Process. Syst. 26 (2013) 3111–3119.
16–20. [28] Y. Goldberg, O. Levy, word2vec explained: deriving mikolov et al.’s negative-
[12] N. Bhuvana, I.A. Aram, Facebook and whatsapp as disaster management tools sampling word-embedding method, 2014 arXiv preprint arXiv:1402.3722.
during the Chennai (India) floods of 2015, International Journal of Disaster Risk [29] J. Friedman, T. Hastie, R. Tibshirani, The Elements of Statistical Learning, vol. 1,
Reduction 39 (2019) 101135. Springer series in statistics New York, 2001, p. 10.
[13] J.R. Ragini, P.R. Anand, V. Bhaskar, Big data analytics for disaster response and [30] N. Mamgain, E. Mehta, A. Mittal, G. Bhatt, Sentiment analysis of top colleges in
recovery through sentiment analysis, Int. J. Inf. Manag. 42 (2018) 13–24. India using twitter data, in: 2016 International Conference on Computational
[14] T. Pranckevičius, V. Marcinkevičius, Comparison of naive bayes, random forest, Techniques in Information and Communication Technologies (ICCTICT), IEEE,
decision tree, support vector machines, and logistic regression classifiers for text 2016, pp. 525–530.
reviews classification, Baltic Journal of Modern Computing 5 (2) (2017) 221. [31] M. Imran, P. Mitra, C. Castillo, Twitter as a Lifeline: Human-Annotated Twitter
[15] A.M. Ramadhani, H.S. Goo, Twitter sentiment analysis using deep learning Corpora for Nlp of Crisis-Related Messages, 2016 arXiv preprint arXiv:1605.05894.
methods, in: 2017 7th International Annual Engineering Seminar (InAES), 2017, [32] H.S. Pannu, S. Ahuja, N. Dang, S. Soni, A.K. Malhi, Deep Learning Based Image
pp. 1–4. Classification for Intestinal Hemorrhage, MULTIMEDIA TOOLS AND
[16] Z. Ashktorab, C. Brown, M. Nandi, A. Culotta, Tweedr: mining twitter to inform APPLICATIONS, 2020.
disaster response, in: ISCRAM, 2014, pp. 269–272. [33] Y. LeCun, Y. Bengio, G. Hinton, “Deep learning,” nature 521 (7553) (2015)
[17] D.T. Nguyen, K.A. Al Mannai, S. Joty, H. Sajjad, M. Imran, P. Mitra, Robust 436–444.
classification of crisis-related data on social networks using convolutional neural [34] K.P. Murphy, Machine Learning: a Probabilistic Perspective, MIT press, 2012.
networks, in: Eleventh International AAAI Conference on Web and Social Media, [35] H. Kaur, H.S. Pannu, A.K. Malhi, A systematic review on imbalanced data
2017. challenges in machine learning: applications and solutions, ACM Comput. Surv. 52
[18] M. Basu, A. Shandilya, P. Khosla, K. Ghosh, S. Ghosh, Extracting resource needs (4) (2019) 1–36.
and availabilities from microblogs for aiding post-disaster relief operations, IEEE [36] H. He, Y. Ma, Imbalanced Learning: Foundations, Algorithms, and Applications,
Transactions on Computational Social Systems 6 (3) (2019) 604–618. John Wiley & Sons, 2013.
[19] Z. Jianqiang, G. Xiaolin, Z. Xuejun, Deep convolution neural networks for twitter [37] S. Raschka, V. Mirjalili, Python Machine Learning: Machine Learning and Deep
sentiment analysis, IEEE Access 6 (2018) 23253–23260. Learning with Python, Scikit-Learn, and TensorFlow 2, Packt Publishing Ltd, 2019.
[20] M.S. Akhtar, A. Ekbal, S. Narayan, V. Singh, E. Cambria, No, that never happened!! [38] J. Ramos, et al., Using tf-idf to determine word relevance in document queries, in:
investigating rumors on twitter, IEEE Intell. Syst. 33 (5) (2018) 8–15. Proceedings of the First Instructional Conference on Machine Learning, vol. 242,
[21] M.T. Niles, B.F. Emery, A.J. Reagan, P.S. Dodds, C.M. Danforth, Social media usage 2003, pp. 133–142. New Jersey, USA.
patterns during natural hazards, PloS One 14 (2) (2019). [39] S.D. Walter, The partial area under the summary roc curve, Stat. Med. 24 (13)
[22] M. Enenkel, S.M. Saenz, D.S. Dookie, L. Braman, N. Obradovich, Y. Kryvasheyeu, (2005) 2025–2040.
Social Media Data Analysis and Feedback for Advanced Disaster Risk Management, [40] S.K. Pal, S. Mitra, Multilayer perceptron, fuzzy sets, classifiaction, IEEE Trans.
2018 arXiv preprint arXiv:1802.02631. Neural Network. 3 (5) (1992) 683–697.
[23] D. Reynard, M. Shirgaokar, Harnessing the power of machine learning: can twitter [41] E. Liberty, K. Lang, K. Shmakov, Stratified sampling meets machine learning, in:
data be useful in guiding resource allocation decisions during a natural disaster? International Conference on Machine Learning, 2016, pp. 2320–2329.
Transport. Res. Transport Environ. 77 (2019) 449–463. [42] J. Neyman, On the two different aspects of the representative method: the method
[24] J.P. De Albuquerque, B. Herfort, A. Brenning, A. Zipf, A geographic approach for of stratified sampling and the method of purposive selection, in: Breakthroughs in
combining social media and authoritative data towards identifying useful Statistics, Springer, 1992, pp. 123–150.
information for disaster management, Int. J. Geogr. Inf. Sci. 29 (4) (2015) [43] M.T. Ribeiro, S. Singh, C. Guestrin, “” why should i trust you?” explaining the
667–689. predictions of any classifier, in: Proceedings of the 22nd ACM SIGKDD
[25] K. Främling, Decision theory meets explainable ai, in: In International Workshop International Conference on Knowledge Discovery and Data Mining, 2016,
on Explainable, Transparent Autonomous Agents and Multi-Agent Systems, pp. 1135–1144.
Springer, 2020, pp. 57–74.

13

You might also like