Twitter For Disaster Relief Through Sentiment Analysis For COVID-19 and
Twitter For Disaster Relief Through Sentiment Analysis For COVID-19 and
Twitter for disaster relief through sentiment analysis for COVID-19 and
natural hazard crises
Shivam Behl, Aman Rao, Sahil Aggarwal, Sakshi Chadha, H.S. Pannu *
Computer Science and Engineering Department Thapar Institute of Engineering and Technology Patiala India, India
A R T I C L E I N F O A B S T R A C T
Keywords: In emergencies and disasters, large numbers of people require basic needs and medical attention. In such situ
Disaster management ations, online social media comes as a possible solution to aid the current disaster management methods. In this
Deep learning paper, supervised learning approaches are compared for the multi-class classification of Twitter data. A careful
COVID-19 preparedness
setting of Multilayer Perceptron (MLP) network layers and the optimizer has shown promising results for clas
Sentiment analysis
sification of tweets into three categories i.e. ‘resource needs’, ‘resource availability’, and ‘others’ being neutral
and of no useful information. Public data of Nepal Earthquake (2015) and Italy Earthquake (2016) have been
used for training and validation of the models, and original COVID-19 data is acquired, annotated, and used for
testing. Detailed data analysis of tweets collected during different disasters has also been incorporated in the
paper. The proposed model has been able to achieve 83% classification accuracy on the original COVID-19
dataset. Local Interpretable Model-Agnostic Explanations (LIME) is used to explain the behavior and short
comings model on COVID-19 data. This paper provides a simple choice for real-world applications and a good
starting point for future research.
Natural hazards are events caused by natural forces of the Earth such It has been observed that the internet was available in situations
as earthquakes, cyclones, hurricanes, floods, volcanic eruptions, and, when there was no other medium of communication during the disaster
tsunamis which lead to loss of lives and immense disruption. A natural [2]. OSM websites such as Twitter, Facebook, WhatsApp, and Instagram
hazard is a tragic event with the atmospheric, geological origin which have played an important role in establishing this communication.
causes social environment disruption, huge damage, and fatalities [1]. These websites help in spreading real-time information about the di
There are varying estimates that the number of natural hazards across sasters by allowing people to share ground zero information and ask for
the world is in the range of 500–1,000 per year. There are two kinds of help. For example, in the 2015 Chennai Floods [3], many regions
help needed by the victims suffering in a natural hazard event, one is remained drowned in heavy rain, various groups and individuals offered
during the disaster, and the other is while dealing with its immediate to help voluntarily and were continuously interacting with social media
after-effects. To reduce the human and financial loss caused by the channels to share information and seek help. Many researchers propose
disaster, many different possible ways are being explored in support of three ways to use social media during natural hazard event [4,5]:
the current disaster management techniques and one such possible so
lution is the use of online social media (OSM). During a natural hazard 1. Preparing for a natural hazard - social media can help people better
event, there is a spike in communication since people seek to contact prepare for a disaster and understand which organizations will help
family and friends in the disaster-affected region and search for infor their communities.
mation regarding food, shelter, and transportation. 2. Responding during and immediately after the natural hazard - during
the disaster social media help users communicate directly to their
families, reporters, volunteer organizations, and other residents and
* Corresponding author.
E-mail address: [email protected] (H.S. Pannu).
https://doi.org/10.1016/j.ijdrr.2021.102101
Received 15 July 2020; Received in revised form 12 January 2021; Accepted 28 January 2021
Available online 3 February 2021
2212-4209/© 2021 Elsevier Ltd. All rights reserved.
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101
immediately share information. It also controls rumors because it’s 2. Recent works
easier for organizations to validate facts.
3. Recovering from the natural hazard event - Social media help bring The impact of users’ sentiments based upon geographical locations
the community together to discuss the event and share information, has been visualised for disastrous Hurricane Sandy [8]. The association
coordinate recovery efforts and get information about aid of retweet for sentiment divergence with the hurricane affects have been
studied during the disaster situation. A transient overview of natural
OSM data refers to all of the raw insights and information collected hazards and social media along with its basic functions and components
from an individual’s social media. This unstructured data is ever- provided in various other studies [4,9,10] proposes basic guidelines for
growing due to an exponential increase in the popularity of OSM. organizing information exchange between actors during a natural haz
There is a need for a solution to use OSM data to bridge the gap between ard. The role of Information in disaster management highlights the need
relief efforts, victims, and people willing to help with medicines, shel for an effective information system for efficient measures also recog
ters, and other resources. It can help us gather numbers, percentages, nizing various information sources and databases for handling disaster
and statistics of tweets from the victims straightforwardly. situations in India [11].
In this paper, we study how to use social media to gather situational A case study of the Chennai city of India has been performed on
information about various disasters to convert it into structured and WhatsApp and Facebook to manage the natural hazard event known as
usable information to devise a system that streamlines the process of ‘black swan’, which was one of the biggest floods in 2015 in southern
disaster management to help save more lives and conduct relief opera India [12]. The study revealed that the data retrieved from WhatsApp
tions more efficiently. Work here focuses on information posted on and Facebook conversations can be an eye-opener to gaps in resource
Twitter, though it can further be extended to other micro-blogging need and distribution which in real disasters should help in
platforms such as Facebook and Instagram. decision-making. The recovery and disaster response has been studied
for India and Pakistan with sentiment analysis by utilizing big data [13].
1.2. Research challenges Real-time visualization, categorization, and grouping of social media
have been discussed using machine learning techniques for the recovery
Existing disaster management techniques suffer from the following and response. Table 2 discusses the comparative analysis of recent works
research gaps [6,7]: including techniques used and results obtained. Table 1 lists definitions
of abbreviations used in this paper.
1 Supervised learning requires the labeling of massive social media The relationships between network size and Twitter activity during
dataset which is expensive and time-consuming and hence not the disasters were mapped [21]. The study conducted on five disasters
best approach during disasters. Unsupervised or semi-supervised (hurricanes Irene and Sandy, two sets of tornado outbreaks, and flooding
learning is limited due to the lack of rule-based structural in Louisiana) found that among all account types, individuals with
grouping. Therefore, the recent techniques are still at the infant stage “average” sized networks, i.e. those with 100 followers or fewer on
struggling for effectiveness and cost-efficiency for real-time Twitter, are most likely to share extensive information about the prep
applications. aration and relief methods of the disaster. The study on hurricane Irma
2. The majority of existing studies employ machine learning models by and Harvey [22] shows how geo-location and filtration of Tweets can be
only using basic lexicon features. Strong feature extraction tech used for rapid disaster relief. The study on hurricane Irma [23] cites the
niques should be incorporated with advanced machine learning usefulness of Twitter data in resource allocation decisions during a
techniques for accurate and quick performance. disaster. Geo-spatial and Supervised machine learning techniques were
3. Mainstream research on the disaster sentiment analysis only con used to categorize Geo-located tweets into negative, neutral, or positive
siders linguistic-based features and lexicon aspects only to match the groups. An approach is presented to enhance the identification of rele
victim demand and resources of supply for disaster relief. vant messages from social media which relies on the relationships be
4. Emotional association using the temporal and spatial features could tween geo-referenced social media messages and geographic features of
be deployed for the missing patterns through the social media in flood phenomena [24]. It concludes that that messages near (up to 10
formation for homophily effect. Homophily signifies the similarity in km) to severely flooded areas had a much higher probability of being
the behavior of a group of people about a certain event. related to floods when the technique is applied to examine Tweets
5. Sociological and psychological impacts on human behavior need to produced during the River Elbe flood of June 2013 in Germany. The
be employed to reduce the difference between the algorithmic pro accuracy of supervised learning algorithms (NB, RF, DT, SVM, LR in
totype and the human’s way of understanding. For example, fear,
hope, and excitement leverage during the disasters which often Table 1
overpower the normal sense of reasoning. Abbreviations used in the recent works.
6. Visual features could also leverage the effectiveness of disaster
Abbreviation Full-Form
management to evaluate the analytical factors for disaster-related
posts and prioritize relief operations. AUC Area under the curve
CNN Convolutional Neural Network
DL Deep Learning
This paper has seven sections. The first section has introduced the DT Decision Tree
topic, discussing the role of social media in disaster management and GBDT Gradient Boosting Decision
problem challenges. The second section highlights the recent works on IR Information Retrieval
KNN K-Nearest Neighbor
social media use in disaster management. The third section discusses the
LIME Local Interpretable Model-Agnostic Explanations
methodology and proposed architecture of the study. The fourth section LR Logistics Regression
is about experiments and detailed results. The fifth section discusses the MLP Multi Layer Perceptron
model interpretation and explainability. The sixth section contains dis NB Naïve Bayes
cussions about the research undertaken in this paper. The seventh sec OOV Out of Vocabulary
OSM Online Social Media
tion, the last, lists the research challenges confronted, concludes and RF Random Forest
points at future possibilities and research extensions. ROC Receiver Operating Characteristic curve
SVM Support Vector Machine
SLDA Supervised Linear Discriminant Analysis
XAI Explainable Artificial Intelligence
2
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101
3
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101
Fig. 2. Flow diagram representing the process sequence of input data preprocessing, Word2Vec [27,28], MLP layers, involved in the proposed architecture. Testing
phase is also similar to prediction.
− 1∑n
[ ( ) ( )]
L(w) = ti log ̂
ti + (1 − ti )log 1 − ̂
ti (3) ( ⃒ ) e(θ )
i
where, ti → true label, t̂i → predicted label, w→ Model parametersand i The standard exponent function is applied to each element θi when θ
ranges from 1 to n, i.e. the number of classes or target variables. The is input parameter represented by Equations (5) and (6). Here j is the
Output layer of the proposed model uses a Softmax activation function, index of summation which goes till n, the cardinality of vector θ. Also
primarily employed for the multi-class classification task. Softmax nor
malises the input values into a vector representing the probability of ∑
n
θ=b + wtj Xj (5)
each class which adds up to 1. Here probability is calculated for a class j=1
y = k with total classes being m. This is calculated by expression:
4
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101
3.2. Pre-processing
The dataset contains raw data in the form of sentences with special
symbols, email ids, and URLs. Such type of data is not required for
machine learning models and thus pre-processing is needed, which deals
with data preparation and transformation of the dataset to make the
input data easier to decode and interpret. Initially, lowerization is per
formed i.e converting all the characters to lower case. After that, all the
special words such as emails, mentions, links, images are eliminated
using a regular expression. Special symbols are also cut from the corpus
of the tweet. All the punctuation such as full stop, comma, and brackets
are removed using the NLTK library. Natural Language Toolkit (NLTK) is
an easy to use platform for Python programs to work with the human
language involving lexical resources and over 50 corpora (https://www.
nltk.org).
Removing of stop words such as “the”, “a”, “an”, “in” is also neces
sary for simplification of the language and efficient vectorization. In the
end, Lemmatization is performed using WordNetLemmatizer. The dataset
used has class imbalanced problem and the majority of the tweets are
from ‘others’ class. This bias influences many machine learning algo
rithms and can lead models ignoring minority class [35,36]. In our
Table 3
Definition of technical parameters and their values involved in the proposed
technique.
Parameter Meaning Value
5
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101
Table 4 Table 5
Dataset description of Italy and Nepal earthquake. There are three categories: Snapshot of need and availability tweets from Italy [18], Nepal [18] and
need, availability and others. COVID-19 datasets.
Categories Italy-Earthquake (2016) Nepal-Earthquake (2015) Tweets Class
6
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101
capped at 0.04 as the words with higher frequency (Eg. “Nepal”, 4.5. State-of-art techniques for performance comparison
“Italy”& “Earthquake”) added no value for the objective of the analysis.
To understand how different data distributions Nepal and Italy datasets The techniques which are considered for performance comparison
possessed, we used a simple MLP with TF-IDF to classify tweets into are discussed briefly as follows:
categories Nepal and Italy. TF-IDF, stands for term frequency–inverse 1. LR-TF [14]: Traditional logistic regression with TF-IDF features.
document frequency [38]. It is a metric to calculate the importance of a Logistic regression (LR) uses Logistic function at its core which maps a
word is to a document relative to the whole collection or corpus. real number onto (0,1) and defined by
This model can classify our two datasets with an AUC-ROC score of
L(x) = 1 / (1 + e− x ) (7)
0.94 which implies that both datasets have different distributions and
can be distinguished. Receiver operating characteristics (ROC) curve LR is used to find the probability of a class or several classes such as
and area under the curve (AUC) are performance metric used with an image containing a fruit, animal, human being etc. Where each class
various threshold values to see the classification performance [39]. will get a probability assigned adding to 1.
AUC-ROC shows the degree of classification and with a probability 2. CNNs were originally used for 2-D images, but for time series and
curve. AUC is a metric to calculate the ability of a classifier to differ Natural Language Processing, are used as 1-dimension convolutions. 1-D
entiate among various classes and is a summary measure for ROC. If CNNs with Disaster specific word embedding with (CNN-WF) and
0.5 < AUC < 1, then classifier is able to detect more true negatives and without fine-tuning (CNN–W) has been considered for our experiments.
true positives as compared to false negatives and false positives. CNN have been found promising for sentiment analysis for short length
Therefore it can classify positive and negative class examples and the texts (such as tweets) [19]. Fine tuning about Google embedding and
best AUC value is 1 which means it can distinguish positive and negative crisis embedding (for example) have been discussed [17] for the two
examples perfectly. So AUC-ROC score of 0.94 by the proposed model is types of pre-trained Embedding.
quite promising. 3. Multi-Layer Perceptron (MLP) [40] is a type of artificial neural
network consisting of multiple layers of perceptron along with threshold
4.4. Performance metrics activation. Perceptron is a type of binary linear classifier to decide if a
given vector belongs to either class, for example apples versus bananas.
The performance of the proposed architecture layers of MLP has been MLP consists of input layer, hidden layer(s) and output layer and uses
compared with the state-of-art models LR-TF [14], CNN–W [19], back-propagation learning technique for training. Hidden layers consist
CNN-WF [19], MLP-TF [20]. Metrics include accuracy, precision, recall, of non-linear activation functions (neurons) to separate non-linear data
and F1-Score which are described in Table 6. The next sub-section distribution. MLP has been studied for sentiment analysis for investi
briefly explains the state-of-art techniques. gating the rumors on Twitter for information authenticity [20]. MLP
combined with TF-IDF is denoted by MLP-TF and our proposed model is
MLP with Disaster Specific Word embedding (MLP-W). These techniques
7
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101
were carefully selected after researching recent applications of tradi respectively. Max epochs were set to 1000, but early stop callback with
tional machine learning and deep learning methods for short text clas patience 3 was used on validation loss save time and prevent over-fitting.
sification. When training on our dataset, new hyper-parameters were Epochs are the number of passes of the entire data during training.
selected using combination of random search and grid search. Bigger datasets are usually split into batches or groups which are passed
Next sub-section discusses about the results analysis and perfor through the model one at a time. Patience is the number of epochs when
mance comparison of the underlying case study. there is no further improvement and hence the training process finishes.
The dataset is split in training, validation and testing. The corresponding
4.6. Results errors or loss signify the quality of the model. For example higher
validation loss would signify that the trained model does not generalise
The result comparisons were done in five different cases, trained and well for the validation data and thus is not good enough for the testing
tested on the Nepal dataset, trained and tested on the Italy dataset, phase. Accuracy is the ratio of correct predictions over total input
trained on the Nepal dataset and tested on Italy dataset, trained on Italy samples [35] and it is inversely proportional to error.
dataset and tested on the Nepal dataset and trained and tested on the During training, we used oversampling of the minority class to deal
mix. The results are recorded in Tables (7, 8, 9) and Figure (6) with the acute data imbalance. In the Nepal dataset, we under-sampled
Fig. 5. (a)–(f) are the curves for accuracy and loss for training and validation for Nepal (a–b), Italy (c–d), and Nepal+Italy both (e–f) as a combined datasets. In (c),
the zig-zag pattern is due to smaller dataset, and with more tweets the curve becomes smoother.
8
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101
others class by taking 1000 tweets, took all the tweets from resource Table 8
availability class and over-sampled resource need class by taking each Comparative analysis of accuracy of models trained on the Italy Dataset. MLP-W
tweet twice. Similarly, in Italy dataset, we under-sampled others by was second best when tested on the Italy dataset but again scored most when
randomly selecting 400 tweets and over-sampled resource availability tested for re-usability on the Nepal dataset. Reusability scores for this case are a
and need by again selecting each tweet twice. For unbiased result again less which can be attributed to the fact that the Italy dataset has less tweets to
train the models.
testing was done by using stratified sampling.
Stratified sampling [41] selects data inversely proportional to the S.No. Method Accuracy on Nepal Accuracy on Italy
size of their stratum. For example Neyman allocation studied [42]. 1. LR-TF 0.47 0.88
Again, during testing, stratified sampling was done for unbiased testing 2. CNN–W 0.46 0.92
of the results. However, a different approach was used in testing on 3. CNN-WF 0.46 0.94
4. MLP-TF 0.45 0.97
original dataset as described in subsection 4.7.
5. MLP-W 0.52 0.95
Plots comparing training and validation accuracy against epochs and
training and validation loss against epochs for MLP-W model are pre
sented in Fig. 5(a–f). Curves which are obtained using smaller data sets
Table 9
provide a zig-zag pattern, showing less sensitivity to noise. Such zig-zag
Comparative analysis of accuracy of Models Trained on Mix (Nepal+Italy)
curves are obtained because noise of training data and validation data is dataset. MLP-W lagged a little behind when tested on Mixed Dataset but per
always different, and this effect is magnified when dealing with smaller formed best on testing reuability scores on original COVID-19 data. Because,
datsets. This behaviour can be observed in Fig. 5. COVID-19 data were not stratified during testing, a detailed performance eval
uation also is done in Table 10.
S. Method Accuracy on Mixed Dataset Accuracy on COVID-19 Dataset
4.7. Original dataset - COVID-19 tweets No.
used for the collection are not present in Italy and Nepal dataset so that
the new dataset does not become biased. Data collection was case dealing with imbalanced classes. MLP-W performed consistently better
insensitive for keywords. The data was manually annotated with labels in all the metrics and thus was considered as the winner. The next sec
‘need’, ‘availability’, and ‘others’ as present in the open dataset. tion discusses the Explainable Artificial Intelligence (XAI) technique
Retweets, repeated tweets, and tweets in languages other than English used to evaluate the test results obtained by the MLP model. XAI refers to
were removed to maintain the quality of the dataset. The dataset con the technique used to interpret and evaluate the machine learning pre
tains a total of 2,274 tweets. It contains 194 resource need tweets, 125 dictions for human analysis [25]. Local interpretable model-agnostic
resource availability tweets and 1,955 others tweets. We tested the explanations (LIME) [43] is a technique of XAI used to reverse engi
proposed MLP model trained on a mix of Nepal and Italy dataset on the neer the test results to evaluate the responsible factors.
Original dataset and found the results in Tables 9 and 10. Since this
dataset is small and only includes a few tweets of availability and need 5. Model interpretation - XAI
class, the whole dataset was used in the evaluation. In Table 10 we
calculated individual accuracy of each class along with their macro Disaster relief management is a critical task and any machine
average. We observed that models performed particularly badly in the learning model which is used for such an important task needs to be
availability class. F1-score is also included for a fair comparison of interpreted and explained before using it in the real world. Some ma
models on unseen data. The weighted average is also used since we are
9
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101
Table 10
Comparative analysis of algorithms trained on union of Nepal and Italy dataset and tested on COVID-19 dataset, MLP-W performed well in individual as well as mean
accuracy for all three classes. It was also able to get the highest F1-score among all the models. Weighted average is considered suitable for comparison when
imbalanced data is used.
S.No Method Accuracy F1-Score
chine learning models are easy to explain like linear models such as approximate linear model) has been set to 5,000 and ‘cosine similarity’
Logistic Regression, but many of the models used for real world appli is used as distance measure.
cations use deep learning. Deep neural networks can understand com For this study, we randomly selected tweets and applied LIME on it.
plex structures better than traditional machine learning but are also Figs. 7 and 8 show output results thus generated for misclassified
victims of being difficult to interpret. In this section, we have used LIME availability tweets. From Fig. 7, it can be observed that words like
[43] to understand the behavior of the MLP model, particularly for ‘blood’ make the tweet more likely to be resource-related, but its impact
explaining its inferior performance on COVID-19 Availability tweets. on availability tweet is very small. Further, the word ‘donate’ which
LIME works on the idea of approximating a black-box model, in this case from common sense can be attached to both need and availability classes
MLP, with a more interpretable white-box model constructed on input decreases the probability of the availability class (Figs. 7 and 8). On the
data. The functioning of LIME can be described by the following other hand, important words from Fig. 9 for correctly classified tweets
equation: make sense. The words like ‘blood’, ‘contact’ and ‘old’ make it a resource
related tweet, words ‘need’ and ‘please’ makes it a clear Need tweet.
explanation(x) = min[L(f , m, nx ) + θ(m)] (8)
mε M The under-performance of the preceding examples can be ascribed to
limitations of the training dataset, which are (i) Training data consti
where, m → explainable model, M → family of all possible explanations, tutes tweets during earthquake, however testing was done on a
θ → model complexity, L → loss function, f → black box model to be completely different kind of crisis data i.e. COVID-19. (ii) Smaller
explained, and nx → size of neighbourhood around instance x. LIME training data may lead to high variability in some cases thus causing
algorithm optimizes the loss part and model complexity is determined bias. This is further discussed in detail in Section 6. However, LIME has
by the user. In this experiment, “num_samples”(neighbourhood to learn
Fig. 7. Results of LIME on incorrectly classified availability tweet “today registered donate blood plasma nh finally feeling like ive recovered covid19 best feeling
tested positive three month ago feel grateful still bit help fight” gave probabilities of 0.69 for Need, 0.30 for Others and 0.01 as Availability.
10
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101
Fig. 8. Results of LIME on incorrectly classified availability tweet “ab positive Covid-19 symptom since 3 month glad donate plasma Kolkata” gave probabilities of
0.03 for Need, 0.02 for Availability and 0.95 as Others.
provided indispensable information about deficiencies of the proposed number of tweets it contains. Fig. 5(a–f) illustrates the accuracy and loss
model. These model interpretations can be monitored in the initial for training and validation for Nepal 5(a-b), Italy 5(c-d), and Nepal
stages of disasters. If they can be explained by humans, it can be +Italy both 5(e-f) as a combined datasets. Zig-zag pattern can be
considered as a sanity check. If some abrupt behavior is observed observed in Fig. 5c and d due to smaller dataset, which becomes
(Fig. 8), the underlying dataset for classification model can be changed smoother and with involvement of more tweets eventually. Also, the
to suit new crisis needs. model performed poorly for classifying availability tweets when it was
trained on Earthquake data set and tested on the COVID-19 data set. The
6. Discussion availability tweets in the COVID-19 dataset involved majorly new
resource names (eg. plasma) and were more directed towards appreci
The challenges faced during the current study are as follows: ation of volunteers as in Fig. 7.
The success of MLP can be attributed to the fact that being a deep
1. The annotated dataset available to train our supervised machine learning model, it has been able to learn complex structures better than
learning models is limited. traditional Machine Learning but did not rely on sentence structure like
2. It is found that out of all the data collected during disasters, only a CNN since we were dealing with tweets from various sources and thus
small proportion are related to disaster needs and availability. containing varied writing styles. Further, we found that pre-trained
3. Optimal neural network architecture depends upon the underlying disaster specific word embedding performed better than TF-IDF. This
data distribution used for training and validation. Therefore, it is a can be explained by the fact that we were dealing with a small dataset
problem without a well defined solution. and an embedding is trained on a much larger disaster specific dataset.
We have also discussed about the LIME technique for XAI to interpret
This work explores a supervised learning algorithm for classification models and analyze their shortcomings for real-world applications.
of user-generated Twitter data into three classes based on the whether
the Tweet talks about some resource need, resource availability or is 7. Conclusion
neither from any of these categories i.e. others. The testing of state-of-art
supervised learning text classification models was done for six different Following are the contributions made by our research:
cases. It was found that all the models performed quite similarly when
trained and tested on the same data set. However, when we trained on 1. Explored the importance and complexity of utilizing social media in
data collected during one disaster and tested on another disaster with disaster relief operations, specifically using the micro-blogging site -
different data distribution, a steep decline in the performance of some Twitter.
models was observed. Looking at the results it was found that deep 2. Considered two public datasets (Nepal and Italy Earthquake) for
learning algorithms performed much better as compared to traditional training and one original Twitter dataset (COVID-19) for testing and
machine learning methods. explored the reusability of the model trained on previous disasters.
Overall, MLP-W (proposed model) performed best. It scored 83% 3. Cleaned and normalized tweets and compared TF-IDF and pre-
accuracy when trained on data from a mix of Nepal Earthquake (2015) trained Word2Vec in different scenarios for text vectorization.
and Italy Earthquake (2016) and tested on the original COVID-19 4. Compared state-of-art supervised classification algorithm for segre
dataset. The problem with re-usability was observed when models gation of resource need, resource availability, and other tweets.
were trained on Italy data set which can be attributed to the limited
11
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101
Fig. 9. Results of LIME on correctly classified need tweet “55 year old lady max saket need plasma covid recovered person recovered 15 day back blood group please
contact daughter arpita” gave probabilities of 0.71 for Need, 0.14 for Others and 0.15 as Availability.
12
S. Behl et al. International Journal of Disaster Risk Reduction 55 (2021) 102101
[10] N. Kankanamge, T. Yigitcanlar, A. Goonetilleke, M. Kamruzzaman, Determining [26] A. Malhi, S. Knapic, K. Främling, Explainable agents for less bias in human-agent
disaster severity through social media analysis: testing the methodology with south decision making, in: International Workshop on Explainable, Transparent
east queensland flood tweets, International journal of disaster risk reduction 42 Autonomous Agents and Multi-Agent Systems, Springer, 2020, pp. 129–146.
(2020) 101360. [27] T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed
[11] M.S. Tad, K. Janardhanan, The role of information system in disaster management, representations of words and phrases and their compositionality, Adv. Neural Inf.
International Journal of Management and Social Sciences Research 3 (1) (2014) Process. Syst. 26 (2013) 3111–3119.
16–20. [28] Y. Goldberg, O. Levy, word2vec explained: deriving mikolov et al.’s negative-
[12] N. Bhuvana, I.A. Aram, Facebook and whatsapp as disaster management tools sampling word-embedding method, 2014 arXiv preprint arXiv:1402.3722.
during the Chennai (India) floods of 2015, International Journal of Disaster Risk [29] J. Friedman, T. Hastie, R. Tibshirani, The Elements of Statistical Learning, vol. 1,
Reduction 39 (2019) 101135. Springer series in statistics New York, 2001, p. 10.
[13] J.R. Ragini, P.R. Anand, V. Bhaskar, Big data analytics for disaster response and [30] N. Mamgain, E. Mehta, A. Mittal, G. Bhatt, Sentiment analysis of top colleges in
recovery through sentiment analysis, Int. J. Inf. Manag. 42 (2018) 13–24. India using twitter data, in: 2016 International Conference on Computational
[14] T. Pranckevičius, V. Marcinkevičius, Comparison of naive bayes, random forest, Techniques in Information and Communication Technologies (ICCTICT), IEEE,
decision tree, support vector machines, and logistic regression classifiers for text 2016, pp. 525–530.
reviews classification, Baltic Journal of Modern Computing 5 (2) (2017) 221. [31] M. Imran, P. Mitra, C. Castillo, Twitter as a Lifeline: Human-Annotated Twitter
[15] A.M. Ramadhani, H.S. Goo, Twitter sentiment analysis using deep learning Corpora for Nlp of Crisis-Related Messages, 2016 arXiv preprint arXiv:1605.05894.
methods, in: 2017 7th International Annual Engineering Seminar (InAES), 2017, [32] H.S. Pannu, S. Ahuja, N. Dang, S. Soni, A.K. Malhi, Deep Learning Based Image
pp. 1–4. Classification for Intestinal Hemorrhage, MULTIMEDIA TOOLS AND
[16] Z. Ashktorab, C. Brown, M. Nandi, A. Culotta, Tweedr: mining twitter to inform APPLICATIONS, 2020.
disaster response, in: ISCRAM, 2014, pp. 269–272. [33] Y. LeCun, Y. Bengio, G. Hinton, “Deep learning,” nature 521 (7553) (2015)
[17] D.T. Nguyen, K.A. Al Mannai, S. Joty, H. Sajjad, M. Imran, P. Mitra, Robust 436–444.
classification of crisis-related data on social networks using convolutional neural [34] K.P. Murphy, Machine Learning: a Probabilistic Perspective, MIT press, 2012.
networks, in: Eleventh International AAAI Conference on Web and Social Media, [35] H. Kaur, H.S. Pannu, A.K. Malhi, A systematic review on imbalanced data
2017. challenges in machine learning: applications and solutions, ACM Comput. Surv. 52
[18] M. Basu, A. Shandilya, P. Khosla, K. Ghosh, S. Ghosh, Extracting resource needs (4) (2019) 1–36.
and availabilities from microblogs for aiding post-disaster relief operations, IEEE [36] H. He, Y. Ma, Imbalanced Learning: Foundations, Algorithms, and Applications,
Transactions on Computational Social Systems 6 (3) (2019) 604–618. John Wiley & Sons, 2013.
[19] Z. Jianqiang, G. Xiaolin, Z. Xuejun, Deep convolution neural networks for twitter [37] S. Raschka, V. Mirjalili, Python Machine Learning: Machine Learning and Deep
sentiment analysis, IEEE Access 6 (2018) 23253–23260. Learning with Python, Scikit-Learn, and TensorFlow 2, Packt Publishing Ltd, 2019.
[20] M.S. Akhtar, A. Ekbal, S. Narayan, V. Singh, E. Cambria, No, that never happened!! [38] J. Ramos, et al., Using tf-idf to determine word relevance in document queries, in:
investigating rumors on twitter, IEEE Intell. Syst. 33 (5) (2018) 8–15. Proceedings of the First Instructional Conference on Machine Learning, vol. 242,
[21] M.T. Niles, B.F. Emery, A.J. Reagan, P.S. Dodds, C.M. Danforth, Social media usage 2003, pp. 133–142. New Jersey, USA.
patterns during natural hazards, PloS One 14 (2) (2019). [39] S.D. Walter, The partial area under the summary roc curve, Stat. Med. 24 (13)
[22] M. Enenkel, S.M. Saenz, D.S. Dookie, L. Braman, N. Obradovich, Y. Kryvasheyeu, (2005) 2025–2040.
Social Media Data Analysis and Feedback for Advanced Disaster Risk Management, [40] S.K. Pal, S. Mitra, Multilayer perceptron, fuzzy sets, classifiaction, IEEE Trans.
2018 arXiv preprint arXiv:1802.02631. Neural Network. 3 (5) (1992) 683–697.
[23] D. Reynard, M. Shirgaokar, Harnessing the power of machine learning: can twitter [41] E. Liberty, K. Lang, K. Shmakov, Stratified sampling meets machine learning, in:
data be useful in guiding resource allocation decisions during a natural disaster? International Conference on Machine Learning, 2016, pp. 2320–2329.
Transport. Res. Transport Environ. 77 (2019) 449–463. [42] J. Neyman, On the two different aspects of the representative method: the method
[24] J.P. De Albuquerque, B. Herfort, A. Brenning, A. Zipf, A geographic approach for of stratified sampling and the method of purposive selection, in: Breakthroughs in
combining social media and authoritative data towards identifying useful Statistics, Springer, 1992, pp. 123–150.
information for disaster management, Int. J. Geogr. Inf. Sci. 29 (4) (2015) [43] M.T. Ribeiro, S. Singh, C. Guestrin, “” why should i trust you?” explaining the
667–689. predictions of any classifier, in: Proceedings of the 22nd ACM SIGKDD
[25] K. Främling, Decision theory meets explainable ai, in: In International Workshop International Conference on Knowledge Discovery and Data Mining, 2016,
on Explainable, Transparent Autonomous Agents and Multi-Agent Systems, pp. 1135–1144.
Springer, 2020, pp. 57–74.
13