10 1108 - Jhti 02 2022 0078
10 1108 - Jhti 02 2022 0078
https://www.emerald.com/insight/2514-9792.htm
JHTI
6,3 Predicting sentiment and rating of
tourist reviews using
machine learning
1188 Karlo Puh and Marina Bagic Babac
Faculty of Electrical Engineering and Computing, University of Zagreb,
Received 21 February 2022
Revised 16 April 2022
Zagreb, Croatia
12 June 2022
13 June 2022
16 June 2022 Abstract
Accepted 16 June 2022 Purpose – As the tourism industry becomes more vital for the success of many economies around the
world, the importance of technology in tourism grows daily. Alongside increasing tourism importance and
popularity, the amount of significant data grows, too. On daily basis, millions of people write their
opinions, suggestions and views about accommodation, services, and much more on various websites.
Well-processed and filtered data can provide a lot of useful information that can be used for making
tourists’ experiences much better and help us decide when selecting a hotel or a restaurant. Thus, the
purpose of this study is to explore machine and deep learning models for predicting sentiment and rating
from tourist reviews.
Design/methodology/approach – This paper used machine learning models such as Naı€ve Bayes, support
vector machines (SVM), convolutional neural network (CNN), long short-term memory (LSTM) and
bidirectional long short-term memory (BiLSTM) for extracting sentiment and ratings from tourist reviews.
These models were trained to classify reviews into positive, negative, or neutral sentiment, and into one to five
grades or stars. Data used for training the models were gathered from TripAdvisor, the world’s largest travel
platform. The models based on multinomial Naı€ve Bayes (MNB) and SVM were trained using the term
frequency-inverse document frequency (TF-IDF) for word representations while deep learning models were
trained using global vectors (GloVe) for word representation. The results from testing these models are
presented, compared and discussed.
Findings – The performance of machine and learning models achieved high accuracy in predicting positive,
negative, or neutral sentiments and ratings from tourist reviews. The optimal model architecture for both
classification tasks was a deep learning model based on BiLSTM. The study’s results confirmed that deep
learning models are more efficient and accurate than machine learning algorithms.
Practical implications – The proposed models allow for forecasting the number of tourist arrivals and
expenditure, gaining insights into the tourists’ profiles, improving overall customer experience, and upgrading
marketing strategies. Different service sectors can use the implemented models to get insights into customer
satisfaction with the products and services as well as to predict the opinions given a particular context.
Originality/value – This study developed and compared different machine learning models for classifying
customer reviews as positive, negative, or neutral, as well as predicting ratings with one to five stars based on a
TripAdvisor hotel reviews dataset that contains 20,491 unique hotel reviews.
Keywords Sentiment analysis, Machine learning, Deep learning, Customer reviews, Tourism
Paper type Research paper
Introduction
Customer experience and opinion are crucial for the enhancement of the tourism industry.
Therefore, this industry has already largely adapted to information and communication
technologies and the advent of big data (Madyatmadja et al., 2021). Currently, many tourist
services are available online such as booking websites (Manosso and Domareski Ruiz, 2021).
Journal of Hospitality and Tourism © Karlo Puh and Marina Bagic Babac. Published by Emerald Publishing Limited. This article is
Insights published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce,
Vol. 6 No. 3, 2023
pp. 1188-1204 distribute, translate and create derivative works of this article (for both commercial and non-commercial
Emerald Publishing Limited purposes), subject to full attribution to the original publication and authors. The full terms of this licence
2514-9792
DOI 10.1108/JHTI-02-2022-0078 may be seen at http://creativecommons.org/licences/by/4.0/legalcode
Since tourists use a lot of websites and social media to leave their personal opinions or Analyzing
comments on a specific place or service, customer reviews have become a significant factor tourist reviews
when deciding which possible hotels or restaurants to visit (Neidhardt et al., 2017). For
example, a number of all reviews on TripAdvisor overpassed a total of 884 million in 2020
using machine
(Statista, 2020). Information obtained from these reviews is important to other tourists but learning
also to service providers who can then note key aspects that make their hotel/restaurant good
or bad (Sumarsono et al., 2018).
In parallel with the huge increase in the number of online user reviews, there is also a 1189
growing need for automated processing of these huge amounts of data because it is
impossible for humans to read and analyze all these reviews on their own (Gour et al., 2021).
Sentiment analysis is a technique used by natural language processing to identify and
extract information in data (Collobert and Weston, 2008). In most cases, it means to
determine whether a review expresses positive, or negative sentiment (Barbierato et al.,
2021). Although there is much research in sentiment analysis of tourist reviews over the
past decade, most of the research is limited to positive/negative classification. Fewer studies
include neutral review sentiment in addition to positive/negative (Wadhe and Suratkar,
2020), which is a more demanding task and is included in this study. Adding neutral in
sentiment classification is important because it gives us additional useful information. A
neutral comment is usually an indicator of concern since the customer can easily turn
positive or negative. Thus, taking into consideration neutral comments can help one to
increase the number of satisfied customers since it is easier to turn a neutral experience into
a positive than a negative one. Moreover, even fewer studies include rating classification
and prediction based on tourist reviews (Harrag et al., 2019), which are also analyzed in this
study. Performing rating prediction is useful when one has a lot of customers’ comments
and wants to process them fast. That way we can easily visualize data (customer
satisfaction) and quickly see if drastic changes need to be made. Furthermore, it enables us
to sort comments based on their importance. It makes sense to first act on comments rated
with the lowest score and make our way up. Predicting specific ratings rather than
sentiment is useful when we want to get more detailed information on some factors, which
make a customer’s rating great or poor. Then, we can find common features that make
customers’ experience poor and improve them in the future but also see what is being done
well. In addition, the studies that test the performance of sentiment analysis are rare in the
tourism and hospitality domain (Mehraliyev et al., 2022), thus our study also contributes to
filling this gap.
In this paper, we have analyzed sentiment and ratings of a specific place or service
expressed in customer reviews on TripAdvisor to predict tourist satisfaction. We have
conducted sentiment and rating classification using different methods ranging from machine
learning algorithms like Naı€ve Bayes and support vector machines (SVMs) to deep learning
methods. Experimental results have shown that deep learning methods based on
bidirectional long short-term memory (BiLSTM) outperformed other implemented
methods. Based on results from his study, tourist service providers can easily and quickly
process a lot of data and get very accurate customer feedback, since user-generated content is
regarded as the most influential content in the tourism industry. There are other noteworthy
benefits of sentiment analysis like shaping company marketing strategies, classification of
textual data and providing overall better service.
Literature review
Sentiment analysis has been performed with a variety of techniques over the last decade
including lexicon-based and machine-learning and deep-learning-based techniques (Jurafsky
and Martin, 2000).
JHTI Lexicon-based sentiment analysis
6,3 For lexicon-based sentiment analysis, a sentiment relates to its semantic value and the
intensity of each word in the sentence, which requires a pre-defined lexicon to classify
positive and negative words (Bagic Babac and Podobnik, 2016). Generally, a text item is
treated as a bag of words, and after scoring each word, the sentiment is obtained by a certain
pooling operation such as taking an average of individual word scores.
Today many of these lexicon-based approaches are automated, such as using the
1190 TextBlob (Loria, 2018), a Python library for natural language processing (NLP). Larasati et al.
(2020) used TextBlob to obtain sentiment analysis scores from eight tourist websites, which
confirmed most of the visitors’ sentiments were positive. In addition, a lexicon-based
approach has been used to evaluate consumers’ sentiment toward several well-known
technological brands (Mostafa, 2013), and sentiment analysis confirmed a generally positive
consumer sentiment. Tan and Wu (2011) utilized a lexical database for extracting hotel
reviews from Ctrip based on the random walk algorithm for the automated generation of a
specific-domain sentiment lexicon. Serna et al. (2016) made use of the WordNet lexical
database to obtain emotions from Twitter mentioning two holiday periods. In addition, Kang
et al. (2012) proposed a replacement senti-lexicon for the sentiment analysis of building
reviews based on an improved Naı€ve Bayes algorithm.
It should be noted that most of the lexicon-based approaches are built upon, so-called,
general-purpose lexicons (Avdic and Bagic Babac, 2021). Bagherzadeh et al. (2021) developed
two specific lexicons, namely weighted and manually selected lexicons, which were tested
and validated by applying classification accuracy metrics to the TripAdvisor data. Their
approach outperformed a SentiWords lexicon-based method and a Naı€ve Bayes machine-
learning algorithm in classifying sentiment.
Research methodology
Data preprocessing
Preprocessing is one of the most important steps when performing any NLP task
(Bagic Babac and Podobnik, 2016). Basically, preprocessing means bringing the text into a
clean form and making it ready to be fed into the model. When it comes to data
preprocessing, there are many useful techniques. Specifically in this paper, tokenization is
the first step in preprocessing. Tokenization means splitting a sentence into a list of words.
After tokenization, removing stop words comes as the next step. Stop words are words that
are commonly used in any language. If we take for example English, stop words are words
such as “is”, “the”, “and”, “a”, etc. Those words are considered unimportant in natural
language processing, so they are being removed. Next comes the process of transforming a
word into its root or lemma called lemmatization. An example of that would be “swimming”
to “swim”, “was” to “be” and “mice” to “mouse”. Considering that machines treat the lower
and upper case differently, all the words will be lower-cased for better interpretation.
Finally, all punctuation is being removed which contributes to noise reduction and getting
rid of useless data.
To perform preprocessing tasks, spaCy was used, an open-source library for advanced
natural language processing in Python. It is multilingual, but for this project, only English
JHTI was needed. After loading data for the English language, spaCy enables us to perform
6,3 tokenization, lemmatization and stop word removal quite easily. Examples of using spaCy
and the explained techniques are shown in Table 1.
Word representation
Since computers do not understand words or their context, it is needed to convert text into the
1192 appropriate, machine-interpretable form. Word embeddings are mathematical
representations of words that give similar representations to words that have a similar
meaning (Mikolov et al., 2013). In other words, those representations model the semantic
meaning of words. Specifically, those representations are vectors that are positioned in space
in such a way that vectors closer to each other have more similar semantic meanings.
The word representation used in this research is called global vectors (GloVe) for word
representation as introduced by Pennington et al. (2014). Since then, it gained popularity due
to its good performance and simplicity. The GloVe is a log-bilinear model with a weighted
least-squares objective trained on a global word-word co-occurrence matrix. That matrix
shows words’ co-occurrence frequency with one another in a given corpus. The main idea
behind GloVe is that ratios of word-word co-occurrence probabilities encode meaning, as
shown with an example in Table 2.
If we investigate the example shown in Table 2, we can see some actual probabilities from a
six billion word, i.e. token corpus. The table shows how the word ice co-occurs more frequently
with solid, but steam co-occurs more with gas. Furthermore, if we look at the word water, we can
see that both ice and steam co-occur with it frequently because it is their shared property.
Another way used for representing words by vectors is Term Frequency Inverse
Document Frequency (TF-IDF). It is commonly used in NLP tasks because it takes into
consideration the relevance of a word in a document and scales it across all documents in a
specific corpus. TF-IDF is calculated by multiplying two metrics, namely term frequency, and
inverse document frequency (IDF). Term frequency (TF) is the number of times a specific
word (term t) appears in a document (d) divided by the total number of words in a document as
shown in Eq. (1) (Jurafsky and Martin, 2000).
Rude people. Do not stay, despite the fact cool hotel, “rude”, “people”, “stay”, “despite”, “fact”, “cool”,
the place sucks, rudest people, are disappointed “hotel”, “place”, “suck”, “rude”, “people”,
“disappointed”
Great location, jr. suite is great, clean comfortable, “great”, “location”, “jr”, “suite”, “great”, “clean”,
close pike. Market in walking distance, breakfast nice “comfortable”, “close”, “pike”, “market”, “walking”,
Table 1. and fresh “distance”, “breakfast”, “nice”, “fresh”
Examples of data Enjoyed the hotel. Location and service costs are “enjoy”, “hotel”, “location”, “service”, “cost”,
preprocessing excellent, good room. Recommend “excellent”, “good”, “room”, “recommend”
Finally, to calculate TF-IDF for the specific term we multiply those two values.
N
tf idf ¼ tf ðt; dÞ $ log (3)
df þ 1
The main difference between these two described vectorization methods is that TF-IDF is
easier to use, but GloVe carries semantic meaning and can understand the context better.
Sentiment analysis using machine learning
For the purposes of this study, Naı€ve Bayes and SVMs were chosen as frequently used
machine learning algorithms in data science (Poch Alonso and Bagic Babac, 2022).
Naı€ve Bayes is one of the most commonly used methods in natural language processing
tasks. It is based on the Bayes theorem which calculates the probability of a specific event
based on prior knowledge using the next equation:
PðxjcÞPðcÞ
PðcjxÞ ¼ (4)
PðxÞ
where PðcjxÞ is a posterior probability of a class, PðcÞ is the prior probability of a class, PðxÞ is
the prior probability of the predictor, and PðxjcÞ is the conditional probability that the
predictor is a given class.
SVM is a machine learning algorithm that uses a hyperplane to separate different classes
of data. A hyperplane is a subspace that is always one dimension less than its parent
dimension. For example, if we were in a two-dimensional space then a hyperplane would be a
line. The main goal of this algorithm is to find the hyperplane that has the largest distance
(margin) between the hyperplane and the nearest data called support vectors. New data is
being classified based on which side of the hyperplane they are located. Furthermore, the
larger the margin is, the more confidence we have in determining data class.
Next, we want to update the cell state. The second gate, called the input gate, also
using the sigmoid layer decides which values to update. Afterward, we combine the result
of the input gate with the tanh layer to create the update on the cell state (Hochreiter,
1998).
it ¼ σ ðWi $½ht−1 ; xt þ bi Þ (6)
e t ¼ tanhðWC $½ht−1 ; xt þ bC Þ
C (7)
et
Ct ¼ ft * Ct−1 þ it * C (8)
Specifically, to update the cell state, we multiply the old cell state by the forget gate, then add
it with the input gate multiplied with C e t. Described process is shown in Eq. (8). Finally, we
have the output gate. Its job is to calculate the next hidden state. As Eq. (9) shows, we first
pass the current and the previously hidden state through the sigmoid. Then, to get the output,
we put the cell state through tanh and multiply it by the previously calculated sigmoid output.
As a result of everything mentioned, we get the new hidden state shown in Eq. (10). In the end,
the new hidden state and the cell state are carried over to the next cell (Hochreiter and
Schmidhuber, 1997).
ot ¼ σ ðWo $½ht−1 ; xt þ bo Þ (9)
ht ¼ ot * tanhðCt Þ (10)
Described LSTM model achieves much better results than traditional RNN (Sherstinsky,
2020) but there is still a place for an upgrade. We have seen that LSTM uses information from
the past, meaning that the current state depends on the information before that moment. In
order to have more contextual information in every moment, i.e. increase the amount of
networks information, we use BiLSTM. BiLSTM consists of two LSTMs, each one of them
going in a different direction. The first one goes forward (from past to the future) and the
second one goes backward (from future to past). That kind of architecture enables us to
understand the context much better.
Besides RNNs, CNNs have been commonly used for text classification and sentiment
analysis tasks, although they are more known for working with images. The difference here is
that one-dimensional (1D) convolution is being used instead of two-dimensional (2D) like with
images as inputs. One of the biggest CNN’s advantages is that they are translation invariant.
It basically means that when some pattern is learned, CNN can recognize it later at any other
different position. Just as 2D convolution, 1D convolution includes many kernels with
weights that are learned through the training process. Those kernels are designed to generate
an output by looking at the word and its surroundings. That way, since similar words have
similar vector representations, convolution will produce a similar value. In practice, those
convolutional layers are combined with pooling layers that discard less relevant information
(Kuhn and Johnson, 2013).
Model architectures for machine learning Analyzing
For conducting sentiment analysis, a few different methods and architectures were proposed. tourist reviews
First, we implemented two machine learning algorithms, namely MNB and SVM. In these
machine learning approaches, we used TF-IDF for word representations.
using machine
After that, we implemented deep learning models using the GloVe for word learning
representations. Our first deep learning model is based on a 1D CNN, i.e. it consists of
three 1D convolutional layers combined with dropout and max-pooling layers with three
linear layers followed by softmax in the end. Described architecture is shown in Figure 1. 1195
Furthermore, we implemented a model architecture that consists of two stacked
BiLSTMs followed by three linear layers with a softmax function at the end. The model’s
architecture is shown in Figure 2. For this model, word representations are provided by
GloVe, thus the word embeddings are used as the inputs of BiLSTM. After passing word
embeddings through two BiLSTM layers and text feature extraction, vectors are used as
inputs into three linear neural network layers with ReLU activation functions is to
perform text classification. Lastly, the output is passed through the softmax function to
convert the numerical output into the range [0, 1] representing the probabilities of
each class.
In addition, another model has the same architecture as the one shown in Figure 2, but we
used normal LSTMs instead of BiLSTMs.
Figure 1.
Proposed CNN model
architecture
Figure 2.
BiLSTM model
architecture
JHTI Experimental results
6,3 For the purpose of training the models to achieve good performance in practice, the dataset
has to be convincing (Cvitanovic and Bagic Babac, 2022). Having that in mind, data was
extracted from TripAdvisor, the world’s largest travel platform that today has over 860
million reviews and opinions (Alam et al., 2016a, b). This study utilized a dataset called
TripAdvisor Hotel Reviews that contains 20,491 unique hotel reviews graded from one to five
stars by guests (Alam et al., 2016a, b). For training purposes, the dataset was split into three
1196 parts, that is training, evaluation and testing subsets in the ratio of 70% for the training, 10%
for evaluation and 20% for the testing subset (Kuhn and Johnson, 2013).
Since the used dataset consists of reviews and their scores which are grades from one to
five, the machine learning models are first trained to predict the exact grade based on the
review text. MNB algorithm resulted in 46% accuracy on the test data while SVM managed to
outperform Naı€ve Bayes and achieve the accuracy of 55%.
After machine learning algorithms, deep learning models were trained and tested.
Given that grid search is quite exhaustive and time-consuming, a random search was
used in the process of setting hyperparameters for training (Vrigazova, 2021).
Furthermore, an early stopping mechanism was used and the model with the smallest
loss in the evaluation data was saved (Marrese-Taylor et al., 2014). Also, to prevent the
model from overfitting, we used the dropout mechanism. In addition, a technique to
prevent exploding gradients problem called gradient clipping was used too. Through all
training processes, the batch size of 16 examples was constant. Considering the problem
of predicting the score using text review can be treated as a classification task, a loss
function called categorical cross-entropy was implemented. Categorical cross-entropy is
one of the most popular loss functions when it comes to multi-class classification
(Neidhardt et al., 2017). It is shown in Eq. (11), where byi is the i-th value in the model
prediction and yi is the true label value.
X5
Loss ¼ yi $log byi (11)
i¼1
Finally, an optimization algorithm for stochastic gradient descent called Adam was chosen
for model training.
The highest accuracy that the 1D CNN managed to achieve after conducting a random
search for setting hyperparameters was 62%. Furthermore, the stacked LSTM model
performed expectedly better than CNN based model. The best model with LSTM architecture
managed to achieve 66% accuracy. Finally, a stacked BiLSTM model outperformed other
models by achieving 72% accuracy. Table 3 shows experimental results of the BiLSTM
based model on test data for specific hyperparameters combinations. Figure 3 shows losses in
evaluation and training data during the training of the best-performing model.
Finally, we can compare the experimental results of all the above classification methods.
An overview of those results is shown in Table 4. The “Rating task” column summarizes the
previously explained results, i.e. classifying reviews into five classes (or grades) from one to
Table 3.
Experimental results of
the BiLSTM model for Learning rate Dropout Clip norm Accuracy
different
hyperparameters: 0.001 0.33 0.33 0.69
learning rate, dropout, 0.0001 0.33 0.33 0.49
and gradient 0.001 0.5 0.5 0.72
clipping norm 0.001 0.65 0.65 0.71
Analyzing
tourist reviews
using machine
learning
1197
Figure 3.
Visualization of loss
during training on
training and evaluation
data for the best
performing
BiLSTM model
Table 5.
Learning rate Dropout Clip norm Accuracy Experimental results
for different
0.001 0.33 0.33 0.85 hyperparameters:
0.001 0.45 0.45 0.89 learning rate, dropout,
0.001 0.65 0.75 0.84 and gradient
0.001 0.55 0.55 0.86 clipping norm
five, and the “Sentiment task” column shows the results from classifying reviews into three
classes representing positive, negative and neutral customer experience. During the process
of calculating the scores, a review is considered positive if it has a score greater than 3, neutral
if the score is 3, and negative if the score is less than 3.
Table 5 shows the results for different hyperparameters combinations of the BiLSTM
model proposed for sentiment classification, while Figure 4 shows how evaluation and
training loss behave during the training process of the model with the highest accuracy.
From the results presented in this section, it can be concluded that deep learning models
delivered better overall performance than the existing classical machine learning approaches.
It has been shown that by leveraging the BiLSTM-based model architecture with touristic
opinion data, higher accuracy in predictions may be obtained. This model’s high accuracy
and efficiency can help further improve the hotel or tourism industry in better understanding
JHTI
6,3
1198
Figure 4.
Visualization of loss
during the training
process on training and
evaluation data for the
best performing model
the requirements and expectations of tourists, which benefits both customers and touristic
organizations and businesses.
Although deep learning models outperform other machine learning models in these multi-
class predicting tasks, it can be also noticed that the results from the sentiment task also seem
satisfactory in certain settings, e.g. 80% for SVM given the fact that simpler pre-processing
and less memory consumption were used. Thus, during the decision-making process in a
particular setting or environment, one can balance between achieving higher efficiency and
accuracy versus utilizing less computational resources. However, for a more complex task
such as rating prediction, deep learning models provide significantly better accuracy
compared to some other models that do not even provide adequate accuracy (e.g. Naı€ve Bayes
with below 50% accuracy).
In the comparison of our results to the results of others (Gitto and Mancuso, 2017; Mehta
et al., 2021; Wang et al., 2022), it can be noted that there are differences in the size, quality, and
purpose of a particular dataset and different uses and implementations of sentiment analysis.
In addition, there are various approaches to calculate whether the sentiment is positive,
negative, or neutral, e.g. a study that explored the cruise experiences (Wang et al., 2022)
considered a comment as negative “if a comment’s positive score was less than or equal to two
times the absolute value of its negative score”. Moreover, a recent survey on the sentiment
analysis in hospitality and tourism (Mehraliyev et al., 2022) has reported that the studies that
test the performance of sentiment analysis are rare, thus our results contribute to filling
this gap.
Discussion
Conclusions
Given the vast amount of data on people’s individual opinions, there is a need to develop and
improve existing sentiment analysis tools. These tools not only serve the individuals as a
recommender on how to optimize their choices of services to use, but also to decision-makers
in improving the quality of their services. The long-term implications of the knowledge
gained by these sentiment tools may influence tourism development and the engagement of
tourist stakeholders. Our contribution in the form of proposed models can indicate a plausible
further direction for developing more robust and accurate models for sentiment and rating Analyzing
classification. tourist reviews
More specifically, this study provides an insight into how to apply machine and deep
learning models for sentiment analysis on tourist reviews. It showed that the BiLSTM model
using machine
outperformed in both sentiment and rating classification tasks. Specifically, in our BiLSTM learning
model, data were first passed through two stacked BiLSTMs whose job was to gather
contextual information followed by three linear layers that perform classification. Models
were trained to classify reviews first into five and later into three classes with GloVe used for 1199
word representation. As result, the best performing model for five classes achieved 72%
accuracy while the best model for three classes surpassed accuracy by 89%. For
methodological comparison, other models based on machine learning called Naı€ve Bayes
and SVMs were implemented as well as other deep learning models like 1D convolution and
LSTM. Experimental results have shown that the deep learning model based on BiLSTM
achieved the best results in both tasks.
Our results confirmed that deep neural network algorithms are more accurate than
machine learning algorithms (Waghmare and Bhala, 2020). Deep neural networks in general
need less time as less human intervention is needed, and they perform automated feature
extraction. However, to produce appropriate accuracy and performance, they require larger
amount of data than machine learning algorithms and the training costs are also high.
Theoretical implications
In recent research based on customer comments in the hospitality and tourism field, the
three themes were identified as most relevant, those are behavior, social media, and
marketing related to user-generated data (Mukhopadhyay et al., 2022). In addition to the use
of sentiment analysis to gain insights from these user-generated data, various text mining
techniques are used depending on a particular research goal, e.g. a recent study that
analyzed online reviews from TripAdvisor has applied sentiment analysis, clustering, topic
modeling, and machine learning algorithms for real-time classification (Gour et al., 2021).
Furthermore, sentiment variables were investigated “not only as independent but also as
dependent variables” (Mehraliyev et al., 2022). For instance, Kim and Han (2022) used
regression analysis to understand the impacts of the length of stay at hotels on online
reviews.
A systematic review of sentiment analysis literature in hospitality and tourism from
methodological and thematic perspectives confirmed that “testing the performance of
sentiment analysis was uncommon” (Mehraliyev et al., 2022), and our study contributes to
filling this gap by providing performance results of sentiment analysis based on different
machine and deep learning models. While most studies use sentiment analysis as a tool to
find insights into customer opinions, our study provides a methodological framework to
create and customize sentiment analysis models based on machine and deep learning
approaches.
Sentiment analysis theoretics might find fruitful insights from methodological aspects of
this study, for instance, when investigating an appropriate model architecture for the
particular purpose and domain as well as fine-tuning the parameters of the machine and deep
learning models. This study provides detailed methodological insight into several different
models, their architectures and complete training and testing processes. Furthermore,
different word representation models like TF-IDF and Glove are implemented and compared.
It can also be noted that our approach goes beyond hospitality and tourism domain.
Another direction for exploring optimal models for rating prediction is the use of other
features as input to the model. Additional features may also be learned from the text, e.g. by
content analysis (Wang et al., 2022), or topic analysis (Gour et al., 2021).
JHTI Furthermore, the touristic insights made from these models may provide a basis for the
6,3 understanding of tourist behavior patterns and setting up a theoretical framework for
shaping public opinion, i.e. distilling variables that contribute to making opinions. Such a
framework can contribute to establishing short-, mid-, or long-term marketing and other
relevant strategies for companies and organizations given a particular touristic context.
Further reading
Kim, Y. (2014), “Convolutional neural networks for sentence classification”, Proceedings of the 2014
Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for
Computational Linguistics, pp. 1746-1751.
Malthouse, E.C., Haenlein, M., Skiera, B., Wege, E. and Zhang, M. (2013), “Managing customer
relationships in the social media era: introducing the social CRM house”, Journal of Interactive
Marketing, Vol. 27, pp. 270-280.
Moliner-Velazquez, B., Fuentes-Blasco, M. and Gil-Saura, I. (2022), “Antecedents of online word-of-
mouth reviews on hotels”, Journal of Hospitality and Tourism Insights, Vol. 5 No. 2,
pp. 377-393.
Tsai, C.F., Chen, K., Hu, Y.H. and Chen, W.K. (2020), “Improving text summarization of online hotel
reviews with review helpfulness and sentiment”, Tourism Management, Vol. 80, doi: 10.1016/j.
tourman.2020.104122.
Zhang, Z., Ye, Q. and Law, R.Y.L. (2010), “The impact of e-word-of-mouth on the online popularity of
restaurants: a comparison of consumer reviews and editor reviews”, International Journal of
Hospitality Management, Vol. 29, pp. 694-700.
Corresponding author
Marina Bagic Babac can be contacted at: [email protected]
For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: [email protected]