0% found this document useful (0 votes)
104 views5 pages

Sentiment Analysis From Movie Reviews Us

This document summarizes research on using LSTMs for sentiment analysis from movie reviews. The researchers used LSTMs to classify movie reviews as either positive or negative sentiment. They experimented with different hyperparameters and model configurations to determine the best performing model on the IMDB benchmark dataset. Their LSTM model achieved the best performance for sentiment classification of movie reviews compared to other models.

Uploaded by

Office Work
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views5 pages

Sentiment Analysis From Movie Reviews Us

This document summarizes research on using LSTMs for sentiment analysis from movie reviews. The researchers used LSTMs to classify movie reviews as either positive or negative sentiment. They experimented with different hyperparameters and model configurations to determine the best performing model on the IMDB benchmark dataset. Their LSTM model achieved the best performance for sentiment classification of movie reviews compared to other models.

Uploaded by

Office Work
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Ingenierie des Systemes d'Information

Vol. 24, No. 1, February, 2019, pp. 125-129


Journal homepage: http://iieta.org/Journals/isi

Sentiment Analysis from Movie Reviews Using LSTMs


Jyostna Devi Bodapati1, N. Veeranjaneyulu2*, Shareef Shaik1
1
Department of CSE, Vignan’s Foundation for Science, Technology and Research, Vadlamudi, Guntur, AP, India
2
Deaprtment of IT, Vignan’s Foundation for Science, Technology and Research, Vadlamudi, Guntur, AP, India

Corresponding Author Email: [email protected]

https://doi.org/10.18280/isi.240119 ABSTRACT

Received: 13 November 2018 With the advent of social networking and internet, it is very common for the people to share
Accepted: 19 January 2019 their reviews or feedback on the products they purchase or on the services they make use
of or sharing their opinions on an event. These reviews could be useful for the others if
Keywords: analyzed properly. But analyzing the enormous textual information manually is impossible
recurrent neural networks, gated recurrent and automation is required. The objective of sentiment analysis is to determine whether the
neural networks, text mining, word reviews or opinions given by the people give a positive sentiment or a negative sentiment.
embedding, SVM, deep neural networks This has to be predicted based on the given textual information in the form of reviews or
ratings. Earlier linear regression and SVM based models are used for this task but the
introduction of deep neural networks has displaced all the classical methods and achieved
greater success for the problem of automatically generating sentiment analysis information
from textual descriptions. Most recent progress in this problem has been achieved through
employing recurrent neural networks (RNNs) for this task. Though RNNs are able to give
state of the art performance for the tasks like machine translation, caption generation and
language modeling, they suffer from the vanishing or exploding gradients problems when
used with long sentences. In this paper we use LSTMs, a variant of RNNs to predict the
sentiment analysis for the task of movie review analysis. LSTMs are good in modeling very
long sequence data. The problem is posed as a binary classification task where the review
can be either positive or negative. Sentence vectorization methods are used to deal with the
variability of the sentence length. In this paper we try to investigate the impact of hyper
parameters like dropout, number of layers, activation functions. We have analyzed the
performance of the model with different neural network configurations and reported their
performance with respect to each configuration. IMDB bench mark dataset is used for the
experimental studies.

1. INTRODUCTION Movie review analysis, product analysis, twitter opinion


mining etc.
Sentiment analysis is the task of processing the given In this work we have focused on understanding the polarity
textual information to analyze the emotions in it. In simple of the given movie reviews by classifying whether it is
words we need to analyze whether the textual information positively polarized or negatively polarized. This problem can
talks positive or negative feedback about the product or topic. be posed as a multi label classification task where the final
It is also popularly known as opinion mining. It requires the opinion could be worse, bad, neutral, good and excellent. In
knowledge of natural language processing, artificial this work the problem is posed as a binary classification task
intelligence and machine learning. Sentiment analysis is all where the final opinion can be either positive or negative. The
about what other people are thinking about something. reviews given by different people are of different lengths with
Sentiment analysis is very much useful as it provides useful different number of words in each review. Sentence
inferences and also helpful to understand public opinion on a vectorization methods are used to deal with the variability of
product or service. the sentence length.
Internet is a rich source of such textual opinion or review In this paper we try to investigate the affect of different
information. Analyzing such information would give us lots of hyper parameters like dropout, number of layers, activation
information and future insights. For example in an online functions. We have analyzed the performance of the model
shopping website people usually write their reviews after with different neural network configurations and reported their
buying and using the product. These reviews are very much performance with respect to each configuration.
helpful for the customers who wish to buy that product. The The IMDB benchmark dataset is used for our experimental
problem here is, when the number of reviews is large in studies that contain movie reviews that are classified as being
number, it is not possible for the customer to read all the positive or negative. In the experiment, an LSTM model is
reviews before taking a decision. So it would be helpful if we compared to other models and the LSTM model yields the best
can automate this process and the task is popularly known as performance on the IMDB datasets.
sentiment analysis. Potential applications of this task are:

125
Figure 2. Working model of RNN

Though RNNs are capable of modeling long sequential data


theoretically they fail to represent long sequences in real time
applications [3]. This is mainly due to the vanishing or
exploding gradients problem. RNNs are trained using back
propagation through time (BPTT). For longer sequences when
gradient flows back through time, it is possible that the
gradient either can explode or can vanish and the model cannot
be trained further. Therefore RNNS are not well suitable for
modeling longer sequences in data.
Figure 1. A sample positive and negative review To address the issue of exploding or vanishing gradients a
variant of neural networks has been introduced called as, Long
Short-Term memory networks (LSTMs), well suitable to
2. BACKGROUND WORK model longer sequences in the data [4]. LSTM makes use of
four gates to regulate the flow of data.
Sentiment analysis is the process of analyzing the given
textual information to analyze the emotions in it []. In recent
past one or more of the following models are being used for
this task. In vocabulary based methods, the important
keywords would be identified and the review considered as
positive, negative or neutral depending on the set of words it
consists of. The task of Sentiment analysis can be achieved
using two different types of techniques: Lexicon based and
machine learning based techniques.
Lexicon based methods or corpus based methods leverage
the set of words and semantics of the words in the given review.
These are the unsupervised techniques so do not require
labelled data. Machine learning based techniques are the
supervised methods that rely on labeled data. These methods Figure 3. Illustration of the LSTM cell
overcome the lexicon methods and the popular approaches are
logistic regression, support vector machines (SVM), multi Figure 3 shows the pictorial representation of LSTM cell.
layer perceptron (MLP). LSTM leverages four different gates to determine how much
Among the machine learning based techniques, recently, of the new information has to be added to the cell state (input
deep neural networks [1] have displaced all the classical (i)), how much of the previous cell state information has to be
methods and achieve great success for the problem of forgotten (forget (f)), gate (g) and output (o) gate along with
automatically generating sentiment analysis information the cell (c) and hidden (st) state. σ represent logistic sigmoid.
descriptions for images and videos. Most recent progress in Following formulas are used to compute the state values at
this problem has been achieved through employing recurrent each gate.
neural networks (RNNs) for this task. A Recurrent Neural
Network is a type of Neural Network that is suitable to model i = σ(𝑥𝑡 𝑈 𝑖 + 𝑠𝑡−1 𝑊 𝑖 )
sequence data [2]. These networks can better represent the
temporal dynamics in the data. f = σ(𝑥𝑡 𝑈𝑓 + 𝑠𝑡−1 𝑊 𝑓 )
RNNs are perfect to model sequential data as they are
capable of remembering the input with its internal memory o = σ(𝑥𝑡 𝑈 𝑜 + 𝑠𝑡−1 𝑊 𝑜 )
state and recurrent connections to learn and model sequential
data as shown in Figure2. Formula for calculating current state: g = tanhσ(𝑥𝑡 𝑈𝑔 + 𝑠𝑡−1 𝑊 𝑔 )
ht = f(ht-1, xt) = tanh(whhht-1 + wxh xt)…… (1) 𝑐𝑡 = 𝑐𝑡−1 o𝑓 + 𝑔o𝑖
In Eq. (1), ht, ht-1 and xt are current state, previous state and 𝑠𝑡 = tanh⁡(𝑐𝑡 )o𝑜
input state respectively and whh and wxh represent weights
associated with the hidden state and weights associated with GRU (gated recurrent unit) works the same way as LSTM
the input respectively. cell but the number of gates is reduced in GRU by merging the
input and forget gates functionality one gate is being designed

126
called update gate. Hence compared to LSTM it is simpler in used to process long sequential data. Following are the steps
operation and has very few in number of parameters which involved in the process: Identify the dataset, load the data,
ultimately results in faster training time than LSTMs. clean the data, then develop a vocabulary and save the
processed data. In this work we use the IMDB, benchmark
Movie Review Data is used for the experimental studies.
IMDB is a collection of movie reviews retrieved from the
imdb.com website.

Figure 4. Illustration of LSTM vs GRU

2.1 Sentence vectorization

The objective of this task is to identify the polarity of the


given review. Usually these reviews are in the text format and
none of the neural network models can be used with the text Figure 5. Block diagram of proposed sentiment classification
data. The given text has to be converted to vector form to use
it as input to the model. The process of converting the given In the first step we have considered IMDB dataset for
reviews or any text format as a vector is known vectorization sentiment analysis. During the preprocessing stage, the entire
of the textual data. Two popular methods in the literature are text has been converted to lowercase and all the white spaces,
one hot encoding and Bag-of-Words (BoW) representations punctuation marks and other special symbols are removed.
[5]. In one hot encoding a binary representation is used with Then tokenize the sentences and remove all the single letter
size of the vector equal to the number of words in the words, numbers. Remove all the less frequent words also.
vocabulary. Each element of the vector represents the Then the remaining words will form the vocabulary. The next
occurrence or no-occurrence of the corresponding word. In step is to represent words of the vocabulary as feature vectors
case of BoW method the ith element of the vector represents so that these feature representations can be used as input to the
the number of times ith word of the vocabulary occurred in the modelling. A more classical representation is the bag of words
given text. That is instead of using a binary 1 or a 0, frequency (BoW) representation. In this representation, if k is the size of
count is used to represent the number of times a particular the vocabulary then the feature vector size is k where every ith
word occurred. These methods are simple but the context of element represents the number of occurrences of ith word in the
the words is not preserved in the representation. The resultant document. This method is very simple and is successfully used
vectors are of size equal to the vocabulary size and are sparse for language modelling and document classification. But the
[6]. problem with this representation is the relationship between
Another method called word embedding is introduced [7] words is completely ignored. In the proposed work we use
[8], where the vector size is much lesser than the size of the word2vec for word embedding. Word2Vec is one of the most
vocabulary. The sentences that are semantically similar are popular technique to learn word embeddings using shallow
represented using similar features [9]. In the vector feature neural network. Word2vec has two primary methods of
space the sentences that are semantically similar would results contextualizing words: The Continuous Bag-of-Words model
in higher similarity than the ones that are not. (CBOW) and the Skip-Gram model. CBOW, which is the less
One of the recent successful word embedding approach is popular of the two models, uses source words to predict the
the word2vec method [10]. In our work we make use of the target words. In practice, this model is very inefficient when
word2vec representation as they are capable of preserving the working with a large set of words.
context in the final feature representation [11] [12]. For our work we use the Skip-Gram model which works in
Once the representation is available any classification the opposite fashion of the CBOW model, using target words
model can be used. As there is some sequence in the data to predict the source, or context, of the surrounding
LSTM can model the sentence vectors better and finally a words. Here we train a simple neural network with a single
SoftMax layer is used to classify the results. hidden layer where the objective is to learn the weights of the
hidden layer. The weights of the hidden layer represent the
“word vectors” that we’re trying to learn.
3. PROPOSED WORK

The objective of this work is to analyze the reviews and 4. EXPERIMENTAL RESULTS
predict the sentiment based on the reviews available. In this
work we propose a sequential model to identify the sentiment 4.1 Summary of the dataset
analysis of the movie reviews. To process the sequential data,
LSTM is proposed. LSTM is a variant of RNN that is majorly For our experimental study we use the IMDB dataset. It is

127
the large movie review dataset and is a bench mark for movie when the model is trained with 50 epochs we reached the best
review dataset that contains a total of 50,000 reviews out of performance when compared to the other models. When the
which 25000 are positively polarized and 25000 are negatively review length is set to 1000 we can observe there is a dip in
polarized. Among the total available reviews, 25,000 reviews performance. When the number of LSTM units is increased to
are used for training and the remaining 25000 are used for 200 then also we can observe the deterioration in performance.
evaluating the performance of the trained model. That is the This could be due to overfitting problem.
same amount of data is used for training and testing. The
objective of this work is to identify the polarity of the given 4.3 Comparison with other models
review that is whether the review given is of positive sentiment
or negative sentiment. This subsection gives a comparative study of performance
of different classification models on the benchmark IMDB
Table 1. Summary of the IMDB dataset movie review dataset. The proposed model is compared with
logistic regression, SVM, MLP and CNN. Except CNN all the
Dataset # Total # Train #Test #Classes other models are shallow models and SVM is the robust
samples samples samples classification model compared to the other shallow models.
IMDB 50000 25000 25000 2
Table 3. Comparison of proposed model with existing
The task here is to classify whether the given reviews lead models
to a positive or negative sentiment. We use logistic regression,
SVM based classification models to compare the results of our SNO Classification Model Accuracy
proposed model. SVM is a shallow model which is proved to 1 Logistic Regression 85.5 %
be the most robust to classify the given data. But with the 2 SVM with linear kernel 82.89 %
introduction of deep neural networks all the existing methods 3 MLP 87.70 %
were fallen behind in terms of performance. 4 DNN 87.64 %
5 LSTM + DNN 88.46 %
4.2 Architecture of the proposed network used
To compare the proposed model with the existing model a
We initialize word embedding layer with random values. comparison study has been done using different existing
Each word is represented with an embedding vector of size 32. models. Table 3 shows that the proposed LSTM based model
Maximum length of the review is set to 500 as most of the with word2vec embedding gives better performance compared
reviews are having more than 500 words and very few are to other models. Linear regression is better than SVM. This
falling o the other side. The top 5000 words are used in the could be due to the linear kernel used with the SVM model.
vocabulary and infrequent words are removed from the
dictionary to avoid unnecessary computations.
During training the hyper parameters that resulted in best
performance are: Dropout is applied with a rate of 0.5. Adam
optimizer is used to optimize the model and binary cross
entropy is used as the loss function. Initial learning rate is set
to 0.001 and with a decay rate of 0.97. A batch size of 128 is
adopted. The regularization parameter is set to 0.001 to avoid
overfitting.

Table 2. Performance of the model with different


configurations

LSTM Max_
Configuration Figure 6. Comparison of proposed model with existing
Epochs -units review Accuracy
of the model models
length
3 100 500 85.51 %
From Figure 6 it is very much evident that the proposed
3 100 1000 86.72 %
model is giving better performance when compared to the
EMBEDDING 3 100 500 87.44 % other models. SVM gives the least performance and MLP and
LAYER 3 50 500 86.88 % DNN gives close to the LSTM based model.
+ 3 200 500 86.65 %
LSTM 30 200 500 84.68 %
LAYER 10 100 500 84.96 % 5. CONCLUSION AND FUTURE SCOPE
+ 3 100 500 86.96 %
DENSE 10 100 500 86.87 % The objective of the work is to generate the polarity of the
LAYER 10 100 500 86.41 % given review. For the word representations we proposed to use
50 100 500 88.46 % word2vec embeddings as it can represent the contextual
information well compared to the other models. The
Table 2 exhibits the accuracy of the proposed model with experimental studies prove that the proposed method when
different architectures used with LSTM based classification gives the best
From Table 2 we can observe that with 100 LSTM units performance.
In future we are planning to extend this study to a larger

128
extent where different embedding models can be considered In: IEEE Transactions on Multimedia 20(12): 3377-3388.
on large variety of the datasets. https://doi.org/10.1109/TMM.2018.2832602
[7] Ain QT, Ali M, Riazy A, Noureenz A, Kamranz M,
Hayat B, Rehman A. (2017). Sentiment analysis using
REFERENCES deep learning techniques: A review. In: International
Journal of Advanced Computer Science and Applications
[1] Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, (IJACSA) 8(6):
Bougares F, Schwenk H, Bengio, Y. (2014). Learning https://doi.org/10.14569/IJACSA.2017.080657
phrase representations using RNN encoder- decoder for [8] Sokolova M. (2018). Big text advantages and challenges:
statistical machine translation. In: Proceedings of the Classification perspective. In: International Journal of
Conference on Empirical Methods in Natural Language Data Science and Analytics 5(1): 1-10.
Processing (EMNLP), Doha, Qatar, pp. 1724–1734. https://doi.org/10.1007/s41060-017-0087-5
https://doi.org/10.3115/v1/D14-1 [9] Bouazizi M, Ohtsuki T. (2018). Multi-class sentiment
[2] Sutskever I, Vinyals O, Le QV. (2014). Sequence to analysis in Twitter: What if classification is not the
sequence learning with neural networks. In: Advances in answer. In: IEEE Access 6: 64486-64502.
Neural Information Processing Systems. https://doi.org/10.1109/ACCESS.2018.2876674
[3] Pascanu R, Mikolov T, Bengio Y. (2012). On the [10] Yan XL, Subramanian P. (2018). A review on exploiting
difficulty of training recurrent neural networks. In: social media analytics for the growth of tourism. In:
ICML'13 Proceedings of the 30th International International Conference of Reliable Information and
Conference on Machine Learning 28: 1310-1318. Communication Technology. Springer, Cham.
https://doi.org/10.1007/s12088-011-0245-8 https://doi.org/10.1007/978-3-319-99007-1_32
[4] Hochreiter S, Schmidhuber, J. (1997). Long short-term [11] Shafi MK, Bhat MR, Lone TA. (2018). Sentiment
memory. In: Neural Computation 9(8): 1735–1780. analysis of print media coverage using deep neural
https://doi.org/10.1162/neco.1997.9.8.1735 networking. In: Journal of Statistics and Management
[5] Ramos J. (2013). Using TF-IDF to determine word Systems 21(4): 519-527.
relevance in document queries. In: Proceedings of the https://doi.org/10.1080/09720510.2018.1471263
First Instructional Conference on Machine Learning, pp. [12] Petrolito R, Dell’Orletta F. (2018). Word embeddings in
242. sentiment analysis. In: Proceedings of the Fifth Italian
[6] Dong J, Li X, Snoek CGM. (2018). Predicting visual Conference on Computational Linguistics.
features from text for image and video caption retrieval.

129

You might also like