See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/327820158
Sentiment Extraction From Bangla Text : A Character Level Supervised
Recurrent Neural Network Approach
Conference Paper · February 2018
DOI: 10.1109/IC4ME2.2018.8465606
CITATIONS READS
16 452
3 authors:
Mohammad Salman Haydar Mustakim Al Helal
DataShall Analytics Ltd University of Regina
2 PUBLICATIONS 28 CITATIONS 7 PUBLICATIONS 83 CITATIONS
SEE PROFILE SEE PROFILE
Syed Akhter Hossain
Daffodil International University
141 PUBLICATIONS 1,188 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Natural Language Processing View project
Bengali news headline classification using LSTM View project
All content following this page was uploaded by Mohammad Salman Haydar on 23 March 2019.
The user has requested enhancement of the downloaded file.
Sentiment Extraction From Bangla Text : A
Character Level Supervised Recurrent Neural
Network Approach
Mohammad Salman Haydar Mustakim Al Helal Syed Akhter Hossain
Computer Science and Engineering Computer Science Computer Science and Engineering
Daffodil International University University of Regina Daffodil International University
Dhaka,Bangladesh Regina, SK, Canada Dhaka,Bangladesh
Email:
[email protected] Email:
[email protected] Email:
[email protected] Abstract—Over the recent years, people are heavily getting real life intelligence operations on online sectors. Judging
involved in the virtual world to express their opinions and others opinions has a better demand to various businesses.
feelings. Each second, hundreds of thousands of data are being In order to have a better analytics to generate a more accurate
gathered in the social media sites. Extraction of information
from these data and finding their sentiments is known as a information, we need to be able to analyze people reviews.
sentiment analysis. Sentiment analysis (SA) is an autonomous The main contributions of this paper are as follows:
text summarization and analysis system. It is one of the most • Showing the effect of character-level representation in
active research areas in the field of NLP and also widely studied
in data mining, web mining and text mining. The significance
Bangla language.
of sentiment analysis is picking up day by day due to its • Making a comparison on traditional representation of
direct impact on various businesses. However, it is not so words with our approach.
straightforward to extract the sentiments when it comes to the The paper is organized in different sections. After a brief
Bangla language because of its complex grammatical structure.
In this paper, a deep learning model was developed to train summary of related work in section II we discussed about
with Bangla language and mine the underlying sentiments. A the data collection, preprocessing and character encoding in
critical analysis was performed to compare with a different deep section III. Then the methodology, the model and experimental
learning model across different representation of words. The setup has been discussed in section IV. The following section
main idea is to represent Bangla sentence based on characters after that, demonstrates the results of our experiment and
and extract information from the characters using a Recurrent
Neural Network (RNN). These extracted information are decoded discussed about the experimental process. Finally, future work
as positive, negative and neutral sentiment. and conclusion were drawn.
Index Terms—Bangla, Sentiment Analysis, RNN, Deep Learn-
ing, Character level RNN, NLP in Bengali II. R ELATED W ORK
Due to the complex grammatical structure and less resources
I. I NTRODUCTION
in Bangla, the language went through a very few research work
One challenge in understanding the user opinions from so far and most of the researches on SA have been carried out
social media is to extract the information from the large in English language. Researchers proposed different methods
amount of opinionated text. It become more complicated when to get the state-of-the-art results. However, some past works
opinions are not made explicitly. However, it is a difficult and related to this topic were studied for this paper.
a time consuming task for human beings to classify different In [1] a sentiment analysis was performed on Romanized
data and extract the opinion. Sentiment analysis has become Bangla and Bangla text collected from different social medias.
vital to data science since the online review is becoming more They applied deep recurrent neural network (LSTM) to train
popular every day. There are many conventional methods for their model and they got accuracy of 78% with categorical
sentiment analysis. Deep learning techniques have also been crossentropy loss.
used in sentiment analysis. For instance, Convolutional Neural In [2], the authors have used a semi-supervised method to
Network (CNN), Recurrent Neural Network (RNN) etc have identify the sentiment of twitter posts. They first annotated
been used in practice to solve sentiment analysis problems. the post into positive and negative polarity using a rule based
However, sentiment analysis for the reviews or short texts classifier to make training data and then used this data to train
in Bangla has not been much of an addressed research in a their sentiment classifier. They used support vector machine
large scale till date. Bangla has now an increasing amount of (SVM) and Maximum Entropy (MaxEnt) algorithm and they
texts used in social media e.g Facebook, blogs etc. Therefore achieved a result of 93% accuracy on SVM using emoticons
analyzing Bangla text will open a new horizon towards the as features.
A hybrid method was proposed in [3] to identify the Table I
sentiment from the sentence. The authors first determined DATA S TATISTICS
whether a sentence is subjective or not and they designed Class Number
a model from the mixture of various parts-of-speech-tagging Positive 8271
(POS) features collected from phrase level similarity and then Negative 14000
Neutral 12000
they used syntactic model to perform sentiment analysis. By total 34271
doing so they achieved overall 63% recall rate using SVM on
the news data.
Some other researchers have also worked on social media examples. The major drawback of these representations is that
data to identify the sentiment in [4]. Here they used sentiment they strictly rely on the words of the documents and if any
analysis on a specific domain. They collected the data ( post ) word is found that was not observed during the training period
from a Facebook group and then applied two different methods then the model would not understand this and this word will
to identify the polarity of a post. One of the approaches are have no effect on the model. In most of the research, words
Naive Bayes and another one is by using lexical resources. are considered as the unit of the sentence but it can also be a
After the experiment, they found that in specific domains character. In our research we have taken characters as a unit
lexicon based approach performs better than the other ones. of the sentence.
In [5], Xiang Zhang et al. performed an empirical study on
III. DATA C OLLECTION AND P REPROCESSING
character level text classification using Convolutional Net on
A. Data Collection English dataset and found that this method works well on the
The Data have been collected from Facebook page using real life data or on data that is generated by the users. The
Facebook graph api. The data are mostly comments of the accuracy depends on some other factors, for instance choice
users in the posts on the Facebook and we have also collected of alphabets, size of the dataset etc. In our work we chose
the reviews from the pages, specifically from the e-commerce 67 alphabets of Bangla language including space and some
and restaurant Facebook pages. As reviews contains direct special characters. Figure 2 is showing the characters that we
opinions of the users. We collected 45 thousand plus data from have included. We didnt include any numeric characters in
Facebook.
B. Data preprocessing
We remove all the unnecessary data tuples except those
containing Bangla. Then we tagged those data manually into Figure 2. Characters
Positive, Negative and Neutral class. Figure 1 is showing how
the data is looks like after cleaning noisy data. Bangla. The characters one, two and three in Bangla is used
instead of three other Bengali letters for the representation
purpose due to pythons limitation to recognize them. We
then encoded each character in a sentence using a unique
id from the list of characters. This process is illustrated in
Figure 3. Here the length of the sequence is l=1024 and we
Figure 1. Dataset Sample Figure 3. Illustration of Encoding
And Table I showing the data statistics of our data set believe within this length we can take most part of a sentence.
after performing cleaning operation. Noisy data are considered Sentences less than the length of 1024 characters were padded
those containing english words or only emoticons or only using zero and sentences more than 1024 characters truncated
randomized bangla words which are not necessary for our to 1024. The characters other than the selected 67 are removed
classification. before the encoding phase using regular expression.
C. Character Encoding IV. M ETHODOLOGY
For using this data to our model, we first represented Deep Learning method has been applied successfully to
this dataset in a vector space. There are different methods the Natural Language Processing problems and achieved the
to represent text data. Tf-Idf, Bag of Words and distributed state-of-the-art results in this field. Recurrent Neural Net [6]
representation of words (e.g word2vec, Glove) etc are some is a kind of Neural Network which is used for processing
sequential data. But later on, the researchers found some two with 128 LSTM unit and another one is the vanilla layer
mathematical problems to model long sequences using RNN with 1024 unit and output layer with 3 units. We have used
[7][8]. a dropout [12] layer between output layer and the last hidden
A clever idea was proposed by Hochreiter and Schmidhuber layer with the probability of 0.3.
to solve this problem. The idea is to create a path and let the In our model we used an embedding layer with 67 units, 3
gradient flow over the time steps dynamically [9]. It is know hidden layers where two layers are with 128 GRU units each
as Long Short Term Memory (LSTM). It is a very popular and and one vanilla layer with 1024 units stacked up serially and
successful technique to handle long-term dependency problem. at last the output layer. Here we have also used a dropout of
There are some variants of LSTM. One of them is Gated 0.3 between the output layer and the last hidden layer. Our
Recurrent Unit (GRU) proposed by Cho et al. [10]. The model is illustrated in Figure 4.
difference between LSTM and GRU is that it merged forget
and input gates into a update gate which means it can control
the flow of information but without the use of memory unit
and it combines cell state and hidden state along with some
other changes. The rest of the thing is the same as LSTM. In
[11] Junyoung Chung et al. conducted an empirical study on
three types of RNN and found that Gated Recurrent Unit is
superior then other two. GRU is also computationally more
efficient than LSTM.
Figure 4. Model Structure
zt = σ(Wz .[ht−1 , xt ]) (1)
rt = σ(Wr .[ht−1 , xt ]) (2) B. Experimental Setup
h˜t = tanh(W.[rt ∗ ht−1 , xt ]) (3) We ran our model in 6 epoch with batch size of 512 and
we used Adam [13] as our optimizer. We also used categorical
ht = (1 − zt ) ∗ ht−1 + zt ∗ h˜t (4)
cross entropy as our loss function. We set the learning rate
Here are the equations that demonstrates how the hidden at 0.01 to train our model. Many different hyperparameters
state ht is calculated in GRU. It has two gates, one is the (learning rate, number of layers, layer size, optimizer) were
update gate z, another one is the reset gate r. equation (1) used and this gave us an optimal result. The embedding size
and (2) are showing how these two are calculated. The reset was kept 67 as we have 67 characters and the dropout was
gate determines how to combine the new input with the set to 0.3 between the output layer and the dense layer of the
previous memory, and the update gate defines how much of two models. Early stopping was used to avoid overfitting. All
the previous memory to keep around. And finally hidden state our experiments were done in python library named keras [14]
ht is calculated as equation (4). which is a high-level neural networks API.
However, the classification task of the sentiment is a step
by step process. For example, if we want to classify the first C. Results and Discussion
sentence from the Figure 1 then at first it will go through the The result that we achieved from the character level model
preprocessing step. Here all the characters except the defined over word level model is pretty good. We came up with 80%
ones above will be filtered out from the sentence and the accuracy on character level mode and 77% accuracy from our
remaining sentence will be represented in a vector space. baseline model with word level representation. Over the recent
Every character will be given a numeric id and then it will be time, sentiment analysis achieved a highest of 78% accuracy
padded by zero to 1024 characters (any sentence with more in [1] using LSTM in Bangla with two class classification.
than 1024 character will be compressed down to 1024). This Figure 5 showing the training and testing loss of our model.
vector will be fed through the model and eventually the model Here we can see that after a certain epoch the training loss
maps the input sentence to a sentiment class. In each hidden started decreasing more than the testing loss. Training keeps
layer of the model, more lower level and meaningful features decreasing. The testing loss on the other hand decreases at a
are extracted from the previous layer and the output layer slower rate compared to the rate of the training loss. So we
calculates the softmax probability of each of the class. The stopped training at epoch 6 resulting in the saving the model
class which has the highest probability is the predicted result. from overfitting. Figure 6 is showing the training and testing
For simplicity and better understanding of the reader we tried accuracy of our character level model. and Figure 7 is showing
not to put all the mathematical details about how the model the comparison between the two models.
learned through backpropagation. The most important observation from our experiments is
that the character-level RNN could work for text classification
A. Model without the need for words semantic meanings. It may also
The baseline model that we compared with consists of one extract the information even if the word is not correct as we
embedding layer with 80 units and 3 hidden layers where are now going through each of the characters individually.
Figure 5. Training and Testing loss
Figure 7. Comparison of the two models
across different data is one future goal of this project that will
make it useable in the industry level to extract sentiment from
the social media reviews and comments.
Figure 6. Training and Testing accuracy
R EFERENCES
However, we need to undergo more study to prove this for [1] Hassan, Asif, et al. ”Sentiment analysis on bangla and romanized bangla
text using deep recurrent models.” Computational Intelligence (IWCI),
Bangla. So we can use this representation to handle real life International Workshop on. IEEE, 2016.
data from social media. However, to observe the performance [2] Chowdhury, Shaika, and Wasifa Chowdhury. ”Performing sentiment
of this model across different datasets, more research is analysis in Bangla microblog posts.” Informatics, Electronics & Vision
(ICIEV), 2014 International Conference on. IEEE, 2014.
needed. Nevertheless, the result depends on various factors [3] Das, Amitava, and Sivaji Bandyopadhyay. ”Phrase-level polarity identifi-
including the size of the dataset, alphabet choice, data quality cation for Bangla.” Int. J. Comput. Linguist. Appl.(IJCLA) 1.1-2 (2010):
etc. But our dataset is focused on a specific telecommunication 169-182.
[4] Akter, Sanjida, and Muhammad Tareq Aziz. ”Sentiment analysis on
campaign domain. So this model can be helpful on some facebook group using lexicon based approach.” Electrical Engineering
specific application. and Information Communication Technology (ICEEICT), 2016 3rd Inter-
We calculated the accuracy as a ratio of correctly classified national Conference on. IEEE, 2016.
[5] Zhang, Xiang, Junbo Zhao, and Yann LeCun. ”Character-level convolu-
data and a total number of data from the test set. The equation tional networks for text classification.” Advances in neural information
is as follows: processing systems. 2015.
[6] Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams.
Tp + Tn ”Learning representations by back-propagating errors.” nature 323.6088
Accuracy = (5)
Tp + Tn + Fp + Fn (1986): 533.
[7] Hochreiter, Sepp. ”Untersuchungen zu dynamischen neuronalen Netzen.”
V. F UTURE WORK AND C ONCLUSION Diploma, Technische Universitt Mnchen 91 (1991).
[8] Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. ”Learning long-
To conclude, this paper offers a research based study on term dependencies with gradient descent is difficult.” IEEE transactions
character-level RNN for sentiment analysis in Bangla. We on neural networks 5.2 (1994): 157-166.
compared it with a deep learning model with word level [9] Hochreiter, Sepp, and Jrgen Schmidhuber. ”Long short-term memory.”
Neural computation 9.8 (1997): 1735-1780.
representation and found a good result. However, the model is [10] Cho, Kyunghyun, et al. ”On the properties of neural machine translation:
not a generic of it kind since it worked well with data from a Encoder-decoder approaches.” arXiv preprint arXiv:1409.1259 (2014).
specific domain. Also, we did not address sarcastic sentence [11] Chung, Junyoung, et al. ”Empirical evaluation of gated recurrent neural
networks on sequence modeling.” arXiv preprint arXiv:1412.3555 (2014).
analysis in this model. So, if a positive word is used in the [12] Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural
sentence with a negative sarcastic perspective the model will networks from overfitting.” Journal of machine learning research 15.1
not be able to detect this. Hence, this needs to be addressed (2014): 1929-1958.
[13] Kingma, Diederik, and Jimmy Ba. ”Adam: A method for stochastic
which is a challenge due to the level of abstraction an user can optimization.” arXiv preprint arXiv:1412.6980 (2014).
create through one sentence. So, intensive research is needed [14] C. François and others, ”Keras”, Keras.io, 2015. [Online]. Available:
in this regard. Our analysis shows that character-level RNN http://keras.io. [Accessed: 16- Nov- 2017].
is an effective method to extract the sentiment from Bangla.
The model however is still immature and yet to be applied
to Romanized Bangla. So, making the model more reliable
View publication stats