0% found this document useful (0 votes)
30 views14 pages

A Machine Learning-Based Sentiment Analysis of Online Product Reviews With A Novel Term Weighting and Feature Selection Approach

This paper presents a novel machine learning algorithm, LSIBA-ENN, for sentiment analysis of online product reviews, addressing challenges in accurately determining sentiment polarities due to complex text structures. The methodology involves data collection, preprocessing, term weighting, feature selection, and sentiment classification, utilizing advanced techniques like the Log Term Frequency-based Modified Inverse Class Frequency and Hybrid Mutation based Earth Warm Algorithm. Experimental results demonstrate that LSIBA-ENN outperforms existing algorithms in sentiment classification accuracy, showcasing its effectiveness in analyzing user reviews on e-commerce platforms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views14 pages

A Machine Learning-Based Sentiment Analysis of Online Product Reviews With A Novel Term Weighting and Feature Selection Approach

This paper presents a novel machine learning algorithm, LSIBA-ENN, for sentiment analysis of online product reviews, addressing challenges in accurately determining sentiment polarities due to complex text structures. The methodology involves data collection, preprocessing, term weighting, feature selection, and sentiment classification, utilizing advanced techniques like the Log Term Frequency-based Modified Inverse Class Frequency and Hybrid Mutation based Earth Warm Algorithm. Experimental results demonstrate that LSIBA-ENN outperforms existing algorithms in sentiment classification accuracy, showcasing its effectiveness in analyzing user reviews on e-commerce platforms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Information Processing and Management 58 (2021) 102656

Contents lists available at ScienceDirect

Information Processing and Management


journal homepage: www.elsevier.com/locate/infoproman

A machine learning-based sentiment analysis of online product


reviews with a novel term weighting and feature
selection approach
Huiliang Zhao a, d, Zhenghong Liu b, *, Xuemei Yao c, *, Qin Yang d
a
Department of Product Design, Guizhou Minzu University, Guiyang, 550025, China
b
School of Mechanical Engineering, Guiyang University, Guiyang, 550005, China
c
School of Data Science and Information Engineering, Guizhou Minzu University, Guiyang, 550025, China
d
School of Mechanical Engineering, Guizhou University, Guiyang, 550025, China

A R T I C L E I N F O A B S T R A C T

Key words: Recently, online shopping has turned into a mainstream means for users to purchase as well as
Sentiment analysis consume with the upsurge development of Internet technology. User satisfaction can be improved
Polarity classification effectively by doing Sentiment Analysis (SA) of a large quantity of user reviews on e-commerce
Machine learning
platforms. It is still challenging to envisage the accurate sentiment polarities of the user reviews
Sentiment analysis of online product reviews
because of the changes in sequence length, textual order, along with complicated logic. This paper
Term weighting
Elman Neural Network (ENN) proposes a new optimized Machine Learning (ML) algorithm called the Local Search Improvised
Bat algorithm (BA) Bat Algorithm based Elman Neural Network (LSIBA-ENN) for the SA of online product reviews.
The proposed work of SA encompasses ‘4’ major steps: i) Data Collection (DC), ii) preprocessing,
iii) Features Extraction (FE) or Term Weighting (TW), Feature Selection (FS), and polarity or
Sentiment Classifications (SC). Initially, the Web Scrapping Tool (WST) is utilized to extract the
customer reviews of the products for which the data is gathered as of the E-commerce websites.
Next, preprocessing is carried out on the web scrap extracted data. Those preprocessed data go
through TW and FS for additional processing by means of Log Term Frequency-based Modified
Inverse Class Frequency (LTF-MICF) and Hybrid Mutation based Earth Warm Algorithm
(HMEWA). Lastly, the HM-EWA data is rendered to the LSIBA-ENN, which classifies the customer
reviews’ sentiment as positive, negative, and neutral. For the performance analysis of the pro­
posed and prevailing classifiers, ‘2’ yardstick datasets are taken. The outcomes exhibit that the
LSIBA-ENN attains the best performance in SC when weighted against the existing top-notch
algorithms. The observations of the reviewer are exact. The prevailing ENN proffers recall of
87.79 when utilizing the proposed LTF-MICF scheme, whereas ENN only achieve 83.55, 84.03,
85.48, and 86.04 of recall whilst utilizing W2V, TF, TF-IDF, and TF-DFS schemes respectively.

1. Introduction

The world-wide-web has moved as of read-only (mostly) to read-write (mostly) which made the users more eager on sharing their
emotions as well as opinions openly for decades (Kiran, Kumar & Bhasker, 2020). The online reviews present on a website are expected

* Corresponding author.
E-mail addresses: [email protected] (Z. Liu), [email protected] (X. Yao).

https://doi.org/10.1016/j.ipm.2021.102656
Received 15 March 2021; Received in revised form 5 May 2021; Accepted 1 June 2021
Available online 16 June 2021
0306-4573/© 2021 Elsevier Ltd. All rights reserved.
H. Zhao et al. Information Processing and Management 58 (2021) 102656

Abbreviations

Abbreviation Expansion
SA Sentiment Analysis
ML Machine Learning
LSIBA-ENN Local Search Improvised Bat Algorithm based Elman Neural Network
DC Data Collection
FE Features Extraction
TW Term Weighting
FS Feature selection
SC Sentiment Classifications
WST Web Scrapping Tool
LTF-MICF Log Term Frequency-based Modified Inverse Class Frequency
HMEWA Hybrid Mutation based Earth Warm Algorithm
SP Sentiment Polarity
ABCDM Attention-centred Bidirectional CNN-RNN Deep Model
DL Deep learning
SL sentiment lexicon
GRU Gated Recurrent Unit
CNN Convolutional neural network
TF-IDF Term frequency-inverse document frequency
SF Sentiment features
OPSM Order-Preserving Sub-Matrix
LSTM Long –short term memory
BiGRU Bidirectional gated recurrent unit
GL Gensim Lemmatization
SBS Snow-Ball Stemming
NLP Natural Language Processing
MICF Modified inverse class frequency
POS Parts of Speech
EWA Earthworm algorithm
RNN Recurrent neural network
LSIBA Local Search Improvised BA
BA Bat algorithm
W2V Word 2vector
SVM Support vector machine
NB Naïve bayes
ENN Elman neural network

to augment user credibility, attract consumer visits, and increase the hit ratio along with time spent on the site (Singla, Randhawa &
Jain, 2017). A major asset for users who are deciding to buy a product, watch a movie, or go to a restaurant and for managers who are
making business decisions is Online reviews (Qiu, Liu, Li & Lin, 2018, Jadhav & Jadhav, 2018). Reviews and feedback on their services
in addition to products are provided by these sources, which lets the viewers to rate (Dong, 2020). SA has aggrandized in the age of the
early 2000s (Nandal, Tanwar, & Pruthi). SA is a sub-domain of web content mining (Pandey & Soni, 2019). Sentimental analysis (SA)
uses many pre-processing methods such as Normalization, Tokenization, and Duplication to start cleaning structured data and
generate the variance. Trigger functions can be used to get rid of them. We were necessary to extract the market analysts from its XML
text, so our feedback may not relate to the above. SA could also be utilized in market intelligence, measuring the degree of user
satisfaction on products or services along with improving their weaknesses, forecasting price changes as per news sentiments,
developing new products together with services, and also promoting and improving products as per customers’ reviews (Tubishat, Idris
& Abushariah, 2018).
The main direction of SA is sentence-level SA (Chen, Xu, He & Wang, 2017). The emotion analysis’s accuracy could be improved by
identifying the relationship betwixt product feature words, particularly applicable to online comments on complex products (Du &
Yang, 2019). Firstly, the reviews are gathered in the process, their sentiment identified, sentiments classified, features selected, and
also lastly, sentiment polarization ascertained or computed (Chakraborty, Bhattacharyya, Bag & Hassanien, 2018). Systems are
required that can perform analysis steps automatically for evaluating subjective information efficiently, and help to generate sentiment
and aspect dictionaries (Wladislav et al., 2018). These reviews go through an assorted process and classification techniques for mining
the sentiments technically (Raj, 2019). These reviews and also ratings include both positive along with negative descriptions
(Dharaiya, Soneji, Kakkad & Tada, 2020). Thus, it is significant to know the nature of customers’ opinions, being positive, negative, or
neutral in Social Media (Bavakhani, Yari & Sharifi, 2019).

2
H. Zhao et al. Information Processing and Management 58 (2021) 102656

Furthermore, the real noisy text data including idioms, sarcasm, informal words, as well as phrases, along with spelling mistakes is
not understood by the SA’s current approaches (Dashtipour et al.). Thus, Cluster Computing automatically analyzes the subjective
emotion in text information and simplifies the consumer decision-making process for processing the unstructured online reviews with
intelligent technology. Presently, it has emerged as a hot research subject in the field of computer science (Zhang, Kong & Zhu, 2019).
Currently, a major part is played by Deep Learning (DL) techniques which are a subfield of ML in multiple SA systems and various
researchers started to study them for improving manipulating data process (Bou Nassif, Elnagar, Shahin & Henno, 2020). LSIBA-ENN is
proposed for SA of online product reviews that achieves the highest accuracy for SC by this paper. The proposed technique comes with
a new TW approach and a novel FS method. The above-mentioned algorithm is based on microbats’ human speech behavior. These
species have sophisticated sound waves tracking abilities, and even a brilliant steering practice that allows them to differentiate among
targets and other obstacles even in total darkness. The basic Bat Algorithm is used to create Bat Algorithms for numerous optimization
motives. Using the BAT algorithm, the aim of the managing open shops issue will provide a timeframe for implementing the whole
procedure, calculating the potential processing time.
This paper is categorized as: Section 2 examines the associated works concerning the proposed method. Section 3 provides the
proposed work’s brief description. Section 4 surveys the experimental outcomes. The conclusion is provided in Section 5.

2. Related work

Chen et al. (2019) proffered TF-IDF together with FPCDA phrase FE methodology aimed at product reviews’ SA. By implementing
the OPSM bi-clustering methodology, feature vectors’ local patterns were identified by pondering the product reviews’ different
lengths. Lastly, for detecting the frequent together with pseudo-consecutive phrases with a high discriminative capability (say, FPCD
phrases) and word-order information, the Prefix Span was developed. Moreover, for ameliorating the discriminative capability of
Sentiment Polarity (SP), utilization of several significant factors, namely, the separation together with the discriminative capability of
words, were made. Next, the text features extraction was done. The sequence of experiential and analogy outcomes evinced that the
SA’s performance on product review was ameliorated. However, in the extraction of textual features, it offers less classification
accuracy.
Guo, Du & Kou (2018) formed a technique for ranking manifold products through heterogeneous information of online reviews
centred on the disparate aspect of the alternative products. Both objectives together with subjective sentiment values are combined by
them. Whilst computing alternative products’ total scores, Consumers’ personalized preferences were pondered. In the interim,
comparative superiority betwixt every two products also contributed to their final scores. Thus, utilizing the enhanced Page rank
method, a directed graph design was built and the calculation of each product’s final score was done. Nevertheless, the technique fails
to offer an accurate answer to the customer.
Basiri et al. (2021) formed an Attention-centred Bidirectional CNN-RNN Deep Model (ABCDM) aimed at Twitter product review’s
SA. The ABCDM has extracted both past together with future contexts by pondering temporal information flow in the direction uti­
lizing 2 independent bidirectional LSTM together with GRU layers. The attention process was implemented to the outputs of bidi­
rectional layers of ABCDM for placing more or less emphasis on a range of words. The efficiency of ABCDM was assessed on sentiment
polarity recognition that was the common and necessary task of SA. Studies were performed on ‘5’ reviews and ‘3’ Twitter datasets. The
outcomes showed that ABCDM attained both long reviews along with the short tweet polarity classification. But the system centred
only on the document-level SA. Polarity classification is a good fit for a variety of business technologies and applications. Even so,
polarity distinction was already extended toward more ethical standards for a new mission, which would be the enhancement of
recommendation processes, at which process does not suggest products that have received negative responses multiple occasions.
Huang et al. (2019) formulated an ensemble learning outline for SC of Chinese online reviews. The Part of Speech Combination
Pattern was extracted, Frequent Word Sequence Pattern and Order Preserved Sub Matrix Pattern as the inputted features. The Random
Subspace algorithm centered on Information Gain was utilized by taking the problem of massive features in the reviews, which could
improve the base classifiers simultaneously. At last, the constructing Base Classifiers centered on attribute’s product was adopted for
associating the sentiment information of every attribute in a review to attain good performance on sentiment classification. However,
this system was classified using limited product attributes.
Li et al. (2020) developed deep learning (DL) centered SA models named lexicon integrated two-channel CNN–LSTM family models
for SA. The sentiment padding methodology was utilized to formulate the inputted data sample of reliable size and develop the
proportion of sentiment information in every review. Sentiment padding improved the problem of gradient vanishing betwixt the
inputted layer and the primary hidden layer that might come out whilst utilizing ‘0’ padding. The operation of Sentiment padding was
formulated premium lexicon components for SA. Wide experiments were established that sentiment lexicon information and parallel
‘2’ channel model was provided to the improvement of SA accurateness.
Liu, Bi & Fan (2017) formulated a methodology that relied on the SA methodology and the intuitionist fuzzy set theory for ranking
the products by means of online reviews. An algorithm centered upon sentiment dictionaries was formulated for recognizing the
neutral, positive as well as negative sentiment orientation on the substitute product concerning the product feature in every review.
Centered upon the known positive, neutral, as well as negative sentiment orientations, an intuitionist fuzzy number was formulated to
symbolize the presence of another product regarding a feature.
Yang, Li, Wang & Sherratt (2020) formulated a SA model-SLCABG centered upon the sentiment lexicon (SL), attention-centered
Bidirectional Gated Recurrent Unit (BiGRU), and Convolutional Neural Network (CNN). The SLCABG model is connected with the
benefits of SL and DL technology. The SL was utilized for enhancing the sentiment features (SF) in the reviews. The Gated Recurrent
Unit (GRU) network and the CNN were implemented to take out the chief SF and context features in the reviews and used the attention

3
H. Zhao et al. Information Processing and Management 58 (2021) 102656

Fig. 1. Proposed framework.

mechanism to weight. And finally, the weighted SF was classified. However, this approach could only divide sentiment into negative
and positive categories, which was not suitable in areas with high requirements for sentiment refinement.
Murugan & Devi (2019) classify twitter data streaming based on sentiment analysis using hybridization methods. The genetic
algorithm, particle swarm optimization and decision tree algorithm is used for classification of sentiment analysis. For, sentiment
analysis classification, 600 twitter data is gathered with the help of URL-based security tool and feature generation.
Murugan & Devi (2019) formulated feature extraction of data by combining dimensional reduction technique with logistics
regression in principle component analysis. By using linear statistics approach, spammers in twitter are detected.

3. Proposed methodology

Here, an ML algorithm, called LSIBA-ENN, is proposed for carrying out SA of online product reviews. Initially, the data openly
available on the E-commerce websites, such as Taobao, JD, Amazon, E-bay, etc are gathered. After that, the WST is utilized for
capturing or extracting the customer opinion data on text format as of the sites. Web scraping software is exclusively adapted tool
towards automatically extracting data from websites. These resources are beneficial to everyone who wants to acquire information
from the Database in every other way. It manages proxy servers including decapitated apps. This can run JavaScript mostly on users
but rotate proxy by each demand, allowing anyone to have the actual HTML page without even being interrupted. It was simple to be
used for many coders and non-coders, and it was widely used for eCommerce data exfiltration. Next, the reviews written by means of
the customers regarding the products are filtered out as of the extracted data. After that, the preprocessing (white tokenization, Gensim
Lemmatization (GL), and Snow-Ball Stemming (SBS)) of the customer reviews is carried out. Tokenization is the method of breaking
down a text into manageable chunks. Tokens are also the name for these bits. We may break down a large amount of text onto words or
phrases. We may specify their specific criteria to split the text information into relevant tokens, depending on the problem at work.
In Natural Language Processing (NLP) and machine learning, lemmatization is among the most popular data pre-processing
methods. For example, as per a lemmatization theorem, the word better is derived from the phrase good, therefore the lemme is
good. It is much more precise and applies to virtual assistants whereby knowing the context of the conversation is important. Addi­
tional effects on the results at an expense, but it is a time-consuming process that led to the information extraction process. The LTF-

4
H. Zhao et al. Information Processing and Management 58 (2021) 102656

MICF does the TW of the preprocessed dataset. Even though a word that occurs ten times in such a report is not inherently ten times as
relevant as a word that occurs once in the whole document, the logarithmic feature in ltf is used to change intra-document rate.
Meanwhile, when implemented to the insufficient factorization test measure, Modified Inverse Class Frequency aims to compensate
with dropped items during the exclusion process. Subsequently, an HM-EWA selects the utmost informative features as of the extracted
features (term weighted reviews of the dataset). Finally, the chosen features are inputted to the proposed LSIBA-ENN aimed at the
polarity classification. The LSIBA-ENN’s output signifies the ‘3’ classes (sentiments) of the inputted data: i) positive, ii) negative, and
iii) neutral. The proposed SC framework is exhibited in Fig. 1, in that the reviews given by the customer for online product is gathered
from the E-commerce websites, then the collected data are processed using web scrapping tool to extract the data. Once the data are
extracted in web scrapping, the reviewer comments are filtered based on the products. After that, pre-processing takes places. Further,
the features of the data are extracted by using Log term frequency (LTF) and modified inverse class frequency (MICF), then the feature
selection approach is processed under HMEWA method.

3.1. Data collection

As of the websites, Taobao, JD, Amazon, as well as E-bay, the WST gathers the data concerning the customer for analyzing the
customers’ sentiment aimed at the products, which is developed specifically to extract data as of websites. It is also termed a web
harvesting tool or web data extraction tool. The textual information of the consumer data is amassed by the WST. Textual ratings
indicate the general preference of a user towards the business analytics. However, information in the review about the sentence topics
and sentiments are combined in a rating. The preprocessing operations of the customer reviews aimed at further processing are given
below,

3.2. Preprocessing

Pre-processing techniques is used to neglect the unwanted from the group of datasets. Here, preprocessing is performed, which is a
data preparation step for SC. The white space tokenization, GL, and SBS are the ‘3’ preprocessing operation.

3.2.1. Tokenization
Whenever this process runs into a whitespace character, it breaks text (input customer reviews) into terms or tokens, which aid in
comprehending the context or creating the model for the NLP. It also analyzes the word sequence to interpret the text’s meaning.

3.2.2. Gensim lemmatization


The dictionary form of the word called the lemma is discovered by the GL, which is done by means of vocabulary (dictionary
importance of words) in addition to morphological analysis (word structure and grammar relations). The morphological analysis is the
process, which used to determine the total set of possible relation in a multidimensional set. It is also used for problem-solving in multi-
g
dimensions. The Gensim package (Gp ) has the lemmatize method, which is utilized by the paper to do disparate operations, such as
lower casing, removal of numeric’s, stand-alone punctuation, special characters, along with stop words. In addition to that, it identifies
the words with part-of-speech tagging (POS tagging -Pos ) considering nouns (Nouns ), verb (Verbs ), adjective (Adjs ), together with adverbs
(Averbs ). Based on the template bundle, Gensim provides lemmatization capabilities. The lemmatize feature in the utils framework can
be used to enforce it. Only the optional ‘template’ package must be enabled for this feature to work. It converts that lamma to
something like a byte sequence and allows you to make your favorite lemmatize replica. A lexeme of the term is defined as the
connection of the normal form with such a word in a sentence. Moreover, the aim of part-of-speech (POS) labelling is to enhance the
consistency of the review by screening on precision and recall. This could be extended to the lexicons under study to delete groups of
terms that are deemed insignificant or disturbance to document identification.
g
The ‘Gp ’ utilizes the English lemmatizer as of the pattern library to extract their lemmas. Word-category disambiguation identifies
the words in the reviews. It considers both the definition and context to identify the specificPos .

3.2.3. Snowball stemming


Lessening a word into its base word or stem (i.e.) the words of a similar kind come under a common stem is called Stemming. Here,
the SBS is used, which is also termed as the Porter2 stemming algorithm as it is a better adaptation of the Porter Stemmer. It stems the
words to their more accurate stem. Porter2 is a stemmer that removes suffixes. It converts terms to stems by adding a probabilistic set of
adjustments to the meaning of a word final sequence. They can try looking up the accented component in a database and link it to a
morphological origin, and they’ll use a clustering scheme to bind various forms to a central form. The stemming is the process, which
used to neglect the small number of characters from the word, without checking the meaning of the word. But, the lemmatization is the
method, which converts words into meaning form without removing any character.

3.3. Term weighting

After the preprocessing operation is executed, the term weighting w T of the input reviews is performed. The new w Tscheme is
utilized by the paper, like the LTF-MICF technique, which is an amalgamation of two weighting schemes, namely Log Term Frequency
(LTF) and Modified Inverse Class Frequency (MICF). The measurement of how often a term occurs within a document of review

5
H. Zhao et al. Information Processing and Management 58 (2021) 102656

Fig. 2. Pseudocode of the LTF-MICF.

sentences is Term frequencyTf . But, Tf alone is inadequate as the terms that occur more frequently will have a very large weight on a
document. The supervised w Tschemes have attracted rising attention for making utilization of the class information of reviews. Thus,
Tf is merged with the MICF (a supervisedw T approach) here. The inverse ratio of the total classes wherein the term occurs on training
reviews to the total classes is called Inverses class frequencyi cf .
Initially, LTF centered w T is calculated, that computes Tf of every term in the preprocessed dataset and then the log normalization is
conducted on the output of Tf data which is indicated as LTF orLtf . After that, the MICF is computed for each term which is the modified
version ofi cf . Here, for each term, disparate class-particular scores must have different contributions (importance) to the whole term
score so the modification of i cf is executed. Thus, different weights for different class specific scores must be necessarily assigned, and
then, the weighted sum across all class-specific scores is utilized as the whole term score. The proposed formula for w T utilizing the
above combination of schemes is expressed in Eq. (1)
( ) ( ) ∑ n
[ ( )]
LTF − MICF tp = Ltf tp ∗ wpq . i cf tp (1)
q=1

Where wpq symbolizes the specific weighting factor of term tp for classcq , which can well be defined as
( ⇀ )
ri t ri ̃t
wpq = log 1 + ( ←). ( ⌢) (2)
max 1, r i t max 1, ri t

The weighting factor (WF) is the method used to assume the weight to the given datasets. Where ri t indicates number of reviews

← ⌢
(ri ) in class cq that include the termtp , r i t is the number of ri that have the term tm in other classes, ri t is the number of ri in class cq that
do not include the term tp , and ri ̃t is the number of ri that do not contain the term tp in other classes. The constant ’1′ is utilized for
← ⌢
ignoring negative weights. If r i t = 0 orri t = 0, the minimal denominator is set to ’1′ for avoiding the zero-denominator problem in an
extreme case. A new TW named LTF − MICF(tp ) is produced centered on theMICF(tp ). The formula for the Ltf (tp ) and i cf (tp ) can be
formulated as
( ) ( ( ))
Ltf tp = log 1 + Tf tp , ri (3)

Where Tf (tp ,ri )is the raw count of a term tp on a document of reviewsri , i.e., the total times that term tp occurs on the set of document
reviewsri .

6
H. Zhao et al. Information Processing and Management 58 (2021) 102656

( )
( ) q
i
cf tp = log 1 + ( ) (4)
c tp

Where, qsignifies the total classes in a set of document reviews, and c(tp ) is the number of classes including the termtp . The features
of the dataset are indicated asFi = {F1 , F2 , ......F3 , .....Fn } afterw T, where F1 ,F2 ,......F3 ,.....Fn implies the nnumber of weighted terms from
the preprocessed dataset. The proposed LTF-MICF pseudo-code is exhibited in Fig. 2,

3.4. Feature selection

Irrelevant features existence inside the data can decrement the classification design’s accuracy and also makes the design study the
irrelevant features. Herein, the FS’s issue is formulated as an optimization issue, which detects the informative features’ optimal subset
as of the dataset aimed at SC. The work employs the optimization techniques termed EWA, to execute FS that is detailed as,

3.4.1. Earthworm algorithm


To optimize a solution, Earth Worm Algorithm (EWA) utilizes the population of diverse earthworms. The EWA’s inspiration is
obtained as of the ‘2’ earthworm reproduction types: 1st reproduction1 Erp , and 2nd reproduction2 Erp . The 1 Erp generates just one child
but 2 Erp creates one or more children at an instance that was performed utilizing improved crossover operators. Define Fi as a set of
document reviews’ features, which is offered as input into the EWA. Each child produced by an earthworm Ew (extracted features -Fi )
possesses the total genetic factor whose length equals to parentalEw . Ew comprising the best fitness value is elected, and it can’t be
changed by operators aimed at the upcoming generation. The EWA possesses sturdy development capability. Simultaneously, its
exploration capability is weak, and it is simple to fall-off inside the local optimal solution. Aimed at overcoming this demerit, a
Gaussian mutation stratagem was modelled, and compiled with the Cauchy mutation protocol aimed at boosting the algorithm’s
capability to escape the local optimal solution. Cauchy mutations, and also a variant which is a linear combination of these two
distributions, are considered to have a greater ability to escape local optima. The findings suggest that activity ranges can be useful in
determining appropriate user parameter values. This hybrid mutation-centred EWA is termed HMEWA. The proposed HMEWA’s 1 Erp is
equated in the Eq. (5),
x
Qi = max Qi + min Qi − φ ∈ [0, 1] ∗ y Qi (5)

Here, x Qi signifies the newly generated Ew x’s ith position vector (→


p ); y Qi signifies the yth earthworm’s ith (→
p ); max Qi and min Qi implies

the upper and lower bounds of the earthworm y’s locations; φsignifies the similarity factor which examines the distance d betwixt the

child and the parent Ew . The d value is minimum whilst i is small (i.e.) x Q is very near toy Q; if this happens, then the local search

procedure is executed. The d value is maximum whilst φ equals ‘0’; this is examined by the Eq. (6),
x
Qi = max Qi + min Qi (6)
Whilst φ equals ‘1’, the HMEWA’s global search is executed; that is arithmetically articulated by Eq. (7)
x
Qi = max Qi + min Qi − y
Qi (7)

The HMEWA balances the Erp ’s exploration and exploitation phase by creating alterations in the ‘φ’value. A few E prevalent in the
1 w

HMEWA possesses the potential to produce more than a single child at an instance; consequently, this stage is generalized to produce
improved crossover operation versions. ‘3’ conditions exist in 2 Erp : single-point crossoverSc , multi-point crossoverMc , and also uniform
crossoverUc . A single point crossover is when a position mostly on parental entity string is chosen as a transition point. All content in
the entity string after that stage is switched between the dual parent species. Situational Bias is a property of strings whereas the
strategy of two-point crossover is a subset of the N-point Crossover strategy. Two randomly generated points on different chromosomes
(strings) are randomly selected and the genetic information is shared there.
Centred on the ‘3’ above cases, the z Q’s evaluation is performed with the child n’s help; it is articulated in Eq. (8): Herein, the
weight factor wis signified in Eq. (9), where the jth child’s fitness is implied asj F:
i
∑n
z
Q= i=1
w....Qn,i (8)
i

⎛ ∑ ρ ⎞
j
F
1 ⎜y=1, y∕=i ⎟
w= ∗⎜ ρ ⎟ (9)
i n− 1 ⎝ ∑ jF

y=1

The Eq. (9) is further expanded as


[ ]
1 +i+1 F + ... + n− 1 F + n F
w= ∗ 1 (10)
i n− 1 F + 2 F + ⋯⋯i− 1 F + i F + i+1 F + ⋯ + n− 1 F + n F
In the subsequent ‘2’ phases of reproduction, the location of yto the next generation is produced by means of Eq. (11), and the term

7
H. Zhao et al. Information Processing and Management 58 (2021) 102656

Fig. 3. Structure of LSBIA-ENN.

φsignifies the proportional factor that adjusts the value of z Q, x Q etc. The value of the subsequent iteration of φt+1 is expressed as
y
Q = φx Q + (1 − ϕ) × z Q (11)

φt+1 = βtα (12)

Wherein, β implies the constant that is similar to the cooling factor.


The integration of hybrid mutation Hm is carried out to the EWA to escape as of the local optima and ameliorate the search capability
of earthworms, which is exhibited in Eq. (13). Hybrid mutation-centred method describes about the earthworm algorithm which
utilizes the population and its reproductive types. The whole earthworm individuals will be assisted to proceed for better positions.
This operator can lessen the likelihood of trapping into the local optimum.

Hm = t+1 Gm + t+1 Cm (13)

t+1
Gm = y Qi + c1 .G(μ, σ ) (14)

t+1
Cm = y Qi + c2 .C(μ′ , σ′ ) (15)
Wherein G(μ, σ )signifies a arbitrary number of Gaussian distribution,C(μ , σ ) implies a arbitrary number of Cauchy distribution,
′ ′

(μ, σ ) signify the mean as well as variance of Gaussian distribution, (μ′ , σ′ )imply the mean as well as variance of Cauchy distribution,
and c1 and c2 signify the coefficients of Gaussian t+1 Gm together with Cauchy mutation t+1 Cm . Subsequent to performing the Hm , the new
solutions are generated that are exhibited in Eq. (16)
[y Qi ]new = y Qi + w (Hm ) (16)
i

Where
∑Ps y
Qi
w= (17)
i y=1 Ps
Wherein, w implies the weight vector as well as Ps signifies the population size. The chosen features as of the features (extracted) are
i
signified assi (i = 1,2,...n). The EWA’s output is signified as (si ) = {s1i , s2i , ........sni } is basically a new sub-set of informative features of the
dataset, whereinnsignifies a new number of every unique features. Lastly, this stage’s result is a new sub-set of documents with more
informative features.

8
H. Zhao et al. Information Processing and Management 58 (2021) 102656

3.5. Sentiment classification

Past performing FS, the features selected are fed into the proposed classifier LSIBA-ENN aimed at SC. As one amidst the classic
RNNs, Elman neural network (ENN -Enn ) is trained in a supervised manner and also optimized centred on the back-propagation neural
networkB Pnn . Elman networks are back - propagation systems including level persistent communications and button delays added on
top. Provided sufficient connections in the hidden units, Elman channels with one or maybe more hidden units can understand any
complex input-output interaction arbitrarily well. Elman networks, on the other hand, use simpler correlation equations at the cost of
less accurate training.
The topology is split generally into ‘4’ layers: an inputted layer (IL), hidden layer (HL), context layer, and then outputted layer (OL).
The Input Layer receives input through some certain nodes and also from the inner source, but the output can also be inverted and is
used as an input. By modifying relative design variables, the hidden layer establishes a stable structure, and its mechanisms can be
evaluated and strengthened. Here, the options were chosen, and the models that resulted in a higher average change were chosen.
Further, in the test input matrix, the output layer is used to calculate network rationalization and to end training by enhancing
simplification.
The context layer is utilized to remember the HL’s output that is observed as a step delay operator. Owing to this structure’s ex­
istence, Enn is found to possess an efficient dynamic information processing capability and rapid convergence rate that is executed
successfully in prediction and also classification tasks. Nevertheless, Enn is consistent with B Pnn regarding the weight updating tech­
nique, (i.e.) the gradient descent technique that falls easily within the local minima. This work utilizes LSIBA aimed at optimizing the
Enn ’s initial weights aimed at evading the falling off within the local minima. TheEnn achieves quick stability utilizing the LSIBA’s
implicit parallel search feature. This LSIBA centered ENN is termed LSBIA-ENN; Fig. 3 exhibits the proposed method’s structure,
Corresponding to theEnn ’s structure diagram, the kth receiving or context layer unit’s output at t iteration is articulated as,
t
yck = g × t− 1
yck + t− 1
yj , k, j = 1, 2, ...n (18)
th th
Here, yck implies the k context layer unit’s output at t iteration; yj signifies the j
t t− 1
HL unit’s output at titeration; g implies a
feedback gain factor. The neural network’s non-linear state space expression is
( )
t
y = f w2 ⋅ t yc + w1 ⋅ t− 1 s (19)

t
yc = g × t− 1
yc + t− 1
y (20)

t
z = q ⋅ (w3 ∗ t y) (21)
Here, t yc signifies the feedback state vector; t simplies the input vector; t zsymbolizes the output vector; w1 signifies the connection
weight vector betwixt the IL and HL; w2 implies the connection weight vector betwixt receiving layer and HL, w3 signifies the
connection weight vector betwixt the HL and OL; gsignifies the OL’s transfer function that is frequently a linear function and it is
enumerated utilizing Eq. (23). Amidst them, the general excitation function qy is the HL’s transfer function that is pondered as the
sigmoid activation function:
1
qy = y
(22)
1 + e−

g(y) = y (23)
The network error employed here is termed the cross-entropy loss function that is calculated as,

m
C→
EL = [tk .log(zk )] (24)
k=1

Here, tk implies the targeted value; zk signifies the output predicted value. The ENN’s weight optimization utilizing LSIBA is
detailed as,

3.5.1. Weight optimization of ENN using LSIBA


The heuristic technique comprising optimization’s global search abilities is termed as Bat Algorithm (BA), which imitates the bats’
echolocation behavior. The echolocation is generally sonar, which is utilized by the bats aimed at finding the prey and guessing the
problems prevalent in a wholly dark environment. They produce a short or else long sound pulse dependent on the circumstance.
Whilst the pulse falls on the prey or else any obstacle, the bats examine their own pulse and also transmute it to valuable information
for measuring how distant they stand as of prey. BA faces one amidst the well-known complications termed premature convergence
prevalent in the continuous data’s optimization. Aimed at overcoming drawbacks like this, this work implements a new inertia weight
(λw ) grounded search function in the traditional BA to balance exploration and exploitation in the procedure of finding the finest
solutions. The BA’s improvisation utilizing λw is termed Local Search Improvised BA (LSIBA), whose arithmetical design is detailed as,
At first, the LSIBA’s initialization is executed. Randomly, the initialization of LSIBA’s population is executed as,
i
W = 1 W, 2 W, .....N W (25)

9
H. Zhao et al. Information Processing and Management 58 (2021) 102656


Every bat looks for the objects arbitrarily with i Vvelocity at i P position. It produces pulses comprising a fixed frequencyF, loudness
m
Lhigh and also varying wavelengthWL . Additionally, it encompasses the potential to control the frequency (or wavelength) and also the
pulse emission rate automatically centred on its target’s closeness. It is presumed that the loudness is altered as of a high value Lhigh to a
minimal valueLlow . i V and also i P are updated grounded on the above-idealized rules as:

(→ ⇀)
i
F = F + F − F ∗ ψ (0, 1) (26)
m x m

(i ) ( ) ( ̅→ )
←̅
V t+1 = i V t + i Pt − p ∗ i F (27)
(i ) ( ) ( )
P t+1 = i P t + i V t+1 (28)

Here, i F signifies the ith bat’s frequency; F and F signifies the maximal and minimal frequencies; ψ implies a random number

x m
←̅
̅→
selected betwixt the (0, 1) range; p symbolizes the global best solution.
BA’s local search component has been implemented via a random walk. In the local search, all novel solutions are created cor­
responding to the Eq. (29),

(p)n = (p)o + β L(t) (29)


Here, β signifies a random number; L(t)implies the average loudness created by all the bats in the optimization procedure. Each bat
arbitrarily looks for as Eq. (29) exhibits, and the number of iterations is not pondered. λw is proffered that provokes the technique to
possess a sturdy global search ability bias at the start; however, it boosts the local search ability since the number of iterations in­
crements. The λw procedure comprises ‘2’ parts: the 1st part alters linearly and is also utilized aimed at balancing exploration and
exploitation, whilst the 2nd part is employed aimed at incrementing the randomness. The weighting procedure is equated as,
[max ]
λw − min λw
λw = max λw − t ∗ + η ∗ rd (30)
T

Here, max λw signifies the maximal inertial weight; min λw signifies minimal inertial weight; η implies a constant which controls the
randomness(rd). Next, the arithmetical design of an ant’s random walk is altered as,

(p)n = (p)o + λw ∗ L(t) (31)


The loudness Land also pulse emission rate Eare equated as:
i i
{ }
L(t + 1) = χ ∈ (0, 1) L(t) (32)
i i

t+1 I
E = E(1 − exp[q ∈ (0, ∞) ∗ t → ∞]) (33)
i i

I 0
Here, χ and qimplies constants,L(t) → 0, and E → E. The L and also Eare altered whilst the bats fly nearer to the finest solutions.
i i i i i

4. Result and discussion

Here, the proposed LSIBA-ENN method’s performance is analyzed which is executed in JAVA. Concerning performance measures,
namely, precision, f-measure, recall, along with accuracy for 2 datasets, the proposed LSIBA-ENN is analogized with the prevalent
techniques. The techniques’ outcomes are analogized with 4 disparate weighting schemes say, Word 2 Vector (W2V), Term Frequency-
Inverse Document Frequency (TF-IDF), TF, TF-DFS along with LTF-MICF.

4.1. Dataset description

2 disparate datasets are utilized aimed at the proposed along with prevalent classifiers’ performance analysis, i.e., i) App - Apps for
Android dataset, this dataset encompasses 7, 62,937 product reviews and metadata as of Amazon, together with ii) Movies - Movies
and TV dataset, which encompass 1,677,539 product reviews and metadata as of Amazon. This dataset includes product recom­
mendations and records from a variety of online sites, where consumer reviews help to give views and explain their interactions with
consumer services’ brands.

4.2. Performance analysis of LSIBA-ENN for Apps dataset

Concerning disparate performance metrics, the analogy of the proposed LSIBA-ENN with the prevalent techniques namely Support
vector machine (SVM), Naive Bayes (NB), along with Elman Neural Network (ENN) is done. Support vector machines (SVMs) are really
a category of deep learning methods for selection, correlation, and identification of outliers which are some of the benefits of support

10
H. Zhao et al. Information Processing and Management 58 (2021) 102656

Table 1
Precision and recall of the classifiers.
Techniques W2V TF TF-IDF TF-DFS Proposed LTF-MICF

(a)
NB 81.67 81.97 82.68 83.67 85.78
SVM 82.36 83.01 84.67 85.01 86.93
ENN 83.45 84.78 85.48 86.04 87.89
Proposed LSIBA-ENN 85.56 85.99 86.76 88.14 90.9
Techniques W2V TF TF-IDF TF-DFS LTF-MICF
(b)
NB 82.07 82.97 83.08 84.01 85.81
SVM 83.06 83.87 84.67 85.01 86.93
ENN 83.55 84.03 85.48 86.04 87.79
Proposed LSIBA-ENN 84.96 85.09 86.76 87.14 90.01

Fig. 4. F-measure and accuracy of the techniques.

vector machines: In high-dimensional environments, it works well. Meanwhile, The Nave Bayes method is a simple and quick possible
to forecast the sample data set’s class. It’s also good at multi-class forecasting. In real life, getting a set of indicators which are
completely autonomous is practically impossible. Concerning, precision together with recall, Table 1 exhibits the techniques’ per­
formance analysis. Table 1(a) exhibits the performance of the proposed LSIBA-ENN with the traditional method for 4 disparate
weighting schemes. The precision of 81.69, 81.97, 82.68 and 83.67 for the W2V, TF, TF-IDF, TF-DFS scheme is attained by the existent
NB, while the proposed LTF-MICF scheme achieves 85.78 of precision, which is superior when analogized with the other schemes.
Likewise, the highest precision values are achieved by the proposed LSIBA-ENN for every weighting scheme. The precision of 90.9 is
proffered by the proposed LSIBA-ENN whilst utilizing the LTF-MICF weighting scheme and it is evinced that the LSIBA-ENN utilizing
LTF-MICF weighting scheme proffers the finest precision outcomes.
Table 1(b) exhibits the recall outcomes of the LSIBA-ENN with all other prevalent classifiers. The prevailing ENN proffers recall of
87.79 when utilizing the proposed LTF-MICF scheme, whereas ENN only achieve 83.55, 84.03, 85.48, and 86.04 of recall whilst
utilizing W2V, TF, TF-IDF, and TF-DFS schemes respectively. It is evinced that when utilizing the proposed LTF-MICF scheme, the
classifiers proffer the highest recall value. When analogized with all other techniques, the highest recall value (90.01) is attained by the
proposed LSIBA-ENN with LTF-MICF. Regarding the precision along with recall, it could be observed that the LSIBA-ENN offer better
performance when analogized with the other three methods. Concerning the other metrics, Fig. 4 exhibits the techniques’
performance.
Fig. 4(a) compares the f-measure of the LSIBA-ENN with the prevailing algorithms. The f-measure proffered by the prevalent NB,
SVM along with ENN for the W2V scheme are 83.07, 84.06, and 85.05, whereas the f-score of 86.96 is attained by the LSIBA-ENN and is
high when analogized with others. The uppermost f-measure value is proffered by the proposed LSIBA-ENN with LTF-MICF when
analogizing with all others. Fig. 4 (b) analogizes the classification accuracy of the proposed LSIBA-ENN with the prevailing classifiers.
Superior outcomes are attained by the proposed LSIBA-ENN when analogized with other techniques regarding accuracy. The pre­
vailing NB, SVM and ENN proffer the accuracy of 87.81, 89.93, and 90.79, whilst an accuracy of 92.01 is attained by the LSIBA-ENN
when executing with the LTF-MICF. Thus, the proposed LSIBA-ENN ameliorates the classification accuracy of the SA process while
utilizing the LTF-MICF scheme and is evinced as of Fig. 4.

11
H. Zhao et al. Information Processing and Management 58 (2021) 102656

Fig. 5. Results of the proposed and existing models.

4.2. Performance analysis of LSIBA-ENN for Movies and TV Datasets

Here, aimed at analyzing the proposed LSIBA-ENN’s performance with the prevalent techniques, TV datasets are employed and is
depicted in Fig. 5. Utilizing the disparate TW scheme for the Movie and TV reviews dataset, Fig. 5 (a) shows the precision outcomes of
the proposed LSIBA-ENN along with existent classifiers. The precision of 89.96 is attained by the LSIBA-ENN, whereas the precision of
85.58, 87.67, 88.48 is attained by the prevailing NB, SVM and ENN for the TF-IDF weighting scheme. Here, amongst all, LSIBA-ENN’s
precision value is the highest. Likewise, the highest precision values are attained by the LSIBA-ENN for all TW schemes. The highest
precision value of 91.29 is proffered by it utilizing LTF-MICF when analogized with all other schemes. Regarding recall metrics, Fig. 5
(b) exhibits the significant differences betwixt the proposed LSIBA-ENN and other prevalent methods. The prevalent SVM offers the
recall of 85.06, 86.87, 87.67, and 88.01 for the weighting schemes, namely, W2V, TF, TF-IDF and TF-DFS, whereas the SVM attains the
recall of 89.93 for LTF-MICF, which is higher when analogized with the recall of SVM while utilizing other weighting schemes. It is
evinced that while utilizing the proposed LTF-MICF scheme, the classifiers offer the highest performance.
In this also, the proposed classifier performs better than the other with its performance. It offered 91.02% of recall whilst utilizing
the proposed LTF-MICF scheme. It is stated that the LSIBA-ENN with LTF-MICF offers the highest precision and recall when analogized
with the other techniques. Fig. 5 (c) analogizes the f-measure outcomes of the proposed with the prevailing works. The f-measure of
93.01 is attained by the proposed LSIBA-ENN for the LTF-MICF. The prevailing NB, SVM and ENN proffer the f-measure of 88.96,
89.09, 91.76, and 92.14 while utilizing LTF-MICF which is less contrasted with the proposed LTF-MICF. Likewise, every prevailing
technique proffers less f-score whilst utilizing the prevailing weighting schemes. Fig. 5 (d) exhibits that the classification accuracy of
the SA is ameliorated while utilizing the proposed LSIBA-ENN with LTF-MICF. The accuracy of 88.26, 89.19, 91.91, 92.84, and 93.91 is
attained by the LSIBA-ENN for W2V, TF, TF-IDF and TF-DFS, respectively, whereas it attains 93.91 of accuracy while executed with the

12
H. Zhao et al. Information Processing and Management 58 (2021) 102656

LTF-MICF and is high contrasted with every other technique. Lastly, it is stated that the proposed LSIBA-ENN consecutively ameliorates
the SA process’ classification performance when executed with the proposed LTF-MICF scheme and is evinced as of the outcomes.

5. Conclusion

LSIBA-ENN is proffered in this work to find the online product reviews’ polarities. Experimentations are executed utilizing the ‘2’
datasets like App, and also Movie and TV dataset. Both the datasets are gathered as of Amazon. The proposed LSIBA-ENN’s perfor­
mance is analogized with the existent SVM, NB, and ENN techniques regarding ‘4’ performance metrics: precision, recall, f-measure,
and also accuracy. Centred on the TW methods, like W2V, TF, TFIDF, TF-DFS, and also LTF-MICF, the outcomes of the techniques’ ‘4’
metrics are examined. The outcomes exhibit that the LSIBA-ENN obtains the highest performance level aimed at both the datasets
whilst analogized to the existent sentiment classifiers. Additionally, whilst employing the proposed LTF-MICF, the classifiers attain
efficient outcomes regarding the classification metrics. Aimed at the App dataset, the LSIBA-ENN’s accuracy whilst employing LTF-
MICF is 92.01%; aimed at the Movie and also TV dataset, the proposed classifier obtains 93.91% accuracy. Both the accuracy values
obtained by LSIBA-ENN are greater than the existent classifiers. Consequently, as of the outcomes, it is validated that the LSIBA-ENN is
the efficient machine learning classifier to execute product data’s SA with precise outcomes. This work can be prolonged in the up­
coming future by implementing a deep learning design to execute SA. Additionally, a design can be created with the multimedia data’s
applicability aimed at SA.

CRediT authorship contribution statement

Huiliang Zhao: Data curtion, Software, Validation. Zhenghong Liu: Conceptualization, Methodology, Writing - original draft.
Xuemei Yao: Methodology, Writing - review & editing, Conceptualization. Qin Yang: Software, Visualization, Investigation.

Acknowledgement

This work was financially supported by the Natural Science Foundation of the Department of Education of Guizhou Province (Grant
No. [2018]152, No. [2017]239), the Humanity and Social Science Foundation of the Guizhou Higher Education Institutions of China
(Grant No. 2018qn46), and Guiyang University Teaching Research Project (Grant No.: JT2019520206).

References

Basiri, M. E., Nemati, S., Abdar, M., Cambria, E., & Rajendra Acharya, U. (2021). ABCDM: An attention-based bidirectional CNN-RNN deep model for sentiment
analysis. Future Generation Computer Systems, 115, 279–294, 10.1016/j.future.2020.08.005.
Bavakhani, M., Yari, A., & Sharifi, A. (2019). A deep learning approach for extracting polarity from customers’ reviews. In Proceeding of IEEE 5th International
Conference on Web Research (ICWR) (pp. 276–280).
Bou Nassif, A., Elnagar, A., Shahin, I., & Henno, S. (2020). Deep learning for Arabic subjective sentiment analysis: Challenges and research opportunities. Applied Soft
Computing. , Article 106836. https://doi.org/10.1016/j.asoc.2020.106836.
Chakraborty, K., Bhattacharyya, S., Bag, R., & Hassanien, A. A. (2018). Sentiment analysis on a set of movie reviews using deep learning techniques. Social network
analytics: Computational research methods and techniques (p. 127). https://doi.org/10.1016/B978-0-12-815458-8.00007-4.
Chen, T., Xu, R., He, Y., & Wang, X. (2017). Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Systems with Applications,
72, 221–230. 10.1016/j.eswa.2016.10.065.
Chen, X., Xue, Y., Zhao, H., Lu, X., Hu, X., & Ma, Z. (2019). A novel feature extraction methodology for sentiment analysis of product reviews. Neural Computing and
Applications, 31(10), 6625–6642.
Dashtipour K., Gogate M., Li J., Jiang F., Kong B., and Hussain A., “A hybrid Persian sentiment analysis framework: Integrating dependency grammar based rules and
deep neural networks”, Neurocomputing, vol. 380, pp. 1-10. 10.1016/j.neucom.2019.10.009.
Dharaiya, S., Soneji, B., Kakkad, D., & Tada, N. (2020). Generating positive and negative sentiment word clouds from e-commerce product reviews. In proceeding of
IEEE International Conference on Computational Performance Evaluation (ComPE) (pp. 459–463).
Dong, J. (2020). Financial investor sentiment analysis based on FPGA and convolutional neural network. Microprocessors and Microsystems, Article 103418, 10.1016/j.
micpro.2020.103418.
Du, Y., & Yang, Lu (2019). A sentiment measurement model for online reviews of complex products. In Proceeding of IEEE international conference on communications,
information system and computer engineering (CISCE) (pp. 199–202), 10.1109/CISCE.2019.00053.
Guo, C., Du, Z., & Kou, X. (2018). Products ranking through aspect-based sentiment analysis of online heterogeneous reviews. Journal of Systems Science and Systems
Engineering, 27(5), 542–558.
Huang, J., Xue, Y., Hu, X., Jin, H., Lu, X., & Liu, Z. (2019). Sentiment analysis of Chinese online reviews using ensemble learning framework. Cluster Computing, 22(2),
3043–3058.
Jadhav, HB., & Jadhav, AB. (2018). Systematic Approach towards sentiment analysis in online review’s. In International conference on computer networks, big data and
IoT (pp. 358–369). Cham: Springer. https://doi.org/10.1007/978-3-030-24643-3_43.
Kiran, R., Kumar, P., & Bhasker, B. (2020). Oslcfit (organic simultaneous LSTM and CNN Fit): A novel deep learning based solution for sentiment polarity classification
of reviews. Expert Systems with Applications, 157, Article 113488. https://doi.org/10.1016/j.eswa.2020.113488.
Li, W., Zhu, L., Shi, Y., Guo, K., & Cambria, E. (2020). User reviews: Sentiment analysis using lexicon integrated two-channel CNN–LSTM family models. Applied Soft
Computing, 94, Article 106435, 10.1016/j.asoc.2020.106435 1568-4946.
Liu, Y., Bi, J-Wu, & Fan, Z.-. P. (2017). Ranking products through online reviews: A method based on sentiment analysis technique and intuitionistic fuzzy set theory.
Information Fusion, 36, 149–161, 10.1016/j.inffus.2016.11.012.
Murugan, S., & Devi, U. (2019). Feature extraction using LR-PCA hybridization on twitter data and classification accuracy using machine learning algorithms. Cluster
Computing, 22, 13965–13974.
Murugan, S., & Devi, U. (2019). Classifying streaming of Twitter data based on sentiment analysis using hybridization. Neural Computing and Applications, 31,
1425–1433.
Nandal N., Tanwar R., and Pruthi J., “Machine learning based aspect level sentiment analysis for Amazon products”, Spatial Information Research, vol. 28, no. 5 601-
607.

13
H. Zhao et al. Information Processing and Management 58 (2021) 102656

Pandey, P., & Soni, N. (2019). Sentiment analysis on customer feedback data: Amazon product reviews. In Proceeding of IEEE international conference on machine
learning, big data, cloud and parallel computing (COMITCon) (pp. 320–322), 10.1109/COMITCon.2019.8862258.
Qiu, J., Liu, C., Li, Y., & Lin, Z. (2018). Leveraging sentiment analysis at the aspects level to predict ratings of reviews. Information Sciences, 451, 295–309. https://doi.
org/10.1016/j.ins.2018.04.009.
Raj, V. (2019). Sentiment analysis on product reviews. In 2019 International conference on computing, communication, and intelligent systems (ICCCIS) (pp. 5–9).
Singla, Z., Randhawa, S., & Jain, S. (2017). Statistical and sentiment analysis of consumer product reviews. In Proceeding of IEEE 8th international conference on
computing, communication and networking technologies (ICCCNT) (pp. 1–6), 10.1109/ICCCNT.2017.8203960.
Tubishat, M., Idris, N., & Abushariah, M. A. M. (2018). Implicit aspect extraction in sentiment analysis: Review, taxonomy, oppportunities, and open challenges.
Information Processing & Management, 54(4), 545–563.
Wladislav, S., Johannes, Z., Christian, W., André, K., & Madjid, F. (2018). Sentilyzer: aspect-oriented sentiment analysis of product reviews. In Proceeding of IEEE
international conference on computational science and computational intelligence (CSCI) (pp. 270–273). https://doi.org/10.1109/CSCI46756.2018.00059.
Yang, Li, Li, Y., Wang, J., & Sherratt, R. S (2020). Sentiment analysis for E-commerce product reviews in Chinese based on sentiment lexicon and deep learning. IEEE
Access, 8, 23522–23530.
Zhang, W., Kong, S.-x., & Zhu, Y.-c. (2019). Sentiment classification and computing for online reviews by a hybrid SVM and LSA based approach. Cluster Computing, 22
(5), 12619–12632.

14

You might also like