0% found this document useful (0 votes)

4 views18 pages

applsci 13 04056 - 加水印

good

Uploaded by

champion799159902

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views18 pages

applsci 13 04056 - 加水印

good

Uploaded by

champion799159902

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

applied

sciences
Article
CWSXLNet: A Sentiment Analysis Model Based on Chinese
Word Segmentation Information Enhancement
Shiqian Guo 1 , Yansun Huang 2 , Baohua Huang 1, * , Linda Yang 1 and Cong Zhou 1

1 School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China
2 Auditing Bureau of Xixiangtang, Nanning 530001, China
* Correspondence: [email protected]; Tel.: +86‑152‑9654‑4306

Abstract: This paper proposed a method for improving the XLNet model to address the shortcom‑
ings of segmentation algorithm for processing Chinese language, such as long sub‑word lengths, long
word lists and incomplete word list coverage. To address these issues, we proposed the CWSXLNet
(Chinese Word Segmentation XLNet) model based on Chinese word segmentation information en‑
hancement. The model first pre‑processed Chinese pretrained text by Chinese word segmentation
tool, and proposed a Chinese word segmentation attention mask mechanism by combining PLM (Per‑
muted Language Model) and two‑stream self‑attention mechanism of XLNet. While performing nat‑
ural language processing at word granularity, it can reduce the degree of masking between masked
and non‑masked words for two words belonging to the same word. For the Chinese sentiment
analysis task, proposed the CWSXLNet‑BiGRU‑Attention model, which introduces bi‑directional
GRU as well as self‑attention mechanism in the downstream task. Experiments show that CWSXL‑
Net has achieved 89.91% precision, 91.53% recall rate and 90.71% F1‑score, and CWSXLNet‑BiGRU‑
Attention has achieved 92.61% precision, 93.19% recall rate and 92.90% F1‑score on ChnSentiCorp
dataset, which indicates that CWSXLNet has better performance than other models in Chinese sen‑
timent analysis.

Keywords: sentiment analysis; Chinese word segmentation; XLNet; attention mask; machine learn‑
ing; natural language processing
Citation: Guo, S.; Huang, Y.; Huang,
B.; Yang, L.; Zhou, C. CWSXLNet: A
Sentiment Analysis Model Based on
Chinese Word Segmentation
1. Introduction
Information Enhancement. Appl. Sci.
2023, 13, 4056. https://doi.org/
In the era of big data, the amount of information on the Internet has increased dra‑
10.3390/app13064056
matically, including a large number of reviews posted by users on the Web. Most of these
reviews express users’ opinions and evaluations of products and services, which contain
Academic Editor: João M. a lot of potential value [1]. The purpose of sentiment analysis techniques is to uncover the
F. Rodrigues
emotions and attitudes expressed in them, but due to the complex nature of comment data,
Received: 26 February 2023 which is diverse, colloquial and abbreviated, it is particularly important to use computa‑
Revised: 19 March 2023 tional techniques to achieve automatic, in‑depth and accurate analysis and processing [2].
Accepted: 21 March 2023 The development of sentiment analysis has had a significant impact on the field of
Published: 22 March 2023 natural language processing. Natural language processing techniques and text analysis
methods are used to mine text and extract sentiment polarity from it [3]. Sentiment anal‑
ysis has a wide range of applications, such as reputation management, market research,
customer service, brand monitoring, and so on.
Copyright: © 2023 by the authors.
Based on the above, sentiment analysis is important in various fields such as business,
Licensee MDPI, Basel, Switzerland.
politics and society to help people better understand and respond to different emotions
This article is an open access article
and attitudes in society. Sentiment analysis has developed through three main stages: sen‑
distributed under the terms and
timent lexicons, machine learning and deep learning [4].
conditions of the Creative Commons
Attribution (CC BY) license (https://
Sentiment lexicon‑based approaches: The earliest approaches to sentiment analysis
creativecommons.org/licenses/by/
were mainly based on sentiment lexicons, which typically contained a large number of
4.0/). words, each of which was tagged with a sentiment polarity such as positive, negative or

Appl. Sci. 2023, 13, 4056. https://doi.org/10.3390/app13064056 https://www.mdpi.com/journal/applsci

Appl. Sci. 2023, 13, 4056 2 of 18

neutral. The main use of sentiment lexicons in sentiment analysis is to automatically iden‑
tify and classify the sentiment polarity of texts. The creation of a sentiment lexicon typically
involves two aspects: word selection and sentiment annotation. For word selection, a large
number of words are usually collected from different sources (e.g., network texts, human
written texts, annotated datasets, etc.). The annotators have to annotate each vocabulary
with a positive, negative or neutral sentiment polarity according to predefined sentiment
classification criteria [5].
Several sentiment lexicons have been developed and are widely used in natural lan‑
guage processing. In the early years of research, Sebastiani Fabrizio et al. in [6–8] proposed
the SentiWordNet sentiment lexicon, a WordNet‑based sentiment lexicon that associates
each word with a set of sentiment strengths, including positive sentiment, negative senti‑
ment, and neutral sentiment. After a few years, Wu Xing et al. in [9], inspired by social
cognitive theories, combined basic sentiment value lexicon and social evidence lexicon to
improve the traditional polarity lexicon. In 2016, Wang Shih‑Ming et al. in [10] presented
the ANTU (Augmented NTU) Sentiment Dictionary, which was constructed by collecting
sentiment statistics for words from several sentiment annotation exercises. A total of 26,021
Chinese words were collected in ANTUSD. In 2020, Yang Li et al. in [11] proposes a new
sentiment analysis model SLCABG based on a sentiment lexicon that combines a convolu‑
tional neural network (CNN) with a bi‑directional gated recurrent unit (BiGRU) based on
an attention mechanism. The sentiment lexicon was used to enhance the sentiment features
in the comments. CNN and BiGRU networks are used to extract the main sentiment and
contextual features from the comments and weight them using the attention mechanism.
The advantage of sentiment dictionary‑based methods is that they are simple and fast,
but they require manual construction and updating of sentiment dictionaries, and they do
not work well for some special texts (e.g., texts with complex semantics such as irony and
metaphor).
Machine learning based approaches: With the development of machine learning tech‑
niques, models such as RNN, LSTM, CRF and GRU are gradually being proposed by re‑
searchers and people are using machine learning algorithms for text sentiment analysis.
LSTM (Long Short‑Term Memory) [12] is a recurrent neural network (RNN) model
commonly used to process sequential data. LSTM can effectively solve the long‑term de‑
pendency problem in RNN by introducing a special memory unit. The BiLSTM model can
be thought of as processing the input sequence from left to right for one LSTM model and
from right to left for another LSTM model, and finally merging their outputs. The advan‑
tage of this is that not only the previous information but also the subsequent information
can be considered when processing the input at the current time step. Xiao Zheng et al.
in [13] used a bidirectional LSTM (BiLSTM) model for sentiment analysis. The experimen‑
tal results show that BiLSTM outperforms CRF and LSTM for Chinese sentiment analysis.
Similarly, Gan Chenquan et al. in [14] proposed a scalable multi‑channel extended CNN‑
BiLSTM model with attention mechanism for Chinese text sentiment analysis, in which
the convolutional model CNN based on bridging the BiLSTM model and achieved better
result on several public Chinese sentiment analysis datasets.
In addition to LSTM, some researchers select GRU (Gated Recurrent Unit) to handle
sentiment analysis tasks. As a variant of LSTM, GRU has fewer parameters than LSTM, re‑
quires less training data and has a faster training speed. Miao YaLin et al. in [15] proposed
the adoption of the application of CNN‑BiGRU model in Chinese short text sentiment anal‑
ysis, which introduced the BiGRU model based on CNN. Zhang Binlong et al. in [16] pro‑
posed Transformer‑Encoder‑GRU (T‑E‑GRU) to solve the problem of transformer being
naturally insufficient compared to the recurrent model in capturing the sequence features
in the text through positional encoding. Both have achieved good experimental results in
the field of Chinese sentiment analysis.
To leverage the affective dependencies of the sentence, in 2020, Liang Bin et al. in [17]
proposed a graph convolutional network based on SenticNet [18] according to the specific
aspect, called Sentic GCN, and explored a novel solution to construct the graph neural net‑
Appl. Sci. 2023, 13, 4056 3 of 18

works via integrating the affective knowledge from SenticNet to enhance the dependency
graphs of sentences. Experimental results illustrate that SenticNet can beat state‑of‑the‑art
methods. In the same year, Jain Deepak Kumar et al. in [19] proposed BBSO‑FCM model
for sentiment analysis, used Binary Brain Storm Optimization (BBSO) algorithm for the
Feature Selection process and thereby achieved improved classification performance, and
Fuzzy Cognitive Maps (FCMs) were used as a classifier to classify the incidence of posi‑
tive or negative sentiments. Experimental values highlight the improved performance of
BBSO‑FCM model in terms of different measures. In 2021, Sitaula Chiranjibi et al. in [20]
proposed three different feature extraction methods and three different CNNs (Convo‑
lutional Neural Networks) to implement the features using a low resource dataset called
NepCOV19Tweets, which contains COVID‑19‑related tweets in Nepali language. By using
ensemble CNN, they ensemble the three CNNs models. Experimental results show that
proposed feature extraction methods possess the discriminating characteristics for the sen‑
timent classification, and the proposed CNN models impart robust and stable performance
on the proposed features.
However, machine learning‑based methods require large amounts of annotated data
to train the classifier, and require manual selection of features and algorithms. There is
a degree of subjectivity in feature extraction and algorithm selection, with good or bad
feature extraction directly affecting classification results [21] and not easily generalized to
a new corpus.
Deep learning‑based approaches: In recent years, the rise of deep learning techniques
has brought new breakthroughs in text sentiment analysis. In particular, the use of pre‑
trained language models (e.g., BERT, RoBERTa, XLNet, etc.) for fine‑tuning to solve sen‑
timent analysis problems has yielded very good results. This approach does not require
manual feature construction and can handle complex semantic relationships, and therefore
has very promising applications in the field of text sentiment analysis.
BERT (Bidirectional Encoder Representations from Transformers) is a pre‑trained lan‑
guage model based on the transformer structure proposed by Google in 2018, and is cur‑
rently one of the most representative and influential models in the field of natural language
processing [22]. Due to the excellent performance of the BERT model, various variants
derived from it are also widely used in the field of natural language processing, such as
RoBERTa [23], ALBERT [24], ELECTRA [25], etc. The emergence of the BERT model has
greatly promoted the development of the field of natural language processing and has
achieved leading scores in several benchmark tests, becoming an important milestone in
the field of natural language processing. Li Mingzheng et al. in [26] proposed a novel senti‑
ment analysis model for Chinese stock reviews based on BERT. This model relies on a pre‑
trained model to improve the classification accuracy. The model uses a BERT pre‑training
language model to perform sentence‑level representation of stock reviews, and then feeds
the obtained feature vector into the classifier layer for classification. In the experiments, we
demonstrate that our method has higher precision, recall and F1 than TextCNN, TextRNN,
Att‑BLSTM and TextCRNN. Our model can achieve the best results, which indicates its ef‑
fectiveness in Chinese stock review sentiment analysis. Meanwhile, our model has strong
generalization ability and can perform sentiment analysis in many fields.
In 2019, Google proposed XLNet [27], which uses the Permuted Language Model
(PLM) with a two‑stream self‑attention mechanism to outperform the BERT model in 20 nat‑
ural language processing tasks, achieving the best results in 18 tasks. Currently, XLNet
is widely used in natural language processing, covering tasks such as classification and
named entity recognition [28,29].
As part of text classification, sentiment analysis was also an important application of
XLNet. Gong Xin‑Rong et al. in [30] proposed a Broad Autoregressive Language Model
(BroXLNet) to automatically process the sentiment analysis task. BroXLNet integrates the
advantage of generalized autoregressive language modeling and broad learning system,
which has the ability of extracting deep contextual features and randomly searching high‑
Appl. Sci. 2023, 13, 4056 4 of 18

level contextual representation in broad spaces. BroXLNet achieved the best result of 94.0%
in sentiment analysis task of binary Stanford Sentiment Treebank.
XLNet was trained on different languages. Alduailej Alhanouf et al. in [31] proposed
AraXLNet model, which pre‑trained XLNet model in Arabic language for sentiment analy‑
sis. For Chinese language, Cui Yiming et al. in [32] published an unofficial XLNet Chinese
pre‑training model, which was trained from the Chinese Wikipedia corpus, but its word
segmentation model still suffers from the defects of excessively long word segmentation
length, infrequent use of word segmentation, and incomplete coverage of the word list.
To address the above problems, this paper proposes the CWSXLNet (Chinese Word
Segmentation XLNet) model, which improves the XLNet model. First, the original corpus
is segmented in the text pre‑processing stage and the corresponding segmentation codes
are generated; in the pre‑training stage, the corresponding segmentation mask codes are
generated according to the random sequence of PLM. Combined with the two‑stream self‑
attention mechanism and the attention mask with the segmentation mask codes, thus re‑
alising the improvement of Chinese sub‑word location information while using the single
Chinese character as the granularity. It is designed to solve the problem of the XLNet
model for Chinese language processing in terms of character‑to‑word granularity.
For the Chinese sentiment analysis tasks, this paper combines the above researches
and uses the BiGRU model in the downstream task, which can further extract the feature
information of the context. In addition, the CWSXLNet‑BiGRU‑Attention model is pro‑
posed by introducing the self‑attention mechanism in the model to increase its attention to
sentiment‑weighted words. It can further capture the sentiment keywords in the text and
achieve better results in Chinese sentiment analysis tasks.

2. Related Works
2.1. XLNet
XLNet is an autoregressive language model proposed by Google in 2019, inherited
from Transformer‑XL, using the PLM, two‑stream self‑attention mechanism and the
segment‑level recurrence with relative position encoding in Transformer‑XL, so that XL‑
Net has better long text reading ability.

2.1.1. PLM
Traditional autoregressive language models are trained in a unidirectional direction;
they can only predict based on the antecedent information of the predicted words, or from
backwards to forwards through the posterior text of the predicted words. PLM is a model
that generates several sequences of text in random order and masks the words at the end
of the sequence as predictors, assuming that the semantic information of all the preceding
words is available at the end of the sequence. PLM assumes that the end words of the
sequence can contain the semantic information of all the preceding words, and so makes
the prediction and finally completes the training of the model. This approach solves the
problem of the possible relationship between multiple masks being ignored in the BERT
model, and of small differences occurring in the pre‑training and fine‑tuning phases due
to the use of masks.

2.1.2. Dual Stream Self‑Attention Mechanism

To solve the problem of confusing the location information caused by rearranging the
input information in PLM, XLNet introduces a two‑stream self‑attentive mechanism that
divides attention into a content stream and a query stream, where the content stream can
see the current predicted word and retains the semantic information of the current word;
the query stream cannot see the current word and retains the location information of the
current word. When making predictions, the content stream can see the current word
itself, whereas the query stream cannot see the content of the current word. This is how
prediction of disordered text is achieved.
that
thatdivides
dividesattention
attentioninto
intoa acontent
contentstream
streamand
anda aquery
querystream,
stream,where
wherethethecontent
contentstream
stream
can
cansee
seethe
thecurrent
currentpredicted
predictedwordwordand andretains
retainsthe
thesemantic
semanticinformation
informationofofthe thecurrent
current
word;
word;thethequery
querystream
streamcannot
cannotseeseethethecurrent
currentword
wordandandretains
retainsthe
thelocation
locationinformation
information
ofofthe
the current word. When making predictions, the content stream cansee
current word. When making predictions, the content stream can seethe
thecurrent
current
Appl. Sci. 2023, 13, 4056 word
worditself,
itself,whereas
whereasthethequery
querystream
streamcannot
cannotsee
seethe
thecontent
contentofofthe
thecurrent
currentword.
word.This isis
5 of 18
This
how
howprediction
predictionofofdisordered
disorderedtexttextisisachieved.
achieved.

2.2.
2.2.Pre-Training
2.2. Pre-TrainingProcess
Pre‑Training Processof
Process ofofXLNet
XLNetModel
XLNet Model
Model
The pre-training
Thepre‑training
The process
pre-trainingprocess ofofXLNet
processof XLNet mainly
XLNetmainly consists
mainlyconsists ofoftwo
consistsof two stages:
stages:text
twostages: textpre-processing
text pre-processing
pre‑processing
stage and
stageand
stage pre-training
andpre‑training stage,
pre-trainingstage, and
stage,and the
andthe detailed
thedetailed flow
detailedflow chart
flowchart is shown
chartisisshown in Figure
shownininFigure 1.1.
Figure1.

(a)
(a) (b)
(b)
Figure
Figure 1.1.XLNet
Figure1. XLNettraining
XLNet trainingprocess.
training process.(a)
process. (a)The
(a) Thetext
The textpre-processing
text pre-processingprocess;
pre‑processing process;(b)
process; (b)The
(b) Thepre-training
The pre-trainingprocess.
pre‑training process.
process.

2.2.1.Text
2.2.1. TextPre-Processing
Pre‑Processing
Pre-Processing
XLNet
XLNetmodelmodelusesusesSentencePiece
SentencePiece[33], [33],a atokenizer
tokenizerprovided
providedby byGoogle. The
Google.The Senten‑
TheSenten-
Senten-
cePiece model is trained on
on the
the original
original text.
text. It uses the BPE algorithm
cePiece model is trained on the original text. It uses the BPE algorithm [34] to separate [34] to separate
words
wordsandandstatistics
statisticsfrom
fromthetheoriginal
originaltext,
text,obtaining
obtainingthe word
theword separation
wordseparation strategy
separationstrategy
strategyby by con‑
bycon-
con-
tinuously
tinuouslymerging
mergingthe themore
morefrequent
frequentsub‑words.
sub-words.
sub-words.
The
Thetrained
trainedSentencePiece
SentencePieceisisused usedtotosplit
splitthe theoriginal
originaltext
textinto
intovectors
vectorsand
andcomplete
complete
the word embedding. After the transformation, merging, splitting
the word embedding. After the transformation, merging, splitting and disruptingsteps,
After the transformation, merging, splitting and
and disrupting
disrupting steps,
steps,
each piece of training data is transformed into a Feature containing input,
each piece of training data is transformed into a Feature containing input, tgt, is_masked, tgt, is_masked,
is_masked,
seg_id
seg_idandandlabel
labelandandsosoon. The
on.The structure
Thestructure
structureof ofofinput
input
inputisisshown Figure22(the
shownininFigure (themaximum
maximum
length
length of the
lengthofofthe sentence
thesentence is assumed
sentenceisisassumed to be 128).
assumedtotobebe128).128).

Figure 2.2.Structure
Figure2.
Figure Structureof
Structure ofof“input”.
“input”.
“input”.

The first half of reuse_len is the reuse of the last 64 tokens of the previous input data,
and the second half is the structure “A + <SEP> + B + <SEP> + <CLS>”, where A and B
are two vectors with a <SEP> control character at the end of A and B, respectively, as a
statement separator A and B are two text vectors, and a <CLS> control character at the end
of the entire data as the end of the entire input data.
Vector A and vector B have a 50% probability of being consecutive contexts and a
further 50% probability of being randomly chosen vector B. The lengths of A and B are not
The first half of reuse_len is the reuse of the last 64 tokens of the previous input data,
and the second half is the structure “A + <SEP> + B + <SEP> + <CLS>”, where A and B are
two vectors with a <SEP> control character at the end of A and B, respectively, as a state-
ment separator A and B are two text vectors, and a <CLS> control character at the end of
Appl. Sci. 2023, 13, 4056 the entire data as the end of the entire input data. 6 of 18
Vector A and vector B have a 50% probability of being consecutive contexts and a
further 50% probability of being randomly chosen vector B. The lengths of A and B are not
fixed,but
fixed, butthe
thelength
lengthof of“A
“A++<SEP>
<SEP>++BB++<SEP> <SEP>++<CLS>”
<CLS>”isisequal
equalto tothe
themaximum
maximumlength
length
of the model sentence.
of the model sentence.
Together with
Together with thethe input,
input, tgt,
tgt, is_masked,
is_masked, seg_id
seg_id and
and label
label are
are generated,
generated, which
which to‑
to-
gether form a Feature, and Features together form
gether form a Feature, and Features together form TFRecord files. TFRecord files.
Wheretgt
Where tgtisisthe
thetarget
targetvector
vectorof ofinput,
input,with
withaalength
lengthofof128
128tokens,
tokens,the
thefirst
first126
126tokens
tokens
are the
are the next
next token
token corresponding
corresponding to to input,
input, which
which isis equivalent
equivalent to to shifting
shifting the
the whole
whole of of
input to the left by one token length, and the last two tokens
input to the left by one token length, and the last two tokens are <CLS>. are <CLS>.
Is_maskedindicates
Is_masked indicateswhich
whichof ofthethe128
128tokens
tokensin ininput
inputare
aremasked,
masked,assigning
assigning00to tothe
the
unmasked and 1 to the
unmasked and 1 to the masked. masked.
Seg_id isis used
Seg_id used to distinguish vector vector A Afrom
fromvector
vectorB,B,where
wherereuse
reuse+ A
+A + <SEP>
+ <SEP>is as-
is
signed 0,0,BB+ +<SEP>
assigned <SEP>isisassigned
assigned1 1andand<CLS>
<CLS>isisassigned
assigned2.2.
Labelisisused
Label usedto todistinguish
distinguishwhether
whethervectorvectorAAisiscontinuous
continuouswith withvector
vectorB.B.

2.2.2.
2.2.2.Pre‑Training
Pre-Training
In
Inthe
thepre‑training
pre-trainingphase,
phase, the
the XLNet
XLNet model
modelreadsreadsthe
theTFRecords
TFRecordsdatadatagenerated
generatedinin
the
the pre‑processing
pre-processing phase, performs
performs random
randomreordering
reorderingon oneach
eachdata,
data, and
and finally
finally gener‑
generates
ates
the the perm_mask
perm_mask matrix
matrix corresponding
corresponding to toeach
eachdata,
data, as
as shown
shown in Figure
Figure 3.3. Where
Where
perm_mask[i][j]
perm_mask[i][j] = 1 (hollow circle) indicates that the ith token cannot detect the jthtoken
= 1 (hollow circle) indicates that the ith token cannot detect the jth token
after
afterreordering;
reordering;conversely,
conversely,perm_mask[i][j]
perm_mask[i][j]==00(solid
(solidcircle)
circle)indicates
indicatesthat
thatthetheith
ithtoken
token
can
candetect
detectthe
thejth
jthtoken.
token.

Figure3.3.An
Figure Anexample
exampleof
ofperm_mask
perm_maskmatrix.
matrix.

In
In the
the subsequent
subsequent pre‑training
pre-training process,
process, the
the perm_mask
perm_mask matrix
matrix isis transformed
transformed into
into
attn_mask
attn_maskmatrix
matrixafter
aftersplitting,
splitting,splicing
splicingand
anddeformation
deformationoperations,
operations,and
andisisused
usedin
inthe
the
calculation
calculationof ofattn_score
attn_scorewith
withthetheformula
formulaasasininEquation
Equation(1).
(1).
attn_score==attn_score
attn_score attn_score − 30
− 10 × attn_mask.
10× attn_mask. (1)
(1)
If the element in attn_mask[i][j] is 0, it means that i can notice j, and attn_score[i][j]
If the element in attn_mask[i][j] is 0, it means that i can notice j, and attn_score[i][j]
remains unchanged at that time.
remains unchanged at that time.
If the element in attn_mask[i][j] is 1, it means that i cannot notice j, and attn_score[i][j]
becomes a large negative number, resulting in the probability of the next softmax operation
being close to 0 to achieve the effect of the mask.

2.3. BiGRU
The Gated Recurrent Unit (GRU) is a type of recurrent neural network. Compared
to LSTM, GRU streamlines the number of gates from three gates (forgetting gate, input
gate and output gate) to two gates (reset gate and update gate), and merges cell state and
becomes a large negative number, resulting in the probability of the next softmax opera-
tion being close to 0 to achieve the eﬀect of the mask.

2.3. BiGRU
Appl. Sci. 2023, 13, 4056 The Gated Recurrent Unit (GRU) is a type of recurrent neural network. Compared 7 ofto18
LSTM, GRU streamlines the number of gates from three gates (forgetting gate, input gate
and output gate) to two gates (reset gate and update gate), and merges cell state and hid-
den state.state.
hidden In addition, GRU GRU
In addition, has fewer paramete
has fewer rs than LSTM,
parameters requires
than LSTM, less training
requires data
less training
and has a faster training speed.
data and has a faster training speed.
The
TheGRU
GRUarchitecture
architectureisisshown
shownininFigure
Figure4.4.

Figure4.4.Structure
Figure StructureofofGRU
GRUunit.
unit.

Thetwo
The twodoors
doorsofofthe
theGRU
GRUare
arecalculated
calculatedasasfollows:
follows:
Reset gate r
Reset gate 𝑟 t::
rt = σ (W r xt + U r ht−1 + br ). (2)
𝑟 = 𝜎(𝑊 𝑥 + 𝑈 ℎ + 𝑏 ). (2)
Update gate z : t
Update gate 𝑧 : z t = σ (W z x t + U z h t − 1 + b z ) . (3)
The formula for the calculation
𝑧 = 𝜎(𝑊of𝑥 the
+𝑈 candidate
ℎ + 𝑏hidden
). layers is (3)
∼
The formula for the calculation of the candidate hidden layers is
ht = tanh(Wxt + U (rt ⊙ ht−1 ) + b). (4)
ℎ = 𝑡𝑎𝑛ℎ(𝑊𝑥 + 𝑈(𝑟 ⊙ ℎ ) + 𝑏). (4)
Finally, the hidden layer information at time t is calculated ht
Finally, the hidden layer information at time t is calculated ℎ
∼
h = (1 − z ) ⊙ h −1 + z t ⊙ h t . (5)
(5)
ℎ = (t 1 − 𝑧 ) ⊙t ℎ t+ 𝑧 ⊙ℎ .
In
Inthe
theabove equation,𝜎σ isis the
aboveequation, thesigmoid
sigmoidactivation
activationfunction and⊙
functionand ⊙isisthe
theHadamard
Hadamard
product
productoperation
operationofofthe
thematrix.
matrix.The TheHadamard
Hadamardproduct
productofofmatrix
matrixAAandandmatrix
matrixBBisisde- de‑
noted as A⊙B.
noted as A ⊙ B. For matrix A = [aij ] and matrix B = [bij ], the elements of matrix A⊙B are de‑
A = [a ij] and matrix B = [bij], the elements of matrix A ⊙ B are
de finedasas
fined the
the product
product of of
thethe corresponding
corresponding elements
elements of the
of the twotwo matrices,
matrices, (A⊙(A
i.e., i.e., B)⊙
ij =B)
aijij b=ij .
aijbij. To capture the contextual information of the text, BiGRU, a bidirectional GRU net‑
work, To can
capture the contextual
be used, information
which consists of the GRU
of a forward text, BiGRU, a bidirectional
and a reverse GRU, asGRU shown net-in
Appl. Sci. 2023, 13, x FOR PEER REVIEW
work,
Figurecan 5. be
Theused, which
forward GRUconsists
provides of athe
forward
aboveGRU and a reverse
information GRU,
of the text and as the
shown 8 ofin18
reverse
Figure 5. The forward
GRU provides GRU
the below provides the
information abov
of the e to
text information
capture theofcontextual
the text and the reverse
information.
GRU provides the below information of the text to capture the contextual information.

Figure5.
Figure 5. Network
Network structure
structure of
of BiGRU.
BiGRU.

2.4. Self-Attention
Self-attention is able to notice the interconnectedness of words within the input ut-
terance, giving the model a stronger ability to grasp emotionally weighted words. Its ma-
trix form is computed as follows, first computing the three matrices Q, K and V :
Appl. Sci. 2023, 13, 4056 8 of 18

2.4. Self‑Attention
Self‑attention is able to notice the interconnectedness of words within the input utter‑
ance, giving the model a stronger ability to grasp emotionally weighted words. Its matrix
form is computed as follows, first computing the three matrices Q, K and V:

Q = W q · I, (6)

K = W k · I, (7)
V = W v · I. (8)
Wq , Wk , Wv are trainable parameter matrices and is a matrix composed of word vec‑
tors.
After calculating Q, K and V, we obtain A:

A = K T · Q. (9)

A′ is obtained by softmax normalization of A:

( )
′ A
A = so f tmax √ . (10)
dk

The final output is calculated as O:

O = V · A′ . (11)

Combining each of the above steps obtains the formula for self‑attention:

QK T
Attention( Q, K, V ) = so f tmax ( √ )V. (12)
dk

3. CWSXLNet
3.1. Deficiencies of XLNet Model in Handling Chinese
When the XLNet model is used to process Chinese, because it uses SentencePiece
for text segmentation, and SentencePiece tends to split longer sub‑words when using the
BPE method for word segmentation, these longer words, when separated from the origi‑
nal corpus, tend to be used less frequently in other text, causing the waste of the limited
positional space of the word table. Furthermore, because SentencePiece tends to segment
longer words rather than individual Chinese character, the required word list is unusu‑
ally large. With a limited word list length of 32,000, there are still many commonly used
Chinese words not included in the word list, which can only be represented by <UNK>
during word embedding, affecting the model’s understanding of semantics and sentiment
analysis.
Figure 6 shows the sub‑words of the word list in [31], which proposed Chinese pre‑
trained XLNet model. The numbers in Figure 6 indicate the weights. It can be seen that
SentencePiece prefers to split out longer sub‑words, which are less frequently used in the
detached pre‑trained corpus and waste space in the word list.
Figure 6 shows the sub-words of the word list in [31], which proposed Chinese
trained XLNet model. The numbers in Figure 6 indicate the weights. It can be seen
Appl. Sci. 2023, 13, 4056 SentencePiece prefers to split out longer sub-words, which are less frequently
9 of 18 used in

detached pre-trained corpus and waste space in the word list.

Figure
Figure6.6.Word
Wordlists andand
lists weights in Chinese
weights XLNet.XLNet.
in Chinese
The basic unit of the Chinese language is the Chinese character. Several Chinese char‑
actersThe
make basic unit ofSeveral
up words. the Chinese
Chinese language
words form is the ChineseUnlike
sentences. character. Several
English, ChineseChinese c
actersare
words make up words.
not separated by Several Chinese
spaces, and words
there are form sentences.
approximately Unlike English,
55,000 commonly used Chi
words are
Chinese notFor
words. separated
comparison,by spaces,
the lengthand there are
of XLNet’s approximately
word list is 32,000. To55,000 commonly
avoid prob‑
lems
Chinesecaused by long
words. word
For lists, we reduce
comparison, the the granularity
length to the word
of XLNet’s Chinese character
list instead
is 32,000. To avoid p
of the Chinese word. However, this approach creates another problem: the relationship
lems caused by long word lists, we reduce the granularity to the Chinese character ins
between characters from the same word is lost. To avoid this, we discovered a way to
of the Chinese word. However, this approach creates another problem: the relation
improve the connection between characters from the same word.
between characters
First, we from
want to use the same
a tokenized toolword is lost.
to separate To avoid
Chinese wordsthis,
from we discovered
sentences, and a wa
improve
use spaces the connection
to separate Chinesebetween characters
words like in English. from the same
Secondly, word.the granularity
we reduce
to the First,
Chinese
wecharacter.
want to Finally, we need totool
use a tokenized improve the relationship
to separate Chinese between
wordscharacters
from sentences,
that form the same word, and make sure that the improvement
use spaces to separate Chinese words like in English. Secondly, we reduce fits perfectly to the XLNet
the granul
model. This is a brief introduction to our CWSXLNet model; we explain the model in more
to the Chinese character. Finally, we need to improve the relationship between chara
detail below.
that form the same word, and make sure that the improvement fits perfectly to the XL
model.
3.2. This isModel
CWSXLNet a brief introduction to our CWSXLNet model; we explain the model in m
detail
In below.
this paper, we propose the CWSXLNet model, which aims to solve the natural
non‑adaptation problem of the SentencePiece model used in the XLNet model for Chinese.
Improvements
3.2. CWSXLNet areModel
made in the data pre‑processing phase and the pre‑training phase of
XLNet.
InInthe
this paper,
data we propose
pre‑processing stage,the
the CWSXLNet model,
model used the which
LTP [35] aimsseparation
as a word to solve the nat
non-adaptation problem of the SentencePiece model used in the XLNet
tool to separate the original corpus with spaces between words. The following Figuremodel7for Chin
Improvements
shows an exampleare made
of the in the
text after data
word pre-proc
separation essing
using thephase
LTP wordandseparation
the pre-training
tool. pha
XLNet.
In the data pre-processing stage, the model used the LTP [35] as a word separa
tool to separate the original corpus with spaces between words. The following Figu
shows an example of the text after word separation using the LTP word separation to
Appl. Sci. 2023, 13, x FOR PEER REVIEW 10 of 18
Appl.
Appl. Sci.
Sci. 2023
2023,, 13
13,, x4056
FOR PEER REVIEW 1010ofof 18
18

(a) (b)
(a) (b)
Figure 7. Original text after using LTP. (a) The original text; (b) After using LTP.
Figure 7. Original text after using LTP. (a) The original text; (b) After using LTP.
Figure 7. Original text after using LTP. (a) The original text; (b) After using LTP.
In order to reduce the training granularity to words while retaining the word sepa-
In order
order to
In informationto reduce
reduce the trai ning granularity
theoriginal
training granularity to
to words
words while retaining
retaining the word
wordsepa-
ration in the text, CWSXLNet trains while
the SentencePiece themodel sepa‑
at the
ration
ration information
information in
in the
the original
original text,
text, CWSXLNet
CWSXLNet trains
trains the
the SentencePiece
SentencePiece model
model at
at the
the
granularity of a single Chinese character. First, a total of 14,516 Chinese characters in the
granularity
granularity of
of aa single
single Chinese
Chineseusing character.
character. First, aa total
First,tool.total of
of 14,516
14,516 Chinese
Chinese characters
characters in
in the
the
dictionary book are crawled a crawler These are fed into the SentencePiece
dictionary
dictionary book are crawled using a crawler tool. These These are
are fed
fed into
into the
the SentencePiece
SentencePiece
model as training text for character-based partitioning training. Finally, the SentencePiece
model
model as training text for character-based partitioning training. Finally, the SentencePiece
modelas is training
trained to segment character‑based
Chinese textspartitioning
as single characters. Finally, the SentencePiece
model
modelThe is
is trained
trained to segment Chin ese texts as single characters.
Chinese
original corpus is fed into the SentencePiece model, which is trained as described
The
TheInoriginal
original corpus
corpus is fed
fed into
into the
istraining the SentencePiece
SentencePiece model,
model, which
which is trained as
as described
above. the subsequent phase, the proposed model also is trained
uses described
the character “▁”
above.
above. In
In the
the subsequent
subsequent training
training phase,
phase, the
the proposed
proposed model
model also
also uses
uses the
thecharacter
character “▁ ”
“�”
as a word separation marker and refers to the character “ ▁” as a <TOK> marker, which is
as
as aa word
word separation
separation markermarker and and refers
refersto tothe
thecharacter
character““�” ▁” as a <TOK> marker, which which is
used to detect the boundary between words.
used
used to to detect
detect the boundary between words.
When the original text is processed to generate Features, the <TOK> token is used as
When
When the the original
original text
text is is processed
processed to to gene rate Features,
generate Features, the the <TOK>
<TOK> tokentoken isis used
used as as
a word delimiter to determine the position of each word. The tok_id vector of each data is
aa word
word delimiter
delimiter to determine the position of ea each
ch word. The The tok_id
tok_id vector
vector ofof each
each data
data isis
generated to determine which word of the data the current character belongs to, where
generated
generated to determine which word of the data the current character
determine which word of the data the current character belongs to, where belongs to, where the
the tok_id of the <TOK> token is 0 and the tok_id of the control characters <SEP> and
tok_id
the of the
tok_id of <TOK>
the <TOK> token is 0 and
token is 0 the
andtok_id of theof
the tok_id control characters
the control <SEP> <SEP>
characters and <CLS>and
<CLS> is −1; the is_TOK vector of the data is generated at the same time, where the posi-
is −1; the
<CLS> is −1;is_TOK vectorvector
the is_TOK of the of data
theisdata generated at theat
is generated same time, time,
the same wherewhere
the position
the posi- of
tion of the TOK token is true and the rest position is false, which is used to determine
the TOK
tion of the token
TOKis token
true and the rest
is true andposition
the restispo false,
sition which is used
is false, whichto determine
is used towhether
determine the
whether the current token is a TOK token or not. The above two pieces of data are stored
current token
whether is a TOK
the current tokentokenis a or
TOKnot.token
The above
or not.two Thepieces
above of two data are stored
pieces of datawith input,
are stored
with input, tgt, label, seg_id and is_masked in the TFRecord files.
tgt, label, seg_id and is_masked in the TFRecord
with input, tgt, label, seg_id and is_masked in the TFRecord files. files.
Figure 8 shows the process of generating the corresponding tok_id of a text with
Figure 88 shows
Figure shows the the process
process of of generating
generating the the corresponding
corresponding tok_id tok_id of of aa text
text with
with
is_TOK. For illustration
illustration purposes,
purposes, the the lengths
lengths of of reuse_len
reuse_len and and thethe vector
vector BB are
are set
set to
to 0.
is_TOK. For illustration purposes, the lengths of reuse_len and the vector B are set to 0.0.
is_TOK. For
Theoriginal
The originaltext textisis““今天天气真不错
今天天气真不错 (It’s (It’sreally
(It’s reallyniceniceweather
weathertoday)”.
today)”. The The LTP
LTP separated
separated
The original text is “今天天气真不错 really nice weather today)”. The LTP separated
the
the text to““今天
text to (today)”, ““天气
今天 (today)”, (weather)”, ““真
天气 (weather)”, (really)”, ““不错
真 (really)”, 不错 (nice)”.
(nice)”.
the text to “今天 (today)”, “天气 (weather)”, “真 (really)”, “不错 (nice)”.

Figure8.
Figure 8. Generation
Generation of
of vector
vector tok_id
tok_id and
and is_TOK.
is_TOK.
Figure 8. Generation of vector tok_id and is_TOK.
Appl.Sci.
Appl.
Appl. Sci.2023,
Sci. 2023,,13,
2023 13,,4056
13 xx FOR
FOR PEER
PEER REVIEW
REVIEW 11
11 of 18
11 of 18
18

Inputs
Inputsis isthe
theresult
resultofofvectorizing
vectorizingthe thetext
textbybyadding
addingthe the<SEP>
<SEP>and and<CLS>
<CLS>flags
flagstotothe
the
end
end of ofthe
theoriginal
originaltext,
text,where
where99isis<TOK>,
<TOK>,33isis<SEP> <SEP>and and44isis<CLS>.
<CLS>.
In
In the
the pre‑training
pre-training phase,
phase, the
the tok_mask
tok_mask matrix matrix isis computed
computed using using the
the same
same random
random
sequence,
sequence,while whilethe therandom
randomsequence
sequenceisisused usedtotogenerate
generatethe theperm_mask.
perm_mask.Following
Followingthe the
example in Figure 8, at this point, assuming the random sequence
example in Figure 8, at this point, assuming the random sequence index = [4, 6, 7, 10, 3, index = [4, 6, 7, 10, 3, 11,
0,
11,1, 0,
8, 9,
1, 2,8,5],
9, then
2, 5],the tok_index
then = [2, 3, 0,= −
the tok_index [2,1,3, −1,
2, 0, −1,1, 2,
1, 4,
−1,4,1,0, 1,
0],4,which
4, 0, is
0],how
whichreordered
is how
version
reordered of tok_id
versionbyofindex
tok_idis by
obtained.
index is obtained.
Matrix
Matrix A A is
is obtained
obtained by by transposing
transposing tok_index
tok_index and and broadcasting
broadcasting the the columns,
columns, and and
matrix
matrixBBisisobtained
obtainedby bybroadcasting
broadcastingthe therows
rowsof oftok_id.
tok_id.Elements
Elementsin intok_mask
tok_maskmatrixmatrixare are
set to 1 if Matrix A and B’s corresponding elements are equal
set to 1 if Matrix A and B’s corresponding elements are equal and 0 if they are not. The and 0 if they are not. The
calculation
calculationflowchart
flowchartisisshown
shownin inFigure
Figure9.9.

Figure9.
Figure
Figure 9.Generation
9. Generationof
Generation ofthe
of thetok_mask
the tok_maskmatrix.
tok_mask matrix.
matrix.

The
Thestructure
structureofofthe
thetok_mask
tok_maskmatrix
matrixisisshown
shownininFigure
Figure10.
10.tok_mask[i][j]
tok_mask[i][j]= =1 1(diago‑
(diag-
nal circle)
onal means
circle) that
means thethe
that ith ith
token belongs
token to the
belongs to same Chinese
the same wordword
Chinese as theasjththe
token after
jth token
disordering, and conversely tok_mask[i][j] = 0 (hollow circle) means that the ith
after disordering, and conversely tok_mask[i][j] = 0 (hollow circle) means that the ith token token does
not
doesbelong to theto
not belong same
the Chinese word word
same Chinese as theasjththe
token.
jth token.

Figure10.
Figure
Figure 10.The
10. Thetok_mask
The tok_maskmatrix.
tok_mask matrix.
matrix.
Appl. Sci. 2023
2023,, 13,
13, 4056
x FOR PEER REVIEW 12of
12 of 18
Appl. Sci. 2023, 13, x FOR PEER REVIEW 12 of 18

After generating tok_mask, the following operations

operations are
are performed
performed on
on perm_mask:
perm_mask:
After generating tok_mask, the following operations are performed on perm_mask:
𝑝𝑒𝑟𝑚_𝑚𝑎𝑠𝑘 = 𝑝𝑒𝑟𝑚_𝑚𝑎𝑠𝑘
perm_mask = perm_mask −− tok_mask××𝛼,α,
𝑡𝑜𝑘_𝑚𝑎𝑠𝑘 (13)
(13)
(13)
𝑝𝑒𝑟𝑚_𝑚𝑎𝑠𝑘 = 𝑝𝑒𝑟𝑚_𝑚𝑎𝑠𝑘 − 𝑡𝑜𝑘_𝑚𝑎𝑠𝑘 × 𝛼,
perm_mask = clip( perm_mask, α − 1, 1), (14)
𝑝𝑒𝑟𝑚_𝑚𝑎𝑠𝑘 = 𝑐𝑙𝑖𝑝(𝑝𝑒𝑟𝑚_𝑚𝑎𝑠𝑘, 𝛼 − 1,1), (14)
𝑝𝑒𝑟𝑚_𝑚𝑎𝑠𝑘 = 𝑐𝑙𝑖𝑝(𝑝𝑒𝑟𝑚_𝑚𝑎𝑠𝑘, 𝛼 − 1,1),
where α is the masking argument of tok_mask in the range (0,1), which is used to control
where α is the masking argument of tok_mask in the range (0,1), which is used to control
the degree of “masking” of the mask, and clip is the clipping operation, which sets the
the degree of “masking” of the mask, and clip is the clipping operation, which sets the
range of each element in the perm_mask matrixmatrix from
from the original ((−
the original −αα,1) α −−1,1).
to((α
,1) to 1,1).
range of each element in the perm_mask matrix from the original (−α,1) to (α − 1,1).
After the above operation, the degree of masking of some originally masked words
After the above operation, the degree of masking of some originally masked words
in After the above operation, thethe
degree of of
masking of some originally maskedwords
words
in perm_mask is
perm_mask “reduced”,
is “reduced”, while
while degree
the degree unmasking
of unmasking of some
of some non‑masked
non-masked words
in
is perm_mask is “reduced”, while the degree of unmasking of some non-masked words
is “enhanced”
“enhanced” (as
(as shown
shown in
in Figure
Figure 11).
11).
is “enhanced” (as shown in Figure 11).

Figure 11. Masking of perm_mask.

Figure
Figure11.
11.Masking
Maskingofofperm_mask.
perm_mask.

This is
This is reflected inin Equation (1),
(1), which calculates
calculates the attn_score.
attn_score.
This isreflected
reflected inEquation
Equation (1),which
which calculatesthethe attn_score.
At this
At point, the element values in the aattn_mask
ttn_mask matrix
matrix can
can be one
one of the four
four cases
At this point, the element values in the attn_mask matrix canbe
this point, the element values in the be oneofofthe
the fourcases
cases
1, 0, 11 −
1, 0, , αα–−1,1,asasshown
− αα, shown ininFigure
Figure12.
12.
1, 0, 1 − α, α – 1, as shown in Figure 12.

Figure 12.
Figure 12. The aattn_mask
ttn_mask matrix, where 1 (solid circle) and 00 (hollow circle) represent masked and
Figure 12.The
The attn_maskmatrix,
matrix,where
where11(solid
(solidcircle)
circle)and
and 0(hollow
(hollowcircle)
circle)represent
representmasked
maskedand
and
non-masked cases.
non‑masked cases.
non-masked cases.

As we see
see inFigure
Figure8,8,the
the LTP shows that Chinese word “天气
“ 天气 (weather)” is sepa-
Aswewe seeinin LTP
Figure 8, the shows
LTP that
shows Chinese
that word
Chinese word “天气(weather)” is separated
(weather)” is sepa-
a space. It indicates that Chinese character “ 天 (sky)” and “ 气
rated by a space. It indicates that Chinese character “天 (sky)” and “气 (air)” belong to
by (air)” belong to Chinese
rated“by
word 天气 a space. It indicates
(weather)”, and we that Chinese
hope character
to enhance “天 (sky)” and
the relationship “气 (air)”
between belong
“ 天 (sky)” to to
“
Chinese word “天气 (weather)”, and we hope to enhance the relationship between “天
气Chinese
(air)”, and, “天气 (weather)”,
wordconversely, “ 气 (air)”andand 天 (sky)”.
we“ hope to enhance theperm_mask
From the relationshipmatrix
between “天
in the
(sky)” to “ 气 (air)”, and, conversely, “气 (air)” and “天 (sky)”. From the perm_mask
(sky)” to “ 气 (air)”, and, conversely, “气 (air)” and “天 (sky)”. From the perm_mask
Appl. Sci. 2023, 13, 4056 13 of 18

middle of Figure 10, we know that “ 天 (sky)” is masked; to enhance the relationship, we
use the tok_mask matrix to reduce the degree of mask from “ 气 (air)” to “ 天 (sky)” and
increase attention from “ 天 (sky)” to “ 气 (air)”.
If the element in attn_mask[i][j] is 1‑α (solid diagonal circle), this means that originally
i could not notice j, but since i and j belong to the same Chinese word, the attentional
masking of j by i is “reduced” and the probability of attn_score[i][j] is reduced compared to
the original one, which increases the probability of the subsequent softmax prediction. For
example, the element in row 1 and column 5 in Figure 12 means that the first element “ 气
(air)” would not be able to notice the unordered fifth element “ 天 (sky)” after disordering,
because “ 天 (sky)” is masked for “ 气 (air)”. However, since “ 天 (sky)” and “ 气 (air)”
belong to the same Chinese word “ 天气 (weather)”, the extent of mask from “ 气 (air)”
to “ 天 (sky)” is “reduced”. Since “ 气 (air)” is not masked for “ 天 (sky)”, the attention is
increased from “ 气 (air)” to “ 天 (sky)”.
As we see in Figure 8, the LTP shows that Chinese word “ 不错 (nice)” is separated
by space. It indicates that Chinese character “ 不 (not)” and “ 错 (bad)” belong to Chinese
word “ 不错 (nice)”, and we hope to enhance the relationship between “ 不 (not)” and “
错 (bad)”, and, conversely, “ 错 (bad)” and “ 不 (not)”. From the perm_mask matrix in
the middle of Figure 10, we know that both “ 不 (not)” and “ 错 (bad)” are unmasked; to
enhance the relationship, we use the tok_mask matrix to increase attention from “ 不 (not)”
to “ 错 (bad)” and “ 错 (bad)” to “ 不 (not)”.
If attn_mask[i][j] is α−1 (hollow diagonal circle), this means that originally i can notice
j, but because i and j belong to the same phrase, i’s attention to j is “increased” compared
to the original, attn_score[i][j] is increased, which also increases the probability of softmax
prediction. For example, the elements in row 9 and column 10 in Figure 12 indicate that the
ninth element “ 不 (not)” after disordering is able to detect the tenth element “ 错 (bad)”,
which is not disordered. Since “ 不 (not)” and “ 错 (bad)” belong to the same Chinese word,
we increase the attention from “ 不 (not)” to “ 错 (bad)” and “ 错 (bad)” to “ 不 (not)”.
In summary, we have reduced the granularity of Chinese natural language processing
from Chinese sub‑word to single Chinese character. To solve the problem of information
loss by splitting Chinese words into characters, we proposed a method to enhance the rela‑
tionship between characters from the same word. The embodiment of this approach in the
XLNet model is the tok_mask matrix. To further enhance the sentiment analysis capability,
we combined BIGRU and self‑attention to form CWSXLNet. In Section 4, the experimental
result shows that it definitely improves the ability of Chinese sentiment analysis.

4. Experiment
To demonstrate the effectiveness of the CWSXLNet model and the CWSXLNet‑BiGRU‑
Attention structure proposed in this paper, experiments were conducted on two public
Chinese sentiment analysis datasets.

4.1. Experimental Environment

The experimental code is developed based on the TensorFlow framework, and the
cloud GPU server is used as the experimental runtime environment, and the experimental
environment specifics are shown in Table 1.

Table 1. Experimental environment.

Name Parameters
Operating System Ubuntu 18.04
Memory 32 G
GPU Tesla T4
GPU Memory 16 G
Appl. Sci. 2023, 13, 4056 14 of 18

4.2. Parameters
The pre‑training corpus was selected from the Chinese Wikipedia corpus, with a total
size of about 2.5 G, and the tokenizer was selected from the LTP/base model. The details
of the parameters in text pre‑processing are shown in Table 2.

Table 2. Parameters in text pre‑processing.

Name Parameters Description

batch_size 8 Batch size.
seq_len 512 Sequence length.
reuse_len 256 Number of tokens that can be reused as memory.
bi_data True Whether to create bidirectional data.
mask_alpha 6 The number of tokens to form a group.
mask_beta 1 The number of tokens to mask within each group.
num_predict 85 The number of tokens to predict.

The details of the parameters in pre‑training are shown in Table 3.

Table 3. Parameters in pre‑training.

Name Parameters Description

n_layer 6 Number of layers.
d_model 768 Dimension of the model.
d_embed 768 Dimension of the embedding.
n_head 12 Number of attention heads.
d_head 64 Dimension of each attention head.
Dimension of inner hidden size in position‑wise
d_inner 3072
feed‑forward.
weight_decay 0.01 Weight decay rate.
adam_epsilon 1 × 10−6 Adam epsilon.
learning_rate 2.5 × 10−5 Maximum learning rate.
dropout 0.1 Dropout rate.
dropatt 0.1 Attention dropout rate.
train_steps 100k Total number of training steps.
ff_activation gelu Activation type used in position‑wise feed‑forward.
mask_arg 1 × 10−34 Mask argument α in Equation (13).

The details of the parameters in fine tune are shown in Table 4.

Table 4. Parameters in fine tune.

Name Parameters Description

Learning rate 2 × 10−5 Maximum learning rate.
GRU_units 128 Number of GRU units.
GRU_dropout 0.5 GRU dropout rate.
dropout 0.3 Dropout rate.

4.3. Data Sets

4.3.1. ChnSentiCorp Dataset
The dataset is the hotel accommodation reviews collected by Songbo Tan, which is
the dichotomous sentiment analysis dataset.

4.3.2. Weibo_senti_100k Dataset

The dichotomous dataset obtained from Sina Weibo comment data has 119,988 pieces
of data, of which there are 59,993 positive samples and 59,995 negative samples.
of data, of which there are 59,993 positive samples and 59,995 negative samples.

4.4. Evaluation Indicators

The evaluation index consists of the precision rate P, the recall rate R and the F1-
Appl. Sci. 2023, 13, 4056 score, which are calculated as follows: 15 of 18

𝑇𝑃 (15)
𝑃=
4.4. Evaluation Indicators 𝑇𝑃 + 𝐹𝑃
The evaluation index consists of the precision rate P, the recall rate R and the F1‑score,
which are calculated as follows: 𝑇𝑃 (16)
𝑅= TP
P =𝑇𝑃 + 𝐹𝑁 (15)
TP + FP

R 2𝑅𝑃
TP
(16) (17)
𝐹 ==TP + FN
𝑅+𝑃
where TP is the number of positive samples 2RP FP is the
considered positive by the model, (17)
F1 =
R + P
number of negative samples considered positive by the model, and FN is the number of
where samples
positive TP is the considered
number of positive
negativesamples
by theconsidered
model. positive by the model, FP is the
number of negative samples considered positive by the model, and FN is the number of
positive samples considered negative by the model.
5. Results and Discussion
5. The experimental
Results results of the ChnSentiCorp dataset are shown in the following Ta-
and Discussion
ble 5 and
TheFigure 13, and the
experimental LTPofprocess
results was performed
the ChnSentiCorp on are
dataset the shown
training
incorpus to maintain
the following
consistency
Table 5 andbetween pre-training
Figure 13, and the LTPand fine-tuning.
process was performed on the training corpus to main‑
tain consistency between pre‑training and fine‑tuning.
Table 5. ChnSentiCorp dataset experimental results.
Table 5. ChnSentiCorp dataset experimental results.
Model P (%) R (%) F1-Score (%)
Model P (%) R (%) F1‑Score (%)
LSTM 84.25 84.01 84.12
LSTM 84.25 84.01 84.12
BiLSTM 86.94 86.50 86.72
BiLSTM 86.94 86.50 86.72
BERT 89.92
BERT 89.9289.90 89.90 89.91
89.91
XLNet XLNet
88.83 88.8387.92 87.92 88.37
88.37
CWSXLNet 89.91 91.53 90.71
CWSXLNet 89.91
CWSXLNet‑BiGRU‑Attention 92.61
91.53 93.19
90.71
92.90
CWSXLNet-BiGRU-Attention 92.61 93.19 92.90

Figure 13. ChnSentiCorp dataset experimental results.

The experimental results of the Weibo_senti_100k dataset are shown in Table 6 and
Figure 14 below, and the LTP process was performed on the training corpus to maintain
consistency between pre‑training and fine‑tuning.
Figure 13. ChnSentiCorp dataset experimental results.

The experimental results of the Weibo_senti_100k dataset are shown in Table 6 and
Figure 14 below, and the LTP process was performed on the training corpus to maintain
Appl. Sci. 2023, 13, 4056 consistency between pre-training and fine-tuning. 16 of 18

Table 6. Weibo_senti_100k dataset experimental results.

Table 6. Weibo_senti_100k dataset experimental results.
Model P (%) R (%) F1-Score (%)
LSTM 90.27
Model P (%)90.58 R (%) 90.42
F1‑Score (%)
BiLSTM 93.28
LSTM 90.27 93.30 90.58 93.29
90.42
BiLSTM 93.28 93.30 93.29
BERT 94.24
BERT 94.24
94.23 94.23
94.24
94.24
XLNet 94.28
XLNet 94.28 94.15 94.15 94.21
94.21
CWSXLNet CWSXLNet
95.02 95.02 94.83 94.83 95.01
95.01
CWSXLNet‑BiGRU‑Attention 95.67 95.48 95.57
CWSXLNet-BiGRU-Attention 95.67 95.48 95.57

Figure 14. 14.

Figure Weibo_senti_100k
Weibo_senti_100kdataset experimental
dataset experimental results.
results.

FromFrom theexperimental
the experimental results, bothboth
results, the CWSXLNet model and
the CWSXLNet the CWSXLNet‑BiGRU‑
model and the CWSXLNet-
Attention model achieve better results in dealing with Chinese sentiment analysis tasks.
BiGRU-Attention model achieve better results in dealing with Chinese sentiment analysis
On ChnSentiCorp dataset, CWSXLNet achieved 89.91% precision, 91.53% recall rate
tasks.
and 90.71% F1‑score, and CWSXLNet‑BiGRU‑Attention has achieved 92.61% precision,
On ChnSentiCorp
93.19% dataset,
recall rate and 92.90% CWSXLNet
F1‑score. achievedChinese
For comparison, 89.91% precision,
pre‑trained 91.53%
XLNet recall rate
model
andproposed
90.71% by F1-score, and CWSXLNet-BiGRU-A
[31] achieved 88.83% precision, 87.92% tt ention
recall rate has achieved
and 88.37% 92.61%
F1‑score precision
on the
samerecall
93.19% dataset.
rate and 92.90% F1-score. For comparison, Chinese pre-trained XLNet mode
On Weibo_senti_100k
proposed by [31] achieved 88.83% dataset,precision,
CWSXLNet87.92% achieved 95.02%
recall rateprecision,
and 88.37%94.83% recall on the
F1-score
rate and 95.01% F1‑score, and CWSXLNet‑BiGRU‑Attention has achieved 95.67% preci‑
same dataset.
sion, 95.48% recall rate and 95.57% F1‑score. For comparison, Chinese pre‑trained XLNet
On Weibo_senti_100k
model dataset,
proposed by [31] achieved CWSXLNet
94.28% precision, achi eved
94.15% 95.02%
recall precision,
rate and 94.83% recal
94.21% F1‑score.
rate andThe 95.01% F1-score,
experimental andindicated
results CWSXLNet-BiGRU-A ttention
that the Chinese word has achieved
separation 95.67%
information can preci
sion,help
95.48% recall
the XLNet rateto
model and 95.57% Chinese
understand F1-score. For comparison,
semantics, Chinese of
and the performance pre-trained
CWSXLNet‑XLNe
BiGRU‑Attention
model proposed by is[31] better than that
achieved of CWSXLNet
94.28% precision, alone, indicating
94.15% recall that
ratethe
andBiGRU
94.21%net‑
F1-score
work and the self‑attention mechanism are more accurate and effective in
The experimental results indicated that the Chinese word separation information cancontrolling the
sentiment keywords.
help the XLNet model to understand Chinese semantics, and the performance o
CWSXLNet-BiGRU-A
6. Conclusions ttention is better than that of CWSXLNet alone, indicating that the
BiGRU network andwe
In this paper, theproposed
self-attention mechanism
a method to improveare
themore
XLNetaccurate and
model for eﬀective
Chinese lan‑ in con
trolling
guagethe sentiment
processing by keywords.
addressing the importance of word separation in Chinese language
processing and combining it with the SentencePiece tool used by the XLNet model. Experi‑
mental evidence shows that the CWSXLNet proposed in this paper outperforms XLNet on
6. Conclusions
Chinese sentiment analysis tasks. Meanwhile, the CWSXLNet‑BiGRU‑Attention structure
In this
model paper,inwe
proposed thisproposed a method
paper proceeds furtherto improve
and achievesthe XLNet
better modelon
performance fortheChinese
Chi‑ lan
guage
neseprocessing by addressing
sentiment analysis the importance
task. However, of word
the pre‑training separation
method in Chinese
of the XLNet model pro‑ language
processing and combining it with the SentencePiece tool used by the XLNet model
posed in this paper still has some shortcomings; for example, there is no better treatment
Appl. Sci. 2023, 13, 4056 17 of 18

for English and numbers. In other words, the CWSXLNet model is language‑dependent
and only supports Chinese at present. In further studies, we will focus on these shortcom‑
ings and commit to constructing word lists in different languages.

Author Contributions: Conceptualization, S.G.; methodology, S.G.; formal analysis, S.G.; software,
S.G., L.Y. and C.Z.; validation, S.G.; writing—original draft, S.G.; investigation, Y.H.; data curation,
Y.H, L.Y. and C.Z.; visualization, Y.H., L.Y. and C.Z.; supervision, Y.H.; resources, Y.H.; project ad‑
ministration, B.H.; funding acquisition, B.H.; writing—review & editing, B.H. All authors have read
and agreed to the published version of the manuscript.
Funding: This research was funded by National Natural Science Foundation of China grant number
61962005.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data underlying this article will be shared upon reasonable request
to the corresponding author.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Wang, H.; Zhou, C.; Li, L. Design and Application of a Text Clustering Algorithm Based on Parallelized K‑Means Clustering.
Rev. D’intelligence Artif. 2019, 33, 453–460. [CrossRef]
2. Kiritchenko, S.; Zhu, X.; Mohammad, S.M. Sentiment analysis of short informal texts. J. Artif. Intell. Res. 2014, 50, 723–762.
[CrossRef]
3. Yadollahi, A.; Shahraki, A.G.; Zaiane, O.R. Current state of text sentiment analysis from opinion to emotion mining. ACM
Comput. Surv. (CSUR) 2017, 50, 1–33. [CrossRef]
4. Bansal, N.; Sharma, A.; Singh, R.K. An Evolving Hybrid Deep Learning Framework for Legal Document Classification. Ingénierie
Des Systèmes D’information 2019, 24, 425–431. [CrossRef]
5. Khoo, C.S.; Johnkhan, S.B. Lexicon‑based sentiment analysis: Comparative evaluation of six sentiment lexicons. J. Inf. Sci. 2018,
44, 491–511. [CrossRef]
6. Sebastiani, F.; Esuli, A. Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of the 5th Inter‑
national Conference on Language Resources and Evaluation, Genoa, Italy, 22–28 May 2006.
7. Esuli, A.; Sebastiani, F. SentiWordNet: A high‑coverage lexical resource for opinion mining. Evaluation 2007, 17, 26.
8. Baccianella, S.; Esuli, A.; Sebastiani, F. Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining.
Lrec 2010, 10, 2200–2204.
9. Wu, X.; Lü, H.; Zhuo, S. Sentiment analysis for Chinese text based on emotion degree lexicon and cognitive theories. J. Shanghai
Jiaotong Univ. 2015, 20, 1–6. [CrossRef]
10. Wang, S.M.; Ku, L.W. ANTUSD: A large Chinese sentiment dictionary. In Proceedings of the Tenth International Conference on
Language Resources and Evaluation (LREC’16), Portorož, Slovenia, 23 May 2016.
11. Yang, L.; Li, Y.; Wang, J.; Sherratt, R.S. Sentiment analysis for E‑commerce product reviews in Chinese based on sentiment lexicon
and deep learning. IEEE Access 2020, 8, 23522–23530. [CrossRef]
12. Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans Neural
Netw Learn Syst. 2016, 28, 2222–2232. [CrossRef]
13. Xiao, Z.; Liang, P. Chinese sentiment analysis using bidirectional LSTM with word embedding. In Proceedings of the Cloud
Computing and Security: Second International Conference, Nanjing, China, 29–31 July 2016.
14. Gan, C.; Feng, Q.; Zhang, Z. Scalable multi‑channel dilated CNN–BiLSTM model with attention mechanism for Chinese textual
sentiment analysis. Future Gener. Comput. Syst. 2021, 118, 297–309. [CrossRef]
15. Miao, Y.; Ji, Y.; Peng, E. Application of CNN‑BiGRU Model in Chinese short text sentiment analysis. In Proceedings of the 2019
2nd International Conference on Algorithms, Computing and Artificial Intelligence, Sanya, China, 20–22 December 2019.
16. Zhang, B.; Zhou, W. Transformer‑Encoder‑GRU (TE‑GRU) for Chinese Sentiment Analysis on Chinese Comment Text. Neural
Process. Lett. 2022, 1–21. [CrossRef]
17. Liang, B.; Su, H.; Gui, L.; Cambria, E.; Xu, R. Aspect‑based sentiment analysis via affective knowledge enhanced graph convolu‑
tional networks. Knowl. Based Syst. 2022, 235, 107643. [CrossRef]
18. Cambria, E.; Liu, Q.; Decherchi, S.; Xing, F.; Kwok, K. SenticNet 7: A commonsense‑based neurosymbolic AI framework for
explainable sentiment analysis. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, Marseille,
France, 21–23 June 2022.
19. Jain, D.K.; Boyapati, P.; Venkatesh, J.; Prakash, M. An intelligent cognitive‑inspired computing with big data analytics framework
for sentiment analysis and classification. Inf. Process. Manag. 2022, 59, 102758. [CrossRef]
Appl. Sci. 2023, 13, 4056 18 of 18

20. Sitaula, C.; Basnet, A.; Mainali, A.; Shahi, T.B. Deep learning‑based methods for sentiment analysis on Nepali COVID‑19‑related
tweets. Comput. Intell. Neurosci. 2021, 2021, 2158184. [CrossRef] [PubMed]
21. Shang, C.; Li, M.; Feng, S.; Jiang, Q.; Fan, J. Feature selection via maximizing global information gain for text classification. Knowl.
Based Syst. 2013, 54, 298–309. [CrossRef]
22. Devlin, J.; Chang, M.‑W.; Lee, K.; Toutanova, K. Bert: Pre‑training of deep bidirectional transformers for language understanding.
arXiv 2018, arXiv:1810.04805.
23. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lweis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly
optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692.
24. Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A Lite BERT for Self‑supervised Learning of
Language Representations. arXiv 2019, arXiv:1909.11942.
25. Clark, K.; Luong, M.T.; Le, Q.V.; Manning, C.D. Electra: Pre‑training text encoders as discriminators rather than generators.
arXiv 2020, arXiv:2003.10555.
26. Li, M.; Chen, L.; Zhao, J.; Li, Q. Sentiment analysis of Chinese stock reviews based on BERT model. Appl. Intell. 2021, 51,
5016–5024. [CrossRef]
27. Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language
understanding. Adv. Neural Inf. Process. Syst. 2019, 32, 1–18.
28. Salma, T.D.; Saptawati, G.A.P.; Rusmawati, Y. Text Classification Using XLNet with Infomap Automatic Labeling Process. In
Proceedings of the 2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA),
Bandung, Indonesia„ 29–30 September 2021.
29. Yan, R.; Jiang, X.; Dang, D. Named entity recognition by using XLNet‑BiLSTM‑CRF. Neural Process. Lett. 2021, 53, 3339–3356.
[CrossRef]
30. Gong, X.R.; Jin, J.X.; Zhang, T. Sentiment analysis using autoregressive language modeling and broad learning system. In
Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), San Diego, CA, USA, 18–
21 November 2019.
31. Alduailej, A.; Alothaim, A. AraXLNet: Pre‑trained language model for sentiment analysis of Arabic. J. Big Data 2022, 9, 72.
[CrossRef]
32. Cui, Y.; Che, W.; Liu, T.; Qin, B.; Wang, S.; Hu, G. Revisiting pre‑trained models for Chinese natural language processing. arXiv
2020, arXiv:2004.13922.
33. Kudo, T.; Richardson, J. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text
processing. arXiv 2018, arXiv:1808.06226.
34. Sennrich, R.; Haddow, B.; Birch, A. Neural machine translation of rare words with subword units. arXiv 2015, arXiv:1508.07909.
35. Che, W.; Feng, Y.; Qin, L.; Liu, T. N‑LTP: An open‑source neural language technology platform for Chinese. arXiv 2020,
arXiv:2009.11616.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual au‑
thor(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

XLNet for Sentiment Analysis
No ratings yet
XLNet for Sentiment Analysis
9 pages
Sentiment Analysis On Data of Social Media: Aditya Zaware
No ratings yet
Sentiment Analysis On Data of Social Media: Aditya Zaware
5 pages
Hotel Sentiment Analysis via LSTM
No ratings yet
Hotel Sentiment Analysis via LSTM
13 pages
Sentiment Analysis On Data of Social Media
No ratings yet
Sentiment Analysis On Data of Social Media
4 pages
SSRN Id3349572
No ratings yet
SSRN Id3349572
4 pages
Machine Learning Sentiment Analysis
No ratings yet
Machine Learning Sentiment Analysis
20 pages
Research Ashish
No ratings yet
Research Ashish
7 pages
Lexi Can
No ratings yet
Lexi Can
6 pages
Reasearch Paper
100% (1)
Reasearch Paper
9 pages
A Study of The Application of Weight Distributing Method Combining Sentiment Dictionary and TF-IDF For Text Sentiment Analysis
No ratings yet
A Study of The Application of Weight Distributing Method Combining Sentiment Dictionary and TF-IDF For Text Sentiment Analysis
10 pages
Sentiment Analysis Using Neural Networks A New Approach
No ratings yet
Sentiment Analysis Using Neural Networks A New Approach
5 pages
Sentiment Analysis Over Social Networks: An
No ratings yet
Sentiment Analysis Over Social Networks: An
6 pages
Minor Fnal
No ratings yet
Minor Fnal
22 pages
A Supervised Framework For Sentiment Analysis: A Two-Stage Approach
No ratings yet
A Supervised Framework For Sentiment Analysis: A Two-Stage Approach
30 pages
Review of Sentiment Analysis: An Hybrid Approach
No ratings yet
Review of Sentiment Analysis: An Hybrid Approach
31 pages
Ambiguity Resolution in English Lang For Sentiment Analysis
No ratings yet
Ambiguity Resolution in English Lang For Sentiment Analysis
6 pages
Cin2015 715730
No ratings yet
Cin2015 715730
9 pages
Formation of Smart Sentiment Analysis Technique For Big Data
No ratings yet
Formation of Smart Sentiment Analysis Technique For Big Data
8 pages
Deep Learning-Based Sentiment Analysis of Text Using Long Short-Term Memory Networks
No ratings yet
Deep Learning-Based Sentiment Analysis of Text Using Long Short-Term Memory Networks
7 pages
49 BC
No ratings yet
49 BC
5 pages
14 28 1 PB
No ratings yet
14 28 1 PB
19 pages
Picet Presentation
No ratings yet
Picet Presentation
12 pages
CSIT LRS Modified
No ratings yet
CSIT LRS Modified
32 pages
1 TweetsIJCSNS
No ratings yet
1 TweetsIJCSNS
8 pages
Engineering Reports - 2022 - Omuya - Sentiment Analysis On Social Media Tweets Using Dimensionality Reduction and Natural
No ratings yet
Engineering Reports - 2022 - Omuya - Sentiment Analysis On Social Media Tweets Using Dimensionality Reduction and Natural
14 pages
PDF
No ratings yet
PDF
5 pages
A Comprehensive Study On Lexicon Based Approaches For Sentiment Analysis
No ratings yet
A Comprehensive Study On Lexicon Based Approaches For Sentiment Analysis
7 pages
A Review On Recent Advances in Deep Learning For
No ratings yet
A Review On Recent Advances in Deep Learning For
9 pages
1 s2.0 S187705091630463X Main
No ratings yet
1 s2.0 S187705091630463X Main
6 pages
2022 - IJIIDS-100232 - PPV Copie
No ratings yet
2022 - IJIIDS-100232 - PPV Copie
21 pages
A Review On Recent Advances in Deep Learning
No ratings yet
A Review On Recent Advances in Deep Learning
9 pages
Sentiment Classification Using Hybrid Textblob Bi-Lstm Deep Learning Model
No ratings yet
Sentiment Classification Using Hybrid Textblob Bi-Lstm Deep Learning Model
6 pages
Plati 1
No ratings yet
Plati 1
16 pages
Document Analysis
No ratings yet
Document Analysis
6 pages
Aspect-Based Sentiment Analysis: A Survey of Deep Learning Methods
No ratings yet
Aspect-Based Sentiment Analysis: A Survey of Deep Learning Methods
18 pages
Paper1 PDF
No ratings yet
Paper1 PDF
6 pages
10.1007@s12559 020 09745 1
No ratings yet
10.1007@s12559 020 09745 1
33 pages
Sentiment Analysis Based On Weighted Word2vec and Att-LSTM
No ratings yet
Sentiment Analysis Based On Weighted Word2vec and Att-LSTM
5 pages
11 Sentiment Analysis A New Paradigm in Natural Language Processing
No ratings yet
11 Sentiment Analysis A New Paradigm in Natural Language Processing
4 pages
(IJCST-V9I4P3) : Shivaji Chabukswar, Renuka Chopade, Mona Saoji, Manjiri Kadu, Dr. Premchand Ambhore
No ratings yet
(IJCST-V9I4P3) : Shivaji Chabukswar, Renuka Chopade, Mona Saoji, Manjiri Kadu, Dr. Premchand Ambhore
3 pages
A Literature Review Enhancing Sentiment
No ratings yet
A Literature Review Enhancing Sentiment
11 pages
Information Sciences: Li Kong, Chuanyi Li, Jidong Ge, Feifei Zhang, Yi Feng, Zhongjin Li, Bin Luo
No ratings yet
Information Sciences: Li Kong, Chuanyi Li, Jidong Ge, Feifei Zhang, Yi Feng, Zhongjin Li, Bin Luo
17 pages
Sentiment Analysis On Movie Reviews Using RNN
No ratings yet
Sentiment Analysis On Movie Reviews Using RNN
10 pages
Godbole2007a PDF
No ratings yet
Godbole2007a PDF
4 pages
Sentiment Analysis Text POS Tagging On Movie Reviews Using NLTK
No ratings yet
Sentiment Analysis Text POS Tagging On Movie Reviews Using NLTK
5 pages
Sentiment Analysis Using Twitter Data: A Comparative Application of Lexicon and Machine Learning Based Approach
No ratings yet
Sentiment Analysis Using Twitter Data: A Comparative Application of Lexicon and Machine Learning Based Approach
14 pages
Short Text Sentiment Classification Using Bayesian
No ratings yet
Short Text Sentiment Classification Using Bayesian
16 pages
Machine Learning Sentiment Analysis
No ratings yet
Machine Learning Sentiment Analysis
5 pages
A Supervised Deep Learning-Based Sentiment Analysis by The Implementation of Word2Vec and Glove Embedding Techniques
No ratings yet
A Supervised Deep Learning-Based Sentiment Analysis by The Implementation of Word2Vec and Glove Embedding Techniques
34 pages
Sentiment Analysis Machine Learning
No ratings yet
Sentiment Analysis Machine Learning
5 pages
An Overview of Lexicon-Based Approach For Sentiment Analysis
No ratings yet
An Overview of Lexicon-Based Approach For Sentiment Analysis
6 pages
IJREST - Real Time Twitter
No ratings yet
IJREST - Real Time Twitter
6 pages
Challenges and Issues in Sentiment Analysis - A Comprehensive Survey
No ratings yet
Challenges and Issues in Sentiment Analysis - A Comprehensive Survey
18 pages
Multilingual Sentiment Lexicon Creation
No ratings yet
Multilingual Sentiment Lexicon Creation
11 pages
Verma 2018 Springerpaper
No ratings yet
Verma 2018 Springerpaper
8 pages
Ensemble Deep Learning for Sentiment Analysis
No ratings yet
Ensemble Deep Learning for Sentiment Analysis
10 pages
Mini Project
No ratings yet
Mini Project
16 pages
Sentiment Classification of Reviews Using Sentiwordnet: 9Th. It & T Conference
No ratings yet
Sentiment Classification of Reviews Using Sentiwordnet: 9Th. It & T Conference
10 pages
Apply Word Vectors For Sentiment Analysis of APP Reviews
No ratings yet
Apply Word Vectors For Sentiment Analysis of APP Reviews
5 pages
Phase-2 Intelligent Chatbot Automated Assistance
No ratings yet
Phase-2 Intelligent Chatbot Automated Assistance
7 pages
Pretraining Part1 16 Mar 23 PDF
No ratings yet
Pretraining Part1 16 Mar 23 PDF
32 pages
Sentiment Classification and Aspect Based Sentiment Analysis On Yelp Reviews Using Deep Learning and Word Embeddings
No ratings yet
Sentiment Classification and Aspect Based Sentiment Analysis On Yelp Reviews Using Deep Learning and Word Embeddings
24 pages
Ai-05-00126 2
No ratings yet
Ai-05-00126 2
33 pages
AI and Deep Learning
No ratings yet
AI and Deep Learning
20 pages
A Brief Survey of Machine Learning and Deep Learni
No ratings yet
A Brief Survey of Machine Learning and Deep Learni
29 pages
EagleBot - A Chatbot Based Multi-Tier Question Answering System Fo
No ratings yet
EagleBot - A Chatbot Based Multi-Tier Question Answering System Fo
54 pages
Optimization of HR Recruitment Process Using Large Language Model (LLM)
No ratings yet
Optimization of HR Recruitment Process Using Large Language Model (LLM)
5 pages
AI-Enhanced VR Object Selection
No ratings yet
AI-Enhanced VR Object Selection
14 pages
Finacial News Summary and Sentiment Report
No ratings yet
Finacial News Summary and Sentiment Report
3 pages
IntellBot Retrieval Augmented LLM Chatbot For Cybe
No ratings yet
IntellBot Retrieval Augmented LLM Chatbot For Cybe
22 pages
《A Primer on Large Language Models and their Limitations
No ratings yet
《A Primer on Large Language Models and their Limitations
33 pages
Advanced Techniques in Sentiment Analysis
No ratings yet
Advanced Techniques in Sentiment Analysis
6 pages
Transformers for AI Enthusiasts
No ratings yet
Transformers for AI Enthusiasts
11 pages
Top 50 LinkedIn LLM Interview Questions
100% (1)
Top 50 LinkedIn LLM Interview Questions
12 pages
2) A - Transformer-Based - Architecture - For - The - Automatic - Detection - of - Clickbait - For - Ar
No ratings yet
2) A - Transformer-Based - Architecture - For - The - Automatic - Detection - of - Clickbait - For - Ar
5 pages
Screenshot 2024-11-17 at 1.19.22 AM
No ratings yet
Screenshot 2024-11-17 at 1.19.22 AM
27 pages
2023 Sigmorphon-1 4
No ratings yet
2023 Sigmorphon-1 4
11 pages
A Complete Process of Text Classification System Using State of The Art NLP Models
No ratings yet
A Complete Process of Text Classification System Using State of The Art NLP Models
26 pages
Intro to RoBERTa for Beginners
No ratings yet
Intro to RoBERTa for Beginners
6 pages
On Technical Trading and Social Media Indicators For Cryptocurrency Price
No ratings yet
On Technical Trading and Social Media Indicators For Cryptocurrency Price
15 pages
T-BERTSum Topic-Aware Text Summarization Based On BERT
No ratings yet
T-BERTSum Topic-Aware Text Summarization Based On BERT
12 pages
A Reinforced Active Learning Approach For Optimal Sampling in Aspect Term 2022
No ratings yet
A Reinforced Active Learning Approach For Optimal Sampling in Aspect Term 2022
18 pages
Cross-Lingual Knowledge Graph Entity Alignment Based On Relation Awareness and Attribute Involvement
No ratings yet
Cross-Lingual Knowledge Graph Entity Alignment Based On Relation Awareness and Attribute Involvement
19 pages
IET Final Year Project - Making YouTube Transcript
No ratings yet
IET Final Year Project - Making YouTube Transcript
63 pages
Medical Image Analysis With Transformers
No ratings yet
Medical Image Analysis With Transformers
66 pages
Multilingual Legal Corpus Release
No ratings yet
Multilingual Legal Corpus Release
15 pages
Onco Bert Building An Interpretable Transfer Learning Bidirectional Encoder Representations From Transformers Framework For Longitudinal Survival Prediction of Cancer Patients
No ratings yet
Onco Bert Building An Interpretable Transfer Learning Bidirectional Encoder Representations From Transformers Framework For Longitudinal Survival Prediction of Cancer Patients
44 pages
LLMs in Medicine: A Guide for Doctors
No ratings yet
LLMs in Medicine: A Guide for Doctors
19 pages
Graphically Speaking Unmasking Abuse in Social Med
No ratings yet
Graphically Speaking Unmasking Abuse in Social Med
14 pages

applsci 13 04056 - 加水印

Uploaded by

applsci 13 04056 - 加水印

Uploaded by

applied

Appl. Sci. 2023, 13, 4056. https://doi.org/10.3390/app13064056 https://www.mdpi.com/journal/applsci

2.1.2. Dual Stream Self‑Attention Mechanism

A′ is obtained by softmax normalization of A:

The final output is calculated as O:

detached pre-trained corpus and waste space in the word list.

After generating tok_mask, the following operations

Figure 11. Masking of perm_mask.

4.1. Experimental Environment

Table 1. Experimental environment.

Table 2. Parameters in text pre‑processing.

Name Parameters Description

The details of the parameters in pre‑training are shown in Table 3.

Table 3. Parameters in pre‑training.

Name Parameters Description

The details of the parameters in fine tune are shown in Table 4.

Table 4. Parameters in fine tune.

Name Parameters Description

4.3. Data Sets

4.3.2. Weibo_senti_100k Dataset

4.4. Evaluation Indicators

Figure 13. ChnSentiCorp dataset experimental results.

Table 6. Weibo_senti_100k dataset experimental results.

Figure 14. 14.

You might also like