applsci 13 04056 - 加水印
applsci 13 04056 - 加水印
sciences
Article
CWSXLNet: A Sentiment Analysis Model Based on Chinese
Word Segmentation Information Enhancement
Shiqian Guo 1 , Yansun Huang 2 , Baohua Huang 1, * , Linda Yang 1 and Cong Zhou 1
1 School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China
2 Auditing Bureau of Xixiangtang, Nanning 530001, China
* Correspondence: [email protected]; Tel.: +86‑152‑9654‑4306
Abstract: This paper proposed a method for improving the XLNet model to address the shortcom‑
ings of segmentation algorithm for processing Chinese language, such as long sub‑word lengths, long
word lists and incomplete word list coverage. To address these issues, we proposed the CWSXLNet
(Chinese Word Segmentation XLNet) model based on Chinese word segmentation information en‑
hancement. The model first pre‑processed Chinese pretrained text by Chinese word segmentation
tool, and proposed a Chinese word segmentation attention mask mechanism by combining PLM (Per‑
muted Language Model) and two‑stream self‑attention mechanism of XLNet. While performing nat‑
ural language processing at word granularity, it can reduce the degree of masking between masked
and non‑masked words for two words belonging to the same word. For the Chinese sentiment
analysis task, proposed the CWSXLNet‑BiGRU‑Attention model, which introduces bi‑directional
GRU as well as self‑attention mechanism in the downstream task. Experiments show that CWSXL‑
Net has achieved 89.91% precision, 91.53% recall rate and 90.71% F1‑score, and CWSXLNet‑BiGRU‑
Attention has achieved 92.61% precision, 93.19% recall rate and 92.90% F1‑score on ChnSentiCorp
dataset, which indicates that CWSXLNet has better performance than other models in Chinese sen‑
timent analysis.
Keywords: sentiment analysis; Chinese word segmentation; XLNet; attention mask; machine learn‑
ing; natural language processing
Citation: Guo, S.; Huang, Y.; Huang,
B.; Yang, L.; Zhou, C. CWSXLNet: A
Sentiment Analysis Model Based on
Chinese Word Segmentation
1. Introduction
Information Enhancement. Appl. Sci.
2023, 13, 4056. https://doi.org/
In the era of big data, the amount of information on the Internet has increased dra‑
10.3390/app13064056
matically, including a large number of reviews posted by users on the Web. Most of these
reviews express users’ opinions and evaluations of products and services, which contain
Academic Editor: João M. a lot of potential value [1]. The purpose of sentiment analysis techniques is to uncover the
F. Rodrigues
emotions and attitudes expressed in them, but due to the complex nature of comment data,
Received: 26 February 2023 which is diverse, colloquial and abbreviated, it is particularly important to use computa‑
Revised: 19 March 2023 tional techniques to achieve automatic, in‑depth and accurate analysis and processing [2].
Accepted: 21 March 2023 The development of sentiment analysis has had a significant impact on the field of
Published: 22 March 2023 natural language processing. Natural language processing techniques and text analysis
methods are used to mine text and extract sentiment polarity from it [3]. Sentiment anal‑
ysis has a wide range of applications, such as reputation management, market research,
customer service, brand monitoring, and so on.
Copyright: © 2023 by the authors.
Based on the above, sentiment analysis is important in various fields such as business,
Licensee MDPI, Basel, Switzerland.
politics and society to help people better understand and respond to different emotions
This article is an open access article
and attitudes in society. Sentiment analysis has developed through three main stages: sen‑
distributed under the terms and
timent lexicons, machine learning and deep learning [4].
conditions of the Creative Commons
Attribution (CC BY) license (https://
Sentiment lexicon‑based approaches: The earliest approaches to sentiment analysis
creativecommons.org/licenses/by/
were mainly based on sentiment lexicons, which typically contained a large number of
4.0/). words, each of which was tagged with a sentiment polarity such as positive, negative or
neutral. The main use of sentiment lexicons in sentiment analysis is to automatically iden‑
tify and classify the sentiment polarity of texts. The creation of a sentiment lexicon typically
involves two aspects: word selection and sentiment annotation. For word selection, a large
number of words are usually collected from different sources (e.g., network texts, human
written texts, annotated datasets, etc.). The annotators have to annotate each vocabulary
with a positive, negative or neutral sentiment polarity according to predefined sentiment
classification criteria [5].
Several sentiment lexicons have been developed and are widely used in natural lan‑
guage processing. In the early years of research, Sebastiani Fabrizio et al. in [6–8] proposed
the SentiWordNet sentiment lexicon, a WordNet‑based sentiment lexicon that associates
each word with a set of sentiment strengths, including positive sentiment, negative senti‑
ment, and neutral sentiment. After a few years, Wu Xing et al. in [9], inspired by social
cognitive theories, combined basic sentiment value lexicon and social evidence lexicon to
improve the traditional polarity lexicon. In 2016, Wang Shih‑Ming et al. in [10] presented
the ANTU (Augmented NTU) Sentiment Dictionary, which was constructed by collecting
sentiment statistics for words from several sentiment annotation exercises. A total of 26,021
Chinese words were collected in ANTUSD. In 2020, Yang Li et al. in [11] proposes a new
sentiment analysis model SLCABG based on a sentiment lexicon that combines a convolu‑
tional neural network (CNN) with a bi‑directional gated recurrent unit (BiGRU) based on
an attention mechanism. The sentiment lexicon was used to enhance the sentiment features
in the comments. CNN and BiGRU networks are used to extract the main sentiment and
contextual features from the comments and weight them using the attention mechanism.
The advantage of sentiment dictionary‑based methods is that they are simple and fast,
but they require manual construction and updating of sentiment dictionaries, and they do
not work well for some special texts (e.g., texts with complex semantics such as irony and
metaphor).
Machine learning based approaches: With the development of machine learning tech‑
niques, models such as RNN, LSTM, CRF and GRU are gradually being proposed by re‑
searchers and people are using machine learning algorithms for text sentiment analysis.
LSTM (Long Short‑Term Memory) [12] is a recurrent neural network (RNN) model
commonly used to process sequential data. LSTM can effectively solve the long‑term de‑
pendency problem in RNN by introducing a special memory unit. The BiLSTM model can
be thought of as processing the input sequence from left to right for one LSTM model and
from right to left for another LSTM model, and finally merging their outputs. The advan‑
tage of this is that not only the previous information but also the subsequent information
can be considered when processing the input at the current time step. Xiao Zheng et al.
in [13] used a bidirectional LSTM (BiLSTM) model for sentiment analysis. The experimen‑
tal results show that BiLSTM outperforms CRF and LSTM for Chinese sentiment analysis.
Similarly, Gan Chenquan et al. in [14] proposed a scalable multi‑channel extended CNN‑
BiLSTM model with attention mechanism for Chinese text sentiment analysis, in which
the convolutional model CNN based on bridging the BiLSTM model and achieved better
result on several public Chinese sentiment analysis datasets.
In addition to LSTM, some researchers select GRU (Gated Recurrent Unit) to handle
sentiment analysis tasks. As a variant of LSTM, GRU has fewer parameters than LSTM, re‑
quires less training data and has a faster training speed. Miao YaLin et al. in [15] proposed
the adoption of the application of CNN‑BiGRU model in Chinese short text sentiment anal‑
ysis, which introduced the BiGRU model based on CNN. Zhang Binlong et al. in [16] pro‑
posed Transformer‑Encoder‑GRU (T‑E‑GRU) to solve the problem of transformer being
naturally insufficient compared to the recurrent model in capturing the sequence features
in the text through positional encoding. Both have achieved good experimental results in
the field of Chinese sentiment analysis.
To leverage the affective dependencies of the sentence, in 2020, Liang Bin et al. in [17]
proposed a graph convolutional network based on SenticNet [18] according to the specific
aspect, called Sentic GCN, and explored a novel solution to construct the graph neural net‑
Appl. Sci. 2023, 13, 4056 3 of 18
works via integrating the affective knowledge from SenticNet to enhance the dependency
graphs of sentences. Experimental results illustrate that SenticNet can beat state‑of‑the‑art
methods. In the same year, Jain Deepak Kumar et al. in [19] proposed BBSO‑FCM model
for sentiment analysis, used Binary Brain Storm Optimization (BBSO) algorithm for the
Feature Selection process and thereby achieved improved classification performance, and
Fuzzy Cognitive Maps (FCMs) were used as a classifier to classify the incidence of posi‑
tive or negative sentiments. Experimental values highlight the improved performance of
BBSO‑FCM model in terms of different measures. In 2021, Sitaula Chiranjibi et al. in [20]
proposed three different feature extraction methods and three different CNNs (Convo‑
lutional Neural Networks) to implement the features using a low resource dataset called
NepCOV19Tweets, which contains COVID‑19‑related tweets in Nepali language. By using
ensemble CNN, they ensemble the three CNNs models. Experimental results show that
proposed feature extraction methods possess the discriminating characteristics for the sen‑
timent classification, and the proposed CNN models impart robust and stable performance
on the proposed features.
However, machine learning‑based methods require large amounts of annotated data
to train the classifier, and require manual selection of features and algorithms. There is
a degree of subjectivity in feature extraction and algorithm selection, with good or bad
feature extraction directly affecting classification results [21] and not easily generalized to
a new corpus.
Deep learning‑based approaches: In recent years, the rise of deep learning techniques
has brought new breakthroughs in text sentiment analysis. In particular, the use of pre‑
trained language models (e.g., BERT, RoBERTa, XLNet, etc.) for fine‑tuning to solve sen‑
timent analysis problems has yielded very good results. This approach does not require
manual feature construction and can handle complex semantic relationships, and therefore
has very promising applications in the field of text sentiment analysis.
BERT (Bidirectional Encoder Representations from Transformers) is a pre‑trained lan‑
guage model based on the transformer structure proposed by Google in 2018, and is cur‑
rently one of the most representative and influential models in the field of natural language
processing [22]. Due to the excellent performance of the BERT model, various variants
derived from it are also widely used in the field of natural language processing, such as
RoBERTa [23], ALBERT [24], ELECTRA [25], etc. The emergence of the BERT model has
greatly promoted the development of the field of natural language processing and has
achieved leading scores in several benchmark tests, becoming an important milestone in
the field of natural language processing. Li Mingzheng et al. in [26] proposed a novel senti‑
ment analysis model for Chinese stock reviews based on BERT. This model relies on a pre‑
trained model to improve the classification accuracy. The model uses a BERT pre‑training
language model to perform sentence‑level representation of stock reviews, and then feeds
the obtained feature vector into the classifier layer for classification. In the experiments, we
demonstrate that our method has higher precision, recall and F1 than TextCNN, TextRNN,
Att‑BLSTM and TextCRNN. Our model can achieve the best results, which indicates its ef‑
fectiveness in Chinese stock review sentiment analysis. Meanwhile, our model has strong
generalization ability and can perform sentiment analysis in many fields.
In 2019, Google proposed XLNet [27], which uses the Permuted Language Model
(PLM) with a two‑stream self‑attention mechanism to outperform the BERT model in 20 nat‑
ural language processing tasks, achieving the best results in 18 tasks. Currently, XLNet
is widely used in natural language processing, covering tasks such as classification and
named entity recognition [28,29].
As part of text classification, sentiment analysis was also an important application of
XLNet. Gong Xin‑Rong et al. in [30] proposed a Broad Autoregressive Language Model
(BroXLNet) to automatically process the sentiment analysis task. BroXLNet integrates the
advantage of generalized autoregressive language modeling and broad learning system,
which has the ability of extracting deep contextual features and randomly searching high‑
Appl. Sci. 2023, 13, 4056 4 of 18
level contextual representation in broad spaces. BroXLNet achieved the best result of 94.0%
in sentiment analysis task of binary Stanford Sentiment Treebank.
XLNet was trained on different languages. Alduailej Alhanouf et al. in [31] proposed
AraXLNet model, which pre‑trained XLNet model in Arabic language for sentiment analy‑
sis. For Chinese language, Cui Yiming et al. in [32] published an unofficial XLNet Chinese
pre‑training model, which was trained from the Chinese Wikipedia corpus, but its word
segmentation model still suffers from the defects of excessively long word segmentation
length, infrequent use of word segmentation, and incomplete coverage of the word list.
To address the above problems, this paper proposes the CWSXLNet (Chinese Word
Segmentation XLNet) model, which improves the XLNet model. First, the original corpus
is segmented in the text pre‑processing stage and the corresponding segmentation codes
are generated; in the pre‑training stage, the corresponding segmentation mask codes are
generated according to the random sequence of PLM. Combined with the two‑stream self‑
attention mechanism and the attention mask with the segmentation mask codes, thus re‑
alising the improvement of Chinese sub‑word location information while using the single
Chinese character as the granularity. It is designed to solve the problem of the XLNet
model for Chinese language processing in terms of character‑to‑word granularity.
For the Chinese sentiment analysis tasks, this paper combines the above researches
and uses the BiGRU model in the downstream task, which can further extract the feature
information of the context. In addition, the CWSXLNet‑BiGRU‑Attention model is pro‑
posed by introducing the self‑attention mechanism in the model to increase its attention to
sentiment‑weighted words. It can further capture the sentiment keywords in the text and
achieve better results in Chinese sentiment analysis tasks.
2. Related Works
2.1. XLNet
XLNet is an autoregressive language model proposed by Google in 2019, inherited
from Transformer‑XL, using the PLM, two‑stream self‑attention mechanism and the
segment‑level recurrence with relative position encoding in Transformer‑XL, so that XL‑
Net has better long text reading ability.
2.1.1. PLM
Traditional autoregressive language models are trained in a unidirectional direction;
they can only predict based on the antecedent information of the predicted words, or from
backwards to forwards through the posterior text of the predicted words. PLM is a model
that generates several sequences of text in random order and masks the words at the end
of the sequence as predictors, assuming that the semantic information of all the preceding
words is available at the end of the sequence. PLM assumes that the end words of the
sequence can contain the semantic information of all the preceding words, and so makes
the prediction and finally completes the training of the model. This approach solves the
problem of the possible relationship between multiple masks being ignored in the BERT
model, and of small differences occurring in the pre‑training and fine‑tuning phases due
to the use of masks.
2.2.
2.2.Pre-Training
2.2. Pre-TrainingProcess
Pre‑Training Processof
Process ofofXLNet
XLNetModel
XLNet Model
Model
The pre-training
Thepre‑training
The process
pre-trainingprocess ofofXLNet
processof XLNet mainly
XLNetmainly consists
mainlyconsists ofoftwo
consistsof two stages:
stages:text
twostages: textpre-processing
text pre-processing
pre‑processing
stage and
stageand
stage pre-training
andpre‑training stage,
pre-trainingstage, and
stage,and the
andthe detailed
thedetailed flow
detailedflow chart
flowchart is shown
chartisisshown in Figure
shownininFigure 1.1.
Figure1.
(a)
(a) (b)
(b)
Figure
Figure 1.1.XLNet
Figure1. XLNettraining
XLNet trainingprocess.
training process.(a)
process. (a)The
(a) Thetext
The textpre-processing
text pre-processingprocess;
pre‑processing process;(b)
process; (b)The
(b) Thepre-training
The pre-trainingprocess.
pre‑training process.
process.
2.2.1.Text
2.2.1. TextPre-Processing
Pre‑Processing
Pre-Processing
XLNet
XLNetmodelmodelusesusesSentencePiece
SentencePiece[33], [33],a atokenizer
tokenizerprovided
providedby byGoogle. The
Google.The Senten‑
TheSenten-
Senten-
cePiece model is trained on
on the
the original
original text.
text. It uses the BPE algorithm
cePiece model is trained on the original text. It uses the BPE algorithm [34] to separate [34] to separate
words
wordsandandstatistics
statisticsfrom
fromthetheoriginal
originaltext,
text,obtaining
obtainingthe word
theword separation
wordseparation strategy
separationstrategy
strategyby by con‑
bycon-
con-
tinuously
tinuouslymerging
mergingthe themore
morefrequent
frequentsub‑words.
sub-words.
sub-words.
The
Thetrained
trainedSentencePiece
SentencePieceisisused usedtotosplit
splitthe theoriginal
originaltext
textinto
intovectors
vectorsand
andcomplete
complete
the word embedding. After the transformation, merging, splitting
the word embedding. After the transformation, merging, splitting and disruptingsteps,
After the transformation, merging, splitting and
and disrupting
disrupting steps,
steps,
each piece of training data is transformed into a Feature containing input,
each piece of training data is transformed into a Feature containing input, tgt, is_masked, tgt, is_masked,
is_masked,
seg_id
seg_idandandlabel
labelandandsosoon. The
on.The structure
Thestructure
structureof ofofinput
input
inputisisshown Figure22(the
shownininFigure (themaximum
maximum
length
length of the
lengthofofthe sentence
thesentence is assumed
sentenceisisassumed to be 128).
assumedtotobebe128).128).
Figure 2.2.Structure
Figure2.
Figure Structureof
Structure ofof“input”.
“input”.
“input”.
The first half of reuse_len is the reuse of the last 64 tokens of the previous input data,
and the second half is the structure “A + <SEP> + B + <SEP> + <CLS>”, where A and B
are two vectors with a <SEP> control character at the end of A and B, respectively, as a
statement separator A and B are two text vectors, and a <CLS> control character at the end
of the entire data as the end of the entire input data.
Vector A and vector B have a 50% probability of being consecutive contexts and a
further 50% probability of being randomly chosen vector B. The lengths of A and B are not
The first half of reuse_len is the reuse of the last 64 tokens of the previous input data,
and the second half is the structure “A + <SEP> + B + <SEP> + <CLS>”, where A and B are
two vectors with a <SEP> control character at the end of A and B, respectively, as a state-
ment separator A and B are two text vectors, and a <CLS> control character at the end of
Appl. Sci. 2023, 13, 4056 the entire data as the end of the entire input data. 6 of 18
Vector A and vector B have a 50% probability of being consecutive contexts and a
further 50% probability of being randomly chosen vector B. The lengths of A and B are not
fixed,but
fixed, butthe
thelength
lengthof of“A
“A++<SEP>
<SEP>++BB++<SEP> <SEP>++<CLS>”
<CLS>”isisequal
equalto tothe
themaximum
maximumlength
length
of the model sentence.
of the model sentence.
Together with
Together with thethe input,
input, tgt,
tgt, is_masked,
is_masked, seg_id
seg_id and
and label
label are
are generated,
generated, which
which to‑
to-
gether form a Feature, and Features together form
gether form a Feature, and Features together form TFRecord files. TFRecord files.
Wheretgt
Where tgtisisthe
thetarget
targetvector
vectorof ofinput,
input,with
withaalength
lengthofof128
128tokens,
tokens,the
thefirst
first126
126tokens
tokens
are the
are the next
next token
token corresponding
corresponding to to input,
input, which
which isis equivalent
equivalent to to shifting
shifting the
the whole
whole of of
input to the left by one token length, and the last two tokens
input to the left by one token length, and the last two tokens are <CLS>. are <CLS>.
Is_maskedindicates
Is_masked indicateswhich
whichof ofthethe128
128tokens
tokensin ininput
inputare
aremasked,
masked,assigning
assigning00to tothe
the
unmasked and 1 to the
unmasked and 1 to the masked. masked.
Seg_id isis used
Seg_id used to distinguish vector vector A Afrom
fromvector
vectorB,B,where
wherereuse
reuse+ A
+A + <SEP>
+ <SEP>is as-
is
signed 0,0,BB+ +<SEP>
assigned <SEP>isisassigned
assigned1 1andand<CLS>
<CLS>isisassigned
assigned2.2.
Labelisisused
Label usedto todistinguish
distinguishwhether
whethervectorvectorAAisiscontinuous
continuouswith withvector
vectorB.B.
2.2.2.
2.2.2.Pre‑Training
Pre-Training
In
Inthe
thepre‑training
pre-trainingphase,
phase, the
the XLNet
XLNet model
modelreadsreadsthe
theTFRecords
TFRecordsdatadatagenerated
generatedinin
the
the pre‑processing
pre-processing phase, performs
performs random
randomreordering
reorderingon oneach
eachdata,
data, and
and finally
finally gener‑
generates
ates
the the perm_mask
perm_mask matrix
matrix corresponding
corresponding to toeach
eachdata,
data, as
as shown
shown in Figure
Figure 3.3. Where
Where
perm_mask[i][j]
perm_mask[i][j] = 1 (hollow circle) indicates that the ith token cannot detect the jthtoken
= 1 (hollow circle) indicates that the ith token cannot detect the jth token
after
afterreordering;
reordering;conversely,
conversely,perm_mask[i][j]
perm_mask[i][j]==00(solid
(solidcircle)
circle)indicates
indicatesthat
thatthetheith
ithtoken
token
can
candetect
detectthe
thejth
jthtoken.
token.
Figure3.3.An
Figure Anexample
exampleof
ofperm_mask
perm_maskmatrix.
matrix.
In
In the
the subsequent
subsequent pre‑training
pre-training process,
process, the
the perm_mask
perm_mask matrix
matrix isis transformed
transformed into
into
attn_mask
attn_maskmatrix
matrixafter
aftersplitting,
splitting,splicing
splicingand
anddeformation
deformationoperations,
operations,and
andisisused
usedin
inthe
the
calculation
calculationof ofattn_score
attn_scorewith
withthetheformula
formulaasasininEquation
Equation(1).
(1).
attn_score==attn_score
attn_score attn_score − 30
− 10 × attn_mask.
10× attn_mask. (1)
(1)
If the element in attn_mask[i][j] is 0, it means that i can notice j, and attn_score[i][j]
If the element in attn_mask[i][j] is 0, it means that i can notice j, and attn_score[i][j]
remains unchanged at that time.
remains unchanged at that time.
If the element in attn_mask[i][j] is 1, it means that i cannot notice j, and attn_score[i][j]
becomes a large negative number, resulting in the probability of the next softmax operation
being close to 0 to achieve the effect of the mask.
2.3. BiGRU
The Gated Recurrent Unit (GRU) is a type of recurrent neural network. Compared
to LSTM, GRU streamlines the number of gates from three gates (forgetting gate, input
gate and output gate) to two gates (reset gate and update gate), and merges cell state and
becomes a large negative number, resulting in the probability of the next softmax opera-
tion being close to 0 to achieve the effect of the mask.
2.3. BiGRU
Appl. Sci. 2023, 13, 4056 The Gated Recurrent Unit (GRU) is a type of recurrent neural network. Compared 7 ofto18
LSTM, GRU streamlines the number of gates from three gates (forgetting gate, input gate
and output gate) to two gates (reset gate and update gate), and merges cell state and hid-
den state.state.
hidden In addition, GRU GRU
In addition, has fewer paramete
has fewer rs than LSTM,
parameters requires
than LSTM, less training
requires data
less training
and has a faster training speed.
data and has a faster training speed.
The
TheGRU
GRUarchitecture
architectureisisshown
shownininFigure
Figure4.4.
Figure4.4.Structure
Figure StructureofofGRU
GRUunit.
unit.
Thetwo
The twodoors
doorsofofthe
theGRU
GRUare
arecalculated
calculatedasasfollows:
follows:
Reset gate r
Reset gate 𝑟 t::
rt = σ (W r xt + U r ht−1 + br ). (2)
𝑟 = 𝜎(𝑊 𝑥 + 𝑈 ℎ + 𝑏 ). (2)
Update gate z : t
Update gate 𝑧 : z t = σ (W z x t + U z h t − 1 + b z ) . (3)
The formula for the calculation
𝑧 = 𝜎(𝑊of𝑥 the
+𝑈 candidate
ℎ + 𝑏hidden
). layers is (3)
∼
The formula for the calculation of the candidate hidden layers is
ht = tanh(Wxt + U (rt ⊙ ht−1 ) + b). (4)
ℎ = 𝑡𝑎𝑛ℎ(𝑊𝑥 + 𝑈(𝑟 ⊙ ℎ ) + 𝑏). (4)
Finally, the hidden layer information at time t is calculated ht
Finally, the hidden layer information at time t is calculated ℎ
∼
h = (1 − z ) ⊙ h −1 + z t ⊙ h t . (5)
(5)
ℎ = (t 1 − 𝑧 ) ⊙t ℎ t+ 𝑧 ⊙ℎ .
In
Inthe
theabove equation,𝜎σ isis the
aboveequation, thesigmoid
sigmoidactivation
activationfunction and⊙
functionand ⊙isisthe
theHadamard
Hadamard
product
productoperation
operationofofthe
thematrix.
matrix.The TheHadamard
Hadamardproduct
productofofmatrix
matrixAAandandmatrix
matrixBBisisde- de‑
noted as A⊙B.
noted as A ⊙ B. For matrix A = [aij ] and matrix B = [bij ], the elements of matrix A⊙B are de‑
A = [a ij] and matrix B = [bij], the elements of matrix A ⊙ B are
de finedasas
fined the
the product
product of of
thethe corresponding
corresponding elements
elements of the
of the twotwo matrices,
matrices, (A⊙(A
i.e., i.e., B)⊙
ij =B)
aijij b=ij .
aijbij. To capture the contextual information of the text, BiGRU, a bidirectional GRU net‑
work, To can
capture the contextual
be used, information
which consists of the GRU
of a forward text, BiGRU, a bidirectional
and a reverse GRU, asGRU shown net-in
Appl. Sci. 2023, 13, x FOR PEER REVIEW
work,
Figurecan 5. be
Theused, which
forward GRUconsists
provides of athe
forward
aboveGRU and a reverse
information GRU,
of the text and as the
shown 8 ofin18
reverse
Figure 5. The forward
GRU provides GRU
the below provides the
information abov
of the e to
text information
capture theofcontextual
the text and the reverse
information.
GRU provides the below information of the text to capture the contextual information.
Figure5.
Figure 5. Network
Network structure
structure of
of BiGRU.
BiGRU.
2.4. Self-Attention
Self-attention is able to notice the interconnectedness of words within the input ut-
terance, giving the model a stronger ability to grasp emotionally weighted words. Its ma-
trix form is computed as follows, first computing the three matrices Q, K and V :
Appl. Sci. 2023, 13, 4056 8 of 18
2.4. Self‑Attention
Self‑attention is able to notice the interconnectedness of words within the input utter‑
ance, giving the model a stronger ability to grasp emotionally weighted words. Its matrix
form is computed as follows, first computing the three matrices Q, K and V:
Q = W q · I, (6)
K = W k · I, (7)
V = W v · I. (8)
Wq , Wk , Wv are trainable parameter matrices and is a matrix composed of word vec‑
tors.
After calculating Q, K and V, we obtain A:
A = K T · Q. (9)
O = V · A′ . (11)
Combining each of the above steps obtains the formula for self‑attention:
QK T
Attention( Q, K, V ) = so f tmax ( √ )V. (12)
dk
3. CWSXLNet
3.1. Deficiencies of XLNet Model in Handling Chinese
When the XLNet model is used to process Chinese, because it uses SentencePiece
for text segmentation, and SentencePiece tends to split longer sub‑words when using the
BPE method for word segmentation, these longer words, when separated from the origi‑
nal corpus, tend to be used less frequently in other text, causing the waste of the limited
positional space of the word table. Furthermore, because SentencePiece tends to segment
longer words rather than individual Chinese character, the required word list is unusu‑
ally large. With a limited word list length of 32,000, there are still many commonly used
Chinese words not included in the word list, which can only be represented by <UNK>
during word embedding, affecting the model’s understanding of semantics and sentiment
analysis.
Figure 6 shows the sub‑words of the word list in [31], which proposed Chinese pre‑
trained XLNet model. The numbers in Figure 6 indicate the weights. It can be seen that
SentencePiece prefers to split out longer sub‑words, which are less frequently used in the
detached pre‑trained corpus and waste space in the word list.
Figure 6 shows the sub-words of the word list in [31], which proposed Chinese
trained XLNet model. The numbers in Figure 6 indicate the weights. It can be seen
Appl. Sci. 2023, 13, 4056 SentencePiece prefers to split out longer sub-words, which are less frequently
9 of 18 used in
Figure
Figure6.6.Word
Wordlists andand
lists weights in Chinese
weights XLNet.XLNet.
in Chinese
The basic unit of the Chinese language is the Chinese character. Several Chinese char‑
actersThe
make basic unit ofSeveral
up words. the Chinese
Chinese language
words form is the ChineseUnlike
sentences. character. Several
English, ChineseChinese c
actersare
words make up words.
not separated by Several Chinese
spaces, and words
there are form sentences.
approximately Unlike English,
55,000 commonly used Chi
words are
Chinese notFor
words. separated
comparison,by spaces,
the lengthand there are
of XLNet’s approximately
word list is 32,000. To55,000 commonly
avoid prob‑
lems
Chinesecaused by long
words. word
For lists, we reduce
comparison, the the granularity
length to the word
of XLNet’s Chinese character
list instead
is 32,000. To avoid p
of the Chinese word. However, this approach creates another problem: the relationship
lems caused by long word lists, we reduce the granularity to the Chinese character ins
between characters from the same word is lost. To avoid this, we discovered a way to
of the Chinese word. However, this approach creates another problem: the relation
improve the connection between characters from the same word.
between characters
First, we from
want to use the same
a tokenized toolword is lost.
to separate To avoid
Chinese wordsthis,
from we discovered
sentences, and a wa
improve
use spaces the connection
to separate Chinesebetween characters
words like in English. from the same
Secondly, word.the granularity
we reduce
to the First,
Chinese
wecharacter.
want to Finally, we need totool
use a tokenized improve the relationship
to separate Chinese between
wordscharacters
from sentences,
that form the same word, and make sure that the improvement
use spaces to separate Chinese words like in English. Secondly, we reduce fits perfectly to the XLNet
the granul
model. This is a brief introduction to our CWSXLNet model; we explain the model in more
to the Chinese character. Finally, we need to improve the relationship between chara
detail below.
that form the same word, and make sure that the improvement fits perfectly to the XL
model.
3.2. This isModel
CWSXLNet a brief introduction to our CWSXLNet model; we explain the model in m
detail
In below.
this paper, we propose the CWSXLNet model, which aims to solve the natural
non‑adaptation problem of the SentencePiece model used in the XLNet model for Chinese.
Improvements
3.2. CWSXLNet areModel
made in the data pre‑processing phase and the pre‑training phase of
XLNet.
InInthe
this paper,
data we propose
pre‑processing stage,the
the CWSXLNet model,
model used the which
LTP [35] aimsseparation
as a word to solve the nat
non-adaptation problem of the SentencePiece model used in the XLNet
tool to separate the original corpus with spaces between words. The following Figuremodel7for Chin
Improvements
shows an exampleare made
of the in the
text after data
word pre-proc
separation essing
using thephase
LTP wordandseparation
the pre-training
tool. pha
XLNet.
In the data pre-processing stage, the model used the LTP [35] as a word separa
tool to separate the original corpus with spaces between words. The following Figu
shows an example of the text after word separation using the LTP word separation to
Appl. Sci. 2023, 13, x FOR PEER REVIEW 10 of 18
Appl.
Appl. Sci.
Sci. 2023
2023,, 13
13,, x4056
FOR PEER REVIEW 1010ofof 18
18
(a) (b)
(a) (b)
Figure 7. Original text after using LTP. (a) The original text; (b) After using LTP.
Figure 7. Original text after using LTP. (a) The original text; (b) After using LTP.
Figure 7. Original text after using LTP. (a) The original text; (b) After using LTP.
In order to reduce the training granularity to words while retaining the word sepa-
In order
order to
In informationto reduce
reduce the trai ning granularity
theoriginal
training granularity to
to words
words while retaining
retaining the word
wordsepa-
ration in the text, CWSXLNet trains while
the SentencePiece themodel sepa‑
at the
ration
ration information
information in
in the
the original
original text,
text, CWSXLNet
CWSXLNet trains
trains the
the SentencePiece
SentencePiece model
model at
at the
the
granularity of a single Chinese character. First, a total of 14,516 Chinese characters in the
granularity
granularity of
of aa single
single Chinese
Chineseusing character.
character. First, aa total
First,tool.total of
of 14,516
14,516 Chinese
Chinese characters
characters in
in the
the
dictionary book are crawled a crawler These are fed into the SentencePiece
dictionary
dictionary book are crawled using a crawler tool. These These are
are fed
fed into
into the
the SentencePiece
SentencePiece
model as training text for character-based partitioning training. Finally, the SentencePiece
model
model as training text for character-based partitioning training. Finally, the SentencePiece
modelas is training
trained to segment character‑based
Chinese textspartitioning
as single characters. Finally, the SentencePiece
model
modelThe is
is trained
trained to segment Chin ese texts as single characters.
Chinese
original corpus is fed into the SentencePiece model, which is trained as described
The
TheInoriginal
original corpus
corpus is fed
fed into
into the
istraining the SentencePiece
SentencePiece model,
model, which
which is trained as
as described
above. the subsequent phase, the proposed model also is trained
uses described
the character “▁”
above.
above. In
In the
the subsequent
subsequent training
training phase,
phase, the
the proposed
proposed model
model also
also uses
uses the
thecharacter
character “▁ ”
“�”
as a word separation marker and refers to the character “ ▁” as a <TOK> marker, which is
as
as aa word
word separation
separation markermarker and and refers
refersto tothe
thecharacter
character““�” ▁” as a <TOK> marker, which which is
used to detect the boundary between words.
used
used to to detect
detect the boundary between words.
When the original text is processed to generate Features, the <TOK> token is used as
When
When the the original
original text
text is is processed
processed to to gene rate Features,
generate Features, the the <TOK>
<TOK> tokentoken isis used
used as as
a word delimiter to determine the position of each word. The tok_id vector of each data is
aa word
word delimiter
delimiter to determine the position of ea each
ch word. The The tok_id
tok_id vector
vector ofof each
each data
data isis
generated to determine which word of the data the current character belongs to, where
generated
generated to determine which word of the data the current character
determine which word of the data the current character belongs to, where belongs to, where the
the tok_id of the <TOK> token is 0 and the tok_id of the control characters <SEP> and
tok_id
the of the
tok_id of <TOK>
the <TOK> token is 0 and
token is 0 the
andtok_id of theof
the tok_id control characters
the control <SEP> <SEP>
characters and <CLS>and
<CLS> is −1; the is_TOK vector of the data is generated at the same time, where the posi-
is −1; the
<CLS> is −1;is_TOK vectorvector
the is_TOK of the of data
theisdata generated at theat
is generated same time, time,
the same wherewhere
the position
the posi- of
tion of the TOK token is true and the rest position is false, which is used to determine
the TOK
tion of the token
TOKis token
true and the rest
is true andposition
the restispo false,
sition which is used
is false, whichto determine
is used towhether
determine the
whether the current token is a TOK token or not. The above two pieces of data are stored
current token
whether is a TOK
the current tokentokenis a or
TOKnot.token
The above
or not.two Thepieces
above of two data are stored
pieces of datawith input,
are stored
with input, tgt, label, seg_id and is_masked in the TFRecord files.
tgt, label, seg_id and is_masked in the TFRecord
with input, tgt, label, seg_id and is_masked in the TFRecord files. files.
Figure 8 shows the process of generating the corresponding tok_id of a text with
Figure 88 shows
Figure shows the the process
process of of generating
generating the the corresponding
corresponding tok_id tok_id of of aa text
text with
with
is_TOK. For illustration
illustration purposes,
purposes, the the lengths
lengths of of reuse_len
reuse_len and and thethe vector
vector BB are
are set
set to
to 0.
is_TOK. For illustration purposes, the lengths of reuse_len and the vector B are set to 0.0.
is_TOK. For
Theoriginal
The originaltext textisis““今天天气真不错
今天天气真不错 (It’s (It’sreally
(It’s reallyniceniceweather
weathertoday)”.
today)”. The The LTP
LTP separated
separated
The original text is “今天天气真不错 really nice weather today)”. The LTP separated
the
the text to““今天
text to (today)”, ““天气
今天 (today)”, (weather)”, ““真
天气 (weather)”, (really)”, ““不错
真 (really)”, 不错 (nice)”.
(nice)”.
the text to “今天 (today)”, “天气 (weather)”, “真 (really)”, “不错 (nice)”.
Figure8.
Figure 8. Generation
Generation of
of vector
vector tok_id
tok_id and
and is_TOK.
is_TOK.
Figure 8. Generation of vector tok_id and is_TOK.
Appl.Sci.
Appl.
Appl. Sci.2023,
Sci. 2023,,13,
2023 13,,4056
13 xx FOR
FOR PEER
PEER REVIEW
REVIEW 11
11 of 18
11 of 18
18
Inputs
Inputsis isthe
theresult
resultofofvectorizing
vectorizingthe thetext
textbybyadding
addingthe the<SEP>
<SEP>and and<CLS>
<CLS>flags
flagstotothe
the
end
end of ofthe
theoriginal
originaltext,
text,where
where99isis<TOK>,
<TOK>,33isis<SEP> <SEP>and and44isis<CLS>.
<CLS>.
In
In the
the pre‑training
pre-training phase,
phase, the
the tok_mask
tok_mask matrix matrix isis computed
computed using using the
the same
same random
random
sequence,
sequence,while whilethe therandom
randomsequence
sequenceisisused usedtotogenerate
generatethe theperm_mask.
perm_mask.Following
Followingthe the
example in Figure 8, at this point, assuming the random sequence
example in Figure 8, at this point, assuming the random sequence index = [4, 6, 7, 10, 3, index = [4, 6, 7, 10, 3, 11,
0,
11,1, 0,
8, 9,
1, 2,8,5],
9, then
2, 5],the tok_index
then = [2, 3, 0,= −
the tok_index [2,1,3, −1,
2, 0, −1,1, 2,
1, 4,
−1,4,1,0, 1,
0],4,which
4, 0, is
0],how
whichreordered
is how
version
reordered of tok_id
versionbyofindex
tok_idis by
obtained.
index is obtained.
Matrix
Matrix A A is
is obtained
obtained by by transposing
transposing tok_index
tok_index and and broadcasting
broadcasting the the columns,
columns, and and
matrix
matrixBBisisobtained
obtainedby bybroadcasting
broadcastingthe therows
rowsof oftok_id.
tok_id.Elements
Elementsin intok_mask
tok_maskmatrixmatrixare are
set to 1 if Matrix A and B’s corresponding elements are equal
set to 1 if Matrix A and B’s corresponding elements are equal and 0 if they are not. The and 0 if they are not. The
calculation
calculationflowchart
flowchartisisshown
shownin inFigure
Figure9.9.
Figure9.
Figure
Figure 9.Generation
9. Generationof
Generation ofthe
of thetok_mask
the tok_maskmatrix.
tok_mask matrix.
matrix.
The
Thestructure
structureofofthe
thetok_mask
tok_maskmatrix
matrixisisshown
shownininFigure
Figure10.
10.tok_mask[i][j]
tok_mask[i][j]= =1 1(diago‑
(diag-
nal circle)
onal means
circle) that
means thethe
that ith ith
token belongs
token to the
belongs to same Chinese
the same wordword
Chinese as theasjththe
token after
jth token
disordering, and conversely tok_mask[i][j] = 0 (hollow circle) means that the ith
after disordering, and conversely tok_mask[i][j] = 0 (hollow circle) means that the ith token token does
not
doesbelong to theto
not belong same
the Chinese word word
same Chinese as theasjththe
token.
jth token.
Figure10.
Figure
Figure 10.The
10. Thetok_mask
The tok_maskmatrix.
tok_mask matrix.
matrix.
Appl. Sci. 2023
2023,, 13,
13, 4056
x FOR PEER REVIEW 12of
12 of 18
Appl. Sci. 2023, 13, x FOR PEER REVIEW 12 of 18
This is
This is reflected inin Equation (1),
(1), which calculates
calculates the attn_score.
attn_score.
This isreflected
reflected inEquation
Equation (1),which
which calculatesthethe attn_score.
At this
At point, the element values in the aattn_mask
ttn_mask matrix
matrix can
can be one
one of the four
four cases
At this point, the element values in the attn_mask matrix canbe
this point, the element values in the be oneofofthe
the fourcases
cases
1, 0, 11 −
1, 0, , αα–−1,1,asasshown
− αα, shown ininFigure
Figure12.
12.
1, 0, 1 − α, α – 1, as shown in Figure 12.
Figure 12.
Figure 12. The aattn_mask
ttn_mask matrix, where 1 (solid circle) and 00 (hollow circle) represent masked and
Figure 12.The
The attn_maskmatrix,
matrix,where
where11(solid
(solidcircle)
circle)and
and 0(hollow
(hollowcircle)
circle)represent
representmasked
maskedand
and
non-masked cases.
non‑masked cases.
non-masked cases.
As we see
see inFigure
Figure8,8,the
the LTP shows that Chinese word “天气
“ 天气 (weather)” is sepa-
Aswewe seeinin LTP
Figure 8, the shows
LTP that
shows Chinese
that word
Chinese word “天气(weather)” is separated
(weather)” is sepa-
a space. It indicates that Chinese character “ 天 (sky)” and “ 气
rated by a space. It indicates that Chinese character “天 (sky)” and “气 (air)” belong to
by (air)” belong to Chinese
rated“by
word 天气 a space. It indicates
(weather)”, and we that Chinese
hope character
to enhance “天 (sky)” and
the relationship “气 (air)”
between belong
“ 天 (sky)” to to
“
Chinese word “天气 (weather)”, and we hope to enhance the relationship between “天
气Chinese
(air)”, and, “天气 (weather)”,
wordconversely, “ 气 (air)”andand 天 (sky)”.
we“ hope to enhance theperm_mask
From the relationshipmatrix
between “天
in the
(sky)” to “ 气 (air)”, and, conversely, “气 (air)” and “天 (sky)”. From the perm_mask
(sky)” to “ 气 (air)”, and, conversely, “气 (air)” and “天 (sky)”. From the perm_mask
Appl. Sci. 2023, 13, 4056 13 of 18
middle of Figure 10, we know that “ 天 (sky)” is masked; to enhance the relationship, we
use the tok_mask matrix to reduce the degree of mask from “ 气 (air)” to “ 天 (sky)” and
increase attention from “ 天 (sky)” to “ 气 (air)”.
If the element in attn_mask[i][j] is 1‑α (solid diagonal circle), this means that originally
i could not notice j, but since i and j belong to the same Chinese word, the attentional
masking of j by i is “reduced” and the probability of attn_score[i][j] is reduced compared to
the original one, which increases the probability of the subsequent softmax prediction. For
example, the element in row 1 and column 5 in Figure 12 means that the first element “ 气
(air)” would not be able to notice the unordered fifth element “ 天 (sky)” after disordering,
because “ 天 (sky)” is masked for “ 气 (air)”. However, since “ 天 (sky)” and “ 气 (air)”
belong to the same Chinese word “ 天气 (weather)”, the extent of mask from “ 气 (air)”
to “ 天 (sky)” is “reduced”. Since “ 气 (air)” is not masked for “ 天 (sky)”, the attention is
increased from “ 气 (air)” to “ 天 (sky)”.
As we see in Figure 8, the LTP shows that Chinese word “ 不错 (nice)” is separated
by space. It indicates that Chinese character “ 不 (not)” and “ 错 (bad)” belong to Chinese
word “ 不错 (nice)”, and we hope to enhance the relationship between “ 不 (not)” and “
错 (bad)”, and, conversely, “ 错 (bad)” and “ 不 (not)”. From the perm_mask matrix in
the middle of Figure 10, we know that both “ 不 (not)” and “ 错 (bad)” are unmasked; to
enhance the relationship, we use the tok_mask matrix to increase attention from “ 不 (not)”
to “ 错 (bad)” and “ 错 (bad)” to “ 不 (not)”.
If attn_mask[i][j] is α−1 (hollow diagonal circle), this means that originally i can notice
j, but because i and j belong to the same phrase, i’s attention to j is “increased” compared
to the original, attn_score[i][j] is increased, which also increases the probability of softmax
prediction. For example, the elements in row 9 and column 10 in Figure 12 indicate that the
ninth element “ 不 (not)” after disordering is able to detect the tenth element “ 错 (bad)”,
which is not disordered. Since “ 不 (not)” and “ 错 (bad)” belong to the same Chinese word,
we increase the attention from “ 不 (not)” to “ 错 (bad)” and “ 错 (bad)” to “ 不 (not)”.
In summary, we have reduced the granularity of Chinese natural language processing
from Chinese sub‑word to single Chinese character. To solve the problem of information
loss by splitting Chinese words into characters, we proposed a method to enhance the rela‑
tionship between characters from the same word. The embodiment of this approach in the
XLNet model is the tok_mask matrix. To further enhance the sentiment analysis capability,
we combined BIGRU and self‑attention to form CWSXLNet. In Section 4, the experimental
result shows that it definitely improves the ability of Chinese sentiment analysis.
4. Experiment
To demonstrate the effectiveness of the CWSXLNet model and the CWSXLNet‑BiGRU‑
Attention structure proposed in this paper, experiments were conducted on two public
Chinese sentiment analysis datasets.
Name Parameters
Operating System Ubuntu 18.04
Memory 32 G
GPU Tesla T4
GPU Memory 16 G
Appl. Sci. 2023, 13, 4056 14 of 18
4.2. Parameters
The pre‑training corpus was selected from the Chinese Wikipedia corpus, with a total
size of about 2.5 G, and the tokenizer was selected from the LTP/base model. The details
of the parameters in text pre‑processing are shown in Table 2.
𝑇𝑃 (15)
𝑃=
4.4. Evaluation Indicators 𝑇𝑃 + 𝐹𝑃
The evaluation index consists of the precision rate P, the recall rate R and the F1‑score,
which are calculated as follows: 𝑇𝑃 (16)
𝑅= TP
P =𝑇𝑃 + 𝐹𝑁 (15)
TP + FP
R 2𝑅𝑃
TP
(16) (17)
𝐹 ==TP + FN
𝑅+𝑃
where TP is the number of positive samples 2RP FP is the
considered positive by the model, (17)
F1 =
R + P
number of negative samples considered positive by the model, and FN is the number of
where samples
positive TP is the considered
number of positive
negativesamples
by theconsidered
model. positive by the model, FP is the
number of negative samples considered positive by the model, and FN is the number of
positive samples considered negative by the model.
5. Results and Discussion
5. The experimental
Results results of the ChnSentiCorp dataset are shown in the following Ta-
and Discussion
ble 5 and
TheFigure 13, and the
experimental LTPofprocess
results was performed
the ChnSentiCorp on are
dataset the shown
training
incorpus to maintain
the following
consistency
Table 5 andbetween pre-training
Figure 13, and the LTPand fine-tuning.
process was performed on the training corpus to main‑
tain consistency between pre‑training and fine‑tuning.
Table 5. ChnSentiCorp dataset experimental results.
Table 5. ChnSentiCorp dataset experimental results.
Model P (%) R (%) F1-Score (%)
Model P (%) R (%) F1‑Score (%)
LSTM 84.25 84.01 84.12
LSTM 84.25 84.01 84.12
BiLSTM 86.94 86.50 86.72
BiLSTM 86.94 86.50 86.72
BERT 89.92
BERT 89.9289.90 89.90 89.91
89.91
XLNet XLNet
88.83 88.8387.92 87.92 88.37
88.37
CWSXLNet 89.91 91.53 90.71
CWSXLNet 89.91
CWSXLNet‑BiGRU‑Attention 92.61
91.53 93.19
90.71
92.90
CWSXLNet-BiGRU-Attention 92.61 93.19 92.90
The experimental results of the Weibo_senti_100k dataset are shown in Table 6 and
Figure 14 below, and the LTP process was performed on the training corpus to maintain
consistency between pre‑training and fine‑tuning.
Figure 13. ChnSentiCorp dataset experimental results.
The experimental results of the Weibo_senti_100k dataset are shown in Table 6 and
Figure 14 below, and the LTP process was performed on the training corpus to maintain
Appl. Sci. 2023, 13, 4056 consistency between pre-training and fine-tuning. 16 of 18
FromFrom theexperimental
the experimental results, bothboth
results, the CWSXLNet model and
the CWSXLNet the CWSXLNet‑BiGRU‑
model and the CWSXLNet-
Attention model achieve better results in dealing with Chinese sentiment analysis tasks.
BiGRU-Attention model achieve better results in dealing with Chinese sentiment analysis
On ChnSentiCorp dataset, CWSXLNet achieved 89.91% precision, 91.53% recall rate
tasks.
and 90.71% F1‑score, and CWSXLNet‑BiGRU‑Attention has achieved 92.61% precision,
On ChnSentiCorp
93.19% dataset,
recall rate and 92.90% CWSXLNet
F1‑score. achievedChinese
For comparison, 89.91% precision,
pre‑trained 91.53%
XLNet recall rate
model
andproposed
90.71% by F1-score, and CWSXLNet-BiGRU-A
[31] achieved 88.83% precision, 87.92% tt ention
recall rate has achieved
and 88.37% 92.61%
F1‑score precision
on the
samerecall
93.19% dataset.
rate and 92.90% F1-score. For comparison, Chinese pre-trained XLNet mode
On Weibo_senti_100k
proposed by [31] achieved 88.83% dataset,precision,
CWSXLNet87.92% achieved 95.02%
recall rateprecision,
and 88.37%94.83% recall on the
F1-score
rate and 95.01% F1‑score, and CWSXLNet‑BiGRU‑Attention has achieved 95.67% preci‑
same dataset.
sion, 95.48% recall rate and 95.57% F1‑score. For comparison, Chinese pre‑trained XLNet
On Weibo_senti_100k
model dataset,
proposed by [31] achieved CWSXLNet
94.28% precision, achi eved
94.15% 95.02%
recall precision,
rate and 94.83% recal
94.21% F1‑score.
rate andThe 95.01% F1-score,
experimental andindicated
results CWSXLNet-BiGRU-A ttention
that the Chinese word has achieved
separation 95.67%
information can preci
sion,help
95.48% recall
the XLNet rateto
model and 95.57% Chinese
understand F1-score. For comparison,
semantics, Chinese of
and the performance pre-trained
CWSXLNet‑XLNe
BiGRU‑Attention
model proposed by is[31] better than that
achieved of CWSXLNet
94.28% precision, alone, indicating
94.15% recall that
ratethe
andBiGRU
94.21%net‑
F1-score
work and the self‑attention mechanism are more accurate and effective in
The experimental results indicated that the Chinese word separation information cancontrolling the
sentiment keywords.
help the XLNet model to understand Chinese semantics, and the performance o
CWSXLNet-BiGRU-A
6. Conclusions ttention is better than that of CWSXLNet alone, indicating that the
BiGRU network andwe
In this paper, theproposed
self-attention mechanism
a method to improveare
themore
XLNetaccurate and
model for effective
Chinese lan‑ in con
trolling
guagethe sentiment
processing by keywords.
addressing the importance of word separation in Chinese language
processing and combining it with the SentencePiece tool used by the XLNet model. Experi‑
mental evidence shows that the CWSXLNet proposed in this paper outperforms XLNet on
6. Conclusions
Chinese sentiment analysis tasks. Meanwhile, the CWSXLNet‑BiGRU‑Attention structure
In this
model paper,inwe
proposed thisproposed a method
paper proceeds furtherto improve
and achievesthe XLNet
better modelon
performance fortheChinese
Chi‑ lan
guage
neseprocessing by addressing
sentiment analysis the importance
task. However, of word
the pre‑training separation
method in Chinese
of the XLNet model pro‑ language
processing and combining it with the SentencePiece tool used by the XLNet model
posed in this paper still has some shortcomings; for example, there is no better treatment
Appl. Sci. 2023, 13, 4056 17 of 18
for English and numbers. In other words, the CWSXLNet model is language‑dependent
and only supports Chinese at present. In further studies, we will focus on these shortcom‑
ings and commit to constructing word lists in different languages.
Author Contributions: Conceptualization, S.G.; methodology, S.G.; formal analysis, S.G.; software,
S.G., L.Y. and C.Z.; validation, S.G.; writing—original draft, S.G.; investigation, Y.H.; data curation,
Y.H, L.Y. and C.Z.; visualization, Y.H., L.Y. and C.Z.; supervision, Y.H.; resources, Y.H.; project ad‑
ministration, B.H.; funding acquisition, B.H.; writing—review & editing, B.H. All authors have read
and agreed to the published version of the manuscript.
Funding: This research was funded by National Natural Science Foundation of China grant number
61962005.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data underlying this article will be shared upon reasonable request
to the corresponding author.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Wang, H.; Zhou, C.; Li, L. Design and Application of a Text Clustering Algorithm Based on Parallelized K‑Means Clustering.
Rev. D’intelligence Artif. 2019, 33, 453–460. [CrossRef]
2. Kiritchenko, S.; Zhu, X.; Mohammad, S.M. Sentiment analysis of short informal texts. J. Artif. Intell. Res. 2014, 50, 723–762.
[CrossRef]
3. Yadollahi, A.; Shahraki, A.G.; Zaiane, O.R. Current state of text sentiment analysis from opinion to emotion mining. ACM
Comput. Surv. (CSUR) 2017, 50, 1–33. [CrossRef]
4. Bansal, N.; Sharma, A.; Singh, R.K. An Evolving Hybrid Deep Learning Framework for Legal Document Classification. Ingénierie
Des Systèmes D’information 2019, 24, 425–431. [CrossRef]
5. Khoo, C.S.; Johnkhan, S.B. Lexicon‑based sentiment analysis: Comparative evaluation of six sentiment lexicons. J. Inf. Sci. 2018,
44, 491–511. [CrossRef]
6. Sebastiani, F.; Esuli, A. Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of the 5th Inter‑
national Conference on Language Resources and Evaluation, Genoa, Italy, 22–28 May 2006.
7. Esuli, A.; Sebastiani, F. SentiWordNet: A high‑coverage lexical resource for opinion mining. Evaluation 2007, 17, 26.
8. Baccianella, S.; Esuli, A.; Sebastiani, F. Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining.
Lrec 2010, 10, 2200–2204.
9. Wu, X.; Lü, H.; Zhuo, S. Sentiment analysis for Chinese text based on emotion degree lexicon and cognitive theories. J. Shanghai
Jiaotong Univ. 2015, 20, 1–6. [CrossRef]
10. Wang, S.M.; Ku, L.W. ANTUSD: A large Chinese sentiment dictionary. In Proceedings of the Tenth International Conference on
Language Resources and Evaluation (LREC’16), Portorož, Slovenia, 23 May 2016.
11. Yang, L.; Li, Y.; Wang, J.; Sherratt, R.S. Sentiment analysis for E‑commerce product reviews in Chinese based on sentiment lexicon
and deep learning. IEEE Access 2020, 8, 23522–23530. [CrossRef]
12. Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans Neural
Netw Learn Syst. 2016, 28, 2222–2232. [CrossRef]
13. Xiao, Z.; Liang, P. Chinese sentiment analysis using bidirectional LSTM with word embedding. In Proceedings of the Cloud
Computing and Security: Second International Conference, Nanjing, China, 29–31 July 2016.
14. Gan, C.; Feng, Q.; Zhang, Z. Scalable multi‑channel dilated CNN–BiLSTM model with attention mechanism for Chinese textual
sentiment analysis. Future Gener. Comput. Syst. 2021, 118, 297–309. [CrossRef]
15. Miao, Y.; Ji, Y.; Peng, E. Application of CNN‑BiGRU Model in Chinese short text sentiment analysis. In Proceedings of the 2019
2nd International Conference on Algorithms, Computing and Artificial Intelligence, Sanya, China, 20–22 December 2019.
16. Zhang, B.; Zhou, W. Transformer‑Encoder‑GRU (TE‑GRU) for Chinese Sentiment Analysis on Chinese Comment Text. Neural
Process. Lett. 2022, 1–21. [CrossRef]
17. Liang, B.; Su, H.; Gui, L.; Cambria, E.; Xu, R. Aspect‑based sentiment analysis via affective knowledge enhanced graph convolu‑
tional networks. Knowl. Based Syst. 2022, 235, 107643. [CrossRef]
18. Cambria, E.; Liu, Q.; Decherchi, S.; Xing, F.; Kwok, K. SenticNet 7: A commonsense‑based neurosymbolic AI framework for
explainable sentiment analysis. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, Marseille,
France, 21–23 June 2022.
19. Jain, D.K.; Boyapati, P.; Venkatesh, J.; Prakash, M. An intelligent cognitive‑inspired computing with big data analytics framework
for sentiment analysis and classification. Inf. Process. Manag. 2022, 59, 102758. [CrossRef]
Appl. Sci. 2023, 13, 4056 18 of 18
20. Sitaula, C.; Basnet, A.; Mainali, A.; Shahi, T.B. Deep learning‑based methods for sentiment analysis on Nepali COVID‑19‑related
tweets. Comput. Intell. Neurosci. 2021, 2021, 2158184. [CrossRef] [PubMed]
21. Shang, C.; Li, M.; Feng, S.; Jiang, Q.; Fan, J. Feature selection via maximizing global information gain for text classification. Knowl.
Based Syst. 2013, 54, 298–309. [CrossRef]
22. Devlin, J.; Chang, M.‑W.; Lee, K.; Toutanova, K. Bert: Pre‑training of deep bidirectional transformers for language understanding.
arXiv 2018, arXiv:1810.04805.
23. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lweis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly
optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692.
24. Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A Lite BERT for Self‑supervised Learning of
Language Representations. arXiv 2019, arXiv:1909.11942.
25. Clark, K.; Luong, M.T.; Le, Q.V.; Manning, C.D. Electra: Pre‑training text encoders as discriminators rather than generators.
arXiv 2020, arXiv:2003.10555.
26. Li, M.; Chen, L.; Zhao, J.; Li, Q. Sentiment analysis of Chinese stock reviews based on BERT model. Appl. Intell. 2021, 51,
5016–5024. [CrossRef]
27. Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language
understanding. Adv. Neural Inf. Process. Syst. 2019, 32, 1–18.
28. Salma, T.D.; Saptawati, G.A.P.; Rusmawati, Y. Text Classification Using XLNet with Infomap Automatic Labeling Process. In
Proceedings of the 2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA),
Bandung, Indonesia„ 29–30 September 2021.
29. Yan, R.; Jiang, X.; Dang, D. Named entity recognition by using XLNet‑BiLSTM‑CRF. Neural Process. Lett. 2021, 53, 3339–3356.
[CrossRef]
30. Gong, X.R.; Jin, J.X.; Zhang, T. Sentiment analysis using autoregressive language modeling and broad learning system. In
Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), San Diego, CA, USA, 18–
21 November 2019.
31. Alduailej, A.; Alothaim, A. AraXLNet: Pre‑trained language model for sentiment analysis of Arabic. J. Big Data 2022, 9, 72.
[CrossRef]
32. Cui, Y.; Che, W.; Liu, T.; Qin, B.; Wang, S.; Hu, G. Revisiting pre‑trained models for Chinese natural language processing. arXiv
2020, arXiv:2004.13922.
33. Kudo, T.; Richardson, J. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text
processing. arXiv 2018, arXiv:1808.06226.
34. Sennrich, R.; Haddow, B.; Birch, A. Neural machine translation of rare words with subword units. arXiv 2015, arXiv:1508.07909.
35. Che, W.; Feng, Y.; Qin, L.; Liu, T. N‑LTP: An open‑source neural language technology platform for Chinese. arXiv 2020,
arXiv:2009.11616.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual au‑
thor(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.