0% found this document useful (0 votes)
29 views9 pages

Adaptive Contextual Masking for ABSA

The document proposes adaptive masking methods to improve aspect-based sentiment analysis (ABSA) tasks. It introduces adaptive contextual threshold masking (ACTM) that adjusts mask thresholds based on context to determine which text tokens to mask. It also adapts two existing distance-based adaptive attention masking techniques for ABSA tasks. Experiments show the adaptive masking approaches outperform baselines on two ABSA subtasks across multiple datasets, providing more granular analysis of complex sentences with multiple aspects and sentiments.

Uploaded by

mrsenpai1993
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views9 pages

Adaptive Contextual Masking for ABSA

The document proposes adaptive masking methods to improve aspect-based sentiment analysis (ABSA) tasks. It introduces adaptive contextual threshold masking (ACTM) that adjusts mask thresholds based on context to determine which text tokens to mask. It also adapts two existing distance-based adaptive attention masking techniques for ABSA tasks. Experiments show the adaptive masking approaches outperform baselines on two ABSA subtasks across multiple datasets, providing more granular analysis of complex sentences with multiple aspects and sentiments.

Uploaded by

mrsenpai1993
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

E XPLOITING A DAPTIVE C ONTEXTUAL M ASKING FOR

A SPECT-BASED S ENTIMENT A NALYSIS

A P REPRINT

S M Rafiuddin∗ Mohammed Rakib Sadia Kamal


Department of Computer Science Department of Computer Science Department of Computer Science
arXiv:2402.13722v1 [[Link]] 21 Feb 2024

Oklahoma State University Oklahoma State University Oklahoma State University


Stillwater, Oklahoma, USA Stillwater, Oklahoma, USA Stillwater, Oklahoma, USA
srafiud@[Link] [Link]@[Link] [Link]@[Link]

Arunkumar Bagavathi
Department of Computer Science
Oklahoma State University
Stillwater, Oklahoma, USA
abagava@[Link]

February 22, 2024

A BSTRACT

Aspect-Based Sentiment Analysis (ABSA) is a fine-grained linguistics problem that entails the
extraction of multifaceted aspects, opinions, and sentiments from the given text. Both standalone
and compound ABSA tasks have been extensively used in the literature to examine the nuanced
information present in online reviews and social media posts. Current ABSA methods often rely on
static hyperparameters for attention-masking mechanisms, which can struggle with context adaptation
and may overlook the unique relevance of words in varied situations. This leads to challenges in
accurately analyzing complex sentences containing multiple aspects with differing sentiments. In this
work, we present adaptive masking methods that remove irrelevant tokens based on context to assist
in Aspect Term Extraction and Aspect Sentiment Classification subtasks of ABSA. We show with our
experiments that the proposed methods outperform the baseline methods in terms of accuracy and F1
scores on four benchmark online review datasets. Further, we show that the proposed methods can be
extended with multiple adaptations and demonstrate a qualitative analysis of the proposed approach
using sample text for aspect term extraction.2

Keywords Aspect-Based Sentiment Analysis · Adaptive Threshold · Adaptive Contextual Masking

Sentiment Polarity (s1): Positive Sentiment Polarity (s2): Negative

The sound quality of the speakers is excellent, but the battery life is disappointingly short.
Aspect Term (a1) Aspect Term (a2)

Figure 1: Simple ABSA for an online review. The review consists of multiple aspect terms and each aspect of the
review contains its own sentiment polarity.


Corresponding Author
2
For code and dataset, please contact: srafiud@[Link]
Exploiting Adaptive Contextual Masking for ABSA A P REPRINT

1 Introduction

Aspect-Based Sentiment Analysis (ABSA) tasks are gaining traction in various online domains like customer reviews
and social media monitoring Zhang et al. [2022a]. Traditional sentiment analysis tasks consider only one sentiment
polarity label, positive, neutral, or negative, for the given text. However, the real-world text may consist of multiple
sentiment polarities assigned to multifaceted aspects and opinions. A simple example of the ABSA task on a restaurant
review is depicted in Figure 1. ABSA tasks usually involve four components: aspect category, aspect terms, opinion
terms, and sentiment polarity. Several works are available in the literature to study ABSA tasks in both standalone Phan
and Ogunbona [2020] and compound Lin et al. [2022] fashion. Standalone approaches aim to extract only one
component, whereas the compound approaches jointly extract more than one component, for example extracting aspect
term and sentiment. In this paper, we focus on two standalone ABSA tasks of aspect term extraction (ATE) and aspect
sentiment classification (ASC). ASC is a supervised task that predicts the sentiment polarity of the given text in the
context of a given topic or aspect. As given in Figure 1, the text data can contain multiple aspects and each aspect can
have a distinct sentiment polarity. Similarly, the ATE is a supervised task to extract the start and end positions of each
aspect available in the text. It is common that there are multiple aspect terms and each aspect term can span across
multiple words of the given text, similar to the example depicted in Figure 1.
The introduction of attention Vaswani et al. [2017] and large language models increased the scope of sentiment analysis
problems by understanding the precise context of the text. Attention and LLMs aid ABSA tasks in capturing local
context of words and global contextual features of the entire text respectively. The self-attention strategy aims to extract
useful information from text with respect to words present in the text itself. Such techniques are crucial for any ABSA
tasks to map aspect categories and aspect terms with opinion terms and sentiment. One of the popular approaches with
ABSA is to filter out noisy terms that do not cover contextual details for other relevant terms of the given text. Such
strategies also utilize attention weights that are optimized to masked words that are not useful or out of context to the
given aspect. In the literature, the normalized attention weights have been used in two different forms. (1) Weights
Threshold: Threshold-based approaches set a user-defined threshold or take the maximum weight as the threshold Feng
et al. [2022]. (2) Distance Based: These approaches sort attention weights and define a window size around the aspect
terms Phan and Ogunbona [2020]. If the word attention weights do not satisfy the threshold or distance condition,
they are considered noisy terms and masked with zero-vectors for any downstream ABSA task. Recently, dynamic
approaches to extract correlations between the local context and aspects of the text are gaining traction in ABSA
problems Zeng et al. [2019], Phan and Ogunbona [2020], Feng et al. [2022]. However, these approaches are introduced
with an assumption of a user-assigned threshold to decide the tokens to mask. In addition to the local context, the
pre-trained LLMs give better contextual features of the text to cover global representations for ABSA tasks. It is a
common practice to use both attention and LLM representations in tandem to perform both standalone Lin and Joe
[2023] and compund ABSA tasks. In this work, we present three key contributions to overcome the challenges of
current dynamic approaches in standalone ABSA tasks:

1. Adaptive Contextual Threshold Masking (ACTM): We introduce a novel masking strategy that adjusts
mask thresholds adaptively to determine the mask ratio of text tokens based on their context and enhance
granularity in standalone ABSA tasks.
2. Adaptive Masking for ABSA: In addition to the proposed ACTM strategy, we tailored two existing distance-
based adaptive attention masking techniques exclusively for standalone ABSA tasks.
3. Experimental Validation: We demonstrate with extensive experiments that the adaptive masking approaches
outperform baseline ABSA methods across multiple SemEval benchmark datasets for two ABSA subtasks.

2 Related Work

Recent advancements in Aspect-Based Sentiment Analysis (ABSA) include joint learning approaches by Mao et
al. Mao et al. [2021] using shared pre-trained models and Xu et al. Xu et al. [2021] employing sequence-to-sequence
models. Additionally, Yan et al. Yan et al. [2021] and Zhang et al. Zhang et al. [2021] have proposed unified generative
frameworks, integrating Pre-trained Language Models (PLMs) and treating ABSA tasks as distinct text generation
challenges.
Key subtasks in ABSA, such as aspect term extraction and sentiment polarity determination, are highlighted in Chen
and Qian Chen and Qian [2020]. These tasks increasingly rely on aspect-based syntactic information (POS-tags), as
evidenced in works like Wu et al. [2021], Li et al. [2022]. Phan Phan and Ogunbona [2020] emphasizes the importance
of context in ABSA by combining syntactical features with contextualized embeddings. Span-level models for aspect
sentiment extraction are explored in Chen et al. Chen et al. [2022], while GCN-based models for extracting syntactic

2
Exploiting Adaptive Contextual Masking for ABSA A P REPRINT

information from dependency trees are discussed in Zhang et al. Zhang et al. [2022b]. Additionally, Tian et al. Tian
et al. [2021] use GNNs over dependency trees to enhance understanding of syntactic features.
Recent ABSA research emphasizes attention-based neural networks Feng et al. [2022], Liu et al. [2015], with key
advancements like Feng et al.’s masked attention method (AM-WORD-BERT) for term-focused performance enhance-
ment Feng et al. [2022]. Lin et al.’s AMA-GLCF model Lin and Joe [2023] utilizes a masked attention mechanism
with Context Dynamic Mask (CDM) and Context Dynamic Weight (CDW) for prioritizing aspect-relevant text in
global and local contexts. Additionally, LCFS uses CDW and CDM for improved aspect extraction and sentiment
classification by leveraging attention-based masking Phan and Ogunbona [2020]. In these models, they have used a
static threshold for masking, but we proposed an adaptive masking based on the context. Adaptive threshold masking
enhances model accuracy by adjusting to the varying relevance of data features in different contexts. It improves overall
model performance by ensuring focus on the most relevant aspects of the input.

3 Methodology

Text Tokens: t1 t2 ... tn Aspects: a1 a2 ... am

Text Tokens: t1 t2 t3 ... tn

Text Vector: Aspect Vector:

POS Tag Context Dependency +

Attention Weight:

Attention Scores: Attention Words:

+ + Contextualized Embedding: e1 e2 e3 en

Adaptive Contextual Masking: Adaptive Contextual Masking:

Fully Connected Layer


Fully Connected Layer

Aspect Term Sentiment Polarity

(a) Aspect Term Extraction (b) Aspect Sentiment Classification


Figure 2: Proposed Structure of (a) Aspect Term Extraction, and (b) Aspect Sentiment Classification. The red box
highlights the models introduced in this paper.

3.1 Problem Formulation and Motivation

Given a text T = {ti }ni=1 = {t1 , t2 , . . . , tn } of n tokens with k aspects {Aj }kj=1 where each aspect spans across
multiple tokens in T and each aspect can map to its own sentiment polarity Aj → Pj ∈ {pos., neut., neg.}. We define
the aspect term extraction task as a supervised approach fATE : M(E(T ), ti ) → P̂ti ∈ {Begin, In, Out} and the
aspect sentiment classification task as fASC : M(E(T ), E(Aj )) → P̂j where E(T ) ∈ Rn×d is token representations,
E(Aj ) ∈ Rm×d |m <<< n is aspect representations, P̂wi is the predicted position of wi , and P̂j is the predicted
sentiment polarity. In this work, we contribute with the adaptive contextual masking M : E(T ) → E′ (T ) for the
above given two ABSA subtasks, where E′ = {[M ASK], e2 , [M ASK], . . . , en } is a masked representations of E.
Our motivation for this work is to design an adaptive masking M that adjusts token masks to hide irrelevant terms in
correspondence to the context of T , rather than using a hard threshold as given by dynamic masking approaches.

3
Exploiting Adaptive Contextual Masking for ABSA A P REPRINT

3.2 Standalone ABSA Tasks

We briefly outline aspect term extraction (ATE) and aspect-based sentiment classification (ASC) tasks, providing context
for our proposed masking methods. Figure 2 shows the model architectures, inspired by established approaches Feng
et al. [2022], Lin and Joe [2023], Phan and Ogunbona [2020].

3.2.1 Aspect Term Extraction (ATE)


The ATE model, depicted in Figure 2a, predicts aspect terms in text T . The input T is formatted as [CLS] + T
+ [SEP], where [CLS] and [SEP] define the sentence context and endpoint. We utilize POS tag representations
(pi ), contextual token features (wi ), and dependency graph-based syntactic features (Di ) of tokens {ti }ni=1 ∈ T .
We concatenate these grammatic, contextual and syntactic features to represent each token representation as ei =
wi ⊕ pi ⊕ Di . We then process E(T ) = {e1 , e2 , . . . en } through the adaptive contextual masking(Sections 3.3, 3.4,
3.5) and a fully connected layer to perform the supervised task fAT E .

3.2.2 Aspect-Based Sentiment Classification (ASC)


The ASC model, depicted in Figure 2b, predicts sentiment polarities of each aspect Aj ∈ T . The ASC model analyze
sentiment of T towards the given specific aspect Aj by employing the input [CLS]+T +[SEP]+Aj +[SEP], where
[CLS] captures overall context. We apply a transformer model, like BERT Kenton and Toutanova [2019], to extract
contextual representations of T and Aj . These features are fed into the attention mechanism, which calculates the
attention weights between the aspect vectors and the text vectors. The attention mechanism assigns higher weights to
the words or tokens that are more relevant to the aspect of focus based on context, allowing the model to focus on the
most important information for sentiment analysis. We mask the tokens in E(T ) by measuring its relevance with E(Aj )
using the adaptive contextual masking(Sections 3.3, 3.4, 3.5) and perform the supervised task fASC .

3.3 Adaptive Contextual Threshold Masking (ACTM)

γ(ASC
Embed Tokens e1 e2 e3 ... en α Only)

Aspect Vector, A Softmax( (Wa*ei)/ √dk)


(Only for ASC)

a1
Aggregate (Attn)
a2 M(ti)
Adaptive Threshold,
a3 Self-Attention
τ
...
am
Q K V
Attention Scores

Ra(ti,Aj) r1 r2 r3 ... rn
Masked Output, E'
(Only for ASC)

Figure 3: Adaptive Contextual Threshold Masking strategy to adjust the masking threshold τ using the aggregated
attention scores of token representations. α and γ are the learnable parameters, where γ is used in the ASC task only
for aspect relevance.

We propose the ACTM strategy, given in Figure 3, to dynamically adjust the attention span and update mask threshold to
prioritize the sentiment-bearing tokens related to aspect terms. First, we capture token significance in T by computing
attention scores Attn = {attn1 , attn2 , . . . attnn } of token representations {ei }ni=1 ∈ E(T ) using the self attention
mechanism, as given in Equation 1.
 
W a ei
attni = softmax √ (1)
dk
where Wa is the weight matrix for the attention layer and dk is the dimensionality of the key vectors in the attention
mechanism. For the ATE task, we incorporate the adaptive threshold τ on each token ti as τ (ti ) = α · Aggregate (Attn).

4
Exploiting Adaptive Contextual Masking for ABSA A P REPRINT

Similarly, for the ASC task, we employ the adaptive threshold on tokens as τ (ti ) = α·Aggregate (Attn)+γ ·Ra (ti , Aj ),
where Ra (ti , Aj ) is the aspect relevance function that measures the relevance of token ti to the given aspect Aj using
the attention scores as given in Equation 2. The α and γ are learnable parameters that adjust the influence of the
attention scores and aspect relevance, respectively. Aggregate is a pooling function that acts as a metric for getting the
most relevant terms based on context.

exp (β · sim(ai , E(Aj ))


Ra (ti , Aj ) = Pn (2)
k=1 exp (β · sim(ak , E(Ak )))

where ai is weighted contextual token vector, E(Aj ) represents the contextual aspect vector, sim is a cosine similarity
function, and β is a scaling factor. The irrelevant tokens are masked for the sentiment analysis by matching their
corresponding attention scores with adaptive threshold τ as given in Equation 3.

attni , if attni ≥ τ (ti )
M(ti ) = (3)
0, otherwise

This mask M results in a filtered context E′ that emphasizes aspect-related sentiments. The iterative process allows
the threshold τ to adapt dynamically, providing a focused analysis of the interplay between aspects and sentiments in
varying contexts. By employing contextual adaptive masking, ABSA models can effectively concentrate on the most
informative tokens that influence the sentiment towards their aspect for precision ABSA tasks.

3.4 Adaptive Attention Masking (AAM)

Sukhbaatar et al. Sukhbaatar et al. [2019] introduced an adaptive self-attention mechanism with a soft masking
function and a masking ratio M in transformer models. This mechanism dynamically adjusts attention spans to
emphasize  downstream tasks. The soft masking function in self-attention layers, mz (x) =
relevant terms for any
min max R1 (R + z − x), 0 , 1 , where R is a flexibility hyper-parameter and z is a learnable parameter, helps in this

dynamic adjustment. x is the distance from a given position in a sequence of tokens to the position being focused
on by the attention mechanism.
P The adaptive masking ratio M , crucial for determining the span of attention, is
mz (x)
calculated
 as M = n . Consequently, the attention weights are given by the equation Attention Weights =
softmax mz√(x)·M
dk
, integrating M and using scaled softmax within the span boundaries l and r. Given a current
position p in the sequence and a learned attention span z, the left boundary l is calculated as l = p − z, and the right
boundary r is calculated as r = p + z, defining the span of attention around position p. This adaptive attention approach,
by incorporating the masking ratio, allows the model to capture extended dependencies, which is vital for effective ATE
and ASC tasks.

3.5 Adaptive Mask Over Masking (AMOM)

Xiao et al. Xiao et al. [2023] introduced Adaptive Masking Over Masking (AMOM), which is adapted to enhance both
Aspect Term Extraction (ATE) and Aspect Sentiment Classification (ASC) in conditional masked language models
ATE ASC
(CMLM). In our tasks of ABSA, AMOM generates masked sequences Ymask and Ymask using input, aspect (Only
for ASC), and sentiment labels. It evaluates prediction correctness for ATE and ASC against the ground truth Y ,
calculating correctness ratios RATE and RASC . These ratios inform adaptive masking ratios µATE = f (RATE ) and
ATE ASC
µASC = f (RASC ), with Nmask = µATE · |Y | and Nmask = µASC · |Y | indicating the number of tokens to mask and
′ ATE ′ ASC
regenerate for each task. The regenerating processes YATE = G(Ymask , X) and YASC = G(Ymask , X) adapt to the ABSA
context, with M (X, RATE ) and M (X, RASC ) dynamically adjusting the decoder’s masking strategy for both ATE and
ASC, effectively addressing nuanced expressions related to specific aspects and sentiments in text. We highlight that
both AAM and AMOM have never been explored for any ABSA tasks and we combine these strategies with both ATE
and ASC tasks in this work.

3.6 Training Procedure for ATE and ASC

We train the ATE task by minimizing the categorical cross-entropy loss employed with the BIO (Begin, Inside, Outside)
tagging scheme as given in Equation 4.
N
X X
L=− yi,c log(ŷi,c ) (4)
i=1 c∈{B,I,O}

5
Exploiting Adaptive Contextual Masking for ABSA A P REPRINT

where N is the number of words, yi,c is the trinary indicator of the class label c for the word i, and ŷi,c is the predicted
probability. We train the ASC task using the multi-class loss function with L2 regularization as given in Equation 5.

n C
1 XX λ
L(Y, Ŷ, Θ) = − yic log(ŷic ) + ∥Θ∥22 (5)
n i=1 c=1 2

where Y and Ŷ are true class and predicted class probabilities respectively, C is the class count, yic and ŷic denote the
true and predicted class probabilities for instance i, respectively, and ∥Θ∥22 is the L2 norm of the model parameters Θ.

4 Experiments and Results


Datasets: We experiment with benchmark ABSA datasets from Semeval 20143 , 20154 , and 20165 . Table 1 details the
counts of reviews and the total number of positive, neutral, and negative aspects present in the overall training and test
datasets used in our experiments. These training and test samples are pre-defined in SemEval and the same setup is
utilized in all our baseline models.
Experiment Setup: We have leveraged BERT (“bert-base-cased”) model as our contextual feature extractor for all of
the proposed approaches Kenton and Toutanova [2019]. Layer normalization is set to 1 × 10−12 and a dropout of 0.1 is
applied to the attention probabilities and hidden layers. For all experiments, we have trained for 50 epochs with a batch
size of 32, a learning rate of 2 × 10−5 , and L2 regularization of 0.01. Experiments were conducted in a lab server with
3×NVIDIA A10 GPUs. We use Mean for the Aggregate operation in the ACTM model.

Table 1: Summary of laptop and restaurant review datasets given by SemEval.


#Reviews #Aspects
Dataset Split
Positive Negative Neutral
Laptop14 Train 1124 994 870 464
Test 332 341 128 464
Restaurant14 Train 1574 2164 807 637
Test 493 728 196 196
Restaurant15 Train 721 1777 334 81
Test 318 703 192 46
Restaurant16 Train 1052 2451 532 125
Test 319 685 93 63

Baseline Methods:(1) ATE task: BiLSTM Liu et al. [2015] is an RNN used for NLP tasks, processing sequences
bidirectionally. DTBCSNN Ye et al. [2017] leverages sentence dependency trees and CNNs for aspect extraction. BERT-
AE Kenton and Toutanova [2019] uses BERT’s pre-trained embeddings for aspect extraction. IMN He et al. [2019],
combines memory networks with aspect-context interactive attention. (CSAE) Phan and Ogunbona [2020] combines
various components such as contextual, dependency-tree, and self-attention mechanisms. (2) ASC task: Models
such as LCF-ASC-CDM, LCF-ASC-CDW Zeng et al. [2019], LCFS-ASC-CDW, and LCFS-ASC-CDM Phan and
Ogunbona [2020] use Local Context Focus (LCF) with Context Dynamic Mask (CDM) and Weight (CDW) layers.
Attention Mask variations like AM Weight-BERT and AM Word-BERT are applied in ABSA, targeting relevant
parts Feng et al. [2022]. The Unified Generative model Yan et al. [2021] utilizes BART for multiple ABSA tasks.
MGGCN-BERT Xiao et al. [2022] leverages BERT embeddings for ASC, while AMA-GLCF Lin and Joe [2023]
combines global and local text contexts using masked attention.
Discussion on ATE and ASC results: We present the performance of the proposed adaptive contextual masking
strategies for the ATE task in Table 2. We note that both the AMOM and ACTM versions of the adaptive contextual
masking strategies outperform the baseline methods in three datasets. We also note that the AAM version shows
competitive but not leading performance, indicating its partial effectiveness in contextual understanding for ATE tasks.
We also emphasize that the proposed ACTM strategy is competitive in two datasets due to its capability to understand
nuanced contextual interpretation by setting adaptive thresholds for the masking function. Similarly, we give the detailed
performance of the proposed adaptive strategies for ASC task in table 3. Unlike the ATE task, we note a significant
performance gain with both Accuracy (%) and F1 in the adaptive contextual masking strategies in all datasets. Most
3
[Link]
4
[Link]
5
[Link]

6
Exploiting Adaptive Contextual Masking for ABSA A P REPRINT

Table 2: Adaptive contextual masking strategies against baseline models on ATE. "-" signifies no results available, and
"*" denotes reproduced results.
Laptop14 Restaurant14 Restaurant15 Restaurant16
Model
F1 F1 F1 F1
BiLSTM Liu et al. [2015] 73.72 81.42 - -
DTBCSNN Ye et al. [2017] 75.66 83.97 - -
BERT-AE Kenton and Toutanova [2019] 73.92 82.56 - -
IMN He et al. [2019] 77.96 83.33 70.04 78.07*
CSAE Phan and Ogunbona [2020] 77.65 86.65 76.84* 80.63*
AAM Sukhbaatar et al. [2019] 79.27 83.49 76.45 79.34
AMOM Xiao et al. [2023] 78.13 82.98 77.49 82.09
ACTM-ATE 80.34 82.91 77.09 81.04

Table 3: Performance of adaptive masking strategies against baseline models on ASC task. "**" indicates results
reported in the paper as mean values.
Laptop14 Restaurant14 Restaurant15 Restaurant16
Model
Acc. F1 Acc. F1 Acc. F1 Acc. F1
LCF-BERT-CDW Zeng et al. [2019] 82.45 79.59 87.14 81.74 - - - -
LCF-BERT-CDM Zeng et al. [2019] 82.29 79.28 86.52 80.40 - - - -
LCFS-ASC-CDW Phan and Ogunbona [2020] 80.52 77.13 86.71 80.31 89.03* 73.31* 92.25* 76.46*
LCFS-ASC-CDM Phan and Ogunbona [2020] 80.34 76.45 86.13 80.10 88.61* 69.32* 91.84* 70.67*
AM Weight-BERT** Feng et al. [2022] 79.78 76.20 85.66 79.92 - - - -
AM Word-BERT** Feng et al. [2022] 79.87 76.26 85.57 79.02 - - - -
Unified Generative Yan et al. [2021] - 76.76 - 75.56 - 73.91 - -
MGGCN-BERT Xiao et al. [2022] 79.57 76.30 83.21 75.38 82.90 69.27 89.66 73.99
AMA-GLCF** Lin and Joe [2023] - 76.78 - 79.33 - - - 77.08
AAM Sukhbaatar et al. [2019] 82.51 79.61 84.92 81.71 89.34 75.13 90.86 77.18
AMOM Xiao et al. [2023] 81.61 79.09 85.95 80.10 91.50 74.14 92.17 76.73
ACTM-ASC 83.65 76.29 91.05 82.01 90.54 74.07 93.49 78.19

importantly, the proposed ACTM strategy outperforms the other two adaptive strategies in three datasets and gives a
competing performance in the Restaurant 15 dataset. Overall, we signify that our idea of adaptively masking based on
local text tokens on top of the global contextual representations can achieve better results in standalone ABSA tasks
considered in this work. While ACTM strategy leads the performance in ASC and is competitive in ATE, the AMOM
strategy is promising with ATE tasks.

Table 4: F1 of ATE and ASC with multiple Aggregator operations. "Gradient-based" indicates that α, γ are learnable
parameters. Otherwise α = γ = 1
Laptop14 Restaurant14 Restaurant15 Restaurant16
Aggregate Functions
ATE ASC ATE ASC ATE ASC ATE ASC
Mean 75.74 71.53 78.43 78.90 76.19 74.19 80.10 74.39
Median 78.18 72.67 79.94 79.59 76.23 74.23 79.01 71.14
SD 80.29 74.37 78.15 80.10 76.96 77.12 81.92 78.26
Gradient-based Mean 80.34 76.29 82.91 82.01 77.09 74.07 81.04 78.19
Gradient-based Median 76.05 73.72 80.17 83.06 75.73 77.86 78.15 72.24
Gradient-based SD 77.13 73.21 75.13 78.15 75.86 75.38 76.91 72.78

Discussion on Ablation Study: Since the ACTM has significant performance in both ATE and ASC tasks, we compare
several aspects of the proposed ACTM strategy as the ablation study. In this study, we explore different Aggregate
operators, like Mean, Media, and Standard Deviation (SD), along with variations that use only a constant weight
α = γ = 1. We present our results of our ablation study in Table 4. Alhough we note that our default setting of Mean
aggregator with learnable α and γ performs overall good in most of the cases, we find some exceptions. We note that a
simple SD aggregator with constant weight is able to match our default gradient based Mean aggregator for the latest
Restaurant’16 dataset in both ATE and ASC tasks. Similarly, the gradient based Median aggregator is also able to give
better performance in two ASC tasks.

7
Exploiting Adaptive Contextual Masking for ABSA A P REPRINT

Discussion on Qualitative Analysis: We qualitatively analyze the performance of ACTM strategy for the ATE task to
evaluate the efficacy of the proposed method in identifying optimal aspects from the text. Table 5a demonstrates an
example token masking by the proposed ACTM strategy and Table 5b compares ATE task using adaptive threshold and
fixed threshold. It is evident from these tables that the proposed ACTM strategy can assist the ATE task to capture
nuanced aspect terms using adaptive contextual masking technique.

Table 5: Qualitative data for Aspect Term Extraction


(a) Tokens with Attention Scores and Masking (b) Comparison of ATE using Predefined vs. Dynamic Threshold for sample
Status texts.
Attn. ATE w/ fixed ATE w/
Token Masked Review Instances
Score threshold ACTM
the 0.0460 Yes After numerous at-
steak 0.1082 No tempts of trying
was 0.0561 Yes (including setting the clock in
clock
incredibly 0.0867 No clock in BIOS setup BIOS setup
tender 0.0775 No directly), I gave up (I
and 0.0323 Yes am a techie).
flavor 0.0265 Yes After really enjoying
ful 0.0319 Yes ourselves at the bar we bar, table,
bar, dinner
, 0.0275 Yes sat down at a table and dinner
but 0.0977 No had dinner.
service 0.0794 No Did not enjoy the new windows 8,
windows,
quite 0.0413 Yes Windows 8 and touch- touchscreen
touchscreen
slow 0.0648 No screen functions. functions
. 0.0493 Yes
Total: 0.8250
Mean: 0.0590

5 Conclusion

In this work, we explored Aspect-based Sentiment Analysis (ABSA) with a focus on standalone tasks such as Aspect
Term Extraction (ATE) and Aspect Sentiment Classification (ASC) using three different adaptive masking strategies.
We introduced one of those strategies named Adaptive Contextual Threshold Masking (ACTM) while utilizing two other
adaptive masking techniques for ABSA. We depicted with our experiments on benchmark datasets that adaptive masking
can increase the chance of precise ATE and ASC tasks. Particularly, our ACTM strategy demonstrated significant
effectiveness over other approaches with its adaptive contextual threshold module. For future research, we recommend
investigating adaptive masking for both standalone and compound ABSA tasks which benefits many applications.
Another open venue for improvement is to explore the adaptive masking for multi-modal ABSA tasks.

References
Wenxuan Zhang, Xin Li, Yang Deng, Lidong Bing, and Wai Lam. A survey on aspect-based sentiment analysis: Tasks,
methods, and challenges. IEEE TKDE, pages 11019–11038, 2022a.
Minh Hieu Phan and Philip O. Ogunbona. Modelling context and syntactical features for aspect-based sentiment
analysis. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, editors, ACL, pages 3211–3220, July
2020.
Ting Lin, Aixin Sun, and Yequan Wang. Aspect-based sentiment analysis through edu-level attentions. In PAKDD,
pages 156–168. Springer, 2022.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia
Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
Ao Feng, Xuelei Zhang, and Xinyu Song. Unrestricted attention may not be all you need–masked attention mechanism
focuses better on relevant parts in aspect-based sentiment analysis. IEEE Access, 10:8518–8528, 2022.
Biqing Zeng, Heng Yang, Ruyang Xu, Wu Zhou, and Xuli Han. Lcf: A local context focus mechanism for aspect-based
sentiment classification. Applied Sciences, 9(16):3389, 2019.

8
Exploiting Adaptive Contextual Masking for ABSA A P REPRINT

Te Lin and Inwhee Joe. An adaptive masked attention mechanism to act on the local text in a global context for
aspect-based sentiment analysis. IEEE Access, pages 43055–43066, 2023.
Yue Mao, Yi Shen, Chao Yu, and Longjun Cai. A joint training dual-mrc framework for aspect based sentiment analysis.
In AAAI, volume 35, pages 13543–13551, 2021.
Lu Xu, Yew Ken Chia, and Lidong Bing. Learning span-level interactions for aspect sentiment triplet extraction. In
Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli, editors, ACL, pages 4755–4766, August 2021.
Hang Yan, Junqi Dai, Tuo Ji, Xipeng Qiu, and Zheng Zhang. A unified generative framework for aspect-based sentiment
analysis. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli, editors, ACL, pages 2416–2429, August 2021.
Wenxuan Zhang, Xin Li, Yang Deng, Lidong Bing, and Wai Lam. Towards generative aspect-based sentiment analysis.
In ACL, pages 504–510, 2021.
Zhuang Chen and Tieyun Qian. Enhancing aspect term extraction with soft prototypes. In EMNLP, pages 2107–2117.
ACL, 2020.
Shengqiong Wu, Hao Fei, Yafeng Ren, Donghong Ji, and Jingye Li. Learn from syntax: Improving pair-wise aspect
and opinion terms extraction with rich syntactic knowledge. In Zhi-Hua Zhou, editor, Proceedings of IJCAI, pages
3957–3963, 8 2021.
Jia Li, Yuyuan Zhao, Zhi Jin, Ge Li, Tao Shen, Zhengwei Tao, and Chongyang Tao. Sk2: Integrating implicit sentiment
knowledge and explicit syntax knowledge for aspect-based sentiment analysis. CIKM ’22, page 1114–1123. ACM,
2022. ISBN 9781450392365.
Yuqi Chen, Chen Keming, Xian Sun, and Zequn Zhang. A span-level bidirectional network for aspect sentiment triplet
extraction. In EMNLP, pages 4300–4309. ACL, 2022.
Zheng Zhang, Zili Zhou, and Yanna Wang. Syntactic and semantic enhanced graph convolutional network for
aspect-based sentiment analysis. In ACL, pages 4916–4925, 2022b.
Yuanhe Tian, Guimin Chen, and Yan Song. Aspect-based sentiment analysis with type-aware graph convolutional
networks and layer ensemble. In NAACL, pages 2910–2922, 2021.
Pengfei Liu, Shafiq Joty, and Helen Meng. Fine-grained opinion mining with recurrent neural networks and word
embeddings. In EMNLP, pages 1433–1443, 2015.
Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers
for language understanding. In NAACL, pages 4171–4186, 2019.
Sainbayar Sukhbaatar, Édouard Grave, Piotr Bojanowski, and Armand Joulin. Adaptive attention span in transformers.
In ACL, pages 331–335, 2019.
Yisheng Xiao, Ruiyang Xu, Lijun Wu, Juntao Li, Tao Qin, Tie-Yan Liu, and Min Zhang. Amom: adaptive masking
over masking for conditional masked language model. In AAAI, volume 37, pages 13789–13797, 2023.
Hai Ye, Zichao Yan, Zhunchen Luo, and Wenhan Chao. Dependency-tree based convolutional neural networks for
aspect term extraction. In Advances in Knowledge Discovery and Data Mining: 21st Pacific-Asia Conference,
PAKDD 2017, Jeju, South Korea, May 23-26, 2017, Proceedings, Part II 21, pages 350–362. Springer, 2017.
Ruidan He, Wee Sun Lee, Hwee Tou Ng, and Daniel Dahlmeier. An interactive multi-task learning network for
end-to-end aspect-based sentiment analysis. In Proceedings of the 57th Annual Meeting of the Association for
Computational Linguistics, pages 504–515, 2019.
Luwei Xiao, Xiaohui Hu, Yinong Chen, Yun Xue, Bingliang Chen, Donghong Gu, and Bixia Tang. Multi-head
self-attention based gated graph convolutional networks for aspect-based sentiment classification. Multimedia Tools
and Applications, pages 1–20, 2022.

Common questions

Powered by AI

The Adaptive Contextual Threshold Masking (ACTM) strategy offers significant advantages over static masking by providing nuanced contextual interpretation, allowing the model to adjust the mask thresholds adaptively based on context . This enhances model accuracy by focusing on the most relevant aspects of the input, improving performance in aspect term extraction and sentiment classification . ACTM was shown to outperform various baseline methods in experimental validations across multiple benchmark datasets .

Experimental findings validate the efficacy of adaptive masking techniques for ABSA tasks by demonstrating significant performance gains over baseline models in both accuracy and F1 metrics . For example, the adaptive contextual masking strategies, particularly ACTM, consistently outperform baseline methods in comprehensive tests across multiple datasets, evidencing improvements in nuanced aspect term recognition and sentiment analysis capabilities . This empirical evidence supports the conclusion that adaptive masking increases the precision of both ATE and ASC tasks .

The adaptive masking strategy proposed for ABSA improves context understanding by adjusting token masks to conceal irrelevant terms based on the context of the text, rather than employing a fixed threshold . This allows for capturing more nuanced contextual features and ensures the model's focus on relevant aspects of the input, enhancing the accuracy of both Aspect Term Extraction (ATE) and Aspect Sentiment Classification (ASC) tasks .

Adaptive contextual masking aims to overcome challenges in current ABSA approaches by enhancing model precision and focus using adaptive thresholds in masking. It addresses the limitations of static thresholds which may not dynamically adjust to contextual relevance, leading to suboptimal interpretation of aspect terms and sentiment polarities . By doing so, it promotes better handling of diverse text features and nuances, particularly in aligning attention weights to aspect-specific sentiments .

Adaptive masking improves performance compared to traditional ABSA methodologies by dynamically adjusting the mask ratio for text tokens based on contextual importance, thereby enhancing granularity and precision in sentiment analysis . It surpasses traditional methods that use static thresholds by offering a more refined focus on aspect-relevant data features, resulting in higher accuracy and F1 scores, as evidenced by comparative analyses on benchmark datasets .

The experimental results indicating the effectiveness of the ACTM strategy include its superior performance in F1 scores compared to baseline methods across several benchmark datasets such as Laptop14 and Restaurant14 . These results highlight ACTM's significant improvements in both aspect term extraction and sentiment classification tasks, demonstrating enhanced ability to capture nuanced aspect terms and set adaptive thresholds .

The ATE model uses several components to predict aspect terms, including Part-of-Speech (POS) tag representations, contextual token features, and dependency graph-based syntactic features . These elements are concatenated to form token representations, which are then processed through adaptive contextual masking and a fully connected layer to execute the task .

Recent advancements like the AMA-GLCF model enhance feature extraction and sentiment classification by utilizing a masked attention mechanism with Context Dynamic Mask (CDM) and Context Dynamic Weight (CDW) to prioritize aspect-relevant text in global and local contexts . This approach enables more precise extraction and classification of sentiment by strategically focusing on relevant parts of the text, leveraging both global and local text contexts .

The proposed ASC model differs by employing a transformer model with a unique input format that effectively segments the text using markers such as [CLS] and [SEP] to define sentence context and endpoints . This segmentation allows for better capturing of the overall context and specific aspects within the text, compared to typical models which may not use such detailed input structuring .

The use of attention-based neural networks in recent ABSA research is significant because these networks enhance the model's ability to focus on relevant parts of the text, thereby improving performance in retrieving aspect-related information and sentiment classification . Techniques like AM-WORD-BERT leverage masked attention mechanisms to focus specifically on relevant portions of the input, addressing challenges of aspect term importance and sentiment polarity .

You might also like