0% found this document useful (0 votes)
29 views5 pages

14th ICCCNT 2023 Paper 77

Uploaded by

miniature test
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views5 pages

14th ICCCNT 2023 Paper 77

Uploaded by

miniature test
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

IEEE - 56998

Detecting Contradiction and Entailment in


Multilingual Text
1st Abhigya Verma 2nd Jahnvi Srivastav
Indira Gandhi Delhi Technical University for Women Indira Gandhi Delhi Technical University for Women
Delhi, India Delhi, India

3rd Pooja Gera 4th A. K. Mohapatra


Indira Gandhi Delhi Technical University for Women Indira Gandhi Delhi Technical University for Women
Delhi, India Delhi, India

Abstract—With many applications, including text categoriza- dataset’s class distribution, language diversity, and sequence
tion, question-and-answer systems, and sentiment mapping, en- lengths which helped us to refine our models accordingly.
tailment and contradiction detection in multilingual content is The comparative examination of contradiction and entail-
a crucial task in the discipline of natural language processing.
This study examines the capabilities of two advanced models, the ment in multilingual literature has not been extensively ex-
BERT-based multilingual case model and the XLM-RoBERTa plored in prior research work. This paper aims to fill this
large model, in detecting entailment and contradiction in a research gap by working on a novel dataset of multilingual data
multilingual dataset containing 15 diverse languages. While the and facilitating an in-depth analysis across multiple languages.
initial performance of BERT and XLM-RoBERTa models was Our study distinguishes itself through the incorporation of a
30% and 40% respectively, we have successfully enhanced the
XLM-RoBERTa model’s accuracy to an impressive 70% using diverse range of models, namely BERT and RoBERTa, to
data processing techniques. These results imply that, by carefully attain a higher level of precision and depth in the detection of
selecting models and employing the relevant data processing entailment and contradiction. In addition to greatly increasing
techniques, it is possible to detect contradiction and entailment the robustness of our findings, the incorporation of different
in multilingual text with high accuracy. The study’s outcomes encoding methods and data processing techniques increased
are relevant to various areas of natural language processing,
including the analysis of multilingual text. These results lay the the overall effectiveness of our investigation. The conclusions
groundwork for future studies in language processing and can reached in this study have significant implications for the study
guide the development of more sophisticated models. of natural language in multilingual texts. By analyzing subtle
Index Terms—NLP, BERT, Multilingual Text, Classification differences in contradiction and entailment across numerous
languages, our research provides valuable information and
I. I NTRODUCTION practical applications.

As the amount of multilingual content on the internet con- II. L ITERATURE R EVIEW
tinues to grow, accurately identifying entailment and contra- The detection of contradiction and entailment in multilin-
diction in this type of text have emerged as critical challenges. gual text is a critical area of research in natural language
The internet has seen rapid growth in multilingual content in processing. With the creation of potent language models
today’s world, creating a pressing need for effective methods like the BERT Multilingual base model(Cased) and XLM-
for detecting contradiction and entailment. It is essential in RoBERTa, this field has progressed significantly. Through
NLP applications to ensure proper understanding and inter- the use of techniques such as next-sentence prediction and
pretation of the text. Text classification, machine translation, masked language modeling, the BERT model has undergone
question-answering systems, and sentiment analysis are some pre-training on a vast range of 104 languages. It was first
practical applications. This has recently been accomplished us- presented in the study titled [1] ”BERT: Pre-training of Deep
ing language models which are pre-trained, such as BERT [1], Bidirectional Transformers for Language Understanding”.
RoBERTa [2], and XLNet [3], which have been demonstrated On the other hand, XLM-RoBERTa is a scaled cross-lingual
to be effective in cross-lingual settings. sentence encoder that has been trained on a vast dataset
This study employed the BERT-based Multilingual Cased comprised of over 100 languages, which was carefully selected
model and XLM-RoBERTa large model to meet our objective from Common Crawl. It was first presented in the study
to identify entailment and contradiction in the multilingual titled, [4] ”Unsupervised cross-lingual representation learning
text. The data was preprocessed by translating non-English at scale” These two models have shown significant potential in
texts into English texts to improve the accuracy of XLM- detecting entailment and contradiction in multilingual text and
RoBERTa, which achieved up to 70% accuracy. The study have been widely used in various natural language processing
also utilized diverse data analysis techniques to investigate the tasks.

14th ICCCNT IEEE Conference


July 6-8, 2023
IIT - Delhi, Delhi
IEEE - 56998

TABLE I
C OMPARATIVE ANALYSIS BETWEEN PRIOR RESEARCH WORKS

Paper Dataset Used Task Performed Model Used


Stanford Natural Language Inference Identifying contradictions and entailment Encoder-Decoder Sequence-To-Sequence
[5]
(SNLI) dataset (110,000 sentence pairs) [6] in a single language Recurrent Neural Network
Machine-translated version of the Stanford Detecting contradiction and entailment in
[7] Recurrent Neural Network
Natural Language Inference (SNLI) corpus multilingual text
A Hinglish SentiWordnet was created by
Detecting sarcasm and analysing sentiment Extended sentiwordnet 3.0 and naı̈ve
[8] combining the English and Hindi
in ”Hinglish” language Bayes classifier
SentiWordnet
Sentiment analysis of social media text
[9] Social Media Data using a system to detect text and Artificial neural networks
emoticons sarcasm
Collection of 994 Text samples with
Classification of Sentiments and
[10] sentiment and sarcasm labels along with DNN
Identifying Sarcasm
eye-movement data from seven readers
Social Media Data and News Headlines
[11] Sarcasm Detection DNN
dataset
K Nearest Neighbors (KNN), Support
[12] Retrieved Data from Amazon dataset Sarcasm Detection and Sentiment Analysis Vector Machine (SVM), and Random
forest
Twitter Dataset and Internet Argument
[13] Sarcasm Detection BERT and Deep Learning Model
Corpus v2

Majumder et al. [10] argue that sentiment classification and Raviraj Joshi. [16] introduces L3Cube-MahaCorpus, a
sarcasm identification, though considered separate tasks, are Marathi monolingual corpus with 24.8M sentences and 289M
correlated, and propose a multitask learning-based framework tokens, along with BERT-based models (MahaBERT, MahaAl-
that models this correlation using a deep neural network. The BERT, and MahaRoBERTa), a generative pretrained trans-
method proposed in this study achieved a 3-4% improvement former model (MahaGPT), and Marathi fast text embeddings
over existing methods on a benchmark dataset. (MahaFT) trained on the full corpus.
Ajnadkar et al. [11] investigated the use of deep neural Jallad et al. [17] address the challenging task of detecting
networks in detecting sarcasm in social media and news contradictions in Arabic text by creating a dataset called
headlines and highlight the importance of such detection ArNLI and proposing a novel approach that combines con-
in improving sentimental analysis. With word embeddings, tradiction and language model vectors as input to a machine
bidirectional LSTM, and convolutional networks, deep neural learning model. Promising results are achieved, with accura-
networks achieve 88% accuracy in detecting sarcasm. cies of 60%, 99%, and 75% on the SICK, PHEME, and ArNLI
datasets, respectively, using a Random Forest classifier.
Rao et al. [12] discuss sentiment analysis of text on social
media, including the detection of sarcasm. The process in- III. M ETHODOLOGY
volves dataset selection, preprocessing, feature extraction, and The methodology section of our study outlines the approach
classification using algorithms like SVM and Random forest we used to investigate the potential of advanced language mod-
with evaluation based on accuracy. els in detecting contradiction and entailment in multilingual
Eke et al. [13] presented an approach to detect sarcasm text. The growing requirement for efficient natural language
using a context-based feature technique combined with deep processing (NLP) models that manages the complexity of
learning model BERT and traditional machine learning. The multilingual data served as the driving force behind our
technique addressed the limitations of existing models. The investigation. To achieve our research objectives, we leveraged
technique was tested on two Twitter datasets and achieved the ”Contradictory, My Dear Watson” 1 dataset, which consists
high precision rates of 98.5% and 98.0%, respectively, and of pairs of premises and hypotheses in 15 different languages.
81.2% on the IAC-v2 dataset, demonstrating its superiority The dataset was analyzed using various data processing
over existing approaches for sarcasm analysis. Mandal et techniques and evaluated the accuracy of different language
al. [14] presented a novel approach utilizing a deep neural models, including BERT-based multilingual case models and
network architecture combining convolutional neural networks XLM-RoBERTa large models. The data preprocessing and
and Long Short-term Memory layers to detect sarcasm in news model training methodology, along with the experimental
headlines with 86.16% accuracy. Sengar et al. [15] proposed setup and evaluation criteria utilized to evaluate the model’s
an innovative method for identifying sarcasm in plain text by performance, are described in this section.
using feature engineering based on contrasting words within A. Description of Dataset
sarcastic sentences. The proposed method applies a ReLU
The dataset ”Contradictory, My Dear Watson” has been
activation function neural network model to improve the f1-
leveraged as a crucial resource in our study. The dataset
score, while also capturing contextual data, in contrast to
traditional machine learning techniques. 1 https://www.kaggle.com/competitions/contradictory-my-dear-watson

14th ICCCNT IEEE Conference


July 6-8, 2023
IIT - Delhi, Delhi
IEEE - 56998

Data Pre-processing

Concatenation
Data from Data of Premise
Data Analysis
Kaggle Translation and
Hypothesis

MODEL 1

Tokenization
BERT Model bert-base MODEL 3

Result Analysis -multilingual-cased


MODEL 2

RoBERTa jplu/tf-xlm-
Model roberta-large

Fig. 1. Flow diagram representing our Methodology

contains pairs of premises and their hypothesis in diverse


languages, making it suitable for a broader audience. The
relationship between the premise and the hypothesis in each
pair of the dataset is determined using the details provided in Fig. 2. Distribution of the Different Categorical Attribute: Class
the premise.
The dataset comprises a testing set without labels as well
as a labeled training set. The training set includes premise-
hypothesis pairs, their ID, label, and language, while the
testing set includes the same, except for labels. The dataset
exploration allowed for the identification of critical variables
and connections between them.
The analysis of the dataset properties revealed several key
findings. The dataset contains 4176 rows with label 0, 4064
rows with label 2, and 3880 rows with label 1, with no dupli-
cates identified. Additionally, the study disclosed the presence
of 8,209 unique premises, 12,119 unique hypotheses, and 15
unique languages in the dataset. The identification of these
properties and their relationships facilitates the development
of robust models and data processing techniques that account
for the dataset’s intricacies.
Fig. 3. Distribution of the Different Categorical Attribute: Language
B. Data analysis and Pre-processing
The class distribution analysis 2 revealed an equitable Our data analysis uncovered a significant insight - the
distribution of the dataset among the three classes, with class majority of premises were in English. To improve our ac-
0 exhibiting the largest share, trailed by class 2 and class 1. curacy, we employed a data preprocessing technique that
In addition, the language distribution 3 analysis divulged involved creating a new data frame comprising non-English
that the majority of the examples in the dataset were in examples. The data was translated into English using the
English, with other languages demonstrating comparable rep- Google Translate library, and the robust RoBERTa model was
resentation. Notably, as the definition of a word varies across then trained. This approach not only led to an impressive
languages, we opted to count the number of characters and increase in accuracy but also highlights the crucial role of
tokens instead. efficient data preprocessing in improving our present models.
We performed a sequence length analysis to quantify the
number of characters in both the ’premise’ column as pre- C. Tokenization
sented in 4. The results showed that the premises’ length was Tokenization, the technique of splitting words into smaller
relatively short, with the minimum and maximum being 4 and parts known as tokens, is a crucial stage in the natural language
967, respectively. The median length was 96. Furthermore, processing process. Various pre-trained tokenizer models are
the sequence lengths were similar across all three classes, as available, including BERT and XLM-RoBERTa, which can
demonstrated by the generated boxplot. be used to tokenize multilingual text. Following data analysis

14th ICCCNT IEEE Conference


July 6-8, 2023
IIT - Delhi, Delhi
IEEE - 56998

TABLE II
DATASET ATTRIBUTES D ESCRIPTION

Data
Attribute Description
Type
id Unique Identification character sequence String
premise A sentence serving as the input text for the natural language inference task. String
Represents a statement that is either entailed by, contradicts, or is neutral to the premise. It is the statement that we are trying to
hypothesis String
predict whether it can be inferred from the given premise or not, and what kind of relationship it has with the premise.
lang abv Abbreviation representing the language of the premise and hypothesis text. String
language The full name of the language used in the premise and hypothesis text. String
The labels in this research paper represent the logical relationship between pairs of sentences and are assigned values of 0, 1, or
label Integer
2 to indicate whether the relationship is Entailment, Contradiction, or Neutral.

Fig. 5. XLM-RoBERTa model architecture

of 559,890,432 parameters. This layer outputs a


Fig. 4. A boxplot is generated to observe the variation of sequence lengths TFBaseModelOutputWithPoolingAndCrossAttentions object,
by class, which indicates that the sequence lengths are quite similar across
the classes.
including the last hidden state with shape (None, 120, 1024),
pooler output with shape (None, 1024), and other attributes
such as hidden states, attentions, past key values, and
and preprocessing, we have utilized the bert-base-multilingual- cross-attentions. The tf. operators .getitem 1 layer slices
cased and jplu/tf-xlm-roberta-large tokenizers in the BERT the last hidden state to output a tensor with shape (None,
multilingual base and XLM-RoBERTa models, respectively. 1024). The dropout 148 layer applies dropout regularization
to the output of the slice layer. The subsequent dense layers
D. Model Construction
dense 5, dense 6, dense 7, dense 8, and dense 9 have
1) BERT multilingual base model (cased):: The output sizes of 64, 32, 16, 8, and 3, respectively, and are
BERT model construction includes three input layers: used for classification. The total trainable parameters of the
input word ids, input mask, and input type ids. The XLM-RoBERTa model are 559,958,803.
TFBertModel layer comprises the BERT model architecture,
with a total of 177,853,440 parameters. This layer outputs a IV. R ESULTS AND D ISCUSSION
TFBaseModelOutputWithPoolingAndCrossAttentions object, The current study aims to detect contradiction and en-
including the last hidden state with shape (None, None, tailment in multilingual text using different models. The
768), pooler output with shape (None, 768), and other initial approach utilized the BERT model, which achieved
attributes such as past hidden states, attentions, key values, an accuracy of 30%. The subsequent approach involved the
and cross-attentions. The tf. operators .getitem layer slices RoBERTa model, which improved the accuracy to 40%.
the last hidden state to output a tensor with shape (None, However, by employing data processing techniques such as
768). Finally, the dense layer is used for classification with converting non-English text to English text, the accuracy was
3 output classes. The total trainable parameters of the BERT significantly enhanced to 70%. The results provide insight
model are 177,855,747. into the potential of machine learning models such as BERT
2) XLM-RoBERTa (large sized model):: The and RoBERTa to detect complex linguistic relationships in
construction of XLM-RoBERTa model includes one multilingual text. The study also highlights the significance of
input layer, input layer, with shape (None, 120). employing appropriate data processing techniques to improve
The tfxlm roberta model 1 layer comprises the the accuracy of natural language processing models. These
XLM-RoBERTa model architecture, with a total findings have significant implications for developing advanced

14th ICCCNT IEEE Conference


July 6-8, 2023
IIT - Delhi, Delhi
IEEE - 56998

TABLE III [9] S. Gupta, R. Singh, and V. Singla, “Emoticon and text sarcasm detection
C OMPARITIVE ACCURACY FOR THE APPLIED MODELS in sentiment analysis,” in First International Conference on Sustainable
Technologies for Computational Intelligence: Proceedings of ICTSCI
Model Used Tokenizer Used Accuracy 2019. Springer, 2020, pp. 1–10.
Bert bert-base-multilingual-cased 30% [10] N. Majumder, S. Poria, H. Peng, N. Chhaya, E. Cambria, and A. Gel-
Roberta jplu/tf-xlm-roberta-large 40% bukh, “Sentiment and sarcasm classification with multitask learning,”
Roberta with IEEE Intelligent Systems, vol. 34, no. 3, pp. 38–43, 2019.
jplu/tf-xlm-roberta-large 70% [11] O. Ajnadkar, “Sarcasm detection of media text using deep neural
Data Processing
networks,” in Computational Intelligence and Machine Learning: Pro-
ceedings of the 7th International Conference on Advanced Computing,
Networking, and Informatics (ICACNI 2019). Springer, 2021, pp. 49–
natural language processing models capable of accurately 58.
[12] M. V. Rao and C. Sindhu, “Detection of sarcasm on amazon product
identifying entailment and contradiction in multilingual text. reviews using machine learning algorithms under sentiment analysis,”
in 2021 Sixth International Conference on Wireless Communications,
V. C ONCLUSION AND F UTURE S COPE Signal Processing and Networking (WiSPNET). IEEE, 2021, pp. 196–
199.
This research has demonstrated the potential for identifying [13] C. I. Eke, A. A. Norman, and L. Shuib, “Context-based feature technique
and understanding entailment and contradiction in multilin- for sarcasm identification in benchmark datasets using deep learning and
bert model,” IEEE Access, vol. 9, pp. 48 501–48 518, 2021.
gual text by leveraging advanced language models such as [14] P. K. Mandal and R. Mahto, “Deep cnn-lstm with word embeddings
the Bert-based multilingual case model and XLM-Roberta for news headline sarcasm detection,” in 16th International Conference
large model. With an impressive accuracy of 30% and 40% on Information Technology-New Generations (ITNG 2019). Springer,
2019, pp. 495–498.
respectively, the application of data processing techniques to [15] C. P. S. Sengar and S. Jaya Nirmala, “Sarcasm detection in tweets as
translate non-English languages into English has paved the contrast sentiment in words using machine learning and deep learning
way for significant improvements in accuracy, with the XLM- approaches,” in Machine Learning, Image Processing, Network Security
and Data Sciences: Second International Conference, MIND 2020,
Roberta model achieving up to 70%. The study underlines Silchar, India, July 30-31, 2020, Proceedings, Part I 2. Springer, 2020,
the significance of data processing techniques, and further pp. 73–84.
investigation is necessary to explore more efficient methods [16] R. Joshi, “L3cube-mahacorpus and mahabert: Marathi monolingual
corpus, marathi bert language models, and resources,” arXiv preprint
for detecting contradiction and entailment in multilingual arXiv:2202.01159, 2022.
text, as well as developing models that can handle language [17] K. A. Jallad and N. Ghneim, “Arnli: Arabic natural language in-
variations and nuances more effectively. Future research may ference for entailment and contradiction detection,” arXiv preprint
arXiv:2209.13953, 2022.
employ the latest advanced models and focus on training
models on proprietary data prior to application. This study has
opened up new possibilities for NLP research, emphasizing
the importance of developing robust models for multilingual
applications.

R EFERENCES
[1] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training
of deep bidirectional transformers for language understanding,” arXiv
preprint arXiv:1810.04805, 2018.
[2] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis,
L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert
pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
[3] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and
Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language
understanding,” Advances in neural information processing systems,
vol. 32, 2019.
[4] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek,
F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov, “Unsu-
pervised cross-lingual representation learning at scale,” arXiv preprint
arXiv:1911.02116, 2019.
[5] R. Sifa, M. Pielka, R. Ramamurthy, A. Ladi, L. Hillebrand,
and C. Bauckhage, “Towards contradiction detection in german: a
translation-driven approach,” in 2019 IEEE Symposium Series on Com-
putational Intelligence (SSCI). IEEE, 2019, pp. 2497–2505.
[6] M. Glockner, V. Shwartz, and Y. Goldberg, “Breaking nli systems
with sentences that require simple lexical inferences,” arXiv preprint
arXiv:1805.02266.
[7] M. Pielka, R. Sifa, L. P. Hillebrand, D. Biesner, R. Ramamurthy, A. Ladi,
and C. Bauckhage, “Tackling contradiction detection in german using
machine translation and end-to-end recurrent neural networks,” in 2020
25th International Conference on Pattern Recognition (ICPR). IEEE,
2021, pp. 6696–6701.
[8] A. Gupta, A. Mishra, and U. S. Reddy, “Sentiment analysis of
hinglish text and sarcasm detection,” in Conference Proceedings of
ICDLAIR2019. Springer, 2021, pp. 11–20.

14th ICCCNT IEEE Conference


July 6-8, 2023
IIT - Delhi, Delhi

You might also like