0% found this document useful (0 votes)
27 views

Final Defense Report-4

The document describes a project to develop a Nepali document summarizer using mT5. It was created by 4 students - Angal Dahal, Kajal Singh, Kushal Acharya, and Maheshwar Prasad Bhatt from Tribhuvan University as a partial requirement for their bachelor's degree in computer engineering. The summarizer was developed using machine learning techniques like transformers and fine-tuning the mT5 model on Nepali text data. It aims to automatically generate concise summaries from Nepali documents and articles to help address the problem of information overload.

Uploaded by

Angel Dahal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Final Defense Report-4

The document describes a project to develop a Nepali document summarizer using mT5. It was created by 4 students - Angal Dahal, Kajal Singh, Kushal Acharya, and Maheshwar Prasad Bhatt from Tribhuvan University as a partial requirement for their bachelor's degree in computer engineering. The summarizer was developed using machine learning techniques like transformers and fine-tuning the mT5 model on Nepali text data. It aims to automatically generate concise summaries from Nepali documents and articles to help address the problem of information overload.

Uploaded by

Angel Dahal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

TRIBHUVAN UNIVERSITY

INSTITUTE OF ENGINEERING
PURWANCHAL CAMPUS

“NEPALI DOCUMENT SUMMARIZER USING mT5”

BY
Angal Dahal(PUR076BCT010)
Kajal Singh(PUR076BCT037)
Kushal Acharya(PUR076042)
Maheshwar Prasad Bhatt(PUR076BCT045)

A PROJECT SUBMITTED TO THE DEPARTMENT OF


ELECTRONICS AND COMPUTER ENGINEERING IN PARTIAL
FULFILLMENT OF THE REQUIREMENT FOR THE
BACHELOR’S DEGREE IN COMPUTER ENGINEERING

DEPARTMENT OF ELECTRONICS AND COMPUTER ENGINEERING


PURWANCHAL CAMPUS
DHARAN, NEPAL

March, 2024
“NEPALI DOCUMENT SUMMARIZER USING mT5”

By

Angal Dahal(PUR076BCT010)

Kajal Singh(PUR076BCT037)

Kushal Acharya(PUR076BCT042)

Maheshwar Prasad Bhatt(PUR076BCT045)

Project Supervisor

Assoc.Prof. Binaylal Shrestha

A project submitted to the Department of Electronics and Computer Engineering in


partial fulfillment of the requirements for the Bachelor’s Degree in Computer
Engineering

Department of Electronics and Computer Engineering


Purwanchal Campus, Institute of Engineering
Tribhuvan University
Dharan, Nepal

March, 2024
COPYRIGHT©

The author has agreed that the Library, Department of Electronics and Computer En-
gineering, Purwanchal Campus, Institute of Engineering may make this report freely
available for inspection. Moreover, the author has agreed that permission for extensive
copying of this project report for scholarly purpose may be granted by the supervisor(s)
who supervised the thesis work recorded herein or, in their absence, by the Head of
the Department wherein the thesis report was done. It is understood that the recogni-
tion will be given to the author of this report and to the Department of Electronics and
Computer Engineering, Purwanchal Campus, Institute of Engineering in any use of the
material of this thesis report. Copying or publication or the other use of this report for
financial gain without approval of the Department of Electronics and Computer Engi-
neering, Purwanchal Campus, Institute of Engineering and author’s written permission
is prohibited.

Request for permission to copy or to make any other use of the material in this report in
whole or in part should be addressed to:

Head
Department of Electronics and Computer Engineering
Purwanchal Campus, Institute of Engineering
Dharan , Sunsari
Nepal

iii
DECLARATION

We declare that the work hereby submitted for Bachelor of Engineering in Computer
Engineering at Institute of Engineering, Purwanchal Campus entitled “ NEPALI DOC-
UMENT SUMMARIZER USING NLP” is our work and has not been previously
submitted by me at any university for any academic award.

We authorize the Institute of Engineering, Purwanchal Campus to lend this report to


other institutions or individuals for scholarly research.

Angal Dahal(PUR076/BCT/010)
Kajal Singh(PUR076/BCT/037)
Kushal Acharya(PUR076/BCT/042)
Maheshwar Prasad Bhatt(PUR076/BCT/045)

March, 2024

iv
RECOMMENDATION

The undersigned certify that they have read and recommended to the Department of
Electronics and Computer Engineering for acceptance, a project entitled “Nepali Doc-
ument Summarizer using mT5”, submitted by Angal Dahal, Kajal Signh, Kushal
Acharya, Maheshwar Prasad Bhatt in partial fulfillment of the requirement for the
award of the degree of “Bachelor of Engineering in Computer Engineering”.

..........................................................................
Assoc. Prof. Binaylal Shrestha
Supervisor
Department of Electronics and Computer Engineering
Purwanchal Campus, Institute of Engineering, Tribhuvan University

..........................................................................
Assoc. Prof. Surendra Shrestha, (PhD)
External Examiner
Department of Electronics and Computer Engineering
Pulchowk Campus, Institute of Engineering, Tribhuvan University

..........................................................................
Asst. Prof. Pravin Sangroula
Head of Department
Department of Electronics and Computer Engineering
Purwanchal Campus, Institute of Engineering, Tribhuvan University

2nd March, 2024

v
DEPARTMENTAL ACCEPTANCE GOES HERE

vi
ACKNOWLEDGEMENT

We would like to extend our heartfelt gratitude to the respective HOD of the DE-
PARTMENT OF ELECTRONICS AND COMPUTER ENGINEERING Mr. Pravin
Sangroula and all the teachers of this department for granting us the opportunity to
do the major project on “Nepali Document Summarizer”. Their unwavering support,
guidance, and encouragement have been invaluable in shaping this endeavor. First and
foremost, we would like to express our sincere appreciation to our project cluster head,
Mr. Binaylal Shrestha his expertise, valuable insights, and continuous support will play
a crucial role in the development and execution of this project which will undoubt-
edly lead us toward successful outcomes. Moreover, we are thankful to the participants
who have willingly agreed to provide the necessary data for training our “Nepali Doc-
ument Summarizer”. Their involvement and support will be integral to the success of
our project. In conclusion, we are deeply grateful to all the individuals (friends, expert
seniors) who have helped us in taking on this project. Their collective efforts and con-
tributions will drive the successful completion of this endeavor. Thank you all for your
continued support guidance, and encouragement.

Angal Dahal(PUR076BCT010)

Kajal Singh(PUR076BCT037)

Kushal Acharya(PUR076BCT042)

Maheshwar Prasad Bhatt(PUR076BCT045)

vii
ABSTRACT

This project focuses on developing and evaluating a Nepali Document Summarizer us-
ing advanced Natural Language Processing techniques. The goal is to automatically
generate concise summaries from diverse Nepali documents, articles etc. , addressing
the common challenge of information overload. Through sourcing data from various
Nepali documents, articles and implementing an iterative development approach, in-
cluding the addition of a document upload feature, the Summarizer demonstrates com-
mendable performance in distilling complex information into succinct summaries. The
use of the ROUGE metric validates its effectiveness, with consistently high scores. In
conclusion, the project successfully addresses the challenges of summarizing Nepali
language documents, offering a valuable tool for efficient information extraction and
contributing to more accessible and time-efficient information consumption.

viii
TABLE OF CONTENTS

COPYRIGHT iii

DECLARATION iv

RECOMMENDATION v

DEPARTMENTAL ACCEPTANCE vi

ACKNOWLEDGEMENT vii

ABSTRACT viii

LIST OF FIGURES xi

1 INTRODUCTION 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 LITERATURE REVIEW 3

3 METHODOLOGY 5

3.1 System architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.2 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.3 Dataset Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.4 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.5 Text Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.6 Model Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.6.1 Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

ix
3.6.2 T5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.6.3 mT5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.6.4 Fine Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.6.5 Summary Generation . . . . . . . . . . . . . . . . . . . . . . . 11

4 RESULT AND CONCLUSION 12

4.1 Loss Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.2 Evaluation Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.3 Summarizer Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.3.1 Summary for article titled - ”5 Star Hotel in Jhapa” . . . . . . . 13

4.3.2 Summary for article titled - ”Services of AAO Gorkha disrupted” 14

4.3.3 Summary for document . . . . . . . . . . . . . . . . . . . . . . 15

4.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5 CONCLUSIONS AND LIMITATIONS 17

5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.2.1 Long time to generate the output . . . . . . . . . . . . . . . . . 17

REFERENCES 18

x
LIST OF FIGURES

Figure 3.1: High level system architecture . . . . . . . . . . . . . . . . . . . 5

Figure 3.2: Use case diagram of the model . . . . . . . . . . . . . . . . . . . 6

Figure 3.3: Transformer model from -”Attention is all you need” . . . . . . . 7

Figure 4.1: Loss Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Figure 4.2: Output for first article . . . . . . . . . . . . . . . . . . . . . . . . 14

Figure 4.3: Output for second article . . . . . . . . . . . . . . . . . . . . . . 15

Figure 4.4: Output for Nepali text document. . . . . . . . . . . . . . . . . . 16

xi
CHAPTER 1

INTRODUCTION

In today’s fast-paced world, information overload is a common challenge, especially


when dealing with extensive documents and textual data. Whether it’s research papers,
news articles, or lengthy reports, the ability to quickly extract key insights and main
ideas is crucial for efficient decision-making and comprehension. This is where the
Document Summarizer comes into play. The Document Summarizer is an innovative
solution that leverages cutting-edge Natural Language Processing (NLP) techniques to
automatically generate concise and coherent summaries from lengthy texts. By con-
densing the content while retaining its essential meaning, the summarizer empowers
users to save time, prioritize information, and gain a comprehensive understanding of
complex documents. Before going to the Text summarization, first we have to know that
what a summary is. A summary is a text that is produced from one or more texts, that
conveys important information in the original text, and it is of a shorter form. The goal
of automatic text summarization is presenting the source text into a shorter version with
semantics. The most important advantage of using a summary is, it reduces the read-
ing time. Text Summarization methods can be classified into extractive and abstractive
summarization. An extractive summarization method consists of selecting important
sentences, paragraphs etc. from the original document and concatenating them into
shorter form. An Abstractive summarization is an understanding of the main concepts
in a document and then expresses those concepts in clear natural language. There are
two different groups of text summarization: inductive and informative [1]. Inductive
summarization only represents the main idea of the text to the user. The typical length
of this type of summarization is 5 to 10 percent of the main text. On the other hand,
the informative summarization systems give concise information of the main text. The
length of informative summary is 20 to 30 percent of the main text.

1.1 Background

Text summarization has evolved significantly, driven by advancements in NLP and ma-
chine learning techniques. Early methods relied on handcrafted rules but struggled with

1
nuanced meaning. Later, transformers, notably introduced in Vaswani et al.’s ”Atten-
tion is All You Need,” revolutionized NLP by improving parallelization and capturing
long-range dependencies. Models like BERT, GPT-2, GPT-3, and T5 further enhanced
text summarization. Various attempts have been made to summarize Nepali texts with
notable results by various scholars achieving good results with help of pre-trained trans-
former models.

1.2 Problem statement

The problem statement for task of ‘Document Summarizer’ is to develop a model and
a web app to deliver the results of the model to the end user. The task involves huge
corpus of Nepali text and their summarized forms. The goal will be to achieve a good
summarization model with resulting ROUGE score comparable to current benchmark.

1.3 Objectives

• To summarize the given text comprehending the meaning of text with ROUGE-L
score near to 0.443.

1.4 Scope

• Summarize given document.

• Help understand content of document.

2
CHAPTER 2

LITERATURE REVIEW

In 1958, a research involved summarizing articles and papers used statistical informa-
tion derived from word frequency and distribution, which were the fed to machine to
compute a relative measure of significance, first for individual words and then for sen-
tences. Sentences scoring highest in significance were extracted and printed out to
become the “auto-abstract.” [5]. This was early phase of summarization work done in
field of NLP, that used manual heuristics and mathematical model to calculate the sig-
nificance of a word in a sentence and how it captures the overall meaning of that text,
generating extractive summary.

There have been two main approaches mainly for the summary namely extractive and
abstractive. In any case use of Neural Network would require minimum of preprocess-
ing and relieves for manual feature processing. Some researches argue that extractive
summary are more interpretable and expected to give better results than abstractive sum-
maries [8]. While the other researches suggest that abstractive summarizer performs
better overall, the results suggest that the margin by which abstraction outperforms ex-
traction is greater when controversiality is high, providing a context in which the need
for generation based methods is especially great [1], suggesting that effectiveness of
extractive and abstractive summary depends upon context provided.

Document extracts consisting of roughly 20 percent of the original can be as informa-


tive as the full text of a document, which suggests that even shorter extracts may be
useful indicative summaries. Also optimal extract can be far from unique. Numerous
heuritstics have been proposed to guide the selection of document extracts. Yet no clear
criteria have been proposed to choose among them. Evidences suggest that combina-
tions of invidual heuristics have the best performance [2].

LSTM cells are also useful for text summarization, use of GloVe algorithm to make
embedding to capture features for Nepali word corpus. That is fed to LSTM cell and
ROUGE -1 and ROUGE -2 is calculated for test matrices with human generated sum-
mary. LSTM cells capture features very well within a corpus. Extractive method is used
to capture sentences with most weight[4].

3
The transformer architecture, introduced by Vaswani et al. in 2017, marked a significant
milestone in NLP. Transformers rely on self-attention mechanisms, which allow the
model to weigh the importance of different words in a sequence when encoding or
decoding, capturing long-range dependencies more effectively compared to traditional
RNNs and CNNs. [7] This architecture led to the development of models like BERT,
GPT, and T5, each trained using a large text corpora, and fine tuned for specific NLP
task.

Building on top of transformer architecture T5 introduced a unified approach.It treats


every task as text-to-text transformation making it versatile and adaptable [6]. It was
trained on huge corprous of publicly available data, however T5 was limited to only
a single language dataset and being a general solution to NLP problems, T5 was not
optimized. Working on limitation of T5 came mT5 that was trained on a large multi-
lingual corpora that handled language limitation of T5 [9].

Leveraging neural networks like BERT, pre-trained models are fine-tuned for sum-
marization tasks. While news domain data dominates training, the study investigates
the adaptability of these models to academic texts. Robustly pre-trained models show
promise in generalization, yet human evaluation underscores the necessity for improved
assessment methods and metrics, revealing potential discrepancies between metric-
based and human-perceived quality in text summarization. [3]

4
CHAPTER 3

METHODOLOGY

3.1 System architecture

High level system architecture is given in the figure below.

Figure 3.1: High level system architecture

The system consists of a web client or browser which user interacts with to access
the features of the model, WSGI application handles the routing of HTTP request to
proper python function which have the logic to preprocess the input, in our case that
would involve parsing the text from document. The parsed text is then fed to the model
and output summary is then returned to the user via the same WSGI application where
browser presents it to the user.

3.2 Use Case Diagram

The use case diagram of the system is given in figure below various stages have to be
performing to achieve automatic text summarization for the Nepali documents.

5
Figure 3.2: Use case diagram of the model

3.3 Dataset Selection

For model fine tuning and testing we used publicly available dataset from Hugging
Face. Dataset involved around 15,580 training data and around 1,732 testing data in
format of Dataset dict that is defined by Hugging Face itself. Dataset involved article
with in average 1000 tokens and it’s summary with in average 270 tokens.

3.4 Model Selection

For the core task of Nepali Text Summarization we selected mT 5base as the base model
to fine-tune. mT5 is a multi-lingual variant of T5 model that generalizes every NLP task
as text-to-text transformation providing adaptability and a unified approach.

3.5 Text Processing

Text preprocessing involved removing HTML tags and English language advertise-
ments. Prefix were added in the text specific to the summarization task. Since the
model needs to understand the syntactic and semantic context of the sentence we kept
the punctuation and stop words as it is.

6
3.6 Model Architecture

3.6.1 Transformer

The Transformer architecture follows an encoder-decoder structure but does not rely on
recurrence and convolutions in order to generate an output.

Figure 3.3: Transformer model from -”Attention is all you need”

In short the task of the encoder is to map an input sequence to a sequence of continuous
representations, which is then fed into a decoder. While decoder receives the output of
the encoder together with the decoder output at the previous time step to generate an
output sequence. At each step the model is auto-regressive, consuming the previously

7
generated symbols as additional input when generating the next. [7]

Encoder

The encoder consists of a stack of N = 6 identical layers, each of the layer is composed
of two sublayers:

1. The first sublayer implements a multi-head self-attention mechanism. The multi-


head mechanism implements a multi-head self-attention mechanism. Multi-head mech-
anism implements h heads that receive a linearly projected version of the queries, keys
and values, each to produce h outputs in parallel that are then used to generate a final
result.

2. The second sublayer is a fully connected feed-forward network consisting of two


linear transformations with Rectified Linear Unit (ReLU) activation in between:

FFN(x) = ReLu(W1x + b1 )W2 + b2 (3.1)

The six layers of the Transformer encoder apply the same linear transformations to all
the words in the input sequence, but each layer employs different weight (W1 ,W2 ) and
bias (b1 , b2 parameters to do so. Each sublayer is also succeeded by a normalization
layer, layernorm(.) which normalizes the sum computed between the sublayer input, x
and the output generated by the sublayer itself, sublayer(x):

layernorm(x + sublayer(x) (3.2)

An important consideration to keep in mind is that the Transformer architecture cannot


inherently capture any information about the relative positions of the words in the se-
quence since it does not make use of recurrence. This information has to be injected by
introducing positional encodings to the input embeddings.

The positional encoding vectors are of the same dimension as the input embeddings and
are generated using sine and cosine functions of different frequencies. Then, they are
simply summed to the input embeddings in order to inject the positional information.

8
Decoder

The decoder shares several similarities with the encoder. The decoder also consists of a
stack of N = 6 identical layers that are each composed of three sublayers:

1. The first sublayer receives the previous output of the decoder stack, augments it
with positional information, and implements multi-head self-attention over it. While
the encoder is designed to attend to all words in the input sequence regardless of their
position in the sequence, the decoder is modified to attend only to the preceding words.
Hence, the prediction for a word at position i can only depend on the known outputs for
the words that come before it in the sequence. In the multi-head attention mechanism
(which implements multiple, single attention functions in parallel), this is achieved by
introducing a mask over the values produced by the scaled multiplication of matrices
Q and K . This masking is implemented by suppressing the matrix values that would
otherwise correspond to illegal connections
   
e11 e12 ... e1n e11 −∞ ... −∞
   
   
T
 e21 e22 ... e2n   e21 e22 ... e2n 
mask(QK ) = mask 

 = 
 

 (3.3)
 ... ... ... ...   ... ... ... ... 
   
em1 em2 ... emn em1 em2 ... emn

2. The second layer implements a multi-head self-attention mechanism similar to the


one implemented in the first sublayer of the encoder. On the decoder side, this multi-
head mechanism receives the queries from the previous decoder sublayer and the keys
and values from the output of the encoder. This allows the decoder to attend to all the
words in the input sequence.

3. The third layer implements a fully connected feed-forward network, similar to the
one implemented in the second sublayer of the encoder.

Furthermore, the three sublayers on the decoder side also have residual connections
around them and are succeeded by a normalization layer. Positional encodings are
also added to the input embeddings of the decoder in the same manner as previously
explained for the encoder.

9
3.6.2 T5

The model structure is just a standard sort of vanilla encoder-decoder transformer. T5


uses common crawl web extracted text. The authors apply some pretty simple heuristic
filtering. T5 removes any lines that didn’t end in a terminal punctuation mark. It also
removes line with the word javascript and any pages that had a curly bracket (since
it often appears in code). It deduplicates the dataset by taking a sliding window of 3
sentence chunks and deduplicated it so that only one of them appeared the dataset.

Additionally, T5 employs a pre-training objective known as ”denoising with span cor-


ruption,” where it randomly masks spans of text and tasks the model with predicting
the original content. This approach encourages the model to understand and generate
coherent text by learning to fill in missing parts effectively. Moreover, T5 benefits from
a large-scale dataset for pre-training, which encompasses diverse domains and linguis-
tic structures, enabling the model to generalize well across various tasks and domains
during fine-tuning. Overall, these design choices contribute to T5’s robustness and ver-
satility in natural language understanding and generation tasks. [6]

3.6.3 mT5

Objective of mT5 model was to closely follow T5 model’s recipe as much as possible,
specifically ”T5.1.1” recipe. One of the most important distinction is to rather use a
“line length filter” that requires pages to contain at least three lines of text with 200
or more characters.The sampling strategy for pre-training multilingual models involves
balancing the representation of languages. Boosting lower-resource languages by sam-
pling according to a probability function helps prevent overfitting or underfitting. The
hyperparameter α controls the degree of boosting, with values like 0.3 striking a bal-
ance between high and low-resource language performance. [9].

3.6.4 Fine Tuning

Due to very low percentage (0.69%) Nepali dataset used for training, the model suffered
highly from “accidental translation”. Due to this, we’ve followed a language-specific
tokenization approach which means creating specific tokenizer for specific language

10
or a subset of language rather than being general multilingual. According to this ap-
proach, we’ve redefined the tokenizer to include English and Nepali tokens. Since, we
were to work exclusively with those texts this completely avoids the cross-lingual errors
improving the model performance. This also reduced the model size significantly and
improved the performance.

Fine-tuning a pre-trained multilingual model with a Nepali dataset using Hugging Face’s
prebuilt trainer involved first loading the pre-trained model from the Hugging Face
model hub and preparing the Nepali dataset for fine-tuning by tokenizing the text data,
batch tokenization was utilized for this task. Next, the trainer was configured with spe-
cific hyperparameters such as batch size, weight decay and learning rate. The prebuilt
trainer then facilitated the fine-tuning process, iterating through the dataset to adjust
the model’s parameters and minimize the loss function. After fine-tuning, the model’s
performance was evaluated on a separate validation dataset to assess its adaptation to
Nepali language. Additional fine-tuning iterations or adjustments to hyperparameters
were conducted based on evaluation results.

3.6.5 Summary Generation

For summary generation in mT5, encoding entails tokenizing the input text into sub-
word units using our tokenizer and passing it through multiple layers of self-attention
and feed-forward neural networks in the encoder to capture contextual information.
During decoding, the model generates the summary token by token, with the input
to the decoder including encoded representations from the encoder and a task-specific
prefix indicating the summary task. The decoder, consisting of multiple layers of self-
attention and feed-forward networks, attends to the encoded input representations and
previously generated tokens to predict the next token in the summary sequence, with
cross-attention mechanisms enabling the decoder to incorporate relevant information
from the input text encoded by the encoder. This iterative process continues until an
end-of-sequence token is generated or a maximum summary length is reached, result-
ing in accurate and concise summaries of the input text.

11
CHAPTER 4

RESULT AND CONCLUSION

4.1 Loss Curve

The graph shows curve plotted of loss against of number of steps. Blue curve represents
training loss and yellow curve represents validation loss.

Figure 4.1: Loss Curve

The graph shows that model starts with higher training and validation loss, the loss
gradually starts decreasing indicating the model’s ability to learn from the dataset and
optimize it’s weights and biases. Training loss and validation loss follows the same
pattern as it take a big dive during initial steps and settles down to base value as it
moves forward.

4.2 Evaluation Report

The report below shows ROUGE-1, ROUGE-2 and ROUGE-L for our model.

12
Metric Recall Precision F-1 Score
ROUGE-1 0.635170 0.372573 0.464093
ROUGE-2 0.442052 0.257207 0.320759
ROUGE-L 0.566153 0.332508 0.414010

Table 4.1: Performance Metrics

ROUGE scores of our model were comparable to scores achieved by other similar re-
searches [4]. These ROUGE scores indicates the congruence of unigram, bigram, and
longest common sub sequence matches between the generated summaries and the ref-
erence summaries. These scores provide quantitative measures of our summarization
model’s ability to accurately capture content overlap at different levels of granularity.

4.3 Summarizer Output

The Document Summarizer exhibited commendable performance on the test dataset.


The summarization model successfully generated concise and informative summaries
for a variety of articles. Two representative examples are provided below:

4.3.1 Summary for article titled - ”5 Star Hotel in Jhapa”

Below is summary output for news article text sourced from OnlineKhabar for testing
our models ability to grasp unseen data.

13
Figure 4.2: Output for first article

The summarization model effectively captured the essence of the news article, high-
lighting key details about the details about the hostel’s construction and brief detail.

4.3.2 Summary for article titled - ”Services of AAO Gorkha disrupted”

Below is summary output for news article text sourced from Setopati.

14
Figure 4.3: Output for second article

The resulting summary was able to summarize the essence of the news.

4.3.3 Summary for document

Below is summary output for document having nepali texts.

15
Figure 4.4: Output for Nepali text document.

4.4 Evaluation Metrics

To quantitatively assess the summarization quality, we employed the ROUGE metric.


The ROUGE scores obtained for the summarization outputs were consistently high,
indicating a close resemblance to reference summaries or human-authored summaries.
The evaluation process involved precision, recall, and F1 score measurements, and the
results affirmed the effectiveness of our Document Summarizer.

16
CHAPTER 5

CONCLUSIONS AND LIMITATIONS

5.1 Conclusions

In conclusion, the Document Summarizer successfully met the objectives set forth in
the project, particularly in the context of handling Nepali language documents. The
model demonstrated proficiency in summarizing diverse Nepali documents, offering a
valuable tool for efficiently extracting key insights from extensive textual data.

The positive results obtained from the evaluation indicate the potential for broader ap-
plications of the Document Summarizer in handling Nepali language documents, con-
tributing to more accessible and time-efficient information consumption.

5.2 Limitations

5.2.1 Long time to generate the output

The project confronts computational challenges, requiring substantial resources for


computing and deploying the model. High-performance computing environments are
essential, potentially escalating operational costs. This resource-intensive approach
may cause delays in summarising analysis in real world settings. The dynamic nature of
NLP algorithms also mandates frequent model retraining, adding to the computational
burden. Balancing computational efficiency with model accuracy is imperative for ad-
dressing infrastructure limitations and managing ongoing operational costs in practical
implementation.

Recognizing and addressing these limitations is crucial for the successful development,
implementation, and eventual adoption of the summarizing model. Ongoing refinement
and adaptation will be necessary to overcome these challenges and enhance the system’s
effectiveness in real-world settings.

17
REFERENCES

[1] G. Carenini and J. C. K. Cheung. Extractive vs. NLG-based abstractive summa-


rization of evaluative text: The effect of corpus controversiality. In M. White,
C. Nakatsu, and D. McDonald, editors, Proceedings of the Fifth International Nat-
ural Language Generation Conference, pages 33–41, Salt Fork, Ohio, USA, June
2008. Association for Computational Linguistics.

[2] H. P. Edmundson. New methods in automatic extracting. J. ACM, 16(2):264–285,


apr 1969.

[3] E. Hermansson and C. Boddien. Using pre-trained language models for extractive
text summarisation of academic papers. 2020.

[4] R. S. Khanal, S. Adhikari, and S. Thapa. Extractive method for nepali text summa-
rization using text ranking and lstm. Proceedings of 10th IOE Graduate Confer-
ence, 10, October 2021.

[5] H. P. Luhn. The automatic creation of literature abstracts. IBM Journal of Research
and Development, 2(2):159–165, 1958.

[6] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li,


and P. J. Liu. Exploring the limits of transfer learning with a unified text-to-text
transformer, 2023.

[7] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser,


and I. Polosukhin. Attention is all you need, 2023.

[8] A. P. Widyassari, S. Rustad, G. F. Shidik, E. Noersasongko, A. Syukur, A. Af-


fandy, and D. R. I. M. Setiadi. Review of automatic text summarization techniques
methods. Journal of King Saud University - Computer and Information Sciences,
34(4):1029–1046, 2022.

[9] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, and


C. Raffel. mt5: A massively multilingual pre-trained text-to-text transformer, 2021.

18

You might also like