0% found this document useful (0 votes)

14 views

Sample Project Final Document

The document presents a project report on 'Relation Extraction using Fine-Tuned Large Language Models (LLMs)' by V. Venkata Sai Teja, submitted for a Master's degree. It outlines the use of LLMs and the Promptify library to enhance relation extraction from unstructured text, achieving an accuracy of 80%, surpassing previous models. The report includes various sections such as literature review, system design, and evaluation metrics, detailing the methodology and objectives of the project.

Uploaded by

angrybirdhari21

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Sample Project Final Document

Uploaded by

angrybirdhari21

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 68

RELATION EXTRACTION USING FINE-TUNED LARGE

LANGUAGE MODELS (LLMs)

A project work

submitted in partial fulfilment of the requirements for the award

Master of Computer Applications

V. VENKATA SAI TEJA

(Regd. No. 22FE1F0064)

Under the Esteemed Guidance of

R. VEERA BABU, M. Tech, (Ph. D)

Associate Professor

Department of Master of Computer Applications

April, 2024

i
CERTIFICATE
This is to certify that the project report entitled “Relation Extraction using
Fine-Tuned Large Language Models (LLMs)” is a bona fide work done by V.
Venkata Sai Teja (22FE1F0064) under my guidance and submitted in partial
fulfilment of the requirements for award of the degree of Master of Computer
Applications from Jawaharlal Nehru Technological University, Kakinada. The work
embodied in this project report is not submitted to any other university or institute for
the award of any degree/diploma.

Project Guide Head of the Department

R. VEERA BABU, M. Tech, (Ph. D) R. VEERA BABU, M. Tech, (Ph. D)
Associate Professor Associate Professor & HOD

External Examiner

ii
DECLARATION
I hereby declare that the project report entitled “Relation Extraction using
Fine-Tuned Large Language Models (LLMs)” submitted to the JNTUK, is a record
of an original work done by V. Venkata Sai Teja under the Guidance of Mr. R.
VEERA BABU, M. Tech, (Ph. D), Associate of Professor of the Department of Master
of Computer Applications and this project work is submitted in the partial fulfilment of
requirements for the award of Degree of Master of Computer Applications. The results
embodied in this project report are not submitted to any other University or Institute for
the award of any Degree of Diploma.

Place: Vadlamudi V. Venkata Sai Teja

Date: (Regd. No. 22FE1F0064)

iii
ACKNOWLEDGEMENT
The satisfaction that accompanies with the successful completion of any task
would be incomplete without the mention of people whose ceaseless cooperation
made it possible, whose constant guidance and encouragement crown all efforts with
success.
I am glad to expense my deep sense of gratitude to. R. VEERA BABU, M.
Tech, (Ph. D), Associate professor, Master of Computer Applications for guiding
through this project and for encouraging right from the beginning of the project. Every
interaction with him was an inspiration. At every step he was there to help me to
choose right path.
I am glad to expense my deep sense of gratitude to Mr. R. VEERA BABU,
Associate professor and Head of the Department, Master of Computer Applications
for giving support in completion of project work.
I am glad to expense my deep sense of gratitude to the beloved chairman Dr.
L. RATHAIAH, and the principal Dr. K. PHANEENDRA KUMAR for their
encouragement and kind support in carrying out our work.
I thank my parents and others who have rendered help to me directly or
indirectly in the completion of project work.

V. VENKATA SAI TEJA

(Regd. No. 22FE1F0064)

iv
ABSTRACT
In the realm of Natural Language Processing (NLP), Relation Extraction (RE)
is a pivotal task that aims to discern and categorize relationships between entities within
a text. This project proposes a novel approach to RE by leveraging the capabilities of
Large Language Models (LLMs) and the Python library, Promptify. Our approach
utilizes the power of LLMs, such as GPT models from OpenAI, to extract relationships
from unstructured text. We employ Promptify, a Python library designed to facilitate
the use of prompt-based models for structured output. This combination allows us to
harness the predictive power of LLMs while maintaining the structure and convenience
provided by Promptify. The proposed model operates by transforming the RE task into
a classification problem, where the relationships between entities are classified based
on the context provided in the text. The model is capable of handling a variety of
domains and can be easily adapted to different contexts and entity types. Based on
previous researches they worked on different models such as BERT, BART, T5, and
GPTs. but the gaps in this research are less accuracy of 72%, scalable issue on various
text format. But our proposed research has achieved better accuracy of 80% than
previous and it can work on different domains and can handle text of multilingual at
simultaneously to train the model.

v
CONTENTS
ABSTRACT v
CONTENTS vi-vii
LIST OF FIGURES viii
LIST OF TABLES viii
CHAPTER 1 INTRODUCTION 1-2
1.1 INTRODUCTION 1
1.2 PURPOSE 1
1.3 SCOPE 2
1.4 OBJECTIVE 2
CHAPTER 2 LITERATURE REVIEW 3-12
2.1 THEORETICAL BACKGROUND OF THE PROBLEM 3
2.2 RELATED RESEARCH TO SOLVE THE PROBLEM 4-12
CHAPTER 3 SYSTEM ANALYSIS 13-16
3.1 EXISTING SYSTEM 13
3.1.1 DISADVANTAGES OF EXISTING MODEL 13
3.2 PROPOSED SYSTEM 13
3.2.1 ADVANTAGES OF PROPOSED SYSTEM 14
3.3 FEASIBILITY STUDY 14
3.3.1 TECHNICAL FEASIBILITY 14
3.3.2 ECONOMICAL FEASIBILITY 14-15
3.3.3 LEAGAL FEASIBILITY 15
3.3.4 OPERATIONAL FEASIBILITY 15-16
CHAPTER 4 SYSTEM REQUIREMENT SPECIFICATION 17-19
4.1 HARDWARE REQUIREMENTS 17
4.2 SOFTWARE REQUIREMENTS 17
4.3 FUNCTIONAL REQUIREMENTS 17
4.4 NON-FUNCTIONAL REQUIREMENTS 18-19
CHAPTER 5 METHODOLOGY 20-27
CHAPTER 6 SYSTEM DESIGN 28-33
6.1 INPUT AND OUTPUT DESIGN 28-29
6.2 UML DIAGRAM 30-33

vi
CHAPTER 7 IMPLEMENTATION 34-38
7.1 INTRODUCTION 34-37
7.2 CODE 37-38
CHAPTER 8 EVALUATION METYRICS AND TESTING 39-49
8.1 EVALUATION METRICS 39-42
8.2 TESTING INTRODUCTION 42-43
8.3 TESTING METHODOLOGIES 43-49
CHAPTER 9 RESULTS AND DISCUSSION 50-53
9.1 RESULT 50
9.2 COMPARISION TABLE 50
9.3 COMPARISON GRAPH 51
9.4 OUTPUT ANALYSIS 51-53
CHAPTER 10 CONCLUSION AND FUTURE SCOPE 54-56
10.1 CONCLUSION 54
10.2 FUTURE SCOPE 55-56
REFERNCES 57 -60

vii
LIST OF FIGURES
Fig. No. Description Page No
2.2.1 An internal description of the feature filtering module 5
2.2.2 Architecture of ARM model 6
2.2.3 Architecture of RE framework 7
2.2.4 The architecture of the PCNNs module. 10
2.2.5 The architecture of the Bi-LSTM module. 11
2.2.6 The architecture of the MSNet based module. 12
5.1 Basic Architecture 20
5.2 Initial data Processing 21
5.3 Feature Extraction 21
5.4 Decoding Layer Processing 25
5.5 Generating Function 25
6.2.1 UML Diagram 30
9.3.1 Comparison Graph 51

LIST OF TABLES
Table No Description Page No.
9.2.1 Comparison Table 50

viii
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION
Text-to-text relation extraction is a subfield of Natural Language Processing (NLP)
that focuses on identifying and classifying the relationships between pairs of text
entities. This process is crucial for understanding the semantic connections in a text,
which can range from simple relations like ‘part of’ or ‘located in’, to more complex
ones like ‘caused by’ or ‘influenced by’.
The extraction process typically involves two steps: entity recognition, where the
entities of interest are identified, and relation classification, where the type of
relationship between these entities is determined. Machine learning techniques,
particularly deep learning models, are often employed for this task due to their ability
to capture complex patterns in large amounts of data.
The extracted relations can be used in various applications such as information
retrieval, knowledge graph construction, and question answering systems. Despite its
potential, text-to-text relation extraction remains a challenging task due to the inherent
ambiguity and complexity of natural language. However, with the advancement of NLP
technologies, significant progress is being made in this field.
The relations that are extracted can be leveraged in a multitude of applications such
as data mining, semantic web development, and automated response systems. Despite
its immense potential, text-to-text relation extraction continues to be a formidable task
due to the intrinsic vagueness and complexity of natural language. Nonetheless, with
the progression of NLP technologies, substantial strides are being made in this domain.

1.2 PURPOSE

The purpose of implementing relation extraction using Promptify is to leverage

advanced natural language processing (NLP) capabilities to automatically identify and
extract meaningful relationships between entities within unstructured text data. By
employing Promptify, which is based on powerful AI models like GPT, the system aims
to enhance information retrieval, knowledge discovery, and decision-making processes
across various domains. Through automated relation extraction, organizations can
unlock valuable insights, improve data analytics, and streamline workflows by
transforming vast amounts of textual data into structured and actionable information.

1
1.3 SCOPE

The scope of relation extraction is vast and spans across various applications. One
of the primary applications is in the field of information retrieval, where relation
extraction can help in extracting key information from a large corpus of text. This can
be particularly useful in fields like law or medicine, where large amounts of text data
need to be analyzed and understood quickly.

1.4 OBJECTIVE

The objectives of implementing relation extraction using Fine-tuned Pipeline are as

follows:
Automate Information Extraction:
Develop a system capable of automatically extracting relationships between
entities from unstructured text data, reducing manual effort and enhancing efficiency.
Enhance Decision-Making:
Provide structured and actionable insights derived from textual data to support
decision-making processes across various domains.
Improve Data Analytics:
Enable deeper analysis of textual data by extracting relationships and patterns,
facilitating data-driven insights and predictive modeling.

Enable Knowledge Discovery:

Facilitate the discovery of hidden connections and insights within textual data,
empowering organizations to uncover valuable knowledge.

Ensure Accuracy and Reliability: Develop algorithms and techniques to ensure high
accuracy and reliability in relation extraction, minimizing errors and false positives.

By achieving these objectives, the relation extraction system using Promptify aims
to revolutionize the way organizations extract, analyse, and utilize information from
unstructured text data, ultimately driving innovation, efficiency, and competitiveness.

2
CHAPTER 2
LITERATURE REVIEW
2.1 THEORETICAL BACKGROUND OF THE PROBLEM
Relation extraction is a fundamental task in natural language processing (NLP)
that involves identifying and categorizing semantic relationships between entities
mentioned in text data. In the context of NLP, several theoretical concepts and
techniques are relevant to relation extraction:
Text Representation:
Textual data is typically represented in a structured format, such as sentences or
documents, where entities and their relationships are expressed through linguistic
patterns and syntactic structures.
Techniques for representing text include tokenization, part-of-speech tagging,
dependency parsing, and named entity recognition, which provide the foundational
elements for identifying entities and their interactions.
Semantic Analysis:
Semantic analysis involves understanding the meaning and context of words,
phrases, and sentences in text data. This includes techniques like semantic role labeling,
semantic parsing, and semantic similarity measurement.
Semantic analysis helps in identifying the semantic roles played by entities in a
sentence and inferring the nature of their relationships based on contextual cues and
linguistic patterns.
Machine Learning and Deep Learning:
Machine learning and deep learning techniques are widely used for relation
extraction, allowing models to automatically learn patterns and relationships from large
amounts of annotated data.
Supervised learning algorithms, such as support vector machines (SVM), logistic
regression, and neural networks, are commonly employed for relation extraction tasks.
Deep learning architectures, including convolutional neural networks (CNNs),
recurrent neural networks (RNNs), and transformer-based models like BERT
(Bidirectional Encoder Representations from Transformers), have shown promising
results in capturing complex patterns and dependencies in text data.

3
Feature Extraction:
Feature extraction involves transforming raw text data into numerical or vector
representations that can be fed into machine learning models.
Traditional feature extraction methods include bag-of-words, n-grams, and tf-
idf (term frequency-inverse document frequency), while deep learning models
automatically learn feature representations from raw text data through embedding
layers and attention mechanisms.
Relation Classification:
Relation classification is the task of assigning predefined labels or categories to
pairs of entities based on their textual context.
Supervised learning approaches for relation classification typically involve
training classifiers to predict the type of relationship between entity pairs using features
extracted from text data.
Evaluation Metrics:
Evaluation metrics such as precision, recall, F1 score, accuracy, and area under
the ROC curve (AUC-ROC) are used to assess the performance of relation extraction
models.
These metrics measure the model’s ability to correctly identify true
relationships while minimizing false positives and false negatives.
2.2 RELATED RESEARCH TO SOLVE THE PROBLEM
1. Joint Biomedical Entity and Relation Extraction Based on Feature Filter Table
Labeling
In the research conducted by LINLIN XING, titled "Joint Biomedical Entity
and Relation Extraction Based on Feature Filter Table Labeling" and published in IEEE
Access in 2023, a novel approach known as the FiTaCNN (Feature Filtering Table with
CNN) algorithm was employed for biomedical entity and relation extraction tasks.
Through the implementation of FiTaCNN, XING achieved a commendable accuracy
rate of 78.88%, showcasing the algorithm's effectiveness in identifying and
categorizing biomedical entities and their relationships within textual data. However,
despite the promising results, the study elucidates certain limitations inherent in the
FiTaCNN approach. Specifically, challenges were encountered in the manipulation and
prediction of relations between entities, highlighting areas where further enhancements
and optimizations are warranted. Addressing these limitations could contribute to the

4
refinement of the FiTaCNN algorithm and its applicability in real-world scenarios,
ultimately advancing the field of biomedical text mining and facilitating the extraction
of valuable insights from biomedical literature.

Figure 2.2.1: An internal description of the feature filtering module

2. Attention Retrieval Model for Entity Relation Extraction from Biological

Literature
In their study published in IEEE Access, Prashant Srivastava and colleagues
propose an Attention Retrieval Model (ARM) for entity relation extraction from
biological literature, leveraging recurrent neural networks (RNNs) for modeling textual
relationships. The evaluation of ARM demonstrates good and efficient performance in
capturing complex dependencies between entities in biological texts. The ARM model
incorporates attention mechanisms to focus on relevant parts of the text, allowing it to
effectively capture the nuances of entity relationships. However, the study identifies
limitations associated with ARM's efficiency in providing relations among different
text inputs. Despite its effectiveness, the ARM model struggles to efficiently infer
relationships between entities, pointing towards areas for improvement in enhancing its
relationship inference capabilities. Future research directions may involve exploring
more sophisticated attention mechanisms, incorporating external knowledge sources,
or integrating additional features to improve ARM's performance further. Addressing
these limitations could further enhance the utility of ARM in biomedical text mining
and contribute to advancements in biological literature analysis.

5
Figure 2.2.2: Architecture of ARM model
3. Extraction of Poetic and Non-Poetic Relations From of-Prepositions Using
WordNet
Christiana Panayiotou's research in IEEE Access presents an innovative
approach utilizing Princeton WordNet (PWN) for extracting poetic and non-poetic
relations from of-prepositions. The paper addresses the challenge of extracting
conceptual relations from diverse resources, offering promising insights into linguistic
analysis. The proposed methodology involves semantic analysis of textual data using
WordNet to identify and categorize different types of relations expressed through of-
prepositions. However, limitations are observed in the accuracy of the algorithm,
particularly in relation extraction from contextual data. Despite its contributions, the
algorithm exhibits inaccuracies in capturing nuanced semantic relationships, indicating
the need for refinement to improve its performance in handling complex linguistic
contexts. Future research directions may involve incorporating contextual information,
leveraging deep learning techniques, or exploring alternative lexical resources to

6
enhance the accuracy and robustness of relation extraction algorithms based on
semantic analysis.
4. A Graph Convolutional Network with Multiple Dependency Representations
for Relation Extraction
In IEEE Access, Yanfeng Hu and co-authors introduce a Graph Convolutional
Network (GCN) equipped with multiple dependency representations for relation
extraction tasks. Their approach achieves an accuracy of 68.0% in identifying
relationships between entities, demonstrating promising results in capturing relational
semantics. The GCN model incorporates graph-based representations of textual data,
allowing it to capture both local and global dependencies between entities.
Nevertheless, the study highlights limitations concerning the applicability of GCN to
high-dimensional data. Despite its effectiveness, the GCN model faces challenges in
handling complex data structures, suggesting avenues for future research to address
scalability issues and enhance its utility across diverse datasets. Future research
directions may involve exploring alternative graph-based models, developing
techniques for dimensionality reduction, or investigating methods for handling sparse
and heterogeneous data in graph-based relation extraction approaches.

Figure 2.2.3: Architecture of RE framework

7
5. A Neural Relation Extraction Model for Distant Supervision in Counter-
Terrorism Scenario
Jiaqi Hou and collaborators present a neural relation extraction model based on
Bidirectional Encoder Representation from Transformers (BERT) for distant
supervision in counter-terrorism scenarios. Their model finds applications in regional
security risk assessment and terrorist event prediction. The BERT-based model utilizes
pre-trained transformer-based architectures to extract relationships between entities
from large-scale textual data sources. However, the study identifies limitations
associated with the model's dependence on large datasets for effective training. Despite
its versatility, the BERT-based model requires substantial data resources, posing
challenges in scenarios with limited labeled data availability. Future research directions
may involve exploring techniques for semi-supervised or unsupervised learning,
developing methods for transfer learning from related tasks, or investigating
approaches for domain adaptation to mitigate data scarcity issues in distant supervision
scenarios.
6. A Novel Document-Level Relation Extraction Method Based on BERT and
Entity Information
Xiaoyu Han and colleagues propose a novel document-level relation extraction
method incorporating BERT and Entity Information (DEMMT) for improved
performance. Their approach demonstrates a notable improvement of 2% in F1 score
compared to models without pre-trained representations and 5% compared to pure
BERT. The DEMMT model integrates contextual information from BERT embeddings
with entity-level features to capture document-level relationships effectively. However,
the study identifies structural complexity as a limitation of the DEMMT model. Despite
its advancements, the intricate architecture of DEMMT poses challenges in
generalizing across different datasets, suggesting the need for simplification and
optimization for broader applicability. Future research directions may involve
exploring alternative architectures, optimizing hyperparameters, or investigating
methods for fine-tuning pre-trained models to improve the robustness and scalability of
document-level relation extraction systems.

8
7. BERT-Based Chinese Relation Extraction for Public Security
Jiaqi Hou and co-authors present a BERT-based Chinese relation extraction
algorithm tailored for public security applications. The model effectively mines
security information from textual data, contributing to enhanced surveillance and threat
detection. The BERT-based approach leverages transformer-based architectures to
capture contextual relationships between entities in Chinese text. Nevertheless, the
study acknowledges limitations arising from the model's dependence on the
transformer’s architecture. Despite its effectiveness, the purely transformer-based
approach exhibits constraints in handling specific linguistic nuances and context-
specific variations. Future research directions may involve exploring techniques for
incorporating linguistic features, developing domain-specific pre-trained models, or
investigating methods for cross-lingual relation extraction to enhance the adaptability
and robustness of BERT-based algorithms for Chinese relation extraction in public
security applications.
8. Reducing Wrong Labels for Distantly Supervised Relation Extraction with
Reinforcement Learning
Tiantian Chen and collaborators propose a Deep Q Network (DQN)-based
denoiser to mitigate incorrect labels in distantly supervised relation extraction. Their
approach outperforms previous state-of-the-art baselines, effectively dealing with noisy
labels and improving overall model performance. The DQN-based denoiser employs
reinforcement learning techniques to learn optimal label correction strategies and
enhance the quality of training data for relation extraction models. However, the study
identifies limitations associated with the model's accuracy compared to other
approaches. Despite its efficacy, the DQN-based denoiser exhibits lower accuracy
rates, suggesting the need for further refinement to achieve optimal performance across
different datasets. Future research directions may involve exploring alternative
reinforcement learning algorithms, developing hybrid approaches combining
supervised and unsupervised learning, or investigating methods for adaptive label
correction to improve the robustness and generalization of relation extraction models
trained on noisy data.

9
Figure 2.2.4: The architecture of the PCNNs module.
9. Jointly Extract Entities and Their Relations from Biomedical Text
In IEEE Access, Jizhi Chen and colleagues propose a methodology combining
Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) for
jointly extracting entities and their relations from biomedical text. Their approach is
beneficial for biomedical text mining and the construction of biomedical knowledge
bases. The CNN-RNN model integrates both local and sequential information from
textual data to capture complex relationships between entities effectively. However,
limitations are observed in its accuracy for identifying relations between contextual
data, indicating areas for improvement in capturing complex semantic relationships.
Future research directions may involve exploring techniques for incorporating external
knowledge sources, developing multi-task learning frameworks, or investigating
methods for joint entity and relation extraction with attention mechanisms to enhance
the accuracy and robustness of biomedical text mining systems.
Entity recognition and relation extraction have become an important part of
knowledge acquisition, and which have been widely applied in various elds, such as
Bioinformatics. However, prior state-of-the-art extraction models heavily rely on the
external features obtained from hand-craft or natural language processing (NLP) tools.
As a result, the performance of models depends directly on the accuracy of the obtained
features.

10
Figure 2.2.5: The architecture of the Bi-LSTM module.
MSnet: Multi-Head Self-Attention Network for Distantly Supervised Relation
Extraction
Tingting Sun and co-authors introduce MSnet, a Multi-Head Self-Attention
Network-based label denoising method for distantly supervised relation extraction.
Their approach outperforms existing systems on popular evaluation datasets,
demonstrating superior performance in relation extraction tasks. The MSnet model
leverages self-attention mechanisms to capture long-range dependencies and identify
informative patterns in textual data. However, limitations arise in the form of high error
rates, which impact the robustness of the approach. Despite its advancements, the
MSnet approach requires further refinement to reduce error rates and enhance its
reliability across different datasets. Future research directions may involve exploring
techniques for fine-tuning attention mechanisms, developing ensemble learning
approaches, or investigating methods for incorporating domain-specific knowledge to
improve the accuracy and generalization of MSnet-based relation extraction systems.

11
Figure 2.2.6: The architecture of the MSNet based module.

12
CHAPTER 3
SYSTEM ANALYSIS
3.1 EXISTING SYSTEM
a novel approach known as the FiTaCNN (Feature Filtering Table with CNN)
algorithm was employed for biomedical entity and relation extraction tasks. Through
the implementation of FiTaCNN, XING achieved a commendable accuracy rate of
78.88%, showcasing the algorithm's effectiveness in identifying and categorizing
biomedical entities and their relationships within textual data. However, despite the
promising results, the study elucidates certain limitations inherent in the FiTaCNN
approach. Specifically, challenges were encountered in the manipulation and prediction
of relations between entities, highlighting areas where further enhancements and
optimizations are warranted.
3.1.1 DISADVANTAGES OF EXISTING MODEL
Accuracy:
The existed system of Relation Extraction yields 86% accuracy.

Multi-Lingual Data: This refers to data that is represented in more than one
language. It’s often a challenge in data processing and analysis, as different
languages have different structures, semantics, and nuances.

Domain-specific: His term refers to something that is specialized for a particular

area or field

Accuracy issues: This term generally refers to problems or errors that affect the
correctness of data or information.

Time Taken: This is a measure of the amount of time required to complete a

task or process.
3.2 PROPOSED SYSTEM

Even the above problems have been solved by different

researchers. But Relation Extraction has no implementation using
Prompt Engineering based on pipelining. This is helpful for
improving the accuracy and performance of model to train the given
data in a most proficient manner.

13
3.2.1 ADVANTAGES OF PROPOSED SYSTEM
• The proposed system of Relation Extraction yields 80% accuracy.
• Work with any domain.
• Can access Multi-lingual data.
• It took less Time.
3.3 Feasibility Study

A feasibility study for implementing a relation extraction system using Promptify

involves evaluating various dimensions such as technical feasibility, economic
feasibility, legal feasibility, operational feasibility, and schedule feasibility. Here is a
detailed exploration of each:

3.3.1Technical Feasibility

The technical feasibility assesses whether the current technology is capable of

meeting the needs of the relation extraction project. Promptify, leveraging advanced AI
models such as GPT, promises robust natural language understanding capabilities that
are crucial for accurately identifying and categorizing relationships within text.
However, the feasibility study would need to explore:

Integration with Existing Systems: How well Promptify can integrate with current IT
infrastructure, including compatibility with existing databases and other software tools.

Scalability: Whether the system can handle the anticipated volume of data and user
queries without performance degradation.

Technical Limitations: Any constraints related to the complexity of queries Promptify

can process or the languages and dialects it supports.

3.3.2 Economic Feasibility

This aspect evaluates the cost-effectiveness of the project:

Cost Analysis: Detailed analysis of the costs involved in licensing Promptify, possibly
adapting it to specific needs, ongoing operational costs, and maintenance.

14
Return on Investment (ROI): Projected savings or revenue enhancements through
improved efficiency, faster information retrieval, and potentially new capabilities like
enhanced data analytics.

Budget Constraints: Whether the budget aligns with project requirements and
expected benefits.

3.3.3 Legal Feasibility

Legal considerations include:

Data Privacy: Compliance with regulations like GDPR or HIPAA, especially relevant
if the extracted relations include personal data.

Intellectual Property: Ensuring that the use of third-party technologies like Promptify
does not infringe on intellectual property rights and that all licenses are in order.

Contractual Obligations: Adherence to terms of use and service agreements with

software providers and other stakeholders.

3.3.4 Operational Feasibility

Operational feasibility examines how well the system will operate within the
organization:

User Acceptance: The likelihood that employees and stakeholders will adopt and
efficiently use the new system.

Training Requirements: The level and extent of training needed for staff to effectively
use Promptify.

Support Structures: The internal support systems required, such as IT support and
customer service.

Schedule Feasibility

This involves determining whether the project can be completed within the desired
timelines:

15
Development Time: Time needed to set up, configure, and deploy the relation
extraction system.

Testing Phases: Duration of testing phases, including initial integration, system

testing, and user acceptance testing.

Project Milestones: Establishment of realistic milestones and deadlines to ensure

timely progress and resource allocation.

16
CHAPTER 4
SYSTEM REQUIREMENT SPECIFICATION
4.1 HARDWARE REQUIREMENTS
• System : Pentium IV 2GHz.
• HardDisk : 40 GB.
• Ram : 512 MB
• Monitor : 15inch VGA Color.
• Keyboard : Standard Keyboard
4.2 SOFTWARE REQUIREMENTS
• Platform : PYTHON TECHNOLOGY
• Tool : Python 3.6
• Back End : Jupyter
4.3 FUNCTIONAL REQUIREMENTS
In software engineering, a functional requirement defines a function of a
software system or its component. A function is described as a set of inputs, the
behaviour, and outputs (see also software). Functional requirements may be
calculations, technical details, data manipulation and processing and other specific
functionality that define what a system is supposed to accomplish. Behavioral
requirements describing all the cases where the system uses the functional requirements
are captured in use cases. Generally, functional requirements are expressed in the form
“system shall do <requirement>”. The plan for implementing functional requirements
is detailed in the system design. In requirements engineering, functional requirements
specify particular results of a system. Functional requirements drive the application
architecture of a system. A requirements analyst generates use cases after gathering and
validating a set of functional requirements. The hierarchy of functional requirements is:
user/stakeholder request -> feature -> use case -> business rule
Functional requirements drive the application architecture of a system. A
requirements analyst generates use cases after gathering and validating a set of
functional requirements.
Functional requirements may be technical details; data manipulation and other
specific functionality of the project is to provide the information to the user. The
following are the Functional requirements of our system:
1.We are providing live human face through video Stream then we will get efficient

17
result.
2. The detection of mask is based on the input given by the user.
3. We are having the effective detecting methodology.
4. We can easily apply this System with Cameras in public areas which restricts the
people who are not wearing mask.
4.4 NON-FUNCTIONAL REQUIREMENTS
In systems engineering and requirements engineering, a non-functional
requirement is a requirement that specifies criteria that can be used to judge the
operation of a system, rather than specific behaviours. This should be contrasted with
functional requirements that define specific behaviour or functions. The plan for
implementing nonfunctional requirements is detailed in the system architecture. non-
functional requirements are "system shall be <requirement>".
The following are the Nonfunctional requirements for our system:
AVAILABILITY:
A system’s “availability” or “uptime” is the amount of time that is operational
and available for use. It’s related to is the server providing the service to the users in
displaying images. As our system will be used by thousands of users at any time our
system must be available always. If there are any cases of updating, they must be
performed in a short interval of time without interrupting the normal services made
available to the users.
EFFICIENCY:
Specifies how well the software utilizes scarce resources: CPU cycles, disk
space, memory, bandwidth etc. All of the above-mentioned resources can be effectively
used by performing most of the validations at client side and reducing the workload on
server by using JSP instead of CGI which is being implemented now.
FLEXIBILITY:
If the organization intends to increase or extend the functionality of
the software after it is deployed, that should be planned from the beginning; it
influences
Choices made during the design, development, testing and deployment of the
system. New modules can be easily integrated to our system without disturbing the
existing modules or modifying the logical database schema of the existing applications.

18
PORTABILITY:
Portability specifies the ease with which the software can be installed on all
necessary platforms, and the platforms on which it is expected to run By using
appropriate server versions released for different platforms our project can be easily
operated on any operating system, hence can be said highly portable.
SCALABILITY:
34Software that is scalable has the ability to handle a wide variety of system
configuration sizes. The nonfunctional requirements should specify the ways in which
the system may be expected to scale up (by increasing hardware capacity, adding
machines etc.). Our system can be easily expandable. Any additional requirements such
as hardware or software which increase the performance of the system can be easily
added. An additional server would be useful to speed up the application.
INTEGRITY:
Integrity requirements define the security attributes of the system, restricting
access to features or data to certain users and protecting the privacy of data entered into
the software. Certain features access must be disabled to normal users such as adding
the details of files, searching etc. which is the sole responsibility of the server. Access
can be disabled by providing appropriate logins to the users for only access.
USABILITY:
Ease-of-use requirements address the factors that constitute the capacity of the
software to be understood, learned, and used by its intended users. Hyperlinks will be
provided for each and every service the system provides through which navigation will
be easier. A system that has high usability coefficient makes the work of the user easier.
PERFORMANCE:
The performance constraints specify the timing characteristics of the software.
Making the application form filling process through online and providing the
invigilation list information and examination hall list is given high priority compared
to other services and can be identified as the critical aspect of the system. In our system
introduced user specific detection performance. The query related detection is effective
it provides within short period results, so the speed of system is very high.

19
CHAPTER 5
METHODOLOGY

Fig 5.1: Basic Architecture

1. Input Embedding:

Initially, our input tokens undergo a couple of encoding steps: they’re encoded
using an Embedding layer, followed by a Positional Encoding layer, and then the two
encodings are added together. The Embedding layer takes each token, which is a single
number, calculates its embedding, which is a sequence of numbers of length and model,
and returns a tensor containing each embedding in place of the corresponding original
token.

The input embedding layer adds information about the absolute position and
relative distance of each token in the sequence. Unlike recurrent neural networks
(RNNs) or convolutional neural networks (CNNs), Transformers don’t inherently
possess any notion of where in the sequence each token appears. Therefore, to capture
the order of tokens in the sequence, Transformers rely on a Positional Encoding.
Therefore, to capture the order of tokens in the sequence, Transformers rely on a
Positional Encoding.

20
Fig 5.2 Initial Data Processing
2. Positional Encoding:

The Positional Encoding layer adds information about the absolute position and
relative distance of each token in the sequence. Unlike recurrent neural networks
(RNNs) or convolutional neural networks (CNNs), Transformers don’t inherently
possess any notion of where in the sequence each token appears. Therefore, to capture
the order of tokens in the sequence, Transformers rely on a Positional Encoding.

3. Feature Extraction:

The pre-trained GPT model inherently captures high-level features from the input text.
It learns to represent the semantic and syntactic properties of the text, encoding relevant
information that aids in the data summarization and relation extraction task. As the GPT
model processes the input prompts, it automatically extracts features from the text at
various levels of abstraction, capturing patterns, relationships, and contextual cues
necessary for predicting the relevant labels.

Fig 5.3: Feature Extraction

21
4. Promptify:
The "Promptify" step in the architecture refers to the process of adapting or
transforming the input text into prompts suitable for the Large Language Model (LLM),
specifically GPT-4. This step ensures that the input text is properly structured to elicit
the desired response or perform the intended task effectively when fed into the LLM.

Here's an explanation of the Promptify step:

• Adapting Text:

The input text undergoes adaptation to create a prompt that provides context
and guidance for the LLM. This adaptation may involve adding specific instructions,
keywords, or cues to prompt the LLM to generate the desired output.

• Structuring Prompt:

The adapted text is structured in a format that effectively communicates the

task or question to the LLM. This format may vary depending on the nature of the
task, such as text generation, question answering, or language understanding.

• Providing Context:

The prompt aims to provide sufficient context for the LLM to understand the
task at hand and generate relevant responses. Overall, the Promptify step plays a crucial
role in preparing the input text for interaction with the LLM, enabling effective
communication and collaboration between the user and the language model to achieve
desired outcomes in various NLP tasks. As we saw in the diagrammatic overview of
the Transformer architecture, the next stage after the Embedding and Fine- tuning layers
is the Decoder module. The Decoder consists of N copies of a Decoder Layer followed
by a Layer Norm. The Layer Norm takes an input of shape and normalizes it over its
last dimension. At a high level, a Decoder Layer consists of two main steps: the
attention step, which is responsible for the communication between tokens, and the feed
forward step, which is responsible for the computation of the predicted tokens. Data
summarization and relation extraction resulting in improved performance and
efficiency in classification tasks involving multiple labels. As we saw in the
diagrammatic overview of the Transformer architecture, the next stage after the
Embedding and Fine- tuning layers is the Decoder module. The adapted text is

22
structured in a format that effectively communicates the task or question to the LLM.
This format may vary depending on the nature of the task, such as text generation.

5. LLM Processing:
Large Language Models are sophisticated neural network architectures trained
on vast amounts of text data to understand, generate, and manipulate natural language.
LLM processing involves leveraging these models to perform a wide range of NLP
tasks, including text generation, question answering and text classification. LLM
processing typically involves encoding the input text, feeding it into the LLM model,
and generating predictions or outputs based on the model's learned representations and
parameters. These models, such as the GPT (Generative Pre-trained Transformer) series
by OpenAI, have achieved remarkable performance in various NLP tasks and have
become integral components in many NLP applications and systems.
6. Fine Tuning:
Fine-tuning in the context of Large Language Models (LLMs) like GPT
involves further training the pre-trained model on specific tasks or datasets to improve
its performance or adapt it to a particular use case. In data summarization and relation
extraction, fine-tuning GPT involves training the model to accurately classify text
inputs into multiple categories or labels simultaneously.

Here's why fine-tuning is used for GPT in data summarization and relation extraction:

Domain Adaptation:

Fine-tuning allows the model to adapt its parameters to better suit the
characteristics of the target task or dataset, improving its performance on data
summarization and relation extraction tasks within that domain.

Task-specific Learning:

Data summarization and relation extraction requires the model to learn task
specific patterns, features, and representations from labelled examples, enhancing its
ability to classify text accurately across multiple labels.

23
Performance Improvement:

Fine-tuning GPT on a labelled dataset for data summarization and relation

extraction can lead to significant improvements in performance metrics such as
accuracy, precision, recall, and F1-score.

Transfer Learning:

Fine-tuning leverages this transfer of knowledge allows the model to benefit

from the general language understanding capabilities learned during pre training while
adapting to the specifics of the multi-label classification task. Overall, fine-tuning GPT
for data summarization and relation extraction resulting in improved performance and
efficiency in classification tasks involving multiple labels. As we saw in the
diagrammatic overview of the Transformer architecture, the next stage after the
Embedding and Fine- tuning layers is the Decoder module.

The Decoder consists of N copies of a Decoder Layer followed by a Layer

Norm. The Layer Norm takes an input of shape (batch_size, seq_len, d_model) and
normalizes it over its last dimension. At a high level, a Decoder Layer consists of two
main steps: the attention step, which is responsible for the communication between
tokens, and the feed forward step, which is responsible for the computation of the
predicted tokens.

7. Multi-Head Attention:

The inputs to the multi-headed attention layer include three tensors called
query(Q), key (K), and value(V). In our particular model, we pass the same tensor for
all three of these parameters: the output x of the previous layer. We pre-process these
three tensors by first passing each through a linear layer, then splitting them into h
attention heads of size, resulting in tensors of shape (batch_size, seq_len, h, d_k).
Attention is calculated using the following formula: The last step in our Transformer is
the Generator, which consists of a linear layer and a softmax executed in sequence.

24
Figure 5.4: Decoder Layer Processing

8. Linear Function:

The purpose of the linear layer is to convert the third dimension of our tensor
from the internal-only d_model embedding dimension to the vocab_size dimension,
which is understood by the code that calls our Transformer. The result is a tensor
dimension of (batch_size, seq_len, vocab_size).

9. Softmax Activation Function:

The purpose of the softmax is to convert the values in the third tensor dimension
into a probability distribution. This tensor of probability distributions is what we return
to the user.

Figure 5.5: Generating Functions

25
Feature Extraction &Engineering:

Feature extraction and engineering for data summarization and relation

extraction in NLP involve transforming raw text data into numerical representations
that capture relevant information for the classification task. Here are some key steps
and techniques:

• Tokenization:

Break down the text into individual tokens, such as words or subwords. Convert
tokens into numerical vectors that can be processed by machine learning algorithms.

• Vectorization:

Techniques include one-hot encoding, count vectorization, TF-IDF (Term

Frequency-Inverse Document Frequency), and word embeddings.

• Word Embeddings:

Utilize pre-trained word embeddings to represent words as dense vectors in a

continuous vector space. Fine-tune embedding’s during model training or use static
embeddings. Consider contextualized word embeddings like ELMo, BERT, or GPT for
capturing context- dependent representations.

• Sentence Embeddings:

Transform entire sentences or documents into fixed-length numerical vectors.

Techniques include averaging word embeddings, weighted averaging based on TF-IDF
scores, or using pre-trained models like Doc2Vec or Universal Sentence Encoder.

• Feature Engineering:

Extract linguistic features such as part-of-speech tags, syntactic parse trees,

named entities, or sentiment scores. Create domain-specific features or meta-features
based on prior knowledge or external resources. Experiment with feature combinations
or interactions to capture complex relationships between features.
• Text Cleaning and Pre-processing:

Remove noise and irrelevant information from text data, such as HTML tags,
punctuation, stop words, and rare or misspelled words. Perform text normalization tasks

26
like lowercasing, stemming, or lemmatization to reduce vocabulary size and improve
generalization.

Sequence Modelling:

Capture sequential information in text data using recurrent neural networks

(RNNs), long short-term memory networks (LSTMs), or gated recurrent units (GRUs).
Apply techniques like attention mechanisms to focus on important parts of the input
sequence.

• Dimensionality Reduction:

Reduce the dimensionality of feature representations to improve computational

efficiency and reduce overfitting. Techniques include principal component analysis
(PCA), singular value decomposition (SVD), or autoencoders.

• Handling Imbalanced Data:

Address class imbalance by applying techniques like oversampling, under

sampling, or using class weights during training.

• Cross-Validation:

Employ cross-validation techniques to evaluate model performance robustly and

mitigate overfitting. Perform text normalization tasks like lowercasing, stemming, or
lemmatization to reduce vocabulary size and improve generalization. Create
domainspecific features or meta-features based on prior knowledge or external
resources.

27
CHAPTER 6
SYSTEM DESIGN
6.1 INPUT AND OUTPUT DESIGN
Input Design:

For relation extraction using Promptify, the input design involves structuring
the textual data in a format suitable for processing by the model. This typically includes
the following components:

Textual Data Sources:

Gather textual documents or corpora from relevant sources such as research

papers, news articles, social media posts, or any other domain-specific texts where
relations need to be extracted.

Preprocessing:

Preprocess the raw textual data to clean and normalize it. This may involve tasks
such as tokenization, lowercasing, punctuation removal, stop word removal, and
stemming or lemmatization to standardize the text.

Entity Recognition:

Identify and annotate entities within the text that are relevant to the relation
extraction task. This could involve named entity recognition (NER) using pre-trained
models or domain-specific dictionaries.

Text Representation:

Convert the pre-processed text into a format suitable for input into the Promptify
model. This may involve techniques such as encoding the text into numerical vectors
using word embeddings or tokenizing the text into sequences for input into the model.

Context Window:

Define the context window or scope within which relations are to be extracted.
This could involve specifying the maximum distance between entities or defining
specific sections of text where relations are expected to occur.

Output Design:

28
The output design for relation extraction using Promptify involves structuring
the extracted relations in a format that is useful for downstream applications. This
typically includes the following components:

Relation Extraction:

Extract relations between entities identified in the input text using the Promptify
model. This could involve identifying direct relationships, such as "X is-a Y" or "X
influences Y," or more complex relational patterns.

Relation Types:

Classify the extracted relations into predefined types or categories based on the
semantics of the relationship. This allows for better interpretation and analysis of the
extracted information.

Confidence Scores:

Assign confidence scores or probabilities to the extracted relations to indicate

the model's confidence in each prediction. This helps users assess the reliability of the
extracted information and filter out low-confidence predictions.

Output Format:

Define the format for presenting the extracted relations, such as structured data
formats (e.g., JSON, XML) or tabular formats (e.g., CSV). This facilitates easy
integration with downstream applications and analysis tools.

Visualization (Optional):

Optionally, visualize the extracted relations using graphical representations

such as network graphs or dependency trees. This provides a visual overview of the
extracted relationships and aids in understanding complex relational patterns.

6.2 UML DIAGRAM

29
Figure 6.2.1 UML Diagram

Text Input Interface:

30
The text input interface serves as the entry point for users to input their raw text
data. Users can input unstructured textual information, such as articles, documents, or
any other form of text data, into the interface. This interface may take various forms,
such as a web-based form, a text file upload feature, or an API endpoint, depending on
the deployment environment and user requirements. The text input interface is designed
to be user-friendly, allowing users to easily provide the input text data for further
processing.

Preprocessing Module:

Upon receiving the raw text input, the preprocessing module performs a series of
data cleaning and preparation tasks to ensure that the text is in a suitable format for
processing. This includes steps such as removing noise and irrelevant information,
tokenization, stemming or lemmatization, and handling special characters or symbols.
Additionally, the preprocessing module may also perform tasks such as language
detection and text normalization to standardize the input text data. The goal of the
preprocessing module is to transform the raw text into a clean and structured format
that is ready for further analysis and processing.

Prompt Generator:

The prompt generator component plays a crucial role in structuring the input text
data into prompts that guide the subsequent interaction with the underlying GPT model.
Based on the preprocessed text, the prompt generator generates structured prompts
tailored to the specific task of relation extraction. These prompts may include
instructions, placeholders for entities or relation types, and contextual information
extracted from the input text. The prompt generator utilizes rule-based techniques or
machine learning models to generate prompts that effectively capture the relevant
information needed for relation extraction. The generated prompts serve as input for the
Promptify API to interact with the GPT model.

Promptify API:

The Promptify API acts as a bridge between the prompt generator component and
the underlying GPT model, facilitating the exchange of prompts and responses. Upon
receiving the structured prompts from the prompt generator, the Promptify API sends
these prompts to the GPT model for processing. The GPT model generates responses

31
based on the provided prompts, utilizing its language generation capabilities to produce
coherent and contextually relevant output. The Promptify API handles the
communication with the GPT model, managing the request-response cycle and
handling any errors or exceptions that may occur during the interaction.

Relation Extraction Engine:

The relation extraction engine is responsible for processing the responses

generated by the GPT model to extract meaningful relationships from the input text
data. This component analyzes the generated responses, identifies relevant entities and
their relationships, and extracts structured information representing the semantic
connections between entities. The relation extraction engine may employ techniques
such as pattern matching, semantic parsing, or machine learning algorithms to infer
relationships from the generated text. The extracted relationships are categorized based
on predefined relation types (e.g., person-to-person, organization-to-person) and stored
for future use or analysis.

Data Storage:

Extracted relationships are stored in a data storage component for future

reference, analysis, or integration with other systems. The data storage component may
utilize relational databases, NoSQL databases, or other storage mechanisms to store the
extracted relationship data in a structured format. The stored data may include
information such as entity identifiers, relation types, confidence scores, and contextual
metadata associated with each extracted relationship. Data storage ensures that the
extracted relationships are persisted and can be accessed efficiently for various
purposes, such as reporting, visualization, or further processing.

Output Interface:

The output interface serves as the endpoint for presenting the results of the
relation extraction process to the user or another system. This interface may take
various forms, such as a web-based dashboard, an API endpoint, or integration with
third-party applications. The output interface provides users with access to the extracted
relationships in a user-friendly and interpretable format, such as tables, graphs, or
visualizations. Additionally, the output interface may also offer functionalities such as
filtering, sorting, or exporting the extracted data for further analysis or downstream

32
processing. The goal of the output interface is to enable users to interpret and utilize the
extracted relationships effectively for their intended purposes.

In practice, the output interface might include real-time updates to reflect the
latest extraction results, ensuring that users are always working with current data. It
could also support user authentication and authorization, ensuring that only permitted
users can access or manipulate the data. Advanced features might include machine
learning integration, where users can apply additional models to the extracted
relationships for predictive analytics. Furthermore, customization options could allow
users to tailor the interface to their specific needs, enhancing the overall usability and
relevance of the information presented.

33
CHAPTER 7
IMPLEMENTATION
7.1 INTRODUCTION
Relation extraction using Fine-tune pipeline can be implemented by creating a
system that seamlessly integrates the capabilities of a natural language processing AI,
like GPT, to understand and extract relationships from text. First, text data input
through a user interface is pre-processed to standardize and clean it, which typically
involves tasks like tokenization, removing stop words, and normalizing text. This pre-
processed text is then used to generate specific prompts, crafted to ask the AI targeted
questions that facilitate the extraction of predefined relationships (e.g., between people,
organizations, and locations). These prompts are sent to the Promptify API, which
interacts with a GPT model to generate responses that contain the potential relationships
expressed in the text. The core of the system, the Relation Extraction Engine, analyses
these responses, employing techniques like pattern matching or advanced natural
language understanding, to identify and extract the relationships. Finally, these
relationships are stored in a structured format in a database for easy access and further
analysis, and the results are presented to the user through an output interface. This setup
leverages the advanced contextual understanding of GPT models to efficiently and
effectively pull relational data from large volumes of text.

PYTHON

Python is a general-purpose interpreted, interactive, object-oriented, and high-

level programming language. An interpreted language, Python has a design philosophy
that emphasizes code readability (notably using white space indentation to delimit code
blocks rather than curly brackets or keywords), and a syntax that allows programmers
to express concepts in fewer lines of code than might be used in languages such as
C++or Java. It provides constructs that enable clear programming on both small and
large scales. Python interpreters are available for many operating systems. CPython,
the reference implementation of Python, is open-source software and has a community-
based development model, as do nearly all of its variant implementations. CPython is
managed by the non-profit Python Software Foundation. Python features a dynamic
type system and automatic memory management. It supports multiple programming

34
paradigms, including object-oriented, imperative, functional and procedural, and has a
large and comprehensive standard library.

• Python is Interpreted: Python is processed at runtime by the interpreter. You do not

need to compile your program before executing it. This is similar to PERL and PHP.

• Python is Interactive: You can actually sit at a Python prompt and interact with the

interpreter directly to write your programs.

• Python is Object-Oriented: Python supports Object-Oriented style or technique of

programming that encapsulates code within objects.

• Python is a Beginner's Language: Python is a great language for the beginner-level

programmers and supports the development of a wide range of applications from

simple text processing to WWW browsers to games.

History of Python

Python was developed by Guido van Rossum in the late eighties and early nineties
at the National Research Institute for Mathematics and Computer Science in the
Netherlands. Python is derived from many other languages, including ABC, Modula-3,
C, C++, Algol-68, Smalltalk, and Unix shell and other scripting languages. Python is
copyrighted. Like Perl, Python source code is now available under the GNU General
Public License (GPL). Python is now maintained by a core development team at the
institute, although Guido van Rossum still holds a vital role in directing its progress.

Python Features

Python's features include:

• Easy-to-read: Python code is more clearly defined and visible to the eyes.

• Easy-to-maintain: Python's source code is fairly easy-to-maintain.

• A broad standard library: Python's bulk of the library is very portable and cross

platform compatible on UNIX, Windows, and Macintosh.

35
• Interactive Mode: Python has support for an interactive mode which allows

interactive testing and debugging of snippets of code.

• Portable: Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.

• Extendable: You can add low-level modules to the Python interpreter. These modules
enable programmers to add to or customize their tools to be more efficient.

• Databases: Python provides interfaces to all major commercial databases.

• GUI Programming: Python supports GUI applications that can be created and ported

to many system calls, libraries and windows systems, such as Windows MFC,

Macintosh, and the X Window system of Unix.

• Scalable: Python provides a better structure and support for large programs than shell
scripting.

Python has a big list of good features:

• It supports functional and structured programming methods as well as OOP.

• It can be used as a scripting language or can be compiled to byte-code for building

large applications.

• It provides very high-level dynamic data types and supports dynamic type checking.

• IT supports automatic garbage collection.

• It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.

Python class and objects

These are the building blocks of OOP. Class creates a new object. This object can

be anything, whether an abstract data concept or a model of a physical object, e.g. a

chair. Each class has individual characteristics unique to that class, including variables
and methods. Classes are very powerful and currently “the big thing” in most
programming languages. Hence, there are several chapters dedicated to OOP later in

36
the book. The class is the most basic component of object- oriented programming.
Previously, you learned how to use functions to make your program do something. Now
will move into the big, scary world of Object-Oriented Programming (OOP). To be
honest, it took me several months to get a handle on objects. When I first learned C and
C++, I did great; functions just made sense for me. Having messed around with BASIC
in the early ’90s, I realized functions were just like subroutines so there wasn’t much
new to learn. However, when my C++ course started talking about objects, classes, and
all the new features of OOP, my grades definitely suffered. Once you learn OOP, you’ll
realize that it’s actually a pretty powerful tool. Plus, many Python libraries and APIs
use classes, so you should at least be able to understand what the code is doing. One
thing to note about Python and OOP: it’s not mandatory to use objects in your code in
a way that works best; maybe you don’t need to have a full-blown class with
initialization code and methods to just return a calculation. With Python, you can get
as technical as you want. As you’ve already seen, Python can do just fine with
functions. Unlike languages such as Java, you aren’t tied down to a single way of doing
things; you can mix functions and classes as necessary in the same program. This lets
you build the code Objects are an encapsulation of variables and functions into a single
entity. Objects get their variables and functions from classes.

Python modules

Python allows us to store our code in files (also called modules). This is very
useful for more serious programming, where we do not want to retype a long function
definition from the very beginning just to change one mistake. In doing this, we are
essentially defining our own modules, just like the modules defined already in the
Python library. To support this, Python has a way to put definitions in a file and use
them in a script or in an interactive instance of the interpreter. Such a file is called a
module; definitions from a module can be imported into other modules or into the main
module.

7.2 CODE

from promptify import Prompter,OpenAI,Pipeline

from nltk.translate.bleu_score import sentence_bleu

37
# Define the API key for the OpenAI model

api_key = "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"

# Create an instance of the OpenAI model, Currently supporting Openai's all model, In
future adding more generative models from Hugginface and other platforms

model = OpenAI(api_key)

prompter = Prompter('relation_extraction.jinja')

pipe= Pipeline(prompter , model)

# Example sentence for demonstratio"

senten = "The patient is a 93-year-old female with a medical history of chronic right
hip pain, osteoporosis, hypertension, depression, and chronic atrial fibrillation admitted
for evaluation and management of severe nausea and vomiting and urinary tract
infection."

print(senten)

result = pipe.fit(senten,

domain = 'transport',

print(result)

result = pipe.fit(domain = 'clinical', text_input = senten)

display(HTML('<h4>Sentence</h4>'))

print(senten)

print("\n")

display(HTML('<h4>Output</h4>'))

print(result)

38
CHAPTER 8
EVALUATION METYRICS AND TESTING
8.1 EVALUATION METRICS
1. Precision

Precision in the context of relation extraction measures the ratio of correctly

predicted positive observations (true positives) to the total predicted positives (true
positives plus false positives). It reflects the accuracy of the model in identifying only
relevant relations, essential in applications where the cost of a false positive is high.

True Positives (TP)

Precision = −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
True Positives (TP) + False Positives (FP)
Example:

Consider a relation extraction system designed to identify financial transactions

between entities within a large corpus of unstructured financial news articles. A high
precision score means that the identified relations are indeed transactions, minimizing
the risk of false leads in subsequent analyses, such as fraud detection.

Potential Pitfalls:

Focusing solely on precision might lead to a model that is overly conservative,

possibly ignoring valid but less obvious relations (high false negatives), thus not
sufficient on its own for system evaluation.

2. Recall

Recall measures the ability of the model to find all relevant instances within a
dataset. It is critical in scenarios where missing a relation can have severe
consequences, such as in legal document analysis or medical record evaluations.

True Positives (TP)

Recall = −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
True Positives (TP) + False Negative (FN)

39
Example:

In a public health context, a relation extraction system might scan patient

histories to find relationships between symptoms and diseases. High recall is crucial
here to ensure all potential connections are considered, avoiding missed diagnoses.

Potential Pitfalls:

Optimizing for recall alone may lead to a large number of false positives, as the
model tries to capture as many relations as possible, potentially leading to an overload
of unhelpful information.

3. F1 Score

The F1 Score is a more balanced metric that combines precision and recall into
a single measure. It is particularly useful when seeking a balance between precision
and recall and there is an uneven class distribution.

Precision × Recall
F1 = 2× −−−−−−−−−−−−−−−
Precision + Recall
Example:

For a corporate intelligence system analysing communications to detect insider

trading, both high precision and high recall are necessary. The F1 score helps maintain
this balance, ensuring the model is neither missing potential insider trades (high recall)
nor over-reporting normal communications as suspicious (high precision).

Potential Pitfalls:

While F1 is a robust measure, it assumes that precision and recall are of equal
importance, which might not always be the case depending on specific application
needs.

4. Accuracy

Definition and Importance:

Accuracy measures the proportion of true results (both true positives and true
negatives) in the dataset. It gives a straightforward metric of overall correctness.

40
True Positives (TP) + True Negative (TN)
Accuracy = −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Total Observations
Example:

In an automated system that scans job applications to extract candidate

qualifications and match them to job criteria, accuracy ensures that both the presence
and absence of qualifications are correctly identified and classified.

Pitfalls:

Accuracy can be misleading in the presence of imbalanced datasets where one

class significantly outnumbers another. For instance, if 90% of the data are non-
relations, a model could ostensibly achieve 90% accuracy by simply predicting 'no
relation' every time.

5. Area Under the ROC Curve (AUC-ROC)

The ROC curve is a graphical plot that illustrates the diagnostic ability of a binary
classifier system as its discrimination threshold is varied. The AUC measures the entire
two-dimensional area underneath the entire ROC curve from (0,0) to (1,1) and provides
an aggregate measure of performance across all possible classification thresholds.

Calculation:

The AUC-ROC involves plotting the true positive rate (Recall) against the false
positive rate at various threshold settings.

Example:

In security applications, where relation extraction is used to detect threats based

on communications, AUC-ROC helps in choosing a threshold that maintains an optimal
balance between catching as many true threats as possible (high true positive rate) and
minimizing false alarms (low false positive rate).

Pitfalls:

While AUC-ROC is a powerful metric, it may not always differentiate well

between models that perform well in specific operational ranges of the false positive
rate, which are often critical in practice.

41
Advanced Considerations

Statistical Significance Testing:

To ensure the observed differences in performance metrics are statistically

significant, especially when comparing two or more models.

Confidence Intervals:

Providing confidence intervals for performance estimates to capture the

uncertainty in these estimates, which is crucial for real-world applications.

Visualization:

Advanced visualization techniques, such as precision-recall curves, confusion

matrices, and ROC curves, can provide deeper insights into model performance beyond
single metrics.

Conclusion

Evaluating relation extraction systems requires a nuanced approach that

considers multiple metrics to fully understand the system’s performance and
suitability for specific applications. Each metric provides unique insights into the
strengths and weaknesses of a model, necessitating a comprehensive evaluation
strategy to develop robust and effective relation extraction systems.

8.2 TESTING INTRODUCTION

Relation extraction is a critical task in natural language processing (NLP) that

involves identifying and categorizing semantic relationships from a given text. This
process is essential for a variety of applications, including information retrieval,
knowledge graph population, and semantic search. Testing relation extraction systems,
particularly those powered by advanced models like Promptify, which may leverage
underlying AI technologies such as GPT (Generative Pre-trained Transformer),
requires a comprehensive and methodical approach to ensure the system's reliability
and effectiveness.

The testing process begins with unit testing, where individual components of the
system—such as the text preprocessing module, prompt generator, and relation

42
extraction engine—are tested in isolation to ensure they perform their specific functions
correctly. Following this, integration testing checks the interactions between these
components, ensuring that they work together seamlessly. System testing then evaluates
the complete system's functionality against specified requirements to ensure it behaves
as expected in a variety of scenarios.

Performance testing is crucial, as it assesses the system’s response under different

load conditions to guarantee stability and efficiency under pressure. Validation testing
focuses on the accuracy of the extraction, employing metrics such as precision, recall,
and F1 score to quantitatively measure the system's performance. User Acceptance
Testing (UAT) involves end users testing the system in real-world scenarios to validate
its usability and effectiveness. Lastly, continuous testing is integrated into the
development cycle to ensure ongoing improvements and stability as the system evolves.
This thorough testing framework is essential for deploying robust, efficient, and
accurate relation extraction systems in real-world applications.

8.3 TESTING METHODOLOGIES

Testing a relation extraction system using Fine-tuned pipeline is a multi-layered

process, encompassing a range of testing strategies from unit testing through to user
acceptance testing. Here’s a more detailed look at each of these testing phases,
emphasizing their importance and implementation in the context of relation extraction:

1. Unit Testing

Unit testing in the context of relation extraction using the Promptify library
involves testing individual components or units of code to ensure that they perform as
expected. In the case of Promptify, unit testing would focus on verifying the
functionality of specific modules or functions responsible for tasks such as
tokenization, prompt generation, relation extraction algorithms, and output formatting.

For example, unit tests could be designed to validate the tokenization process,
ensuring that text inputs are correctly segmented into tokens according to language
rules and punctuation. This ensures that the input to the relation extraction algorithm is
properly formatted and can be processed accurately.

43
Additionally, unit tests could target the prompt generation mechanism within
Promptify, verifying that prompts are generated appropriately based on the input text
and desired relational patterns. This helps ensure that the prompts effectively guide the
model in identifying relevant relationships within the text.

Preprocessing Module:

Test cases might check whether text normalization converts different date formats
to a standard format or if the tokenizer correctly splits text into words or sentences.

Prompt Generator:

Verify if the generator composes prompts that are syntactically correct and
contextually appropriate based on the input from the preprocessing module.

Relation Extraction Engine:

Tests could involve checking if the engine correctly identifies and extracts named
entities and relationships from simulated API responses.

2. Integration Testing

Integration testing in the context of relation extraction using the Promptify library
involves verifying the interactions and interoperability between different components
of the system. Unlike unit testing, which focuses on testing individual units of code in
isolation, integration testing evaluates how these units work together as a cohesive
system.

In the case of Promptify, integration testing would assess how its various modules
and functionalities integrate with each other to perform the task of relation extraction
effectively. This includes testing the end-to-end workflow of relation extraction, from
text input to the final output of extracted relationships.

For example, integration tests may simulate the entire process of relation
extraction using Promptify by providing sample input texts, running the tokenization,
prompt generation, and relation extraction algorithms, and then verifying that the
extracted relations are accurate and properly formatted. These tests ensure that all
components of Promptify function together seamlessly to produce the desired outcome.

44
Test the data flow from the preprocessing module to the prompt generator and
from there to the Promptify API. For example, ensure that the pre-processed data
correctly influences the prompts and that these prompts elicit responses that are suitable
for relation extraction.

Validate the interaction between the Promptify API and the Relation Extraction
Engine to ensure that the system correctly interprets the extracted data from the
responses and populates the database accurately.

3. System Testing

System testing is an end-to-end testing of the complete system against the

requirements specified. In relation extraction, this might include:

Testing the full functionality by processing a diverse dataset through the system
to ensure the software meets all specified requirements of extracting relationships.

Verify the system’s ability to handle edge cases, such as ambiguous or complex
sentences that may present challenges in relation extraction.

System testing in the context of relation extraction using the Promptify library
involves testing the entire system as a whole to ensure that it meets the specified
requirements and functions correctly in its intended environment. This type of testing
evaluates the system's behavior and performance from an end-to-end perspective,
focusing on validating its functionality, reliability, and usability.

System tests for Promptify would typically involve providing a diverse set of
input texts representing various domains and languages and evaluating the system's
ability to accurately extract relationships from them. These tests would cover a range
of scenarios and edge cases to ensure that Promptify can handle different types of
textual data effectively and produce reliable results consistently.

4. Functional Testing

Functional testing in the context of relation extraction using the Promptify library
involves verifying that the system performs its intended functions correctly and in
accordance with specified requirements. This type of testing focuses on evaluating

45
Promptify's functionality from a user's perspective, ensuring that it accurately extracts
relationships from input text and produces the expected output.

One aspect of functional testing for Promptify is validating its ability to correctly
identify and extract relationships from different types of input text. This involves
providing a variety of input texts representing various domains and languages and
verifying that Promptify can accurately detect and extract the specified relationships,
regardless of the complexity or diversity of the input data.

Functional testing also involves evaluating Promptify's handling of different

types of relational patterns and semantic structures within the input text. This includes
testing its ability to recognize and extract various types of relationships, such as binary
relations, n-ary relations, hierarchical relationships, and temporal relationships, among
others. By testing Promptify's performance across a wide range of relational patterns,
functional testing helps ensure its versatility and effectiveness in capturing diverse
types of relationships.

5. Performance Testing

This testing assesses the system’s behavior under a significant load and its ability to
scale:

Performance testing in the context of relation extraction using the Promptify

library involves evaluating the system's responsiveness, scalability, and stability under
various load conditions. The goal of performance testing is to assess how Promptify
performs in terms of speed, resource usage, and reliability, ensuring that it can handle
the expected workload efficiently without degradation in performance.

One aspect of performance testing for Promptify is assessing its responsiveness,

which involves measuring the system's response time to execute relation extraction
tasks. This includes measuring the time taken for Promptify to process a given input
text and generate the corresponding output with extracted relationships. By
benchmarking response times under different conditions, such as varying input text
lengths or concurrent requests, performance testing helps identify any potential
bottlenecks or areas for optimization.

46
Scalability testing is another important aspect of performance testing for
Promptify. This involves evaluating how well the system can handle increasing loads,
such as processing a large volume of text or handling multiple concurrent requests. By
gradually increasing the workload and monitoring the system's performance metrics,
scalability testing helps determine Promptify's ability to scale up to meet growing
demands without compromising performance or stability.

Load Testing:

This might involve feeding large volumes of text to check how well the system
handles high demand, especially important for applications needing real-time data
processing.

Stress Testing:

Push the system beyond normal operational capacities to see how it handles
failure. This is crucial for determining the system's robustness and data integrity during
unexpected or high-load conditions.

6. Validation Testing

Validation testing ensures that the software meets the organization's standards
and end-user requirements, particularly focusing on the output’s accuracy:

Precision and Recall:

Test cases would calculate the precision and recall for the extracted relationships,
ensuring the system reliably identifies correct relationships without missing significant
ones.

F1 Score:

This harmonic mean of precision and recall is used to evaluate the system's
overall accuracy, providing a more comprehensive measurement than using either
metric alone.

7. User Acceptance Testing (UAT)

UAT is the final phase of testing, conducted to ensure that the system can handle
real-world tasks and is ready for production:

47
Users test the system in real conditions with actual data inputs. They would verify
whether the system meets their requirements and is user-friendly, providing feedback
on any issues or improvements.

User acceptance testing (UAT) in the context of relation extraction using the
Promptify library involves validating that the system meets the requirements and
expectations of its end-users. Unlike other types of testing that focus on technical
aspects, UAT is conducted by the intended users or stakeholders to ensure that
Promptify meets their needs and operates effectively in real-world scenarios.

During UAT for Promptify, users or stakeholders would interact with the system
to perform tasks related to relation extraction, such as providing input texts, reviewing
the extracted relationships, and assessing the usability of the system's interface or
integration within their workflow. The focus is on evaluating Promptify's functionality,
usability, and overall suitability for the intended use cases and user requirements.

One aspect of UAT for Promptify involves validating the accuracy and relevance
of the extracted relationships. Users would review the extracted relationships against
their expectations and domain knowledge, verifying that Promptify correctly identifies
and extracts the desired relationships from the input text. This ensures that Promptify
delivers meaningful and reliable results that align with the users' needs and objectives.

8. Continuous Testing

In continuous development environments, testing is an ongoing process:

Regression Testing: Regularly re-run all or a subset of tests to ensure that new code
changes do not adversely affect existing functionality.

Continuously test system performance and reliability as new updates and features
are added to ensure that the quality remains high throughout the development process.

Each phase of testing in the development of a relation extraction system is crucial

for ensuring that the final product is robust, performs well, and meets user needs
effectively.

Continuous testing is a cornerstone of modern software development

methodologies, particularly in environments where agility and rapid iteration are

48
paramount. Regression testing, a key component of continuous testing, involves
regularly re-running tests to verify that recent code changes haven't introduced any
unintended side effects or regressions. By automating this process and integrating it
seamlessly into the development pipeline, teams can maintain confidence in the
stability and reliability of their codebase, even as it undergoes frequent updates and
enhancements.

49
CHAPTER 9
RESULTS AND DISCUSSION
9.1 RESULT
Input: “The patient is a 93-year-old female with a medical history of chronic right hip
pain, osteoporosis, hypertension, depression, and chronic atrial fibrillation admitted
for evaluation and management of severe nausea and vomiting and urinary tract
infection.”

Output:

1: "patient", "hasAge", "93-year-old"

2: "patient", "hasGender", "female"
3: "patient", "hasMedicalHistory", "osteoporosis"
4: "patient", "hasMedicalHistory", "hypertension"
5: "patient", "hasMedicalHistory", "chronic atrial fibrillation"
9.2 COMPARISON TABLE

Model BERT BART Gemini Finetune GPT

Case 1 0.58 0.65 0.72 0.8

Case 2 0.4 0.54 0.65 0.85

Case 3 0.3 0.7 0.73 0.89

Case 4 0.53 0.65 0.81 0.89

Table 9.2.1 Comparison Table

50
9.3 COMPARISON GRAPH

1
0.9
0.8
0.7
0.6
F1 Score

0.5
0.4
0.3
0.2
0.1
0
Case 1 Case 2 Case 3 Case 4
Axis Title

BERT BART Gemini Fine Tune GPT

Figure 9.3.1 Comparison Graph

9.4 OUTPUT ANALYSIS

Output analysis for relation extraction using Promptify involves evaluating the
quality of the extracted relations, assessing their relevance, accuracy, and usefulness
for downstream applications. Here's a detailed overview of the output analysis process:

Relevance Assessment:

Determine the relevance of the extracted relations to the task or domain of

interest. This involves assessing whether the extracted relations align with the intended
purpose of the relation extraction task.

Evaluate the coverage of the extracted relations, i.e., whether the model captures all
relevant relationships present in the input text or misses important ones.

Accuracy Evaluation:

Calculate precision, recall, and F1-score metrics to quantitatively evaluate the

accuracy of the extracted relations compared to a gold standard or human-labeled
dataset.

51
Conduct error analysis to identify common types of errors made by the model,
such as false positives (incorrectly extracted relations) and false negatives (missed
relations).

Relation Types and Semantics:

Analyze the distribution of relation types extracted by the model to understand

the diversity of relationships captured.

Assess the semantic coherence of the extracted relations to ensure that they
convey meaningful and interpretable information.

Confidence and Uncertainty:

Examine the confidence scores or probabilities assigned to the extracted relations

to gauge the model's certainty in its predictions.

Identify instances where the model exhibits high confidence or uncertainty in its
predictions and investigate the reasons behind them.

Error Analysis:

Perform qualitative analysis of erroneous predictions to identify patterns, biases,

or limitations in the model's performance.

Investigate specific cases of errors to determine whether they stem from inherent
challenges in the data, model architecture, or training process.

Evaluation on Specific Use Cases:

Evaluate the performance of the relation extraction model on specific use cases
or application scenarios relevant to the target domain.

Assess how well the extracted relations meet the requirements of downstream
applications, such as knowledge graph construction, information retrieval, or question
answering.

Comparison with Baselines and State-of-the-Art:

Benchmark the performance of the Promptify model against baseline methods

and state-of-the-art approaches for relation extraction.

52
Identify areas where Promptify outperforms existing techniques or areas for
improvement compared to the current state-of-the-art.

Scalability and Efficiency:

Evaluate the scalability and efficiency of the relation extraction process using
Promptify, particularly concerning processing speed and resource utilization.

Assess the model's ability to handle large volumes of textual data efficiently
without compromising on accuracy or quality.

53
CHAPTER 10
CONCLUSION AND FUTURE SCOPE
10.1 CONCLUSION
In conclusion, the utilization of Promptify represents a significant advancement
in the field of natural language processing (NLP), particularly in the domain of relation
extraction. By harnessing the power of prompt-based learning methodologies and
leveraging extensive pre-training techniques, Promptify offers a transformative
solution for automatically extracting semantic relationships from unstructured textual
data. This innovative approach not only streamlines the process of identifying and
extracting relational patterns but also demonstrates remarkable efficacy across diverse
domains and languages.

Through comprehensive output analysis and evaluation, it becomes evident that

relation extraction using Promptify yields promising outcomes in terms of relevance,
accuracy, and efficiency. The system's ability to accurately capture and interpret
nuanced relationships within text underscores its potential for a wide range of
applications, including but not limited to information extraction, knowledge graph
construction, and question answering systems. By providing a robust framework for
extracting meaningful insights from textual data, Promptify empowers organizations to
unlock valuable information resources and derive actionable intelligence from large
volumes of unstructured content.

Furthermore, Promptify's effectiveness in navigating the complexities of natural

language ensures that it remains a valuable tool for enhancing various NLP tasks,
ultimately contributing to advancements in fields such as data analytics, artificial
intelligence, and machine learning. As the demand for sophisticated NLP solutions
continues to grow, the role of systems like Promptify becomes increasingly
indispensable in enabling organizations to extract value from their textual data assets
efficiently and accurately. In essence, relation extraction using Promptify represents not
just a technological achievement but also a pivotal step forward in realizing the full
potential of natural language processing in real-world applications.

54
10.2 FUTURE SCOPE

Looking ahead, several promising avenues for further research and development
in relation extraction using Promptify emerge:

Multi-modal Relation Extraction:

Extend Promptify's capabilities to handle multi-modal inputs, including text,

images, and knowledge graphs, enabling more comprehensive and contextualized
relation extraction from heterogeneous data sources.

Incremental Learning:

Investigate methods for incremental learning with Promptify, facilitating

continuous improvement and adaptation to evolving linguistic patterns and domain-
specific nuances over time.

Interpretable Relation Extraction:

Develop techniques for interpretable relation extraction using Promptify,

enabling users to understand and validate the rationale behind extracted relationships
for increased trust and transparency.

Cross-lingual Relation Extraction:

Extend Promptify's capabilities to support cross-lingual relation extraction,

allowing extraction of relationships from text in multiple languages and facilitating
multilingual knowledge discovery.

Zero-shot and Few-shot Learning:

Explore zero-shot and few-shot learning approaches with Promptify, enabling

relation extraction in scenarios with limited labeled data, reducing dependency on large
annotated datasets.

Ethical and Bias Considerations:

Address ethical considerations and potential biases in relation extraction using

Promptify, including fairness, transparency, and mitigation strategies for biased
predictions.

55
Integration with Downstream Applications:

Integrate Promptify with downstream applications such as search engines,

recommendation systems, and virtual assistants to leverage extracted relationships for
enhanced user experiences and decision-making.

By pursuing these avenues, researchers and practitioners can further propel the
capabilities of relation extraction using Promptify, unlocking new vistas for knowledge
discovery, information extraction, and semantic understanding in the ever-evolving
landscape of natural language processing.

56
REFERENCES

[1] C.-H. Wei, Y. Peng, R. Leaman, A. P. Davis, C. J. Mattingly, J. Li, T. C.

Wiegers, and Z. Lu, ‘‘Assessing the state of the art in biomedical relation
extraction: Overview of the BioCreative V chemical-disease relation (CDR)
task,’’ Database, vol. 2016, pp. 1–8, Jan. 2016.
[2] M. Krallinger, O. Rabal, and S. A.Akhondi, ‘‘Overview of the BioCreativevi
chemical-protein interaction track,’’ in Proc. 6th BioCreative ChallengeEval.
Workshop, vol. 1, 2017, pp. 141–146.
[3] I. Segura-Bedmar, P. Martinez, and M. Herrero-Zazo, ‘‘SemEval-2013 task 9:
Extraction of drug-drug interactions from biomedical texts (DDIExtraction
2013),’’ in Proc. 7th Int. Workshop Semantic Eval., Atlanta, GA, USA, Jun.
2013, pp. 341–350.
[4] H. Gurulingappa, A. M. Rajput, A. Roberts, J. Fluck, M. Hofmann-Apitius, and
L. Toldo, ‘‘Development of a benchmark corpus to support the automatic
extraction of drug-related adverse effects from medical case reports,’’ J.
Biomed. Informat., vol. 45, no. 5, pp. 885–892, Oct. 2012.
[5] X. Yu and W. Lam, ‘‘Jointly identifying entities and extracting relationsin
encyclopedia text via a graphical model approach,’’ in Proc. 23rd Int. Conf.
Comput. Linguistics, Beijing, China, Aug. 2010, pp. 1399–1407.
[6] D. Zeng, K. Liu, S. Lai, G. Zhou, and J. Zhao, ‘‘Relation classification via
convolutional deep neural network,’’ in Proc. 25th Int. Conf. Comput.
Linguistics (COLING), Dublin, Ireland, Aug. 2014, pp. 2335–2344.
[7] M.MiwaandY.Sasaki,‘‘Modelingjointentityandrelationextractionwith table
representation,’’ in Proc. Conf. Empirical Methods Natural Lang. Process.
(EMNLP), Doha, Qatar, 2014, pp. 1858–1869.
[8] Z. Yan, C. Zhang, J. Fu, Q. Zhang, and Z. Wei, ‘‘A partition filter network for
joint entity and relation extraction,’’ in Proc. Conf. Empirical Methods Natural
Lang. Process., Punta Cana, Dominican Republic, 2021, pp. 185–197.
[9] Y. Ma, T. Hiraoka, and N. Okazaki, ‘‘Joint entity and relation extraction based
on table labeling using convolutional neural networks,’’ in Proc. 6th Workshop
Structured Predict. NLP, Dublin, Ireland, 2022, pp. 11–21.

57
[10] Y. S. Chan and D. Roth, ‘‘Exploiting syntactico-semantic structures for relation
extraction,’’ in Proc. 49th Annu. Meeting Assoc. Comput. Linguistics, Human
Lang. Technol., Portland, OR, USA, Jun. 2011, pp. 551–560.
[11] Y. Lin, H. Ji, F. Huang, and L. Wu, ‘‘A joint neural model for information
extraction with global features,’’ in Proc. 58th Annu. Meeting Assoc. Comput.
Linguistics, Jul. 2020, pp. 7999–8009.
[12] X. Ren, Z. Wu, W. He, M. Qu, C. R. Voss, H. Ji, T. F. Abdelzaher, and J. Han,
‘‘CoType: Joint extraction of typed entities and relations with knowledge
bases,’’ in Proc. 26th Int. Conf. World Wide Web, Apr. 2017, pp. 1015–1024.
[13] S. Zheng, F. Wang, H. Bao, Y. Hao, P. Zhou, and B. Xu, ‘‘Joint extraction of
entities and relations based on a novel tagging scheme,’’ in Proc. 55th Annu.
Meeting Assoc. Comput. Linguistics, Vancouver, BC, Canada, 2017, pp. 1227–
1236.
[14] S. Wang, Y. Zhang, W. Che, and T. Liu, ‘‘Joint extraction of entities and
relations based on a novel graph scheme,’’ in Proc. 27th Int. Joint Conf. Artif.
Intell., Stockholm, Sweden, Jul. 2018, pp. 4461–4467.
[15] M. Zhang, Y. Zhang, and G. Fu, ‘‘End-to-end neural relation extraction with
global optimization,’’ in Proc. Conf. Empirical Methods Natural Lang. Process.,
Copenhagen, Denmark, 2017, pp. 1730–1740.
[16] Q. Xia, B. Zhang, R. Wang, Z. Li, Y. Zhang, Huang, L. Si, and M. Zhang, „„A
unified span- based approach 212 for opinion mining with syntactic
constituents, ‟‟ in Proc. Conf. North Amer. Chapter Assoc. Compute. 213
Linguistics, Human Lang. Technol., 2021, pp. 1795–1804. 214
[17] Z. Zhong and D. Chen, „„A frustratingly easy approach for entity and relation
extraction, ‟‟ in Proc. Conf. 215 North Amer. Chapter Assoc. compute.
Linguistics, Human Lang. Technol., 2021, pp. 50–61. 216
[18] Y. Wang, B. Yu, Y. Zhang, T. Liu, H. Zhu, and L. Sun „TPLinker: Single- stage
joint extraction of entities 217 and relations through token pair linking,‟‟ in
Proc. 28th Int. Conf. Com put. Linguistics, 2020, pp. 1572– 218 1582. 219
[19] Z. Li, L. Fu, X. Wang, H. Zhang, and C. Zhou,[19] „„RFBFN: A relation- first
blank filling network for joint 220 relational triple extraction,‟‟ in Proc. 60th
Annu. Meeting Assoc. Comput. Linguistics, Student Res. 221 Workshop, 2022,
pp. 10–20. 222

58
[20] D. Ye, Y. Lin, P. Li, and M. Sun, „„Packed levitated marker for entity and
relation extraction, ‟‟ in Proc. 60th 223 Annu. Meeting Assoc. Comput.
Linguistics, 2022, pp. 4904– 4917.
[21] Q. Li, N. Yao, N. Zhou, J. Zhao and Y. Zhang "A joint entity and relation
extraction model based on efficient 201 sampling and explicit interaction",
ACM Trans. Intell. yst. Technol., vol. 14, no. 5, Oct. 2023.
[22] T. Zhao, Z. Yan, Y. Cao, and Z. Li, „„Entity relative position representation
based multi- head selection for 203 joint entity and relation extraction, ‟‟ in
Chinese Computational Linguistics. Hainan, China: Springer, 2020, 204 pp.
184–198
[23] X. Li, Y. Li, J. Yang, H. Liu, and P. Hu, „„A relation aware embedding
mechanism for relation extraction, ‟‟ 206 Appl. Intell., vol. 52, pp. 10022–
10031, Jan. 2022. 207
[24] D. Yu, C. Zhu, Y. Yang, and M. Zeng, „„JAKET: Joint pre-training of
knowledge graph and language 208 understanding, ‟‟ in Proc. AAAI Conf.
Artif. Intell., Jun. 2022, vol. 36, no. 10, pp. 11630–11638. 209
[25] W. Wu, Z. Zhu, J. Qi, W. Wang, G. Zhang, and P. Liu, „A dynamic graph
expansion network for multi-hop 210 knowledge base question answering, ‟‟
Neuro computing, vol. 515, pp. 37–47, Jan. 2023. 211
[26] Q. Xia, B. Zhang, R. Wang, Z. Li, Y. Zhang, Huang, L. Si, and M. Zhang, „„A
unified span- based approach 212 for opinion mining with syntactic
constituents, ‟‟ in Proc. Conf. North Amer. Chapter Assoc. Compute. 213
Linguistics, Human Lang. Technol., 2021,pp. 1795–1804. 214
[27] Z. Zhong and D. Chen, „„A frustratingly easy approach for entity and relation
extraction,‟‟ in Proc. Conf. 215 North Amer. Chapter Assoc. compute.
Linguistics, Human Lang. Technol., 2021, pp. 50–61. 216
[28] Y. Wang, B. Yu, Y. Zhang, T. Liu, H. Zhu, and L. Sun „TPLinker: Single- stage
joint extraction of entities 217 and relations through token pair linking, ‟‟ in
Proc. 28th Int. Conf. Com put. Linguistics, 2020, pp. 1572– 218 1582. 219
[29] Z. Li, L. Fu, X. Wang, H. Zhang, and C. Zhou,[19] „„RFBFN: A relation- first
blank filling network for joint 220 relational triple extraction,‟‟ in Proc. 60th
Annu. Meeting Assoc. Comput. Linguistics, Student Res. 221 Workshop, 2022,
pp. 10–20. 222

59
[30] D. Ye, Y. Lin, P. Li, and M. Sun, „„Packed levitated marker for entity and
relation extraction,‟‟ in Proc. 60th 223 Annu. Meeting Assoc. Comput.
Linguistics, 2022, pp. 4904– 4917.

The C# Player's Guide - 5th Edition - 5.0.0
83% (18)
The C# Player's Guide - 5th Edition - 5.0.0
497 pages
Corce
70% (46)
Corce
206 pages
Introduction To Computer Theory by Cohen Solutions Manual
80% (5)
Introduction To Computer Theory by Cohen Solutions Manual
198 pages
Ap Computer Science Principles Practice Exam and Notes 2021
86% (7)
Ap Computer Science Principles Practice Exam and Notes 2021
108 pages
The Ethical Slut PDF
55% (69)
The Ethical Slut PDF
298 pages
Hacking The Art of Exploitation 2nd Edition Jon Erickson
100% (20)
Hacking The Art of Exploitation 2nd Edition Jon Erickson
492 pages
PrepTest 83 - Print and Take Test - 7sage Lsat
100% (3)
PrepTest 83 - Print and Take Test - 7sage Lsat
46 pages
Typography For Lawyers
33% (6)
Typography For Lawyers
9 pages
Between U and Me - How To Rock Your Tween Years With Style and Confidence
77% (22)
Between U and Me - How To Rock Your Tween Years With Style and Confidence
47 pages
50 Phone Hacks DR - Brad
58% (19)
50 Phone Hacks DR - Brad
29 pages
One-Page Mythic GME
100% (8)
One-Page Mythic GME
11 pages
C# Cheat Sheet
100% (6)
C# Cheat Sheet
12 pages
Learn Python in A Day
100% (14)
Learn Python in A Day
141 pages
Senior High School Subjects
100% (1)
Senior High School Subjects
1 page
Restaurant Review Analysis
67% (3)
Restaurant Review Analysis
59 pages
All Codes Mobile
100% (1)
All Codes Mobile
53 pages
Count Ur Chicken Before They Hatch - Book
60% (5)
Count Ur Chicken Before They Hatch - Book
30 pages
20101128, 20101123, 20101115, 20101346_CSE
No ratings yet
20101128, 20101123, 20101115, 20101346_CSE
52 pages
project_document
No ratings yet
project_document
70 pages
Exploratory_Project_Report
No ratings yet
Exploratory_Project_Report
57 pages
Voice Based System Assistant Using NLP and Deep Learning-1
No ratings yet
Voice Based System Assistant Using NLP and Deep Learning-1
82 pages
udaya
No ratings yet
udaya
63 pages
Sample Report
No ratings yet
Sample Report
67 pages
Report SEM I
No ratings yet
Report SEM I
56 pages
pdfquery doc
No ratings yet
pdfquery doc
68 pages
Fake News Detection System Project Report-Merged
No ratings yet
Fake News Detection System Project Report-Merged
60 pages
Xyz
No ratings yet
Xyz
62 pages
Mẫu Trình Bày (Tham Khảo)
No ratings yet
Mẫu Trình Bày (Tham Khảo)
82 pages
DROWSINESS-DET_AddPage
No ratings yet
DROWSINESS-DET_AddPage
46 pages
Major Merged (1)
No ratings yet
Major Merged (1)
67 pages
PROJECT DOC-FILE
No ratings yet
PROJECT DOC-FILE
64 pages
Template To Prepare Documentation
No ratings yet
Template To Prepare Documentation
6 pages
B2 Salma Fayaz
No ratings yet
B2 Salma Fayaz
56 pages
Koushik Final Project
No ratings yet
Koushik Final Project
37 pages
B.E Cse Batchno 176
No ratings yet
B.E Cse Batchno 176
83 pages
Towards Metamorphic Testing of Space Software Using Large Language Models
No ratings yet
Towards Metamorphic Testing of Space Software Using Large Language Models
133 pages
BOT or Brain FP Final
No ratings yet
BOT or Brain FP Final
9 pages
final report -12
No ratings yet
final report -12
60 pages
FYP Proposal
No ratings yet
FYP Proposal
18 pages
Final Modified Document PG
No ratings yet
Final Modified Document PG
58 pages
Minor Project-1 R21-Cse Report Template Ss2425
No ratings yet
Minor Project-1 R21-Cse Report Template Ss2425
39 pages
voice sample (1)
No ratings yet
voice sample (1)
44 pages
Accurate Traffic Prediction 4.4
No ratings yet
Accurate Traffic Prediction 4.4
50 pages
CV Re-Updated
No ratings yet
CV Re-Updated
7 pages
Minor
No ratings yet
Minor
48 pages
Mainprojectsample Documentation.
No ratings yet
Mainprojectsample Documentation.
51 pages
Visvesvaraya Technological University: BELAGAVI-590018
No ratings yet
Visvesvaraya Technological University: BELAGAVI-590018
25 pages
Towards Developing Tools For Indian Lang
No ratings yet
Towards Developing Tools For Indian Lang
59 pages
Articles_search_project (1)
No ratings yet
Articles_search_project (1)
8 pages
Seminar Report
No ratings yet
Seminar Report
38 pages
Digit Final PDF
No ratings yet
Digit Final PDF
46 pages
Proposal Guid
No ratings yet
Proposal Guid
50 pages
BERT Model
No ratings yet
BERT Model
69 pages
yaswanth (1)
No ratings yet
yaswanth (1)
103 pages
Final Eval Report PDF
No ratings yet
Final Eval Report PDF
89 pages
Internship Report (Sanjay Final)
No ratings yet
Internship Report (Sanjay Final)
45 pages
firoz KHAN
No ratings yet
firoz KHAN
31 pages
Black_Book-2024
No ratings yet
Black_Book-2024
24 pages
fake object detection
No ratings yet
fake object detection
69 pages
Documentation (AA20)
No ratings yet
Documentation (AA20)
62 pages
Santhosh BE Paper To Jeevi Veh
No ratings yet
Santhosh BE Paper To Jeevi Veh
47 pages
AIML Major Minor FORMAT DeepFake
No ratings yet
AIML Major Minor FORMAT DeepFake
21 pages
Subramanian Venkataraman - Crafting Effective Prompts - A Guide To Prompt Engineering-Independently Published (2024)
No ratings yet
Subramanian Venkataraman - Crafting Effective Prompts - A Guide To Prompt Engineering-Independently Published (2024)
211 pages
Internship Report On Machine Learning With Python
100% (1)
Internship Report On Machine Learning With Python
50 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
109 pages
CS - 6 Months Exp
No ratings yet
CS - 6 Months Exp
2 pages
Techical Seminar Report sam_edit
No ratings yet
Techical Seminar Report sam_edit
16 pages
Final Report Vericheck
No ratings yet
Final Report Vericheck
49 pages
Automatic Generation of Multiple Choice Questions
No ratings yet
Automatic Generation of Multiple Choice Questions
125 pages
Domain Specialization As The Key To Make Large Language Models Disruptive: A Comprehensive Survey
No ratings yet
Domain Specialization As The Key To Make Large Language Models Disruptive: A Comprehensive Survey
35 pages
BT4431 Report of Project Ete 7TH Sem Plag Report Attachted
No ratings yet
BT4431 Report of Project Ete 7TH Sem Plag Report Attachted
69 pages
NLP Evaluation in The Time of Large Language Models
No ratings yet
NLP Evaluation in The Time of Large Language Models
153 pages
Mini Project Sample Document-1
No ratings yet
Mini Project Sample Document-1
12 pages
Fake News Detection
40% (10)
Fake News Detection
71 pages
MAJOR_AND_MINOR_PROJECT_REPORT_FORMAT_niist[1]
No ratings yet
MAJOR_AND_MINOR_PROJECT_REPORT_FORMAT_niist[1]
9 pages
Mastering Project Management: PMP and Agile for Leaders
From Everand
Mastering Project Management: PMP and Agile for Leaders
Rupal Jain
No ratings yet
Coding With JavaScript For Dummies Everything To Know About JavaScript (2020) - 40153
100% (1)
Coding With JavaScript For Dummies Everything To Know About JavaScript (2020) - 40153
247 pages
AI Tools and Prompts
100% (4)
AI Tools and Prompts
94 pages
Linux Cheat Sheet
No ratings yet
Linux Cheat Sheet
4 pages
Learn To Code Getting Started Guide
100% (4)
Learn To Code Getting Started Guide
23 pages
Introduction To Computer Science
100% (6)
Introduction To Computer Science
202 pages
Simple Sabotage Field Manual
100% (2)
Simple Sabotage Field Manual
16 pages
Eat That Frog
100% (10)
Eat That Frog
124 pages
Structured and Unstructured Maintenance With Example
0% (1)
Structured and Unstructured Maintenance With Example
9 pages
NWO, Illuminati, Freemason, Occult, Bible Prophecy, Conspiracy, Secret Society, Etc. Links
No ratings yet
NWO, Illuminati, Freemason, Occult, Bible Prophecy, Conspiracy, Secret Society, Etc. Links
47 pages
The JavaScript Beginner's Handbook
90% (10)
The JavaScript Beginner's Handbook
76 pages
Credit Card Processing System
No ratings yet
Credit Card Processing System
18 pages
Do You Speak Java
No ratings yet
Do You Speak Java
186 pages
Python Programming For Beginners - A Crash Course To Learn Python and Other Recommended Coding
83% (6)
Python Programming For Beginners - A Crash Course To Learn Python and Other Recommended Coding
86 pages
Learn To Code HTML and CSS Develop Style Websites PDF
100% (2)
Learn To Code HTML and CSS Develop Style Websites PDF
595 pages
How To Use PATS Module Initialization Function
No ratings yet
How To Use PATS Module Initialization Function
5 pages
LINUX COMMAND LINE An Introduction To Linux Command Line Environment
No ratings yet
LINUX COMMAND LINE An Introduction To Linux Command Line Environment
174 pages
Learning Liquid
100% (1)
Learning Liquid
89 pages
Marital Quality, Individualism/Collectivism and Divorce Attitude in Turkey
No ratings yet
Marital Quality, Individualism/Collectivism and Divorce Attitude in Turkey
8 pages
CV Sellaroli Alessio PDF
No ratings yet
CV Sellaroli Alessio PDF
2 pages
Borlasa - Bioethics Session 2 SAS
No ratings yet
Borlasa - Bioethics Session 2 SAS
6 pages
Full Time Bursary Application Form: Personal Details
No ratings yet
Full Time Bursary Application Form: Personal Details
2 pages
Memory
No ratings yet
Memory
11 pages
WC - Numbers in Academic Writing
No ratings yet
WC - Numbers in Academic Writing
2 pages
Steps To Starting A Lego Robotics Program
No ratings yet
Steps To Starting A Lego Robotics Program
7 pages
Siop Lesso Plan - Sequence of Events
No ratings yet
Siop Lesso Plan - Sequence of Events
4 pages
Instant Download The foundations of mathematics in the theory of sets 1st Edition John P. Mayberry PDF All Chapters
No ratings yet
Instant Download The foundations of mathematics in the theory of sets 1st Edition John P. Mayberry PDF All Chapters
64 pages
Bells Adjustment Inventory
No ratings yet
Bells Adjustment Inventory
2 pages
3rd Quarter Force
No ratings yet
3rd Quarter Force
6 pages
In Service Teacher Education
No ratings yet
In Service Teacher Education
13 pages
An LLM-Based Framework for Synthetic Data Generation
No ratings yet
An LLM-Based Framework for Synthetic Data Generation
11 pages
Resume Saravana
No ratings yet
Resume Saravana
3 pages
What Is A Collocation
No ratings yet
What Is A Collocation
2 pages
Atp Departmental Syllabus
No ratings yet
Atp Departmental Syllabus
12 pages
Conflict Resolution 2
100% (3)
Conflict Resolution 2
238 pages
Philosophic Vision of Sri MahaBharata TatParyaNirnaya and BhagavataTatparyaNirnaya of Sri AnandaTeertha K T Pandurangi 2015
No ratings yet
Philosophic Vision of Sri MahaBharata TatParyaNirnaya and BhagavataTatparyaNirnaya of Sri AnandaTeertha K T Pandurangi 2015
295 pages
Updated Accelerated Online Psychology Degree Plan of Study 1.10.2020
No ratings yet
Updated Accelerated Online Psychology Degree Plan of Study 1.10.2020
1 page
Chacha Nehru Sports Awards 2013-14
No ratings yet
Chacha Nehru Sports Awards 2013-14
17 pages
All My Sons Thesis Statement
100% (2)
All My Sons Thesis Statement
8 pages
IMO Model Course 6.09 Article
0% (1)
IMO Model Course 6.09 Article
2 pages
Paragraph and Essay
No ratings yet
Paragraph and Essay
11 pages
Chapter Summary Freud
No ratings yet
Chapter Summary Freud
11 pages
HVPE 0.1 Holistic Devl & Role of Edu
No ratings yet
HVPE 0.1 Holistic Devl & Role of Edu
59 pages
Ar2008 Section8f
No ratings yet
Ar2008 Section8f
3 pages
S: M - Reed, A Teacher, Left The Classroom For Fifteen Minutes. During That Time, Her Students Did Whatever They Wanted. When She Came Back, ..
No ratings yet
S: M - Reed, A Teacher, Left The Classroom For Fifteen Minutes. During That Time, Her Students Did Whatever They Wanted. When She Came Back, ..
4 pages