NLP for GDPR Compliance in Food Safety
NLP for GDPR Compliance in Food Safety
Abstract—This research explores the application of Large RQ2: How does the legal requirements classification
Language Models (LLMs) for automating the extraction of approach fare against baselines? To assess whether our
requirement-related legal content in the food safety domain and approach offers benefits over simpler solutions, RQ2 compares
checking legal compliance of regulatory artifacts. With Industry
4.0 revolutionizing the food industry and with the General Data our approach against two baselines: one based on LSTM and
arXiv:2404.17522v1 [[Link]] 26 Apr 2024
Protection Regulation (GDPR) reshaping privacy policies and the other based on keyword search. To answer RQ2, we report
data processing agreements, there is a growing gap between accuracy using the standard classification metrics.
regulatory analysis and recent technological advancements. This RQ3: How do state-of-the-art LLMs, both open-source
study aims to bridge this gap by leveraging LLMs, namely and closed-source, fare against one another in terms of
BERT and GPT models, to accurately classify legal provisions
and automate compliance checks. Our findings demonstrate accuracy for zero-shot learning and fine-tuning in regula-
promising results, indicating LLMs’ significant potential to en- tory compliance checking? This research question compares
hance legal compliance and regulatory analysis efficiency, notably generative LLMs in terms of their ability to understand and
by reducing manual workload and improving accuracy within apply regulatory requirements. To answer RQ3, we report
reasonable time and financial constraints. accuracy using the standard classification metrics.
Index Terms—Legal Compliance, Legal Requirements, Large
Language Models, GPT-4, GPT-3.5, BERT, GDPR, Food Safety, RQ4: Compared to traditional sentence-level analysis,
Industry 4.0, Internet of Things. how does incorporating paragraph context and the textual
specification of compliance rules enhance the performance
I. I NTRODUCTION of compliance checking? This research question aims to
The evolution of Industry 4.0 and regulations like GDPR assess the enhancement in accuracy of compliance checking
demand innovative approaches to ensure that food safety soft- brought about by the integration of paragraph-level context
ware systems and regulatory artifacts, such as Data Processing and rules, as opposed to traditional single-input sentence-
Agreement (DPAs), comply with legal standards. Our review level methods. To answer RQ4, we report accuracy using the
of the literature and existing methodologies reveals a gap in standard classification metrics.
applying LLMs to domain-specific regulatory analysis. This RQ5: What are the cost and time implications associated
study is propelled by the need to automate the interpretation with our proposed approach? To be practical, our approach
and application of legal provisions in the food industry and must generate predictions within a reasonable time. RQ5
data processing regulatory artifacts and is also motivated by provides a practical assessment of the approach’s resource
industry partners. This research aims to develop an LLM- efficiency, considering time and costs in terms of computing
based methodology for classifying food safety and checking and financial resources.
the compliance of regulatory provisions, seeking to outperform Structure. Section II presents background. Section III com-
traditional methods in efficiency, accuracy, and cost. pares the proposed method with related work. Section IV
My research is driven by two main objectives: to develop describes our proposed method. Section V provides current
an automated methodology for classifying legal provisions evaluation results. Section VI lays out the future research and
pertaining to food safety, and to enhance the efficacy and the timeline.
interoperability of compliance checking of regulatory artifacts
II. BACKGROUND
using LLMs. I propose that LLMs can significantly outperform
existing methods in these tasks, thereby enhancing legal com- In this section, the necessary legal and technical background
pliance and regulation analysis with improvements in terms of for my approach is provided.
accuracy, time, and financial cost.
A. Food-safety Regulations
A. Research Questions To ensure that the food supply is safe, countries around
The main research questions (RQs) are: the world have established regulatory frameworks to govern
RQ1: How accurate is the legal requirements classifi- the production, distribution, and consumption of food prod-
cation approach? RQ1 examines alternative realizations of ucts. Our primary focus is on the Safe Food for Canadians
our classification approach using different BERT variants and Regulations (SFCR). SFCR consolidates several federal Cana-
GPT-3.5-turbo. To answer RQ1, we report accuracy using the dian food regulations that apply, among other things, to the
standard classification metrics. import, export, and inter-provincial trade of food, including
1
ingredients. In many cases, SFCR defers the elaboration of Strategy 1: Extracting metadata or semantic frames and then
requirements for specific food products to Food-Specific Re- executing predefined rules over the metadata/frames to check
quirements and Guidance (FSRG) regulations. We use content compliance such as [1]–[3], [6], [7].
from FSRG regulations both in our qualitative analysis and in Strategy 2: Projecting regulatory texts into an embedding
our evaluation. In addition to SFCR and FSRG regulations, space and then measuring the similarity between these texts
we consider selected parts of the food-safety regulations by against (textual) compliance rules or a labelled group of
the US Food and Drug Administration (FDA) to examine the regulatory provisions. An example of research employing this
extent to which our classification solution is transferable to strategy is that of Amaral et al. [1], [2], who create sentence
jurisdictions outside Canada. embeddings and utilize semantic similarity for completeness
checking of privacy policies and DPAs.
B. General Data Protection Regulation (GDPR)
Strategy 3: Using encoder-only single-input transformer
GDPR is a privacy and data protection law enacted by models, such as BERT [8], to classify which compliance rules
the European Union to regulate the processing of personal are satisfied by each legal text. An example of work applying
data and enhance individuals’ control over their information. this strategy is that of Ilyas et al. [4], who use BERT to
We use Data Processing Agreements (DPAs), as defined by construct a multi-label classification pipeline for identifying
GDPR, to illustrate ideas and conduct evaluations. DPAs are sentences in DPAs that meet specific compliance rules.
binding agreements that specify the responsibilities and rights
of controllers and processors. Most of the content in DPAs B. Related Work on the Classification of Requirements-related
is software-related, making them of direct interest to the RE Legal Content
community [1]. For GDPR compliance, DPAs must follow
GDPR Article 28. 1) Metadata Extraction from Legal Texts: Legal documents
are typically lengthy and complex, making it difficult for indi-
III. R ELATED W ORK viduals to locate relevant information quickly and accurately.
Significant progress has been made in compliance checking This has led to the development of automated approaches for
of regulatory artifacts. Below we discuss limitations in existing information extraction from legal texts such as [9]–[12]. These
research for compliance checking fall under two categories. works, despite being amongst the first to focus on information
For our classification of requirements-related legal content extraction from legal texts, do not use ML techniques and are
from food safety regulations, we consider three prior lines of limited to privacy policies.
work: (1) extracting metadata from legal texts, (2) applying Studies such as [1]–[3] differ from ours in both domain and
LLMs to natural-language requirements, and (3) monitoring the utilization of feature-based learning as opposed to LLM-
food safety through IoT. To our knowledge, no prior work based approaches.
exists on automated classification of requirements-related pro- Some studies link privacy policies to software code [13]–
visions in food-safety regulations. [15]. In contrast, our approach focuses on food-safety regula-
tions. We use BERT variants and GPT for classification and
A. Related Work on Compliance Checking of Legal Content information extraction.
1) Reliance on sentences as units of analysis: Existing Zeni et al. [16], [17] and Sleimi et al. [18] extract semantic
approaches mostly promote a sentence-by-sentence analysis of metadata from legal texts using NLP and semantic-web tech-
legal texts [1]–[4]. We have observed three key issues caused niques, but their focus and domain differ from ours. Abualhaija
by this decision that can significantly affect accuracy. First, et al. [19] employ LLMs for question answering over GDPR;
interpreting sentences often requires contextual understanding they have different analytical goals and inputs from ours.
beyond the immediate sentence structure, as sentences can 2) LLMs and Natural-language (NL) Requirements: Sev-
be related to previously mentioned categories or definitions. eral studies use LLMs, particularly BERT, for analyzing NL
Second, it is common for a legal concept to be defined, with requirements, with tasks including classifying non-functional
subsequent sentences drawing upon the definition. Third, legal requirements [20]–[22], classifying and summarizing contrac-
texts are frequently interwoven with cross-references [5]. This tual obligations [23], [24], detecting requirements smells [25],
practice adds complexity by requiring an understanding of the classifying security requirements [26], classifying user feed-
broader content across a collection of legal documents. back [27], classifying requirements dependencies [28], detect-
2) Automation strategies lack justification for decisions and ing causality [29], classifying and clustering coreferences [30],
are either coarse-grained or entail significant manual effort [31], transforming NL requirements into formal specifica-
to build: Automation for legal texts employs one of the tions [32], predicting issue links [33], classifying issue sen-
following three strategies or a combination thereof. We argue tences [34], identifying similar requirement [35], check-
that these strategies not only fail to offer justification and ing completeness [36], and generating elicitation-interview
rationale for decisions – a capability now achievable with scripts [37]. These studies showcase the versatility of LLMs
LLMs – but they either are too coarse-grained, potentially in requirements-analysis tasks. We have benefited from the
compromising accuracy, or require considerable manual effort, best practices in the above-cited strands of work; however,
making compliance automation costly to implement. our analytical objectives set us apart.
2
3) Food-monitoring Systems: Food monitoring is a crucial Hyperparameter(s) Fine-tuning
Data
area of research for improving food safety and enhancing
supply-chain efficiency. Bouzembrak et al. [38] have con-
ducted a systematic literature review to examine the poten- Prompt 2 LLM-based 3
Construction Compliance Checking
tial of IoT for food safety and quality monitoring, as well Prompt
as food traceability and supply chain. They observe that
most existing IoT research aimed at these objectives focuses
on measurement solutions using sensors and communication
technologies. These solutions are technology-driven and not Compliance Prompt Passages from
Rule(s) Template Regulatory Artifact Compliance
directly linked or traceable to regulations. Our qualitative study Report
pr1
<latexit sha1_base64="FPXtabmdPbeQhp/pU1fr5K0aIz8=">AAACYXicZZHNSsNAFIWn8a+Nf21duilmIy5KIojbohR0V8FWoRNkMrmpo8lMmJlqdcg7uNU3c+2LOGmLEHs3czlzz8fh3ihPmdK+/11z1tY3NrfqDXd7Z3dvv9lqj5SYSgpDKlIh7yOiIGUchprpFO5zCSSLUriLni/L/7sXkIoJfqvfcggzMuEsYZRoK40wjYVWD03P7/rz6qw2wbLx0LIGD61aH8eCTjPgmqZEqTEVPAEJnEJorvv9vpaEFy6eKsgJfSYTMPOsFWmcx4mGWWgmkuSPjM6qhqlMqwKRkrwVros5vFKRZYTHBkMKZYpiHITG4EQIzYUGxd4BW7ZWifGCoiiqpoxxVhr/XFyXDnNWGL9wOx1sFwpUl3qVUsUoOyNyRucY/KJsUDgx3VMLxjmRmAvGYxvOzCHMvlGiQDJQnQXOLj74v+bVZnTaDfxucBN4vYvlCeroEB2hYxSgc9RDV2iAhoiiJ/SBPtFX7cdpOE2nvRh1akvPAaqUc/gLwxi8HA==</latexit>
sha1_base64="m94MPbhQ0jNffvSn8H3/t4KQWm4=">AAACYXicZZDNSgMxFIVvx//xr+rSTbEbcVFmBHElFqWgOwVbhaZIJnOnRmeSIUn9CwM+glt9IZ/BtS9iphVh9GxyObnn43CjPOXaBMFnzZuanpmdm1/wF5eWV1bra+s9LUeKYZfJVKqriGpMucCu4SbFq1whzaIUL6O74/L/8h6V5lJcmKccBxkdCp5wRo2zeoTF0ujrejNoBWM1/g/hz9A8/PAPXgDg7Hqt1iGxZKMMhWEp1brPpEhQoWA4sKedTscoKgqfjDTmlN3RIdpx14rVz+PE4OPADhXNbzh7rAZGKq0aVCn6VPg+EfjAZJZREVuCKZYtin44sJYkUhohDWr+jMSxjU5sMyyKohrKuOBl8DclTJmwe4UNCr/RIO6gyEzpVylVjHY7MudsjCH32hXFHdvadWCSU0WE5CJ25ewYwt0bJRoVR92Y4Nzhw79n/j/0dlth0ArPw2b7CCaah03Ygm0IYR/acAJn0AUGt/AKb/Be+/IWvLq3Pln1aj+ZDajI2/wGA1K96Q==</latexit>
sha1_base64="sROku4tHxBM7SCz7EWzFzPdrdr8=">AAACYXicZZDNSgMxFIXT8X/8q7rsptiNuCgzgrgSRSnoTsG2QlNKJnOnRmeSkKT+hXkHt7r1YXwEce2LmGlFGHs2uZzc83G4kUyZNkHwVfFmZufmFxaX/OWV1bX16sZmR4uRotCmIhXqOiIaUsahbZhJ4VoqIFmUQje6Oy3+u/egNBP8yjxJ6GdkyFnCKDHO6mAaC6MH1UbQDMaqTw/h79A4+vAP5funfzHYqLRwLOgoA25oSrTuUcETUMAp9O15q9UyivDcxyMNktA7MgQ77lqyejJODDz27VARecPoYzkwUmnZIEqRp9z3MYcHKrKM8NhiSKFokffCvrU4EcJwYUCzZ8CObXRiG2Ge5+VQxjgrgn8pboqE3c9tkPv1OnYHBWoKv0wpY7TbEZLRMQbfa1cUdm1zz4GxJApzwXjsytkxhLk3SjQoBro+wbnDh//PPD109pph0Awvw8bxCZpoEdXQNtpBITpAx+gMXaA2ougWvaBX9Fb59pa8qrc5WfUqv5ktVJJX+wGdcb9d</latexit>
Keyword-based
<latexit sha1_base64="VnNRRHtStAjW/uENRb6RNM6TAg8=">AAACaHicZVHNThsxGHSW8rctEMoBoV5WyaXiEO1GAq4gFIn2BBIBpHgVeZ1vg8WubWyHn1p+Dq70PXiRnnrvA1Q94k0Aactc/Gn8zWg8zmTBtInjX41g7sP8wuLScvjx08rqWnP985kWE0WhT0Uh1EVGNBSMQ98wU8CFVEDKrIDz7Oqwuj+/AaWZ4KfmXkJakjFnOaPEeCrFBu6Mzq1UbsiHzXbciaeI3g/Jy9Dej/5+//2v9XQ8XG/08EjQSQnc0IJoPaCC56CAU0jtt16vZxThLsQTDZLQKzIGO41cowZylPsYqR0rIi8ZvasLJqqoE0Qpcu/CEHO4paIsCR9ZDAVUKdwgSa3FuRCGCwOa/YDXJ7YT51xdVDLOKuGbiptKYXecjV0YRdj3CtRUfN2lbqP9jpCMTm3wjfZBYdt2ut4YS6IwF4yPfDg7NWH+zHINioGOZna++OT/mt8PZ91OstvZOfE/cIBmWEJfUAt9RQnaQ/voCB2jPqLoGj2gR/Sz8SdoBpvB1mw1aLxoNlANQesZ57XD5w==</latexit>
<latexit sha1_base64="HGDDMgGXvNdN/aepGWeX+nlLPlU=">AAACaHicZVHNThsxGHS2tMBC2wAHhLiskgvqIdpFAq5UVaTCCSQCSPEq8jrfpha7tms7/Fl+jl7Le/AinLjzAIgj3gSQFubiT+NvRuNxJgumTRzfNYJPM5+/zM7NhwuLX799by4tH2sxVhR6VBRCnWZEQ8E49AwzBZxKBaTMCjjJzn5V9yfnoDQT/MhcSUhLMuIsZ5QYT6XYwKXRuZXKDZJBsx134gmij0PyMrR3o8f9+6fW7cFgqdHFQ0HHJXBDC6J1nwqegwJOIbV73W7XKMJdiMcaJKFnZAR2ErlG9eUw9zFSO1JE/mH0si4Yq6JOEKXIlQtDzOGCirIkfGgxFFClcP0ktRbnQhguDGh2Da9PbCfOubqoZJxVwjcVN5XCbjkbuzCKsO8VqKn4ukvdRvsdIRmd2OBz7YPCD9vZ9MZYEoW5YHzow9mJCfNnlmtQDHQ0tfPFJ+9r/jgcb3aS7c7Wof+Bn2iKObSOWmgDJWgH7aLf6AD1EEV/0T/0H900HoJmsBqsTVeDxotmBdUQtJ4Bb53Dqg==</latexit>
Pre-processing pr2
<latexit sha1_base64="e13wsKKwUgrK022LUyket19Yr+M=">AAACaHicZVDBThsxFHSWUmChJdADqnpZJRfEIdqNRHsNQpHankBqACleRV7nbWqxaxvbgQTL38G1/Y/+SE+99wMQR7wJrbTwLh6N34zmTSYLpk0c/24EK69WX6+tb4SbW2/ebjd3ds+0mCoKAyoKoS4yoqFgHAaGmQIupAJSZgWcZ5fH1f/5NSjNBP9m5hLSkkw4yxklxlMpNjAzOrdSuVF31GzHnXgx0UuQPIF2L7r/+ueh9etktNPo47Gg0xK4oQXRekgFz0EBp5DaL/1+3yjCXYinGiShl2QCdhG5Rg3lOPcxUjtRRH5ndFYXTFVRJ4hSZO7CEHO4oaIsCR9bDAVUKdwwSa3FuRCGCwOa3cK/E9uJc64uKhlnlfC/iptKYQ+djV0YRdj3CtRUfN2lbqP9jpCMLmzwtfZB4cB2ut4YS6IwF4yPfTi7MGH+zXINioGOlna++OR5zS/BWbeTfOwcnibt3hFazjr6gFpoHyXoE+qhz+gEDRBFV+gO/UA/G3+DZrAXvF+uBo0nzTtUm6D1CHHlw6w=</latexit>
Classification
··· Classification is a compliance report, accompanied by an explanation and
prn Results
<latexit sha1_base64="FPXtabmdPbeQhp/pU1fr5K0aIz8=">AAACYXicZZHNSsNAFIWn8a+Nf21duilmIy5KIojbohR0V8FWoRNkMrmpo8lMmJlqdcg7uNU3c+2LOGmLEHs3czlzz8fh3ihPmdK+/11z1tY3NrfqDXd7Z3dvv9lqj5SYSgpDKlIh7yOiIGUchprpFO5zCSSLUriLni/L/7sXkIoJfqvfcggzMuEsYZRoK40wjYVWD03P7/rz6qw2wbLx0LIGD61aH8eCTjPgmqZEqTEVPAEJnEJorvv9vpaEFy6eKsgJfSYTMPOsFWmcx4mGWWgmkuSPjM6qhqlMqwKRkrwVros5vFKRZYTHBkMKZYpiHITG4EQIzYUGxd4BW7ZWifGCoiiqpoxxVhr/XFyXDnNWGL9wOx1sFwpUl3qVUsUoOyNyRucY/KJsUDgx3VMLxjmRmAvGYxvOzCHMvlGiQDJQnQXOLj74v+bVZnTaDfxucBN4vYvlCeroEB2hYxSgc9RDV2iAhoiiJ/SBPtFX7cdpOE2nvRh1akvPAaqUc/gLwxi8HA==</latexit>
sha1_base64="m94MPbhQ0jNffvSn8H3/t4KQWm4=">AAACYXicZZDNSgMxFIVvx//xr+rSTbEbcVFmBHElFqWgOwVbhaZIJnOnRmeSIUn9CwM+glt9IZ/BtS9iphVh9GxyObnn43CjPOXaBMFnzZuanpmdm1/wF5eWV1bra+s9LUeKYZfJVKqriGpMucCu4SbFq1whzaIUL6O74/L/8h6V5lJcmKccBxkdCp5wRo2zeoTF0ujrejNoBWM1/g/hz9A8/PAPXgDg7Hqt1iGxZKMMhWEp1brPpEhQoWA4sKedTscoKgqfjDTmlN3RIdpx14rVz+PE4OPADhXNbzh7rAZGKq0aVCn6VPg+EfjAZJZREVuCKZYtin44sJYkUhohDWr+jMSxjU5sMyyKohrKuOBl8DclTJmwe4UNCr/RIO6gyEzpVylVjHY7MudsjCH32hXFHdvadWCSU0WE5CJ25ewYwt0bJRoVR92Y4Nzhw79n/j/0dlth0ArPw2b7CCaah03Ygm0IYR/acAJn0AUGt/AKb/Be+/IWvLq3Pln1aj+ZDajI2/wGA1K96Q==</latexit>
sha1_base64="sROku4tHxBM7SCz7EWzFzPdrdr8=">AAACYXicZZDNSgMxFIXT8X/8q7rsptiNuCgzgrgSRSnoTsG2QlNKJnOnRmeSkKT+hXkHt7r1YXwEce2LmGlFGHs2uZzc83G4kUyZNkHwVfFmZufmFxaX/OWV1bX16sZmR4uRotCmIhXqOiIaUsahbZhJ4VoqIFmUQje6Oy3+u/egNBP8yjxJ6GdkyFnCKDHO6mAaC6MH1UbQDMaqTw/h79A4+vAP5funfzHYqLRwLOgoA25oSrTuUcETUMAp9O15q9UyivDcxyMNktA7MgQ77lqyejJODDz27VARecPoYzkwUmnZIEqRp9z3MYcHKrKM8NhiSKFokffCvrU4EcJwYUCzZ8CObXRiG2Ge5+VQxjgrgn8pboqE3c9tkPv1OnYHBWoKv0wpY7TbEZLRMQbfa1cUdm1zz4GxJApzwXjsytkxhLk3SjQoBro+wbnDh//PPD109pph0Awvw8bxCZpoEdXQNtpBITpAx+gMXaA2ougWvaBX9Fb59pa8qrc5WfUqv5ktVJJX+wGdcb9d</latexit>
<latexit sha1_base64="VnNRRHtStAjW/uENRb6RNM6TAg8=">AAACaHicZVHNThsxGHSW8rctEMoBoV5WyaXiEO1GAq4gFIn2BBIBpHgVeZ1vg8WubWyHn1p+Dq70PXiRnnrvA1Q94k0Aactc/Gn8zWg8zmTBtInjX41g7sP8wuLScvjx08rqWnP985kWE0WhT0Uh1EVGNBSMQ98wU8CFVEDKrIDz7Oqwuj+/AaWZ4KfmXkJakjFnOaPEeCrFBu6Mzq1UbsiHzXbciaeI3g/Jy9Dej/5+//2v9XQ8XG/08EjQSQnc0IJoPaCC56CAU0jtt16vZxThLsQTDZLQKzIGO41cowZylPsYqR0rIi8ZvasLJqqoE0Qpcu/CEHO4paIsCR9ZDAVUKdwgSa3FuRCGCwOa/YDXJ7YT51xdVDLOKuGbiptKYXecjV0YRdj3CtRUfN2lbqP9jpCMTm3wjfZBYdt2ut4YS6IwF4yPfDg7NWH+zHINioGOZna++OT/mt8PZ91OstvZOfE/cIBmWEJfUAt9RQnaQ/voCB2jPqLoGj2gR/Sz8SdoBpvB1mw1aLxoNlANQesZ57XD5w==</latexit>
3
primarily paragraphs (Fig. 2, Step ➊). This process ensures reported the average results. Boxplots were generated to visu-
that the chunks, once incorporated into the overall prompt, will ally represent the performance outcomes of these experiments.
fit within the token limit of the LLM. If a full paragraph were Our results indicate that accuracy is largely consistent across
to exceed the token limit, it would need to be either truncated BERT (Precision of 87%, Recall of 86%, and F-score of 87%)
or summarized. However, in our investigation of LLMs, we and GPT-3.5 (Precision of 89%, Recall of 83%, and F-score
did not encounter such situations due to their reasonably large of 86%), with BERT achieving slightly better overall results.
token limit. The output of this step is a set of passages each For our compliance checking of DPAs with GDPR, our
within the LLM’s token limit, ready for prompt construction. early experiments with gpt-3.5-turbo-0125 [42], Mixtral-8x7B-
Step 2) Prompt Construction. Our approach constructs tai- Instruct-v0.1 [43], and gpt-4-0125-preview [42] show promis-
lored prompts to guide the LLMs in eliciting rule-specific ing improvements when transitioning from sentence-level to
responses, based on the input compliance rules. Prompts, paragraph-level passages. On average, the improvements in
designated to extract both the rule applicability and the LLMs’ Accuracy obtained through using these three models are in
explanation and justification, consist of three parts: Prompt the ranges of 33% (from 30% and 63%), 35% (from 33% and
Template, Compliance Rules, and Passages from Regulatory 69%), and 40% (from 41% and 81%), respectively.
Artifact (Fig. 2, Step ➋).
VI. F UTURE R ESEARCH
The prompt is presented in a structured chat format for the
LLMs, specifying roles for system instructions, user inquiries, My research aims to make a case for the need to reconsider
and the assistant’s responses, following best practices where current practices in legal compliance automation in light of
applicable [40], [41]. The system role provides instructions recent advances in AI. Specifically, I argue that the enhanced
for the model to follow (detailed in the Prompt Template). capacity of modern LLMs to handle context is likely to induce
The user role presents input that model should respond to a major shift in our treatment of textual legal artifacts. This
(Passages from Regulatory Artifact and Compliance Rule(s)). shift will involve transitioning from analyzing smaller con-
The assistant role represents the model’s response to the texts, such as individual sentences and phrases, to considering
user’s input (Compliance Report). Prompt Template: You are a larger volumes of content, such as paragraphs and beyond,
legal expert trained to identify applicable {Compliance Rules} as context. I posit that the larger context will be able to
based on a given {text} within its specific {context}. When provide the prerequisite knowledge, including cross-referenced
provided with the {text} and its {context}, your response should legal materials, to create a self-contained basis for accurate
only include the rule identifier (e.g., ’R5’) if applicable. If automated decision-making regarding compliance.
there is no direct connection to any Compliance Rule within In the domain of food-safety regulations, my focus will
the context provided, respond with ’R99’. Follow this format extend to exploring concepts beyond North America, with a
strictly. Then, provide your rationale for the decision. keen interest in the regulations of countries like India and the
Step 3) LLM-based Compliance Checking. Once tailored UK, which, while operating under English law, have distinct
prompts are constructed, we employ zero-shot learning, where regulations.
the model generates responses without prior training on similar In the domain of DPAs, my future work will focus
tasks, based solely on the instructions provided in the prompt. on four main aspects: (1) enriching the DPA dataset [1]
In the future, this process may be enhanced with (optional) with paragraph-level annotations; (2) conducting compre-
fine-tuning to further refine the models’ responses to the spe- hensive empirical evaluations to validate the effectiveness
cific language of DPAs. Subsequently, the generated responses of paragraph-level context in increasing LLM accuracy;
are analyzed to determine the compliance of the DPA text (3) benchmarking against prior BERT-based approaches,
with the GDPR, identifying areas of compliance and non- e.g., [4], to showcase benefits; and (4) involving legal experts
compliance in a comprehensive report (Fig. 2, Step ➌). to critically assess LLM outputs, especially concerning their
explanations and justifications. A pertinent question arises on
V. C URRENT E MPIRICAL R ESULTS evaluating legal experts’ verification of these justifications.
We conducted (1) a qualitative study that characterizes food- I foresee dedicating at least eight months to these mentioned
safety concepts in regulatory provisions impacting modern activities, paralleling these efforts with the composition of a
software-intensive food-safety systems, and (2) an LLM- research paper for submission to RE 2025.
based approach that classifies the provisions of food-safety In addition, and time permitting, I plan to examine tech-
regulations based on their relevance to systems and software niques like Retrieval Augmented Generation [44] to facilitate
requirements. We conducted an extensive evaluation of our question answering over domains such as DPAs and Privacy
approach by instantiating it with both BERT and GPT-3.5. Policies. A preparatory period of four months will be reserved
We evaluated our models using standard classification met- for this initiative, post-investigation and prior to full-scale
rics, including Accuracy, Precision, Recall, and F-score, to implementation. As I progress, I trust that the constructive
classify legal provisions effectively. This evaluation facilitated insights garnered from our ongoing research, coupled with the
a comprehensive comparison of LLM performances, identi- sufficient time frame, will empower me to not only expand
fying strengths and highlighting areas for enhancement. To upon my thesis but also to finalize it punctually by the
account for variability, we conducted multiple experiments and culmination of my fourth academic year.
4
R EFERENCES [22] W. Alhoshan, L. Zhao, A. Ferrari, and K. J. Letsholo, “A zero-
shot learning approach to classifying requirements: A preliminary
study,” in Proceeding of 28th International Working Conference on Re-
[1] O. Amaral, S. Abualhaija, and L. Briand, “ML-based compliance quirements Engineering: Foundation for Software Quality (REFSQ).
verification of data processing agreements against GDPR,” in Proceed- Springer, 2022.
ings of 31st IEEE International Requirements Engineering Conference [23] A. Sainani, P. R. Anish, V. Joshi, and S. Ghaisas, “Extracting and
(RE), 2023. classifying requirements from software engineering contracts,” in
[2] O. Amaral, S. Abualhaija, D. Torre, M. Sabetzadeh, and L. C. Briand, Proceedings of 28th IEEE International Requirements Engineering
“AI-enabled automation for completeness checking of privacy poli- Conference (RE), 2020.
cies,” IEEE Transactions on Software Engineering, vol. 48, no. 11, [24] C. Jain, P. R. Anish, A. Singh, and S. Ghaisas, “A transformer-
2021. based approach for abstractive summarization of requirements from
[3] O. Amaral, M. I. Azeem, S. Abualhaija, and L. C. Briand, “NLP-based obligations in software engineering contracts,” in Proceedings of 31st
automated compliance checking of data processing agreements against IEEE International Requirements Engineering Conference (RE), 2023.
GDPR,” IEEE Transactions on Software Engineering, 2023. [25] M. K. Habib, S. Wagner, and D. Graziotin, “Detecting requirements
[4] M. Ilyas Azeem and S. Abualhaija, “A multi-solution study on GDPR smells with deep learning: Experiences, challenges and future work,”
AI-enabled completeness checking of DPAs,” arXiv:2311.13881, in Proceedings of 29th IEEE International Requirements Engineering
2023. Conference Workshops (REW). IEEE, 2021.
[5] N. Sannier, M. Adedjouma, M. Sabetzadeh, and L. Briand, “An auto- [26] V. Varenov and A. Gabdrahmanov, “Security requirements classifi-
mated framework for detection and resolution of cross references in cation into groups using NLP transformers,” in Proceedings of 29th
legal texts,” Requirements Engineering, vol. 22, 2017. IEEE International Requirements Engineering Conference Workshops
[6] O. Amaral, S. Abualhaija, M. Sabetzadeh, and L. Briand, “A Model- (REW). IEEE, 2021.
based conceptualization of requirements for compliance checking of [27] R. R. Mekala, A. Irfan, E. C. Groen, A. Porter, and M. Lindvall,
data processing against GDPR,” in Proceedings of 29th International “Classifying user requirements from online feedback in small dataset
Requirements Engineering Conference Workshops (REW), 2021. environments using deep learning,” in Proceedings of 29th IEEE
[7] A. Xiang, W. Pei, and C. Yue, “PolicyChecker: Analyzing the GDPR International Requirements Engineering Conference (RE). IEEE,
completeness of mobile apps’ privacy policies,” in Proceedings of the 2021.
2023 ACM SIGSAC Conference on Computer and Communications [28] G. Deshpande, B. Sheikhi, S. Chakka, D. L. Zotegouon, M. N.
Security, 2023. Masahati, and G. Ruhe, “Is BERT the new silver bullet? An empirical
[8] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- investigation of requirements dependency classification,” in Proceed-
training of deep bidirectional transformers for language understand- ings of 29th IEEE International Requirements Engineering Conference
ing,” in Proceedings of 17th American Chapter of the Association for Workshops (REW). IEEE, 2021.
Computational Linguistics: Human Language Technologies (NAACL- [29] J. Fischbach, J. Frattini, A. Spaans, M. Kummeth, A. Vogelsang,
HLT), 2019. D. Mendez, and M. Unterkalmsteiner, “Automatic detection of causal-
[9] T. D. Breaux, M. W. Vail, and A. I. Antón, “Towards regulatory com- ity in requirement artifacts: The CiRa approach,” in Proceedings of
pliance: Extracting rights and obligations to align requirements with 27th International Working Conference on Requirements Engineering:
regulations,” in Proceedings of 14th IEEE International Requirements Foundation for software quality (REFSQ). Springer, 2021.
Engineering Conference (RE), 2006. [30] Y. Wang, L. Shi, M. Li, Q. Wang, and Y. Yang, “A deep context-wise
[10] T. Breaux and A. Antón, “Analyzing regulatory rules for privacy and method for coreference detection in natural language requirements,”
security requirements,” IEEE Transactions on Software Engineering, in Proceedings of 28th IEEE International Requirements Engineering
vol. 34, no. 1, 2008. Conference (RE), 2020.
[11] T. D. Breaux and D. G. Gordon, “Preserving traceability and encoding [31] ——, “Detecting coreferent entities in natural language requirements,”
meaning in legal requirements extraction,” in Proceedings of 6th Inter- Requirements Engineering, vol. 27, no. 3, 2022.
national Workshop on Requirements Engineering and Law (RELAW), [32] A. Nayak, H. P. Timmapathini, V. Murali, K. Ponnalagu, V. G.
2013. Venkoparao, and A. Post, “Req2Spec: Transforming software require-
[12] J. Bhatia and T. D. Breaux, “Semantic incompleteness in privacy ments into formal specifications using natural language processing,”
policy goals,” in Proceedings of 26th IEEE International Requirements in Proceedings of 28th International Working Conference on Require-
Engineering Conference (RE), 2018. ments Engineering: Foundation for Software Quality (REFSQ), 2022.
[13] M. Fan, L. Yu, S. Chen, H. Zhou, X. Luo, S. Li, Y. Liu, J. Liu, and [33] C. M. Luders, T. Pietz, and W. Maalej, “Automated detection of typed
T. Liu, “An empirical evaluation of GDPR compliance violations in links in issue trackers,” in Proceedings of 30th IEEE International
Android mHealth apps,” in Proceedings of 31st IEEE International Requirements Engineering Conference (RE), 2022.
Symposium on Software Reliability Engineering (ISSRE), 2020. [34] Ş. Mehder and F. B. Aydemir, “Classification of issue discussions
[14] R. E. Hamdani, M. Mustapha, D. R. Amariles, A. Troussel, S. Meeùs, in open source projects using deep language models,” in 2022 IEEE
and K. Krasnashchok, “A combined rule-based and machine learning 30th International Requirements Engineering Conference Workshops
approach for automated GDPR compliance checking,” in Proceedings (REW), 2022.
of 18th International Conference on Artificial Intelligence and Law [35] M. Abbas, A. Ferrari, A. Shatnawi, E. Enoiu, M. Saadatmand, and
(ICAIL), 2021. D. Sundmark, “On the relationship between similar requirements and
[15] F. Xie, Y. Zhang, C. Yan, S. Li, L. Bu, K. Chen, Z. Huang, and G. Bai, similar software,” Requirements Engineering, vol. 28, no. 1, 2023.
“Scrutinizing privacy policy compliance of virtual personal assistant [36] D. Luitel, S. Hassani, and M. Sabetzadeh, “Using language models
apps,” in Proceedings of 37th IEEE/ACM International Conference on for enhancing the completeness of natural-language requirements,”
Automated Software Engineering (ASE), 2022. in Proceedings of 29th International Working Conference on Re-
[16] N. Zeni, N. Kiyavitskaya, L. Mich, J. R. Cordy, and J. Mylopoulos, quirement Engineering: Foundation for Software Quality (REFSQ).
“GaiusT: supporting the extraction of rights and obligations for regu- Springer, 2023.
latory compliance,” Requirements Engineering, vol. 20, 2015. [37] B. Görner and F. B. Aydemir, “Generating requirements elicitation
[17] N. Zeni, E. A. Seid, P. Engiel, S. Ingolfo, and J. Mylopoulos, “Building interview scripts with large language models,” in Proceedings of
large models of law with NómosT,” in Proceedings of 35th Interna- the 31st IEEE International Requirements Engineering Conference
tional Conference on Conceptual Modeling (ER), 2016. Workshops (REW), 2023.
[18] A. Sleimi, N. Sannier, M. Sabetzadeh, L. Briand, and J. Dann, “Au- [38] Y. Bouzembrak, M. Klüche, A. Gavai, and H. J. Marvin, “Internet of
tomated extraction of semantic legal metadata using natural language Things in food safety: Literature review and a bibliometric analysis,”
processing,” in Proceedings of 26th IEEE International Requirements Trends in Food Science & Technology, vol. 94, 2019.
Engineering Conference (RE), 2018. [39] D. Torre, S. Abualhaija, M. Sabetzadeh, L. Briand, K. Baetens,
[19] S. Abualhaija, C. Arora, A. Sleimi, and L. C. Briand, “Automated P. Goes, and S. Forastier, “An AI-assisted approach for checking the
question answering for improved understanding of compliance re- completeness of privacy policies against GDPR,” in Proceedings of
quirements: A multi-document study,” in Proceedings of 30th IEEE 28th IEEE International Requirements Engineering Conference (RE),
International Requirements Engineering Conference (RE), 2022. 2022.
[20] T. Hey, J. Keim, A. Koziolek, and W. F. Tichy, “NoRBERT: Transfer [40] OpenAI, “OpenAIChatData,” [Link]
learning for requirements classification,” in Proceedings of 28th IEEE [41] Huggingface, “Chatcompletion,” [Link]
International Requirements Engineering Conference (RE), 2020. [42] OpenAI, “OpenAIModels,” [Link]
[21] R. Chatterjee, A. Ahmed, P. R. Anish, B. Suman, P. Lawhatre, and [43] Huggingface, “Mixtral,” 2023, [Link]
S. Ghaisas, “A pipeline for automating labeling to prediction in [44] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, and
classification of NFRs,” in Proceedings of 29th IEEE International H. Wang, “Retrieval-augmented generation for large language models:
Requirements Engineering Conference (RE), 2021. A survey,” arXiv:2312.10997, 2023.