Identifying User Requirements Using LLMS: A. What Is An LLM?
Identifying User Requirements Using LLMS: A. What Is An LLM?
Abstract— Saltwater intrusion caused by rising sea levels, that there exists a great deal of practicality and benefits to their
droughts, and increased freshwater pumping in the coast and work. This research aims to achieve some of the same benefits
tidal regions poses a significant threat to people's daily lives and to this approach, but by using tools to improve performance of
industrial and agricultural production. Many organizations and existing LLMs, making it more affordable and more
individuals have come together to develop various products to accessible.
reduce its impact. A major challenge in this is that different
water users have different water usage types such as irrigation, The rest of this paper is organized as follows. Section II
power generation, sand & gravel washing, product introduces the technologies used in this project. Section III &
manufacturing, drinking water, etc. and it is essential to learn IV describe the backend and frontend systems of this
and understand the needs of each user group so that appropriate application respectively. Section V describes the user
services can be provided and used effectively. To effectively interview processing. Section VI discusses results and Section
identify the diverse requirements from the water users, in this VII concludes this paper.
paper, we proposed a method to use Large Language Model
(LLM) to analyze water user interview transcripts and construct II. WHY USE LLMS?
an AI-powered chat system. The system supports quick and
accurate understanding of client water usage requirements under A. What is an LLM?
the situation of saltwater intrusion.
LLMs are a recently developed type of AI that is focused
I. INTRODUCTION on the understanding and generation of human language. 3 In
Saltwater intrusion is an increasingly pressing climate other words, they are capable of taking an input text and
issue, characterized by the mixing of saltwater with freshwater generating an output text based on that input text. A common
due to rising sea levels1, droughts, and increased freshwater application of LLMs is chat-bots, which takes a user prompt
pumping.2 This phenomenon poses a significant threat to and creates their own text response. The most popular of these
people's daily lives and industrial and agricultural production. is OpenAI’s ChatGPT.3
These concerns have generated demand for products to B. How do LLMs fulfill our needs?
monitor, visualize, and predict water salt concentration.
However, due to the diverse and user-specific demands of the For our research, we need to extract requirements about
water usage clients, it is important to identify each client's water usage from interview text documents. An LLM
unique needs through user interviews, surveys, etc. To quickly architecture is well-suited for this task due to it’s ability to
and correctly synthesize the results of those user interviews, understand contextual meaning across text. Specifically, we
we propose to use LLMs to develop an interface to analyze the plan to develop an OpenAI powered chat-bot for its simplicity,
results from user interviews and generate summaries in a
versatility, and performance. This chat-bot takes user
simple and concise manner.
questions, queries documents for relevant information, and
In this research, we plan to develop a user-friendly chat generate responses that not only pull from the context it
system that allows service providers and product developers to learned during training but also from the client interview
better understand water usage clients’ needs and demands transcripts. OpenAI’s models are notorious for their ability to
related to saltwater intrusion. This proposed system uses understand complex pieces of text and extract details from
transcript obtained from interviewing a number of clients who them. This will see the chat-bot be able to offer insights
represent various organizations. The interview transcript data related to the wants and needs expressed by the clients in their
was anonymized to ensure no sensitive data is shared with interviews.
LLM providers. The interview data is fed to an LLM to
construct the proposed AI-powered chat system. The design of C. What is Langchain?
this product is inherently non-domain specific, allowing it to
theoretically be used for user requirement analysis in other Due to the rapid growth and popularity of LLMs and chat-
application domains. bots, user-friendly mechanisms for handling and working with
We have seen a growth in models designed to help these models have become necessary. LangChain has emerged
administrative and specialized industries constructing as one of the most popular frameworks, providing many
specialized LLM architectures. One example comes from helpful tools and components that make it quick and simple to
financial software company Bloomberg who is working on a incorporate complex, recent innovations, and strategies into
local LLM for their own financial usage. They have shown modern LLM-based applications.4 Our research aligns with the
Our application has two main ways of handling this. Another practical feature we implemented was the ability
Firstly, when web searching is needed, our AI can request to save and continue previous conversations. The functionality
web documents via the DuckDuckGo API. The second is for this however is more complex and implemented in large
ArXiv’s search. ArXiv is an online academic and scholarly part by us. The way in which this works is to store the
documents database ran by Cornell University 6. It is highly conversations in a SQL database where every saved
beneficial to our chat-bot especially in areas such as Water conversation has a unique key that is generated during run
Salinity where the knowledge is often highly technical and time. These conversations then are loaded into the frontend
complicated and interview data is unlikely to go into those and reconstructed into the various chat messages and saved
technical details, and DuckDuckGo may not return academic into chat history for backend purposes. The frontend then is
resources. just an array of buttons allowing for user to choose desired
chat session, as shown in Figure 3.
IV. FRONT-END METHODS
A. User Interface
The front-end for our application has a few requirements
and goals to make it as simple as possible for the end user and
for future development. Firstly, we wanted the front-end to be
able to be used as many various types of applications, such as
a GUI Desktop Application and Website given its need to be
used within various areas of research and confine to their
needs. We also wanted fast development and therefore needed
a language that is easy to work with, this lead us to use the
Next.js web framework due to its robust, already-existing
infrastructure.
Figure 3. Chat selection user interface
V. CLIENT INTERVIEW DOCUMENTS ANALYSIS are not clients themselves, making the results received more
consistent and accurate.
A. Client Identification
B. Need Identification
With the design of specialized chat-bots such as this, it
opens the possibilities for generation of specialized algorithms Once clients have been identified, the naive approach of
and structures to maximize the intended capabilities. One of identifying needs becomes significantly more logical. This is
the major ideas of this project was the development of because, given client names, very few documents are likely to
specialized documents that are easily understandable and pertain to each the smaller clients within the Salinity field. We
precomputed such that many of the questions likely to be did however discover major differences between the usage of
asked by users can be answered quickly. Given the client- smaller singular document based databases and large
oriented perspective, these generated documents will be a cumulative databases. We have chosen to be oriented towards
simple list of every client needs and the relevant clients. The cumulative databases to maximize information provided to the
LLM.
two parts identify the clients and list of needs of the client, as
shown in Figure 4. With this system, a document generation becomes simple
opting for markdown as preferred format allowing for both
Human readable and LLM readable format to be built with
very little required effort. These documents have since been
loaded into LLM.