Aisha A Custom AI Library Chatbot Using The ChatGPT API
Aisha A Custom AI Library Chatbot Using The ChatGPT API
To cite this article: Yrjo Lappalainen & Nikesh Narayanan (2023): Aisha: A Custom
AI Library Chatbot Using the ChatGPT API, Journal of Web Librarianship, DOI:
10.1080/19322909.2023.2221477
Brief Report
ABSTRACT KEYWORDS
This article focuses on the development of a custom chatbot for chatbots; ChatGPT;
Zayed University Library (United Arab Emirates) using Python OpenAI; GPT-3; GPT-3.5;
and the ChatGPT API. The chatbot, named Aisha, was designed generative pre-trained
to provide quick and efficient reference and support services to transformer; academic
libraries; artificial
students and faculty outside the library’s regular operating
intelligence; AI
hours. The article also discusses the benefits of chatbots in aca-
demic libraries, and reviews the early literature on ChatGPT's
applicability in this field. The article describes the development
process, perceived capabilities and limitations of the bot, and
plans for further development. This project represents the first
fully reported attempt to explore the potential of a ChatGPT-
based bot in academic libraries, and provides insights into the
future of AI-based chatbot technology in this context.
Introduction
Zayed University Library (United Arab Emirates) plays a crucial role in
providing access to various resources and information to support its stu-
dents and faculty’s research and academic needs. The library has been
providing an online chat service managed by Reference librarians to its
users during the working hours of the library. However, extending the
chat service when users require assistance outside of regular operating
hours is difficult. Chatbots offer a promising solution to address this issue.
A chatbot is a computer program simulating communication with human
users, typically using a message interface. Chatbots are designed to com-
prehend user inquiries and provide responses that resemble a human
conversation. They can be utilized for many things, such as work auto-
mation, information retrieval, and customer service. By leveraging chatbots,
libraries can provide quick and easy access to resources, answer research-re-
lated questions, and offer Reference help to students and faculty 24/7.
However, until recently, chatbots have been limited in their ability to
understand and respond to user queries accurately. Traditional chatbot
CONTACT Yrjo Lappalainen [email protected] Library and Learning Commons, Zayed University,
P.O. Box 19282, Dubai, United Arab Emirates
© 2023 Yrjo Lappalainen, Nikesh Narayanan
2 Y. LAPPALAINEN AND N. NARAYANAN
Literature review
Alan Turing developed the Turing Test (originally called the “imitation
game”) in the 1950s to assess the intelligence of computer programs, and
Mauldin (1994) coined the term “chatbot” to characterize systems that can
simulate human interaction and attempt to pass the Turing test. Midway
through the 1960s, the MIT Artificial Intelligence Laboratory created
ELIZA, the first chatbot capable of locating keywords in a given input
sentence and matching those keywords against predefined rules to produce
appropriate responses (Weizenbaum, 1966). Following ELIZA, the devel-
opment of increasingly intelligent chatbots advanced, most notably with
the creation of PARRY, developed by Kenneth Colby, a psychiatrist, in the
early 1970s to simulate a paranoid patient’s conversational style for use
in therapy and research (Deshpande et al., 2017). When users interact
Journal of Web Librarianship 3
AI chatbots in libraries
Since its public launch in November 2022, ChatGPT has received significant
attention with numerous articles and preprints already published about its
potential impact on various fields. The impact of ChatGPT on academia and
education, particularly in regards to academic integrity, has been an area of
major interest (see e.g. Cotton et al., 2023; King & ChatGPT, 2023; Lim et al.,
2023). Some articles have also been published about the role of ChatGPT in
the context of academic libraries. Lund and Wang (2023) examined the potential
impact of ChatGPT on academia and libraries by interviewing ChatGPT itself.
Based on its responses, they identified that ChatGPT has the capability to
improve several library services such as search and discovery, reference and
information services, cataloging and metadata generation and content creation.
However, they also emphasized that the technology needs to be used
6 Y. LAPPALAINEN AND N. NARAYANAN
responsibly and that ethical considerations such as privacy issues and bias need
to be taken into account.
Cox and Tzoc (2023) discussed the potential implications of ChatGPT
for academic libraries from a wide perspective. They suggested that
ChatGPT could complement or even replace existing search methods. They
also commented that ChatGPT can be integrated into library discovery
tools, which may lead into an “arms race” between providers as they
contend to add this functionality into their products. The authors also
highlighted the role of ChatGPT in research, where it could be used for
brainstorming and finding relevant literature. The authors suggested that
as the technology develops, AI tools could function as intelligent research
assistants that conduct virtual experiments, analyze data, do copywriting,
edit texts, and generate citations. In terms of library reference services,
the authors discussed the increasing use of AI chatbots to answer basic
reference questions, which can free up librarian time for more complex
research queries or tasks. The authors also noted that AI tools will make
information literacy and digital literacy more important than ever and
that librarians need to teach critical thinking skills to validate facts and
evaluate the quality of the answers provided by ChatGPT. They concluded
that the introduction of ChatGPT seems similar to other innovative devel-
opments such as the introduction of calculators, cell phones, the World
Wide Web, and Wikipedia, and that libraries should evaluate these new
tools and develop services to support their use.
Chen (2023) conducted a simple test where ChatGPT was asked questions
about library services, and its responses were compared with those provided
by conventional library chatbots. As a result, ChatGPT was able to suggest
specific databases, whereas the conventional chatbots did not understand
the question or only suggested visiting the library’s A-Z database page or
general instructions. The author also noted that a customized ChatGPT
might better answer local questions, such as library hours and local resources.
The author also suggested that past lessons from the adoption of Google
and Web 2.0 can guide how to approach ChatGPT. According to the author,
the library community failed to fully recognize and utilize the potential of
Google when it was first introduced and also failed to anticipate the impact
of social media in spreading misinformation. The author concluded that
the library community should avoid underestimating or underutilizing
ChatGPT's potential to enhance library services but also acknowledge and
address its potential weaknesses and pitfalls, such as plagiarism and the
possibility of erroneous output due to poor data quality.
Panda and Kaur (2023) examined the viability of ChatGPT as an alter-
native to traditional chatbot systems in library and information centers.
According to the authors, ChatGPT represents a significant advancement
over traditional chatbots because it enables more flexible and natural
Journal of Web Librarianship 7
can be processed by the language model. At the time of writing, the usage
cost of the gpt-3.5-turbo model (the same as the default publicly-available
ChatGPT model) is USD $0.002 per 1,000 tokens. A single question and
answer typically require fewer than 1,000 tokens, which means that the
model can answer at least 1,000 questions for approximately USD $2. This
pricing model offers a very low entry barrier and makes the API accessible
for a wide range of users.
For a custom chatbot, incorporating domain-specific data is crucial.
Currently, there are two ways to use custom data with the GPT models:
identified and added into the prompt. Generating embeddings with the
OpenAI API is also priced separately, and the price is USD $0.0004 per
1,000 tokens at the time of writing. Embeddings can also be created using
free alternatives such as the SentenceTransformers framework (https://
www.sbert.net).
but it can also lead to slow performance when dealing with larger datasets.
This is why OpenAI recommends using a vector database for searching
over many vectors quickly (OpenAI, 2023a). Based on this recommendation
and extremely helpful instructions published by Kim (2023), Yang (2023),
and Chase (2023), we decided to set up a vector database using Chroma
(https://www.trychroma.com), a toolkit designed for building AI applica-
tions with embeddings. It uses an in-process (serverless) DuckDB database,
allowing the storage and querying of embeddings and their metadata
without having to set up a dedicated server environment. At the same
time, we started using LangChain (https://python.langchain.com), a frame-
work for interfacing and working with various LLM. It facilitates data
ingestion, prompt management, embedding creation, and output parsing.
Above all, it can also be used to create chains, i.e., sequences of multiple
LLM calls, and advanced agents that use LLMs to interact with other
systems and tools. Chroma has a LangChain integration which makes it
possible to create a chain that queries the vector database first, before
passing the data to the OpenAI API. LangChain can also be used with
other LLM, which gives us more possibilities for further development.
• You are Aisha, a friendly and helpful library assistant at Zayed University Library.
Provide clickable links for any URLs. Answer the questions from the perspective of
Zayed University Library. Translate responses to the language of the question. Ask
follow up questions. If you don’t know the answer, say that you don’t know. Ask
for clarifications if you don’t understand the question. Provide direct links to the
mentioned library databases and services. Remind users that all databases can also
be accessed from https://zu.libguides.com/az.php. Don’t respond if the question is
not related to Zayed University Library or its resources and services. If you cannot
answer the question, recommend contacting a relevant subject librarian.
Journal of Web Librarianship 11
its knowledge (Alkaissi & McFarlane, 2023). Unfortunately, this has been
an issue in our project as well, with the bot occasionally promoting non-ex-
istent links and library services. We tried to reduce this by providing the
following additional instructions after the context: “When providing links,
prefer those that start with https://zu.libguides.com or https://zulib.idm.
oclc.org. Do not invent non-existent links or services that are not listed in
the context”. The OpenAI API also has a “temperature” setting that controls
the randomness in the generated text. Higher values will make the output
more random and lower values more predictable, closer to the training
data. Another approach was to provide one example question and the
correct answer to the bot before the actual prompt (also known as “one-
shot learning”). Revising the instructions, setting the temperature to 0 and
incorporating one-shot learning slightly reduced the frequency of the
hallucinations, but Aisha continues to occasionally generate non-existent
links and other minor hallucinations (e.g., personal names that do not
appear anywhere in the embedded source data).
There are many options for deploying Python apps online. Based on the
instructions published by Biswas (2023), we chose Streamlit (https://
streamlit.io), which is an open-source Python library for creating interactive
web applications. In Streamlit, a chatbot interface can be created with just
a couple of lines of code using the streamlit-chat component (https://pypi.
org/project/streamlit-chat). However, we decided to print the outputs using
Streamlit’s “st.markdown” function instead since this gave us more options
for customizing the look and feel of the chat. We added a custom avatar
generated by another AI model, Stable Diffusion. After initial testing, we
also added a debug mode that prints the prompt history, number of tokens,
and costs of usage. Based on Streamlit’s documentation, we also created
a Google Drive integration to record all questions and answers in a spread-
sheet that is only accessible to a few selected developers (Streamlit, 2023).
This allows us to monitor the bot’s performance and to modify the settings
based on the outputs. While Google Drive is convenient for logging the
conversations, we only intend to use it as a temporary solution during
testing as it may not meet the necessary data privacy standards in a pro-
duction environment.
Summary
Each chat query was then processed using the following steps (Figure 2):
Current limitations
Although the bot has performed well in initial testing, the implementation
still has certain limitations. First of all, the OpenAI API has token restric-
tions based on the selected model. OpenAI's gpt-3.5-turbo model currently
has a limit of 4,096 tokens, including both the input and the output. This
is not a major issue when it comes to chats, because the questions are
typically brief and require a total of 500-1,500 tokens per question, includ-
ing the question, context and the output. However, the full chat history
cannot be preserved for long since the token limit would be reached quickly.
Another issue is that the bot could be “tricked” by providing additional
data in the prompt (also known as “prompt injection”). This can result
in unreliable or questionable responses and the bot could start performing
tasks beyond its intended scope. Privacy issues may also arise if personal
Journal of Web Librarianship 15
data are passed to the OpenAI API. However, these issues could be pre-
vented by adding measures for detecting and filtering unwanted prompts
before even passing them to the OpenAI API.
One major downside of the implementation is that the bot currently
has no real-time access to information online, for example the library
website. However, this could be solved by loading and embedding specific
information automatically (such as opening hours and library events) on
a regular basis. We also expect that the ChatGPT plugins, which are cur-
rently under development, will be incorporated into the OpenAI APIs in
the coming months, making it easier to retrieve real-time information
from websites and other sources.
Librarians often receive complex Reference questions about specific papers
and topics. Since the bot currently has no real-time access to online informa-
tion, it cannot answer specific questions about individual papers. It is possible
to build functionality that allows users to upload their own documents, embed
them, and ask questions about their contents (e.g., Dara, https://www.dara.
chat). However, copyright and privacy issues could arise if all data are processed
on OpenAI's servers. One possible solution would be to embed the paper
locally and then query the OpenAI API with the relevant question and context
only, instead of passing the full text to the API. When it comes to interpreting
a paper’s content, a ChatGPT-based bot could easily outperform human librar-
ians, because it can process an entire scientific paper in a matter of seconds.
A ChatGPT-based bot could also provide general guidance in research methods
and citations, especially if such materials are available in the source data.
However, one of the most important aspects of human librarians is the ability
to provide search assistance and recommend specific library resources, which
requires a thorough and up-to-date understanding of the subject area and the
library’s collections. Achieving human-level recommendations with a chatbot
would require at least full access to the library discovery service and possibly
a memory function that keeps track of recommended resources, latest acqui-
sitions and perhaps even the latest trends in different fields. This is an inter-
esting area that calls for further research.
So far, we have only tested the bot informally among library staff - about
15 people representing all library teams. The bot saves all questions and
answers in an access-restricted Google spreadsheet, and we have used this
data to refine the bot’s instructions and source materials. After reviewing
around 500 unique questions and answers, we identified three main issues:
1) the bot often generates non-existent links, as described earlier; 2) the
bot may mistake a link from the source data as a (subscribed) library
service, although it is only mentioned as an additional resource, and; 3)
the bot cannot answer questions that require real-time data or access to a
specific resource (for example “When is the next library workshop?” or
“Can you recommend a good research article about AI?”). Despite these
16 Y. LAPPALAINEN AND N. NARAYANAN
issues, we were pleased to notice that there were very few factual errors,
and most of them were due to outdated or erroneous source data. The
generation of non-existent links remains a challenge at the time of writing,
but the latter two issues can be corrected by revising the source materials
and ingesting certain content (e.g., library event calendar) on a regular basis.
Future development
In the next phase of our project, we plan to move forward with more
formal testing by engaging Zayed University students and faculty. We are
excited to study how the bot performs in larger-scale testing and to hear
feedback and development ideas from library users. As a follow-up study,
we intend to do a more-formal analysis of the bot’s outputs and compare
it with a keyword-based chatbot solution.
During testing, we noticed that managing the embeddings can be challenging
in the long run, and another interface is needed for creating and updating
embedded contents. By developing an interface, a larger number of library
staff will be able to manage the embedded contents. We are also planning to
create a feedback mechanism in the bot that allows users to indicate their
opinions about the bot’s performance (for example, upvoting or downvoting
the response). In case of downvoting, the user could be instructed to provide
textual feedback or contact a liaison librarian for further questions.
The bot is currently instructed to provide contact details when it cannot
answer a question. This could be developed further, for example by con-
necting the user to a live chat with a librarian automatically during library
hours or by providing a form to ask further questions or report the issue.
This feedback would be valuable for further development.
Another development idea is adding a cache to reduce the number of
LLM calls and increase the bot’s performance. A solution called GPTCache
(https://gptcache.readthedocs.io) has already been developed for this pur-
pose. Another interesting direction to explore is the implementation of
AI agents which have the potential to facilitate interactions with other
library systems. With the use of agents, users could potentially carry out
various tasks such as searching library databases, renewing loans, or mak-
ing other requests, directly through the chatbot. This could greatly enhance
the user experience and streamline the overall process of accessing library
resources. Projects such as BabyAGI (https://github.com/oliveirabruno01/
babyagi-asi), Auto-GPT (https://github.com/Significant-Gravitas/Auto-GPT),
and AgentGPT (https://github.com/reworkd/AgentGPT) are already avail-
able to help with the development of intelligent agents. One particularly
interesting possibility would be to connect the chatbot to the library
discovery service, allowing it to query and recommend specific library
materials. We have also begun testing speech-to-text and text-to-speech
Journal of Web Librarianship 17
Conclusion
In this article, we described the development of Aisha, a custom ChatGPT-
powered chatbot at Zayed University Library. We also reviewed the history
of chatbots and early literature on ChatGPT in the context of academic
18 Y. LAPPALAINEN AND N. NARAYANAN
ORCID
Yrjo Lappalainen http://orcid.org/0000-0003-0942-6377
Nikesh Narayanan http://orcid.org/0000-0002-2005-1177
References
Adetayo, A. J. (2023). Artificial intelligence chatbots in academic libraries: The rise of
ChatGPT. Library Hi Tech News, 40(3), 18–21. https://doi.org/10.1108/LHTN-01-2023-
0007
Alkaissi, H., & McFarlane, S. I. (2023). Artificial hallucinations in ChatGPT: Implications
in scientific writing. Cureus, 15(2), Article e35179. https://doi.org/10.7759/cureus.35179
Allison, D. (2012). Chatbots in the library: Is it time? Library Hi Tech, 30(1), 95–107.
https://doi.org/10.1108/07378831211213238
Ashfaque, M. W. (2022). Analysis of different trends in chatbot designing and develop-
ment: A review. ECS Transactions, 107(1), 7215–7227. https://doi.org/10.1149/10701.7215ecst
Biswas, A. (2023, March 19) How to build a chatbot with ChatGPT API and a conver-
sational memory in Python. Medium. https://medium.com/@avra42/how-to-build-a-
chatbot-with-chatgpt-api-and-a-conversational-memory-in-python-8d856cda4542
Black, S., Biderman, S., Hallahan, E., Anthony, Q., Gao, L., Golding, L., He, H., Leahy,
C., McDonell, K., Phang, J., Pieler, M., Prashanth, U. S., Purohit, S., Reynolds, L., Tow,
Journal of Web Librarianship 19
Kane, D. (2019). Creating, managing and analyzing an academic library chatbot. BiD, 43
https://doi.org/10.1344/BiD2019.43.22
Kim, S. (2023, March 20) How to ensure OpenAI's GPT-3 provides an accurate answer
using embedding and semantic search. Dev Genius, https://blog.devgenius.io/how-to-
ensure-openais-gpt-3-provides-an-accurate-answer-and-stays-on-topic-af5da300ba81
King, M. R, ChatGPT. (2023). A conversation on artificial intelligence, chatbots, and
plagiarism in higher education. Cellular and Molecular Bioengineering, 16(1), 1–2. https://
doi.org/10.1007/s12195-022-00754-8
Lim, W. M., Gunasekara, A., Pallant, J. L., Pallant, J. I., & Pechenkina, E. (2023). Generative
AI and the future of education: Ragnarök or reformation? A paradoxical perspective
from management educators. The International Journal of Management Education, 21(2),
100790. https://doi.org/10.1016/j.ijme.2023.100790
Lund, B. D., & Wang, T. (2023). Chatting about ChatGPT: How may AI and GPT impact
academia and libraries? Library Hi Tech News, 40(3), 26–29. https://doi.org/10.1108/
LHTN-01-2023-0009
Mauldin, M. L. (1994). Chatterbots, TinyMUDs, and the Turing Test: Entering the Loebner
Prize competition. In Hayes-Roth, B. & Korf, R. E. (Eds.), AAAI-94: Proceedings of the 12th
national conference on artificial intelligence (pp. 16–21). AAAI Press. https://aaai.org/
papers/00016-chatterbots-tinymuds-and-the-turing-test-entering-the-loebner-prize-competition/
Mckie, I. A. S., & Narayan, B. (2019). Enhancing the academic library experience with
chatbots: An exploration of research and implications for practice. Journal of the
Australian Library and Information Association, 68(3), 268–277. https://doi.org/10.1080
/24750158.2019.1611694
McNeal, M. L., & Newyear, D. (2013). Introducing chatbots in libraries. Library Technology
Reports, 49(8), 5–10. https://www.journals.ala.org/index.php/ltr/article/view/4504/5281
Meta, AI. (2023, February 24). Introducing LLaMA: A foundational, 65-billion-parameter
language model. https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
OpenAI. (2022, November 30). Introducing ChatGPT. Blog. https://openai.com/blog/chatgpt
OpenAI. (2023a). OpenAI Documentation. Retrieved April 6, 2023, from https://platform.
openai.com/docs
OpenAI. (2023b, March 23). ChatGPT plugins. OpenAI Blog. https://openai.com/blog/
chatgpt-plugins
OpenAI. (2023c, May 12). Web browsing and Plugins are now rolling out in beta. ChatGPT
Release Notes. https://help.openai.com/en/articles/6825453-chatgpt-release-
notes#h_9894d7b0a4
Panda, S., & Kaur, N. (2023). Exploring the viability of ChatGPT as an alternative to
traditional chatbot systems in library and information centers. Library Hi Tech News,
40(3), 22–25. https://doi.org/10.1108/LHTN-02-2023-0032
Pichai, S. (2023, February 6). Google AI updates: Bard and new AI features in search.
Google Blog. https://blog.google/technology/ai/bard-google-ai-search-updates
Ren, X., Zhou, P., Meng, X., Huang, X., Wang, Y., Wang, W., Li, P., Zhang, X., Podolskiy,
A., Arshinov, G., Bout, A., Piontkovskaya, I., Wei, J., Jiang, X., Su, T., Liu, Q., & Yao,
J. (2023). Pangu-Sigma: Towards trillion parameter language model with sparse hetero-
geneous computing. ArXiv:2303.10845v1. https://arxiv.org/abs/2303.10845
Rodriguez, S., & Mune, C. (2021). Library chatbots: Easier than you think. Computers in
Libraries, 41(8), 29–32. https://www.infotoday.com/cilmag/oct21/Rodriguez-Mune–
Library-Chatbots-Easier-Than-You-Think.shtml
Soltan, S., Ananthakrishnan, S., FitzGerald, J., Gupta, R., Hamza, W., Khan, H., Peris, C.,
Rawls, S., Rosenbaum, A., Rumshisky, A., Prakash, C. S., Sridhar, M., Triefenbach, F.,
Verma, A., Tur, G., & Natarajan, P. (2022). AlexaTM 20B: Few-shot learning using a
Journal of Web Librarianship 21
5. Once you have registered, you can log in to Scival using your email address and
password.
I hope this helps! Let me know if you have any further questions or if there’s anything
else I can assist you with.
Authors: Thank you so much!
Aisha: Sure, I'm glad I could help! If you have any other questions or need further
assistance, feel free to ask.