CoLinS 2024 Cherednichenko
CoLinS 2024 Cherednichenko
Abstract
Chat Bots play crucial role in modern world businesses. Development of a chatbot is a tedious and
complex task that takes enormous amount of time. Therefore, to enable businesses to develop chatbots
in short amount of time with small resources, we need to explore usage of LLMs for the chatbot
development. In this article the researchers developed a prototype chatbot architecture that enables
businesses to use LLMs interchangeably. Nonetheless, it notes the prototype's limitation in tracking
conversation history, an area ripe for future enhancement. The current focus on pre-trained LLMs sets
the stage for subsequent research into personalized, fine-tuned chatbot experiences for SaaS customers.
There are dozens of LLMs existing already, therefore it is crucial to select the most capable LLM which
will power the chatbot. It is crucial for the LLM to be cost-efficient, to be profitable for the business. This
article evaluates three LLMs, endorsing ChatGPT for its superior speed, cost-effectiveness, and
relevance, backed by OpenAI's pioneering status.
Keywords
ChatBots, LLMs, SaaS, Langchain, Python, Node.js, Telegram Bot
1. Introduction
Modern technologies and the Internet have enabled Software as a Service (SaaS) [1] models to
emerge and dominate the global landscape; numerous multibillion-dollar companies have been
built on this model, such as Netflix, Amazon, Facebook, and Airbnb. According to 360 Research
Reports, the SaaS market size was valued at approximately 1,777,098 million dollars in 2021 and
is expected to continue growing, reaching an estimated 1,177,188 million dollars by 2027.
Developing a contemporary SaaS application presents numerous technical challenges that need
to be addressed, especially today with the widespread adoption of cloud technologies and the
emergence of new cloud-native [2] solutions. Moreover the global chatbot market is projected to
reach 2 billion dollars by end of 2024, growing at a CAGR (compound annual growth rate) of
29.7%.
Furthermore, the MACH Alliance [3] and the use of composable e-commerce [4] architectures
are gaining significant popularity. This approach allows businesses to construct their systems like
a modular kit, assembling various SaaS systems where each system is responsible for different e-
commerce technical capabilities, such as search, personalization, and order management, among
others.
In the dynamic landscape of technological advancements, the integration of conversational
interfaces has emerged as a critical facet for Software as a Service (SaaS) solutions. Within the
SaaS business framework, chatbots play a crucial role in enhancing customer engagement,
providing real-time support, and automating routine tasks. The integration of chatbots is not
solely a technological trend but a strategic imperative, offering businesses the opportunity to
optimize operations, reduce response times, and ultimately improve customer satisfaction.
Moreover, Chatbots can collect and analyze vast amounts of data from interactions with users.
This capability allows SaaS companies to gain insights into user behavior, preferences, and
feedback. The analysis of this data can inform product development, improve user experience,
and guide strategic decision-making. In scientific terms, this iterative process of hypothesis,
experimentation, and validation is fundamental to both product improvement and understanding
user dynamics. As well as offer personalized experiences to users by leveraging AI and machine
learning algorithms. This personalization can extend to product recommendations, support
solutions, and interactive experiences tailored to the individual user’s behavior and preferences.
Emerging technologies like Large Language Models [5] are revolutionizing the way we interact
with an AI, since LLMs provide rich set of capabilities, like contextual awareness, adaptability to
user preferences, language proficiency and understanding user queries with a depth, multilingual
capabilities etc. Therefore, LLMs can be a great choice for integration into SaaS applications to
deliver truly unique customer experiences with chat bots.
2. Related works
In the research [6], Anne L. Roggeveen at al. and Raj Sethurma highlighted that Chatlines (like
chatbots but featuring with a human presence on the seller side) help retailers engage in person-
to-person digital conversations, such as on a website, and at least half of Fortune 5000 companies
have experimented with chatbots, hoping to reduce costs and improve response quality. Though
this work does not get into details of chatbot technology selection.
Literature review [7] underlines different weaknesses of a Chatbot agent in comparison with
a human such as limited understanding and knowledge, limited empathy and emotions. Though
it worth to mention that the review was based on literature produced between 2020 – 2021 and
therefore before huge breakthrough of LLM. LLMs have a great potential to overcome the
limitations mentioned in the literature review.
Evaluating chatbots and natural language generation is a well-known challenge and research
[8] proposed to use human likeness of conversational responses to evaluate chatbots. They
describe two human-evaluation setups: static, in which they benchmark models on a fixed set of
multi-turn contexts to generate responses; and interactive, where they allow humans to chat
freely with chatbots. They also introduce two key metrics: sensibleness, specificity. Moreover the
researchers built their own multi-turn open-domain chatbot called Meena, and compare it with
other technologies, though since the article was written at 2020 none of the modern LLMs are
mentioned.
Researchers in [9] highlight challenges that chatbot applications are tightly coupled to their
intent recognition providers, hampering their maintainability, reusability and evolution.
Typically, once the chatbot designer chooses a specific chatbot development platform, she ends
up in a vendor lock-in scenario, especially with the NL engine coupled with the platform.
Similarly, current chatbot platforms lack proper abstraction mechanisms to easily integrate and
communicate with other external platforms the company may need to interact with. The work
aims to tackle all these issues by raising the level of abstraction at what chatbots are defined. To
this purpose, they introduce Xatkit, a novel model-based chatbot development framework that
aims to address this question using Model Driven Engineering (MDE) techniques: domain-specific
languages, platform independent bot definitions, and runtime interpretation. Xatkit embeds a
dedicated chatbot-specific modeling language to specify user intentions, computable actions and
callable services, combining them in rich conversation flows. Conversations can either be started
by a user awakening Xatkit or by an external event that prompts a reaction from Xatkit (e.g.
alerting a user that some event of interest fired on an external service the bot is subscribed to).
In the research work [10], Girija Attigeri, Ankit Agrawal, and Sucheta Kolekar focus on
developing a chatbot for technical university information dissemination and conducting a
comparative analysis of various Natural Language Processing (NLP) models. The aim was to
address the information needs of prospective students by providing a chatbot on the university's
website, which could offer official, uniform information accessible 24/7, thus assisting students
in making informed decisions. The researchers implemented five chatbot models using different
techniques: Neural networks, TF-IDF (Term Frequency-Inverse Document Frequency)
vectorization, Sequential modeling, Pattern matching.
The paper details the development process of the chatbots, including:
• Data collection from resources
• Pre-processing steps like tokenization, stemming, and lemmatization
• Design considerations for the neural networks, including layers and activation functions
• The importance of a vast and well-structured knowledge base for effective response
generation
Those again proving that development of a modern chatbot is a challenging task that requires
significant amount of effort. Though it is worth to mention that the work did not investigate usage
of LLMs to build the chatbots.
In the paper [11] Guanwen Mao, Jindian Su, Shanshan Yu, and Da Luo, addresses the challenge
of matching an appropriate response with its context in retrieval-based chatbots. The proposed
Hierarchical Aggregation Network of Multi-Representation (HAMR) leverages abundant
representations of context and response to enhance the selection process.
In the work [11], the researchers work on key question ““How we can ensure that AI systems,
like ChatGPT, are developed and adopted in a responsible way?”. the researchers adopted a
comprehensive approach to ensure that AI systems, specifically chatbots for financial services,
are developed in a responsible and ethical manner. They tackled the challenge of operationalizing
Responsible Artificial Intelligence (RAI) at scale by creating a pattern-oriented RAI engineering
methodology. The researchers also conducted a case study on developing chatbots for the
financial services industry to demonstrate the application of the RAI pattern catalog. This case
study outlined the chatbot development process, from planning and design to implementation,
testing, deployment, and monitoring. It highlighted how RAI patterns could mitigate risks at each
stage, ensuring the chatbots are developed responsibly. This included addressing ethical
concerns, ensuring data privacy and fairness, and incorporating diversity in development teams.
The researchers [12] conducted a comprehensive review focusing on Large Language Models
(LLMs), covering their history, architectures, applications, and challenges. The paper emphasized
the difficulty in tracking the rapid advancements in LLM research due to the substantial increase
in contributions within a short period. To address this, the researchers provided a thorough
overview of LLMs, including their evolution, fundamental concepts, architectures (particularly
transformers), training methods, and the datasets used in studies. They also explored a wide
range of LLM applications across different domains, such as biomedical and healthcare,
education, social media, business (including potential use-cases of LLMs for Chatbots), and
agriculture, highlighting how LLMs impact society and the future of AI. Furthermore, the paper
discussed open issues and challenges in deploying LLMs in real-world scenarios, offering insights
into future research directions and development.
The research of related work proves that chatbots play crucial roles in modern business, and
development of one requires tremendous effort and knowledge to produce a bot that fulfills the
business goals and customer desires. None of the works explored usage of modern LLMs like
ChatGPT to build chatbots for SaaS applications so far. Therefore, the goal of this article is to build
an architecture of a chatbot that can use different LLMs interchangeably, and to select the best
LLM for use in SaaS chatbot development.
4. Experiment
The diagram (fig. 1) outlines the architecture of a chatbot system that was developed. The system
integrates with the Telegram platform and leverages Large Language Models (LLMs) through
Langchain. Here is a detailed description of each component and the flow of data:
• Users: These are the individuals interacting with the chatbot via the Telegram platform.
They send messages to the bot and receive responses from it.
• Telegram ChatBot: This represents the front end of the chatbot system where
interactions with users take place. The Telegram chatbot is configured to communicate with
users and handle incoming messages.
• Webhooks with messages: When a user sends a message to the Telegram ChatBot,
Telegram's servers use webhooks to forward this message to the designated Node.js ChatBot
Server. Webhooks are HTTP callbacks that send real-time data to other servers when an event
occurs.
• Node.js ChatBot Server: This is a server-side application written in Node.js that receives
messages from the Telegram bot via webhooks. It processes these messages, maintains the
conversation state, and decides how to respond. The server can handle various tasks such as
command parsing, message logging, user session management, and preparing context for
LLMs.
• Context/Messages: After processing the initial message, the Node.js server formulates a
context or structured prompt which includes the necessary information that the LLM needs to
generate a relevant response. This context can include the message content, conversation
history, and any other relevant data.
• Langchain Python Server: This is a Python-based server that uses Langchain, a
framework designed for building applications with LLMs. The server receives the
context/messages from the Node.js server. Langchain then constructs an appropriate prompt
to send to the LLM based on the received context.
• LLMs: These are the Large Language Models that generate responses based on the
prompts they receive. LLMs are advanced AI models capable of understanding and generating
human-like text. Once the LLM processes the prompt, it sends back a generated text response.
• Response Path: The response from the LLM is sent back to the Langchain Python Server,
which then forwards it to the Node.js ChatBot Server. The Node.js server processes this
response as needed (which may include formatting or further logic) and sends it back to the
Telegram ChatBot.
• Telegram ChatBot to Users: Finally, the Telegram ChatBot sends the processed response
from the Node.js server back to the user, completing the interaction loop.
This architecture allows for a separation of concerns, where the Node.js server handles
interaction management and the Python server with Langchain focuses on utilizing LLMs for
natural language processing tasks. This modular setup enables easy maintenance and scalability,
as each part of the system can be updated or scaled independently.
Below you can see examples of the Chatbot providing responses from different LLMs
integrated into our application.
Comparing LLMs involves evaluating various aspects such as text quality, model
characteristics, computational efficiency, and real-world applicability.
We have used different prompts to estimate how LLM performs under difference
circumstances, focusing on:
• Question Answering
• Language Understanding and Generation
• Commonsense Reasoning
• Technical Understanding
• Bias and Sensitivity Analysis
Besides of prompts, we also need to consider key metrics for LLM, namely:
• Model Size and Complexity
• Training Data
• Response Times
• Cost-effectiveness
5. Results
5.1. Prompt Comparison
Let’s start with comparing responses of LLMs in different prompts focused on specific areas
outlined in previous section.
Table 5
Comparison of training data sizes
ChatGPT Cohere Llama
Unfiltered 45TB 3TB Unknown
Filtered 570GB 200GB Unknown
5.2.4. Cost-effectiveness
• Cohere Command: Cohere's technology is built on large language models similar to those
developed by OpenAI, emphasizing ease of integration for developers and businesses. They
focus on natural language understanding and generation, with a strong commitment to
accessibility and ethical AI usage.
• ChatGPT 3.5 Turbo: Developed by OpenAI, ChatGPT 3.5 Turbo is a variant of the GPT-3.5
model optimized for faster response times and lower operational costs, making it suitable for
more interactive applications. It retains the GPT-3.5 model's vast knowledge base and
generative capabilities but with a focus on efficiency.
• LLaMA-13B: The LLaMA-13B model is part of Meta's LLaMA (Large Language Model
Meta AI) series. The LLaMA models are known for their state-of-the-art performance on
various NLP tasks, emphasizing efficient training and inference.
Though all LLMs were able to provide reasonable and non-biased answers, the Cohere LLM tends
towards long and detailed answers, even when it is not appropriate, Also it worth to mention that
Cohere LLM failed to comply with restriction to provide answer in just 4 sentences, which plays
crucial role in LLMs capabilities.
LLaMa gives most “emotional” answers, it could probably be affected by a high temperature
parameter that the model has. Unfortunately, we can’t control LLaMa’s temperature parameter
via API provided.
ChatGPT in general provides most appropriate answers, short where needed, and detail in case
you ask for a complex and technical questions.
All LLMs have non-biased or neutral responses, that indicates that the training data was of
good quality.
Overall, ChatGPT appears to be the fastest in terms of response time and at the same time
provides good price, whereas Llama seems to be the most economical option. Cohere, on the other
hand, takes the longest to respond and is the most expensive per 1,000 tokens of output.
Though it worth to mention that when we compare response times, we do not take into
consideration number of tokens produced in average.
7. Conclusions
LLMs are on the uprise, and we see numerous LLMs appearing each year. The industry undergoes
rapid development and innovation; therefore, it is crucial to take solid approaches while choosing
the baseline LLM for your developments.
The articles try to deal with the problem of selecting an appropriate LLM for SaaS chatbot
development, though due to limited resources only tree LLMs were examined in detail. Out of all,
the ChatGPT seems to be the best choice in given circumstances, as it gives the most relevant
answers with lighting speed, while charging reasonable price per 1k of tokens. We also need to
bear in mind that OpenAI is an industry leading company that provides the most innovations, so
by onboarding into their toolset the business makes strategic decision that will benefit them in
long-run.
To address the challenges of chatbot development, we developed a multi-layered architecture
that enables the business to scale chatbot throughput as the business grows, and what’s more
important to replace LLMs if needed, without effect on the rest of the application.
Though it worth to mention that the prototype that we developed doesn’t have capabilities to
manage history of the chat, so there is wide range of improvements we could make to ensure that
the LLMs have the context on previous discussions with the customer.
Moreover, the work is limited to the usage of pre-trained LLMs, what is a great starting point
for any LLM integration, but to build a truly unique customer experience for your SaaS application
customers we need to ensure that the chatbot has all the relevant information and fine-tuned
according to your needs. This topic is to be explored in future works.
Acknowledgements
The research study depicted in this paper is funded by the French National Research Agency
(ANR), project ANR-19-CE23-0005 BI4people (Business intelligence for the people)
References
[1] Michael J. Kavis. Architecting the Cloud Design Decisions for Cloud Computing Service
Models (SaaS, PaaS, and IaaS): Wiley, 2014. 229 c.
[2] Tomas Erl, Ricardo Puttini, Zaigham Mahmood. Cloud Computing, Concepts, Technology &
Architecture: Pearson, 2013. 747 с.
[3] “MACH Allience”, available at: https://en.wikipedia.org/wiki/MACH_Alliance (last accessed:
27.11.2023)
[4] “Composable Commerce”, available at: https://www.elasticpath.com/composable-
commerce (last accessed: 14.02.2024)
[5] “Large language model”, available at: https://en.wikipedia.org/wiki/Large_language_model
(last accessed: 14.02.2024)
[6] Roggeveen, A. L., & Sethuraman, R. (2020). Customer-interfacing retail technologies in 2020
& beyond: An integrative framework and research directions. Journal of Retailing, 96(3),
299-309. https://doi.org/10.1016/j.jretai.2020.08.001
[7] Bouchra EL Bakkouri, Smaira Raki, Touhfa Lalla Belgnaoui, (2022). “The Role of Chatbots in
Ecnhancing Customer Experience: Literature Review”. Procedia Computer Science, 203(1),
432-437. http://dx.doi.org/10.1016/j.procs.2022.07.057
[8] Adiwardana, D., Luong, M. T., So, D. R., Hall, J., Fiedel, N., Thoppilan, R., ... & Le, Q. V. (2020).
"Towards a human-like open-domain chatbot."
https://doi.org/10.48550/arXiv.2001.09977.
[9] G. Daniel, J. Cabot, L. Deruelle and M. Derras, "Xatkit: A Multimodal Low-Code Chatbot
Development Framework," in IEEE Access, vol. 8, pp. 15332-15346, 2020,
https://doi.org/10.1109/access.2020.2966919
[10] G. Attigeri, A. Agrawal and S. Kolekar, "Advanced NLP models for Technical University
Information Chatbots: Development and Comparative Analysis," in IEEE Access.
https://doi.org/10.1109/access.2024.3368382.
[11] G. Mao, J. Su, S. Yu and D. Luo, "Multi-Turn Response Selection for Chatbots With Hierarchical
Aggregation Network of Multi-Representation," in IEEE Access, vol. 7, pp. 111736-111745,
2019, https://doi.org/10.1109/access.2019.2934149.
[12] Q. Lu, Y. Luo, L. Zhu, M. Tang, X. Xu and J. Whittle, "Developing Responsible Chatbots for
Financial Services: A Pattern-Oriented Responsible Artificial Intelligence Engineering
Approach," in IEEE Intelligent Systems, vol. 38, no. 6, pp. 42-51, Nov.-Dec. 2023.
https://doi.org/10.1109/mis.2023.3320437.
[13] M. A. K. Raiaan et al., "A Review on Large Language Models: Architectures, Applications,
Taxonomies, Open Issues and Challenges," in IEEE Access, vol. 12, pp. 26839-26874, 2024,
https://doi.org/10.1109/access.2024.3365742.
[14] Hunter, I. T. (n.d.). Distributed Node. Js: Building Enterprise-Ready Backend Services. O’Reilly
Media.
[15] "Node.js – Introduction to Node.js", available at: https://nodejs.org/en/learn/getting-
started/introduction-to-nodejs (last accessed: 25.02.2024)
[16] "Introduction | Langchain", available at:
https://python.langchain.com/docs/get_started/introduction (last accessed: 25.02.2024)
[17] "ChatGPT", available at: https://openai.com/chatgpt (last accessed: 25.02.2024)
[18] "Cohere Chat", available at https://cohere.com/chat (last accessed: 25.02.2024)
[19] "Llama API", available at https://www.llama-api.com/ (last accessed: 25.02.2024)