0% found this document useful (0 votes)
121 views18 pages

AI Unreliable Answers - A Case Study

Uploaded by

sartemholov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views18 pages

AI Unreliable Answers - A Case Study

Uploaded by

sartemholov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

AI Unreliable Answers: A Case Study

on ChatGPT

Ilaria Amaro, Attilio Della Greca, Rita Francese(B) , Genoveffa Tortora,


and Cesare Tucci

Computer Science Department, University of Salerno, Fisciano, Italy


{iamaro,adellagreca,francese,tortora,ctucci}@unisa.it

Abstract. ChatGPT is a general domain chatbot which is object of


great attention stimulating all the world discussions on the power and
the consequences of the Artificial Intelligence diffusion in all the field,
ranging from education, research, music to software development, health
care, cultural heritage, and entertainment.
In this paper, we try to investigate whether and when the answers pro-
vided by ChatGPT are unreliable and how this is perceived by expert
users, such as Computer Science students. To this aim, we first analyze
the reliability of the answers provided by ChatGPT by experimenting its
narrative, problem solving, searching, and logic capabilities and report
examples of answers. Then, we conducted a user study in which 15 par-
ticipants that already knew the chatbot proposed a set of predetermined
queries generating both correct and incorrect answers and then we col-
lected their satisfaction. Results revealed that even if the present version
of ChatGPT sometimes is unreliable, people still plan to use it. Thus, it
is recommended to use the present version of ChatGPT always with the
support of human verification and interpretation.

Keywords: ChatGPT · Satisfaction · Case Study

1 Introduction
Artificial Intelligence (AI) is increasingly pervading our daily lives through the
use of intelligent software, such as chatbots [1]. There exists many definitions
of chatbots, according to [10], a chatbot is a computer program which responds
like a smart entity when conversed with through text or voice and understands
one or more human languages by Natural Language Processing (NLP).
At the present (January 2023), the ChatGPT1 chatbot is fascinating all the
world and generates a lot of discussions on the capability of Artificial Intelli-
gence of substituting the Human Being. Many positive reactions to the tool have
been provided: the New York Times stated that it is “the best artificial intel-
ligence chatbot ever released to the general public” [17]. An important Italian
newspaper (Corriere della Sera) on 01.31.2023 stated that “ChatGPT answers
1
https://chat.openai.com/chat.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
H. Degen and S. Ntoa (Eds.): HCII 2023, LNAI 14051, pp. 23–40, 2023.
https://doi.org/10.1007/978-3-031-35894-4_2
24 I. Amaro et al.

questions, solves equations and resumes text” even if “it does not provide either
morale judgment nor intellectual contents”. The big tech companies have also
been largely impacted by the ChatGPT technology: Microsoft is investing in
ChatGPT 10 billion dollars with the plan to include it into Bing for attacking
the Google hegemony, and to add ChatGPT text summarization and transla-
tion features to Microsoft Teams. On the other side, Google is worried of loosing
its predominating position in the search engine field and is going to develop a
similar product [7].
Many newspapers are contributing to create a halo of magic intelligence
around ChatGPT, but simultaneously the first negative comments are arriving.
Many people are worried about the future of knowledge workers [12], who may
be substituted by Artificial Intelligence algorithms such as ChatGPT. Many
teachers are worried on the ChatGPT capabilities of performing exams. As an
example, ChatGPT scored C+ at Low school exams in four courses, low but
passing grade [5]. Some researchers tried to generate literature review on a spe-
cific topic (Digital Twin) and noted that the results were promising and with a
low detection of plagiarism percentage [3]. Others consider the GPT technology
as an opportunity for Health, with due caution [11]. The use of ChatGPT for
supporting the researcher has been investigated in [13]. Many are the advan-
tages, but also in this case caution is recommended, due to several drawbacks,
such as lack of generalizability, dependence on data quality and diversity, lack of
domain expertise, limited ability to understand context, ethical considerations,
and limited ability to generate original insights. Someone else, such as the Stack
Overflow community, banned for one month the users providing answers pro-
duced by ChatGPT in their discussions [19] because the solutions produced by
the chatbot seem realistic but often wrong and may confuse a inexpert user.
Others are sceptic on its real capabilities and tested its reliability, as in our case.
The goal of this case study is to try to investigate the capability of ChatGPT
and how reliable are its answers. In particular, we aim at answering to the
following research questions:
– RQ1: which are the task better supported by ChatGPT?
– RQ2: which is the behaviour of ChatGPT when it does not know the answer?
– RQ3: how may we use the support offered by ChatGPT?
– RQ4: which is the opinion of expert Computer Science students onto the
tool?
To answer these questions we conducted:
– a preliminary investigation on the reliability of the answers provided by Chat-
GPT, also considering related work;
– a user study involving fifteen expert users (Computer Science students) to
collect their satisfaction and their opinion on the tool.
The paper is organized as follows: in Sect. 2, we describe the ChatGPT tool
along with some details about generative transformers; in Sect. 3, we define an
exploratory method to analyze the capabilities of ChatGPT for various types of
tasks; in Sect. 4 we describe the user study. Section 5 discusses the results of our
investigation, and Sect. 6 concludes with the final remarks and future work.
AI Unreliable Answers: A Case Study on ChatGPT 25

2 ChatGPT
In this section we summarize the main steps that OpenAI followed to create
ChatGPT.
ChatGPT is a language model developed by OpenAI that belongs to the
family of Generative Pre-trained Transformer (GPT) models. The GPT models
leverage deep learning approaches to generate text and have attained state-of-
the-art performances on a variety of natural language processing tasks. ChatGPT
is the most recent model in this line, following the success of GPT-1, GPT-2,
and GPT-3.

GPT-1. GPT-1, released in 2018, was the first model in the GPT series devel-
oped by OpenAI [15]. It was a cutting-edge language model that utilized deep
learning techniques to generate text. Relying on a model size of 117 million
parameters, which was relatively large for its time, GPT-1 was trained on a
diverse range of internet text, allowing it to generate coherent and diverse
answers. The architecture of the model was based on the transformer architec-
ture, which has since become a staple of the field of natural language processing.
The self-attention mechanism in GPT-1 allowed it to effectively weigh the
importance of each word in a sentence when generating text, while the fully
connected feed-forward network enabled it to produce a representation of the
input that was more suitable for the task of text generation.
Through its training on a large corpus of wording, GPT-1 learned pat-
terns in the language and became capable of generating semantically meaningful
responses that were similar to the input it was trained on. This made it a valuable
tool for a wide range of natural language processing applications and established
the GPT series as a benchmark for language models.

GPT-2. OpenAI published GPT-2 in 2019 as the second model of its GPT series
[16]. GPT-2 considerably enlarged the architecture of its predecessor, GPT-1, by
adding 1.5 billion parameters to its model. This enabled it to generate writing
that was even more human-like, as well as execute a variety of linguistic tasks,
including translation and summarization. It was therefore a major improve-
ment over its predecessor, GPT-1, and proved the potential of deep learning
approaches is generate human-like text capable of performing a variety of lin-
guistic tasks. Its success set the path for the creation of the following GPT-series
devices.

GPT-3. OpenAI’s GPT-3, introduced in 2020, was the third model in the GPT
series [4]. At the time of its publication, it had 175 billion parameters, making
it the largest language model available at the time. This enabled it to display
unparalleled performance in a variety of natural language processing tasks, such
as question-answering, summarization, and even coding. The fully connected
feed-forward network enabled the model to generate a representation of the
input that was more suited for the current task.
26 I. Amaro et al.

GPT-3 was trained on a wide variety of Internet text, providing it with a


broad comprehension of language and the ability to generate meaningful and
varied writing.

ChatGPT. ChatGPT is the latest product belonging to the OpenAI’s GPT


models, being released in late 2022. With respect to GPT-3, ChatGPT is opti-
mized for the task of conversational AI with a smaller model size. This difference
in model size reflects the different tasks that each model is designed to perform.
Besides, while GPT-3 was trained on a diverse range of internet text, ChatGPT
was specifically trained on conversational data. This specialization in training
data has resulted in ChatGPT having a better understanding of conversational
language and being able to generate more coherent and diverse responses to
prompts. The final architecture of ChatGPT comprises of numerous layers, each
of which contains a self-attention mechanism that enables the network to eval-
uate the significance of certain words in a phrase while making predictions.
ChatGPT’s specific transformer design consists of a sequence of identical layers,
each of which has two sub-layers: a multi-head self-attention mechanism and
a position-wise, fully connected feed-forward network. The self-attention mech-
anism enables each word in the input sequence to pay attention to all other
words, thereby enabling the network to determine the relative importance of
each word when making predictions. This approach is achieved using numerous
concurrent attention heads, each of which learns to focus on distinct portions of
the input. The fully linked feed-forward network is utilized to generate an input
representation that is better suitable for the current task, such as text produc-
tion. This network is followed by layer normalization and a residual connection,
which assists in stabilizing model training.
The output of each layer is provided as input to the subsequent layer, enabling
the model to represent higher degrees of abstraction as the input passes through
the network. The last layer generates a probability distribution across the vocab-
ulary, which can be used to sample a coherent sequence of words.

3 The Reliability of the Answers Provided by ChatGPT

To answer RQ1, RQ2, and RQ3 we proposed ChatGPT several kinds of ques-
tions, such as:

– Creativity: we submitted request of generating contents on specific topics.


– Search: Search information on specific topic.
– Problem solving: Resolution of an arithmetic problem, a programming
problem, and a finance problem.
– Logic: Capability of providing logical proofs when all the assumptions have
been provided.

We tried to test in a not exhaustive way the ChatGPT capabilities by searching


for problems and solutions on Google.
AI Unreliable Answers: A Case Study on ChatGPT 27

Fig. 1. Screenshot of a story invented by ChatGPT.

3.1 Creativity
We decided to propose tasks related to the invention of a story for a child, the
creation of an exam track of programming with the solution, lyrics, and music
generation.
The first question we proposed is reported in Fig. 1, which shows the nice
story on a fox and a bear ChatGPT generated for a child. Result is very impres-
sive.
28 I. Amaro et al.

Many musicians tried the capability of ChatGPT in generating lyrics and


music [18]. ChatGPT answers query such as “write a lyrical verse in the style
of [artist] about [topic]”. We experimented this by asking the question reported
in Fig. 2, where we asked to write a lyrics on the environment in John Lennon’s
style. This is also nice, but the analysis reported in [18] observed that the modifi-
cation provided to the chord progression in Fig. 3 it is not satisfying. Concerning
the generation of exam tracks, we asked to create a track in which the use of
two arrays and the module operator is needed and to provide the solution in C
language. Results are shown in Fig. 4.

Fig. 2. Screenshot of a song on the environment written in John Lennon style by


ChatGPT.

3.2 Search/Text Summarization/Translation

Using ChatGPT to search for information on a certain topic might be a quick


and simple solution. To verify this we asked ChatGPT to provide us related
works concerning the evaluation of trust in itself. Results are shown in Fig. 5.
We were very happy of the results until we discovered that all the papers were
invented! We also verified the questions concerning Prosdocimo adopted in [20]
and the answers of ChatGPT varies in a nondeterministic way.
To verify ChatGPT didactic capabilities we asked it to explain several Com-
puter Science concepts, such as the concept of pointer by making examples in
the C language. The explanation was very easy to understand.
AI Unreliable Answers: A Case Study on ChatGPT 29

Fig. 3. Screenshot of a chord progression modified by ChatGPT [18].

We did not assessed text summarization and translation capabilities og the


GPT-3 transformer, which are widely recognized [2,7,9].

3.3 Problem Solving

Problem solving concerns the process of examining a problem scenario and


finding a satisfactory and effective solution. In the mathematical context, prob-
lem solving takes on even more significance, as it requires the use of certain
methods and techniques to solve complex equations and calculations. The prob-
lems that we submitted to ChatGPT were selected from various sources, includ-
ing medical school entrance tests and Computer Science exam tracks. These
problems represent different challenges, but all require in-depth analysis and
the ability to apply appropriate knowledge to find an effective solution. For the
problem in Fig. 6 it provides different but always wrong solutions. We discovered
also that ChatGPT sometimes provides nondeterministic answers: we asked to
perform the following task:
“An annual subscription to a weekly, whose cover price is 3 euros, costs 104
euros. A subscription to a monthly, whose cover price is 4 euros, costs 36 euros.
How much do you save in total by buying subscriptions?
(Consider the year consisting of 52 weeks)”
We verified by running the test 27 times through chat and different accounts
that ChatGPT solved this task correctly in only 50% of the cases. As shown in
Fig. 7a and 7b, the same test produced completely different answers of which
one was correct and one was wrong.
30 I. Amaro et al.

Fig. 4. Screenshot of a C programming exam track invented by ChatGPT.

3.4 Logic

We proposed to ChatGPT the logic problems in Fig. 8 and 9, obtaining wrong


results. As shown in Fig. 8, ChatGPT admitted its mistakes when we provided
the right solution and apologized in a very polite way.
AI Unreliable Answers: A Case Study on ChatGPT 31

4 The User Study


After having investigated the ChatGPT answers to different types of questions,
we conducted a user study with the aim to answer RQ4: Which is the opinion
of Computer Science students on the tool?

Fig. 5. Screenshot of nonexistent references provided by ChatGPT.


32 I. Amaro et al.

Fig. 6. Screenshot of a wrong solution to a mathematical problem with explanation


provided by ChatGPT.

4.1 Study Planning


We tried to answer RQ4 by let participants explore several types of questions
a user may propose to a general purpose chatbot on the base of our experience
described in the previous section. We did not do this in an exhaustive way, but
we tried to investigate the impact of both ChatGPT abilities and mistakes on
the user’s perception.

4.2 Participants
To assess the user satisfaction on ChatGPT we conducted a study involving
fifteen participants.
The administration of the assessment was preceded by obtaining the partici-
pants’ informed consent. Participants had the option to withdraw from the study
at any point if they felt uneasy or lost interest in continuing. This guaranteed
that the participants had complete control over their experience and were free
to make decisions based on their degree of comfort.
The participants were eleven men and four women. 9 participants falling
between the ages of 24 and 30 years old and 6 participant in the range 18 and
23 years old.
All of them were Italian. Eleven participants had a bachelor’s degree in Com-
puter Science and four held a master’s degree in the same field.
In addition to their academic credentials, each participant had prior experi-
ence interacting with ChatGPT. This provided a level of familiarity and comfort
that was beneficial during the delivery of the test.
AI Unreliable Answers: A Case Study on ChatGPT 33

(a) right solution.

(b) wrong solution.

Fig. 7. Screenshot of solution to a mathematical problem with explanation provided


by ChatGPT.

4.3 Procedure

We collected the participant’s perception on their satisfaction with ChatGPT


before the study. In particular, they answered the question:
Q1: Overall, how satisfied are you with ChatGPT? The answers were scored
with a seven point Likert scale, ranging from 1 = in no way to 7 = very very much.
We provided the participants a list of the questions to provide to ChatGPT. The
identified questions are reported in Table 1. We selected the questions on the base
of the analysis discussed in the previous section. Then the study started with
the supervision of two of the authors. When the participants accomplished all
the tasks they answered again Q1 and filled in the following open question:
Q2: Which is your opinion on ChatGPT?
34 I. Amaro et al.

4.4 Results

The analysis of the difference between the user perception of ChatGPT’s reliabil-
ity before and after the experience are shown in the boxplots in Fig. 10. Results
revealed that the participant’s satisfaction decreases after the experience but
at a level lower than expected, after knowing the indeterminate behaviour. The
median passed from 6 (before) to 5 (after). As an example, P7 scored seven in
Q1 before the experience and six after. By examining the answer to the open
question, P7 stated: “Simple and intuitive to use. If you use it with a certain
constancy you can perceive the difference between what has been formulated by
it and what has not, and the structure of his answers. Overall, it is an excellent
trump card to be used in critical situations and beyond”.
P3 stated that: “Very useful for study support, especially in carrying out and
explaining exercises; brainstorming, or, in general, when it is necessary to discuss
or elicit ideas and there is no adequate people; precise questions, for which you
want a short and immediate answer.” According to P4, “Even if sometimes the
answers may be unreliable the tool generates text which is very realistic and polite.
For this reason it is still appealing for me”. P14’s opinion is the following:“It is
a software that helps everyone and facilitates the understanding and research of
the information you need to know. I trust it for code generation.”

Fig. 8. Screenshot of a wrong solution to a logic test provided by ChatGPT.


AI Unreliable Answers: A Case Study on ChatGPT 35

Table 1. Task description

Task Question
Creativity 1.a) Write a children’s story on the
fox and the bear for a child
1.b) Generate a complex exam track
of programming in C language which
requires the development of a
program on arrays by using the
while loop, two arrays, the
%operator and provide the solution
Generate lyrics in John Lennon’s
style.
Problem solving 3) For his aquarium Michele bought
50 fish including neons, guppies,
black angels and clown loaches. 46
are not guppies, 33 are not clown
loaches and neons are one more than
black angels. How many neons are
there?
Search activity 2.b.1) Who is Father Christopher?
(he is a character of “Promessi
Sposi” by Alessandro Manzoni)
2.b.2) Give me an example of when
Father Christopher does the right
thing
3.b.1) Do you know that Prosdocimo
is a character in a Rossini opera? [20]
3.b.2) Did you know that
Prosdocimo is a character in “Turco
in Italia”? [20]
3.c) Please, explain the concept of
pointers by using examples in C
language

5 Discussion and Lesson Learned

In this section we try to answer the research questions on the base of our analysis
and on the user perceptions. This case study’s primary objective was to identify
its strengths and limitations and to evaluate the effect of ChatGPT’s reliability
on user satisfaction. The findings of the study revealed that despite the fact
that the chatbot occasionally produced incorrect responses, most users still are
satisfied of it and enjoyed interacting with it. Figure 10 depicts how the partic-
ipant’s satisfaction changed after learning that ChatGPT’s responses may not
be reliable while Table 2 reports descriptive statistics of perceived satisfaction.
36 I. Amaro et al.

Fig. 9. Screenshot of a wrong solution to a logic problem with explanation provided


by ChatGPT.

In particular, we observed that the satisfaction of participants P2 and P3


grew after interacting with the chatbot; Both P4 and P6 satisfaction remained
unchanged at 5 points. Concerning the other users, results agreed the findings
in [6,8,14], which claimed that low reliability mostly decreases intention to use
of virtual AIs in laboratory and field studies where the initial trust was very
high. But in our case the decreasing is reduced, indeed the median was 6 before
the experience and 5 after. This also raises the question of whether users are
prepared to trade accuracy and reliability for the convenience and accessibility
that chatbots give. One of the most important takeaways from this study is that
consumers are willing to accept inconsistent responses from chatbots so long
as they are able to offer the desired information or text/code. It is essential to
remember, however, that this tolerance for unreliability may vary by subject. In
AI Unreliable Answers: A Case Study on ChatGPT 37

Table 2. Descriptive statistics of perceived satisfaction (N=15).

Median Mean Stdev Min Max


Before 6 5.4 1.24 2 7
After 5 4.7 1.11 3 7

other words, people may be more tolerant of erroneous replies while searching
for information in fields in which they are experts, as they are able to determine
the reliability of the material themselves. The ability of ChatGPT to serve users
in a range of scenarios is an additional significant finding of the study. Users
may rely critical decisions on the chatbot’s responses; therefore, it is essential to
thoroughly evaluate the accuracy of the information supplied.

Fig. 10. Boxplot related to perceived satisfaction with ChatGPT collected Before and
After the study.

Concerning RQ1, the identification of tasks better supported by ChatGPT,


it is difficult to answer. In many cases results are satisfying but often an unre-
liable/nondeterministic behavior may appear. This includes creating fabricated
or altered bibliographic references as reported in the case of the related works
list or making mistakes on logic problems providing believable solutions. Thus,
it is crucial before using its results to remember that ChatGPT can produce
false or misleading information due to its reliance on natural language process-
ing and large-scale text parsing. Therefore, it is essential to carefully confirm the
veracity of information gained through ChatGPT with reputable sources and by
consulting with specialists in the field.
38 I. Amaro et al.

RQ1: which are the task better supported by ChatGPT?

It is common opinion that ChatGPT excels in text summarization and


language translation. We also verified that it excels in creativity tasks.
Concerning code generation tasks, the Stack Overflow community has
bad consideration of the snippets produced, even if on simple code
ChatGPT seems to work well. Problems involving logic or mathematics
are prone to yield incorrect results. The chatbot also delivers accurate
responses to questions about general culture (assuming the asked fact
has happened within 2021). All the answers have to be verified with an
expert support or by performing searches (in the present version of the
tool).

We also observed the behaviour of ChatGPT when it has an “hallucination”


and provides errouneous information in a very credible way.

RQ2: which is the behaviour of ChatGPT when it does not know the
answer?

ChatGPT always attempts to deliver a plausible response, even if it is


uncertain about the accuracy of the provided information. Rarely does
the tool confess it is unable to respond. When questioned about subjec-
tive judgments (e.g., aesthetic evaluations), its own feelings, or methods
to damage others/commit unlawful acts, ChatGPT refuses to respond
or admits it cannot deliver a reliable response. When the user claims the
response from ChatGPT is uncorrect, it almost always apologizes and
modifies it.

Many newspapers, talk shows and also users consider ChatGPT as an appro-
priate general purpose supporting tool.

RQ3: how may we use the support offered by ChatGPT?

It is evident that ChatGPT generates realistic answers and this may


be very dangerous if the person who is using it is not aware of this.
In addition, the behaviour is not deterministic: the same question may
correspond to a correct or incorrect answer when asked at different times.

Concerning the perception of the Computer Science students on ChatGPT,


we observed an enthusiastic attitude. Participants already used the tool.
AI Unreliable Answers: A Case Study on ChatGPT 39

RQ4: which is the opinion of expert Computer Science students onto


the tool?

Despite participants had a clear knowledge that the answers from Chat-
GPT may sometimes be unreliable, they would continue to utilize it for
general research and trust in it for specific tasks. Even if some of the
responses were incorrect, the average score for user satisfaction resulted
to be 4.7 out of 7, median 5.

6 Conclusion
In this paper, we proposed a study investigating the reliability of the answers
generated by ChatGPT and on how end users, e.g., Computer Science students
are satisfied with this chatbot, also collecting their opinions. To this aim we pro-
posed to ChatGPT problems of different types. It generates very credible text,
such as story o lyrics, but failed in case of some mathematics and logic prob-
lems, always trying to provide an answer. Fifteen compute Science students used
the tool under our indication that show them both appropriate and unreliable
answers. Surprisingly, they keep on appreciating the tool. This study concludes
by emphasizing the trade-off between reliability and usability in the design of
chatbots. Users are ready to accept inconsistent responses in exchange for the
convenience and accessibility given by chatbots; nonetheless, it is essential to
ensure that chatbots are used responsibly and that users are aware of their lim-
its.
This study has been conducted on a reduced number of participants since
they were expert users. For more accurate results we plan to consider a larger
sample, involving also participants without previous knowledge on the chatbot.

References
1. Adamopoulou, E., Moussiades, L.: An overview of chatbot technology. In: Maglo-
giannis, I., Iliadis, L., Pimenidis, E. (eds.) AIAI 2020. IAICT, vol. 584, pp. 373–383.
Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49186-4 31
2. Aydın, N., Erdem, O.A.: A research on the new generation artificial intelligence
technology generative pretraining transformer 3. In: 2022 3rd International Infor-
matics and Software Engineering Conference (IISEC), pp. 1–6. IEEE (2022)
3. Aydın, Ö., Karaarslan, E.: OpenAI ChatGPT generated literature review: digital
twin in healthcare. Available at SSRN 4308687 (2022)
4. Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural
Information Processing Systems 33, pp. 1877–1901 (2020)
5. Choi, J.H., Hickman, K.E., Monahan, A., Schwarcz, D.: ChatGPT goes to law
school. Available at SSRN (2023)
6. Fan, X., et al.: The influence of agent reliability on trust in human-agent collabo-
ration. In: Proceedings of the 15th European Conference on Cognitive Ergonomics:
The Ergonomics of Cool Interaction, pp. 1–8 (2008)
40 I. Amaro et al.

7. Forbes: Microsoft confirms its $10 billion investment into ChatGPT, changing
how Microsoft competes with Google, Apple and other tech giants. https://www.
forbes.com/sites/qai/2023/01/27/microsoft-confirms-its-10-billion-investment-
into-chatgpt-changing-how-microsoft-competes-with-google-apple-and-other-
tech-giants/?sh=24dd324b3624
8. Glass, A., McGuinness, D.L., Wolverton, M.: Toward establishing trust in adaptive
agents. In: Proceedings of the 13th International Conference on Intelligent User
Interfaces, pp. 227–236 (2008)
9. Katrak, M.: The role of language prediction models in contractual interpretation:
the challenges and future prospects of GPT-3. In: Legal Analytics, pp. 47–62 (2023)
10. Khanna, A., Pandey, B., Vashishta, K., Kalia, K., Pradeepkumar, B., Das, T.: A
study of today’s AI through chatbots and rediscovery of machine intelligence. Int.
J. u- and e-Serv. Sci. Technol. 8(7), 277–284 (2015)
11. Korngiebel, D.M., Mooney, S.D.: Considering the possibilities and pitfalls of gen-
erative pre-trained transformer 3 (GPT-3) in healthcare delivery. NPJ Digit. Med.
4(1), 93 (2021)
12. Krugman, P.: Does ChatGPT mean robots are coming for the skilled jobs?
The New York Times. https://www.nytimes.com/2022/12/06/opinion/chatgpt-ai-
skilled-jobs-automation.html
13. Alshater, M.M.: Exploring the role of artificial intelligence in enhancing academic
performance: a case study of ChatGPT. Available at SSRN (2022)
14. Moran, S., et al.: Team reactions to voiced agent instructions in a pervasive game.
In: Proceedings of the 2013 International Conference on Intelligent User Interfaces,
pp. 371–382 (2013)
15. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving lan-
guage understanding by generative pre-training (2018)
16. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language
models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)
17. Roose, K.: The brilliance and weirdness of ChatGPT. The New York Times.
https://www.nytimes.com/2022/12/05/technology/chatgpt-ai-twitter.html
18. Sandzer-Bell, E.: ChatGPT music prompts for generating chords and lyrics.
https://www.audiocipher.com/post/chatgpt-music
19. Stack Overflow. https://meta.stackoverflow.com/questions/421831/temporary-
policy-chatgpt-is-banned
20. Vetere, G.: Posso chiamarti prosdocimo? perché è bene non fidarsi troppo delle
risposte di ChatGPT. https://centroriformastato.it/posso-chiamarti-prosdocimo/

You might also like