0% found this document useful (0 votes)
45 views7 pages

Chatbot vs Physicians for Breast Cancer Care

This study evaluates the effectiveness of the Vik chatbot in providing information to breast cancer patients compared to a group of physicians. The results indicate that the chatbot's responses were noninferior in quality, with satisfaction rates of 69% for the chatbot versus 64% for the physicians. This suggests that chatbots may serve as a valuable resource for patients with minor health concerns, potentially freeing up clinicians for more critical cases.

Uploaded by

Zhu Kening
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views7 pages

Chatbot vs Physicians for Breast Cancer Care

This study evaluates the effectiveness of the Vik chatbot in providing information to breast cancer patients compared to a group of physicians. The results indicate that the chatbot's responses were noninferior in quality, with satisfaction rates of 69% for the chatbot versus 64% for the physicians. This suggests that chatbots may serve as a valuable resource for patients with minor health concerns, potentially freeing up clinicians for more critical cases.

Uploaded by

Zhu Kening
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

JOURNAL OF MEDICAL INTERNET RESEARCH Bibault et al

Original Paper

A Chatbot Versus Physicians to Provide Information for Patients


With Breast Cancer: Blind, Randomized Controlled Noninferiority
Trial

Jean-Emmanuel Bibault1*, MD, PhD; Benjamin Chaix2,3*, MSc; Arthur Guillemassé3, MSc; Sophie Cousin4, MSc,
MD; Alexandre Escande5, MSc, MD; Morgane Perrin6, MSc, MD; Arthur Pienkowski3, PharmD; Guillaume Delamon3,
PharmD; Pierre Nectoux3, MSc; Benoît Brouard3, PharmD
1
Department of Radiation Oncology, Hôpital Européen Georges Pompidou, AP-HP, Paris, France
2
ENT Department, Hôpital Gui de Chauliac, Université Montpellier 1, Montpellier, France
3
Wefight, Institut du Cerveau et de la Moelle épinière, Hôpital Pitié-Salpêtrière, Paris, France
4
Department of Medical Oncology, Institut Bergonié, Bordeaux, France
5
Department of Radiation Oncology, Centre Oscar Lambret, Lille, France
6
Department of Gynecological Oncologic Surgery, Gustave Roussy Cancer Campus, Villejuif, France
*
these authors contributed equally

Corresponding Author:
Benjamin Chaix, MSc
ENT Department
Hôpital Gui de Chauliac
Université Montpellier 1
60 Avenue Augustin Fliche
Montpellier, 34264
France
Phone: 33 0467336872
Email: b-chaix@[Link]

Abstract
Background: The data regarding the use of conversational agents in oncology are scarce.
Objective: The aim of this study was to verify whether an artificial conversational agent was able to provide answers to patients
with breast cancer with a level of satisfaction similar to the answers given by a group of physicians.
Methods: This study is a blind, noninferiority randomized controlled trial that compared the information given by the chatbot,
Vik, with that given by a multidisciplinary group of physicians to patients with breast cancer. Patients were women with breast
cancer in treatment or in remission. The European Organisation for Research and Treatment of Cancer Quality of Life Group
information questionnaire (EORTC QLQ-INFO25) was adapted and used to compare the quality of the information provided to
patients by the physician or the chatbot. The primary outcome was to show that the answers given by the Vik chatbot to common
questions asked by patients with breast cancer about their therapy management are at least as satisfying as answers given by a
multidisciplinary medical committee by comparing the success rate in each group (defined by a score above 3). The secondary
objective was to compare the average scores obtained by the chatbot and physicians for each INFO25 item.
Results: A total of 142 patients were included and randomized into two groups of 71. They were all female with a mean age of
42 years (SD 19). The success rates (as defined by a score >3) was 69% (49/71) in the chatbot group versus 64% (46/71) in the
physicians group. The binomial test showed the noninferiority (P<.001) of the chatbot’s answers.
Conclusions: This is the first study that assessed an artificial conversational agent used to inform patients with cancer. The
EORTC INFO25 scores from the chatbot were found to be noninferior to the scores of the physicians. Artificial conversational
agents may save patients with minor health concerns from a visit to the doctor. This could allow clinicians to spend more time
to treat patients who need a consultation the most.
Trial Registration: [Link] NCT03556813, [Link]

[Link] J Med Internet Res 2019 | vol. 21 | iss. 11 | e15787 | p. 1


(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Bibault et al

(J Med Internet Res 2019;21(11):e15787) doi: 10.2196/15787

KEYWORDS
chatbot; clinical trial; cancer

This study is a blind, noninferiority randomized controlled trial


Introduction that compared the information given by the Vik chatbot versus
Background that given by a multidisciplinary group of physicians (medical,
radiation, and surgical oncology) to patients with breast cancer
Chatbots can imitate human conversation by using a field of (NCT03556813). The EORTC QLQ-INFO25 questionnaire,
artificial intelligence (AI) known as natural language processing. which was validated to assess information of patients with
Chatbots are now widely used in several forms as voice-based cancer [9], was adapted and used to compare the quality of the
agents, such as Siri (Apple), Google Now (Google), Alexa information provided to the 2 groups of patients by the physician
(Amazon), or Cortana (Microsoft). Text-based chatbots are or the chatbot.
available as Messenger (Facebook) agents or as stand-alone
mobile or Web apps. They provide information and create a Methods
dynamic interaction between the agent and the user, without
human back-end intervention. The concept of an artificial Study Design and Participants
conversational agent dates back to 1950, when Alan Turing
The study was a blind, noninterventional, noninferiority
envisioned a future where a computer would be able to express
randomized study, without any risk or burden. It was conducted
itself with a level of sophistication that would render it
in France in November and December 2018.
indistinguishable from humans [1].
The authors selected the 12 most frequently asked questions
In health care, the first example of a computer program used as
about breast cancer from Vik's database (Multimedia Appendix
a conversational agent was Joseph Weizenbaum’s ELIZA, a
1). These questions were then asked both to the Vik chatbot and
program that mimicked a Rogerian psychotherapist and that
to a multidisciplinary medical committee (oncologist surgeon,
was able to rephrase the patient’s sentences as questions and
medical oncologist, and oncologist radiotherapist; Figure 1).
provide prerecorded answers [2]. In 1991, Dr Sbaitso was
The second independent multidisciplinary group of physicians
created as an AI speech synthesis program for MS-DOS personal
ensured that each group’s answers did not provide inaccurate
computers. In this software, Dr Sbaitso was designed as a
information. Institutional affiliations of the coordinating team
psychologist, with very limited possibilities [3]. Four years later,
were not displayed.
the chatbot, Artificial Linguistic Internet Computer Entity, was
created to include 40,000 knowledge categories and was Patients were recruited with the help of a French breast cancer
awarded the Loebner Prize thrice [4]. In 2001, SmarterChild patients association (Mon Réseau Cancer du Sein). They were
was made available as a bot distributed across SMS networks filtered for eligibility based on the inclusion criteria (age >18
and is now considered as a precursor to Apple’s Siri, which was years, female, subjects with breast cancer in treatment or
released on iPhones in 2010. Patients can now use chatbots to remission, nonopposition, and internet literacy). Participants
check for symptoms and to monitor their health, but the were compensated for their time. This study was approved by
relevance and validity of chatbots have rarely been assessed an ethics committee independently selected by the French
[5-7]. Ministry of Health (N° ID RCB: 2018-A01365-50) and
registered in the [Link] database (NCT03556813).
Objective The data collected were anonymized and then hosted by Wefight
Wefight designed a chatbot named Vik for patients with breast on a server compliant with health care data storage requirements.
cancer and their relatives via personalized text messages. Vik Consent was collected online before the start of the study. In
provides information about breast cancer and its epidemiology, accordance with the French and European laws on information
treatments, side effects, and quality of life improvement technology and civil liberties (Commission Nationale
strategies (sport, fertility, sexuality, and diet). More practical Informatique et Libertés, Règlement Général pour la Protection
information, such as reimbursement and patients’ rights, is also des Données), users had a right of use at their disposal to verify
available. Chaix et al [8] showed that it was possible to obtain its accuracy and, if necessary, to correct, complete, and update
support through a chatbot as Vik improved the medication it. They also had a right to object to their use and a right to
adherence rate of patients with breast cancer. Vik is available delete these data. General conditions of use were displayed and
for free on the Web, on any mobile phones, iOS (Apple) or explained very clearly, and they must be accepted before using
Android (Google), or on Messenger (Facebook). the questionnaire. No demographical data beyond age were
asked or gathered to participate in the study.

[Link] J Med Internet Res 2019 | vol. 21 | iss. 11 | e15787 | p. 2


(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Bibault et al

Figure 1. Flow diagram. EORTC QLQ-INFO25: European Organisation for Research and Treatment of Cancer.

the answers of all activated modules to build the answer sent


Chatbot Design to the user and saves the conversation on the user’s profile.
Wefight designed a chatbot named Vik to empower patients
with cancer and their relatives via personalized text messages. For the patient, the use of a chatbot is very simple. It is a classic
Vik’s answers are very diverse, and patients can find all the chat on a conversation window. The patient asks a question by
relevant, quality-checked medical information they need. Vik’s writing it on his or her keyboard, and the chatbot answers
architecture is composed of several technological parts allowing directly in simple and understandable language.
a fine analysis of the questions posed by the patients and an Procedures
adapted treatment of the answer.
Patients were randomized (1:1) blindly and received either the
For a chatbot to be fully developed, both machine learning responses of the Vik chatbot or the responses of the medical
algorithms and natural language processing are required. To committee to the predefined 12 questions, as previously
build a chatbot, there are 2 crucial components to be supervised: explained. Participants were shown each question in order, and
intent classification and entity recognition. To understand the blinded responses were directly delivered as Web-based text
users’ messages and send personalized answers, the conversation messages for each group. The full answer for each question
goes through 3 steps: the first step analyzes the sentence and from either Vik or experts was directly shown to the participants
identifies intents and entities by using machine learning. The upon activation of each question by the participants. There was
second stage activates modules according to the intents and no actual conversation per question nor the necessity of natural
entities detected by the first stage, and the third stage aggregates language processing for each question. Patients were then asked
[Link] J Med Internet Res 2019 | vol. 21 | iss. 11 | e15787 | p. 3
(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Bibault et al

to complete an adapted version of the EORTC QLQ-INFO25 the numbers of participants who were randomly assigned
questionnaire online, assessing the quality of medical received the intended treatment and were analyzed for the
information received based on each response. A total of 21 items primary outcome. A single intervention was performed for this
of the EORTC QLQ-INFO25 questionnaire were included study. All participating patients finished the evaluation. They
(Multimedia Appendix 1). were all female with a mean age of 42 years (SD 19).
Outcomes Descriptive Analysis
The perceived quality of the answers was assessed using the Patients responded to the questionnaire in an average of 15 min
QLQ-INFO25 questionnaire that uses a scale of satisfaction (SD 4). The first group of 71 patients received the responses
graded from 1 to 4. from Vik, and the second group received the responses from
physicians. The average global rating was 2.86 (median 3, IQR
The primary objective was to assess the overall perceived quality
2-4). The success rates (as defined by a score >3) were 69% in
of the answers given by the Vik chatbot to common questions
the chatbot group versus 64% in the physicians group. Patients
asked by patients with breast cancer about their therapy
assessing physicians’ answers gave an average rating of 2.82,
management compared with answers given by a
whereas patients assessing Vik’s answers gave an average rating
multidisciplinary medical committee (oncologist surgeon,
of 2.89 (Multimedia Appendix 2).
medical oncologist, and radiotherapist oncologist), by comparing
proportions of success in the physicians’ and Vik’s group. The A total of 62.0% of patients (88/142) would have liked to get
secondary objective was to compare the average scores obtained even more information (65% [46/71]) in the physicians’ group
by the chatbot and by the physicians for each individual INFO25 and 59% ([42/71] in Vik's group), whereas only 4.2% (6/142)
item. Gradings for the 21 items were averaged to define an would have liked to get less. A total of 83.1% of patients
overall score for each patient, in each group. We defined success (118/142) found answers helpful (82% [58/71] in the physicians’
as a grade greater than or equal to 3. Descriptive statistics were group and 85% [60/71] in Vik’s group), and 81.0% (115/142)
used to summarize patient characteristics by treatment group. were satisfied with the amount of information they have received
(77% [55/71] in physicians’ group and 85% [60/71] in Vik's
Statistical Analysis group).
This study used a randomized phase III design with an alpha of
.05 and a beta of .2 with a noninferiority limit of 10%. The Comparison of Patient Groups
effect size was based on a published EORTC INFO25 validation Primary Objective
study [9]. This noninferiority limit of 10% was chosen as an
acceptable difference for patient satisfaction. In view of these The difference between success rates in the physicians’ group
assumptions, the trial required at least 142 patients randomly and Vik’s group was –0.03 (95% CI -0.07 to 0.00). Furthermore,
assigned to the 2 groups. A 1-sided binomial test using the the binomial test showed a noninferiority (P<1e-14) between
method of Mietinen and Nurminen was performed to compare the perceived quality of the chatbot responses and that of the
the difference between the proportions of success in the 2 groups physicians, as assessed by EORTC INFO25.
for questions 1 to 19 and the noninferiority limit. Noninferiority Secondary Objective
was declared if the P value of the test is lower than .05. For
Both-sided 90% CI, equivalent to 1-sided 95% CI was computed
each item, confidence interval of the difference between the
for the difference between proportions of success in the
proportions of success in the physicians’ group and Vik’s group
physicians’ group and Vik’s group for each item (Multimedia
was estimated using the Wald Z method. Noninferiority was
Appendix 2). For 12 items of them, the noninferiority can be
declared when the upper limit of a 2-sided 90% CI, equivalent
declared as the upper limit of the 95% CI did not exceed the
to a 1-sided 95% CI, did not exceed the noninferiority limit of
0.1 noninferiority limit (Figure 2). For the rest of them (9 items),
10%.
the upper limit of the 95% CI crossed the 0.1 noninferiority
limit. For these items, the noninferiority cannot be claimed.
Results These items include questions 2 and 3 about breast cancer stages
and causes, question 4 about whether or not the cancer is under
Analysis Size
control, 4 questions related to treatments (types, benefits, and
Between November and December 2018, we included a total side effects), and 2 questions related to care outside of the
of 142 patients, divided into 2 groups of 71. For each group, hospital.

[Link] J Med Internet Res 2019 | vol. 21 | iss. 11 | e15787 | p. 4


(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Bibault et al

Figure 2. Noninferiority (NI) graph.

agent to describe their symptoms. Advices and information are


Discussion given in return by the chatbot. The second trial, The Buddy
Principal Findings Study (NCT02742740) evaluates an Embodied Conversational
Agent (ECA) Oncology Trial Advisor for Cancer Trials that
This is the first study that rigorously assessed an artificial acts as an advisor to patients on chemotherapy regimens,
conversational agent used to inform patients with cancer, but promoting protocol adherence and retention, providing
the study has limitations: we did not evaluate the demographic anticipatory guidance, and answering questions. The chatbot
features of the patients who answered the survey to remain in also serves as a conduit to capture information about complaints
compliance with the European General Data Protection or adverse events. Usability metrics will include session time,
Regulation. Patients were recruited in our study through a satisfaction, and error rates. Subjects will be identified from
patients association mailing list, which means that they could among patients on chemotherapy regimens at the Boston
potentially be younger than the average population of patients Medical Center [11]. All subjects will be enrolled for 2 months
with breast cancer, have more digital literacy skills, and be more and randomized to the chatbot group or control group. The
open-minded toward digital tools, even if the 2 groups were primary outcome will be treatment protocol adherence, defined
blinded and did not know if they received the answers from the by the number of treatment visits attended/number of treatment
chatbot or from the group of physicians. visits scheduled. The secondary outcome will measure subject
Chatbot Assessment satisfaction, number of adverse events as reported through the
ECA and directly to clinic by patient, time to detect and resolve
A search on [Link] currently returns only 4 trials
adverse events as reported through the ECA and directly to
evaluating chatbots in health care: in the United Kingdom, a
clinic by patient, and adverse event false alarm rate as reported
nonrandomized trial is being performed by the National Health
through ECA and directly to clinic by patient. The third study,
Service to compare the Babylon chatbot with the nonemergency
the RAISE project (NCT01458002) [12], is designed to promote
111 telephone number [10]. Patients interact with an automatic
exercise and sun protection. The primary aims were to develop
[Link] J Med Internet Res 2019 | vol. 21 | iss. 11 | e15787 | p. 5
(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Bibault et al

and assess the effectiveness of a tailored internet intervention and the health care system. In this phase III, blind,
on a national sample, to develop and assess the effectiveness noninferiority, randomized controlled trial, the EORTC INFO25
of the internet intervention enhanced by a relational agent, and scores from the chatbot were found to be noninferior to the
to determine if the intervention with the relational agent can scores of the group of physicians. Conversational agents may
outperform the regular tailored internet intervention. The study save patients with minor health concerns from a visit to the
will include 3 groups (control, internet, and internet plus doctor. This could allow clinicians to spend more time to treat
relational agent). A representative national sample of 1639 patients who need a consultation at the most. Consultations for
individuals at risk for both behaviors will be recruited. symptoms that do not require an actual consultation could be
avoided, potentially saving a significant amount of money and
Randomized studies demonstrating the superiority (or at least
resources. However, if the quality of these computer programs
noninferiority) of chatbots, compared with an intervention
is not rigorously assessed, they could be unable to actually detect
performed by a physician, do not exist. However, if chatbots
the difference between minor and major symptoms, without
are to be safely used by a large number of patients, they must
anyone knowing. Health chatbots will need to be used by many
be evaluated like a medical device or even a drug. The
and have access to rich datasets to increase their knowledge of
consequences of a medical chatbot dysfunction could potentially
medical terms, symptoms, and treatments. These systems will
have a significant negative impact, such as misdiagnosis,
not replace the physicians and should be considered as a
delayed diagnosis, inappropriate self-medication, or bad
resource to enhance the efficacy of health care interventions. If
treatment adherence. Their use should not be promoted without
chatbots are consistently shown to be effective and safe, they
conducting thorough investigations.
could be prescribed like a drug to improve patient information,
Conclusions monitoring, or treatment adherence. Significant hurdles still
The data regarding the use of conversational agents in health exist in the widespread application of chatbots at this time, such
care in general and oncology in particular are limited, which is as compliance with the Health Insurance Portability and
in sharp contrast with their potential benefits for the patients Accountability Act.

Acknowledgments
BC had full access to all of the data in the study and takes responsibility for the quality, integrity of the data, and the accuracy of
the data analysis. The authors would like to thank Laure Guéroult Accolas, association Patients en Réseau, for her help in patient
recruitment.

Authors' Contributions
JEB and BC contributed equally. JEB and BC designed the study and wrote the manuscript. BC and AP performed research. AG
performed statistical analysis. The manuscript was reviewed by all the authors.

Conflicts of Interest
AG, AP, GD, BB, and PN are employed by Wefight. BC and JEB own shares of Wefight.

Multimedia Appendix 1
Questions used and adapted version of the EORTC QLQ-INFO25 questionnaire.
[DOCX File , 16 KB-Multimedia Appendix 1]

Multimedia Appendix 2
Detailed grading of each EORTC INFO25 item in each group.
[DOCX File , 18 KB-Multimedia Appendix 2]

Multimedia Appendix 3
CONSORT-EHEALTH checklist (V 1.6.1).
[PDF File (Adobe PDF File), 2281 KB-Multimedia Appendix 3]

References
1. Turing AM. Computing machinery and intelligence. Mind 1950;49:433-460. [doi: 10.1093/mind/LIX.236.433]
2. Weizenbaum J. ELIZA---a computer program for the study of natural language communication between man and machine.
Commun ACM 1966;9(1):36-45. [doi: 10.1145/365153.365168]
3. Wikipedia. 2018. Dr. SbaitsoURL: [Link] [accessed
2018-10-23]

[Link] J Med Internet Res 2019 | vol. 21 | iss. 11 | e15787 | p. 6


(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Bibault et al

4. AISB - The Society for the Study of Artificial Intelligence and Simulation of Behaviour. URL: [Link] [accessed
2018-10-23]
5. Miner AS, Milstein A, Schueller S, Hegde R, Mangurian C, Linos E. Smartphone-based conversational agents and responses
to questions about mental health, interpersonal violence, and physical health. JAMA Intern Med 2016 May 1;176(5):619-625
[FREE Full text] [doi: 10.1001/jamainternmed.2016.0400] [Medline: 26974260]
6. Brouard B, Bardo P, Bonnet C, Mounier N, Vignot M, Vignot S. Mobile applications in oncology: is it possible for patients
and healthcare professionals to easily identify relevant tools? Ann Med 2016 Nov;48(7):509-515. [doi:
10.1080/07853890.2016.1195010] [Medline: 27348761]
7. Laranjo L, Dunn AG, Tong HL, Kocaballi AB, Chen J, Bashir R, et al. Conversational agents in healthcare: a systematic
review. J Am Med Inform Assoc 2018 Sep 1;25(9):1248-1258 [FREE Full text] [doi: 10.1093/jamia/ocy072] [Medline:
30010941]
8. Chaix B, Bibault J, Pienkowski A, Delamon G, Guillemassé A, Nectoux P, et al. When Chatbots meet patients: one-year
prospective study of conversations between patients with breast cancer and a Chatbot. JMIR Cancer 2019 May 2;5(1):e12856
[FREE Full text] [doi: 10.2196/12856] [Medline: 31045505]
9. Arraras JI, Greimel E, Sezer O, Chie W, Bergenmar M, Costantini A, et al. An international validation study of the EORTC
QLQ-INFO25 questionnaire: an instrument to assess the information given to cancer patients. Eur J Cancer 2010
Oct;46(15):2726-2738. [doi: 10.1016/[Link].2010.06.118] [Medline: 20674333]
10. Babylon Health. URL: [Link] [accessed 2018-05-25]
11. Clinical Trials. Study Buddy (an ECA Oncology Trial Advisor) for Cancer TrialsURL: [Link]
NCT02742740 [accessed 2018-10-23]
12. Clinical Trials. Online Tailored Interventions & Relational Agents for Exercise and Sun Protection (Project RAISE)URL:
[Link] [accessed 2018-10-23]

Abbreviations
AI: artificial intelligence
ECA: Embodied Conversational Agent
EORTC QLQ-INFO25: European Organisation for Research and Treatment of Cancer Quality of Life Group
information questionnaire

Edited by G Eysenbach; submitted 07.08.19; peer-reviewed by M Hall, Y Xiao; comments to author 03.10.19; revised version received
14.10.19; accepted 23.10.19; published 24.11.19
Please cite as:
Bibault JE, Chaix B, Guillemassé A, Cousin S, Escande A, Perrin M, Pienkowski A, Delamon G, Nectoux P, Brouard B
A Chatbot Versus Physicians to Provide Information for Patients With Breast Cancer: Blind, Randomized Controlled Noninferiority
Trial
J Med Internet Res 2019;21(11):e15787
URL: [Link]
doi: 10.2196/15787
PMID:

©Jean-Emmanuel Bibault, Benjamin Chaix, Arthur Guillemassé, Sophie Cousin, Alexandre Escande, Morgane Perrin, Arthur
Pienkowski, Guillaume Delamon, Pierre Nectoux, Benoît Brouard. Originally published in the Journal of Medical Internet Research
([Link] 24.11.2019. This is an open-access article distributed under the terms of the Creative Commons Attribution
License ([Link] which permits unrestricted use, distribution, and reproduction in any
medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete
bibliographic information, a link to the original publication on [Link] as well as this copyright and license information
must be included.

[Link] J Med Internet Res 2019 | vol. 21 | iss. 11 | e15787 | p. 7


(page number not for citation purposes)
XSL• FO
RenderX

You might also like