R2 Bayesian-Based Symptom Screening For Medical Dialogue Diagnosis
R2 Bayesian-Based Symptom Screening For Medical Dialogue Diagnosis
Abstract—In a medical dialogue diagnosis system, the se- to predict a disease, do not get good classification results
lection of symptoms for inquiry has a significant impact when there is a high overlap of symptoms between diseases.
2023 IEEE Symposium on Computers and Communications (ISCC) | 979-8-3503-0048-2/23/$31.00 ©2023 IEEE | DOI: 10.1109/ISCC58397.2023.10218219
on diagnostic accuracy and dialogue efficiency. In a typical Recently there has been an increase in research based on
diagnosis process, the symptoms initially reported by users are
often insufficient to support an accurate diagnosis, making it deep reinforcement learning, [2] defined the query and diag-
necessary to ask users about other symptoms through dialogue nosis process as Markov decision process and using deep
to form a conclusive diagnosis. In this paper, we propose a Q-network (DQN) for policy learning. Based on this, [3]
disease diagnosis algorithm based on Bayesian, which simulates introduced knowledge graphs to add knowledge branch and
the process of doctor’s inquiry and diagnosis by dynamically knowledge-routed graph branch, but with limited enhance-
updating the list of diseases to increase the interpretability of
diagnosis results. For the symptom interrogation, we propose ment. However, RL-based methods have poor interpretability,
a symptom screening algorithm based on the difference of require more resources, and are also poorly transferable.
symptom sets to exclude diseases with low probability. Through
the intersection and union of disease symptom sets, we can In this paper, we propose Bayesian-based symptom screen-
screen out the symptoms that can distinguish diseases in fewer ing algorithm. We get the rough probabilities of diseases
inquiring rounds. The experimental results demonstrate the based on the positive and negative symptoms in the patient’s
proposed method performs more efficiently than existing state- self-report, using Bayesian and ranking them in descending
of-the-art algorithms. order. Symptom screening combines the variability between
Index Terms—Medical dialogue diagnosis system, symptom
screening, Bayesian inference, e-health disease symptom sets to identify the symptoms that best
distinguish diseases, we select high probability symptoms in
I. I NTRODUCTION low probability diseases to get more additional information.
Symptom screening is performed using a binary search,
The medical dialogue diagnosis system is one of the most which reduces the number of diseases by half in each round
important artificial intelligence applications in healthcare. of inquiry, greatly improving the efficiency of inquiry. The
The medical dialogue diagnosis system simulates a doctor, patient’s additional information and self-report are then used
inquiring the patient and combining their self-reports to to make the final disease prediction. Our contribution is as
predict potential disease. Medical dialogue diagnosis system follows:
can provide pre-consultation to patients and also help doctors
to collect information from patients. 1) We propose lightweight Bayesian-based symptom
The main challenges are that patients describe their symp- screening algorithm to predict patients’ diseases by multi-
toms incompletely and that one symptom can be associated ple rounds of interrogation. The main advantages are high
with multiple diseases, which makes diagnosis more difficult. interpretability, low computational complexity, and less data
The current research interest is to make a more accurate resources and computational resources required.
diagnosis by having multiple rounds of dialogue with the 2)We propose symptom screening algorithm. The variabil-
patient to make up for information that the patient is not ity between sets of disease symptoms is exploited to find the
actively giving. Also, using as few dialogue rounds as symptoms that best distinguish the disease. The number of
possible to check enough additional symptoms is desired for interrogation rounds is greatly reduced by binary search. The
the efficiency of this system. best results are obtained on two real-world datasets.
In traditional methods, [1] developed an expert system by
extracting knowledge from datasets and human knowledge, The rest of this paper is organized as follows. Section
but it has poor transferability. Machine learning approaches II reviews the related works. In Section III, we describes
such as decision tree, SVM, random forest have been used the overall framework and details of the algorithm. We then
presents the experimental setup and results in Section IV.
* Yuchun Guo is the corresponding author. Finally, we conclude in Section V.
Authorized licensed use limited to: Zhejiang University. Downloaded on December 12,2024 at 13:01:50 UTC from IEEE Xplore. Restrictions apply.
IEEE Conference on ICT Solutions for eHealth (ICTS4eHealth 2023)
Note that the term P (d = 1, S + , S − ) can be factorized take situations with four diseases. Each circle represents the
because the symptom are conditionally independent given full set of symptoms of a disease, the slash area is the target
all diseases. symptom set, blue and orange represent the symptom sets
The Bayesian inference algorithm is summarized as Al- belonging to SH and SL , respectively.Our goal was to find
gorithm 1. Step 1 initializes an empty list DP to store the the symptom that best distinguished the disease. We put the
probabilities of all diseases. Step 2 calculates for each dis- symptom into S + or S − depending on the patient’s negative
ease, the joint probability of S + and S − . Step 3 calculates the or positive response at each inquiry, while removing half
probability of S + and S − . Step 4 calculates the conditional of the symptom set. Because the current results are biased
probabilities of each disease when the conditions of S + and towards high probability diseases, the system receives less
S − are all added to the DP . benefit if the symptoms are selected from the high probability
disease symptom set to ask the patient, in other words, the
Algorithm 1: Bayesian inference algorithm for diag- Bayesian inference process gets more additional information
nosis when the symptoms are selected from the low probability
Input: Positive symptom S + and negative symptom disease symptom set to ask the patient.
S−. In order to make the screened symptoms correlated with
Output: List DP of probability in each disease all sets in SL and uncorrelated with all sets in SH , we take
1 Initialize an empty list DP ; the union set of Si in SH :
2 for i in 1...m do [
3 Compute P (di = 1, S + , S − ) using equation 6; S∪ = Si (8)
Si ∈SH
4 end
+ − we take the intersect set of Si in SL :
5 Compute P (S , S ) using equation 7;
6 for i in 1...m do \
S∩ = Si (9)
7 Compute P (di = 1|S + , S − ) using equation 5;
Si ∈SL
8 add it to the list DP ;
9 end Then, we get the target set of symptoms:
\ [
SO = S∩ − S∪ = Si − Si (10)
Si ∈SL Si ∈SH
D. Symptom screening
Generally, patient do not provide enough information for After we get the target symptom set SO , in order to
the doctor to confirm the disease, so additional information is make the selected symptoms work in most cases, so we
required through multiple rounds of inquiry. We are inspired select the most frequently occurring symptoms according to
by this to propose symptom screening algorithm based on the probability distribution of the symptoms, which is the
the difference of symptom sets of each disease. following:
Patient self-reports were used to get rough disease proba- sOmax = argmax P (s = 1) (11)
s∈SO
bilities using Algorithm 1 and sorted in descending order(eq
P (d0 |S + , S − ) > P (d1 |S + , S − ) > ... > P (dm−1 |S + , S − ) So we select the symptom sOmax with the highest proba-
). Then we get the high probability disease symptom set bility of positive in the SO , add it to the positive symptom set
m S + or negative symptom set S − according to the patient’s
SH = {Si , i ∈ [1, 2...⌈ ⌉]} and the low probability disease
2 m answer.
symptom set SL = {Si , i ∈ [⌈ ⌉−1, ...m]}. Figure 1 shows Figure 2 represents the complete interrogation process
2
of the system when four diseases are present. The dashed
circles indicate the set of eliminated symptoms. In the first
S1 S3 round, the system gets the current sOmax symptoms by the
SH
symptom screening algorithm and interrogates the patient.
When sOmax is positive, S1 and S2 are eliminated and
SL sOmax is put into the positive symptom set S + ; when the
interrogation result of sOmax is negative, S3 and S4 are
SO eliminated and sOmax is put into the negative symptom set
S2 S4 S − . By the second round, take the left half as an example.
At this point, sOmax is recalculated, and when sOmax is
positive, S3 is eliminated and sOmax is put into the positive
symptom set S + ; the current when sOmax is negative, S4
Fig. 1. A round of symptom screening process
is eliminated and sOmax is put into the negative symptom
our symptom screening process for each round, selected to set S − . After the second round, only one disease symptom
Authorized licensed use limited to: Zhejiang University. Downloaded on December 12,2024 at 13:01:50 UTC from IEEE Xplore. Restrictions apply.
IEEE Conference on ICT Solutions for eHealth (ICTS4eHealth 2023)
Authorized licensed use limited to: Zhejiang University. Downloaded on December 12,2024 at 13:01:50 UTC from IEEE Xplore. Restrictions apply.
IEEE Conference on ICT Solutions for eHealth (ICTS4eHealth 2023)
TABLE II
P ERFORMANCE COMPARISONS ON MZ DATASET.
method upper respiratory tract infection minor bronchitis pediatric diarrhea pediatric dyspepsia Overall Ave rounds
SVM-ex 0.44 0,71 0.89 0.28 0.59 -
SVM-ex+im 0.52 0.93 0.91 0.34 0.71 -
Basic DQN - - - - 0.65 -
KR-DS - - - - 0.73 -
Bayesian-ex 0.5333 0.8235 0.77 0.3733 0.6338 1
Bayesian-ex+im 0.5667 0.8235 0.8444 0.3939 0.6761 3.26
ours 0.833 0.9412 0.9778 0.909 0.9225 2.89
TABLE III
P ERFORMANCE COMPARISONS ON DX DATASET.
method allergic rhinitis upper respiratory pneumonia Hand, foot and mouth childhood diarrhea Overall Ave rounds
tract infections disease
Basic DQN - - - - - 0.731 3.92
KR-DS - - - - - 0.74 3.36
Bayesian-ex 0.6 0.625 0.55 0.9 1 0.7308 1
Bayesian-ex+im 0.95 0.3750 0.85 0.85 0.95 0.7788 2.67
ours 1 0.9167 0.85 1 1 0.9519 3.08
Authorized licensed use limited to: Zhejiang University. Downloaded on December 12,2024 at 13:01:50 UTC from IEEE Xplore. Restrictions apply.
IEEE Conference on ICT Solutions for eHealth (ICTS4eHealth 2023)
Authorized licensed use limited to: Zhejiang University. Downloaded on December 12,2024 at 13:01:50 UTC from IEEE Xplore. Restrictions apply.