0% found this document useful (0 votes)
18 views12 pages

Neuralsympcheck: A Symptom Checking and Disease Diagnostic Neural Model With Logic Regularization

The document describes a new neural model called NeuralSympCheck for symptom checking and disease diagnosis. It uses logic regularization to combine the advantages of existing Bayesian, decision tree, and reinforcement learning models. Experiments show it outperforms other methods in diagnostic accuracy when there are many possible diagnoses and symptoms. The model treats symptom suggestion as multi-label classification to handle large, sparse symptom spaces. Unlike reinforcement learning models, its predictions do not depend on symptom order.

Uploaded by

adek3710
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views12 pages

Neuralsympcheck: A Symptom Checking and Disease Diagnostic Neural Model With Logic Regularization

The document describes a new neural model called NeuralSympCheck for symptom checking and disease diagnosis. It uses logic regularization to combine the advantages of existing Bayesian, decision tree, and reinforcement learning models. Experiments show it outperforms other methods in diagnostic accuracy when there are many possible diagnoses and symptoms. The model treats symptom suggestion as multi-label classification to handle large, sparse symptom spaces. Unlike reinforcement learning models, its predictions do not depend on symptom order.

Uploaded by

adek3710
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

NeuralSympCheck: A Symptom Checking and

Disease Diagnostic Neural Model with


Logic Regularization

Aleksandr Nesterov1[0000−0003−1126−8099] , Bulat


2[0000−0001−8540−0684]
Ibragimov , Dmitriy Umerenkov1[0000−0003−0413−7170] ,
2,3[0000−0002−2151−6212]
Artem Shelmanov , Galina Zubkova1[0000−0001−9555−1689] ,
arXiv:2206.00906v1 [cs.CL] 2 Jun 2022

1[0000−0002−9257−0259]
and Vladimir Kokh
1
Sber AI Lab, 2 AIRI, 3 Skoltech
Moscow, Russia
[email protected]

1 Abstract
The symptom checking systems inquire users for their symptoms and perform
a rapid and affordable medical assessment of their condition. The basic symp-
tom checking systems based on Bayesian methods, decision trees, or information
gain methods are easy to train and do not require significant computational re-
sources. However, their drawbacks are low relevance of proposed symptoms and
insufficient quality of diagnostics. The best results on these tasks are achieved
by reinforcement learning models. Their weaknesses are the difficulty of devel-
oping and training such systems and limited applicability to cases with large
and sparse decision spaces. We propose a new approach based on the supervised
learning of neural models with logic regularization that combines the advantages
of the different methods. Our experiments on real and synthetic data show that
the proposed approach outperforms the best existing methods in the accuracy
of diagnosis when the number of diagnoses and symptoms is large. The models
and the code are freely available online1 .
Keywords: Neural Networks, Symptom Checker, Diagnostic Model

2 Introduction
Health systems need to balance three critical qualities: accessibility, quality, and
cost. These three qualities unfortunately often compete over a limited pool of
resources, and improving one of these qualities leads to losses in others. This
is known as the “iron triangle” of healthcare. Mobile networks, big data, and
artificial intelligence are promising directions for improving quality and acces-
sibility while decreasing costs. In [15], authors show that in 2012 35% of adult
US citizens at least once used the internet for self-diagnosis. Self-diagnosis com-
monly starts with queries to search engines. While highly accessible and free,
1
https://github.com/SympCheck/NeuralSymptomChecker
2 A. Nesterov et al.

the quality of the results may be unsatisfactory, and results may be irrelevant,
inaccurate, or even harmful.
To increase the quality of self-diagnosis, several symptom-checker systems
have been proposed [15,16]. Such systems present users with several additional
questions about existing or potential symptoms and use this information to
suggest possible diagnoses and recommend visiting a specialist physician. The
disease diagnosing process can be modeled as a sequence of questions and an-
swers: a physician asks a patient questions about his/her symptoms and uses
the answers to identify the disease. While asking the questions, the physician
pursues two goals. Firstly, the answer to each question must be the most infor-
mative in the current context. Secondly, after a series of questions and answers,
a correct diagnosis should be identified.
This work presents a symptom checker based on a logic regularisation frame-
work [1], which outperforms the state-of-the-art results achieved with methods
based on reinforcement learning (RL). Unlike the RL systems, the proposed
symptom checker is simple both in implementation and training. We split the
system into symptom recommendation and diagnosis prediction submodels and
implement the novel logic regularisation framework that allows us to train the
submodels simultaneously with the standard backpropagation and to treat the
symptom suggestion as a multi-label classification task. The latter, in turn, al-
lows to deal with the problem of big and sparse symptom space by using the
Asymmetric loss [13]. In contrast to RL-based systems, the diagnoses predicted
with our system also do not depend on the order of presented symptoms. The
contributions of the paper can be summarized as follows:
– We present a symptom checker that outperforms state-of-the-art systems
based on reinforcement learning or knowledge graphs in the task of diagnosis
prediction both on real world and synthetic datasets.
– Instead of the RL framework, we apply logic regularisation to train the
symptom checker, showing that simpler models can achieve state-of-the-art
results.
– Unlike the predictions of the RL-based diagnosis systems, the predictions of
our system do not depend on the order of the revealed symptoms.
– Our system is easier to implement, train and requires less computational
resources than state-of-the-art RL-based systems.
– Reframing the symptom recommendation problem as a multi-label classifi-
cation task allows dealing with the big and sparse symptom space using the
Asymmetric loss [13].

3 Related Work
Early works concerning automated symptom clarification and diagnostics were
based on the naive Bayes classifier, decision trees, and other information-gain
methods [8,9]. Due to simplicity and various drawbacks, such systems do not
achieve high diagnostics quality. There are also attempts to use rule-based expert
systems [3]. The performance of such systems depends on the quality of the rules
A Symptom Checking and Disease Diagnostic Neural Model 3

Symptom Suggestion and Inquiring


One-hot symptoms
(from a previous iteration) Inquiring
Predicted (from patient or ground truth)
0 symptoms
1
0.2 0
0
0.1 0
1
0.7 Binnarization 1 No Yes
0
1 + +

1 0 1 1 1 0

Predicted
Next iteration u(x) >
diagnoses

0.2
Uncertainty
0.5
estimation
0.3

Return a predicted u(x) <


Diagnostic and Uncertainty Estimation
diagnosis

Fig. 1. The architecture of the symptom checker model

and medical knowledge bases. Therefore, scaling and modifying them is very
difficult.
Several recent works [16,17,7,12,6] demonstrate effectiveness of RL-based
methods for these tasks. In the RL framework, the symptom clarification and
diagnosis prediction tasks are framed as a Markov decision process [16,17]. This
leads to the unwanted quality of RL-based systems that the final diagnosis de-
pends on the order in which the symptoms are revealed. Despite impressive
results, the RL-based approach is plagued with several difficulties. Firstly, the
possible symptoms and diagnoses are numbered from hundreds to a tenth of
thousands, which leads to a huge decision space. Secondly, the number of symp-
toms present in each case is tiny compared to all possible symptoms, leading to a
sparse decision space. To overcome this difficulties, in [16,7], the authors propose
ensembling that helps to reduce the decision space for each individual ensemble
component and improve the qualitative performance. Peng et al. [12] addresses
the sparsity of decision space by proposing special reward estimation and regu-
larisation techniques. To increase the diagnostic performance, other models use
context (age, sex, location) [7] or information from knowledge graphs [20].
In [4,10], the authors note that the decision to stop the dialog made by an RL-
agent can be sub-optimal because the agent is penalized for long conversations.
To solve the problem, they use uncertainty estimation [11] of the diagnosis as a
stopping criteria. The quality of diagnostics improves, because the agent makes
more steps, and the diagnostics model receives more information.
Our logic regularisation framework is similar to RL-based methods as it both
models the dialog between the physician and the patient and achieves high-
quality results. At the same time, our system is easier to implement and train,
and its predictions are independent of the order of revealed symptoms.
4 A. Nesterov et al.

4 Symptom Checker Model


We propose a symptom checker model, NeuralSympCheck, consisting of two
neural submodels: a network for suggesting symptoms that should be inquired
from a patient and a model for performing actual diagnostics. The architecture
of the model is presented in Figure 1.
The symptom checker works iteratively. On each iteration, it receives a set of
already known symptoms (as well as the information about the absence of some
symptoms) and tries to guess the most probable symptom of a patient with the
symptom suggestion submodel. The factual information about the presence of
the corresponding symptom is inquired from a patient. Then known symptoms
and the factual information about the presence/absence of the suggested symp-
tom are used by the diagnostics submodel to predict the disease. We quantify the
uncertainty of this prediction, and if it is intolerably high, we start a new iteration
of symptom clarification, in which the symptom suggestion submodel receives
extended information about symptoms. We note that despite splitting the whole
model into two submodels during the inference, they are trained jointly end-
to-end with a logic regularization mechanism: the diagnostics submodel learns
how to correctly predict diseases with limited information, while the symptom
suggestion submodel learns to suggest the most crucial evidence for diagnostics.

4.1 Symptom Suggestion and Diagnostics Submodels


The symptom suggestion submodel receives two vectors that encode information
known so far about the symptoms. The first vector accumulates the information
about present symptoms stored using one-hot encoding, while the second vector
stores one-hot encoded information about known absent symptoms.
The architecture of the submodel is a feed-forward neural network, in which
each linear layer is followed by batch normalization, a dropout layer, and a ReLU
activation. The model’s output is a probability distribution of possibly present
symptoms obtained via the softmax function.
The most probable symptom is queried from the patient (during training,
it is taken from the gold standard). Then, the actual information about the
presence or absence of the symptom is added to the corresponding vectors.
The diagnostics submodel takes the extended available information about
symptoms as input and predicts a patient’s diagnosis. The symptoms are encoded
in the same way as the input for the symptom suggestion submodel and the
architecture is also the same. The output of the diagnostics model is a probability
distribution of potential diseases obtained via the softmax function.

4.2 Training with Logic Regularization


During training, we perform the same iterative process of symptom suggestion
and diagnosis prediction until the uncertainty of the latter is not low enough.
The training of both submodels is performed end-to-end, so the gradient from
the diagnostics submodel is propagated into the symptom suggestion submodel.
A Symptom Checking and Disease Diagnostic Neural Model 5

Since the diagnostics submodel takes as input discrete data encoded in one-
hot vectors instead of differentiable softmax distributions, the straightforward
stacking of these two submodels requires indifferentiable operations. To mitigate
this problem and train submodels with the standard backpropagation algorithm,
we use a simplified implementation of the Gumbel-softmax approximation [5]
without stochastic sampling.
The overall training loss L is combined from two components: the symptom
prediction loss Ls and the diagnosis prediction loss Ld : L = λLs + Ld , where
λ > 0 is a hyperparameter. Ls is the Asymmetric loss [13] designed for multi-
label classification to mitigate the data skewness towards particular classes:

L+ = (1 − p)γ+ log(p)

Ls = γ (1)
L− = (pm ) − log (1 − pm ) ,

where p is a symptom prediction probability, γ+ , γ− are focusing hyperparam-


eters, pm = max(p − m, 0), m ≥ 0 is a margin hyperparameter. Ld is a simple
cross-entropy loss commonly used for the standard multi-class classification.
The suggested approach to training these two submodels lies in the paradigm
of analytic-synthetic logic regularization. In the analytical approach, a complex
model is trained as a sequence of small independent architectures, which im-
proves the interpretability of the solution. In contrast, the synthetic approach
trains a single model on a significant target problem (end-to-end), which in-
creases the solution’s flexibility. In this paper, none of the approaches described
can meet the essential requirements for symptom suggestion. Firstly, symptom
suggestion cannot be viewed as task-independent of diagnosis prediction since
the goal of this step is not to propose the most likely symptom but the symptom
that would potentially reduce the uncertainty of the second submodel the most.
Secondly, with the standard end-to-end learning, the information necessary for
interpreting and identifying symptoms is lost.
The proposed architecture, in which the predictions of the first submodel
are fed to the input of the second submodel and the gradients from the second
submodel are propagated to the first submodel, is an attempt to encapsulate
both approaches within a single architecture and take advantage of the benefits
of each. Such an analytic-synthetic system benefits from two-way regularization:
using an explicit symptom prediction subproblem to solve the diagnosis detection
problem and using the end goal of the whole problem to regularize the symptom
prediction solutions. Because the proposed architecture, in a sense, imitates the
logic of decision-making by physicians in real life (additional cyclic tests until
the diagnosis is certain), the proposed framework can be considered as one of
the variants of logic regularization [14,21].

4.3 Uncertainty Estimation of the Diagnostics Submodel

Following Lin et al. [10], we quantify the uncertainty of the diagnostics submodel
and use it as a criterion for stopping “questioning” a patient about additional
symptoms. This resembles conducting the diagnostics in real life, as a physician
6 A. Nesterov et al.

collects more evidence only until he is sure enough about the diagnosis. Further-
more, different diseases require a different amount of evidence to make a reliable
conclusion, as some diseases are more ambiguous than others. Therefore, exhaus-
tive questioning or asking a fixed considerable number of symptoms is impossible
since we would like to make a reliable conclusion as soon as possible, saving time
and effort of patients. This also helps speed up training and prevents overfitting,
which eventually leads to better performance of the diagnostics submodel.
In this work, for quantifying uncertainty u of a diagnosis d for a case x,
we rely on the entropy of the diagnostics submodel output distribution p(d|x)
obtained with softmax: u(x) = Ep(d|x) [− log p(d|x)].
We ask for more symptoms until uncertainty of a disease prediction becomes
lower than a predefined threshold: u(x) < β, β ∈ (0, 1) or we exceed a predefined
maximum number of attempts Q. The values Q and β are hyperparameters that
are selected using a validation dataset.

5 Experiments

5.1 Data

Real world data. The MuZhi dataset was created by Zhongyu et al. [17]
from real dialogues on the Chinese healthcare internet portal2 . This dataset en-
compasses 66 symptoms and four diseases. The dataset consists of 710 records
containing the raw dialogue and normalized symptoms checked during the dia-
logue, either found or not. The symptoms from each record are tagged either as
explicit or implicit. The explicit symptoms are the symptoms initially presented
by the patient before the beginning of the dialogue. The presence or absence of
implicit symptoms is discovered during the recorded dialogue.
The Dxy Dialogue Medical [19] dataset is based on dialogues from a popular
Chinese medical forum3 . It consists of 527 unique dialogues, five diseases, and
41 symptoms. The symptoms from each record are tagged either as explicit or
implicit as in the MuZhi dataset.

Synthetic data. The MuZhi and Dxy datasets are limited in the number of
symptoms and diseases. To check the performance of our model in the case of
significant symptom and diseases spaces, we used a synthetic dataset SymCat
presented in [7] with modifications from [12]. This dataset is created from the
similarly named symptom and disease database SymCat 4 . It contains informa-
tion about 474 symptoms and 801 related diseases.
The dataset is built following the procedure: select a disease from the list;
select the symptoms from aposteriori distribution using a Bernoulli experiment
for each symptom; split the symptoms into implicit and explicit groups as in the
MuZhi and Dxy datasets. As in previous works [12,4], to evaluate the system
2
https://muzhi.baidu.com/
3
https://dxy.com/
4
http://www.symcat.com/
A Symptom Checking and Disease Diagnostic Neural Model 7

performance on different scales, we use three versions of the dataset with the
varying number of diseases – 200, 300, and 400. We note that we did not find the
source code of the generating procedure used in [12,4]. Therefore, although we
reproduced the generation process according to their description, there might be
minor deviations. Dataset statistics is presented in Table 3 in Appendix A.

5.2 Experimental Setup


Hyperparameters and Training Details. Model and training hyperparam-
eters, including the number of hidden layers, dropout ratio, layer size, learning
rate, number of epochs, scaling coefficient of the multi-label loss, and the uncer-
tainty threshold are selected on the validation datasets using the Optuna package
5
. To reduce optimization search space, we use the same number of layers with
the same size in both submodels. The selected values are presented in Table 4
in Appendix A. Training is performed using the corrected version of Adam with
linear decay of the learning rate and warm-up. The focusing hyperparameters
are fixed: γ + = 1, γ − = 4. The maximum number of attempts Q = 50.
Evaluation Metrics. The quality of disease prediction is evaluated using the
top-k accuracy metric Acc@k (k ∈ 1, 2, 3). For each example, if a true disease is
present among the top k predictions in the output probability distribution of the
model, it is considered as the correct answer of the model. In ablation studies,
we also use weighted macro F1 to evaluate symptom prediction quality.
Baselines. We compare the proposed NeuralSympCheck model with several
models from the previous work [12,19,18,20,4,2] and with two simple baselines
based on a feedforward neural network. These baselines have the same archi-
tecture as submodules: several fully-connected layers with batch normalization,
ReLU activation, and dropout regularization. The first baseline performs multi-
label disease classification using only the starting explicit symptoms (baseline
ex). The second baseline uses both explicit and implicit symptoms (baseline
ex&im), which is unrealistic and very strong assumption.

5.3 Results and Discussion


Table 1 presents the main experimental results on small datasets MuZhi and Dxy.
On the MyZhi dataset, our NeuralSympCheck model achieves the new state-of-
the-art, outperforming all the baselines and the models from the previous work.
On the Dxy dataset, our model outperforms the first baseline and most of the
systems from the previous work, only falling behind the recently proposed RL-
based systems presented in [19,4,2]. We attribute this to the fact that Dxy is
smaller, contains less number of symptoms, and has a smaller average number
of implicit symptoms that can be clarified for the final diagnosis.
As we can see from Table 2, on the more extensive synthetic datasets based
on SymCat, NeuralSympCheck also achieves state-of-the-art results, outperform-
ing all previous models and the first baseline. The best results are achieved for
5
https://optuna.org
8 A. Nesterov et al.

Table 1. Diagnostics Acc@1 (%) on the MuZhi and Dxy datasets


Model MuZhi Dxy
Baseline ex 61.3 66.4
Baseline ex&im 65.8 77.3
Peng at al.[12] 71.8 75.7
Xu at al.[19] 73.0 74.0
Xia at al.[18] 73.0 76.9
Zhao at al.[20] 69.7 74.0
He at al.[4] 72.6 81.1
Guan at al.[2] 65.5 80.8
Our best results 74.5 75.7

Table 2. Results (%) on the test part of the SymCat datasets


200 diseases 300 diseases 400 diseases
Model Acc@1 Acc@3 Acc@5 Acc@1 Acc@3 Acc@5 Acc@1 Acc@3 Acc@5
Baseline ex 46.7 70.6 81.1 41.1 63.0 73.4 36.3 57.0 67.8
Baseline ex&im 82.6 96.7 99.1 78.4 94.0 98.0 74.4 92.0 97.0
Peng at al.[12] 54.8 73.6 79.5 47.5 65.1 71.8 43.8 60.8 68.9
He at al.[4] 55.6 80.7 89.3 48.2 73.8 84.2 44.6 69.2 79.5
Our best results 63.2 89.3 96.7 54.8 81.2 91.1 49.8 76.6 87.9

each number of possible diagnoses. Our model does not reach the performance
of the second unrealistic baseline trained on both explicit and implicit symp-
toms. This may happen because NeuralSympCheck overcomes the uncertainty
threshold early and stops clarifying additional symptoms. We note that our so-
lution outperforms models from the previous work with a significant margin.
We attribute this remarkable achievement to using a conceptually novel model
architecture that is better adapted to the big and sparse symptom space.

5.4 Ablation Studies

The goal of the first ablation study is to evaluate the effect of the symptom
prediction loss (Table 5 in Appendix B). We train our model using only the
first diagnosis prediction loss with a fixed number of clarification iterations.
This helps to improve Acc@1 of the diagnostics model. However, this results in
a substantial reduction of symptom-suggestion model F1 compared to training
with both losses. We conclude that training only with the diagnosis classifica-
tion loss facilitates the symptom suggestion submodel to exceedingly adjust its
predictions in the direction of best coherence with the predicted diagnosis.
In the second ablation study, we evaluate the effect of uncertainty estimation.
Table 5 in Appendix B shows that models using uncertainty estimation achieve
the best results in terms of the Acc@1 metric for diagnosis prediction. However,
the F1 metric for symptom prediction is significantly lower. The obtained results
can be explained by the fact that models using uncertainty estimation conduct
fewer symptom clarification iterations that are only necessary to achieve model
confidence in the correct diagnosis.
A Symptom Checking and Disease Diagnostic Neural Model 9

We also test the hypothesis that additional symptoms help to reduce the
uncertainty of the diagnosis submodel predictions. Figure 2 in Appendix B shows
that, indeed, regardless of the dataset, the more iterations of symptom refinement
are performed, the less uncertain the diagnostics submodel predictions are.

6 Conclusion

We presented a novel model for symptom and diagnosis prediction based on


supervised learning. It outperforms recently proposed RL-based counterparts
and mitigates some of their limitations, such as the complexity of learning, the
fundamental flaws of the Markov process, and the complexity of applying RL-
based methods in practice. By leveraging asymmetric loss, we overcome the
problem of large and sparse symptoms space. We propose an approach that
allows training the symptom suggestion and the diagnosis prediction models in
an end-to-end fashion with standard backpropagation. We are the first to use
logic regularization for the considered task, which effectively helps to predict
relevant symptoms for diagnostics. Finally, uncertainty estimation of diagnosis
prediction is used as a stopping criterion for asking about new symptoms. Our
NeuralSympCheck model achieves the new state of the art on datasets with large
symptom and diagnosis spaces.
We want to emphasize the practical significance of this work because the pre-
sented model is relatively easy to implement, stable in training, and not demand-
ing on computational resources. This makes it possible to apply the proposed
model in real-world medical systems, which is our future work direction.
Acknowledgements

We are grateful to anonymous reviewers for their valuable feedback. The work
was supported by the RSF grant 20-71-10135.

References
1. Asai, A., Hajishirzi, H.: Logic-guided data augmentation and regularization
for consistent question answering. In: Proceedings of the 58th Annual Meet-
ing of the Association for Computational Linguistics. pp. 5642–5650 (2020).
https://doi.org/10.18653/v1/2020.acl-main.499
2. Guan, H., Baral, C.: A Bayesian approach for medical inquiry and disease inference
in automated differential diagnosis. arXiv preprint arXiv:2110.08393 (2021)
3. Hayashi, Y.: A neural expert system with automated extraction of fuzzy if-then
rules and its application to medical diagnosis. In: Advances in neural information
processing systems. pp. 578–584 (1991)
4. He, W., Mao, X., Ma, C., Hernández-Lobato, J.M., Chen, T.: BSODA: A bipar-
tite scalable framework for online disease diagnosis. In: Proceedings of ACM Web
Conference (WWW-2022) (2022)
5. Jang, E., Gu, S., Poole, B.: Categorical reparameterization with Gumbel-softmax.
In: Proceedings of ICLR (2017)
10 A. Nesterov et al.

6. Janisch, J., Pevnỳ, T., Lisỳ, V.: Classification with costly features as a se-
quential decision-making problem. Machine Learning 109(8), 1587–1615 (2020).
https://doi.org/https://doi.org/10.1007/s10994-020-05874-8
7. Kao, H.C., Tang, K.F., Chang, E.: Context-aware symptom checking for disease
diagnosis using hierarchical reinforcement learning. In: Proceedings of the AAAI
Conference on Artificial Intelligence. vol. 32 (2018)
8. Kohavi, R., et al.: Scaling up the accuracy of naive-bayes classifiers: A decision-tree
hybrid. In: Proceedings of KDD. vol. 96, pp. 202–207 (1996)
9. Kononenko, I.: Machine learning for medical diagnosis: history, state of the
art and perspective. Artificial Intelligence in medicine 23(1), 89–109 (2001).
https://doi.org/https://doi.org/10.1016/S0933-3657(01)00077-X
10. Lin, J., Chen, Z., Liang, X., Wang, K., Lin, L.: Towards causality-aware infer-
ring: A sequential discriminative approach for medical diagnosis. arXiv preprint
arXiv:2003.06534v4 (2022)
11. McAllister, R., Kahn, G., Clune, J., Levine, S.: Robustness to out-of-distribution
inputs via task-aware generative uncertainty. In: 2019 International Confer-
ence on Robotics and Automation (ICRA). pp. 2083–2089. IEEE (2019).
https://doi.org/10.1109/ICRA.2019.8793552
12. Peng, Y.S., Tang, K.F., Lin, H.T., Chang, E.: Refuel: Exploring sparse features in
deep reinforcement learning for fast disease diagnosis. Advances in neural informa-
tion processing systems 31, 7322–7331 (2018)
13. Ridnik, T., Ben-Baruch, E., Zamir, N., Noy, A., Friedman, I., Protter, M., Zelnik-
Manor, L.: Asymmetric loss for multi-label classification. In: Proceedings of the
IEEE/CVF International Conference on Computer Vision. pp. 82–91 (2021).
https://doi.org/10.1109/ICCV48922.2021.00015
14. Riegel, R., Gray, A., Luus, F., Khan, N., Makondo, N., Akhalwaya, I.Y., Qian,
H., Fagin, R., Barahona, F., Sharma, U., et al.: Logical neural networks. arXiv
preprint arXiv:2006.13155 (2020)
15. Semigran, H.L., Linder, J.A., Gidengil, C., Mehrotra, A.: Evaluation of symp-
tom checkers for self diagnosis and triage: audit study. BMJ 351 (2015).
https://doi.org/10.1136/bmj.h3480
16. Tang, K.F., Kao, H.C., Chou, C.N., Chang, E.Y.: Inquire and diagnose: Neural
symptom checking ensemble using deep reinforcement learning. In: NIPS Workshop
on Deep Reinforcement Learning (2016)
17. Wei, Z., Liu, Q., Peng, B., Tou, H., Chen, T., Huang, X.J., Wong, K.F., Dai, X.:
Task-oriented dialogue system for automatic diagnosis. In: Proceedings of the 56th
Annual Meeting of the Association for Computational Linguistics (Volume 2: Short
Papers). pp. 201–207 (2018). https://doi.org/10.18653/v1/P18-2033
18. Xia, Y., Zhou, J., Shi, Z., Lu, C., Huang, H.: Generative adversarial regularized
mutual information policy gradient framework for automatic diagnosis. In: Pro-
ceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 1062–1069
(2020). https://doi.org/10.1609/aaai.v34i01.5456
19. Xu, L., Zhou, Q., Gong, K., Liang, X., Tang, J., Lin, L.: End-to-end knowledge-
routed relational dialogue system for automatic diagnosis. In: Proceedings of
the AAAI Conference on Artificial Intelligence. vol. 33, pp. 7346–7353 (2019).
https://doi.org/10.1609/aaai.v33i01.33017346
20. Zhao, X., Chen, L., Chen, H.: A weighted heterogeneous graph-based dialog system.
IEEE Transactions on Neural Networks and Learning Systems pp. 1–6 (2021).
https://doi.org/10.1109/TNNLS.2021.3124640
A Symptom Checking and Disease Diagnostic Neural Model 11

21. Zhou, Y., Yan, Y., Han, R., Caufield, J.H., Chang, K.W., Sun, Y.,
Ping, P., Wang, W.: Clinical temporal relation extraction with probabilis-
tic soft logic regularization and global inference. In: Proceedings of the
AAAI Conference on Artificial Intelligence. vol. 35, pp. 14647–14655 (2021).
https://doi.org/10.1109/TNNLS.2021.3124640
12 A. Nesterov et al.

A Dataset Statistics and Hyperparameters


Table 3. Dataset statistics
MuZhi Dxy SymCat 200 SymCat 300 SymCat 400
Total dialogues 710 527 1,110,000 1,110,000 1,110,000
Training dialogues 568 423 1,000,000 1,000,000 1,000,000
Validation dialogues - - 100,000 100,000 100,000
Testing dialogues 142 123 10,000 10,000 10,000
Unique diagnoses 4 5 200 300 400
Unique symptoms 66 41 326 350 367
Average number of explicit symptoms 2.4 3.1 1.9 2.0 2.0
Average number of implicit symptoms 2.4 1.2 1.9 2.0 2.0

Table 4. Hyperparameters of the models that showed the best results on validation
datasets
Hyperparams MuZhi Dxy SymCat 200 SymCat 300 SymCat 400
Size of the first layer 6,000 10,000 8,000 8,000 8,000
Size of the second layer 3000 - - - -
Dropout probability 0.4 0.5 0.5 0.5 0.5
Multilabel loss coefficient 1.6 0.6 1 1 1
Minimum uncertainty value, β 0.5 0.3 0.3 0.4 0.4
Number of epochs 35 5 5 10 10
Learning rate 5e-5 1e-3 1e-3 1e-2 1e-2

B Additional Experimental Results

0.40 Dataset
MZ
Dxy
SymCat 200
0.35 SymCat 300
SymCat 400
Entropy (normalized)

0.30

0.25

0.20

0.15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Iteration
Fig. 2. Change of entropy value depending on iteration of symptom inquiring

Table 5. Ablation studies results (% Acc@1 by diagnosis / F1 weighted by symptoms)


MuZhi Dxy SymCat 200 SymCat 300 SymCat 400
Only diagnosis loss 67.2 / 32.7 71.9 / 24.0 70.7 / 23.7 64.1 / 24.5 57.6 / 24.4
Two losses, without entropy 70.3 / 10.2 69.0 / 32.8 59.6 / 35.0 53.8 / 33.2 47.7 / 32.3
With entropy 68.3 / 28.4 69.1 / 19.2 63.2 / 20.7 54.8 / 15.8 49.8 / 14.5

You might also like