Explainable Lung Disease Classification from Chest
X-Ray Images Utilizing Deep Learning and XAI
Tanzina Taher Ifty*, Saleh Ahmed Shafin*, Shoeb Mohammad Shahriar, Tashfia Towhid
Department of Computer Science and Engineering
Ahsanullah University Of Science and Technology (AUST), Dhaka, Bangladesh
Email- {tanzina19taher,salehahmedshafin7,tashfia6288}@gmail.com,
[email protected] Abstract—Lung diseases remain a critical global health con- learning models are proficient in picking out complex patterns
cern, and it’s crucial to have accurate and quick ways to in X-ray images that might be missed by humans, thereby
arXiv:2404.11428v1 [eess.IV] 17 Apr 2024
diagnose them. This work focuses on classifying different lung helping in the early diagnosis of diseases. With traditional
diseases into five groups: viral pneumonia, bacterial pneumonia,
COVID, tuberculosis, and normal lungs. Employing advanced diagnostic practices grappling to identify subtle patterns in X-
deep learning techniques, we explore a diverse range of models ray images, this research focuses on ameliorating lung disease
including CNN, hybrid models, ensembles, transformers, and detection using advanced deep-learning techniques.
Big Transfer. The research encompasses comprehensive method- Deep learning has been extensively used in recent years
ologies such as hyperparameter tuning, stratified k-fold cross- for the automated classification of medical pictures. In this
validation, and transfer learning with fine-tuning.Remarkably,
our findings reveal that the Xception model, fine-tuned through paper, we develop deep Learning (DL) and transformer-based
5-fold cross-validation, achieves the highest accuracy of 96.21%. classification models for the diagnosis of four different types
This success shows that our methods work well in accurately of lung disorders - Bacterial Viral Pneumonia, Tuberculosis,
identifying different lung diseases. The exploration of explainable and COVID-19 and as well as Normal lungs. We use pre-
artificial intelligence (XAI) methodologies further enhances our trained CNN models, hybrid models, ensemble models, trans-
understanding of the decision-making processes employed by
these models, contributing to increased trust in their clinical former models, K-fold technique, and hyperparameter tuning
applications. to train and test our model. We used Lung Disease Dataset
Index Terms—Lung Disease Detection, Digital X-ray Images, [5] obtained from kaggle website. We also use Explainable
Image Classification, Deep Learning (DL), Transformer, Explain- Artificial Intelligence (XAI) [6] techniques such as Gradient-
able Artificial Intelligence(XAI). weighted Class Activation Mapping (Grad-CAM) [7] and
Local Interpretable Model-Agnostic Explanations (LIME) [8]
I. I NTRODUCTION to provide visual explanations for the model’s predictions.
Lung diseases affect a notable portion of the worldwide These techniques will enable us to identify the significant
population and are a serious health problem. Advance detec- features and areas of the chest X-ray images that contribute to
tion and pinpoint diagnosis of lung disorders are crucial for the classification decisions, which can be useful for clinicians
both better patient outcomes and successful therapy. Chest X- in the diagnosis and treatment of lung diseases.
ray (CXR) is a more inexpensive substitute to techniques like Our research’s contribution is: (1) it outperformed the
Polymerase Chain Reaction (PCR) and Computed Tomogra- baseline results in terms of multi-class classification scores,
phy (CT) scans for the advanced diagnosis of lung diseases and (2) our work encompasses a comprehensive set of popular
[1]. Modern expansion in artificial intelligence (AI) and deep approaches, leveraging XAI.
learning have made it attainable to automatically point out lung
II. R ELATED W ORKS
diseases from chest X-ray pictures, with uplifting outcomes.
These AI algorithms’ pace and efficacy are especially con- There is a lot of work on detecting disease from chest X-ray
venient in emergency scenarios, where they upgrade resource images [9]. Many approaches have been done for identifying
management and accelerate decision-making. the disease, we focused on ways related to our task that are
The World Health Organization (WHO) stated Covid-19 a different from deep learning methodology.
pandemic, and it has had a destructive consequence on human In order to classify CXR pictures into three
beings all around the globe [2]. Pneumonia was the peak cause groups—COVID-19, respiratory infections, and regular
of death for children under five in 2019, considering nearly a classes—Mohammad Rahimzadeh et al. trained deep
third of the demises in this age group, and led to 2.5 million convolutional networks. Their average accuracy was 99.50%
deaths [3]. After COVID-19 and HIV/AIDS, tuberculosis is the using two open-source datasets: 180 COVID-19, 6054
second leading cause of mortality from contagious diseases respiratory infections, and 8851 regular pictures. In particular,
worldwide, placing 13th among death causes [4]. Advanced sensitivity for COVID-19 was 80.53%, which helped to
lung illness spotting is crucial such as a backdrop. Deep achieve an accuracy of 91.4% overall. They used the loss
function named categorical cross entropy loss and optimizer
*First two authors contributed equally to this research. as Adam for training a concatenation neural network. To
improve training effectiveness, data augmentation approaches accuracy of 96.05%. [14].
were used [10].
To overcome the limitations of conventional RT-PCR testing III. DATASET A NALYSIS
techniques, the research offers a deep learning-based approach We used a public dataset of chest X-ray pictures named the
for COVID-19 detection utilizing chest X-ray pictures. 99.7%, Lungs Disease Dataset (4 kinds) [5]. The dataset is divided into
95.02%, and 94.53% testing accuracies are attained by using three directories—test, train, and validation—and comprises
a CNN architectural model evolved in the study article for a 2 10095 photos. The Test folder has 2025 photos, the Train
classes, 3 classes, and 4 classes classifications. However, more folder contains 6054 images, and the Val folder contains 2016
details on the dataset used its size, composition, and potential images. Table 1 displays the dataset’s class distribution.
biases or limitations are needed. The paper claims superiority It is prepared from various datasets. These datasets include
over other related works in the field but requires more in-depth COVID-19 Detection X-Ray Dataset [15], Lungs Dataset
comparisons and discussions. The high testing accuracy of [16], Chest X-Ray Images (Pneumonia) [17], Chest X-Ray
the proposed model demonstrates its potential usefulness, but (Pneumonia,Covid-19,Tuberculosis) [18], Chest X-Ray 14
further exploration and analysis are needed to fully evaluate Dataset with Lungs Cropped [19] and Tuberculosis (TB) Chest
its performance compared to other existing approaches [11]. X-ray Database [20]. It is combined to remove the same
The aim of the research article ”Coronet: A DL Network images in the dataset using VisiPics [When two identical
for Identification and Diagnosis of COVID-19 from Chest X pictures are stored in different formats or resolutions, Visipics
Ray Images” is to give a thorough and conclusive diagnostic will identify them as duplicates even if they are identical but
technique using CXR for detecting the COVID 19 virus. for little aesthetic differences] [21].
The paper shows CoroNet, a Convolutional Neural Network
(CNN) model trained on a dataset consisting of COVID-19 TABLE I
acquired from broad sources and chest X-ray images from DATA D ISTRIBUTION
pneumonia cases. This model showed a high overall accuracy
of 89.6% with precision of 93.5%, recall of 98.2% for COVID- Set Name Classes
19 cases in the 4-class classification, and 95.5% in the 3- Bacterial Corona Normal Tuber- Viral
Pneumo- Virus culosis Pneu-
class classification. The study highlighted how helpful Coro- nia monia
Net may be in aiding medical professionals and radiologists Test 403 407 404 408 403
during the pandemic. However, it confesses the necessity for Train 1205 1218 1207 1220 1204
further training data and advocates doing more validations and Validation 401 406 402 406 401
assessments on outside datasets to verify CoroNet’s reliability
and applicability. [12]. Finally, it can be said that the dataset is split into 20.06% test
The work offers a practical method for detecting COVID- data, 59.97% of train data, and 19.97% of validation data. Our
19 in areas with limited resources by using chest X-ray deep learning models for the identification and categorization
imaging. It reports the challenges of identifying COVID- of lung illnesses are developed and assessed using this dataset.
19, specifically in regions without access to biotechnology
testing. The advocated approach requires taking chest X-ray TABLE II
pictures, extracting important characteristics from them, and DATASET ORIENTATION .
then utilizing machine learning classifiers to split the pictures
Test Data Train Data Validation
into different sets. The best application combines an ensemble Data
of subset discriminant classifiers with ResNet-50 for important Percentage 20.06% 59.97% 19.97%
feature (DF) computation. For the five-class classification task,
the application gets a detection accuracy of 91.6% and is
computationally efficient. The dataset, which presents that IV. M ETHODOLOGY
the suggested pipeline can accurately categorize COVID-19 The architecture for identifying lung illness from chest
classes, is accessible for more research and comparisons. [13]. X-ray pictures utilizing explainable artificial intelligence and
Multi-scale convolutional Neural Network (MS-CNN), a deep learning is shown in this part. Our model has two parts,
Deep Learning framework evolved by Ovi Sarkar et al., is the first is dedicated to capturing visual attributes, and the
effective in accurately identifying six lung-related disorders. second is focused on classifying the images into five classes.
Lung opacity, COVID-19, fibrosis, TB, viral pneumonia, and We will now provide a breakdown of each step within this
bacterial pneumonia are among these illnesses. They used architectural framework
explainable AI (XAI) to increase prediction ability, which
improved the diagnostic threshold for lung diseases. The Step-1,5) Input Image: First of all, we partitioned the
accuracy and understanding of the model were improved by dataset, allocating 59.67% for training, 20.06% for testing, and
integrating XAI techniques like Grad-CAM and SHAP. The an additional 19.97% for validation. In this stage, we present
MS-CNN model outperformed other models and showed re- the proposed model with batches of lung images from the
markable efficiency in detecting COVID-19, with an amazing training dataset. For our work, the batch size is 32.
Step-2,6) Deep Learning Methodology: Utilizing
deep learning algorithms, deep learning methodology
requires a methodical process for controlling complex
circumstances [22]. The stairs involved in this procedure are
problem definition, data collection and preprocessing, model
construction, training, evaluation, hyperparameter tuning,
and model deployment. Deep neural networks are appointed
to extract significant information and decipher complex
patterns from giant datasets. Through continuous testing,
development, and refining, these models are developed to
generate predictions on previously unseen data.
a) Visual Feature Extractor: Seven well-known models
were used in our study to extract visual characteristics from
the dataset: Xception, Inception-V3, VGG19, EfficientNetB7,
DenseNet201, DenseNet121, and ResNet50 [23]. The training
settings comprised 6000 iterations, a number of batch size is
32, and a rate of learning of 0.0001. Each epoch—which was
Fig. 1. This methodology for Lungs Disease identification: The visual feature
extractor module is started from the left side blocks established by the number of iterations needed to cover the
whole training dataset once—had a total of 31. Applications
for multi-class classification can benefit from the use of the
a) Image Augmentation: First, we ensure that all images optimizer known as Adam and the function for loss is known
are the same size and to enable more effective processing, as categorical cross-entropy., were used for optimization and
we pre-processed each image by scaling it to 299*299*3. loss computation. The neural network’s parameters were set
Normalization technique is used to reduce the image pixels by this extensive training setup and subsequently assessed
will be scaled between 0 and 1. This helps to stabilize the with test data.
training process by standardizing the pixel values to have an a) Hyper-parameter Tuning: Changing a model’s hyper-
average of 0 and 1 value is for a standard deviation. Also, we parameters is essential to increasing its efficacy. The grid
use different types of augmentation techniques such as search approach was used to find the best combination
a) Rescaling: The parameter rescale=1./255 normal- of settings for this goal. The optimizer named Adam was
izes image values to the range [0, 1], a standard step in selected with a learning rate of 1e-4, a dropout rate of 0.6 to
image classification preprocessing. This ensures the uniform avoid overfitting, 32 batch sizes for effective learning, and 31
pixel value ranges, making the data more suitable for neural epochs to maintain a balance between learning and depend
networks. on training data.
b) Shifting: The values of 0.2 for the width shifting b) Transfer learning and Fine Tune: Transfer learning is a
range and height shifting range parameters cause pictures to process that uses past information to modify a model that has
be randomly shifted both horizontally and vertically. This been trained on one job for a related one. In neural networks,
enhances the diversity of the dataset and helps to create a this frequently means taking a model that has already been
model that is adaptable to small changes. trained on a generic task and applying it to a particular,
c) Zooming: The zoom_range=0.2 parameter randomly related task. The provided code snippet fine-tunes the last 50
adjusts image zoom by up to 20%, enhancing the model’s layers of a pre-trained DenseNet121 model for a specific task,
ability to handle different sizes and improving its accuracy allowing adaptation without retraining the entire model. This
and generalization with new images. is done by allowing these layers to be trainable throughout
d) Flipping: By arbitrarily mirroring images along the hor- the training, ensuring they can adapt to the specific details of
izontal axis, the setting horizontal_flip=True doubles new data.
the dataset with flipped versions, allowing the model to detect c) Hybrid Model: We combined InceptionV3, VGG19, and
features nonetheless of their left-right positioning. Xception in hybrid models for a multi-class classification
e) Rotation: The parameter rotation_range=10 task, achieving up to 89% accuracy. Carefully chosen
presents random rotation of images by up to 10 degrees. This hyperparameters and data augmentation enriched the training
feature enhances the diversity of alignments within the dataset, process. Results emphasize the batch size impact on training
thus upgrading the model’s capacity to adapt and perform dynamics, offering insights for further optimizations in model
accurately with new images. architecture.
f) Shearing: Up to a maximum of 20%, the setting d) Ensemble Learning: Ensemble models combine multiple
shear_range=0.2 presents arbitrary shape variations to individual models to improve predictions. Common strategies
images through horizontal or vertical contortions. This devel- include averaging predictions, assuming equal contribution,
opment strengthens the model’s resistance to form changes and majority voting, relying on the most frequent prediction.
and enhances its flexibility with new photos. This approach explores diverse model architectures for
enhanced image classification performance. 86%. The models exhibit consistent performance, as seen by
d) Stratified K-fold cross-validation: It is tailored the tight alignment of precision, recall, and F1 scores with
for imbalanced datasets, enhancing the evaluation of accuracy. Xception performs admirably as the best model for
classification models. It ensures balanced representation in the given job.
each fold, preventing bias toward the majority class. Each of b) The Result by Hybrid Model: Hybrid models were cre-
the k folds (usually k=5 or k=10) in the dataset serves as ated using various combinations. The InceptionV3+Xception
a testing set for the model as it is trained k times. Average hybrid is particularly noteworthy as it attains the greatest
performance across folds estimates the model’s generalization accuracy of 89% while exhibiting constant F1 scores, preci-
error. In the resource problem, two K values, K=3 and K=5 sion, and recall. With 88% accuracy, other combinations such
with random state=42, are used for dataset folding. as InceptionV3+DenseNet201 and VGG19+DenseNet201 also
e) Vision Transformer and Big Transfer: ViT and BiT perform well. These results highlight the value of mixing dif-
were compared on an image classification task [22], trained ferent CNN architectures and demonstrate how hybrid models
with identical hyper-parameters and duration (79 epochs). may perform better in the given job in terms of prediction.
ViT outperformed, indicating its superior architecture and c) The Result by Ensemble Model: Ensemble models
training approach. Both models shared the same standardized are created through multiple combinations using two criteria:
hyper-parameters for fair comparison, emphasizing the impact average prediction and majority voting. The highest accuracy
of architectural differences on performance. of 93% was achieved, surpassing other combinations. In the
”Average Prediction” table, various combinations involving
Step-3,7) Classification and Explanation with XAI: Xception, InceptionV3, DenseNet201, VGG19, and ResNet50
We compile a list of models to run for our visual tasks. consistently yield an accuracy of 93%, reflecting high pre-
Then, we execute all of the visual models with the identical cision, recall, and F1 scores. The ”Majority Voting” table
hyper-parameter configuration and save event history. All the follows a similar pattern, with the top accuracy of 93%
values of each epoch is stored in history. So in step 4, the obtained through ensemble combinations.
model’s predictions among five distinct respiratory disease d) The Result by Vision Transformer and Big-Transfer
classes such as Bacterial Pneumonia, Corona Virus Disease, Model: We also implemented Vision Transformer (ViT) and
Normal, Tuberculosis, and Viral Pneumonia are subjected to Big-Transfer (BiT) models. Vision Transformer achieved an
thorough visualization using explain ability techniques. In accuracy of 91%, while Big-Transfer achieved an accuracy of
step-8 the process involves preparing test images, running 86%.
model predictions, and applying LIME [8] and Grad-CAM [7]
to samples. The resulting visualizations provide interpretable
TABLE III
insights into the model’s decisions, fostering transparency and R ESULT OBTAINED BY TRANSFORMER AND BIG TRANSFER MODEL
trust in critical applications like medical image classification.
Model Name Accuracy Precision Recall F1
Vision Transformer (ViT) 91 91 91 91
Step-4,8) Evaluate The Models and Visualization Pre- BigTransfer (BiT) 86 86 86 86
dictions: In step-4, confusion matrix is used to compare
performance. The model’s miss classification rate has been
utilized as one of the metrics to effectively compare its e) Stratified K-fold Cross Validation We implemented the
performance across several classes. To evaluate how well K-fold technique to determine the best accuracy. Utilizing both
the model performs, we utilize the weighted F1-score mea- 3-fold and 5-fold cross-validation, we applied this technique
sure. Finally, in step-8 LIME is utilized to generate local to the base dataset. We utilized the Xception model in this
explanations for individual predictions, offering insights into K-fold analysis since it showed the best accuracy in a single
the specific features influencing the model’s decision-making. model evaluation.
Grad-CAM offers an illustrative heat map that highlights key I) Using 3-fold: The model achieved an accuracy of 90% on
areas within the input photos that are important for the model’s one fold, 95% on another fold, and 98% on the third fold. The
classifications. average accuracy across all three folds is 94.33%. Precision,
recall, and F1 scores are all very high, ranging from 94% to
V. R ESULT A NALYSIS 95%.
We implemented several models, and the results obtained II) Using 5-fold: The model achieved an accuracy of 90%
from those models are shown in Table (3-8). on one fold, 95% on another fold, 98% on another fold,
a) The Result by Pre-trained CNN Model: We con- 99% on another fold, and 99% on the fifth fold. The average
ducted experiments using seven models. The results of a accuracy across all five folds is 96.20%. Precision, recall, and
classification test using seven CNN models that have been F1 scores are all very high, ranging from 96% to 97%.
trained beforehand. At 92% accuracy, DenseNet121 and In- K-fold cross-validation ensures more efficient use of data
ceptionV3 are second and third, respectively, to Xception’s and minimizes bias, leading to enhanced accuracy compared to
93%. DenseNet201 attains 91% accuracy, whereas ResNet50, other methodologies. This comprehensive approach to model
EfficientNetB7, and VGG19 vary in accuracy from 90% to evaluation significantly contributes to the reliability of our re-
sults, providing a robust estimate of the model’s generalization [9] C. Shorten and T. M. Khoshgoftaar, “A survey on image
capability on unseen data. data augmentation for deep learning,” Journal of big data,
vol. 6, no. 1, pp. 1–48, 2019.
TABLE IV [10] M. Rahimzadeh and A. Attar, “A modified deep convolu-
R ESULT OBTAINED BY S TRATIFIED K- FOLD C ROSS VALIDATION MODEL tional neural network for detecting covid-19 and pneumo-
Scene Model Accuracy Precision Recall F1
nia from chest x-ray images based on the concatenation
3-fold Xception 94.33 94 94 94 of xception and resnet50v2,” Informatics in medicine
5-fold Xception 96.20 96 96 96 unlocked, vol. 19, p. 100360, 2020.
[11] N. N. Qaqos and O. S. Kareem, “Covid-19 diagnosis
from chest x-ray images using deep learning approach,”
VI. C ONCLUSION AND FUTURE WORK in 2020 international conference on advanced science and
engineering (ICOASE), pp. 110–116, IEEE, 2020.
The study explores automated diagnosis of lung disorders
[12] A. I. Khan, J. L. Shah, and M. M. Bhat, “Coronet: A
using Chest X-ray images. It proposes models to classify and
deep neural network for detection and diagnosis of covid-
analyze lung diseases using XAI tools like the LIME algorithm
19 from chest x-ray images,” Computer methods and
and Grad-CAM. The Xception model achieved a 96.21%
programs in biomedicine, vol. 196, p. 105581, 2020.
accuracy using stratified 5-fold cross-validation, demonstrating
[13] A. H. Al-Timemy, R. N. Khushaba, Z. M. Mosa, and
its usefulness in improving lung disease diagnosis. Future
J. Escudero, “An efficient mixture of deep and machine
research should explore integrating imaging modalities, hybrid
learning models for covid-19 and tuberculosis detection
models, image segmentation [23], and SHAP to enhance
using x-ray images in resource limited settings,” Artificial
accuracy and robustness. Investigating real-time applications
Intelligence for COVID-19, pp. 77–100, 2021.
and deploying the proposed models in clinical settings could
[14] O. Sarkar, M. R. Islam, M. K. Syfullah, M. T. Islam,
further validate their effort and effectiveness in streamlining
M. F. Ahamed, M. Ahsan, and J. Haider, “Multi-scale
the diagnostic process for lung disorders. Furthermore, col-
cnn: An explainable ai-integrated unique deep learning
laborative efforts between medical professionals and machine
framework for lung-affected disease classification,” Tech-
learning experts are crucial for refining models and ensuring
nologies, vol. 11, no. 5, p. 134, 2023.
their practical utility in diverse healthcare scenarios.
[15] Dataset1, ”https://www.kaggle.com/datasets/darshan1504/covid19-
detection-xray-dataset” [accessed: 27 February, 2024]
R EFERENCES
[16] Dataset2, ”https://www.kaggle.com/datasets/muhammadrizkyperdana/
[1] Bhandari, Mohan, et al. ”Explanatory classification of dataset” [accessed: 27 February, 2024]
CXR images into COVID-19, Pneumonia and Tuberculo- [17] Dataset3, ”https://www.kaggle.com/datasets/paultimothymooney/ches
sis using deep learning and XAI.” Computers in Biology xray-pneumonia” [accessed: 27 February, 2024]
and Medicine 150 (2022): 106156. [18] Dataset4, ”https://www.kaggle.com/datasets/jtiptj/chest-
[2] Ciotti, Marco, et al. ”The COVID-19 pandemic.” Critical xray-pneumoniacovid19tuberculosis” [accessed: 27
reviews in clinical laboratory sciences 57.6 (2020): 365- February, 2024]
388. [19] Dataset5, ”https://www.kaggle.com/datasets/iamsuyogjadhav/chest-
[3] Pneumonia, ”https://ourworldindata.org/pneumonia” [ac- x-ray-14-lungs-cropped” [accessed: 27 February, 2024]
cessed: 27 November, 2023] [20] Dataset6, ”https://www.kaggle.com/datasets/tawsifurrahman/tuberculo
[4] Tuberculosis, ”https://www.who.int/news-room/fact- tb-chest-xray-dataset” [accessed: 27 February, 2024]
sheets/detail/tuberculosis.” [accessed: 27 November, [21] visipics, ”https://visipics.info/” [accessed: 27 November,
2023] 2023]
[5] Dataset, ”https://www.kaggle.com/datasets/omkarmano- [22] I. H. Sarker, “Deep learning: a comprehensive overview
hardalvi/lungs-disease-dataset-4-types” [accessed: 27 on techniques, taxonomy, applications and research di-
November, 2023] rections,” SN Computer Science, vol. 2, no. 6, p. 420,
[6] Arrieta, Alejandro Barredo, et al. ”Explainable Artificial 2021.
Intelligence (XAI): Concepts, taxonomies, opportunities [23] https://keras.io/api/applications
and challenges toward responsible AI.” Information fu- [24] S. Minaee, Y. Boykov, F. Porikli, A. Plaza, N. Kehtar-
sion 58 (2020): 82-115. navaz, and D. Terzopoulos, “Image segmentation using
[7] Selvaraju, Ramprasaath R., et al. ”Grad-cam: Visual deep learning: A survey,” IEEE transactions on pattern
explanations from deep networks via gradient-based lo- analysis and machine intelligence, vol. 44, no. 7, pp.
calization.” Proceedings of the IEEE international con- 3523–3542, 2021.
ference on computer vision. 2017.
[8] Garreau, Damien, and Ulrike Luxburg. ”Explaining the
explainer: A first theoretical analysis of LIME.” Interna-
tional conference on artificial intelligence and statistics.
PMLR, 2020.