0% found this document useful (0 votes)
149 views10 pages

Lung Disease Detection via AI Techniques

Uploaded by

joestanly8055
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
149 views10 pages

Lung Disease Detection via AI Techniques

Uploaded by

joestanly8055
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Journal of

Computer Science and Software Development

Research Article Open Access

Detection and Classification of Lung Diseases using Machine and Deep


Learning Techniques
Syamala KPL*, Niharika CS, Jenny AM, Pavani P
Department of Computer Science and Engineering, Kalasalingam Academy of Research and Education, Krishnankoil, Tamil-
Nadu, India

Received Date: January 09, 2023 Accepted Date: February 09, 2023 Published Date: February 12, 2023

Corresponding author: Syamala KPL, Dept of Computer Science and Engineering, Kalasalingam Academy of Research and
*

Education, Krishnankoil, TamilNadu, India, Email: poojitakkr@[Link]

Citation: Syamala KPL, Niharika CS, Jenny AM, Pavani P (2023) Detection and Classification of Lung Diseases using Machine
and Deep Learning Techniques. J Comput Sci Software Dev 2: 1-10

Abstract

The change in the environment, pollution, and some unwanted daily habits, such as smoking, drinking, etc., can lead to
many lung diseases, which need early detection. As a result of smoking, smokers and surrounding people are infected with
lung diseases, especially by breathing problems. This paper proposes a website that takes the symptoms of the patient and
determines if any disease is present and gives a grade that indicates how severe or moderate the disease is. In the case that
the user has an x-ray image and wants a cross-verification, he or she can upload the image and view the results. Thus, the
user input can be a string or an image. We focus on detecting chronic lung disease in an early stage, which, in turn, enhances
the chances of recovery and survival. Our paper contains a hypothesis that utilizes deep learning and machine learning to
predict diseases such as COVID-19, Tuberculosis, Pneumonia, and COPD. For Covid-19 and COPD, we achieved accuracy
of 96.90%, 90.32% respectively using classification algorithms, and for image dataset, we obtained accuracy of 98.58% using
EfficientNet B0, a deep learning algorithm.

Keywords: X-ray, Dataset, Machine Learning, Classification, Efficient Net B0, Deep Learning

©2023 The Authors. Published by the JScholar under the terms of the Crea-
tive Commons Attribution License [Link]
by/3.0/, which permits unrestricted use, provided the original author and
source are credited.

JScholar Publishers J Comput Sci Software Dev 2023 | Vol 2: 102


2

Introduction deaths each year and are straining healthcare systems. A timely
diagnosis is imperative for enhancing long-term survival and
Lung disease refers to a variety of medical conditions improving the chance of recovery.
that cause the lungs to work inefficiently. The most common
lung diseases are Asthma, COPD, Hypertension, Lung cancer, There are several causes of lung diseases including
Pneumonia, Tuberculosis, Pulmonary edema. Among these smoking, alcohol consumption, pollution. COPD affects 65 mil-
diseases, in this paper we forecasted Pneumonia, Tuberculosis, lion people worldwide and kills 3 million people each year, mak-
Covid-19 and COPD using Machine Learning and Deep Learning ing it the third most common cause of death. Almost 15 percent,
techniques. Across the globe, lung diseases cause millions of or roughly one in seven, middle-aged, older, and adults have lung
disease.

Figure 1: Causes of Lung Diseases

In Covid times recently, having a chronic lung disease as well as EfficientNet B0, a deep learning algorithm used to
meantyou were at high risk for severe illness and complications, detect diseases based on chest X-ray images. We used different
also causing deaths. One-fourth of covid-19 cases involve an algorithms to detect the disease, among them wechoose a
infection that affects both lungs. best model with good accuracy rate. The development of deep
learning technology on medical images,such as Chest X-rays, has
In this paper we present machine learning algorithms shown great potential for detectinglung disease.
to determine the severity of diseases based on their symptoms,

Figure 2: Different Types of Lung Diseases

JScholar Publishers J Comput Sci Software Dev 2023 | Vol 2: 102


3
Problem Statement COPD

In this project by using both Machine Learning and Our COPD dataset include 101 patients and 24 vari-
Deep Learning to take best features by combining the processing ables. There is information on their characteristics variables such
of patient information with data from symptoms as information as AGE, GENDER and Smoking, disease severity, and co-mor-
and from X-ray images, using EfficientNetB0 as a well-trained bidities. It’s also having measures of their walking ability, quality
model, to predict patient has a lung disease. As technology in- of life, and anxiety and depression. The different stages of COPD
creases and world is changing so fast that the pressure on health in the dataset are taken as Gold1 to Gold4 as Mild, Moderate,
is rapidly increasing due the changes in environment and climate Severe and VerySevere
which increased the risk of disease for people. One of the issues
will be focused in this paper i.e., Lung diseases. This application Tuberculosis
is applied before the treatment of patient in health care systems
and in addition patient information can provide better service The tuberculosis dataset consists 16 columns with
during the treatment. symptoms such as fever, coughing blood, chestpain, night sweats,
weight loss etc., In Gender column male and female are indicated
Database as 0 refers to women and 1 to men.

In this paper, we provide the work of experimental COVID


analysis of the proposed model on various popular lungdiseases
datasets, such as COVID, TUBERCULOSIS,COPD DATASETS In COVID dataset has different symptoms as Cough,
and CHEST X-RAY DATASET. This project uses different types Fever, Sore Throat, Shortness of Breath, Headache, persons with
of diseases datasets from Kaggle. Before moving into the results, age above 60 and above, and Gender. The categorical data is pre-
we are going to give a brief overview of our datasets. processed to convert to numerical data.

Figure 3: Depicts the detail information of dataset

Image dataset

The dataset has a total of 7135 x-ray images are present,


which includes four different diseases as subfolders under train,
test and Val, the subfolders for each image category as Normal,
Pneumonia, covid-19, Tuberculosis. The EfficientNetB0 tech-
nique is used to detect and classify the disease from x-ray images

JScholar Publishers J Comput Sci Software Dev 2023 | Vol 2: 102


4

Figure 4: Some Sample pictures of the Image Dataset

Proposed Methodology For Image dataset we use activation function as ReLU for input
layers and SoftMax for output layer.
In our model we worked on both csv dataset which con-
tains disease symptoms and image dataset which consists of ra- In our model we built a website using streamlit a open-
diology images, for csv dataset we worked on 3 types of diseases source python library used for creating and sharing webapps for
such as Covid, Tuberculosis and COPD. In that for each dataset Machine Learning projects. This website will be in private and it
we use various algorithms such as decision trees for both Covid, should be deployed in Heroku to make it as public. In the website
kmeans Clustering for Tuberculosis respectively. For Image data- the disease is predicted as positive or negative based on the input
set we use EfficientNetB0 which is pre-defined ImageNet model. given by the patients.

Figure 5: Steps to follow for Building CNN to data

JScholar Publishers J Comput Sci Software Dev 2023 | Vol 2: 102


5

Implementation Second step to train the images by labeling the columns


and the filepaths and taking the target size.
Import all the required libraries for building the model.
For numerical calculations import NumPy, for reading csv file Third step is to design an neural network model by ini-
import Pandas and machine learning algorithms as KNeighbor- tializing the Efficient Net B0 model which is used for creating the
Classifier, DecisionTreeClassifier for predicting the model, Keras deep learning models for improving theefficiency and accuracy
for developing and evolution of deep learning models and we
will import the dataset. Here, we also import some layers, some In this model Efficient net b0 model is connected to dif-
Keras library like dense,Conv2D, Maxpooling2D, Flatten, Drop- ferent input layers such as Normalization, ZeroPadding2D, Batch
out & keras applications as EfficientNetB0 Fig 9 shows the steps Normalization, Conv2D all these layers are used to normalize
to follow for building CNN model. the output of the previous layers. In the output layer, the SoftMax
function serves as an activation [Link] loss function used
In the dataset there are four types of diseases they are as is categorical cross entropy for multi- class classification for giv-
Covid, Pneumonia, Tuberculosis and without any disease. After ing two or more output labels. The optimizer used is the Adam
importing the images dataset, first step is to preprocess the data which is a stochastic gradient descent method for training the
by creating a Data Frame with the filepath and the labels of the model.
pictures.

Figure 6: The concept of Convolutional Neural Network (CNN)

Figure 7: ReLu Activation Function

JScholar Publishers J Comput Sci Software Dev 2023 | Vol 2: 102


6

Figure 8: ReLu Activation Function Derivative

Figure 9: Steps to follow for Building CNN to data

The following are the layers of CNN: the number of parameters and computations, thereby shortening
the training time and reducingoverfitting.
Convolutional Layer: Layers of convolution are used
to retrieve images features, which are edges, intersection points, Dense layers: CNN’s bottom layer is the convolution
giving rich information. The number of layers matters here. layer, the layer which takes all the feature data produced by the
convolution layers and analyzes it.
We can change the architecture by using different acti-
vation functions with different numbers of features. Dropout Layers: For preventing overfitting of irregural
turns of neurons in deep learning networks, we will have many
The network will include the following components: weight parameters and bias parameters.

Activation functions: Our model includes the Relu In the dropout layer, we select specific features from the
andSoftmax activation functions that are applied to all theout- input layer and a specific set of neurons from the hidden layer,
put layers. according to the p value. Some neurons and features are deacti-
vated and others are activated.
Pooling Layers: A convolution layer is added after it,
performing continuous dimensionality reduction i.e., reducing Dropout ratio:0 ≤ p ≤ 1

JScholar Publishers J Comput Sci Software Dev 2023 | Vol 2: 102


7

Batch Normalization: The method is used for training to the disease then it predicts the whether the disease is there or
neural network models. As the batch normalization increases, not based on the algorithms that are fitted with different models.
the epochs required to train the deep neural network model de- In other way, radiology images are given as the input where the
crease. Each layer of the neural network can learn independently, model is trained with the efficientnetb0 and fitted with differ-
enabling faster training. ent activation layers to predict the disease with the best accuracy
for the images as dataset. Finally, the website is an interface that
To make the model, we use Adam as the optimizer, loss would be helpful for patients to predict the disease by the symp-
as categorical cross-entropy, and metrics as accuracy. Afterbuild- toms with a good accuracy and efficiently.
ing and compiling the model, the data is split into training data
and validation data. Our model takes the batch size as 32 with 15 Experiment Results
epochs.
We use both Deep Learning and Machine Learning
After completion of training, we evaluate the model and Algorithms to predict various lung diseases such as COVID-19,
calculate the loss and accuracy. Tuberculosis, Pneumonia, and COPD. For Symptoms dataset i.e.,
CSV Data, For Covid-19 and COPD we got accuracies of 96.90%,
To predict the disease for symptoms by using we use 90.32% respectively using classification algorithm i.e., Decision
the machine learning algorithms. Firstly, import the NumPy and Trees. And ForTuberculosis we use K-Means Clustering.
Pandas for linear algebra and data processing and readingthe
csv file. Then preprocessed the data and changed the categorical For Image Dataset, Which Contains Radiology Imag-
data to the numerical data. Secondly, split the data to train, test es of Lung Diseases like Covid, Pneumonia, Tuberculosis and
sets. Finally, the fit model into the algorithms. The algorithms we COPD we obtained accuracy of 98.58% using EfficientNet B0, a
used are KNN algorithmsand Decision Tree Classifier which are deep learning algorithm. By Using Streamlit Package We build a
supervised machine learning algorithm used for solving both the website which helps the User/Patient in detecting chronic lung
regression and classification problems. A website is developed disease in an early stage, which in turn, enhances the chances of
where we can predict the disease by using both symptoms and recovery and survival.
radiology images. The symptoms of the patient is givenaccording

Figure 10: User Interface of website

JScholar Publishers J Comput Sci Software Dev 2023 | Vol 2: 102


8

Figure 11: Prediction via Symptoms

Figure 12: Prediction via Radiology Images

In this project we tried to develop a website for predict- helps in the prior treatment for the patient. We used both ma-
ing and classifying lung diseases with a better accuracy which chine learning and deep learning algorithms for better classifi-
cation and prediction.

JScholar Publishers J Comput Sci Software Dev 2023 | Vol 2: 102


9

Figure 13: Heat Map for COPD Dataset model

Table 1: Accuracy Results

Disease Model Accuracy


Covid Decision Trees 96.90%
Tuberculosis K-Meansclustering Error rate-0.33
COPD Decision Trees 90.32%
Radiology images(Covid, Pneumonia
Tuberculosis) EfficientNetB0 Train-95.58%Test-86.18%

Conclusion images to predict the severity of the disease. We initially build


a model and trained it such that it is capable of detecting and
The main aim of our research paper is to build a web- classifying the images and the symptoms in real-time. The model
site to predict different types of lung diseases using symptoms gave good accuracy but takes much time to train the data.
and x-ray images of patients in real-time manner. The work done Hereby, I conclude this paper by hoping that you got a fairknowl-
by us made the model to work in a better way such that it can edge, idea and understood the whole concept of designing the
predict the diseases using different patients x-ray images. In our predicting lung disease using ML and DL in real-time, by using
model we use Machine Learning algorithms as Decision Trees, pre-trained models like Efficient Net B0 for better performance
clustering for the prediction of symptoms and Keras which is a of models.
python library helps in deeplearning model, tensor flow for x-ray

JScholar Publishers J Comput Sci Software Dev 2023 | Vol 2: 102


10

References

1. Shimpy Goyal, Rajiv Singh (2021) Detection and classi-


fication of lung diseases for pneumonia and covid-19 using ma-
chine and deep learning techniques.

2. Asmaa Abbas, Mohammed [Link], Mohamed


Medhat Gaber (2020) Classification of Covid-19 in chest x-ray
images using DeTraC deep convolutional neural network.

3. Sema Candemir, Stefan Jaeger, Rahul [Link], Kan-


nappan Palaniappan (2013) Lung Segmentation in chest Radio-
graphs using Anatomical Atlases With Nonrigid Registration.

4. Alexanderos Karargyris, Les Folio, Fiona Callaghan,


Zhiyun Xue. Automatic Tuberculosis Screening Using Chest Ra-
diographs.

5. Stefanus Tao Hwa Kieu, Abdullah Bade, Mohd Hana-


fi Ahmad Hijazi, Hoshang Kolivand (2020) A survey of Deep
Learning for lung disease detection on medical images- State of
the art Taxonomy.

6. Siddhanth Tripathi, Sicnhana Shetty, Somil Jain, Van-


shika Sharma (2021) Lung disease detection using deep learning.

7. Latheesh Mangeri, Gnana Prakasi, Neeraj Puppala Submit your manuscript to a JScholar journal
(2021) Chest diseases prediciton from x-ray images using CNN and benefit from:
model 12. ¶ Convenient online submission
¶ Rigorous peer review
8. Anuradha [Link], Achala [Link], Harsha ¶ Immediate publication on acceptance
Thirimanna (2020) Early prediction of lung diseases. ¶ Open access: articles freely available online
¶ High visibility within the field
9. Ishan Sen, Ikbal Hossain, Faisal Shakib, Asaduzzaman ¶ Better discount for your subsequent articles
Imran (2020) Depth analysis of lung disease prediction using Submit your manuscript at
machine learning algorithms. [Link]

10. Matthew Zak, Adam Krzyzak (2020) Classification of


lung diseases using deep leanring models”, International Confer-
ence on Computational Science.

11. Anuj Rohilla, Rahul Hooda, Ajay Mittal (2017) TB De-


tection in chest radiograph using deep learning architecture 6.

12. Peng Gang, Jiang Hui, Wei Zeng, [Link] (2018) Deep
leanring with lung segmentation and bone shadow exclusive te-
chinques for chest x- ray analysis of lung cancer”, ICCSEEA.

JScholar Publishers J Comput Sci Software Dev 2023 | Vol 2: 102

Common questions

Powered by AI

The research utilizes both CSV datasets containing disease symptoms and image datasets of X-ray images for predicting lung diseases. The CSV datasets record symptoms for diseases like COVID-19, Tuberculosis, and COPD, enabling machine learning algorithms to correlate symptoms with diagnoses. Meanwhile, the X-ray images allow deep learning models like EfficientNet B0 to identify disease presence and severity through visual analysis. By combining these types of data, the models leverage both structured clinical information and unstructured image data, enhancing prediction accuracy and robustness by drawing on complementary features .

Symptom datasets contribute to disease prediction models by providing structured, categorical data that represent clinical manifestations observable during disease progression. These datasets allow models to recognize patterns or combinations of symptoms typical of particular diseases. In contrast, radiology images offer unstructured visual information that can reveal physical manifestations and severities of diseases not easily detectable through symptom questioning alone. The integration of these two types of data enables a more comprehensive analysis where algorithms benefit from both symptom correlation and visual diagnostic insights, enhancing the prediction accuracy and diagnostic value of the models .

Deep learning technology like EfficientNet B0 has shown potential for detecting lung diseases by analyzing chest X-ray images with high accuracy. This model leverages a deep convolutional neural network to differentiate between various conditions such as COVID-19, pneumonia, and tuberculosis with efficient training processes and improved accuracy rates. The model's ability to process large datasets quickly and accurately reduces the time needed for diagnosis, thus potentially improving clinical outcomes and resource allocation in medical settings .

Despite its high accuracy, training a model using EfficientNet B0 for real-time lung disease prediction poses challenges such as computational intensity requiring high processing power and memory capacity. Training such deep learning models can be time-consuming, necessitating significant computational resources that aren't always available in real-time applications. Furthermore, the model needs substantial volumes of labeled data to generalize adequately across varied conditions, and ensuring data quality can also be a concern. Managing the risk of overfitting due to the complexity of the model architecture is another potential challenge when model robustness is essential for real-time predictions .

Smoking, alcohol consumption, and pollution contribute to lung diseases by damaging lung tissue and impairing immune function. Smoking introduces carcinogens, tar, and toxic chemicals, leading to chronic obstructive pulmonary disease (COPD) and lung cancer. Alcohol consumption can depress the immune system, making individuals more susceptible to respiratory infections, while pollution involves inhalation of particulate matter and toxins, causing inflammation and exacerbating conditions like asthma and COPD .

Decision tree classifiers are chosen for predicting lung diseases like COVID-19 and COPD from symptoms datasets because they effectively handle categorical data, which is common in descriptive symptom datasets. They provide clear visualization of decisions, making the model interpretability easier, and are particularly effective in handling nonlinear data interactions. Decision trees can manage missing values and require less data preprocessing, which is advantageous given the diverse and incomplete nature of symptom datasets in medical applications. Furthermore, their robustness to noise and capacity to handle multi-class classifications support precise predictions for complex datasets like those involving multiple symptoms .

Deploying a web-based interface for lung disease prediction significantly impacts healthcare delivery by enhancing accessibility and timeliness of diagnosis. This platform allows patients and healthcare providers to input symptoms or X-ray images directly, leveraging machine learning algorithms to predict disease presence rapidly. It democratizes access to diagnostic tools, potentially alleviating healthcare system burdens by preliminarily identifying disease states without immediate specialist intervention. This not only speeds up the diagnostic process, enabling earlier interventions but also supports remote diagnosis, which is vital in areas with limited healthcare facilities .

Individuals with chronic lung diseases are at high risk for severe COVID-19 outcomes due to compromised lung function, which can hinder effective response to infections. The inflammation and tissue damage associated with chronic conditions like COPD and asthma can exacerbate the respiratory failure induced by COVID-19. Additionally, weakened immune responses and co-morbidities often present in those with chronic lung diseases increase susceptibility to severe complications and reduce recovery prospects .

Integrating machine learning with patient information through a web-based interface revolutionizes patient care by personalizing healthcare experiences and enhancing predictive precision. This technology facilitates real-time data processing, allowing for dynamic updating of patient records and predictive analytics regarding potential disease states or progressions. It supports personalized treatment plans by integrating patient history with current symptomatology, offering tailored advice and interventions. Additionally, it empowers patients to take an active role in their health management through accessible, user-friendly platforms, potentially leading to better health outcomes and increased patient engagement in their own care .

Pre-processing is crucial in developing machine learning models for lung disease detection as it ensures data quality and model efficiency. It involves converting categorical data into numerical formats, normalizing and standardizing datasets to facilitate accurate algorithmic processing. Moreover, pre-processing includes handling missing values and removing noise, which improves the model's accuracy and reliability. It prepares the datasets, such as symptom codification and image resolution adjustment, ensuring seamless integration into algorithms and aiding in more precise predictions by enhancing data consistency .

You might also like