Final Report
Final Report
A PROJECT REPORT ON
BACHELOR OF ENGINEERING
In
Computer Science And Engineering & Artificial Intelligence Engineering
Submitted by
CERTIFICATE
This is to certify that the project entitled “PREDICTION OF LUNG CANCER USING DEEP LEARNING
MODEL” carried out by Ms. VIMALA K V(1VE20CA024), Ms. DHRUTHI S(1VE20CA006), Ms.
AISHWARYA RAJU S (1VE20CS008), Mr. CHARAN N (1VE20CA030), a bonafide student of Sri
Venkateshwara College of Engineering, in partial fulfilment for the award of Bachelor of Engineering in
Computer Science and Engineering and Artificial Intelligence of Visvesvaraya Technological University,
Belgaum during the academic year 2023-2024. It is certified that all corrections/suggestions indicated for Internal
Assessment have been incorporated in the Report deposited in the departmental library.
External viva-voice
2…………………………….......... ………………………………..
DECLARATION
The euphoria and complacency of completing this technologically advanced project will not
be complete until we thank all the people who have helped us in completing this enthusiastic
work. Submission of this project marks a milestone in our academic career.
It is our privilege to express heartfelt gratitude to the management of SVCE and our beloved
Principal Dr. Nageswara Guptha M, Sri Venkateshwara College of Engineering Bengaluru,
for providing necessary support and encouragement.
We are thankful to Dr. Hema M S, Head, Department of Computer Science & Engineering
and Dr. Prathima V R, Head, Department of Computer Science Engineering-Artificial
Intelligence for Overall guidance, and co-operation.
I would like to express our sincere thanks to the project coordinator Dr. POORNIMA G R,
Professor and Dean Academics, SVCE Bengaluru, for guidance and support in bringing this
project to completion.
I would also like to express our sincere thanks to the Internal guide, Dr. SUMA T, Professor,
Department of Computer Science and Engineering, SVCE, Bengaluru.
Finally, we would like to express our heart full thanks to parents and friends for their invaluable
help, constant support and motivation for helping us to complete our project work successfully.
INSTITUTE MISSION
M-1: Nurture students with professional and ethical outlook to identify needs, analyze, design
and innovate sustainable solutions through lifelong learning in service of society as individual
or a team.
M-2: Establish State of the Art Laboratories and Information Resource center for education
and Research.
M-3: Collaborate with Industry, Government Organization and Society to align the curriculum
and outreach activities.
PROGRAM OUTCOMES
Engineering Graduates will be able to:
PO2. Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of mathematics,
natural sciences, and engineering sciences.
PO5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and
modern engineering and IT tools including prediction and modeling to complex engineering
activities with an understanding of the limitations.
PO6. The engineer and society: Apply reasoning informed by the contextual knowledge to
assess societal, health, safety, legal and cultural issues and the consequent responsibilities
relevant to the professional engineering practice.
PO9. Individual and team work: Function effectively as an individual, and as a member or
leader in diverse teams, and in multidisciplinary settings.
PO11. Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.
PO12. Life-long learning: Recognize the need for and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological change.
ABSTRACT
Lung cancer, responsible for over 1.8 million deaths annually, remains a critical global health
issue. Early detection significantly improves patient outcomes and reduces mortality rates, yet
current screening methods, such as chest X-rays and CT scans, have notable limitations. Chest
X-rays often fail to detect small nodules or early-stage cancers, leading to delayed diagnoses.
While CT scans, particularly low-dose CT (LDCT), are more sensitive, they are costly, expose
patients to radiation, and frequently yield false positives, resulting in unnecessary biopsies and
patient anxiety. Computer-aided detection (CAD) systems attempt to mitigate these issues by
assisting radiologists with pattern recognition and image processing, but they still produce a
high number of false positives, have limited sensitivity, and rely on radiologists' interpretations,
making the process time-consuming and prone to human error.
To address these challenges, we propose a two-stage deep learning model designed to enhance
the early detection and classification of lung cancer. In the first stage, our model uses
convolutional neural networks (CNNs) to analyze CT scan data, distinguishing between benign
and malignant lung nodules. CNNs are particularly effective in identifying spatial hierarchies
in images, crucial for accurately detecting lung nodules. The model outputs the classification
along with a confidence level, indicating the certainty of the prediction. This stage aims to
reduce the false negative rate, ensuring that more malignant cases are identified early on, while
also minimizing false positives to reduce unnecessary follow-ups.
In the second stage, the model further analyzes the identified malignant nodules, classifying
them into one of four specific types: adenocarcinoma, squamous cell carcinoma, large cell
carcinoma, normal . This detailed classification is achieved through additional layers of CNNs,
which refine the features extracted from the malignant nodules. The output includes the specific
type of malignant nodule and a confidence level for each classification, providing a
comprehensive assessment that aids in precise diagnosis and treatment planning. This two-
stage approach is designed for integration into clinical workflows, offering real-time, detailed
assessments to assist radiologists. By enhancing the accuracy and speed of lung cancer
detection and providing detailed classifications, our model aims to facilitate timely
interventions and improve patient outcomes, ultimately contributing to a reduction in lung
cancer mortality rates.
TABLE OF CONTENTS
6 Implementation 26-38
6.1 Data Source 26
6.2 Dataset Collection 26
6.3 Data Visualization 27
6.4 Data Pre-Processing 27
6.5 Machine Learning Algorithms 27-29
6.5.1 Alexa Net 27-28
6.5.2 Efficientnetb0 28-29
6.7 Accuracy 30
6.8 Source Code 30-38
CHAPTER 1
INTRODUCTION
Deep learning models have revolutionized lung cancer detection, significantly improving early
diagnosis and treatment outcomes. Historically, lung cancer, a leading cause of cancer-related
deaths worldwide, posed significant challenges due to late-stage diagnosis. However, with the
advent of deep learning in medical imaging, a transformative shift has occurred. Deep
learning's journey in lung cancer detection began with the pioneering work of researchers who
developed convolutional neural networks (CNNs) for image analysis. These models, inspired
by the human visual system, have shown remarkable capabilities in recognizing patterns and
features indicative of lung cancer on radiological images.
Early studies in the 2010s laid the groundwork for deep learning's application in lung cancer
detection, demonstrating its potential in accurately identifying malignant nodules from chest
X-rays and CT scans. Over the years, with the accumulation of large-scale medical image
datasets and advancements in computational power, deep learning models have become
increasingly sophisticated. The introduction of architectures like Efficientnet and Alexanet
further improved model performance, enabling finer segmentation of tumors and more precise
localization of abnormalities.
The adoption of deep learning in lung cancer diagnosis accelerated in the mid-2010s, with
researchers and healthcare institutions worldwide leveraging these models to enhance
radiologists' diagnostic capabilities. By providing automated analysis and assisting in the
interpretation of medical images, deep learning has significantly reduced the time and resources
required for diagnosis while improving accuracy and consistency.
Today, deep learning continues to play a pivotal role in lung cancer management, with models
capable of not only detecting tumours but also predicting patient prognosis, treatment response,
and disease recurrence. As deep learning algorithms become more sophisticated and data
availability increases, the future holds promise for even more accurate, efficient, and
personalized approaches to lung cancer detection and treatment.
1.1 OBJECTIVES
Early Detection: Our deep learning model utilizes advanced image analysis techniques to
detect subtle signs of lung cancer in medical scans, enabling early diagnosis and timely
intervention, crucial for improving patient outcomes.
High Accuracy: With state-of-the-art convolutional neural networks (CNNs), our model
achieves exceptional accuracy in identifying malignant lung nodules, minimizing false
positives and negatives, ensuring reliable diagnosis.
Speed and Efficiency: Leveraging parallel processing and optimized algorithms, our model
rapidly analyzes medical images, providing quick and efficient results, crucial for accelerating
the diagnostic process and facilitating prompt treatment decisions.
Reduction of Human Error: By automating the interpretation of medical scans, our model
reduces reliance on human interpretation, minimizing the risk of diagnostic errors and ensuring
consistency in lung cancer detection.
Reduced Healthcare Costs: By enabling early detection, reducing diagnostic errors, and
streamlining the diagnostic process, our model contributes to significant cost savings in
healthcare, making lung cancer screening and diagnosis more accessible and affordable.
Machine learning is an application of artificial intelligence (AI) which provides systems the
ability to automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on the development of computer programs that can
access the data and that data can be used to learn themselves. The process of learning starts
with observations or data, such as examples, direct experience, or instruction, in order to look
for patterns in data and makes better decision in the future based on the examples that we
provide. The primary aim is to allow the computers learn automatically without human
intervention of assistance and adjust actions accordingly. Machine learning algorithms are
often categorized as supervised or unsupervised.
Supervised machine learning algorithms can be applied to what has been learned in the past
to new data using labelled examples to predict future events. Starting from the analysis of
known training dataset, the learning algorithm produces an inferred function to make
predictions about the output values. The system will be able to provide targets for any new
input after sufficient training. The learning algorithm can also be compared with its output for
intended output and find errors in order to modify the model accordingly.
Unsupervised machine learning algorithms are used when the information which is used for
training is neither classified nor labelled. Unsupervised machine learning studies about how
systems can infer a function to describe a hidden structure from unlabelled data. The system
does not figure out the right output, but it can explore the data and draw the inferences from
datasets to describe hidden structures from unlabelled data
Reinforcement machine learning algorithms is a learning method that interacts with its
environment by producing actions and discovers errors or rewards. Trial and error search and
delaying in rewards are the most relevant characteristics of reinforcement learning. This
method allows machines and software agents to automatically determine the ideal behavior
within a specific contest in order to maximize its performance. Simple reward feedback is
required for the agent to learn which is best action, this is known as the reinforcement signal.
Machine learning can be implemented to analyze massive quantities of data.
While it generally delivers faster, more accurate results in order to identify profitable
opportunities or dangerous risks, it may also require additional time and the resources to train
it properly. Combining machine learning with Al and cognitive technologies can make it more
effective in processing large volumes of information.
NumPy
NumPy, is short form for Numerical Python, which is the foundational package for scientific
computing in Python. The majority of the book will be based on NumPy and libraries which
are built on top of NumPy. It provides, among other things
Pandas
Pandas is designed to provide rich data structures and functions to make working with
structured data fast, easy, and expressive. It is, as you will see, one of the critical ingredients
enabling Python to be a powerful and productive data analysis environment. The primary object
of pandas are, that will be used in Data Frame, a two dimensional tabular, column oriented data
structure with both row and column label pandas combine the high performance array
computing features of NumPy with the flexible data manipulation capabilities of spreadsheets
and relational databases (such as SQL). It provides sophisticated indexing functionality to make
it easy to reshape, slice and dice, perform aggregations, and select subsets of data pandas is the
primary tool. For financial users, pandas feature rich, high-performance time series
functionality and tools well-suited for working with financial data. The pandas name itself is
derived from panel data, an econometrics term for multidimensional structured datasets, and
Python data analysis itself.
Sklearn
SkLearn is a library in Python that provides many unsupervised and supervised learning
algorithms. It's built upon some of the technology you might already be familiar with, like
NumPy, pandas, and Matplotlib! The functionality that scikit-learn provides include:
SciPy
Matplotlib
Matplotlib is a python library used to create 2D graphs and plots by using python scripts. It has
a module named pyplot which makes things easy for plotting by providing feature to control
line styles, font properties, formatting axes etc. It supports a wide variety of graphs and plots
namely histogram, bar charts, power spectra, error charts etc. It is used along with NumPy to
provide an environment that is an effective open-source alternative for Matlab. Matplotlib can
also be used with graphics toolkits like PyQt and wxPython.
Tensorflow
TensorFlow provides a flexible framework for constructing various types of neural networks,
including convolutional neural networks (CNNs) commonly used for medical image analysis.
These CNNs excel at learning hierarchical features from medical images, crucial for accurately
detecting lung cancer nodules.
The library offers extensive support for GPU acceleration, allowing models to train faster and
handle large-scale medical image datasets efficiently. This speed and scalability are essential
for processing the vast amounts of data required for accurate lung cancer prediction.
Streamlit
Streamlit is an open-source Python library designed to create powerful and interactive web
applications for data science and machine learning projects. It allows you to build intuitive and
user-friendly web interfaces directly from Python scripts, enabling rapid prototyping and
deployment. Streamlit provides a straightforward and intuitive API, allowing you to create web
apps using familiar Python syntax. You can easily add widgets like sliders, buttons, text inputs,
and plots to your app.With Streamlit, you can quickly turn your Python scripts into web apps.
There's no need to write HTML, CSS, or JavaScript code. You can focus on your data analysis
or machine learning model, and Streamlit handles the rest.Streamlit widgets enable users to
interact with your data or models dynamically. Users can upload files, adjust parameters, and
see real-time updates of the results.Streamlit seamlessly integrates with popular data
visualization libraries like Matplotlib, Plotly, and Altair, allowing you to create interactive
charts and plots directly in your web app.
When it comes to data science, machine learning is one of the most significant elements used
to maximize value from data. With Python as the data science tool, exploring the basics of
machine learning becomes easy and effective. In a nutshell, machine learning is more about
statistics, mathematical optimization, and probability. It has become the most preferred
machine learning tool in the way it allows aspirants to do mathematics easily. Name any math
function, and you have a Python package meeting the requirement. There is NumPy for
numerical linear algebra, Pandas for data manipulation, matplotlib for embedding plots into
applications, seaborn for data visualization, and scikit for providing features like classification,
regression and clustering algorithms. With the grip on the basics of machine learning algorithm
including logistic regression and linear regression, it makes easy to implement machine
learning systems for predictions by the way of its scikit-learn library. It is easy to customize
for neutral networks and deep learning with libraries including Keras, Theano and TensorFlow.
Data science landscape is changing rapidly, and tools used for extracting value from data
science have also grown largely. The two popular languages that fight for the top spot are R
and Python. Both are revered by enthusiasts, and both come with their strengths and
weaknesses. But with the tech giants like Google showing the way to use Python and with the
learning curve made short and easy, python inches ahead to become the most popular language
in the data science world.
CHAPTER 2
LITERATURE SURVEY
A literature survey or a literature review in a project report shows the various analyses and
research made in the field of interest and the results already published, taking into account the
various parameters of the project and the extent of the project. Literature survey is mainly
carried out in order to analyze the background of the current project which helps to find out
flaws in the existing system & guides on which unsolved problems we can work out. So, the
following topics not only illustrate the background of the project but also uncover the problems
and flaws which motivated to propose solutions and work on this project. A literature survey is
a text of a scholarly paper, which includes the current knowledge including substantive
findings, as well as theoretical and methodological contributions to a particular topic. Literature
reviews use secondary sources, and do not report new or original experimental work. Most
often associated with academic-oriented literature, such as a thesis, dissertation or a peer-
reviewed journal article, a literature review usually precedes the methodology and results
sectional though this is not always the case. Literature reviews are also common in are search
proposal or prospectus (the document that is approved before a student formally begins a
dissertation or thesis). Its main goals are to situate the current study within the body of literature
and to provide context for the particular reader. Literature reviews are a basis for researching
nearly every academic field.
Literature survey describes about the existing work on the given project. It deals with the
problem associated with the existing system and also gives user a clear knowledge on how to
deal with the existing problems and how to provide solution to the existing problems.
Lung cancer is one of the major causes of death across the globe. Medical interventions with
modern healthcare facilities are widely used to cure lung cancer. However, it is indispensable
to have research on early detection of lung cancer as it has potential to save lives of people.
With innovations in machine learning Computer Aided Design (CAD) systems for automatic
detection of lung cancer has become an important solution. Particularly deep learning models
such as Convolutional Neural Network (CNN) is found to have necessary mechanisms to learn
features from Computed Tomography (CT) scan images and detect the probability of lung
cancer. In this paper, we propose a CNN based model for automatic detection of lung cancer
provided lung CT scan image. We proposed an algorithm known as CNN based Automatic
Lung Cancer Detection (CNN-ALCD) which is based on supervised learning phenomenon.
The learned model is capable of detecting lung cancer from any newly arrived test sample. The
proposed solution has different mechanisms such as pre-processing, building CNN with
different layers, training the CNN model and performing lung cancer detection. Empirical
study revealed that the proposed CNN based model outperforms many existing neural network
based methods with highest accuracy 94.11%. Therefore, the proposed system can be integrated
with a Clinical.Decision Support System (CDSS) in healthcare units for automatic diagnosis
of lung cancer
Lung cancer is the stereotypical cancer after breast cancer in this era. The survival rate in this
cancer is less than other cancers as well. Lung cancer screening can help find cancer at an early
stage. If the disease is found and treated at an early stage, the chances of recovery are more.
Computed Tomography (CT) is the most preferred and effective way of lung cancer screening.
However, visual interpretation of CT scan images is quite difficult, time con suming and may
lead to wrong interpretation of the malignancy. Therefore, computer aided techniques are
required for proper and accurate detection of the lung diseases. There are several techniques
available in the literature. In this paper, we propose a novel approach of lung cancer detection
and classification by image processing of the CT scan. We apply different preprocessing
techniques for smoothing and image enhancement. Then we apply thresholding and edge
detection for segmentation of the region of interest (ROI) of the lung tumor. Finally, we
compute several geometrical features of the extracted ROI and classify them into severity levels
as Benign and Malignant using support vector machine (SVM) classifier. We find significant
accuracy in detection of lung cancer nodules and estimation of the severity level using our
proposed method.
Lung cancer is the malignant tumor with the highest morbidity and mortality, and it is a great
threat to human health. With the increasing refinement of lung cancer images, it provides a lot
of useful information for the analysis and identification of lung cancer, and an important help
to assist doctors in making accurate diagnosis. A considerable part of lung cancer manifests as
nodules in the early stage. Pulmonary nodules are round or irregular lesions in the lungs, about
34% are lung cancers, and the rest are benign lesions. Therefore, the detection of pulmonary
nodules is very important for the detection of early lung cancer. In this paper, some Computed
Tomography (CT) images of the Lung Image Database Consortium (LIDC) dataset are adopted
as training and testing data, data preprocessing is completed by intercepting pixels,
normalization and other methods, data enhancement is realized such as rotation and scaling
methods, and the pulmonary nodule sample library is expanded. Utilizing the constructed lung
nodule sample library, train the Convolutional Neural Network (CNN) model, complete the
detection and segmentation of pulmonary nodules, and exact the regions of pulmonary nodules.
The size and regularity features of pulmonary nodules are extracted, and lung cancer
recognition is realized according to the size and shape of pulmonary nodules. The experiment
results show the lung cancer detection and identification method based on convolutional neural
network with morphological features has higher accuracy
This paper addresses the imperative need for early lung cancer detection by developing an
automated methodology using PET/CT images. Utilizing Contrast Limited Adaptive
Histogram Equalization (CLAHE) and Wiener filtering for image pre processing, the lung
regions of interest were extracted employing morphological operators. Haralick statistical
texture features were extracted to enhance cancer region delineation. Fuzzy C means (FCM)
clustering was employed for classification into normal and abnormal regions. The proposed
methodology, implemented using MATLAB, demonstrated a robust performance with an
overall accuracy of 92.67% in classifying and detecting lung cancer from PET/CT images, as
evaluated using Receiver Operating Characteristics (ROC) curve analysis.
CHAPTER 3
SYSTEM REQUIREMENTS
System Requirement Specification (SRS) is a central report, which frames the establishment
of the product advancement process. It records the necessities of a framework as well as has
the depiction of its significant highlight. An SRS is essentially an association's seeing (in
composing) for a client or potential customer's frame work necessities and its conditions at a
specific point in time (generally) before any genuine configuration or improvement work. It's
a two-way protection approach that guarantees that both the customer and the association
comprehend alternate's necessities from that viewpoint at a given point in time.
The SRS talks about the item however not the venture that created it, consequently the SRS
serves as a premise for later improvement of the completed item. The SRS may need to be
changed, however it gives an establishment to proceed with creation assessment. In
straightforward words, programming necessity determination is the beginning stage of the
product improvement action. The SRS means deciphering the thoughts in the brains of the
customers – the information, into a formal archive – the yield of the prerequisite stage.
Subsequently the yield of the stage is a situated of formally determined necessities, which
ideally are finished and steady, while the data has none of these properties.
Intel Core is a brand name that Intel uses for various mid-range to high-end consumer and
business microprocessors. As of 2015 the current lineup of Core processors included the Intel
Core i7, Intel Core i5, and Intel Core i3. 5th generation Intel® Core™ i5 processors empower
new innovations like Intel® Real Sense™ technology— bringing you features such as gesture
control, 3D capture and edit, and innovative photo and video capabilities to your devices. Enjoy
stunning visuals, built-in security, and an automatic burst of speed when you need it with Intel®
Turbo Boost Technology 2.0.
3.1.2 RAM
Functional Requirements
This section describes the functional requirements of the system for those requirements which
are expressed in the natural language style.
These are requirements that are not functional in nature, that is, these are constraints within
which the system must work.
• The program must be self-contained so that it can easily be moved from one Computer to
another. It is assumed that network connection will be available on the computer on which the
program resides.
➢ The system shall achieve 100 per cent availability at all times.
➢ The system shall be scalable to support additional clients and volunteers.
• Maintainability.
CHAPTER 4
SYSTEM ANALYSIS
Analysis is a detailed study of the various operations performed by a system and their
relationships within and outside of the system. One aspect of analysis is defining the boundaries
of the system and determining whether or not a candidate system should consider other related
systems.
During analysis data are collected on the available files decision points and transactions
handled by the present system. This solves gathering information and using structured tools for
analysis System analysis and design are the application of the system approach to problem
solving generally using computers. To reconstruct a system the analyst ma consider its elements
output and inputs processors, controls, feedback and environment
Recent advancements in deep learning have led to the development of more sophisticated
models for medical image analysis, including lung cancer detection. These models, particularly
those based on convolutional neural networks (CNNs), can analyze imaging data more
effectively, offering higher sensitivity and specificity compared to traditional CAD systems,
thus reducing false positives and false negatives. They can autonomously analyze large
volumes of imaging data, potentially easing the workload on radiologists and enabling faster
diagnoses. Deep learning models can automatically learn and extract relevant features from
imaging data, improving the detection of subtle and early-stage lung cancers. However, these
models require large, annotated datasets for training, significant computational power, and
specialized hardware, which may not be available in all clinical settings. Additionally, models
trained on specific datasets may not generalize well to data from different populations or
imaging equipment, necessitating further validation and adaptation. Despite these challenges,
deep learning approaches hold considerable promise for enhancing early detection of lung
cancer, improving patient outcomes, and reducing mortality rates.
Despite advancements, traditional lung cancer screening methods like chest X-rays and CT
scans suffer from low sensitivity, high costs, radiation exposure, and frequent false positives.
Computer-aided detection (CAD) systems improve detection but generate many false positives
and still rely on radiologists, leading to time consumption and potential human error. Deep
learning models offer higher accuracy but require large annotated datasets, significant
computational power, and may not generalize well across different populations or imaging
equipment, necessitating further validation and adaptation.
Easy to use: The main objective of this project is to develop a platform which will be simple
and easy to use, as here one must provide the patient's medical details and based on the features
extracted the algorithm will then detect the lung disease and spot its type. As here algorithm
does the task hence a well-trained model is less bound to make errors in predicting the lung
disease and its type hence, in short accuracy is improved and thereby it also saves time and
makes easier for doctors as well as patients to predict whether they are prone to any type of
lung disease or not, which is otherwise we difficult to do without doctor's involvement
No human intervention required: To detect the lung disease one must provide medical details
such as CT Scan and here the algorithm will provide the results based on the features extracted
and hence here chances of error been made are very minimum since there is no human
intervention and it also saves lot of time for the patients or doctors and they can further proceed
for treatments or other procedures must faster. This is in case when results are provided faster
to them. This can in-turn make the precaution/prevention process of lung treatment a lot faster
when it saves doctors and patient the crucial time, so they can go on to further treatments and
precautions to be taken to minimize the impact of that heart disease
Efficient use of available annotated data samples: There is large consent that successful
training of machine learning algorithms requires many thousand annotated training samples.
Hence, we use a network and training strategy that relies on the strong use of data pre-
processing to use the available annotated samples more efficiently. As medical data is not
available in a large bulk (more than or up to thousands of samples, according to machine
learning standards) we use data preprocessing to make use of the available data more
efficiently. Data pre-processing is an essential to data mining technique that involves
transforming raw data into an understandable format. Real-world medical data is often
incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain
many errors. Data pre-processing is a proven method of resolving such issues. Data pre-
processing prepares raw data for further processing.
Technical Feasibility
This is considered with specifying equipment and software that will successful satisfy the user
requirement the technical needs of the system may vary considerably but might include
Economic Feasibility
Economic analysis is the most frequently used technique for evaluating the effectiveness of a
proposed system. More commonly known as cost/benefit analysis. The procedure is to
determine the benefits and savings are expected form a proposed system and a compare them
with costs. It benefits outweigh costs; a decision is taken to design and implement the system
will have to be made if it is to have a chance of being approved. There is an ongoing effort that
improves in accuracy at each phase of the system life cycle.
Operational Feasibility
CHAPTER 5
SYSTEM DESIGN
The system “design” is defined as the process of applying various requirements and permits it
physical realization. Various design features are followed to develop the system the design
specification describes the features of the system, the opponent or elements of the system and
their appearance to the end-users.
Level: 0
Level 0 describes the overall process of this project. We are passing dataset as a input the
system will detects the Lung disease using machine learning algorithms.
Level: 1
Level 1 Describes the first stage process of this project. we are passing dataset as input to the
system will extract features.
Level 2:
Level 2: In the first stage of this project, we extract features from medical imaging data, such
as X-rays or CT scans, using machine learning models. These features are then used to detect
whether a person has lung cancer or not, distinguishing between benign and malignant types
of cancer.
Level 3:
In the second stage of detection, the system further analyzes the malignant cases identified in
Level 2. It classifies the type of malignant cancer into three categories: carcinoma, squamous
carcinoma, and adenocarcinoma. This stage represents the final step of detection, providing
detailed information about the specific type of lung cancer present
Use case diagram is a type of static structure diagram that describes the structure of a system
by showing the system's classes, their attributes, operations (or methods), and the relationships
among objects. The class diagram is the main building block of object oriented modelling. It is
used for general conceptual modelling of the systematic of the application, and for detailed
modelling translating the models into programming code. Class diagrams can also be used for
data modelling. The classes in a class diagram represent both the main elements, interactions
in the application, and the classes to be programmed. It indicates that one of the two related
classes (the subclass) is considered to be a specialized form of the other (the super type) and
the superclass is considered a Generalization of the subclass. Any instance of the subtype is
also an instance of the superclass. The graphical representation of a Generalization is a hollow
triangle shape on the superclass end of the line (or tree of lines) that connects it to one or more
subtypes.
The generalization relationship is also known as the inheritance or "is a" relationship. The
superclass (base class) in the generalization relationship is also known as the "parent",
superclass, base class, or base type. The subtype in the specialization relationship is also known
as the “child”, derived class, derived types, inheriting class, or inheriting type.
Sequence Diagram:
Sequence diagram shows object interactions arranged in time sequence. It depicts the objects
and classes involved in the scenario and the sequence of messages exchanged between the
objects needed to carry out the functionality of the scenario. Sequence diagrams are typically
associated with use case realizations in the Logical View of the system under development.
Sequence diagrams are sometimes called event diagrams or event scenarios
CHAPTER 6
IMPLEMENTATION
For the first stage of detection the dataset includes the X-rays are labeled with two classes:
"Benign" and "Malignant."
For the second stage of detection, the dataset includes detailed information about the malignant
cases identified in the first stage. It comprises features related to the specific types of lung
cancer:
➢ Carcinoma
➢ Adenocarcinoma
➢ Squamous Cell Carcinoma
These datasets are curated to facilitate the training and evaluation of deep learning models for
lung cancer prediction
A large amount of information represented in graphic form is easier to understand and analyze.
Some companies specify that a data analyst must know how to create slides, diagrams, charts,
and templates. In our approach, we detect whether the patients are diabetic or not.
Transformation: This involves changing data format to one form to other that is making them
most understandable by doing normalization, smoothing, and generalization, aggregation
techniques on data.
Integration: Data that we need not process may not be from a single source sometimes it can
be from different sources we do not integrate them it may be a problem while processing
integration is one of important phase in data pre-processing and different issues considered
here to integrate.
Reduction: When we work on data it may be complex and it may be difficult to understand
sometimes so to make them understandable to system we will reduce them to required format
so that we can achieve good results.
6.5.1 AlexaNet
spatial dimensions are preserved, followed by another max-pooling layer and batch
normalization. The third convolutional layer consists of 384 filters of size 3x3 with 'same'
padding, followed by batch normalization. The fourth and fifth convolutional layers each use
384 and 256 filters of size 3x3 with 'same' padding, respectively, both followed by batch
normalization. The fifth layer is also followed by a max-pooling layer.After the convolutional
layers, the output is flattened and fed into two fully connected layers, each with 4096 units and
ReLU activation. Dropout regularization with a rate of 0.5 is applied after each fully connected
layer to prevent overfitting. The final layer is a SoftMax classifier with 2 units, designed for
binary classification tasks. The model is compiled with categorical cross entropy loss and the
Adam optimizer, aiming to maximize accuracy. Training is conducted using
ImageDataGenerators for data augmentation, with the training data rescaled by a factor of
1/255 and augmented with random shear, zoom, and horizontal flips. The validation data is also
rescaled by a factor of 1/255. Finally, the trained model is saved to a file named
'AlexNetModel.h5'. This AlexNet implementation balances computational complexity and
accuracy, making it suitable for image classification tasks.
6.5.2 EfficientNetB0
performance is evaluated using a confusion matrix, which visualizes true versus predicted
labels, providing detailed insights into the model's classification accuracy and potential areas
for improvement. This implementation of EfficientNetB0 is designed to balance speed, size,
and accuracy, making it well-suited for deployment in resource-constrained environments
while maintaining robust performance in image classification tasks.
The learning algorithm finds patterns in the training data that map the input data attributes to
the target (the answer that you want to predict), and it outputs an ML model that captures these
patterns.
20% of the data in dataset is used for testing. In testing phase the model is applied to new set
of data. The training and test data are two different datasets. The goal in building a machine
learning model is to have the model perform well. On the training set, as well as generalize
well on new data in the test set. Once the build model is tested then we will pass real time data
for the prediction.
6.7 ACCURACY
Classification accuracy is what we usually mean, when we use the term accuracy. It is the ratio
of number of correct prediction to the total number of input samples.
In this project the prediction was obtained from a random set of diabetic attribute values. The
accuracy obtained using logistic regression algorithm was 97% .And by using random forest
algorithm a actual accuracy of 99% is obtained.
import streamlit as st
import tensorflow as tf
import numpy as np
from PIL import Image
# Constants
IMG_SIZE = (224, 224)
CLASSES = ['Benign', 'Malignant']
# Load model
@st.cache_resource
def load_my_model():
model =
tf.keras.models.load_model("C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_
cancer/CODE_Lung_cancer/AlexNetModel.h5")
return model
# Resize the image to match the input size expected by the model (227x227)
img_resized = img_rgb.resize((227, 227))
return img_expanded
def main():
st.title("Early Stage Lung Cancer Detection")
Early detection of lung cancer can significantly improve outcomes and increase the
chances of successful treatment.
This app allows you to upload chest X-ray images for early stage detection of lung
cancer.
"""
)
st.image("C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE
_Lung_cancer/formatting/6.jpg", use_column_width=True)
elif risk_factors_button:
st.write(
"""
There are several risk factors associated with the development of lung cancer. These
include smoking, exposure to secondhand smoke,
exposure to radon gas, exposure to asbestos and other carcinogens, family history of
lung cancer, and a history of certain lung diseases.
It is important to be aware of these risk factors and take steps to reduce your risk of
developing lung cancer.
"""
)
st.image("C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE
_Lung_cancer/formatting/7.jpg", use_column_width=True)
elif symptoms_button:
st.write(
"""
Lung cancer may not cause any symptoms in its early stages. However, as the cancer
progresses, it can cause symptoms such as persistent cough,
coughing up blood, chest pain, hoarseness, shortness of breath, wheezing, fatigue,
unexplained weight loss, and recurrent respiratory infections.
It is important to see a doctor if you experience any of these symptoms, as they may
indicate lung cancer or another serious condition.
"""
)
st.image("C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE
_Lung_cancer/formatting/8.jpg", use_column_width=True)
elif screening_button:
st.write(
"""
Screening tests are available for the early detection of lung cancer in individuals at high
risk. These tests may include low-dose computed tomography (CT) scans,
chest X-rays, and sputum cytology. Screening is recommended for individuals aged 55
to 80 years who have a history of heavy smoking and currently smoke
or have quit within the past 15 years. If you are at high risk of lung cancer, talk to your
doctor about whether lung cancer screening is right for you.
"""
)
st.image("C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE
_Lung_cancer/formatting/2.jpg", use_column_width=True)
# Preprocess image
img = preprocess_image(Image.open(uploaded_file))
# Load model
model = load_my_model()
2. About Adenocarcinoma
Adenocarcinoma is the most common type of lung cancer, typically originating in the outer
regions of the lungs. It grows relatively slowly compared to other lung cancers, allowing for
potential early detection and treatment initiation. Originating from cells lining the air sacs of
the lungs, adenocarcinoma may form glandular structures. Despite its slower growth rate,
adenocarcinoma can metastasize, emphasizing the importance of timely treatment.
st.write(
"""
1. Causes of Occurrence of Squamous Cell Carcinoma
Squamous cell carcinoma of the lung is primarily caused by long-term smoking and exposure
to tobacco smoke. Other risk factors include exposure to environmental pollutants, such as
asbestos, radon, and certain chemicals. Chronic inflammation of the lungs due to conditions
like chronic bronchitis or tuberculosis can increase the risk of developing squamous cell
carcinoma. Individuals with a history of human papillomavirus (HPV) infection may have a
higher risk of developing squamous cell carcinoma in the lung.
carcinoma tends to grow and spread rapidly, presenting challenges in treatment and
management. Treatment options for large cell carcinoma may include surgery, chemotherapy,
radiation therapy, or a combination of these modalities. Prognosis varies accordingly.
normal_folder =
"C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE_Lung_cancer
/secondstagedetection/secondstagedetection/test/normal/"
CHAPTER 7
SYSTEM TESTING
The purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality
of components, sub assemblies, assemblies and/or a finished product It is the process of
exercising software with the intent of ensuring that the Software system meets its requirements
and user expectations and does not fail in an unacceptable manner. There are various types of
test. Each test type addresses a specific testing requirement.
Unit testing involves the design of test cases that validate that the internal program logic is
functioning properly, and that program inputs produce valid outputs. All decision branches and
internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration. This is a
structural testing, that relies on knowledge of its construction and is invasive. Unit tests perform
basic tests at component level and test a specific business process, application, and/or system
configuration. Unit tests ensure that each unique path of a business process performs accurately
to the documented specifications and contains clearly defined inputs and expected results.
Integration tests are designed to test integrated software components to determine if they
actually run as one program. Testing is event driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that although the components were
individually satisfaction, as shown by successfully unit testing, the combination of components
is correct and consistent. Integration testing is specifically aimed at exposing the problems that
arise from the combination of components.
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user manuals.
Functional testing is centered on the following items:
System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system testing is the
configuration oriented system integration test. System testing is based on process descriptions
and flows, emphasizing pre-driven process links and integration points.
White Box Testing is a testing in which in which the software tester has knowledge of the inner
workings, structure and language of the software, or at least its purpose. It is purpose. It is used
to test areas that cannot be reached from a black box level.
Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, as most other kinds of tests,
must be written from a definitive source document, such as specification or requirements
document, such as specification or requirements document. It is a testing in which the software
under test is treated, as a black box .you cannot “see” into it. The test provides inputs and
responds to outputs without considering how the software works.
Unit testing is usually conducted as part of a combined code and unit test phase of the software
lifecycle, although it is not uncommon for coding and unit testing to be conducted as two
distinct phases.
Field testing will be performed manually and functional tests will be written in detail.
Software integration testing is the incremental integration testing of two or more integrated
software components on a single platform to produce failures caused by interface defects. The
task of the integration test is to check that components or software applications, e.g.
components in a software system or – one step up – software applications at the company level
– interact without error.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
CHAPTER 8
SNAPSHOTS
Snapshot 1:
Snapshot 2:
Snapshot 3:
Snapshot 4:
Snapshot 5:
Snapshot 6:
Moreover, the introduction discusses the historical trend of late-stage diagnoses, leading to
limited treatment options and poorer prognoses for patients. It also addresses the subjectivity
inherent in interpreting imaging results and histopathological analyses, which can introduce
variability into diagnoses.
Recognizing these limitations, the introduction advocates for the adoption of deep learning,
specifically CNNs, renowned for their capacity to discern complex patterns within extensive
datasets autonomously. By leveraging deep learning methodologies, the study aims to
transcend the constraints of traditional diagnostic methods and achieve more accurate and
timely detection of lung cancer.
The introduction not only identifies the challenges in lung cancer diagnosis but also proposes
a promising solution through the application of deep learning techniques. It sets forth the
research initiative's objective to revolutionize lung cancer detection, ultimately aiming to
enhance patient outcomes and advance the field of oncology
REFERENCES
[1] K.Punithavathy, M.M.Ramya, Sumathi Poobal, "Analysis of statistical texture features for
automatic lung cancer detection in PET/CT images", International Conference on Robotics,
Automation, Control and Embedded systems(RACE ),IEEE ,18-20 February 2015.
[2] Badrul Alam Mia, Mohammad Abu Yusuf , "Detection of lung cancer from CT image using
image processing and neural network", International conference on Electrical Engineering and
Information Communication Technology (ICEEICT) ,IEEE ,May, 2015.
[3] Anita Chaudhary, Sonit Sukhraj Singh “Lung cancer detection on CT images using image
processing”, computing sciences 2012 international conference, IEEE, 2012.
[4] Nooshin Hadavi, Md Jan Nordin, Ali Shojaeipour , “Lung cancer diagnosis using CT-scan
images based on cellular learning automata”, International conference on Computer and
Information Sciences(ICCOINS), IEEE , 2014.
[5] Moitra, D., & Kr. Mandal, R. Classification of non-small cell lung cancer using one
dimensional convolutional neural network. Expert Systems with Applications, 159, 113564.
P1-12.2020
[6] “Non-Small Cell Lung Cancer”, NCCN clinical practice guidelines in oncology, version
6.2020, June 15, 2020.
[7] Asuntha, A. and Srinivasan, A., “Deep learning for lung Cancer detection and
classification”, Multimedia Tools and Applications, pp.1-32, 2020