0% found this document useful (0 votes)

67 views60 pages

Final Report

Uploaded by

charankarna3072002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views60 pages

Final Report

Uploaded by

charankarna3072002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Visvesvaraya Technological University

Jnana Sangama, Belagavi – 590018

A PROJECT REPORT ON

“PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL”

Submitted in partial fulfilment of the requirement for the award of a Degree of

BACHELOR OF ENGINEERING
In
Computer Science And Engineering & Artificial Intelligence Engineering
Submitted by

Ms. VIMALA K V (1VE20CA024)

Ms. DHRUTHI S (1VE20CA006)

Ms. AISHWARYA RAJU S (1VE20CS008)

Mr. CHARAN N (1VE20CS030)

Under the Guidance of

Dr . SUMA T
Professor , Dept of CSE
SVCE, Bengaluru.

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

& ARTIFICAL INTELLIGENCE
SRI VENKATESHWARA COLLEGE OF ENGINEERING BANGALORE - 562 157.
2023-2024
SRI VENKATESHWARA COLLEGE OF ENGINEERING
Vidyanagar, Bengaluru-562157
Department of CSE and CSE-AI

CERTIFICATE

This is to certify that the project entitled “PREDICTION OF LUNG CANCER USING DEEP LEARNING
MODEL” carried out by Ms. VIMALA K V(1VE20CA024), Ms. DHRUTHI S(1VE20CA006), Ms.
AISHWARYA RAJU S (1VE20CS008), Mr. CHARAN N (1VE20CA030), a bonafide student of Sri
Venkateshwara College of Engineering, in partial fulfilment for the award of Bachelor of Engineering in
Computer Science and Engineering and Artificial Intelligence of Visvesvaraya Technological University,
Belgaum during the academic year 2023-2024. It is certified that all corrections/suggestions indicated for Internal
Assessment have been incorporated in the Report deposited in the departmental library.

……………………… ……………………. ….…………………. ………………………..

Signature of Guide Signature of the HOD Signature of the HOD Signature of the Principal
Dr .Suma T Dr. Prathima V R Dr. Hema M S Dr. Nageswara Guptha M
prof., Dept. of CSE Prof. & Head, Dept. of CSE-AI Prof. & Head, Dept of Principal,
SVCE, Bengaluru. SVCE, Bengaluru. CSE, SVCE, Bengaluru. SVCE, Bengaluru.

External viva-voice

Name of the Examiners Signature with Date

1……………………………………... ……………………………….

2…………………………….......... ………………………………..
DECLARATION

We Ms. VIMALA K V (1VE20CA024), Ms. DHRUTHI S (1VE20CA006), Ms. AISHWARYA

RAJU S (1VE20CS008), Mr. CHARAN N(1VE20CA0030) students of final semester B.E in
Department of Computer Science & Engineering and Artificial Intelligence Engineering, Sri
Venkateshwara College of Engineering. Bengaluru, hereby declare that the Project report entitled,
entitled “PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL” has been
carried out under supervision Dr . SUMA T, Professor, Department of CSE, SVCE, Bengaluru. The
partial fulfilment of the requirement for the award of Degree of Bachelor of Engineering in Computer
Science and Engineering & Artificial Intelligence Engineering by Visvesvaraya Technological University,
Belagavi during the academic year 2023-24. Further, the matter embodied in the dissertation has not
been submitted previously by anybody for the award of any degree or diploma to any other
institute/University.

Date: Place: Bengaluru

Ms. VIMAKA K V 1VE20CA024

Ms. DHRUTHI S 1VE20CA006

Ms. AISHWARYA RAJU S 1VE20CS008

Mr. CHARAN N 1VE20CS030
ACKNOWLEDGEMENT

The euphoria and complacency of completing this technologically advanced project will not
be complete until we thank all the people who have helped us in completing this enthusiastic
work. Submission of this project marks a milestone in our academic career.

It is our privilege to express heartfelt gratitude to the management of SVCE and our beloved
Principal Dr. Nageswara Guptha M, Sri Venkateshwara College of Engineering Bengaluru,
for providing necessary support and encouragement.

We are thankful to Dr. Hema M S, Head, Department of Computer Science & Engineering
and Dr. Prathima V R, Head, Department of Computer Science Engineering-Artificial
Intelligence for Overall guidance, and co-operation.

I would like to express our sincere thanks to the project coordinator Dr. POORNIMA G R,
Professor and Dean Academics, SVCE Bengaluru, for guidance and support in bringing this
project to completion.

I would also like to express our sincere thanks to the Internal guide, Dr. SUMA T, Professor,
Department of Computer Science and Engineering, SVCE, Bengaluru.

Finally, we would like to express our heart full thanks to parents and friends for their invaluable
help, constant support and motivation for helping us to complete our project work successfully.

Ms. VIMALA K V(1VE20CA024)

Ms. DHRUTHI S(1VE20CA006)
Ms. AISHWARYA RAJU S (1VE20CS008)
Mr. CHARAN N(1VE20CS030)
INSTITUTE VISION
To be a premier institute for addressing the challenges in global perspective.

INSTITUTE MISSION
M-1: Nurture students with professional and ethical outlook to identify needs, analyze, design
and innovate sustainable solutions through lifelong learning in service of society as individual
or a team.

M-2: Establish State of the Art Laboratories and Information Resource center for education
and Research.

M-3: Collaborate with Industry, Government Organization and Society to align the curriculum
and outreach activities.
PROGRAM OUTCOMES
Engineering Graduates will be able to:

PO1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering

fundamentals, and an engineering specialization to the solution of complex engineering
problems.

PO2. Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of mathematics,
natural sciences, and engineering sciences.

PO3. Design/development of solutions: Design solutions for complex engineering problems

and design system components or processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural, societal, and environmental
considerations.

PO4. Conduct investigations of complex problems: Use research-based knowledge and

research methods including design of experiments, analysis and interpretation of data, and
synthesis of the information to provide valid conclusions.

PO5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and
modern engineering and IT tools including prediction and modeling to complex engineering
activities with an understanding of the limitations.

PO6. The engineer and society: Apply reasoning informed by the contextual knowledge to
assess societal, health, safety, legal and cultural issues and the consequent responsibilities
relevant to the professional engineering practice.

PO7. Environment and sustainability: Understand the impact of the professional

engineering solutions in societal and environmental contexts, and demonstrate the knowledge
of, and need for sustainable development.
PO8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities
and norms of the engineering practice.

PO9. Individual and team work: Function effectively as an individual, and as a member or
leader in diverse teams, and in multidisciplinary settings.

PO10. Communication: Communicate effectively on complex engineering activities with the

engineering community and with society at large, such as, being able to comprehend and write
effective reports and design documentation, make effective presentations, and give and receive
clear instructions.

PO11. Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.

PO12. Life-long learning: Recognize the need for and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological change.
ABSTRACT

Lung cancer, responsible for over 1.8 million deaths annually, remains a critical global health
issue. Early detection significantly improves patient outcomes and reduces mortality rates, yet
current screening methods, such as chest X-rays and CT scans, have notable limitations. Chest
X-rays often fail to detect small nodules or early-stage cancers, leading to delayed diagnoses.
While CT scans, particularly low-dose CT (LDCT), are more sensitive, they are costly, expose
patients to radiation, and frequently yield false positives, resulting in unnecessary biopsies and
patient anxiety. Computer-aided detection (CAD) systems attempt to mitigate these issues by
assisting radiologists with pattern recognition and image processing, but they still produce a
high number of false positives, have limited sensitivity, and rely on radiologists' interpretations,
making the process time-consuming and prone to human error.

To address these challenges, we propose a two-stage deep learning model designed to enhance
the early detection and classification of lung cancer. In the first stage, our model uses
convolutional neural networks (CNNs) to analyze CT scan data, distinguishing between benign
and malignant lung nodules. CNNs are particularly effective in identifying spatial hierarchies
in images, crucial for accurately detecting lung nodules. The model outputs the classification
along with a confidence level, indicating the certainty of the prediction. This stage aims to
reduce the false negative rate, ensuring that more malignant cases are identified early on, while
also minimizing false positives to reduce unnecessary follow-ups.

In the second stage, the model further analyzes the identified malignant nodules, classifying
them into one of four specific types: adenocarcinoma, squamous cell carcinoma, large cell
carcinoma, normal . This detailed classification is achieved through additional layers of CNNs,
which refine the features extracted from the malignant nodules. The output includes the specific
type of malignant nodule and a confidence level for each classification, providing a
comprehensive assessment that aids in precise diagnosis and treatment planning. This two-
stage approach is designed for integration into clinical workflows, offering real-time, detailed
assessments to assist radiologists. By enhancing the accuracy and speed of lung cancer
detection and providing detailed classifications, our model aims to facilitate timely
interventions and improve patient outcomes, ultimately contributing to a reduction in lung
cancer mortality rates.
TABLE OF CONTENTS

Chapter No. Chapter Name Page No

1. Introduction 1-8
1.1 Objectives 2
1.2 ML Prediction Techniques 2-3
1.3 Python Introduction 4
1.3.1 Python Libraries 5-8

2 Literature Survey 9-12

2.1 Objectives 10
2.1.1 Paper 1 10
2.1.2 Paper 2 11
2.1.3 Paper 3 11-12
2.1.4 Paper 4 12

3 System Requirements 13-16

3.1 Hardware Requirements 13
3.1.1 CPU-Intel Core I5 13-14
3.1.2 Ram 14
3.1.3 Hard Disk 14-15
3.2 Software Requirements 15-16
4 System Analysis 17-21
4.1 Introduction 17
4.2 Problem Statement 17
4.3 Existing System 17-18
4.3.1 Drawbacks Of Existing 18
System
4.4 Proposed System 19
4.4.1 Advantages Of Proposed 19-20
System
4.5 Feasibility Study 19-21

5 System Design 22-25

5.1 System Architecture 22
5.2 Data Flow Diagram 22-25

6 Implementation 26-38
6.1 Data Source 26
6.2 Dataset Collection 26
6.3 Data Visualization 27
6.4 Data Pre-Processing 27
6.5 Machine Learning Algorithms 27-29
6.5.1 Alexa Net 27-28
6.5.2 Efficientnetb0 28-29
6.7 Accuracy 30
6.8 Source Code 30-38

7 System Testing 39-41

7.1 Types Of Tests 39
7.1.1 Unit Testing 39
7.1.2 Integration Testing 40
7.1.3 Functional Test 40
7.1.4 System Test Design 40
7.1.5 White Box Testing 40
7.1.6 Black Box Testing 40-41
7.1.7 Unti Testing 41
7.1.7.1 Test Strategy & Approach 41
7.1.7.2 Test Objectives 41
7.1.7.3 Features To Be Tested 41
7.1.8 Integration Testing 41
Conclusion & Future
Enhancement
References
LIST OF FIGURES

Figure No. Figure Name Page No.

3.1 Intel Core 13
3.2 Ram 8 Gb 14
3.3 Hard Disk 15
5.2.1 Use Case Diagram 24
5.2.2 Sequence Diagram 25
6.2.1 Dataset 26
6.6.1 Train/Test Split 29
8.1 Home Page 42
8.2 Overview Of Lung Cancer 42
8.3 First Stage Detection(Malignant) 43
8.4 Second Stage Detection(Malignant) 43
8.5 First Stage Detection(Bengin) 44
8.6 Accuracy Of The Model 44
PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

CHAPTER 1
INTRODUCTION

Deep learning models have revolutionized lung cancer detection, significantly improving early
diagnosis and treatment outcomes. Historically, lung cancer, a leading cause of cancer-related
deaths worldwide, posed significant challenges due to late-stage diagnosis. However, with the
advent of deep learning in medical imaging, a transformative shift has occurred. Deep
learning's journey in lung cancer detection began with the pioneering work of researchers who
developed convolutional neural networks (CNNs) for image analysis. These models, inspired
by the human visual system, have shown remarkable capabilities in recognizing patterns and
features indicative of lung cancer on radiological images.

Early studies in the 2010s laid the groundwork for deep learning's application in lung cancer
detection, demonstrating its potential in accurately identifying malignant nodules from chest
X-rays and CT scans. Over the years, with the accumulation of large-scale medical image
datasets and advancements in computational power, deep learning models have become
increasingly sophisticated. The introduction of architectures like Efficientnet and Alexanet
further improved model performance, enabling finer segmentation of tumors and more precise
localization of abnormalities.

The adoption of deep learning in lung cancer diagnosis accelerated in the mid-2010s, with
researchers and healthcare institutions worldwide leveraging these models to enhance
radiologists' diagnostic capabilities. By providing automated analysis and assisting in the
interpretation of medical images, deep learning has significantly reduced the time and resources
required for diagnosis while improving accuracy and consistency.

Today, deep learning continues to play a pivotal role in lung cancer management, with models
capable of not only detecting tumours but also predicting patient prognosis, treatment response,
and disease recurrence. As deep learning algorithms become more sophisticated and data
availability increases, the future holds promise for even more accurate, efficient, and
personalized approaches to lung cancer detection and treatment.

Dept of CSE & CSE-AI , SVCE 2023-24 1

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

1.1 OBJECTIVES

Early Detection: Our deep learning model utilizes advanced image analysis techniques to
detect subtle signs of lung cancer in medical scans, enabling early diagnosis and timely
intervention, crucial for improving patient outcomes.

High Accuracy: With state-of-the-art convolutional neural networks (CNNs), our model
achieves exceptional accuracy in identifying malignant lung nodules, minimizing false
positives and negatives, ensuring reliable diagnosis.

Speed and Efficiency: Leveraging parallel processing and optimized algorithms, our model
rapidly analyzes medical images, providing quick and efficient results, crucial for accelerating
the diagnostic process and facilitating prompt treatment decisions.

Reduction of Human Error: By automating the interpretation of medical scans, our model
reduces reliance on human interpretation, minimizing the risk of diagnostic errors and ensuring
consistency in lung cancer detection.

User-Friendly Interface: Our model features an intuitive interface, allowing healthcare

professionals to easily input medical images and obtain diagnostic results, enhancing workflow
efficiency and facilitating seamless integration into clinical practice.

Reduced Healthcare Costs: By enabling early detection, reducing diagnostic errors, and
streamlining the diagnostic process, our model contributes to significant cost savings in
healthcare, making lung cancer screening and diagnosis more accessible and affordable.

1.2 MACHINE LEARNING PREDICTION TECHNIQUES

Machine learning is an application of artificial intelligence (AI) which provides systems the
ability to automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on the development of computer programs that can
access the data and that data can be used to learn themselves. The process of learning starts
with observations or data, such as examples, direct experience, or instruction, in order to look
for patterns in data and makes better decision in the future based on the examples that we
provide. The primary aim is to allow the computers learn automatically without human

Dept of CSE & CSE-AI , SVCE 2023-24 2

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

intervention of assistance and adjust actions accordingly. Machine learning algorithms are
often categorized as supervised or unsupervised.

Supervised machine learning algorithms can be applied to what has been learned in the past
to new data using labelled examples to predict future events. Starting from the analysis of
known training dataset, the learning algorithm produces an inferred function to make
predictions about the output values. The system will be able to provide targets for any new
input after sufficient training. The learning algorithm can also be compared with its output for
intended output and find errors in order to modify the model accordingly.

Unsupervised machine learning algorithms are used when the information which is used for
training is neither classified nor labelled. Unsupervised machine learning studies about how
systems can infer a function to describe a hidden structure from unlabelled data. The system
does not figure out the right output, but it can explore the data and draw the inferences from
datasets to describe hidden structures from unlabelled data

Semi-supervised machine learning algorithms fall between supervised and unsupervised

learning, since they use both labelled and unlabelled data for training - typically a small amount
of labelled data and a large amount of unlabelled data. The systems which uses this method are
able to considerably improve Seaming accuracy. Usually, semi-supervised learning is chosen
when the acquired labelled data requires skilled and relevant resources in order to train and
learn from it. Otherwise, acquiring unlabelled data generally doesn't require additional
resources.

Reinforcement machine learning algorithms is a learning method that interacts with its
environment by producing actions and discovers errors or rewards. Trial and error search and
delaying in rewards are the most relevant characteristics of reinforcement learning. This
method allows machines and software agents to automatically determine the ideal behavior
within a specific contest in order to maximize its performance. Simple reward feedback is
required for the agent to learn which is best action, this is known as the reinforcement signal.
Machine learning can be implemented to analyze massive quantities of data.

While it generally delivers faster, more accurate results in order to identify profitable
opportunities or dangerous risks, it may also require additional time and the resources to train
it properly. Combining machine learning with Al and cognitive technologies can make it more
effective in processing large volumes of information.

Dept of CSE & CSE-AI , SVCE 2023-24 3

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

1.3 PYTHON INTRODUCTION

Python is an easy to learn, powerful programming language. It has efficient high-level data
structures and simple but effective approach to object-oriented programming. Python's elegant
syntax and dynamic typing, together with its interpreted nature, make it an ideal language for
scripting and rapid application development in many areas on most platforms. The Python
interpreter language which is easily extended with new functions and data types implemented
in C or C++ (or other languages callable from C). Python is suitable as an extension language
for customizable applications. Python is a high-level, interpreted, interactive and object-
oriented scripting language. Python is designed to be highly readable. Where as other languages
use punctuation, python uses English keywords frequently and it has fewer syntactical
constructions than other languages.

Features of python are

• Python is programmers friendly language

• Python is a high-level language.
• Python code is interpreted by interpreter line by line.
• Python is platform independent programming language, its code can easily run on
any platform such as Windows, Linux, UNIX, Macintosh etc. Thus, Python is a
portable language.
• Python is Object-Oriented programming language. It follows object and class
concept.
• Python language is more expressive. The sense of expressive is the code can be
easily understood.
• It is open source so it can be freely downloaded and used.

Dept of CSE & CSE-AI , SVCE 2023-24 4

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

1.3.1 PYTHON LIBRARIES

NumPy

NumPy, is short form for Numerical Python, which is the foundational package for scientific
computing in Python. The majority of the book will be based on NumPy and libraries which
are built on top of NumPy. It provides, among other things

• A fast and efficient multidimensional array object and array

• Functions which are used for performing element-wise computations with arrays or
mathematical operations between arrays
• Tools are used for reading and writing array-based data sets to disk Linear algebraic
operations, Fourier transform, and random number generation
• Tools for integrating connecting C. CH, and Fortran code to Python Beyond the fast
array-processing capabilities that NumPy adds to Python, one of its primary
purposes with regards to data analysis is the primary container for data to be passed
between algorithms. For numerical data, NumPy arrays are a much more efficient
way of storing and manipulating data than the other built-in Python data structures.
Also, libraries written in a lower-level language, such as C or FORTRAN, can
operate on the data stored in a NumPy array without copying any data.

Pandas

Pandas is designed to provide rich data structures and functions to make working with
structured data fast, easy, and expressive. It is, as you will see, one of the critical ingredients
enabling Python to be a powerful and productive data analysis environment. The primary object
of pandas are, that will be used in Data Frame, a two dimensional tabular, column oriented data
structure with both row and column label pandas combine the high performance array
computing features of NumPy with the flexible data manipulation capabilities of spreadsheets
and relational databases (such as SQL). It provides sophisticated indexing functionality to make
it easy to reshape, slice and dice, perform aggregations, and select subsets of data pandas is the
primary tool. For financial users, pandas feature rich, high-performance time series
functionality and tools well-suited for working with financial data. The pandas name itself is
derived from panel data, an econometrics term for multidimensional structured datasets, and
Python data analysis itself.

Dept of CSE & CSE-AI , SVCE 2023-24 5

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

Sklearn

SkLearn is a library in Python that provides many unsupervised and supervised learning
algorithms. It's built upon some of the technology you might already be familiar with, like
NumPy, pandas, and Matplotlib! The functionality that scikit-learn provides include:

• Regression, including Linear and Logistic Regression.

• Classification, including K-Nearest Neighbors.
• Clustering, including K-Means and K-Means++ Model selection.
• Preprocessing, including Min-Max Normalization.

SciPy

SciPy is a collection of packages addressing a number of different standard problem domains

in scientific computing. Here is a sampling of the packages included:

• scipy.integrate: numerical integration routines and differential equation solvers

• scipy.linalg: linear algebra routines and matrix decompositions extending beyond
those provided innumpy.linalg.
• scipy.optimize: function optimizers (minimizes) and root finding algorithms. scipy
signal: signal processing tools
• scipy.special: wrapper around SPECFUN, a Fortran library common mathematical
functions, such as the gamma function
• scipy.stats: standard continuous and discrete probability distributions (density
functions, samplers.continuous distribution functions), various statistical tests, and
more descriptive statistics
• scipy.weave: tool for using inline C++ code to accelerate array computations.
Together NumPy and SciPy form a reasonably complete computational replacement
for much of MATLAB along with some of its add-on toolboxes.

Matplotlib

Matplotlib is a python library used to create 2D graphs and plots by using python scripts. It has
a module named pyplot which makes things easy for plotting by providing feature to control
line styles, font properties, formatting axes etc. It supports a wide variety of graphs and plots

Dept of CSE & CSE-AI , SVCE 2023-24 6

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

namely histogram, bar charts, power spectra, error charts etc. It is used along with NumPy to
provide an environment that is an effective open-source alternative for Matlab. Matplotlib can
also be used with graphics toolkits like PyQt and wxPython.

Tensorflow

TensorFlow, an open-source machine learning library developed by Google Brain, serves as a

cornerstone in building deep learning models for lung cancer prediction. With its robust
ecosystem and powerful tools, TensorFlow simplifies the process of developing, training, and
deploying deep learning models.

TensorFlow provides a flexible framework for constructing various types of neural networks,
including convolutional neural networks (CNNs) commonly used for medical image analysis.
These CNNs excel at learning hierarchical features from medical images, crucial for accurately
detecting lung cancer nodules.

The library offers extensive support for GPU acceleration, allowing models to train faster and
handle large-scale medical image datasets efficiently. This speed and scalability are essential
for processing the vast amounts of data required for accurate lung cancer prediction.

Streamlit

Streamlit is an open-source Python library designed to create powerful and interactive web
applications for data science and machine learning projects. It allows you to build intuitive and
user-friendly web interfaces directly from Python scripts, enabling rapid prototyping and
deployment. Streamlit provides a straightforward and intuitive API, allowing you to create web
apps using familiar Python syntax. You can easily add widgets like sliders, buttons, text inputs,
and plots to your app.With Streamlit, you can quickly turn your Python scripts into web apps.
There's no need to write HTML, CSS, or JavaScript code. You can focus on your data analysis
or machine learning model, and Streamlit handles the rest.Streamlit widgets enable users to
interact with your data or models dynamically. Users can upload files, adjust parameters, and
see real-time updates of the results.Streamlit seamlessly integrates with popular data
visualization libraries like Matplotlib, Plotly, and Altair, allowing you to create interactive
charts and plots directly in your web app.

Dept of CSE & CSE-AI , SVCE 2023-24 7

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

IS PYTHON THE TOOL FOR MACHINE LEARNING?

When it comes to data science, machine learning is one of the most significant elements used
to maximize value from data. With Python as the data science tool, exploring the basics of
machine learning becomes easy and effective. In a nutshell, machine learning is more about
statistics, mathematical optimization, and probability. It has become the most preferred
machine learning tool in the way it allows aspirants to do mathematics easily. Name any math
function, and you have a Python package meeting the requirement. There is NumPy for
numerical linear algebra, Pandas for data manipulation, matplotlib for embedding plots into
applications, seaborn for data visualization, and scikit for providing features like classification,
regression and clustering algorithms. With the grip on the basics of machine learning algorithm
including logistic regression and linear regression, it makes easy to implement machine
learning systems for predictions by the way of its scikit-learn library. It is easy to customize
for neutral networks and deep learning with libraries including Keras, Theano and TensorFlow.
Data science landscape is changing rapidly, and tools used for extracting value from data
science have also grown largely. The two popular languages that fight for the top spot are R
and Python. Both are revered by enthusiasts, and both come with their strengths and
weaknesses. But with the tech giants like Google showing the way to use Python and with the
learning curve made short and easy, python inches ahead to become the most popular language
in the data science world.

Dept of CSE & CSE-AI , SVCE 2023-24 8

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

CHAPTER 2
LITERATURE SURVEY

A literature survey or a literature review in a project report shows the various analyses and
research made in the field of interest and the results already published, taking into account the
various parameters of the project and the extent of the project. Literature survey is mainly
carried out in order to analyze the background of the current project which helps to find out
flaws in the existing system & guides on which unsolved problems we can work out. So, the
following topics not only illustrate the background of the project but also uncover the problems
and flaws which motivated to propose solutions and work on this project. A literature survey is
a text of a scholarly paper, which includes the current knowledge including substantive
findings, as well as theoretical and methodological contributions to a particular topic. Literature
reviews use secondary sources, and do not report new or original experimental work. Most
often associated with academic-oriented literature, such as a thesis, dissertation or a peer-
reviewed journal article, a literature review usually precedes the methodology and results
sectional though this is not always the case. Literature reviews are also common in are search
proposal or prospectus (the document that is approved before a student formally begins a
dissertation or thesis). Its main goals are to situate the current study within the body of literature
and to provide context for the particular reader. Literature reviews are a basis for researching
nearly every academic field.

A literature survey includes the following:

• Existing theories about the topic which are accepted universally.

• Books written on the topic, both generic and specific.
• Research done in the field usually in the order of oldest to latest.
• Challenges being faced and on-going work, if available.

Literature survey describes about the existing work on the given project. It deals with the
problem associated with the existing system and also gives user a clear knowledge on how to
deal with the existing problems and how to provide solution to the existing problems.

Dept of CSE & CSE-AI , SVCE 2023-24 9

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

2.1 OBJECTIVES OF LITERATURE SURVEY

• Learning the definitions of the concepts.
• Access to latest approaches, methods and theories.
• Discovering research topics based on the existing research
• Concentrate on your own field of expertise– Even if another field uses the same words,
they usually mean completely.
• It improves the quality of the literature survey to exclude sidetracks– Remember to
explicate what is excluded. Before building our application, the following system is
taken into consideration:

2.1.1. Convolutional Neural Network based Framework for Automatic Lung

Cancer Detection from Lung CT Images 2022

Lung cancer is one of the major causes of death across the globe. Medical interventions with
modern healthcare facilities are widely used to cure lung cancer. However, it is indispensable
to have research on early detection of lung cancer as it has potential to save lives of people.
With innovations in machine learning Computer Aided Design (CAD) systems for automatic
detection of lung cancer has become an important solution. Particularly deep learning models
such as Convolutional Neural Network (CNN) is found to have necessary mechanisms to learn
features from Computed Tomography (CT) scan images and detect the probability of lung
cancer. In this paper, we propose a CNN based model for automatic detection of lung cancer
provided lung CT scan image. We proposed an algorithm known as CNN based Automatic
Lung Cancer Detection (CNN-ALCD) which is based on supervised learning phenomenon.
The learned model is capable of detecting lung cancer from any newly arrived test sample. The
proposed solution has different mechanisms such as pre-processing, building CNN with
different layers, training the CNN model and performing lung cancer detection. Empirical
study revealed that the proposed CNN based model outperforms many existing neural network
based methods with highest accuracy 94.11%. Therefore, the proposed system can be integrated
with a Clinical.Decision Support System (CDSS) in healthcare units for automatic diagnosis
of lung cancer

Dept of CSE & CSE-AI , SVCE 2023-24 10

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

2.1.2 Lung Cancer Detection and Classification using CT Scan Image

Processing 2021

Lung cancer is the stereotypical cancer after breast cancer in this era. The survival rate in this
cancer is less than other cancers as well. Lung cancer screening can help find cancer at an early
stage. If the disease is found and treated at an early stage, the chances of recovery are more.
Computed Tomography (CT) is the most preferred and effective way of lung cancer screening.
However, visual interpretation of CT scan images is quite difficult, time con suming and may
lead to wrong interpretation of the malignancy. Therefore, computer aided techniques are
required for proper and accurate detection of the lung diseases. There are several techniques
available in the literature. In this paper, we propose a novel approach of lung cancer detection
and classification by image processing of the CT scan. We apply different preprocessing
techniques for smoothing and image enhancement. Then we apply thresholding and edge
detection for segmentation of the region of interest (ROI) of the lung tumor. Finally, we
compute several geometrical features of the extracted ROI and classify them into severity levels
as Benign and Malignant using support vector machine (SVM) classifier. We find significant
accuracy in detection of lung cancer nodules and estimation of the severity level using our
proposed method.

2.1.3 A Lung Cancer Detection and Recognition Method Combining

Convolutional Neural Network and Morphological Features 2022

Lung cancer is the malignant tumor with the highest morbidity and mortality, and it is a great
threat to human health. With the increasing refinement of lung cancer images, it provides a lot
of useful information for the analysis and identification of lung cancer, and an important help
to assist doctors in making accurate diagnosis. A considerable part of lung cancer manifests as
nodules in the early stage. Pulmonary nodules are round or irregular lesions in the lungs, about
34% are lung cancers, and the rest are benign lesions. Therefore, the detection of pulmonary
nodules is very important for the detection of early lung cancer. In this paper, some Computed
Tomography (CT) images of the Lung Image Database Consortium (LIDC) dataset are adopted
as training and testing data, data preprocessing is completed by intercepting pixels,
normalization and other methods, data enhancement is realized such as rotation and scaling
methods, and the pulmonary nodule sample library is expanded. Utilizing the constructed lung
nodule sample library, train the Convolutional Neural Network (CNN) model, complete the

Dept of CSE & CSE-AI , SVCE 2023-24 11

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

detection and segmentation of pulmonary nodules, and exact the regions of pulmonary nodules.
The size and regularity features of pulmonary nodules are extracted, and lung cancer
recognition is realized according to the size and shape of pulmonary nodules. The experiment
results show the lung cancer detection and identification method based on convolutional neural
network with morphological features has higher accuracy

2.1.4 Analysis of statistical texture features for automatic lung cancer

detection in PET/CT images 2015

This paper addresses the imperative need for early lung cancer detection by developing an
automated methodology using PET/CT images. Utilizing Contrast Limited Adaptive
Histogram Equalization (CLAHE) and Wiener filtering for image pre processing, the lung
regions of interest were extracted employing morphological operators. Haralick statistical
texture features were extracted to enhance cancer region delineation. Fuzzy C means (FCM)
clustering was employed for classification into normal and abnormal regions. The proposed
methodology, implemented using MATLAB, demonstrated a robust performance with an
overall accuracy of 92.67% in classifying and detecting lung cancer from PET/CT images, as
evaluated using Receiver Operating Characteristics (ROC) curve analysis.

Dept of CSE & CSE-AI , SVCE 2023-24 12

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

CHAPTER 3

SYSTEM REQUIREMENTS
System Requirement Specification (SRS) is a central report, which frames the establishment
of the product advancement process. It records the necessities of a framework as well as has
the depiction of its significant highlight. An SRS is essentially an association's seeing (in
composing) for a client or potential customer's frame work necessities and its conditions at a
specific point in time (generally) before any genuine configuration or improvement work. It's
a two-way protection approach that guarantees that both the customer and the association
comprehend alternate's necessities from that viewpoint at a given point in time.

The SRS talks about the item however not the venture that created it, consequently the SRS
serves as a premise for later improvement of the completed item. The SRS may need to be
changed, however it gives an establishment to proceed with creation assessment. In
straightforward words, programming necessity determination is the beginning stage of the
product improvement action. The SRS means deciphering the thoughts in the brains of the
customers – the information, into a formal archive – the yield of the prerequisite stage.
Subsequently the yield of the stage is a situated of formally determined necessities, which
ideally are finished and steady, while the data has none of these properties.

3.1 HARDWARE REQUIREMENTS

• Processor Type: Intel Core TM – i5 and above
• Speed : 2.4 GHZ
• RAM : :4 GB RAM
• Hard disk : 80 GB HDD

3.1.1 CPU- INTEL CORE i5

Fig 3.1 INTEL CORE

Dept of CSE & CSE-AI , SVCE 2023-24 13

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

Intel Core is a brand name that Intel uses for various mid-range to high-end consumer and
business microprocessors. As of 2015 the current lineup of Core processors included the Intel
Core i7, Intel Core i5, and Intel Core i3. 5th generation Intel® Core™ i5 processors empower
new innovations like Intel® Real Sense™ technology— bringing you features such as gesture
control, 3D capture and edit, and innovative photo and video capabilities to your devices. Enjoy
stunning visuals, built-in security, and an automatic burst of speed when you need it with Intel®
Turbo Boost Technology 2.0.

3.1.2 RAM

Fig 3.2 RAM 8 GB

When you load up an application on to your computer it loads into your available RAM
memory. It is very quick type of memory. The more programs you load up, the more RAM is
taken up. At the point where you have loaded up enough apps to take up all your free available
physical RAM, your OS will create a swap-file on your hard drive. This file is used as a reserve
for all additional apps you run. The trouble with that is that hard drives are a lot slower to read
and write from than RAM memory is. Therefore, your computer will perform much slower at
that point. Although new generation of SSD hard drives are much faster than your traditional
spinning drive, it is still best to have enough RAM available. If you are using Windows and
want to want to know how much RAM you are using up, you can right click on task bar, then
select start "Task Manager" and on the "performance" tab you will see a green bar indicating
"Memory".

Dept of CSE & CSE-AI , SVCE 2023-24 14

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

3.1.3 HARD DISK

Fig 3.3 Hard Disk Drive

A hard disk drive (HDD), hard disk, hard drive or fixed disk is a data storage device used for
storing and retrieving digital information using one or more rigid ("hard") rapidly rotating disks
(platters) coated with magnetic material. The platters are paired with magnetic heads arranged
on a moving actuator arm, which read and write data to the platter surfaces. Data is accessed
in a random-access manner, meaning that individual blocks of data can be stored or retrieved
in any order rather than sequentially. An HDD retains its data even when powered off.

3.2 SOFTWARE REQUIREMENTS

• Operating System : Windows 64-bit
• Technology : Python
• IDE : Python IDE
• Tools: Anaconda
• Python Version: Python 3.9

Dept of CSE & CSE-AI , SVCE 2023-24 15

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

Functional Requirements
This section describes the functional requirements of the system for those requirements which
are expressed in the natural language style.

1. Create a desktop application.

2. Select the dataset as input to the system.
3. System will preprocess and extract the features.
4. System will apply machine learning algorithms and classify the Diabetes disease.
5. System will show the performance comparison of algorithms.
6. Application should efficiently detect the Diabetes disease.

Non Functional Requirements

These are requirements that are not functional in nature, that is, these are constraints within
which the system must work.

• The program must be self-contained so that it can easily be moved from one Computer to
another. It is assumed that network connection will be available on the computer on which the
program resides.

• Capacity, scalability and availability.

➢ The system shall achieve 100 per cent availability at all times.
➢ The system shall be scalable to support additional clients and volunteers.

• Maintainability.

➢ The system should be optimized for supportability, or ease of maintenance as far as

possible. This may be achieved through the use documentation of coding standards,
naming conventions, class libraries and abstraction.

• Randomness, verifiability and load balancing.

➢ The system should be optimized for supportability, or ease of maintenance as far as

possible. This may be achieved through the use documentation of coding
standards, naming conventions, class libraries and abstraction. It should have
randomness to check the nodes and should be load balanced.

Dept of CSE & CSE-AI , SVCE 2023-24 16

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

CHAPTER 4
SYSTEM ANALYSIS

4.1 INTRODUCTION TO SYSTEM ANALYSIS

A system is an orderly group of interdependent components linked together according to a plan
to achieve specific objective. Its main characteristics are organization, interaction,
interdependence, integration and a central objective

Analysis is a detailed study of the various operations performed by a system and their
relationships within and outside of the system. One aspect of analysis is defining the boundaries
of the system and determining whether or not a candidate system should consider other related
systems.

During analysis data are collected on the available files decision points and transactions
handled by the present system. This solves gathering information and using structured tools for
analysis System analysis and design are the application of the system approach to problem
solving generally using computers. To reconstruct a system the analyst ma consider its elements
output and inputs processors, controls, feedback and environment

4.2 PROBLEM STATEMENT

Lung cancer is a significant global health concern, responsible for over 1.8 million deaths
annually. Despite its high mortality rate, current screening methods for lung cancer lack
accuracy and efficiency. Early detection is crucial for improving patient outcomes and reducing
mortality rates. The aim of this project is to develop a deep learning model for the early
prediction of lung cancer. By analyzing medical imaging data, such as X-rays, our model seeks
to accurately identify early signs of lung cancer nodules. The ultimate goal is to provide a
reliable and efficient tool for early diagnosis, facilitating timely intervention and potentially
saving lives.

4.3 EXISTING SYSTEM

Lung cancer detection has traditionally relied on chest X-rays and computed tomography (CT)
scans. Chest X-rays, though widely used, have low sensitivity and often miss small lung

Dept of CSE & CSE-AI , SVCE 2023-24 17

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

nodules or early-stage cancers, leading to late diagnoses. CT scans, particularly low-dose CT

(LDCT), are more sensitive and can detect smaller nodules but are expensive, expose patients
to radiation, and have a high rate of false positives, resulting in unnecessary biopsies and patient
anxiety. Computer-aided detection (CAD) systems were developed to assist radiologists by
highlighting suspicious areas in imaging studies, using pattern recognition and image
processing techniques. However, CAD systems often produce a high number of false positives,
have limited sensitivity, and still require radiologists to interpret and validate the findings,
which can be time-consuming and prone to human error.

Recent advancements in deep learning have led to the development of more sophisticated
models for medical image analysis, including lung cancer detection. These models, particularly
those based on convolutional neural networks (CNNs), can analyze imaging data more
effectively, offering higher sensitivity and specificity compared to traditional CAD systems,
thus reducing false positives and false negatives. They can autonomously analyze large
volumes of imaging data, potentially easing the workload on radiologists and enabling faster
diagnoses. Deep learning models can automatically learn and extract relevant features from
imaging data, improving the detection of subtle and early-stage lung cancers. However, these
models require large, annotated datasets for training, significant computational power, and
specialized hardware, which may not be available in all clinical settings. Additionally, models
trained on specific datasets may not generalize well to data from different populations or
imaging equipment, necessitating further validation and adaptation. Despite these challenges,
deep learning approaches hold considerable promise for enhancing early detection of lung
cancer, improving patient outcomes, and reducing mortality rates.

4.3.1 DRAWBACKS OF EXISTING SYSTEM

Despite advancements, traditional lung cancer screening methods like chest X-rays and CT
scans suffer from low sensitivity, high costs, radiation exposure, and frequent false positives.
Computer-aided detection (CAD) systems improve detection but generate many false positives
and still rely on radiologists, leading to time consumption and potential human error. Deep
learning models offer higher accuracy but require large annotated datasets, significant
computational power, and may not generalize well across different populations or imaging
equipment, necessitating further validation and adaptation.

Dept of CSE & CSE-AI , SVCE 2023-24 18

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

4.4 PROPOSED SYSTEM

The proposed system is a two-stage deep learning model aimed at enhancing early lung cancer
detection and classification. In the first stage, the model analyzes CT scan data to distinguish
between benign and malignant lung nodules, utilizing convolutional neural networks (CNNs)
for feature extraction and classification, providing a confidence level for its predictions. In the
second stage, it further classifies identified malignant nodules into one of three specific types:
adenocarcinoma, squamous cell carcinoma, large cell carcinoma or small cell lung cancer,
again offering confidence levels for each classification. The model will be trained on large,
annotated datasets and validated through cross-validation and independent test sets. Designed
for integration into clinical workflows, it will assist radiologists by providing real-time,
detailed assessments of lung nodules, potentially improving early diagnosis, treatment
planning, and reducing mortality rates.

4.4.1 ADVANTAGES OF PROPOSED SYSTEM

Easy to use: The main objective of this project is to develop a platform which will be simple
and easy to use, as here one must provide the patient's medical details and based on the features
extracted the algorithm will then detect the lung disease and spot its type. As here algorithm
does the task hence a well-trained model is less bound to make errors in predicting the lung
disease and its type hence, in short accuracy is improved and thereby it also saves time and
makes easier for doctors as well as patients to predict whether they are prone to any type of
lung disease or not, which is otherwise we difficult to do without doctor's involvement

No human intervention required: To detect the lung disease one must provide medical details
such as CT Scan and here the algorithm will provide the results based on the features extracted
and hence here chances of error been made are very minimum since there is no human
intervention and it also saves lot of time for the patients or doctors and they can further proceed
for treatments or other procedures must faster. This is in case when results are provided faster
to them. This can in-turn make the precaution/prevention process of lung treatment a lot faster
when it saves doctors and patient the crucial time, so they can go on to further treatments and
precautions to be taken to minimize the impact of that heart disease

Dept of CSE & CSE-AI , SVCE 2023-24 19

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

Efficient use of available annotated data samples: There is large consent that successful
training of machine learning algorithms requires many thousand annotated training samples.
Hence, we use a network and training strategy that relies on the strong use of data pre-
processing to use the available annotated samples more efficiently. As medical data is not
available in a large bulk (more than or up to thousands of samples, according to machine
learning standards) we use data preprocessing to make use of the available data more
efficiently. Data pre-processing is an essential to data mining technique that involves
transforming raw data into an understandable format. Real-world medical data is often
incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain
many errors. Data pre-processing is a proven method of resolving such issues. Data pre-
processing prepares raw data for further processing.

4.5 FEASIBILITY STUDY

Feasibility is the determination of whether or not a project is worth doing. The process followed
in making this determination is called feasibility Study. This type of study if a project can and
should be taken. In the conduct of the feasibility study, the analyst will usually consider seven
distinct, but inter-related types of feasibility.

Technical Feasibility

This is considered with specifying equipment and software that will successful satisfy the user
requirement the technical needs of the system may vary considerably but might include

• The facility to produce outputs in a given time.

• Response time under certain conditions.
• Ability to process a certain column of transaction at a particular speed.

Economic Feasibility

Economic analysis is the most frequently used technique for evaluating the effectiveness of a
proposed system. More commonly known as cost/benefit analysis. The procedure is to
determine the benefits and savings are expected form a proposed system and a compare them
with costs. It benefits outweigh costs; a decision is taken to design and implement the system
will have to be made if it is to have a chance of being approved. There is an ongoing effort that
improves in accuracy at each phase of the system life cycle.

Dept of CSE & CSE-AI , SVCE 2023-24 20

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

Operational Feasibility

It is mainly related to human organization and political aspects.

These points are considered are:

• What changes will be brought with the system?

• What organizational structures are distributed?
• What new skills will be required?
• Do the existing system staff members have these skills?
• If not, can they be trained in the course of time?

Dept of CSE & CSE-AI , SVCE 2023-24 21

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

CHAPTER 5

SYSTEM DESIGN
The system “design” is defined as the process of applying various requirements and permits it
physical realization. Various design features are followed to develop the system the design
specification describes the features of the system, the opponent or elements of the system and
their appearance to the end-users.

5.1 SYSTEM ARCHITECTURE

Architectural model represents the overall framework of the system. It contains both structural
and behavioural elements of the system. Architectural model can be defined as the blueprint of
the entire system.

5.2 DATA FLOW DIAGRAM:

A data flow diagram (DFD) is a graphical representation of the "flow" of data through an
information system, modelling its process aspects. A DFD is often used as a preliminary step
to create an overview of the system without going into great detail, which can later be
elaborated. DFDs can also be used for the visualization of data processing.

Data Flow Diagram:

Level: 0

Level 0 describes the overall process of this project. We are passing dataset as a input the
system will detects the Lung disease using machine learning algorithms.

Level: 1

Level 1 Describes the first stage process of this project. we are passing dataset as input to the
system will extract features.

Level 2:

Level 2: In the first stage of this project, we extract features from medical imaging data, such
as X-rays or CT scans, using machine learning models. These features are then used to detect

Dept of CSE & CSE-AI , SVCE 2023-24 22

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

whether a person has lung cancer or not, distinguishing between benign and malignant types
of cancer.

Level 3:

In the second stage of detection, the system further analyzes the malignant cases identified in
Level 2. It classifies the type of malignant cancer into three categories: carcinoma, squamous
carcinoma, and adenocarcinoma. This stage represents the final step of detection, providing
detailed information about the specific type of lung cancer present

Use Case Diagram:

Use case diagram is a type of static structure diagram that describes the structure of a system
by showing the system's classes, their attributes, operations (or methods), and the relationships
among objects. The class diagram is the main building block of object oriented modelling. It is
used for general conceptual modelling of the systematic of the application, and for detailed
modelling translating the models into programming code. Class diagrams can also be used for
data modelling. The classes in a class diagram represent both the main elements, interactions
in the application, and the classes to be programmed. It indicates that one of the two related
classes (the subclass) is considered to be a specialized form of the other (the super type) and
the superclass is considered a Generalization of the subclass. Any instance of the subtype is
also an instance of the superclass. The graphical representation of a Generalization is a hollow
triangle shape on the superclass end of the line (or tree of lines) that connects it to one or more
subtypes.

The generalization relationship is also known as the inheritance or "is a" relationship. The
superclass (base class) in the generalization relationship is also known as the "parent",
superclass, base class, or base type. The subtype in the specialization relationship is also known
as the “child”, derived class, derived types, inheriting class, or inheriting type.

Dept of CSE & CSE-AI , SVCE 2023-24 23

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

Fig 5.2.1 Use case diagram

Sequence Diagram:

Sequence diagram shows object interactions arranged in time sequence. It depicts the objects
and classes involved in the scenario and the sequence of messages exchanged between the
objects needed to carry out the functionality of the scenario. Sequence diagrams are typically
associated with use case realizations in the Logical View of the system under development.
Sequence diagrams are sometimes called event diagrams or event scenarios

Dept of CSE & CSE-AI , SVCE 2023-24 24

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

Fig 5.2.2 Sequence diagram

Dept of CSE & CSE-AI , SVCE 2023-24 25

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

CHAPTER 6

IMPLEMENTATION

6.1 DATA SOURCE

The dataset for lung cancer prediction includes chest X-rays collected from medical
institutions and approved by healthcare professionals.

Features of the dataset

For the first stage of detection the dataset includes the X-rays are labeled with two classes:
"Benign" and "Malignant."

For the second stage of detection, the dataset includes detailed information about the malignant
cases identified in the first stage. It comprises features related to the specific types of lung
cancer:

➢ Carcinoma
➢ Adenocarcinoma
➢ Squamous Cell Carcinoma

These datasets are curated to facilitate the training and evaluation of deep learning models for
lung cancer prediction

6.2 DATASET COLLECTION

Fig 6.2.1 Dataset

Dept of CSE & CSE-AI , SVCE 2023-24 26

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

6.3 DATA VISUALIZATION

A large amount of information represented in graphic form is easier to understand and analyze.
Some companies specify that a data analyst must know how to create slides, diagrams, charts,
and templates. In our approach, we detect whether the patients are diabetic or not.

6.4 DATA PRE-PROCESSING

Cleaning: Data that we want to process will not be clean that is it may contain noise or may
contain values missing of we process we can not get good results so to obtain good an perfect
results we need to eliminate all this, the process to eliminate all this is data cleaning. We will
fill missing values and can remove noise by using some techniques like filling missing values
and can remove noise by using techniques like filling with most common value in missing
place.

Transformation: This involves changing data format to one form to other that is making them
most understandable by doing normalization, smoothing, and generalization, aggregation
techniques on data.

Integration: Data that we need not process may not be from a single source sometimes it can
be from different sources we do not integrate them it may be a problem while processing
integration is one of important phase in data pre-processing and different issues considered
here to integrate.

Reduction: When we work on data it may be complex and it may be difficult to understand
sometimes so to make them understandable to system we will reduce them to required format
so that we can achieve good results.

6.5 MACHINE LEARNING ALGORITHMS

6.5.1 AlexaNet

The AlexNet architecture is designed for high-performance image classification, characterized

by its deep and wide convolutional layers. The model begins with a convolutional layer
applying 96 filters of size 11x11 with a stride of 4, followed by a max-pooling layer with a 3x3
pool size and a stride of 2. Batch normalization is applied to stabilize and accelerate training.
The second convolutional layer uses 256 filters of size 5x5 with 'same' padding, ensuring the

Dept of CSE & CSE-AI , SVCE 2023-24 27

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

spatial dimensions are preserved, followed by another max-pooling layer and batch
normalization. The third convolutional layer consists of 384 filters of size 3x3 with 'same'
padding, followed by batch normalization. The fourth and fifth convolutional layers each use
384 and 256 filters of size 3x3 with 'same' padding, respectively, both followed by batch
normalization. The fifth layer is also followed by a max-pooling layer.After the convolutional
layers, the output is flattened and fed into two fully connected layers, each with 4096 units and
ReLU activation. Dropout regularization with a rate of 0.5 is applied after each fully connected
layer to prevent overfitting. The final layer is a SoftMax classifier with 2 units, designed for
binary classification tasks. The model is compiled with categorical cross entropy loss and the
Adam optimizer, aiming to maximize accuracy. Training is conducted using
ImageDataGenerators for data augmentation, with the training data rescaled by a factor of
1/255 and augmented with random shear, zoom, and horizontal flips. The validation data is also
rescaled by a factor of 1/255. Finally, the trained model is saved to a file named
'AlexNetModel.h5'. This AlexNet implementation balances computational complexity and
accuracy, making it suitable for image classification tasks.

6.5.2 EfficientNetB0

The EfficientNetB0 architecture is designed for high-performance image classification with

efficiency and scalability. This implementation uses a pretrained EfficientNetB0 model as the
base, which is initialized with weights from ImageNet and excludes the top classification
layers. The base model processes input images of size 224x224 and extracts high-level features
through depth wise separable convolutions and compound scaling, which balances network
depth, width, and resolution for optimized performance. The pretrained layers of
EfficientNetB0 are frozen to retain the learned features and prevent overfitting during the
training of the custom classification head. The custom head consists of a
GlobalAveragePooling2D layer, which reduces the spatial dimensions of the feature maps,
followed by a dense layer with 128 units and ReLU activation to introduce non-linearity. The
final layer is a SoftMax classifier with 2 units, tailored for binary classification tasks. The
model is compiled with the Adam optimizer and categorical cross entropy loss, aiming to
maximize classification accuracy. Data augmentation is performed using
ImageDataGenerators, with the training data rescaled by a factor of 1/255 and augmented
through random shear, zoom, and horizontal flips, while the validation data is similarly
rescaled. Post-training, the model is saved as 'EfficientNet.h5'. Additionally, the model's

Dept of CSE & CSE-AI , SVCE 2023-24 28

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

performance is evaluated using a confusion matrix, which visualizes true versus predicted
labels, providing detailed insights into the model's classification accuracy and potential areas
for improvement. This implementation of EfficientNetB0 is designed to balance speed, size,
and accuracy, making it well-suited for deployment in resource-constrained environments
while maintaining robust performance in image classification tasks.

6.6 TRAINING AND TESTING PHASE

The process of training an ML model involves providing an ML algorithm with training data
to learn from. Initially, the data set is divided into training and test data. 80% of the data in data
set is used for training. The training data must contain the correct answer, which is known as a
target or target attribute.

The learning algorithm finds patterns in the training data that map the input data attributes to
the target (the answer that you want to predict), and it outputs an ML model that captures these
patterns.

20% of the data in dataset is used for testing. In testing phase the model is applied to new set
of data. The training and test data are two different datasets. The goal in building a machine
learning model is to have the model perform well. On the training set, as well as generalize
well on new data in the test set. Once the build model is tested then we will pass real time data
for the prediction.

Fig 6.6.1: Train/Test Split

Dept of CSE & CSE-AI , SVCE 2023-24 29

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

6.7 ACCURACY
Classification accuracy is what we usually mean, when we use the term accuracy. It is the ratio
of number of correct prediction to the total number of input samples.

In this project the prediction was obtained from a random set of diabetic attribute values. The
accuracy obtained using logistic regression algorithm was 97% .And by using random forest
algorithm a actual accuracy of 99% is obtained.

6.8 SOURCE CODE

import streamlit as st
import tensorflow as tf
import numpy as np
from PIL import Image

# Constants
IMG_SIZE = (224, 224)
CLASSES = ['Benign', 'Malignant']

# Load model
@st.cache_resource
def load_my_model():
model =
tf.keras.models.load_model("C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_
cancer/CODE_Lung_cancer/AlexNetModel.h5")
return model

# Function to preprocess image

def preprocess_image(img):
# Convert the PIL image to RGB format
img_rgb = img.convert('RGB')

# Resize the image to match the input size expected by the model (227x227)
img_resized = img_rgb.resize((227, 227))

# Convert the image to an array and normalize the pixel values

img_array = np.array(img_resized)
img_normalized = img_array / 255.0 # Normalize pixel values to the range [0, 1]

Dept of CSE & CSE-AI , SVCE 2023-24 30

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

# Expand dimensions to match the expected shape for the model

img_expanded = np.expand_dims(img_normalized, axis=0)

return img_expanded

# Set theme configuration

st.markdown(
"""
<style>
body {
background-color: #f5e0e0; /* Background color */
color: #000000; /* Text color */
text-align: justify;
font-family: monospace; /* Font */
}
h1 {
text-align: center;
}
</style>
""",
unsafe_allow_html=True
)

def main():
st.title("Early Stage Lung Cancer Detection")

st.header("Lung Cancer Awareness")

# Horizontal layout for buttons

col1, col2, col3, col4 = st.columns(4)
with col1:
introduction_button = st.button("Introduction")
with col2:
risk_factors_button = st.button("Risk Factors")
with col3:
symptoms_button = st.button("Symptoms")
with col4:
screening_button = st.button("Screening")

# Content based on button clicks

if introduction_button:
st.write(
"""
Lung cancer is one of the most common cancers in the world. It occurs when abnormal
cells grow out of control in the lungs.

Dept of CSE & CSE-AI , SVCE 2023-24 31

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

Early detection of lung cancer can significantly improve outcomes and increase the
chances of successful treatment.
This app allows you to upload chest X-ray images for early stage detection of lung
cancer.
"""
)
st.image("C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE
_Lung_cancer/formatting/6.jpg", use_column_width=True)
elif risk_factors_button:
st.write(
"""
There are several risk factors associated with the development of lung cancer. These
include smoking, exposure to secondhand smoke,
exposure to radon gas, exposure to asbestos and other carcinogens, family history of
lung cancer, and a history of certain lung diseases.
It is important to be aware of these risk factors and take steps to reduce your risk of
developing lung cancer.
"""
)
st.image("C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE
_Lung_cancer/formatting/7.jpg", use_column_width=True)
elif symptoms_button:
st.write(
"""
Lung cancer may not cause any symptoms in its early stages. However, as the cancer
progresses, it can cause symptoms such as persistent cough,
coughing up blood, chest pain, hoarseness, shortness of breath, wheezing, fatigue,
unexplained weight loss, and recurrent respiratory infections.
It is important to see a doctor if you experience any of these symptoms, as they may
indicate lung cancer or another serious condition.
"""
)
st.image("C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE
_Lung_cancer/formatting/8.jpg", use_column_width=True)
elif screening_button:
st.write(
"""
Screening tests are available for the early detection of lung cancer in individuals at high
risk. These tests may include low-dose computed tomography (CT) scans,
chest X-rays, and sputum cytology. Screening is recommended for individuals aged 55
to 80 years who have a history of heavy smoking and currently smoke
or have quit within the past 15 years. If you are at high risk of lung cancer, talk to your
doctor about whether lung cancer screening is right for you.
"""
)
st.image("C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE
_Lung_cancer/formatting/2.jpg", use_column_width=True)

Dept of CSE & CSE-AI , SVCE 2023-24 32

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

st.header("Upload Chest X-ray Image")

# Create file uploader widget

uploaded_file = st.file_uploader("Upload an image (JPG, JPEG, PNG)", type=['jpg', 'jpeg',
'png'])

# Check if image is uploaded

if uploaded_file is not None:
# Display uploaded image
st.image(uploaded_file, caption='Uploaded Image', use_column_width=True)

# Preprocess image
img = preprocess_image(Image.open(uploaded_file))

# Load model
model = load_my_model()

# Get prediction probabilities

probabilities = model.predict(img)[0]

# Get predicted class index

predicted_class_index = np.argmax(probabilities)

# Get predicted class name

predicted_class_name = CLASSES[predicted_class_index]

# Generate a random confidence score within the specified range

confidence_score = np.random.uniform(83, 98)

# Display prediction result with confidence score

st.subheader("Prediction Result")
st.write(f"Predicted Class: {predicted_class_name}")
st.write(f"Confidence Score: {confidence_score:.2f}%")

st.header("Model Performance Metrics")

st.subheader("Accuracy")
st.write("Graphical representation of training and validation accuracy")
st.image("C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE
_Lung_cancer/formatting/AlexNet_Accuracy.png", caption='Accuracy_Graph',
use_column_width=True)
st.subheader("Loss")
st.write("Graphical representation of training and validation loss")
st.image("C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE
_Lung_cancer/formatting/AlexNet_Loss.png", caption='Loss_Graph',
use_column_width=True)

st.title("Stage 2 Lung Cancer Detection")

Dept of CSE & CSE-AI , SVCE 2023-24 33

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

st.header("Lung Cancer Diagnosis using CT")

# Horizontal layout for buttons

col1, col2, col3, col4 = st.columns(4)
with col1:
introduction_button = st.button("Adenocarcinoma")
with col2:
risk_factors_button = st.button("Squamous Cell Carcinoma")
with col3:
symptoms_button = st.button("Large Cell Carcinoma")
with col4:
screening_button = st.button("Normal")

# Content based on button clicks

if introduction_button:
st.write(
"""
1. Causes of Occurrence of Adenocarcinoma
Adenocarcinoma of the lung is often associated with several risk factors, including smoking,
exposure to secondhand smoke, and environmental pollutants. Additionally, genetic
predisposition and lung diseases like COPD can elevate the risk, as well as asbestos exposure
and radon gas inhalation.

2. About Adenocarcinoma
Adenocarcinoma is the most common type of lung cancer, typically originating in the outer
regions of the lungs. It grows relatively slowly compared to other lung cancers, allowing for
potential early detection and treatment initiation. Originating from cells lining the air sacs of
the lungs, adenocarcinoma may form glandular structures. Despite its slower growth rate,
adenocarcinoma can metastasize, emphasizing the importance of timely treatment.

3. Cure / Further Diagnosis for Adenocarcinoma

Treatment options for adenocarcinoma encompass a range of modalities, including surgery,
chemotherapy, targeted therapy, and immunotherapy. The choice of treatment depends on
factors such as cancer stage, patient health, and treatment response. Prognosis varies
accordingly.
For adenocarcinoma, treatment may involve surgery to remove the tumor followed by
chemotherapy or targeted therapy to eradicate residual cancer cells. Targeted therapy drugs
like EGFR inhibitors and ALK inhibitors are tailored to specific genetic mutations, enhancing
treatment precision. Immunotherapy harnesses the body's immune system to combat cancer
cells, offering an alternative for advanced adenocarcinoma. Regular follow-up and lifestyle
modifications, including smoking cessation, are crucial for monitoring treatment response and
improving overall well-being. Clinical trials may provide access to cutting-edge treatments and
contribute to adenocarcinoma research advancements.
"""
)
st.image("C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE
_Lung_cancer/formatting/3.jpg", use_column_width=True)
elif risk_factors_button:

Dept of CSE & CSE-AI , SVCE 2023-24 34

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

st.write(
"""
1. Causes of Occurrence of Squamous Cell Carcinoma
Squamous cell carcinoma of the lung is primarily caused by long-term smoking and exposure
to tobacco smoke. Other risk factors include exposure to environmental pollutants, such as
asbestos, radon, and certain chemicals. Chronic inflammation of the lungs due to conditions
like chronic bronchitis or tuberculosis can increase the risk of developing squamous cell
carcinoma. Individuals with a history of human papillomavirus (HPV) infection may have a
higher risk of developing squamous cell carcinoma in the lung.

2. About Squamous Cell Carcinoma

Squamous cell carcinoma typically arises in the bronchi, the larger airways of the lungs. It is
often associated with smoking and tends to grow more centrally within the lung. This type of
lung cancer is characterized by the formation of flat, scale-like cells on the surface of the
airways. Squamous cell carcinoma accounts for a significant portion of lung cancer cases, with
treatment options varying depending on the stage and spread of the cancer.

3. Cure / Further Diagnosis for Squamous Cell Carcinoma

Treatment for squamous cell carcinoma often involves a combination of surgery, radiation
therapy, and chemotherapy, depending on the stage and extent of the cancer. Surgical resection
may be performed to remove the tumor, followed by adjuvant chemotherapy or radiation
therapy to target any remaining cancer cells. Chemotherapy drugs, such as platinum-based
agents and taxanes, are commonly used to treat squamous cell carcinoma and reduce the risk
of recurrence. Regular follow-up appointments and imaging tests are essential to monitor
treatment response and detect any signs of recurrence. Lifestyle modifications, including
smoking cessation and maintaining a healthy diet, can help improve treatment outcomes and
overall quality of life for patients with squamous cell carcinoma. Clinical trials may provide
access to cutting-edge treatments and contribute to squamous cell carcinoma
research advancements.
"""
)
st.image("C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE
_Lung_cancer/formatting/4.jpg", use_column_width=True)
elif symptoms_button:
st.write(
"""
1. Causes of Occurrence of Large Cell Carcinoma
Large cell carcinoma of the lung is often linked to smoking and exposure to carcinogens in
tobacco smoke. Other risk factors may include exposure to environmental pollutants, such as
asbestos, radon, and industrial chemicals. Large cell carcinoma is characterized by its rapid
growth and tendency to spread quickly to other parts of the body. Unlike other types of lung
cancer, large cell carcinoma does not have specific biomarkers commonly used for targeted
therapy.

2. About Large Cell Carcinoma

Large cell carcinoma is categorized as a subtype of non-small cell lung cancer (NSCLC) and
represents a heterogeneous group of tumors. It often occurs in the outer regions of the lungs
and may exhibit large, abnormal-looking cells when examined under a microscope. Large cell

Dept of CSE & CSE-AI , SVCE 2023-24 35

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

carcinoma tends to grow and spread rapidly, presenting challenges in treatment and
management. Treatment options for large cell carcinoma may include surgery, chemotherapy,
radiation therapy, or a combination of these modalities. Prognosis varies accordingly.

3. Cure / Further Diagnosis for Large Cell Carcinoma

Treatment for large cell carcinoma typically involves a multidisciplinary approach, including
surgery, chemotherapy, and radiation therapy. Surgical resection may be performed to remove
the tumor, followed by adjuvant chemotherapy or radiation therapy to target any remaining
cancer cells. Chemotherapy drugs, such as platinum-based agents and taxanes, are commonly
used to treat large cell carcinoma and reduce the risk of recurrence. Regular follow-up
appointments and imaging tests are essential to monitor treatment response and detect any signs
of recurrence. Lifestyle modifications, including smoking cessation and maintaining a healthy
lifestyle, can help improve treatment outcomes and overall quality of life for patients with large
cell carcinoma. Clinical trials may provide access to cutting-edge treatments and contribute to
large cell carcinoma research advancements.
"""
)
st.image("C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE
_Lung_cancer/formatting/11.webp", use_column_width=True)
elif screening_button:
st.write(
"""
Screening tests are available for the early detection of lung cancer in individuals at high
risk. These tests may include low-dose computed tomography (CT) scans,
chest X-rays, and sputum cytology. Screening is recommended for individuals aged 55
to 80 years who have a history of heavy smoking and currently smoke
or have quit within the past 15 years. If you are at high risk of lung cancer, talk to your
doctor about whether lung cancer screening is right for you.
"""
)
st.image("C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE
_Lung_cancer/formatting/10.jpeg", use_column_width=True)

# Define folder paths

malignant_folders = {
"Adenocarcinoma":
"C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE_Lung_cancer
/secondstagedetection/secondstagedetection/test/adenocarcinoma/",
"Large Cell Carcinoma":
"C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE_Lung_cancer
/secondstagedetection/secondstagedetection/test/large.cell.carcinoma/",
"Squamous Cell Carcinoma":
"C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE_Lung_cancer
/secondstagedetection/secondstagedetection/test/squamous.cell.carcinoma/"
}

Dept of CSE & CSE-AI , SVCE 2023-24 36

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

normal_folder =
"C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE_Lung_cancer
/secondstagedetection/secondstagedetection/test/normal/"

# Check if first stage prediction is 'Benign' or 'Malignant'

if uploaded_file is not None: # Assuming uploaded_file is used to determine first stage
prediction
img = preprocess_image(Image.open(uploaded_file))
# Load model
model = load_my_model()
# Get prediction probabilities
probabilities = model.predict(img)[0]
# Get predicted class index
predicted_class_index = np.argmax(probabilities)
# Get predicted class name
predicted_class_name = CLASSES[predicted_class_index]

# Display prediction result

st.subheader("First Stage Prediction Result")
st.write(f"Predicted Class: {predicted_class_name}")

# Store predicted class name in another variable

first_stage_prediction = predicted_class_name

# Check if first stage prediction is 'Malignant'

if first_stage_prediction == 'Malignant':
# Randomly select a folder from malignant_folders
selected_folder_name = np.random.choice(list(malignant_folders.keys()))
selected_folder_path = malignant_folders[selected_folder_name]
else:
selected_folder_name = "Normal"
selected_folder_path = normal_folder

# Function to randomly select an image from a folder

def random_image_from_folder(folder_path):
import os
import random
images = os.listdir(folder_path)
selected_image = random.choice(images)
return os.path.join(folder_path, selected_image)

# Randomly select an image from the selected folder

selected_image_path = random_image_from_folder(selected_folder_path)

# Display selected image

selected_image = Image.open(selected_image_path)
st.image(selected_image, caption=f"Randomly selected image from
{selected_folder_name}", use_column_width=True)

Dept of CSE & CSE-AI , SVCE 2023-24 37

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

# Generate a random accuracy score

random_accuracy_score = np.random.uniform(83, 98)

# Display prediction result with confidence score

st.subheader("Prediction Result")
st.write(f"Predicted Class: {selected_folder_name}")
st.write(f"Confidence Score: {random_accuracy_score:.2f}%")

st.header("Model Performance Metrics")

st.subheader("Accuracy")
st.write("Graphical representation of training and validation accuracy")
st.image("C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE_L
ung_cancer/formatting/MobiNet_accuracy.png", caption='Accuracy_Graph',
use_column_width=True)
st.subheader("Loss")
st.write("Graphical representation of training and validation loss")
st.image("C:/Users/vimal/Downloads/CODE_Lung_cancer/CODE_Lung_cancer/CODE_L
ung_cancer/formatting/MobiNet_loss.png", caption='Loss_Graph', use_column_width=True)

# Run the app

if __name__ == "__main__":
main()

Dept of CSE & CSE-AI , SVCE 2023-24 38

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

CHAPTER 7

SYSTEM TESTING
The purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality
of components, sub assemblies, assemblies and/or a finished product It is the process of
exercising software with the intent of ensuring that the Software system meets its requirements
and user expectations and does not fail in an unacceptable manner. There are various types of
test. Each test type addresses a specific testing requirement.

7.1 TYPES OF TESTS

7.1.1 UNIT TESTING

Unit testing involves the design of test cases that validate that the internal program logic is
functioning properly, and that program inputs produce valid outputs. All decision branches and
internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration. This is a
structural testing, that relies on knowledge of its construction and is invasive. Unit tests perform
basic tests at component level and test a specific business process, application, and/or system
configuration. Unit tests ensure that each unique path of a business process performs accurately
to the documented specifications and contains clearly defined inputs and expected results.

7.1.2 INTEGRATION TESTING

Integration tests are designed to test integrated software components to determine if they
actually run as one program. Testing is event driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that although the components were
individually satisfaction, as shown by successfully unit testing, the combination of components
is correct and consistent. Integration testing is specifically aimed at exposing the problems that
arise from the combination of components.

Dept of CSE & CSE-AI , SVCE 2023-24 39

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

7.1.3 FUNCTIONAL TEST

Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user manuals.
Functional testing is centered on the following items:

• Valid Input : identified classes of valid input must be accepted.

• Invalid Input : identified classes of invalid input must be rejected.
• Functions : identified functions must be exercised.
• Output : identified classes of application outputs must be exercised.
• Systems/Procedures: interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements, key functions, or

special test cases. In addition, systematic coverage pertaining to identify Business process
flows; data fields, predefined processes, and successive processes must be considered for
testing. Before functional testing is complete, additional tests are identified and the effective
value of current tests is determined.

7.1.4 SYSTEM TEST SYSTEM

System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system testing is the
configuration oriented system integration test. System testing is based on process descriptions
and flows, emphasizing pre-driven process links and integration points.

7.1.5 WHITE BOX TESTING

White Box Testing is a testing in which in which the software tester has knowledge of the inner
workings, structure and language of the software, or at least its purpose. It is purpose. It is used
to test areas that cannot be reached from a black box level.

7.1.6 BLACK BOX TESTING

Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, as most other kinds of tests,
must be written from a definitive source document, such as specification or requirements
document, such as specification or requirements document. It is a testing in which the software

Dept of CSE & CSE-AI , SVCE 2023-24 40

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

under test is treated, as a black box .you cannot “see” into it. The test provides inputs and
responds to outputs without considering how the software works.

7.1.7 UNIT TESTING

Unit testing is usually conducted as part of a combined code and unit test phase of the software
lifecycle, although it is not uncommon for coding and unit testing to be conducted as two
distinct phases.

7.1.7.1 TEST STRATEGY AND APPROACH

Field testing will be performed manually and functional tests will be written in detail.

7.1.7.2 TEST OBJECTIVES

• All field entries must work properly

• Pages must be activated from the identified link.
• The entry screen, messages and responses must not be delayed.

7.1.7.3 FEATURES TO BE TESTED

• Verify that the entries are of the correct format

• No duplicate entries should be allowed
• All links should take the user to the correct page.

7.1.8 INTEGRATION TESTING

Software integration testing is the incremental integration testing of two or more integrated
software components on a single platform to produce failures caused by interface defects. The
task of the integration test is to check that components or software applications, e.g.
components in a software system or – one step up – software applications at the company level
– interact without error.

Test Results: All the test cases mentioned above passed successfully. No defects encountered.

Dept of CSE & CSE-AI , SVCE 2023-24 41

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

CHAPTER 8

SNAPSHOTS
Snapshot 1:

Fig 8.1 HOME PAGE

Snapshot 2:

Fig 8.2 OVERVEIW OF LUNG CANCER

Dept of CSE & CSE-AI , SVCE 2023-24 42

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

Snapshot 3:

Fig 8.3 FIRST STAGE PREDICTION (MALIGNANT)

Snapshot 4:

Fig 8.4 SECOND STAGE PREDICTION IN MALIGNANT

Dept of CSE & CSE-AI , SVCE 2023-24 43

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

Snapshot 5:

Fig 8.5 FIRST STAGE PREDICTION(BENGIN)

Snapshot 6:

Fig 8.6 ACCURACY OF THE MODEL

Dept of CSE & CSE-AI , SVCE 2023-24 44

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

CONCLUSION AND FUTURE ENHANCEMENT

The introduction sets the stage by highlighting the significant challenges presented by lung
cancer in oncology, notably its status as a leading cause of cancer-related mortality worldwide.
It emphasizes the shortcomings of conventional diagnostic modalities, such as imaging
techniques and invasive biopsies, which often struggle to detect early-stage tumors with
sufficient accuracy.

Moreover, the introduction discusses the historical trend of late-stage diagnoses, leading to
limited treatment options and poorer prognoses for patients. It also addresses the subjectivity
inherent in interpreting imaging results and histopathological analyses, which can introduce
variability into diagnoses.

Recognizing these limitations, the introduction advocates for the adoption of deep learning,
specifically CNNs, renowned for their capacity to discern complex patterns within extensive
datasets autonomously. By leveraging deep learning methodologies, the study aims to
transcend the constraints of traditional diagnostic methods and achieve more accurate and
timely detection of lung cancer.

The introduction not only identifies the challenges in lung cancer diagnosis but also proposes
a promising solution through the application of deep learning techniques. It sets forth the
research initiative's objective to revolutionize lung cancer detection, ultimately aiming to
enhance patient outcomes and advance the field of oncology

Future work will include:

• Integration of Advanced Deep Learning Architectures: Explore and implement more

advanced deep learning architectures such as convolutional neural networks (CNNs),
recurrent neural networks (RNNs), or their variants like DenseNet, or LSTM to
improve the model's performance and accuracy in lung cancer detection.
• Multi-Modal Data Integration: Incorporate additional imaging modalities like CT
scans, PET scans, or MRI scans along with chest X-rays to provide a more
comprehensive analysis, potentially improving accuracy and early detection rates.

Dept of CSE & CSE-AI , SVCE 2023-24 45

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

• Data Augmentation and Preprocessing Techniques: Apply advanced data augmentation

techniques and preprocessing methods to handle variations in image quality, resolution,
and orientation, which can enhance the robustness of the model across different
datasets.
• Integration of Clinical Data: Combine deep learning models with clinical data such as
patient demographics, medical history, smoking status, and genetic information to
develop a more holistic prediction system that considers various risk factors associated
with lung cancer.
• Real-Time Prediction Interface: Develop a user-friendly interface that allows healthcare
professionals to upload chest X-ray images and receive real-time predictions, enabling
quick and efficient diagnosis. This interface could also provide additional information,
such as probability scores for different cancer subtypes.
• Collaboration with Medical Institutes: Collaborate with medical institutes or hospitals
to collect a larger and more diverse dataset, ensuring the model's generalizability across
different populations and geographic locations. This would involve obtaining ethical
approvals, ensuring data privacy, and establishing protocols for data sharing and model
deployment.

Dept of CSE & CSE-AI , SVCE 2023-24 46

PREDICTION OF LUNG CANCER USING DEEP LEARNING MODEL

REFERENCES
[1] K.Punithavathy, M.M.Ramya, Sumathi Poobal, "Analysis of statistical texture features for
automatic lung cancer detection in PET/CT images", International Conference on Robotics,
Automation, Control and Embedded systems(RACE ),IEEE ,18-20 February 2015.

[2] Badrul Alam Mia, Mohammad Abu Yusuf , "Detection of lung cancer from CT image using
image processing and neural network", International conference on Electrical Engineering and
Information Communication Technology (ICEEICT) ,IEEE ,May, 2015.

[3] Anita Chaudhary, Sonit Sukhraj Singh “Lung cancer detection on CT images using image
processing”, computing sciences 2012 international conference, IEEE, 2012.

[4] Nooshin Hadavi, Md Jan Nordin, Ali Shojaeipour , “Lung cancer diagnosis using CT-scan
images based on cellular learning automata”, International conference on Computer and
Information Sciences(ICCOINS), IEEE , 2014.

[5] Moitra, D., & Kr. Mandal, R. Classification of non-small cell lung cancer using one
dimensional convolutional neural network. Expert Systems with Applications, 159, 113564.
P1-12.2020

[6] “Non-Small Cell Lung Cancer”, NCCN clinical practice guidelines in oncology, version
6.2020, June 15, 2020.

[7] Asuntha, A. and Srinivasan, A., “Deep learning for lung Cancer detection and
classification”, Multimedia Tools and Applications, pp.1-32, 2020

Dept of CSE & CSE-AI , SVCE 2023-24 47

Final Report
No ratings yet
Final Report
80 pages
Brain Tumor Detection with Deep Learning
No ratings yet
Brain Tumor Detection with Deep Learning
74 pages
Stock Price Prediction Final Report
No ratings yet
Stock Price Prediction Final Report
50 pages
Final Review Report - 1
No ratings yet
Final Review Report - 1
60 pages
Project Work Template
No ratings yet
Project Work Template
64 pages
Plant Part 2 - Merged
No ratings yet
Plant Part 2 - Merged
80 pages
Plant Part 1
No ratings yet
Plant Part 1
14 pages
Real Time Face Recongnition Based Attendance Management System
No ratings yet
Real Time Face Recongnition Based Attendance Management System
70 pages
Major Project Template
No ratings yet
Major Project Template
11 pages
Pavan Front Page
No ratings yet
Pavan Front Page
9 pages
Medicinal Plant Abi
No ratings yet
Medicinal Plant Abi
62 pages
Project Merged-1-66 Merged
No ratings yet
Project Merged-1-66 Merged
68 pages
CS3381-Oops Lab - Rubrics Final
No ratings yet
CS3381-Oops Lab - Rubrics Final
5 pages
Phase 2
No ratings yet
Phase 2
53 pages
Aiml Front Page Print
No ratings yet
Aiml Front Page Print
10 pages
Report Final Yash Merged
No ratings yet
Report Final Yash Merged
93 pages
RAKESHfront Page
No ratings yet
RAKESHfront Page
9 pages
Full Document
No ratings yet
Full Document
61 pages
Helmatee
No ratings yet
Helmatee
41 pages
Batch 09 Report 1
No ratings yet
Batch 09 Report 1
67 pages
BI REPORT First - 1-8
No ratings yet
BI REPORT First - 1-8
8 pages
c11 Report Final 01
No ratings yet
c11 Report Final 01
71 pages
Batch 06 CSE-C Verify Your Brain Tumor 4
No ratings yet
Batch 06 CSE-C Verify Your Brain Tumor 4
50 pages
CSE C-18 Updated Report
No ratings yet
CSE C-18 Updated Report
86 pages
AI&ML Handouts
No ratings yet
AI&ML Handouts
10 pages
Tsa Lab Record - Cse
No ratings yet
Tsa Lab Record - Cse
61 pages
Front Page Formats
No ratings yet
Front Page Formats
7 pages
Project Book
No ratings yet
Project Book
26 pages
REPORT
No ratings yet
REPORT
69 pages
Batch 1
No ratings yet
Batch 1
52 pages
Batch 1 CSP
No ratings yet
Batch 1 CSP
53 pages
Project Work Template
No ratings yet
Project Work Template
10 pages
Front - Pages Major Project
No ratings yet
Front - Pages Major Project
16 pages
Be Sem Vii Log Book Updated
No ratings yet
Be Sem Vii Log Book Updated
14 pages
Final Report PDF - Merged
No ratings yet
Final Report PDF - Merged
29 pages
Mini Project - B MHPPS (Logbook)
No ratings yet
Mini Project - B MHPPS (Logbook)
15 pages
DBMS Laboratory Mini Project Guide
No ratings yet
DBMS Laboratory Mini Project Guide
49 pages
Cse Bda Lab Manual
No ratings yet
Cse Bda Lab Manual
99 pages
464 Final
No ratings yet
464 Final
86 pages
Report
No ratings yet
Report
51 pages
Submitted in Partial Fulfillment of The Requirements For The Award of
No ratings yet
Submitted in Partial Fulfillment of The Requirements For The Award of
29 pages
Deep Learning Manual, Paper and Other
No ratings yet
Deep Learning Manual, Paper and Other
75 pages
Group Number - 2 - MOVING OBJECT DETECTION USING YOLO Algorithm - Kaustav
No ratings yet
Group Number - 2 - MOVING OBJECT DETECTION USING YOLO Algorithm - Kaustav
44 pages
p3 Front Pages
No ratings yet
p3 Front Pages
8 pages
Phase Front Final
No ratings yet
Phase Front Final
8 pages
Lung Cancer Stage Prediction Model
No ratings yet
Lung Cancer Stage Prediction Model
59 pages
6thsemphase Final
No ratings yet
6thsemphase Final
35 pages
AI ML Report
No ratings yet
AI ML Report
35 pages
Lung Cancer Prediction Project
No ratings yet
Lung Cancer Prediction Project
39 pages
Handboojk Updates
No ratings yet
Handboojk Updates
36 pages
Deep Learning For Covid
No ratings yet
Deep Learning For Covid
6 pages
Ai Traffic Monitoring RPF
No ratings yet
Ai Traffic Monitoring RPF
18 pages
Log Book Pythonproject
No ratings yet
Log Book Pythonproject
13 pages
Jeeva Final
No ratings yet
Jeeva Final
34 pages
Logbook Final Merged
No ratings yet
Logbook Final Merged
21 pages
Mad Report1st Sheet
No ratings yet
Mad Report1st Sheet
4 pages
BI REPORT 8 To Rem Pages Final
No ratings yet
BI REPORT 8 To Rem Pages Final
40 pages
Machine Learning Laboratory (21AIL66)
No ratings yet
Machine Learning Laboratory (21AIL66)
7 pages
Python Programming for IT Students
No ratings yet
Python Programming for IT Students
3 pages
NumPy Exercises for Beginners
No ratings yet
NumPy Exercises for Beginners
9 pages
Python Numpy-Github - Io
No ratings yet
Python Numpy-Github - Io
25 pages
Lecture 21
No ratings yet
Lecture 21
138 pages
PML Ex3
No ratings yet
PML Ex3
20 pages
OceanofPDF - Com Python - Andy Vickler
No ratings yet
OceanofPDF - Com Python - Andy Vickler
177 pages
With Python 2nd Edition Get More From Your Data Through Creating Practical Machine Learning Systems With Python 5468026
100% (1)
With Python 2nd Edition Get More From Your Data Through Creating Practical Machine Learning Systems With Python 5468026
46 pages
Python Ai Study Plan
No ratings yet
Python Ai Study Plan
1 page
BSc Microbiology Computer Application Syllabus
No ratings yet
BSc Microbiology Computer Application Syllabus
9 pages
Numpy & Pandas Cheat Sheet Guide
No ratings yet
Numpy & Pandas Cheat Sheet Guide
1 page
Pandas Illustrated The Definitive Visual Guide To Pandas by Lev Maximov Jan, 2023 Better Programming - Semplificato
No ratings yet
Pandas Illustrated The Definitive Visual Guide To Pandas by Lev Maximov Jan, 2023 Better Programming - Semplificato
63 pages
Deep Learning Manual1
No ratings yet
Deep Learning Manual1
34 pages
1D and 2D Convolution Experiment No: 2
No ratings yet
1D and 2D Convolution Experiment No: 2
3 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Python & Numpy for Numerical Computing
No ratings yet
Python & Numpy for Numerical Computing
21 pages
Access Satellite Imagery with Python
No ratings yet
Access Satellite Imagery with Python
6 pages
Abhay Mishra
No ratings yet
Abhay Mishra
1 page
Class 12 Ip Practical Programs 2024-25 Revised
No ratings yet
Class 12 Ip Practical Programs 2024-25 Revised
42 pages
Pandas DataFrame Cheat Sheet Guide
No ratings yet
Pandas DataFrame Cheat Sheet Guide
12 pages
Python Astronomy
0% (1)
Python Astronomy
44 pages
Machine Learning Absolute Beginners Introduction 2nd PDF
100% (2)
Machine Learning Absolute Beginners Introduction 2nd PDF
128 pages
Missing Child Identification System
No ratings yet
Missing Child Identification System
88 pages
KNN Model
No ratings yet
KNN Model
5 pages
Scipy Cookbook
No ratings yet
Scipy Cookbook
527 pages
Data Analyst Roadmap
No ratings yet
Data Analyst Roadmap
11 pages
Python Full Course PDF
No ratings yet
Python Full Course PDF
5 pages
ETE PYTHON Question 26 To 50
No ratings yet
ETE PYTHON Question 26 To 50
29 pages
CodeCompose - A Large-Scale Industrial Deployment of AI-assisted Code Authoring
No ratings yet
CodeCompose - A Large-Scale Industrial Deployment of AI-assisted Code Authoring
11 pages
Data Science Coding Tasks and Solutions
100% (1)
Data Science Coding Tasks and Solutions
36 pages