Project Based Learning
Presentation
Project Title: Early-Stage Lung Cancer
Detection
(Using Advanced
Computational Algorithms)
Team Members:
1. Kritika Dhanesh Magnani – 23FE10CDS00483
2. Vallala Shiva Sai Danush Vardhan – 23FE10CDS00484
Current Challenges
1.Lack of Diverse Training Data [1]
o Most datasets are collected from specific regions/patient groups
o Makes the AI biased and less accurate for people of different ages, ethnicities, or medical
histories
2. Low Sensitivity for Small Nodules [2]
o AI still struggles to detect very small tumors (under 6 mm), where detection is most critical
3. Privacy & Data Security [3]
o Medical imaging data is sensitive, sharing it for AI training raises major concerns about
patient privacy and data protection laws
4. Risk of Overfitting [4]
o If models are trained on limited or repetitive data, they may perform well in training but fail
in real-world scenarios
1. Nature Medicine, 2023
2. Radiology Society of North America (RSNA), 2024
3. Journal of Digital Health, 2024
4. ArXiv preprint, 2023
Project Objectives
1. 2. 3. 4.
Enhance Improve Minimize Speed Up
Early-Stage Diagnostic False the
Detection Accuracy Alarms Diagnosis
Process
Identifying lung AI helps reduce
cancer at an early Advanced AI models unnecessary follow-
stage improves can detect cancer ups by distinguishing Automated systems
treatment success patterns more between cancerous provide faster results,
and saves lives[5]. reliably than and non-cancerous supporting timely
traditional nodules[7]. decisions by
methods[6]. doctors[8].
5. American Lung Association, n.d.; 6. Litjens et al., 2017 – Medical Image Analysis; 7. Setio et al., 2017 – LUNA16 Challenge; 8. Shen et al., 2015 –
Steps to Increase
Detection Accuracy
Deep Learning Models: Utilizing Convolutional Neural Networks (CNNs) and
Recurrent neural Networks (RNNs) for image analysis and sequence data
processing[9].
Multi-omics Data Integration: Combine different types of biological data for
a comprehensive analysis.
Transfer Learning: Fine-tune pre-trained models on specific tasks[10].
Ensemble Learning: Combine multiple model predictions to improve robustness
and accuracy[11].
Data Augmentation: Increase data diversity with transformations[12].
9. Krizhevsky et al., 2012 – NIPS; 10. Shin et al., 2016 – Deep Learning in Medical Imaging; 11. Setio et al., 2017 – LUNA16 Challenge; 12. Litjens et
al., 2017
Steps to Increase
Detection Accuracy
Advanced Feature Engineering: Extract the most informative features using
techniques like PCA[13].
Regularization Techniques: Prevent overfitting with dropout, L1, and L2
regularization[14].
Hyperparameter Optimization: Use grid search or Bayesian optimization for
optimal model settings.
Cross-Validation: Ensure consistent performance across data subsets[15].
Collaborative Learning: Work with other institutions to access larger
datasets[16].
13. Goodfellow et al., 2016 – Deep Learning Book; 14. Srivastava et al., 2014 – Journal of Machine Learning Research; 15. Kohavi, 1995 – IJCAI
Conference; 16. Cancer Genome Atlas, 2014
Flowchart of the
Project
Test with Use in
Collect Train AI
Real Hospital
CT Scans Model
Data s
Key Features:
1. AI Models that Learn from Images: CNNs detect nodules without human
input.
2. Multi-scale Analysis: Looks at both large and small parts of scans.
3. Fast Results: Can give real-time answers for doctors.
Workflow of the
Project
Phase 1 Phase 2 Phase 3
Data Collection Clinical Trials
Model Training
Infrastructure UI-
Optimization
Setup Testing Development
Initial Design Deployment
Workflow of the
Project
Prototype Overview
Dataset Used
Source: LIDC-IDRI
Key Steps
Format: DICOM medical
imaging files and 1. DICOM Parsing and Loading
associated XML 2. Visualization
annotations 3. Annotation Parsing
Structure: Each patient 4. Preprocessing
folder contains multiple 5. Modeling
CT scans and related
metadata
Key Steps in
Prototype
3. Annotation Parsing
1. DICOM Parsing and • XML files parsed to detect
2. Visualization
Loading annotations like nodules
• Specific slices of 2D CT
• Patient CT scans loaded and malignancy level
scan visualized using
using pydicom • Converts annotation into
matplotlib
• Metadata extracted usable input for
• Helps verify quality and
• Pixel arrays created from classification model
position of tumor regions
individual slices
5. Modeling
4. Preprocessing
• Placeholder for CNN or
• Intensity normalization
deep learning models for
• Resizing image
Nodule Detection and
dimensions
Malignancy Classification
• Thresholding lung regions
Conclusion
Leveraging advanced machine learning techniques enhances lung cancer detection accuracy.
Key strategies:
• Utilize deep learning models like CNNs.
• Integrate multi-omics data for comprehensive analysis.
• Apply transfer learning with pre-trained models.
• Use ensemble methods and data augmentation for robustness.
• Employ advanced feature engineering and regularization for better generalization.
Potential increase in detection accuracy by up to 4%.
Leads to earlier and more precise diagnoses.
Improves patient outcomes and saves lives.
Continuous innovation and collaboration across institutions drive further advancements.
References
1. American Cancer Society. (2022). Key statistics for lung cancer. https://www.cancer.org/cancer/lung-cancer/about/key-statistics.html
2. American Lung Association. (n.d.). Lung cancer symptoms and risk factors. https://www.lung.org/lung-health-diseases/lung-disease-lookup/lung-cancer/symptoms-causes-risk-factors
3. World Health Organization. (2021). Cancer. https://www.who.int/news-room/fact-sheets/detail/cancer
4. Mayo Clinic. (n.d.). Lung cancer - symptoms and causes. https://www.mayoclinic.org/diseases-conditions/lung-cancer/symptoms-causes/syc-20374620
5. Lung Cancer Foundation of America. (n.d.). Lung cancer facts. https://lcfamerica.org/lung-cancer-info/lung-cancer-facts/
6. Centers for Disease Control and Prevention (CDC). (2023). Lung cancer statistics. https://www.cdc.gov/cancer/lung/statistics/index.htm
7. Cancer.Net. (2023). Lung cancer: Types of treatment. https://www.cancer.net/cancer-types/lung-cancer/types-treatment
8. WebMD. (n.d.). Types of lung cancer. https://www.webmd.com/lung-cancer/guide/lung-cancer-types
9. American Cancer Society. (2022). Types of lung cancer. https://www.cancer.org/cancer/lung-cancer/about/what-is.html
10. National Cancer Institute. (2023). Lung cancer—patient version. https://www.cancer.gov/types/lung
11. Cleveland Clinic. (n.d.). Lung cancer overview. https://my.clevelandclinic.org/health/diseases/4436-lung-cancer
12. Cancer Treatment Centers of America. (n.d.). Lung cancer diagnosis and staging. https://www.cancercenter.com/cancer-types/lung-cancer/diagnosis
13. Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., ... & Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42, 60-88.
https://doi.org/10.1016/j.media.2017.07.005
14. Shen, W., Zhou, M., Yang, F., Yang, C., & Tian, J. (2015). Multi-scale convolutional neural networks for lung nodule classification. In Information Processing in Medical Imaging (pp. 588-599). Springer.
https://doi.org/10.1007/978-3-319-19992-4_46
15. Setio, A. A. A., Traverso, A., de Bel, T., Berens, M. S., van den Bogaard, C., Cerello, P., ... & van Ginneken, B. (2017). Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in
computed tomography images: The LUNA16 challenge. Medical Image Analysis, 42, 1-13. https://doi.org/10.1016/j.media.2017.06.015
16. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 25, 1097–1105.
https://papers.nips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
Thank You