Synopsis Report
on
DiagnostiX: AI-Driven multi-disease detection
and patient interaction support
Submitted as partial fulfilment for the award of
BACHELOR OF TECHNOLOGY
DEGREE
Session 2025-26
`
Computer Engineering
By
Ashutosh Maurya (2200320150017)
Harshit Rai (2200320150028)
Himanshu Mishra (2200320150030)
Under the guidance of
Ms. Monica
Assistant Professor
DEPARTMENT OF COMPUTER ENGINEERING
ABES ENGINEERING
COLLEGE, GHAZIABAD
AFFILIATED TO
DR. A.P.J. ABDUL KALAM TECHNICAL UNIVERSITY, U.P.,
LUCKNOW
(Formerly UPTU)
Declaration
We hereby declare that the work being presented in this report
entitled ―DiagnostiX: AI-Driven multi-disease detection
and patient interaction support‖ is an authentic record of our
own work carried out under the supervision of Ms. Monica,
Assistant Professor, Information Technology.
The matter embodied in this report has not been submitted by us for
the award of any other degree.
Date: 27/08/2025
Signature of Student Signature of Student
Name: Ashutosh Maurya Name: Harshit Rai
Roll No: 2200320150017 Roll No: 2200320150028
(Department of CE&IT) (Department of CE&IT)
Signature of Student
Name: Himanshu Mishra
Roll No: 2200320150030
(Department of CE&IT)
i
Certificate
This is to certify that Abstract Report entitled ―DiagnostiX: AI-
Driven multi-disease detection and patient interaction support‖
which is submitted by
Ashutosh Maurya (2200320150017)
Harshit Rai (2200320150028)
Himanshu Mishra (2200320150030)
in partial fulfilment of the requirement for the award of degree
[Link]. in the Department of Information Technology of Dr. A.P.J.
Abdul Kalam Technical University, Lucknow is a record of the
candidate's own work carried out by him under our supervision.
The matter embodied in this thesis is original and has not been
submitted for the award of any other degree.
Signature of HOD Signature of supervisor
Prof (Dr.) Amrita Jyoti Assistant Professor
Department of CE&IT Department of CE&IT
ii
Acknowledgement
It gives us a great sense of pleasure to present the report of the
[Link]. abstract report undertaken during [Link] Final Year. We owe
a special debt of gratitude to Ms. Monica, Assistant Professor
Department of Information Technology, ABES Engineering College for
her constant support and guidance throughout our work. Her sincerity,
thoroughness and perseverance have been a constant source of
inspiration for us. It is only her cognizant efforts that our endeavors
have been seen light of the day.
We also take the opportunity to acknowledge the contribution of our
Head of Department (Information Technology) for her full support and
assistance during the development of the project.
We also do not like to miss the opportunity to acknowledge the
contribution of all faculty members of the department for their kind
assistance and cooperation during the development of our project. Last
but not the least, we acknowledge our friends for their contribution in
the completion of the project.
Signature of Student Signature of Student
Name: Ashutosh Maurya Name: Harshit Rai
Roll No: 2200320150017 Roll No: 2200320150028
(Department of CE&IT) (Department of CE&IT)
Signature of Student
Name: Himanshu Mishra
Roll No: 2200320150030
(Department of CE&IT)
iii
List of Figures
Figure No. Figure Caption
Data preprocessing pipeline for imaging and tabular
Fig 1
data
Ensemble pipeline for structured-data diseases
Fig 2
(Random Forest + XGBoost)
Fig 3 Overall system architecture of DIAGNOSTIX
Fig 4 Chatbot dialog flow
Sample UI wireframe for upload, results and
Fig 5
explanation display
iv
List of Tables
Table No. Table Caption
Table 1 Datasets selected per disease
Table 2 Models and hyper parameter choices
Table 3 Evaluation metrics and acceptance criteria
Expected performance summary (baseline vs.
Table 4
proposed)
v
List of Symbols, Abbreviations and
Nomenclature
Abbreviation Full Form
AI Artificial Intelligence
ML Machine Learning
DL Deep Learning
CNN Convolutional Neural Network
NLP Natural Language Processing
XAI Explainable Artificial Intelligence
SHAP SHapley Additive exPlanations
Grad-CAM Gradient-weighted Class Activation Mapping
vi
Abstract
The prompt identification and precise diagnosis of life-threatening
illnesses remain major obstacles in the healthcare industry. The majority
of conventional diagnostic systems are disease-specific, resource-
intensive, and frequently unavailable to the general public. To address
these challenges, this project proposes DIAGNOSTIX, an AI-driven multi-
disease detection and patient support platform. The system uses deep
learning and machine learning models such as CNN, Random Forest,
and XGBoost to predict heart disease, diabetes, pneumonia, Alzheimer's
disease, and brain tumors. An NLP-powered medical chatbot not only
diagnoses but also provides interactive support to patients, increasing
user engagement and reducing communication barriers. Furthermore, the
use of Explainable AI methods (SHAP, Grad-CAM) ensures that model
predictions are transparent and interpretable, which boosts the
confidence and productivity of medical professionals to a great extent.
vii
Table of Contents
S. No. Contents Page No.
Student’s Declaration i
Certificate Ii
Acknowledgment Iii
List of Figures Iv
List of Tables v
List of Symbols, Abbreviations, Nomenclature vi
Abstract vii
Chapter 1 : Introduction 1-2
1.1 : Background
1.2 : Motivation & Significance
1.3 : Problem Statement
1.4 : Scope & Limitations
1.5 : Organization of report
Chapter 2 : Related Work/Methodology 3-4
Chapter 3 : Project Objective 5-6
Chapter 4 : Proposed Methodology 7-13
4.1 : Dataset
4.2 : Data Preprocessing
4.3 : Model design & architectures
4.4 : Training strategy & hyperparameters
4.4 : Explainability methods (technical)
4.5 : Evaluation protocol (rigor & stats)
viii
4.6 : Deployment & MLOPs
4.7 : Ethics, privacy & safety
Chapter 5 : Details of Project Work (Design and Implementations 14-20
5.1 : System architecture
5.2 : API contract schema
5.3 : Database schema (concise)
5.4 : Implementation milestones & concrete deliverables
5.5 : Testing plan
5.6 : Deployment plan (of both semesters)
Chapter 6 : Results and Discussion 21-25
Chapter 7 : Conclusion and Future Scope 26
Chapter 8 : References 27
ix
Chapter 1
Introduction
The healthcare industry faces challenges in early detection and accurate
diagnosis of multiple critical diseases. Traditional diagnosis systems are often
disease-specific, lacking integration and scalability. Artificial Intelligence (AI)
and Machine Learning (ML) offer powerful tools to analyze medical data with
high precision. This project proposes a unified AI-based system for the
prediction of Brain Tumor, Alzimer’s Disease, Diabetes, Pneumonia, Heart
Disease.
1.1 Background
Large volumes of diverse data are produced by modern healthcare, including
electronic health records, clinical test results, and medical images (MRI, X-ray).
Predictive analytics and image interpretation automation have shown promise
thanks to developments in ML/DL. The majority of implemented systems,
however, only treat one illness, which results in redundant infrastructure and
user annoyance when several conditions need to be assessed.
1.2 Motivation & Significance
A unified diagnostic tool that handles several high-impact conditions can reduce
the time to diagnosis, save resources, and enable remote access in low-
resource environments. Adding a conversational assistant improves
accessibility for non-expert users and makes awareness or triage easier.
Explainability is crucial: clinicians must be able to comprehend the logic behind
AI-driven recommendations before implementing them.
1
1.3 Problem Statement
Current diagnostic pipelines are siloed and often lack interpretability and
patient-facing communication. The project aims to design a scalable platform
that:
Predicts multiple diseases from imaging and structured data,
Provides model explanations understandable to clinicians, and
Offers a patient-oriented chatbot to clarify results and next steps.
1.4 Scope and Limitations
Scope: Six target conditions (brain tumor, breast cancer, Alzheimer’s, diabetes,
pneumonia, and heart disease), web-based prototype, integration of XAI
techniques, and a rule-based plus ML-powered medical chatbot.
Limitations: This is a prototype for academic evaluation. Clinical-grade
deployment requires regulatory approvals and larger clinical datasets;
predictive outputs will be presented as decision-support rather than definitive
diagnoses.
1.5 Organization of the Report
This overview adheres to the given TOC, which consists of the following
chapters: Chapter 2's literature review; Chapter 3's objectives; Chapter 4's
detailed methodology; Chapter 5's implementation; Chapter 6's results and
discussion; and Chapter 7's conclusions with future implications.
2
Chapter 2
Related Work
Current research in medical AI mostly looks at detecting one disease at a time,
making predictions using patient data and using chatbots to help with health-
related questions. These studies show good results and offer useful information
but they rarely combine the prediction of multiple diseases with clear
explanations and support that is focused on helping patients talk and interact.
2.1 Overview of prior art
Selected studies demonstrate strong single-disease performance: CNNs for
brain tumor segmentation/detection; deep networks for chest X-ray pneumonia
detection; ensemble classifiers for diabetes prediction on clinical datasets; and
chatbots for general medical information. However, these contributions are
mostly task-specific and rarely combine multi-disease prediction with XAI and
patient-facing conversational support.
2.2 Key takeaways from literature
Transfer learning with pre-trained CNN backbones is advantageous for
imaging tasks.
Ensemble tree-based models (Random Forest, XGBoost) frequently
produce reliable results when applied to tabular clinical data.
Clinical trust is enhanced by explainability; Grad-CAM and SHAP are
popular and complementary; SHAP is used for feature-level attribution,
while Grad-CAM is used for spatial localization.
3
Chatbots trained or fine-tuned on medical QA data provide higher
relevance; linking the chatbot outputs to model predictions can improve
interpretability for lay users.
4
Chapter 3
Project Objective
PrimaryObjective
Create and deploy DIAGNOSTIX, a unified, modular platform that offers a
patient-facing medical chatbot and explainable AI-based decision support for
clinically relevant conditions (heart disease, diabetes, pneumonia, Alzheimer's
disease, and brain tumors).
Specific, measurable objectives:
1. Data & Dataset Integration
o Unify data ingestion pipelines and obtain and pre-process a
minimum of one representative dataset for each disease (refer to
Chapter 4).
o Make certain that every dataset is anonymized, cleaned, and
divided into train, val, and test segments using repeatable splits.
2. Model Accuracy Targets
o Target ROC-AUC ≥ 0.85 and F1 ≥ 0.80 on hold-out test sets for
image tasks (MRI, X-ray, and mammography) (subject to data
complexity).
o Target ROC-AUC ≥ 0.80 and a strong precision/recall trade-off for
tabular tasks (diabetes, heart disease).
3. Explainability
o Integrate Grad-CAM for image localization and SHAP for tabular
feature attribution.
5
o Produce per-sample visual explanations that correlate with known
clinical markers at least 70% of the time (to be validated against
clinician annotations if available).
4. Chatbot & Interaction
o Build an NLP assistant to answer common health queries and
explain model outputs in lay language.
o Achieve a conversational intent accuracy ≥ 85% on a held-out
intent classification set.
5. System & Deployment
o Deliver a secure web prototype (upload → prediction +
explanation + chat).
6. By containerizing models and APIs, it is possible to demonstrate end-to-
end inference with a latency of less than three seconds for single-image
inference on a GPU (prototype target).
7. Reproducibility & Documentation:
o Provide environment files, training scripts, and a README that
are all fully reproducible; record experiments using MLflow or a
comparable program.
6
Chapter 4
Proposed Methodology
Most existing diagnostic tools are disease-specific, leading to high
infrastructure cost and limited accessibility. There is a need for a unified,
intelligent system that can detect multiple diseases using medical data and
also assist users through interactive AI-based support. The lack of model
interpretability in AI-based healthcare systems reduces trust and limits clinical
adoption.
4.1 Datasets
Brain Tumor (MRI): BraTS (Brain Tumor Segmentation) dataset:
segmentation and classification using T1, T2, and FLAIR modalities.
Alzheimer's disease: ADNI (Alzheimer's Disease Neuroimaging
Initiative) cognitive scores and structural MRI.
Chest X-ray dataset for pneumonia (RSNA Pneumonia Detection
Challenge; Kaggle/NIH ChestX-ray14).
Diabetes: Pima Indians Diabetes dataset (UCI) and, for robustness,
larger clinical datasets, if available
Heart Disease: UCI Heart Disease dataset (Cleveland) or MIMIC subset
Table 1: Datasets selected per disease
Disease Dataset Used Sample Size Data Type
MRI (T1, T2,
BraTS (Brain Tumor
Brain Tumor FLAIR),
Segmentation) ~2,000 cases
(MRI) segmentation
Dataset
& classification
ADNI (Alzheimer’s Structural MRI
Alzheimer’s
Disease Neuroimaging ~1,500 subjects + Cognitive
Disease Initiative) Scores
7
Chest X-ray
datasets (Kaggle, ~100,000 X- Radiography
Pneumonia
NIH ChestX-ray14, rays (X-ray)
RSNA Challenge)
Pima Indians
Diabetes Dataset ~768 (PIMA) + Tabular clinical
Diabetes
(UCI) + larger clinical extended attributes
datasets (if available)
UCI Heart Disease
~303 (UCI) + Tabular (EHR
Heart Disease Dataset (Cleveland)
extended features)
or MIMIC subset
4.2 Data preprocessing — imaging and tabular
Imaging pipeline
Normalization: Pixel-wise z-score normalization or rescale to [0,1]
depending on model pretraining.
Resize: Use model input sizes (e.g., 224×224 for ResNet/EfficientNet;
256×256 for custom networks). Use slice-level models with majority
voting or 3D CNN for MRI multi-slice volumes if computing allows.
Augmentation:
o Geometric: rotations (±15°), horizontal/vertical flips (if
anatomically valid), random crops, scaling.
o Photometric: brightness/contrast jitter, Gaussian noise, elastic
deformation (careful for medical plausibility).
o Spatial: random erasing/cutout for robustness.
Class imbalance: Oversampling minority class, focal loss, or
mixup/SMOTE (tabular) to prevent bias.
8
Cleaning & Normalization
Raw Data
Imputation /Scaling
Feature
Ready for Augmentation
Grouping
Model (Images)
(Tabular)
Fig 1: Data preprocessing pipeline for imaging and tabular data
Tabular pipeline
Cleaning: Drop identifiers, impute missing with median (continuous) or
mode (categorical); for clinical labs prefer domain imputation strategies.
Feature engineering: Create clinically meaningful derived features (BMI
from weight/height, risk scores), bin continuous variables if useful.
Scaling: StandardScaler for tree-less models; not required for tree
ensembles.
Random Forest
Ensemble
Input Features
(Average/Stack)
XGBoost
Fig 2: Ensemble pipeline for structured-data diseases (Random Forest + XGBoost)
9
4.3 Model design & architectures (specifics)
Backbone: Pretrained transfer-learning models: EfficientNet-B3 or
ResNet50 for 2D images. For histopathology consider DenseNet.
Head: GlobalAveragePooling → Dense(256, ReLU) → Dropout(0.5) →
Dense(num_classes, Softmax).
Loss: categorical cross-entropy for multi-class tasks; binary cross-
entropy for binary tasks. For imbalance, think about weighted loss.
Tabular models
Main: XGBoost with early stopping (max_depth 4–8, learning_rate 0.01–
0.1, n_estimators up to 1000).
Alternative/Ensemble: RandomForest (n_estimators 200–500) and a
small MLP (2–3 layers) to compare.
Volumetric MRI option
Use 3D UNet for segmentation/localization; use 3D classification head
for whole-volume prediction when segmentation masks exist.
Chatbot NLU
Intent classifier: Fine-tuned DistilBERT or BERT-base with softmax
over intent labels.
Entity recognition: CRF layer on top of BERT embeddings (if entities
are needed).
Response: Template + slot filling or retrieval augmented generation
(RAG) for factual answers from curated FAQ.
10
Imaging Model
(CNN, Transfer-
Learning)
Data Sources Preprocessing
(MRI,X-Ray, (Augment/Imput
Tabular) e)
Tabular Model
(Boosted Trees)
Explainability API + Chatbot
(Grad-CAM / (FastAPI,
Shap) Summaries)
Fig 3: Overall system architecture of DIAGNOSTIX
4.4 Training strategy & hyperparameters
• Batch sizes range: from 16 to 64, depending on the GPU; AdamW is
optimized for CNNs with weight decay 1e-4.
• Learning rate schedule: ReduceLROnPlateau or One-cycle LR; LR 1e-4 is
where fine-tuning begins.
• Regularization: Label smoothing (0.1) for noisy labels, dropout 0.3–0.5, and
weight decay.
• Early stopping: Keep an eye on the validation ROC-AUC; wait 8–12 epochs.
• Cross-validation: if the dataset is small, use the hold-out test plus cross-val;
for tabular data, use stratified k-fold (k=5).
• Reproducibility: log seed values, package versions, and fix seeds
([Link], torch, and tf).
11
Table 2: Models and hyper parameter choices
Model Hyperparameter Purpose
Conv layers = 3, Filter size =
Tumor classification
CNN (Imaging) (3x3), Dropout = 0.3,
from MRI
Optimizer = Adam
Trees = 200, Max Depth = 10 Tabular data
Random Forest
classification
Learning rate=0.1, Estimators Structured-data disease
XGBoost = 250, Max Depth = 6 prediction
4.5 Explainability methods (technical)
Quantitative explainability validation: If clinician-annotated ROIs
available, compute IoU/Dice between Grad-CAM high-activation region
and ground-truth mask.
Grad-CAM: Compute gradient of predicted class w.r.t final conv layer →
weighted sum → ReLU → upsample to original image. Save heatmap
overlays and heatmap score (localization confidence).
SHAP (TreeExplainer/DeepExplainer): For XGBoost use
TreeExplainer; produce per-sample SHAP value lists, global SHAP
summary, and dependence plots for top features.
4.6 Evaluation protocol (rigor & stats)
Metrics: Accuracy, Precision, Recall, F1, ROC-AUC, PR-AUC. For
segmentation: Dice, IoU. Report classwise metrics and macro/micro
averages.
Calibration: Use reliability diagrams and Brier score; apply Platt scaling
or isotonic regression for calibration if needed.
12
Statistical significance: For model A vs B use McNemar’s test (paired
classification) or bootstrap confidence intervals (95%) for AUC
differences.
Error analysis: Manual review of false positives/negatives, grouped by
demographic or acquisition device to check dataset bias.
4.7 Deployment & MLOps
Model serving: FastAPI or Flask + Gunicorn; Dockerized image for
inference. Use ONNX or TorchScript to speed up inference.
Model registry: MLflow for basic model versioning, model artifacts, and
experiment tracking.
GitHub is used for CI/CD. actions to execute tests, linters, API unit tests,
and automatic Docker builds.
Monitoring: Record input distribution drift, error rates, and inference
times; plan retraining when drift is identified.
4.8 Ethics, privacy & safety
PHI handling: Remove direct identifiers; encrypt data at rest; TLS for
transit.
Informed disclaimers in UI: ―For research/decision-support only —
consult a medical professional.‖
Bias mitigation: Stratified evaluation across age/gender groups and
device sources; if bias detected, consider re-sampling or domain
adaptation techniques.
13
Chapter 5
Details of Project Work
The project work gives a clear plan for creating, building, and putting into use
DiagnostiX, which is an AI system that helps detect multiple diseases and
interacts with patients. It starts by setting up the system structure, how the
APIs will work, and the database design. Then it outlines specific goals for
developing the AI models, adding explainability features, setting up chatbot
services, and making sure the front end and back end work together smoothly.
The project is split into two semesters. In the 7th semester, the focus is on
making the core version of the system, including disease prediction,
explanations using Grad-CAM, and a simple chatbot. The 8th semester is
about improving the system with a smarter AI assistant, better explainability
tools, making it scalable, and improving security. These steps together help
create a strong and user-friendly platform that supports diagnosis.
5.1 System architecture (module-level)
1. Data ingestion service
o Endpoints: /upload/image, /upload/csv.
o Implements validation, anonymization, and pushes to data lake
(S3 or local storage).
2. Preprocessing service
o Image transforms pipeline (TorchVision Albumentations).
o Tabular cleaning + feature engineering notebooks.
14
3. Model inference microservice
o Accepts preprocessed input, returns: {label, confidence, explanation:
{gradcam_path, shap_summary}}.
o Example endpoint: POST /predict/brain_tumor with multipart image.
4. Chatbot microservice
o Endpoints: /chat/query — returns {response, intent, confidence} and
optionally linked prediction explanation.
5. Frontend UI
o React or plain HTML/Bootstrap with pages: Login, Upload,
Results, Explanation viewer (Grad-CAM overlay slider), Chat
modal.
6. Database & logging
o PostgreSQL for structured records, S3 for image artifacts, ELK
stack for logs (optional).
5.2 API contract examples
Predict Image
POST /predict/{disease}
Headers: Authorization: Bearer <token>
Body: multipart/form-data { file: [Link], patient_id: "U123", metadata: { age: 54, sex: "M" } }
Response:
"prediction": "Tumor",
"confidence": 0.93,
"explanations": {
"gradcam_url": "/artifacts/gradcam/[Link]",
"notes": "High activation in left temporal lobe"
},
"model_version": "brain_v1.2" }
15
Chat Query
POST /chat/query
Body: { user_id: "U123", message: "What does early stage Alzheimer's mean?" }
Response:
{ "response": "Early stage AD often shows mild memory loss...", "intent": "explain_term",
"confidence": 0.87 }
5.3 Database schema (concise)
users(id, name, email_hashed, role, created_at)
records(id, user_id, disease, upload_path, created_at)
predictions(id, record_id, model, label, confidence, explanation_ref,
created_at)
chatlogs(id, user_id, message, response, intent, timestamp)
5.4 Implementation milestones & concrete deliverables
Milestone 1 — Week 1–2: Datasets acquired & EDA notebooks;
baseline models trained. Deliverable: dataset report + baseline metrics.
Milestone 2 — Week 3–6: Transfer-learning models + hyperparameter
tuning. Deliverable: tuned models & validation curves.
Milestone 3 — Week 6–9: XAI integration (Grad-CAM & SHAP) +
chatbot core. Deliverable: explanation artifacts & chat prototype.
Milestone 4 — Week 9–11: Web UI + API integration + security &
logging. Deliverable: full prototype deployed in Docker.
Milestone 5 — Week 11–12: Testing, user simulation, final report &
appendix code. Deliverable: final report & reproducible code.
5.5 Testing plan
Unit tests: For preprocessing steps, input validators, and API responses
(expected JSON schema).
16
Integration tests: Upload → preprocess → predict → explanation
retrieval.
Load testing: Use a simple load test to ensure server handles
concurrent requests (k6 or locust).
User acceptance: Simulated users evaluate UI and chatbot
(questionnaire: clarity, helpfulness, trust).
5.6 Deployment plan (of both semesters):
7th Semester – Core System Development: The primary focus during
the 7th semester will be to design and implement the foundational
system architecture of DiagnostiX. This phase establishes both the
frontend interface and the backend microservices, ensuring seamless
integration between data ingestion, model inference, and user
interaction.
Key Objectives:
Build the system skeleton (frontend + backend).
Integrate machine learning models into functional APIs.
Provide users with a working prototype for multi-disease prediction and
explanation.
Planned Activities:
Frontend Development:
Develop user-friendly UI with React/Bootstrap.
Implement pages: Login/Registration, Image & CSV
Upload, Results Dashboard, Explanation Viewer, and Chat
modal with Grad-CAM visualization overlay for medical
image explanations.
17
Backend Development
Data Ingestion Service: Implement endpoints (/upload/image,
/upload/csv) with validation, anonymization, and secure storage in
S3/local storage.
Preprocessing Service: Build pipelines for image
transformations (TorchVision/Albumentations) and tabular data
cleaning + feature engineering.
Model Inference Microservice: Integrate disease-specific ML
models with endpoints like /predict/{disease} returning predictions,
confidence scores, and explanation links.
Database Integration: Configure PostgreSQL for structured
patient records and connect with S3 for image artifacts.
Basic Chatbot Service
Develop intent-classification chatbot to answer predefined queries
(medical terms, workflow explanation).
Enable linkages between chatbot responses and model outputs
(e.g., pointing to explanation images).
Testing & Deployment
Unit tests for preprocessing, validation, and model outputs.
Integration tests for complete pipeline (upload → preprocess →
predict → explain).
Deploy prototype in Docker for reproducibility and testing.
Deliverable (End of 7th Semester): A fully functional prototype system
with frontend UI, backend APIs, disease prediction models, Grad-CAM
explanations, and a basic chatbot — all deployed in a containerized
environment.
18
8th Semester – Advanced Features & AI Assistant Integration: The
8th semester will focus on enhancing the system’s intelligence, usability,
and scalability. The highlight of this phase is the integration of a virtual
AI-based medical assistant, making DiagnostiX more interactive,
informative, and patient-friendly.
Key Objectives:
Transform the rule-based chatbot into an AI-driven conversational
assistant.
Enhance explainability features with natural language summaries.
Improve scalability, security, and reliability for practical deployment.
Planned Activities:
Virtual AI Assistant Integration
Upgrade chatbot into an LLM-powered conversational
assistant (using Rasa, LangChain, or OpenAI API).
Provide context-aware responses to medical queries such
as disease explanations, treatment guidelines, and
diagnostic interpretation.
Enable assistant to reference Grad-CAM/SHAP results and
explain them in natural language.
(Optional) Extend to voice-based interaction for
accessibility.
Explainable AI (XAI) Enhancements
Expand Grad-CAM/SHAP integration with dynamic UI
visualizations (heatmaps, comparative views).
19
Generate textual explanations aligned with prediction outputs
(e.g., ―High activation in left temporal lobe indicates possible tumor
growth‖).
System Optimization & Scalability
Conduct load testing (k6/Locust) for concurrent predictions and
queries.
Strengthen security with JWT-based authentication and role-
based access control.
Improve logging and monitoring (ELK stack integration).
User Acceptance & Feedback Loop
Conduct simulated patient–doctor interactions with the AI
assistant.
Collect structured feedback via questionnaires (clarity, trust,
usability).
Refine responses and UI based on user insights.
Final Integration & Reporting
Consolidate all features into the final release.
Prepare comprehensive documentation, final report, and appendix
with complete source code.
Deliverable (End of 8th Semester): An AI-driven diagnostic support
system with a fully integrated virtual assistant, enhanced explainability,
robust performance, and validated usability — ready for demonstration
and submission.
20
Chapter 6
Results & Discussion
The project evaluation framework defines a comprehensive plan for
experiments, reporting, and validation to assess both model performance and
system usability. It outlines experiment matrices, standardized reporting
templates, and statistical analyses for robust comparisons. Additionally, it
incorporates explainability validation, clinician alignment, error analysis,
limitations, and a structured user study for the chatbot, supported with dialog
flows and sample UI wireframes for results visualization.
6.1 Experiment matrix
Baseline model (simple CNN / logistic regression)
Transfer-learning backbone variants (ResNet50, EfficientNetB3)
Data-augmented vs. non-augmented training
Ensemble for tabular (XGBoost vs RandomForest vs MLP)
Explainability validation (quantitative IoU / clinician scoring where
possible)
6.2 Reporting templates
Table 3: Evaluation metrics and acceptance criteria
Metric Acceptance Criteria Rationale
Accuracy ≥ 85% Minimum threshold for clinical
usability
Precision ≥ 0.80 Ensure low false positives
Recall (Sensitivity) Reduce false negatives (critical in
≥0.85 healthcare)
F1-score Balanced performance measure
≥0.82
Strong discrimination ability
AUC-ROC ≥ 0.90
21
Table 4: Expected performance summary (baseline vs. proposed)
Disease Model Test ROC- Accuracy Precision Recall F1 Notes
AUC
Brain Tumor ResNet50 0.92 0.88 0.86 0.90 0.88 Grad-CAM
(fine-tuned) localizes
tumor in
78% TP
Pneumonia EfficientNetB3 0.89 0.85 0.83 0.86 0.84 Good
sensitivity on
frontal X-
rays
Alzheimer 3D-CNN + 0.91 0.87 0.85 0.88 0.86 Combines
Cognitive MRI
Scores(multim volumetrics
odal) + cognitive
data; SHAP
highlights
hippocampal
volume
Diabetes XGBoost 0.87 0.83 0.82 0.84 0.83 Good
(tabular) interpretabilit
y via SHAP;
stable
across
datasets
Heart Random 0.9 0.86 0.84 0.87 0.85 Consistent
Disease Forest + performance
Logistic ; feature
Ensemble importance
highlights
cholesterol
& ECG
6.3 Analysis & Statistical tests
Paired comparisons: Use McNemar’s test to compare two classifiers
on the same test set (e.g., ResNet50 vs EfficientNetB3). Report p-value
and whether difference is significant at α=0.05.
Confidence intervals: Bootstrap AUC with 1000 resamples to compute
95% CIs.
Calibration analysis: Brier score and reliability plots; if poorly
calibrated, apply isotonic regression and report post-calibration metrics.
22
6.4 Explainability & clinician validation
Grad-CAM validation: If segmentation masks exist, compute IoU and
Dice between high-activation heatmap threshold and ground truth. Aim
for IoU > 0.4 as an initial target (depends on dataset).
SHAP sanity checks: Confirm SHAP top features match clinical
knowledge (e.g., blood sugar importance for diabetes). If discrepancies
arise, perform feature interaction analysis to detect confounding.
6.5 Error analysis
Review false negatives for high-risk cases (clinically unacceptable
misses).
Analyze correlation between misclassifications and metadata (scanner
model, age group).
Provide mitigation strategies: collect more data in underperforming
strata, calibration, or use model ensembling.
6.6 Limitations to report
Dataset biases (geography, age, device) — explicitly state
generalizability limits.
Clinical validation is outside the scope of the academic prototype;
emphasize decision-support role.
Small datasets may inflate variance; use cross-validation and report
variance.
6.7 User study plan (chatbot)
Participants: 10–20 volunteers (non-clinical) for initial usability.
Tasks: Upload sample reports, ask 8 preset health questions, rate
answers for clarity (1–5).
23
Metrics: Intent accuracy, response helpfulness, average response time.
Success criteria: ≥ 4.0 average helpfulness and ≥ 85% intent accuracy.
Patient: Upload Symptoms
Bot: Request Details
Patient: Provide Information
Bot: Generate Prediction
Bot: Explain Result
Patient: Ask Clarification
Bot: Provide Guidance
Fig 4: Chatbot dialog flow
24
Patient Dashboard Doctor Dashboard
Upload Image/Enter Symptoms
Worklist/Cases
Prediction & Summary Image Viewer
Explaination (Heatmap) Feature Attributes
Chatbot Notes & Export
Fig 5: Sample UI wireframe for upload, results and explanation display
25
Chapter 7
Conclusion & Future Scope
Conclusion
The project DIAGNOSTIX presents a modular pipeline t
hat integrates machine learning and deep learning with explainability and
patient interaction features. Unlike black-box diagnostic tools, it emphasizes
transparency through Grad-CAM (for medical imaging) and SHAP (for clinical
data).
By combining prediction with a chatbot interface, the system allows patients
and doctors to interact with results in an interpretable and conversational
manner. Even in prototype form, DIAGNOSTIX highlights how pairing decision
support with explanation can lower diagnostic barriers, foster trust, and improve
accessibility, especially in resource-limited healthcare settings.
Future Scope
The system can be extended in several directions:
1. Broader Disease Coverage – Expand to liver, kidney, retinal, and skin
disorders, and incorporate multi-modal data like ECG, genomics, and
EHR records.
2. Continuous Monitoring – Integrate wearables and IoT devices for real-
time health tracking and early alerting.
3. Smarter Chatbot – Train on medical dialogue datasets, add multilingual
support, and improve patient engagement.
4. Clinical Validation – Collaborate with hospitals for real-world trials and
work towards regulatory compliance.
26
References
[1] Rajpurkar, P. et al., ―CheXNet: Radiologist-Level Pneumonia Detection on
Chest X-Rays with Deep Learning,‖ arXiv:1711.05225 (2017).
[2] Razzak, M. I., Naz, S., & Zaib, A., ―Deep Learning for Medical Image
Processing: Overview, Challenges and Future,‖ Neurocomputing (2018).
[3] Sharma, A., et al., ―Brain Tumor Detection Using CNNs,‖ International
Journal of Medical Imaging (2015).
[4] Gupta, R., et al., ―Predictive Modeling of Diabetes Using Machine Learning
Techniques,‖ Journal/Conference (2021).
[5] Patel, S., et al., ―Deep Learning for Pneumonia Detection in Chest
Radiographs,‖ IEEE Access (2022).
[6] Bora, N., et al., ―Applications of NLP in Healthcare: Medical Chatbots,‖
Review (2023).
[7] Selvaraju, R. R., et al., ―Grad-CAM: Visual Explanations from Deep
Networks via Gradient-based Localization,‖ ICCV (2017).
[8] Géron, A., ―Hands-On Machine Learning with Scikit-Learn, Keras, and
TensorFlow,‖ O’Reilly (2019).
[9] Scikit-learn Documentation, [Link] (accessed for
implementation details).
27