0% found this document useful (0 votes)
15 views

Gen AI for Disease Prediction

The document presents a project titled 'Gen AI for Disease Prediction' that utilizes machine learning, specifically the Random Forest algorithm, to predict diseases like diabetes, heart disease, and cancer based on user-input symptoms. It describes the system's development using Python, Scikit-learn, and Django, emphasizing its web interface for user interaction and data preprocessing techniques to enhance prediction accuracy. The project aims to improve healthcare accessibility and early disease detection through automated, accurate predictions and personalized health recommendations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Gen AI for Disease Prediction

The document presents a project titled 'Gen AI for Disease Prediction' that utilizes machine learning, specifically the Random Forest algorithm, to predict diseases like diabetes, heart disease, and cancer based on user-input symptoms. It describes the system's development using Python, Scikit-learn, and Django, emphasizing its web interface for user interaction and data preprocessing techniques to enhance prediction accuracy. The project aims to improve healthcare accessibility and early disease detection through automated, accurate predictions and personalized health recommendations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Volume 10, Issue 4, April – 2025 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25apr760

Gen AI for Disease Prediction


M V V Krishna1*; G Sri Jaya Sairam2; P Karthik3;
M Shakeer4; G Arjun5; SD Basheer Babu6
1
Assistant Professor, CSE Dept, Sri Vasavi Engineering College, Tadepalligudem.
2,3,4,5,6
Student, CSE Dept, Sri Vasavi Engineering College, Tadepalligudem, A.P., India

Corresponding Author: M V V Krishna1*

Publication Date: 2025/04/23

Abstract: The project "Gen AI for Disease Prediction", utilizes advanced machine learning methodologies to forecast
diseases such as diabetes, heart disease, and cancer based on user-input symptoms. It employs the Random Forest algorithm,
a powerful and flexible machine learning model, ensuring accurate predictions while reducing the likelihood of overfitting.
To enhance prediction reliability, the system incorporates data preprocessing techniques such as feature selection, data
cleaning, and encoding. Developed using Scikit-learn, Python, and Django, the project integrates sophisticated machine
learning functions with an intuitive web interface. Users can conveniently select symptoms from dropdown menus, which
are then processed by the backend system. The machine learning model, trained on a well-structured dataset covering
various medical conditions and their symptoms, analyzes the input to generate predictions. Ultimately, this project delivers
a scalable and efficient disease prediction system that aids in the early detection of potential health issues.

Keywords: Random Forest Algorithm, Medical Diagnosis, Scikit-Learn, Symptom Analysis, Early Disease Detection.

How to Cite: M V V Krishna; G Sri Jaya Sairam; P Karthik; M Shakeer; G Arjun; SD Basheer Babu (2025). Gen AI for Disease
Prediction. International Journal of Innovative Science and ResearchTechnology, 10(4), 1067-1074.
https://doi.org/10.38124/ijisrt/25apr760

I. INTRODUCTION disease prediction, focusing on models like Naïve Bayes,


K-Nearest Neighbor (KNN), Logistic Regression, and
Healthcare systems across the globe struggle with the Decision Tree. Their findings showed that the Random
challenge of ensuring timely and precise disease diagnosis. Forest algorithm achieved the highest accuracy of 98.95%,
Conventional diagnostic techniques often depend on manual surpassing Support Vector Machine (96.49%) and Naïve
assessments, which can result in delays and potential Bayes (89.4%), making it a reliable choice for predictive
inaccuracies. The integration of machine learning (ML) with healthcare applications.
artificial intelligence (AI) offers a ground-breaking technique  In 2020, Marouane Ferjani explored various machine
for improving the accuracy of disease prediction. Gen AI for learning techniques, dataset processing, and performance
Disease Prediction leverages the Random Forest algorithm to metrics for disease prediction. His research highlighted
evaluate symptoms provided by users and identify potential that Support Vector Machine (SVM) was highly effective
diseases such as diabetes, heart disease, and cancer. Developed for predicting kidney diseases and Parkinson’s disease,
using Python, Scikit-learn, and Django, the system features an whereas Logistic Regression (LR) yielded the best results
intuitive web-based interface, allowing users to input for heart disease detection, demonstrating the adaptability
symptoms and receive accurate predictions, thereby of ML models in medical diagnosis.
supporting early diagnosis and timely medical intervention.  Pooja Panapana et al. (2024) developed a disease
prediction and medication recommendation system using
II. LITERATURE SURVEY Naïve Bayes, Random Forest, and Gaussian Naïve Bayes
algorithms. Their study evaluated performance metrics
A comprehensive analysis of existing research is such as accuracy, precision, recall, and F1-score. Results
essential to understand the advancements in machine learning- showed that Support Vector Machine (SVM) achieved the
based disease prediction systems. Various studies have highest accuracy of 99.63%, proving its effectiveness in
explored different machine learning algorithms to enhance disease classification and predictive analytics.
diagnostic accuracy and early detection of diseases.  Mr. Sharan L. Pais et al. (2023) investigated Decision Tree
and Random Forest classifiers for disease prediction. Their
 Dr. C. K. Gomathy and Mr. A. Rohith Naidu (2021) research focused on the development of an ML-based
conducted research on machine learning algorithms for system that utilizes ensemble learning techniques to

IJISRT25APR760 www.ijisrt.com 1067


Volume 10, Issue 4, April – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25apr760
enhance predictive performance. The findings emphasized remote areas or regions with limited healthcare access face
the efficiency of Decision Tree and Random Forest models even greater difficulties in obtaining timely consultations,
in analyzing symptoms and diagnosing diseases increasing health risks due to delayed diagnosis.
accurately.
 In 2022, Md Manjurul Ahsan conducted a study on  Lack of Real-Time AI Integration
Logistic Regression, Decision Tree, and Random Forest in Traditional diagnosis systems do not integrate real-time
disease diagnosis. His review detailed how machine AI and machine learning models that can improve with
learning assists in the early detection of various diseases, continuous data input. Without automated learning
reinforcing the importance of data-driven diagnostic mechanisms, these systems remain static and do not adapt to
approaches. The study concluded that ML algorithms emerging diseases or evolving medical knowledge, making
significantly contribute to improving healthcare decision- them less effective over time.
making.
 Christina Zhuang and Ramin Ramezani (2024) examined IV. PROPOSED SYSTEM
Decision Trees, Logistic Regression, and Support Vector
Machines (SVMs) in disease prediction. Their study  Data Collection and Processing:
addressed challenges related to multiclass classification
and unbalanced datasets, highlighting the effectiveness of  Data Sources: Utilize a curated medical dataset containing
machine learning methods in enhancing diagnostic symptoms and corresponding diseases to train the
accuracy. The authors suggested further improvements prediction model. The dataset should be diverse and
through advanced data balancing techniques and comprehensive to enhance prediction accuracy.
hyperparameter tuning.  Data Cleaning: Preprocess the data by handling missing
values, removing inconsistencies, and normalizing features
These studies collectively demonstrate the effectiveness to ensure better model performance. Tools like Pandas and
of machine learning in disease prediction, emphasizing the NumPy can be used for efficient data processing.
role of Random Forest, SVM, and Decision Tree algorithms in  Feature Selection: Implement feature engineering
achieving high accuracy. The findings reinforce the techniques to identify the most relevant symptoms that
significance of artificial intelligence-driven healthcare contribute to accurate disease prediction.
solutions in facilitating early disease detection and medical
decision-making.  Machine Learning Model:

III. PROBLEMS IN EXISTING SYSTEM  Algorithm Selection: Deploy the Random Forest
algorithm, known for its high accuracy and robustness, to
 Manual and Time-Intensive Diagnosis predict diseases based on user-inputted symptoms.
The current healthcare system relies on traditional  Training and Testing: The model is trained using
medical consultations, where doctors manually assess historical patient data, validated with test datasets, and
symptoms to diagnose diseases. This time-consuming process fine-tuned to improve prediction precision.
often results in delays in treatment and increases the risk of  Performance Evaluation: Assess model accuracy using
late-stage disease detection. Additionally, medical expertise metrics such as precision, recall, F1-score, and confusion
varies among professionals, leading to subjectivity and matrix to ensure reliable predictions.
inconsistencies in diagnosis.
 User Interface Module:
 Limited Accuracy in Symptom-Based Checkers
Some online platforms provide basic symptom-checking  Web Interface: A Django-based web application provides
tools, but these systems operate on predefined rule-based an intuitive and interactive platform where users can select
algorithms rather than intelligent machine learning models. As symptoms from dropdown menus.
a result, they struggle to analyze complex symptom patterns,  User Input Handling: The system efficiently processes
often delivering generalized and unreliable predictions that do selected symptoms and sends them to the ML model for
not consider individual health variations. disease prediction.
 Personalization: Users receive customized predictions
 Delayed Disease Detection and Preventive Care along with relevant precautionary measures for the
Most conventional diagnostic methods focus on reactive diagnosed condition.
treatment rather than proactive prevention. This leads to late-
stage diagnosis, making treatment more challenging and
 AI-Powered Insights:
costly. Additionally, patients do not receive automated
insights on potential health risks based on their symptoms,
 Precautionary Measures: The system integrates
limiting their ability to take preventive actions.
OpenAI's API to offer personalized healthcare advice and
preventive suggestions for predicted diseases.
 Dependency on Medical Expertise
 Recommendation System: Based on the predicted
Disease diagnosis depends heavily on medical
professionals' experience and judgment, which introduces the disease, the system suggests next steps, including seeking
possibility of human error and misdiagnosis. Patients in medical consultation or lifestyle changes.

IJISRT25APR760 www.ijisrt.com 1068


Volume 10, Issue 4, April – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25apr760
 Automated Disease Prediction: receive AI-powered predictions, making disease diagnosis
more accessible and convenient.
 Real-Time Processing: The model instantly processes
user symptoms and returns results within seconds, making VI. METHODOLOGY
the system fast and efficient.
 Scalability: The system is designed to be easily The methodology of this project follows a structured
expandable, allowing for the addition of new diseases and approach, including data preprocessing, model training using
symptoms as more data becomes available. the Random Forest algorithm, web application development
and database management. Each stage ensures the system
V. OBJECTIVES effectively predicts diseases based on user-input symptoms
and provides precautionary suggestions for better health
 Early Disease Detection – management.
The system predicts diseases such as diabetes, heart
disease, and cancer based on symptoms provided by users,  Data Preprocessing:
enabling timely medical intervention and preventive care. Data preprocessing is a fundamental step in preparing
medical datasets for disease classification. The dataset
 Accurate and Efficient Predictions – contains various symptoms associated with multiple diseases,
By utilizing the Random Forest algorithm, the system requiring cleaning, transformation, and encoding to ensure
ensures high prediction accuracy while minimizing consistency and reliability.
overfitting, leading to reliable diagnostic outcomes.
 Handling Missing Data – Missing values in the dataset
 Automated Symptom Analysis – are identified and managed through imputation techniques
The model processes user-inputted symptoms efficiently or removal to maintain data integrity.
through feature selection, data cleaning, and encoding  Feature Selection – Redundant or irrelevant features are
techniques, ensuring fast and precise disease predictions. eliminated using statistical methods to improve model
efficiency.
 AI-Generated Health Precautions –  Data Normalization – Feature scaling techniques such as
By integrating OpenAI's API, the system provides Min-Max Scaling or Standardization are applied to
personalized precautionary measures and recommendations, optimize model performance by ensuring uniformity in
helping users take proactive steps toward better health data distribution.
management.  Encoding Categorical Data – Since symptom data often
contains categorical values, encoding techniques (e.g.,
 Improved Healthcare Accessibility – One-Hot Encoding or Label Encoding) are employed to
The Django-based web interface ensures an easy-to-use convert them into numerical form for machine learning
platform, allowing users to effortlessly input symptoms and compatibility.

Fig 1 Architecture of Disease Prediction Based on Symptoms

 Machine Learning Model – Random Forest: method that combines multiple decision trees to improve
The Random Forest algorithm is used for disease prediction performance. This approach reduces the risk of
classification due to its high accuracy and ability to handle overfitting and enhances the model’s reliability.
large datasets efficiently. It is an ensemble learning

IJISRT25APR760 www.ijisrt.com 1069


Volume 10, Issue 4, April – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25apr760
 Working of the Random Forest Algorithm:  Feature Randomness – During tree construction, only
a random subset of features (symptoms) is considered
 Random Sampling – A subset of data is randomly at each decision node to enhance model diversity.
selected from the original dataset through the  Voting Mechanism – Each decision tree makes a
Bootstrap Aggregation (Bagging) technique. separate disease prediction, and the final output is
 Decision Tree Construction – Each subset is used to determined based on a majority voting approach (for
train an independent decision tree, where it learns to classification tasks).
classify diseases based on symptoms.  Final Prediction – The disease with the highest votes
from the ensemble of trees is selected as the final
predicted outcome.

Fig 2 Working Flow of Random Forest Model

The bootstrap aggregation technique ensures that  User Information Storage – Securely stores user details,
different subsets of data contribute to diverse decision trees, login credentials, and past predictions.
improving model accuracy. Unlike relying on a single decision  Medical Data Handling – Stores symptom-disease
tree, the Random Forest model aggregates multiple relationships and model-generated insights.
predictions, leading to a more robust and generalized  Prediction History – Logs previous disease predictions
classification. for future reference and analysis.

 Web Application Development : VII. RESULT AND ANALYSIS


The web-based platform provides an interactive interface
for users to input symptoms and receive real-time disease The Gen AI for Disease Prediction system has been
predictions. successfully implemented and tested to evaluate its accuracy,
efficiency, and usability. The primary objective of this
 User Authentication – A secure authentication system evaluation is to determine how well the system predicts
enables users to sign up, log in, and manage their profiles. diseases based on user-input symptoms and provides relevant
 Symptom Selection – A dynamic dropdown menu allows precautionary measures. The performance of the Random
users to select symptoms, which are then processed for Forest model was assessed using various machine learning
prediction. evaluation metrics. Additionally, the web application’s
 Prediction Display – The predicted disease and associated functionality and user experience were examined to ensure
precautionary measures (retrieved using OpenAI’s API) seamless interaction and accessibility.
are displayed.
 User History Management – The system stores past  Model Performance Evaluation :
predictions, enabling users to track their health trends. To analyze the effectiveness of the Random Forest
classifier, key evaluation metrics were used, including:
 Database Management:
Efficient database management ensures smooth  Accuracy – The percentage of correctly predicted disease
functioning and structured storage of user data. cases among total cases.

IJISRT25APR760 www.ijisrt.com 1070


Volume 10, Issue 4, April – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25apr760

Fig 3 Accuracy vs no of Trees

 Confusion Matrix – A tabular representation of correct and incorrect predictions, categorized as true positives, false positives,
true negatives, and false negatives.

Fig 4 Confusion Matrix of Trained Model

IJISRT25APR760 www.ijisrt.com 1071


Volume 10, Issue 4, April – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25apr760
 Web Application Testing and user Experience :  Fast and Accurate Predictions: The model quickly
The web-based application was tested for functionality, processed symptom inputs and generated accurate disease
responsiveness, and user satisfaction. Several test cases were predictions.
executed to analyze how well the system handled symptom  Interactive User Interface: The web application was easy
selection, disease prediction, and precautionary measure to navigate, allowing users to seamlessly input symptoms
retrieval. Key observations included: and receive results.

Fig 5 Displays the Home Page of Disease Prediction

 Precautionary Suggestions: The system effectively fetched health recommendations using the OpenAI API, providing users
with valuable guidance.

Fig 6 Displays the Predicted Disease and along with Precautions.

IJISRT25APR760 www.ijisrt.com 1072


Volume 10, Issue 4, April – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25apr760
 History Page: The system securely maintains a record of user predictions, enabling easy access to previous diagnoses and
corresponding symptoms.

Fig 7 Displays the History of the user

 User Feedback: The application received positive reviews for its simplicity, accuracy, and usefulness.

Fig 8 Mail Received by the user when user is Successfully Registered.

VIII. CONCLUSION algorithm, it automates diagnosis based on user-input


symptoms, ensuring faster and more reliable results. The web-
The Gen AI for Disease Prediction system enhances based interface allows users to select symptoms and receive
healthcare by leveraging Machine Learning (ML) and AI for instant predictions with precautionary suggestions. Integration
accurate disease prediction. Using the Random Forest with OpenAI’s API provides personalized health insights,

IJISRT25APR760 www.ijisrt.com 1073


Volume 10, Issue 4, April – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25apr760
making the system both predictive and advisory. Built with
Django and Scikit-learn, it ensures security, scalability, and
user privacy. This AI-driven approach improves early disease
detection, reducing errors and supporting data-driven medical
decision-making.

FUTURE SCOPE

Future enhancements of the Gen AI for Disease


Prediction system will focus on improving machine learning
accuracy by integrating deep learning techniques for better
disease classification. The system will also incorporate
Natural Language Processing (NLP) to allow users to input
symptoms in natural language or through voice-based
interaction for greater accessibility. Additionally, integration
with wearable and IoT devices will enable real-time health
monitoring, offering early warnings for potential health risks.
Expanding the database will allow the system to predict a
wider range of diseases, including rare conditions. Enhanced
security measures, such as end-to-end encryption and
compliance with data protection laws, will ensure user data
privacy and safety.

REFERENCES

[1]. L. Breiman, “Random Forests,” Machine Learning,


vol. 45, no. 1, pp. 5–32, 2001. [Online]. Available:
https://doi.org/10.1023/A:1010933404324
[2]. J. Han, M. Kamber, and J. Pei, Data Mining: Concepts
and Techniques, 3rd ed. Morgan Kaufmann, 2011.
[3]. A. Rajkomar, J. Dean, and I. Kohane, “Machine
Learning in Medicine,” New England Journal of
Medicine, vol. 380, no. 14, pp. 1347–1358, 2019.
[Online]. Available:
https://doi.org/10.1056/NEJMra1814259.
[4]. T. Chen and C. Guestrin, “XGBoost: A Scalable Tree
Boosting System,” in Proceedings of the 22nd ACM
SIGKDD International Conference on Knowledge
Discovery and Data Mining, 2016, pp. 785–794.
[5]. S. Hochreiter and J. Schmidhuber, “Long Short-Term
Memory,” Neural Computation, vol. 9, no. 8, pp.
1735–1780, 1997. [Online]. Available:
https://doi.org/10.1162/neco.1997.9.8.173
[6]. Django Software Foundation, “Django Web
Framework,” [Online]. Available:
https://www.djangoproject.com/
[7]. Scikit-learn Developers, “Scikit-learn: Machine
Learning in Python,” [Online]. Available:
https://scikit-learn.org/.
[8]. McKinsey & Company. (2021). AI in Healthcare:
Transforming Diagnosis and Treatment.
[9]. Esteva, A., et al. (2017). Dermatologist-level
classification of skin cancer with deep neural networks.
Nature, 542(7639), 115-118.
[10]. Topol, E. (2019). Deep Medicine: How Artificial
Intelligence Can Make Healthcare Human Again.
Basic Books.

IJISRT25APR760 www.ijisrt.com 1074

You might also like