Gen AI for Disease Prediction
Gen AI for Disease Prediction
Abstract: The project "Gen AI for Disease Prediction", utilizes advanced machine learning methodologies to forecast
diseases such as diabetes, heart disease, and cancer based on user-input symptoms. It employs the Random Forest algorithm,
a powerful and flexible machine learning model, ensuring accurate predictions while reducing the likelihood of overfitting.
To enhance prediction reliability, the system incorporates data preprocessing techniques such as feature selection, data
cleaning, and encoding. Developed using Scikit-learn, Python, and Django, the project integrates sophisticated machine
learning functions with an intuitive web interface. Users can conveniently select symptoms from dropdown menus, which
are then processed by the backend system. The machine learning model, trained on a well-structured dataset covering
various medical conditions and their symptoms, analyzes the input to generate predictions. Ultimately, this project delivers
a scalable and efficient disease prediction system that aids in the early detection of potential health issues.
Keywords: Random Forest Algorithm, Medical Diagnosis, Scikit-Learn, Symptom Analysis, Early Disease Detection.
How to Cite: M V V Krishna; G Sri Jaya Sairam; P Karthik; M Shakeer; G Arjun; SD Basheer Babu (2025). Gen AI for Disease
Prediction. International Journal of Innovative Science and ResearchTechnology, 10(4), 1067-1074.
https://doi.org/10.38124/ijisrt/25apr760
III. PROBLEMS IN EXISTING SYSTEM Algorithm Selection: Deploy the Random Forest
algorithm, known for its high accuracy and robustness, to
Manual and Time-Intensive Diagnosis predict diseases based on user-inputted symptoms.
The current healthcare system relies on traditional Training and Testing: The model is trained using
medical consultations, where doctors manually assess historical patient data, validated with test datasets, and
symptoms to diagnose diseases. This time-consuming process fine-tuned to improve prediction precision.
often results in delays in treatment and increases the risk of Performance Evaluation: Assess model accuracy using
late-stage disease detection. Additionally, medical expertise metrics such as precision, recall, F1-score, and confusion
varies among professionals, leading to subjectivity and matrix to ensure reliable predictions.
inconsistencies in diagnosis.
User Interface Module:
Limited Accuracy in Symptom-Based Checkers
Some online platforms provide basic symptom-checking Web Interface: A Django-based web application provides
tools, but these systems operate on predefined rule-based an intuitive and interactive platform where users can select
algorithms rather than intelligent machine learning models. As symptoms from dropdown menus.
a result, they struggle to analyze complex symptom patterns, User Input Handling: The system efficiently processes
often delivering generalized and unreliable predictions that do selected symptoms and sends them to the ML model for
not consider individual health variations. disease prediction.
Personalization: Users receive customized predictions
Delayed Disease Detection and Preventive Care along with relevant precautionary measures for the
Most conventional diagnostic methods focus on reactive diagnosed condition.
treatment rather than proactive prevention. This leads to late-
stage diagnosis, making treatment more challenging and
AI-Powered Insights:
costly. Additionally, patients do not receive automated
insights on potential health risks based on their symptoms,
Precautionary Measures: The system integrates
limiting their ability to take preventive actions.
OpenAI's API to offer personalized healthcare advice and
preventive suggestions for predicted diseases.
Dependency on Medical Expertise
Recommendation System: Based on the predicted
Disease diagnosis depends heavily on medical
professionals' experience and judgment, which introduces the disease, the system suggests next steps, including seeking
possibility of human error and misdiagnosis. Patients in medical consultation or lifestyle changes.
Machine Learning Model – Random Forest: method that combines multiple decision trees to improve
The Random Forest algorithm is used for disease prediction performance. This approach reduces the risk of
classification due to its high accuracy and ability to handle overfitting and enhances the model’s reliability.
large datasets efficiently. It is an ensemble learning
The bootstrap aggregation technique ensures that User Information Storage – Securely stores user details,
different subsets of data contribute to diverse decision trees, login credentials, and past predictions.
improving model accuracy. Unlike relying on a single decision Medical Data Handling – Stores symptom-disease
tree, the Random Forest model aggregates multiple relationships and model-generated insights.
predictions, leading to a more robust and generalized Prediction History – Logs previous disease predictions
classification. for future reference and analysis.
Confusion Matrix – A tabular representation of correct and incorrect predictions, categorized as true positives, false positives,
true negatives, and false negatives.
Precautionary Suggestions: The system effectively fetched health recommendations using the OpenAI API, providing users
with valuable guidance.
User Feedback: The application received positive reviews for its simplicity, accuracy, and usefulness.
FUTURE SCOPE
REFERENCES