EPIDETECT SYSTEM USING
MACHINE LEARNING
AGENDA
• Abstract
• Introduction
• Literature survey
• Existing system
• Proposed system
• System Requirements
• Objectives
ABSTRACT
In today's fast-paced world, the healthcare industry faces numerous challenges in providing
timely and accurate diagnoses for patients. The early detection and correct identification of
diseases are crucial in ensuring effective treatment and improving patient outcomes.
However, with the vast array of medical conditions and symptoms, it can be challenging for
medical professionals to keep up-to-date with all possible diseases and their corresponding
symptoms.To address this issue, we propose the development of a ‘EpiDetect system using
Machine Learning’ which is a symptom-based disease detector system, a smart and efficient
tool that employs machine learning techniques to assist healthcare professionals in
diagnosing diseases based on a patient's reported symptoms. The system will act as a
decision support system, offering medical practitioners an additional resource to enhance
diagnostic accuracy and speed.
INTRODUCTION
● The development of intelligent disease detection systems has emerged as a
significant advancement in healthcare technology. These systems aim to assist
medical professionals in diagnosing diseases based on patient symptoms,
leveraging the power of machine learning algorithms.
● In this project, we propose the implementation of a disease detection system using
the Random Forest algorithm, Naive Bayes, and Support Vector Machines (SVM).
● The system will operate by accepting symptom descriptions from users through a
user-friendly interface, such as a web application .
● The collected symptom data will undergo preprocessing and feature extraction to
ensure its suitability for the machine learning models
LITERATURE SURVEY
S. Title Approach Pros/cons Year
No
1 Generic Disease ● Associating symptoms with Pros: 2022
Prediction using disease in Database
Symptoms with ● Pattern Matching ● RF algorithm is most accurate
Supervised ● Predicting the possible ensemble learning algorithm.
Machine diseases ● RF runs efficiently for large
Learning ● Output as Disease
● Algorithms used data sets.
1. Decision tree ● It can handle hundreds of input
2. Naive Bayes variables.
3. Random forest
Cons:
● (Random forest, Decision tree,
Naïve Bayes) achieved an
accuracy of 82.26%
LITERATURE SURVEY
S. Title Approach Pros/cons Year
No
2 Symptoms ● Data preprocessing Pros: 2021
Based ● Training model using KNN ● Handles Multiclass
Problems
Multiple ● Testing ● Instance-Based Learning
Disease ● Predict values
Prediction Cons:
Model using ● Feature Scaling
Machine ● Imbalanced Data
Learning
Approach
LITERATURE SURVEY
S. Title Approach Pros/cons Year
No
3 Disease ● Data preprocessing Pros: 2020
prediction from ● Training model using ● Less Susceptible to
various Logistic Regression Overfitting
symptoms using ● Interpretable Results
● Testing
machine Cons:
● Predict values ● Sensitivity to Outliers
learning
● Independence Assumption
LITERATURE SURVEY
S. Title Approach Pros/cons Year
No
4 Multiple Disease ● Data Gathering Pros: 2020
Prognostication ● Handles Multiclass
● Data Processing Problems
Based On
Symptoms Using ● Model Selection ● Instance-Based
Machine Learning ● Training Learning
Techniques ● Logistic Regression Cons:
● Evaluation ● Feature Scaling
● Prediction ● Imbalanced Data
LITERATURE SURVEY
S. Title Approach Pros/cons Year
No
5 Django Pros: 2021
Website for ● Decision Tree ● Data Interpretation
Disease ● KNN ● Data Preparation
● Naive Bayes
Prediction ● Multiple data types
using Machine are supported
Learning ● Generates robust
classifiers
Cons:
● Maintenance and
updates
● User interpretation
and psychological
impact
LITERATURE SURVEY
S. Title Approach Pros/cons Year
No
6 Disease ● Simple Linear Regression Pros: 2020
Prediction ● Data Collection ● Simple Implementation
using Machine ● Data Preparation ● Performs best on linear
● Test the data
Learning Data
Cons:
● Underfitting
● Sensitive to outliers
● Asssumes the data is
independent
LITERATURE SURVEY
S. Title Approach Pros/cons Year
No
7 Disease ● Data Collection Pros: 2022
Prediction ● Data Cleaning ● Simplicity
using Machine ● Model Building ● Fast training and prediction
Learning ● Data splitting using K-Fold ● Scalability
Cross Validation ● No overfitting
● Naive Bayes Classification
Cons:
● No regression
● Limited expressive power
● Lack of probabilistic
calibration
LITERATURE SURVEY
S. Title Approach Pros/cons Year
No
8 Disease ● Decision Tree Algorithm Pros: 2021
Prediction ● Data Collection ● Interpretability
using Machine ● Data Preparation ● Handling both numerical
● Test the data
Learning and categorical data
● Feature importance
● Fast prediction time
● Robust to outliers
Cons:
● Overfitting
● Lack of stability
● Difficulty handling
continuous variables
● Instability with small
changes
LITERATURE SURVEY
S. Title Approach Pros/cons Year
No
9 Disease ● Designed a disease prediction system Pros: 2020
prediction using multiple ML algorithms such as ● virtual doctor
from Fine Tree, Medium Tree, Coarse Tree, ● more accurate
various Fine KNN, Medium KNN, Coarse KNN , diagnosis
symptoms Weighted KNN and Gaussian Naive ● prevention of the
using Bayes. illness
machine ● The weighted KNN algorithm gave the
learning best results as compared to the other Cons:
algorithms.
● Accuracy
● The outcome of the models is the disease
● Data availability
as per the symptoms, age, and gender is
given to the processing model.
● Privacy
LITERATURE SURVEY
S. Title Approach Pros/cons Year
No
10 SmartCare: A ● Predicts more than one disease at a time. Pros: 2022
Symptoms ● The following algorithms are used in ● Remote
Based Disease developing the Symptoms Based Disease accessibility
Prediction Prediction Model: ● Time saving
Model Using -Decision Tree ● Easy to use
Machine -Random Forest
Learning -KNN Cons:
Approach -Naive Bayes ● Privacy
● Accuracy rate of the system is 97%
problem
● Smartphone
accessibility
LITERATURE SURVEY
S. Title Approach Pros/cons Year
No
11 Disease ● Disease Predictor is a web-based Pros: 2021
Prediction using system that predicts a user's disease ● Reduces the
Machine based on the symptoms they have. time and cost
Learning ● Disease prediction is accomplished ● Reduces
using the random forest classifier. mortality rate
● Both structured and unstructured data
would be considered by the proposed Cons:
framework. ● Accuracy is not
● In the case of unstructured text files,
100%
we use the random forest algorithm to
● Smartphone
automatically select features.
● The use of a latent factor model to
accessibility
recreate missing data in medical
records obtained from online sources.
LITERATURE SURVEY
S. Title Approach Pros/cons Year
No
12 Symptoms ● Used three algorithms viz. Support Vector Pros: 2022
based disease Machine, Random Forest Classifier, Naive ● Early disease
prediction Bayes algorithm. detection
● Free of cost
● Model built using Random Forest ● Time
Classifier.(Accuracy= (TP+TN)/ Total n) efficient
● Ensemble prediction Cons:
● Lack of
specificity
● Slight
chances of
misprediction
LITERATURE SURVEY
S. Title Approach Pros/cons Year
No
13 Disease ● Algorithms Pros: 2020
Prediction using ● K-Neighbours ● Early
Machine ● Random Forest detection
Learning ● SVM ● Improved
accuracy
● Personalized
Medicine
Cons:
● Privacy
concerns
● Data quality
and bias
LITERATURE SURVEY
S.N Title Approach Pros/cons Year
o
14 Disease ● Data preprocessing Pros: 2022
Prediction ● Logistic Regression ● Early detection
Application ● Random Forest ● Population
using Machine Diseases health
Learning Diabetes management
Breast Cancer ● Improved
Heart Disease accuracy
Cons:
● Data quality
and bias
● Overreliance on
technology
● Privacy
concerns
LITERATURE SURVEY
S. Title Approach Pros/cons Year
No
15 Smart Health ● Naive Bayes Algorithm Pros: 2021
Prediction
Using Machine ● Accessibility and
Learning Convenience
● Early detection and
intervention
Cons:
● Variation in
symptom reporting
● Limited diagnostic
accuracy
● Potential for
misinterpretation
and self-diagnosis
LITERATURE SURVEY
S. Title Approach Pros/cons Year
No
16 Disease ● Logistic Regression Pros: 2022
Prediction using ● CNN ● To help helpless
Machine Diseases and poor people
Learning and Cancer ● To increase the
Deep Learning Diabetes technology in
Heart medical fields
Liver Cons:
Kidney ● Large data
Malaria
requirements
Pneumonia
● Overfitting
● Complexity
EXISTING SYSTEM
● The traditional method for disease detection based on symptoms typically involves a manual
and subjective approach by medical professionals.
● It relies on their expertise and clinical experience to interpret and analyze patient symptoms.
● This process heavily relies on the individual's knowledge, and ability to recognize patterns and
potential correlations between symptoms and diseases.
● The traditional method can be time-consuming and subjective, as it heavily relies on the
healthcare professional's expertise.
● Additionally, it may be prone to human errors, misinterpretation, or bias.
DRAWBACKS OF EXISTING SYSTEM
● Subjectivity: Relies on the subjective interpretation and decision of healthcare
professionals
● Limited knowledge base: They may not be aware of rare or emerging diseases, and their
knowledge may not always be up to date
● Time-consuming: The traditional method can be time-consuming, requiring extensive
patient interviews, physical examinations, and diagnostic tests.
● Costly: The extensive use of manual examinations and diagnostic tests in the traditional
method can be costly for patients
● Human error: Healthcare professionals are prone to human errors, such as overlooking
certain symptoms or misinterpreting information.
PROPOSED SYSTEM
● The proposed system aims to develop an intelligent disease detection system that utilizes machine
learning algorithms, specifically the Random Forest, Naive Bayes, and Support Vector Machines
(SVM), to accurately predict diseases based on patient symptoms.
● The system will offer an efficient and reliable solution for assisting medical professionals in
diagnosing diseases and providing preliminary assessments.
● The system will be designed to accept symptom descriptions from users through a user-friendly
interface, allowing for easy input of relevant information
● The collected symptoms will undergo preprocessing and feature extraction to transform the data
into a suitable format for analysis by the machine learning models.
SYSTEM REQUIREMENTS
HARDWARE REQUIREMENTS
● 4GB RAM
● Intel 2.8GHz i3 processor
● Processor : I5 or above
● Memory : 1 GB or above
SOFTWARE REQUIREMENTS
● Operating system : Windows 10 or later
● Python
● VSCode editor
SYSTEM REQUIREMENTS
FUNCTIONAL REQUIREMENTS
● Functional requirements
● Symptom Input and Processing
● Disease Prediction
● Ensemble Model Integration
● Performance Metrics
● Continuous Learning and Updates
NON FUNCTIONAL REQUIREMENTS
● User Interface Design
● Performance
● Accuracy and Reliability
● Security
● Scalability
● Compatibility
● Ethical Considerations
OBJECTIVES
1. Accurate Disease Prediction: The system should utilize machine learning algorithms and
data analysis techniques to identify patterns and correlations between symptoms and specific
diseases, ultimately improving diagnostic accuracy.
2. Continuous Learning and Improvement: Regular updates, incorporating new research
findings, clinical data, and expert feedback, should enhance the system's prediction
capabilities over time.
3. Time and Cost Reduction: By providing a rapid and efficient preliminary assessment of
potential diseases based on symptoms alone, the system can expedite the diagnostic process
and minimize unnecessary procedures.
THANK YOU