21kn1a4250 1
21kn1a4250 1
A PROJECT REPORT
Submitted to
Bachelor of Technology
in
COMPUTER SCIENCE AND ENGINEERING (AI&ML)
Submitted by
Golla Bhavana 21KN1A4250(881186183480)
1
PREDICTION OF DIABETES, HEART
DISEASE AND PARKINSON’S DISEASE
USING ML
A PROJECT REPORT
Submitted to
Bachelor of Technology
in
COMPUTER SCIENCE AND ENGINEERING (AI&ML)
Submitted by
Golla Bhavana 21KN1A4250(881186183480)
2
CERTIFICATE
EXTERNAL EXAMINER
3
DECLARATION
I hereby declare that the project report titled “PREDICTION OF DIABETES, HEART
DISEASE AND PARKINSON’S DISEASE USING ML” is a bonafide work carried out in
fulfilment for the award of the degree of Bachelor of Technology by JNTU Kakinada.
I further declare that this dissertation has not been submitted elsewhere for any Degree.
4
ACKNOWLEDGEMENT
We take this opportunity to thank all who have rendered their full support to my work. The
pleasure, the achievement, the glory, the satisfaction, the reward, the appreciation and the
construction of my project cannot be expressed with a few words for their valuable suggestions.
We are extending our sincere thanks to Professor, Dr. V TEJU for her continuous guidance
and support to complete my project successfully.
We are expressing our heartfelt thanks to Professor & head of the Department, Dr. B.
DASARADHA RAM for his continuous guidance for completion of my Project work.
We thankful to the Principal, Dr. C. NAGA BHASKAR for his encouragement to complete
the Project work.
We are extending our sincere and honest thanks to the Chairman, Dr.R. VENKATA RAO &
Secretary, Sri K. Sridhar for their continuous support in completing the Project work.
5
ABSTRACT
Healthcare has witnessed significant advancements with the integration of machine learning
(ML), enabling early disease detection and accurate predictions. This project presents a
Multiple Disease Prediction System that utilizes Random Forest, Decision Tree, and Support
Vector Machine (SVM) to diagnose Diabetes, Heart Disease, and Parkinson’s Disease using
Kaggle datasets. The system employs a data-driven approach, leveraging feature selection,
preprocessing, and model optimization to enhance predictive performance.
Among the algorithms evaluated, Random Forest achieved the highest accuracy across all three
diseases, demonstrating its robustness in handling large datasets with complex patterns. The
Decision Tree classifier was incorporated for its interpretability, providing insights into feature
importance, while SVM was effective for high-dimensional medical data classification. The
system follows a structured ML pipeline, including data cleaning, normalization,
hyperparameter tuning, and model evaluation to ensure optimal results.
6
TABLE OF CONTENTS
1. INTRODUCTION 1
2. LITERATURE SURVEY 3
3. SYSTEM ANALYSIS 7
3.1 Existing System 7
3.1.1 Disadvantages of Existing System 7
7
5. SYSTEM IMPLEMENTATION 19
5.1 Modules 19
5.2 Algorithms 19
5.3 Random Forest 20
5.4 SVM 21
5.5 Decision Tree 22
6. SYSTEM TESTING 35
6.1 Software Testing 35
6.2 Structural Testing 36
6.3 Behavioral Testing 37
6.4 Black-Box Testing 37
7. RESULTS 38
8. CONCLUSION & FUTURE WORK 46
9. REFERENCES 47
10. PUBLICATION 49
8
LIST OF FIGURES
Figure No Name of the Figure Page No
1 Introduction 1
9
1. INTRODUCTION
Healthcare is one of the most critical sectors where technological advancements have
significantly improved diagnosis, treatment, and patient management. Machine learning (ML) in
healthcare has gained immense importance due to its capability to examine large medical
datasets, recognize patterns, and make accurate predictions. Detection in its early stages is key to
a decisive improvement in the treatment of heart diseases. Machine learning helps to detect
cardiac disease in few seconds [1]. However, traditional diagnostic methods are often time-
consuming, costly, and require extensive medical expertise. These challenges highlight the need
for automated, AI-driven healthcare solutions that can assist medical professionals and improve
diagnostic accuracy.
This work presents a Multiple Disease Prediction System to forecast the probability of three main
diseases: machine learning models for Diabetes, Heart Disease, and Parkinson’s and real-world
medical datasets. Every one of these diseases seriously affects world health:
Diabetes: This metabolic illness causes abnormal blood sugar levels over time. If left undiagnosed or
unmanaged, it can lead to serious complications like kidney failure, heart disease, and nerve damage.
Heart Disease: It is a general term for various heart conditions, including coronary artery disease,
heart attacks and heart failure. As one of the major causes of death worldwide, early detection is
crucial for preventing serious complications and saving lives.
Parkinson’s Disease: a progressive and long-lasting neurological disease influencing mobility and
speech. Many problems with movement and control follow from this disturbance, which causes
many symptoms including tremor, stiffness, and trouble with coordination. Early diagnosis can
enhance quality of life [2] and help to properly control symptoms.
By leveraging Kaggle datasets, this system is trained on real-world patient data to improve accuracy
and reliability. The project employs Support Vector Machines (SVM) for Diabetes and Parkinson’s
Disease prediction and Logistic Regression for Heart Disease classification, ensuring a tailored and
effective approach for each condition. Additionally, Decision Tree algorithms are incorporated to
refine predictions and optimize model performance.
The system aims to predict multiple diseases using an AI-driven approach, reducing dependency on
costly and time-consuming medical tests, Enhance diagnostic accuracy by training models on diverse
datasets with real-world patient information. Support early on prevention of Parkinson's disease,
heart disease, and diabetes. To anticipate disease and maximise therapy choice for the real-life
patient, a virtual representation of a patient is created and gets real-time updates of a spectrum of data
variables [3].
Make the system accessible and user-friendly through an interactive web-based interface (using
flask), allowing users to input medical parameters and receive real-time disease predictions.
Facilitate preventive healthcare by providing an efficient tool for risk assessment, helping both
individuals and medical professionals make informed decisions.
1
FIGURE 1: Proposed Model Architecture
In Figure 1, the Proposed System of Predictive Disease Detection introduces a method for
processing input datasets (Diabetes, Heart Disease, Parkinson's Disease), data preprocessing
steps, and classification process (DT, SVM, RF) and use Accuracy, Precision, Recall, and
F1-score.
2
2. LITERATURE SURVEY
2.2 Design of an Efficient Prediction Model for Early Parkinson’s Disease Diagnosis (2024)
Author: K. Shyamala,T. M. Navamani
3
2.5 Deep Learning of the Retina Enables Phenome and Genome (2021)
Author:Seyedeh Maryam Zekavat,Vineet K Raghu,Mark Trinder
Date of Submission: 2021
Abstract: The microvasculature, the smallest blood vessels in the body, has key roles in
maintenance of organ health and tumorigenesis. The retinal fundus is a window for human in
vivo noninvasive assessment of the microvasculature.
2.8 Use of deep learning genomics to discriminate Alzheimer's disease and healthy controls
(2021).
Author(s):Lanlan Li,Yanru Huang,Ying Han,Jiehui Jiang.
Date of Submission : 2021
Abstract: Alzheimer's disease (AD) is the most prevalent neurodegenerative disorder and
the most common form of dementia in the elderly. Because gene is an important clinical
risk factor resulting in AD, genomic studies, such as genome-wide association studies
(GWAS), have widely been applied into AD studies.
2.9 A Deep Learning-Based Genome-Wide Polygenic Risk Score for Common Diseases
Identifies Individuals with Risk (2021).
Author(s): J. Peng, J. Li, R. Han
Date of Submission : 2021
Abstract: The paper explores deep learning-based genomics in differentiating Alzheimer’s
disease from healthy individuals. Their study underscores the potential of AI in advancing
neurodegenerative disease diagnostics.
4
2.10 Gait Analysis for Parkinson’s Disease Using Machine Learning (2021).
Author(s) : S. W. Lee
2.11 Machine learning algorithms for predicting coronary artery disease (2021)
Author(s) :Aravind Akella,Sudheer Akella
2.13 Heart disease prediction using hybrid machine learning model (2021)
Author(s) : M. Kavitha, G. Gnaneswar, R. Dinesh, Y. R. Sai, and R. S. Suraj
Abstract : Background Studies have demonstrated that the current US guidelines based on
American College of Cardiology/American Heart Association (ACC/AHA) Pooled Cohort
Equations Risk Calculator may underestimate risk of atherosclerotic cardiovascular disease
(CVD) in certain high‐risk individuals, therefore missing opportunities for intensive therapy
and preventing CVD events
2.18 Speech Analysis for Parkinson’s Disease Detection Using CNN and LSTM (2020).
Author(s) : D. P. Kingma
6
2.19 Predicting Coronary Heart Disease Using an Improved LightGBM Model
Author(s): Captainozlem
3. SYSTEM ANALYSIS
3.1 EXISTING SYSTEM:
• Accuracy is Low
• Less prediction.
The proposed system integrates Random Forest, Decision Tree, and Support Vector
Machine (SVM) for Multiple Disease Prediction (Diabetes, Heart Disease, and
Parkinson’s). After extensive evaluation, Random Forest achieved the highest
accuracy across all three diseases, demonstrating its superior performance in handling
complex patterns and large datasets. Decision Tree provided interpretability, while
SVM performed well with high-dimensional data. The system follows a structuredML
7
pipeline, including data preprocessing, feature selection, and hyperparameter tuning
using Kaggle datasets. With a user-friendly interface, this system supports early
diagnosis, enhances healthcare decision-making, and promotes preventive care,
ensuring improved patient outcomes.
• Our work focuses on optimizing the PCHF mechanism to select the most important
features, leading to improved accuracy.
• We introduce an innovative approach by creating a new dataset based on the eight best-fit
features, optimized for accuracy enhancement.
8
3.3 SYSTEM REQUIREMENTS:
Hard Disk: 50 GB
Software: Anaconda
Database: Sqlite3
• 1.Data Collection
• 2.Data Preprocessing
• 4.Modiling
• 5.Predicting
9
3.5 NON-FUNCTIONAL REQUIREMENTS:
Usability Requirement
The system should have a user-friendly interface for key management and encryption
operations.
Users should be able to manage encrypted files with minimal technical knowledge.
Service ability Requirement
The system should allow easy updates and improvements without affecting existing
security mechanisms.
3.5.1 ADVANTAGES:
3.5.2 DISADVANTAGES:
• High Computational Overhead - Encrypting files with AES and ECC, along with
blockchain transactions, can increase processing time.
• Storage Overhead - Blockchain requires significant storage space due to its
immutable record-keeping.
• Complexity - Managing dynamic AES keys and blockchain infrastructure requires
specialized technical expertise.
10
• Transaction Costs - Public blockchain networks may incur transaction fees,
increasing operational costs.
• Latency Issues - Writing and verifying transactions on a blockchain can introduce
delays in encryption and decryption processes.
• The biggest disadvantage of Non-functional requirement is that it may affect the various
high-level software subsystems.
The feasibility of the project is analyzed in this phase and business proposal is put
forth with a very general plan for the project and some cost estimates. During system
analysis the feasibility study of the proposed system is to be carried out. This is to
ensure that the proposed system is not a burden to the company. For feasibility
analysis, some understanding of the major requirements for the system is essential.
• ECONOMICAL FEASIBILITY
• TECHNICAL FEASIBILITY
• SOCIAL FEASIBILITY
11
3.6.1.1 ECONOMICAL FEASIBILITY:
This study is carried out to check the economic impact that the system will
have on the organization. The amount of fund that the company can pour into the
research and development of the system is limited. The expenditures must be justified.
Thus, the developed system as well within the budget and this was achieved because
most of the technologies used are freely available. Only the customized products had
to be purchased.
This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the
available technical resources. This will lead to high demands on the available technical
resources. This will lead to high demands being placed on the client. The developed system
must have a modest requirement, as only minimal or null changes are required for
implementing this system.
The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not
feel threatened by the system, instead must accept it as a necessity. The level of acceptance
by the users solely depends on the methods that are employed to educate the user about the
system and to make him familiar with it. His level of confidence must be raised so that he
is also able to make some constructive criticism, which is welcomed, as he is the final user
of the system.
12
4. SYSTEM DESIGN
13
4.2 UML DIAGRAMS:
GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of programming languages and development process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks,
patterns and components.
7. Integrate best practices.
14
4.2.2 CLASS DIAGRAM:
The class diagram is used to refine the use case diagram and define a detailed
design of the system. The class diagram classifies the actors defined in the use case
diagram into aset of interrelated classes. The relationship or association between the
classes can be either an "is-a" or "has-a" relationship. Each class in the class diagram
maybe capable of providing certain functionalities. These functionalities provided by
the class are termed "methods" of the class. Apart from this, each class may have
certain "attributes" that uniquely identify the class.
15
4.2.3 SEQUENCE DIAGRAM:
A sequence diagram represents the interaction between different objects in the system.
The important aspect of a sequence diagram is that it is time-ordered. This means that
the exact sequence of the interactions between the objects is represented step by step.
Different objects in the sequence diagram interact with each other by passing
"messages".
DIAGRAM
Deployment diagram:
The deployment diagram captures the configuration of the runtime elements of the application. This
diagram is by far most useful when a system is built and ready to be deployed.
16
4.2.4 DATA FLOW DIAGRAM:
1. The DFD is also called bubble chart. It is a simple graphical formalism that can
be used to represent a system in terms of input data to the system, various
processing carried out on this data, and the output data is generated by this
system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It
is used to model the system components. These components are the system
process, the data used by the process, an external entity that interacts with the
system and the information flows in the system.
3. DFD shows how the information moves through the system and how it is
modified by a series of transformations. It is a graphical technique that depicts
information flow and the transformations that are applied as data moves from
input to output.
4. DFD is also known as bubble chart. A DFD maybe used to represent a system
at any level of abstraction. DFD may be partitioned into levels that represent
increasing information flow and functional detail.
17
4.2.5 FLOWCHART DIAGRAM
The flowchart represents a structured workflow for a system that processes datasets using
an algorithm. It begins with a user attempting access, followed by a verification step to
check authorization. Unauthorized users are denied access, while authorized users
proceed to upload a dataset. The system then allows viewing of the dataset before running
an algorithm for analysis. Once executed, the algorithm predicts an output, which is then
visualized as a graph. The process concludes after the graphical representation, ensuring a
streamlined approach to data processing and analysis.
18
5. SYSTEM IMPLEMENTATION
5.1MODULES:
• Gathering the datasets: We gather all the r data from the Kaggle website and upload to
the proposed model.
• Generate Train & Test Model: We must preprocess the gathered data and then we have
to split the data into two parts training data with 80% and test data with 20%.
• Run Algorithms: For prediction apply the machine learning models on the dataset by
splitting the datasets in to 70 to 80 % of training with these models and 30 t0 20 % of
testing for predicting.
• Predict output: in this module we will predict output based on input data.
5.2 Algorithm:
Step 1 Choose relevant datasets for Diabetes, Heart Disease, and Parkinson’s for training
and evaluation.
Step 2 To ensure generalization, divide the dataset into training, validation, and testing sets.
Step 3 Use ML classifiers such as DT, RF, or SVM for disease classification.
Step 4 Optimize model parameters such as tree depth, number of estimators, and feature
selection.
Step 5 Assess model performance using accuracy, precision, recall, and F1-score.
Step 6 Identify the most effective model for each disease based on validation and testing
metrics.
19
Step 7 Test the model’s effectiveness by applying it to unseen real-world data.
Step 8 Deploy the model into a Flask-based web application with SQLite for user
interaction and real-time disease detection.
This combination of multiple models is called Ensemble. Ensemble uses two methods:
1. Bagging: Creating a different training subset from sample training data with
replacement is called Bagging. The final output is based on majority voting.
2. Boosting: Combing weak learners into strong learners by creating sequential models
such that the final model has the highest accuracy is called Boosting. Example: ADA
BOOST, XG BOOST.
Fig 3
5.3 Random Forest:
Random Forest is an ensemble learning technique that use a large number of decision
trees to reduce overfitting and improve prediction accuracy. This research uses
Random Forest to foretell cases of Parkinson's disease. This method creates a large
number of low-correlation decision trees by bootstrapping data and features at each
split. With more diverse decision trees produced by randomisation, the Random Forest
model becomes more resilient and less prone to overfitting. When making predictions,
all of the decision trees' outputs are combined for classification, often using majority
voting. Random Forest's ability to manage complex
20
Fig 4: Radom forest architecture
5.4 SVM
For most classification tasks, supervised machine learning methods like Support
Vector Machine (SVM) come to mind. For diabetes prediction, SVM sorts patient data
into categories of diabetics and non-diabetics. SVM determines the best hyperplane
for class data separation. Separation is made easier using a kernel technique, which
translates complex non-linear data into a higher-dimensional space. With or without
noise, support vector machines (SVMs) are masters at identifying the critical decision
boundary in high-dimensional data. By highlighting support vectors—the data points
that are geographically closest to the decision border—support vector machines
(SVMs) improve diabetes prediction accuracy and decrease classification errors.
21
5.4 Decision Tree
As a supervised machine learning method, the Decision Tree methodology works well
for regression and classification tasks. It foretells heart in this particular experiment.
Decision trees repeatedly sort data into subgroups based on the most informative
features using Gini Impurity or Entropy. At some point, the tree branches out to
classify the information (e.g., "Heart Disease" or "No Heart Disease"). Because the
decision-making process is straightforward and easy to follow, healthcare practitioners
can understand prediction with the use of Decision Trees, which are interpretable.
Decision trees work well for datasets that have feature connections that are clear to
see, but they are simplistic and can overfit if not calibrated properly.
Fig 6: DT architecture
CODE:
App.py
#import require python classes and packages
random_seed = 42
random.seed(random_seed)
np.random.seed(random_seed)
22
UPLOAD_FOLDER = 'uploads'
ALLOWED_EXTENSIONS = {'txt','csv'}
MAX_ UPLOAD_ SIZE_ MB = 5 1 2 # Maximum upload size in megabytes
app = Flask(_name_)
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
app.config['MAX_CONTENT_LENGTH'] = MAX_UPLOAD_SIZE_MB * 1024 * 1024 # Set max
content length
def allowed_file(filename):
return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS
@app.route('/')
def index():
return render_template('index.html')
@app.route('/register')
def register():
return render_template('register.html')
@app.route('/login')
def login():
return render_template('login.html')
@app.route("/signup")
def signup_route():
return signup()
23
@app.route("/signin")
def signin_route():
return signin()
@app.route('/home')
def home():
return render_template('home.html')
@app.route('/heart')
def heart():
return render_template('heart_predict.html')
@app.route('/diabetes')
def diabetes():
return render_template('diabetes_predict.html')
@app.route('/parkinson')
def parkinson():
return render_template('parkinsons_predict.html')
@app.route('/heart_notebook')
def heart_notebook():
return render_template('heart_test.html')
@app.route('/parkinson_notebook')
def parkinson_notebook():
return render_template('parkinson_test.html')
@app.route('/diabetes_notebook')
def diabetes_notebook():
24
return render_template('diabetes_test.html')
@app.route('/predict',methods=['POST'])
def predict():
int_features= [float(x) for x in request.form.values()]
print(int_features,len(int_features))
final4=[np.array(int_features)]
model = joblib.load('model/heart_model.sav')
predict = model.predict(final4)
if predict == 0:
output = "NO Heart"
alert_class = "alert-success text-success" # Green color for positive result
icon = "fas fa-smile-beam text-success" # Happy icon
text_color = "green"
elif predict == 1:
output = "Heart"
alert_class = "alert-danger text-danger" # Red color for warning
icon = "fas fa-exclamation-triangle text-danger" # Warning icon
text_color = "red"
returnrender_template('prediction.html',
output=output,alert_class=alert_class,icon=icon,text_color=text_color)
@app.route('/predict2',methods=['POST'])
def predict2():
int_features= [float(x) for x in request.form.values()]
print(int_features,len(int_features))
final4=[np.array(int_features)]
model = joblib.load('model/diabetes.sav')
predict = model.predict(final4)
if predict == 0:
25
output = "No Diabetes"
alert_class = "alert-success text-success" # Green color for positive result
icon = "fas fa-smile-beam text-success" # Happy icon
text_color = "green"
elif predict == 1:
output = "Diabetes"
alert_class = "alert-danger text-danger" # Red color for warning
icon = "fas fa-exclamation-triangle text-danger" # Warning icon
text_color = "red"
@app.route('/predict3',methods=['POST'])
def predict3():
int_features= [float(x) for x in request.form.values()]
print(int_features,len(int_features))
final4=[np.array(int_features)]
model = joblib.load('model/parkinson.sav')
predict = model.predict(final4)
if predict == 0:
output = "NO Parkinson"
alert_class = "alert-success text-success" # Green color for positive result
icon = "fas fa-smile-beam text-success" # Happy icon
text_color = "green"
elif predict == 1:
output = "Parkinson"
alert_class = "alert-danger text-danger" # Red color for warning
icon = "fas fa-exclamation-triangle text-danger" # Warning icon
text_color = "red"
26
return render_template('prediction.html', output=output, alert_class=alert_class,
icon=icon,text_color=text_color)
if_name_ == '_main_':
app.run(debug=True)
<!DOCTYPE html>
<html>
<head>
<title>
Signup
</title>
<!--meta tags -->
<style>
/* CSS Libraries Used
*/
@import url('https://fonts.googleapis.com/css?family=Source+Sans+Pro:300,400');
body, html {
font-family: 'Source Sans Pro', sans-serif;
background-color: #ddf1e0;
padding: 0;
margin: 0;
}
27
#particles-js {
position: absolute;
width: 100%;
height: 100%;
}
.container{
margin: 0;
top: 50px;
left: 50%;
position: absolute;
text-align: center;
transform: translateX(-50%);
background-color: rgb(238, 243, 237);
border-radius: 9px;
border-top: 10px solid #79a6fe;
border-bottom: 10px solid #8BD17C;
width: 400px;
height: 500px;
box-shadow: 1px 1px 108.8px 19.2px rgb(25,31,53);
}
.box h4 {
font-family: 'Source Sans Pro', sans-serif;
color: #5c6bc0;
font-size: 18px;
margin-top:94px;;
}
.box h4 span {
color: #dfdeee;
font-weight: lighter;
}
28
.box h5 {
font-family: 'Source Sans Pro', sans-serif;
font-size: 13px;
color: #a1a4ad;
letter-spacing: 1.5px;
margin-top: - 15px;
margin-bottom: 70px;
}
}
::-webkit-input-placeholder {
color: #565f79;
}
29
a{
color: #5c7fda;
text-decoration: none;
}
a:hover {
text-decoration: underline;
}
.btn1 {
border:0;
background: #7f5feb;
color: #dfdeee;
border-radius: 100px;
width: 340px;
height: 49px;
30
font-size: 16px;
position: absolute;
top: 79%;
left: 8%;
transition: 0.3s;
cursor: pointer;
}
.btn1:hover {
background: #5d33e6;
}
.rmb {
position: absolute;
margin-left: -24%;
margin-top: 0px;
color: #dfdeee;
font-size: 13px;
}
.forgetpass {
position: relative;
float: right;
right: 28px;
}
.dnthave{
position: absolute;
top: 92%;
left: 24%;
}
31
font-size: 16px;
content: "\f00c";
position: absolute;
top: -4px;
color: #896cec;
left: - 1px;
width: 13px;
}
.typcn {
position: absolute;
left: 339px;
top: 282px;
color: #3b476b;
font-size: 22px;
cursor: pointer;
}
.typcn.active {
color: #7f60eb;
}
.error {
background: #ff3333;
text-align: center;
width: 337px;
height: 20px;
padding: 2px;
border: 0;
border-radius: 5px;
margin: 10px auto 10px;
position: absolute;
top: 31%;
left: 7.2%;
32
color: white;
display: none;
}
.footer {
position: relative;
left: 0;
bottom: 0;
top: 605px;
width: 100%;
color: #78797d;
font-size: 14px;
text-align: center;
}
.footer .fa {
color: #7f5feb;;
}
</style>
</head>
<body>
<div class="animated bounceInDown">
<div class="container">
<span class="error animated tada" id="msg"></span>
<form action="/signup" method="GET" class="box" >
<h1 style="color: black; "><span>Register</span></h1>
<!-- Success Message -->
{% if message %}
<p style="color: green;">{{ message }}</p>
{% endif %}
<input type="text" name="user" placeholder="Username" required autocomplete="off"
pattern="[A-Za-z].{6,}" title="Six or more Uppercase and lowercase letter req.">
<i class="typcn typcn-eye" id="eye"></i>
33
<input type="text" name="name" placeholder="Name" required autocomplete="off">
<i class="typcn typcn-eye" id="eye"></i>
<input type="text" name="email" placeholder="Email" required autocomplete="off"
pattern="[a-z0-9._%+\-]+@[a-z0-9.\-]+\.[a-z]{2,}$">
<i class="typcn typcn-eye" id="eye"></i>
<input type="text" name="mobile" placeholder="Mobile" required autocomplete="off"
pattern="[6-9]{1}[0-9]{9}" title="Phone number with 7-9 and remaing 9 digit with 0-9">
<i class="typcn typcn-eye" id="eye"></i>
<input type="password" name="password" placeholder="Passsword" id="pwd" required
autocomplete="off" pattern="(?=.\d)(?=.[a-z])(?=.*[A-Z]).{8,}" title="Must contain at least one
number and one uppercase and lowercase letter, and atleast 8 or more characters">
</div>
</body>
</html>
34
6. SYSTEM TESTING
35
6.2 Structural Testing:
It is not possible to effectively test software without running it. Structural testing, also
known as white-box testing, is required to detect and fix bugs and errors emerging
during the pre-production stage of the software development process. At this stage,
unit testing based on the software structure is performed using regression testing. In
most cases, it is an automated process working within the test automation framework
to speedup the development process at this stage. Developers and QA engineers have
full access to the software’s structure and data flows (data flows testing), so they could
track any changes (mutation testing) in the system’s behavior by comparing the tests’
outcomes with the results of previous iterations (control flow testing).
36
6.3 Behavioral Testing:
The final stage of testing focuses on the software’s reactions to various activities rather
than on the mechanisms behind these reactions. In other words, behavioral testing, also
known as black-box testing, presupposes running numerous tests, mostly manual, to
see the product from the user’s point of view. QA engineers usually have some specific
information about a business or other purposes of the software (the blackbox’) to run
usability tests, for example, and react to bugs as regular users of the product will do.
Behavioral testing also may include automation (regression tests) to eliminate human
error if repetitive activities are required. For example, you may need to fill 100
registration forms on the website to see how the product copes with such an activity,
so the automation of this test is preferable.
U s e r g e t l o g i n i n t o t h e
2 User signin a p p l i c a t i o n There is no process
37
7. RESULTS
Figure 7.1: HOME PAGE
38
Figure 7.4: Input for heart disease
39
Figure 7.6: FOR DIABETES
40
Figure 7.9 FOR PARKINSONS DISEASE
41
Figure 7.11: Output for Parkinson’s disease
• Figure 7.1 presents the Multi Disease Classification system, which predicts Diabetes, Heart
Disease, and Parkinson’s Disease using machine learning. The interface provides disease
information, promoting awareness and early detection.
• Figure 7.2 illustrates the login page for an AI-driven approach for predicting diseases like
Diabetes, Parkinson’s, and heart disease. It employs machine learning models such as SVM,
Decision Tree, and Random Forest to enhance classification accuracy.
• Figure 7.3 shows the heart disease prediction system, outlining symptoms, causes, and
prevention strategies. It reassures users by promoting a healthy lifestyle and encouraging risk
reduction measures.
• Figure 7.4 depicts the heart disease prediction system, in which users input parameters
like age, cholesterol, and blood pressure to assess their risk.
• Figure 7.5 shows the heart disease prediction system confirming no risk, reassuring
the user and encouraging a healthy lifestyle.
• Figure 7.6 shows the diabetes system, outlining symptoms, causes, and prevention
strategies. It reassures users by promoting a healthy lifestyle and encouraging risk
reduction measures.
• Figure 7.7 illustrates the diabetes prediction system, which collects inputs such as BMI
and glycated haemoglobin to evaluate diabetes risk.
• Figure 7.8 presents the diabetes prediction system confirming no diabetes, promoting
proactive health management.
42
• Figure 7.9 shows the Parkinson’s disease prediction system, outlining symptoms, causes, and
prevention strategies. It reassures users by promoting a healthy lifestyle and encouraging risk
reduction measures.
• Figure 7.10 highlights the Parkinson’s disease prediction system, which analyzes vocal
and neurological markers for risk assessment.
• Figure7.11 indicates the Parkinson’s disease prediction system predicting a risk and
advising the user to consult a doctor.
• Figure 7.12: Diseases Notebook & Logout Button provides colab notebook for each
disease and user logout button.
43
Figure 7.14 Heart Disease Confusion Matrix
The classifications are widely spread, as seen in figure 7.14, with most predictions correctly
landing on the diagonal. A few misclassifications do happen, nevertheless, demonstrating a useful
model performance with slight variations in class correctness. where 0 stands for every non-heart
disease. A value of 1 denotes all heart disease. A categorization distribution is displayed in figure
7.15. Although there are a few misclassifications that exhibit only modest differences in class
accuracy, overall model performance is good. where 0 denotes the absence of diabetes. A value of
1 denotes diabetes. As seen in figure 7.16, all categories are evenly distributed, and every forecast
falls on the diagonal as intended. The general predicted accuracy is usually rather good, although a
few misclassifications are immediately noticeable and may indicate class deviations. where 0 is
for non-Parkinson's and 1 denotes Parkinson's disease.
44
Table 1: Comparison Accuracy of the trained model
The existing systems, including SVM Classifier as well as Logistic Regression, have been
customarily used in Diabetes, Heart Disease, and Parkinson’s Disease prediction. However,
their accuracy remains somewhat limited in range, with SVM achieving nearly 79% for
Diabetes and almost 89% for Parkinson’s Disease, while Logistic Regression reaches roughly
85% for heart disease. These results point to the need for many strong models and accurate
ones. The models are needed to greatly improve disease classification performance.
Table 2 displays the evaluation results of the proposed model assessed via a comparison of
Random Forest with standard approaches like SVM as well as Logistic Regression when
classifying diseases. In order to diagnose cardiovascular, diabetes, and Parkinson’s disease, the
whole test took precision, accuracy, recall, and F1 score into account. The Random Forest
algorithm achieves the optimal results, with a Precision of 0.980, a Recall of 0.986, and an
F1 Score of 0.983 for Diabetes. It performs, also, very well in heart disease classification,
attaining to 0.966 Precision along with to 0.980 Recall as well as to 0.951 F1 Score. The
model shows high accuracy in Parkinson’s Disease detection. It’s Precision, Recall, and F1
Score are 0.937, 0.983, and 0.959 respectively. These results absolutely confirm of that
Random Forest delivers with very optimal classification performance, greatly outperforming
from quite customary classifiers like SVM and Logistic Regression in all of disease
categories.
45
8. CONCLUSION
The Multiple Disease Prediction System successfully leverages machine learning
algorithms to predict Diabetes, Heart Disease, and Parkinson’s Disease based on real-
world medical data. After evaluating different models, Random Forest emerged as the
most accurate algorithm across all three diseases, outperforming Support Vector
Machine (SVM) and Decision Tree algorithms.
46
9. REFERENCES
[1] A. Islam, M. Saleh, A. Tasnim, M. Samiun, S. Noor, and K. Bishnu, "Identification of
Cardiovascular Disease via Diverse Machine Learning Methods," Journal of Computer and
Communications, vol. 12,pp. 134- 150, Dec. 2024, doi: 10.4236/jcc.2024.1212009.
[2] S. Krishnan and N. T. M, "Design of an Efficient Prediction Model for Early Parkinson's
Disease Diagnosis," IEEE Access, vol. PP, no. 99,pp. 1- 1, Jan. 2024, doi:
10.1109/ACCESS.2024.3421302.
[3] G. Coorey, G. A. Figtree, D. F. Fletcher, et al., "The Health Digital Twin to Tackle
Cardiovascular Disease—A Review of an Emerging Interdisciplinary Field," NPJ Digit.
Med., vol. 5,no. 1,p. 126, 2022, doi: 10.1038/s41746-022-00640-7.
[4] S.-J. Hahn, S. Kim, Y. S. Choi, J. Lee, and J. Kang, "Prediction of Type 2 Diabetes Using
Genome-Wide Polygenic Risk Score and Metabolic Profiles: A Machine Learning Analysis
of Population-Based 10-Year Prospective Cohort Study," EBioMedicine, vol. 86, 2022, doi:
10.1016/j.ebiom.2022.104383.
[5] S. M. Zekavat, V. K. Raghu, M. Trinder, et al., "Deep Learning of the Retina Enables
Phenome-and Genome Wide Analyses of the Microvasculature," Circulation, vol. 145,no. 2,
pp. 134– 150, 2022, doi: 10.1161/CIRCULATIONAHA.121.057709.
[6] J. Shahbakhti et al., "Detection of Parkinson’s Disease Using Speech Features," IEEE
Transactions on Neural Networks and Learning Systems, vol. 33,no. 2,pp. 487-500, 2022.
[8] L. Li, Y. Huang, and Y. Han, "Use of Deep Learning Genomics to Discriminate Alzheimer’s
Disease and Healthy Controls," in Proc. 43rd Annu. Int. Conf. IEEE Eng. Med. Biol. Soc.
(EMBC), 2021,pp. 1-4, doi: 10.1109/EMBC46164.2021.9630731.
[9] J. Peng, J. Li, R. Han, et al., "A Deep Learning-Based Genome-Wide Polygenic Risk Score
for Common Diseases Identifies Individuals with Risk," medRxiv, 2021,
doi:10.1101/2021.11.17.21265352.
[10] S. W. Lee et al., "Gait Analysis for Parkinson’s Disease Using Machine Learning," IEEE
Journal of Translational Engineering in Health and Medicine, vol. 9,pp. 1- 10, 2021.
[11] Akella and S. Akella, ‘‘Machine learning algorithms for predicting coronary artery disease:
Efforts toward an open-source solution,’’ Future Sci. OA, vol. 7,no. 6, Jul. 2021, Art. no.
FSO698, doi: 10.2144/fsoa-2020- 0206.
[13] M. Kavitha, G. Gnaneswar, R. Dinesh, Y. R. Sai, and R. S. Suraj, ‘‘Heart disease prediction
using hybrid machine learning model,’’ in Proc. 6th Int. Conf. Inventive Comput. Technol.
47
(ICICT), Coimbatore, India, Jan. 2021,pp. 1329– 1333.
[14] J. Corral-Acero, F. Margara, M. Marciniak, et al., "The ‘Digital Twin’ to Enable the Vision
of Precision Cardiology," Eur. Heart J., vol. 41,no. 48,pp. 4556–4564, 2020, doi:
10.1093/eurheartj/ehaa159.
[16] J. R. Gray et al., "A Machine Learning Approach to Predict Parkinson’s Disease," IEEE
Transactions on Neural Systems and Rehabilitation Engineering, vol. 28,no. 8,pp. 1802-
1810, 2020.
[17] M. M. Rahman et al., "A Deep Learning-Based Approach for Parkinson’s Disease Detection
Using Handwriting Analysis," IEEE Access, vol. 8,pp. 147953- 147962, 2020.
[18] D. P. Kingma et al., "Speech Analysis for Parkinson’s Disease Detection Using CNN and
LSTM," IEEE Transactions on Biomedical Engineering, vol. 67,no. 9,pp. 2703-2714, 2020.
48
10. PUBLICATION
49