0% found this document useful (0 votes)
21 views58 pages

21kn1a4250 1

This project report presents a Multiple Disease Prediction System utilizing machine learning techniques to predict Diabetes, Heart Disease, and Parkinson's Disease using datasets from Kaggle. The system employs algorithms such as Random Forest, Decision Tree, and Support Vector Machine to enhance predictive performance and facilitate early disease detection. The research highlights the potential of machine learning in improving healthcare outcomes through automated diagnostics and risk assessment.

Uploaded by

rohitreon09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views58 pages

21kn1a4250 1

This project report presents a Multiple Disease Prediction System utilizing machine learning techniques to predict Diabetes, Heart Disease, and Parkinson's Disease using datasets from Kaggle. The system employs algorithms such as Random Forest, Decision Tree, and Support Vector Machine to enhance predictive performance and facilitate early disease detection. The research highlights the potential of machine learning in improving healthcare outcomes through automated diagnostics and risk assessment.

Uploaded by

rohitreon09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

PREDICTION OF DIABETES, HEART

DISEASE AND PARKINSON’S DISEASE


USING ML

A PROJECT REPORT

Submitted to

Jawaharlal Nehru Technological University Kakinada, Kakinada


in partial fulfillment for the award of the degree of

Bachelor of Technology
in
COMPUTER SCIENCE AND ENGINEERING (AI&ML)
Submitted by
Golla Bhavana 21KN1A4250(881186183480)

Under the esteemed guidance of


Dr.V TEJU
Professor

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING (AI&ML)


NRI INSTITUTE OF TECHNOLOGY
Autonomous
(Approved by AICTE, Permanently Affiliated to JNTUK, Kakinada)
Accredited by NBA (CSE, ECE & EEE), Accredited by NAAC with ‘A’
Grade ISO 9001: 2015 Certified Institution
Pothavarappadu (V), (Via) Nunna,Agiripalli (M), EluruDist,
PIN: 521212, A.P,India.
2021-2025

1
PREDICTION OF DIABETES, HEART
DISEASE AND PARKINSON’S DISEASE
USING ML

A PROJECT REPORT

Submitted to

Jawaharlal Nehru Technological University Kakinada, Kakinada


in partial fulfillment for the award of the degree of

Bachelor of Technology
in
COMPUTER SCIENCE AND ENGINEERING (AI&ML)
Submitted by
Golla Bhavana 21KN1A4250(881186183480)

Under the esteemed guidance of


Dr.V TEJU
Professor

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING (AI&ML)


NRI INSTITUTE OF TECHNOLOGY
Autonomous
(Approved by AICTE, Permanently Affiliated to JNTUK, Kakinada)
Accredited by NBA (CSE, ECE & EEE), Accredited by NAAC with ‘A’
Grade ISO 9001: 2015 Certified Institution
Pothavarappadu (V), (Via) Nunna, Agiripalli (M), EluruDist,
PIN: 521212,A.P,India.
2021-2025

2
CERTIFICATE

This is to certify that the Project entitled “PREDICTION OF DIABETES, HEART


DISEASE AND PARKINSON’S DISEASE USING ML” is a bonafide work carried out by
Golla Bhavana (21KN1A4250) in partial fulfillment for the award of degree of Bachelor
of Technology in Computer Science and Engineering (AI&ML) of Jawaharlal
Nehru Technological University Kakinada, during the year 2024-2025.

Project Guide Head of the Department


(Dr. V TEJU) (Dr. B. DASARADHARAM)
Professor Professor & HOD of CSE(AI&ML) Dept

EXTERNAL EXAMINER

3
DECLARATION

I hereby declare that the project report titled “PREDICTION OF DIABETES, HEART

DISEASE AND PARKINSON’S DISEASE USING ML” is a bonafide work carried out in

the Department of Computer Science and Engineering (AI&ML), NRI Institute of

Technology, Agiripalli, Vijayawada, during the academic year 2024-2025, in partial

fulfilment for the award of the degree of Bachelor of Technology by JNTU Kakinada.

I further declare that this dissertation has not been submitted elsewhere for any Degree.

GOLLA BHAVANA (21KN1A4250)

4
ACKNOWLEDGEMENT

We take this opportunity to thank all who have rendered their full support to my work. The
pleasure, the achievement, the glory, the satisfaction, the reward, the appreciation and the
construction of my project cannot be expressed with a few words for their valuable suggestions.

We are extending our sincere thanks to Professor, Dr. V TEJU for her continuous guidance
and support to complete my project successfully.

We are expressing our heartfelt thanks to Professor & head of the Department, Dr. B.
DASARADHA RAM for his continuous guidance for completion of my Project work.

We thankful to the Principal, Dr. C. NAGA BHASKAR for his encouragement to complete
the Project work.

We are extending our sincere and honest thanks to the Chairman, Dr.R. VENKATA RAO &
Secretary, Sri K. Sridhar for their continuous support in completing the Project work.

Finally, we thank the Administrative Officer, Staff Members, Faculty of Department of


CSE(AI&ML), NRI Institute of Technology and my friends, directly or indirectly helped us in
the completion of this project.

GOLLA BHAVANA (21KN1A4250)

5
ABSTRACT

Healthcare has witnessed significant advancements with the integration of machine learning
(ML), enabling early disease detection and accurate predictions. This project presents a
Multiple Disease Prediction System that utilizes Random Forest, Decision Tree, and Support
Vector Machine (SVM) to diagnose Diabetes, Heart Disease, and Parkinson’s Disease using
Kaggle datasets. The system employs a data-driven approach, leveraging feature selection,
preprocessing, and model optimization to enhance predictive performance.

Among the algorithms evaluated, Random Forest achieved the highest accuracy across all three
diseases, demonstrating its robustness in handling large datasets with complex patterns. The
Decision Tree classifier was incorporated for its interpretability, providing insights into feature
importance, while SVM was effective for high-dimensional medical data classification. The
system follows a structured ML pipeline, including data cleaning, normalization,
hyperparameter tuning, and model evaluation to ensure optimal results.

This research highlights the transformative impact of ML in healthcare, particularly in early


disease detection and risk assessment. By integrating advanced classification techniques, the
proposed system supports early diagnosis, minimizes diagnostic delays, and aids medical
professionals in decision-making. The combination of predictive analytics and automated
diagnosis enhances healthcare accessibility, improves patient outcomes, and paves the way for
data-driven preventive care strategies.

6
TABLE OF CONTENTS

S.NO TITLE PAGE NO

1. INTRODUCTION 1
2. LITERATURE SURVEY 3
3. SYSTEM ANALYSIS 7
3.1 Existing System 7
3.1.1 Disadvantages of Existing System 7

3.2 Proposal System 7


3.2.1 Advantages of Proposal System 8
3.3 System Requirements 9
3.3.1 Hardware Requirements 9
3.3.2 Software Requirements 9
3.4 Functional Requirements 9

3.5 Non-Functional Requirements 10


3.5.1 Advantages 10
3.5.2 Disadvantages 11
3.5.3 Key Learning 11
3.6 System Study 11
3.6.1 Feasibility Study 11
3.6.1.1 Economical Feasibility 12
3.6.1.2 Technical Feasibility 12
3.6.1.3 Social Feasibility 12
4. SYSTEM DESIGN 13
4.1System Architecture 13

4.2 UML Diagrams 14


4.2.1 Use Case Diagram 15
4.2.2 Class Diagram 15
4.2.3 Sequence Diagram 16
4.2.4 Data Flow Diagram 17
4.2.5 FlowChart Diagram 18

7
5. SYSTEM IMPLEMENTATION 19

5.1 Modules 19
5.2 Algorithms 19
5.3 Random Forest 20
5.4 SVM 21
5.5 Decision Tree 22
6. SYSTEM TESTING 35
6.1 Software Testing 35
6.2 Structural Testing 36
6.3 Behavioral Testing 37
6.4 Black-Box Testing 37
7. RESULTS 38
8. CONCLUSION & FUTURE WORK 46
9. REFERENCES 47
10. PUBLICATION 49

8
LIST OF FIGURES
Figure No Name of the Figure Page No

1 Introduction 1

3.5 Non-Functional Requirements 10

4.1 System Architecture 13

4.2.1 Use Case Diagram 15

4.2.2 Class Diagram 15

4.2.3 Sequence Diagram 16

4.2.4 Data Flow Diagram 17

4.2.5 Flow Chart Diagram 18

5.3 Random Forest and SVM Algorithm 21


Diagram

5.5 Decision Tree Algorithm Diagram 22

7.1-7.16 Results 38-45

9
1. INTRODUCTION

Healthcare is one of the most critical sectors where technological advancements have
significantly improved diagnosis, treatment, and patient management. Machine learning (ML) in
healthcare has gained immense importance due to its capability to examine large medical
datasets, recognize patterns, and make accurate predictions. Detection in its early stages is key to
a decisive improvement in the treatment of heart diseases. Machine learning helps to detect
cardiac disease in few seconds [1]. However, traditional diagnostic methods are often time-
consuming, costly, and require extensive medical expertise. These challenges highlight the need
for automated, AI-driven healthcare solutions that can assist medical professionals and improve
diagnostic accuracy.

This work presents a Multiple Disease Prediction System to forecast the probability of three main
diseases: machine learning models for Diabetes, Heart Disease, and Parkinson’s and real-world
medical datasets. Every one of these diseases seriously affects world health:

Diabetes: This metabolic illness causes abnormal blood sugar levels over time. If left undiagnosed or
unmanaged, it can lead to serious complications like kidney failure, heart disease, and nerve damage.

Heart Disease: It is a general term for various heart conditions, including coronary artery disease,
heart attacks and heart failure. As one of the major causes of death worldwide, early detection is
crucial for preventing serious complications and saving lives.

Parkinson’s Disease: a progressive and long-lasting neurological disease influencing mobility and
speech. Many problems with movement and control follow from this disturbance, which causes
many symptoms including tremor, stiffness, and trouble with coordination. Early diagnosis can
enhance quality of life [2] and help to properly control symptoms.

By leveraging Kaggle datasets, this system is trained on real-world patient data to improve accuracy
and reliability. The project employs Support Vector Machines (SVM) for Diabetes and Parkinson’s
Disease prediction and Logistic Regression for Heart Disease classification, ensuring a tailored and
effective approach for each condition. Additionally, Decision Tree algorithms are incorporated to
refine predictions and optimize model performance.

The system aims to predict multiple diseases using an AI-driven approach, reducing dependency on
costly and time-consuming medical tests, Enhance diagnostic accuracy by training models on diverse
datasets with real-world patient information. Support early on prevention of Parkinson's disease,
heart disease, and diabetes. To anticipate disease and maximise therapy choice for the real-life
patient, a virtual representation of a patient is created and gets real-time updates of a spectrum of data
variables [3].

Make the system accessible and user-friendly through an interactive web-based interface (using
flask), allowing users to input medical parameters and receive real-time disease predictions.

Facilitate preventive healthcare by providing an efficient tool for risk assessment, helping both
individuals and medical professionals make informed decisions.

1
FIGURE 1: Proposed Model Architecture
In Figure 1, the Proposed System of Predictive Disease Detection introduces a method for
processing input datasets (Diabetes, Heart Disease, Parkinson's Disease), data preprocessing
steps, and classification process (DT, SVM, RF) and use Accuracy, Precision, Recall, and
F1-score.

2
2. LITERATURE SURVEY

1.1 Identification of Cardiovascular Disease via Diverse Machine Learning Methods


(2024)
Author(s): Alaa, S.A., Islam, M.I.M. and Saleh.

Date of Submission: 2024


Abstract: Highly efficient silver halide nanoparticles (AgX, X = Cl, Br NP’s) were successfully
synthesized by facile and template-free direct-precipitation method using potassium chloride, potassium
bromide and silver nitrate as reactive sources. The as-prepared AgX NP’s were characterized by FTIR,
thermogravimetric analysis, XRD, EDX and HRTEM .

2.2 Design of an Efficient Prediction Model for Early Parkinson’s Disease Diagnosis (2024)
Author: K. Shyamala,T. M. Navamani

Date of Submission: 2024


Abstract: Parkinson’s Disease (PD) is along-lasting and progressive brain disorder that disrupts
the body’s nervous system pathways. This disruption leads to various issues with movement and
control, leading to various symptoms, including tremors, stiffness, and difficulty with movement
and coordination

2.3 Health digital twin to tackle cardiovascular disease-a review of an emerging


interdisciplinary field
Author: Genevieve Coorey, Gemma A Figtree, David F Fletcher
Date of Submission:2022
Abstract: Potential benefits of precision medicine in cardiovascular disease (CVD) include
more accurate phenotyping of individual patients with the same condition or presentation,
using multiple clinical, imaging, molecular and other variables to guide diagnosis and
treatment.

2.4 Prediction of type 2 diabetes using genome (2022)


Author: Seok-Ju Hahn,Suhyeon Kim,Young Sik Choi,Junghye Lee
Date of Submission:2022
Abstract: Previous work on predicting type 2 diabetes by integrating clinical and genetic
factors has mostly focused on the Western population. In this study, we use genome-wide
polygenic risk score (gPRS) and serum metabolite data for type 2 diabetes risk prediction in
the Asian population.

3
2.5 Deep Learning of the Retina Enables Phenome and Genome (2021)
Author:Seyedeh Maryam Zekavat,Vineet K Raghu,Mark Trinder
Date of Submission: 2021
Abstract: The microvasculature, the smallest blood vessels in the body, has key roles in
maintenance of organ health and tumorigenesis. The retinal fundus is a window for human in
vivo noninvasive assessment of the microvasculature.

2.6 Detection of Parkinson’s Disease Using Speech Features(2022)


Author: A. J. Shahbakhti.
Date of Submission: 2022
Abstract: The paper investigates speech feature extraction techniques for Parkinson’s
disease detection. The authors implement a neural network-based classification model. Their
findings demonstrate the effectiveness of speech analysis in early diagnosis.

2.7 Effectively Predicting the Presence of Coronary Heart Disease (2022)


Author:Ch Anwar Ul Hassan,Jawaid Iqbal,Rizwana Irfan

Date of Submission: 2022


Abstract: Coronary heart disease is one of the major causes of deaths around the globe.
Predicating a heart disease is one of the most challenging tasks in the field of clinical data
analysis. Machine learning (ML) is useful in diagnostic assistance in terms of decision
making and prediction on the basis of the data produced by healthcare sector globally.

2.8 Use of deep learning genomics to discriminate Alzheimer's disease and healthy controls
(2021).
Author(s):Lanlan Li,Yanru Huang,Ying Han,Jiehui Jiang.
Date of Submission : 2021
Abstract: Alzheimer's disease (AD) is the most prevalent neurodegenerative disorder and
the most common form of dementia in the elderly. Because gene is an important clinical
risk factor resulting in AD, genomic studies, such as genome-wide association studies
(GWAS), have widely been applied into AD studies.

2.9 A Deep Learning-Based Genome-Wide Polygenic Risk Score for Common Diseases
Identifies Individuals with Risk (2021).
Author(s): J. Peng, J. Li, R. Han
Date of Submission : 2021
Abstract: The paper explores deep learning-based genomics in differentiating Alzheimer’s
disease from healthy individuals. Their study underscores the potential of AI in advancing
neurodegenerative disease diagnostics.

4
2.10 Gait Analysis for Parkinson’s Disease Using Machine Learning (2021).
Author(s) : S. W. Lee

Date of Submission: 2021


Abstract: This study applies machine learning techniques for gait analysis in Parkinson’s
disease patients

2.11 Machine learning algorithms for predicting coronary artery disease (2021)
Author(s) :Aravind Akella,Sudheer Akella

Date of Submission: 2021


Abstract: The development of coronary artery disease (CAD), a highly prevalent disease
worldwide, is influenced by several modifiable risk factors. Predictive models built using
machine learning (ML) algorithms may assist clinicians in timely detection of CAD and may
improve outcomes.

2.12 Machine Learning Predictive Models for Coronary Artery Disease(2021)


Author(s) :L J Muhammad,Ibrahem Al-Shourbaji,Ahmed Abba Haruna

Date of Submission: 2021


Abstract : Coronary artery disease (CAD) is the commonest type of heart disease and over
80% of the deaths resulted from the diseases occurred in developing countries including
Nigeria, with majority being in those victims are below 70 years of age.

2.13 Heart disease prediction using hybrid machine learning model (2021)
Author(s) : M. Kavitha, G. Gnaneswar, R. Dinesh, Y. R. Sai, and R. S. Suraj

Date of Submission: 2021


Abstract : This study introduces a hybrid machine learning model for heart disease prediction
. Corral-Acero et al. The paper introduces the concept of a "Digital Twin" in precision
cardiology, leveraging computational models and patient-specific data.

2.14 The 'Digital Twin' to enable the vision of precision cardiology


Author(s):Jorge Corral-Acero,Francesca Margara,Maciej Marcinia

Date of Submission: 2020


Abstract : Providing therapies tailored to each patient is the vision of precision medicine,
enabled by the increasing ability to capture extensive data about individual patients. In this
position paper, we argue that the second enabling pillar towards this vision is the increasing
power of computers and algorithms to learn, reason, and build the 'digital twin' of a patient.
5
2.15 Machine Learning Outperforms ACC/AHA CVD Risk Calculator in MESA (2018)
Author(s) : I. Kakadiaris,Michalis Vrigkas,A. Yen

Date of Submission: 2018

Abstract : Background Studies have demonstrated that the current US guidelines based on
American College of Cardiology/American Heart Association (ACC/AHA) Pooled Cohort
Equations Risk Calculator may underestimate risk of atherosclerotic cardiovascular disease
(CVD) in certain high‐risk individuals, therefore missing opportunities for intensive therapy
and preventing CVD events

2.16 A Machine Learning Approach to Predict Parkinson’s Disease


Author(s) : J. R. Gray

Date of Submission: 2020


Abstract : This paper presents a machine learning model for predicting Parkinson’s disease using
multimodal data.

2.17 A Deep Learning-Based Approach for Parkinson’s Disease Detection Using


Handwriting Analysis (2020).
Author(s) : M. M. Rahman

Date of Submission: 2020


Abstract : The study develops a deep learning-based approach for Parkinson’s disease detection
using handwriting analysis.

2.18 Speech Analysis for Parkinson’s Disease Detection Using CNN and LSTM (2020).
Author(s) : D. P. Kingma

Date of Submission: 2020


Abstract : This research explores the use of CNN and LSTM models for speech analysis in Parkinson’s
disease detection.

6
2.19 Predicting Coronary Heart Disease Using an Improved LightGBM Model
Author(s): Captainozlem

Date of Submission: 2020


Abstract: This dataset provides preprocessed data for Framingham coronary heart disease
studies.

3. SYSTEM ANALYSIS
3.1 EXISTING SYSTEM:

In the current competitive world, we require an efficient technique to summarize,


analyze, present and maintain large datasets using data mining. This requires the
knowledge of all data mining techniques in order to choose the best for desired datasets
and these data mining techniques can answer the questions that traditionally were too
time consuming to resolve. Research has shown that, data doubles every three years

3.1.1 DISADVANTAGES OF EXISTING SYSTEM:

• Accuracy is Low
• Less prediction.

3.2 PROPOSED SYSTEM:

The proposed system integrates Random Forest, Decision Tree, and Support Vector
Machine (SVM) for Multiple Disease Prediction (Diabetes, Heart Disease, and
Parkinson’s). After extensive evaluation, Random Forest achieved the highest
accuracy across all three diseases, demonstrating its superior performance in handling
complex patterns and large datasets. Decision Tree provided interpretability, while
SVM performed well with high-dimensional data. The system follows a structuredML

7
pipeline, including data preprocessing, feature selection, and hyperparameter tuning
using Kaggle datasets. With a user-friendly interface, this system supports early
diagnosis, enhances healthcare decision-making, and promotes preventive care,
ensuring improved patient outcomes.

3.2.1 ADVANTAGES OF PROPOSED SYSTEM:

• However,we utilize a novel Principal Component Heart Failure (PCHF) feature


engineering technique to select the most prominent features.

• In contrast, our work employs a broader spectrum by utilizing nine machine


learning algorithms.

• Our work focuses on optimizing the PCHF mechanism to select the most important
features, leading to improved accuracy.

• We introduce an innovative approach by creating a new dataset based on the eight best-fit
features, optimized for accuracy enhancement.

8
3.3 SYSTEM REQUIREMENTS:

3.3.1 HARDWARE REQUIREMENTS:

Operating System: Windows Only

Processor: i5 and above

Ram: 4gb and above

Hard Disk: 50 GB

3.3.2 SOFTWARE REQUIREMENTS:

Software: Anaconda

Primary Language: Python

Frontend Framework: Flask

Back-end Framework: Jupyter Notebook

Database: Sqlite3

Front-End Technologies: HTML, CSS, JavaScript and Bootstrap4

3.4 FUNCTIONAL REQUIREMENTS:

• 1.Data Collection

• 2.Data Preprocessing

• 3.Training And Testing

• 4.Modiling

• 5.Predicting

9
3.5 NON-FUNCTIONAL REQUIREMENTS:

Non-Functional Requirements (NFRs) define the quality attributes of a software


system, ensuring its reliability, security, and overall performance. These requirements
impose constraints on system design and implementation to meet operational
standards. Below are the key NFRs for the proposed blockchain-based cloud security
system:

Usability Requirement

The system should have a user-friendly interface for key management and encryption
operations.
Users should be able to manage encrypted files with minimal technical knowledge.
Service ability Requirement

The system should allow easy updates and improvements without affecting existing
security mechanisms.

3.5.1 ADVANTAGES:

• Enhanced Security - Blockchain decentralization eliminates single points of failure,


reducing the risk of key theft.
• Tamper-Proof Data - The immutability of blockchain prevents unauthorized
modifications to encryption keys.
• Dynamic Key Generation - AES keys change dynamically, reducing vulnerability
to brute-force attacks.
• Scalability - The system can handle increasing numbers of users and encrypted files
without compromising performance.
• High Availability - Decentralized key storage ensures continued access even if
some blockchain nodes fail.
• Regulatory Compliance - The system aligns with data privacy laws, enhancing trust
for enterprise adoption.

3.5.2 DISADVANTAGES:

• High Computational Overhead - Encrypting files with AES and ECC, along with
blockchain transactions, can increase processing time.
• Storage Overhead - Blockchain requires significant storage space due to its
immutable record-keeping.
• Complexity - Managing dynamic AES keys and blockchain infrastructure requires
specialized technical expertise.

10
• Transaction Costs - Public blockchain networks may incur transaction fees,
increasing operational costs.
• Latency Issues - Writing and verifying transactions on a blockchain can introduce
delays in encryption and decryption processes.

3.5.3 KEY LEARNING:


• A non-functional requirement defines the performance attribute of a software system.

• Types of Non-functional requirement is Scalability Capacity, Availability, Reliability,


Recoverability, Data Integrity, etc.

• Example of Non-Functional Requirement is Employees never allowed to update their


salary information. Such attempt should be reported to the security administrator.
• Functional Requirement is a verb while Non-Functional Requirement is an attribute 11.
• The advantage of Non-functional requirement is that it helps you to ensure good user
experience and ease of operating the software.

• The biggest disadvantage of Non-functional requirement is that it may affect the various
high-level software subsystems.

3.6 SYSTEM STUDY:

3.6.1 FEASIBILITY STUDY:

The feasibility of the project is analyzed in this phase and business proposal is put
forth with a very general plan for the project and some cost estimates. During system
analysis the feasibility study of the proposed system is to be carried out. This is to
ensure that the proposed system is not a burden to the company. For feasibility
analysis, some understanding of the major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are

• ECONOMICAL FEASIBILITY
• TECHNICAL FEASIBILITY
• SOCIAL FEASIBILITY

11
3.6.1.1 ECONOMICAL FEASIBILITY:

This study is carried out to check the economic impact that the system will
have on the organization. The amount of fund that the company can pour into the
research and development of the system is limited. The expenditures must be justified.
Thus, the developed system as well within the budget and this was achieved because
most of the technologies used are freely available. Only the customized products had
to be purchased.

3.6.1.2 TECHNICAL FEASIBILITY:

This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the
available technical resources. This will lead to high demands on the available technical
resources. This will lead to high demands being placed on the client. The developed system
must have a modest requirement, as only minimal or null changes are required for
implementing this system.

3.6.1.3 SOCIAL FEASIBILITY:

The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not
feel threatened by the system, instead must accept it as a necessity. The level of acceptance
by the users solely depends on the methods that are employed to educate the user about the
system and to make him familiar with it. His level of confidence must be raised so that he
is also able to make some constructive criticism, which is welcomed, as he is the final user
of the system.

12
4. SYSTEM DESIGN

Figure 2 The Flow Chart of Hybrid model.


In Figure 2, The model learns from organized data, improves through training, and gets tested for accuracy.
Once ready, it examines new data for valid real-world predictions.

4.1 SYSTEM ARCHITECTURE:

Fig.4.1.1 System architecture

13
4.2 UML DIAGRAMS:

UML stands for Unified Modeling Language. UML is a standardized general-purpose


modeling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group.
The goal is for UML to become a common language for creating models of object-
oriented computer software. In its current form UML is comprised of two major
components: a Meta-model and anotation. In the future, some form of method or
process may also be added to; or associated with, UML.
The Unified Modeling Language is a standard language for specifying,
Visualization, Constructing and documenting the artifacts of software system, as
well as for business modeling and other non-software systems.
The UML represents a collection of best engineering practices that have proven
successful in the modeling of large and complex systems.
The UML is a very important part of developing objects oriented software and the
software development process. The UML uses mostly graphical notations to
express the design of software projects.

GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of programming languages and development process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks,
patterns and components.
7. Integrate best practices.

4.2.1 USE CASE DIAGRAM:

A use case diagram in the Unified Modeling Language (UML) is a type of


behavioural diagram defined by and created from a Use-case analysis. Its purpose is
to present a graphical overview of the functionality provided by a system in terms of
actors, their goals (represented as use cases), and any dependencies between those use
cases. The main purpose of a use case diagram is to show what system functions are
performed for which actor. Roles of the actors in the system can be depicted.

14
4.2.2 CLASS DIAGRAM:

The class diagram is used to refine the use case diagram and define a detailed
design of the system. The class diagram classifies the actors defined in the use case
diagram into aset of interrelated classes. The relationship or association between the
classes can be either an "is-a" or "has-a" relationship. Each class in the class diagram
maybe capable of providing certain functionalities. These functionalities provided by
the class are termed "methods" of the class. Apart from this, each class may have
certain "attributes" that uniquely identify the class.

15
4.2.3 SEQUENCE DIAGRAM:

A sequence diagram represents the interaction between different objects in the system.
The important aspect of a sequence diagram is that it is time-ordered. This means that
the exact sequence of the interactions between the objects is represented step by step.
Different objects in the sequence diagram interact with each other by passing
"messages".

DIAGRAM

Deployment diagram:
The deployment diagram captures the configuration of the runtime elements of the application. This
diagram is by far most useful when a system is built and ready to be deployed.

16
4.2.4 DATA FLOW DIAGRAM:

1. The DFD is also called bubble chart. It is a simple graphical formalism that can
be used to represent a system in terms of input data to the system, various
processing carried out on this data, and the output data is generated by this
system.

2. The data flow diagram (DFD) is one of the most important modeling tools. It
is used to model the system components. These components are the system
process, the data used by the process, an external entity that interacts with the
system and the information flows in the system.

3. DFD shows how the information moves through the system and how it is
modified by a series of transformations. It is a graphical technique that depicts
information flow and the transformations that are applied as data moves from
input to output.

4. DFD is also known as bubble chart. A DFD maybe used to represent a system
at any level of abstraction. DFD may be partitioned into levels that represent
increasing information flow and functional detail.

17
4.2.5 FLOWCHART DIAGRAM
The flowchart represents a structured workflow for a system that processes datasets using
an algorithm. It begins with a user attempting access, followed by a verification step to
check authorization. Unauthorized users are denied access, while authorized users
proceed to upload a dataset. The system then allows viewing of the dataset before running
an algorithm for analysis. Once executed, the algorithm predicts an output, which is then
visualized as a graph. The process concludes after the graphical representation, ensuring a
streamlined approach to data processing and analysis.

18
5. SYSTEM IMPLEMENTATION

5.1MODULES:

• Gathering the datasets: We gather all the r data from the Kaggle website and upload to
the proposed model.

• Generate Train & Test Model: We must preprocess the gathered data and then we have
to split the data into two parts training data with 80% and test data with 20%.

• Run Algorithms: For prediction apply the machine learning models on the dataset by
splitting the datasets in to 70 to 80 % of training with these models and 30 t0 20 % of
testing for predicting.

• Obtain the accuracy: In this module we will get accuracies.

• Predict output: in this module we will predict output based on input data.

5.2 Algorithm:

Step 1 Choose relevant datasets for Diabetes, Heart Disease, and Parkinson’s for training
and evaluation.

Step 2 To ensure generalization, divide the dataset into training, validation, and testing sets.

Step 3 Use ML classifiers such as DT, RF, or SVM for disease classification.

Step 4 Optimize model parameters such as tree depth, number of estimators, and feature
selection.

Step 5 Assess model performance using accuracy, precision, recall, and F1-score.

Step 6 Identify the most effective model for each disease based on validation and testing
metrics.
19
Step 7 Test the model’s effectiveness by applying it to unseen real-world data.

Step 8 Deploy the model into a Flask-based web application with SQLite for user
interaction and real-time disease detection.

This combination of multiple models is called Ensemble. Ensemble uses two methods:

1. Bagging: Creating a different training subset from sample training data with
replacement is called Bagging. The final output is based on majority voting.

2. Boosting: Combing weak learners into strong learners by creating sequential models
such that the final model has the highest accuracy is called Boosting. Example: ADA
BOOST, XG BOOST.

Fig 3
5.3 Random Forest:

Random Forest is an ensemble learning technique that use a large number of decision
trees to reduce overfitting and improve prediction accuracy. This research uses
Random Forest to foretell cases of Parkinson's disease. This method creates a large
number of low-correlation decision trees by bootstrapping data and features at each
split. With more diverse decision trees produced by randomisation, the Random Forest
model becomes more resilient and less prone to overfitting. When making predictions,
all of the decision trees' outputs are combined for classification, often using majority
voting. Random Forest's ability to manage complex

20
Fig 4: Radom forest architecture

5.4 SVM

For most classification tasks, supervised machine learning methods like Support
Vector Machine (SVM) come to mind. For diabetes prediction, SVM sorts patient data
into categories of diabetics and non-diabetics. SVM determines the best hyperplane
for class data separation. Separation is made easier using a kernel technique, which
translates complex non-linear data into a higher-dimensional space. With or without
noise, support vector machines (SVMs) are masters at identifying the critical decision
boundary in high-dimensional data. By highlighting support vectors—the data points
that are geographically closest to the decision border—support vector machines
(SVMs) improve diabetes prediction accuracy and decrease classification errors.

Fig 5: SVM architecture

21
5.4 Decision Tree

As a supervised machine learning method, the Decision Tree methodology works well
for regression and classification tasks. It foretells heart in this particular experiment.
Decision trees repeatedly sort data into subgroups based on the most informative
features using Gini Impurity or Entropy. At some point, the tree branches out to
classify the information (e.g., "Heart Disease" or "No Heart Disease"). Because the
decision-making process is straightforward and easy to follow, healthcare practitioners
can understand prediction with the use of Decision Trees, which are interpretable.
Decision trees work well for datasets that have feature connections that are clear to
see, but they are simplistic and can overfit if not calibrated properly.

Fig 6: DT architecture
CODE:
App.py
#import require python classes and packages

from flask import *


from auth_utils import *
from werkzeug.utils import secure_filename
import os,random
import numpy asnp
import pandas aspd
import joblib

random_seed = 42
random.seed(random_seed)
np.random.seed(random_seed)
22
UPLOAD_FOLDER = 'uploads'
ALLOWED_EXTENSIONS = {'txt','csv'}
MAX_ UPLOAD_ SIZE_ MB = 5 1 2 # Maximum upload size in megabytes

app = Flask(_name_)

app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
app.config['MAX_CONTENT_LENGTH'] = MAX_UPLOAD_SIZE_MB * 1024 * 1024 # Set max
content length

# Ensure the upload folder exists


os.makedirs(UPLOAD_FOLDER, exist_ok=True)

def allowed_file(filename):
return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS

@app.route('/')
def index():
return render_template('index.html')

@app.route('/register')
def register():
return render_template('register.html')

@app.route('/login')
def login():
return render_template('login.html')

@app.route("/signup")
def signup_route():
return signup()

23
@app.route("/signin")
def signin_route():
return signin()

@app.route('/home')
def home():
return render_template('home.html')

@app.route('/heart')
def heart():
return render_template('heart_predict.html')

@app.route('/diabetes')
def diabetes():
return render_template('diabetes_predict.html')

@app.route('/parkinson')
def parkinson():
return render_template('parkinsons_predict.html')

@app.route('/heart_notebook')
def heart_notebook():
return render_template('heart_test.html')

@app.route('/parkinson_notebook')
def parkinson_notebook():
return render_template('parkinson_test.html')

@app.route('/diabetes_notebook')
def diabetes_notebook():

24
return render_template('diabetes_test.html')

@app.route('/predict',methods=['POST'])
def predict():
int_features= [float(x) for x in request.form.values()]
print(int_features,len(int_features))
final4=[np.array(int_features)]
model = joblib.load('model/heart_model.sav')
predict = model.predict(final4)
if predict == 0:
output = "NO Heart"
alert_class = "alert-success text-success" # Green color for positive result
icon = "fas fa-smile-beam text-success" # Happy icon
text_color = "green"
elif predict == 1:
output = "Heart"
alert_class = "alert-danger text-danger" # Red color for warning
icon = "fas fa-exclamation-triangle text-danger" # Warning icon
text_color = "red"

returnrender_template('prediction.html',
output=output,alert_class=alert_class,icon=icon,text_color=text_color)

@app.route('/predict2',methods=['POST'])
def predict2():
int_features= [float(x) for x in request.form.values()]
print(int_features,len(int_features))
final4=[np.array(int_features)]
model = joblib.load('model/diabetes.sav')
predict = model.predict(final4)

if predict == 0:

25
output = "No Diabetes"
alert_class = "alert-success text-success" # Green color for positive result
icon = "fas fa-smile-beam text-success" # Happy icon
text_color = "green"
elif predict == 1:
output = "Diabetes"
alert_class = "alert-danger text-danger" # Red color for warning
icon = "fas fa-exclamation-triangle text-danger" # Warning icon
text_color = "red"

return render_template('prediction.html', output=output, alert_class=alert_class,


icon=icon,text_color=text_color)

@app.route('/predict3',methods=['POST'])
def predict3():
int_features= [float(x) for x in request.form.values()]
print(int_features,len(int_features))
final4=[np.array(int_features)]
model = joblib.load('model/parkinson.sav')
predict = model.predict(final4)

if predict == 0:
output = "NO Parkinson"
alert_class = "alert-success text-success" # Green color for positive result
icon = "fas fa-smile-beam text-success" # Happy icon
text_color = "green"
elif predict == 1:
output = "Parkinson"
alert_class = "alert-danger text-danger" # Red color for warning
icon = "fas fa-exclamation-triangle text-danger" # Warning icon
text_color = "red"

26
return render_template('prediction.html', output=output, alert_class=alert_class,
icon=icon,text_color=text_color)

if_name_ == '_main_':
app.run(debug=True)
<!DOCTYPE html>
<html>

<head>
<title>
Signup
</title>
<!--meta tags -->
<style>
/* CSS Libraries Used

*Animate.css by Daniel Eden.


*FontAwesome 4.7.0
*Typicons

*/

@import url('https://fonts.googleapis.com/css?family=Source+Sans+Pro:300,400');

body, html {
font-family: 'Source Sans Pro', sans-serif;
background-color: #ddf1e0;
padding: 0;
margin: 0;
}

27
#particles-js {
position: absolute;
width: 100%;
height: 100%;
}

.container{
margin: 0;
top: 50px;
left: 50%;
position: absolute;
text-align: center;
transform: translateX(-50%);
background-color: rgb(238, 243, 237);
border-radius: 9px;
border-top: 10px solid #79a6fe;
border-bottom: 10px solid #8BD17C;
width: 400px;
height: 500px;
box-shadow: 1px 1px 108.8px 19.2px rgb(25,31,53);
}

.box h4 {
font-family: 'Source Sans Pro', sans-serif;
color: #5c6bc0;
font-size: 18px;
margin-top:94px;;
}

.box h4 span {
color: #dfdeee;
font-weight: lighter;
}

28
.box h5 {
font-family: 'Source Sans Pro', sans-serif;
font-size: 13px;
color: #a1a4ad;
letter-spacing: 1.5px;
margin-top: - 15px;
margin-bottom: 70px;
}

.box input[type = "text"],.box input[type = "password"] {


display: block;
margin: 20px auto;
background: #f3f4f5;
border: 0;
border-radius: 5px;
padding: 14px 10px;
width: 320px;
outline: none;
color: #ec1010;
-webkit-transition: all .2s ease-out;
-moz-transition: all .2s ease-out;
-ms-transition: all .2s ease-out;
-o-transition: all .2s ease-out;
transition: all .2s ease-out;

}
::-webkit-input-placeholder {
color: #565f79;
}

.box input[type = "text"]:focus,.box input[type = "password"]:focus {


border: 1px solid #79A6FE;

29
a{
color: #5c7fda;
text-decoration: none;
}

a:hover {
text-decoration: underline;
}

label input[type = "checkbox"] {


display: none; /* hide the default checkbox */
}

/* style the artificial checkbox */


label span {
height: 13px;
width: 13px;
border: 2px solid #464d64;
border-radius: 2px;
display: inline-block;
position: relative;
cursor: pointer;
float: left;
left: 7.5%;
}

.btn1 {
border:0;
background: #7f5feb;
color: #dfdeee;
border-radius: 100px;
width: 340px;
height: 49px;

30
font-size: 16px;
position: absolute;
top: 79%;
left: 8%;
transition: 0.3s;
cursor: pointer;
}

.btn1:hover {
background: #5d33e6;
}

.rmb {
position: absolute;
margin-left: -24%;
margin-top: 0px;
color: #dfdeee;
font-size: 13px;
}

.forgetpass {
position: relative;
float: right;
right: 28px;
}

.dnthave{
position: absolute;
top: 92%;
left: 24%;
}

[type=checkbox]:checked + span:before {/* <-- style its checked state */


font-family: FontAwesome;

31
font-size: 16px;
content: "\f00c";
position: absolute;
top: -4px;
color: #896cec;
left: - 1px;
width: 13px;
}

.typcn {
position: absolute;
left: 339px;
top: 282px;
color: #3b476b;
font-size: 22px;
cursor: pointer;
}

.typcn.active {
color: #7f60eb;
}

.error {
background: #ff3333;
text-align: center;
width: 337px;
height: 20px;
padding: 2px;
border: 0;
border-radius: 5px;
margin: 10px auto 10px;
position: absolute;
top: 31%;
left: 7.2%;
32
color: white;
display: none;
}

.footer {
position: relative;
left: 0;
bottom: 0;
top: 605px;
width: 100%;
color: #78797d;
font-size: 14px;
text-align: center;
}

.footer .fa {
color: #7f5feb;;
}
</style>
</head>

<body>
<div class="animated bounceInDown">
<div class="container">
<span class="error animated tada" id="msg"></span>
<form action="/signup" method="GET" class="box" >
<h1 style="color: black; "><span>Register</span></h1>
<!-- Success Message -->
{% if message %}
<p style="color: green;">{{ message }}</p>
{% endif %}
<input type="text" name="user" placeholder="Username" required autocomplete="off"
pattern="[A-Za-z].{6,}" title="Six or more Uppercase and lowercase letter req.">
<i class="typcn typcn-eye" id="eye"></i>

33
<input type="text" name="name" placeholder="Name" required autocomplete="off">
<i class="typcn typcn-eye" id="eye"></i>
<input type="text" name="email" placeholder="Email" required autocomplete="off"
pattern="[a-z0-9._%+\-]+@[a-z0-9.\-]+\.[a-z]{2,}$">
<i class="typcn typcn-eye" id="eye"></i>
<input type="text" name="mobile" placeholder="Mobile" required autocomplete="off"
pattern="[6-9]{1}[0-9]{9}" title="Phone number with 7-9 and remaing 9 digit with 0-9">
<i class="typcn typcn-eye" id="eye"></i>
<input type="password" name="password" placeholder="Passsword" id="pwd" required
autocomplete="off" pattern="(?=.\d)(?=.[a-z])(?=.*[A-Z]).{8,}" title="Must contain at least one
number and one uppercase and lowercase letter, and atleast 8 or more characters">

<input type="submit" value="Sign Up" class="btn1">


</form>
<a href="/login" class="dnthave">Do you have an account? Sign In</a>
</div>

</div>
</body>

</html>

34
6. SYSTEM TESTING

System testing, also referred to as system-level tests or system-integration testing, is


the process in which a quality assurance (QA) team evaluates how the various
components of an application interact together in the full, integrated system or
application. System testing verifies that an application performs tasks as designed.
This step, a kind of black box testing, focuses on the functionality of an application.
System testing, for example, might check that every kind of user input produces the
intended output across the application.
Phases of system testing:
A video tutorial about this test level. System testing examines every component of an
application to make sure that they work as a complete and unified whole. A QA team
typically conducts system testing after it checks individual modules with functional or
user-story testing and then each component through integration testing.
If a software build achieves the desired results in system testing, it gets a final check
via acceptance testing before it goes to production, where users consume the software.
An app-dev team logs all defects and establishes what kinds and amount of defects are
tolerable.
6.1 Software Testing Strategies:
Optimization of the approach to testing in software engineering is the best way to make
it effective. A software testing strategy defines what, when, and how to do whatever is
necessary to make an end-product of high quality. Usually, the following software
testing strategies and their combinations are used to achieve this major objective:
Static Testing:
The early-stage testing strategy is static testing: it is performed without actually
running the developing product. Basically, such desk-checking is required to detect
bugs and issues that are present in the code itself. Such a check-up is important at the
pre-deployment stage as it helps avoid problems caused by errors in the code and
software structure deficits.

35
6.2 Structural Testing:
It is not possible to effectively test software without running it. Structural testing, also
known as white-box testing, is required to detect and fix bugs and errors emerging
during the pre-production stage of the software development process. At this stage,
unit testing based on the software structure is performed using regression testing. In
most cases, it is an automated process working within the test automation framework
to speedup the development process at this stage. Developers and QA engineers have
full access to the software’s structure and data flows (data flows testing), so they could
track any changes (mutation testing) in the system’s behavior by comparing the tests’
outcomes with the results of previous iterations (control flow testing).

36
6.3 Behavioral Testing:
The final stage of testing focuses on the software’s reactions to various activities rather
than on the mechanisms behind these reactions. In other words, behavioral testing, also
known as black-box testing, presupposes running numerous tests, mostly manual, to
see the product from the user’s point of view. QA engineers usually have some specific
information about a business or other purposes of the software (the blackbox’) to run
usability tests, for example, and react to bugs as regular users of the product will do.
Behavioral testing also may include automation (regression tests) to eliminate human
error if repetitive activities are required. For example, you may need to fill 100
registration forms on the website to see how the product copes with such an activity,
so the automation of this test is preferable.

1 User signup User get registered


application
into the
There is no process

U s e r g e t l o g i n i n t o t h e
2 User signin a p p l i c a t i o n There is no process

3 Enter input for prediction Prediction result displayed There is no process

37
7. RESULTS
Figure 7.1: HOME PAGE

Figure 7.2: LOGIN PAGE

Figure 7.3: FOR HEART DISEASE

38
Figure 7.4: Input for heart disease

Figure 7.5: Output for heart disease

39
Figure 7.6: FOR DIABETES

Figure 7.7: Input for diabetes

Figure 7.8: Output for diabetes

40
Figure 7.9 FOR PARKINSONS DISEASE

Figure 7.10: Input for Parkinsons disease

41
Figure 7.11: Output for Parkinson’s disease

Figure 7.12: Diseases Notebook & Logout Button.

• Figure 7.1 presents the Multi Disease Classification system, which predicts Diabetes, Heart
Disease, and Parkinson’s Disease using machine learning. The interface provides disease
information, promoting awareness and early detection.
• Figure 7.2 illustrates the login page for an AI-driven approach for predicting diseases like
Diabetes, Parkinson’s, and heart disease. It employs machine learning models such as SVM,
Decision Tree, and Random Forest to enhance classification accuracy.
• Figure 7.3 shows the heart disease prediction system, outlining symptoms, causes, and
prevention strategies. It reassures users by promoting a healthy lifestyle and encouraging risk
reduction measures.
• Figure 7.4 depicts the heart disease prediction system, in which users input parameters
like age, cholesterol, and blood pressure to assess their risk.
• Figure 7.5 shows the heart disease prediction system confirming no risk, reassuring
the user and encouraging a healthy lifestyle.
• Figure 7.6 shows the diabetes system, outlining symptoms, causes, and prevention
strategies. It reassures users by promoting a healthy lifestyle and encouraging risk
reduction measures.
• Figure 7.7 illustrates the diabetes prediction system, which collects inputs such as BMI
and glycated haemoglobin to evaluate diabetes risk.
• Figure 7.8 presents the diabetes prediction system confirming no diabetes, promoting
proactive health management.
42
• Figure 7.9 shows the Parkinson’s disease prediction system, outlining symptoms, causes, and
prevention strategies. It reassures users by promoting a healthy lifestyle and encouraging risk
reduction measures.
• Figure 7.10 highlights the Parkinson’s disease prediction system, which analyzes vocal
and neurological markers for risk assessment.
• Figure7.11 indicates the Parkinson’s disease prediction system predicting a risk and
advising the user to consult a doctor.
• Figure 7.12: Diseases Notebook & Logout Button provides colab notebook for each
disease and user logout button.

Figure 7.13 Comparison Graph

Figure 3 represents a comparative analysis of classification performance for three medical


conditions: Heart Disease, Diabetes, and Parkinson’s Disease. It evaluates three ML
models—RF, DT, and SVM across four key performance metrics:

• Accuracy (Blue): Evaluates the accuracy of predictions in general.


• Precision (Red): Evaluateshow many of the predicted positive cases were positive.
• Recall (Yellow): Assesses how many actual positive cases were correctly identified
• F1 Score (Purple): Represents the balance between precision and recall.

43
Figure 7.14 Heart Disease Confusion Matrix

Figure 7.15 Diabetes Confusion Matrix

Figure 7.16 Parkinson’s Disease Confusion Matrix

The classifications are widely spread, as seen in figure 7.14, with most predictions correctly
landing on the diagonal. A few misclassifications do happen, nevertheless, demonstrating a useful
model performance with slight variations in class correctness. where 0 stands for every non-heart
disease. A value of 1 denotes all heart disease. A categorization distribution is displayed in figure
7.15. Although there are a few misclassifications that exhibit only modest differences in class
accuracy, overall model performance is good. where 0 denotes the absence of diabetes. A value of
1 denotes diabetes. As seen in figure 7.16, all categories are evenly distributed, and every forecast
falls on the diagonal as intended. The general predicted accuracy is usually rather good, although a
few misclassifications are immediately noticeable and may indicate class deviations. where 0 is
for non-Parkinson's and 1 denotes Parkinson's disease.
44
Table 1: Comparison Accuracy of the trained model

Disease Name Algorithm Name Existing System


Accuracy
Diabetes SVM Classifier 79%
Heart Disease Logistic Regression 85%

Parkinson’s Disease SVM Classifier 89%

The existing systems, including SVM Classifier as well as Logistic Regression, have been
customarily used in Diabetes, Heart Disease, and Parkinson’s Disease prediction. However,
their accuracy remains somewhat limited in range, with SVM achieving nearly 79% for
Diabetes and almost 89% for Parkinson’s Disease, while Logistic Regression reaches roughly
85% for heart disease. These results point to the need for many strong models and accurate
ones. The models are needed to greatly improve disease classification performance.

Table 2: Proposed Performance Comparison for Heart Disease, Diabetes,


Parkinson’s Disease

Algorithm Accuracy Precision Recall F1 Score Disease Name


Name
Random 0.966 0.966 0.980 0.951 Heart Disease
Forest
Random 0.970 0.98 0.986 0.983 Diabetes
Forest
Random 0.958 0.937 0.983 0.959 Parkinson’s Disease
Forest

Table 2 displays the evaluation results of the proposed model assessed via a comparison of
Random Forest with standard approaches like SVM as well as Logistic Regression when
classifying diseases. In order to diagnose cardiovascular, diabetes, and Parkinson’s disease, the
whole test took precision, accuracy, recall, and F1 score into account. The Random Forest
algorithm achieves the optimal results, with a Precision of 0.980, a Recall of 0.986, and an
F1 Score of 0.983 for Diabetes. It performs, also, very well in heart disease classification,
attaining to 0.966 Precision along with to 0.980 Recall as well as to 0.951 F1 Score. The
model shows high accuracy in Parkinson’s Disease detection. It’s Precision, Recall, and F1
Score are 0.937, 0.983, and 0.959 respectively. These results absolutely confirm of that
Random Forest delivers with very optimal classification performance, greatly outperforming
from quite customary classifiers like SVM and Logistic Regression in all of disease
categories.

45
8. CONCLUSION
The Multiple Disease Prediction System successfully leverages machine learning
algorithms to predict Diabetes, Heart Disease, and Parkinson’s Disease based on real-
world medical data. After evaluating different models, Random Forest emerged as the
most accurate algorithm across all three diseases, outperforming Support Vector
Machine (SVM) and Decision Tree algorithms.

The high accuracy of Random Forest can be attributed to:

Its ability to handle complex, non-linear relationships in medical datasets.


The ensemble learning approach, which reduces overfitting and improves
generalization.
Its effectiveness in dealing with imbalanced datasets by aggregating multiple decision
trees.

46
9. REFERENCES
[1] A. Islam, M. Saleh, A. Tasnim, M. Samiun, S. Noor, and K. Bishnu, "Identification of
Cardiovascular Disease via Diverse Machine Learning Methods," Journal of Computer and
Communications, vol. 12,pp. 134- 150, Dec. 2024, doi: 10.4236/jcc.2024.1212009.

[2] S. Krishnan and N. T. M, "Design of an Efficient Prediction Model for Early Parkinson's
Disease Diagnosis," IEEE Access, vol. PP, no. 99,pp. 1- 1, Jan. 2024, doi:
10.1109/ACCESS.2024.3421302.

[3] G. Coorey, G. A. Figtree, D. F. Fletcher, et al., "The Health Digital Twin to Tackle
Cardiovascular Disease—A Review of an Emerging Interdisciplinary Field," NPJ Digit.
Med., vol. 5,no. 1,p. 126, 2022, doi: 10.1038/s41746-022-00640-7.

[4] S.-J. Hahn, S. Kim, Y. S. Choi, J. Lee, and J. Kang, "Prediction of Type 2 Diabetes Using
Genome-Wide Polygenic Risk Score and Metabolic Profiles: A Machine Learning Analysis
of Population-Based 10-Year Prospective Cohort Study," EBioMedicine, vol. 86, 2022, doi:
10.1016/j.ebiom.2022.104383.

[5] S. M. Zekavat, V. K. Raghu, M. Trinder, et al., "Deep Learning of the Retina Enables
Phenome-and Genome Wide Analyses of the Microvasculature," Circulation, vol. 145,no. 2,
pp. 134– 150, 2022, doi: 10.1161/CIRCULATIONAHA.121.057709.

[6] J. Shahbakhti et al., "Detection of Parkinson’s Disease Using Speech Features," IEEE
Transactions on Neural Networks and Learning Systems, vol. 33,no. 2,pp. 487-500, 2022.

[7] A. U. Hassan, J. Iqbal, R. Irfan, S. Hussain, A. D. Algarni, S. S. H. Bukhari, N. Alturki, and


S. S. Ullah, ‘‘Effectively predicting the presence of coronary heart disease using machine
learning classifiers,’’ Sensors, vol. 22,no. 19,p. 7227, Sep. 2022, doi: 10.3390/s22197227.

[8] L. Li, Y. Huang, and Y. Han, "Use of Deep Learning Genomics to Discriminate Alzheimer’s
Disease and Healthy Controls," in Proc. 43rd Annu. Int. Conf. IEEE Eng. Med. Biol. Soc.
(EMBC), 2021,pp. 1-4, doi: 10.1109/EMBC46164.2021.9630731.

[9] J. Peng, J. Li, R. Han, et al., "A Deep Learning-Based Genome-Wide Polygenic Risk Score
for Common Diseases Identifies Individuals with Risk," medRxiv, 2021,
doi:10.1101/2021.11.17.21265352.

[10] S. W. Lee et al., "Gait Analysis for Parkinson’s Disease Using Machine Learning," IEEE
Journal of Translational Engineering in Health and Medicine, vol. 9,pp. 1- 10, 2021.

[11] Akella and S. Akella, ‘‘Machine learning algorithms for predicting coronary artery disease:
Efforts toward an open-source solution,’’ Future Sci. OA, vol. 7,no. 6, Jul. 2021, Art. no.
FSO698, doi: 10.2144/fsoa-2020- 0206.

[12] L. J. Muhammad, I. Al-Shourbaji, A. A. Haruna, I. A. Mohammed, A. Ahmad, and M. B.


Jibrin, ‘‘Machine learning predictive models for coronary artery disease,’’ Social Netw.
Comput. Sci., vol. 2,no. 5,p. 350, Sep. 2021, doi: 10.1007/s42979-021-00731-4.

[13] M. Kavitha, G. Gnaneswar, R. Dinesh, Y. R. Sai, and R. S. Suraj, ‘‘Heart disease prediction
using hybrid machine learning model,’’ in Proc. 6th Int. Conf. Inventive Comput. Technol.
47
(ICICT), Coimbatore, India, Jan. 2021,pp. 1329– 1333.
[14] J. Corral-Acero, F. Margara, M. Marciniak, et al., "The ‘Digital Twin’ to Enable the Vision
of Precision Cardiology," Eur. Heart J., vol. 41,no. 48,pp. 4556–4564, 2020, doi:
10.1093/eurheartj/ehaa159.

[15] A. Kakadiaris, M. Vrigkas, A. A. Yen, T. Kuznetsova, M. Budoff, and M. Naghavi,


"Machine Learning Outperforms ACC/AHA CVD Risk Calculator in MESA," J. Am. Heart
Assoc., vol. 7,no. 22, e009476, 2020, doi: 10.1161/JAHA.118.009476.

[16] J. R. Gray et al., "A Machine Learning Approach to Predict Parkinson’s Disease," IEEE
Transactions on Neural Systems and Rehabilitation Engineering, vol. 28,no. 8,pp. 1802-
1810, 2020.

[17] M. M. Rahman et al., "A Deep Learning-Based Approach for Parkinson’s Disease Detection
Using Handwriting Analysis," IEEE Access, vol. 8,pp. 147953- 147962, 2020.

[18] D. P. Kingma et al., "Speech Analysis for Parkinson’s Disease Detection Using CNN and
LSTM," IEEE Transactions on Biomedical Engineering, vol. 67,no. 9,pp. 2703-2714, 2020.

[19] Captainozlem. Framingham_CHD_Preprocessed_Data. Version 1. Accessed: May 5, 2020.


[Online]. Available: https://www.kaggle. com/-datasets/captainozlem/framingham-chd-
preprocesseddata/download?datasetVersionNumber=1.

48
10. PUBLICATION

Title: Prediction Of Diabetes, Heart Disease, Parkinson’s Disease using ML.


Submitted to: PICET (Parul University International Conference on Engineering
and Technology)
Date of Submission: Maech 11, 2025
Status: This conference paper presents a digital visualization of prediction of
diabetes, heart and parkinson’s disease designed for real-time health monitoring.
Currently, this manuscript is under review (Paper ID: 681).

49

You might also like