0% found this document useful (0 votes)
49 views11 pages

Heart Disease Detection SYNOPSIS

The document outlines a project titled 'Heart Disease Detection and Prevention' submitted by Shubha.V and K.S Sahil as part of their Bachelor of Computer Application degree requirements. It details the use of the 2020 Annual CDC Survey Data of 400,000 adults to identify risk factors and develop predictive models for heart disease, aiming to improve public health outcomes through targeted interventions and data-driven insights. The project methodology includes data analysis, machine learning model building, and deployment to predict heart disease likelihood based on various health indicators.

Uploaded by

sahilks1999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views11 pages

Heart Disease Detection SYNOPSIS

The document outlines a project titled 'Heart Disease Detection and Prevention' submitted by Shubha.V and K.S Sahil as part of their Bachelor of Computer Application degree requirements. It details the use of the 2020 Annual CDC Survey Data of 400,000 adults to identify risk factors and develop predictive models for heart disease, aiming to improve public health outcomes through targeted interventions and data-driven insights. The project methodology includes data analysis, machine learning model building, and deployment to predict heart disease likelihood based on various health indicators.

Uploaded by

sahilks1999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

PROJECT TITLE

Heart Disease Detection and Prevention

SYNOPSIS

SUBMITTED BY :

NAME: SHUBHA.V K.S Sahil


REG. NO: U19YU220049 U19YU22S0014
DEPARTMENT: BCA BCA
SEMESTER: 6TH 6th

UNDER THE GUIDANCE OF:

MS. NOORA SHAJAHAN


(HEAD OF DEPARTMENT OF COMPUTER SCIENCE)

1
DATE:08/01/25

CERTIFICATE

This is to certify that MR/MS Shubha .V / K.S Sahil


Reg.No: U19YU22S0049/U19YU22S0014 ,has done the synopsis work on "
Heart Disease Detection and Prevention" submitted to the "BANGALORE
NORTH UNIVERSITY" in the partial fulfilment of the requirement for the award of the
degree of work carried out, under the guidance and supervision of “Ms. NOORA
SHAJAHAN”

This report does not form part of any previous dissertation or reports previously submitted
to this University or any other Universities for the award of degree or diploma.

……………………………………. ………………………………………..
GUIDE/HOD PRINCIPAL
Ms. NOORA SHAJAHAN Dr. LILLY THOMAS
HOD,DEPT. OF BCA SJES COLLEGE OF MANAGEMENT STUDIES
SJES COLLEGE OF MANAGEMENT STUDIES

2
DECLARATION

I do hereby declare that the project work entitled" Heart Disease


Detection and Prevention" submitted to the "Bangalore North
University" in the partial fullment of the requirements for the award
of degree of Bachelor Of Computer Application is a record of conde
and independent project work carried out by ourself under the
guidance and supervision "Ms. NOORA SHAJAHAN", HOD,
Department of Bachelor of Computer Application and this report does
not part of any previous dissertation or report previously submitted to
this University or any other Universities for the award of degree

…………………………………………

NAME: Shubha.V/ Sahil K .S


REG. NO: U19YU22S0049/U19YU22S0014

3
ACKNOWLEDGEMENT

I take this opportunity to express our deep sense of gra tude to our
founder chairman, Sri. K. Nagaraj. I express my sincere thanks to our
respected Principal , Dr. LILLY THOMAS also express my sincere thanks
to my respected guide Ms. NOORA SHAJAHAN, Head of Department of
BCA for his valuable guidance, keen interest and help during the project
development. I also express my sincere thanks to my respected
teachers Ms. POOJA, Assistant Professor,
Department of BCA, Ms. MEGHA, Assistant Professor, Department of
BCA and Mr. Bhaskar, Assistant Professor, Department of BCA for their
love and caring, valuable support, advices regarding project
management. I would like to thank all staffs, friends and all other who
have directly and indirectly helped me in the successful
comple on of this project

…………………………………..
NAME: Shubha.V / Sahil K.S
REG NO:U19YU22S0049/U19YU22S0014

4
INDEX
SL.NO CONTENT PAGE NO

1 ABSTRACT 6

2 OBJECTIVE 6

3 DATA SET OVERVIEW 6

4 DATASET DESCRIPTON 7

5 BUSINESS PROBLEM STATEMENT 7-8

6 SIGNIFICANCE OF PROBLEM 8

STATEMENT
7 SCOPE 8-9

8 SOLUTION TO THE PROBLEM 9

STATEMENT
9 METHODOLOGY FOLLOWED 9-10

10 CONCLUSION 11

5
ABSTRACT
The "2020 Annual CDC Survey Data of 400k Adults" is a comprehensive dataset that includes
responses from 400,000 adults in the United States, collected as part of an annual survey by the
CDC. This dataset provides valuable information on lifestyle choices, medical history, and health
indicators, which are crucial for heart disease detection and prevention. By applying machine
learning techniques, such as classification algorithms, we can build models to predict the likelihood
of heart disease based on factors like age, gender, BMI, smoking, drinking, and sleep patterns. The
data also allows for the analysis of lifestyle factors contributing to heart disease, helping to identify
patterns and develop personalized prevention strategies. Using this dataset for predictive modeling
can lead to early detection and tailored interventions, ultimately improving public health outcomes
and reducing the burden of heart disease.

OBJECTIVE:
The objective of this project is to leverage the 2020 Annual CDC Survey Data of 400k adults to
develop effective strategies for detecting and preventing heart disease. Specifically, the project aims
to:
1. Identify Risk Factors: Analyze the dataset to identify key risk factors associated with heart
disease, such as smoking, obesity, physical inactivity, and poor dietary habits.
2. Predict Heart Disease Risk: Utilize machine learning techniques to develop predictive
models that can accurately assess the risk of heart disease in individuals based on their health
behaviors and medical history.
3. Targeted Interventions: Design and propose targeted interventions to reduce the incidence of
heart disease, including lifestyle modifications and public health campaigns based on data-
driven insights.
4. Support Healthcare Solutions: Provide actionable recommendations for healthcare providers,
policy-makers, and businesses to promote heart-healthy behaviors and improve overall
cardiovascular health in the population.

DATA SET OVERVIEW


The "2020 Annual CDC Survey Data of 400k Adults" is a comprehensive dataset featuring survey
responses from 400,000 adults in the United States. It provides valuable insights into health-related
factors, including demographics, lifestyle choices, medical history, and health indicators. Key
sections include:
- **Demographic Information**: Age, sex, race, and other basic characteristics.
- **Lifestyle Factors**: Smoking, alcohol consumption, and sleep patterns.
- **Medical History**: Previous diagnoses like diabetes, asthma, and stroke.
- **Health Indicators**: BMI, physical health, mental health, and walking difficulties.
- **Risk Factors**: Variables such as obesity, diabetes, smoking, and sedentary lifestyle linked to
heart disease risk.
6
DATA SET DESCRIPTON
The 2020 Annual CDC Survey Data contains 18 attributes, capturing various health-related factors
from 319,795 respondents. Key attributes include:
1. HeartDisease: History of coronary heart disease or myocardial infarction.
2. BMI: Body Mass Index.
3. Smoking: Smoking history (100+ cigarettes).
4. AlcoholDrinking: Heavy drinking (men >14 drinks/week, women >7 drinks/week).
5. Stroke: History of stroke.
6. PhysicalHealth: Days of poor physical health in the past 30 days.
7. MentalHealth: Days of poor mental health in the past 30 days.
8. DiffWalking: Difficulty walking or climbing stairs.
9. Sex: Gender.
10. AgeCategory: Age group.
11. Race: Race/ethnicity.
12. Diabetic: History of diabetes.
13. PhysicalActivity: Exercise participation in the last 30 days.
14. GenHealth: Self-reported general health.
15. SleepTime: Average sleep hours per day.
16. Asthma: History of asthma.
17. KidneyDisease: History of kidney disease (excluding stones/infections).
18. SkinCancer: History of skin cancer.

BUSINESS PROBLEM STATEMENT


The 2020 Annual CDC Survey Data of 400k adults offers valuable insights into heart disease
prevalence, risk factors, and associated behaviors. Businesses across various industries can leverage
this data to address the growing issue of heart disease. Healthcare providers can identify gaps in
prevention and treatment, while food companies can promote heart-healthy options based on dietary
trends. Technology companies can create solutions like wearable devices or telemedicine to monitor
and manage heart health, especially in underserved areas. By utilizing this data, businesses can help
reduce heart disease incidence and improve overall health outcomes.
Business Objective:

7
The goal is to develop strategies to combat heart disease using insights from the CDC survey. Key
objectives include:
 Targeted Interventions: Identify high-risk populations and create specific programs to address
risk factors like high blood pressure and cholesterol.
 Promote Healthy Eating: Use data to identify unhealthy eating habits and encourage heart-
healthy food choices in businesses like restaurants and food manufacturers.
 Innovative Solutions: Develop tech-driven solutions such as wearable devices and mobile
apps to monitor and improve heart health.
 Improve Healthcare Access: Use data to pinpoint areas with limited access to healthcare and
create initiatives like telemedicine services for better healthcare distribution.

SIGNIFICANCE OF PROBLEM STATEMENT


• Public Health Impact: Heart disease is a leading cause of death and a significant public health
issue worldwide. By addressing the problem through effective strategies and solutions,
businesses can contribute to improving population health and reducing the burden of heart
disease on individuals, families, and healthcare systems.
• Economic Implications: Heart disease imposes substantial economic costs on society,
including healthcare expenditures, lost productivity, and reduced quality of life. Developing
interventions and initiatives based on the dataset can lead to cost-effective solutions,
potentially reducing healthcare costs and improving overall economic outcomes.
• Business Opportunities: Businesses that address the problem of heart disease can tap into a
growing market for heart-healthy products, services, and technologies. By leveraging the
dataset, businesses can identify unmet needs, target specific populations, develop innovative
solutions, and gain a competitive advantage in the market.
• Stakeholder Engagement: Collaboration between businesses, healthcare providers,
policymakers, and community organizations is crucial for effectively addressing heart
disease. The dataset serves as a foundation for fostering collaborations, partnerships, and
knowledge-sharing among stakeholders, leading to coordinated efforts and greater impact in
combating heart disease.

SCOPE:
The scope of this project on Heart Disease Detection and Prevention is focused on analyzing the
2020 Annual CDC Survey Data of 400k adults to identify key factors contributing to heart disease
risk. The project will explore various health indicators, lifestyle habits, and medical histories to
assess heart disease risk in the U.S. population. Specifically, the analysis will:
 Focus on identifying risk factors such as smoking, physical inactivity, poor diet, high blood
pressure, and diabetes that contribute to heart disease.
 Use machine learning techniques to predict the likelihood of heart disease based on these
factors.

8
 Develop insights that can guide preventive strategies, including lifestyle modifications and
healthcare interventions.
 Aim to identify high-risk groups within different demographics, such as age, gender, race,
and health history.
 Provide recommendations for businesses and healthcare providers to develop targeted
interventions, health campaigns, and preventive solutions.

SOLUTION TO THE PROBLEM STATEMENT


• Data Analysis and Insights: Perform thorough data analysis on the survey dataset to identify
key trends, patterns, and risk factors related to heart disease. Utilize statistical analysis and
machine learning techniques to uncover correlations and potential causal relationships
between variables. This will provide valuable insights into the prevalence, impact, and
disparities of heart disease.
• Risk Factor Identification: After analysis of the survey data, we can identify the key risk
factors associated with heart disease in the surveyed population. This includes factors such as
smoking, obesity, sedentary lifestyle, high blood pressure, high cholesterol, and diabetes. We
can prioritize the identified risk factors based on their prevalence and impact on heart health.
• Data-Driven Decision Making: Continuously monitor and evaluate the effectiveness of
interventions and programs using data from the survey and other relevant sources. Use data
analytics and evaluation techniques to measure outcomes, identify areas for improvement,
and refine strategies. Utilize the insights gained from data analysis to inform future decision
making and optimize resource allocation.

METHODOLOGY FOLLOWED
As initial steps, we took the raw data and after getting a sense of its structure, we worked to prepare
the data for modelling. These steps are detailed below.
1. Data Analysis, Cleaning/ Pre-Processing:
The pre-processing of the dataset before performing Machine Learning functions involves the
following:
1.1. CSV to Python: Reading the given dataset into Python using pandas.
1.2. Preliminary Data Sanitation: Treatment of error and missing values, so that the Data Frames
are ready to be joined.
1.3. Descriptive Statistics: Here, we used Standard Data Exploration and Visualization functions
using pandas and other libraries in Python. Through this process we begin to understand the
available data by looking at measures such as variation, normality, potential outlier influence
etc. Among other uses, this helps us identify which features might require a form of
normalisation, standardisation or feature engineering treatment.
1.4. Dropping Unnecessary Columns: We removed the columns which do not contribute to the
model building or the columns which are of less, or of no importance.
1.5. Treating Missing Values: Null values in the variables were treated with suitable methods.
9
1.6. Refining and Transforming Features: Based on an initial understanding of the dataset, we
identified which variables are unduly impacted by outliers and remove these outliers using the
inter-quartile range or z-score methods.

Additionally, we made sure variables are in the appropriate form for classification analysis,
and standardize those variables whose original value range may have an undue impact on
distance calculations during model development phase.
Finally, we ensured our data is split into two sets for training the model and then testing the
model.

2. Exploratory Data Analysis:


Exploratory Data Analysis (EDA) is understanding the Datasets by summarizing their main
characteristics, often plotting them visually. This step is very important especially when we arrive
at modelling the data in order to apply Machine learning.
Plotting in EDA done which consists of Bar plot, Box plot, Scatter plot and many more, using
Univariate, Bivariate and Multivariate Analysis.

3. Data Preparation:
Train Test Split: The data is split into train and test in required ratio.

4. Model Building:
We tried to fit/train and test with below Machine Learning (ML) models and compared the
performances:
4.1. Naive Bayes
4.2. KNN Classifier
4.3. Decision Tree
4.4. Random Forest
4.5. AdaBoost Classifier

5. Model Evaluation:
Below metrics are used to evaluate the Multi Classification model’s performance:
5.1. Accuracy
5.2. Precision
5.3. Recall
5.4. F1-score
5.5. Confusion Matrix
5.6. RoC/AuC Score

6. Model Deployment:

10
In this step, we saved the best model and came up with a method or function which takes U.S.
resident’s health data as input and predicts likelihood of getting heart disease as output. We
productionized the deployment using Streamlit app.

7. Conclusions:
We created a precise model which predicts likelihood of getting any heart disease for a person in
U.S. from the given data set, so as to detect and prevent heart diseases.

CONCLUSION
Value additions that can be done using this dataset:
• Targeted Interventions: Identifying specific risk factors and populations at higher risk for
heart disease can enable businesses to develop targeted interventions and prevention
programs. This can range from educational campaigns and behavior change initiatives to
personalized digital health solutions.
• Product and Service Innovation: Businesses can leverage the dataset to inform the
development of heart-healthy products and services. This can include food industry
innovations for healthier menu options, wearable devices for monitoring and promoting
physical activity, and telehealth solutions for remote access to healthcare services.
• Evidence-Based Decision-Making: The dataset provides valuable evidence for decision-
makers in both the public and private sectors. Policymakers can use the insights to shape
public health policies and regulations, while businesses can make data-driven decisions
regarding market strategies, resource allocation, and investments in heart disease prevention
and management.
• Health Promotion and Awareness: Businesses can use the dataset to create health promotion
campaigns and raise awareness about heart disease risk factors, prevention strategies, and the
importance of early detection. This can contribute to empowering individuals to make
informed decisions and adopt healthier lifestyles.

11

You might also like