0% found this document useful (0 votes)
87 views16 pages

BMI Analysis for Student Health

This document analyzes body mass index (BMI) data from a sample of individuals. Scatter plots and regression analysis were used to understand the relationships between age, height, weight, education, gender and BMI. The analysis found height and weight to be significant predictors of BMI, but not gender. Most individuals in the sample had a high BMI, indicating obesity may be a problem in the analyzed region. This could negatively impact education and career outcomes if not addressed.

Uploaded by

Saurabh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views16 pages

BMI Analysis for Student Health

This document analyzes body mass index (BMI) data from a sample of individuals. Scatter plots and regression analysis were used to understand the relationships between age, height, weight, education, gender and BMI. The analysis found height and weight to be significant predictors of BMI, but not gender. Most individuals in the sample had a high BMI, indicating obesity may be a problem in the analyzed region. This could negatively impact education and career outcomes if not addressed.

Uploaded by

Saurabh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

MARKETING ANALYTICS PROJECT

REPORT

ANALYSIS OF BODY MASS INDEX

( BMI )

Alisha Srivastava
Prachi Aggarwal
Anup Thakur
Gowtham Reddy
Sandeep Pal
Body mass index (BMI) is a value derived from the mass (weight) and height of
a person. The BMI is defined as the mass divided by the square of the body
height, and is universally expressed in units of kg/m2, resulting from mass
in kilograms and height in metres. The BMI may also be determined using a
table or chart which displays BMI as a function of mass and height using
contour lines or colours for different BMI categories, and which may use other
units of measurement (converted to metric units for the calculation). The BMI is
a convenient rule of thumb used to broadly categorize a person
as underweight, normal weight, overweight, or obese based on tissue mass
(muscle, fat, and bone) and height. That categorization is the subject of some
debate about where on the BMI scale the dividing lines between categories
should be placed. Commonly accepted BMI ranges are underweight: under
18.5 kg/m2, normal weight: 18.5 to 25, overweight: 25 to 30, obese: over 30.
BMIs under 20.0 and over 25.0 have been associated with higher all-cause
mortality, increasing risk with distance from the 20.0–25.0 range. The
prevalence of overweight and obesity is the highest in the Americas and lowest
in Southeast Asia. The prevalence of overweight and obesity in high income
and upper middle-income countries is more than double that of low and lower
middle-income countries.

DATA SET
Age Height Weight Education m m^2 gender
24 170 45 13 1.7 2.89 1
25 186 55 13 1.86 3.4596 0
18 152 115 12 1.52 2.3104 0
18 154 52 17 1.54 2.3716 0
20 166 87 12 1.66 2.7556 0
21 160 115 12 1.6 2.56 1
20 169 105 12 1.69 2.8561 1
19 177 87 12 1.77 3.1329 0
23 185 109 12 1.85 3.4225 0
24 175 113 12 1.75 3.0625 0
18 168 77 12 1.68 2.8224 0
24 166 68 17 1.66 2.7556 0
19 159 50 14 1.59 2.5281 1
25 159 73 15 1.59 2.5281 1
23 175 140 12 1.75 3.0625 1
21 157 104 12 1.57 2.4649 1
18 175 138 12 1.75 3.0625 0
17 161 129 12 1.61 2.5921 1
22 159 137 12 1.59 2.5281 0
22 167 49 13 1.67 2.7889 0
25 173 108 12 1.73 2.9929 0
17 183 87 12 1.83 3.3489 1
23 152 115 12 1.52 2.3104 1
23 171 120 12 1.71 2.9241 1
24 160 115 12 1.6 2.56 1
Age Height Weight Education m m^2 gender
24 170 45 13 1.7 2.89 1
25 186 55 13 1.86 3.4596 0
18 152 115 12 1.52 2.3104 0
18 154 52 17 1.54 2.3716 0
20 166 87 12 1.66 2.7556 0
21 160 115 12 1.6 2.56 1
20 169 105 12 1.69 2.8561 1
19 177 87 12 1.77 3.1329 0
23 185 109 12 1.85 3.4225 0
24 175 113 12 1.75 3.0625 0
18 168 77 12 1.68 2.8224 0
24 166 68 17 1.66 2.7556 0
19 159 50 14 1.59 2.5281 1
25 159 73 15 1.59 2.5281 1
23 175 140 12 1.75 3.0625 1
21 157 104 12 1.57 2.4649 1
18 175 138 12 1.75 3.0625 0
17 161 129 12 1.61 2.5921 1
22 159 137 12 1.59 2.5281 0
22 167 49 13 1.67 2.7889 0
25 173 108 12 1.73 2.9929 0
17 183 87 12 1.83 3.3489 1
23 152 115 12 1.52 2.3104 1
23 171 120 12 1.71 2.9241 1
24 160 115 12 1.6 2.56 1

The above demonstrated table contains the sample data of the data set we have
worked on, the data contains the age, height, weight, education and gender.
Source: https://www.kaggle.com/
Tools Used : Excel and Rstudio
Variables: 7
Dependent variable: 1
Independent variables: 5
Here we have imported the data in R studio, which is how it looks in the image we have attached
above, it summarises our whole data set the minimum age was found to be 17 and maximum was 25.
the minimum height was found to be 150 and maximum was 180 ( height was recorded in centimeter )
, next we see that the minimum weight was found to be 45 and maximum was 150 ( weight was
recorded in kilograms ) , then we converted the height to meters and later on to meter square.
SCATTER PLOTS

The above demonstrated image is the scatter plot between the age, height, weight, education, gender
and the overall BMI test that was conducted.
Next we founded out the RESIDUALS and COEFFICIENTS, in the coefficients we found out the
standard error and t value, which can also be done through annova test. it can be seen that p-value of
the F-statistic is < 2.2e-16, which is highly significant. This means that, at least, one of the predictor
variables is significantly related to the outcome variable. Here we can see that the gender (-0.05) is
not significant. This means that be it male or female, it is the height, weight that determines the BMI.
The equation of regression can be written as:
BMI = 70+0.4*HEIGHT+0.35*WEIGHT
Now for the model accuracy assessment:
The Adjusted R squared is 0.98 this means that 98% of the variance in this model for BMI can be
predicted by these variables. The residual standard error was found to be 1.535 on 494 degreees of
freedom and multiple R squared was 0.985, adjusted R squared was 0.9849.
Residual plot interpretation
• In the first graph, which is the graph of model1, the residuals are falling
in a symmetrical pattern and have a constant spread throughout the range.
This is how it should be.
• In the second graph, of predicted model3, the shape of the graph is non
random u-shaped, which indicates that some predictors are missing.
• The predictor variables are not able to capture some explanatory
explanation.
NORMAL Q-Q PLOT (QUANTILE PLOT)
INTERPRETATION
• In the graph for model1 the points are somewhat falling in a straight line,
but curving off in the extremities, this shows that the data have more
extreme values than expected.
• In the graph for model3, the points are trying to form a straight line,
indicating a uniform distribution.
• The residuals are normally distributed if the points follow the dotted line
closely
• In this case residual points follow the dotted line closely except for
observation #120, so this model residuals have passed the test of
Normality.

SCALE LOCATION PLOTS

It’s also called Spread-Location plot. This plot shows if residuals are
spread equally along the ranges of predictors. This is how we can check
the assumption of equal variance (homoscedasticity). It’s good if we see a
horizontal line with equally (randomly) spread points.
Graphs on RESIDUALS VS LEVERAGE

Cook’s distance is used to measure the outliers.


More outliers means it would affect the mean.

In both the cases, there is no influential case, or cases. we


can barely see Cook’s distance lines (a red dashed line)
because all cases are well inside of the Cook’s distance lines.
In the following histogram we can see that the number of values is
maximum in the range between 40 – 45, this means that all together BMI
is high in this data set (towards obesity).This is a spiked histogram, hence
no normal distribution.
• After analysing the following dataset, we can conclude that the particular
region is suffering from eating habits, students are more prone to junk
food, they are not active with physical activities.
• Due to this obesity problem, education is getting affected, as it affects the
mental health when it comes to confidence and due to the lack of
confidence students are not performing well.

• As more students are dropped from high school, we can also predict that
they are not into good jobs with a good lumpsum of income, hence
lifestyle is suffering.

You might also like