0% found this document useful (0 votes)
52 views27 pages

Biostat Lab 2024 07

bahan ajar biostatistik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views27 pages

Biostat Lab 2024 07

bahan ajar biostatistik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Laboratory Excercise 7

Linear Model:
A simple Regression and Correlation

Siswanto Agus Wilopo

Professor of Faculty of Medicine, Public Health and Nursing,


Universitas Gadjah Mada, Yogyakarta
and
Adjunct Full Professor of College of Health and Agricultural Sciences
University College Dublin, Befield, Ireland

Departement of Biostatistics, Epidemiology and Population Health

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 1 / 27


Table of Contents

1 Learning Objectives

2 Activities

3 Class Exercise

4 Homework

5 Output

6 Required Reading

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 2 / 27


Learning Objectives

Learning Objectives

Upon completion of the course unit, students should be able to:


a. demonstrate the process of estimation and inference for the simple linear regression
and correlation
b. evaluate the simple regression assumption and correlation
c. apply concept of simple regressions for data analysis in public health research.
d. demonstrate how to use simple linear regression and correlation methods from
health data
e. appraise published article using a simple linear regression method

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 3 / 27


Activities

Activities

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 4 / 27


Activities

Activities

1 Discussion: correlation and a simple linear regression


2 Laboratory session:
1 Estimating correlation coefficient and a simple linear regression
2 Inference for correlation and linear regression
3 Calculating and reading computer outputs from the correlation and a
simple regression analysis
4 How to estimate and interpret coefficients of a simple linear regression
5 How to assess assumption for a simple linear regression
6 Analyzing data using a a simple linear regression analysis

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 5 / 27


Class Exercise

Class Exercise

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 6 / 27


Class Exercise

Instruction

• Every student should read the journal before the class exercise and
discuss these following questions with his/her friends under the
guidance of your tutor.
• In the group discussion, you are encouraged to discuss questions
and possible answers with other students.
• During the group discussion your tutor will be able to help a few
concepts that you have not exposed before.

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 7 / 27


Class Exercise

Lung function data


The lung function data set includes information on nonsmoking families from the UCLA
study of chronic obstructive respiratory disease (CORD). In the CORD study persons 7
years old and older from four areas (Burbank, Lancaster, Long Beach, and Glendora) were
sampled, and information was obtained from them at two time periods. The data in this
exercise are a subset including 150 families with a mother and a father, and one, two, or
three children between the ages of 7 and 17 who answered the questionnaire and took the
lung function tests at the first time period. The purpose of the CORD study was to
determine the effects of different types of air pollutants on respiratory function, but
numerous other types of studies have been performed on this data set. Data on age, sex,
height, weight, FVC, and FEV1 are included for the members of each family. Some families
have only one or two children and if there is only one child it is listed as the oldest child.
Since many families have only one child (considered the oldest), there are many missing
values in the data for the middle and youngest child.

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 8 / 27


Class Exercise

Data anlysis example

One of the major early indicators of reduced respiratory function is FEV1 or forced
expiratory volume in the first second (amount of air exhaled in 1 second). Since it is known
that taller males tend to have higher FEV1, we wish to determine the relationship between
height and FEV1. We exclude the data from the mothers as several studies have shown a
different relationship for women. The sample size is 150. These data belong to the
variable-X case, where X is height (in inches) and Y is FEV1 (in liters). Here we may be
concerned with describing the relationship between FEV1 and height, a descriptive
purpose. We may also use the resulting equation to determine expected or normal FEV1
for a given height, a predictive use.

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 9 / 27


Class Exercise

You should create scatterplot of the data using stata


sofware. Here the stata commands

use "lung.dta", clear


generate ffev1a = ffev1/100
graph twoway (scatter ffev1a fheight, msymbol(Oh)) ///
(lfit ffev1a fheight), ///
xtitle("A scatterplot between Male FEV1 and Height") ///
xlabel(58 62 to 78, grid) ylabel(2 2.5 to 6.5, angle(0)) ///
ytitle(FFEV1) graphregion(fcolor(white))

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 10 / 27


Class Exercise

6.5

5.5

4.5
FFEV1

3.5

2.5

2
58 62 66 70 74 78
A scatterplot between Male FEV1 and Height

ffev1a Fitted values

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 11 / 27


Class Exercise

Data analysis example

• In your graph, height is given on the horizontal axis since it is the


independent or predictor variable and FEV1 is given on the vertical
axis since it is the dependent or outcome variable.
• Please round heights to the nearest inch in the original data and the
program marked every four inches on the horizontal axis.
• The circles in Figure represent the location of the data.
• There does appear to be a tendency for taller men to have higher
FEV1.
• The program also draws the regression line in the graph.
• The line is tilted upwards, indicating that we expect larger values of
FEV1 with larger values of height.

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 12 / 27


Class Exercise

A simple regression equation

The equation of the regression line is given by stata as:


Y = −4.087 + 0.118X
. regress ffev1a fheight
Source SS df MS Number of obs = 150
F(1, 148) = 50.50
Model 16.0531702 1 16.0531702 Prob > F = 0.0000
Residual 47.0451258 148 .317872472 R-squared = 0.2544
Adj R-squared = 0.2494
Total 63.098296 149 .423478497 Root MSE = .5638

ffev1a Coef. Std. Err. t P>|t| [95% Conf. Interval]

fheight .1181052 .0166194 7.11 0.000 .0852633 .1509472


_cons -4.086702 1.151979 -3.55 0.001 -6.363155 -1.81025

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 13 / 27


Class Exercise

Iterpretation

• The quantity 0.118 in front of X is greater than zero, indicating that as


we increase X; Y will increase.
• For example, we would expect a father who is 70 inches tall to have
an FEV1 value of F EV 1 = −4.087 + (0.118)(70) = 4.173
• Question:
1 If the height was 66 inches then what would you expect for the value of
FEV1?
2 Suppose a father was 2 feet (what is in cm?) tall, what would you
expect for the value of FEV1?
3 This example illustrates the danger of using the regression equation
outside the appropriate range.
4 A safe policy is to restrict the use of the equation to the range of the X
observed in the sample.

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 14 / 27


Class Exercise

Iterpretation

In order to get more information about these men, we requested


descriptive statistics.
. summarize ffev1a fheight
Variable Obs Mean Std. Dev. Min Max

ffev1a 150 4.093267 .6507523 2.5 5.85


fheight 150 69.26 2.779189 61 76

Note that the mean height is approximately in the middle of the heights and the mean
FEV1 is approximately in the middle of the FEV1 values in Figure above.

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 15 / 27


Class Exercise

Iterpretation

We can compute a correlation coefficient as follows:


. pwcorr fheight ffev1 , sig star(.05)
fheight ffev1

fheight 1.0000

ffev1 0.5044* 1.0000


0.0000

a. Can you check whether correlation coefficient is statistically


significant different from 0?
b. How are you going to report?

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 16 / 27


Class Exercise

Model Checking

1 In the lecture, methods for checking for outliers, normality,


homogeneity of variance, and independence were presented along
with a brief discussion of the importance of including checks in the
analysis.
2 Create Normal Probability Plot of the Residuals of the Regression of
FEV1 on Height for Fathers
3 What is your conclusion?
4 Here the stata commands:
• regress ffev1a fheight
• predict resid, resid
• qnorm resid

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 17 / 27


Homework

Homework

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 18 / 27


Homework

Homework

The following are research articles for your reading assignment. Each
student needs to read this articles.
• Muhammad, I. N., Yasrul Izad, A. B., Akram, S., & Atif, A. B. (2021).
Correlation of anthropometric indices with lipid profile indices among
malay obese and non-obese subjects in malaysia. Nutrition and Food
Science, 51(2), 278-288.
doi:https://doi.org/10.1108/NFS-01-2020-0008

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 19 / 27


Homework

Question No. 1

Pay attention Table 3. This is simple linear correlationa between


anthropometric indices and serum lipid profile among Malay subjects
a. Can you read the regressions coefficient of BMI (Kg per m2 with HDL (mmol per l)
and LDL (mmol per l)?
b. Can you estimate the coefficient determinations from that regression equation? How
are you going to explain your coefficient determinations to the reader?
c. For the regression equation involving sex, can you justify the relationship between
sex and HDL (mmol per l) and LDL (mmol per l have statistical significance)? Can
you conclude that male has a higher regression coefficient than female? How are
you going to tell the reader who is not familiar with statistical language?
d. The p-value for BMI to HDL is equal to .01. Can you write a formal hypothesis of this
p-value? What is your conclusion on this association?

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 20 / 27


Homework

Question No. 2

Read the data on Framingham study (framingham.dta). These data


comprise of 3 periods of examination and its is coded as period
a. For the first follow-up, is systolic blood pressure in men determined by BMI? What is
for women? What is your conclusion to compare men and women? Read all
statistical findings from your Stata commands. What is coefficient determination?
b. For the third follow-up, are males having higher blood pressure compared to women?
c. You can use t-test to compare mean systolic blood pressure among men and
women. Please use a simple regression instead for comparing systolic blood
pressure among males and women. Try to compare for the third examination.
d. Create a scatterplot to correlate between systolic blood pressure and BMI among
males and females in the single graph and predictive line as well. Is it consistent with
your previous analysis?
e. Please check the assumption for those equations using qnorm plot. Is the
assumption met?

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 21 / 27


Homework

Note

• During Laboratory Exercise student will be assisted to use computer


program by your teaching assistants. Every student should turn in the
homework within at most 2 weeks after laboratory exercise.
• Here’s a link to class web:
https://elok.ugm.ac.id/course/view.php?id=13296

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 22 / 27


Output

Output

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 23 / 27


Output

Output of this laboratory exercise

1 Analysis continues data with a simple regression


2 Create a graph that presents an association between two variables
3 Interpret the association between two variables
4 Critical appraisal for an article that uses a simple regressionon

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 24 / 27


Required Reading

Required Reading

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 25 / 27


Required Reading

Required Reading

1 Lecture Materials
2 Muhammad, I. N., Yasrul Izad, A. B., Akram, S., & Atif, A. B. (2021).
Correlation of anthropometric indices with lipid profile indices among
malay obese and non-obese subjects in malaysia. Nutrition and Food
Science, 51(2), 278-288.
doi:https://doi.org/10.1108/NFS-01-2020-0008

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 26 / 27


Required Reading

END OF LABORATORY
EXERCISE 07

Prof. Siswanto Agus Wilopo (UGM) Biostatistics I July 23, 2024 27 / 27

You might also like