0% found this document useful (0 votes)
7 views52 pages

01 Introduction Statr2-Lec4

f

Uploaded by

yacinrossi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views52 pages

01 Introduction Statr2-Lec4

f

Uploaded by

yacinrossi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Confounding and Adjustment

John McGready, PhD


Johns Hopkins University
Learning Objectives

► In this set of lectures, we will:


► Formally define confounding and give explicit examples of its impact
► Define adjustment and adjusted estimates conceptually
► Begin a discussion of the analytics of adjustment

2
Confounding: A Formal Definition and
Some Examples

The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Section Objectives

► Formally define confounding

► Establish conditions that can result in the confounding of an outcome/exposure


relationship

► Demonstrate the potential effects of confounding via examples

4
Confounding (Lurking Variable)—1

► Consider results from the following (fictitious) study:


► This study was done to investigate the association between smoking and a certain
disease in male and female adults
► 210 smokers and 240 nonsmokers were recruited for the study

Diagnosis Smokers Nonsmokers Total


Disease 52 64 116
No disease 158 176 334
Total 210 240 450

� 𝑆𝑆 𝑡𝑡𝑡𝑡 𝑁𝑁𝑁𝑁 = 𝑝𝑝�𝑆𝑆 = 52⁄210 ≈ 0.93


► Here, 𝑅𝑅𝑅𝑅 𝑝𝑝�
𝑁𝑁𝑁𝑁 64⁄240

5
Confounding (Lurking Variable)—2

► Additional information: The following table shows the distribution of smokers and non-
smokers by sex

Sex Smokers Nonsmokers Total

Male 160 40 200

Female 50 200 250

Total 210 240 450

6
Confounding (Lurking Variable)—3

► Even more additional information: The following table shows the distribution of disease
status by sex

Sex Disease No disease Total

Male 33 167 200

Female 83 167 250

Total 116 334 450

7
Recap

► The original outcome of interest is DISEASE, and the original exposure of interest is
SMOKING

► In this sample, SEX is related to both the outcome and exposure


► This relationship is possibly impacting the overall relationship between DISEASE and
SMOKING

► How can we look at the relationship between DISEASE and SMOKING, removing any
possible “interference” from SEX?
► One approach—look at the DISEASE and SMOKING relationship separately for males
and females

8
Disease and Smoking: Males Only

► Here is a 2x2 of the disease/smoking relationship among males only

Diagnoses for males Smokers Nonsmokers Total

Disease 29 4 33

No disease 131 36 167

Total 160 40 200

� 𝑀𝑀:𝑆𝑆 𝑡𝑡𝑡𝑡 𝑀𝑀:𝑁𝑁𝑁𝑁 = 𝑝𝑝�𝑀𝑀:𝑆𝑆 = 29⁄160 ≈ 1.8


► Here, 𝑅𝑅𝑅𝑅 𝑝𝑝�
𝑀𝑀:𝑁𝑁𝑁𝑁 4⁄40

9
Disease and Smoking: Females Only

► Here is a 2x2 of the disease/smoking relationship among females only

Diagnoses for
Smokers Nonsmokers Total
females
Disease 23 60 83

No disease 27 140 167

Total 50 200 250

� 𝐹𝐹:𝑆𝑆 𝑡𝑡𝑡𝑡 𝐹𝐹:𝑁𝑁𝑁𝑁 = 𝑝𝑝�𝐹𝐹:𝑆𝑆 = 23⁄50 ≈ 1.5


► Here, 𝑅𝑅𝑅𝑅 𝑝𝑝�
𝐹𝐹:𝑁𝑁𝑁𝑁 60⁄200

10
Smoking, Disease, and Sex: A Recap

► The overall (sometimes called crude or unadjusted) relationship (RR) between smoking
and disease was nearly 1 (risk difference nearly 0)
� 𝑆𝑆 𝑡𝑡𝑡𝑡 𝑁𝑁𝑁𝑁 ≈ 0.93; 𝑅𝑅𝑅𝑅
𝑅𝑅𝑅𝑅 � 𝑆𝑆 𝑡𝑡𝑡𝑡 𝑁𝑁𝑁𝑁 = 𝑝𝑝̂ 𝑀𝑀:𝑆𝑆 − 𝑝𝑝̂ 𝑁𝑁𝑁𝑁 = −0.02

► The sex-specific results showed similar positive associations between smoking and disease
� 𝑀𝑀:𝑆𝑆 𝑡𝑡𝑡𝑡 𝑀𝑀:𝑁𝑁𝑁𝑁 ≈ 1.8;
Males: 𝑅𝑅𝑅𝑅 � 𝑀𝑀:𝑆𝑆 𝑡𝑡𝑡𝑡 𝑀𝑀:𝑁𝑁𝑁𝑁 = 𝑝𝑝̂ 𝑀𝑀:𝑆𝑆 − 𝑝𝑝̂ 𝑀𝑀:𝑆𝑆 = 0.08
𝑅𝑅𝑅𝑅
� 𝐹𝐹:𝑆𝑆 𝑡𝑡𝑡𝑡 𝐹𝐹:𝑁𝑁𝑁𝑁 ≈ 1.5;
Females: 𝑅𝑅𝑅𝑅 � 𝐹𝐹:𝑆𝑆 𝑡𝑡𝑡𝑡 𝐹𝐹:𝑁𝑁𝑁𝑁 = 𝑝𝑝̂ 𝐹𝐹:𝑆𝑆 − 𝑝𝑝̂ 𝐹𝐹:𝑆𝑆 = 0.16
𝑅𝑅𝑅𝑅

► (Note: for the moment, we are not considering statistical significance, just using estimates
to illustrate point)

11
Smoking, Disease, and Sex: What Happened?

► Recall: Males more likely to be smokers, and females more likely to have disease

► The crude RR comparing the risk of disease in smokers to non-smokers has an over-
representation of persons with lower risk of disease (Males)

12
Simpson’s Paradox and Confounding—1

► The nature of an association can change (and even reverse direction) or disappear when
data from several groups are combined to form a single group

► An association between an exposure 𝑋𝑋 and an outcome 𝑌𝑌 can be confounded by another


lurking (hidden) variable 𝑍𝑍 (or variables 𝑍𝑍1, 𝑍𝑍2…)

13
Simpson’s Paradox and Confounding—2

► A confounder 𝑍𝑍 (or set of confounders 𝑍𝑍1 … 𝑍𝑍𝑝𝑝) distorts the true relation between 𝑋𝑋
and 𝑌𝑌

► This can happen if 𝑍𝑍 is related both to 𝑋𝑋 and to 𝑌𝑌

14
Arm Circumference, Height, and Weight—1

► An observational study to estimate association between arm circumference and height in


Nepali children (we’ve used these data before, of course)
► 150 randomly selected subjects, ages 0–12 months, had arm circumference, weight,
and height measured
► This study is observational—it is not possible to randomize subjects to height groups!

► The data
► Arm circumference range: 7.3–15.6 cm
► Height range: 40.9–73.3 cm
► Weight range: 1.6–9.9 kg

15
Arm Circumference, Height, and Weight—2

► Scatterplot with regression line, 𝑦𝑦� = 2.7 + 0.16𝑥𝑥1

16
Arm Circumference, Height, and Weight—3

► Perhaps not surprisingly, weight is associated with both arm circumference (AC) and
height

17
Arm Circumference, Height and Weight-4

► Scatterplot: Arm circumference by height, after adjusting for weight

18
“Batch Effects” in Lab-based Analyses

► Lab-based results can be influenced by the technician, the laboratory used, the time of
day, the temperature in the lab, etc.

► If the goal of a study is to ascertain differences in lab measures between groups (for
example, diseased and non-diseased), and the group is associated with at least some of
the above characteristics, then there can be confounding

19
Summary

► In non-randomized studies, outcome/exposure relationships of interest may be


confounded by other variables
► In such a situation, the relationship between the outcome and exposure differs after
taking into account the confounder(s) of note

► In order to confound an outcome/exposure relationship, a variable must be related to


both the outcome and exposure

20
Adjusted Estimates: Presentation,
Interpretation, and Utility for Assessing
Confounding

The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Learning Objectives

► Understand how to interpret estimates of association that have been adjusted to control
for a confounder

► Compare/contrast the comparisons being made by unadjusted and adjusted association


estimates

2
Adjustment

► Adjustment is a method for making comparable comparisons between groups in the


presence of a confounder/confounding variables

► We will discuss the basics of the mechanics behind adjustment in the next lecture section

3
Fictitious Example—1

► Recall the results from the following (fictitious) study:


► This study was done to investigate the association between smoking and a certain
disease in male and female adults
► 210 smokers and 240 nonsmokers were recruited for the study

Diagnosis Smokers Nonsmokers Total


Disease 52 64 116
No disease 158 176 334
Total 210 240 450

� 𝑆𝑆 𝑡𝑡𝑡𝑡 𝑁𝑁𝑁𝑁 = 𝑝𝑝�𝑆𝑆 = 52⁄210 ≈ 0.93


► Here, 𝑅𝑅𝑅𝑅 𝑝𝑝�
𝑁𝑁𝑁𝑁 64⁄240

4
Fictitious Example—2

► This relative risk is being influenced by the difference in sex distributions among smokers
and nonsmokers

► This relative risk compares all smokers to all nonsmokers in the sample without taking any
other factors into account: this is called the unadjusted or crude estimated association
between disease and smoking

5
Fictitious Example—3

► Adjustment provides a mechanism for estimating an outcome/exposure relationship after


removing the potential distortion or negation that comes from a confounder or multiple
confounders

► In the fictional example, for example, the relationship between disease and smoking can
be adjusted for sex

6
Fictitious Example—4

► Frequently, the presentation of results from non-randomized studies will include a table of
unadjusted and adjusted measures of association
► Example: table of unadjusted and sex-adjusted relative risks from this fictitious example

Table 1: Relative risks of disease (and 95% CIs)

Participation in smoking Unadjusted Adjusted1


Nonsmokers ref ref
Smokers 0.93 (0.68, 1.27) 1.57 (1.12, 2.20)
— — 1adjusted for sex

7
Fictitious Example—5

► Unadjusted estimated relative risk, 0.93


► This compares the risk of disease for all smokers compared to all nonsmokers in the
sample, regardless of sex or any other characteristic (including sex), and, hence,
estimates the comparison of all smokers to all nonsmokers in the population sampled

► Adjusted estimated relative risk, 1.57


► This compares the risk of disease for smokers to nonsmokers of the same sex in the
sample and, hence, estimates the comparison of smokers to nonsmokers of the same
sex in the population sampled: male smokers to male nonsmokers and female smokers
to female nonsmokers

8
Fictitious Example—6

► The unadjusted and adjusted associations can be compared both numerically and
qualitatively to assess confounding by (at least some of) the adjustors

Table 1: Relative risks of disease (and 95% CIs)

Participation in smoking Unadjusted Adjusted1


Nonsmokers ref ref
Smokers 0.93 (0.68, 1.27) 1.57 (1.12, 2.20)
— — 1adjusted for sex

9
Arm Circumference, Height, and Weight—1

► An observational study to estimate association between arm circumference and height in


Nepali children (we’ve used these data before, of course)
► 150 randomly selected subjects, ages 0–12 months, had arm circumference, weight,
and height measured
► This study is observational—it is not possible to randomize subjects to height groups!

► The data
► Arm circumference range: 7.3–15.6 cm
► Height range: 40.9–73.3 cm
► Weight range: 1.6 – 9.9 kg

10
Arm Circumference, Height, and Weight—2

► The unadjusted and adjusted associations can be compared both numerically and
qualitatively to assess confounding by (at least some of) the adjustors

Table 1: Regression slopes (and 95% CIs) from models with AC as outcome (� �
𝒚𝒚 = 𝑨𝑨𝑨𝑨)

Physical characteristic Unadjusted Adjusted1

Height 0.16 (0.13, 0.19) −0.16 (−0.21, −0.11)

Weight 0.80 (0.72, 0.88) 1.40 (1.21, 1.59)

11
Arm Circumference, Height, and Weight—3

► Unadjusted linear regression slope estimate for height, 𝛽𝛽̂ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = 0.16


► This estimates the average difference in arm circumference (cm) between two groups
of children who differ by one centimeter in height
► The average change in arm circumference (cm) per one-centimeter increase in height

► Adjusted linear regression slope estimated for height, 𝛽𝛽̂ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒∗ = − 0.16


► This estimates the average difference in arm circumference (cm) between two groups
of children who differ by one centimeter in height and are the same weight
► The average change in arm circumference (cm) per one-centimeter increase in height
adjusted for weight

12
Arm Circumference, Height, and Weight—4

► The unadjusted and adjusted associations can be compared both numerically and
qualitatively to assess confounding by (at least some of) the adjustors

Table 1: Regression slopes (and 95% CIs) from models with AC as outcome (� �
𝒚𝒚 = 𝑨𝑨𝑨𝑨)

Physical characteristic Unadjusted Adjusted1

Height 0.16 (0.13, 0.19) −0.16 (−0.21, −0.11)

Weight 0.80 (0.72, 0.88) 1.40 (1.21, 1.59)

13
Summary

► Adjustment is a method for making comparable comparisons between groups in the


presence of a confounder/confounding variables

► The group comparisons made by adjusted associations are more specific than those made
by unadjusted (crude) associations

► Contrasting crude and adjusted association estimates is useful for identifying confounding

14
Adjusted Estimates: The General Idea
Behind the Computations

The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Learning Objectives

► Gain some insights conceptually as to how adjusted estimates are computed

2
Confounding (Lurking Variable)

► Consider results from the following (fictitious) study:


► This study was done to investigate the association between smoking and a certain
disease in male and female adults
► 210 smokers and 240 nonsmokers were recruited for the study

Diagnosis Smokers Nonsmokers Total


Disease 52 64 116
No disease 158 176 334
Total 210 240 450

� 𝑆𝑆 𝑡𝑡𝑡𝑡 𝑁𝑁𝑁𝑁 = 𝑝𝑝�𝑆𝑆 = 52⁄210 ≈ 0.93


► Here, 𝑅𝑅𝑅𝑅 𝑝𝑝�
𝑁𝑁𝑁𝑁 64⁄240

3
Smoking, Disease, and Sex: A Recap

► The overall (sometimes called crude, unadjusted) relationship (RR) between smoking and
disease was nearly 1 (risk difference nearly 0)

� 𝑆𝑆 𝑡𝑡𝑡𝑡 𝑁𝑁𝑁𝑁 ≈ 0.93; 𝑅𝑅𝑅𝑅


𝑅𝑅𝑅𝑅 � 𝑆𝑆 𝑡𝑡𝑡𝑡 𝑁𝑁𝑁𝑁 = 𝑝𝑝̂ 𝑀𝑀:𝑆𝑆 − 𝑝𝑝̂ 𝑁𝑁𝑁𝑁 = −0.02

► The sex-specific results showed similar positive associations between smoking and disease

� 𝑀𝑀:𝑆𝑆 𝑡𝑡𝑡𝑡 𝑀𝑀:𝑁𝑁𝑁𝑁 ≈ 1.8;


Males: 𝑅𝑅𝑅𝑅 � 𝑀𝑀:𝑆𝑆 𝑡𝑡𝑡𝑡 𝑀𝑀:𝑁𝑁𝑁𝑁 = 𝑝𝑝̂ 𝑀𝑀:𝑆𝑆 − 𝑝𝑝̂ 𝑀𝑀:𝑆𝑆 = 0.08
𝑅𝑅𝑅𝑅
� 𝐹𝐹:𝑆𝑆 𝑡𝑡𝑡𝑡 𝐹𝐹:𝑁𝑁𝑁𝑁 ≈ 1.5;
Females: 𝑅𝑅𝑅𝑅 � 𝐹𝐹:𝑆𝑆 𝑡𝑡𝑡𝑡 𝐹𝐹:𝑁𝑁𝑁𝑁 = 𝑝𝑝̂ 𝐹𝐹:𝑆𝑆 − 𝑝𝑝̂ 𝐹𝐹:𝑆𝑆 = 0.16
𝑅𝑅𝑅𝑅

4
Computing an Adjusted Estimate, Conceptually—1

► Stratify when the confounder 𝑍𝑍 is categorical


► Compute the association between the outcome and the exposure separately for each
level (stratum) of 𝑍𝑍
► In this fictitious example, separate sex-specific estimates of the disease/smoking
relationship for males and females
► Take weighted average of stratum-specific estimates

5
Computing an Adjusted Estimate, Conceptually—2

► For example, to get a sex-adjusted relative risk for the smoking disease relationship, we
could weight the sex-specific relative risks by numbers of males and females, i.e.:

� 𝑀𝑀:𝑆𝑆 𝑡𝑡𝑡𝑡 𝑀𝑀:𝑁𝑁𝑁𝑁 + 𝑛𝑛𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 × 𝑅𝑅𝑅𝑅


𝑛𝑛𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 × 𝑅𝑅𝑅𝑅 � 𝐹𝐹:𝑆𝑆 𝑡𝑡𝑡𝑡 𝐹𝐹:𝑁𝑁𝑁𝑁
� 𝑆𝑆 𝑡𝑡𝑡𝑡 𝑁𝑁𝑁𝑁,𝑠𝑠𝑠𝑠𝑠𝑠 𝑎𝑎𝑎𝑎𝑎𝑎
𝑅𝑅𝑅𝑅 =
𝑛𝑛𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 + 𝑛𝑛𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓

► So, for the given results:

200 × 1.8 + 250 × 1.5


� 𝑆𝑆 𝑡𝑡𝑡𝑡 𝑁𝑁𝑁𝑁,𝑠𝑠𝑠𝑠𝑠𝑠 𝑎𝑎𝑎𝑎𝑎𝑎 =
𝑅𝑅𝑅𝑅 ≈ 1.6
200 + 250

6
Computing an Adjusted Estimate, Conceptually—3

► There are better ways than this to take such a weighted average (first, doing the
computation the natural 𝑙𝑙𝑙𝑙 scale and then weight by standard error, for example), but this
just illustrates the concept

► Confidence intervals can be computed for these adjusted measures of association

► Multiple regression (in this case, logistic) will be a very useful tool for performing
adjustment

7
Arm Circumference, Height, and Weight—1

► (Unadjusted) scatterplot with regression line, 𝑦𝑦� = 2.7 + 0.16𝑥𝑥1

8
Arm Circumference, Height, and Weight—2

► Scatterplot: Arm circumference by height, after adjusting for weight

9
Arm Circumference, Height, and Weight—3

► How to adjust for a continuous measure (in this case, weight)?

► The algorithm (multiple regression) breaks data into individual weight groups
► In each specific weight strata, a simple linear regression is fit to the AC/height data for
the stratum
► The overall height-adjusted association between AC and height is a weighted average
of the AC/height slopes for each of the individual weight strata

10
Summary

► The adjusted association between 𝑌𝑌 and 𝑋𝑋, adjusted for a single potential confounder 𝑍𝑍,
can be estimated by:
► Stratifying on 𝑍𝑍 (hard to operationalize if 𝑍𝑍 is continuous)
► Estimate the 𝑌𝑌/𝑋𝑋 relationship for each stratum of 𝑍𝑍
► Take a weighted estimate of all 𝑍𝑍 strata-specific 𝑌𝑌/𝑋𝑋 associations

► Idea can be generalized to estimating the adjusted association between 𝑌𝑌 and 𝑋𝑋, adjusted
for multiple potential confounders 𝑍𝑍1, 𝑍𝑍2 … 𝑍𝑍𝑐𝑐

► Multiple regression methods will make the adjustment process easy and straightforward

11
Additional Examples

The material in this video is subject to the copyright of the owners of the material and is being provided for educational purposes under
rules of fair use for registered students in this course only. No additional copies of the copyrighted work may be made or distributed.
Physician Salaries and Sex of the Physician—1

► Article abstract

Source: Jagsi, R., et al. (2012). Gender differences in the salaries of physician researchers. JAMA, 307(22), 2410–2417. 2
Physician Salaries and Sex of the Physician—2

► Unadjusted linear regression slope estimate for sex (1=M, 0=F)

𝛽𝛽̂𝑠𝑠𝑠𝑠𝑠𝑠 = $32,764

► Adjusted linear regression slope estimated for sex (1=M, 0=F)

𝛽𝛽̂𝑠𝑠𝑠𝑠𝑠𝑠∗ = $13,399

► (*after adjustment for specialty, academic rank, leadership positions, publications, and
research time)

3
Example: Clinical Trial, PBC: Incidence Rate Ratio—1

► Crude (unadjusted) incidence rate ratio:

𝐼𝐼�
𝐼𝐼𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷
� 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 𝑡𝑡𝑡𝑡 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 =
𝐼𝐼𝐼𝐼𝐼𝐼 = 1.06, with 95% CI (0.75, 1.50)
𝐼𝐼�
𝐼𝐼𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃

► Interpretations
► The risk of death in the DPCA group (in the study follow-up period) is 1.06 times the
risk in the placebo group
► Subjects in the DPCA group had 6% higher risk of death in the follow-up period when
compared to the subjects in the placebo group
► This comparison is not statistically significant

Source: Dickson, E., et al. (1985). Trial of penicillamine in advanced primary biliary cirrhosis. N Engl J Med, 312(16), 1011–1015. 4
Example: Clinical Trial, PBC: Incidence Rate Ratio—2

► Recall, patients (𝑛𝑛 = 312 total) were randomized to the DPCA or placebo group

► In a moment, the adjusted IRR, adjusted for sex and baseline bilirubin, will be presented
► How do you expect this to compare in value to the unadjusted estimate from the
previous slide? Why?

Source: Dickson, E., et al. (1985). Trial of penicillamine in advanced primary biliary cirrhosis. N Engl J Med, 312(16), 1011–1015. 5
Example: Clinical Trial, PBC: Incidence Rate Ratio—3

� 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 𝑡𝑡𝑡𝑡 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝∗ = 1.01, with 95% CI (0.70, 1.43)


► The adjusted IRR is 𝐼𝐼𝐼𝐼𝐼𝐼

► Interpretations
► The risk of death in the DPCA group (in the study follow-up period) is 1.01 times the
risk in the placebo group after adjusting for sex and baseline bilirubin
► Subjects in the DPCA group had 1% higher risk of death in the follow-up period when
compared to the subjects in the placebo group, among subjects of the same sex with
the same baseline bilirubin levels

6
Example: Clinical Trial, PBC: Incidence Rate Ratio—4

► Why are unadjusted and adjusted IRRs so similar?

You might also like