Epidemiology MCQ
Epidemiology MCQ
Instructions:
o Write the last 4 digits of your ID number in space provide on each page (top right).
o Write clearly and legibly; avoid writing on the back of these pages.
o Show all your work and include units where appropriate.
o Write all answers and computations on these pages.
1. Which of the following best describes the retrospective design where subjects are sampled
by disease status and is often used when the investigator is interested in rare diseases? (4
pts)
A. intervention trial
B. case control study
C. retrospective cohort
D. ecologic study
E. none of the above
2. Which of the following best describes the study design that can be either retrospective or
prospective and is often used when the investigators are interested in rare exposures? (4
pts)
A. intervention trials
B. cohort studies
C. prevalence studies
D. case control study
E. none of the above
3. The strength of an association is one of the criteria for evaluating the cause and effect
relationship between an exposure and outcome. Which of the following is a measure of the
strength of association? (Choose one best answer). (4 pts)
4. Incidence rates of a disease are often referred to as direct measures of risk. Can incidence
rates be calculated from case-control studies? Briefly explain in 1-2 sentences why they can
or cannot be calculated. (4 pts)
5. For each of the following epidemiological measures, indicate whether it is a rate, a
proportion or that it is neither a rate nor a proportion. Circle the best answer. (1 pt each)
____ ____ a. A "J" or "U" shaped relationship of a continuous risk factor and continuous measure of
disease suggests a Pearson product-moment correlation coefficient of near plus one or
minus one.
____ ____ b. A risk ratio measure and a correlation coefficient are both measures of association.
c. A population attributable risk proportion depends on the prevalence of exposure and is
____ ____
not directly related to the strength of an association.
d. The study base for a case-control study consists of those people who if they developed
____ ____
the disease could have been counted as cases.
e. The Bradford Hill criterion "coherence" means that the association has been observed
____ ____
repeatedly in different places, by different observers, and at different times.
f. If an exposure is a cause of a disease, then "temporality" is the Bradford Hill criterion
____ ____
for causal inference that must hold true between exposure and disease.
7. The death rates from various conditions are often compared across geographic areas. These
comparisons are usually based on directly age-standardized mortality rates. Which of the
following best describes what is meant by an age-standardized rate created by the direct
method? (Choose one best answer). (4 pts)
A. The number of events in each age stratum of a standard population is used to create
a weighted average rate.
B. The event rates in each age stratum in the standard population are used to create a
weighted average rate.
C. The event rates in the geographic area of interest are applied to the age-stratum
sizes of a standard population to create a rate that is a weighted average.
D. The event rates in the geographic area of interest are compared to the event rates of
a standard population to create a summary rate that is a weighted average.
8. In order to estimate counts and rates of work-related fatalities, the National Traumatic
Occupational Fatality system has introduced a tick-box on the death certificate to indicate
"injury at work." Kraus et al. (Am J Epidemiol 1995; 141: 973-9) attempted to validate this
"injury at work" classification system against a gold standard [International Classification of
Diseases (ICD) death certificate codes designating deaths that occurred during work-related
activities]. After reviewing a sample of 100,000 death certificates, the authors reported the
following: 1,195 true positives; 788 false positives; 97,672 true negatives; 345 false
negatives. ("Positive" indicates that the tick-box was checked; "negative" indicates that it
was not checked; "true" indicates agreement between the tick-box and the ICD code).
a. Using the counts provided above, complete the 2x2 table below: (2 pts)
ICD Classification
Not work-
Death Certificate Work-related TOTAL
related
Work-related
Not work-related
TOTAL
A. Active surveillance
B. Passive surveillance
C. Retrospective cohort surveillance
D. Cross-sectional survey surveillance
f. The sensitivity and specificity computed above are quantitative measures of which
of the following aspects of death certificate classification of work-related fatalities?
(Choose one best answer). (4 pts)
a. Which of the following best describes the research design used by in this study?
(choose one best answer) (3 pts)
b. Create a 2 x 2 table where one axis is smoking status and the other is age-related
maculopathy status. (4 pts)
c. Calculate the 5-year cumulative incidence of age-related maculopathy in ever
smokers, and in never smokers. Show your work. (4 pts)
10. The following data come from a national survey of the occurrence of back pain. A case of
low back pain was defined as having at least one episode of severe back pain occurring over
a period of 6 months. The number of cases was obtained from surveys of different
occupation groups as well as a national random sample.
Cell phone manufacturing Textile manufacturing National random sample
Age Persons cases Rate Persons Cases Rate Persons Cases rate
11. The evidence supporting obesity as a risk factor for colon cancer remains inconclusive,
especially among women. A recent study (Am J Epidemiol1999; 150:390-398) reported the
association between obesity (measured at baseline) and colon cancer morbidity as
determined from review of medical records and death certificates in a nationally
representative cohort of men and women age 25-74 years who participated in the First
National Health and Nutrition Examination Survey from 1971 to 1975 and were
subsequently followed up through 1992. The following table is from this study for men and
women combined.
<22 28 53,475
22 - <24 41 38,919
24 - <26 36 36,610
26 - <28 40 32,635
28 - <30 35 21,122
30+ 42 34,904
A. Cross-sectional survey
B. Ecological study
C. Population based case control study
D. Cohort study
E. None of the above
b. Complete the table by calculating the crude body mass index-specific incidence rates. (3
pts)
c. Calculate the relative risk (RR) of colon cancer associated with a BMI of 28-<30. Use the
lowest BMI category as referent. In one sentence interpret your answer. (2 pts)
d. Calculate the attributable risk proportion of those in the 28-<30 BMI category. In one
sentence interpret your answer. (the attributable risk formulas provided in class can be
used even though the data provide is for rates) (2 pts)
12. Analyses of data from cohort studies often have to deal with the reality that participants
have unequal lengths of follow up. Given the data below, calculate the (a) total person time
(month) of follow up, (b) the overall incidence density rate, (c) 13 month cumulative
incidence, and (d) the product limit estimate of failure. Each horizontal line represents a
cohort participant. Each vertical line represents one month. Arrows indicate time of loss to
follow up. Black boxes indicate onset of disease (failure). (2 pts each)
a. ______________
b. ______________
c. ______________
d. ______________
University of North Carolina at Chapel Hill
School of Public Health
Department of Epidemiology
Fundamentals of Epidemiology (EPID 168)
Answer Guide
1. B. Case-control studies are said to use sampling by disease and are suited for studying rare
diseases.
2. B. Cohort studies can be either retrospective or prospective and are often used to study rare
exposures.
3. The ratio of odds of exposure among cases to odds of exposure among noncases is the odds
ratio, which is a measure of association.
4. Incidence rates cannot be estimated from case-control studies without additional
information. In the case-control design selection of subjects is based on disease status, so
the number of cases is under the control of the investigator. If the investigator has access to
all cases and knows the size of the population from which they arise s/he can estimate
incidence, but knowledge of the population size is not available from the case-control
design.
5.
7. C. "The event rates in the geographic area of interest are applied to the age-stratum sizes of
a standard population to create a rate that is a weighted average" describes a directly-
standardized rate.
8. a.
ICD Classification
9. D. Prospective cohort, since the investigators monitored people without the condition over
time to detect its development.
Cigarette smoking status
10.
a. Standardized event ratio (for cell phones) = SMR (cell phone) = observed/expected
= 42/{(.003)(1000) + (.06)(700) + (.08)(50)} = 42/49 = 0.86
b. Standardized event ratio (textiles) = SMR (textile) = observed/expected
= 182/{(.003)(100) + (.06)(500) + (.08)(1500)} = 182/150 = 1.2
c. These two ratios cannot be compared directly. An SMR is a weighted average where
the weights (e.g., age structure) come from the population for which indirect
standardization is being carried out. So SMRs for two populations use different
weights. Unless the populations have identical age structures, the stratum-specific
rates are the same for all strata, or the stratum-specific rates for one population are a
constant multiple of those for the second population, the comparison is invalid. With
indirect standardization, it is actually the "standard population" rates that are being
"standardized" to the age distribution of the study population.
11.
Baseline body Number of incident Person-years Incidence
mass index* cases of colon cancer of follow up rate/100,000 PY
<22 28 53,475 52.4
22 - <24 41 38,919 105.3
24 - <26 36 36,610 98.3
26 - <28 40 32,635 122.6
28 - <30 35 21,122 165.7
30+ 42 34,904
a. D. Cohort study
b. RR of colon cancer for BMI 28-<30 kg/m2 vs. lowest = 165.7/52.4 = 3.16
c. ARP for BMI 28-<30 kg/m2 vs. lowest = (3.16 – 1) / 3.16 = 68%
The ARP of 68% means that 68% of the incidence in the 28-<30 kg/m 2 group is
attributable to elevated BMI.
12.
a. 43 person-months
b. 3 cases/43 person-months = 7.0 cases per 100 person-months
c. 13-month CI = 3/7 = 0.43
d. Product-limit estimate of survival = 1-[(6-1)/6 x (5-1)/5 x (3-1)/3)] = 1-0.444 =
0.555
University of North Carolina at Chapel Hill
School of Public Health
Department of Epidemiology
The questions on this examination are largely based on Cantor KP, Lynch CF, Hildesheim ME,
Dosemeci M, Lubin J, Alavanja M, Craun G. Drinking water source and chlorination byproducts in
Iowa. III. Risk of brain cancer. Am J Epidemiol 1999;150:552-60. You may refer to an unannotated
copy of this article during the examination.
1. Briefly discuss two reasons why a case-control study is (or is not) well suited to examine risk
factors for brain cancer. (3 pts)
2. The authors describe the study design they used as a "population-based case-control study".
Briefly explain how this is different than a non-population based case-control study. Include in
your answer issues regarding the selection of cases, selection of controls, and validity. (3 pts)
3. Cases were identified by the State Health Registry of Iowa. Which of the following categories of
study design best describes this method of case finding? Choose one best answer. (3 pts)
A. Prospective follow-up
B. Passive surveillance
C. Cross-sectional survey
D. Community-based screening
E. Hospital-based surveillance
4. The authors state that cases had to be newly diagnosed with histologically confirmed glioma
without previous diagnosis of a maligant neoplasm. Which of the following best describes an
advantage of using incident cases instead of prevalent cases? Choose one best answer. (3 pts)
A. Using incident cases allows the investigators to directly compute relative risks.
B. Using incident cases reduces the non-systematic error of case-control studies.
C. Estimates of exposure from incident cases may be less influenced by disease status.
D. Using incident cases allows for the investigation of effects on risk versus those effecting
duration.
E. Incident cases are less likely to be lost to follow up than prevalent cases.
5. Even if the investigators are careful in the selection of cases and controls, selection bias can make
interpretation of results difficult. Which of the following is NOT a situation that can produce selection
bias? Choose one best answer. (3 pts)
A. The exposure has some influence on the process by which controls are selected.
B. The exposure has some influence on the process of case ascertainment.
C. The disease status has some influence on the recall of exposures.
D. The exposed cases are reported to registries more than unexposed.
E. All of the above will produce selection bias.
6. In this study, exposre information for many of the brain cancer cases was provided by proxy
respondents. The authors did not have information from independent sources that could be
used to directly verify information provided by these surrogates. However, suppose a follow-up
questionnaire was administered to cases, and for 85 of the cases, the investigators were able to
obtained information about whether or not they used a private well directly for the cases (self
report). Assuming that self report is the best available assessment of whether they used a
private well or not, complete the table below so that it reflects a sensitivity, specificity, and
positive predictive value of a proxy response of 77%, 75%, and 57%, respectively. Assume that
26 of cases reported that they used private wells. Show your calculation. (6 pts)
YES
NO
7. Cases in this study were histologically confirmed. This is an example of which of the following
disease classification criteria? Choose one best answer. (3 pts)
A. Causal criteria
B. Ecologic criteria
C. Manifestational criteria
D. Etiologic criteria
E. None of the above
8. Consider the data presented in Table 1 of this article. Which of the following best represents the
proportion of the risk of brain cancer in the population that is attributable to working on a farm
(farm occupation)? Assume that a farm occupation is causally related to brain cancer risk.
Choose one best answer. (4 pts)
A. 33%
B. 57%
C. 10%
D. 29%
E. Cannot be calculated from case-control studies
9. A case-control study like the one described in this paper is most useful when it helps us
understand what is happening in the study base (underlying population). Which of the
following best describes the study base in this article? Choose one best answer. (3 pts)
A. The study base is those who if they developed brain cancer could have been selected as a
case.
B. The study base is those who have an equal probability to be selected as a case or control.
C. The study base is those who are identified as cases or controls after excluding non-
responders.
D. The study base is those who if exposed would have been identified as exposed.
E. None of the above.
10. In Table 3 the odds ratios for incident brain cancer by duration of chlorinated surface water
exposure are given. The odds ratio (95% confidence interval) in men estimating the risk of
brain cancer with 1-19 years of exposure is 1.3 (0.8, 2.1) and 2.5 (1.2, 5.0) for 40 years or more
of exposure. Which of the following best describes the role of chance in observing these two
estimates? Choose one best answer. (3 pts).
A. The odds ratio for 40 years exposure is more likely due to chance because it is based on
fewer cases and controls.
B. The odds for 1-19 years of exposure are more likely due to chance because the point
estimate is closer to the null value (1.0).
C. The odds ratio for 40 years exposure is more likely due to chance because the
confidence interval is so wide.
D. The odds ratio for 1-19 years of exposure is less likely due to chance because the
confidence interval is narrower.
E. The odds ratio for 40 years exposure is less likely due to chance because the confidence
interval does not include 1.0.
11. Table 3 presents’ odds ratios for the association of incident brain cancer with various levels of
lifetime average THM exposure. The odds ratio (95% confidence interval) for lifetime average
THM concentration of 0.8-2.2 g/liter for men was 0.9 (0.6, 1.6). The odds ratio (95%
confidence interval) for lifetime average THM concentration of 32.6 g/liter for woman was
0.9 (0.4, 1.8). Which of the following best describes the precision of these two estimates of
risk? Choose one best answer. (3 pts)
A. The estimate is equal because the point estimates are the same.
B. The estimate is equal because neither confidence interval excludes 1.0.
C. The estimate in men is slightly more precise because the confidence interval is narrower.
D. The estimate in women is slightly more precise because the exposure level is much higher.
E. The precision of the estimates cannot be compared because they are from different
exposure groups.
12. Using the data in Table 4, which of the following best describes the crude unadjusted odds
ratios estimating the risk of brain associated with 40 years exposure to chlorinated surface
water in men with above median tap water intake? Use the category of 0 years exposure to
chlorinated surface water as the reference group. Choose one best answer. (4 pts)
A. 4.0
B. 1.5
C. 3.6
D. 2.6
E. Cannot be computed from data in Table 4.
13. Table 1 shows the adjusted odds ratio estimating the risk of brain cancer by population size.
Using the 25,000 population sizes as a reference calculate the crude (unadjusted) odds ratio
associated with the > 50,100 population. In 2 sentences or less explain why the two estimates
agree or disagree. (4 pts)
14. The authors state that they "found a dose-response relationship among men between brain
cancer and duration of consuming drinking water from chlorinated surface water…” Using 3
Bradford Hill criteria, in 3-4 sentences, address causality (or the lack of causality) of the
relationship of drinking water to brain cancer. (4 pts)
15. An early study of drinking water and brain cancer was an ecological study conducted by the
lead author of the present article. In this study, brain cancer mortality rates in 923 U.S. counties
were compared with average levels of THM measured in the drinking water supplies of those
counties. For counties in which the sampled water supply served at least 85% of the residents
of that county, the correlation coefficient between county-specific mortality rates from brain
cancer and trihalomethane levels was 0.24 in White men and 0.19 in White women. After
reviewing this paper, your colleague concluded that THM in drinking water are causally related
to brain cancer. However, you are more cautious in your interpretation, citing the "ecological
fallacy." Please define the ecologic fallacy (2 pts) and describe why it limits the causal
inferences that can be made from the ecological study described above (2 pts).
16. The authors used information provided by cases and controls on place of residence, primary
source of drinking water, and tap water and total fluid consumption to create an index of
cumulative lifetime exposure. However, the natural history of cancer (initiation, promotion,
conversion, and progression) may encompass many years. If drinking water is involved at the
earliest stages of brain cancer (initiation), then drinking water exposures in the recent past may
be more important than present exposures or those in the distant past (e.g., in childhood). As
defined in class, which of the following periods would be important in defining the minimal and
maximal length of time expected between drinking water exposure and diagnosis with
histologically confirmed glioma? Choose one best answer. (3 pts)
A. Induction period
B. One year case fatality
C. Latent period
D. Both a and c
E. None of above
17. The authors included all cases of histologically confirmed malignant brain cancers, including
glioblastoma, fibrillary and gemistocytic astrocytoma, and mixed glioma. If authors suspected
that drinking water exposure was associated with only certain subtypes of brain cancer (i.e.,
disease heterogeneity), which of the following strategies could they employ at the analysis
stage? (3 pts)
A. Adjustment for cancer type using mathematical modeling (e.g., logistic regression)
B. Stratification of cases by brain cancer type
C. Direct standardization by brain cancer type
D. Indirect standardization by brain cancer type
E. Matching cases and control by brain cancer type
18. The authors restricted their analysis to those cases and controls with at least 70 percent of
their lifetime years with a known source of drinking water. This approach was used to reduce
which type of bias? Choose one best answer (3 pts)
A. Confounding bias
B. Selection bias
C. Information bias
D. Random error
E. None of the above
19. (question was not asked)
20.
a. Using the data in Table 3, label and complete a 2x2 table for the association between brain
cancer and >=40 years’ residence with a chlorinated surface water source (versus 0
years), collapsing over sex (i.e., combine the data for men and women). (4 pts)
b. Calculate the odds ratio for your 2x2 table in part a. Show your work. (3 pts)
c. Suppose that the sex-adjusted OR for the relationship between brain cancer and >=40
years’ residence with a chlorinated surface water source is 1.1. Is sex a confounder of this
relationship? Justify your answer. (3 pts)
d. Is sex an effect modifier (assuming a multiplicative model for joint effects) of the
relationship between brain cancer and >=40 years’ residence with a chlorine
e. Ted surface water source? Justify your answer. (3 pts)
f. According to Table 1, having a farming occupation (ever vs. never) is a risk factor for
brain cancer (OR=1.5). Assume that among the controls, farming occupation is associated
with duration of residence with a chlorinated surface water source. Could farming
occupation be a confounder of the associations reported under the Total column in Table
3? Explain your answer. (3 pts)
21. Characteristics of cases and controls included in this study are shown in Table 1. Using this
information answer the following questions.
YES
NO
b. Assume that 10% of the cases that were labeled as never having worked on a farm truly had
worked in such an environment. Furthermore assume that 15% of the controls that were
labeled as having ever worked on a farm, in fact never really did work on a farm. What
would the true association be between farm occupation and brain cancer? Assume that the
classification of disease status is valid. (4 pts)
c. Which of the following best describes a comparison of the odds ratios you computed in
parts (a) and (b)? Choose one best answer. (3 pts)
23.
a. Using data in Table 1, assess whether the crude OR of brain cancer associated with
farm occupation is confounded by age and/or sex. Support your answer with
relevant calculations. Table 1 shows the adjusted odds ratios estimating the risk of
brain cancer due to having farm occupation. (2 pts)
b. What feature of the study design could have contributed to the crude OR’s in Table 1
being confounded by age and/or sex? (2 pts)
University of North Carolina at Chapel Hill
School of Public Health
Department of Epidemiology
The questions on this examination are largely based on Cantor KP, Lynch CF, Hildesheim ME,
Dosemeci M, Lubin J, Alavanja M, Craun G. Drinking water source and chlorination byproducts in
Iowa. III. Risk of brain cancer. Am J Epidemiol 1999; 150:552-60. You may refer to an unannotated
copy of this article during the examination.
1. Briefly discuss two reasons why a case-control study is (or is not) well suited to examine risk
factors for brain cancer. (3 pts)
2. The authors describe the study design they used as a "population-based case-control study".
Briefly explain how this is different than a non-population based case-control study. Include in
your answer issues regarding the selection of cases, selection of controls, and validity. (3 pts)
3. Cases were identified by the State Health Registry of Iowa. Which of the following categories of
study design best describes this method of case finding? Choose one best answer. (3 pts)
A. Prospective follow-up
B. Passive surveillance
C. Cross-sectional survey
D. Community-based screening
E. Hospital-based surveillance
4. The authors state that cases had to be newly diagnosed with histologically confirmed glioma
without previous diagnosis of a malignant neoplasm. Which of the following best describes an
advantage of using incident cases instead of prevalent cases? Choose one best answer. (3 pts)
A. Using incident cases allows the investigators to directly compute relative risks.
B. Using incident cases reduces the non-systematic error of case-control studies.
C. Estimates of exposure from incident cases may be less influenced by disease status.
D. Using incident cases allows for the investigation of effects on risk versus those effecting
duration.
E. Incident cases are less likely to be lost to follow up than prevalent cases.
5. Even if the investigators are careful in the selection of cases and controls, selection bias can
make interpretation of results difficult. Which of the following is NOT a situation that can
produce selection bias? Choose one best answer. (3 pts)
A. The exposure has some influence on the process by which controls are selected.
B. The exposure has some influence on the process of case ascertainment.
C. The disease status has some influence on the recall of exposures.
D. The exposed cases are reported to registries more than unexposed.
E. All of the above will produce selection bias.
6. In this study, exposure information for many of the brain cancer cases was provided by proxy
respondents. The authors did not have information from independent sources that could be
used to directly verify information provided by these surrogates. However, suppose a follow-up
questionnaire was administered to cases, and for 85 of the cases, the investigators were able to
obtained information about whether or not they used a private well directly for the cases (self
report). Assuming that self report is the best available assessment of whether they used a
private well or not, complete the table below so that it reflects a sensitivity, specificity, and
positive predictive value of a proxy response of 77%, 75%, and 57%, respectively. Assume that
26 of cases reported that they used private wells. Show your calculation. (6 pts)
YES
NO
7. Cases in this study were histologically confirmed. This is an example of which of the following
disease classification criteria? Choose one best answer. (3 pts)
A. Causal criteria
B. Ecologic criteria
C. Manifestational criteria
D. Etiologic criteria
E. None of the above
8. Consider the data presented in Table 1 of this article. Which of the following best represents
the proportion of the risk of brain cancer in the population that is attributable to working
on a farm (farm occupation). Assume that a farm occupation is causally related to brain
cancer risk. Choose one best answer. (4 pts)
A. 33%
B. 57%
C. 10%
D. 29%
E. Cannot be calculated from case-control studies
9. A case-control study like the one described in this paper is most useful when it helps us
understand what is happening in the study base (underlying population). Which of the
following best describes the study base in this article? Choose one best answer. (3 pts)
A. The study base is those who if they developed brain cancer could have been selected
as a case.
B. The study base is those who have an equal probability to be selected as a case or
control.
C. The study base is those who are identified as cases or controls after excluding non-
responders.
D. The study base is those who if exposed would have been identified as exposed.
E. None of the above.
10. In Table 3 the odds ratios for incident brain cancer by duration of chlorinated surface water
exposure are given. The odds ratio (95% confidence interval) in men estimating the risk of
brain cancer with 1-19 years of exposure is 1.3 (0.8, 2.1) and 2.5 (1.2, 5.0) for 40 years or
more of exposure. Which of the following best describes the role of chance in observing
these two estimates? Choose one best answer. (3 pts).
A. The odds ratio for 40 years exposure is more likely due to chance because it is
based on fewer cases and controls.
B. The odds for 1-19 years of exposure is more likely due to chance because the point
estimate is closer to the null value (1.0).
C. The odds ratio for 40 years exposure is more likely due to chance because the
confidence interval is so wide.
D. The odds ratio for 1-19 years of exposure is less likely due to chance because the
confidence interval is narrower.
E. The odds ratio for 40 years exposure is less likely due to chance because the
confidence interval does not include 1.0.
11. Table 3 presents odds ratios for the association of incident brain cancer with various levels
of lifetime average THM exposure. The odds ratio (95% confidence interval) for lifetime
average THM concentration of 0.8-2.2 g/liter for men was 0.9 (0.6, 1.6). The odds ratio
(95% confidence interval) for lifetime average THM concentration of 32.6 g/liter for
woman was 0.9 (0.4, 1.8). Which of the following best describes the precision of these two
estimates of risk? Choose one best answer. (3 pts)
A. The estimate is equal because the point estimates are the same.
B. The estimate is equal because neither confidence interval excludes 1.0.
C. The estimate in men is slightly more precise because the confidence interval is
narrower.
D. The estimate in women is slightly more precise because the exposure level is much
higher.
E. The precision of the estimates cannot be compared because they are from different
exposure groups.
12. Using the data in Table 4, which of the following best describes the crude unadjusted odds
ratios estimating the risk of brain associated with 40 years exposure to chlorinated
surface water in men with above median tap water intake? Use the category of 0 years
exposure to chlorinated surface water as the reference group. Choose one best answer. (4
pts)
A. 4.0
B. 1.5
C. 3.6
D. 2.6
E. Cannot be computed from data in Table 4.
13. Table 1 shows the adjusted odds ratio estimating the risk of brain cancer by population size.
Using the 25,000 population size as a reference calculate the crude (unadjusted) odds
ratio associated with the > 50,100 population. In 2 sentences or less explain why the two
estimate agree or disagree. (4 pts)
14. The authors state that they "found a dose-response relationship among men between brain
cancer and duration of consuming drinking water from chlorinated surface water…". Using
3 Bradford Hill criteria, in 3-4 sentences, address causality (or the lack of causality) of the
relationship of drinking water to brain cancer. (4 pts)
15. An early study of drinking water and brain cancer was an ecological study conducted by the
lead author of the present article. In this study, brain cancer mortality rates in 923 U.S.
counties were compared with average levels of THM measured in the drinking water
supplies of those counties. For counties in which the sampled water supply served at least
85% of the residents of that county, the correlation coefficient between county-specific
mortality rates from brain cancer and trihalomethane levels was 0.24 in White men and
0.19 in White women. After reviewing this paper, your colleague concluded that THM in
drinking water are causally related to brain cancer. However, you are more cautious in your
interpretation, citing the "ecological fallacy." Please define the ecologic fallacy (2 pts) and
describe why it limits the causal inferences that can be made from the ecological study
described above (2 pts).
16. The authors used information provided by cases and controls on place of residence, primary
source of drinking water, and tap water and total fluid consumption to create an index of
cumulative lifetime exposure. However, the natural history of cancer (initiation, promotion,
conversion, and progression) may encompass many years. If drinking water is involved at
the earliest stages of brain cancer (initiation), then drinking water exposures in the recent
past may be more important than present exposures or those in the distant past (e.g., in
childhood). As defined in class, which of the following periods would be important in
defining the minimal and maximal length of time expected between drinking water
exposure and diagnosis with histologically confirmed glioma? Choose one best answer. (3
pts)
A. Induction period
B. One year case fatality
C. Latent period
D. Both a and c
E. None of above
17. The authors included all cases of histologically confirmed malignant brain cancers,
including glioblastoma, fibrillary and gemistocytic astrocytoma, and mixed glioma. If
authors suspected that drinking water exposure was associated with only certain subtypes
of brain cancer (i.e., disease heterogeneity), which of the following strategies could they
employ at the analysis stage? (3 pts)
A. Adjustment for cancer type using mathematical modeling (e.g., logistic regression)
B. Stratification of cases by brain cancer type
C. Direct standardization by brain cancer type
D. Indirect standardization by brain cancer type
E. Matching cases and control by brain cancer type
18. The authors restricted their analysis to those cases and controls with at least 70 percent of
their lifetime years with a known source of drinking water. This approach was used to
reduce which type of bias? Choose one best answer (3 pts)
A. Confounding bias
B. Selection bias
C. Information bias
D. Random error
E. None of the above
20.
a. Using the data in Table 3, label and complete a 2x2 table for the association between
brain cancer and >=40 years’ residence with a chlorinated surface water source
(versus 0 years), collapsing over sex (i.e., combine the data for men and women). (4
pts)
b. Calculate the odds ratio for your 2x2 table in part a. Show your work. (3 pts)
c. Suppose that the sex-adjusted OR for the relationship between brain cancer and
>=40 years’ residence with a chlorinated surface water source is 1.1. Is sex a
confounder of this relationship? Justify your answer. (3 pts)
d. Is sex an effect modifier (assuming a multiplicative model for joint effects) of the
relationship between brain cancer and >=40 years’ residence with a chlorinated
surface water source? Justify your answer. (3 pts)
e. According to Table 1, having a farming occupation (ever vs. never) is a risk factor for
brain cancer (OR=1.5). Assume that among the controls, farming occupation is
associated with duration of residence with a chlorinated surface water source.
Could farming occupation be a confounder of the associations reported under the
Total column in Table 3? Explain your answer. (3 pts)
21. Characteristics of cases and controls included in this study are shown in Table 1. Using this
information answer the following questions.
YES
NO
b. Assume that 10% of the cases that were labeled as never having worked on a farm truly had
worked in such an environment. Furthermore assume that 15% of the controls that were
labeled as having ever worked on a farm, in fact never really did work on a farm. What
would the true association be between farm occupation and brain cancer? Assume that the
classification of disease status is valid. (4 pts)
c. Which of the following best describes a comparison of the odds ratios you computed in
parts (a) and (b)? Choose one best answer. (3 pts)
22. Which of the following is a measure of the validity of methods used to classify exposures
such as having worked on a farm? Choose one best answer. (3 pts)
A. interclass correlation coefficient
B. kappa statistic
C. standard error
D. sensitivity
E. none of the above
23.
a. Using data in Table 1, assess whether the crude OR of brain cancer associated with
farm occupation is confounded by age and/or sex. Support your answer with
relevant calculations. Table 1 shows the adjusted odds ratios estimating the risk of
brain cancer due to having farm occupation. (2 pts)
b. What feature of the study design could have contributed to the crude OR’s in Table 1
being confounded by age and/or sex? (2 pts)
University of North Carolina at Chapel Hill
School of Public Health
Department of Epidemiology
1. a. Briefly summarize two criteria on which disease classifications are based. Discuss
a reason why these two criteria do not always correspond with one another. (3 pts)
1. b. List two examples of each of the two types of criteria you mentioned in 1A. (2 pts)
2. Cohort studies can form the framework for efficient sub studies, using nested case-
control and case-cohort study designs. Which of the following best compares and
contrasts these nested case control studies and case-cohort studies? (3 pts)
A. Both nested case control and case-cohort studies select controls that are
matched on time of case development but only case-cohort studies allow for
multiple comparisons with different case groups.
B. Both nested case control and case-cohort studies select controls from the
entire baseline cohort, but in case-cohort studies the selection is done at
random.
C. In case-cohort studies a single group of controls can be used for comparison
with several case groups.
D. In nested case control studies, cases are selected entirely from the non-
exposed cohort group.
E. both C and D
3. Name the three component parts of any kind of incidence measure. (3 pts)
4. Over a ten-year period the number of bicycle injury events in a population increases
even as the age adjusted bicycle injury rate decreases in the population. Describe
two conditions that could cause this outcome (assume the definition of a bicycle
injury and the quality of the data remain constant over the 10 year period) (3 pts)
5. Which of the following best describes the condition(s) that are required for the odds
ratio (OR) to estimate the risk ratio (RR) in a case-control study? (choose one best
answer) (3pts)
6. The association between induced abortion and breast cancer has been the subject of
previous epidemiological studies. Cohort studies have found no association, while at
least one case-control study has found a positive association. Possible explanations
for the different results in case-control and cohort studies of this topic include
(choose single best answer). (3pts)
A. Case-control studies are prone to selection bias, whereas cohort studies are
not vulnerable to selection bias.
B. Recall bias might explain the association observed in a case-control study,
but this would not be a problem in prospective cohort studies.
C. The method of disease classification is different in case-control and cohort
studies.
D. All of the above
7. Swaen et al (1998) conducted a study of 6,803 males who worked for at least six
months before 1/1/80 at one of nine chemical plants in the Netherlands. The
workers were followed for mortality from 1/1/56 until 1/1/96. Before 1/1/80,
2,842 of the workers were occupationally exposed to acrylonitrile and the other
3,961 workers were not exposed to acrylonitrile. After 1/1/80, there was no
exposure to acrylonitrile. To measure the association between occupational
exposure to acrylonitrile and several outcomes, the investigators calculated
standardized mortality ratios (SMRs) for both the exposed and the unexposed
workers. Age-interval-specific person-years were generated for specific exposure
groups and were multiplied by the mortality rates for the total male population of
the Netherlands to generate expected numbers of cause specific deaths.
b. What was the (crude) cumulative incidence ratio (CIR) for mortality
comparing the exposed to the unexposed men? What are two reasons why
this measure is problematic with these data?
c. For brain cancer, the SMR for the exposed workers (SMR=173.9) was more
than twice the SMR for the unexposed workers (SMR=85.7). Why are these
two SMRs not strictly comparable? (3 pts)
d. There were 290 deaths due to all causes among the exposed group and 983
deaths due to all causes among the unexposed group. What measure of effect
could be calculated to strictly compare all-cause mortality between the
exposed and the unexposed group. (2 pts)
Background information:
A panel of experts reviewed the medical records of 525 patients discharged from the
hospital with diagnosis codes indicative of a stroke (ICD 430-438). The panel
classified strokes as either ischemic or not ischemic. Assume the diagnos is reached
by the panel is the most accurate classification possible. Of the 525 cases, 325 had a
discharge diagnosis code for ischemic stroke (ICD code 434). Of these 325 patients,
85 were determined by the panel not to be ischemic strokes. All but 20 o f the
patients with discharge diagnosis codes other than 434 were determined by the
panel to have non-ischemic strokes.
Given the background information, compute the sensitivity, specificity, and positive
predictive value of a hospital discharge code for ischemic stroke (ICD code 434) in
classifying a patient as truly having an ischemic stroke.
e. If you were to use a 434 discharge code to identify a group of cases with
ischemic stroke and the sensitivity was 99% but the specificity was 40%,
which of the following would best describe your resulting case group.
(Choose one best answer). (2 pts)
10. In a community intervention study, like the Minnesota Heart Health Program,
the effectiveness of an educational intervention program was evaluated.
Which of the following best describes the unit of assignment, the unit of
observation, and the unit of analysis in these types of studies (in this order)?
(2 pts)
11. Indicate next to each statement below whether you consider it to be TRUE,
FALSE, or if you are NOT SURE. A correct answer receives 2 points, an
incorrect one zero.
12. Attributable measures are used by researchers to assess the public health
impact of a detrimental exposure, assuming causality. Given data from a
cohort study on the incidence of stroke (see below), estimate the attributable
risk proportion among the exposed (physically inactive). Explain your
answer in one sentence. Assume that physical activity is causally related to
stroke risk.
Incidence
Physical Did develop a Do notdevelop Person years
per 1,000
activity level stroke a stroke (PY)
PY
Explain:
b. Additional data from the National Health and Nutrition Examination Survey
(NHANES) suggest the prevalence of a physically active lifestyle (at least 30
minutes of moderate activity 3 days per week) is 27%. Using this information
and your answer to part (A), estimate what we can hope to accomplish with
programs to get people to be physically active in the total population. In one
sentence explain your answer. (3 pts)
Explain:
High error
Low error profile
profile
# of Year of # of Year of
Cause of Death Cause of Death
Deaths Death Deaths Death
Other 30 1970
b. Compute the incidence density rate of Alzheimer’s disease death for those
with a high error profile and for those with a low error profile. (3 pts) Show
your work.
c. Compute the incidence density ratio for the risk of Alzheimer’s disease death
associated with a high error communication profile. Explain, in two
sentences or less, what this value means. (3 pts)
d. Using data from this study compute an odds ratio for the association of a high
error communication profile with death from Alzheimer’s disease. Show a
clearly labeled 2x2 table. (2 pts)
e. Compare the odds ratio with the incidence density ratio computed in part c
and explain why they are similar or different.
Causal criteria: disease definition and classification based on the cause of the
condition,
Causal criteria: microbial diseases for which the pathogen has been identified
(syphilis, TB, malaria, yellow fever, influenza, etc.), lead poisoning, birth trauma,
2. (C)- Other choices are incorrect because controls in case-cohort studies are not
matched to cases (A), contrrols are selected at random with both designs (B), and
cases must be selected without regard to exposure (D).
3. New cases or events, population at risk or source population, passage of time
4. The size of the population may have grown (number increases even though rate
does not); the age distribution of the population may have changed (e.g., influx of
families with small children, outmigration of families with older children), so that
age-standardized rate may not change but a greater proportion of the population
may be in the higher risk age range (assuming that younger children have higher
injury rates).
5. (D)- All of the above - use of prevalent cases requires that duration is not related to
exposure, controls should provide estimate of exposure in study base, and rare
disease assumption is required for OR to estimate RR (though not for OR to estimate
IDR).
6. (B)- In a prospective cohort study, information on exposure is obtained before the
outcome (breast cancer, in this case) has occurred. Therefore recall bias - different
recall by cases and non-cases - is not an issue. In a case-control study, cases and
non-cases may recall and report exposure with different degrees of accuracy.
7. a. A (retrospective) cohort study.
c. SMRs are an indirect method of standardization, since they are based on weighted
averages for which the weights are taken from the population whose SMR is being
computed rather than from a "standard" population. Unless the age (and in this case,
age-calendar year interval) distributions for the populations whose SMR's are being
computed are the same, then the weighted averages that make up the SMR's are
based on different sets of weights and are not strictly comparable. Since age-interval
distributions of exposed and unexposed workers may differ, their SMR's are not
strictly comparable.
d. An ROC curve plots the value of sensitivity and specificity for each case definition
or cutpoint. Examining the ROC curve shows the trade-off between sensitivity and
specificity that is available for the diagnostic test or measurement method. [The
area between the identity diagonal (slope = 1.0) and the ROC curve serves as a
measure of accuracy that takes into account both sensitivity and specificity, with the
assumption that the costs of false negatives and false positives are the same.]
e. (B) - Due to the low specificity (50%), half of hemmorhagic strokes in the patient
group will be classified as ischemic strokes.
9. a. Corona del Mar has a 2.9 times higher crude accident rate than Boulder.
b. Adjusted rates -
The cell phone/pager adjusted auto accident rate for Corona del Mar was 1.6 times
that of Boulder. A portion of the difference seen in the crude rates was due to
differences in the distribution of use of cell phones and pagers between the two
cities.
The standard weights are the sum of the population sizes for the two cities. The
weighted rates are the rates for each city, weighted (multiplied) by the standard
weights. The total of the weighted rates is the directly standardized rate. A problem
in using the directly standardized rates is that there are small numbers of cellular
phone and pager users in Boulder.
The higher crude rate in Corona del Mar reflects the much higher use of cellular
phones and pagers, which is associated with a much higher accident rate. The
difference is reduced for the standardized rates, since these control for the different
distributions of cellular phones and pagers between the two cities. However, this is
a situation where it is essential to examine the specific rates, since Boulder has
lower accident rates among cellular phone and pager users but a higher rate among
never-users.
Since the rates in never users are quite similar, Corona del Mar is likely to make its
greatest impact on accident rates by getting motorists to reduce cellular phone and
pager use while driving or finding some way to such use safer (promote the use of
"designated drivers"!?).
10. (A) Community intervention trials of this type assign groups to treatments and
collect measurements from individuals. The unit of analysis must be the same as the
unit of assignment (GROUP) or both (i.e., using mixed models).
11. a. T – a cohort study enrolls people who are free of the outcome and monitors them
for the development of the outcome, so the cohort design can be used to estimate
risk of the event;
b. Not sure – the temporal sequence of exposure and disease can typically not be
addressed in a case-control study, though in some cases (e.g., a genetic
characteristic or other "exposure" that can be definitively assigned to a time prior to
disease onset);
e. T – a cohort study begins with disease-free subjects and monitors them for
development of the outcome; if the outcome is rare, many subjects must be followed
to obtain an adequate number of cases;
g. T – correlational studies (another term for ecological studies) are often used to
compare disease rates across geopolitical entities using available data;
i. F – cross-sectional studies measure prevalence, not risk (of a future event); they
are the most statistically generalizable type of study when, as is often the case, the
study population is obtained through population-sampling;
j. F – the natural history of a disease is the process by which it develops over time;
descriptive information relating to person, place, and time can at best provide only
indirect information;
k. F – as used in class, the term "attributable risk" refers to the risk difference;
o. F – since case-control studies begin with people who are already cases, they avoid
having to study a large number of people for a long time in order to accumulate
enough cases; they can also compare cases and controls in respect to many
exposures; HOWEVER, they cannot readily study many outcomes, since to do so
requires enrolling cases for each of the outcomes to be studied (i.e., equivalent to
conducting several case-control studies that share the same control group);
s. F – typically, general population controls will be less motivated than cases and
sources of medical information for them will not be comparable to those for cases.
12. a. ARP = (I1 - I0) / I1 = (RR-1) / RR = (1.34-1.04) / 1.34 = 0.30 / 1.34 = 22% (after
rounding)
b. A key point here is that 27% is the prevalence of physically active people, whereas
the exposure is physical inactivity, whose prevalence is therefore 100% - 27% =
73%
(The formula PARP = (I - I0) / I can also be used by first estimating the crude
population incidence, I, as a weighted average of the incidences in exposed and
unexposed, weighting by the prevalence of exposure, e.g.: I = (0.73)(1.34) + (0.27)
(1.04) = 1.26, so PARP = (1.259 - 1.04) / 1.259 = 17%
All cases are exposed cases + unexposed cases. Since we do not know the population
size, let it be represented by n. Based on the prevalence of physically active people,
there are 0.73n phyisically inactive and 0.27n physically active people (or person-
years, if we assume a one-year period). So the total number of cases = exposed cases
+ unexposed cases = 0.73(1.34) + 0.27(1.04) = 1.259
Note that these measures can be computed more precisely by using the original
number of cases and person-years and not rounding intermediate results, but two
significant figures is adequate for the actual result, and in this case the answer does
not change.
c. Attributable risk measures assume that the relationship is causal (i.e., that
physical inactivity does in fact cause an ncrease stroke risk). Some of the above
interpretations may also require that the process be reversible, so that changing to a
physically active lifestyle brings risk down to the level of someone who was not
inactive. Another assumption is that the rates and rate ratio observed in the cohort
study hold ofr the entire population. Also, we have ignored the effects of other
factors, most notably age.
c. IDR= ID High / ID low = 2.24/0.651 = 3.4. Nuns with a high error communications
profile are 3.4 times more likely to die from Alzheimer's Disease than nuns with a
low error profile.
d.
Alzheimer’s Disease
e. The two are similar because the condition is fairly rare.
Most of the questions on this examination relate to the article "Individual risk factors for
hip osteoarthritis: obesity, hip injury, and physical activity" (Cyrus Cooper, Hazel Inskip,
Peter Croft, Lesley Campbell, Gillian Smith, Magnus McLaren, and David Coggon. Am J
Epidemiol 1998; 147:516-22). You may refer to this article during the examination.
1. Briefly list two reasons why a case control study is (or is not) appropriate to
examine individual risk factors for hip osteoarthritis. (2 pts)
2. The authors state that their cases come from a defined population.
List four features of the population or the study design that support this statement
or helped the authors to achieve it? (4 pts)
3. Considering the study population, study design, and other information in the
article, which of the following statements is (are) TRUE and which is (are) FALSE. (2
pts each)
b. If about 12% of the population was age 65 years or older, then about
12,000 people age 65 years or older in the two districts have radiographic
evidence of hip osteoarthritis.
c. The data in Table 1 demonstrate that women are 1.9 times as likely to
develop severe symptomatic hip osteoarthritis as are men.
d. The data in Table 2 indicate that female gender is not a risk factor for hip
osteoarthritis.
e. In this study, matching the control group to the cases on age, as opposed to
a random sample of the general adult population, probably resulted in
greater statistical power and precision.
4. The case identification process was based on a register in each district made up of
persons on a waiting list for a total hip arthoplasty (surgical reformation of the hip
joint). Waiting lists for procedures are common in societies with a nationa l or social
medicine system. In the United States, a region wide waiting list for a hip
arthoplasty is unlikely, as the availability of receiving this procedure would be more
related to insurance status or ability to afford such a procedure. Explain how using
the register system in the Untied Kingdom to select cases either increases or
decreases the possibility of selection bias as compared to a study conducted in the
United States. (4 pts)
5. How was the diagnosis of hip osteoarthritis made in this study? Was this based on
manifestional or causal criteria? Explain your answer. (3 pts)
6. According to the authors: "For each case, a control of the same sex and age was
selected from the list of the same general practice held by the county Family Health
Service Association". State in one sentence the rationale for using a list from ge neral
practioners? (3pts)
7. Eighty-four percent of the patients listed for total hip arthroplasty fulfilled the
criteria for entry into the study as cases. Which of the following best describes the
criteria: (3 pts)
a. age > 45 years, being on the waiting list for hip arthroplasty, and the
presence of Heberden’s nodes.
b. age > 45 years, pain duration at least for 36 months, and presence of
Heberden’s nodes.
c. history of hip fracture within the past year, being on the waiting list for hip
arthroplasty and reside in the study area.
d. presence of Heberden’s nodes, history of hip fracture within the past year,
and reside in the study area.
e. being on the waiting list for hip arthroplasty, reside in the study area, and
age > 45 years
8. The authors report that 89% of the eligible cases agreed to participate and 60% of
the 1060 controls approached agreed to participate. Which of the following best
states a condition regarding the non-responders that could lead to an odds ratio re
ported for the risk of osteoarthritis associated with previous hip injury that is
biased away from the null (>1). Choose one best answer. (3 pts)
b. the control group would have been less representative of the study base;
f. it would have been necessary to control for age and sex in the analysis.
10. The authors selected controls who were individually matched to cases by age,
gender, and family practitioner. Matching in the design stage is usually considered
only for those variables that are known to be confounders. Under which of the
follow ing circumstances could gender be a confounder of the association between a
risk factor (obesity) and the outcome (hip osteoarthritis)? Circle all that apply. (4
pts)
a. the prevalence of obesity and the prevalence of hip osteoarthritis are both
higher in men that in women
b. the prevalence of obesity is lower in men than women, but the prevalence
of hip osteoarthritis is higher in men than women.
c. the prevalence of obesity is higher in men than women, but the prevalence
of hip osteoarthritis is the same in men and women.
d. the prevalence of obesity is the same in men and women, but the
prevalence of hip osteoarthritis is higher in men than women.
11. The odds ratios in Table 2 are "mutually adjusted for the other two variables" by
logistic regression. The following questions concern the models used to estimate the
odds ratios in the table (ignore the fact that it was "condit ional" logistic regression
and ignore the middle categories for body mass index and presence of Heberden’s
nodes) (2 pts each):
a. How many logistic models were necessary to estimate the odds ratios for
body mass index >28.0, definite Heberden’s nodes, and previous hip injury
among women.
b. The odds ratio estimate for hip injury in women was 2.8. What must the
logistic coefficient have been?
c. From this table, estimate the odds ratio for women who had both definite
Heberden’s nodes and previous hip injury compared to women who had
neither.
12. In this study, information on medical history, life style, and leisure time physical
activities was obtained through a "structured interviewer-administered
questionnaire". (page 517). It is possible that persons on a waiting list for a hip
arthoplasty would be more keenly aware of hip injuries they may have had in the
past than controls. If true, this is an example of which of the following? Choose one
best answer. (3 pts)
13. Among women, the odds of previous hip injury is higher among cases than
controls (Table 2; OR=2.8). As indicated in the footnotes for Table 2, the odds ratio
for pervious hip injury is adjusted or controlled for the other two variables in the Ta
ble (body mass index and Heberden’s nodes). Using the counts shown in Table 2,
calculate an unadjusted (crude) odds ratio for previous hip injury in women. (3 pts)
14. Which of the following conclusions can be made from the above results? (choose
one best answer) (3 pts)
b. since the unadjusted and adjusted odds ratios are similar, the risk factor
(hip injury) must not be associated with the adjustment variables (body mass
index and Heberden’s nodes)
c. since the unadjusted and adjusted odds ratios are similar, there is no
effect-measure modification of the association between hip injury and hip
osteoarthritis.
15. The odds ratios presented in Table 5 are adjusted for previous hip injury. Why
might they still be confounded by hip injury? (3 pts)
16. In Table 6, is the crude association between previous hip injury and risk of
unilateral hip osteoarthritis biased towards the null or away from the null? (2 pts)
17. Based on the data in Table 3, what is the odds ratio for Heberden's nodes
(definite versus none) for persons in the Upper tertile of body mass index? (3 pts)
18. Rothman has proposed that "public health synergism" is present when an
observed joint effect exceeds that expected under the additive model. Do the odds
ratios in Table 3 indicate the presence of "public health synergism" for effect of
Heberden 's nodes and elevated body mass index on hip osteoarthiritis? If not, do
the odds ratios conform to a multiplicative model? Include in your answer a 1-2
sentence assessment of whether these data indicate "public health synergism". (For
this question, ignore the row for "Possible" Heberden's nodes and the column for
the middle tertile of body mass index, and assume that both Heberden’s nodes and
elevated BMI reflect casual risk factors for hip osteoarthritis. Note: do not
necessarily rely on the autho rs' description of this table.) (6 pts)
19. The authors investigated the association of specific sporting activities with risk
of hip osteoarthritis. Their data are presented in Table 5. Using their data, compute
separately the unadjusted (crude) risk of osteoarthritis associated with pla ying golf
and for swimming in men and women combined. Consider those who do not
participate in any sport as the reference group and assume no missing data. Show
two appropriate 2x2 table and your calculations. (4 pts)
19a. Compare these unadjusted (crude) odds ratios with the ones presented in
Table 3. Briefly describe and explain the comparison. (3 pts)
19b. Consider the possibility that golfers who have hip osteoarthritis are reluctant
to seek medical attention for their condition for fear it will mean the end of their
ability to play golf. Therefore, cases who golf are less likely to be se lected for this
study than cases who do not golf. If the true OR associated with golf is 2.0, then
which of the following best describes the selection bias and its impact on the odds
ratio you computed. (3 pts)
d. differential selection bias resulting in an odds ratio biased toward the null.
19c. The authors state that "...the association with swimming may have arisen
because patients with hip osteoarthritis were advised to swim..." (page 521).
Suppose that 25% of the cases had been incorrectly classified as swimmers and
assume that the misclassified cases had not participated in any other sporting
activity, either. Re-compute the odds ratio for the association of hip osteoarthritis
and swimming, after re-classifying these individuals, using the number from the 2x2
table in question 19 above. Briefly discuss how your conclusion about the role of
swimming does (or does not) change. In what direction did misclassification bias
the study OR? (3 pts)
20. The odds ratio (95% confidence interval) estimating the risk of osteoarthritis
associated with a previous hip injury was 24.8 (3.1-199.3) in men and 2.8 (1.4-5.8)
in women (see Table 2).
"In a previous case-control study (17) of men aged 60-76 years, we observed
a doubling of risk for hip osteoarthritis among those in the highest third of
body mass index distribution, as compared with those in the lowest third,
although the increased risk was not statistically significant." (p519 bottom of
right column)
c. The doubling of risk was not statistically significant because a p-value was
not computed, so it is not possible for the authors to know whether the
increased risk was due to chance.
22. A medical journalist, confused by the thrust of this article, comes to you and
says: "I've read this article several times, but I can't figure out what it shows about
the relationship of body mass index, Heberden's nodes, and hip osteoarthri tis. The
authors explain that 'two broad mechanisms are believed to underlie the
pathogenesis of osteoarthritis at any joint site: mechanical stress and a generalized
predisposition to the disorder' as indexed by Heberden’s nodes [p519 right column].
T hat seems straightforward enough, and they later conclude that the analysis
'supports the notion that this condition arises through an interaction between a
generalized predisposition to the disorder and specific mechanical insults to the hip'
[p521]. Y et on page 518 [right column], the authors state that there was 'no
statistically significant interaction' between body mass index and Heberden's nodes,
and on page 519 [left column] they refer to obesity and a tendency to polyarticular
involvement as 'i ndependent risk factors for hip osteoarthritis'. Would you please
assess for me what this article shows about the relationship among body mass
index, Heberden's nodes, and hip osteoarthritis? I have room for 40-60 words.
Thanks!" (6 pts)
23. Write a brief statement for or against a causal relationship between hip injury
and risk of osteoarthritis. Comment specifically on at least two of Bradford Hill’s
criteria for causal inference. Support your conclusion with data or statements f rom
the article. (4 pts)
1. Briefly list two reasons why a case control study is (or is not) appropriate to examine
individual risk factors for hip osteoarthritis. (2 pts)
Condition rare, faster to complete than cohort study, wide range of exposures of
interest.
2. The authors state that their cases come from a defined population. List four features of
the population or the study design that support this statement or helped the authors to
achieve it? (4 pts)
1. The two health districts had a centralized orthopedic facility for assessment and
treatment of hip osteoarthritis;
2. Local orthopedic surgeons were willing to enter all patients into the study;
3. All men and women 45 years and older who were placed on the waiting list for
primary total hip arthoplasty were considered for the study;
5. The study excluded patients who lived outside the two districts.
The diverse socioeconomic profile was an advantage for generalizability but does not
make this a defined population.
3. Considering the study population, study design, and other information in the article,
which of the following statements is TRUE and which are FALSE . (2 pts each)
b. If about 12% of the population was age 65 years or older, then about 12,000
people age 65 years or older in the two districts have radiographic evidence of hip
osteoarthritis.
[TRUE - 10% population prevalence in age 65 years and older * 12% of one
million]
c. The data in Table 1 demonstrate that women are 1.9 times as likely to develop
severe symptomatic hip osteoarthritis as are men.
[FALSE - the data in Table 1 cannot demonstrate this female excess, since there
is no information about the sex ratio in the older population; this ratio may
well reflect a greater incidence of severe symptomatic hip osteoarthritis in
women, but some of the excess presumably derives from greater mortality
among men.]
d. The data in Table 2 indicate that female gender is not a risk factor for hip
osteoarthritis.
[FALSE - controls were matched to cases on gender (and age), so the sex ratio
in the controls must match that in the cases]
e. In this study, matching the control group to the cases on age, as opposed to a
random sample of the general adult population, probably resulted in greater
statistical power and precision.
[TRUE - the mean age of the cases is 70 years old, with the majority older than
60; thus, the use of general population controls without regard to age would
result in relatively little overlap between the age distributions of cases and
controls on this very important variable.]
4. The case identification process was based on a register in each district made up of
persons on a waiting list for a total hip arthoplasty (surgical reformation of the hip joint).
Waiting lists for procedures are common in societies with a national or social medicine
system. In the United States, a region wide waiting list for a hip arthoplasty is unlikely, as
the availability of receiving this procedure would be more related to insurance status or
ability to afford such a procedure. Explain how using the register system in the Untied
Kingdom to select cases either increases or decreases the possibility of selection bias as
compared to a study conducted in the United States. (4 pts)
Using the registry may reduce selection bias if affluence or ability to pay for a hip
replacement is associated with exposures like BMI, physical activity, Heberden’s nodes.
Cases selected from surgery lists in the United States system may have a differential
association with a risk factor as compared cases not receiving this procedure, so
measures of association may be more biased in a U.S. study.
5. How was the diagnosis of hip osteoarthritis made in this study? Was this based on
manifestional or causal criteria? Explain your answer. (3 pts)
(page 517, left column, 2nd paragraph): Diagnosis of hip osteoarthritis in this study
was based on pelvic radiographs. This is based on manifestional criteria.
6. According to the authors: "For each case, a control of the same sex and age was selected
from the list of the same general practice held by the county Family Health Service
Association". State in one sentence the rationale for using a list from general practioners?
(3pts)
(page 517, left column, 3rd paragraph): In England and Wales, almost everyone is
registered with a general practitioner so that these lists essentially provide an
enumeration of the general population.
7. Eighty-four percent of the patients listed for total hip arthroplasty fulfilled the criteria
for entry into the study as cases. Which of the following best describes the criteria: (3 pts)
a. age > 45 years, being on the waiting list for hip arthroplasty, and the presence of
Heberden’s nodes.
b. age > 45 years, pain duration at least for 36 months, and presence of Heberden’s
nodes.
c. history of hip fracture within the past year, being on the waiting list for hip
arthroplasty and reside in the study area.
d. presence of Heberden’s nodes, history of hip fracture within the past year, and
reside in the study area.
e. being on the waiting list for hip arthroplasty, reside in the study area, and age > 45
years (answer)
8. The authors report that 89% of the eligible cases agreed to participate and 60% of the
1060 controls approached agreed to participate. Which of the following best states a
condition regarding the non-responders that could lead to an odds ratio reported for the
risk of osteoarthritis associated with previous hip injury that is biased away from the null
(>1). Choose one best answer. (3 pts)
a. control non-responders are more likely to have a history of hip injury compared to
case non-responders. (answer)
b. control non-responders are less likely to have a history of hip injury compared to
case non-responders.
b. the control group would have been less representative of the study base;
f. it would have been necessary to control for age and sex in the analysis.
Answer: d. Failure to replace controls who refused would have reduced both the
number of controls and of cases (due to the matching), with a loss of statistical power
and increase in the probability of a type II error.
10. The authors selected controls who were individually matched to cases by age, gender,
and family practitioner. Matching in the design stage is usually considered only for those
variables that are known to be confounders. Under which of the following circumstances
could gender be a confounder of the association between a risk factor (obesity) and the
outcome (hip osteoarthritis)? Circle all that apply. (4 pts)
a. the prevalence of obesity and the prevalence of hip osteoarthritis are both higher in
men that in women (true)
b. the prevalence of obesity is lower in men than women, but the prevalence of hip
osteoarthritis is higher in men than women. (true)
c. the prevalence of obesity is higher in men than women, but the prevalence of hip
osteoarthritis is the same in men and women.
d. the prevalence of obesity is the same in men and women, but the prevalence of
hip osteoarthritis is higher in men than women.
11. The odds ratios in Table 2 are "mutually adjusted for the other two variables" by
logistic regression. The following questions concern the models used to estimate the odds
ratios in the table (ignore the fact that it was "conditional" logistic regresion and ignore the
middle categories for body mass index and presence of Heberden’s nodes) (2 pts each):
a. How many logistic models were necessary to estimate the odds ratios for body
mass index >28.0, definite Heberden’s nodes, and previous hip injury among
women.
"Mutually adjusted" means that each odds ratio comes from a model that
includes the other two factors, which therefore means that all three factors are
included in the same model. So one model yields an adjusted odds ratio for each
variable. So one model was used.
b. The odds ratio estimate for hip injury in women was 2.8. What must the logistic
coefficient have been?
<p
The OR for a dichotomous or indicator variable is exp(beta), where beta
is the logistic coefficient. Therefore the coefficient was 1n(2.8) = 1.0296.
</p
c. From this table, estimate the odds ratio for women who had both
definite Heberden’s nodes and previous hip injury compared to
women who had neither.
12. In this study, information on medical history, life style, and leisure time
physical activities was obtained through a "structured interviewer-
administered questionnaire". (page 517). It is possible that persons on a
waiting list for a hip arthoplasty would be more keenly aware of hip injuries
they may have had in the past than controls. If true, this is an example of
which of the following? Choose one best answer. (3 pts)
14. Which of the following conclusions can be made from the above results?
(chose one best answer) (3 pts)
b. since the unadjusted and adjusted odds ratios are similar, the risk
factor (hip injury) must not be associated with the adjustment
variables (body mass index and Heberden’s nodes)
c. since the unadjusted and adjusted odds ratios are similar, there is
no effect-measure modification of the association between hip injury
and hip osteoarthritis.
15. The odds ratios presented in Table 5 are adjusted for previous hip injury.
Why might they still be confounded by hip injury? (3 pts)
16. In Table 6, is the crude association between previous hip injury and risk
of unilateral hip osteoarthritis biased towards the null or away from the null?
(2 pts)
17. Based on the data in Table 3, what is the odds ratio for Heberden's nodes
(definite versus none) for persons in the Upper tertile of body mass index? (3
pts)
Expected joint excess risk = excess risk for factor 1 + excess risk for factor 2
= excess risk for Heberden's nodes + excess risk for Body mass index
Expected excess risk = (OR for Heberden's nodes - 1) + (OR for Body mass
index - 1)
The substantial difference between 2.2 and 1.0 indicates that the odds
ratios in this table do not conform to an additive model for expected
joint effect.
The odds ratios do not conform to a multiplicative model, either:
Expected joint OR = (OR for Heberden's nodes) * (OR for Body mass index )
Since these odds ratios indicate a joint effect greater than that
expected under an additive model, "public health synergism" is
present, to a moderate degree (we expect a 100% increase in risk but
observe a 220% increase in risk)
YES 51 34
NO 140 162
OR = 1.7
NO 140 162
OR = 1.6
19a. Compare these unadjusted (crude) odds ratios with the ones presented
in Table 3. Briefly describe and explain the comparison. (3 pts)
Table shows 1.4 and 1.5, respectively. This suggests that BMI, nodes, and
hip injury explain very little of the association of these two sports with
hip osteoarthritis.
19b. Consider the possibility that golfers who have hip osteoarthritis are
reluctant to seek medical attention for their condition for fear it will mean
the end of their ability to play golf. Therefore, cases who golf are less likely to
be selected for this study than cases who do not golf. If the true OR associated
with golf is 2.0, then which of the following best describes the selection bias
and its impact on the odds ratio you computed. (3 pts)
19c. The authors state that "...the association with swimming may have
arisen because patients with hip osteoarthritis were advised to swim..." (page
521). Suppose that 25% of the cases had been incorrectly classified as
swimmers and assume that the misclassified cases had not participated in
any other sporting activity, either. Re-compute the odds ratio for the
association of hip osteoarthritis and swimming, after re-classifying these
individuals, using the number from the 2x2 table in question 19 above.
Briefly discuss how your conclusion about the role of swimming does (or
does not) change. In what direction did misclassification bias the study OR?
(3 pts)
20. The odds ratio (95% confidence interval) estimating the risk of
osteoarthritis associated with a previous hip injury was 24.8 (3.1-199.3) in
men and 2.8 (1.4-5.8) in women (see Table 2).
a. Which estimate indicates a stronger association? (2 pts) 24.3
22. A medical journalist, confused by the thrust of this article, comes to you
and says: "I've read this article several times, but I can't figure out what it
shows about the relationship of body mass index, Heberden's nodes, and hip
osteoarthritis. The authors explain that 'two broad mechanisms are believed
to underlie the pathogenesis of osteoarthritis at any joint site: mechanical
stress and a generalized predisposition to the disorder' as indexed by
Heberden’s nodes [p519 right column]. That seems straightforward enough,
and they later conclude that the analysis 'supports the notion that this
condition arises through an interaction between a generalized predisposition
to the disorder and specific mechanical insults to the hip' [p521]. Yet on page
518 [right column], the authors state that there was 'no statistically
significant interaction' between body mass index and Heberden's nodes, and
on page 519 [left column] they refer to obesity and a tendency to
polyarticular involvement as 'independent risk factors for hip osteoarthritis'.
Would you please assess for me what this article shows about the
relationship among body mass index, Heberden's nodes, and hip
osteoarthritis? I have room for 40-60 words. Thanks!" (6 pts)
Points to include:
2. People with both elevated BMI and Heberden's nodes have a greater
risk for hip osteoarthritis than people with only one of these risk factors
and even greater than would be expected from adding or multiplying
their individual effects (i.e., greater than expected by both additive or
multiplicative models).
3. The authors seem to believe and the study does not show otherwise
that most cases of hip osteoarthritis in their study result from a
combination of mechanical stress (which could be something other than
obesity) and biologic predisposition (which might not yet have
manifested in other joints).
23. Write a brief statement for or against a causal relationship between hip
injury and risk of osteoarthritis. Comment specifically on at least two of
Bradford Hill’s criteria for causal inference. Support your conclusion with
data or statements from the article. (4 pts)
1. Match the term from column A with the most appropriate topic or
concept from column B (use each term only once and each topic only
once). (1 pt each = 12 pts)
3. In the Minnesota Heart Health Program (as described in class) and many
other community intervention studies, the effectiveness of an
educational intervention program is evaluated. Which of the following
selections best describes the unit of assignment, the unit of
observation, and the unit of analysis (in this order) in studies of
these types? (Choose one best answer) (4 pts)
____ b. prevent bias introduced when the patients know what type of
treatment they are receiving
____ c. prevent bias introduced when the investigators know what type of
treatment the patients are receiving
____ d. b and c
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
-3- ID Number __-__ __ __ __
___________________________________________________________________
HIV-infected 9 31 40
___________________________________________________________________
7A. Which one answer best describes the transmission rate in the table?
(4 pts)
____ a. proportion
____ d. odds
7B. Using the data in the table, estimate the relative risk of HIV
infection for infants whose mothers took zidovudine relative to
infants of mothers who took placebo. Show formula and calculations.
(4 pts)
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
7C. Based on the data in the above table, estimate the proportion of
potential cases of perinatal HIV transmission that could be prevented
by providing zidovudine to HIV-positive, 2nd trimester pregnant women
who would otherwise not receive the drug. (Assume all women take the
medication and consider only singleton births.) Show formula or
diagram and calculations. (4 pts)
____ c. Cases are HIV-infected infants; controls are infants whose mothers
should have received zidovudine but did not.
Methods: Data were obtained from interview, exam, and lab tests.
Results:
SD = standard deviation
-5- ID Number __-__ __ __ __
a. mean
b. SD
c. range
d. median
8B. Of the four variables in Table 1, which has the most symmetrical
(normal-like) distribution? (Choose one best answer.) (4 pts)
Syphilis 7/930
Gonorrhea 42/940
Chlamydia 66/957
_______________________________________________________
8C. Based on the above data and assuming that the the two diseases have
the same average duration, how do their incidence rates compare in
this population? (Choose the one correct answer.) (3 pts)
8D. Based on the above data but this time assuming that the two diseases
have the same incidence, how do their average durations compare in
this population? (Choose the one correct answer.) (2 pts)
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
9C. What measure would you use to quantify the strength of association
between cigarette smoking and AA-10? Show the formula for this
measure, substitute the appropriate numbers for that formula, compute
the result, and state its meaning in one sentence. (4 pts)
a. Formula
b. Substitution
c. Result
d. Meaning ____________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
9D. Assuming that cigarette smoking is responsible for the observed excess
in AA-10, how many cases of AA-10 during the quarter are attributable
to cigarette smoking? Show a relevant formula or diagram,
intermediate computation, and result, and give a sentence stating the
meaning of the result. (4 pts)
a. Formula or diagram
b. Substitution
c. Result
d. Meaning ____________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
10. Suppose that 900 of the subjects in question #8 consent to regular STD
screening following release from detention. Subjects are counseled
about preventive measures and screened every three months for two
years. All cases are treated and cured.
Syphilis 0 1 0 3 1 2 3 4
Gonorrhea 10 8 15 21 11 12 19 24
Chlamydia 15 23 8 18 17 17 14 11
Number tested 890 870 850 810 780 760 710 630
____________________________________________________________________
(Subjects can become infected with the same organism more than once
and/or become co-infected with more than one organism.)
10B. What is the average incidence density (per 100 person months or per
100 person years) of chlamydia for the two years of follow up? Assume
that: dropouts contribute no time to follow up after the last time
they are tested; subjects remain at risk even while infected. (3 pts)
10C. Give two reasons for preferring incidence density over cumulative
incidence for assessing frequency of infection in this cohort. (6 pts)
i. ___________________________________________________________
_______________________________________________________________
ii. ___________________________________________________________
_______________________________________________________________
-9- ID Number __-__ __ __ __
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
11B. Many of the criteria for causal inference pertain to the evaluation of
evidence from multiple studies, but several can also apply to a single
study. Name two (2) such criteria and use them to evaluate
(quantitatively where possible) the evidence from the above study.
(6 pts)
i. ___________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
ii. ___________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
Department of Epidemiology
1. Matching (1 pt each):
Column A - Terms Column B - Topics
7 cumulative incidence (11 is ok) 1. Case-control studies
12 incidence density 2. Causal inference
11 prevalence (7 is ok) 3. Confounds cross-sectional data
2 dose response 4. Death certificate
9 inductions period 5. Descriptive epidemiology
1 odds ratio 6. Diagnostic tests
8 preventive fraction in the exposed 7. Estimates risk
4 underlying cause of death 8. Measures impact
6 positive predictive value 9. Natural history of disease
10 detectable, pre-clinical phase 10. Population screening
5 migrant studies 11. Proportion
3 cohort effect 12. Relative rate
(Credit was also given for some other pairings.)
-2-
By diagram:
8B. a. Age at first coitus -- its mean and mean are both close together
and not very far from the middle of the range. Although the mean and
median are also close together for the number of partners in the past
4 months, but they are no where near the middle of the range. (4 pts)
-3-
Since both diseases have the same incidence, the ratio of their
durations equals the ratio of their prevalence odds:
9A. School absence from acute asthma and cigarette smoking (4 pts):
CI in smokers 8.3%
Cumulative incidence ratio = ----------------- = ------ = 1.89
CI in nonsmokers 4.4%
9D. Number of cases of excessive absence due to acute asthma (AA-10) that
(assuming causation) are attributable to smoking.
This question asks for the size of the shaded box in the diagram in
the "evolving text". That diagram, with numbers instead of variables
is:
|
8.3% | 8.3% = incidence
| |XXXXXXXXXXXXXXX| in exposed
Incidence | | | persons
| | 3.9% x 1,200 |
| | = 47 | 3.9% = "attributable
4.4% | |XXX XXXX| risk"
| |\\\\\\\\\\\\\\\|
| 300 | 4.4% x 1,200 | 4.4% = incidence
0| |\\ = 53 \\| in unexposed
6,800 1,200 (15%) persons
Nonsmokers Smokers
All these methods come up with approximately the same answer, the
differences being due to the rounding of intermediate results in
obtaining some of the incidences and the CIR. When the numbers
from the table are used and intermediate results not rounded, the
number of cases attributable to smoking is 47.0588
-5-
(Total) Cases
Prevalence = ---------------------
(Total) person-time
123 cases
= ------------------ = 0.65/100 person-months = 7.8/100 person-yrs
18,930 person-months
10C Reasons for preferring incidence density in this case (6 pts):
These diseases have an extended risk period (i.e., one longer than the
period of observation)
"I have neither given nor received help from others in completing this examination."
______________________________________________________________________________________
2. Find an example from the paper for each of the following (give the page
number and quote enough of the words to identify the point or passage; the
same point or phrase cannot be used more than once) (2 pts each)
5. The authors used the term "cohort effects" in regard to results from
previously reported studies. Which of the following best describes what is
meant by cohort effects in this context? (choose one best answer). (2 pts)
6. Cases in this study were incident cases of conformed cancer of the breast (p.
325). Which of the following best describes the advantage of selecting incident
cases over prevalent cases (choose one best answer) (2 pts)
a. primary
b. histologically-confirmed
A. kappa coefficient
B. correlation coefficient of reproducibility
C. intraclass correlation coefficient
D. product-moment correlation
E. A or B
F. A, B, or C
13.A list of control variables for use in the logistic regression models appears on
page 325, middle of column 2. These variables have been chosen because they
(choose one best answer): (2 pts)
[Link] page 326, 2nd column, the authors state "As shown in Table 3, the risk of
breast cancer associated with having been breastfed, was about 0.7 for both
pre- and postmenopausal women." In this context, to which of the following
epi demiologic measures does the term "risk" refer? Choose one best answer.
(2 pts)
A. Cumulative incidence
B. Incidence density
C. Attributable risk
D. Odds ratio
[Link] the multiple logistic model referred to as Model 2 in Table 3, what was the
coefficient for the variable not-having-been-breastfed among all breast cancer
cases? (2 pts)
a. The odds of breast cancer vary as the product of the odds for age and
the odds for education.
b. The odds of breast cancer vary as the sum of the odds for age and the
odds for education.
c. Age, education, and not having been breastfed were independent of
(i.e., uncorrelated with) each other.
d. Breast cancer is a rare disease.
[Link] that cases who refused to participate in this study were less likely to
have been breastfed as infants than those who participated in the study.
Which of the following best describes what this fact would imply for the obser
ved relative risk associated with being breastfed compared with what would
have been observed had all persons participated I the study? (choose one best
answer). (2 pts)
A. the observed relative risk would be biased away from the null.
B. the observed relative risk would be subject to selection bias and the
direction of the bias can not be estimated.
C. the observed relative risk would be biased toward the null.
D. the observed relative risk would be subject to misclassification bias and
the direction of the bias can not be estimated.
[Link] table 3, the confidence intervals for the OR's for all women do not include
the value 1.0, whereas all but one of the OR's for premenopausal breast cancer
and postmenopausal breast cancer do. Mathematically, what does this patte rn
reflect? (2 pts)
[Link] page 324, 2nd column, the authors offer a possible explanation of why two
previous studies of breastfeeding and breast cancer found little crude
association, observing that the result may have been "confounded by a fa ilure
to adjust for age, because of cohort effects with regard to breastfeeding
frequency". The following stratified analysis has been constructed to illustrate
a situation where cohort effects with regard to breastfeeding completely
obscure a true prote ctive association seen when age is controlled.
Breastfed 24 67
Bottlefed 81 36
Region A Region B
Crude 2.9
Compute the following (for adjusted rates use the direct method and the total
population as a standard):
[Link] that this relationship is causal, why might a similar study, 50 years
from now, fail to find as strong a relationship? (2 pts)
9. A. Kappa coefficient
10. Table:
Biomarker validation of women's self-report of having been breastfed
Yes No Total
S r --------------------------------------------
e e Breastfed 70 26 96
l p
f o Not breastfed 80 28 108
r --------------------------------------------
t Total 150 54 204
11. a. Table:
Adult breast cancer by having been breastfed as an infant,
among premenopausal women with education beyond high school
d. False - The matching caused cases and controls to have the same
age distribution, so it did "work"; matching would not be expected
to eliminate an association between age and the exposure, since
exposure status was not known when controls were being selected and
in any case would not have been used in the matching procedure.
e. False - The matching procedure prevented an association.
From Table 2:
Cases Controls
------------------------- -------------------------
Breastfed Not breastfed Breastfed Not breastfed
Body mass ---------- -------------- --------- -------------
index (kg/mz)
16-22 48 15 89 19
23-27 103 26 125 16
>27 90 17 91 16
To show the details, here is a table for estimating OR's for body mass index and breast
cancer:
and the resulting OR's are [e.g., (90 * 89) / (48 * 91) = 1.83]:
The OR's in the total column are shown to illustrate that in this
case there is some confounding by breastfeeding history, at body
mass index level 23-27 kg/m sq. Within either breastfed or not
breastfed group there is no "dose-response" relationship.
13. Potential confounders are factors that are known or suspected risk
factors for breast cancer or its detection, or at least proxies for
such factors.
OR = (50 x 216) / (38 x 167) = 1.7 (for zero vs. >= 3 pregnancies)
Not breastfed 41 25 66
----------------------------------
Total 189 208 397
> 165 vs. all others: OR = (148 x 68) / (396 x 41) = 0.62
16. a. Estimate RR for Not breastfed as 1/OR for Breastfed: 1 / 0.69 = 1.45
b. If know the formula (or can derive it from the diagram and the
"grand synthesis"):
P(E|D) (RR-1)
PARP = --------------- and since breast cancer is rare, use OR.
RR
(117)
----------- (1.47-1)
(117+112) (0.51) (0.47)
Premenopausal: ----------------------- = --------------- = 0.16
1.47 1.47
AND
(58)
-------------- (1.45-1)
(58+241) (0.19) (0.45)
Postmenopausal: ------------------------- = --------------- = 0.06
1.45 1.45
Proportion of exposed (Not breastfed) cases that are atttributable to not having been
breastfed is:
ARP = (RR-1)/RR
Since breast cancer is rare, we can estimate with
(OR-1)/OR = (1.47-1) / 1.47 = 0.3197 for postmenopausal.
17. Logistic model coefficients for risk factor variables are natural
logarithms of odds ratios per one unit change in the variable.
So the coefficient was ln(0.70) = -0.3567
Assumptions:
a. True - The odds of breast cancer vary as the product of the odds
for age and the odds for education.
b. False - Only in a few special cases will the product of two odds
equal their sum (e.g., both odds equal zero or both odds equal two).
The logistic model is additive in the logit (logarithm of odds),
multiplicative in the odds.
18. C. The observed relative risk would be biased toward the null.
20.
AGE < 60 AGE > 60 TOTAL
----------------------------------------------------
Breast Bottle Breast Bottle Breast Bottle
------ ------ ------ ------ ------ ------
Cases 24 40 256 100 280 140
Region A Region B
Cases Population Rate/1000 Cases Population Rate/1000
< High School Education
Age
40-50 10 7,000 1.4 10 15,000 0.7
51-60 15 10,000 1.5 20 5,000 4.0
61-65 30 3,000 10 600 55,000 10.9
25. Assuming that this relationship is causal, why might a similar study,
50 years from now, fail to find as strong a relationship? (2 pts)
A copy of this article was provided to you before this examination and can be
used in answering the following questions.
1. Briefly state the primary study question of this report. Identify the
main exposure and outcome of interest. (3 pts)
___________________________________________________________
A. Active surveillance
B. Ongoing crossectional survey
C. Passive surveillance
D. Follow up study of dynamic population
5. This study determined exposure and outcomes using data from "a list of
all members of the agricultural community who were certified to apply
restricted-use pesticides in 1991" (p. 394-methods) and from "all in-
wedlock live births recorded in the state for the years 1989 through
1992" (p. 394-methods). Briefly assess the strength of these data
sources in establishing the temporal sequence of pesticide exposure
and birth defects and provide support for your assessment. (4 pts)
7. The use of the term "rate" is not an infallible guide to the specific
epidemiologic measure being presented. Which one of the following
epidemiologic measures best characterizes the measure that the authors
refer to as the "rate of anomalies per 1000 live births" (Table 2 -
footnote)? Choose one best answer. (4 pts)
A. pesticide appliers had 1.37 times more births with anomalies than
did the general population.
B. pesticide appliers had more children with birth anomalies than did
the general population.
___________________________________________________________________
10. Using data in Table 1:
11. Using the data presented in Table 1, recalculate the crude odds ratio
for all births with anomalies assuming that all musculoskeletal birth
anomalies occurring among those with maternal age greater than 30 and
the "other" anomalies among maternal age > 35 were later found to
actually have occurred among persons incorrectly classified as
appliers. Explain what implications this new calculation would have
on the conclusions of the study. (3 pts)
___________________________________________________________________
12. It is possible that the pesticides examined in this study might have
reduced fecundity or increased the proportion of conceptions not
resulting in live births. Assume that both of these effects (lower
fecundity, more spontaneous abortions, and more still births) have in
fact occurred in the pesticide applier population studied here, so
that the number of live births to pesticide applier fathers is smaller
than it would have been in the absence of pesticide exposure. Which of
the following statements is (are) TRUE and which is (are) FALSE? (2
pts each)
TRUE FALSE
____ ____ A. Since all births would be affected equally, effects on
fecundity and spontaneous abortion WOULD NOT have influenced
the size of the odds ratio presented in this study. [This
question is problematic.]
____ ____ B. If pesticides were equally likely to cause fetal loss and birth
anomalies, then the odds ratios would strongly understate the
harmful effects of pesticides.
13. Table 4 shows the frequency per 1000 births of major anomalies for the
general population by region. Which of the following best describes
the study design from which these data were obtained. (4 pts)
A. ecologic study
B. prospective cohort study
C. retrospective cohort study
D. region-specific case control study
14. The authors begin their discussion section by stating that this report
"is an initial step in the evaluation of the possible relationships
between the frequency of birth anomalies and pesticide use". They
conclude, however by saying that these data "signify a clear-cut need
for comprehensive examination of the health issues involved". This
latter statement seems to indicate that the authors suspect a causal
relationship. Identify and describe three criteria for causal
inference for which at least some information is present in the
article. Give specific examples from the article to support your
selection. (9 pts)
___________________________________________________________________
15. Suppose that after this publication came out, another study was
conducted in Illinois to investigate the hypothesis that birth defects
occurred more often in Illinois as compared to Minnesota. However,
in this new study the authors thought that the type of water consumed
could be related to birth defects. They wanted to adjust
(standardize) the rates of defects in the two states for water type.
Data from the two studies are compared as below.
a. calculate the crude rate and the water-type specific rates for
Illinois. Briefly describe how these two states compare in crude
rates of birth anomalies. (4 pts)
17. Which of the following statements about the present study are (is)
TRUE and which are (is) FALSE. Indicate TRUE or FALSE for each
statement. (2 pts each)
TRUE FALSE
____ ____ A. Subjects used in the analyses for Table 1 of this study were
selected on the basis of their exposure status.
____ ____ C. The age-adjusted odds ratio for all birth anomalies of 1.41 is
considered a modest association.
____ ____ D. Since birth defects of these types are rare in the general
population, a cohort study could be designed to efficiently
examine further the relationship of pesticides and birth
anomalies.
4. C. Passive surveillance
10A. Since the question does not specify absolute or relative impact, either
attributable risk (AR) or attributable risk proportion (ARP) is correct
(actually, attributable prevalence, but the term attributable risk is
typically applied to rates and prevalences as well as risks).
Meaning: 7.2 births with anomalies per 1000 live births fathered by
pesticide appliers are attributable to pesticide exposure.
Meaning: 1.8 births with anomalies per 10,000 live births to the general
(married) population are attributable to pesticide exposure in pesticide
appliers.
(Note: small differences among the results from the various methods are
primarily due to the fact that the OR of 1.37 has been rounded to fewer
significant digits than are the prevalences computed above.
12. A. False - there is no basis for assuming that all births would be
affected equally.
15. This question underwent a revision to simplify it, but unfortunately some
parts of the previous version remained. The columns labelled
"# live births" should have included the qualifier "Normal", and the
rates for Minnesota needed to be re-computed accordingly. Due to this
problem, two alternate solutions are completely acceptable, one in which
the denominators are the numbers in the "# live births" column and one in
which the denominators equal the sum of these numbers plus the numbers of
births with anomalies. In addition, full credit is given if the rates
for Minnesota were recomputed. Here is the version in which the stated
rates were used and the # of live births column was treated as if it
meant "Total live births":
16. Yes - it is not clear from these data whether birth anomalies occurred
in people with or without exposure because exposure information was
based on group data.
17. A. False - subjects were selected from birth records for live births
B. False
C. True
D. False
E False
F. True - (however, a correlation coefficient indicates the extent of
association in the sense of two variables moving in tandem; it does
not indicate the strength of association in the epidemiologic sense
of how great a change occurs in the response variable for a change
of a given size in the exposure variable)
G. True
19. Points in favor of action at this time are the evidence that the
relationship is causal (biological plausibility, consistency between
results of ecologic [by crop-region] and individual-based [pesticide
applier] analyses, pattern of findings (season of conception),
consistency across several epidemiologic studies, and the high
attributable risk percent (27%) among babies with birth anomalies born
to pesticide applier couples. In addition, the substantially
increased prevalences of birth anomalies among all live births in
county clusters with high use of chlorophenoxy herbicides/fungicides
(Table 4), consistent across the four regions, suggest that anomalies
due to pesticides (assuming that the relationship is causal) occur
throughout areas where these pesticides are used. Even though the
population attributable risk proportion is very small (about 1%) for
exposure due to being a pesticide applier, the proportion of all
Minnesota birth anomalies potentially attributable to residence in a
county cluster with high pesticide use is 27% [overall prevalence of
birth anomalies for all Minnesota in-wedlock births was 3791 / 183,721
= 20.63 per 1000 live births (Table 1), prevalence of birth anomalies
in low-pesticide county clusters ("unexposed") was 15 per 1000 (Table
4), so PARP = (PCrude - P0) / PCrude = (20.63 - 15) / 20.63 = .27).
The effects seem to be strongest for chlorophenoxy pesticides,
suggesting that at least this category should be restricted.
Moreover, there are powerful arguments for reducing pesticide use for
environmental reasons as well.
Against taking action other than continuing research are that the
evidence is still not very strong (biological mechanisms not yet
elucidated, relationship is not highly specific, epidemiologic studies
limited and not entirely consistent, experimental evidence not
available), the potential impact on agriculture and therefore food
prices is considerable, and the costs to industry and commerce from
restrictions on a major product are substantial. Moreover, the
relative weakness of the odds ratios (below 2.0) indicates a
significant possibility that other factors could be responsible for
the increase in birth anomaly prevalence seen in association with
pesticide exposure, a possibility whose investigation requires better
data on exposure and other factors that may lead to birth anomalies.
Grading of this question is based on the clarity and support for your
evaluation and recommendation.
University of North Carolina
School of Public Health
Department of Epidemiology
Fundamentals of Epidemiology (EPID 168)
Victor J. Schoenbach and Wayne D. Rosamond
NOTE: For simplicity, ignore the requirement that this study was
restricted to those persons with a telephone number.
A. Manifestational criteria
B. Causal criteria
D. Neither
A. Selection bias
B. Prevalence-incidence bias
C. Information bias
D. Surveillance bias
_ -2-
A. Without the exclusion the odds ratio would be closer to the null.
8. This study uses a case control design with a population based control
group. Which of the following, in general, is a strength of this
design. (Choose one best answer) (3 pts)
A. Nominal
B. Ordinal
C. Interval
D. Ratio
10. Control for age in the analyses presented in Table 2 was accomplished
through which of the following methods? (Choose one best answer)
(3 pts)
12. The authors state on page 49 that after controlling for smoking, the
relative risk for CrohnÕ s disease among men was 1.9 for a high
consumption of sucrose and 0.7 for a high consumption of fiber. Briefly
explain why based on these data the authors state that smoking did not
confound these associations. (3 pts)
13. The data presented in Table 3 indicate that Crohn's disease is
associated with the consumption of fast foods. Suppose that when
stratified by educational attainment, the resulting data were as
follows:
Educational attainment
High Low
Fast foods
1+ times/wk 12 10 8 14
14. In the discussion (page 50), the authors state that Ò if the change in
diet is the same in cases as in controls, then the relative risk
estimates would be biased toward unityÓ . This is an example of which of
the following? (Choose one best answer) (3 pts)
15. This articles does not present p-values yet reports 95% confidence
intervals for all odds ratios. Which of the following best describes
what information a confidence interval conveys that a p-value does not.
(Choose one best answer) (3 pts)
A. A confidence interval puts the observed point estimate in the context
of randomness.
17. Briefly present the evidence for or against the role of fiber as a
confounder of the association of sucrose intake and CrohnÕ s disease. (3
pts)
18. Suppose a follow-up to this study was done to estimate the rate (per
10,000 person years) of ulcerative colitis among a large sample in the
Swedish population. The table below summarizes the results.
a. Which model for the joint effect of these two food items, the
additive model or the multiplicative model, better fits the data?
Your answer should give the formula for each model and show how to
evaluate it with the above data. (5 pts)
19. This study did not differentiate between caffeinated and decaffeinated
coffee. Using the data presented in Table 4 and applying the
assumptions below, calculate the odds ratio (heavy versus no use)
associated with caffeinated coffee consumption and determine if it is
protective against ulcerative colitis. Describe in 2 sentences or less
the interpretation of this new odds ratio, ignoring issues of random
error. (4 pts)
Assumptions:
1. 20% of the heavy coffee drinkers ( 3 cups per day) among cases drink
only decaffeinated coffee.
20. Which of the following variables was NOT in the multiple logistic model
that was used to estimate the relative risk for sucrose intake in
relation to ulcerative colitis in women? (Choose best answer) (3 pts)
A. Age
B. Gender
D. Ulcerative colitis
21. In the multiple logistic model that yielded the relative risk estimate
of 0.7 for Ulcerative colitis in relation to daily vegetable consumption
(Table 4), what was the value of the coefficient for the vegetable
consumption variable assuming that it was coded as 1=daily, 0=less
frequently? Write the conversion equation of coefficient to relative
risk estimate. (3 pts)
22. Assume that the population of Stockholm County in the age range covered
by this study was 1,000,000 in 1980 and remained constant throughout the
decade. What was the average annual incidence of hospital-diagnosed
Crohn's disease during that period regardless of when their medical
record became available? (3 pts)
23. Using the data in Table 2, for which of the following two associations
is there more of an indication of confounding by age and total energy
intake in WOMEN? Support your answer with relevant data and/or
computations. (3 pts)
24. Briefly state one major strength and one major limitation of this study
(2 pts)
_ -6-
25. List two Bradford Hill criteria for evaluating whether dietary sucrose
intake is causally related to inflammatory bowel disease. Evaluate each
using specific facts from the article. (4 pts)
26. Which of the following statements about the data in Tables 1 and 2 are
TRUE and which are FALSE (answer TRUE or FALSE for each statement). (2
pts each)
d. The proportion of controls with high dietary fat intake was higher
for men than for women.
27. A Swedish friend of yours who lives in Stockhom has an indentical twin
sister who is anything but identical in terms of her diet. Your friend,
as other health conscious Swedes, avoids fast foods and soft drinks, and
eats whole grain bread and muesli-type cereals daily. Her twin sister,
and many Swedes, often consumes fast foods and soft drinks, but never
touches whole grain bread or muesli.
Your friend comes to visit with you over the holidays, and while you are
sleeping late one morning she comes across your class notes from EPID
168. At breakfast, where she has been busily scribbling on her napkin,
she asks you this question.
"Suppose that fast foods, soft drinks, whole grain bread, and muesli-
type cereal affect Crohn's disease risk independently, and that I can
ignore other risk factors. Suppose also that the excess risks are
additive. Is my twin sister's risk of Crohn's disease 10 times my own?"
She shows you how she used the information in Table 3 to obtain that
estimate:
She goes on to explain "(3.4 -1) is the excess risk from fast foods, and
((1/0.4) - 1) is the excess risk from eating bread that is not whole
grain."
Even though you're not quite fully awake, you feel justifiable pride in
your command of epidemiologic concepts and explain to her the one big
mistake she has made. You say, " . . . ". Write a brief statement of
what you would say. (4 pts)
2. A. Manifestational criteria
3. C. Information bias
4. C. Provide an estimate of the dietary exposure in source population from
which the cases arose.
9. D. Ratio (The response scale for each item was ordinal, but in order to
create the total energy variable the authors had to convert each
response into calories.)
11. The odds ratios for 80 to 104 grams per day was 1.4 and for intakes of
greater than 105 grams per day the odds ratio was 1.3. This suggests a
tendency for cases to have a greater proportion of high fat eaters than
controls. However, the confidence intervals are broad, extending as low
as 0.4 and 0.6. Furthermore there is no suggestion of a dose response.
This is at most weak evidence of a relationship between fat intake and
ulcerative colitis.
12. a. The crude (with respect to smoking) and adjusted odds ratios are the
same. If smoking had been a confounder in the relationship between
sucrose and Crohn's disease or between fiber and Crohn's disease the
adjusted odds ratio would have been meaningfully different from the
values in Table 2.
13. a. Odds ratios: Crude = (24 x 285) / (20 x 128) = 6840 / 2560 = 2.7
among High education = (10 x 150) / (12 x 100) = 1.3
among Low education = (14 x 135) / (28 x 8) = 8.4
b. The stratum-specific odds ratios are quite different from each other,
suggesting some degree of effect modification. The crude odds ratio
is within the range of the two stratum-specific odds ratio, which
suggests that education is not so much a confounder as an effect
modifier.
17. The authors state that sucrose and fiber intake could be associated with
one another as well as with Crohn's disease and thus each factor might be
a confounder of the associations between Crohn's disease and the other
("mutual confounding"). The odds ratio was 2.6 for a high sucrose intake
(bottom page 48). When adjusted for fiber the sucrose odds ratio changed
only slightly to 2.5. Therefore, fiber was a only a slight modifier of
the sucrose and Crohn's disease relationship.
18. a. Under the additive model, we expect the joint excess rate of the two
factors will be equal to the sum of the excess rate from each factor
separately. The additive model can also be written in terms of rates:
expected rate of ulcerative colitis with both daily soft drink and =2
fast foods per week = rate (daily soft drinks, without fast food) +
rate (less freq. soft drink, =2 fast food per week) - rate (neither).
Under the multiplicative model, we expect the joint rate ratio of the
two factors to be equal to the product of the rate ratios for each
factor separately. In the above notation, the model can be expressed
as: R1,1 = (R1,0 x R0,1)/R0,0. This equation expressed with numbers
from the tables is: (9.1 x 6.8) / 3.7 = 16.7. The observed rate is
18.0. The close agreement for the observed joint rate and that
expected under the multiplicative model suggests that the relationship
among daily soft drink consumption, frequent fast food exposure, and
Crohn's disease is closer to multiplicative than to additive.
19. odds ratio for =3 caffeinated coffee = (56 x 36) / (18 x 36) = 3.1
Heavy caffeinated coffee drinking now appears to be a risk factor for
Ulcerative colitis where before coffee drinking appeared to be
protective. An alternative approach would be to include the
decaffeinated coffee drinkers in the "No" (caffeinated) coffee group.
Under this model the odds ratio for =3 cups caffeinated coffee, relative
to none or only decaffeinated = (56 x 201) / (50 x 18) = 12.5
22. 236 cases / 5,000,000 person years = 4.72 cases/100,000 person years.
Full credit was given for 236 cases / 4,000,000 person years = 5.9 cases
/ 100,000 per year. Note that the incidence is obtained from all cases
(or at least all confirmed cases), rather than from only consenting
cases.
For disaccharides:
Crude OR = (30 x 66) / (35 x 45) = 1.26, versus adjusted OR of 1.2
26. a. F
b. F
c. T
d. F
27. Models of joint effects combine effects of "pure" exposures, i.e., in the
absence of other exposures. But the excess risk for each food item in
Table 2 is estimated without controlling for the effects of others. For
example, since people who eat fast foods are also likely to take soft
drinks and not to eat whole grain bread, the relative risk estimates for
fast food 2+ times/week probably already reflect frequent soft drink
consumption and low whole grain bread consumption. In order to add up
the excess risk for each food item, we need to know the excess risks for
exposure to that item in the absence of the others.