0% found this document useful (0 votes)

14 views11 pages

Freq, Genmod, Logistic

The paper discusses nine methods for estimating adjusted relative risks (RR) using SAS procedures FREQ, GENMOD, LOGISTIC, and PHREG, illustrated through an observational cohort study on stroke patients. It highlights the importance of adjusting for systematic differences in non-randomized studies and compares the strengths and limitations of each method. The study found variations in adjusted RR estimates depending on the statistical approach used, emphasizing the need for careful selection of methods in data analysis.

Uploaded by

kmeena73

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views11 pages

Freq, Genmod, Logistic

Uploaded by

kmeena73

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

SAS Global Forum 2011 Statistics and Data Analysis

Paper 345-2011

Using SAS® Procedures FREQ, GENMOD, LOGISTIC, and PHREG to Estimate

Adjusted Relative Risks – A Case Study

Jiming Fang
Institute for Clinical Evaluative Sciences, Toronto, Ontario, Canada

ABSTRACT
We present nine methods to compute an adjusted relative risk (RR). These methods evolved over the past 25 years
(1985–2010) via SAS/STAT® procedures: FREQ, GENMOD, LOGISTIC, and PHREG. We also compare the strengths
and limitations of these methods, using an observational cohort study for illustration.

INTRODUCTION
The relative risk (RR) is a common measure of the effect of treatment or exposure on a dichotomous outcome in cohort
studies. Researchers are increasingly using observational studies to estimate the effect of treatment on outcomes.
However, unlike randomized controlled trials, treated subjects in non-randomized studies often differ systematically
from untreated subjects. The effect of treatment on outcomes cannot be compared directly between groups. Therefore,
statistical methods must be used to adjust for systematic differences when estimating the effect of treatment on
outcomes. In the present paper, we illustrate 9 methods to compute adjusted relative risks which have been developed
in a quarter of a century via 4 different SAS/Stat® procedures: FREQ, GENMOD, LOGISTIC, and PHREG. We will also
compare the strengths and limitations of these methods based on an observational cohort study using data from the
Registry of Canadian Stroke Network.

Study Cohort
We conducted a study to investigate the impact of follow-up at a secondary prevention clinic (SPC) on 1-year mortality
in stroke patients. The study cohort was taken from the Registry of Canadian Stroke Network (RCSN), which includes
patients seen at all 11 stroke centers in Ontario, Canada between July 2003 and March 2006. Data concerning the date
of stroke onset and hospital arrival, stroke type, comorbidities, stroke severity, and outcomes at discharge were
abstracted from each patient’s chart by trained nurses using custom RCSN data entry software. The risk of 1-year death
following stroke onset was determined through linkages to a provincial administrative database. The study cohort
consisted of 9074 ischemic or transient ischemic attack (TIA) patients who were alive at discharge. Of these, 4036
patients were referred to a secondary prevention clinic follow-up (SPC=1), and 5038 were not (SPC=0). Patients with
SPC were significantly different from those without SPC in terms of their demographic and clinical characteristics (Table
1, P-values <0.05 highlighted in red).

Crude RR using Proc Freq

The crude RR provides a measure of the overall association between the risk factor and the outcome, e.g., SPC and
1-year mortality in the present study. It can be obtained easily from Proc Freq using RelRisk option.

Proc Freq data=StudyCohort;

Tables SPC*Death_1year / RelRisk;
Run;

The 1-year mortality rates in SPC patients and non-SPC patients were 6.5% and 14.4%, respectively. The crude RR is
0.454 (95% CI: 0.397-0.519), suggesting the 1-year mortality rate for SPC patients was 54.6% lower than for non-SPC
patients. However, due to the differences in baseline characteristic (Table 1), we must run multivariate analyses to
adjust the RR for the impact of other potential factors that may be related to SPC follow-up.

Adjusted RR using Proc Freq – Stratified Mantel-Haenszel

We can use a stratified Mantel-Haenszel Chi-square statistic to control for the other categorical factors, for example,
ambulance transportation and hospital admission. This adjusted RR may identify the role of the risk factor of interest
(SPC) after the risk from other factors(s) has been statistically removed (Greenland & Robins 1985). Here is
Mantel-Haenszel test:

Proc Freq data= StudyCohort;

Tables Ambulance*Admission*SPC*Death_1year / RelRisk;
Run;

1
SAS Global Forum 2011 Statistics and Data Analysis

Table 1. Baseline comparisons

Variable Value No-SPC SPC P-value

Sample size n 5038 4036
Male n (%) 2568 (51.0%) 2171 (53.8%) 0.0076
Lives with others n (%) 3547 (70.4%) 2953 (73.2%) 0.0037
Urban residence n (%) 4403 (87.4%) 3755 (93.0%) 0.0000
Neighborhood income quintile 1 n (%) 1246 (24.7%) 788 (19.5%) 0.0000
Neighborhood income quintile 2 n (%) 1076 (21.4%) 803 (19.9%) 0.0877
Neighborhood income quintile 3 n (%) 958 (19.0%) 792 (19.6%) 0.4658
Neighborhood income quintile 4 n (%) 856 (17.0%) 724 (17.9%) 0.2368
Neighborhood income quintile 5 n (%) 902 (17.9%) 929 (23.0%) 0.0000
Presenting to ER from home n (%) 3635 (72.2%) 3233 (80.1%) 0.0000
Transport by ambulance n (%) 3176 (63.0%) 2006 (49.7%) 0.0000
Ischemic n (%) 3468 (68.8%) 2572 (63.7%) 0.0000
Weakness stroke symptom n (%) 3760 (74.6%) 2784 (69.0%) 0.0000
Stroke classification - TASC n (%) 367 (7.3%) 165 (4.1%) 0.0000
Emergency consultation n (%) 2257 (44.8%) 2031 (50.3%) 0.0000
tPA treatment n (%) 376 (7.5%) 301 (7.5%) 0.9922
Preadmission independence n (%) 3854 (76.5%) 3588 (88.9%) 0.0000
Charlson score >=2 n (%) 1689 (33.5%) 1098 (27.2%) 0.0000
Admitted to hospital n (%) 3662 (72.7%) 2102 (52.1%) 0.0000
Deficit at discharge n (%) 3263 (64.8%) 2322 (57.5%) 0.0000
Discharge to home n (%) 2732 (54.2%) 3071 (76.1%) 0.0000
Modified Rankin score 3~5 n (%) 2186 (43.4%) 975 (24.2%) 0.0000
Age (Years) Mean±SD 72.24±13.69 69.09±13.65 0.0000
Stroke severity measured by
Mean±SD 8.91±2.76 9.77±2.27 0.0000
Canadian Neurological Score
Length of stay (Days) Mean±SD 11.42±22.72 6.3±14.87 0.0000
Death in 1-year n (%) 726 (14.4%) 264 (6.5%) 0.0000
SD=standard deviation

The Mantel-Haenszel adjusted RR is 0.517 (95% CI: 0.451-0.591), suggesting that after controlling for ambulance
transport to the stroke centre and hospital admission, the 1-year mortality rate for SPC patients was 48.3% lower than
for non-SPC patients. Stratification is attractive when statistical control of a few categorical covariates is required.
However, it can be difficult to implement in practice when there are many confounding covariates, especially if some of
the confounders are continuous.

Adjusted RR using Proc GenMod – Log-Binomial regression Model

When we need to adjust for many covariates, including continuous covariates, we can use Log-Binomial regression
(McNutt et al. 2003; Wacholder 1986), which is implemented in the GenMod procedure. Here is the SAS program using
Log-Binomial regression to adjust for other covariates:

Proc GenMod data=StudyCohort descending;

Class SPC/param=ref ref=first;
Model Death_1year=SPC Var_1- Var_n / Dist=bin Link=log;
Estimate 'RR SPC vs. Non-SPC' SPC 1/exp;
Run;

Here Var_1 – Var_n include 24 covariates besides SPC, such as gender, age, ambulance transportation, admission,
etc. However, the program did not run successfully and produced the following error message.

WARNING: The specified model did not converge.

2
SAS Global Forum 2011 Statistics and Data Analysis

ERROR: The mean parameter is either invalid or at a limit of its range for some
observations.

The probability of an outcome must fall within the bounds [0, 1]. However, the Log link function in Log-Binomial models
restricts the probabilities of an outcome to be greater than or equal to zero, that is, to fall within the bounds [0, ∞). Due
to this mismatch between the bounds of the model and the allowable outcome, in practice, the Log-Binomial model will
routinely fail to converge and will not provide the parameter estimates (Localio et al. 2007). The failure of convergence
in the Log-Binomial regression may also indicate that the data do not support the model (Tian & Liu 2006).

We found that if 15 covariates are included in the model instead of desired 24, the model does converge. The adjusted
RR is 0.561 (95% CI: 0.490-0.642, with StdErr=0.0389).

Adjusted RR using Proc GenMod – Log-Binomial regression Model with negative intercept
When all predictors are zero or at their reference levels in the multivariate Log-Binomial regression model, the intercept
estimates log(p)<0 as 0<p<1. So it makes sense to start its estimation in the negative value. It was found that starting
value of - 4 for the intercept has worked well in practice (Deddens et al. 2003).

Proc GenMod data=StudyCohort descending;

Class SPC/param=ref ref=first;
Model Death_1year=SPC Var_1- Var_n /Dist=bin Link=log intercept=-4;
Estimate 'RR SPC vs. Non-SPC' SPC 1/exp;
Run;

The adjusted RR from the Log-Binomial regression Model with negative intercept is 0.806 (95% CI: 0.672-0.968), with
StdErr = 0.0751. However, the following SAS warning messages suggests that the convergence problem has not been
completely solved by the negative intercept given and the model fit is still questionable.

WARNING: The relative Hessian convergence criterion of 0.126294346 is greater than

the limit of 0.0001. The convergence is questionable.

WARNING: The procedure is continuing but the validity of the model fit is
questionable.

Adjusted RR using Proc GenMod – Poisson regression model

In contrast to the Log-Binomial regression model, the Poisson regression model, using all 24 covariates, has no
difficulty with convergence (McNutt et al. 2003). Poisson distribution would be expected to be a good approximation to
the binomial distribution when the outcome is low and the sample size is large. Here is the Poisson regression using
Proc GenMod:

Proc GenMod data=StudyCohort descending;

Class SPC/param=ref ref=first;
Model death_1year=SPC Var_1- Var_n / Dist=poisson Link=log;
Estimate 'RR SPC vs. Non-SPC' SPC 1 /exp;
Run;

The adjusted RR from the Poisson regression model is 0.777 (95% CI: 0.667-0.905), with StdErr = 0.0607. However,
one limitation in the Poisson approximation is that the estimated probabilities from the Poisson model may be greater
than 1, which is invalid (Deddens & Petersen 2004).

Adjusted RR using Proc GenMod – Modified Poisson regression model

Poisson regression without robust error variances may result in a conservative CI (i.e., wider CI). A “modified Poisson”
method has been proposed to estimate the RR using a robust error variance (Zou 2004). This method leads to the
robust error variance estimation and produces 95% CIs with the correct coverage. Using SAS, the robust error
variances can be obtained by using the repeated statement and the subject identifier (here PatientID), even though
there is only one observation per subject. Here is the SAS program (Spiegelman & Hertzmark 2005).

Proc GenMod data=StudyCohort descending;

Class PatientID SPC/param=ref ref=first;
Model Death_1year=SPC Var_1-Var_n / Dist=poisson Link=log;
Repeated subject=PatientID / type=Ind;
Estimate 'RR SPC vs. Non-SPC' SPC 1 /exp;
Run;

The RR from the modified Poisson regression is 0.777 (95% CI: 0.675-0.894), with StdErr=0.0555, which is smaller
than the StdErr (0.0607) from simple Poisson regression. However, this method may fail when outcomes are common,

3
SAS Global Forum 2011 Statistics and Data Analysis

for example, the point estimate for risk and the upper CI bounds of the expected probability may exceed 1 because the
log link does not constrain expected probabilities (Localio et al. 2007).

Adjusted RR using Proc Logistic – OR-to-RR formula

We may also obtain the adjusted RR from the adjusted odds ratio (OR) using the simple relationship (Zhang & Yu
1998):

PT
1  PT RR 
PT

OR
OR  and
Pc PC (1  PC )  ( PC  OR )
1  PC
where PC and PT are the unadjusted risks in the control and treat groups, respectively.

CIs for the RR are estimated by substituting the upper and lower CIs for the OR from the multivariate logistic regression
model (Daly 1998). In the present study, the following SAS code was used:

Proc Logistic data=StudyCohort descending;

Class SPC/param=ref ref=first;
Model Death_1year=SPC Var_1-Var_n /rl lackfit;
Run;

Adjusted OR=0.731 (95% CI: 0.618-0.866)

Pc=0.144
Adjusted RR=0.761 (95% CI: 0.654-0.883)

This RR is biased away from the null, suggesting a stronger association. However, if the incidence of an outcome of
interest is common in the study population (say, >10%, Figure 1), the adjusted OR derived from the logistic regression
can no longer approximate the RR (Zhang & Yu 1998). When the outcome is common, the RR estimated using logistic
regression will be more extreme (farther from 1.0) than the RR for the same data (Altman et al. 1998; Deeks 1998). In
addition, the proposed CIs for the RR will be too narrow because this approach fails to account for variability in the
baseline risk (Localio et al. 2007; McNutt et al. 2003).

Figure 1. The relationship between relative risk and odds ratios by incidence of the outcome
(Cited from Zhang & Yu 1998)

Adjusted RR using Proc Logistic – Propensity-score matching

Propensity-score matching is frequently used in the medical literature to estimate the effect of treatments and
exposures on health outcomes (Austin 2008). The propensity score is defined as a subject’s probability of receiving the

4
SAS Global Forum 2011 Statistics and Data Analysis

treatment or the probability of exposure, conditional on his/her observed baseline characteristics. It is usually estimated
using a logistic regression model (Austin et al. 2010). Once the propensity score has been estimated for each subject,
treated and untreated subjects are matched on the propensity score. The most commonly-used method is to form pairs
with similar propensity scores. The next step is to use the standardized difference to examine the balance in measure
baseline variables between treated and untreated subjects. If the balance is acceptable, we then estimate the effect of
treatment on the outcome via the appropriate statistical tests (Austin 2007).

Austin and his colleagues illustrate the detailed SAS coding to obtain the RR using propensity-score matching method
(Austin et al. 2010). In the present study, 3,114 matched pairs of patients were identified (total n=6,228, less than the
original sample size). Figures 2a and 2b show the similarity between SPC and non-SPC patients before (Figure 2a) and
after (Figure 2b) propensity-score matching.

Figure 2a. Density distribution of propensity scores – Original cohort

Figure 2b. Density distribution of propensity scores – Propensity-score matched cohort

5
SAS Global Forum 2011 Statistics and Data Analysis

In the present study, there were 32 matched pairs in which both subjects died within one year of the stroke, 2598
matched pairs in which neither subject died within one year of the stroke, 291 matched pairs in which the Non-SPC
patient died and the SPC patient did not die, and 193 matched pairs in which the SPC patient died and the Non-SPC
patient did not die. According to the method proposed by Agresti and Min to estimate the RR and its confidence interval
for matched data (Agresti & Min 2004), the RR of 1-year mortality for SPC patients compared to Non-SPC patients was
0.697, and the 95%CI was 0.594-0.817.

SPC
Dead Alive
32 291
Dead
(a) (b)
Non-SPC
193 2598
Alive
(c) (d)

ac
RR 
ab
bc
log( RR ) 1.96
( a  b )( a  c )
RR 95% CI given as e
Adjusted RR using Proc Logistic – Marginal Probability and Bootstrapping
Austin introduced a new method for deriving the adjusted RR from a logistic regression model (Austin 2010b). This
method involves determining the probability of the outcome if each patient in the cohort was treated, and again if each
patient was untreated. These probabilities are then averaged across the study cohort to determine the average
probability of the outcome in the population if all patients were treated, and if they were untreated.

 Pr(Yi  1) 
log    α 0  βTi  α1X1i  α 2 X 2i    α k X ki
 1  Pr(Yi  1) 
Y=1 denoting outcome success (e.g., dead), Y=0 denoting outcome failure (e.g., alive)
T=1 denoting treatment (SPC), T=0 denoting control (Non-SPC)
X1 to Xk denote k confounding covariates
 denotes the log-odds ratio, e denotes the odds ratio

We can estimate the probability of the outcome if a given patient was treated, and if the same patient was untreated.

Probability of the outcome if a patient was treated:

e 0   1 X 1i  2 X 2 i  k X ki
1  e 0   1 X 1i  2 X 2 i  k X ki
Probability of the outcome if a patient was untreated:

e 0 1 X 1i  2 X 2 i  k X ki
1  e 0 1 X 1i  2 X 2 i  k X ki
We then compute the mean probability ( PT 1 ) of success in the cohort if all patients were treated, and the mean
probability ( PT  0 ) of success in the cohort if all patients were untreated. These are referred to as the marginal

PT 1
probabilities of success for treated and untreated patients. The adjusted RR is estimated as .
PT  0
Use of marginal probabilities allows one to compare outcomes between two populations whose only difference is the
exposure. Because all patients contribute to both PT 1 and PT  0 there are no systematic differences in baseline

6
SAS Global Forum 2011 Statistics and Data Analysis

characteristics between the two populations.

The solution to the multivariate logistic regression model of the study cohort is:

Log (p/(1-p)) = -3.4767–0.3128SPC+0.1516Male +…-0.00329*LOS

PT=1 and P T=0 can be computed using the following SAS program:

Data PT_PC;
Set StudyCohort;
Ln_PT=-3.4767 - 0.3128*SPC + 0.1516*Male + ...-0.00329*LOS;
Ln_PC=-3.4767 + 0.1516*Male + ...-0.00329*LOS;
PT=exp(Ln_PT)/(1+exp(Ln_PT));
PC=exp(Ln_PC)/(1+exp(Ln_PC));
Run;

However, typing these SAS codes is tedious and not efficient. We show an easy way to compute P T=1 and P T=0. First,
we generate a population cohort which includes both the treated cohort and the control cohort.

Data Population;
Set StudyCohort (in=a)
StudyCohort (in=b);
If a then SPC=1;
If b then SPC=0;
Run;

We then run logistic regression using the score option.

Proc Logistic data=StudyCohort descending;

Class SPC/param=ref ref=first;
Model Death_1year=SPC Var_1-Var_n/rl;
Score data=Population out=Pred_risk;
Run;

We then compute the mean probability of death for each patient in the population cohort – once for the patients as if
they are all untreated (SPC=0), and again for the patients as if they are all treated (SPC=1). The ratio of these two mean
probabilities is the estimated RR.

Proc Means data=Pred_risk nway;

Class SPC;
Var Prob;
Output out=pop_risk mean=pop_risk;
Run;

Proc Transpose data=pop_risk out=pop_risk prefix=SPC_;

Id SPC;
Var pop_risk
Run;

Data pop_risk;
Set pop_risk;
Adjusted_RR=SPC_1/SPC_0;
Run;

Proc Print data=pop_risk;

Var Adjusted_RR;
Run;

The confidence interval of the RR can be estimated using the bootstrap method (Efron & Tibshirani 1993). A bootstrap
sample is a random sample drawn with replacement from the original sample such that the random sample has the
same size as the original sample. Constructing nonparametric bootstrap 95% CIs requires drawing a large number of
bootstrap samples (say 1000 bootstrap samples) and estimating the quantity of interest in each of the bootstrap
samples. The endpoints of the nonparametric bootstrap 95% CIs would be the 2.5th and 97.5th percentiles of that
quantity across the bootstrap samples.

Applying the method of Austin (Austin 2010b), we got an adjusted RR of 0.789 if all patients were referred to SPC

7
SAS Global Forum 2011 Statistics and Data Analysis

compared with the case where all patients were not referred to SPC (95% CI: 0.690-0.889). Thus, SPC was associated
with a 21.1% relative decrease (95% CI: 11.1%-31.0%) in the risk of 1-year mortality.

Adjusted RR using Proc Phreg – Time-to-event

Statistically, survival models for time-to-event outcomes are more powerful than logistic models for testing the impact of
treatment for event outcomes. Austin described a method to derive the RR of an event occurring within a specific
duration of follow-up using an adjusted survival model (Austin 2010a). The method allows for the estimation of
measures of treatment effect that may be more clinically meaningful than the adjusted hazard ratio that is obtained
directly from the Cox proportional hazards regression model.

The SAS program is similar to that in last section above (see Adjusted RR using Proc Logistic – Marginal Probability
and Bootstrapping). The adjusted RR from survival models is 0.769 (95% CI: 0.671-0.865).

/* Dataset population generated above */

Proc Phreg data=StudyCohort descending;

Model SurvivalTime_1y*Death_1year(0)=SPC Var_1-Var_n/rl;
Baseline out=Pred_risk
covariates=Population
survival=survival/nomean;
Run;

Data Pred_risk;
Set Pred_risk;
Event_risk=1-survival;
Where SurvivalTime_1y=365;
Run;

/* RR calculation omitted (See SAS codes above) */

CONCLUSION
The crude and adjusted RRs of SPC are summarized in Figure 3. Obviously, the crude RR is further away from 1 than
the adjusted ones. Thus, the impact of SPC is over-estimated by the crude rate, which suggests that the risk-adjustment
is necessary. According to the whole study cohort, the point estimations of the adjusted RRs using Poisson regression
(0.777), modified Poisson regression (0.777), Logistic regression (0.789) and Cox proportional hazards model (0.769)
are quite close to one another. These adjusted RRs indicated that the ischemic stroke or TIA patients referred to SPCs
had greater survival than those without a referral to a SPC. According to the Propensity-score matching cohort, the
adjusted RR is 0.697, which suggests even more positive impact of the SPC on patient survival.

In this article, we describe 9 methods to derive adjusted RRs which were developed in the past 25 years, from 1985 to
2010, and illustrate the SAS program codes to estimate the adjusted RR accordingly. Table 2 shows their strengths and
limitations. In general, the data structure per se may lead to the method that should be used to estimate the adjusted
RR.

If there is no convergence problem, we can just use a Log-Binomial model to get the adjusted RR. However, if there is
a convergence problem, we should apply Modified Poisson regression instead. Petersen and Deddens compared both
Log-Binomial model and Modified Poisson regression and found (1) for very high prevalence and moderate sample size,
the Modified Poisson method yields less biased estimate of the prevalence ratios than the Log-Binomial method; (2)
However, for moderate prevalence and moderate sample size, the Log-Binomial method yields slightly less biased
estimate than the Modified Poisson method; (3) In nearly all cases, the Log-Binomial method yields slightly higher
power and smaller standard errors than the Modified Poisson method (Petersen & Deddens 2008).

If computing time is not an issue and both Log-Binomial and Modified Poisson regression models are questionable,
then we can obtain the adjusted RR using a Logistic regression model or Cox proportional hazards regression model.
Using these two models, we are able to get not only the RR, but also the other meaningful measures of treatment effect,
such as the absolute risk reduction, the RR reduction and the number needed to treat (Austin 2010a; Austin 2010b).

8
SAS Global Forum 2011 Statistics and Data Analysis

Figure 3. Forest Plot - Comparison between RR computing methods

Table 2. Comparison between different RR computing methods

Study Cohort
can exceed 1

Using Whole
Continuous

Consuming
Confidence
Estimate of
Probability

Covariates
Estimation

Converge

Methods of Computing
Problem

References
Interval
Biased

Adjusted Relative Risk

Adjust
Wide

Time

(Greenland &
Stratified Mantel-Haenszel
Robins 1985) 
Log-Binomial model (Wacholder 1986)  
Log-Binomial model with (Deddens et al.
negative intercept 2003)  
Poisson regression (McNutt et al. 2003)   
Modified Poisson regression (Zou 2004)  
Adjusted OR of Logistic
regression
(Zhang & Yu 1998)  
Propensity-score matching (Austin 2008)   
Logistic regression (Austin 2010b)  
Cox proportional hazards
regression
(Austin 2010a)  
9
SAS Global Forum 2011 Statistics and Data Analysis

REFERENCES
Agresti,A. & Min,Y. 2004. Effects and non-effects of paired identical observations in comparing proportions with binary
matched-pairs data. Stat Med 23, 65-75.

Altman,D.G., Deeks,J.J. & Sackett,D.L. 1998. Odds ratios should be avoided when events are common. BMJ 317,
1318.

Austin,P.C. 2007. Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic
review and suggestions for improvement. J Thorac. Cardiovasc Surg 134, 1128-1135.

Austin,P.C. 2008. The performance of different propensity-score methods for estimating relative risks. J Clin Epidemiol
61, 537-545.

Austin,P.C. 2010a. Absolute risk reductions and numbers needed to treat can be obtained from adjusted survival
models for time-to-event outcomes. J Clin Epidemiol 63, 46-55.

Austin,P.C. 2010b. Absolute risk reductions, relative risks, relative risk reductions, and numbers needed to treat can be
obtained from a logistic regression model. J Clin Epidemiol 63, 2-6.

Austin,P.C., Chiu,M., Ko,D.T., Geree,R. & Tu,J.V. 2010. Propensity score matching for estimating treatment effects. In:
Analysis of Observational Health Care Data Using SAS (Ed. by [Link], [Link], [Link] & [Link]), pp.
23-49. Cary, NC, SAS Institute Inc.

Daly,L.E. 1998. Confidence limits made easy: interval estimation using a substitution method. Am J Epidemiol 147,
783-790.

Deddens,D.A., Petersen,M.R. & Lei,X. 2003. Estimation of prevalence ratios when PROC GENMOD does not
converge. SAS User Group International Proceedings paper 270-28.

Deddens,J.A. & Petersen,M.R. 2004. Re: "Estimating the relative risk in cohort studies and clinical trials of common
outcomes". Am J Epidemiol 159, 213-214.

Deeks,J. 1998. When can odds ratios mislead? Odds ratios should be used only in case-control studies and logistic
regression analyses. BMJ 317, 1155-1156.

Efron,B. & Tibshirani,R.J. 1993. An Introduction to the Bootstrap. New York, NY: Chapman & Hall.

Greenland,S. & Robins,J.M. 1985. Estimation of a common effect parameter from sparse follow-up data. Biometrics 41,
55-68.

Localio,A.R., Margolis,D.J. & Berlin,J.A. 2007. Relative risks and confidence intervals were easily computed indirectly
from multivariable logistic regression. J Clin Epidemiol 60, 874-882.

McNutt,L.A., Wu,C., Xue,X. & Hafner,J.P. 2003. Estimating the relative risk in cohort studies and clinical trials of
common outcomes. Am J Epidemiol 157, 940-943.

Petersen,M.R. & Deddens,J.A. 2008. A comparison of two methods for estimating prevalence ratios. BMC Med Res
Methodol. 8, 9.

Spiegelman,D. & Hertzmark,E. 2005. Easy SAS calculations for risk or prevalence ratios and differences. Am J
Epidemiol 162, 199-200.

Tian,L. & Liu,K. 2006. Re: "Easy SAS calculations for risk or prevalence ratios and differences". Am J Epidemiol 163,
1157-1158.

Wacholder,S. 1986. Binomial regression in GLIM: estimating risk ratios and risk differences. Am J Epidemiol 123,
174-184.

Zhang,J. & Yu,K.F. 1998. What's the relative risk? A method of correcting the odds ratio in cohort studies of common
outcomes. JAMA 280, 1690-1691.

10
SAS Global Forum 2011 Statistics and Data Analysis

Zou,G. 2004. A modified poisson regression approach to prospective studies with binary data. Am J Epidemiol 159,
702-706.

ACKNOWLEDGEMENTS

We thank Jennifer Waller, Ruth Croxford and Paul Cascagnette for helpful comments on the manuscript. We wish to
acknowledge the helpful comments of Peter Austin, Kelvin Lam and Hong Zheng. This study used data from the
Registry of Canadian Stroke Network (RCSN). The RCSN is funded by the Canadian Stroke Network and the Ontario
Ministry of Health and Long-Term Care. The Institute for Clinical Evaluative Sciences is supported by an operating grant
from the Ontario Ministry of Health and Long-Term Care.

CONTACT INFORMATION

Jiming Fang, PhD

Program Lead Analyst - Cardiovascular
Institute for Clinical Evaluative Sciences
2075 Bayview Avenue, G106
Toronto, Ontario M4N 3M5
Canada
Works Phone: 416-480-6100 Ext. 3613
Fax: 416-480-6048
E-mail: [Link]@[Link]
Web : [Link]

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute
Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.

Common questions

Log-Binomial regression in SAS often faces convergence issues, especially when many covariates are included. The Log link function restricts the probability space, leading to convergence failure in practice. To address this, reducing the covariate number or starting with a negative intercept can assist convergence. Poisson regression serves as an alternative with no convergence issues, albeit with potential probabilities exceeding 1 .

Researchers may prefer Cox proportional hazards regression because it provides a direct estimate of the relative risk of event occurrence over time, making it more suitable for time-to-event data. It also allows computation of other measures like absolute risk reduction and number needed to treat, which may be more meaningful than the odds ratios provided by logistic regression, especially when outcomes are common .

Adjusted relative risk is crucial for evaluating the impact of SPC referrals on post-stroke mortality. It accounts for confounding factors like demographic and clinical characteristics, providing a more accurate picture of the SPC's effect on outcomes. For example, the adjusted RR indicated that SPC referral was associated with a significant reduction in 1-year mortality, underscoring the importance of targeted follow-up care in improving patient outcomes .

Adjusted relative risks are preferred over odds ratios in cohort studies with common outcomes because they provide a more direct and interpretable measure of association. Odds ratios can overestimate the strength of an association in such scenarios, potentially leading to misleading conclusions about the effect size .

Adjusted relative risk estimates, like those obtained from modified Poisson or logistic regression, indicate a lower risk of 1-year mortality for patients referred to secondary prevention clinics compared to those not referred. For instance, an RR of 0.789 suggests a 21.1% relative decrease in 1-year mortality for referred patients, highlighting the positive impact of follow-up care on patient outcomes .

The bootstrap method offers flexibility and does not assume normality, providing better estimates of confidence intervals, particularly in small samples or complex models. However, it can be computationally intensive and time-consuming, potentially limiting its use with large datasets or in time-sensitive analyses .

The Modified Poisson regression model, which includes robust error variance, mitigates the issue of overly conservative confidence intervals present in traditional Poisson regression, ensuring correct CI coverage. Despite this, it may still encounter limitations when outcomes are common, as point estimates might exceed 1. This occurs because the log link does not constrain probabilities within the 0 to 1 range .

SAS procedures like FREQ, GENMOD, LOGISTIC, and PHREG are used to compute adjusted relative risks (RR) by addressing systematic differences in observational cohort studies, where treated and untreated subjects often differ systematically. These procedures help adjust for confounders, allowing for a more accurate estimation of treatment effects on dichotomous outcomes. For example, the FREQ procedure provides crude RR estimation, while GENMOD can handle multiple covariates through Log-Binomial or Poisson regression to adjust RR. LOGISTIC can convert odds ratios to relative risks, and PHREG is used for time-to-event data .

Propensity-score matching aims to balance covariates between treated and control groups, mimicking randomization in observational studies. It directly estimates treatment effects by creating comparable groups, whereas logistic regression adjusts for covariates in estimation. Propensity-score matching can be less biased and can provide a more intuitive understanding of causal effects, but it might be less effective if there are complex treatment-covariate interactions .

Despite its successful convergence, the Poisson regression model has limitations such as estimating probabilities greater than 1, which is invalid. The Poisson distribution assumption may not always hold, especially when the real-world data distribution deviates from model assumptions, leading to model fit issues .

Poisson Models for Event Rates Analysis
No ratings yet
Poisson Models for Event Rates Analysis
45 pages
Methods For Estimating Prevalence Ratio
No ratings yet
Methods For Estimating Prevalence Ratio
6 pages
An Overall Strategy Based On Regression Models To Estimate Relative Survival Remontet - Et - Al-2007-Statistics - in - Medicine
No ratings yet
An Overall Strategy Based On Regression Models To Estimate Relative Survival Remontet - Et - Al-2007-Statistics - in - Medicine
15 pages
Correlation Analysis of Demographics and Insurance
No ratings yet
Correlation Analysis of Demographics and Insurance
50 pages
Regression Models in Medical Diagnosis
No ratings yet
Regression Models in Medical Diagnosis
46 pages
Statistics For Health Data Science: An Organic Approach 1st Edition Ruth Etzioni Newest Ebook Edition 2026
No ratings yet
Statistics For Health Data Science: An Organic Approach 1st Edition Ruth Etzioni Newest Ebook Edition 2026
56 pages
Relative Risk Estimation in Epidemiology
No ratings yet
Relative Risk Estimation in Epidemiology
4 pages
Absolute Risk in Clinical Management
No ratings yet
Absolute Risk in Clinical Management
15 pages
USMLE Step 1 High Yield Biostatistics Concepts
No ratings yet
USMLE Step 1 High Yield Biostatistics Concepts
17 pages
B-Spline Model for Relative Survival Analysis
No ratings yet
B-Spline Model for Relative Survival Analysis
18 pages
Regression Analysis of Smoking and Life Expectancy
No ratings yet
Regression Analysis of Smoking and Life Expectancy
4 pages
Alternative Regression Methods in Healthcare Analysis
No ratings yet
Alternative Regression Methods in Healthcare Analysis
14 pages
Optimal Cutpoints in Clinical Diagnostics
No ratings yet
Optimal Cutpoints in Clinical Diagnostics
45 pages
Stroke Prediction with Machine Learning Models
No ratings yet
Stroke Prediction with Machine Learning Models
13 pages
Biostatistics in Public Health Science
No ratings yet
Biostatistics in Public Health Science
18 pages
Calculating Measures of Effect in Epidemiology
No ratings yet
Calculating Measures of Effect in Epidemiology
209 pages
Cancer Mortality Prediction Analysis
No ratings yet
Cancer Mortality Prediction Analysis
7 pages
Epidemiology Concepts and Study Design
No ratings yet
Epidemiology Concepts and Study Design
6 pages
Understanding Risk Metrics in Epidemiology
No ratings yet
Understanding Risk Metrics in Epidemiology
3 pages
Mantel-Haenszel Method in Biostatistics
No ratings yet
Mantel-Haenszel Method in Biostatistics
38 pages
Rel Risk
No ratings yet
Rel Risk
14 pages
Random Forest for Health Risk Prediction
No ratings yet
Random Forest for Health Risk Prediction
17 pages
Understanding Cohort Study Design
No ratings yet
Understanding Cohort Study Design
65 pages
CHIME Model for Diabetes Outcomes Analysis
No ratings yet
CHIME Model for Diabetes Outcomes Analysis
1 page
Nordpred: Cancer Incidence Prediction
No ratings yet
Nordpred: Cancer Incidence Prediction
21 pages
Block J Community OSPE Notes
No ratings yet
Block J Community OSPE Notes
28 pages
Mortality Risk Score Prediction in An Elderly Population Using Machine Learning
No ratings yet
Mortality Risk Score Prediction in An Elderly Population Using Machine Learning
10 pages
An Introduction To Survival Analysis Using Stata 3rd Edition Mario Cleves Ebook Testbank Solutions Updated Ebook Pack
100% (1)
An Introduction To Survival Analysis Using Stata 3rd Edition Mario Cleves Ebook Testbank Solutions Updated Ebook Pack
72 pages
Multivariate Frailty Models for Survival Analysis
No ratings yet
Multivariate Frailty Models for Survival Analysis
51 pages
Understanding Census and Epidemiology Basics
No ratings yet
Understanding Census and Epidemiology Basics
51 pages
L12+L13 XS, Cohort Studies
No ratings yet
L12+L13 XS, Cohort Studies
20 pages
Understanding Harm in Observational Studies
No ratings yet
Understanding Harm in Observational Studies
25 pages
Comparative Health Risk Assessment Methods
No ratings yet
Comparative Health Risk Assessment Methods
20 pages
Public Health Sciences Overview
No ratings yet
Public Health Sciences Overview
20 pages
Understanding Risk in Epidemiology
No ratings yet
Understanding Risk in Epidemiology
61 pages
Frailty-Based Mortality Models for Longevity Risk
No ratings yet
Frailty-Based Mortality Models for Longevity Risk
20 pages
Decision Curve Analysis for Cataract Diagnosis
No ratings yet
Decision Curve Analysis for Cataract Diagnosis
23 pages
Mortality Data and Disease Impact Measures
No ratings yet
Mortality Data and Disease Impact Measures
4 pages
Chapter 21.2 - Discrimination and Calibration of Clinical Prediction Models
No ratings yet
Chapter 21.2 - Discrimination and Calibration of Clinical Prediction Models
14 pages
Hosmer D.W., Lemeshow S. Applied Survival Analysis PDF
No ratings yet
Hosmer D.W., Lemeshow S. Applied Survival Analysis PDF
206 pages
Hosmer D.W., Lemeshow S. Applied Survival Analysis PDF
No ratings yet
Hosmer D.W., Lemeshow S. Applied Survival Analysis PDF
206 pages
Biostatistics Formula Cheat Sheet
No ratings yet
Biostatistics Formula Cheat Sheet
6 pages
Cox Regression Models in PROC PHREG
No ratings yet
Cox Regression Models in PROC PHREG
57 pages
Pi Is 0895435625003749
No ratings yet
Pi Is 0895435625003749
2 pages
Health Determinants and Disease Models
No ratings yet
Health Determinants and Disease Models
6 pages
Atm 08 16 982
No ratings yet
Atm 08 16 982
8 pages
FirstAid 2017 PDF
88% (8)
FirstAid 2017 PDF
412 pages
Clinical Prediction Models Overview
No ratings yet
Clinical Prediction Models Overview
36 pages
Machine Learning in Health Care Risks
No ratings yet
Machine Learning in Health Care Risks
5 pages
Comparing Prevalence Ratio Estimation Methods
No ratings yet
Comparing Prevalence Ratio Estimation Methods
9 pages
Poisson e Binomial Negativo (Interpretação)
No ratings yet
Poisson e Binomial Negativo (Interpretação)
21 pages
Data Mining Techniques Overview
No ratings yet
Data Mining Techniques Overview
109 pages
Understanding Cohort and Case Control Studies
No ratings yet
Understanding Cohort and Case Control Studies
47 pages
Minitab Tip Sheet 15
No ratings yet
Minitab Tip Sheet 15
5 pages
Evidence-Based Therapy Assessment Guide
No ratings yet
Evidence-Based Therapy Assessment Guide
8 pages
Categorical Data Analysis in Medicine
No ratings yet
Categorical Data Analysis in Medicine
27 pages
Unavoidable Errors in Hypothesis Testing
100% (1)
Unavoidable Errors in Hypothesis Testing
59 pages
Nonparametric Survival Analysis in R
No ratings yet
Nonparametric Survival Analysis in R
27 pages
Poisson Regression
No ratings yet
Poisson Regression
17 pages
A Hacker's Guide To SAS® Environment Customization
No ratings yet
A Hacker's Guide To SAS® Environment Customization
12 pages
Drug Interactions Genmod
No ratings yet
Drug Interactions Genmod
10 pages
Longitudinal and Repeated Measures
No ratings yet
Longitudinal and Repeated Measures
19 pages
A Technique For Producing Sorted Columns With A Hanging First Line Using PROC REPORT
No ratings yet
A Technique For Producing Sorted Columns With A Hanging First Line Using PROC REPORT
6 pages
ADaM Reviewer’s Guide Overview
No ratings yet
ADaM Reviewer’s Guide Overview
5 pages
GTL Ver4
100% (1)
GTL Ver4
718 pages
ADaM in Clinical Trials Overview
No ratings yet
ADaM in Clinical Trials Overview
28 pages
Numeric Length
No ratings yet
Numeric Length
12 pages
Overview of SAS Functions for Novices
No ratings yet
Overview of SAS Functions for Novices
8 pages
Epoch
No ratings yet
Epoch
2 pages
Biostatistics Concepts by Wayne W. Daniel
100% (6)
Biostatistics Concepts by Wayne W. Daniel
186 pages
Understanding SDTM
No ratings yet
Understanding SDTM
10 pages
Silent Power
100% (14)
Silent Power
239 pages
Essence of Ribhu Gita
No ratings yet
Essence of Ribhu Gita
29 pages
Sociological Imagination and Social Roles
No ratings yet
Sociological Imagination and Social Roles
20 pages
MBA Finance Graduate with SAP FICO Skills
No ratings yet
MBA Finance Graduate with SAP FICO Skills
3 pages
Religion's Impact on Age Groups in Mizoram
No ratings yet
Religion's Impact on Age Groups in Mizoram
28 pages
Land Economy Examination Paper
No ratings yet
Land Economy Examination Paper
6 pages
HEC MBA Case Book Overview 2019
No ratings yet
HEC MBA Case Book Overview 2019
139 pages
Hybrid Maize Technology Evaluation in Ethiopia
No ratings yet
Hybrid Maize Technology Evaluation in Ethiopia
6 pages
The Application of FLAC and FLAC3D To The Support Design of Underground Cavern
No ratings yet
The Application of FLAC and FLAC3D To The Support Design of Underground Cavern
5 pages
Resource Requirements Planning Overview
No ratings yet
Resource Requirements Planning Overview
56 pages
Factors Affecting Purchase Behavior
No ratings yet
Factors Affecting Purchase Behavior
19 pages
Purpose of Table of Specifications
No ratings yet
Purpose of Table of Specifications
6 pages
Simple Pendulum Physics Project
No ratings yet
Simple Pendulum Physics Project
8 pages
Kwiecien I Kowalczyk-Rolczynska P Popielas M Pay As You Live
No ratings yet
Kwiecien I Kowalczyk-Rolczynska P Popielas M Pay As You Live
11 pages
World Malaria Report 2022
No ratings yet
World Malaria Report 2022
356 pages
Visual Data Representation Strategies
No ratings yet
Visual Data Representation Strategies
4 pages
Kano's Attractive Quality in Packaging
No ratings yet
Kano's Attractive Quality in Packaging
33 pages
Business Statistics Exam Instructions
0% (1)
Business Statistics Exam Instructions
1 page
2016 2017 Guidelines For Thesis Preparation Faculty of Education PDF
No ratings yet
2016 2017 Guidelines For Thesis Preparation Faculty of Education PDF
33 pages
Monthly Allowance vs. Daily Expenses at UiTM Jengka
No ratings yet
Monthly Allowance vs. Daily Expenses at UiTM Jengka
10 pages
Sensory Quality of Banana Peel Crackers
No ratings yet
Sensory Quality of Banana Peel Crackers
8 pages
Socio-Economic Factors Affecting Fertility in Nepal
No ratings yet
Socio-Economic Factors Affecting Fertility in Nepal
6 pages
Healrh Hazard
No ratings yet
Healrh Hazard
293 pages
Slope Stability Factor Calculation Guide
100% (4)
Slope Stability Factor Calculation Guide
2 pages
Forecasting Methods: Trends & Seasonality
No ratings yet
Forecasting Methods: Trends & Seasonality
44 pages
Urban Research Paper Guidelines
No ratings yet
Urban Research Paper Guidelines
4 pages
10 1017@sjp 2013 79
No ratings yet
10 1017@sjp 2013 79
10 pages
Refractive Errors in Addis Ababa Students
No ratings yet
Refractive Errors in Addis Ababa Students
6 pages
Morphological Awareness in BAEL Students
No ratings yet
Morphological Awareness in BAEL Students
21 pages
Research Proposal and Literature Review Guide
No ratings yet
Research Proposal and Literature Review Guide
12 pages
Validity Study: Lumina Spark vs. BFI-2
No ratings yet
Validity Study: Lumina Spark vs. BFI-2
30 pages
Statistical Tables for Engineers
No ratings yet
Statistical Tables for Engineers
45 pages

Freq, Genmod, Logistic

Uploaded by

Freq, Genmod, Logistic

Uploaded by

SAS Global Forum 2011 Statistics and Data Analysis

Using SAS® Procedures FREQ, GENMOD, LOGISTIC, and PHREG to Estimate

Crude RR using Proc Freq

Proc Freq data=StudyCohort;

Adjusted RR using Proc Freq – Stratified Mantel-Haenszel

Proc Freq data= StudyCohort;

Table 1. Baseline comparisons

Variable Value No-SPC SPC P-value

Adjusted RR using Proc GenMod – Log-Binomial regression Model

Proc GenMod data=StudyCohort descending;

WARNING: The specified model did not converge.

Proc GenMod data=StudyCohort descending;

WARNING: The relative Hessian convergence criterion of 0.126294346 is greater than

Adjusted RR using Proc GenMod – Poisson regression model

Proc GenMod data=StudyCohort descending;

Adjusted RR using Proc GenMod – Modified Poisson regression model

Proc GenMod data=StudyCohort descending;

Adjusted RR using Proc Logistic – OR-to-RR formula

Proc Logistic data=StudyCohort descending;

Adjusted OR=0.731 (95% CI: 0.618-0.866)

Adjusted RR using Proc Logistic – Propensity-score matching

Figure 2a. Density distribution of propensity scores – Original cohort

Figure 2b. Density distribution of propensity scores – Propensity-score matched cohort

Probability of the outcome if a patient was treated:

characteristics between the two populations.

Log (p/(1-p)) = -3.4767–0.3128*SPC+0.1516*Male +…-0.00329*LOS

We then run logistic regression using the score option.

Proc Logistic data=StudyCohort descending;

Proc Means data=Pred_risk nway;

Proc Transpose data=pop_risk out=pop_risk prefix=SPC_;

Proc Print data=pop_risk;

Adjusted RR using Proc Phreg – Time-to-event

/* Dataset population generated above */

Proc Phreg data=StudyCohort descending;

/* RR calculation omitted (See SAS codes above) */

Figure 3. Forest Plot - Comparison between RR computing methods

Table 2. Comparison between different RR computing methods

Adjusted Relative Risk

Jiming Fang, PhD

Common questions

What are the challenges and solutions when using Log-Binomial regression in SAS to estimate relative risks?

What are the challenges and solutions when using Log-Binomial regression in SAS to estimate relative risks?

Why might researchers prefer using a Cox proportional hazards regression model over logistic regression when estimating adjusted relative risks in time-to-event data?

Why might researchers prefer using a Cox proportional hazards regression model over logistic regression when estimating adjusted relative risks in time-to-event data?

Discuss the role of adjusted relative risk in the context of evaluating secondary prevention clinic (SPC) referrals post-stroke using the Registry of Canadian Stroke Network data.

Discuss the role of adjusted relative risk in the context of evaluating secondary prevention clinic (SPC) referrals post-stroke using the Registry of Canadian Stroke Network data.

Why might adjusted relative risks be preferred over odds ratios in cohort studies with common outcomes?

Why might adjusted relative risks be preferred over odds ratios in cohort studies with common outcomes?

In the context of the Canadian Stroke Network study, how do adjusted relative risk estimates inform the evaluation of follow-up care's impact on mortality?

In the context of the Canadian Stroke Network study, how do adjusted relative risk estimates inform the evaluation of follow-up care's impact on mortality?

What are the advantages and disadvantages of using a bootstrap method to estimate the confidence interval for the adjusted relative risk?

What are the advantages and disadvantages of using a bootstrap method to estimate the confidence interval for the adjusted relative risk?

How does the Modified Poisson regression model mitigate issues seen with traditional Poisson regression, and what are its limitations?

How does the Modified Poisson regression model mitigate issues seen with traditional Poisson regression, and what are its limitations?

How can SAS procedures like FREQ, GENMOD, LOGISTIC, and PHREG be used to address systematic differences in cohort studies?

How can SAS procedures like FREQ, GENMOD, LOGISTIC, and PHREG be used to address systematic differences in cohort studies?

How does propensity-score matching compare with logistic regression for estimating adjusted relative risk?

How does propensity-score matching compare with logistic regression for estimating adjusted relative risk?

What limitations did researchers face when using the Poisson regression model, even though it converged successfully without issues?

What limitations did researchers face when using the Poisson regression model, even though it converged successfully without issues?

You might also like

Log (p/(1-p)) = -3.4767–0.3128SPC+0.1516Male +…-0.00329*LOS