0% found this document useful (0 votes)
16 views8 pages

Multiolelogisitcregre

StatModel

Uploaded by

kz4scq65gy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views8 pages

Multiolelogisitcregre

StatModel

Uploaded by

kz4scq65gy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Multiple logistic regression models:

an example

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

ML estimation 3
Dummy variable coding schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
disease vs. age, area & status - R summary output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
disease vs. age, area & status - Estimated odds ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Hypothesis testing 7
H0 : βage = βSect2 = βMiddle = βUpper = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Likelihood ratio test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
H0 : βMiddle = βUpper = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Likelihood ratio test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
H0 : βage = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Likelihood ratio test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Choice of the dummy coding schemes 14


Change in the coding scheme for the dependent variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
disease vs. age, area & status - R summary output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Change in the coding scheme for a regressor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
disease vs. age, area & status - R summary output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1
Introduction
In a health study to investigate an epidemic outbreak of a disease that is spread by mosquitoes in a city, 98 individuals were
randomly sampled (See also Kutner et al., 2005, Chapter 14). For each individual, information about the following variables was
collected:

■ disease: absence/presence of specific symptoms associated with the disease,


■ age: age of the individual (years),
■ area: sector of the city in which the individual lives (two categories: sector 1/sector 2),
■ status: socio-economic status of the household to which the individual belongs (three categories: lower/medium/upper)
Is there a significant association between the presence of the disease syptoms and any of the regressors?

Stat. Mod. Giuliano Galimberti – 2

ML estimation 3
Dummy variable coding schemes
■ disease:
y
absence 0
presence 1
⇒ πi = P (yi = 1) = P (diseasei = present) i = 1, . . . , n
■ area:
areaSect2
Sector 1 0
Sector 2 1
■ status:
statusMiddle statusUpper
Lower 0 0
Middle 1 0
Upper 0 1

Stat. Mod. Giuliano Galimberti – 4

2
disease vs. age, area & status - R summary output
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.618 0.613 -4.270 0.000
age 0.030 0.014 2.203 0.028
areaSect2 1.575 0.502 3.139 0.002
statusMiddle 0.714 0.654 1.092 0.275
statusUpper 0.305 0.604 0.505 0.613

Null deviance: 122.32 on 97 degrees of freedom


Residual deviance: 101.05 on 93 degrees of freedom
AIC: 111.05

Note that:
■ the null deviance corresponds, up to a constant, to minus twice the maximized log-likelihood for a logistic regression model
that contains only the intercept (without regressors)
■ the residual deviance corresponds, up to a constant, to minus twice the maximized log-likelihood of the fitted model
■ for a multiple logistic regression model, AIC is given, up to a constant, by the residual deviance plus twice the number of model
parameters

Stat. Mod. Giuliano Galimberti – 5

disease vs. age, area & status - Estimated odds ratios


bk exp (bk )
age 0.030 1.0302
areaSect2 1.575 4.8295
statusMiddle 0.714 2.0422
statusUpper 0.305 1.3570

■ the odds of an individual having contracted the disease increase by about 3.0 percent with each additional year of age, for given
city sector location and socio-economic status
■ the odds of an individual from sector 2 having contracted the disease are almost five times as great as for an individual from
sector 1, for given age and socio-economic status
■ the odds of an individual with middle socio-economic status having contracted the disease are almost twice times as great as
for an individual with lower socio-economic status, for given age and city sector location
■ the odds of an individual with upper socio-economic status having contracted the disease are about 35 percent larger than the
odds of an individual with lower socio-economic status, for given age and city sector location

Stat. Mod. Giuliano Galimberti – 6

3
Hypothesis testing 7

H0 : βage = βSect2 = βMiddle = βUpper = 0


■ Full model:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.618 0.613 -4.270 0.000
age 0.030 0.014 2.203 0.028
areaSect2 1.575 0.502 3.139 0.002
statusMiddle 0.714 0.654 1.092 0.275
statusUpper 0.305 0.604 0.505 0.613
Null deviance: 122.32 on 97 degrees of freedom
Residual deviance: 101.05 on 93 degrees of freedom
AIC: 111.05
■ Reduced model:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.771 0.217 -3.548 0.000
Null deviance: 122.32 on 97 degrees of freedom
Residual deviance: 122.32 on 97 degrees of freedom
AIC: 124.32

Stat. Mod. Giuliano Galimberti – 8

Likelihood ratio test


Model Resid. Df Resid. Dev Df Deviance Pr(>Chi)
disease~1 97 122.32
disease~age+sector+status 93 101.05 4 21.26 0.0003

L(F )
2 ln = −2 ln [L(R) − L(F )] = 122.32 − 101.05 = 21.26
L(R)

■ At least one of the three regressors is significantly associated with the presence of the disease (at a significance level α = 0.01)
■ Note that the degrees of freedom for this test statistic are equal to 4, since 4 regression coefficients are set equal to 0,
according to H0

Stat. Mod. Giuliano Galimberti – 9

4
H0 : βMiddle = βUpper = 0
■ Full model:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.618 0.613 -4.270 0.000
age 0.030 0.014 2.203 0.028
areaSect2 1.575 0.502 3.139 0.002
statusMiddle 0.714 0.654 1.092 0.275
statusUpper 0.305 0.604 0.505 0.613
Null deviance: 122.32 on 97 degrees of freedom
Residual deviance: 101.05 on 93 degrees of freedom
AIC: 111.05
■ Reduced model:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.335 0.511 -4.569 0.000
age 0.029 0.013 2.224 0.026
areaSect2 1.673 0.487 3.434 0.001
Null deviance: 122.32 on 97 degrees of freedom
Residual deviance: 102.26 on 95 degrees of freedom
AIC: 108.26

Stat. Mod. Giuliano Galimberti – 10

Likelihood ratio test


Model Resid. Df Resid. Dev Df Deviance Pr(>Chi)
disease~age+sector 95 102.26
disease~age+sector+status 93 101.05 2 1.21 0.5474

L(F )
2 ln = −2 ln [L(R) − L(F )] = 102.26 − 101.05 = 1.21
L(R)

■ There are not significant differences in the probability of having the disease among the three categories of socio-economic
status, for given age and city sector location
■ Note that the degrees of freedom for this test statistic are equal to 2, since 2 regression coefficients are set equal to 0, in order
to exclude the socio-economic status from the full model

Stat. Mod. Giuliano Galimberti – 11

5
H0 : βage = 0
■ Full model:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.618 0.613 -4.270 0.000
age 0.030 0.014 2.203 0.028
areaSect2 1.575 0.502 3.139 0.002
statusMiddle 0.714 0.654 1.092 0.275
statusUpper 0.305 0.604 0.505 0.613
Null deviance: 122.32 on 97 degrees of freedom
Residual deviance: 101.05 on 93 degrees of freedom
AIC: 111.05
■ Reduced model:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.917 0.481 -3.984 0.000
areaSect2 1.620 0.486 3.336 0.001
statusMiddle 0.713 0.636 1.120 0.263
statusUpper 0.478 0.583 0.820 0.412
Null deviance: 122.32 on 97 degrees of freedom
Residual deviance: 106.20 on 94 degrees of freedom
AIC: 114.2

Stat. Mod. Giuliano Galimberti – 12

Likelihood ratio test


Model Resid. Df Resid. Dev Df Deviance Pr(>Chi)
disease~sector+status 94 106.20
disease~age+sector+status 93 101.05 1 5.15 0.0233

L(F )
2 ln = −2 ln [L(R) − L(F )] = 106.20 − 101.05 = 5.15
L(R)
!
b2age 0.032
=
∼ = = 4.854
s2 [bage ] 0.0142

The age of an individual has a significant effect on the probability of having the disease, for given city sector location and
socio-economic status, but only if one considers a significance level equal to α = 0.05

Stat. Mod. Giuliano Galimberti – 13

6
Choice of the dummy coding schemes 14

Change in the coding scheme for the dependent variable


■ Original coding scheme:
y
absence 0
presence 1
■ Alternative coding scheme:
y
absence 1
presence 0
⇒ πi = P (yi = 1) = P (diseasei = absent) i = 1, . . . , n

Stat. Mod. Giuliano Galimberti – 15

disease vs. age, area & status - R summary output


■ Original coding scheme:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.618 0.613 -4.270 0.000
age 0.030 0.014 2.203 0.028
areaSect2 1.575 0.502 3.139 0.002
statusMiddle 0.714 0.654 1.092 0.275
statusUpper 0.305 0.604 0.505 0.613
Residual deviance: 101.05 on 93 degrees of freedom
■ Alternative coding scheme:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.618 0.613 4.270 0.000
age -0.030 0.014 -2.203 0.028
areaSect2 -1.575 0.502 -3.139 0.002
statusMiddle -0.714 0.654 -1.092 0.275
statusUpper -0.305 0.604 -0.505 0.613
Residual deviance: 101.05 on 93 degrees of freedom

The two models are equivalent (they have the same residual deviance): the change in the coding scheme affects only the signs of
the regression coefficients

Stat. Mod. Giuliano Galimberti – 16

7
Change in the coding scheme for a regressor
status
■ Original coding scheme:
statusMiddle statusUpper
Lower 0 0
Middle 1 0
Upper 0 1
Reference category: Lower
■ Alternative coding scheme:
status1 status2
Lower 1 0
Middle 0 1
Upper 0 0
Reference category: Upper

Stat. Mod. Giuliano Galimberti – 17

disease vs. age, area & status - R summary output


■ Original coding scheme:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.618 0.613 -4.270 0.000
age 0.030 0.014 2.203 0.028
areaSect2 1.575 0.502 3.139 0.002
statusMiddle 0.714 0.654 1.092 0.275
statusUpper 0.305 0.604 0.505 0.613
Residual deviance: 101.05 on 93 degrees of freedom
■ Alternative coding scheme:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.313 0.643 -3.599 0.000
age 0.030 0.014 2.203 0.028
areaSect2 1.575 0.502 3.139 0.002
status1 -0.305 0.604 -0.505 0.613
status2 0.409 0.599 0.682 0.495
Residual deviance: 101.05 on 93 degrees of freedom

The two models are equivalent: the change in the coding scheme for status affects only the intercept and the coefficients
associated with the corresponding dummy variables

Stat. Mod. Giuliano Galimberti – 18

You might also like