0% found this document useful (0 votes)
8 views5 pages

Chapter 2

This document discusses Generalized Linear Models (GLMs), which extend ordinary regression models to accommodate nonnormal response distributions and multiple explanatory variables. It outlines the three components of GLMs: the random component (response variable and its distribution), the systematic component (explanatory variables in a linear predictor), and the link function (connecting the random and systematic components). The document also introduces specific GLMs for binary and count data, including logistic regression and Poisson loglinear models.

Uploaded by

Sanjida Tasnim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views5 pages

Chapter 2

This document discusses Generalized Linear Models (GLMs), which extend ordinary regression models to accommodate nonnormal response distributions and multiple explanatory variables. It outlines the three components of GLMs: the random component (response variable and its distribution), the systematic component (explanatory variables in a linear predictor), and the link function (connecting the random and systematic components). The document also introduces specific GLMs for binary and count data, including logistic regression and Poisson loglinear models.

Uploaded by

Sanjida Tasnim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Generalized Linear Models

In previous chapter we focused on methods for two-way contingency tables. Most studies, however, have several
explanatory variables, and they may be continuous as well as categorical. The goal is usually to describe their effects
on response variables. Modeling the effects helps us do this efficiently. A good-fitting model evaluates effects,
includes relevant interactions, and provides smoothed estimates of response probabilities. This chapter focuses on
model building for categorical response variables. In this chapter we introduce a family of generalized linear models
that contains the most important models for categorical responses as well as standard models for continuous
responses.

2.1 GENERALIZED LINEAR MODEL

Generalized linear models (GLM) extend ordinary regression models to encompass nonnormal response
distributions and modeling functions of the mean. Three components specify a generalized linear model: A random
component identifies the response variable Y and its probability distribution; a systematic component specifies
explanatory variables used in a linear predictor function; and a link function specifies the function of E(Y) that the
model equates to the systematic component. Nelder and Wedderburn (1972) introduced the class of GLMs.

2.1.1 Components of Generalized Linear Models

The random component of a GLM consists of a response variable Y with independent observations (y1, . . . , yN )
from a distribution in the natural exponential family. This family has probability density function or mass function
of form

Several important distributions are special cases, including the Poisson and binomial. The value of the parameter θi

may vary for i=1,2, …N as a function of values of explanatory variables. The term is called the natural
parameter of the distribution.

The systematic component of a GLM relates a vector (η1 , η2 , … ,η N ) to the explanatory variables through a linear
model. Let xi j denote the value of explanatory variable j (j=0, 1, 2, . . . ) for subject i. Then

This linear combination of explanatory variables is called the linear predictor.

The third component of a GLM is a link function that connects the random and systematic components. Let

μi=E ( Y i ) ,i=1 , 2 ,3 , … N . The model links μi ¿ ηi by ηi =g ( μ i ) where the link function g is a monotonic,
differentiable function. Thus, g links μi to explanatory variables through the formula
, i=1, 2,….N
The link function g ( μi )=μ i called the identity link, has ηi =μ i . It specifies a linear model for the mean itself. This
is the link function for ordinary regression with normally distributed Y. The link function that transforms the mean to
the natural parameter is called the canonical link. For it, g ( μi )=Q (θ¿¿ i)¿ and Q(θ¿ ¿i)= ∑ β j x ij ¿.
j
In summary, a GLM is a linear model for a transformed mean of a response variable that has distribution in the
natural exponential family. We now illustrate the three components by introducing the key GLMs for discrete
response variables.

2.1.2 Binomial Logit Models for Binary Data


Many categorical response variables have only two categories. The observation for each subject might be classified
as a “Success” or a “Failure”. Represent these outcomes by 1 and 0. The Bernoulli distribution for binary random

variables specifies probabilities for the two outcomes, for which .


When has Bernoulli distribution with parameter , the probability mass function is

for y = 0 and 1. This is in the natural exponential family, identifying θ with π , a( π ¿=1- π and b(y)=1 and

. The natural parameter is the log odds of response outcome 1, the logit of
π. This is the canonical link function. GLMs using the logit link are often called logistic regression models or logit
models.

2.1.3 Poisson Loglinear Models for Count Data

Some response variables have counts as their possible outcomes. In a health survey, each observation might be the
number of illness in the past year for which the subject visited a doctor. Counts also occur as entries in contingency
tables.

The simplest distribution for count data is the Poisson. Like counts, Poisson variates can take any nonnegative
integer value. Let Y denote a count and let μ=E (Y ) .The Poisson probability mass function for a count Y is

1
This has natural exponential form (eq. 1) with θ=μ , a ( μ )=exp (−μ ) , b ( y )= ∧Q ( μ )=log ( μ ) . The natural
y!
parameter is log ( μ ) . So the canonical link function is the log link η=log ( μ ) . The model using this link function
is
This model is called a Poisson loglinear model.

Generalized Linear Models for Binary Data

Linear Probability Model


For a Binary response, the regression model

is called a linear probability model. The linear probability model has a major structural defect. Probabilities must fall
between 0 and 1, whereas linear functions take values over the entire real line.

E(Y) = π ( x ) =P(Y =1)

Logistic Regression Model


Because of the structural problems with the linear probability model, it is more fruitful to study models implying a

curvilinear relationship between .

is called the logistic regression model.

 When the model holds with , the binary response is independent of .


 When β>0, π ( x ) increases as x increases and
 When β<0, π ( x ) decreases as x increases.

For model , the odds of making response “YES=1” are


This formula provides a basic interpretation for . The odds increase multiplicatively by for every unit increase
in . The log odds has the linear relationship

For multiple predictors, log


( π (x )
1−π ( x ))=α + β 1 x1 + β 2 x 2 +…+ β p x p

Binomial GLM for 2×2 Contingency Tables

Among the simplest GLMs for a binary response is the one having a single explanatory variable X that is also
binary. Label its values by 0 and 1. For a given link function, the GLM link

link ( π ( x ) ) =α + βx

has the effect of X described by β=link [ π ( 1 ) ] −link [ π ( 0 ) ]

For the identity link, β=π ( 1 )−π ( 0 ) is the difference between proportions.

For the log link, β=log [ π ( 1 ) ] −log [ π ( 0 ) ] =log ¿ is the log relative risk.

For the logit link,

π ( 1) π ( 0) π ( 1 ) /(1−π ( 1 ))
β=logit [ π ( 1 ) ] −logit [ π ( 0 ) ] =log −log =log
1−π ( 1 ) 1−π ( 0 ) π ( 0 ) /(1−π ( 0 ) )

is the log odds ratio.

Example: Snoring and Heart Disease

Table 2: Relationship between Snoring and Heart Disease

We illustrate the linear probability model with Table 2, from an epidemiological survey of 2484 subjects to
investigate snoring as a risk factor for heart disease. Those surveyed were classified according to their spouses’
report of how much they snored. The model states that the probability of heart disease is linearly related to the level
of snoring x. We treat the rows of the table as independent binomial samples. No obvious choice of scores exists for
categories of x. We used (0, 2, 4, 5), treating the last two levels as closer than the other adjacent pairs. ML estimates
and standard errors are the same if we use a data file of 2484 binary observations or if we enter the four binomial
totals of “yes” and “no” responses listed in Table 2.

From SPSS, we get ^π ( x ) =0.0172+0.0198 x


For nonsnorers (x=0), the estimated proportion of subjects having heart disease is
0.0172. Table 2 shows the sample proportions and the fitted values (estimated values of E(Y) for a GLM) for this
model.

For the snoring data in Table 2, SPSS reports the logistic regression
logit [ ^π ( x ) ] =−3.87+0.40 x
The positive β=0.40 reflects the increased incidence of heart disease at higher snoring levels. Figure 1 displays
the fit. The fit is close to linear over this narrow range of estimated probabilities, and results are similar to those for
the linear probability model.

You might also like