0% found this document useful (0 votes)

2 views32 pages

15-GLM

The document discusses Generalized Linear Models (GLMs), which extend classic linear regression to accommodate response variables from the exponential family and include a link function for transformation. It highlights the importance of GLMs in statistical inference, particularly for categorical data and logistic regression. Additionally, it addresses the limitations of linear probability models in predicting binary outcomes and introduces the logistic function as a solution for constraining predictions between 0 and 1.

Uploaded by

tanyalim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views32 pages

15-GLM

Uploaded by

tanyalim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

STAT 5703

Statistical Inference and Modeling for Data Science

Dobrin Marchev

covering 2M models : 1
logistic ; Ipoisson
d
.
start exponentially
-
not normally distributed
1
Recall: Multivariate Exponential family
GLM type of family
Let Y = (Y1, … , Yn) be a random sample with joint pdf

𝑓 𝒚; 𝜽 = 𝑐 𝜽 ℎ 𝒚 -σ𝑘
𝑒 𝑗=1 𝑡𝑗 𝒚 𝑞𝑗 (𝜽)
exponential term
where 𝜃 is a k-parameter vector.

Such distribution is said to be in a k-parameter exponential

family.
GLM
categorical linear analysis .

A Generalized Linear Model (GLM) extends the classic linear

regression model in two ways:

1. Y|x ~ Exponential family exponential dispersion function .

[edf]

2. Transformation between the outcome and the predictors:

g[E(Y|x)] = x’𝛽
parameter Related to mean
:

of distribution.
the function g() is called the link function.
GLM
More specifically, GLM assumes that the response variable has a
generalized linear model (aka exponential dispersion model EDM)
pdf:

𝑦𝑖 𝜃−𝜅 𝜃
𝑓 𝑦𝑖 ; 𝜃, 𝜙 = 𝑎 𝑦𝑖 , 𝜙 𝑒 𝜙 , 𝑖 = 1, … , 𝑛

where 𝜙 is called a dispersion parameter (and could be known).

𝜅 𝜃 is called cumulant function. Binom depends on parameterp .
This form of the distribution is aka#
natural exponential family,
because 𝜃 is the natural parameter.
Notation: Y ~ EDM(𝜇, 𝜙), where E(Yi) = 𝜇

Note: For a fixed value of the dispersion parameter ϕ it is a one-

parameter exponential family (indexed by 𝜃).
Example: Normal distribution
The normal distribution with unknown mean 𝜇 and variance 𝜎2:
𝑦2 𝜇2
𝑦−𝜇 2 − 𝑦𝜇−
1 − 𝑒 2𝜎2 2
𝑓 𝑦; 𝜇, 𝜎 2 = 𝑒 2𝜎2 = 𝑒 𝜎 2

2𝜋𝜎 2 2𝜋𝜎 2
Match with: D out needto do

𝑦𝜃−𝜅 𝜃 GLM with normal

𝑓 𝑦; 𝜃, 𝜙 = 𝑎 𝑦, 𝜙 𝑒 𝜙 model .

𝜃 = 𝜇 is the natural aka canonical parameter.

𝜇2 𝜃2
𝜅 𝜃 = = is the cumulant function
2 2
𝜙 = 𝜎2 is the dispersion parameter scale
- .

𝑦2
−
𝑒 2𝜎2
𝑎 𝑦, 𝜙 = 2
is the normalizing function
2𝜋𝜎
Other examples are: Exponential, Gamma, Binomial, …
GLM: Moment generating and cumulant
functions Mean variance
Theorem:
,

𝑀 𝑡 = 𝑒
𝜅 𝜃+𝑡𝜙 −𝜅(𝜃)
𝜙

𝜅 𝜃 + 𝑡𝜙 − 𝜅(𝜃)
↓ easier with mgf
cummulant
:

𝐾 𝑡 = easier w
𝜙 function, K(t) -

first/second/third

S
commulent part
𝜇2 of pdf derivative
Example: Normal distribution 𝜅 𝜇 = .

> evaluated@0
2 -

G get first/213
2 2 2
𝜎2𝑡2
↓
𝜇 + 𝑡𝜎 𝜇 >
-

𝐾 𝑡 = 2
− 2 = 𝜇𝑡 + .
moment
2𝜎 2𝜎 2
↓ Y M function.
first derrative : mean subin 2nd formula : you get
second derivative :
variance. exactly cummulent generating
function of normal distribution .
3rd 14th : mgf.
GLM: Mean and variance Terminology
Theorem:
Link function , exp family,
cummulent function
𝑑𝜅(𝜃)
𝐸 𝑌 =𝜇=
𝑑𝜃
𝑑 2 𝜅(𝜃)
𝑉𝑎𝑟 𝑌 =O 𝜙
𝑑𝜃 2
where
𝑑 2 𝜅(𝜃)
𝑑𝜃 2
=
𝑑 𝑑𝜅 𝜃
𝑑𝜃 𝑑𝜃
=
𝑑𝜇
d𝜃
o man can also be written as

of mean .
part

We can define the variance function

d𝜇
𝑉 𝜇 = ⇒ 𝑉𝑎𝑟 𝑌 = 𝜙𝑉(𝜇)
d𝜃
𝜃2 In general need to do as
part
Example: Normal distribution 𝜅 𝜃 = ,

2
d
of

𝜃2
𝑑
𝐸 𝑌 =
𝑑𝜃
2
#
= 𝜃 = 𝜇, 𝑉 𝜇 = 1 ⇒ 𝑉𝑎𝑟 𝑌 = 𝜎 2
Applied Question :
-

GLMQE ; Regression -

interpretation ; theory :
Hw , not exam.

Deviance& variance :
GLM: Unit deviance
𝑦𝜃−𝜅 𝜃
Suppose we want to write the EDM 𝑓 𝑦; 𝜃, 𝜙 = 𝑎 𝑦, 𝜙 𝑒 𝜙 as a
function of the mean 𝜇. Denote
might depend on transformation,
>
-

𝑡 𝑦, 𝜇 = 𝑦𝜃 − 𝜅(𝜃)

Then 𝑡 𝑦, 𝜇 has a unique max w.r.t. 𝜇 at -

𝜇 = y. This allows us to
C the unit deviance:
define a very important quantity,
Residual : Difbtw
always positive
.

predicts observed.
𝑑 𝑦, 𝜇 = 2 𝑡 𝑦, 𝑦 − 𝑡 𝑦, 𝜇

Notice that d(y, μ) = 0 only when y= μ and otherwise, d(y, μ) > 0. In

fact, d(y, μ) increases as μ moves away from y in either direction. This
shows that d(y, μ) can be interpreted as a type of distance measure
between y and μ.
when to use GLM ?
data
prev, assume normal
>
GLM: Summary
.

data natural
space M parameter.

continuous .

I
Oor 1 .

value are =
Y
.
counts

E
skewed
positive data
Classification and Logistic Regression
The linear regression model assumes the response variable Y is
quantitative (numerical) and the error terms are normally distributed. If,
instead, the response variable is qualitative, (or categorical), the task of
predicting responses is aka&classification. In such cases the error terms
are not normally distributed. Broadtopic -SVMetc
.
.

Examples of classification problems:

• Your email service determines whether to label an incoming
message as “spam” or not based on the text of the email, the subject,
and your history of interaction with the sender.
• Hand-written zip codes are scanned and stored as an image file and
a computer is programmed to classify each digit as a “0”, “1”, “2”,
… , “9”.

10
Linear Regression Approach for Binary Response

• Consider Yi ~ Bernoulli(𝜋i), where 𝜋i = P(Yi = 1 | Xi = xi).

• We could use a linear probability model (LPM) with the usual
assumptions:

P
I 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖1 + 𝛽2 𝑋𝑖2 + ⋯ + 𝛽𝑝 𝑋𝑖,𝑝 + 𝜀𝑖
𝜀𝑖 ~ 𝑁 0, 𝜎 2

• The expected values for the linear probability model from RHS are:

𝐸 𝑌𝑖 ȁ𝑋𝑖 = 𝑥𝑖 = 𝛽0 + 𝛽1 𝑋𝑖1 + 𝛽2 𝑋𝑖2 + ⋯ + 𝛽𝑝 𝑋𝑖,𝑝

cannot restrict
• But for a Bernoulli outcome from the LHS we have:
3
predictors from
𝐸 𝑌𝑖 ȁ𝑋𝑖 = 𝑥𝑖 = 𝜋𝑖 = P 𝑌𝑖 = 1 𝑋𝑖 = 𝑥𝑖 ) O to 1.
-

reality Probability function .

:
11
Problems with Linear Probability Model

• Predicted values from

linear regression are
not constrained to the
interval [0,1], even
though probabilities
should be.
General :
good idea . Violatingother
Not

assumptions as well .

12
Problems with Linear Regression
(Continued)

• Predicted values from linear regression are not constrained to the

interval [0,1], even though probabilities should be.
• Residuals from the linear model are not normally distributed because
but
they are dichotomous conditional X = x: Not Bernouilli discrete with
,

2 values
−𝑝 𝑥 if 𝑌𝑖 = 0 difx values diferrous.
𝜀𝑖 = ቊ
:

1 − 𝑝 𝑥 if 𝑌𝑖 = 1

• Also, the variance of εi is p(x)[1 – p(x)], which means it is not

constant. ~
outcomes not homogeneous .

• Furthermore, if the outcome has more than two categories, linear

regression becomes impossible unless they are ordered, and we

assume the distances between all categories are identical.
• We need a method that deals with the above-mentioned deficiencies.
GLM :
Haslinked functions Restricting predictors => Transform function (RHS)
Logistic Regression: Foundation
• Assume the outcome Y is coded as 0/1. Our goal is to specify a
model for
𝑝 𝑋 = Pr(𝑌 = 1ȁ𝑋 = 𝑥)

• In linear regression we use a linear model in the covariates X1, …,

Xp:
𝑝(𝑋) = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑝 𝑋𝑝
Numerator smaller
denominator
Logistic Function . I :
than .

• The model above, however, has a range equal to (-∞, ∞). To restrict
our range to (0,1), we need a function f:(-∞, ∞) → (0,1). Ideally, the
function will be simple to write, continuous, and monotonic. The
logistic function is a prime candidate:
standard logistic function .

exp(𝑡)
𝑓 𝑡 =
1 + exp(𝑡)
14
Logistic/Sigmoid
Function
The sigmoid function
resembles an S-shaped curve.
It takes the real numbered
input values and converts
them between 0 to L (by
shrinking from both sides,
i.e., the negative values to 0
and very high positive ones
to L). Note that it can be
decreasing as well.

intercept growth How fast curve fits between

: :

O and 1.

15
Sigmoidal Response Functions
can use for neural networks :

flexible base for more complicated models.

X
• A&sigmoid function is a function having a characteristic "S"-shaped
curve or sigmoid curve.
Fit binomial model with logistic .

• The logistic function is a prime candidate:

I
exp(𝑡)
𝑓 𝑡 = Use Logistic model
1 + exp(𝑡)
>
- want to interpret result .

• An alternative is the probit function:

Φ 𝑡
where  is the cdf of the standard normal distribution.

• Theoretically, any cumulative distribution function can serve as a

response function.
16
① interpret
② switch from probability to
Logistic Regression: Foundation odds/logodds .
• Applying the logistic function to the linear transformation of the
predictors gives three equivalent formulations. -when askto interpres additive

1. probability
depends on predictor with logistic function

exp 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑝 𝑋𝑝
I-
effect.
All 3 are
equivalent .

j
the power
𝑝 𝑋 = e to

dif :
-
1 + exp 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑝 𝑋𝑝 - of the
predictor

I
B!
2. odds C extra multiplier > % Dinodds
-
: e
𝑝 𝑋
= exp 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑝 𝑋𝑝
G
1−𝑝 𝑋 easier to but eral odds interpret ; of event , not probability .

This is no
longer additive model . Interpret : Multiplicative effect of
-

3. log (odds) or “logit” magnitude .

: As a % .
D
𝑝 𝑋
log = 𝛽0 + 𝛽11
𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑝 𝑋𝑝
1−𝑝 𝑋
Bumps /unit log transform both side.
#
-

Exercise: Show algebraically that 3 implies 1. 17

should be easy
.
Example: Bank Marketing
offered product output
people :
Data in 15-bank-full.csv come from here: ,

did they agree to buy

https://archive.ics.uci.edu/dataset/222/bank+marketing

• The data are related with direct marketing campaigns of a Portuguese

banking institution. The marketing campaigns were based on phone
calls. Often, more than one contact to the same client was required, to
access if the product (bank term deposit) would be ('yes') or not ('no')
!
subscribed.
• Explanatory variable thought possibly to affect the outcome were:
• Job occupation: admin.’, 'blue-collar’, 'entrepreneur’, 'housemaid’,
'management’, 'retired’, 'selfemployed’, 'services’, 'student’, 'technician’,
'unemployed’, 'unknown',
• Housing: has housing loan or no,
• Balance: average annual balance
• Marital status
• …

The data has 45,211 rows.

18
Logistic Regression Example

Use “glm” function to fit GLM models. “family” option determines the
distribution to be used. Choices are:
Al depends on likelihood.
:

More predictor
• binomial(link = "logit") lot of predictor good
idea : .
Drop a

• gaussian(link = "identity") result :

958e ance
Balance
• Gamma(link = "inverse") 3
04 Intercept
: 1 .

j
.
0 . 00

↑
• inverse.gaussian(link = "1/mu^2") ↓ predict Clogit
(bankblance
....

plot
,
1000
• poisson(link = "log") > *

exp (coeff
(logitO
>
1000 of
"For each extra
euros

• quasi(link = "identity", variance = "constant") balancing increase, ...

of euros

balance the odds of obtaining

• quasibinomial(link = "logit") the product increases by 4 %

• quasipoisson(link = "log") -

See R code
19
Logistic Regression Example: Some Results
> logit0 = glm(y ~ balance, data = bank, family = "binomial")
> summary(logit0)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.081e+00 1.595e-02 -130.50 <2e-16 ***
balance 3.958e-05 3.840e-06 10.31 <2e-16 ***

Equation: log(odds of subsrcibing) = -2.081 + 0.00003958*balance

PartA :
Write down estimated model .

Interpretation of the slope:

• If the balance increases by 1 euro, then the log of the odds (of
subscribing) will increase by 0.00003958 units.
• Equivalent (and better!): the odds will be multiplied byF e0.00003958 =
3
1.00004 for each extra 1 euro of balance.
• Final interpretation : For each extra 1000 euros of balance the
odds of obtaining the product increase by 4%
20
Logistic Regression Example:
Dichotomous Covariate
odds will decrease if you increase the variable.
R output: Negative :

I
o
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.60687 0.01892 -84.93 <2e-16 ***
housingyes -0.87696 0.03030 -28.95 <2e-16 ***

Have mortgage
: Less
likely
to subscribe. 𝑝Ƹ 𝑋𝑖
log = 𝛽መ0 + 𝛽መ1 𝑋𝑖
1 − 𝑝Ƹ 𝑋𝑖
odds that have
calculating
housing loan or .
not

21
Logistic Regression Example:
Dichotomous Covariate
• Let’s use only the housing variable
> logit0 = glm(y ~ housing, data = bank, family =
"binomial")
> summary(logit0)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.60687 0.01892 -84.93 <2e-16 ***
housingyes -0.87696 0.03030 -28.95 <2e-16 ***

• Report the regression equation:

𝑝Ƹ
log = −1.60687−0.87696 ∗ Housing
1 − 𝑝Ƹ

22
Logistic Regression Example:
Dichotomous Covariate
Compare the equation to cross-tabs results:

No housing loan prob of subscribing = 16.7%

logscale :

With housing loan prob of subscribing = 7.7%

d
𝑝Ƹ
log = −1.60687−0.87696 ∗ Housing
1 − 𝑝Ƹ
Calculate probability, odds and log-odds for each category:
No housing loan Housing loan
? Probability ?

? Odds ?

A
C
? Log-odds ? How to
go
this num ber.
bability Sigmoid Function
.
exponentiate 23
This interpretation
Logistic Regression Example:
IMPORTANT
Dichotomous Covariate
Is
-

• The odds can be interpreted as the relative risk of an event A

happening versus not happening. That is,

𝑝𝑟𝑜𝑏(𝐴) - Happening.
𝑜𝑑𝑑𝑠(𝐴) =
↓

5
-

1 – 𝑝𝑟𝑜𝑏(𝐴) > Not happening

Odds = 0.2 = 1:5 means likelihood of subscribing to the service is 0.2 times the likelihood of not
subscribing, within the no housing loan group. Or, you can say that the not subscribing is 5 times
more likely than subscribing within the no housing group.
-

-
odds of subscribing 1 5.
:

Exercise: Interpret the housing loan = yes group odds.

• The odds ratio is the ratio of odds for cases with different x values.
0.200514
Odds ratio = 0.083423
= 2.4036
This means that in the absence of a house loan, the odds of subscribing are 2.4 times higher,
compared to the group which have a house loan.
25
Logistic Regression: Estimating Betas
• Just as in simple regression, the coefficients β0 and β1 are
unknown and must be estimated using the available data.
Maximum likelihood is the most commonly used method for this
problem.
• When 𝑌𝑖 ȁ𝑥𝑖 ~ 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖 𝑝 𝑥𝑖 , then the likelihood function is:
𝑛
𝑦𝑖 1−𝑦𝑖
𝐿 𝛽0 , 𝛽1 = ෑ 𝑝 𝑥𝑖 1 − 𝑝 𝑥𝑖
𝑖=1
𝑝 𝑥𝑖 1
Since = 𝑒 𝛽0 +𝛽1 𝑥𝑖 ⇒ 1 − 𝑝 𝑥𝑖 = and therefore
1−𝑝 𝑥𝑖 1+𝑒 𝛽0 +𝛽1 𝑥𝑖
𝑛
𝛽0 +𝛽1 𝑥𝑖 𝑦𝑖
1
𝐿 𝛽0 , 𝛽1 = ෑ 𝑒
1 + 𝑒𝛽0 +𝛽1 𝑥𝑖
cannot solve 𝑖=1
Taking grad analytically
:
.
𝑛 𝑛

⇒ 𝑙 𝛽0 , 𝛽1 = ෍ 𝑦𝑖 𝛽0 + 𝛽1 𝑥𝑖 − ෍ log 1 + 𝑒 𝛽0 +𝛽1 𝑥𝑖
𝑖=1 𝑖=1 26
Logistic Regression: Estimating Betas
• The logistic regression loglikelihood is:
𝑛 𝑛

𝑙 𝛽0 , 𝛽1 = ෍ 𝑦𝑖 𝛽0 + 𝛽1 𝑥𝑖 − ෍ log 1 + 𝑒 𝛽0 +𝛽1 𝑥𝑖
𝑖=1 𝑖=1
𝑛
𝜕𝑙 𝛽0 , 𝛽1 𝑒 𝛽0 +𝛽1 𝑥𝑖
⇒ = ෍ 𝑦𝑖 −
𝜕𝛽0 1 + 𝑒𝛽0 +𝛽1 𝑥𝑖
𝑖=1
• Unlike the closed-form analytical solution (i.e., the normal
equations) available for linear regression, the score function
for logistic regression is transcendental. That is, there is no
closed-form solution so we must solve with numerical
optimization methods such as Newton’s method (IRLS) which
is done in several iterations.
• Although we have focused on simple logistic regression, this
likelihood (and score) function generalize directly to the case
of p predictors. 27
Deviance >
-
Difbtw Residuals of difmodel .

Most parameterised.

It measures the deviance of

the fitted generalized linear
model with respect to a perfect
model for the sample.
This perfect model, known as
the saturated model, is the
Model : underfitted,
model that perfectly fits the .
underparameterised
data, in the sense that the fitted
responses equal the observed
responses.
For a linear model, deviance is
the sum of squared errors (SSE)
and D0 is the total sum of
squares (SST).
Deviance (continued)
• The deviance for a GLM m is defined as How well fitted likelihood model gets closer
𝐷(𝒚, 𝑚) = −2 𝑙 𝜷 ෡ − 𝑙𝑆 𝜙 I to the saturated model .

• It measures the difference of the fitted generalized Deviance between models

linear model with respect to a perfect model for the
sample. whatever we want to fit.
• This perfect model, known as the saturated model, is
the model that perfectly fits the data, in the sense
that the fitted responses equal the observed
responses.
Gap btw
• In the linear case, D = SSE. Best &

• In R, the “Residual Deviance” is two times the Worst .

difference in the log likelihood of the saturated Small

(perfect) model and our model. = Best
Deviance
• In R, Null Deviance = 2[LL(Saturated Model) - LL(Null
likelihood ratio test
Model)]
like a

• Note, deviance is not equal to KL distance

• But 𝐷 𝒚, 𝑚2 − 𝐷 𝒚, 𝑚1 = 𝐾𝐿 𝒚, 𝑚2 − 𝐾𝐿 𝒚, 𝑚1
• See Hastie (1987) article in The American Statistician
for more details.
Don't have to remove entire predictor

Rah with reduced

dataset with less
predictors.
S

& reduced predictor rested in

>
-
Likelihood Ratio Test
Need deviance
larger dataset.

2 nested models ;
O compare
·

↓ model .
switch to Chi-square
expected : we removed 4 predictors .

· can also do this manually

/
Final exam : Not Gamma :
Logistic/Poisson Model .

>
-

output ,
confidence level
Logistic Regression: Model Selection
• In multiple linear regression there were many selection criteria
available: R2, R2-adjusted, AIC, BIC, …
• R2 is based on the underlying assumption that you are fitting a linear
model. If you aren’t fitting a linear model, you shouldn’t use it!
• Only AIC and BIC can be adapted to the GLM with the same
formula but now using the logistic likelihood:
AIC = -2log L + 2p
BIC = -2log L + log(n) p
• Rule is the same: smaller AIC or BIC means better model.
• Note: regsubsets function from the leaps package is
applicable only for linear regression with normal errors. For GLM
you must use bestglm function from the bestglm package.
• See R code for the healthcare example

All Command 7360 - CLi & TL1
100% (3)
All Command 7360 - CLi & TL1
7 pages
Activity1 - NCP 4202
No ratings yet
Activity1 - NCP 4202
3 pages
Leather Industry in Sri Lanka
No ratings yet
Leather Industry in Sri Lanka
6 pages
7 generalized linear models padua
No ratings yet
7 generalized linear models padua
29 pages
Generalised linear model
No ratings yet
Generalised linear model
4 pages
glm
No ratings yet
glm
4 pages
θ, then the probability density function for Y, θ), can be written as  y∣=exp  ybcd  y θ) is called the natural −m  n y ,
No ratings yet
θ, then the probability density function for Y, θ), can be written as  y∣=exp  ybcd  y θ) is called the natural −m  n y ,
6 pages
s-m-s-t-c--lecture-2425
No ratings yet
s-m-s-t-c--lecture-2425
45 pages
Econometria Avanzada: Generalized Linear Models
No ratings yet
Econometria Avanzada: Generalized Linear Models
30 pages
Capstone - Https:Users - Ox.ac - Uk: Jesu0073:Lecture 3:LogisticRegression
No ratings yet
Capstone - Https:Users - Ox.ac - Uk: Jesu0073:Lecture 3:LogisticRegression
17 pages
Httpsemas2.Ui - Ac.idpluginfile - Php2375826mod Resourcecontent1kuliah1 2 PDF
No ratings yet
Httpsemas2.Ui - Ac.idpluginfile - Php2375826mod Resourcecontent1kuliah1 2 PDF
31 pages
Probabilistic Learning and Generalized Linear Models (GLMS)
No ratings yet
Probabilistic Learning and Generalized Linear Models (GLMS)
38 pages
Ch13slides Generalized Linear Models
No ratings yet
Ch13slides Generalized Linear Models
24 pages
w6 - Statistical Modelling
No ratings yet
w6 - Statistical Modelling
24 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
9 pages
07 GLM
No ratings yet
07 GLM
49 pages
Generalized Linear Models: Simon Jackman Stanford University
No ratings yet
Generalized Linear Models: Simon Jackman Stanford University
7 pages
09 Discrete Choice 1 Notes
No ratings yet
09 Discrete Choice 1 Notes
17 pages
stat5900_f24_lec9
No ratings yet
stat5900_f24_lec9
12 pages
(GAM) Application PDF
No ratings yet
(GAM) Application PDF
30 pages
Lecture 11
No ratings yet
Lecture 11
6 pages
Lecture Notes 5
100% (1)
Lecture Notes 5
53 pages
Lecture BDS 2 23 24 Print
No ratings yet
Lecture BDS 2 23 24 Print
10 pages
Logistic regression (6)
No ratings yet
Logistic regression (6)
26 pages
Categorical-Notes-Ch4
No ratings yet
Categorical-Notes-Ch4
40 pages
ML Unit 3
No ratings yet
ML Unit 3
20 pages
Generalized Linear Models
No ratings yet
Generalized Linear Models
7 pages
ML Assignment
No ratings yet
ML Assignment
20 pages
Understanding Maximum Likelihood
No ratings yet
Understanding Maximum Likelihood
5 pages
Chapman-Kolmogorov_Equations_37_This_produces_the_48511
No ratings yet
Chapman-Kolmogorov_Equations_37_This_produces_the_48511
9 pages
ES714glm Generalized Linear Models
No ratings yet
ES714glm Generalized Linear Models
26 pages
General Additive Model - Michael Clark
No ratings yet
General Additive Model - Michael Clark
31 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Random notes
No ratings yet
Random notes
11 pages
GLMM Revision
No ratings yet
GLMM Revision
15 pages
Exponential Family
No ratings yet
Exponential Family
13 pages
Ho GLM
No ratings yet
Ho GLM
5 pages
Logistics Regression
No ratings yet
Logistics Regression
56 pages
Generalized Linear Models-1
No ratings yet
Generalized Linear Models-1
29 pages
GAMS Getting Started
No ratings yet
GAMS Getting Started
31 pages
Roni Presentation
No ratings yet
Roni Presentation
17 pages
Presentation Generalized Linear Model Theory
No ratings yet
Presentation Generalized Linear Model Theory
77 pages
Lecture 13: Introduction To Generalized Linear Models: 21 November 2007
No ratings yet
Lecture 13: Introduction To Generalized Linear Models: 21 November 2007
12 pages
GLM Slides 2 Exp Family
No ratings yet
GLM Slides 2 Exp Family
35 pages
Generalised Linear Models
No ratings yet
Generalised Linear Models
24 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
Econometrics Lecture Notes
No ratings yet
Econometrics Lecture Notes
16 pages
UnivariateRegression 3
No ratings yet
UnivariateRegression 3
81 pages
Logistic Nota
No ratings yet
Logistic Nota
87 pages
DS535 Note 4 (With Marks)
No ratings yet
DS535 Note 4 (With Marks)
18 pages
Bio24_Rathouz
No ratings yet
Bio24_Rathouz
45 pages
GLM Theory
No ratings yet
GLM Theory
46 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
30 pages
CQF ML Lab Estimating Default Probability With Logistic Regression
No ratings yet
CQF ML Lab Estimating Default Probability With Logistic Regression
7 pages
Generalized Linear Model: Badr Missaoui
No ratings yet
Generalized Linear Model: Badr Missaoui
35 pages
Lecture 8
No ratings yet
Lecture 8
22 pages
Generalized Linear Models: FX Axb C DX Axb C DX
No ratings yet
Generalized Linear Models: FX Axb C DX Axb C DX
11 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Chapter 5-LDVM-2024
No ratings yet
Chapter 5-LDVM-2024
27 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
26GeneralizedLinearModelBernoulliAnnotated PDF
No ratings yet
26GeneralizedLinearModelBernoulliAnnotated PDF
46 pages
Lecture_7
No ratings yet
Lecture_7
10 pages
The Dirac equation
From Everand
The Dirac equation
Alessio Mangoni
No ratings yet
X35 User Manual English Ver1.2
No ratings yet
X35 User Manual English Ver1.2
124 pages
Windows Phone
No ratings yet
Windows Phone
29 pages
Osisense XCC Xcc1510ps10y
No ratings yet
Osisense XCC Xcc1510ps10y
2 pages
12th Maths Answer Key For Quarterly Exam 2019 Question Paper English Medium PDF
No ratings yet
12th Maths Answer Key For Quarterly Exam 2019 Question Paper English Medium PDF
8 pages
Important Mcq-Digital Electronics
No ratings yet
Important Mcq-Digital Electronics
27 pages
Fulltext01 PDF
No ratings yet
Fulltext01 PDF
66 pages
Neo liberalism and the Architecture of the Post Professional Era Hossein Sadri All Chapters Instant Download
100% (1)
Neo liberalism and the Architecture of the Post Professional Era Hossein Sadri All Chapters Instant Download
65 pages
Social Studies
No ratings yet
Social Studies
21 pages
A Consistent, User Friendly Interface For Running A Variety of Underwater Acoustic Propagation Codes
No ratings yet
A Consistent, User Friendly Interface For Running A Variety of Underwater Acoustic Propagation Codes
7 pages
Lead Scoring Case Study
No ratings yet
Lead Scoring Case Study
14 pages
HGA Cat
No ratings yet
HGA Cat
31 pages
Tesco CMR and Customer Satisfaction PDF
No ratings yet
Tesco CMR and Customer Satisfaction PDF
7 pages
RESOLUTION - UN General Assembly (64 - 134) Proclamation of 2010 As The International Year of Youth - Dialogue and Mutual Understanding
No ratings yet
RESOLUTION - UN General Assembly (64 - 134) Proclamation of 2010 As The International Year of Youth - Dialogue and Mutual Understanding
2 pages
Lateritic Soil Stabilization Using Zyco Soil
No ratings yet
Lateritic Soil Stabilization Using Zyco Soil
7 pages
MG301-302 Fundamentals of Management PYQ Solutions (v1)
No ratings yet
MG301-302 Fundamentals of Management PYQ Solutions (v1)
46 pages
Telescopic Cylinder
0% (1)
Telescopic Cylinder
4 pages
Reply Show Cause Notice DT 23.04.2024
No ratings yet
Reply Show Cause Notice DT 23.04.2024
4 pages
Problems Cce
No ratings yet
Problems Cce
4 pages
A Comparison Between Some Mechanical Properties of Self Compacting Concrete and Normal Concrete
No ratings yet
A Comparison Between Some Mechanical Properties of Self Compacting Concrete and Normal Concrete
13 pages
Vocathlon 2023 TOSYA Draft
No ratings yet
Vocathlon 2023 TOSYA Draft
6 pages
Aalborg Universitet: Hansen, Rico Hjerm
No ratings yet
Aalborg Universitet: Hansen, Rico Hjerm
291 pages
Speedtest CLI - Internet Speed Test For The Command Line
No ratings yet
Speedtest CLI - Internet Speed Test For The Command Line
6 pages
Welcome Aboard !: Unit 2
No ratings yet
Welcome Aboard !: Unit 2
6 pages
Mens Health
No ratings yet
Mens Health
116 pages
Hospitality Quality New Directions and New Challenges
No ratings yet
Hospitality Quality New Directions and New Challenges
16 pages
A Research Proposal
100% (1)
A Research Proposal
4 pages
Lustica Bay Marina Montenero Brochure PDF
No ratings yet
Lustica Bay Marina Montenero Brochure PDF
16 pages

15-GLM

Uploaded by

15-GLM

Uploaded by

STAT 5703

Statistical Inference and Modeling for Data Science

Such distribution is said to be in a k-parameter exponential

A Generalized Linear Model (GLM) extends the classic linear

1. Y|x ~ Exponential family exponential dispersion function .

2. Transformation between the outcome and the predictors:

where 𝜙 is called a dispersion parameter (and could be known).

Note: For a fixed value of the dispersion parameter ϕ it is a one-

𝑦𝜃−𝜅 𝜃 GLM with normal

𝜃 = 𝜇 is the natural aka canonical parameter.

We can define the variance function

Then 𝑡 𝑦, 𝜇 has a unique max w.r.t. 𝜇 at -

Notice that d(y, μ) = 0 only when y= μ and otherwise, d(y, μ) > 0. In

Examples of classification problems:

• Consider Yi ~ Bernoulli(𝜋i), where 𝜋i = P(Yi = 1 | Xi = xi).

𝐸 𝑌𝑖 ȁ𝑋𝑖 = 𝑥𝑖 = 𝛽0 + 𝛽1 𝑋𝑖1 + 𝛽2 𝑋𝑖2 + ⋯ + 𝛽𝑝 𝑋𝑖,𝑝

reality Probability function .

• Predicted values from

• Predicted values from linear regression are not constrained to the

• Also, the variance of εi is p(x)[1 – p(x)], which means it is not

• Furthermore, if the outcome has more than two categories, linear

regression becomes impossible unless they are ordered, and we

• In linear regression we use a linear model in the covariates X1, …,

intercept growth How fast curve fits between

flexible base for more complicated models.

• The logistic function is a prime candidate:

• An alternative is the probit function:

• Theoretically, any cumulative distribution function can serve as a

3. log (odds) or “logit” magnitude .

Exercise: Show algebraically that 3 implies 1. 17

did they agree to buy

• The data are related with direct marketing campaigns of a Portuguese

The data has 45,211 rows.

• gaussian(link = "identity") result :

• quasi(link = "identity", variance = "constant") balancing increase, ...

balance the odds of obtaining

Equation: log(odds of subsrcibing) = -2.081 + 0.00003958*balance

Interpretation of the slope:

• Report the regression equation:

No housing loan prob of subscribing = 16.7%

With housing loan prob of subscribing = 7.7%

• The odds can be interpreted as the relative risk of an event A

1 – 𝑝𝑟𝑜𝑏(𝐴) > Not happening

Exercise: Interpret the housing loan = yes group odds.

It measures the deviance of

• It measures the difference of the fitted generalized Deviance between models

• In R, the “Residual Deviance” is two times the Worst .

difference in the log likelihood of the saturated Small

• Note, deviance is not equal to KL distance

Rah with reduced

& reduced predictor rested in

· can also do this manually

You might also like