0% found this document useful (0 votes)

12 views30 pages

21 Multiple Regression

The document provides an overview of multiple regression analysis in R, highlighting the importance of interpreting slope and intercept values, and the potential for confounding variables to bias results. It discusses how to address confounders through design, subclassification, and control variables, emphasizing the need for careful analysis to isolate the effects of independent variables on a dependent variable. Additionally, it illustrates the application of multiple regression with examples and the significance of coefficients in understanding relationships between variables.

Uploaded by

bollfills

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views30 pages

21 Multiple Regression

Uploaded by

bollfills

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

DATA IN POLITICS I

MULTIPLE REGRESSION

D r. A n n i e Wa t s o n
November 12, 2024
[email protected]
REVIEW
Review
• Running a regression in R is simple
• lm(y ~ x)
• Interpreting it is more tricky
• Two main outputs: slope and intercept
• Slope (aka beta-hat, beta coefficient, regression coefficient)
• “What is the predicted change in Y when X increases by 1 unit”
• Trick: What is “one unit” for X in your analysis?
• Intercept (aka alpha-hat, Y intercept, constant)
• “What is the predicted value of Y when X is zero?”
• Trick: Is this actually meaningful?
Review Example 1
Table of Regression Output
Socialist Thermometer
Unions 0.58
Therm
(0.013)
Constant 4.58
• 𝑌𝑖 = 𝛼 + 𝛽1𝑋𝑖 + 𝜀𝑖
(0.822)
• Regression estimates 𝛼ො and 𝛽መ
• You tell me: in this regression, what
(conceptually) are 𝑋𝑖 and 𝑌𝑖 ? N 6354
• Which number is our estimate of 𝛼? ො How would R2 0.2377
we interpret it?
• Which number is our estimate for 𝛽? መ How would
we interpret it?
Review Example 2
DV: % of voters who
turn out in that
- Simplified from real study in Ghana precinct (0-100)
- Unit of analysis: voting precinct Had election - 2.00
- IV: Whether precinct had election monitors: monitor (0 or 1)
- Treatment group = 1, Control group = 0 (.012)
- DV: voter turnout (as a %, 0-100) Constant 76.7
(4.6)
Discuss with your group:
- What does the Intercept tell us here?
- What does it mean here for X to move from 0 to 1?
- What was the estimated treatment effect of election monitors in this
experiment?
REGRESSION AND CAUSALITY
From Math to Interpretation
• Does the slope have a causal interpretation?
• We often want it to!
• “Liking unions more causes you to like socialists more.”
• It can, but it’s certainly not foreordained.
• You can regress anything on anything else and get a result.
• E.g. you’d find that ambulance rides predict your risk of dying. This does not
mean that ambulance rides cause people to die. (Quite the opposite!)
• Even sillier: How does the number of oranges you eat per day predict the
number of your grandparents who were born abroad?
• For a casual interpretation, we need (at a minimum), an argument
about:
• Why X precedes Y temporally
• Why the people high in X are similar to the people low in X.
Confounders / Omitted Variables
• Something associated with our X of interest also affects Y.

• As a result, we may have failed to isolate the effect of X on Y.

• We’ve seen this before.

• Maybe Francine didn’t go to high school, and Allison did.
• Maybe New Jersey started a new job training program around the same time that it raised the minimum
wage.
• Maybe partisanship affects both how you feel about unions, and your feelings towards socialists

• These kinds of concerns are everywhere, and a constant focus in social scientific analysis.

• In regression contexts, confounders are often called “omitted variables,” since there is some
promise that, if we cease omitting them, it could solve some of our problems (coming up).
Hypothetical Example
• Say women tend to both
• Dislike unions
• Dislike socialists
• This is a potential confounder.
• Suggests our result is biased
• Bias means our regression results are
too high or too low, compared to the
“true” effect of X on Y.
• Nothing to do with bias in terms of
ideology etc.
• In this case: maybe true effect is 0, but
 More women? More men? → we get a positive result.
Confounders / Omitted Variables
• Omitted variables: X and O are X and O are X and O are
positively negatively uncorrelated
• X = Variable of interest correlated correlated
• Y = Dependent variable
• O = Omitted variable
O has a positive
• O is only a problem if it’s effect on Y
No bias!
correlated with both the O has a
IV of interest, and the negative effect No bias!
DV. on Y
O has no effect
No bias! No bias! No bias!
on Y
Confounders / Omitted Variables
• If there’s only 1 X and O are X and O are X and O are
positively negatively uncorrelated
confounder, we can correlated correlated
actually say something
about direction of bias
O has a positive Positive
• Meaning: Are you getting effect on Y bias
Negative bias No bias!
a coefficient that’s too big, O has a
or too small? Negative
negative effect Positive bias No bias!
bias
• With multiple on Y
confounders, gets O has no effect
No bias! No bias! No bias!
complicated really fast on Y

• And what if you don’t

think of a key confounder?
Dealing with Potential Confounders in Regression
1. Rely on your design
• Maybe your X-variable is randomly assigned.
• If so, you start with a strong argument that the people high in X are similar to the
people low in X.
• Could imagine an experiment where randomly assign some treatment to change
perceptions of unions, then see if that also affects feelings towards socialists

2. Subclassification
• Suppose men have more positive feelings toward unions, and more positive feelings
toward socialists.
• This is a classic confounding problem.
• It goes away if we estimate separate regressions (socialist liking = 𝛼 + 𝛽1Union
liking) within gender subgroups.
• But this requires clear categories, and the number of regressions explodes as we
attempt to “control for” more and more things.
Dealing with Potential Confounders in Regression

3. Control variables
• Think about the (potential) positive association between having positive
feelings toward socialists, and Republican partisanship
• It’s possible to estimate more than one linear relationship simultaneously.
• That is, we can have more than one independent variable in a regression.
• This is what we call multiple regression.
MULTIPLE REGRESSION
Multiple Regression
• Model
• Before: 𝑌𝑖 = 𝛼 + 𝛽1𝑋𝑖 + 𝜀𝑖
• Now: 𝑌𝑖 = 𝛼 + 𝛽1𝑋1𝑖 + 𝛽2𝑋2𝑖 + ⋯ + 𝛽𝑛𝑋𝑛𝑖 + 𝜀𝑖
• We are saying that Y is a linear function of multiple X variables

• Decision rule
• Before: Find the 𝛼 and 𝛽1 that minimize the SSR.
• Now: Find the 𝛼 and 𝛽1, 𝛽2, … , 𝛽𝑛 that jointly minimize the SSR.

• Estimation
σ𝑛 ത ത
𝑖=1 𝑌𝑖 −𝑌 (𝑋𝑖−𝑋)
• Before: 𝛽1 = σ𝑛 ത 2
𝑖=1(𝑋𝑖−𝑋)
• Now: Requires matrix algebra. (R will do this for us.)
What does multiple regression do?
• One-variable regression: Estimate of 𝛽1 comes from comparing all variation in
𝑋1 ((liking unions) to variation in Y (liking socialists)

• Multiple regression: Estimates of 𝛽1 (slope of union_th) and 𝛽2 (slope of

partisanship) come only from cases where X1 and X2 depart from each other.
• “Holds constant” partisanship, and estimates effect of liking unions
• “Holds constant” liking unions, and estimates effect of partisanship
• Conceptually this is similar to subclassification! But we’re “holding constant” one variable by
estimating the linear relationship of best fit, rather than looking within categories.
• If X1 and X2 are perfectly correlated in our dataset, we can’t even do this. (We couldn’t do
subclassification, either.)

• As a result, “controlling for” another variable can change our estimate on 𝛽1.
Perhaps a lot.
Intuition for Multiple Regression
• Worried that some omitted variable, X2, is biasing our result
• So, estimate 𝑌𝑖 = 𝛼 + 𝛽1𝑋1𝑖 + 𝛽2𝑋2𝑖 + 𝜀𝑖
• In R, lm(y ~ x1 + x2)
• Behind the scenes in R: to get 𝛽1 , multiple regression is like the following:
• Regresses X1 on X2, and takes the residual for X1.
• This is the variation in X1 that’s uncorrelated with X2
• So, X2 no longer a confounder: uncorrelated w. residual for X1
• Then regresses Y on the residual of X1
• End result:
• 𝛼 – still the intercept. Predicted value of Y when X1 and X2 are BOTH zero
• 𝛽1 –Slope of line of best fit between X1 and Y, once we’ve accounted for X2
• 𝛽2 –Slope of line of best fit between X2 and Y, once we’ve accounted for X1
Visualizing Multiple Regression (2 predictors)
One X: Fit a line through a 2-D scatterplot Two X’s: Fit a plane through
a 3-D cloud
Implementation in R
Only ever 1 DV Multiple IVs separated by “+”
Socialist Socialist Socialist
Thermometer Thermometer Thermometer
Union Thermometer (0-1)

Strong Republican

High Education

Intercept

N
R2
Partisanship has 7 levels, from 0 = Strong Democrat to 1 = Strong Republican
Education has 15 levels ranging from 0 = Less than 1st grade to 1 = Ph.D.
Socialist Socialist Socialist
Thermometer Thermometer Thermometer
Union Thermometer (0-1) 58.22
(1.268)
Strong Republican --

High Education --

Intercept 4.26
(0.798)

N 6,689
R2 0.239
Partisanship has 7 levels, from 0 = Strong Democrat to 1 = Strong Republican
Education has 15 levels ranging from 0 = Less than 1st grade to 1 = Ph.D.
Socialist Socialist Socialist
Thermometer Thermometer Thermometer
Union Thermometer (0-1) 58.22 33.06
(1.268) (1.197)
Strong Republican -- -37.87
(0.76)
High Education -- --

Intercept 4.26 37.14

(0.798) (0.95)

N 6,689 6,674
R2 0.239 0.445
Partisanship has 7 levels, from 0 = Strong Democrat to 1 = Strong Republican
Education has 15 levels ranging from 0 = Less than 1st grade to 1 = Ph.D.
Socialist Socialist Socialist
Thermometer Thermometer Thermometer
Union Thermometer (0-1) 58.22 33.06
(1.268) (1.197)
Strong Republican -- -37.87
(0.76)
High Education -- --

Intercept 4.26 37.14

(0.798) (0.95)
Predicted value of the
DV when “Union therm”
N = 0 AND ”Strong 6,689 6,674
2
Republican” = 0
R 0.239a 0.445
Partisanship has 7 levels, from 0 = Strong Democrat to 1 = Strong Republican
Education has 15 levels ranging from 0 = Less than 1st grade to 1 = Ph.D.
Socialist Socialist Socialist
Thermometer Thermometer Thermometer
Union Thermometer (0-1) 58.22 33.06 33.16
(1.268) (1.197) (1.21)
Strong Republican (0-1) -- -37.87 -37.64
(0.76) (0.778)
High Education (0-1) -- -- 0.2163
(0.132)
Intercept 4.26 37.14 35.98
(0.798) (0.95) (1.21)

N 6,689 6,674 6,582

R2 0.239a 0.445 0.4438
Partisanship has 7 levels, from 0 = Strong Democrat to 1 = Strong Republican
Education has 15 levels ranging from 0 = Less than 1st grade to 1 = Ph.D.
Socialist Socialist Socialist
Thermometer Thermometer Thermometer
Union Thermometer (0-1) 58.22 33.06 33.16
(1.268) (1.197) (1.21)
Strong Republican (0-1) -- -37.87 -37.64
(0.76) (0.778)
High Education (0-1) -- -- 0.2163
(0.132)
Intercept 4.26 37.14 35.98
(0.798) (0.95) (1.21)
Predicted value of the
DV when all three IVs = 0
N 6,689 6,674 6,582
R2 0.239a 0.445 0.4438
Partisanship has 7 levels, from 0 = Strong Democrat to 1 = Strong Republican
Education has 15 levels ranging from 0 = Less than 1st grade to 1 = Ph.D.
Multiple Regression
Socialist
• What does the coefficient (e.g. 𝛽1 = 33.06) mean
now? Thermometer
Union Thermometer (0 – 1) 33.06
• The linear relationship between Union Therm and (1.197)
the DV that minimizes SSR… once we’ve Strong Republican = 1 -37.87
accounted for the linear relationship between
partisanship and the DV. (0.76)
Intercept 37.14
• Likewise, 𝛽2 (=-37.87) is the linear relationship (0.95)
between partisanship and the DV that minimizes
SSR… once we’ve accounted for the linear
relationship between Union Therm and the DV. N 6,674
R2 0.445
• Have we “controlled for” Partisanship? Partisanship has 7 levels, from 0 = Strong
• Yes, in a sense, but in a different way than in the Democrat to 1 = Strong Republican
subsetting approach.
• We haven’t isolated observations that are all exactly Education has 15 levels ranging from 0 = Less
the same respect to partisanship. Rather, we’ve than 1st grade to 1 = Ph.D.
assumed the relationship is linear, and analyzed only
the variance that was left over, after a linear prediction.
Multiple Regression – The Perils
• Lots. This is just for starters.

• You might not include (or even have a measure of) all the
confounding variables.

• It is sensitive to “outlier” observations.

• If any of your variables have a nonlinear relationship with

the DV, results can be very misleading.

• If one of the X’s has a causal relationship with another one

of them, things can also be misleading.
• Imagine controlling for “Minutes exercising per week” and “VO2
Max” in a model predicting “Time in a 10-mile race.”

• A linear model is an awkward way to analyze categorical

data.
Multiple Regression – The Perils (continued)
• You can regress anything on anything, and it’s
easy to read too much into the relationships
you uncover.

• There are temptations to data-mine.

Multiple Regression – The Promise
• This basic model is highly adaptable.

• If you think there is a nonlinear relationship, there are ways to model

that.
• E.g., square an X variable and include it in the model.

• You can examine interactive relationships between different variables

of interest.
• “I think the effect of fiscal stimulus on economic recovery depends on whether
a country has a parliamentary or presidential system.”

• This can be adapted to make sense of far more complex data

structures
• Children inside of schools inside of U.S. states, which develop over time. (I.e.
multilevel panel data.)
MULTIPLE REGRESSION IN R

Lecture 4
No ratings yet
Lecture 4
25 pages
Lec 6
No ratings yet
Lec 6
133 pages
A Guide To Interpreting Regression Tables
No ratings yet
A Guide To Interpreting Regression Tables
15 pages
Lecture06 MultReg
No ratings yet
Lecture06 MultReg
38 pages
Regression Analysis Essentials
No ratings yet
Regression Analysis Essentials
43 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
Econometrics Notes
No ratings yet
Econometrics Notes
95 pages
Asite2 Chapter 12a
No ratings yet
Asite2 Chapter 12a
63 pages
Bivariate Regression Analysis: The Beginning of Many Types of Regression
No ratings yet
Bivariate Regression Analysis: The Beginning of Many Types of Regression
40 pages
Lec 3EFCFull
No ratings yet
Lec 3EFCFull
50 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
STAT630Slide Adv Data Analysis
0% (1)
STAT630Slide Adv Data Analysis
238 pages
HLST 2302 Lecture 5
No ratings yet
HLST 2302 Lecture 5
32 pages
Linear Regression Essentials
No ratings yet
Linear Regression Essentials
2 pages
Linear Regression Analysis Guide
100% (5)
Linear Regression Analysis Guide
2 pages
OLS Assumptions & Issues Guide
No ratings yet
OLS Assumptions & Issues Guide
4 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Review: Multiple Regression: Holding The Other Explanatory Variables Constant or Fixed
No ratings yet
Review: Multiple Regression: Holding The Other Explanatory Variables Constant or Fixed
7 pages
FinalExam Mar21 Solutions
No ratings yet
FinalExam Mar21 Solutions
9 pages
Chap3 - Multiple Regression
No ratings yet
Chap3 - Multiple Regression
56 pages
Econometrics Cheat Sheet
No ratings yet
Econometrics Cheat Sheet
4 pages
Econometrics Lecture4 MultipleRegression
No ratings yet
Econometrics Lecture4 MultipleRegression
40 pages
A1 Regression
No ratings yet
A1 Regression
31 pages
MIT Microeconomics 14.32 Final Review
No ratings yet
MIT Microeconomics 14.32 Final Review
5 pages
Aqt 1
No ratings yet
Aqt 1
33 pages
Ra Web
No ratings yet
Ra Web
70 pages
17.874 Lecture Notes Part 6: Panel Models
No ratings yet
17.874 Lecture Notes Part 6: Panel Models
13 pages
SO491 24AT Week 8 Slides
No ratings yet
SO491 24AT Week 8 Slides
22 pages
Correlation and Regression 2
No ratings yet
Correlation and Regression 2
24 pages
Regression Analysis (Simple)
100% (1)
Regression Analysis (Simple)
8 pages
Linear Regression 101
No ratings yet
Linear Regression 101
20 pages
An Overview of Regression Analysis: Notes
No ratings yet
An Overview of Regression Analysis: Notes
5 pages
Data Analysis
No ratings yet
Data Analysis
70 pages
RiP Final Study
No ratings yet
RiP Final Study
35 pages
Lecture 3 - LRM
No ratings yet
Lecture 3 - LRM
40 pages
Beck (2010)
No ratings yet
Beck (2010)
6 pages
Class 8
No ratings yet
Class 8
9 pages
Multiple Regression Explained
100% (2)
Multiple Regression Explained
23 pages
Simple Linear Regression Part I - Updated FA18
No ratings yet
Simple Linear Regression Part I - Updated FA18
59 pages
Regression: Nisma Merdad
No ratings yet
Regression: Nisma Merdad
8 pages
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
Simple Linear Regression Scott M Lynch
No ratings yet
Simple Linear Regression Scott M Lynch
111 pages
Ch10-Bekes Kezdi Data Analysis Slides v2
No ratings yet
Ch10-Bekes Kezdi Data Analysis Slides v2
51 pages
3a - Relaxing The Ols Assumptions
No ratings yet
3a - Relaxing The Ols Assumptions
37 pages
Ritt-Isabel - Lecture 3
No ratings yet
Ritt-Isabel - Lecture 3
20 pages
BRM - L4,5 - Linear Regression
No ratings yet
BRM - L4,5 - Linear Regression
113 pages
Econometric Modeling
No ratings yet
Econometric Modeling
38 pages
Multivariate Lineare Regression PDF
No ratings yet
Multivariate Lineare Regression PDF
68 pages
Module 3 - Multiple Linear Regression
No ratings yet
Module 3 - Multiple Linear Regression
68 pages
Economics 308: Econometrics Professor Moody: Describing The Relationship Between Two Variables
No ratings yet
Economics 308: Econometrics Professor Moody: Describing The Relationship Between Two Variables
8 pages
Handout 4 Multiple Regression
No ratings yet
Handout 4 Multiple Regression
2 pages
Regresi Ganda
No ratings yet
Regresi Ganda
33 pages
Assessing Relationships: Regression Analyses: February 25, 2020
No ratings yet
Assessing Relationships: Regression Analyses: February 25, 2020
20 pages
Econometrics Final Exam Study Guide PDF
No ratings yet
Econometrics Final Exam Study Guide PDF
14 pages
Bus 173 - Lecture 5
No ratings yet
Bus 173 - Lecture 5
38 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
G023: Econometrics: J Er Ome Adda
No ratings yet
G023: Econometrics: J Er Ome Adda
154 pages
Pols0010t2 Lec6 Handout
No ratings yet
Pols0010t2 Lec6 Handout
45 pages
Lecture 7
No ratings yet
Lecture 7
14 pages
ISO 31000 Risk Management Model
No ratings yet
ISO 31000 Risk Management Model
11 pages
FCOM vs FCTM: Cockpit Prep Steps
No ratings yet
FCOM vs FCTM: Cockpit Prep Steps
6 pages
Adobe 2013 Security Breach Analysis
No ratings yet
Adobe 2013 Security Breach Analysis
14 pages
MCM Settings - Ultimate Skyrim 4.2.0
No ratings yet
MCM Settings - Ultimate Skyrim 4.2.0
20 pages
0-Adv Faculty 2017 Revised
No ratings yet
0-Adv Faculty 2017 Revised
8 pages
Musa Kabanangi Resume
No ratings yet
Musa Kabanangi Resume
1 page
2021 Waec Timetable
No ratings yet
2021 Waec Timetable
11 pages
101successaffirmations Bigmanifestation
No ratings yet
101successaffirmations Bigmanifestation
9 pages
Manual Testing Full Course
100% (1)
Manual Testing Full Course
38 pages
mdt9 3 en
No ratings yet
mdt9 3 en
2 pages
CTC - Westpoint.edu CTC SENTINEL 012024
No ratings yet
CTC - Westpoint.edu CTC SENTINEL 012024
41 pages
MGT 480 - Group Assignment - Spring 2024
No ratings yet
MGT 480 - Group Assignment - Spring 2024
2 pages
P5Maths Mid Year Set 1
No ratings yet
P5Maths Mid Year Set 1
19 pages
AI Bias in Hiring: A Gender Analysis
No ratings yet
AI Bias in Hiring: A Gender Analysis
2 pages
1.final Urban Hydrology Drainage
100% (1)
1.final Urban Hydrology Drainage
44 pages
GEG 311 - 3 Calculus of Several Variablespdf2
No ratings yet
GEG 311 - 3 Calculus of Several Variablespdf2
26 pages
Freud's Theory of The Id, The Ego and The Superego
No ratings yet
Freud's Theory of The Id, The Ego and The Superego
5 pages
UGC Care List
No ratings yet
UGC Care List
30 pages
Pierre Bourdieu Sociology Is A Martial A
No ratings yet
Pierre Bourdieu Sociology Is A Martial A
4 pages
Topic Outline: Introduction To Human Communication
No ratings yet
Topic Outline: Introduction To Human Communication
6 pages
Thesis Manga Raw
100% (2)
Thesis Manga Raw
8 pages
Financial Management PDF
No ratings yet
Financial Management PDF
132 pages
Scheme - G Third Semester (Ce, CR, CS, CV)
No ratings yet
Scheme - G Third Semester (Ce, CR, CS, CV)
35 pages
Asa Mi Asami 2
No ratings yet
Asa Mi Asami 2
32 pages
Probability Basics for Students
No ratings yet
Probability Basics for Students
23 pages
ESL Auxiliary Verbs Worksheet
No ratings yet
ESL Auxiliary Verbs Worksheet
1 page
Exp 4
No ratings yet
Exp 4
4 pages
The Spiritual Way To Say "No"
No ratings yet
The Spiritual Way To Say "No"
6 pages
Chapter 4 Motivation Sport Psychology
No ratings yet
Chapter 4 Motivation Sport Psychology
25 pages
Siemens: RELAY 7PA30 2-1AA00
No ratings yet
Siemens: RELAY 7PA30 2-1AA00
7 pages