0% found this document useful (0 votes)

13 views48 pages

Understanding Linear Regression Models

The document discusses regression analysis and correlational research, emphasizing the predictive capabilities of linear models without establishing cause-effect relationships. It details the components of linear models, including parameters like slope and intercept, and explains how to assess model fit using ANOVA and R-squared values. Additionally, it provides practical examples of predicting outcomes, such as album sales based on advertising budgets, and highlights the importance of meeting assumptions for reliable results.

Uploaded by

Alisha cliste

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views48 pages

Understanding Linear Regression Models

Uploaded by

Alisha cliste

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Regression Analysis

What is Correlational Research

 Correlational study:
 When we use correlational designs we can’t look for cause-
effect relationships because we haven’t manipulated any of the
variables
 Way of predicting the value of one variable from another
(often using multiple questionnaires)

 Allows us to estimate beyond the data we possess

 Model is linear so we summarise data by using a straight line

The linear model
The only equation we ever really need is this one:

�� = Model� + ��

 We also saw that we often fit a linear model, which in its
simplest form can be written as:

�� = �� + �� + ��
Eq. 1

�� = �� + �� + ε�
 The fundamental idea is that an outcome for an entity can
be predicted from a model and some error associated with
that prediction.
 (yi) outcome variable
 (Xi) predictor variable
 (b1) : a parameter associated with the predictor variable that
quantifies a the relationship it has with the outcome variable.
 ( b0.): a parameter that tells us the value of the outcome
when the predictor is zero
A Linear Model Yi  b0  b1 X i   i

 b1
 Parameter for the predictor
 Gradient (slope) of the line
 Direction/Strength of Relationship/Effect
 b0
 The value of outcome when predictor(s) = 0
(intercept)
Linear Models: Straight Line
 Any straight line can be defined by two things:
 (1)Slope: the slope (or gradient) of the line (usually
denoted by b1); and

 (2) Intercept: the point at which the line crosses the vertical
axis of the graph (known as the intercept of the line, b0).

 These parameters b1 and b0 are known as the regression

coefficients.
Regression co-efficients
 Slope (or gradient): b1: the shape of the line (slope)
 Intercept: b0 : where the line crosses the vertical (y) axis
Same b0, Different b
70
Positive
60
50
Ssales

40 None
30
20
Negative
10
0
0 2 4 6 8 10
Budget ($)
the gradient (b1) tells us what the model looks like (its shape) and the
intercept (b0) tells us where the model is (its location in geometric space).
Straight Lines

Outcome
Variable Error

Intercept: ith participant’s score

Slope: direction/
the point on predictor variable
strength of relationship
line crosses
y axis
Example – album sales
 Predict number of albums you would sell from how much
you spend on advertising
Example – album sales

 If we spend nothing on advertising, 50 albums were sold (b0)

 What if you spend £5 on advertising?

 Sales = 50 + 100*5 = 550 albums
 This value of 550 album sales is known as a predicted value.
The linear model with several
predictors
��
Eq. 4

= �0 + �1 �1� + �2 �2� + ��

album sales�
= �0 + �1 advertising budget� + �2 airplay� + ��
Fitting a line to the data
 Simplest Model: the mean
 Without other data, the best guess of the outcome (Y) is
always the mean

 Ordinary Least Squares (OLS) regression:

 Fits a line of best fit to the data
 Estimates the constant (b0) and parameters of each
predictor (b for each X)

SPSS finds the values of the parameters that have the least
amount of error
Total Sum of Squares, SST
7

4
Sales

0
0 1 2 3 4 5
Budget ($)

 SST
 Total variability (variability between scores and the mean)
Residual Sum of Squares, SSR
SSR
Residual/Error variability (variability between the regression model and the
actual data)
7

4
Sales

0
0 1 2 3 4 5
Budget($)
Model Sum of Squares, SSM

SSM
Model variability (difference in variability between the model and the mean)
Testing the Fit of the Model
 We need to see whether the model is a reasonable ‘fit’ of the
actual data.
 SST
 Total variability (variability between scores and the mean)
 SSR
 Residual/Error variability (variability between the regression
model and the actual data)
 SSM
 Model variability (difference in variability between the model
and the mean)
Testing the Model: ANOVA
SST
Total variance

SSM SSR
Error in Model
Improvement
due to Model
Testing the Model: ANOVA
 If the model results in better prediction than using
the mean, then SSM should be greater than SSR
 Mean Squared Error
 Sums of Squares are total values, we use Mean
Squared Error instead.

MS M
F MS R
Testing the Model: R2
 R2
 The proportion of variance accounted for by the regression
model.
 The Pearson Correlation Coefficient between observed and
predicted scores squared
 Adjusted R2
 An estimate of R2 in the population (shrinkage)

2 SS M
R  SST
Summary
 We can fit linear models predicting an outcome from one or
more predictors
 Parameter estimates (b)
 Tell us about the shape of the model
 Tell us about size and direction of relationship between predictor
and outcome
 Can significance test
 CI tells us about population value
 Use bootstrapping if assumptions are in doubt
 Model Fit
 ANOVA
 R2
Running the analysis
 FILE: Album_sales.sav
 What are our IV and DV?
 How many participants/data points are there?
 What kind of variables do we
have? (nominal, interval
or scale)
 Does the scatterplot (on p7)
show a positive or negative
relationship between
the two variables?
Running the analysis
 Analyse  Regression  Linear…
 Predictor (IV) goes in “Independent(s)”
 Outcome (DV) goes in “Dependent”
Running the analysis
 Click on “Bootstrap…”
 Runs the analysis on a sample of your data for 1000 iterations
 Check “Perform Bootstrapping…”
and choose BCa
 “Continue” then “OK” to run
Interpretation:
Simple Regression
Navigating the output
 Model Summary: how useful is our model?
 ANOVA: is our model better than the mean?
 Coefficients: What are the numbers?
 Bootstrap for coefficients
Model summary
 First, is this model is better than using the mean?
 For simple regression, R = correlation coefficient
 Compare errors (differences between predicted and
observed values) for both the mean model and the
regression model
 amount of variance explained by the model vs the mean (R2)
 Expressed as a percentage
R values range from –1 to 1, so this is a
large positive correlation

adjusted R2, gives us

R2 :
how much of the variability in the some idea of how our
outcome is accounted for by the predictors. model generalizes and,
is ideally very close to
Here, the predictor accounts for 33.5% of the our value for R2
outcome (.335*100)
ANOVA
 F-ratio measures how well the model predicts the outcome
(MSM) compared to error in the model (MSR)
 Tells us if using our model is significantly better than using
the mean alone

F(1, 198) = 99.59, p < .001

Coefficients
 Assess individual predictors using t-tests
 H0: our value of b1 is zero
 Therefore should be significant if the predictor is related
 If b1 = 0, the outcome was unchanged by that predictor
variable
 Examines if our value of b is big compared to the error
b0: Intercept Is budget a
b1: Slope
significant predictor?

 T-test: Are our variables significant predictors of our outcome?

 In this case, the t-test tells us the same thing as the ANOVA
 Because only one predictor
 We can also use this table to form our equation
 Intercept (b0): if no money is spent on advertising how
many albums will be sold? (units are in 1,000s)
 134,140 albums sold when advertising is 0 (134.10 * 1000)
 Coefficient (b1): if we increase our predictor by 1 unit
(£1000), how many more albums will we sell?
 96 additional albums sold for each £1,000 of advertising
budget spent (0.096 * 1000)=96
Bias
 We need the meet the four assumptions:
 Linearity: the relationship to model is actually linear
 Additivity: the outcome can be predicted by adding together all
predictors
 Normality: residuals to be normally distributed for optimal b
estimates, normal sampling distribution for accurate CI and
statistical tests
 Homoscedasticity

 Meeting these assumptions  can trust our estimates of b and

their associated confidence intervals and significance tests
 If not, then we can bootstrap to compute robust parameters and
confidence intervals instead
 The bootstrap CI: the population value for b is likely to fall
between .08 and .11
 Boundaries do not include zero  genuine positive
relationship between advertising budget and album sales
 If it contained 0, the true value might be 0 [i.e. no effect] or a
negative number [the opposite of our sample]
 The p value associated with the confidence interval is also
highly significant (p=.001)
Album Sales: More Predictors
 Analyse  Regression  Linear…
 Add a second block
for new predictors
Using the Model

 If a company wanted to spend £100,000 on advertising,

how many albums would we predict they would sell?
 Hint: units are in 1,000s!
 Sales = 134.14 + .096 (100,000)
 Sales = 134.23 (100000)
 Make a prediction: approximately 13,500,000 albums would
be sold if the company spent £100,000 on advertising
Album Sales: More Predictors
 Advertising only accounted for 33.5% of variance in albums
sales, leaving 66.5% variance unaccounted for
 [Link] includes 2 additional predictors:
 Amount of airplay the band receives on the radio
 The attractiveness ratings of the band
 Add these to the model to see if the model improves
Interpretation:
Multiple Regression
In this output
we have 2
models:
1. Only
advertising
2. All 3 how our model
predictors generalizes

Multiple correlation co-efficient

between the predictors and How much of the variability in the outcome is
outcome. accounted for by the predictors
In a hierarchical regression, this Advertising accounts for 33.5% while
can change with the addition of attractiveness and airplay account for an extra
new variables into the model. 33%
F(1, 198) = 99.59, p < .001 F(3, 196) = 129.50, p < .001

Both models significantly improved our ability to predict the outcome variable
compared to not fitting the model (using the mean model)
 Assess the contribution of each predictor using t-tests
 Advertising budget: t(196)= 12.26, p<.001
 Did the other predictors contribute significantly to the model?
 No. of radio plays: : t(196)= 12.12, p<.001
 Attractiveness of band: t(196)= 4.55, p<.001

 Remember: significance tests are only reliable if we have met our

assumptions!
 Advertising budget: (b1= 0.09)
 As advertising budget increases by 1 unit (£1000), album sales
increase by 0.09 units
 Airplay: (b2= 3.37)
 As number of plays on radio 1 per week increases by 1 unit (1 play),
album sales increase by 3.37 units
 Attractiveness: (b3= 11.09)
 As attractiveness rating of band increases by 1 unit album sales
increase by 11.09 units
 If assumptions are not met  use bootstrap CIs
 Advertising: (b=0.09) [0.07, 0.10], p=.001
 Number of radio plays (b=3.37) [2.80, 3.99], p=.001
 Attractiveness of band (b=11.09) [6.25, 15.10], p=.001
 Bootstrap CIs do not cross zero
 Can conclude confidently that bs are positive (do contribute)
Tasks!
 Please complete all tasks (1 – 4) at the end of the handout
 Write out all tests in APA style
 Complete all calculations
 We will give answers at the end of the session
Task 1
 Correlation of .35 between suicide and listening to heavy metal
 The model explains 12.5% of variance in suicidal tendencies (.125 x 100)

 The model is significantly better than the mean model at predicting

suicide rates
 F(1, 2135) = 304.78, p < .001, with heavy metal listening predicting suicide
risk, t(2135) = -17.46, p <.001, [-0.70, -0.53]

 There is a negative relationship between listening to heavy metal and

suicide risk
 As listening increases, suicide risk decreases
 Suicide risk = 16.04 + (-0.61* heavy metal listening)

 As listening increases by 1 unit, suicide risk decreases by 0.61 units

Task 2
 Correlation of .08 between tea drinking and cognitive function
 The model explains 0.6% of variance in cognitive functioning (.006 x
100)

 Drinking tea significantly predicts cognitive function

 F(1, 714) = 4.33, p = .038

 Positive relationship between drinking tea and cognitive scores

 As tea drinking increases, so does cognitive function
 t(714) = 2.08, p = .038

 Cognitive function = 49.22 + (0.46 x 10)

 Score after 10 cups of tea = 53.82
Task 3
 Correlation of .81 between mortality and number pubs
 The model explains 64.9% of variance in mortality (.649 x 100)

 Number of pubs significantly predicts mortality

 F(1, 6) = 11.12, p = .016

 Positive relationship between number of pubs and number of

deaths
 As pubs increase, so does mortality, t(6) = 3.33, p = .016

 Mortality = 3351.96 + (14.34*pubs)

Task 4
 The model explains 69.1% of variance in dishonesty ratings
(.691 x 100)

 The model is significantly better than using the mean to predict

dishonesty ratings
 F(1, 98) = 219.10, p< .001

 There is a negative relationship between rating of likeability and

ratings of dishonesty
 t(98) = 14.80, p < .001
 The scales are written so as likeability increases DIShonesty decreases
(honesty increases)

 Dishonesty = -1.86 + (0.94* likeableness)

SSR vs. SST in Regression Analysis
No ratings yet
SSR vs. SST in Regression Analysis
55 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
73 pages
Regression Analysis in SPSS
No ratings yet
Regression Analysis in SPSS
22 pages
Intermediate Analytics Course Overview
No ratings yet
Intermediate Analytics Course Overview
52 pages
Topic 9
No ratings yet
Topic 9
9 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
36 pages
Understanding Multiple Regression Techniques
No ratings yet
Understanding Multiple Regression Techniques
45 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
57 pages
Cronbach's Alpha & Linear Regression Guide
No ratings yet
Cronbach's Alpha & Linear Regression Guide
49 pages
Linear Regression W6 7
No ratings yet
Linear Regression W6 7
37 pages
Understanding Multiple Regression
No ratings yet
Understanding Multiple Regression
45 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
20 pages
Understanding Covariance and Regression Analysis
No ratings yet
Understanding Covariance and Regression Analysis
46 pages
Autocorrelation and Regression in Excel
No ratings yet
Autocorrelation and Regression in Excel
19 pages
Understanding Regression Analysis Methods
No ratings yet
Understanding Regression Analysis Methods
6 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
48 pages
Understanding Covariance and Regression Analysis
No ratings yet
Understanding Covariance and Regression Analysis
46 pages
Understanding Regression and Covariance
No ratings yet
Understanding Regression and Covariance
34 pages
Regression Analysis in Market Research
No ratings yet
Regression Analysis in Market Research
35 pages
Understanding Linear Models in Research
No ratings yet
Understanding Linear Models in Research
19 pages
Understanding Linear Regression Analysis
No ratings yet
Understanding Linear Regression Analysis
15 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
30 pages
Regression Analysis in Malayalam PDF
No ratings yet
Regression Analysis in Malayalam PDF
40 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
17 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
44 pages
Simple Linear Regression Analysis Guide
No ratings yet
Simple Linear Regression Analysis Guide
13 pages
Linear Regression Analysis Overview
100% (1)
Linear Regression Analysis Overview
146 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
37 pages
Regression (1)
No ratings yet
Regression (1)
38 pages
Simple vs. Multiple Linear Regression
No ratings yet
Simple vs. Multiple Linear Regression
65 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
52 pages
Advertising Impact on Sales Regression Analysis
No ratings yet
Advertising Impact on Sales Regression Analysis
29 pages
Basics of Regression Analysis Seminar
No ratings yet
Basics of Regression Analysis Seminar
41 pages
Introduction to Regression Analysis
No ratings yet
Introduction to Regression Analysis
32 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
56 pages
Understanding Unstandardized Beta in Regression
100% (4)
Understanding Unstandardized Beta in Regression
28 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
9 pages
Understanding Variable Relationships
No ratings yet
Understanding Variable Relationships
63 pages
Regression Analysis for Forecasting Techniques
No ratings yet
Regression Analysis for Forecasting Techniques
36 pages
Regression Analysis Techniques Explained
No ratings yet
Regression Analysis Techniques Explained
53 pages
Regression Concepts and Model Building
50% (2)
Regression Concepts and Model Building
15 pages
Understanding Regression Analysis Techniques
No ratings yet
Understanding Regression Analysis Techniques
15 pages
Regression Analysis: Concepts & Techniques
No ratings yet
Regression Analysis: Concepts & Techniques
54 pages
Essentials of Data Analytics: Regression
No ratings yet
Essentials of Data Analytics: Regression
146 pages
Business Analytics: Advance: Simple & Multiple Linear Regression
No ratings yet
Business Analytics: Advance: Simple & Multiple Linear Regression
38 pages
Understanding 4M Regression Analysis
No ratings yet
Understanding 4M Regression Analysis
29 pages
Mod 3 Notes
No ratings yet
Mod 3 Notes
11 pages
Linear Regression for Sales Prediction
No ratings yet
Linear Regression for Sales Prediction
56 pages
Multivariate Statistical Techniques Guide
No ratings yet
Multivariate Statistical Techniques Guide
25 pages
Understanding Regression Analysis Techniques
No ratings yet
Understanding Regression Analysis Techniques
33 pages
Regression Analysis Techniques Explained
No ratings yet
Regression Analysis Techniques Explained
49 pages
Statistical Modeling Anova Regression. l3 Biotech Sante
No ratings yet
Statistical Modeling Anova Regression. l3 Biotech Sante
47 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
33 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
43 pages
Bivariate Regression Analysis Explained
No ratings yet
Bivariate Regression Analysis Explained
7 pages
Linear Regression Analysis Guide
No ratings yet
Linear Regression Analysis Guide
20 pages
Simple Linear Regression Analysis Guide
No ratings yet
Simple Linear Regression Analysis Guide
61 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
26 pages
Linear Regression in Finance & Marketing
No ratings yet
Linear Regression in Finance & Marketing
84 pages
Conditional Logistic Regression Guide
No ratings yet
Conditional Logistic Regression Guide
16 pages
SML Destruction Overview 2018
No ratings yet
SML Destruction Overview 2018
8 pages
C-H Dissociation Data for SAA Catalysts
No ratings yet
C-H Dissociation Data for SAA Catalysts
9 pages
Feasibility Study Steps and Analysis
No ratings yet
Feasibility Study Steps and Analysis
9 pages
Hard Landing Prediction System for Pilots
No ratings yet
Hard Landing Prediction System for Pilots
11 pages
Factors Affecting The Earnings Management: The Case of Listed Firms in Vietnam
No ratings yet
Factors Affecting The Earnings Management: The Case of Listed Firms in Vietnam
11 pages
Financial Management in Nigerian SOEs
No ratings yet
Financial Management in Nigerian SOEs
341 pages
Linear Regression Quiz and Solutions
No ratings yet
Linear Regression Quiz and Solutions
3 pages
Cricket Match Prediction Project Plan
No ratings yet
Cricket Match Prediction Project Plan
2 pages
What-If Analysis Dashboard Guide
No ratings yet
What-If Analysis Dashboard Guide
32 pages
DONE - Exploring-Role-Internet-Radicalisation
No ratings yet
DONE - Exploring-Role-Internet-Radicalisation
38 pages
Corporate Bankruptcy Prediction Tool
No ratings yet
Corporate Bankruptcy Prediction Tool
7 pages
SAS Trainer and Life Sciences Expert Resume
No ratings yet
SAS Trainer and Life Sciences Expert Resume
3 pages
Linear Models with Additive Features
No ratings yet
Linear Models with Additive Features
19 pages
Time Series Forecasting Techniques
No ratings yet
Time Series Forecasting Techniques
50 pages
MBA General Semester-Wise Syllabus
No ratings yet
MBA General Semester-Wise Syllabus
51 pages
Business Research Analytical Methods Guide
No ratings yet
Business Research Analytical Methods Guide
12 pages
Problem-Solving Approaches for Managers
No ratings yet
Problem-Solving Approaches for Managers
4 pages
Model Adequacy Checking in Regression
No ratings yet
Model Adequacy Checking in Regression
36 pages
Culture's Impact on Retail Capital Structure
No ratings yet
Culture's Impact on Retail Capital Structure
7 pages
File 27092021030637
No ratings yet
File 27092021030637
33 pages
Comprehensive Data Analysis Guide
No ratings yet
Comprehensive Data Analysis Guide
1 page
Correlation Analysis of Car Sales and Super Bowl Points
100% (7)
Correlation Analysis of Car Sales and Super Bowl Points
225 pages
ML-Based Reliability in RC Deep Beams
No ratings yet
ML-Based Reliability in RC Deep Beams
16 pages
Quantitative Methods and Computer Applications
No ratings yet
Quantitative Methods and Computer Applications
4 pages
Transport Modelling Assignment Overview
No ratings yet
Transport Modelling Assignment Overview
5 pages
Machine Learning Midterm Exam 2011
No ratings yet
Machine Learning Midterm Exam 2011
16 pages
Introduction to Statistics Overview
No ratings yet
Introduction to Statistics Overview
36 pages
Statistical Intervals and Hypothesis Testing
No ratings yet
Statistical Intervals and Hypothesis Testing
116 pages
Legal Insider Trading in India Analysis
No ratings yet
Legal Insider Trading in India Analysis
143 pages

Understanding Linear Regression Models

Uploaded by

Understanding Linear Regression Models

Uploaded by

Regression Analysis

What is Correlational Research

 Allows us to estimate beyond the data we possess

 Model is linear so we summarise data by using a straight line

�������� = Model� + ������

 These parameters b1 and b0 are known as the regression

Intercept: ith participant’s score

 If we spend nothing on advertising, 50 albums were sold (b0)

 What if you spend £5 on advertising?

 Ordinary Least Squares (OLS) regression:

adjusted R2, gives us

F(1, 198) = 99.59, p < .001

 T-test: Are our variables significant predictors of our outcome?

 Meeting these assumptions  can trust our estimates of b and

 If a company wanted to spend £100,000 on advertising,

Multiple correlation co-efficient

 Remember: significance tests are only reliable if we have met our

 The model is significantly better than the mean model at predicting

 There is a negative relationship between listening to heavy metal and

 As listening increases by 1 unit, suicide risk decreases by 0.61 units

 Drinking tea significantly predicts cognitive function

 Positive relationship between drinking tea and cognitive scores

 Cognitive function = 49.22 + (0.46 x 10)

 Number of pubs significantly predicts mortality

 Positive relationship between number of pubs and number of

 Mortality = 3351.96 + (14.34*pubs)

 The model is significantly better than using the mean to predict

 There is a negative relationship between rating of likeability and

 Dishonesty = -1.86 + (0.94* likeableness)

You might also like

�� = Model� + ��