ECON-3312
Machine Learning for
Social Scientists
Lecture 5: Regression
Problem and The Linear
Regression Model 1
Lecture Outline
Regression Problem
The Linear Regression Model
Conditional Expectation Function
Why is it called “linear”?
Evaluating Linear Regression Model – Evaluation Metrics
Training the linear regression model – OLS
Problem with OLS in large datasets
2
Foreword
• Before training any model…
• Should have a minimum performance threshold in mind
• That is: if my model performs this well, I will deploy it
3
Discussion on the board
Using OLS to find betas for simple linear regression
5
Illustrating
Regression
Problem
6
Democracy Index
Illustrating
Data – One
Feature
Stability
7
• To make predictions, we need to find a
formula that describes the relationship
Predicting between stability and democracy in the
Scores training data.
• The simplest way to do this is to draw a
line—the regression line.
8
Democracy Index
Predicting
Scores – The
Regression
Line
Stability
9
Democracy Index
Predicting
Scores – On
the line!
Stability
10
Strengths and Weaknesses
11
The Linear Regression Model
12
The Conditional Expectation Function
13
Example –
Simple
Linear
Regression
14
Example – Multiple Linear Regression
15
Why “Linear” Regression?
16
Why “Linear” • Linear in betas (parameters)
Regression? • Means parameters are weights, not powers
• The predictors/features themselves may be non-
linear: transformed by squaring, taking log, etc.
• The predictors may also be discrete – dummy
variables
17
Evaluating
Regression
Performance
18
Residuals
19
Performance Metric: MAE
20
Performance Metric: RMSE
21
Performance Metric: R-squared
• R-squared also known as the coefficient of determination.
• It measures the variability in the dependent variable that is explained
by the independent variables in the regression model.
• It lies between [0,1]
• 1 indicates perfect fit.
• Review how it is measured!
22