100% found this document useful (2 votes)

393 views5 pages

Probability & Regression Basics

A random variable is a variable whose value is unknown until observed. Probability density functions describe the probability of values occurring, while cumulative density functions describe the probability of values being greater than or equal to a value. Joint probability density functions describe the joint probability of two random variables occurring together. Statistical independence occurs when one variable does not impact the other. The assumptions of simple linear regression include that errors have mean zero, are homoscedastic and serially independent. Least squares regression minimizes the sum of squared residuals to estimate coefficients. The properties of estimators from least squares regression depend on factors like sample size and the sum of squares of the independent variable.

Uploaded by

Nisal Wickramasinghe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

393 views5 pages

Probability & Regression Basics

Uploaded by

Nisal Wickramasinghe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

A random variable is a variable whose value is

unknown until it is observed;  

 If g(x) if a function of X, then g(x) is
Probability Density Functions – f(x) random The error term
 Probability of each possible value occurring
 
Cumulative Density Function – F(x)  Rules of mean: 

 Probability x≥X Assumptions of Simple Linear Regression

Joint Probability Density Function
 1. For each value x,
 Joint probability of X&Y occurring 2. equivalently,
 Marginal probabilities are PDFs of either
X/Y variables
 Conditional Probability (f(x|y)) Pr of x Variance = Measure of dispersion 3.
occurring if y assumed to happen 4. Covariance between any pair of random
errors ei and ej

o
 Statistical Independence occurs if x  Stronger version: es are stat independent,
doesn’t impact y  therefore values of y are stat independent
5. Variable x is not random, and takes at
least two different values
o 6. (optional) Values of e are normally
o Or, if: distributed about mean if values of y are,
o  IF x and Y independent
and vice versa
o X and Y ONLY statistically independent
if either above is true for every pair of x Least squares regression
and y

 General form
Rules of Summation  Least squares residual
Covariance


Generating the least squares estimates

 Minimise function:

Correlation



 
 If ρ= {-1,1) perfect positive/negative
correlation Elasticities (for a 1% change in x, {elasticity}%
 If ρ=0, no correlation, also cov(X,Y) = 0 change in y)

 Elasticity of mean expenditure with
respect to income

- often used at a representative point on

the regression line

The estimator b2


Normal Distribution - X ~ N(µ, σ2)

 Standard normal distribution 

 Plus, E(b2) =β2 (if model assumptions hold)
hence estimator is unbiased.
o
 Weighted sum of normal variables is Variances/covariances of the OLS
normally distributed indicators

On the variances b2
o Larget the variance, greater the
uncertainty there is in the statistical
Properties of Probability distributions model, and the larger the variances and
covariance of the least squares
Mean = key measure of centre Simple Regressions estimator
o The larger the sum of the squares, the
 General form smaller the variances of the LSE and the
 more precisely we estimate the
 Can calculate a conditional mean Slope of simple regression (β2)
unknown parameters
o The larger the sample size N, the smaller 2. Calculate Test Statistic
the variances and covariances of the LSE 
o The larger the term (sum(x2)), the larger  Elasticity: (β2)
the variance of the LSE b1
o The absolute magnitude of the Log-linear function 3. Decide on α
covariance increases the larger in 4. Calculate tc (1-α OR 1-α/2, df(N-2) using t-tables and
magnitude of the sample mean xbar and  a sketch of rejection region
the covariance has a sign opposite xbar  Slope =by 5. Rule that we reject H0 if |t|>tc
6. Conclude in terms of problem at hand
 Elasticity:
Gauss-Markov Theorum
 Semi-elasticity: %change in y for a 1 unit
 Under assumptions of SR1-SR5, estimators
b1 and b2 have smallest variance of all change in x
linear and unbiased estimators of b1 and
b2. They are BLUE – Best linear unbiased Regression with indicator variables
estimators

Facts about GMT

 Go
1. Estimators b1 and b2 best compared to
odness of fit and modelling issues
similar estimators – those which linear  Used to compare difference bw two
and unbiased. Not states that best of all variables- slope is the difference bw R2 – Explains the proportion of the variance in y
possible estimators population means about its mean that is explained by the
2. Estimators best in class bc min variance. regression model
Always want to use one with smallest var, Interval estimation and hypothesis testing
Mention Omitted vars if low
bc estimation rule gives us higher p of
Point v Interval Estimates
obtaining estimate close to true
parameter value  Point estimates - be point estimate of the
3. If any of SR1-5 assumptions not held, then unknown population parameter in
OLS estimates not best linear unbiased regression model
estimators  Interval estimate – a range of values in
4. GMT doesn’t depend on assumption of which true parameter is likely to fall
normality (SR6)
5. GMT applies to LS estimators – not to LS How to make a interval estimate of β2 (But
estimates from a single sample. don’t know population s.d.)

Normality assumption

 If we make SR6, LSE are normally distributed

CLT
Normalise by converting to Z
 If SR1-5 hold, and sample size N is sufficiently
large, then the least squares estimators are apx
normal

Estimating σ2

BUT DO NOT USE!!! –

Use Critical t and df because popn sd unknown
Estimating Variance and Covariance
Obtaining Interval Estimates

 Find Critical tc percentile value t(1-a/2,m)

for degrees freedom m

 Then solve as above for upper and lower
(Xi-xbar)^2
limits
Variance-Covariance Matrix  ‘When the procedure we used is applied
to many random samples of data from the
same population, then 95% of all the
interval estimates constructed using this
procedure will contain the true
parameter’
Standard error of b2
Hypothesis Testing
 i.e. se from sample-to-sample used to Multiple Regression
construct various b2s  Steps:
Assumptions:
1. State Hypotheses

 [i.e. homoskedastic]


Estimating nonlinear relationships 
Quadratic model
Assumptions about explanatory variables:
 Expl. Vars are not random (i.e. known  Omitting a relevant variable leads to a  EG, inc. N, S, E – and base will be W
prior to finding value of dependent var) biased estimator
 Any one of expl vars is not an exact linear  Can be viewed as setting βOmittedvar=0 Example
function of another – otherwise exact
collinearity and LS fails

Finding OLS estimators
Irrelevant Variables
Minimise
 Can increase variance for included var. i.e.
 LS estimators are random vars reducing precision of those vars. Can apply F test to test sig of dummies

Model specification tips Log-linear models

Estimating σ^ 2

1. Choose vars and form on basis of

theoretical and general understanding of
, k=number of β parameters being the relationship
2. If estimated equation has coefficients w
estimated
unexpected signs or unrealistic
Var-Covar matrix magnitude, may be cause by
misspecifications like omission of imp var
3. Can perform sig tests to decide whether
var or group of vars should be included Can approximate % gap bw M/F by δ
use hat 4. Can test adequacy of a model using RESET
(not good) For a better calculation, use:
Hypothesis testing of βk
Collinearity
(% change bw Dummy=1, D=0)
*NOTE df=N-K!*
Linear Probability Model
Interval estimation
Therefore, if x2, x3 corr=1, var b2-> infinity –
likewise id x2 has no variation (i.e. collinear w
constant term)
Probability function
Testing joint hypothesis Cannot find LS estimators, cannot obtain
estimates of βk -
e.g. , H1: any of β4, 5,
6 are ≠0 Impacts:
this is a Bernoulli distribution:
 Makes unrestricted model with all xi  Estimator SEs are large, so likely t tests
 Makes restricted model with x4,5,6 be will lead to conclusion that parameter
excluded from y estimates are not sig diff from 0
 Calc. SSER and SSEU  Estimators may be v sensitive to
 F stat determines whether a large or small addition/deletion of few obs, or to
reduction in SSEs deletion of apparently insignificant var BUT, var of error term is not homoscedastic,
 F crit(J, N-k) – J is horizontal, less sig on  Accurate forecasts may be possible if the and p(x) can be <0, >1 (i.e. problems w model)
crit) nature of the collinear relationship
remains same within the out of sample
obs

J = #of restrictions (ie Indicator variables Heteroskedacity

terms removed), N=#obs, K=#coef. In
unrestricted model inc. constant  Use to construct models in which some or When the var of e is not randomly distributed –
all of regression parameters inc intercept i.e. it increases/decreases or some
Steps in F test change for some obs in the sample combination. NOT Randomly distributed
 D=0 reference (base) group residuals!
1. State H0 and H1
2. Specify test stat and distribution Intercept indicator (dummy) variable E.g. var(e) increases as x increases -> y and e
3. Set sig. level, determine rej. Region
are heteroskedastic
4. Calculate sample value of test stat
5. State conclusion Therefore the LS assumptions are violated
Testing Sig of model (test of the overall
 - violation of LS
significance of the regression model
assumptions, as variance is a function of x
Interaction variable (slope indicator/slope
1. State H0 (all βk=0) and H1 (at least 1≠0)
dummy) Two implications of heteroskedasticity
2. Continue as above. Use this equation

3. 1. The LSE still linear, unbiased- but not best

– there is another better estimator
Relationship bw t- and F-Tests  2. The standard errors usually computed for
 Slope: the LS estimators are incorrect. CIs and
 When F-test for a single β, F=t 2
Hyp tests may be misleading

Model specification

Omitted Variable Bias Dummy var trap

Need to use as an estimator
 Cannot include L and (Not L) – will make of var(b2), not the one used for unbiased e
collinearity
Detecting heteroskedasticity σ 2 =α 1 + α 2 male
Visually (informal – should be no pattern in For full marks the variance function should be
residuals) for K=2 in terms of sigma squared– I don’t mind what
greek letters are used for the coefficients
Lagrange multiplier (Breusch-Pagan) Test  Helps to ensure CIs and test stats are
correct when there is heteroskedasticity Is the estimated variance of e higher for men
 BUT, does not address other impacts of or for women? .[5 points]
hetero – LS estimator no longer best
 Failing to address this may not be too The estimated variance of e is lower for men
serious – w large N, var of LS estimators than for women. The estimated coefficient
may be small enough to get adequately suggests that the variance is lower for men by
precise estimates 28,849.63, Must state that it’s lower and say by
o To find an alternative estimator how much it is lower for full marks.
w lower var, it is necessary to
Sub ehat – then R2 from eqn measures the specify a suitable variance Is the variance of e statistically different for
proportion of var on ehat2 explained by Zs. function. Using LS w robust SEs men and for women? .[5 points]
avoids the need to specify a
Use Chi-square test – test stat: Chi-crit: Hypothesis test of male coefficient. Required:
suitable variance function
Hypotheses: (1 point)
Tough MCQs Test statistic/t critical/alpha OR p-value. (3)
BUT! Large sample test only Conclusion (1 point)
When collinear variables are included in an
econometric model coefficient estimates are Conduct an appropriate test for the presence
d.) unbiased but have larger standard errors of heteroskedasticity. What do you conclude?
Show all working.
If you reject the null hypothesis when
performing a RESET test, what should you State the equation to use for testing hetero:
conclude? d.) an incorrect functional form was e^ 2=α 1 +α 2 male +v 1
used
Hypothesis (1 point):
How does including an irrelevant variable in a H 0 :α 2 =0( homoskedasticity)
regression model affect the estimated
H 1 :α 2≠0( heteroskedasticity )
coefficient of other variables in the model? d.)
they are unbiased but have larger standard
errors
Test statistic (1 point)
If X has a negative effect on Y and Z has a 2 2
positive effect upon Y, and X and Z are χ =N×R =706×0 . 0016=1 . 1296
negatively correlated, what is the expected
consequence of omitting Z from a regression of Level of significance, df, Chi sq critical value –
Hal White test
Y on X? a) The estimated coefficient on X will be any level of significance can be used. For 0.05,
 Can test for hetero wo precise knowledge biased downwards (too negative). df=1, the critical value is 3.841 (1 point)
of relevant vars – sets Zs as equal to xs,
What are the consequences of using least Conclusion (1 points). Since the test statistic is
x2s, possibly cross-products
squares when heteroskedasticity is present? not greater than the critical value, we cannot
NONE of a) no consequences, coefficient reject the null hypothesis of homoscedasticity.
estimates are still unbiased b) confidence There is no heteroskedasticity in the model.
intervals and hypothesis testing are inaccurate
Depending on your result from part (16), what
due to inflated standard errors c) all coefficient
changes should be made to your model?
estimates are biased for variables correlated
with the error term d) it requires very large Since the test in part (16) concludes that there
Use F test or with sample sizes to get efficient estimates is no hetero present, we don’t need to do
anything but can estimate the model as
Exam Qs
specified.
Suppose [equation] includes hetero – what
ln(WAGE) = β1 + β2EDUC + β3EDUC2 + β4EXPER
does this mean for [CI/hyp tests]
+ β5EXPER2 + β6HRSWK + e
For full marks, I expect an explanation of
d) Suppose you wish to test the hypothesis
heteroskedasticity, the consequences and why
that a year of education has the same effect on
the tests are unreliable.
ln(WAGE) as a year or experience. What null
E.g. Heteroskedasticity is a violation of the GM and alternative hypothesis would you set up?
assumption of constant error variance (5 marks)
(homoscedasticity). The variance of the error
NOTE: White/Breusch tests may give different Education and experience have the same effect
term under hetero is no longer constant. (2
results on ln(wage) if β2 = β4 and β3 = β5 The null and
points)
alternative hypotheses are: H0 : β2 = β4 and β3
Heteroskedasticity-consistent standard errors In the presence of hetero, standard errors will = β5 H1 : β2 ≠ β4 or β3 ≠ β5 or both
(Robust standard errors) be biased and test statistics therefore unreliable
e) What is the restricted model, assuming that
since they depend on the estimates of the
Valid in large samples for both hetero- and the null hypothesis is true? (5 marks)
standard errors. (3 points)
homoscedastic errors
The restricted model assuming the null
Write down a model that allows the variance
hypothesis is true is: ln(WAGE) = β1 + β4 (EDUC
of e to differ between men and women. The
+ EXPER) + β5 EDUC2 + EXPER2 ( ) + β6HRSWK +
variance should not depend on other factors
e
f) Given that the sum of squared errors from
the restricted model is SSER = 254.1726, test
the hypothesis in (d). (For SSEU use the
relevant value from the table of output above.
The sample size is N = 1000 )

F = (SSER − SSEU ) J SSEU (N − K) = (254.1726 −

222.6674) 2 222.6674 994 = 70.32 The 5%
critical value is F=3.005. Since the F statistic is
greater than the F critical value, we reject the
null hypothesis and conclude that education
and experience have different effects on
ln(WAGE).

Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
Econometrics Cheat Sheet Overview
No ratings yet
Econometrics Cheat Sheet Overview
3 pages
Econometrics With R
No ratings yet
Econometrics With R
56 pages
Econometrics Cheat Sheet Stock and Watson
100% (5)
Econometrics Cheat Sheet Stock and Watson
2 pages
Principles of Econometrics
94% (17)
Principles of Econometrics
609 pages
Econometrics Cheat
No ratings yet
Econometrics Cheat
3 pages
Stock Watson 3U ExerciseSolutions Chapter2 Instructors
83% (6)
Stock Watson 3U ExerciseSolutions Chapter2 Instructors
29 pages
Development Economics Overview
0% (2)
Development Economics Overview
31 pages
Handbook of Applied Econometrics & Statistical Inference
100% (1)
Handbook of Applied Econometrics & Statistical Inference
718 pages
Statistics For Economics
100% (1)
Statistics For Economics
214 pages
INTRODUCTION TO ECONOMETRICS (Cap1) PDF
0% (1)
INTRODUCTION TO ECONOMETRICS (Cap1) PDF
32 pages
Cheat Sheet Econometrics
No ratings yet
Cheat Sheet Econometrics
4 pages
Financial Econometrics Notes
No ratings yet
Financial Econometrics Notes
115 pages
Econometrics moduleII
100% (2)
Econometrics moduleII
114 pages
Introduction To Econometrics (4th Edition) : ©2018 Pearson Education, Inc
No ratings yet
Introduction To Econometrics (4th Edition) : ©2018 Pearson Education, Inc
7 pages
Principles of Econometrics 4e Chapter 2 Solution
84% (19)
Principles of Econometrics 4e Chapter 2 Solution
33 pages
Introduction To Econometric Solutions To Exercises (Part 2)
75% (4)
Introduction To Econometric Solutions To Exercises (Part 2)
58 pages
Chap 11 Heterscedasticity
100% (1)
Chap 11 Heterscedasticity
45 pages
Advanced Econometrics - 1985 - 1era Edición - Amemiya
100% (1)
Advanced Econometrics - 1985 - 1era Edición - Amemiya
531 pages
Solution Econometrics
100% (2)
Solution Econometrics
132 pages
Stock and Watson Introduction To Econometrics 1-5 CH
No ratings yet
Stock and Watson Introduction To Econometrics 1-5 CH
218 pages
Stock Watson 3U ExerciseSolutions Chapter5 Students PDF
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter5 Students PDF
9 pages
Intro to Econometrics for Finance
No ratings yet
Intro to Econometrics for Finance
21 pages
Chapter 6 Solutions Solution Manual Introductory Econometrics For Finance
No ratings yet
Chapter 6 Solutions Solution Manual Introductory Econometrics For Finance
11 pages
Autocorrelation
100% (2)
Autocorrelation
172 pages
Introduction To Econometrics
100% (2)
Introduction To Econometrics
28 pages
Modern Econometric Analysis
100% (4)
Modern Econometric Analysis
236 pages
Chapter 23 Aggregate Demand and Supply Analysis
No ratings yet
Chapter 23 Aggregate Demand and Supply Analysis
54 pages
Scott and Watson CHPT 4 Solutions
No ratings yet
Scott and Watson CHPT 4 Solutions
4 pages
Econometrics Guide for Stata Users
100% (5)
Econometrics Guide for Stata Users
222 pages
ch02 Ans
No ratings yet
ch02 Ans
11 pages
ch12 Autocorrelation
100% (1)
ch12 Autocorrelation
36 pages
Bayesian Econometrics Introduction
No ratings yet
Bayesian Econometrics Introduction
107 pages
Chapter 2 Power Point Slides
No ratings yet
Chapter 2 Power Point Slides
40 pages
Econometric S
No ratings yet
Econometric S
231 pages
Microeconometrics Using Stata
100% (2)
Microeconometrics Using Stata
733 pages
Econometrics
100% (1)
Econometrics
115 pages
Financial Econometrics Introduction
No ratings yet
Financial Econometrics Introduction
13 pages
Instructor's Manual: Econometrics 4e
100% (5)
Instructor's Manual: Econometrics 4e
620 pages
GENEVIEVE BRIAND, R. CARTER HILL - Using Excel For Principles of Econometrics-Wiley (2011) PDF
100% (1)
GENEVIEVE BRIAND, R. CARTER HILL - Using Excel For Principles of Econometrics-Wiley (2011) PDF
484 pages
Arnold Zellner - Statistics, Econometrics & Forecasting PDF
No ratings yet
Arnold Zellner - Statistics, Econometrics & Forecasting PDF
186 pages
Chapter16 Distributed Lag Models
No ratings yet
Chapter16 Distributed Lag Models
30 pages
Using SAS For Econometrics, Fourth Edition - Griffiths, William E.
100% (4)
Using SAS For Econometrics, Fourth Edition - Griffiths, William E.
592 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
Econometrics
No ratings yet
Econometrics
13 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
54 pages
OLS Assumptions & Issues Guide
No ratings yet
OLS Assumptions & Issues Guide
4 pages
Notes 2
No ratings yet
Notes 2
16 pages
Econometrics Endterm Summary 2 PDF
No ratings yet
Econometrics Endterm Summary 2 PDF
43 pages
Ssss PDF
No ratings yet
Ssss PDF
50 pages
Lecture01 Uppsala EQG 12
No ratings yet
Lecture01 Uppsala EQG 12
39 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
Econometrics Study Guide
No ratings yet
Econometrics Study Guide
9 pages
Ecn 306
No ratings yet
Ecn 306
43 pages
Econometrics Cheat Sheet
No ratings yet
Econometrics Cheat Sheet
4 pages
Topic 3a
No ratings yet
Topic 3a
64 pages
Critical Thinking Course Syllabus 2021
No ratings yet
Critical Thinking Course Syllabus 2021
4 pages
One-Way ANOVA and Model Fitting
No ratings yet
One-Way ANOVA and Model Fitting
81 pages
Science & Motivation Teaching Analysis
No ratings yet
Science & Motivation Teaching Analysis
10 pages
CHỦ ĐỀ 2.TIẾNG ANH
No ratings yet
CHỦ ĐỀ 2.TIẾNG ANH
31 pages
Basic Financial Econometrics PDF
No ratings yet
Basic Financial Econometrics PDF
167 pages
SW 3401 Course Book
No ratings yet
SW 3401 Course Book
543 pages
Chapter 1 - Linear Regression With 1 Predictor: Statistical Model
No ratings yet
Chapter 1 - Linear Regression With 1 Predictor: Statistical Model
35 pages
Prince Kumar Singh AssessmentCenterReport 163
No ratings yet
Prince Kumar Singh AssessmentCenterReport 163
31 pages
Past Questions (MCQ)
100% (1)
Past Questions (MCQ)
56 pages
Simple Test of Hypothesis
60% (20)
Simple Test of Hypothesis
49 pages
Cse Reviewer
No ratings yet
Cse Reviewer
23 pages
Hypothesis Testing in Statistics Exams
No ratings yet
Hypothesis Testing in Statistics Exams
22 pages
Understanding Reasoning Concepts
No ratings yet
Understanding Reasoning Concepts
19 pages
Chapter 15 - Analyzing Qualitative Data
No ratings yet
Chapter 15 - Analyzing Qualitative Data
8 pages
AFB Saurabh Last Year
No ratings yet
AFB Saurabh Last Year
11 pages
Fulltext01 PDF
No ratings yet
Fulltext01 PDF
86 pages
2013 Process-Tracing Methods - Foundations and Guidelines PDF
100% (1)
2013 Process-Tracing Methods - Foundations and Guidelines PDF
208 pages
STA630 - Research Methods Mega Quiz File Solved by Muhammad Afaaq
No ratings yet
STA630 - Research Methods Mega Quiz File Solved by Muhammad Afaaq
93 pages
ANOVA & Regression Quiz
No ratings yet
ANOVA & Regression Quiz
8 pages
Introduction To Econometrics, 5 Edition: Chapter 3: Multiple Regression Analysis
No ratings yet
Introduction To Econometrics, 5 Edition: Chapter 3: Multiple Regression Analysis
17 pages
Moving Average Over 4 Periods: Chart Title
No ratings yet
Moving Average Over 4 Periods: Chart Title
1 page
QT2 Full Assignment
100% (2)
QT2 Full Assignment
12 pages
Research Approaches in Social Sciences
No ratings yet
Research Approaches in Social Sciences
28 pages
Student's T-Test: History Uses Assumptions Unpaired and Paired Two-Sample T-Tests
100% (1)
Student's T-Test: History Uses Assumptions Unpaired and Paired Two-Sample T-Tests
13 pages
Solution Assignment
No ratings yet
Solution Assignment
34 pages
Social Work Degree Programs Overview
No ratings yet
Social Work Degree Programs Overview
111 pages
Armenteros-Joseph de Maistre
No ratings yet
Armenteros-Joseph de Maistre
17 pages
Sales Forecast for Skiwear 2017
0% (1)
Sales Forecast for Skiwear 2017
58 pages
Pearson R Problems and Solutions
67% (3)
Pearson R Problems and Solutions
61 pages
Econometric Presentation
No ratings yet
Econometric Presentation
14 pages

Probability & Regression Basics

Uploaded by

Probability & Regression Basics

Uploaded by

A random variable is a variable whose value is

unknown until it is observed;  

- often used at a representative point on

Normal Distribution - X ~ N(µ, σ2)

 Standard normal distribution 

 If we make SR6, LSE are normally distributed

BUT DO NOT USE!!! –

 Find Critical tc percentile value t(1-a/2,m)

Model specification tips Log-linear models

1. Choose vars and form on basis of

J = #of restrictions (ie Indicator variables Heteroskedacity

3. 1. The LSE still linear, unbiased- but not best

Omitted Variable Bias Dummy var trap

F = (SSER − SSEU ) J SSEU (N − K) = (254.1726 −

You might also like