0% found this document useful (0 votes)
43 views4 pages

Econometrics Tutorial 4 Exercises

The document outlines Tutorial 4 for an Econometrics course, including theoretical and practical exercises related to regression analysis and hypothesis testing. It provides tasks involving data analysis using R, interpretation of regression outputs, and statistical testing. Additionally, it includes exercises based on datasets concerning family consumption, teacher salaries, and health expenditures.

Uploaded by

trongphan.uel2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views4 pages

Econometrics Tutorial 4 Exercises

The document outlines Tutorial 4 for an Econometrics course, including theoretical and practical exercises related to regression analysis and hypothesis testing. It provides tasks involving data analysis using R, interpretation of regression outputs, and statistical testing. Additionally, it includes exercises based on datasets concerning family consumption, teacher salaries, and health expenditures.

Uploaded by

trongphan.uel2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

W4479 - Econometrics,

WS 2024/2025

Tutorial 4
November 06, 2024

Remark 1: Highlighted tasks (with the asterisk *) will not be discussed in the tutorial session, however a
solution will be provided on PANDA.

Part A - Theoretical tasks and questions


Exercise 4.1
State with reason whether the following statements are true, false, or uncertain.
1. Even though the disturbance term in the CLRM is not normally distributed, the OLS estimators are
still unbiased.
2. If there is no intercept in the regression model, the estimated ui (= ûi ) will not sum to zero.
3. The p-value and the size of a test statistic mean the same thing.
4. In a regression model that contains the intercept, the sum of the residuals is always zero.
5. If a null hypothesis is not rejected, it is true.
6. The higher the value of σ 2 the larger is the variance of β̂2 .
7. The conditional and unconditional means of a random variable are the same things.
8. In the two-variable PRF, if the slope coefficient β2 is zero, the intercept β1 is estimated by the sample
mean Ȳ .
9. The conditional variance, var(Yi |Xi ) = σ 2 , and the unconditional variance of Y , var(Y ) = σY2 , will be
the same if X has no influence on Y .

1
Exercise 4.2
The dataset “[Link]” contains hypothetical data on weekly family consumption expenditure
Y and weekly family income X. The data is given in the table below.
Table 1: The table to consider in exercise 4.2.

no. Y X
1 70 80
2 65 100
3 90 120
4 95 140
5 110 160
6 115 180
7 120 200
8 140 220
9 155 240
10 150 260

Xi , Yi , Xi2 Yi2 and


P P P P P
1. Calculate the sums Xi Yi from the data by hand.
2. Fit a linear regression to the data, using the sums from 4.2.1.
3. Calculate TSS, RSS, ESS, r2 and σ̂ 2 from the sums in 4.2.1.
4. Obtain a prediction of E(Y0 |X0 ) and of Y0 |X0 both for X0 = 130 and obtain for each of these predictions
corresponding 95%-prediction intervals (assume normality of the errors).
Note: Data files to be downloaded from the HTML version of this document.

Exercise 4.3
From the data “[Link]” on weekly family consumption expenditure Y and weekly family
income X from task 4.2, we obtained the following regression (in R):

Ybi = 24.4546 + 0.5091 · Xi


se = (6.4138) ( )
t=( ) (14.243) r2 = 0.9621, n = 10, σ̂ = 6.493
1. Fill in the missing numbers.
2. How do you interpret the coefficient 0.5091?
3. Would you reject the hypothesis that family income has no effect whatsoever on consumption (assume
normality of the errors)? Which test do you use? What is the p value of your test statistic?
4. Set up the ANOVA table for this example and test the hypothesis that the slope coeffcient is zero.
Which test do you use and why?
5. Suppose in the regression given above the r2 value was not given to you. Could you have obtained it
from the other information given in the regression?

Exercise 4.4
Consider the following regression output:
Ybi = 0.2033 + 0.6560 · Xi
se = (0.0976) (0.1961)
r2 = 0.9621, RSS = 0.0544, ESS = 0.0358,
where Y = labor force participation rate (LFPR) of women in 1972 and X = LFPR of women in 1968. The
regression results were obtained from a sample of 19 cities in the United States.

2
1. How do you interpret this regression?
2. Test the hypothesis: H0 : β2 ≤ 1 against H1 : β2 > 1. Which test do you use (assume normality of the
errors)? And why? What are the underlying assumptions of the test(s) you use?
3. Calculate the 95% confidence interval for β2 (assume normality of the errors).

Part B - Practical tasks using R


Exercise 4.5
The file “[Link]” gives data on average public teacher pay (annual salary in dollars) and spending on
public schools per pupil (dollars) in 1985 for 50 states and the District of Columbia. To find out if there is
any relationship between teacher’s pay and per pupil expenditure in public schools, the following model was
suggested:
Sali = β1 + β2 · Expi + ui , where Sal stands for teacher’s salary and Exp stands for per pupil expenditure.
1. Plot the data and eyeball a regression line.
2. Suppose on the basis of 4.5.1, you decide to estimate the above regression model. Obtain the estimates
of the parameters, their standard errors, r2 , RSS, and ESS.
3. Interpret the regression. Does it make economic sense?
4. Establish a 95% confidence interval for β2 . Would you reject the hypothesis that the true slope coeffcient
is 3.0?
5. Obtain the mean and individual forecast value of Sal if per pupil spending is $5000. Also establish 95%
confidence intervals for the true mean and individual values of Sal for the given spending figure.
6. How would you test the assumption of the normality of the error term? Show the test(s) you use.
Note: Data files to be downloaded from the HTML version of this document.

Exercise 4.6(*)
Refer to the S.A.T. data given in exercise 2.8 (on practice sheet 2). Suppose you want to predict the male
math Y scores on the basis of the female math scores X by running the following regression:

Yt = β1 + β2 Xt + ut .

1. Estimate the preceding model.


2. From the estimated residuals, find out if the normality assumption can be sustained.
3. Now test the hypothesis that β2 = 1, that is, there is a one-to-one correspondence between male and
female math scores. If the normality assumption does not hold according to the previous subtask, still
just assume normality of the errors in this subtask.
4. Set up the ANOVA table for this problem.

Exercise 4.7
The data sets “OECD Health [Link]” and “OECD Life [Link]” stem from the Organisation
for Economic Co-Operation and Development (OECD) and consist of data on health expenditures and life
expectancies for the OECD countries. The data is raw data (i.e. it is the same data that can be found on
the OECD’s website), thus this is a panel data set, which needs some preprocessing. In this task, we will
prepare the data set for our investigation purposes and run some econometric analysis on the influence of
health expenditures on life expectancy.
1. Load the data and get an overview.

3
2. We are interested in the log-values of health expenditures as a share of the gross domestic product, the
average life expectancy at birth of the population and the corresponding countries for the year 2017.
Filter and select the observations/variables of interest.
3. Plot the log-life expectancy Y against the log-health expenditure share of the GDP X. Use colors to
key the countries. What general conclusions can you draw from this picture? Run a linear regression
and interpret the model.
4. Identify possible outliers. Clean the data from outliers and run a linear regression on the new data.
Compare your result to that from 4.7.3.
5. Are the estimated effects (slope coefficients) from 4.7.3 and 4.7.4 significant (assume normality of the
errors)?
6. Test the hypothesis that the coefficient from 4.7.3 is equal to the coefficient from 4.7.4 (assume
normality of the errors).
Note: Data files to be downloaded from the HTML version of this document.

Common questions

Powered by AI

When using a t-test to compare regression coefficients, assumptions include: (1) the residuals of both regressions are normally distributed, (2) homoscedasticity of errors, (3) the datasets are independently drawn from their respective populations, and (4) equal variance of the errors across conditions. Ensuring these assumptions will grant validity to the comparison drawn between coefficients .

In a regression model with an intercept, the sum of residuals is zero because the model is fitted by minimizing the sum of squared residuals (Ordinary Least Squares method), and the inclusion of the intercept allows the regression line to pass through the mean of the dependent variable, making the average error equal to zero by construction .

A significant change in the slope coefficient after removing outliers indicates that the outliers had a substantial impact on the relationship estimated by the model. This may suggest that the original model was misrepresentative of the true data pattern. The revised coefficient is likely more reliable, providing a clearer picture of the underlying relationship between the variables absent the distortion created by outliers .

Identifying and removing outliers is crucial because outliers can disproportionately affect parameter estimates and the goodness-of-fit in a regression analysis. They may lead to skewed coefficients, misleading statistical inferences, and reduced model validity. Cleaning data from outliers helps ensure more reliable, robust, and representative results in econometric analysis .

The slope coefficient of 0.6560 represents the expected change in the dependent variable (e.g., labor force participation rate) for each one-unit increase in the independent variable, assuming other factors remain constant. It quantifies the strength and direction of the relationship between the independent and dependent variables .

Yes, r² can be calculated using the formula r² = 1 - (RSS/TSS), where RSS is the residual sum of squares and TSS is the total sum of squares. If ESS (explained sum of squares) is known, r² can also be directly computed as r² = ESS/TSS. These sums are often derived from the regression coefficients and statistical information provided .

True. The OLS estimators remain unbiased as long as the assumptions of the CLRM other than normality hold, particularly that the expected value of disturbances is zero and that they are homoscedastic and uncorrelated. Normality affects efficiency and validity of hypothesis tests but not unbiasedness of the estimators .

To construct an ANOVA table for testing if a slope coefficient is zero, partition the total variation in the dependent variable (Total Sum of Squares, TSS) into explained variation (Regression Sum of Squares, ESS) and unexplained variation (Residual Sum of Squares, RSS). Calculate the mean squares by dividing ESS and RSS by their respective degrees of freedom (df for ESS = 1, df for RSS = n-2). The F-statistic is obtained by dividing the mean square due to regression by the mean square due to error. If this F-statistic exceeds the critical value from the F-distribution for the chosen significance level, reject the null hypothesis .

If residuals are not normally distributed, hypothesis testing (e.g., t-tests, F-tests) under normality assumptions may be unreliable because these tests rely on the normality of errors for accurate Type I and Type II error rates. The results might show inflated significance or invalid p-values. Alternative methods or robust testing procedures should be considered if normality is violated .

The p-value and the size of a test statistic are not synonymous. The p-value indicates the probability of obtaining a test statistic at least as extreme as the one observed, assuming the null hypothesis is true. The size of the test statistic only provides its absolute magnitude without direct probabilistic interpretation. They are related through the distribution of the test statistic under the null hypothesis but are distinct concepts .

You might also like