Assignment No: 1
Course Title: Sessional on Econometrics I
Course No. Econ 4104
Submitted To
Prof. khan Mehedi Hassan, PhD
Professor
Submitted by
Economics Discipline
Khulna University, Khulna Group: G
& Student No.: 211518,211540,211544
Tasnim Murad Mamun 211545,
Assistant Professor
Year: 4th; Term: 1st
Economics Discipline
Khulna University, Khulna Economics Discipline
Khulna university, Khulna
Date of submission: 19th May 2025
Ans: A
Task: Run a simple linear regression
For running simple linear regression model, we select GPA1 data set. Basic concept about dataset and
variable are discussed following table:
Dataset: GPA1
Variable Unit of measurement Variable type
College grade point average--colGPA Four-point scale continuous
High school GPA --hsGPA Four-point scale continuous
Achievement test score --ACT In score continuous
Sample size- 141
Now we obtain OLS regression line to predict collage GPA from achievement test score. Thus, the simple
model is:
colGPA = β0 + β1 ACT+ u--------------------------------(i)
Now we run using this code (reg colGPA ACT) in stata17 to see the effect of ACT on colGPA from GPA 1
dataset. After running this code, we get the result below: Stata code(A)
reg colGPA ACT
Table 01: Regress colGPA on ACT
colGPA Coef. St.Err. t-value p-value [95% Conf Interval] Sig
ACT .027 .011 2.49 .014 .006 .049 **
Constant 2.403 .264 9.10 0 1.881 2.925 ***
Mean dependent var 3.057 SD dependent var 0.372
R-squared 0.043 Number of obs 141
Adj R-squared 0.036 Residual 18.577
F-test 6.207 Prob > F 0.014
*** p<.01, ** p<.05, * p<.1
Finally, we obtain a regression equation:
colGPA = 2.403 + 0.027 ACT---------------------(ii)
Interpretation:
First, if ACT score is zero, ACT=0, then the predicted colGPA is the intercept, 2.403. Next we can write the
predicted change in ACT: ∆𝑐𝑜𝑙𝐺𝑃𝐴 = 0.027 (∆ 𝐴𝐶𝑇). This means that if achievement test score increases
by one unit,∆𝐴𝐶𝑇 = 1, then colGPA also increases by about 0.027. It is statistically significant at 5% level.
Note: In this document, most regression outputs were generated using Stata ‘asdoc’ command. In some cases, screenshots
were used where ‘asdoc’ formatting was not good.
Group G 1|Page
Ans: B
Task: Run a multiple linear regression. Interpret different terminologies from the regression result.
Multiple regression was run using the above dataset (GPA1). For this model equation are given below:
colGPA = β0 + β1 hsGPA+ β2 ACT + u Stata code(B)
reg colGPA hsGPA ACT
Result:
Table 02: Multiple regression colGPA
colGPA Coef. St.Err. t- p- [95% Interval] Sig
value value Conf
hsGPA .453 .096 4.73 0.00 .264 .643 ***
ACT .009 .011 0.87 .383 -.012 .031
Constant 1.286 .341 3.77 0.00 .612 1.96 ***
Mean dependent var 3.057 SD dependent var 0.372
R-squared 0.176 Number of obs 141
F-test 14.781 Prob > F 0.000
Adj R-squared 0.1645 Residual 15.9824444
*** p<.01, ** p<.05, * p<.1
S. N Terminology Explanation
1 Intercept The intercept 1.29 is the predicted college GPA if hsGPA and ACT are both set as
zero.
2 Coefficient These are the estimated effects of the independent variables like hsGPA, ACT on the
dependent variable colGPA. Example, for hsGPA, the coefficient is 0.4535, meaning
a one-unit increase in hsGPA is associated with an average increase of 0.4535 in
colGPA, holding other variables constant.
3 Standard It measures the accuracy or variability of the coefficient estimate. Smaller standard
error errors indicate more precise estimates. hsGPA standard error = 0.453± 0.96.
4 Significant Significance level (α) is the maximum acceptable probability of error we set before
level the test. Like, hsGPA statistically significant at 1% level.
5 t value The t value tests whether the coefficient is significantly different from zero. It is
calculated as the coefficient divided by the standard error.
6 P value The p value tells the smallest significant level where null hypothesis is rejected. For
ACT, H0 rejects when significant level 38%.
7 Confidence A range of values that likely contains the true coefficient with 95% certainty. It helps
interval assess the precision of the estimate. hsGPA estimated renge 0.263-0.643.
8 R- squared Indicates the proportion of variation in the dependent variable explained by the
model. Higher values suggest a better model fit. Above model, colGPA 17.6%
explained by hsGPA and ACT.
9 Adj R- Main difference between R2 and Adj-R2 is Adj-R2 is penalized for adding insignificant
squared variable.
10 Residual Residual means difference between actual value and estimated value.
11 Error It contains factors other than explanatory variable like unobserved value.
term(u)
Group G 2|Page
Ans: C
Task: Calculate omitted variable bias from equation 1 and 2.
Model 1: wage=β0 + β1educ + β2exper +u----------------(1) Stata code(C)
Model 2: wage=β0 + β1educ + u ---------------------------(2)
regress wage educ exper
Model 3: exper = 𝛿 0 + 𝛿1educ +u estimates store Real
regress wage educ
estimates store OLS
regress exper educ
First, we select three model, here model (1)
estimates store Omit
includes all variable but model 2 omit ‘exper’ esttab Real OLS Omit, se b(%10.7f)
display ( 0.5413593) -( 0.6442721)
variable. And model (3) selects to find OVM
display 0.0700954 * -1.4681823
formula value 𝛿1 (slope from the regression
experience on education).
Second, we ran these models in stata 17 using
beside code.
Third, we got all value (β2 & 𝛿1) for omitted
variable bias formula. Then we calculate
omitted variable bias.
Formula :
Omitted variable bais = β2* 𝛿 1
= 0.0700954*(-1.4681823)
= -0.1029128
A = education (educ)
B = experience (exper)
Group G 3|Page
Ans: D
Task: Based on a factor variable test a hypothesis and interpret your result.
Null Hypothesis (H₀):
There is no difference in mean college GPA between students who own a PC and those who do not.
H0: 𝜇PC=1 - 𝜇 PC=0 = 0
Alternative Hypothesis (H₁):
There is a significant difference in mean college GPA between students who own a PC and those who do
not.
Stata code(D)
H0: 𝜇PC=1 - 𝜇 PC=0 ≠ 0
❖ Before conducting hypothesis testing, it is essential to verify whether the data ttest colGPA, by (PC)
follow a normal distribution, as many parametric tests (such as the t-test) assume
normality. There are various methods available to assess normality. In this task, we examined the
distribution of the variable colGPA using histogram.
p-value < 0.05, we reject the null
hypothesis.
Interpretation:
1. Mean GPA Comparison:
▪ Students without a PC (group 0): Mean GPA = 2.9894
▪ Students with a PC (group 1): Mean GPA = 3.1589
2. Mean Difference:
▪ Difference = -0.1695
▪ The negative sign indicates that students without a PC have a lower GPA than those with a
PC.
3. 95% Confidence Interval for Mean Difference:
▪ Range: [-0.2934, -0.0456]
▪ Since the interval does not include 0, the difference is statistically significant.
4. t-statistic and p-value:
▪ t = -2.7045
▪ p-value (two-tailed test) = 0.0077
Since p-value < 0.05, we reject the null hypothesis.
Group G 4|Page
Ans: E
Task: Run a level-level, level-log, log-level and log-log regression. Interpret your variable of interest.
Dataset: CEOSAL1.DTA
Model Equiation Interpretation
Stata code(E)
Level-level salary= β0 + β1 sales+ u ∆𝑠𝑎𝑙𝑎𝑟𝑦 = 𝛽1 ∆𝑠𝑎𝑙𝑒𝑠
reg salary sales
Level-log salary= β0 + β1 log (sales)+ u 𝛽1 reg salary lnsales
∆𝑠𝑎𝑙𝑎𝑟𝑦 = %∆𝑠𝑎𝑙𝑒𝑠
100 reg lnsalary sales
reg lnsalary lnsales
Log-level log(salary) = β0 + β1 (sales)+ u %∆𝑠𝑎𝑙𝑎𝑟𝑦 = 𝛽1 ∗ 100 ∆𝑠𝑎𝑙𝑒𝑠
Log-log log(salary) = β0 + β1 log (sales)+ u %∆𝑠𝑎𝑙𝑎𝑟𝑦 = %𝛽1 ∆𝑠𝑎𝑙𝑒𝑠
1.Level-level
Interpretation: if sales increase by one unit, then the predicted salary increases by 0.015 unit on average.
2.Level-log
262.90
Interpretation: if sales increase by one percentage (%), the predicted salary increases by = 2.63 unit
100
on average.
Group G 5|Page
3.Log-level
Interpretation: if sales increase by one unit, the predicted salary increases by 0.0015 % (100 ∗ 0.000015)
on average.
4.Log-log
Interpretation: if sales increase by one %, then the predicted salary increases by 0.26% on average.
Ans: F
Task: Run multiple linear regression by using dummy variable, quadratic and interaction term. Interpret
those variables.
To complete this task, we take
WAGE1.DTA dataset. Basic Dataset: WAGE1.DTA
concept about dataset and Variable Unit of Variable
measurement type
variable are discussed
Hourly wage –wage In dollar continuous
following table: Year of education –educ Year continuous
Year of experience --exper year continuous
Female--female 1=female, 0= others dummy
Experience (quadratic) exper2
educ.exper (interaction term) educ*exper
Sample size- 526
Group G 6|Page
Regression Model Stata code(F)
wage=β0 + β1educ + β2exper + β3female + β4 exper + 2 gen expersq= exper*exper
gen educ_exper =educ*exper
β5(educ×exper) + u reg wage educ exper female expersq educ_exper,robust
Table 03: Regress wage on dummy, quadratic and interaction term
wage Coef. Robust t- p- [95% Interval] Sig
St.Err. value value Conf
educ .587 .089 6.61 0 .412 .761 ***
exper .278 .058 4.79 0 .164 .393 ***
female -2.1 .261 -8.04 0 -2.614 -1.587 ***
expersq -.005 .001 -6.25 0 -.006 -.003 ***
educ_exper -.002 .003 -0.44 .662 -.008 .005
Constant -2.738 1.186 -2.31 .021 -5.067 -.409 **
Mean dependent var 5.896 SD dependent var 3.693
R-squared 0.350 Number of obs 526
F-test 39.731 Prob > F 0.000
*** p<.01, ** p<.05, * p<.1
Quadratic function:
Δ 𝑤𝑎𝑔𝑒
≈ 𝛽2 + 2𝛽4(𝑒𝑥𝑝𝑒𝑟)
Δ 𝑒𝑥𝑝𝑒𝑟
=0.28 - 2*0.005 (exper)
The first year of experience is worth roughly per hour ($0.28). The second year of experience is less about
[0.28-2*(0.005) (1)] = $0.27. In going from 10 to 11 years of experience, wage is predicted to increase by
about [0.28-2*(0.005) (10)] = $0.18.
Interaction term (continuous by continuous):
Without interaction
➢ If education increases by one year, then the predicted wage increases by $0.59 on average.
➢ If experience increases by one year, then the predicted wage increases by $0.29 on average.
With interaction:
However, increasing education and experience slightly reduces Δ 𝑤𝑎𝑔𝑒
= 𝛽1 + 𝛽5(𝑒𝑥𝑝𝑒𝑟)
these gains by about $0.002 for each year of both education and Δ 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛
experience. But it is not statistically significant. = 0.587- (0.002*10)
For someone with 0 years of experience, each year of education = 0.567
increases salary by $ 0.587. For 10 years of experience, education increases salary by 0.587- (0.002*10) = 0.
567.So, the effect of education gets lesser with more experience. But it is not significant.
Dummy variable:
A female person earns less wage about $2.1 compared to a male person on average holding other remain
constant. This effect is also very statistically significant at the 1% level.
Group G 7|Page
Ans: G
Task: Draw the turning point of quadratic term
Now we try to draw turning point of the quadratic term using stata code.
From the above (task F) regression model,
Stata code(G)
𝛽2
Turning point: exper* = reg wage educ exper female expersq educ_exper,robust
2𝛽4 display -1*_b[exper]/(2*_b[expersq])
0.28 sum educ
= scalar mean_educ = r(mean)
2∗0.005 sum female
scalar mean_female = r(mean)
=30.52
twoway ///
(scatter wage exper, mcolor(gs12)) ///
❖ Drawing the turning point in (function y = _b[_cons] ///
Stata was complex due to the + _b[educ]*`=mean_educ' ///
+ _b[exper]*x ///
presence of additional variables + _b[female]*`=mean_female' ///
+ _b[expersq]*x^2 ///
in the model, such as education, + _b[educ_exper]*`=mean_educ'*x, ///
range(exper) lwidth(thick) lcolor(blue)) ///
female, and their interaction. (scatteri 7.4 30.52, msymbol(circle_30.5) msize(large) mcolor(red)) ///
, ytitle("Wage") xtitle("Experience") ///
legend(label(1 "Observed") label(2 "Fitted Line") label(3 "Turning Point")) ///
❖ To address this, we fixed these title("Quadratic relation between wage and experience ")
variables at their mean values to
isolate the effect of experience.
❖ This allowed us to graph the
quadratic relationship between
wage and experience while
controlling for other factors.
❖ The turning point at 30.52 years
of experience indicates the value
at which the predicted wage is
maximized in this model.
Group G 8|Page
Ans: H
Task: Test heteroskedasticity and multicollinearity. Interpret your result. Why do you test robustness?
Regression model: colGPA = β0 + β1 hsGPA+ β2 ACT +β3 age + β4 junior + u
If we want to test the heteroskedasticity and multicollinearity of the above Stata code(H)
model. First, we ran regress, then we test using stata code (hettest & vif). reg colGPA hsGPA ACT age junior
hettest
vif
Table 04: Regression output for colGPA
colGPA Coef. St.Err. t- p- [95% Interval]
value value Conf Sig
hsGPA .479 .099 4.84 0 .283 .675 ***
ACT .009 .011 0.85 .398 -.012 .03
age .037 .027 1.38 .17 -.016 .089
junior .05 .068 0.73 .464 -.084 .184
Constant .421 .717 0.59 .558 -.996 1.838
Mean dependent var 3.057 SD dependent var 0.372
R-squared 0.188 Number of obs 141
Adjusted R-squared 0.164 Residual 15.760
F-test 7.864 Prob > F 0.000
*** p<.01, ** p<.05, * p<.1
P value> 0.05.we fail to reject null hypothesis of VIF (1.26) < 5. There is no multicollinearity.
homoskedasticity. So, there is no heteroskedasticity.
Why we test robustness:
➢ When heteroskedasticity is present, OLS estimators remain unbiased and consistent, but the standard
errors, t, F, and LM statistics become invalid.
➢ Robust standard errors (White, 1980) correct this issue, allowing valid inference even when the error
variance is not constant.
➢ Robust testing ensures:
▪ Reliable t-tests and F-tests.
▪ Confidence intervals are accurate.
▪ Policy conclusions are not misleading.
Group G 9|Page
Ans: I
Task: Run a linear probability model and interpret the result
When dependent variable is dummy variable, then we run a linear Stata code(I)
probability model.
gen colgpadummy=.
Regression model: replace colgpadummy = 0 if colGPA <3
replace colgpadummy = 1 if colGPA >=3
colGPAdummy=β0 + β1 age + β2 hsGPA + β3 skipped + β4 label define colGPAlabel 0 "fail" 1 "pass"
alcohol + β5 male + β6 ACT + u label values colgpadummy colGPAlabel
Where, reg colgpadummy age hsGPA skipped
alcohol male ACT
colGPAdummy: 1 = above 3.00 of colGPA (pass); 0 = others value
Table 05: Linear probability model of collage exam pass
colgpadummy Coef. St.Err. t- p- [95% Interval] Sig
value value Conf
age .014 .032 0.43 .67 -.05 .077
hsGPA .396 .142 2.79 .006 .116 .676 ***
skipped -.119 .039 -3.07 .003 -.196 -.042 ***
alcohol .027 .032 0.85 .397 -.036 .09
male -.074 .088 -0.85 .398 -.247 .099
ACT .008 .015 0.54 .591 -.022 .039
Constant -1.137 .918 -1.24 .218 -2.953 .679
Mean dependent var 0.582 SD dependent var 0.495
R-squared 0.159 Number of obs 141
F-test 4.235 Prob > F 0.001
Adj R-squared 0.122 Residual 28.843
*** p<.01, ** p<.05, * p<.1
For every 1 unit increase in age, the probability of passing the exam increases by 0.014. However, this
effect is not statistically significant (p-value > 0.05), so age does not have a significant impact on passing the
exam. For every 1 point increase in high school GPA, the probability of passing the exam increases by 0.396.
This effect is statistically significant (p-value < 0.01), meaning a higher GPA significantly increases the
probability of passing.
For every skipped subject, the probability of passing the exam decreases by 0.119. This effect is
statistically significant (p-value < 0.01), meaning skipping subjects reduces the chance of passing. Alcohol
consumption has a very small effect on the probability of passing the exam, and this effect is not statistically
significant (p-value > 0.05).
Therefore, alcohol consumption does not significantly affect the probability of passing. Being male
has a small negative effect on the probability of passing the exam, but this effect is not statistically significant
(p-value > 0.05). Gender does not significantly affect the probability of passing the exam.
The ACT score has a very small effect on the probability of passing the exam, and this effect is not
statistically significant (p-value > 0.05). Therefore, the ACT score does not significantly impact the probability
of passing. The constant term represents the baseline probability of passing the exam when all independent
variables are zero, but this effect is not statistically significant (p-value > 0.05).
Group G 10 | P a g e
Ans: J
Task: Run a regression by using a time series dataset. Test autocorrelation and interpret the result.
Dataset: PRMINWGE.DTA
Stata code(J)
Model: tsset year, yearly
lprepop =β0 + β1 lmincov + β2 lprgnp + β3 lusgnp + β4 t + u reg lprepop lmincov lprgnp lusgnp t
dwstat
Table 06: Regression for time sries data
lprepop Coef. St.Err. t- p- [95% Interval] Sig
value value Conf
lmincov -.212 .04 -5.29 0 -.294 -.131 ***
lprgnp .285 .08 3.54 .001 .121 .449 ***
lusgnp .486 .222 2.19 .036 .034 .938 **
t -.027 .005 -5.76 0 -.036 -.017 ***
Constant -6.663 1.258 -5.30 0 -9.223 -4.104 ***
Mean dependent var -0.944 SD dependent var 0.093
R-squared 0.889 Number of obs 38
F-test 66.234 Prob > F 0.000
Akaike crit. (AIC) -147.318 Bayesian crit. (BIC) -139.130
*** p<.01, ** p<.05, * p<.1
we can use the Durbin–Watson test to check for first-order autocorrelation in the errors.
Interpreting the Durban Watson Statistic
The Durban Watson statistics will always assume a value between 0 and 4. A value of DW = 2 indicates that
there is no autocorrelation. When the value is below 2, it indicates a positive autocorrelation, and a value
higher than 2 indicates a negative serial correlation. In this model d value = 1.013 < 2, so it is positive
autocorrelation. To test for positive autocorrelation at significance level α (alpha), the test statistic DW is
compared to lower and upper critical values which are given below figure:
The lower and upper bounds (dL and dU) depend on sample size n and the number of explanatory variables
k (not include intercept): (1% significant level)
du=1.515
These values are taken from
This model is n=38, k= 4 Durbin Watson significance tables.
(https://shorturl.at/cGpby)
dL=1.072
Finaly, 1.03 < 1.072 = d<dL. Based on the picture above, the Durbin-Watson value for the stata output is
1.03. The Durbin-Watson test value is between 0 and dL values. Because the value of the Durbin-Watson test
is between 0 and dl, it can be concluded that the regression equation is positive autocorrelation.
Group G 11 | P a g e