Assignment On Data Analysis Using Stat and EViews.
Assignment On Data Analysis Using Stat and EViews.
Measures of Dispersion:
Measures of Distribution:
Age: The data are relatively symmetric, with a slight left skewness.
The distribution is more peaked than a normal distribution.
Sales: The sales data are slightly positively skewed.
The distribution is moderately peaked.
By Gender: Gender 1 (presumably male) has a mean age of 24.13 and mean sales of 45.
Gender 2 (presumably female) has a mean age of 24.92 and mean sales of 42.58.
By Region: Region 1 has the highest mean sales (58.25) and a mean age of 24.5.
Region 2 has a mean sale of 37.33 and mean age of 22.5.
Region 3 has a mean sale of 30.17 and mean age of 26.83.
Sales: Sales range from 11 to 77. The most common sales values are in the mid-range
(between 40 and 50).
Age: Ages range from 19 to 30. The most common age values are centered around 24.
Gender: 40% are in Gender 1, and 60% are in Gender 2.
Region: 40% are in Region 1, 30% in Region 2, and 30% in Region 3.
2. Hypothesis Testing
One Sample t-test:
H0: the efficiency of the worker in the private and government sector is the same.
H1: the efficiency of the worker in the private and government sector is not the same.
The mean salary for the Private Sector (42.55) is significantly higher than that for the
Government Sector (39.125) with a t-value of 2.6778 (p = 0.0105).
Oromia State University, School of Graduate Studies, Msc In
Development Economics (Burayu Center)
H0: the mean value of observation before and after is the same.
H1: the mean value of observation before and after is not the same.
We reject H0 and accept H1 because p value is less than 5% that means the mean value of
observation before and after is not the same i.e. the mean after is greater than before. The mean
difference in salary before and after training is -3.45, indicating a decrease in salary after
training, and this difference is statistically significant (t = -2.3913, p = 0.0273).
The overall model is significant (F = 8.24, p = 0.0005), and experience has a significant effect on
salary (F = 23.90, p < 0.0001). Education and the interaction between experience and education
are not significant.
3. Correlation Analysis
Bivariate/Pairwise Correlation:
Interpretation: There is a strong positive correlation (0.7018) between salary and marks,
indicating that individuals with higher marks tend to have higher salaries.
Interpretation: There is a very strong positive correlation (0.8132) between salary and
communication awareness. This suggests that higher communication awareness is associated
with higher salaries.
Interpretation: There is a very strong positive correlation (0.8444) between salary and IQ,
indicating that individuals with higher IQ scores tend to have higher salaries.
Partial Correlation:
Partial correlation measures the relationship between two variables while controlling for the
influence of other variables. The partial correlations of salary with other variables are as follows:
Interpretation: After controlling for other variables, the partial correlation suggests a moderate
positive relationship between salary and marks.
Oromia State University, School of Graduate Studies, Msc In
Development Economics (Burayu Center)
Interpretation: After controlling for other variables, the partial correlation indicates a moderate
positive relationship between salary and communication awareness.
Interpretation: After controlling for other variables, the partial correlation suggests a weak
positive relationship between salary and awareness.
Interpretation: After controlling for other variables, the partial correlation indicates a moderate
positive relationship between salary and IQ.
Size (coef = 0.2091, p = 0.003): The coefficient is positive, suggesting that as the size increases,
the performance tends to increase. The p-value indicates that the size variable is statistically
significant.
Age (coef = 0.1897, p = 0.083): The coefficient is positive, suggesting that as age increases,
performance tends to increase. However, the p-value is greater than 0.05, indicating that age is
not statistically significant at the 5% level.
Intercept (coef = -0.3568, p = 0.939): The intercept is not statistically significant, suggesting that
it may not add value to the model.
Model Fit:
Adjusted R-squared = 0.1375: The adjusted R-squared suggests that the model explains
approximately 13.75% of the variance in the dependent variable.
Oromia State University, School of Graduate Studies, Msc In
Development Economics (Burayu Center)
F-Statistic (p = 0.0116): The F-statistic tests the overall significance of the model. The p-value is
less than 0.05, indicating that at least one independent variable is statistically significant in
predicting the dependent variable.
Yhat (predicted values): The Yhat values have been estimated for each observation.
Residuals (prediction errors): Residuals have been calculated as the difference between the
observed and predicted values.
Shapiro-Wilk and Shapiro-Francia Tests for Residuals: Both tests show non-significant
results (p-values > 0.05), suggesting that the residuals follow a normal distribution.
Skewness/Kurtosis tests for Normality for Residuals: The joint test indicates a non-
significant result (p-value > 0.05), suggesting that the skewness and kurtosis of residuals are not
significantly different from a normal distribution.
Histogram and Kernel Density Plot for Residuals: The histogram and kernel density plot
of residuals also visually support the normality assumption.
Skewness/Kurtosis tests for Normality: The joint tests show mixed results. Age and size
have significant p-values, indicating non-normality, while performance does not exhibit
significant departure from normality.
Shapiro-Wilk Tests for Normality: Similar to the joint tests, the Shapiro-Wilk tests show
that performance is borderline significant, while age and size are significantly non-normal.
Oromia State University, School of Graduate Studies, Msc In
Development Economics (Burayu Center)
In conclusion, the residuals from the multiple linear regression model appear to be normally
distributed, satisfying one of the key assumptions of linear regression. However, the predicted
values (Yhat) and some of the predictor variables (age and size) may not be perfectly normal.
Further investigation or transformation of variables might be considered to address these
departures from normality.
Heteroscedastic Test:
Multicollinearity Test:
Model: The logit regression model was applied to analyze the relationship between the
performance (dependent variable) and the size and age of entities.
Results: The log-likelihood decreased from -34.62 to -11.48 across iterations, indicating model
improvement. The LR chi2(2) test is significant (p < 0.001), suggesting that the model as a
whole is statistically significant. Pseudo R2 of 0.67 indicates a good fit.
Variable Coefficients: Size (coef: 0.32, p < 0.01) and age (coef: 0.19, p < 0.01) are positively
associated with performance. The intercept (_cons) is negative, suggesting a baseline
performance when size and age are zero.
Model: The probit regression model was used to analyze the same relationship as in the logit
model.
Results: Similar to logit, LR chi2(2) is significant (p < 0.001), indicating overall model
significance. Pseudo R2 is 0.68, suggesting a good fit.
Variable Coefficients: Size (coef: 0.19, p < 0.01) and age (coef: 0.11, p < 0.01) positively
influence performance. The intercept (_cons) is negative, indicating a baseline performance.
Model: Tobit regression was used to analyze censored data regarding performance, considering
size and age.
Results: LR chi2(2) is significant (p < 0.001), indicating model significance. Pseudo R2 is 0.56,
suggesting a moderate fit.
Oromia State University, School of Graduate Studies, Msc In
Development Economics (Burayu Center)
Variable Coefficients: Size (coef: 0.014, p < 0.001) and age (coef: 0.015, p < 0.001) positively
impact performance. The intercept (_cons) is negative, representing baseline
performance. /sigma represents the standard deviation of the latent variable, indicating variability
in uncensored observations.
Oromia State University, School of Graduate Studies, Msc In
Development Economics (Burayu Center)
Observations 40 40 40 40 40 40 40
Mean: The average value of each variable over the observation period.
The mean of LNRGDP is 12.14, LNGFCF is 10.62, LNEXPO is 9.82, LNEXD is 9.57, LNEHE
is 7.48, LNAID is 8.88 and INF is 9.70.
Median: The middle value of each variable when arranged in ascending order.
The median of: LNRGDP is about 11.94, LNGFCF is about 10.53, LNEXPO is about 9.67,
LNEXD is about 10.03, LNEHE is about 7.26, LNAID is about 9.06 and INF is 7.50.
Measures of Dispersion:
Maximum and Minimum: The highest and lowest values observed for each variable.
The maximum and minimum value of LNRGDP, LNGFCF, LNEXPO, LNEXD, LNEHE,
LNAID and INF is 13.43 & 11.54, 12.47 & 9.56, 11.72 & 8.46, 12.49 & 6.37, 10.94 & 5.34,
11.20 & 6.79 and 36.40 & -10.60 respectively.
Std. Dev. of LNRGDP, LNGFCF, LNEXPO, LNEXD, LNEHE, LNAID and INF is 0.56, 0.76,
0.89, 1.61, 1.67, 1.30 and 10.19 respectively. LNEHE has a relatively high standard deviation of
approximately 1.67.
Oromia State University, School of Graduate Studies, Msc In
Development Economics (Burayu Center)
Measures of Distribution:
Skewness: Indicates the asymmetry of the distribution. Positive skewness (LNRGDP,
LNGFCF, LNEXPO, LNEHE, INF) indicates a right-skewed distribution.
Kurtosis: Measures the "tailedness" of the distribution. All variables have positive
kurtosis, suggesting heavy tails compared to a normal distribution.
Jarque-Bera Test: Tests whether the data follows a normal distribution. The higher the
statistic, the more divergent the distribution is from normal. For all variables, the p-values
are greater than 0.05, suggesting that they do not significantly depart from normality.
At Level:
All variables (LNRGDP, LNGFCF, LNEXPO, LNEXD, LNEHE, LNAID, INF) with a constant:
None of the variables show unit root (p-values > 0.05).
All variables with a constant and trend: Some variables (LNRGDP, LNEXD, LNEHE, LNAID)
exhibit stationarity (p-values < 0.05).
Oromia State University, School of Graduate Studies, Msc In
Development Economics (Burayu Center)
At First Difference:
All variables with a constant: Some variables (LNGFCF, LNEXPO, LNEXD, LNEHE, LNAID,
INF) are stationary (p-values < 0.05).
All variables with a constant and trend: All variables are stationary.
Notes:
a: (*)Significant at the 10%; (**)Significant at the 5%; (***) Significant at the 1% and (no) Not Significant
b: Lag Length based on AIC
c: Probability based on MacKinnon (1996) one-sided p-values.
At Level:
All variables with a constant: None of the variables show unit root (p-values > 0.05).
All variables with a constant and trend: Some variables (LNRGDP, LNAID) exhibit unit root
(p-values > 0.05).
At First Difference
Oromia State University, School of Graduate Studies, Msc In
Development Economics (Burayu Center)
All variables with a constant and trend: All variables are stationary.
With Constant
& Trend t-Statistic 0.1998 0.2059 0.1967 0.1123 0.1970 0.1242 0.1383
Prob. ** ** ** n0 ** * *
At First Difference
d(LNRGDP) d(LNGFCF) d(LNEXPO) d(LNEXD) d(LNEHE) d(LNAID) d(INF)
With Constant t-Statistic 0.6870 0.4544 0.3036 0.0795 0.6451 0.1218 0.0212
Prob. ** * n0 n0 ** n0 n0
With Constant
& Trend t-Statistic 0.1312 0.2403 0.1152 0.0702 0.0510 0.0924 0.0181
Prob. * *** n0 n0 n0 n0 n0
Notes:
a: (*)Significant at the 10%; (**)Significant at the 5%; (***) Significant at the 1% and (no) Not Significant
b: Lag Length based on AIC
c: Probability based on Kwiatkowski-Phillips-Schmidt-Shin (1992, Table 1)
At Level:
All variables with a constant: Some variables (LNGFCF, LNEXD, LNEHE) show stationarity
(p-values < 0.05).
All variables with a constant and trend: Some variables (LNGFCF, LNEXD, LNEHE,
LNAID, INF) exhibit non-stationarity (p-values > 0.05).
At First Difference:
All variables with a constant: Some variables (LNGFCF, LNEHE) are stationary (p-values <
0.05).
Some variables (LNGFCF, LNEXPO, LNEHE, LNAID, INF) show non-stationarity (p-
values > 0.05).
Oromia State University, School of Graduate Studies, Msc In
Development Economics (Burayu Center)
The presented results are from a vector autoregression (VAR) model with various lag orders (0,
1, and 2) for the endogenous variables (LNRGDP, LNGFCF, LNEXPO, LNEXD, LNEHE,
LNAID, INF) and an exogenous variable (C). The lag order selection criteria used are the
likelihood ratio test (LR), final prediction error (FPE), Akaike information criterion (AIC),
Schwarz information criterion (SC), and Hannan-Quinn information criterion (HQC). The
optimal lag order is indicated by asterisks (*) in each criterion.
Lag 2: LR = 55.36
Lag 0: 2.25e-05
Lag 0: 9.16
Oromia State University, School of Graduate Studies, Msc In
Development Economics (Burayu Center)
Lag 0: 9.47
Lag 2: SC = 5.06
Lag 0: 9.27
Interpretation:
Based on the lag order selection criteria, lag 1 appears to be the optimum lag length for this VAR
model. This conclusion is supported by the significant likelihood ratio test, the lowest final
prediction error, and the lower values of AIC, SC, and HQC compared to lag 0 and lag 2.
It's important to note that the choice of lag order can significantly impact the results and
interpretation of a VAR model. In this case, lag 1 is preferred for its better fit based on the
specified criteria. Further analysis and model diagnostics should be conducted to ensure the
reliability of the chosen lag order.
4. Co-integration Test
ARDL Bounds Test:
F-statistic 2.960692 6
Wald Test:
Equation: Untitled
Critical Value Bounds: The F-statistic is compared to critical value bounds at different
significance levels. At 1% significance level, the critical values are 3.15 and 4.43.
Wald Test: The Wald test assesses the joint hypothesis that certain coefficients are zero. F-
statistic: 10.76525 with (6, 23) degrees of freedom, p-value: 0.0000.
Null Hypothesis Summary: C(2), C(3), C(6), C(9), C(10), C(12) coefficients are tested for
being equal to zero. All p-values are less than 0.05, suggesting rejection of the null hypothesis
for all coefficients.
Tests for the number of cointegrating equations using the trace statistic.
Another test for the number of cointegrating equations using the maximum eigenvalue.
Interpretation: The results suggest that there are long-term relationships among the variables.
The ARDL Bounds Test indicates that the null hypothesis of no long-run relationships is
rejected.
Cointegrating Form
Cointegrating Form:
The coefficient for the variable D(LNGFCF) is 0.183663, and it is statistically significant (t-
statistic = 4.092233, p-value = 0.0004).
The coefficient for the variable D(LNEXPO(-1)) is -0.076854, and it is statistically significant
(t-statistic = -2.103602, p-value = 0.0466).
The coefficient for the variable D(LNEXD(-1)) is 0.050479, and it is statistically significant (t-
statistic = 2.286333, p-value = 0.0318).
Oromia State University, School of Graduate Studies, Msc In
Development Economics (Burayu Center)
The coefficient for the variable D(LNEHE) is 0.167963, and it is statistically significant (t-
statistic = 2.565875, p-value = 0.0173).
The coefficient for the variable D(LNAID) is -0.052981, and it is statistically significant (t-
statistic = -2.412910, p-value = 0.0242).
The coefficient for the lagged dependent variable CointEq(-1) is -0.630755, and it is statistically
significant (t-statistic = -4.719128, p-value = 0.0001).
CointEq=LNRGDP−(0.2912×LNGFCF+0.0010×LNEXPO−0.0809×LNEXD+0.2663×LNEHE+
0.0108×LNAID+0.0018×INF+7.7382)
The coefficient for LNGFCF is 0.291179, indicating a positive relationship with the dependent
variable, and it is statistically significant (t-statistic = 3.877050, p-value = 0.0008).
The coefficient for LNEXD is -0.080894, indicating a negative relationship with the dependent
variable, and it is statistically significant (t-statistic = -3.591263, p-value = 0.0015).
The coefficient for LNEHE is 0.266288, indicating a positive relationship with the dependent
variable, and it is statistically significant (t-statistic = 3.995004, p-value = 0.0006).The constant
C is 7.738165, and it is statistically significant (t-statistic = 10.816402, p-value = 0.0000).
Overall, the cointegrating equation and long-run coefficients suggest that variables such as
LNGFCF, LNEXD, and LNEHE have a significant impact on the long-run behavior of the
dependent variable LNRGDP. The signs of the coefficients provide insights into the direction of
these relationships.
Unrestricted
Cointegration
Rank Test
(Trace)
Oromia State University, School of Graduate Studies, Msc In
Development Economics (Burayu Center)
Hypothesized Trace 0.05 Prob.**
No. of CE(s) Eigenvalue Statistic Critical Value Critical Value
Unrestricted
Cointegration
Rank Test (Max-
eigenvalue)
The null hypothesis for each case is that there are no cointegrating equations (CE).
None of CE: Eigenvalue 0.7894, Trace Statistic 200.9355 (Critical Value 125.6154), p-value
0.0000 (Reject the null hypothesis)
At most 1 CE: Eigenvalue 0.7209, Trace Statistic 143.2911 (Critical Value 95.75366), p-value
0.0000 (Reject the null hypothesis)
At most 2 CE: Eigenvalue 0.7035, Trace Statistic 96.07462 (Critical Value 69.81889), p-value
0.0001 (Reject the null hypothesis)
Oromia State University, School of Graduate Studies, Msc In
Development Economics (Burayu Center)
At most 3 CE: Eigenvalue 0.5241, Trace Statistic 51.09665 (Critical Value 47.85613), p-value
0.0240 (Reject the null hypothesis)
The trace test indicates that there are 4 cointegrating equations at the 0.05 significance level.
The null hypothesis for each case is the same as in the trace test.
None of CE: Eigenvalue 0.7894, Max-Eigen Statistic 57.64431 (Critical Value 46.23142), p-
value 0.0021 (Reject the null hypothesis)
At most 1 CE: Eigenvalue 0.7209, Max-Eigen Statistic 47.21652 (Critical Value 40.07757), p-
value 0.0067 (Reject the null hypothesis)
At most 2 CE: Eigenvalue 0.7035, Max-Eigen Statistic 44.97797 (Critical Value 33.87687), p-
value 0.0016 (Reject the null hypothesis)
At most 3 CE: Eigenvalue 0.5241, Max-Eigen Statistic 27.47459 (Critical Value 27.58434), p-
value 0.0516 (Do not reject the null hypothesis)
The max-eigenvalue test indicates that there are 3 cointegrating equations at the 0.05 significance
level.
Conclusion:
The results suggest that there is evidence of cointegration among the variables. The trace test
suggests 4 cointegrating equations, while the max-eigenvalue test suggests 3 cointegrating
equations. These results are important for understanding the long-run relationships among the
variables in your time-series data.
Interpretation: In the first period, 100% of the variance in LNRGDP is attributed to itself.
Over subsequent periods, the contribution of each variable (LNGFCF, LNEXPO, LNEXD,
LNEHE, LNAID, INF) to the variance in LNRGDP is shown.
For example, in the second period, LNGFCF contributes 1.13%, LNEXPO contributes 1.04%,
LNEXD contributes 1.02%, and so on.
Interpretation: LNGFCF contributes significantly to its own variance in the first period
(55.85%). Over time, other variables like LNRGDP, LNEXPO, and INF start to contribute to the
variance in LNGFCF.
For example, in the second period, LNRGDP contributes 54.34%, LNEXPO contributes 1.13%,
and INF contributes 0.56% to the variance in LNGFCF.
Interpretation: LNEXPO has a high initial contribution to its own variance (52.67%). Over
time, other variables, such as LNRGDP, LNEHE, and INF, start to contribute to the variance in
LNEXPO.
For example, in the second period, LNRGDP contributes 14.87%, LNEHE contributes 4.13%,
and INF contributes 0.69% to the variance in LNEXPO.
Similar patterns are observed for LNEXD, LNEHE, LNAID, and INF.
Response of LNRGDP:
Period LNRGDP LNGFCF LNEXPO LNEXD LNEHE LNAID INF
Response of LNGFCF:
Period LNRGDP LNGFCF LNEXPO LNEXD LNEHE LNAID INF
Response of LNEXPO:
Period LNRGDP LNGFCF LNEXPO LNEXD LNEHE LNAID INF
Oromia State University, School of Graduate Studies, Msc In
Development Economics (Burayu Center)
Response of LNEXD:
Period LNRGDP LNGFCF LNEXPO LNEXD LNEHE LNAID INF
Response of LNEHE:
Period LNRGDP LNGFCF LNEXPO LNEXD LNEHE LNAID INF
Response of LNAID:
Period LNRGDP LNGFCF LNEXPO LNEXD LNEHE LNAID INF
Response of INF:
Period LNRGDP LNGFCF LNEXPO LNEXD LNEHE LNAID INF
Interpretation: In the first period, a shock to LNRGDP has a positive impact on itself
(0.05%) and a positive impact on LNEHE (0.07%). Over time, the responses of LNRGDP to
shocks become more complex, with varying impacts on other variables.
Response of Other Variables to Shocks: Similar patterns are observed for the responses of
LNGFCF, LNEXPO, LNEXD, LNEHE, LNAID, and INF to shocks in their respective
equations.
Overall Summary: The variance decomposition analysis provides insights into the relative
contributions of different variables to the overall variability in each variable. The impulse
response function analysis illustrates how shocks to one variable affect the others over time.
These findings can be valuable for understanding the dynamics and interdependencies among the
variables in the time-series data. Further analysis and interpretation would depend on the specific
context of the data and the economic or financial model under consideration.
Oromia State University, School of Graduate Studies, Msc In
Development Economics (Burayu Center)
Mean: The mean of the residuals is very close to zero (-7.72e-16), indicating that, on average, the
residuals do not exhibit a systematic bias.
Median: The median is slightly negative (-0.001221), suggesting a slight left skewness.
Maximum and Minimum: The range between the maximum (0.053027) and minimum (-
0.051652) values of the residuals is relatively small, indicating limited variability.
Standard Deviation: The standard deviation (0.023146) provides a measure of the dispersion of
the residuals around the mean.
Skewness: The skewness value (-0.101793) is close to zero, indicating a roughly symmetric
distribution of residuals.
Kurtosis: The kurtosis value (3.039177) is greater than 3, suggesting heavy tails in the
distribution.
Jarque-Bera Test:
The Jarque-Bera test statistic is 0.068055, and the associated probability is 0.966545. The high p-
value (0.966545) suggests that we do not have enough evidence to reject the null hypothesis of
normality. Therefore, based on the Jarque-Bera test, we can assume that the residuals follow a
normal distribution.
Oromia State University, School of Graduate Studies, Msc In
Development Economics (Burayu Center)
In conclusion, the descriptive statistics and the Jarque-Bera test results suggest that the residuals
from the time-series analysis exhibit properties of a normal distribution. The skewness is close to
zero, and while kurtosis indicates heavy tails, the Jarque-Bera test does not provide sufficient
evidence to reject the assumption of normality. These results are essential for understanding the
distributional properties of the residuals and can inform further analyses and modeling decisions.
F-statistic: 0.859210
Obs*R-squared: 13.04921
Interpretation: The heteroskedasticity test does not provide evidence against the null hypothesis
of homoskedasticity. The p-values are all greater than the significance level of 0.05, suggesting
that there is no significant heteroskedasticity in the model.
F-statistic: 1.472007
Obs*R-squared: 4.672255
Interpretation: The test results do not provide sufficient evidence to reject the null hypothesis
of no serial correlation. The p-values are higher than the significance level of 0.05, indicating
that serial correlation is not present.
Value df Probability
t-statistic 0.250789 22 0.8043
F-statistic 0.062895 (1, 22) 0.8043
F-test summary:
Mean
Sum of Sq. df Squares
Test SSR 5.65E-05 1 5.65E-05
Restricted SSR 0.019822 23 0.000862
Unrestricted SSR 0.019765 22 0.000898
t-statistic: 0.250789
Interpretation: The Ramsey RESET test assesses the functional form of the model. The results
suggest that the inclusion of squares of fitted values does not significantly improve the model fit.
The p-values are high, indicating that the null hypothesis of correct functional form is not
rejected.
The stability tests using the Johansen Co-integration Approach are designed to assess the number
of cointegrating equations in a vector autoregressive (VAR) model. Here's a brief interpretation
of the results:
Unrestricted Cointegration Rank Test (Trace):
The hypothesis "None" suggests no cointegration relationship.
The test statistics and critical values are provided for different hypothesized numbers of
cointegrating equations (CE).
The trace test indicates that at the 0.05 significance level, the null hypothesis of "None" is
rejected, and there are 4 cointegrating equations.
This suggests that there is evidence of a cointegrating relationship among the variables.
Unrestricted Cointegration Rank Test (Maximum Eigenvalue):
Similar to the trace test, this test also considers different hypothesized numbers of cointegrating
equations.
The max-eigenvalue test indicates that at the 0.05 significance level, the null hypothesis of
"None" is rejected, and there are 3 cointegrating equations.
In summary, both tests provide evidence that there is cointegration among the variables in your
vector autoregressive model. The number of cointegrating equations is estimated to be either 3 or
4, depending on the test used. This suggests a long-term relationship among the variables,
indicating that they move together over time. It's essential to consider these results in the broader
context of your analysis and to interpret them in conjunction with other diagnostic tests and the
economic theory underlying your model.