Annotated Stata Regression Output
. reg y x
ANOVA Table Model: Goodness of Fit (GOF)
SSE; SSR; SST; MSE n; F; probF; R2; adjR2; RMSE
Source | SS1a df1a MS1a Number of obs = 15
-------------+---------------------------------- F( 1, 13) = 80.02
Model | SSE:1b 15.593 11e 15.593 Prob > F = 0.0000
Residual | SSR:1c 2.5332 131f MSE:1g .19486 R-squared = 0.8602
-------------+---------------------------------- Adj R-squared = 0.8495
Total | SST:1d 18.126 14 1.2947 Root MSE = RMSE = .44143
Parameters: OLS Estimation and Inference/Precision (I/P)
LHS and RHS variables, βˆ , se, t stat, p value, 95% Confidence Interval
OLS Estimation | ------------------- Inference/Precision (I/P) ------------------------
(LHS) y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(RHS) x | -13.66858 1.527994 -8.95 0.000 -16.96961 -10.36755
_cons | 4.404741 .2924053 15.06 0.000 3.773038 5.036444
Predicted Values
. predict yhat
(option xb assumed; fitted values)
. twoway (scatter y x) (line yhat x)
Sample Regression Function (SRF):
(Fitted values line in Figure; the set of
4
predicted values, given the estimated linear
relationship between x and y; uses the
3
estimated coefficients):
2
SRF: yˆ =βˆ0 + βˆ1 x =4.405 − 13.669 x
β̂ 0 : _cons (intercept) coef.; β̂1 : x (slope)
1
coef.
0
0 .1 .2 .3
x
y Fitted values
1) ANOVA Table
a) SS: Sum Square(d)s; df: Degrees of freedom; MS: Mean Square(d)s
b) SS/Model = SSE: Sum Squared Explained, or Explained Sum of Squares
c) SS/Residual = SSR: Sum Squared Residuals, or Residual Sum of Squares
d) SS/Total = SST: Sum Squared Total, or Total Sum of Squares (SST = SSE + SSR)
e) df/Model = k: #RHS variables (don't count the constant term; k=1 since one RHS var, x)
f) df/Residual = (n-k-1) = (n-2): degrees of freedom for the model
g) GOF #3 – MS/Residual = MSE: Mean Squared Error (MSE = SSR/(n-k-1)); sort of an
average squared residual 1
2) Model: Goodness of Fit (GOF) Metrics
a) n: Number of observations
b) GOF #1a – R-squared = R2 : Coefficient of Determination (0 ≤ R 2 ≤ 1) )
c) GOF #1b – Adj R-squared = R 2 : R adjusted (for df); R 2 ≤ R 2 ≤ 1
d) GOF #2 – Root MSE = RMSE: Root Mean Squared Error; RMSE = MSE ; sort of an
average magnitude of the residuals
e) GOF #4a – F: F Statistic for the Regression
f) GOF #4b – Prob>F: probability value for the F stat
3) Parameters: OLS Estimation and Inference/Precision (I/P) Metrics
RHS vars x and _cons, the constant/intercept term in the model (assume data generated
according to yi =β 0 + β1 xi + U i , where β 0 and β1 are the true and unknown parameter
values, to be estimated using your data and OLS… and U i is random element/noise)
a) OLS (Ordinary Least Squares) Estimation
i) coef. = βˆx = βˆ1 : OLS (slope) parameter estimate (hats! … since estimates)
b) Inference/Precision (I/P) Metrics
i) I/P #1a – P>|t| = p value: probability value for the t stat
ii) I/P #1b – [95% Conf. Interval]: 95% Confidence Interval
iii) I/P #2 – t : t statistic under the Null Hypothesis that the true parameter value is 0
iv) I/P #3 – Std. Err. = se = sex : Standard error of slope estimate
1
Reasonable people can disagree in selecting and ordering the GOF metrics … and the I/P metrics (below).