THE UNIVERSITY OF HONG KONG
DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE
STAT2301_3600 Linear Statistical Analysis (Semester 1, 2014/2015)
Example Class 1
Notations
n
S XX = ( xi x ) = xi nx 2
2
i =1
n
i =1
n
S YY = ( y i y ) = y i ny 2
S=
XY
i =1
n
i =1
y)
( xi x )( yi =
x y n x y
=i 1 =i 1
dependent variable
Scatter Plot / Scatter Diagram / Scatter Graph
A graphical approach to display the values of two variables
independent variable
Sample correlation coefficient r
An indicator to measure the linear association between two variables
r = r = XY =
X Y
S XY
n 1
S XX S YY
n 1 n 1
S XY
S XX S YY
1 r 1
Scale-independent
Simple Linear Regression Model
To study the linear relationship between an explanatory variable (independent variable / predictor
1
variable / regressor) and a response variable (dependent variable / predicted variable) based on a
sample (of size n ) collected.
Consider a sample of observations is observed in the form of ( , ), = 1, , .
1
2
1
2
where is the explanatory variable and is the response variable.
Assumptions of simple linear regression model:
1 , , are nonrandom constants,
= 0 + 1 + with ~ . . . (0, 2 ),
The responses 1 , , are independent.
0 = 0 and 1 = 1 are estimators of 0 and 1
Fitted value y i = 0 + 1 xi
Residual ei = y i y i
The least square estimates (also the MLE) of 0 and 1 are
S XY
b1 = b1 =
S XX
b = b = y b x
0
1
0
0 and 1 are unbiased estimators for 0 and 1 respectively.
(0 ) = 2 +
(1 ) =
(0 , 1 ) =
Mean square error MSE is the estimate of 2 , i.e.
=1( )2
2 = 2 =
,
2
Hence
2
estimate of (1 ) =
estimate of (0 ) = 2 +
estimate of (0 , 1 ) =
Example 1.1
Show that when the line = , which passes through the origin, is fitted to the data ( , ),
2
= 1, 2, , , the least squares estimate of is
=1
=
.
=1 2
Example 1.2
Consider the following data on the number of hours that 10 persons studied for a French test and
their scores on the test:
Hours studied ( x )
Test score ( y )
4
31
9
58
10
65
14
73
4
37
7
44
12
60
22
91
1
21
17
84
The scatter plot of test score against hours studied is:
Test score (Y)
100
90
80
70
60
50
40
30
20
10
0
0
10
15
Hours studied (X)
20
25
a) Does a linear relationship appear reasonable?
b) Compute S XX , S YY and S XY .
c) Estimate the correlation coefficient between hours studied ( x ) and test score ( y ).
d) Write down an appropriate model according to the data given and state all the model
assumptions.
e) Fit a regression line of y on x .
f) Predict the test score of a person who studied 22.5 hours for the test. Is the prediction reliable?
g) Given: 2 = 2 = 27.8848. Find the standard errors for b 0 and b1 .
h) Estimate the covariance of b 0 and b1 .
Example 1.3
A sample of n boys and n girls is taken from a secondary school and their heights are measured.
Let y1 , y 2 ,..., y n denote the heights of the n girls, and y n +1 , y n + 2 ,..., y 2 n those of the n boys,
respectively. It is believed that the random quantities y i satisfy y i = + xi + i ,
i ~ i.i.d .N (0, 2 ), i = 1,...,2n where , are unknown parameters and the covariates xi are
i = 1,2,..., n
1,
defined by xi =
Find the least squares estimators of and .
i = n + 1,...,2n
+ 1,