0% found this document useful (0 votes)
27 views

Lecture 2

This lecture discusses analyzing relationships between two or more variables simultaneously. It covers joint and conditional distributions, covariance and correlation, and linear prediction models. Key points include: 1) Joint distributions of two random variables (X,Y) are described by their simultaneous probability density function f(x,y). Marginal densities are obtained by integrating f(x,y) over one of the variables. 2) Dependence between X and Y is summarized by their covariance and correlation coefficient. Correlation measures the strength and direction of any linear relationship between the variables. 3) The conditional density f2(y|x) describes the distribution of Y given a value of X. It can be used to find the

Uploaded by

satyabasha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Lecture 2

This lecture discusses analyzing relationships between two or more variables simultaneously. It covers joint and conditional distributions, covariance and correlation, and linear prediction models. Key points include: 1) Joint distributions of two random variables (X,Y) are described by their simultaneous probability density function f(x,y). Marginal densities are obtained by integrating f(x,y) over one of the variables. 2) Dependence between X and Y is summarized by their covariance and correlation coefficient. Correlation measures the strength and direction of any linear relationship between the variables. 3) The conditional density f2(y|x) describes the distribution of Y given a value of X. It can be used to find the

Uploaded by

satyabasha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Lecture 2 Program

1. Introduction
2. Simultaneous distributions
3. Covariance and correlation
4. Conditional distributions
5. Prediction
1
Basic ideas
We will often consider two (or more) variables
simultaneously.
Examples (B& S, page 15)
2
There are two typical ways this is can be done:
(1) The data (x
1
, y
1
), . . . , (x
n
, y
n
) are
considered as independent replications of a
pair of random variables, (X, Y ).
(2) The data are described by a linear regres-
sion model
y
i
= a +bx
i
+
i
, i = 1, . . . , n
Here y
1
, . . . , y
n
are the responses that are
considered to be realizations of random vari-
ables, while x
1
, . . . , x
n
are considered to be
xed (i.e. non-random) and the
i
s are
random errors (noise)
Situation 1) occurs for observational studies,
while situation 2) occur for planned experi-
ments (where the values of the x
i
s are under
the control of the experimenter).
In situation 1) we will often condition on the
observed values of the x
i
s, and analyse the
data as if they are from situation 2)
In this lecture we focus on situation 1)
3
Joint or simultaneous distributions
The most common way to describe the si-
multaneous distribution of a pair of random
variables (X, Y ), is through their simultaneous
probability density, f(x, y)
This is dened so that
P( (X, Y ) A) =

A
f(x, y) dxdy
The marginal density of X is obtained by
integrating over all possible values of Y :
f
1
(x) =

f(x, y)dy
and similarly for the marginal density f
2
(y) of Y .
If f(x, y) = f
1
(x) f
2
(y), the random variables
X and Y are independent.
Otherwise, they are dependent, which means
that there is a relationship between X and Y ,
so that certain realizations of X tend to occur
more often together with certain realizations
of Y than others.
4
Covariance and correlation
The dependence between X and Y is often
summarized by the covariance:
= Cov(X, Y ) = E[(X
1
)(Y
2
)]
and the correlation coecient:
= corr(X, Y ) =
Cov(X, Y )
sd(X) sd(Y )
The following are important properties of the
correlation coecient.
corr(X, Y ) takes values in the interval [1, 1]
corr(X, Y ) describes the linear relationship
between Y and X.
If X and Y are independent corr(X, Y ) = 0,
but not (necessarily) the other way around
5
Correlation: correlated data
1 0 1 2

1
0
1
Correlation 0.9
x
y
2 1 0 1 2

1
0
1
Correlation 0.5
x
y
2 1 0 1 2

1
0
1
Correlation 0.5
x
y
2 1 0 1 2

1
0
1
2
Correlation 0.9
x
y
6
Correlation: uncorrelated data
1 0 1 2 3

1
0
1
Correlation 0.0
x
y
7
Correlation: uncorrelated data
2 1 0 1 2
0
1
2
3
4
5
6
Correlation 0.0
x
y
8
Transformations
Sometimes a transformation may improve the
linear relation
9
Sample versions of covariance
and correlation
Data (x
1
, y
1
), . . . , (x
n
, y
n
) are independent
replicates of (X, Y ).
Empirical analogues to the population concepts
and basic results:
Empirical covariance:
=
1
n 1
n

i=1
(x
i
x
n
) (y
i
y
n
)
Empirical correlation coecient:
=

s
1n
s
2n
When n increases:


10
Conditional distributions
The conditional density of Y given X = x
is given by
f
2
(y|x) =
f(x, y)
f
1
(x)
If X and Y are independent, so that f(x, y) = f
1
(x)f
2
(y),
we see that f
2
(y|x) = f
2
(y). This is reasonable, and
corresponds to the fact that there are no information in
a realization of X about the distribution of Y
Using the conditional density, one may nd the
conditional mean and the conditional variance:
Conditional mean:
2|x
= E(Y |x)
Conditional variance :
2
2|x
= Var(Y |x)
When (X, Y ) is bivariate, normally distributed,

2|x
is linear in x, and is known as the regres-
sion of Y on X = x (cf. below).
11
Prediction
When X and Y are dependent, it is reasonable
that knowledge of the value of X can be used
to improve the prediction for the correspond-
ing realization of Y .
Let

Y (x) be such a predictor. Then:


Y (x) Y is the prediction error


Y
opt
(x) = E(Y |x) minimizes E[(

Y (x)Y )
2
],
the mean squared prediction error
E(Y |x) will often depend on unknown
parameters, and it may be complicated to
compute
12
Linear prediction
It is convenient to consider linear predictors,
i.e. predictors of the form:

Y
lin
(x) = a +bx
Minimizing E[(a + bX Y )
2
] w.r.t. a and b
yields:
b =

2
1
and a =
2
b
1
The minimum is E[(

Y
lin
(x)Y )
2
] =
2
2
(1
2
).
Note that if
2
increases, the mean squared
error decreases.
13
Linear prediction, contd.
Without knowledge of the value of X, the best
predictor is the unconditional mean of Y , i.e.

Y
0
=
2
.
This has mean squared error E[(

Y
0
Y )
2
] =
2
2
.
Hence, a sensible measure of the quality of a
prediction is the ratio
E[(

Y
lin
(x) Y )
2
]
E[(

Y
0
Y )
2
]
= 1
2
.
For judging a prediction, the squared correla-
tion coecient is the appropriate measure.
When a and b are unknown, we plug in the
empirical counterparts:

b =


2
1
and a =
2

b
1
= y

b x
14
The bivariate normal distribution
When (X, Y ) is bivariate normal:
The distribution is described by the ve
parameters
1
,
2
,
2
1
,
2
2
and
The marginal distributions of X and Y are
normal, X N(
1
,
2
1
), Y N(
2
,
2
2
)
corr(X, Y ) = and Cov(X, Y ) =
1

2
The conditional distributions are normal
E(Y |x) =
2
+

2

1
(x
1
)
Var(Y |x) =
2
2
(1
2
)
b =

2

1
=

2
1
and a =
2
b
1
15

You might also like