Lec 10
Lec 10
W ∗ = (AT A)−1 AT Y
Where A is a matrix whose rows are Xi and Y is a
vector whose components are yi .
W ∗ = (AT A)−1 AT Y
Where A is a matrix whose rows are Xi and Y is a
vector whose components are yi .
• We can also minimize J by iterative gradient descent.
W ∗ = (AT A)−1 AT Y
Where A is a matrix whose rows are Xi and Y is a
vector whose components are yi .
• We can also minimize J by iterative gradient descent.
• An incremental version of this gradient descent is the
LMS algorithm.
W T nE[XX T ] − nE[Xy] = 0
W T nE[XX T ] − nE[Xy] = 0
• This gives us the optimal W ∗ as
W T nE[XX T ] − nE[Xy] = 0
• This gives us the optimal W ∗ as
T
Pn
• Similarly A Y = i=1 Xi yi ≈ nE[Xy].
• Thus we have
(AT A)−1 AT Y ≈ (nE[XX T ])−1 (nE[Xy])
f ∗ (X) = E[y | X]
We have
2 2
(f (X) − y) = [(f (X) − E[y | X]) + (E[y | X] − y)]
We have
2 2
(f (X) − y) = [(f (X) − E[y | X]) + (E[y | X] − y)]
= (f (X) − E[y | X])2 + (E[y | X] − y)2
+ 2(f (X) − E[y | X])(E[y | X] − y)
We have
2 2
(f (X) − y) = [(f (X) − E[y | X]) + (E[y | X] − y)]
= (f (X) − E[y | X])2 + (E[y | X] − y)2
+ 2(f (X) − E[y | X])(E[y | X] − y)
f ∗ (X) = E [y | X]
ŷ(X) = h(W T X + w0 )
W (k + 1) = W (k) −
n
X
η h′ (XiT W )Xi (h(XiT W ) − yi )
i=1
f0 (X) p0
q0 (X) =
f0 (X) p0 + f1 (X) p1
f0 (X) p0
q0 (X) =
f0 (X) p0 + f1 (X) p1
1
= where
1 + exp(−ξ)
f1 (X) p1 f0 (X) p0
ξ = − ln = ln
f0 (X) p0 f1 (X) p1
• This gives us
n
1 X
σ2 = (yi − XiT W )2
n i=1
• This gives us
n
1 X
σ2 = (yi − XiT W )2
n i=1
AT (AW − Y ) + λ W = 0
AT (AW − Y ) + λ W = 0
• This gives us
AT (AW − Y ) + λ W = 0
• This gives us
f (W ) = N (W |µ0 , Σ0 )
• The probability model for y is
n
Y
f (Y |A, W, σ 2 ) = N (yi |XiT W, σ 2 ) = N (Y |AW, σ 2 I)
i=1
f (W |Y, A, σ 2 ) = N (W |µn , Σn )
where
2 T −1
µ n = Σn σ A Y + Σ µ 0 0
Σ−1
n = (Σ −1
0 + σ 2 T
A A)
which is N (y|µn T X, AT Σn A + σ 2 ).
• We can use this to predict the y for any X .
1 |x − µ|
fLap (x|µ, b) = exp − , −∞ < x < ∞
2b b
• Mean is µ and variance is 2b2 .
1 |x − µ|
fLap (x|µ, b) = exp − , −∞ < x < ∞
2b b
• Mean is µ and variance is 2b2 .
• This is a heavy tailed distribution.