0% found this document useful (0 votes)
75 views

Lecture Slides For: Ethem Alpaydin © The MIT Press, 2010

gbgfb

Uploaded by

anand_sesham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODP, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views

Lecture Slides For: Ethem Alpaydin © The MIT Press, 2010

gbgfb

Uploaded by

anand_sesham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODP, PDF, TXT or read online on Scribd
You are on page 1/ 30

Lecture Slides for

ETHEM ALPAYDIN
© The MIT Press, 2010
[email protected]
http://www.cmpe.boun.edu.tr/~ethem/i2ml2e
Outline
Last class Chapter 13 Kernel Machines
-Non separable case: Soft Margin Hyperplane
-Kernel Trick
-Vectorial Kernels
-Multiple Kernel Learning
-Multiclass Kernel Machines
Today: Finish Chapter 13 Kernel Machines
Chapter 16 Hidden Markov Models

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
SVM for Regression
Use a linear model (possibly kernelized)
f(x)=wTx+w0
Use the є-sensitive error function
0 if r t  f  xt   
e  r , f  x     t
t t

 r  f  x   
t
otherwise

min w  C   t   t 
1 2

2
 
t

r t  w T x  w0     t
w x  w   r
T
0
t
    t
 t , t  0
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 4
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 5
Kernel Regression
Polynomial kernel Gaussian kernel

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 6
One-Class Kernel Machines
Consider a sphere with center a and radius R
min R 2  C  t
t

subject to
xt  a  R 2   t , t  0

Ld    x  x    r r  x 
N
t t T s t s t s t T
xs
t t 1 s

subject to
0   t  C ,  t  1
t

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 7
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 8
Kernel Dimensionality Reduction
Kernel PCA does
PCA on the kernel
matrix (equal to
canonical PCA with
a linear kernel)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 9
Introduction
 Assumption
 Modeling dependencies in input; no longer iid (independent and identically
distributed)

 Sequences
 Temporal:
In speech: phonemes in a word (dictionary), words in a sentence (syntax,

semantics of the language).


 In handwriting: pen movements

 Spatial:
 In a DNA sequence: base pairs

 Base pairs in a DNA sequence can not be modeled as simple

probability distribution.
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
11
Discrete Markov Process
 N states: S1, S2, ..., SN
 State at “time” t, qt = Si
 First-order Markov
P(qt+1=Sj | qt=Si, qt-1=Sk ,...) = P(qt+1=Sj | qt=Si)

 Transition probabilities
aij ≡ P(qt+1=Sj | qt=Si) aij ≥ 0 and Σj=1N aij=1

 Initial probabilities
πi ≡ P(q1=Si) Σj=1N πi=1

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
12
Stochastic Automaton

T
P ( O=Q∣ A,Π ) =P ( q1 ) ∏ P ( q t∣qt −1 ) =π q a q q .. . aq q
t=2
1 1 2 T −1 T

For example:
π 3 a31 a12 a22 a 23 a32 a21 ...
Q= 3 1 2 2 3 2 1...
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
13
Example: Balls and Urns
Three urns each full of balls of one color
S1: red, S2: blue, S3: green
S1 S2 S3
¿

S1 S2
0.4 0 .3 0.3
Π= [ 0 .5,0 . 2,0 . 3 ]
T
A= 0 . 2
[ 0 .6 0.2
]
S3
0.1 0.1 0.8
O= { S1 ,S 1 ,S 3 ,S 3 } = {red, red, green, green}
P ( O∣A,Π ) =
¿
¿

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
14
Example: Balls and Urns
Three urns each full of balls of one color
S1: red, S2: blue, S3: green
S1 S2 S3
0.4 0.3 0.3

S1 S2
   0.5,0.2,0.3 A  0.2 0.6 0.2
T

0.1 0.1 0.8

S3
O  S1 ,S1 ,S3 ,S3 = {red, red, green, green}
P O | A ,    P  S1   P  S1 | S1   P  S3 | S1   P  S3 | S3 
 1  a11  a13  a33
 0.5  0.4  0.3  0.8  0.048
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
15
Balls and Urns: Learning
Observable Markov Model
Given K example sequences of length T
How to estimate the parameters?

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
16
Balls and Urns: Learning
Given K example sequences of length T

{sequences starting with Si } ∑k 1 ( q1k =Si )


π̂ i = =
{ sequences } K
{ transitions from Si to S j }
a
̂ ij=
{ transitions from Si }
T −1
∑k ∑t 1 ( qkt =S i and qt+k 1 =S j )
= T −1
∑k ∑t 1 ( qkt =S i )

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
17
Hidden Markov Models
 States are not observable
 Discrete observations {v1,v2,...,vM} are recorded; a probabilistic function
of the state
 Emission probabilities
bj(m) ≡ P(Ot=vm | qt=Sj)
 Example
 In each urn, there are balls of different colors, but with different
probabilities.
 For each observation sequence, there are multiple state sequences

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
18
Another Example
A colored ball choosing example :

Urn 1 Urn 2 Urn 3


# of Red = 30 # of Red = 10 # of Red =60
# of Green = 50 # of Green = 40 # of Green =10
# of Blue = 20 # of Blue = 50 # of Blue = 30

Probability of transition to another Urn after picking a ball:


U1 U2 U3
U1 0.1 0.4 0.5
U2 0.6 0.2 0.2
U3 0.3 0.4 0.3
Example (contd.)
U1 U2 U3 R G B
Given : U1 0.1 0.4 0.5 U1 0.3 0.5 0.2
and
U2 0.6 0.2 0.2 U2 0.1 0.4 0.5
U3 0.3 0.4 0.3 U3 0.6 0.1 0.3

Observation : RRGGBRGR

State Sequence : ??

Not so Easily Computable.


Example (contd.)

Here :
S = {U1, U2, U3} U1 U2 U3
A=
V = { R,G,B} U1 0.1 0.4 0.5
For observation: U2 0.6 0.2 0.2
O ={o1… on}
U3 0.3 0.4 0.3
And State sequence R G B
Q ={q1… qn} B=
U1 0.3 0.5 0.2
π is
U2 0.1 0.4 0.5
 i  P(q1  U i )
U3 0.6 0.1 0.3
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
22
Elements

of an HMM
N: Number of states

S = {S1, S2, ..., SN }


 M: Number of observation symbols

V = {v1,v2,...,vM}
 A = [aij]: N by N state transition probability matrix

aij ≡ P (qt+1=Sj | qt=Si)


 B = bj(m): N by M observation probability matrix

bj(m) ≡ P (Ot=vm | qt=Sj)


 Π = [πi]: N by 1 initial state probability vector

πi ≡ P (q1=Si)
λ = (A, B, Π), parameter set of HMM
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
23
Examples

•Gene regulation O={A, C, G, T} S={Gene,Transcription factor binding site,Junk


DNA,...}

•Speech processing O=speech signal S=word or phoneme being uttered•

Text understanding O=words S=topic (e.g. sports, weather, etc)

•Robot localization–O=sensor readings S=discretized position of the robot

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
24
Three Basic Problems of HMMs
1. Evaluation:
Given λ, and O, calculate P (O | λ)
1. State sequence:
Given λ, and O, find Q* such that
P (Q* | O, λ ) = maxQ P (Q | O , λ )
1. Learning:
Given X={Ok}k, find λ* such that
P ( X | λ* )=maxλ P ( X | λ )
(Rabiner, 1989)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
25
Evaluation: Naïve solution
State sequence Q = {q1,…qT}
Assume independent observations:
T
P(O∣Q , λ )=∏ P(Ot ∣q t , λ )=b q (O1 )b q (O 2 ). .. b q (OT )
i=1 1 2 T

Observations are mutually independent, given the


hidden states.
Evaluation: Naïve solution
Observe that :

P(Q∣λ)=π q1 aq1q2 aq2q3 ...aqT −1qT


And that:

P(O∣λ )=∑ P(O∣Q , λ )P (Q∣λ )


q
Evaluation: Naïve solution
Finally get:

P(O∣λ )=∑ P(O∣Q , λ )P (Q∣λ )


q

-The above sum is over all state paths


-There are NT states paths, each ‘costing’
O(T) calculations, leading to O(TNT)
time complexity.
Evaluation
 Forward variable:
α t ( i ) ≡P ( O 1 ⋯O t ,q t =Si∣ λ )
 The probability of observing the partial
sequence {O1 ,⋯,O
until timet } t and being

in Si at time t, given the model λ

Initialization:
α 1 ( i ) =π i bi ( O 1 )
Recursion:
N
α t+1 ( j )=
N
[∑ ] (
i=1
α t ( i ) aij b j O t+1 )

P ( O∣λ ) =∑ αT ( i ) Evaluation result


i=1

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
29
Evaluation
 Backward variable:
β t ( i ) ≡P ( O t+1 ⋯O T ∣q t =S i ,λ )
 The probability of being in Si at time t
and observing the partial sequence
{O t+1 ,⋯,OT }
Initialization:
β T ( i )=1 (=P(OT+1∣qT =Si ,λ ))
Recursion:
N
β t ( i ) =∑ aij b j ( Ot+1 ) β t+1 ( j )
j=1

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
30

You might also like