Introduction For seq2seq(sequence to sequence) and RNN

Introduction to
Sequence to Sequence Model
2017.03.16 Seminar
Presenter : Hyemin Ahn

Recurrent Neural Networks : For what?
2017-03-28 CPSLAB (EECS) 2
 Human remembers and uses the pattern of sequence.
• Try ‘a b c d e f g…’
• But how about ‘z y x w v u t s…’ ?
 The idea behind RNN is to make use of sequential
information.
 Let’s learn a pattern of a sequence, and utilize (estimate,
generate, etc…) it!
 But HOW?

Recurrent Neural Networks : Typical RNNs
2017-03-28 CPSLAB (EECS) 3
OUTPUT
INPUT
ONE
STEP
DELAY
HIDDEN
STATE
 RNNs are called “RECURRENT” because they
perform the same task for every element of a
sequence, with the output being depended on
the previous computations.
 RNNs have a “memory” which captures
information about what has been calculated so
far.
 The hidden state ℎ 𝑡 captures some information
about a sequence.
 If we use 𝑓 = tanh , Vanishing/Exploding
gradient problem happens.
 For overcome this, we use LSTM/GRU.
𝒉 𝒕
𝒚 𝒕
𝒙 𝒕
ℎ 𝑡 = 𝑓 𝑈𝑥 𝑡 + 𝑊ℎ 𝑡−1 + 𝑏
𝑦𝑡 = 𝑉ℎ 𝑡 + 𝑐
𝑈
𝑊
𝑉

Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 4
 Let’s think about the machine, which guesses the dinner menu from
things in shopping bag.
Umm,,
Carbonara!

2017-03-28 CPSLAB (EECS) 5
𝑪 𝒕
Cell state,
Internal memory unit,
Like a conveyor belt!
𝒉 𝒕
𝒙 𝒕

2017-03-28 CPSLAB (EECS) 6
𝑪 𝒕
Cell state,
𝒉 𝒕
𝒙 𝒕
Forget
Some
Memories!

2017-03-28 CPSLAB (EECS) 7
𝑪 𝒕
Cell state,
𝒉 𝒕
𝒙 𝒕
Forget
Some
Memories!
LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given,
(2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.

2017-03-28 CPSLAB (EECS) 8
𝑪 𝒕
Cell state,
𝒉 𝒕
𝒙 𝒕
Insert
Some
Memories!

2017-03-28 CPSLAB (EECS) 9
𝑪 𝒕
Cell state,
𝒉 𝒕
𝒙 𝒕

2017-03-28 CPSLAB (EECS) 10
𝑪 𝒕
Cell state,
𝒉 𝒕
𝒙 𝒕

2017-03-28 CPSLAB (EECS) 11
𝑪 𝒕
Cell state,
𝒉 𝒕
𝒚 𝒕
𝒙 𝒕

2017-03-28 CPSLAB (EECS) 12
Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/

2017-03-28 CPSLAB (EECS) 13

2017-03-28 CPSLAB (EECS) 14

Recurrent Neural Networks : GRU
2017-03-28 CPSLAB (EECS) 15
𝑓𝑡 = 𝜎(𝑊𝑓 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏𝑓)
𝑖 𝑡 = 𝜎 𝑊𝑖 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏𝑖
𝑜𝑡 = 𝜎(𝑊𝑜 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑜)
𝐶𝑡 = tanh 𝑊𝐶 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝐶
𝐶𝑡 = 𝑓𝑡 ∗ 𝐶𝑡−1 + 𝑖 𝑡 ∗ 𝐶𝑡
ℎ 𝑡 = 𝑜𝑡 ∗ tanh(𝐶𝑡)
Maybe we can simplify this structure, efficiently!
GRU
𝑧𝑡 = 𝜎 𝑊𝑧 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑧
𝑟𝑡 = 𝜎 𝑊𝑟 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑟
ℎ 𝑡 = tanh 𝑊ℎ ∙ 𝑟𝑡 ∗ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝐶
ℎ 𝑡 = (1 − 𝑧𝑡) ∗ ℎ 𝑡−1 + 𝑧𝑡 ∗ ℎ 𝑡

Sequence to Sequence Model: What is it?
2017-03-28 CPSLAB (EECS) 16
ℎ 𝑒(1) ℎ 𝑒(2) ℎ 𝑒(3) ℎ 𝑒(4) ℎ 𝑒(5)
LSTM/GRU
Encoder
LSTM/GRU
Decoder
ℎ 𝑑(1) ℎ 𝑑(𝑇𝑒)
Western Food
To
Korean Food
Transition

Sequence to Sequence Model: Implementation
2017-03-28 CPSLAB (EECS) 17
 The simplest way to implement sequence to sequence model is
to just pass the last hidden state of decoder 𝒉 𝑻
to the first GRU cell of encoder!
 However, this method’s power gets weaker when the encoder need to
generate longer sequence.

Sequence to Sequence Model: Attention Decoder
2017-03-28 CPSLAB (EECS) 18
Bidirectional
GRU Encoder
Attention
GRU Decoder
𝑐𝑡
 For each GRU cell consisting the
decoder, let’s differently pass the
encoder’s information!
ℎ𝑖 =
ℎ𝑖
ℎ𝑖
𝑐𝑖 =
𝑗=1
𝑇𝑥
𝛼𝑖𝑗ℎ𝑗
𝑠𝑖 = 𝑓 𝑠𝑖−1, 𝑦𝑖−1, 𝑐𝑖
= 1 − 𝑧𝑖 ∗ 𝑠𝑖−1 + 𝑧𝑖 ∗ 𝑠𝑖
𝑧𝑖 = 𝜎 𝑊𝑧 𝑦𝑖−1 + 𝑈𝑧 𝑠𝑖−1
𝑟𝑖 = 𝜎 𝑊𝑟 𝑦𝑖−1 + 𝑈𝑟 𝑠𝑖−1
𝑠𝑖 = tanh(𝑦𝑖−1 + 𝑈 𝑟𝑖 ∗ 𝑠𝑖−1 + 𝐶𝑐𝑖)
𝛼𝑖𝑗 =
exp(𝑒 𝑖𝑗)
𝑘=1
𝑇 𝑥 exp(𝑒 𝑖𝑘)
𝑒𝑖𝑗 = 𝑣 𝑎
𝑇
tanh 𝑊𝑎 𝑠𝑖−1 + 𝑈 𝑎ℎ𝑗

Sequence to Sequence Model: Example codes
2017-03-28 CPSLAB (EECS) 19
Codes Here @ Github

Introduction For seq2seq(sequence to sequence) and RNN

More Related Content

What's hot(20)

Viewers also liked(20)

Similar to Introduction For seq2seq(sequence to sequence) and RNN(20)

Recently uploaded(20)

Introduction For seq2seq(sequence to sequence) and RNN