16
Most read
17
Most read
20
Most read
Introduction to
Sequence to Sequence Model
2017.03.16 Seminar
Presenter : Hyemin Ahn
Recurrent Neural Networks : For what?
2017-03-28 CPSLAB (EECS) 2
 Human remembers and uses the pattern of sequence.
• Try ‘a b c d e f g…’
• But how about ‘z y x w v u t s…’ ?
 The idea behind RNN is to make use of sequential
information.
 Let’s learn a pattern of a sequence, and utilize (estimate,
generate, etc…) it!
 But HOW?
Recurrent Neural Networks : Typical RNNs
2017-03-28 CPSLAB (EECS) 3
OUTPUT
INPUT
ONE
STEP
DELAY
HIDDEN
STATE
 RNNs are called “RECURRENT” because they
perform the same task for every element of a
sequence, with the output being depended on
the previous computations.
 RNNs have a “memory” which captures
information about what has been calculated so
far.
 The hidden state ℎ 𝑡 captures some information
about a sequence.
 If we use 𝑓 = tanh , Vanishing/Exploding
gradient problem happens.
 For overcome this, we use LSTM/GRU.
𝒉 𝒕
𝒚 𝒕
𝒙 𝒕
ℎ 𝑡 = 𝑓 𝑈𝑥 𝑡 + 𝑊ℎ 𝑡−1 + 𝑏
𝑦𝑡 = 𝑉ℎ 𝑡 + 𝑐
𝑈
𝑊
𝑉
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 4
 Let’s think about the machine, which guesses the dinner menu from
things in shopping bag.
Umm,,
Carbonara!
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 5
𝑪 𝒕
Cell state,
Internal memory unit,
Like a conveyor belt!
𝒉 𝒕
𝒙 𝒕
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 6
𝑪 𝒕
Cell state,
Internal memory unit,
Like a conveyor belt!
𝒉 𝒕
𝒙 𝒕
Forget
Some
Memories!
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 7
𝑪 𝒕
Cell state,
Internal memory unit,
Like a conveyor belt!
𝒉 𝒕
𝒙 𝒕
Forget
Some
Memories!
LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given,
(2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 8
𝑪 𝒕
Cell state,
Internal memory unit,
Like a conveyor belt!
𝒉 𝒕
𝒙 𝒕
Insert
Some
Memories!
LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given,
(2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 9
𝑪 𝒕
Cell state,
Internal memory unit,
Like a conveyor belt!
𝒉 𝒕
𝒙 𝒕
LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given,
(2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 10
𝑪 𝒕
Cell state,
Internal memory unit,
Like a conveyor belt!
𝒉 𝒕
𝒙 𝒕
LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given,
(2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 11
𝑪 𝒕
Cell state,
Internal memory unit,
Like a conveyor belt!
𝒉 𝒕
𝒚 𝒕
𝒙 𝒕
LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given,
(2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 12
LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given,
(2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 13
LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given,
(2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Recurrent Neural Networks : LSTM
2017-03-28 CPSLAB (EECS) 14
LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given,
(2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Recurrent Neural Networks : GRU
2017-03-28 CPSLAB (EECS) 15
𝑓𝑡 = 𝜎(𝑊𝑓 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏𝑓)
𝑖 𝑡 = 𝜎 𝑊𝑖 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏𝑖
𝑜𝑡 = 𝜎(𝑊𝑜 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑜)
𝐶𝑡 = tanh 𝑊𝐶 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝐶
𝐶𝑡 = 𝑓𝑡 ∗ 𝐶𝑡−1 + 𝑖 𝑡 ∗ 𝐶𝑡
ℎ 𝑡 = 𝑜𝑡 ∗ tanh(𝐶𝑡)
Maybe we can simplify this structure, efficiently!
GRU
𝑧𝑡 = 𝜎 𝑊𝑧 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑧
𝑟𝑡 = 𝜎 𝑊𝑟 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑟
ℎ 𝑡 = tanh 𝑊ℎ ∙ 𝑟𝑡 ∗ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝐶
ℎ 𝑡 = (1 − 𝑧𝑡) ∗ ℎ 𝑡−1 + 𝑧𝑡 ∗ ℎ 𝑡
Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Sequence to Sequence Model: What is it?
2017-03-28 CPSLAB (EECS) 16
ℎ 𝑒(1) ℎ 𝑒(2) ℎ 𝑒(3) ℎ 𝑒(4) ℎ 𝑒(5)
LSTM/GRU
Encoder
LSTM/GRU
Decoder
ℎ 𝑑(1) ℎ 𝑑(𝑇𝑒)
Western Food
To
Korean Food
Transition
Sequence to Sequence Model: Implementation
2017-03-28 CPSLAB (EECS) 17
 The simplest way to implement sequence to sequence model is
to just pass the last hidden state of decoder 𝒉 𝑻
to the first GRU cell of encoder!
 However, this method’s power gets weaker when the encoder need to
generate longer sequence.
Sequence to Sequence Model: Attention Decoder
2017-03-28 CPSLAB (EECS) 18
Bidirectional
GRU Encoder
Attention
GRU Decoder
𝑐𝑡
 For each GRU cell consisting the
decoder, let’s differently pass the
encoder’s information!
ℎ𝑖 =
ℎ𝑖
ℎ𝑖
𝑐𝑖 =
𝑗=1
𝑇𝑥
𝛼𝑖𝑗ℎ𝑗
𝑠𝑖 = 𝑓 𝑠𝑖−1, 𝑦𝑖−1, 𝑐𝑖
= 1 − 𝑧𝑖 ∗ 𝑠𝑖−1 + 𝑧𝑖 ∗ 𝑠𝑖
𝑧𝑖 = 𝜎 𝑊𝑧 𝑦𝑖−1 + 𝑈𝑧 𝑠𝑖−1
𝑟𝑖 = 𝜎 𝑊𝑟 𝑦𝑖−1 + 𝑈𝑟 𝑠𝑖−1
𝑠𝑖 = tanh(𝑦𝑖−1 + 𝑈 𝑟𝑖 ∗ 𝑠𝑖−1 + 𝐶𝑐𝑖)
𝛼𝑖𝑗 =
exp(𝑒 𝑖𝑗)
𝑘=1
𝑇 𝑥 exp(𝑒 𝑖𝑘)
𝑒𝑖𝑗 = 𝑣 𝑎
𝑇
tanh 𝑊𝑎 𝑠𝑖−1 + 𝑈 𝑎ℎ𝑗
Sequence to Sequence Model: Example codes
2017-03-28 CPSLAB (EECS) 19
Codes Here @ Github
2017-03-28 CPSLAB (EECS) 20

Introduction For seq2seq(sequence to sequence) and RNN

  • 1.
    Introduction to Sequence toSequence Model 2017.03.16 Seminar Presenter : Hyemin Ahn
  • 2.
    Recurrent Neural Networks: For what? 2017-03-28 CPSLAB (EECS) 2  Human remembers and uses the pattern of sequence. • Try ‘a b c d e f g…’ • But how about ‘z y x w v u t s…’ ?  The idea behind RNN is to make use of sequential information.  Let’s learn a pattern of a sequence, and utilize (estimate, generate, etc…) it!  But HOW?
  • 3.
    Recurrent Neural Networks: Typical RNNs 2017-03-28 CPSLAB (EECS) 3 OUTPUT INPUT ONE STEP DELAY HIDDEN STATE  RNNs are called “RECURRENT” because they perform the same task for every element of a sequence, with the output being depended on the previous computations.  RNNs have a “memory” which captures information about what has been calculated so far.  The hidden state ℎ 𝑡 captures some information about a sequence.  If we use 𝑓 = tanh , Vanishing/Exploding gradient problem happens.  For overcome this, we use LSTM/GRU. 𝒉 𝒕 𝒚 𝒕 𝒙 𝒕 ℎ 𝑡 = 𝑓 𝑈𝑥 𝑡 + 𝑊ℎ 𝑡−1 + 𝑏 𝑦𝑡 = 𝑉ℎ 𝑡 + 𝑐 𝑈 𝑊 𝑉
  • 4.
    Recurrent Neural Networks: LSTM 2017-03-28 CPSLAB (EECS) 4  Let’s think about the machine, which guesses the dinner menu from things in shopping bag. Umm,, Carbonara!
  • 5.
    Recurrent Neural Networks: LSTM 2017-03-28 CPSLAB (EECS) 5 𝑪 𝒕 Cell state, Internal memory unit, Like a conveyor belt! 𝒉 𝒕 𝒙 𝒕
  • 6.
    Recurrent Neural Networks: LSTM 2017-03-28 CPSLAB (EECS) 6 𝑪 𝒕 Cell state, Internal memory unit, Like a conveyor belt! 𝒉 𝒕 𝒙 𝒕 Forget Some Memories!
  • 7.
    Recurrent Neural Networks: LSTM 2017-03-28 CPSLAB (EECS) 7 𝑪 𝒕 Cell state, Internal memory unit, Like a conveyor belt! 𝒉 𝒕 𝒙 𝒕 Forget Some Memories! LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given, (2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
  • 8.
    Recurrent Neural Networks: LSTM 2017-03-28 CPSLAB (EECS) 8 𝑪 𝒕 Cell state, Internal memory unit, Like a conveyor belt! 𝒉 𝒕 𝒙 𝒕 Insert Some Memories! LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given, (2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
  • 9.
    Recurrent Neural Networks: LSTM 2017-03-28 CPSLAB (EECS) 9 𝑪 𝒕 Cell state, Internal memory unit, Like a conveyor belt! 𝒉 𝒕 𝒙 𝒕 LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given, (2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
  • 10.
    Recurrent Neural Networks: LSTM 2017-03-28 CPSLAB (EECS) 10 𝑪 𝒕 Cell state, Internal memory unit, Like a conveyor belt! 𝒉 𝒕 𝒙 𝒕 LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given, (2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
  • 11.
    Recurrent Neural Networks: LSTM 2017-03-28 CPSLAB (EECS) 11 𝑪 𝒕 Cell state, Internal memory unit, Like a conveyor belt! 𝒉 𝒕 𝒚 𝒕 𝒙 𝒕 LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given, (2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡.
  • 12.
    Recurrent Neural Networks: LSTM 2017-03-28 CPSLAB (EECS) 12 LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given, (2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡. Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 13.
    Recurrent Neural Networks: LSTM 2017-03-28 CPSLAB (EECS) 13 LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given, (2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡. Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 14.
    Recurrent Neural Networks: LSTM 2017-03-28 CPSLAB (EECS) 14 LSTM learns (1) How to forget a memory when the ℎ 𝑡−1 and new input 𝑥 𝑡 is given, (2) Then how to add the new memory with given ℎ 𝑡−1 and 𝑥 𝑡. Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 15.
    Recurrent Neural Networks: GRU 2017-03-28 CPSLAB (EECS) 15 𝑓𝑡 = 𝜎(𝑊𝑓 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏𝑓) 𝑖 𝑡 = 𝜎 𝑊𝑖 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏𝑖 𝑜𝑡 = 𝜎(𝑊𝑜 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑜) 𝐶𝑡 = tanh 𝑊𝐶 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝐶 𝐶𝑡 = 𝑓𝑡 ∗ 𝐶𝑡−1 + 𝑖 𝑡 ∗ 𝐶𝑡 ℎ 𝑡 = 𝑜𝑡 ∗ tanh(𝐶𝑡) Maybe we can simplify this structure, efficiently! GRU 𝑧𝑡 = 𝜎 𝑊𝑧 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑧 𝑟𝑡 = 𝜎 𝑊𝑟 ∙ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝑟 ℎ 𝑡 = tanh 𝑊ℎ ∙ 𝑟𝑡 ∗ ℎ 𝑡−1, 𝑥 𝑡 + 𝑏 𝐶 ℎ 𝑡 = (1 − 𝑧𝑡) ∗ ℎ 𝑡−1 + 𝑧𝑡 ∗ ℎ 𝑡 Figures from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 16.
    Sequence to SequenceModel: What is it? 2017-03-28 CPSLAB (EECS) 16 ℎ 𝑒(1) ℎ 𝑒(2) ℎ 𝑒(3) ℎ 𝑒(4) ℎ 𝑒(5) LSTM/GRU Encoder LSTM/GRU Decoder ℎ 𝑑(1) ℎ 𝑑(𝑇𝑒) Western Food To Korean Food Transition
  • 17.
    Sequence to SequenceModel: Implementation 2017-03-28 CPSLAB (EECS) 17  The simplest way to implement sequence to sequence model is to just pass the last hidden state of decoder 𝒉 𝑻 to the first GRU cell of encoder!  However, this method’s power gets weaker when the encoder need to generate longer sequence.
  • 18.
    Sequence to SequenceModel: Attention Decoder 2017-03-28 CPSLAB (EECS) 18 Bidirectional GRU Encoder Attention GRU Decoder 𝑐𝑡  For each GRU cell consisting the decoder, let’s differently pass the encoder’s information! ℎ𝑖 = ℎ𝑖 ℎ𝑖 𝑐𝑖 = 𝑗=1 𝑇𝑥 𝛼𝑖𝑗ℎ𝑗 𝑠𝑖 = 𝑓 𝑠𝑖−1, 𝑦𝑖−1, 𝑐𝑖 = 1 − 𝑧𝑖 ∗ 𝑠𝑖−1 + 𝑧𝑖 ∗ 𝑠𝑖 𝑧𝑖 = 𝜎 𝑊𝑧 𝑦𝑖−1 + 𝑈𝑧 𝑠𝑖−1 𝑟𝑖 = 𝜎 𝑊𝑟 𝑦𝑖−1 + 𝑈𝑟 𝑠𝑖−1 𝑠𝑖 = tanh(𝑦𝑖−1 + 𝑈 𝑟𝑖 ∗ 𝑠𝑖−1 + 𝐶𝑐𝑖) 𝛼𝑖𝑗 = exp(𝑒 𝑖𝑗) 𝑘=1 𝑇 𝑥 exp(𝑒 𝑖𝑘) 𝑒𝑖𝑗 = 𝑣 𝑎 𝑇 tanh 𝑊𝑎 𝑠𝑖−1 + 𝑈 𝑎ℎ𝑗
  • 19.
    Sequence to SequenceModel: Example codes 2017-03-28 CPSLAB (EECS) 19 Codes Here @ Github
  • 20.