0% found this document useful (0 votes)
6 views10 pages

Sequence Models-II

Uploaded by

Gurvinder Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views10 pages

Sequence Models-II

Uploaded by

Gurvinder Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Sequence Models-II

Many-to-Many-RNNs for Unequal Length

D R. JA SM EE T SIN G H,
A SSI STANT P RO F ES SOR,
C SE D, T IE T
Many-to-Many RNNs for Unequal Input-
Output Length
 For applications, like machine translation, question-answering systems,
document summarization, etc. the lengths of input and output pairs are of unequal
length.
 For machine translation systems, the input is a source language text and output
is a target language text of unequal lengths.
 Similarly, for document summarization systems, the input is a document whose
summary is desired but the output is the summary of the given input document.
The output length is quite less than the input document length.
 For these kind of applications, traditional RNNs are modified to Sequence-to-
Sequence Encoder-Decoder architectures.
Seq2Seq Encoder-Decoder Architecture
 A sequence-to-sequence (Seq2Seq) model with an encoder-decoder architecture is
commonly used for applications where inputs and outputs lengths are equal.
1. Encoder (Processes Input Sequence)
•Takes an input sequence x1,x2,...,xTx.
•Processes each token sequentially using recurrent units.
•The final hidden state summarizes the input and is passed to the decoder.
2. Decoder (Generates Output Sequence)
•Takes the final hidden state of the encoder as its initial hidden state.
•Generates the output sequence y1,y2,...,yTy.
•Each output yt is dependent on the previous output yt−1
Seq2Seq Encoder-Decoder Architecture
(Contd….)
Seq2Seq Encoder-Decoder Forward
Propagation
Encoder Forward Pass:
1. Initialize hidden state (may be zero matrix or any random values).
2. For each time step in 𝑥:

• Compute the hidden state:

• If last input token xTx, pass to the decoder.

where 𝑊 is of shape: nXn, ba is of shape nX1, Wax is of shape nX|V1|


n: number of neurons in encoder/decoder hidden layer, |V1|: length of vocabulary of source
language.
Seq2Seq Encoder-Decoder Forward
Propagation (Contd….)
Decoder Forward Pass:

1. Initialize 𝑎 =𝑎

2. For each time step t in Ty:

 Compute the hidden state: 𝑎 = tanh 𝑊 𝑎 +𝑊 𝑦 +𝑏

 Compute the output: 𝑦 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑊 𝑎 +𝑏 )

Where 𝑊 is of shape |V2| X n and by is of shape |V2| X 1; |V2| being length of vocabulary of
target language.
Seq2Seq Encoder-Decoder Loss Function
 Compute Loss:

Use Cross-Entropy Loss (since machine translation is a classification task at each step). For one
examples, the loss is computed as:

𝐿= 𝐿 (𝑦 , 𝑦 )

Where 𝐿 𝑦 , 𝑦 = −𝑦 ∗ log(𝑦 )
Seq2Seq Encoder-Decoder Back-
Propagation
 During the back-propagation phase, gradient of loss function w.r.t Wya (dWya), Way (dWay), Waa (dWaa), Wax
(dWax), ba (dba), and by (dby) is computed.

𝜕𝐿 𝜕𝐿 𝜕𝐿 𝜕𝐿
= + + ⋯………+
𝜕𝑊 𝜕𝑊 𝜕𝑊 𝜕𝑊

Now, = . = 𝑦 −𝑦 ∗𝑎

𝜕𝐿 𝜕𝐿 𝜕𝐿 𝜕𝐿
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑙𝑦, = + + ⋯………+
𝜕𝑏 𝜕𝑏 𝜕𝑏 𝜕𝑏

= . = 𝑦 −𝑦 ∗1= 𝑦 −𝑦
Seq2Seq Encoder-Decoder Back-
Propagation (Contd…..)

. .

𝝏𝑳𝒕 𝒕 𝝏𝑳𝒕 𝝏𝒚𝒕 𝝏𝒂𝒊𝒅𝒆𝒄


𝝏𝑾𝒂𝒚 𝒊 𝟏 𝝏𝒚𝒕 𝝏𝒂𝒊 𝝏𝑾𝒂𝒚
𝒅𝒆𝒄

𝝏𝑳𝒕 𝑻𝒙 𝝏𝑳𝒕 𝝏𝒚𝒕 𝝏𝒂𝒕𝒅𝒆𝒄 𝝏𝒂𝒕𝒅𝒆𝒄𝟏 𝝏𝒂𝟏𝒅𝒆𝒄 𝝏𝒂𝒊𝒆𝒏𝒄


𝝏𝑾𝒂𝒙 𝒊 𝟏 𝝏𝒚𝒕 . 𝝏𝒂𝒕 𝒕 𝟏 𝒕 𝟐 𝝏𝒂𝒊𝒆𝒏𝒄 𝝏𝑾𝒂𝒙
𝒅𝒆𝒄 𝝏𝒂𝒅𝒆𝒄 𝝏𝒂𝒅𝒆𝒄
Seq2Seq Encoder-Decoder Back-
Propagation (Contd…..)
= + + ⋯………+

𝒕 𝝏𝑳𝒕 𝝏𝒚𝒕 𝝏𝒂𝒊𝒅𝒆𝒄 𝑻𝒙 𝝏𝑳𝒕 𝝏𝒚𝒕 𝝏𝒂𝒕𝒅𝒆𝒄 𝝏𝒂𝒕𝒅𝒆𝒄𝟏 𝝏𝒂𝟏𝒅𝒆𝒄 𝝏𝒂𝒊𝒆𝒏𝒄


𝑁𝑜𝑤, = ∑𝒊 𝟏 𝒕 . 𝒊 .
𝝏𝒚 𝝏𝒂𝒅𝒆𝒄 𝝏𝑾𝒂𝒂
+ ∑𝒊 𝟏 𝝏𝒚𝒕 . 𝝏𝒂𝒕 . 𝒕 𝟏 . 𝒕 𝟐 … … … … … … … . 𝝏𝒂𝒊 . 𝝏𝑾
𝒅𝒆𝒄 𝝏𝒂𝒅𝒆𝒄 𝝏𝒂𝒅𝒆𝒄 𝒆𝒏𝒄 𝒂𝒂

You might also like