0% found this document useful (0 votes)
16 views14 pages

VR Part2 Lecture 5 Annotated

Uploaded by

Achintya Harsha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views14 pages

VR Part2 Lecture 5 Annotated

Uploaded by

Achintya Harsha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

VISUAL RECOGNITION – PART 2

Lecture 5: Transformers
Image Captioning

Encoder
Representation
Attention in LSTMs for Image Captioning
Sequence Modeling : Issues with RNN Architecture

• Single hidden state from Encoder fed into Decoder

• Every step in decoder depends only on previous hidden state and


prediction

• Dependencies in a sequence need to be ‘sequential’

• Learn to attend to relevant regions in ‘source’ at every step

Language Translation
http://jalammar.github.io/illustrated-transformer/
https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Self Attention- Motivation
Self Attention: A method to improve word embeddings.

Example : “The animal didn’t cross the street because it was too tired”

RNN-style Modeling Modeling with Self-Attention


http://jalammar.github.io/illustrated-transformer/
https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Self Attention

dmodel

Queries: The animal

= dk

Keys:

= dk

Values:

= dv
To control the input
values to softmax
function
http://jalammar.github.io/illustrated-transformer/

Matrix Calculation of Self Attention


http://jalammar.github.io/illustrated-transformer/

Multiple- Attention Heads


The
animal

The
animal

2. Multiply with Weight matrix W0


that is trained jointly with model.
1. Concatenate all attention heads

3. Final Matrix Z captures


attention from all heads.
http://jalammar.github.io/illustrated-transformer/

Multiple- Attention Heads

Attention Calculation Results concatenation


8 -heads
using Qi , Ki, Vi and multiplication with
final output weight
matrix Wo

The
animal
https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

Multiple- Attention Heads – Illustration

Advantages:
Parallelizable Computations
Long Range Dependency Modeling.
http://jalammar.github.io/illustrated-transformer/

Self Attention - Illustration

Focus of single attention head Focus of Multiple attention heads


https://kazemnejad.com/blog/transformer_architecture_positional_encoding/

Positional Encoding Illustration:


Intuition for Position representation : Red to Orange
is decreasing frequency of bit reversal

Positional Encoding: uses float values instead of bits

Positional encoding contains pairs of sines


and cosines at different frequencies
https://kazemnejad.com/blog/transformer_architecture_positional_encoding/

Learnable Position Embeddings


https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

Machine Translation : Transformer Architecture

Encoder & Decoder has stack of N =6 identical layers

Encoder Self Attention :


Input = Word embedding + Position embedding

Decoder Self Attention :


Input = Word embedding + Position embedding + Masking

Encoder - Decoder Attention :


Queries = Previous Decoder Layer
Keys, Values : Encoder Layer Output

Every position in the decoder to attend over all positions in the input
sequence

You might also like