Siamese Network
Shusen Wang
Learning Pairwise Similarity Scores
Reference:
• Bromley et al. Signature verification using a Siamese time delay neural network. In NIPS. 1994.
• Koch, Zemel, & Salakhutdinov. Siamese neural networks for one-shot image recognition. In ICML, 2015.
Training Set
Husky Elephant Tiger Macaw Car
⋮ ⋮ ⋮ ⋮ ⋮
Training Data
Positive Samples Negative Samples
( , , 1) ( , , 0)
( , , 1) ( , , 0)
( , , 1) ( , , 0)
CNN for Feature Extraction
Conv, Conv,
Pool Pool Flatten
𝐱 𝐟 𝐱
Training Siamese Network
𝐱$
𝐱%
Training Siamese Network
𝐱$
𝐱%
Training Siamese Network
𝐱$ 𝐡$ = 𝐟 𝐱$
𝐱% 𝐡% = 𝐟 𝐱 %
Training Siamese Network
𝐱$ 𝐡$ = 𝐟 𝐱$
𝐟 𝐳 = 𝐡$ − 𝐡%
𝐱% 𝐡% = 𝐟 𝐱 %
Training Siamese Network
𝐱$ 𝐡$ = 𝐟 𝐱$
Dense
Layers
𝐟 𝐳 = 𝐡$ − 𝐡%
𝐱% 𝐡% = 𝐟 𝐱 %
Training Siamese Network
𝐱$ 𝐡$ = 𝐟 𝐱$
Dense
Layers Sigmoid
sim 𝐱$, 𝐱 %
𝐟 𝐳 = 𝐡$ − 𝐡%
𝐱% 𝐡% = 𝐟 𝐱 %
Training Siamese Network
Target = 1 Loss
𝐟
𝐱$ 𝐡$ = 𝐟 𝐱$
Dense
Layers Sigmoid
sim 𝐱$, 𝐱 %
𝐟 𝐳 = 𝐡$ − 𝐡%
𝐱% 𝐡% = 𝐟 𝐱 %
Training Siamese Network
Target = 1 Loss
𝐟
𝐱$ 𝐡$ = 𝐟 𝐱$
Dense
Layers Sigmoid
sim 𝐱$, 𝐱 %
𝐟 𝐳 = 𝐡$ − 𝐡%
𝐱% 𝐡% = 𝐟 𝐱 %
Training Siamese Network
Target = 1 Loss
𝐟
𝐱$ 𝐡$ = 𝐟 𝐱$
Dense
Layers Sigmoid
sim 𝐱$, 𝐱 %
𝐟 𝐳 = 𝐡$ − 𝐡%
𝐱% 𝐡% = 𝐟 𝐱 %
Training Siamese Network
Target = 1 Loss
𝐟
𝐱$ 𝐡$ = 𝐟 𝐱$
Dense
Layers Sigmoid
sim 𝐱$, 𝐱 %
𝐟 𝐳 = 𝐡$ − 𝐡%
𝐱% 𝐡% = 𝐟 𝐱 %
Training Siamese Network
Target = 0 Loss
𝐟
𝐱$ 𝐡$ = 𝐟 𝐱$
Dense
Layers Sigmoid
sim 𝐱$, 𝐱 %
𝐟 𝐳 = 𝐡$ − 𝐡%
𝐱% 𝐡% = 𝐟 𝐱 %
One-Shot Prediction
• 6-way 1-shot prediction: support set has 6 test classes; each class has
1 sample.
• The training data (for the Siamese network) does not contain the 6
classes.
Support Set:
Fox Squirrel Rabbit Hamster Otter Beaver
One-Shot Prediction
Query:
Support Set:
Fox Squirrel Rabbit Hamster Otter Beaver
One-Shot Prediction
Query:
sim = 0.2
Fox Squirrel Rabbit Hamster Otter Beaver
One-Shot Prediction
Query:
sim = 0.2 sim = 0.9
Fox Squirrel Rabbit Hamster Otter Beaver
One-Shot Prediction
Query:
sim = 0.2 sim = 0.9 sim = 0.7 sim = 0.5 sim = 0.3 sim = 0.4
Fox Squirrel Rabbit Hamster Otter Beaver
One-Shot Prediction
Query:
sim = 0.2 sim = 0.9 sim = 0.7 sim = 0.5 sim = 0.3 sim = 0.4
Fox Squirrel Rabbit Hamster Otter Beaver
Triplet Loss
Reference:
• Schroff, Kalenichenko, & Philbin. Facenet: A unified embedding for face recognition and clustering. In
CVPR, 2015.
Data for Training Siamese Network
Training Set
Elephant Tiger Macaw Car
Data for Training Siamese Network
Training Set
Elephant Tiger Macaw Car
anchor
Data for Training Siamese Network
Training Set
Elephant Tiger Macaw Car
anchor
Data for Training Siamese Network
Training Set
Elephant Tiger Macaw Car
Positive
Data for Training Siamese Network
Training Set
Elephant Tiger Macaw Car
Positive
Data for Training Siamese Network
Training Set
Elephant Tiger Macaw Car
Negative
Data for Training Siamese Network
Training Set
Elephant Tiger Macaw Car
Negative
Triplet Loss
7
𝐱
(positive)
𝐱6
(anchor)
𝐱8
(negative)
Triplet Loss
𝐱 7 𝐟
(positive)
6 𝐟
𝐱
(anchor)
𝐱 8 𝐟
(negative)
Triplet Loss
𝐱 7 𝐟
(positive)
𝐟 𝐱7
6 𝐟
𝐱 𝐟 𝐱6
(anchor)
𝐱 8 𝐟
(negative)
𝐟 𝐱8
Triplet Loss
𝐱 7 𝐟
(positive)
𝐟 𝐱7
7 7 6 %
𝑑 = 𝐟 𝐱 −𝐟 𝐱 %
6 𝐟
𝐱 𝐟 𝐱6
(anchor)
𝐱 8 𝐟
(negative)
𝐟 𝐱8
Triplet Loss
𝐱 7 𝐟
(positive)
𝐟 𝐱7
7 7 6 %
𝑑 = 𝐟 𝐱 −𝐟 𝐱 %
6 𝐟
𝐱 𝐟 𝐱6
(anchor)
8 6 8 %
𝑑 = 𝐟 𝐱 −𝐟 𝐱 %
𝐱 8 𝐟
(negative)
𝐟 𝐱8
Triplet Loss
Feature Space
7
𝐱
(positive)
𝐱6
(anchor)
𝐱 8 𝐟 𝐱6
(negative)
Triplet Loss
Feature Space
7
𝐱
(positive)
𝐱6
(anchor)
𝑑7
𝐟 𝐱7
𝐱 8 𝐟 𝐱6
(negative)
Triplet Loss
Feature Space
7
𝐱
(positive) 𝐟 𝐱8
𝑑8
𝐱6
(anchor)
𝑑7
𝐟 𝐱7
𝐱 8 𝐟 𝐱6
(negative)
Triplet Loss
%
7 7
• Encourage 𝑑 = 𝐟 𝐱 7
−𝐟 𝐱 6
to be small.
𝐱 %
(positive)
8 6 8 %
• Encourage 𝑑 = 𝐟 𝐱 −𝐟 𝐱 %
to be big.
𝐱6
(anchor)
𝐱8
(negative)
Triplet Loss
%
7 7
• Encourage 𝑑 = 𝐟 𝐱 7
−𝐟 𝐱 6
to be small.
𝐱 %
(positive)
8 6 8 %
• Encourage 𝑑 = 𝐟 𝐱 −𝐟 𝐱 %
to be big.
• If 𝑑 8 ≥ 𝑑 7 + 𝛼, then no loss. (𝛼 > 0 is margin.)
𝐱6
(anchor) • Otherwise, the loss is 𝑑 7 + 𝛼 − 𝑑 8 .
𝐱8
(negative)
Triplet Loss
%
7 7
• Encourage 𝑑 = 𝐟 𝐱 7
−𝐟 𝐱 6
to be small.
𝐱 %
(positive)
8 6 8 %
• Encourage 𝑑 = 𝐟 𝐱 −𝐟 𝐱 %
to be big.
• If 𝑑 8 ≥ 𝑑 7 + 𝛼, then no loss. (𝛼 > 0 is margin.)
𝐱6
(anchor) • Otherwise, the loss is 𝑑 7 + 𝛼 − 𝑑 8 .
• Loss 𝐱 6 , 𝐱 7 , 𝐱 8 = max 0, 𝑑 7 + 𝛼 − 𝑑 8 .
𝐱8 • Update the CNN (function 𝐟) to decrease the loss.
(negative)
One-Shot Prediction
Query:
Support Set:
Fox Squirrel Rabbit Hamster Otter Beaver
One-Shot Prediction
Query:
dist = 231
Fox Squirrel Rabbit Hamster Otter Beaver
One-Shot Prediction
Query:
dist = 231 dist = 19
Fox Squirrel Rabbit Hamster Otter Beaver
One-Shot Prediction
Query:
dist = 231 dist = 19 dist = 138 dist = 76 dist = 122 dist = 94
Fox Squirrel Rabbit Hamster Otter Beaver
One-Shot Prediction
Query:
dist = 231 dist = 19 dist = 138 dist = 76 dist = 122 dist = 94
Fox Squirrel Rabbit Hamster Otter Beaver
Summary
Basic Idea of Few-Shot Learning
• Train a Siamese network on large-scale training set.
• Given a support set of 𝑘-way 𝑛-shot.
• 𝑘-way means 𝑘 classes.
• 𝑛-shot means every class has 𝑛 samples.
• The training set does not contain the 𝑘 classes.
• Given a query, predict its class.
• Use the Siamese network to compute similarity or distance.
Siamese Network for Pairwise Similarity
Target = 1 Loss
𝐟
𝐱$ 𝐡$ = 𝐟 𝐱$
Dense
Layers Sigmoid
sim 𝐱$, 𝐱 %
𝐟 𝐳 = 𝐡$ − 𝐡%
𝐱% 𝐡% = 𝐟 𝐱 %
Siamese Network with Triplet Loss
𝐱 7 𝐟
(positive)
𝐟 𝐱7
7 7 6 %
𝑑 = 𝐟 𝐱 −𝐟 𝐱 %
6 𝐟
𝐱 𝐟 𝐱6
(anchor)
8 6 8 %
𝑑 = 𝐟 𝐱 −𝐟 𝐱 %
𝐱 8 𝐟
(negative)
𝐟 𝐱8
Thank you!