Practical Deep Learning
Tambet Matiisen
Machine Learning Meetup
29.03.2015
Deep Learning in Practice
0. Use web API
1. Use pre-trained model
1.5. Fine-tune pre-trained model
2. Train your own model
2.5. Write custom layer/loss function
3. Design your own architecture
Deep Learning in Practice
0. Use web API
1. Use pre-trained model
1.5. Fine-tune pre-trained model
2. Train your own model
2.5. Write custom layer/loss function
3. Design your own architecture
Microsoft Project Oxford
www.projectoxford.ai
Face Detection Similar Face Search Face Grouping
Face IdentificationEmotion
Recognition
Face Verification
Microsoft Project Oxford Demo
https://www.projectoxford.ai/demo/Emotion
Other Services
• Computer vision API
– image categorization, pornography detection, OCR
• Video API
– face tracking, motion detection
• Speech API
– Speech recognition, speaker recognition
• Language API
– Spell check, entity recognition, predict next word
Price
• 5000-10000 transactions per month free.
• Later from $0.05 to $4 per 1000 transactions.
• Good for prototyping?
• But what if
– your dataset is too big?
– your dataset is sensitive?
– the pricing model doesn’t match?
Deep Learning in Practice
0. Use web API
1. Use pre-trained model
1.5. Fine-tune pre-trained model
2. Train your own model
2.5. Write custom layer/loss function
3. Design your own architecture
Caffe
• Developed in Berkley Vision and Learning
Center (BVLC).
• Written in C++, bindings for Python and
Matlab.
• Works under Ubuntu, OSX and with some
effort in Windows.
• Uses GPUs (cuDNN) to accelerate learning.
caffe.berkeleyvision.org
Caffe Model Zoo
• Others: age and gender classification, emotion recognition,
car model classification, flower classification, image hashing,
image segmentation, object detection (RCNN) etc.
1000 everyday objects,
including 100 dog breeds
205 scene categories,
including outdoors, indoors
Face feature extraction
https://github.com/BVLC/caffe/wiki/Model-Zoo
Caffe Pretrained Model Demo
https://github.com/BVLC/caffe/blob/master/examples/00-classification.ipynb
Transfer Learning
 
i
ii yxyxd 2
)(),(
ImageNet Transfer Learning
Demo
posters.dreamitget.it
Deep Learning in Practice
0. Use web API
1. Use pre-trained model
1.5. Fine-tune pre-trained model
2. Train your own model
2.5. Write custom layer/loss function
3. Design your own architecture
Fine-tuning Pretrained Network
10
freeze layers train
Fine-tuning Pretrained Network
10
freeze layers train
Fine-tuning Pretrained Network
10
freeze layers train
Fine-tuning Pretrained Network
10
train
Example: Estonian Border Guard
Deep Learning in Practice
0. Use web API
1. Use pre-trained model
1.5. Fine-tune pre-trained model
2. Train your own model
2.5. Write custom layer/loss function
3. Design your own architecture
Images labeled “Lennart Meri”
Many people on the image.
Face too small.
Not facing the camera.
The person
not on the
image at all!
Training your own model
• Make sure you have enough labeled data
– Minimum in thousands, preferably in millions.
• Start with existing (most similar) architecture:
– Models based on ImageNet (256x256 color images)
– Models based on CIFAR-10 (32x32 color images)
• Scale down hidden layer sizes so that ratio of
samples / parameters stays roughly the same.
– AlexNet: 1.2M training images, 61M parameters
– Your dataset: 100K images, 6M parameters
Python toolkits
Keras Neon
• Written in Python
• Built on top of Theano,
supports also TensorFlow
• Inspired by Torch API
• Plenty of examples
• Written in Python
• Custom GPU backend,
written in GPU assembler
• Fastest convolutions
• Plenty of examples
keras.io neon.nervanasys.com
Defining architecture
Keras Neon
Training the model
Keras
Neon
The Good and the Bad...
Keras
• Nicer API
• Better documentation
• Can be extended with Theano
• Slower convolutions
• Compilation time (Theano)
• Repo stability
Neon
• Fastest convolutions
• Some nice gimmicks:
– deconvolution layer
– object detection (RCNN)
– guided backpropagation
• Recurrent networks slow
• Documentation
• Repo stability
Use with text data! Use with image data!
Hyperparameter shock
• Too many hyperparameters to try – number of
layers, hidden nodes, filter size, learning rate etc.
– Start with default parameters from example
– Use adaptive learning rate (Adam, Rmsprop)
– Use batch normalization
– Turn off regularization at first
– Overfit small subset and then regularize with more
data and dropout. Consider data augmentation.
– Do greedy search, changing one parameter at a time
– If desparate, try Bayesian optimization
Example: positioning rat
using brain activity
Deep Learning in Practice
0. Use web API
1. Use pre-trained model
1.5. Fine-tune pre-trained model
2. Train your own model
2.5. Write custom layer/loss function
3. Design your own architecture
Deep Learning in Practice
0. Use web API
1. Use pre-trained model
1.5. Fine-tune pre-trained model
2. Train your own model
2.5. Write custom layer/loss function
3. Design your own architecture
TensorFlow
• Computational flow graph
– Automatic differentiation
• Runs on CPUs and GPUs
– Desktop, server, mobile
• Asynchronous computation
– Assign nodes to different devices
• Connect research and production
– The same code can be run everywhere
Example: Logistic Regression
yp
a
L



W
a
a
L
W
L







x
W
a



)( ypx
W
L T



bxWa 
)softmax(ap 

ji
ijij pyL
,
)log(
x W
b×
+
Softmax
Logy
*
Sum
-
bxWa 
)softmax(ap 

ji
ijij pyL
,
)log(
x W
b×
+
Softmax
Logy
*
Sum
-
Differentiable Programming
1. Express your assumptions about the problem as
computational graph
2. Come up with meaningful loss function
3. Optimize the hell out of it using gradient descent
4. Profit!!!
• Automatic differentiation inefficient?
– C seemed inefficient compared to assembler
– ORMs seemed inefficient compared to SQL
Choose the level of abstraction that
you are comfortable with!
0. Use web API
1. Use pre-trained model
2. Train your own model
3. Design your own architecture
Thank you!
tambet@ut.ee

Practical Deep Learning

  • 1.
    Practical Deep Learning TambetMatiisen Machine Learning Meetup 29.03.2015
  • 3.
    Deep Learning inPractice 0. Use web API 1. Use pre-trained model 1.5. Fine-tune pre-trained model 2. Train your own model 2.5. Write custom layer/loss function 3. Design your own architecture
  • 4.
    Deep Learning inPractice 0. Use web API 1. Use pre-trained model 1.5. Fine-tune pre-trained model 2. Train your own model 2.5. Write custom layer/loss function 3. Design your own architecture
  • 5.
    Microsoft Project Oxford www.projectoxford.ai FaceDetection Similar Face Search Face Grouping Face IdentificationEmotion Recognition Face Verification
  • 6.
    Microsoft Project OxfordDemo https://www.projectoxford.ai/demo/Emotion
  • 7.
    Other Services • Computervision API – image categorization, pornography detection, OCR • Video API – face tracking, motion detection • Speech API – Speech recognition, speaker recognition • Language API – Spell check, entity recognition, predict next word
  • 8.
    Price • 5000-10000 transactionsper month free. • Later from $0.05 to $4 per 1000 transactions. • Good for prototyping? • But what if – your dataset is too big? – your dataset is sensitive? – the pricing model doesn’t match?
  • 9.
    Deep Learning inPractice 0. Use web API 1. Use pre-trained model 1.5. Fine-tune pre-trained model 2. Train your own model 2.5. Write custom layer/loss function 3. Design your own architecture
  • 10.
    Caffe • Developed inBerkley Vision and Learning Center (BVLC). • Written in C++, bindings for Python and Matlab. • Works under Ubuntu, OSX and with some effort in Windows. • Uses GPUs (cuDNN) to accelerate learning. caffe.berkeleyvision.org
  • 11.
    Caffe Model Zoo •Others: age and gender classification, emotion recognition, car model classification, flower classification, image hashing, image segmentation, object detection (RCNN) etc. 1000 everyday objects, including 100 dog breeds 205 scene categories, including outdoors, indoors Face feature extraction https://github.com/BVLC/caffe/wiki/Model-Zoo
  • 12.
    Caffe Pretrained ModelDemo https://github.com/BVLC/caffe/blob/master/examples/00-classification.ipynb
  • 13.
  • 14.
  • 15.
    Deep Learning inPractice 0. Use web API 1. Use pre-trained model 1.5. Fine-tune pre-trained model 2. Train your own model 2.5. Write custom layer/loss function 3. Design your own architecture
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
    Deep Learning inPractice 0. Use web API 1. Use pre-trained model 1.5. Fine-tune pre-trained model 2. Train your own model 2.5. Write custom layer/loss function 3. Design your own architecture
  • 22.
    Images labeled “LennartMeri” Many people on the image. Face too small. Not facing the camera. The person not on the image at all!
  • 23.
    Training your ownmodel • Make sure you have enough labeled data – Minimum in thousands, preferably in millions. • Start with existing (most similar) architecture: – Models based on ImageNet (256x256 color images) – Models based on CIFAR-10 (32x32 color images) • Scale down hidden layer sizes so that ratio of samples / parameters stays roughly the same. – AlexNet: 1.2M training images, 61M parameters – Your dataset: 100K images, 6M parameters
  • 24.
    Python toolkits Keras Neon •Written in Python • Built on top of Theano, supports also TensorFlow • Inspired by Torch API • Plenty of examples • Written in Python • Custom GPU backend, written in GPU assembler • Fastest convolutions • Plenty of examples keras.io neon.nervanasys.com
  • 25.
  • 26.
  • 27.
    The Good andthe Bad... Keras • Nicer API • Better documentation • Can be extended with Theano • Slower convolutions • Compilation time (Theano) • Repo stability Neon • Fastest convolutions • Some nice gimmicks: – deconvolution layer – object detection (RCNN) – guided backpropagation • Recurrent networks slow • Documentation • Repo stability Use with text data! Use with image data!
  • 28.
    Hyperparameter shock • Toomany hyperparameters to try – number of layers, hidden nodes, filter size, learning rate etc. – Start with default parameters from example – Use adaptive learning rate (Adam, Rmsprop) – Use batch normalization – Turn off regularization at first – Overfit small subset and then regularize with more data and dropout. Consider data augmentation. – Do greedy search, changing one parameter at a time – If desparate, try Bayesian optimization
  • 29.
  • 31.
    Deep Learning inPractice 0. Use web API 1. Use pre-trained model 1.5. Fine-tune pre-trained model 2. Train your own model 2.5. Write custom layer/loss function 3. Design your own architecture
  • 32.
    Deep Learning inPractice 0. Use web API 1. Use pre-trained model 1.5. Fine-tune pre-trained model 2. Train your own model 2.5. Write custom layer/loss function 3. Design your own architecture
  • 33.
    TensorFlow • Computational flowgraph – Automatic differentiation • Runs on CPUs and GPUs – Desktop, server, mobile • Asynchronous computation – Assign nodes to different devices • Connect research and production – The same code can be run everywhere
  • 34.
    Example: Logistic Regression yp a L    W a a L W L        x W a    )(ypx W L T    bxWa  )softmax(ap   ji ijij pyL , )log(
  • 35.
  • 36.
  • 37.
    Differentiable Programming 1. Expressyour assumptions about the problem as computational graph 2. Come up with meaningful loss function 3. Optimize the hell out of it using gradient descent 4. Profit!!! • Automatic differentiation inefficient? – C seemed inefficient compared to assembler – ORMs seemed inefficient compared to SQL
  • 38.
    Choose the levelof abstraction that you are comfortable with! 0. Use web API 1. Use pre-trained model 2. Train your own model 3. Design your own architecture
  • 39.