GluonCV

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Soji Adeshina, Machine Learning Engineer, Amazon AI
Computer Vision 101 - Gluon
CV

Computer Vision Architectures for Image
Classification : A brief Timeline

Convolution
• Ideal for picking up on spatial
patterns in data
• Applied over and over again
(layer after layer), you can create
more abstracted spatial features
• Inspired by experiments on visual
cortex of a cat.
• Can be run in parallel for really
fast computations
http://colah.github.io/posts/2014-07-
Understanding-Convolutions/

LeNet 1995
• Challenge: Multiple convolutions blow up dimensionality
• Solution: Pooling
• AvgPooling/Subsampling - average over patches (works OK)
• MaxPooling - pick the maximum over patches (much better)

AlexNet (Krizhevsky et al., 2012)
• More convolutional layers
• More channels
• More filters
• More data
More computation

VGG (2014)
+
vs.
• Want to reach receptive field of size k
• Use one large filter (linear mix of many, then nonlinearity)
• Use several small filters (many linear mixes of few) - has fewer parameters
• Simonyan & Zisserman, 2014 find that deep and narrow wins
Deep and Narrow or Wide and Shallow?

Fancy structures - Networks of networks
• Compute different filters
• Compose one big vector from all of them
• Layer them iteratively
Szegedy et al. arxiv.org/pdf/1409.4842v1.pdf
Inception (2014)

Batch Norm (Ioffe et al., 2015) loss
data
• Loss occurs at last layer
• Last layers learn quickly
• Data is inserted at bottom layer
• Bottom layers change - everything changes
• Last layers need to relearn many times
• Slow convergence
• This is like covariate shift
Can we avoid changing last layers while
learning first layers?

Batch Norm (Ioffe et al., 2015)
• Can we avoid changing last layers while
learning first layers?
• Fix mean and variance
and adjust it separately mean
variance

ResNet (He et al., 2015)
• In regular layer simple
function is given by f(x) = 0
• Key idea - ‘Taylor expansion’

DenseNet (Huang et al., 2016)
• Simple Function
• In ResNet ‘Taylor expansion’ ends after one term
• In DenseNet use multiple steps

Gluon CV: Deep Learning Toolkit for Computer
Vision

Why GluonCV?
What is the biggest challenge you have ever encountered
with deep learning?

Why GluonCV?
What is the biggest challenge you have ever encountered
with deep learning?
“reproducing the best claimed results”

Real-world Stories
Back to a period in 2016, the same ImageNet models trained by MXNet
achieves on average 1% worse accuracy compared to Torch.
Tried almost everything to debug, even developed a plugin to run Torch
code inside MXNet so that it is easier to compare the results.
Transcoding training images using 95 JPEG quality rather than 85
solved the problem.

Real-world Stories
Using another open source DL framework, a similar problem
happened: trained model accuracies cannot match previous internal
version.
Spent months to figure out why, with no clue.
The order of data augmentation is different from previous version.

Starting from scratch can be hard
• Even the most talented researchers will get blocked by trivial things.
• Experiences and instincts might be your enemies in certain
circumstances.
• Training is time-consuming, initialization and augmentation is
randomized, and tons of implementation details need to be taken
care of. Debugging deep models is extremely difficult.

• Qualities of open-source implementations vary.
• Languages, code styles, project structures, DL frameworks are
mixed.
• Personal projects tend to focusing on a specific task with specific
datasets. It requires significant engineering efforts to adapt to your
use case.
• Community projects can be abandoned frequently.
Embracing open source solutions can be difficult

What does GluonCV provide
Reproduction of important papers in recent years
Training scripts (as well as tuned hyper-
parameters) to reproduce the results
Considerate APIs and modules that are easy to
follow and understand, so that experiments based
on existing algorithms are less frustrating
Community support, feel free to ask and discuss

What’s in GluonCV
Image Classification
• More than 20+ pre-trained ImageNet models(ResNet,
MobileNet…)
• We achieved the best accuracy using some of the most popular
models(e.g., ResNet), compared with other frameworks

What’s in GluonCV
• Object Detection
• SSD and YOLOv3: fastest
solution
• Faster-RCNN, RFCN and
FPN: slower but more
accurate, especially for tiny
objects
• Mask-RCNN: simultaneous
object detection and
semantic segmentation

What’s in GluonCV
Semantic Segmentation
• FCN
• PSPNet
• Mask-RCNN
• DeepLab
Instance Segmentation
• Mask-RCNN

What’s in GluonCV
• Style Transfer
• MSGNet
• Generative Adversarial
Networks (GAN)
• CycleGAN

Like GluonCV?
https://gluon-cv.mxnet.io
https://github.com/dmlc/gluon-cv

GluonCV

More Related Content

Similar to GluonCV (16)

Recently uploaded (20)

GluonCV

Editor's Notes