© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Soji Adeshina, Machine Learning Engineer, Amazon AI
Computer Vision 101 - Gluon
CV
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Computer Vision Architectures for Image
Classification : A brief Timeline
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Convolution
• Ideal for picking up on spatial
patterns in data
• Applied over and over again
(layer after layer), you can create
more abstracted spatial features
• Inspired by experiments on visual
cortex of a cat.
• Can be run in parallel for really
fast computations
http://colah.github.io/posts/2014-07-
Understanding-Convolutions/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
LeNet 1995
• Challenge: Multiple convolutions blow up dimensionality
• Solution: Pooling
• AvgPooling/Subsampling - average over patches (works OK)
• MaxPooling - pick the maximum over patches (much better)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AlexNet (Krizhevsky et al., 2012)
• More convolutional layers
• More channels
• More filters
• More data
More computation
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
VGG (2014)
+
vs.
• Want to reach receptive field of size k
• Use one large filter (linear mix of many, then nonlinearity)
• Use several small filters (many linear mixes of few) - has fewer parameters
• Simonyan & Zisserman, 2014 find that deep and narrow wins
Deep and Narrow or Wide and Shallow?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fancy structures - Networks of networks
• Compute different filters
• Compose one big vector from all of them
• Layer them iteratively
Szegedy et al. arxiv.org/pdf/1409.4842v1.pdf
Inception (2014)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Batch Norm (Ioffe et al., 2015) loss
data
• Loss occurs at last layer
• Last layers learn quickly
• Data is inserted at bottom layer
• Bottom layers change - everything changes
• Last layers need to relearn many times
• Slow convergence
• This is like covariate shift
Can we avoid changing last layers while
learning first layers?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Batch Norm (Ioffe et al., 2015)
• Can we avoid changing last layers while
learning first layers?
• Fix mean and variance
and adjust it separately mean
variance
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ResNet (He et al., 2015)
• In regular layer simple
function is given by f(x) = 0
• Key idea - ‘Taylor expansion’
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DenseNet (Huang et al., 2016)
• Simple Function
• In ResNet ‘Taylor expansion’ ends after one term
• In DenseNet use multiple steps
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gluon CV: Deep Learning Toolkit for Computer
Vision
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why GluonCV?
What is the biggest challenge you have ever encountered
with deep learning?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why GluonCV?
What is the biggest challenge you have ever encountered
with deep learning?
“reproducing the best claimed results”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Real-world Stories
Back to a period in 2016, the same ImageNet models trained by MXNet
achieves on average 1% worse accuracy compared to Torch.
Tried almost everything to debug, even developed a plugin to run Torch
code inside MXNet so that it is easier to compare the results.
Transcoding training images using 95 JPEG quality rather than 85
solved the problem.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Real-world Stories
Using another open source DL framework, a similar problem
happened: trained model accuracies cannot match previous internal
version.
Spent months to figure out why, with no clue.
The order of data augmentation is different from previous version.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Starting from scratch can be hard
• Even the most talented researchers will get blocked by trivial things.
• Experiences and instincts might be your enemies in certain
circumstances.
• Training is time-consuming, initialization and augmentation is
randomized, and tons of implementation details need to be taken
care of. Debugging deep models is extremely difficult.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Qualities of open-source implementations vary.
• Languages, code styles, project structures, DL frameworks are
mixed.
• Personal projects tend to focusing on a specific task with specific
datasets. It requires significant engineering efforts to adapt to your
use case.
• Community projects can be abandoned frequently.
Embracing open source solutions can be difficult
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What does GluonCV provide
Reproduction of important papers in recent years
Training scripts (as well as tuned hyper-
parameters) to reproduce the results
Considerate APIs and modules that are easy to
follow and understand, so that experiments based
on existing algorithms are less frustrating
Community support, feel free to ask and discuss
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s in GluonCV
Image Classification
• More than 20+ pre-trained ImageNet models(ResNet,
MobileNet…)
• We achieved the best accuracy using some of the most popular
models(e.g., ResNet), compared with other frameworks
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s in GluonCV
• Object Detection
• SSD and YOLOv3: fastest
solution
• Faster-RCNN, RFCN and
FPN: slower but more
accurate, especially for tiny
objects
• Mask-RCNN: simultaneous
object detection and
semantic segmentation
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s in GluonCV
Semantic Segmentation
• FCN
• PSPNet
• Mask-RCNN
• DeepLab
Instance Segmentation
• Mask-RCNN
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s in GluonCV
• Style Transfer
• MSGNet
• Generative Adversarial
Networks (GAN)
• CycleGAN
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Like GluonCV?
https://gluon-cv.mxnet.io
https://github.com/dmlc/gluon-cv
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

More Related Content

PDF
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D
PDF
"A Shallow Dive into Training Deep Neural Networks," a Presentation from Deep...
PDF
Lego blocks and pieces stacked on top of one another process 6 stages style ...
PDF
Scalable Deep Learning on AWS with Apache MXNet
PPTX
Autoscaling near-persistent EBS
PPTX
OpenStack in the Enterprise - NJ VMUG June 9, 2015 - Melissa Palmer
PPTX
Introduction to GluonCV
PPTX
Emotion recognition in images: from idea to a model in production - Nordic DS...
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D
"A Shallow Dive into Training Deep Neural Networks," a Presentation from Deep...
Lego blocks and pieces stacked on top of one another process 6 stages style ...
Scalable Deep Learning on AWS with Apache MXNet
Autoscaling near-persistent EBS
OpenStack in the Enterprise - NJ VMUG June 9, 2015 - Melissa Palmer
Introduction to GluonCV
Emotion recognition in images: from idea to a model in production - Nordic DS...

Similar to GluonCV (16)

PPTX
What is deep learning (and why you should care) - Talk at SJSU Oct 2018
PDF
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
PPT
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
PDF
Emotion Recognition in Images
PDF
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
PDF
Issues in AI product development and practices in audio applications
PPTX
Amazon SageMaker (December 2018)
PDF
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
PDF
Choosing a JVM Web Framework
PDF
Austin,TX Meetup presentation tensorflow final oct 26 2017
PDF
Using Java to deploy Deep Learning models with MXNet
PPTX
Deep Learning in Java with Apache MXNet
PPTX
Tensors for topic modeling and deep learning on AWS Sagemaker
PDF
Time series modeling workd AMLD 2018 Lausanne
PPTX
GoF Design patterns I: Introduction + Structural Patterns
What is deep learning (and why you should care) - Talk at SJSU Oct 2018
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Emotion Recognition in Images
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
Issues in AI product development and practices in audio applications
Amazon SageMaker (December 2018)
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Choosing a JVM Web Framework
Austin,TX Meetup presentation tensorflow final oct 26 2017
Using Java to deploy Deep Learning models with MXNet
Deep Learning in Java with Apache MXNet
Tensors for topic modeling and deep learning on AWS Sagemaker
Time series modeling workd AMLD 2018 Lausanne
GoF Design patterns I: Introduction + Structural Patterns
Ad

Recently uploaded (20)

PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PDF
EIS-Webinar-Regulated-Industries-2025-08.pdf
PDF
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
PDF
Electrocardiogram sequences data analytics and classification using unsupervi...
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PDF
SaaS reusability assessment using machine learning techniques
PDF
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
PDF
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
PDF
Altius execution marketplace concept.pdf
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
LMS bot: enhanced learning management systems for improved student learning e...
PDF
The AI Revolution in Customer Service - 2025
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PDF
CEH Module 2 Footprinting CEH V13, concepts
PDF
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
Rapid Prototyping: A lecture on prototyping techniques for interface design
EIS-Webinar-Regulated-Industries-2025-08.pdf
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
Electrocardiogram sequences data analytics and classification using unsupervi...
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
SaaS reusability assessment using machine learning techniques
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
Altius execution marketplace concept.pdf
Data Virtualization in Action: Scaling APIs and Apps with FME
LMS bot: enhanced learning management systems for improved student learning e...
The AI Revolution in Customer Service - 2025
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
CEH Module 2 Footprinting CEH V13, concepts
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
giants, standing on the shoulders of - by Daniel Stenberg
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
Ad

GluonCV

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Soji Adeshina, Machine Learning Engineer, Amazon AI Computer Vision 101 - Gluon CV
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computer Vision Architectures for Image Classification : A brief Timeline
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Convolution • Ideal for picking up on spatial patterns in data • Applied over and over again (layer after layer), you can create more abstracted spatial features • Inspired by experiments on visual cortex of a cat. • Can be run in parallel for really fast computations http://colah.github.io/posts/2014-07- Understanding-Convolutions/
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. LeNet 1995 • Challenge: Multiple convolutions blow up dimensionality • Solution: Pooling • AvgPooling/Subsampling - average over patches (works OK) • MaxPooling - pick the maximum over patches (much better)
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AlexNet (Krizhevsky et al., 2012) • More convolutional layers • More channels • More filters • More data More computation
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. VGG (2014) + vs. • Want to reach receptive field of size k • Use one large filter (linear mix of many, then nonlinearity) • Use several small filters (many linear mixes of few) - has fewer parameters • Simonyan & Zisserman, 2014 find that deep and narrow wins Deep and Narrow or Wide and Shallow?
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fancy structures - Networks of networks • Compute different filters • Compose one big vector from all of them • Layer them iteratively Szegedy et al. arxiv.org/pdf/1409.4842v1.pdf Inception (2014)
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Batch Norm (Ioffe et al., 2015) loss data • Loss occurs at last layer • Last layers learn quickly • Data is inserted at bottom layer • Bottom layers change - everything changes • Last layers need to relearn many times • Slow convergence • This is like covariate shift Can we avoid changing last layers while learning first layers?
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Batch Norm (Ioffe et al., 2015) • Can we avoid changing last layers while learning first layers? • Fix mean and variance and adjust it separately mean variance
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ResNet (He et al., 2015) • In regular layer simple function is given by f(x) = 0 • Key idea - ‘Taylor expansion’
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DenseNet (Huang et al., 2016) • Simple Function • In ResNet ‘Taylor expansion’ ends after one term • In DenseNet use multiple steps
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gluon CV: Deep Learning Toolkit for Computer Vision
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why GluonCV? What is the biggest challenge you have ever encountered with deep learning?
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why GluonCV? What is the biggest challenge you have ever encountered with deep learning? “reproducing the best claimed results”
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-world Stories Back to a period in 2016, the same ImageNet models trained by MXNet achieves on average 1% worse accuracy compared to Torch. Tried almost everything to debug, even developed a plugin to run Torch code inside MXNet so that it is easier to compare the results. Transcoding training images using 95 JPEG quality rather than 85 solved the problem.
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-world Stories Using another open source DL framework, a similar problem happened: trained model accuracies cannot match previous internal version. Spent months to figure out why, with no clue. The order of data augmentation is different from previous version.
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Starting from scratch can be hard • Even the most talented researchers will get blocked by trivial things. • Experiences and instincts might be your enemies in certain circumstances. • Training is time-consuming, initialization and augmentation is randomized, and tons of implementation details need to be taken care of. Debugging deep models is extremely difficult.
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Qualities of open-source implementations vary. • Languages, code styles, project structures, DL frameworks are mixed. • Personal projects tend to focusing on a specific task with specific datasets. It requires significant engineering efforts to adapt to your use case. • Community projects can be abandoned frequently. Embracing open source solutions can be difficult
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What does GluonCV provide Reproduction of important papers in recent years Training scripts (as well as tuned hyper- parameters) to reproduce the results Considerate APIs and modules that are easy to follow and understand, so that experiments based on existing algorithms are less frustrating Community support, feel free to ask and discuss
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s in GluonCV Image Classification • More than 20+ pre-trained ImageNet models(ResNet, MobileNet…) • We achieved the best accuracy using some of the most popular models(e.g., ResNet), compared with other frameworks
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s in GluonCV • Object Detection • SSD and YOLOv3: fastest solution • Faster-RCNN, RFCN and FPN: slower but more accurate, especially for tiny objects • Mask-RCNN: simultaneous object detection and semantic segmentation
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s in GluonCV Semantic Segmentation • FCN • PSPNet • Mask-RCNN • DeepLab Instance Segmentation • Mask-RCNN
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s in GluonCV • Style Transfer • MSGNet • Generative Adversarial Networks (GAN) • CycleGAN
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Like GluonCV? https://gluon-cv.mxnet.io https://github.com/dmlc/gluon-cv
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Editor's Notes

  • #4: Won a nobel prize for this in 19