1
Use AI to Build AI
The Evolution of AutoML
Ning Jiang
CTO, OneClick.ai
2018
Ning Jiang
Co-founder of OneClick.ai, the first automated
Deep Learning platform in the market.
Previously Dev Manager at Microsoft Bing, Ning
has over 15 years of R&D experience in AI for ads,
search, and cyber security.
2
So, Why AutoML?
{ Challenges in AI Applications }
4
1. Never enough experienced data scientists
2. Long development cycle (typically 3 mo to 0.5 year)
3. High risk of failure
4. Endless engineering traps in implementation and
maintenance
{ Coming Along With Deep Learning }
5
1. Few experienced data scientists and engineers
2. Increasing complexity in data (mix images, text, and numbers)
3. Algorithms need to be customized
4. Increased design choices and hyper-parameters
5. Much harder to debug
What is AutoML
{ AutoML }
7
Controller
Model Training Model Validation
Model Designs
Validation DataTraining Data
{ Key Challenges }
8
1. Satisfy semantic Constraints (e.g. data types)
2. Take the feedback to improve model designs
3. Minimize number of models to train
4. Avoid local minima
5. Speed up model training
{ Neural Architecture Search }
9
1. Evolutionary algorithms
(ref: https://arxiv.org/abs/1703.01041)
2. Greedy search
(ref: https://arxiv.org/abs/1712.00559)
3. Reinforcement learning
(ref: https://arxiv.org/abs/1611.01578)
4. Speed up model training
(ref: https://arxiv.org/abs/1802.03268)
Greedy Search
{ Target Scenarios }
11
1. Image classification (on CIFAR-10 & ImageNet)
2. Using only Convolution & Pooling layers
3. This is what powers Google AutoML
{ Constraints }
12
1. Predefined architectures
2. N=2
3. # of filters decided by heuristics
4. NAS to find optimal Cell
structure
{ Basic constructs }
13
Each construct has
1. Two inputs
2. Processed by two operators
3. One combined output
Operator 1 Operator 2
输入1 输入2
{ Predefined Operators }
14
Why these and these only?
1. 3X3 convolution
2. 5X5 convolution
3. 7X7 convolution
4. Identity (pass through)
5. 3X3 average pooling
6. 3X3 max pooling
7. 3x3 dilated convolution
8. 1X7 followed by 7X1 convolution
Operator 1 Operator 2
输入1 输入2
{ Cells }
15
1. Stacking up to 5 basic
constructs
2. About 5.6x1014
cell
candidates
{ Greedy Search }
16
1. Start with a single construct
(m=1)
2. There are 256 possibilities
3. Add one more construct
4. Pick the best K (256) cells to train
5. Repeat step ¾ until we have 5
constructs in the cell
6. 1028 models to be trained
{ Pick the best cells}
17
1. Cells as a sequence of choices
2. LSTM to estimate model
accuracy
3. Training data are from trained
models (up to 1024 examples)
4. 99.03% accuracy at m=2
5. 99.52% at m=5
LSTM
Dense
Input2
Input2
Operator1
Operator2
{ Summary }
18
1. Fewer models to train
○ Remarkable improvement over evolutionary algorithms
2. Search from simple to complex models
3. Heavy use of domain knowledge and heuristics
4. Suboptimal results due to greedy search
5. Can’t generalize to other problems
Reinforcement Learning
{ Why RL? }
20
1. RL is a generative model
2. RL doesn't assume less domain knowledge on the problem
3. Trained model accuracy is used as rewards
{ RNN Controller }
21
{ RNN Controler }
22
1. Autoregressive RNN
2. Outputs capable of describe any architecture
3. Support non-linear architecture using Skip Connections
{ Skip Connections }
23
{ Stochastic Sampling }
24
For example:
1. Filter size has 4 choices:24,36,48,64
2. For each layer of convolution, RNN outputs a distribution:
○ 60%,20% ,10%, 10%)
○ With 60% chances, the filter size will be 24
3. This helps collects data to correct controller’s mistakes
{ Training RNN Controller }
25
1. Use REINFORCE to update controller parameters
○ Binary rewards (0/1)
○ Trained model accuracy is the prob. of reward being 1
○ Apply cross entropy to RNN outputs
2. Designs with higher accuracy are assigned higher prob.
{ Speed Up Model Training }
26
1. When same layers are shared across architectures
2. Share the same layer parameters
3. Alternating training between models
{ Summary }
27
1. Better model accuracy
2. Can be made to work with complex architectures
3. Able to correct controller mistakes (e.g. bias)
4. Speed up training when layers can be shared
○ From 40K to 16 GPU hours
5. Designed for specific type of problems
6. Still very expensive with typically 10K GPU hours
So, What is Next?
{ Challenges }
29
1. NAS algorithms are domain specific
2. Only neural networks are supported
3. Heavy use of human heuristics
4. Expensive (thousands of GPU hours)
5. Cold start problem: NAS has no prior knowledge about data
{ Our Answer }
30
Controller
Model Training Model Validation
Model Designs
Validation DataTraining Data
Training Data
{ Generalized Architecture Search }
31
1. Accumulate domain knowledge over time
2. Works with any algorithm (neural networks or not)
3. Automated feature engineering
4. Much fewer models to train
5. GAS powers OneClick.ai
32
Use AI to Build AI
1. Custom-built Deep Learning models for best performance
2. Model designs improved iteratively in few hours
3. Better models in fewer shots due to self-learned domain
knowledge
Meta-learning evaluates millions of
deep learning models in the blink of
an eye. US patent pending
33
Versatile Applications
1. Data types: numeric, categorical, date/time, textual, images
2. Applications: regression, classification, time-series forecasting,
clustering, recommendations, vision
Powered by deep learning, we support
an unprecedented range of applications
and data types
34
Unparalleled Simplicity
1. Users need zero AI background
2. Simpler to use than Excel
3. Advanced functions available to experts via a chatbot
Thanks to a chatbot-based UX, we can
accommodate both newbie and expert
users
Use AI to Build AI
Sign up on http://oneclick.ai
ask@oneclick.ai

More Related Content

PDF
Automated Machine Learning
PDF
AutoML - The Future of AI
PDF
Automatic Machine Learning, AutoML
PPTX
Microsoft Introduction to Automated Machine Learning
PPTX
Automated Machine Learning
PDF
The Power of Auto ML and How Does it Work
PDF
Kaggle presentation
PDF
Best Practice on using Azure OpenAI Service
Automated Machine Learning
AutoML - The Future of AI
Automatic Machine Learning, AutoML
Microsoft Introduction to Automated Machine Learning
Automated Machine Learning
The Power of Auto ML and How Does it Work
Kaggle presentation
Best Practice on using Azure OpenAI Service

What's hot (20)

PDF
Automatic machine learning (AutoML) 101
PDF
Building NLP applications with Transformers
PDF
LLMs Bootcamp
PDF
An introduction to the Transformers architecture and BERT
PPTX
Automated Machine Learning (Auto ML)
PPTX
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
PDF
Introduction to Transformers for NLP - Olga Petrova
PDF
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
PDF
Transformer Introduction (Seminar Material)
PDF
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
PPTX
Introduction to Transformer Model
PPTX
PPTX
Regulating Generative AI - LLMOps pipelines with Transparency
PDF
An Introduction to Generative AI - May 18, 2023
PDF
Customizing LLMs
PDF
Generative Models and ChatGPT
PDF
Intro to LLMs
PPTX
Getting Started with Azure AutoML
PPTX
BERT introduction
PPTX
Fine tune and deploy Hugging Face NLP models
Automatic machine learning (AutoML) 101
Building NLP applications with Transformers
LLMs Bootcamp
An introduction to the Transformers architecture and BERT
Automated Machine Learning (Auto ML)
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Introduction to Transformers for NLP - Olga Petrova
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Transformer Introduction (Seminar Material)
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
Introduction to Transformer Model
Regulating Generative AI - LLMOps pipelines with Transparency
An Introduction to Generative AI - May 18, 2023
Customizing LLMs
Generative Models and ChatGPT
Intro to LLMs
Getting Started with Azure AutoML
BERT introduction
Fine tune and deploy Hugging Face NLP models
Ad

Similar to The Evolution of AutoML (20)

PDF
Current clustering techniques
PDF
Machine Learning Infrastructure
PPTX
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
PDF
Alexandra johnson reducing operational barriers to model training
PDF
SigOpt at MLconf - Reducing Operational Barriers to Model Training
PPTX
Lessons Learned from Building Machine Learning Software at Netflix
PPTX
Serving deep learning models in a serverless platform (IC2E 2018)
PDF
Thesis Defense (Gwendal DANIEL) - Nov 2017
PPTX
Unsupervised Feature Learning
PPTX
Webinar: Deep Learning Pipelines Beyond the Learning
PDF
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
PDF
T-GCPMLE-A-m1-l1-en-bxbnzmmafile-1.en.pdf
PPTX
Bangla Hand Written Digit Recognition presentation slide .pptx
PDF
Data Con LA 2018 - Towards Data Science Engineering Principles by Joerg Schad
PDF
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
PPSX
Intro to Deep Learning with Keras - using TensorFlow backend
PPTX
3DCS and Parallel Works Provide Cloud Computing for FAST Tolerance Analysis
PPT
Mps intro
PDF
Many-Objective Performance Enhancement in Computing Clusters
PDF
First steps with Keras 2: A tutorial with Examples
Current clustering techniques
Machine Learning Infrastructure
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Alexandra johnson reducing operational barriers to model training
SigOpt at MLconf - Reducing Operational Barriers to Model Training
Lessons Learned from Building Machine Learning Software at Netflix
Serving deep learning models in a serverless platform (IC2E 2018)
Thesis Defense (Gwendal DANIEL) - Nov 2017
Unsupervised Feature Learning
Webinar: Deep Learning Pipelines Beyond the Learning
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
T-GCPMLE-A-m1-l1-en-bxbnzmmafile-1.en.pdf
Bangla Hand Written Digit Recognition presentation slide .pptx
Data Con LA 2018 - Towards Data Science Engineering Principles by Joerg Schad
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Intro to Deep Learning with Keras - using TensorFlow backend
3DCS and Parallel Works Provide Cloud Computing for FAST Tolerance Analysis
Mps intro
Many-Objective Performance Enhancement in Computing Clusters
First steps with Keras 2: A tutorial with Examples
Ad

Recently uploaded (20)

PDF
CEH Module 2 Footprinting CEH V13, concepts
PDF
Altius execution marketplace concept.pdf
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PDF
Co-training pseudo-labeling for text classification with support vector machi...
PPTX
Information-Technology-in-Human-Society.pptx
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
Launch a Bumble-Style App with AI Features in 2025.pdf
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PDF
Introduction to MCP and A2A Protocols: Enabling Agent Communication
PPTX
Build automations faster and more reliably with UiPath ScreenPlay
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PPTX
How to use fields_get method in Odoo 18
PDF
Identification of potential depression in social media posts
PPTX
Blending method and technology for hydrogen.pptx
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
Streamline Vulnerability Management From Minimal Images to SBOMs
PDF
CCUS-as-the-Missing-Link-to-Net-Zero_AksCurious.pdf
PPTX
Presentation - Principles of Instructional Design.pptx
CEH Module 2 Footprinting CEH V13, concepts
Altius execution marketplace concept.pdf
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
Co-training pseudo-labeling for text classification with support vector machi...
Information-Technology-in-Human-Society.pptx
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
Launch a Bumble-Style App with AI Features in 2025.pdf
Lung cancer patients survival prediction using outlier detection and optimize...
Introduction to MCP and A2A Protocols: Enabling Agent Communication
Build automations faster and more reliably with UiPath ScreenPlay
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Data Virtualization in Action: Scaling APIs and Apps with FME
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
How to use fields_get method in Odoo 18
Identification of potential depression in social media posts
Blending method and technology for hydrogen.pptx
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
Streamline Vulnerability Management From Minimal Images to SBOMs
CCUS-as-the-Missing-Link-to-Net-Zero_AksCurious.pdf
Presentation - Principles of Instructional Design.pptx

The Evolution of AutoML

  • 1. 1 Use AI to Build AI The Evolution of AutoML Ning Jiang CTO, OneClick.ai 2018
  • 2. Ning Jiang Co-founder of OneClick.ai, the first automated Deep Learning platform in the market. Previously Dev Manager at Microsoft Bing, Ning has over 15 years of R&D experience in AI for ads, search, and cyber security. 2
  • 4. { Challenges in AI Applications } 4 1. Never enough experienced data scientists 2. Long development cycle (typically 3 mo to 0.5 year) 3. High risk of failure 4. Endless engineering traps in implementation and maintenance
  • 5. { Coming Along With Deep Learning } 5 1. Few experienced data scientists and engineers 2. Increasing complexity in data (mix images, text, and numbers) 3. Algorithms need to be customized 4. Increased design choices and hyper-parameters 5. Much harder to debug
  • 7. { AutoML } 7 Controller Model Training Model Validation Model Designs Validation DataTraining Data
  • 8. { Key Challenges } 8 1. Satisfy semantic Constraints (e.g. data types) 2. Take the feedback to improve model designs 3. Minimize number of models to train 4. Avoid local minima 5. Speed up model training
  • 9. { Neural Architecture Search } 9 1. Evolutionary algorithms (ref: https://arxiv.org/abs/1703.01041) 2. Greedy search (ref: https://arxiv.org/abs/1712.00559) 3. Reinforcement learning (ref: https://arxiv.org/abs/1611.01578) 4. Speed up model training (ref: https://arxiv.org/abs/1802.03268)
  • 11. { Target Scenarios } 11 1. Image classification (on CIFAR-10 & ImageNet) 2. Using only Convolution & Pooling layers 3. This is what powers Google AutoML
  • 12. { Constraints } 12 1. Predefined architectures 2. N=2 3. # of filters decided by heuristics 4. NAS to find optimal Cell structure
  • 13. { Basic constructs } 13 Each construct has 1. Two inputs 2. Processed by two operators 3. One combined output Operator 1 Operator 2 输入1 输入2
  • 14. { Predefined Operators } 14 Why these and these only? 1. 3X3 convolution 2. 5X5 convolution 3. 7X7 convolution 4. Identity (pass through) 5. 3X3 average pooling 6. 3X3 max pooling 7. 3x3 dilated convolution 8. 1X7 followed by 7X1 convolution Operator 1 Operator 2 输入1 输入2
  • 15. { Cells } 15 1. Stacking up to 5 basic constructs 2. About 5.6x1014 cell candidates
  • 16. { Greedy Search } 16 1. Start with a single construct (m=1) 2. There are 256 possibilities 3. Add one more construct 4. Pick the best K (256) cells to train 5. Repeat step ¾ until we have 5 constructs in the cell 6. 1028 models to be trained
  • 17. { Pick the best cells} 17 1. Cells as a sequence of choices 2. LSTM to estimate model accuracy 3. Training data are from trained models (up to 1024 examples) 4. 99.03% accuracy at m=2 5. 99.52% at m=5 LSTM Dense Input2 Input2 Operator1 Operator2
  • 18. { Summary } 18 1. Fewer models to train ○ Remarkable improvement over evolutionary algorithms 2. Search from simple to complex models 3. Heavy use of domain knowledge and heuristics 4. Suboptimal results due to greedy search 5. Can’t generalize to other problems
  • 20. { Why RL? } 20 1. RL is a generative model 2. RL doesn't assume less domain knowledge on the problem 3. Trained model accuracy is used as rewards
  • 22. { RNN Controler } 22 1. Autoregressive RNN 2. Outputs capable of describe any architecture 3. Support non-linear architecture using Skip Connections
  • 24. { Stochastic Sampling } 24 For example: 1. Filter size has 4 choices:24,36,48,64 2. For each layer of convolution, RNN outputs a distribution: ○ 60%,20% ,10%, 10%) ○ With 60% chances, the filter size will be 24 3. This helps collects data to correct controller’s mistakes
  • 25. { Training RNN Controller } 25 1. Use REINFORCE to update controller parameters ○ Binary rewards (0/1) ○ Trained model accuracy is the prob. of reward being 1 ○ Apply cross entropy to RNN outputs 2. Designs with higher accuracy are assigned higher prob.
  • 26. { Speed Up Model Training } 26 1. When same layers are shared across architectures 2. Share the same layer parameters 3. Alternating training between models
  • 27. { Summary } 27 1. Better model accuracy 2. Can be made to work with complex architectures 3. Able to correct controller mistakes (e.g. bias) 4. Speed up training when layers can be shared ○ From 40K to 16 GPU hours 5. Designed for specific type of problems 6. Still very expensive with typically 10K GPU hours
  • 28. So, What is Next?
  • 29. { Challenges } 29 1. NAS algorithms are domain specific 2. Only neural networks are supported 3. Heavy use of human heuristics 4. Expensive (thousands of GPU hours) 5. Cold start problem: NAS has no prior knowledge about data
  • 30. { Our Answer } 30 Controller Model Training Model Validation Model Designs Validation DataTraining Data Training Data
  • 31. { Generalized Architecture Search } 31 1. Accumulate domain knowledge over time 2. Works with any algorithm (neural networks or not) 3. Automated feature engineering 4. Much fewer models to train 5. GAS powers OneClick.ai
  • 32. 32 Use AI to Build AI 1. Custom-built Deep Learning models for best performance 2. Model designs improved iteratively in few hours 3. Better models in fewer shots due to self-learned domain knowledge Meta-learning evaluates millions of deep learning models in the blink of an eye. US patent pending
  • 33. 33 Versatile Applications 1. Data types: numeric, categorical, date/time, textual, images 2. Applications: regression, classification, time-series forecasting, clustering, recommendations, vision Powered by deep learning, we support an unprecedented range of applications and data types
  • 34. 34 Unparalleled Simplicity 1. Users need zero AI background 2. Simpler to use than Excel 3. Advanced functions available to experts via a chatbot Thanks to a chatbot-based UX, we can accommodate both newbie and expert users
  • 35. Use AI to Build AI Sign up on http://oneclick.ai [email protected]