The Evolution of AutoML

1
Use AI to Build AI
The Evolution of AutoML
Ning Jiang
CTO, OneClick.ai
2018

Ning Jiang
Co-founder of OneClick.ai, the first automated
Deep Learning platform in the market.
Previously Dev Manager at Microsoft Bing, Ning
has over 15 years of R&D experience in AI for ads,
search, and cyber security.
2

{ Challenges in AI Applications }
4
1. Never enough experienced data scientists
2. Long development cycle (typically 3 mo to 0.5 year)
3. High risk of failure
4. Endless engineering traps in implementation and
maintenance

{ Coming Along With Deep Learning }
5
1. Few experienced data scientists and engineers
2. Increasing complexity in data (mix images, text, and numbers)
3. Algorithms need to be customized
4. Increased design choices and hyper-parameters
5. Much harder to debug

{ AutoML }
7
Controller
Model Training Model Validation
Model Designs
Validation DataTraining Data

{ Key Challenges }
8
1. Satisfy semantic Constraints (e.g. data types）
2. Take the feedback to improve model designs
3. Minimize number of models to train
4. Avoid local minima
5. Speed up model training

{ Neural Architecture Search }
9
1. Evolutionary algorithms
(ref: https://arxiv.org/abs/1703.01041)
2. Greedy search
3. Reinforcement learning
4. Speed up model training

{ Target Scenarios }
11
1. Image classification (on CIFAR-10 & ImageNet)
2. Using only Convolution & Pooling layers
3. This is what powers Google AutoML

{ Constraints }
12
1. Predefined architectures
2. N=2
3. # of filters decided by heuristics
4. NAS to find optimal Cell
structure

{ Basic constructs }
13
Each construct has
1. Two inputs
2. Processed by two operators
3. One combined output
Operator 1 Operator 2
输入1 输入2

{ Predefined Operators }
14
Why these and these only?
1. 3X3 convolution
2. 5X5 convolution
3. 7X7 convolution
4. Identity (pass through)
5. 3X3 average pooling
6. 3X3 max pooling
7. 3x3 dilated convolution
8. 1X7 followed by 7X1 convolution
Operator 1 Operator 2
输入1 输入2

{ Cells }
15
1. Stacking up to 5 basic
constructs
2. About 5.6x1014
cell
candidates

{ Greedy Search }
16
1. Start with a single construct
(m=1)
2. There are 256 possibilities
3. Add one more construct
4. Pick the best K (256) cells to train
5. Repeat step ¾ until we have 5
constructs in the cell
6. 1028 models to be trained

{ Pick the best cells}
17
1. Cells as a sequence of choices
2. LSTM to estimate model
accuracy
3. Training data are from trained
models (up to 1024 examples)
4. 99.03% accuracy at m=2
5. 99.52% at m=5
LSTM
Dense
Input2
Input2
Operator1
Operator2

{ Summary }
18
1. Fewer models to train
○ Remarkable improvement over evolutionary algorithms
2. Search from simple to complex models
3. Heavy use of domain knowledge and heuristics
4. Suboptimal results due to greedy search
5. Can’t generalize to other problems

{ Why RL? }
20
1. RL is a generative model
2. RL doesn't assume less domain knowledge on the problem
3. Trained model accuracy is used as rewards

{ RNN Controler }
22
1. Autoregressive RNN
2. Outputs capable of describe any architecture
3. Support non-linear architecture using Skip Connections

{ Stochastic Sampling }
24
For example:
1. Filter size has 4 choices：24，36，48，64
2. For each layer of convolution, RNN outputs a distribution:
○ 60%，20% ，10%， 10%）
○ With 60% chances, the filter size will be 24
3. This helps collects data to correct controller’s mistakes

{ Training RNN Controller }
25
1. Use REINFORCE to update controller parameters
○ Binary rewards (0/1)
○ Trained model accuracy is the prob. of reward being 1
○ Apply cross entropy to RNN outputs
2. Designs with higher accuracy are assigned higher prob.

{ Speed Up Model Training }
26
1. When same layers are shared across architectures
2. Share the same layer parameters
3. Alternating training between models

{ Summary }
27
1. Better model accuracy
2. Can be made to work with complex architectures
3. Able to correct controller mistakes (e.g. bias)
4. Speed up training when layers can be shared
○ From 40K to 16 GPU hours
5. Designed for specific type of problems
6. Still very expensive with typically 10K GPU hours

{ Challenges }
29
1. NAS algorithms are domain specific
2. Only neural networks are supported
3. Heavy use of human heuristics
4. Expensive (thousands of GPU hours)
5. Cold start problem: NAS has no prior knowledge about data

{ Our Answer }
30
Controller
Model Training Model Validation
Model Designs
Validation DataTraining Data
Training Data

{ Generalized Architecture Search }
31
1. Accumulate domain knowledge over time
2. Works with any algorithm (neural networks or not)
3. Automated feature engineering
4. Much fewer models to train
5. GAS powers OneClick.ai

32
Use AI to Build AI
1. Custom-built Deep Learning models for best performance
2. Model designs improved iteratively in few hours
3. Better models in fewer shots due to self-learned domain
knowledge
Meta-learning evaluates millions of
deep learning models in the blink of
an eye. US patent pending

33
Versatile Applications
1. Data types: numeric, categorical, date/time, textual, images
2. Applications: regression, classification, time-series forecasting,
clustering, recommendations, vision
Powered by deep learning, we support
an unprecedented range of applications
and data types

34
Unparalleled Simplicity
1. Users need zero AI background
2. Simpler to use than Excel
3. Advanced functions available to experts via a chatbot
Thanks to a chatbot-based UX, we can
accommodate both newbie and expert
users

Use AI to Build AI
Sign up on http://oneclick.ai
ask@oneclick.ai

The Evolution of AutoML

More Related Content

What's hot (20)

Similar to The Evolution of AutoML (20)

Recently uploaded (20)

The Evolution of AutoML