Highly-scalable Reinforcement Learning RLlib for Real-world Applications

@deanwampler
What We’ll Talk About:
● Why Ray?
● RLlib - Reinforcement Learning with Ray
● RLlib demo
● Adopting Ray and the Ray community

@deanwamplerUsage%
2012 2014 2016 2018. 2020
Time
051015
Two Major Trends Hence, there is a pressing
need for robust, easy to
use solutions for
distributed PythonModel sizes and therefore
compute requirements
outstripping Moore’s Law
Moore’s Law (2x every 18 months)
35x every 18 months!
GPU
CPU
Python growth driven by
ML/AI and other data
science workloads
2013 2014 2015 2016 2017 2018 2019

@deanwampler
Hyperparam
Tuning
The ML Landscape Today
5
Training
Model
Serving
StreamingFeaturization
All require distributed
implementations to scale
Simulation

@deanwampler
Hyperparam
Tuning
The Ray Vision: Sharing a Common Framework
6
Training
Model
Serving
Streaming SimulationFeaturization
Framework for
distributed Python (and
other languages…)
Domain-specific libraries
for each subsystem
More libraries
coming soon

@deanwampler
Growing Adoption
205 contributors (228% more
than last year)
10K stars (225% more than
last year)
Sold out tutorials at O’Reilly AI
Included in AWS Sage Maker
See ray.io
for details:

@deanwampler
RLlib:
Reinforcement Learning with Ray
rllib.io

@deanwampler
Hyperparam
Tuning
Reinforcement Learning - Ray RLlib
9
Training
Model
Serving
Streaming SimulationFeaturization

@deanwampler
Reinforcement Learning
Decisions
(actions)
Consequences
(observations, rewards)
environmentagent
Industrial
Processes
System
Optimization
Advertising,
Recommendations
FinanceGames
Robotics,
Autonomous
Vehicles
RL applications

@deanwampler
Go as a Reinforcement
Learning Problem
AlphaGo (Silver et al. 2016)
● Observations:
○ board state
● Actions:
○ where to place the stones
● Rewards:
○ 1 if win
○ 0 otherwise
Decisions
(actions)
Consequences
(observations, rewards)
environmentagent

@deanwampler
RLlib: A Scalable, Unified Library for RL
Single-Agent Multi-Agent Hierarchical Offline Batch RL approaches
RLlib
RLlib Training API
PPO IMPALA QMIX
Custom
Algorithms
...
Distributed Execution with Ray
Industrial
Processes
System
Optimization
Advertising,
Recommendations
FinanceGames
Robotics,
Autonomous
Vehicles
RL applications

@deanwampler
● gradient-free
○ Augmented Random Search (ARS)
○ Evolution Strategies
● Multi-agent specific
○ QMIX Monotonic Value Factorisation
(QMIX, VDN, IQN)
● Offline
○ Advantage Re-Weighted Imitation Learning
(MARWIL)
Broad Range of Scalable Algorithms
● High-throughput architectures
○ Distributed Prioritized Experience Replay (Ape-X)
○ Importance Weighted Actor-Learner Architecture (IMPALA)
○ Asynchronous Proximal Policy Optimization (APPO)
● Gradient-based
○ Soft Actor-Critic (SAC)
○ Advantage Actor-Critic (A2C, A3C)
○ Deep Deterministic Policy Gradients (DDPG, TD3)
○ Deep Q Networks (DQN, Rainbow, Parametric DQN)
○ Policy Gradients
○ Proximal Policy Optimization (PPO)

@deanwampler
Different RL Loop Decompositions Must Be Supported
Async DQN (Mnigh et al, 2016)
Actor-
Learner
Actor-
Learner
Actor-
Learner
Param
Server
X <- rollout()
dθ <- grad(L,
X)
sync(dθ)
Ape-X DQN (Horgan et al, 2018)
Learner
Replay
Actor
Actor
Actor
θ <-
sync()
rollout()
X <- replay()
apply(grad(L, X))

@deanwampler
Different RL Loop Decompositions Must Be Supported
Async DQN (Mnigh et al, 2016) Ape-X DQN (Horgan et al, 2018)
Actor-
Learner
Actor-
Learner
Actor-
Learner
Param
Server Learner
Replay
Actor
Actor
Actor
Policy πθ(ot)
Trajectory
postprocessor ρθ(X)
Loss L(θ,X)

@deanwampler
So, We Need Abstractions for RL
Good abstractions decompose RL algorithms into
reusable components.
Goals:
● Code reuse across deep learning frameworks
● Scalable execution of algorithms
● Easily implement, compare, and reproduce
algorithms

@deanwampler
Policy Serving
RLlib Policy
Server
Multi-Agent
Actor Network
ActionState
Environment
Actor Network
ActionState
Training in Simulation
General Purpose APIs Impose Requirements on Ray
Ray was designed to be flexible for a
wide range of compute and memory
access patterns.

@deanwampler
Hence, RLlib provides a unified framework for scalable RL
that doesn’t compromise on performance
Distributed PPO
Evolution
Strategies
Ape-X Distributed
DQN, DDPG

@deanwampler
Adopting Ray
and the Ray community

@deanwampler
If you’re already using…
● asyncio
● joblib
● multiprocessing.Pool
● Use Ray’s implementations
● Drop-in replacements
● Change import statements
● Break the one-node limitation!
For example, from this:
from multiprocessing.pool import Pool
To this:
from ray.util.multiprocessing.pool import Pool
See these blog posts:
https://medium.com/distributed-computing-with-ray/how-to-scale-python-multiprocessing-to-a-cluster-with-one-line-of-code-d19f242f60ff
https://medium.com/distributed-computing-with-ray/easy-distributed-scikit-learn-training-with-ray-54ff8b643b33

@deanwampler
Ray Community and Resources
● ray.io & rllib.io
● Tutorials: Anyscale Academy (coming soon)
● Need help?
● Ray Slack: ray-distributed.slack.com
● ray-dev group

Highly-scalable Reinforcement Learning RLlib for Real-world Applications

More Related Content

What's hot (20)

Similar to Highly-scalable Reinforcement Learning RLlib for Real-world Applications (20)

More from Bill Liu (20)

Recently uploaded (20)

Highly-scalable Reinforcement Learning RLlib for Real-world Applications