@deanwampler
What We’ll Talk About:
● Why Ray?
● RLlib - Reinforcement Learning with Ray
● RLlib demo
● Adopting Ray and the Ray community
@deanwampler
Why Ray?
@deanwamplerUsage%
2012 2014 2016 2018. 2020
Time
051015
Two Major Trends Hence, there is a pressing
need for robust, easy to
use solutions for
distributed PythonModel sizes and therefore
compute requirements
outstripping Moore’s Law
Moore’s Law (2x every 18 months)
35x every 18 months!
GPU
CPU
Python growth driven by
ML/AI and other data
science workloads
2013 2014 2015 2016 2017 2018 2019
@deanwampler
Hyperparam
Tuning
The ML Landscape Today
5
Training
Model
Serving
StreamingFeaturization
All require distributed
implementations to scale
Simulation
@deanwampler
Hyperparam
Tuning
The Ray Vision: Sharing a Common Framework
6
Training
Model
Serving
Streaming SimulationFeaturization
Framework for
distributed Python (and
other languages…)
Domain-specific libraries
for each subsystem
More libraries
coming soon
@deanwampler
Growing Adoption
205 contributors (228% more
than last year)
10K stars (225% more than
last year)
Sold out tutorials at O’Reilly AI
Included in AWS Sage Maker
See ray.io
for details:
@deanwampler
RLlib:
Reinforcement Learning with Ray
rllib.io
@deanwampler
Hyperparam
Tuning
Reinforcement Learning - Ray RLlib
9
Training
Model
Serving
Streaming SimulationFeaturization
@deanwampler
Reinforcement Learning
Decisions
(actions)
Consequences
(observations, rewards)
environmentagent
Industrial
Processes
System
Optimization
Advertising,
Recommendations
FinanceGames
Robotics,
Autonomous
Vehicles
RL applications
@deanwampler
Reinforcement Learning
Decisions
(actions)
Consequences
(observations, rewards)
environmentagent
Industrial
Processes
System
Optimization
Advertising,
Recommendations
FinanceGames
Robotics,
Autonomous
Vehicles
RL applications
@deanwampler
Reinforcement Learning
Decisions
(actions)
Consequences
(observations, rewards)
environmentagent
Industrial
Processes
System
Optimization
Advertising,
Recommendations
FinanceGames
Robotics,
Autonomous
Vehicles
RL applications
@deanwampler
Reinforcement Learning
Decisions
(actions)
Consequences
(observations, rewards)
environmentagent
Industrial
Processes
System
Optimization
Advertising,
Recommendations
FinanceGames
Robotics,
Autonomous
Vehicles
RL applications
@deanwampler
Reinforcement Learning
Decisions
(actions)
Consequences
(observations, rewards)
environmentagent
Industrial
Processes
System
Optimization
Advertising,
Recommendations
FinanceGames
Robotics,
Autonomous
Vehicles
RL applications
@deanwampler
Reinforcement Learning
Decisions
(actions)
Consequences
(observations, rewards)
environmentagent
Industrial
Processes
System
Optimization
Advertising,
Recommendations
FinanceGames
Robotics,
Autonomous
Vehicles
RL applications
@deanwampler
Reinforcement Learning
Decisions
(actions)
Consequences
(observations, rewards)
environmentagent
Industrial
Processes
System
Optimization
Advertising,
Recommendations
FinanceGames
Robotics,
Autonomous
Vehicles
RL applications
@deanwampler
Go as a Reinforcement
Learning Problem
AlphaGo (Silver et al. 2016)
● Observations:
○ board state
● Actions:
○ where to place the stones
● Rewards:
○ 1 if win
○ 0 otherwise
Decisions
(actions)
Consequences
(observations, rewards)
environmentagent
@deanwampler
RLlib: A Scalable, Unified Library for RL
Single-Agent Multi-Agent Hierarchical Offline Batch RL approaches
RLlib
RLlib Training API
PPO IMPALA QMIX
Custom
Algorithms
...
Distributed Execution with Ray
Industrial
Processes
System
Optimization
Advertising,
Recommendations
FinanceGames
Robotics,
Autonomous
Vehicles
RL applications
@deanwampler
● gradient-free
○ Augmented Random Search (ARS)
○ Evolution Strategies
● Multi-agent specific
○ QMIX Monotonic Value Factorisation
(QMIX, VDN, IQN)
● Offline
○ Advantage Re-Weighted Imitation Learning
(MARWIL)
Broad Range of Scalable Algorithms
● High-throughput architectures
○ Distributed Prioritized Experience Replay (Ape-X)
○ Importance Weighted Actor-Learner Architecture (IMPALA)
○ Asynchronous Proximal Policy Optimization (APPO)
● Gradient-based
○ Soft Actor-Critic (SAC)
○ Advantage Actor-Critic (A2C, A3C)
○ Deep Deterministic Policy Gradients (DDPG, TD3)
○ Deep Q Networks (DQN, Rainbow, Parametric DQN)
○ Policy Gradients
○ Proximal Policy Optimization (PPO)
@deanwampler
@deanwampler
RLlib Demo
@deanwampler
Different RL Loop Decompositions Must Be Supported
Async DQN (Mnigh et al, 2016)
Actor-
Learner
Actor-
Learner
Actor-
Learner
Param
Server
X <- rollout()
dθ <- grad(L,
X)
sync(dθ)
Ape-X DQN (Horgan et al, 2018)
Learner
Replay
Actor
Actor
Actor
θ <-
sync()
rollout()
X <- replay()
apply(grad(L, X))
@deanwampler
Different RL Loop Decompositions Must Be Supported
Async DQN (Mnigh et al, 2016) Ape-X DQN (Horgan et al, 2018)
Actor-
Learner
Actor-
Learner
Actor-
Learner
Param
Server Learner
Replay
Actor
Actor
Actor
Policy πθ(ot)
Trajectory
postprocessor ρθ(X)
Loss L(θ,X)
@deanwampler
Different RL Loop Decompositions Must Be Supported
Async DQN (Mnigh et al, 2016) Ape-X DQN (Horgan et al, 2018)
Actor-
Learner
Actor-
Learner
Actor-
Learner
Param
Server Learner
Replay
Actor
Actor
Actor
Policy πθ(ot)
Trajectory
postprocessor ρθ(X)
Loss L(θ,X)
@deanwampler
So, We Need Abstractions for RL
Good abstractions decompose RL algorithms into
reusable components.
Goals:
● Code reuse across deep learning frameworks
● Scalable execution of algorithms
● Easily implement, compare, and reproduce
algorithms
@deanwampler
Policy Serving
RLlib Policy
Server
Multi-Agent
Actor Network
ActionState
Environment
Actor Network
ActionState
Training in Simulation
General Purpose APIs Impose Requirements on Ray
Ray was designed to be flexible for a
wide range of compute and memory
access patterns.
@deanwampler
Hence, RLlib provides a unified framework for scalable RL
that doesn’t compromise on performance
Distributed PPO
Evolution
Strategies
Ape-X Distributed
DQN, DDPG
@deanwampler
Adopting Ray
and the Ray community
@deanwampler
If you’re already using…
● asyncio
● joblib
● multiprocessing.Pool
● Use Ray’s implementations
● Drop-in replacements
● Change import statements
● Break the one-node limitation!
For example, from this:
from multiprocessing.pool import Pool
To this:
from ray.util.multiprocessing.pool import Pool
See these blog posts:
https://medium.com/distributed-computing-with-ray/how-to-scale-python-multiprocessing-to-a-cluster-with-one-line-of-code-d19f242f60ff
https://medium.com/distributed-computing-with-ray/easy-distributed-scikit-learn-training-with-ray-54ff8b643b33
@deanwampler
Ray Community and Resources
● ray.io & rllib.io
● Tutorials: Anyscale Academy (coming soon)
● Need help?
● Ray Slack: ray-distributed.slack.com
● ray-dev group

More Related Content

PDF
Enabling Composition in Distributed Reinforcement Learning with Ray RLlib wit...
PDF
Crafting Recommenders: the Shallow and the Deep of it!
PDF
LLM 모델 기반 서비스 실전 가이드
PPTX
[HCM Scrum Breakfast] Agile estimation - Story points
PDF
ChatGPT-the-revolution-is-coming.pdf
PDF
The NLP Muppets revolution!
PDF
ChatGPT Evaluation for NLP
PDF
Personalizing "The Netflix Experience" with Deep Learning
Enabling Composition in Distributed Reinforcement Learning with Ray RLlib wit...
Crafting Recommenders: the Shallow and the Deep of it!
LLM 모델 기반 서비스 실전 가이드
[HCM Scrum Breakfast] Agile estimation - Story points
ChatGPT-the-revolution-is-coming.pdf
The NLP Muppets revolution!
ChatGPT Evaluation for NLP
Personalizing "The Netflix Experience" with Deep Learning

What's hot (20)

PDF
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
PDF
ChatGPT and OpenAI.pdf
PDF
Building NLP applications with Transformers
PDF
Agile stories, estimating and planning
PDF
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
PPTX
Strategies to split user stories
PPTX
Learn Prompting with ChatGPT
PDF
Stories, Backlog & Mapping
PDF
Scaled Agile Framework® and Objective Key Results
PDF
Large Language Models - Chat AI.pdf
PPTX
ChatGPT-Template-slidesppt.net_.pptx
PDF
ChatGPT Use- Cases
PDF
Reinventing Deep Learning
 with Hugging Face Transformers
PPTX
200109-Open AI Chat GPT-4-3.pptx
PDF
Proximal Policy Optimization (Reinforcement Learning)
PDF
Lego For Extended Scrum Simulation
PDF
E0 dd1d scrum-cheat-sheet
PPTX
2023 HR Technology Trends: What's New and What's Next
PDF
Scrum in Practice
PDF
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
ChatGPT and OpenAI.pdf
Building NLP applications with Transformers
Agile stories, estimating and planning
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Strategies to split user stories
Learn Prompting with ChatGPT
Stories, Backlog & Mapping
Scaled Agile Framework® and Objective Key Results
Large Language Models - Chat AI.pdf
ChatGPT-Template-slidesppt.net_.pptx
ChatGPT Use- Cases
Reinventing Deep Learning
 with Hugging Face Transformers
200109-Open AI Chat GPT-4-3.pptx
Proximal Policy Optimization (Reinforcement Learning)
Lego For Extended Scrum Simulation
E0 dd1d scrum-cheat-sheet
2023 HR Technology Trends: What's New and What's Next
Scrum in Practice
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
Ad

Similar to Highly-scalable Reinforcement Learning RLlib for Real-world Applications (20)

PDF
Graph Gurus Episode 1: Enterprise Graph
PDF
FlinkML: Large Scale Machine Learning with Apache Flink
PDF
On unifying query languages for RDF streams
PDF
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
PPTX
Parallel analytics as a service
PDF
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
PDF
Seed rl paper review
PDF
RTL DESIGN IN ML WORLD_OBJECT AUTOMATION Inc
PDF
ACM Sunnyvale Meetup.pdf
PDF
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
PPTX
Pydata talk
PDF
MLconf seattle 2015 presentation
PPTX
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
PDF
useR 2014 jskim
PDF
TensorFlow London 11: Pierre Harvey Richemond 'Trends and Developments in Rei...
PDF
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
PDF
TinkerPop 2020
PDF
The Sierra Supercomputer: Science and Technology on a Mission
PDF
FlinkML - Big data application meetup
PDF
Deep Convolutional GANs - meaning of latent space
Graph Gurus Episode 1: Enterprise Graph
FlinkML: Large Scale Machine Learning with Apache Flink
On unifying query languages for RDF streams
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Parallel analytics as a service
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Seed rl paper review
RTL DESIGN IN ML WORLD_OBJECT AUTOMATION Inc
ACM Sunnyvale Meetup.pdf
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Pydata talk
MLconf seattle 2015 presentation
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
useR 2014 jskim
TensorFlow London 11: Pierre Harvey Richemond 'Trends and Developments in Rei...
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
TinkerPop 2020
The Sierra Supercomputer: Science and Technology on a Mission
FlinkML - Big data application meetup
Deep Convolutional GANs - meaning of latent space
Ad

More from Bill Liu (20)

PDF
Walk Through a Real World ML Production Project
PDF
Redefining MLOps with Model Deployment, Management and Observability in Produ...
PDF
Productizing Machine Learning at the Edge
PPTX
Transformers in Vision: From Zero to Hero
PDF
Deep AutoViML For Tensorflow Models and MLOps Workflows
PDF
Metaflow: The ML Infrastructure at Netflix
PDF
Practical Crowdsourcing for ML at Scale
PDF
Building large scale transactional data lake using apache hudi
PDF
Deep Reinforcement Learning and Its Applications
PDF
Big Data and AI in Fighting Against COVID-19
PDF
Build computer vision models to perform object detection and classification w...
PDF
Causal Inference in Data Science and Machine Learning
PDF
Weekly #106: Deep Learning on Mobile
PDF
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
PDF
AISF19 - On Blending Machine Learning with Microeconomics
PDF
AISF19 - Travel in the AI-First World
PDF
AISF19 - Unleash Computer Vision at the Edge
PDF
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
PDF
Toronto meetup 20190917
PPTX
Feature Engineering for NLP
Walk Through a Real World ML Production Project
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Productizing Machine Learning at the Edge
Transformers in Vision: From Zero to Hero
Deep AutoViML For Tensorflow Models and MLOps Workflows
Metaflow: The ML Infrastructure at Netflix
Practical Crowdsourcing for ML at Scale
Building large scale transactional data lake using apache hudi
Deep Reinforcement Learning and Its Applications
Big Data and AI in Fighting Against COVID-19
Build computer vision models to perform object detection and classification w...
Causal Inference in Data Science and Machine Learning
Weekly #106: Deep Learning on Mobile
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
AISF19 - On Blending Machine Learning with Microeconomics
AISF19 - Travel in the AI-First World
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
Toronto meetup 20190917
Feature Engineering for NLP

Recently uploaded (20)

PDF
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
PDF
Examining Bias in AI Generated News Content.pdf
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
substrate PowerPoint Presentation basic one
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
Advancing precision in air quality forecasting through machine learning integ...
PDF
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
PDF
LMS bot: enhanced learning management systems for improved student learning e...
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
Auditboard EB SOX Playbook 2023 edition.
PPTX
Internet of Everything -Basic concepts details
PDF
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PDF
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
PPTX
Module 1 Introduction to Web Programming .pptx
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
Examining Bias in AI Generated News Content.pdf
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
substrate PowerPoint Presentation basic one
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
Data Virtualization in Action: Scaling APIs and Apps with FME
4 layer Arch & Reference Arch of IoT.pdf
Advancing precision in air quality forecasting through machine learning integ...
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
LMS bot: enhanced learning management systems for improved student learning e...
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
Auditboard EB SOX Playbook 2023 edition.
Internet of Everything -Basic concepts details
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
giants, standing on the shoulders of - by Daniel Stenberg
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
Module 1 Introduction to Web Programming .pptx

Highly-scalable Reinforcement Learning RLlib for Real-world Applications