0% found this document useful (0 votes)

11 views48 pages

Lecture 1 - Introduction

The document outlines a course on Reinforcement Learning (DA6400) taught by B. Ravindran, including lecture schedules and topics covered. It introduces key concepts of reinforcement learning, such as trial-and-error learning, rewards, and its applications in complex environments like helicopter and humanoid control. The document also highlights significant advancements in AI, including notable achievements in game playing and real-world applications like driver-rider matching.

Uploaded by

pratssunkad08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views48 pages

Lecture 1 - Introduction

Uploaded by

pratssunkad08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

DA6400:

Reinforcement Learning
E Slot
Tue: 11 - 11:50, Wed: 10 - 10:50, Fri 5:00 – 5:50
Tutorial: Thur: 8 – 8:50

B. Ravindran
TAs: Returaj, Jash, Argha
Lecture 1:
Introduction to
Reinforcement Learning

B. Ravindran
Machine Learning
❏ Learn functions from input to output from data

DA6400 Lecture 1 3
Reinforcement Learning
❏ Familiar models of machine learning
❏ Learning from data

❏ How did you learn to cycle?

❏ Trial and error!
❏ Falling down hurts!
❏ Evaluation, not instruction
❏ Reinforcement Learning

❏ Walk, Talk, etc.

DA6400 Lecture 1 4
Reinforcement Learning
❏ A trial-and-error learning paradigm
❏ Rewards and Punishments
❏ Learn about a system through interaction
❏ Inspired by behavioural psychology!
❏ Pavlov’s dog

DA6400 Lecture 1 5
Reinforcement Learning Works!

DA6400 Lecture 1 6
Tic-Tac-Toe
X

X X X

O O ...
O

X X X X X X

O
... X O O
... O X
... X

X X X O O

DA6400 Lecture 1 7
Supervised Learning
X X O
O X O
X X

X O
X O X
O

DA6400 Lecture 1 8
Supervised Learning
X X O
O O X O
X X O
Expert Moves
X O X
X X O X
O

DA6400 Lecture 1 9
Reinforcement Learning
❏ Learn from evaluation
❏ Win gives 1 point
❏ Loss gives -1 point
❏ Draw gives 0 points

❏ Learn from repeated

play

DA6400 Lecture 1 10
MENACE
❏ Machine Educable Noughts And Crosses Engine
❏ Michie and Chambers ’60

DA6400 Lecture 1 11
More Tic-Tac-Toe
X X X X X X X X X X

O O O O O O O O

X X O X O X O X O

❏ Assume an imperfect opponent

❏ makes mistakes sometimes
DA6400 Lecture 1 12
Reinforcement Learning
❏ Simple rule to explain complex behaviors
❏ Intuition: Prediction of outcome at time t+1 is
better than the prediction at time t. Hence use
the later prediction to adjust the earlier
prediction
❏ Has also had profound impact in behavioral
psychology and neuroscience!

DA6400 Lecture 1 13
TD in the Brain

DA6400 Lecture 5 14
Administrivia
❏ Textbook: Sutton, R. S., and Barto, A. G.
‘Reinforcement Learning: An Introduction’,
2nd Edition. MIT Press
❏ Get Moodle Access
❏ TAs: Returaj, Jash, Argha

DA6400 Lecture 1 15
Why RL?
❏ Complex Dynamics
❏ Helicopter control

DA6400 Lecture 1 16
Helicopter Control

[Link]

DA6400 Lecture 1 17
Why RL?
❏ Complex Dynamics
❏ Helicopter control
❏ Humanoid control

DA6400 Lecture 1 18
Humanoid Control

[Link]

DA6400 Lecture 1 19
Humanoid Control

DA6400 Lecture 1 20
Why RL?
❏ Complex Dynamics
❏ Helicopter control
❏ Humanoid control
❏ Complex Environments

DA6400 Lecture 1 21
Path Planning

❏ Learn distance function ICRA 2014

DA6400 Lecture 1 22
Supply chain: Inventory management

❏ Hierarchical ﬂow of
products
❏ Challenges:
❏ Different time scales
at different levels of
the tree
❏ Capacity constraints
❏ Lead times
❏ Large number of
products
❏ Selﬁsh objectives

DA6400 Lecture 1 23
Green Security Games
❏ Subclass of Stackelberg Security Games used to model
strategic interactions between law enforcement agencies
(defenders) and their opponents (adversaries)

❏ Model repeated interactions [Fang, Stone, and Tambe 2015;

Fang et al. 2016; Xu et al. 2017]

❏ Defenders protect a ﬁnite set of targets (e.g., wildlife) with

limited resources

DA6400 Lecture 1 24
Poaching
❏ Focus on real-world scenarios
❏ Combination of allocation and patrolling
❏ MILP and LP approaches do not scale well
❏ Use of Reinforcement Learning

DA6400 Lecture 1 25
CombSGPO

DA6400 Lecture 1 26
Cluttered Workspace

DA6400 Lecture 1 27
Why RL?
❏ Complex Dynamics
❏ Helicopter control
❏ Humanoid control
❏ Complex Environments
❏ Customization / Personalization

DA6400 Lecture 1 28
Customization

DA6400 Lecture 1 29
Ad Selection

DA6400 Lecture 1 30
Aligning LLM Responses with
Humans Preferences
Prompt: Serendipity means the occurrence and development
of events by chance in a happy or beneﬁcial way. Use the word
in a sentence.

❏ Serendipity is the ability to see something good in

something bad.

❏ Serendipity can be deﬁned as the happy chance

occurrence of events leading to a beneﬁcial outcome.

Both the responses are technically completions, but which one

is better?

DA6400 Lecture 1 31
Aligning LLM Responses with
Humans Preferences
❏ Thought experiment.
❏ Consider the prompt - “Write a strongly worded email
to your co-worker about their unﬁnished task”.
❏ Consider the following responses:
❏ The task is of extreme importance and I wish you took
its completion seriously.
❏ You are a terrible colleague and do not ﬁnish assigned
tasks.
❏ Which one would you prefer?
❏ Even though there are multiple correct answers, users
have a preference.
❏ How do we encode this preference in the LLM?
DA6400 Lecture 1 32
Aligning LLM Responses with
Humans Preferences

DA6400 Lecture 1 33
Why RL?
❏ Complex Dynamics
❏ Helicopter control
❏ Humanoid control
❏ Complex Environments
❏ Customization / Personalization
❏ Going beyond human knowledge
❏ Learn through Self-Play

DA6400 Lecture 1 34
TD-Gammon
❏ TD-Gammon (Tesauro 92, 94,
95)
❏ Human Level Backgammon
player
❏ Beat the best human player in
1995

❏ Learnt completely by self play

❏ New moves not recorded by
humans in centuries of play

DA6400 Lecture 1 35
Game Playing – Arcade Games
❏ Learnt to play from video input!
❏ Learnt from scratch

❏ Used a complex neural network!

❏ Considered one of the hardest
learning problems solved by a
computer!

DA6400 Lecture 1 36
DQN - Breakout

DA6400 Lecture 1 37
AlphaGo
❏ Branching factor : Go 250 vs 35
Chess
❏ AlphaGo Master defeated the
18-time World Champion 4-1

DA6400 Lecture 1 38
AlphaGo - Move 37
Match #2 - Move 37
“I would be a bit thrown off by some unusual moves that
AlphaGo has played. … It’s playing moves that are deﬁnitely
not usual moves. They’re not moves that would have a high
percentage of moves in its database. So it’s coming up with
the moves on its own. … It’s a creative move”

DA6400 Lecture 1 39
Defence of The Ancients 2 (DoTA 2)
❏ The AI bot won 1v1 matches against
top players in the world at the
International DoTA Championships
❏ 100 different heroes. 100 different
items. Many different tactics. Much
more complex than board games

❏ Trained for 2 weeks by using just

self-play.
❏ The full game is played 5v5.
Multi-agent coordination required.
Still being developed

DA6400 Lecture 1 40
AlphaZero
❏ A general AI agent; Not limited to Go.
Superhuman Performance on Chess, Shogi
and Go
❏ No human data: Trained from Scratch RL by
playing against itself
❏ No human features: Only the raw positions
from the board are provided to the agent

❏ Simpler Search: No randomized Monte Carlo

Rollouts. Use a Neural Network to evaluate

❏ Beat AlphaGo Lee by 100 – 0

❏ Beat Stockﬁsh and Elmo on Chess and Shogi

DA6400 Lecture 1 41
Why RL?
❏ Complex Dynamics
❏ Helicopter control
❏ Humanoid control
❏ Complex Environments
❏ Customization / Personalization
❏ Going beyond human knowledge
❏ Learn through Self-Play
❏ Improving heuristics

DA6400 Lecture 1 42
Power Management

DeepMind AI reduces Google Data Centre cooling bill by 40%

DA6400 Lecture 1 43
AlphaTensor

DeepMind's AlphaTensor discovered new efficient matrix

multiplication methods, the ﬁrst breakthrough since 1969

DA6400 Lecture 1 44
Driver-Rider Matching

Lyft used RL to automate driver-rider matching, increasing the

revenue by $30 million per year

DA6400 Lecture 1 45
Other Applications
❏ Optimal Control ❏ Computational
Neuroscience
❏ Robot Navigation
❏ Primary mechanism of
❏ Chemical Plants
learning
❏ Combinatorial Optimization
❏ Psychology
❏ Elevator Dispatching
❏ Behavioral and operant
❏ VLSI placement and conditioning
routing
❏ Decision making
❏ Job-shop scheduling
❏ Operations Research
❏ Routing algorithms
❏ Approximate Dynamic
❏ Call admission control Programming
❏ More ❏ More
❏ Intelligent Tutoring Systems ❏ Dialogue systems

DA6400 Lecture 1 46
What’s Next?

❏ Deep Reinforcement Learning has revived excitement in the

community
❏ But many fundamental questions still to be addressed

DA6400 Lecture 1 47
Administrivia
❏ Textbook: Sutton, R. S., and Barto, A. G.
‘Reinforcement Learning: An Introduction’,
2nd Edition. MIT Press
❏ Get Moodle Access
❏ TAs: Returaj, Jash, Argha

DA6400 Lecture 1 48

RL Chap 5
No ratings yet
RL Chap 5
21 pages
Lec 23
No ratings yet
Lec 23
51 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
47 pages
Mlunit 5
No ratings yet
Mlunit 5
10 pages
RL Introduction
No ratings yet
RL Introduction
225 pages
RL Lecture1-Introduction (IITH)
No ratings yet
RL Lecture1-Introduction (IITH)
44 pages
cs224r L01 Intro
No ratings yet
cs224r L01 Intro
51 pages
Lec10 - Interaction
No ratings yet
Lec10 - Interaction
40 pages
2-Artificial Intelligence, Concept and Application
No ratings yet
2-Artificial Intelligence, Concept and Application
24 pages
L1 - UCLxDeepMind DL2020
No ratings yet
L1 - UCLxDeepMind DL2020
97 pages
Deep Reinforcement Learning: Lecture Notes
No ratings yet
Deep Reinforcement Learning: Lecture Notes
60 pages
Reinforcement Learning For IoT - Final
No ratings yet
Reinforcement Learning For IoT - Final
45 pages
Intro
No ratings yet
Intro
28 pages
Deep Reinforcement Learning Nanodegree
No ratings yet
Deep Reinforcement Learning Nanodegree
13 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
7 pages
Final
No ratings yet
Final
18 pages
Datamahadev Com Ai Pilot Deep Reinforcement Learning Change Aviation Warfare
No ratings yet
Datamahadev Com Ai Pilot Deep Reinforcement Learning Change Aviation Warfare
8 pages
Green and Black Modern Machine Learning Presentation
No ratings yet
Green and Black Modern Machine Learning Presentation
14 pages
Deep Learning Introduction Class
No ratings yet
Deep Learning Introduction Class
46 pages
AI - Artículo en Ingles
No ratings yet
AI - Artículo en Ingles
23 pages
1 Introduction To RL
No ratings yet
1 Introduction To RL
46 pages
Modern Deep Reinforcement Learning Algorithms
No ratings yet
Modern Deep Reinforcement Learning Algorithms
56 pages
Reinforcement Learning Syllabus
No ratings yet
Reinforcement Learning Syllabus
6 pages
Self-Driving Car Racing: Application of Deep Reinforcement Learning
No ratings yet
Self-Driving Car Racing: Application of Deep Reinforcement Learning
12 pages
AI-Based Dice Roller Using Reinforcement Learning: Your Name City, Country
No ratings yet
AI-Based Dice Roller Using Reinforcement Learning: Your Name City, Country
2 pages
Self-Learning Game via Deep RL
No ratings yet
Self-Learning Game via Deep RL
12 pages
Unit 5 ML
No ratings yet
Unit 5 ML
49 pages
Autonomous Vehicle Control Via Deep Reinforcement Learning: Simon Kardell Mattias Kuosku
No ratings yet
Autonomous Vehicle Control Via Deep Reinforcement Learning: Simon Kardell Mattias Kuosku
73 pages
Unit 1 Machine Learning Applications
No ratings yet
Unit 1 Machine Learning Applications
76 pages
Deep Reinforcement Learning from Human Preferences (深度强化学习来自人类偏好)
No ratings yet
Deep Reinforcement Learning from Human Preferences (深度强化学习来自人类偏好)
9 pages
6S191 MIT DeepLearning L5
No ratings yet
6S191 MIT DeepLearning L5
62 pages
An Introduction To Deep ReinforcementLearning
No ratings yet
An Introduction To Deep ReinforcementLearning
65 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
14 pages
Reinforcement Learning in Game Playing From Checkers To Complex Worlds
No ratings yet
Reinforcement Learning in Game Playing From Checkers To Complex Worlds
10 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
ML 5 Reinforcement
No ratings yet
ML 5 Reinforcement
23 pages
RL Unit-1
No ratings yet
RL Unit-1
52 pages
Reinforcement Learning in Mechatronics
No ratings yet
Reinforcement Learning in Mechatronics
12 pages
Unit 1 Introduction-AI For Everyone Notes
No ratings yet
Unit 1 Introduction-AI For Everyone Notes
9 pages
2024 Special Lectures in Information Science 4 Wo Ans
No ratings yet
2024 Special Lectures in Information Science 4 Wo Ans
47 pages
Report ML Aat g1 Final
No ratings yet
Report ML Aat g1 Final
8 pages
RADL LACuong
No ratings yet
RADL LACuong
81 pages
Aids Module 6
No ratings yet
Aids Module 6
19 pages
Deep RL for Autonomous Car Racing
No ratings yet
Deep RL for Autonomous Car Racing
6 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
15) EXPLAIN Fitted Q and Deep Q-Learning
No ratings yet
15) EXPLAIN Fitted Q and Deep Q-Learning
17 pages
Deep Reinforcement Learning Overview
No ratings yet
Deep Reinforcement Learning Overview
50 pages
Report On Reinforcement Learning
No ratings yet
Report On Reinforcement Learning
26 pages
Reinforcement Learning for Gamers
No ratings yet
Reinforcement Learning for Gamers
13 pages
ML (Unit-1)
No ratings yet
ML (Unit-1)
17 pages
An Introduction To Deep Reinforcement Learning PDF
No ratings yet
An Introduction To Deep Reinforcement Learning PDF
140 pages
Pap 1 LATEST 26 MAY Cmnts Shash Sir Addressed After Comments ICONIP Model Human Act With DRL and Cognitive
No ratings yet
Pap 1 LATEST 26 MAY Cmnts Shash Sir Addressed After Comments ICONIP Model Human Act With DRL and Cognitive
13 pages
21cse417t - Fundamentals of Reinforcement Learning Syllabus
No ratings yet
21cse417t - Fundamentals of Reinforcement Learning Syllabus
2 pages
Seminar Report Amrita 056 (1) (AutoRecovered)
No ratings yet
Seminar Report Amrita 056 (1) (AutoRecovered)
31 pages
Applsci 13 02443
No ratings yet
Applsci 13 02443
23 pages
Artificial General Intelligence
No ratings yet
Artificial General Intelligence
4 pages
Reema Rafeek - CV
No ratings yet
Reema Rafeek - CV
2 pages
I Epd N Famous Education Quotes PDF
No ratings yet
I Epd N Famous Education Quotes PDF
10 pages
MCQ Nep 2020
No ratings yet
MCQ Nep 2020
10 pages
Chapter 2
No ratings yet
Chapter 2
6 pages
Module-In-In-Foundation-Of-Special-And-Inclusive-Education - Pdf-Free
No ratings yet
Module-In-In-Foundation-Of-Special-And-Inclusive-Education - Pdf-Free
13 pages
NSTP Course Intro and Modules
No ratings yet
NSTP Course Intro and Modules
87 pages
ITIL 4 Foundation - Trainer Guide - Digital
100% (2)
ITIL 4 Foundation - Trainer Guide - Digital
118 pages
Jeraad Pillay
0% (1)
Jeraad Pillay
3 pages
Grade 1 Credo
No ratings yet
Grade 1 Credo
4 pages
Improving English Speaking Skills Through Technology
No ratings yet
Improving English Speaking Skills Through Technology
12 pages
Grade 1 English Lesson Plan: Naming Words
No ratings yet
Grade 1 English Lesson Plan: Naming Words
15 pages
Daily Lesson Log in English 8
100% (2)
Daily Lesson Log in English 8
4 pages
Cpe 105
No ratings yet
Cpe 105
2 pages
Pre-Service Teacher Observation Sheet
No ratings yet
Pre-Service Teacher Observation Sheet
1 page
Integrative Science Performance Task
No ratings yet
Integrative Science Performance Task
12 pages
Hortatory Eksposition Text For Students Xi Grade
No ratings yet
Hortatory Eksposition Text For Students Xi Grade
5 pages
Chapter 2 Trends and Issues EDUC 103
No ratings yet
Chapter 2 Trends and Issues EDUC 103
26 pages
Rubric
No ratings yet
Rubric
2 pages
UEFA B Licence: Winning
No ratings yet
UEFA B Licence: Winning
27 pages
Learning Autonomy
No ratings yet
Learning Autonomy
13 pages
Science Education Journal
No ratings yet
Science Education Journal
18 pages
Job Interview Lesson Plan for MUET
No ratings yet
Job Interview Lesson Plan for MUET
2 pages
4th Grade Realistic Narrative Writing Plan
No ratings yet
4th Grade Realistic Narrative Writing Plan
8 pages
26 Focus 4 Lesson Plan Unit 2
No ratings yet
26 Focus 4 Lesson Plan Unit 2
1 page
Test Bank For Managing Change Creativity and Innovation 3rd Edition HQ File Download
No ratings yet
Test Bank For Managing Change Creativity and Innovation 3rd Edition HQ File Download
406 pages
Chapter One 1.1 Background To The Study
No ratings yet
Chapter One 1.1 Background To The Study
18 pages
Educ 401 Research Paper On Outcome Based Education
No ratings yet
Educ 401 Research Paper On Outcome Based Education
17 pages
Conversation Analysis Assignment
No ratings yet
Conversation Analysis Assignment
17 pages
Prof - Ed Raw
No ratings yet
Prof - Ed Raw
19 pages
The Creative Curriculum For Preschool
100% (1)
The Creative Curriculum For Preschool
8 pages