0% found this document useful (0 votes)

19 views27 pages

CS 747, Autumn 2023 - Lecture 3

Uploaded by

Kus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views27 pages

CS 747, Autumn 2023 - Lecture 3

Uploaded by

Kus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

CS 747, Autumn 2023: Lecture 3

Shivaram Kalyanakrishnan

Department of Computer Science and Engineering

Indian Institute of Technology Bombay

Autumn 2023

1/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 1 / 10

Multi-armed Bandits
The exploration-exploitation dilemma
Definitions: Bandit, Algorithm
ϵ-greedy algorithms
Evaluating algorithms: Regret
Achieving sub-linear regret
A lower bound on regret

2/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 2 / 10

Multi-armed Bandits
The exploration-exploitation dilemma
Definitions: Bandit, Algorithm
ϵ-greedy algorithms
Evaluating algorithms: Regret
Achieving sub-linear regret
A lower bound on regret

UCB, KL-UCB algorithms

Thompson Sampling algorithm

2/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 2 / 10

Multi-armed Bandits
The exploration-exploitation dilemma
Definitions: Bandit, Algorithm
ϵ-greedy algorithms
Evaluating algorithms: Regret
Achieving sub-linear regret
A lower bound on regret

UCB, KL-UCB algorithms

Thompson Sampling algorithm

Concentration bounds
Analysis of UCB

Understanding Thompson Sampling

2/10
Other bandit problems
Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 2 / 10
Multi-armed Bandits
The exploration-exploitation dilemma
Definitions: Bandit, Algorithm
ϵ-greedy algorithms
Evaluating algorithms: Regret
Achieving sub-linear regret
A lower bound on regret

UCB, KL-UCB algorithms

Thompson Sampling algorithm

Concentration bounds
Analysis of UCB

Understanding Thompson Sampling

2/10
Other bandit problems
Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 2 / 10
A Lower Bound on Regret
Paraphrasing Lai and Robbins (1985; see Theorem 2).
Let L be an algorithm such that for every bandit instance I ∈ Ī
and for every α > 0, as T → ∞:
RT (L, I) = o(T α ).

Then, for every bandit instance I ∈ Ī, as T → ∞:

RT (L, I) X p⋆ (I) − pa (I)

≥ ,
ln(T ) KL(pa (I), p⋆ (I))
a:pa (I)̸=p⋆ (I)

def
where for x, y ∈ [0, 1), KL(x, y ) = x ln yx + (1 − x) ln 1−y
1−x
.
3/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 3 / 10

Multi-armed Bandits

1. UCB, KL-UCB algorithms

2. Thompson Sampling algorithm

4/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 4 / 10

Upper Confidence Bounds = UCB (Auer
q et al., 2002)
2 ln(t)
- At time t, for every arm a, define ucbta = p̂at + uat
.
- p̂at is the empirical mean of rewards from arm a.
- uat the number of times a has been sampled at time t.

ucb at

pt
a

0 5/10
R
Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 5 / 10
Upper Confidence Bounds = UCB (Auer
q et al., 2002)
2 ln(t)
- At time t, for every arm a, define ucbta = p̂at + uat
.
- p̂at is the empirical mean of rewards from arm a.
- uat the number of times a has been sampled at time t.
- Pull an arm a for which ucbta is maximum.
1

ucb at

pt
a

ucb at

pt
a

Achieves regret of O (log(T )):

ucb at optimal dependence on T
up to a constant factor.
pt
a

0 5/10
R
Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 5 / 10
KL-UCB (Garivier and Cappé, 2011)
Identical to UCB algorithm on previous slide, except for a different definition
of the upper confidence bound.

6/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 6 / 10

KL-UCB (Garivier and Cappé, 2011)
Identical to UCB algorithm on previous slide, except for a different definition
of the upper confidence bound.
ucb-klta = max{q ∈ [p̂at , 1] s. t. uat KL(p̂at , q) ≤ ln(t) + c ln(ln(t))}, where c ≥ 3.

6/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 6 / 10

6/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 6 / 10

KL-UCB algorithm: at step t, pull argmaxa∈A ucb-klta .

6/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 6 / 10

KL-UCB algorithm: at step t, pull argmaxa∈A ucb-klta .

Observe that KL(p̂at , q) monotonically increases with q, and

▶ KL(p̂at , p̂at ) = 0;
▶ KL(p̂at , 1) = ∞.
Easy to compute ucb-klta numerically (for example through binary search).

6/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 6 / 10

KL-UCB algorithm: at step t, pull argmaxa∈A ucb-klta .

Observe that KL(p̂at , q) monotonically increases with q, and

▶ KL(p̂at , p̂at ) = 0;
▶ KL(p̂at , 1) = ∞.
Easy to compute ucb-klta numerically (for example through binary search).

ucb-klta is a tighter confidence bound than ucbta .

6/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 6 / 10

KL-UCB algorithm: at step t, pull argmaxa∈A ucb-klta .

Observe that KL(p̂at , q) monotonically increases with q, and

▶ KL(p̂at , p̂at ) = 0;
▶ KL(p̂at , 1) = ∞.
Easy to compute ucb-klta numerically (for example through binary search).

ucb-klta is a tighter confidence bound than ucbta .

Regret of KL-UCB asymptotically matches Lai and Robbins’ lower bound! 6/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 6 / 10

Multi-armed Bandits

1. UCB, KL-UCB algorithms

2. Thompson Sampling algorithm

7/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 7 / 10

Background: Beta Distribution
Beta(α, β) defined on [0, 1]. Two parameters: α and β.
α αβ
Mean = ; Variance = .
α+β (α + β)2 (α + β + 1)

3.0 α = 1, β = 1
α = 3, β = 4
α = 5, β = 15
2.5
Beta pdf(x)

2.0

1.5

1.0

0.5

0.0
0.0 0.2 0.4 0.6 0.8 1.0
x 8/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 8 / 10

Background: Beta Distribution
Beta(α, β) defined on [0, 1]. Two parameters: α and β.
α αβ
Mean = ; Variance = .
α+β (α + β)2 (α + β + 1)
0.6
3.0 α = 1, β = 1 µ = 0, σ = 2
α = 3, β = 4 µ = 0, σ = 3
α = 5, β = 15 0.5 µ = 5, σ = 1
2.5

Gaussian pdf(x)
0.4
Beta pdf(x)

2.0

0.3
1.5

1.0 0.2

0.5 0.1

0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 -10.0 -5.0 0.0 5.0 10.0
x x 8/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 8 / 10

Thompson Sampling (Thompson, 1933)
- At time t, let arm a have sat successes (1’s/heads) and fat failures (0’s/tails).

9/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 9 / 10

Thompson Sampling (Thompson, 1933)
- At time t, let arm a have sat successes (1’s/heads) and fat failures (0’s/tails).
- Beta(sat + 1, fat + 1) represents a “belief” about the true mean of arm a.
t (sat +1)(fat +1)
- Mean = st s+f
a +1
t +2 ; variance = .
a a (sat +fat +2) 2 (sat +fat +3)
1

0
R 9/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 9 / 10

- Computational step: For every arm 1

a, draw a sample (in agent’s mind)

xat ∼ Beta(sat + 1, fat + 1).
- Sampling step: Pull (in real world)
arm a for which xat is maximum.

0
R 9/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 9 / 10

- Computational step: For every arm 1

a, draw a sample (in agent’s mind)

xat ∼ Beta(sat + 1, fat + 1).
- Sampling step: Pull (in real world)
arm a for which xat is maximum.

0
R 9/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 9 / 10

- Computational step: For every arm 1

a, draw a sample (in agent’s mind)

xat ∼ Beta(sat + 1, fat + 1).
- Sampling step: Pull (in real world)
arm a for which xat is maximum.

Achieves optimal regret (Kaufmann

et al., 2012); is excellent in practice
(Chapelle and Li, 2011). 0
R 9/10

Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 9 / 10

Multi-armed Bandits
The exploration-exploitation dilemma
Definitions: Bandit, Algorithm
ϵ-greedy algorithms
Evaluating algorithms: Regret
Achieving sub-linear regret
A lower bound on regret

UCB, KL-UCB algorithms

Thompson Sampling algorithm

Concentration bounds
Analysis of UCB

Understanding Thompson Sampling

10/10
Other bandit problems
Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 10 / 10

CS3002 Solution Paper 2015.16 - v2
No ratings yet
CS3002 Solution Paper 2015.16 - v2
6 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Assignment 2- solution
No ratings yet
Assignment 2- solution
5 pages
CS 747, Autumn 2023: Lecture 5: Shivaram Kalyanakrishnan
No ratings yet
CS 747, Autumn 2023: Lecture 5: Shivaram Kalyanakrishnan
63 pages
RL WEEK_2_3
No ratings yet
RL WEEK_2_3
83 pages
Assignment 1: CS747: F I L A
No ratings yet
Assignment 1: CS747: F I L A
10 pages
multi-arm-bandit problem
No ratings yet
multi-arm-bandit problem
11 pages
CS 747, Autumn 2023: Lecture 4: Shivaram Kalyanakrishnan
No ratings yet
CS 747, Autumn 2023: Lecture 4: Shivaram Kalyanakrishnan
42 pages
MCQ& FB_Unit 1
No ratings yet
MCQ& FB_Unit 1
9 pages
EE675A Lecture 4
No ratings yet
EE675A Lecture 4
7 pages
EE675A Lecture 3
No ratings yet
EE675A Lecture 3
8 pages
Mid-Semester Examination
No ratings yet
Mid-Semester Examination
2 pages
Expanded_Multi_Armed_Bandit_and_Probability_Basics
No ratings yet
Expanded_Multi_Armed_Bandit_and_Probability_Basics
5 pages
Bandits
No ratings yet
Bandits
2 pages
RL UNIT PPT
No ratings yet
RL UNIT PPT
595 pages
Solution2
No ratings yet
Solution2
5 pages
RL-Unit-1_QA
No ratings yet
RL-Unit-1_QA
10 pages
HW 2
No ratings yet
HW 2
3 pages
RL SEM ANS
No ratings yet
RL SEM ANS
90 pages
Unit II
No ratings yet
Unit II
10 pages
Lecture 03: Adaptive Exploration-Based Algorithms: 1.1 Outline of The Algorithm
No ratings yet
Lecture 03: Adaptive Exploration-Based Algorithms: 1.1 Outline of The Algorithm
4 pages
CS115 Optimization
No ratings yet
CS115 Optimization
160 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
2017-18-I MS Key
No ratings yet
2017-18-I MS Key
6 pages
26 Making Decisions
No ratings yet
26 Making Decisions
31 pages
Mock Test AI 11 July 2021
No ratings yet
Mock Test AI 11 July 2021
26 pages
Ex 2 Solution
No ratings yet
Ex 2 Solution
13 pages
EE 6106: Online Learning and Optimisation Homework 1
No ratings yet
EE 6106: Online Learning and Optimisation Homework 1
4 pages
Data Challenge - NC Soft
No ratings yet
Data Challenge - NC Soft
4 pages
Amazon ML Pyq
No ratings yet
Amazon ML Pyq
8 pages
rl
No ratings yet
rl
11 pages
note2
No ratings yet
note2
4 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
EE675 Lecture 5
No ratings yet
EE675 Lecture 5
6 pages
Z.H. Sikder University of Science and Technology: Mid-Term Examination, Fall-2020
No ratings yet
Z.H. Sikder University of Science and Technology: Mid-Term Examination, Fall-2020
6 pages
Online Learning: 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
No ratings yet
Online Learning: 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
33 pages
Monte-Carlo Tree Search: Alan Fern
No ratings yet
Monte-Carlo Tree Search: Alan Fern
51 pages
Monte-Carlo Tree Search: Alan Fern
No ratings yet
Monte-Carlo Tree Search: Alan Fern
51 pages
AIML II Test Scheme and Soluion 2023
No ratings yet
AIML II Test Scheme and Soluion 2023
12 pages
Amazon ML Summer School Free Resources PDF
No ratings yet
Amazon ML Summer School Free Resources PDF
12 pages
pdf24 Images Merged
No ratings yet
pdf24 Images Merged
12 pages
KLUCB Paper
No ratings yet
KLUCB Paper
59 pages
Mod6_Slides
No ratings yet
Mod6_Slides
105 pages
Open Problem: Regret Bounds For Thompson Sampling: 1. Background
No ratings yet
Open Problem: Regret Bounds For Thompson Sampling: 1. Background
3 pages
Sol All
No ratings yet
Sol All
66 pages
Lab 4 Guide State Estimation
No ratings yet
Lab 4 Guide State Estimation
21 pages
Fan Glynn
No ratings yet
Fan Glynn
32 pages
Introduction To Bandits: (Some Slides Stolen From Csaba's AAAI Tutorial)
No ratings yet
Introduction To Bandits: (Some Slides Stolen From Csaba's AAAI Tutorial)
16 pages
RL Theory Tutorial
No ratings yet
RL Theory Tutorial
80 pages
sol3_2016
No ratings yet
sol3_2016
8 pages
Biocontrol - Week 3, Lecture 1: Goals of This Lecture
No ratings yet
Biocontrol - Week 3, Lecture 1: Goals of This Lecture
19 pages
Class 3
No ratings yet
Class 3
32 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
UCB Algorithm in RL
No ratings yet
UCB Algorithm in RL
3 pages
2110.14099v3
No ratings yet
2110.14099v3
32 pages
sol3_2015
No ratings yet
sol3_2015
8 pages
EE2211 Past Paper Ans
No ratings yet
EE2211 Past Paper Ans
19 pages
1 Upper Bounds On The Tail Probability
No ratings yet
1 Upper Bounds On The Tail Probability
7 pages
Fifth Dimension: The Light to See
From Everand
Fifth Dimension: The Light to See
Marc E. King
No ratings yet
A Study Of The Special Theory Of Relativity
From Everand
A Study Of The Special Theory Of Relativity
Richard Banner
No ratings yet
比较和对比论文论点
100% (1)
比较和对比论文论点
6 pages
2004content Rarbg Main Showsother
No ratings yet
2004content Rarbg Main Showsother
6,965 pages
DS 3151608-Be-Winter-2022
No ratings yet
DS 3151608-Be-Winter-2022
2 pages
6 - Programming
No ratings yet
6 - Programming
154 pages
Unit 1
No ratings yet
Unit 1
12 pages
Faculty Name: Khawaja Fahad Jawed
No ratings yet
Faculty Name: Khawaja Fahad Jawed
3 pages
Website Maintenance Methods
No ratings yet
Website Maintenance Methods
33 pages
Multisensor Data Fusion
100% (1)
Multisensor Data Fusion
529 pages
IIMA - Manual
No ratings yet
IIMA - Manual
19 pages
Cordect Wireless in Local Loop System
No ratings yet
Cordect Wireless in Local Loop System
27 pages
Types of Resumes
No ratings yet
Types of Resumes
1 page
BSOA 3-3D - Endozo, Ivann Rhenae R. - Organizational Chart
No ratings yet
BSOA 3-3D - Endozo, Ivann Rhenae R. - Organizational Chart
10 pages
Ir Partner Cisco
No ratings yet
Ir Partner Cisco
13 pages
Day 1
No ratings yet
Day 1
3 pages
1 s2.0 S0360132322007065 Main
No ratings yet
1 s2.0 S0360132322007065 Main
13 pages
Division of Guihulngan City: Knowledge Skills Attitude Values
100% (1)
Division of Guihulngan City: Knowledge Skills Attitude Values
2 pages
GPS-Based Attitude and Guidance Displays For General Aviation
No ratings yet
GPS-Based Attitude and Guidance Displays For General Aviation
7 pages
Features Description: Ltc6804-1/Ltc6804-2 Multicell Battery Monitors
No ratings yet
Features Description: Ltc6804-1/Ltc6804-2 Multicell Battery Monitors
78 pages
Sap Note 2484299 - e - 20230814
No ratings yet
Sap Note 2484299 - e - 20230814
3 pages
Robert Musk How To Create An Online Course The Beginner's Guide
No ratings yet
Robert Musk How To Create An Online Course The Beginner's Guide
132 pages
Mabini's Decalogue For Filipinos by Mabini, Apolinario, 1864-1903
No ratings yet
Mabini's Decalogue For Filipinos by Mabini, Apolinario, 1864-1903
12 pages
Stonks Meme Template - Google Search
No ratings yet
Stonks Meme Template - Google Search
1 page
Sem. / Computer Engineering Subject: Cloud Computing
No ratings yet
Sem. / Computer Engineering Subject: Cloud Computing
2 pages
Lecture 2 (Software Effort Estimation - 2)
No ratings yet
Lecture 2 (Software Effort Estimation - 2)
23 pages
Simulation Practical
No ratings yet
Simulation Practical
68 pages
iPhone 15 Pro - Technical Specifications
No ratings yet
iPhone 15 Pro - Technical Specifications
19 pages
R22cse (Iot)
No ratings yet
R22cse (Iot)
52 pages
Cse 326
No ratings yet
Cse 326
2 pages
HC Verma Ray Optics
67% (3)
HC Verma Ray Optics
25 pages