CS 747, Autumn 2023 - Lecture 3
CS 747, Autumn 2023 - Lecture 3
Shivaram Kalyanakrishnan
Autumn 2023
1/10
2/10
2/10
Concentration bounds
Analysis of UCB
Concentration bounds
Analysis of UCB
def
where for x, y ∈ [0, 1), KL(x, y ) = x ln yx + (1 − x) ln 1−y
1−x
.
3/10
4/10
ucb at
pt
a
0 5/10
R
Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 5 / 10
Upper Confidence Bounds = UCB (Auer
q et al., 2002)
2 ln(t)
- At time t, for every arm a, define ucbta = p̂at + uat
.
- p̂at is the empirical mean of rewards from arm a.
- uat the number of times a has been sampled at time t.
- Pull an arm a for which ucbta is maximum.
1
ucb at
pt
a
0 5/10
R
Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 5 / 10
Upper Confidence Bounds = UCB (Auer
q et al., 2002)
2 ln(t)
- At time t, for every arm a, define ucbta = p̂at + uat
.
- p̂at is the empirical mean of rewards from arm a.
- uat the number of times a has been sampled at time t.
- Pull an arm a for which ucbta is maximum.
1
ucb at
pt
a
0 5/10
R
Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 5 / 10
Upper Confidence Bounds = UCB (Auer
q et al., 2002)
2 ln(t)
- At time t, for every arm a, define ucbta = p̂at + uat
.
- p̂at is the empirical mean of rewards from arm a.
- uat the number of times a has been sampled at time t.
- Pull an arm a for which ucbta is maximum.
1
0 5/10
R
Shivaram Kalyanakrishnan (2023) CS 747, Autumn 2023 5 / 10
KL-UCB (Garivier and Cappé, 2011)
Identical to UCB algorithm on previous slide, except for a different definition
of the upper confidence bound.
6/10
6/10
6/10
6/10
6/10
7/10
3.0 α = 1, β = 1
α = 3, β = 4
α = 5, β = 15
2.5
Beta pdf(x)
2.0
1.5
1.0
0.5
0.0
0.0 0.2 0.4 0.6 0.8 1.0
x 8/10
Gaussian pdf(x)
0.4
Beta pdf(x)
2.0
0.3
1.5
1.0 0.2
0.5 0.1
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 -10.0 -5.0 0.0 5.0 10.0
x x 8/10
9/10
0
R 9/10
0
R 9/10
0
R 9/10
Concentration bounds
Analysis of UCB