Risk-Averse Multi-Armed Bandit Problems under Mean-Variance Measure

Vakili, Sattar; Zhao, Qing

doi:10.1109/JSTSP.2016.2592622

Computer Science > Machine Learning

arXiv:1604.05257 (cs)

[Submitted on 18 Apr 2016 (v1), last revised 14 Aug 2017 (this version, v3)]

Title:Risk-Averse Multi-Armed Bandit Problems under Mean-Variance Measure

Authors:Sattar Vakili, Qing Zhao

View PDF

Abstract:The multi-armed bandit problems have been studied mainly under the measure of expected total reward accrued over a horizon of length $T$. In this paper, we address the issue of risk in multi-armed bandit problems and develop parallel results under the measure of mean-variance, a commonly adopted risk measure in economics and mathematical finance. We show that the model-specific regret and the model-independent regret in terms of the mean-variance of the reward process are lower bounded by $\Omega(\log T)$ and $\Omega(T^{2/3})$, respectively. We then show that variations of the UCB policy and the DSEE policy developed for the classic risk-neutral MAB achieve these lower bounds.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:1604.05257 [cs.LG]
	(or arXiv:1604.05257v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1604.05257
Related DOI:	https://doi.org/10.1109/JSTSP.2016.2592622

Submission history

From: Sattar Vakili [view email]
[v1] Mon, 18 Apr 2016 17:28:41 UTC (232 KB)
[v2] Wed, 27 Jul 2016 11:35:36 UTC (572 KB)
[v3] Mon, 14 Aug 2017 21:10:14 UTC (622 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2016-04

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Sattar Vakili
Qing Zhao

Computer Science > Machine Learning

Title:Risk-Averse Multi-Armed Bandit Problems under Mean-Variance Measure

Submission history

Access Paper:

Current browse context:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Risk-Averse Multi-Armed Bandit Problems under Mean-Variance Measure

Submission history

Access Paper:

Current browse context:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators