0% found this document useful (0 votes)
24 views30 pages

Tembine Book

Uploaded by

Fatima Oubelkas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views30 pages

Tembine Book

Uploaded by

Fatima Oubelkas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Distributed Strategic Learning for Wireless Engineers

Item Type Book

Authors Tembine, Hamidou

Citation H. Tembine, Distributed Strategic Learning for Wireless


Engineers, CRC Press/ Taylor & Francis, 496 pages, May 2012

DOI 10.1201/b11896

Publisher Informa UK Limited

Download date 2024-12-09 14:39:16

Link to Item http://hdl.handle.net/10754/292325


Distributed Strategic
Learning
for Wireless Engineers

© 2012 by Taylor & Francis Group, LLC

K13548_FM.indd 1 4/20/12 2:59 PM


Distributed Strategic
Learning
for Wireless Engineers

Hamidou Tembine

© 2012 by Taylor & Francis Group, LLC

K13548_FM.indd 3 4/20/12 2:59 PM


MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does
not warrant the accuracy of the text or exercises in this book. This book’s use or discussion of MAT-
LAB® software or related products does not constitute endorsement or sponsorship by The MathWorks
of a particular pedagogical approach or particular use of the MATLAB® software.

CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2012 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works


Version Date: 20120330

International Standard Book Number-13: 978-1-4398-7644-2 (eBook - PDF)

This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a pho-
tocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com

© 2012 by Taylor & Francis Group, LLC


Dedicated to Bourere Siguipily

© 2012 by Taylor & Francis Group, LLC


Contents

List of Figures xv

List of Tables xvii

Foreword xix

Preface xxi

The Author Bio xxxiii

Contributors xxxv

1 Introduction to Learning in Games 1

1.1 Basic Elements of Games . . . . . . . . . . . . . . . . . . . . 5


1.1.1 Basic Components of One-Shot Game . . . . . . . . . 5
1.1.4 State-Dependent One-Shot Game . . . . . . . . . . . . 9
1.1.4.1 Perfectly-Known State One-Shot Games . . . 9
1.1.4.2 One-Shot Games with Partially-Known State 10
1.1.4.3 State Component is Unknown . . . . . . . . 10
1.1.4.4 Only the State Space Is Known . . . . . . . 11
1.1.5 Perfectly Known State Dynamic Game . . . . . . . . . 11
1.1.6 Unknown State Dynamic Games . . . . . . . . . . . . 12
1.1.7 State-Dependent Equilibrium . . . . . . . . . . . . . . 20
1.1.8 Random Matrix Games . . . . . . . . . . . . . . . . . 21
1.1.9 Dynamic Robust Game . . . . . . . . . . . . . . . . . 21
1.2 Robust Games in Networks . . . . . . . . . . . . . . . . . . . 22
1.3 Basic Robust Games . . . . . . . . . . . . . . . . . . . . . . . 27
1.4 Basics of Robust Cooperative Games . . . . . . . . . . . . . 29
1.4.0.1 Preliminaries . . . . . . . . . . . . . . . . . . 29
1.4.0.4 Cooperative Solution Concepts . . . . . . . . 30
1.5 Distributed Strategic Learning . . . . . . . . . . . . . . . . . 33
1.5.1 Convergence Issue . . . . . . . . . . . . . . . . . . . . 39
1.5.2 Selection Issue . . . . . . . . . . . . . . . . . . . . . . 39
1.5.2.1 How to Select an Efficient Outcome? . . . . 40
1.5.2.2 How to Select a Stable Outcome ? . . . . . . 40
1.6 Distributed Strategic Learning in Wireless Networks . . . . . 41

vii

© 2012 by Taylor & Francis Group, LLC


viii

1.6.1 Physical Layer . . . . . . . . . . . . . . . . . . . . . . 41


1.6.2 MAC Layer . . . . . . . . . . . . . . . . . . . . . . . . 42
1.6.3 Network Layer . . . . . . . . . . . . . . . . . . . . . . 42
1.6.4 Transport Layer . . . . . . . . . . . . . . . . . . . . . 47
1.6.5 Application Layer . . . . . . . . . . . . . . . . . . . . 48
1.6.6 Compressed Sensing . . . . . . . . . . . . . . . . . . . 49

2 Strategy Learning 53

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.2 Strategy Learning under Perfect Action Monitoring . . . . . 53
2.2.1 Fictitious Play-Based Algorithms . . . . . . . . . . . . 54
2.2.2 Best Response-Based Learning Algorithms . . . . . . . 66
2.2.5 Better Reply-Based Learning Algorithms . . . . . . . 72
2.2.6 Fixed Point Iterations . . . . . . . . . . . . . . . . . . 77
2.2.7 Cost-To-Learn . . . . . . . . . . . . . . . . . . . . . . 80
2.2.8 Learning Bargaining Solutions . . . . . . . . . . . . . 86
2.2.9 Learning and Conjectural Variations . . . . . . . . . . 91
2.2.10 Bayesian Learning in Games . . . . . . . . . . . . . . 94
2.2.11 Non-Bayesian Learning . . . . . . . . . . . . . . . . . 96
2.3 Fully Distributed Strategy-Learning . . . . . . . . . . . . . . 96
2.3.1 Learning by Experimentation . . . . . . . . . . . . . 98
2.3.2 Reinforcement Learning . . . . . . . . . . . . . . . . . 101
2.3.3 Learning Correlated Equilibria . . . . . . . . . . . . . 111
2.3.4 Boltzmann-Gibbs Learning Algorithms . . . . . . . . . 114
2.3.5 Hybrid Learning Scheme . . . . . . . . . . . . . . . . . 118
2.3.6 Fast Convergence of Evolutionary Dynamics . . . . . 121
2.3.7 Convergence in Finite Number of Steps . . . . . . . . 122
2.3.8 Convergence Time of Boltzmann-Gibbs Learning . . . 123
2.3.9 Learning Satisfactory Solutions . . . . . . . . . . . . . 127
2.4 Stochastic Approximations . . . . . . . . . . . . . . . . . . . 130
2.5 Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . 131
2.6 Discussions and Open Issues . . . . . . . . . . . . . . . . . . 132

3 Payoff Learning and Dynamics 137

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 137


3.2 Learning Equilibrium Payoffs . . . . . . . . . . . . . . . . . . 140
3.3 Payoff Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 144
3.4 Routing Games with Parallel Links . . . . . . . . . . . . . . 144
3.5 Numerical Values of Payoffs Are Not Observed . . . . . . . . 146

© 2012 by Taylor & Francis Group, LLC


ix

4 Combined Learning 149

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 149


4.2 Model and Notations . . . . . . . . . . . . . . . . . . . . . . 152
4.2.1 Description of the Dynamic Game . . . . . . . . . . . 153
4.2.2 Combined Payoff and Strategy Learning . . . . . . . . 155
4.3 Pseudo-Trajectory . . . . . . . . . . . . . . . . . . . . . . . 162
4.3.1 Convergence of the Payoff Reinforcement Learning . . 163
4.3.2 Folk Theorem . . . . . . . . . . . . . . . . . . . . . . . 163
4.3.3 From Imitative Boltzmann-Gibbs CODIPAS-RL to
Replicator Dynamics . . . . . . . . . . . . . . . . . . . 165
4.4 Hybrid and Combined Dynamics . . . . . . . . . . . . . . . 166
4.4.1 From Boltzmann-Gibbs-Based CODIPAS-RL to Com-
posed Dynamics . . . . . . . . . . . . . . . . . . . . . 166
4.4.2 From Heterogeneous Learning to Novel Game Dynamics 167
4.4.3 Aggregative Robust Games in Wireless Networks . . . 171
4.4.3.2 Power Allocation as Aggregative Robust
Games . . . . . . . . . . . . . . . . . . . . . 172
4.4.4 Wireless MIMO Systems . . . . . . . . . . . . . . . . . 178
4.4.4.1 Learning the Outage Probability . . . . . . 179
4.4.4.2 Learning the Ergodic Capacity . . . . . . . 180
4.5 Learning in Games with Continuous Action Spaces . . . . . 180
4.5.1 Stable Robust Games . . . . . . . . . . . . . . . . . . 181
4.5.2 Stochastic-Gradient-Like CODIPAS . . . . . . . . . . 183
4.6 CODIPAS for Stable Games with Continuous Action Spaces 183
4.6.1 Algorithm to Solve Variational Inequality . . . . . . . 184
4.6.2 Convergence to Variational Inequality Solution . . . . 185
4.7 CODIPAS-RL via Extremum-Seeking . . . . . . . . . . . . 186
4.8 Designer and Users in an Hierarchical System . . . . . . . . 188
4.9 From Fictitious Play with Inertia to CODIPAS-RL . . . . . 191
4.10 CODIPAS-RL with Random Number of Active Players . . . 192
4.11 CODIPAS for Multi-Armed Bandit Problems . . . . . . . . 197
4.12 CODIPAS and Evolutionary Game Dynamics . . . . . . . . 200
4.12.1 Discrete-Time Evolutionary Game Dynamics . . . . . 201
4.12.4 CODIPAS-Based Evolutionary Game Dynamics . . . 201
4.13 Fastest Learning Algorithms . . . . . . . . . . . . . . . . . . 202

5 Learning under Delayed Measurement 207

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 207


5.2 Learning under Delayed Imperfect Payoffs . . . . . . . . . . 208
5.2.1 CODIPAS-RL under Delayed Measurement . . . . . . 209
5.3 Reacting to the Interference . . . . . . . . . . . . . . . . . . 212
5.3.1 Robust PMAC Games . . . . . . . . . . . . . . . . . . 214
5.3.2 Numerical Examples . . . . . . . . . . . . . . . . . . . 216

© 2012 by Taylor & Francis Group, LLC


x

5.3.2.1 Two Receivers . . . . . . . . . . . . . . . . . 216


5.3.2.2 Three Receivers . . . . . . . . . . . . . . . . 216
5.3.3 MIMO Interference Channel . . . . . . . . . . . . . . . 218
5.3.3.1 One-Shot MIMO Game . . . . . . . . . . . . 222
5.3.4.1 MIMO Robust Game . . . . . . . . . . . . . 225
5.3.4.5 Without Perfect CSI . . . . . . . . . . . . . . 227

6 Learning in Constrained Robust Games 231

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 231


6.2 Constrained One-Shot Games . . . . . . . . . . . . . . . . . 231
6.2.1 Orthogonal Constraints . . . . . . . . . . . . . . . . . 231
6.2.2 Coupled Constraints . . . . . . . . . . . . . . . . . . . 232
6.3 Quality of Experience . . . . . . . . . . . . . . . . . . . . . . 233
6.4 Relevance in QoE and QoS satisfaction . . . . . . . . . . . . 234
6.5 Satisfaction Levels as Benchmarks . . . . . . . . . . . . . . . 235
6.6 Satisfactory Solution . . . . . . . . . . . . . . . . . . . . . . 236
6.7 Efficient Satisfactory Solution . . . . . . . . . . . . . . . . . 237
6.8 Learning a Satisfactory Solution . . . . . . . . . . . . . . . . 237
6.8.3 Minkowski-Sum of Feasible Sets . . . . . . . . . . . . . 239
6.9 From Nash Equilibrium to Satisfactory Solution . . . . . . . 239
6.10 Mixed and Near-Satisfactory Solution . . . . . . . . . . . . . 240
6.11 CODIPAS with Dynamic Satisfaction Level . . . . . . . . . . 242
6.12 Random Matrix Games . . . . . . . . . . . . . . . . . . . . . 243
6.12.1 Random Matrix Games Overview . . . . . . . . . . . . 244
6.12.2 Zero-Sum Random Matrix Games . . . . . . . . . . . 245
6.12.4 NonZero Sum Random Matrix Games . . . . . . . . . 247
6.12.5.1 Relevance in Networking and Communication 248
6.12.7 Evolutionary Random Matrix Games . . . . . . . . . . 250
6.12.8 Learning in Random Matrix Games . . . . . . . . . . 250
6.12.9 Mean-Variance Response . . . . . . . . . . . . . . . . 251
6.12.11 Satisfactory Solution . . . . . . . . . . . . . . . . . . . 253
6.13 Mean-Variance Response and Demand Satisfaction . . . . . . 253

7 Learning under Random Updates 255

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 255


7.2 Description of the Random Update Model . . . . . . . . . . 258
7.2.1 Description of the Dynamic Robust Game . . . . . . 260
7.3 Fully Distributed Learning . . . . . . . . . . . . . . . . . . . 263
7.3.1 Distributed Strategy-Reinforcement Learning . . . . . 263
7.3.2 Random Number of Interacting Players . . . . . . . . 269
7.3.3 CODIPAS-RL for Random Updates . . . . . . . . . . 273
7.3.4 Learning Schemes Leading to Multi-Type Replicator
Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 274

© 2012 by Taylor & Francis Group, LLC


xi

7.3.5 Heterogeneous Learning with Random Updates . . . . 276


7.3.6 Constant Step-Size Random Updates . . . . . . . . . . 279
7.3.7 Revision Protocols with Random Updates . . . . . . . 279
7.4 Dynamic Routing Games with Random Traffic . . . . . . . . 280
7.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
7.5.1 Learning in Stochastic Games . . . . . . . . . . . . . . 282
7.5.2.1 Nonconvergence of Fictitious Play . . . . . . 283
7.5.2.3 Q-learning in Zero-Sum Stochastic Games . . 284
7.5.3 Connection to Differential Dynamic Programming . . 286
7.5.4 Learning in Robust Population Games . . . . . . . . . 286
7.5.4.1 Connection with Mean Field Game Dynamics 286
7.5.5 Simulation of Population Games . . . . . . . . . . . . 291
7.6 Mobility-Based Learning in Cognitive Radio Networks . . . . 292
7.6.1 Proposed Cognitive Network Model . . . . . . . . . . 295
7.6.2 Cognitive Radio Network Model . . . . . . . . . . . 296
7.6.2.1 Mobility of Users . . . . . . . . . . . . . . . 296
7.6.3 Power Consumption . . . . . . . . . . . . . . . . . . . 298
7.6.4 Virtual Received Power . . . . . . . . . . . . . . . . . 300
7.6.5 Scaled SINR . . . . . . . . . . . . . . . . . . . . . . . 300
7.6.6 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . 301
7.6.8 Performance of a Generic User . . . . . . . . . . . . . 303
7.6.8.1 Access Probability . . . . . . . . . . . . . . . 303
7.6.8.3 Coverage Probability . . . . . . . . . . . . . 305
7.7 Hybrid Strategic Learning . . . . . . . . . . . . . . . . . . . 308
7.7.1 Learning in a Simple Dynamic Game . . . . . . . . . . 309
7.7.1.1 Learning Patterns . . . . . . . . . . . . . . . 309
7.7.1.2 Description of CODIPAS Patterns . . . . . . 310
7.7.1.3 Asymptotics of Pure Learning Schemes . . . 311
7.7.1.4 Asymptotics of Hybrid Learning Schemes . . 312
7.8 Quiz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
7.8.1 What is Wrong in Learning in Games? . . . . . . . . . 312
7.8.2 Learning the Action Space . . . . . . . . . . . . . . . . 314
7.9 Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . 314

8 Fully Distributed Learning for Global Optima 317

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 317


8.2 Resource Selection Games . . . . . . . . . . . . . . . . . . . 317
8.3 Frequency Selection Games . . . . . . . . . . . . . . . . . . . 318
8.3.1 Convergence to One of the Global Optima . . . . . . . 319
8.3.2 Symmetric Configuration and Evolutionarily Stable
State . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
8.3.3 Accelerating the Convergence Time . . . . . . . . . . 323
8.3.4 Weighted Multiplicative imitative CODIPAS-RL . . . 324
8.3.5 Three Players and Two Frequencies . . . . . . . . . . 329

© 2012 by Taylor & Francis Group, LLC


xii

8.3.5.1 Global Optima . . . . . . . . . . . . . . . . . 329


8.3.5.2 Noisy Observation . . . . . . . . . . . . . . . 329
8.3.6 Similar Learning Rate . . . . . . . . . . . . . . . . . . 331
8.3.7 Two Time-Scales . . . . . . . . . . . . . . . . . . . . . 332
8.3.8 Three Players and Three Frequencies . . . . . . . . . . 332
8.3.9 Arbitrary Number of Users . . . . . . . . . . . . . . . 332
8.3.9.1 Global Optimization . . . . . . . . . . . . . . 333
8.3.9.2 Equilibrium Analysis . . . . . . . . . . . . . 334
8.3.9.3 Fairness . . . . . . . . . . . . . . . . . . . . . 334
8.4 User-Centric Network Selection . . . . . . . . . . . . . . . . . 335
8.4.1 Architecture for 4G User-Centric Paradigm . . . . . . 337
8.4.2 OPNET Simulation Setup . . . . . . . . . . . . . . . . 342
8.4.3 Result Analysis . . . . . . . . . . . . . . . . . . . . . . 344
8.5 Markov Chain Adjustment . . . . . . . . . . . . . . . . . . . 345
8.5.1 Transitions of the Markov Chains . . . . . . . . . . . . 346
8.5.2 Selection of Efficient Outcomes . . . . . . . . . . . . . 347
8.6 Pareto Optimal Solutions . . . . . . . . . . . . . . . . . . . . 348
8.6.1 Regular Perturbed Markov Process . . . . . . . . . . . 350
8.6.2 Stochastic Potential . . . . . . . . . . . . . . . . . . . 350

9 Learning in Risk-Sensitive Games 361

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 361


9.1.1 Risk-Sensitivity . . . . . . . . . . . . . . . . . . . . . . 363
9.1.2 Risk-Sensitive Strategic Learning . . . . . . . . . . . . 365
9.1.3 Single State Risk-Sensitive Game . . . . . . . . . . . 366
9.1.4 Risk-Sensitive Robust Games . . . . . . . . . . . . . . 366
9.1.5 Risk-Sensitive Criterion in Wireless Networks . . . . . 367
9.2 Risk-Sensitive in Dynamic Environment . . . . . . . . . . . . 368
9.2.1 Description of the Risk-Sensitive Dynamic Environment 368
9.2.2 Description of the Risk-Sensitive Dynamic Game . . . 369
9.2.2.8 Two-by-Two Risk-Sensitive Games . . . . . 373
9.2.2.9 Type I . . . . . . . . . . . . . . . . . . . . . 375
9.2.2.10 Type II . . . . . . . . . . . . . . . . . . . . . 375
9.3 Risk-sensitive CODIPAS . . . . . . . . . . . . . . . . . . . . 376
9.3.1 Learning the Risk-Sensitive Payoff . . . . . . . . . . . 376
9.3.2 Risk-Sensitive CODIPAS Patterns . . . . . . . . . . . 378
9.3.2.1 Bush-Mosteller based RS-CODIPAS . . . . . 378
9.3.2.2 Boltzmann-Gibbs-Based RS-CODIPAS . . . 378
9.3.2.3 Imitative BG CODIPAS . . . . . . . . . . . . 379
9.3.2.4 Multiplicative Weighted Imitative CODIPAS 379
9.3.2.5 Weakened Fictitious Play-Based CODIAPS . 380
9.3.2.6 Risk-Sensitive Payoff Learning . . . . . . . . 380
9.3.3 Risk-Sensitive Pure Learning Schemes . . . . . . . . . 381
9.3.4 Risk-sensitive Hybrid Learning Scheme . . . . . . . . . 383

© 2012 by Taylor & Francis Group, LLC


xiii

9.3.5 Convergence Results . . . . . . . . . . . . . . . . . . . 384


9.3.5.2 Convergence to Equilibria . . . . . . . . . . . 386
9.3.5.6 Convergence Time . . . . . . . . . . . . . . . 387
9.3.5.8 Explicit Solutions . . . . . . . . . . . . . . . 388
9.3.5.9 Composed Dynamics . . . . . . . . . . . . . 390
9.3.5.11 Non-Convergence to Unstable Rest Points . . 391
9.3.5.13 Dulac Criterion for Convergence . . . . . . . 391
9.4 Risk-Sensitivity in Networking and Communications . . . . . 393
9.5 Risk-Sensitive Mean Field Learning . . . . . . . . . . . . . . 405
9.6 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
9.6.1 Risk-Sensitive Correlated Equilibria . . . . . . . . . . 409
9.6.2 Other Risk-Sensitive Formulations . . . . . . . . . . . 410
9.6.3 From Risk-Sensitive to Maximin Robust Games . . . . 410
9.6.4 Mean-Variance Approach . . . . . . . . . . . . . . . . 412
9.7 Chapter Review . . . . . . . . . . . . . . . . . . . . . . . . . 413
9.7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 413
9.7.2 Open Issues . . . . . . . . . . . . . . . . . . . . . . . . 415

A Appendix 417

A.1 Basics of Dynamical Systems . . . . . . . . . . . . . . . . . . 417


A.2 Basics of Stochastic Approximations . . . . . . . . . . . . . . 423
A.3 Differential Inclusion . . . . . . . . . . . . . . . . . . . . . . 438
A.4 Markovian Noise . . . . . . . . . . . . . . . . . . . . . . . . . 442

Bibliography 443

Index 459

© 2012 by Taylor & Francis Group, LLC


List of Figures

1.1 Strategic Learning. . . . . . . . . . . . . . . . . . . . . . . . 33


1.2 A generic combined learning scheme. . . . . . . . . . . . . . 35

2.1 Convergence of best-reply. . . . . . . . . . . . . . . . . . . . 68


2.2 Nonconvergence of best-reply . . . . . . . . . . . . . . . . . 69
2.3 Nonconvergent aggregative game. . . . . . . . . . . . . . . . 76
2.4 Design of step size. . . . . . . . . . . . . . . . . . . . . . . . 79
2.5 Mann iteration. Design of step size. . . . . . . . . . . . . . . 80
2.6 Multiple access game between two mobiles. . . . . . . . . . 99
2.7 Cognitive MAC Game. . . . . . . . . . . . . . . . . . . . . . 101
2.8 Reduced Cognitive MAC Game. . . . . . . . . . . . . . . . 101
2.9 A generic RL algorithm. . . . . . . . . . . . . . . . . . . . . 103

4.1 A generic CODIPAS-RL algorithm. . . . . . . . . . . . . . . 156


4.2 Mixed strategy of P1 under CODIPAS-RL BG. . . . . . . . 175
4.3 Mixed strategy of P2 under CODIPAS-RL BG. . . . . . . . 175
4.4 Probability to play the action 1 under Boltzmann-Gibbs
CODIPAS-RL. . . . . . . . . . . . . . . . . . . . . . . . . . 176
4.5 Average payoffs of action 1 under Boltzmann-Gibbs CODIPAS-
RL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
4.6 Estimated payoffs for action 1 under Boltzmann-Gibbs
CODIPAS-RL. . . . . . . . . . . . . . . . . . . . . . . . . . 177
4.7 Probability to play the action 1 under Imitative CODIPAS-
RL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
4.8 Estimated payoff for action 1 under Imitative CODIPAS-RL. 178
4.9 Two jammers and one regular node. . . . . . . . . . . . . . 197

5.1 A delayed CODIPAS-RL algorithm. . . . . . . . . . . . . . . 211


5.2 Heterogeneous CODIPAS-RL: Convergence of the ODEs of
strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
5.3 CODIPAS-RL: Convergence to global optimum equilibria. . 219
5.4 CODIPAS-RL: Convergence of payoff estimations. . . . . . 220
5.5 CODIPAS-RL under two-step delayed payoffs. . . . . . . . . 221

7.1 Large population of users. . . . . . . . . . . . . . . . . . . . 287


7.2 Bad RSP: Zoom around the stationary point. . . . . . . . . 292
7.3 Mean field simulation of good RSP ternary plot and zoom. . 293

xv

© 2012 by Taylor & Francis Group, LLC


xvi Distributed Strategic Learning for Wireless Engineers

7.4 Typical cognitive radio scenario under consideration. . . . . 297


7.5 A generic Brownian mobility. . . . . . . . . . . . . . . . . . 298
7.6 Evolution of remaining energy. . . . . . . . . . . . . . . . . . 299

8.1 Convergence to global optimum under imitation dynamics. . 319


8.2 Vector field of imitation dynamics. . . . . . . . . . . . . . . 320
8.3 Vector field of replicator dynamics. . . . . . . . . . . . . . . 320
8.4 Strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
8.5 Estimations and average payoffs. . . . . . . . . . . . . . . . 326
8.6 Three users and two choices. . . . . . . . . . . . . . . . . . . 330
8.7 Three users and two actions. . . . . . . . . . . . . . . . . . . 353
8.8 Impact of the initial condition. . . . . . . . . . . . . . . . . 354
8.9 IMS based integration of operators with trusted third party 355
8.10 OPNET simulation scenario . . . . . . . . . . . . . . . . . . 355
8.11 The scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
8.12 Evolution of randomized actions for underloaded configura-
tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
8.13 Evolution of randomized actions for congested configuration. 357
8.14 Convergence to equilibrium. . . . . . . . . . . . . . . . . . . 357
8.15 Convergence to global optimum. . . . . . . . . . . . . . . . . 358
8.16 Evolution of randomized actions. . . . . . . . . . . . . . . . 358
8.17 Evolution of estimated payoffs. . . . . . . . . . . . . . . . . 359

9.1 Global optima: μj < 0. . . . . . . . . . . . . . . . . . . . . . 398


9.2 Convergence to global optima, μj < 0. . . . . . . . . . . . . 399
9.3 Convergence to global optimum (1, 0, 0). μi > 0. . . . . . . . 400
9.4 Two risk-averse users and one risk-seeking user. μ1 < 0, μ2 <
0, μ3 > 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
9.5 Imitative CODIPAS-RL. Impact initial condition. μi = −0.01. 402
9.6 Imitative CODIPAS-RL, μj > 0. Impact of initial condition. 403
9.7 Imitative CODIPAS-RL: 3D plot. . . . . . . . . . . . . . . . 404

© 2012 by Taylor & Francis Group, LLC


List of Tables

1.1 2 × 2 expected robust game. . . . . . . . . . . . . . . . . . . 13


1.2 Robust game with dominant strategy. . . . . . . . . . . . . . 28
1.3 Anti-coordination robust game. . . . . . . . . . . . . . . . . 29

2.1 Comparison of analytical model estimates and auditory judg-


ments (MOS). . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.2 Comparative properties of the different learning schemes . . 130

3.1 Information assumptions . . . . . . . . . . . . . . . . . . . . 138


3.2 Routing versus game theoretic parameters . . . . . . . . . . 145

4.1 Basic assumptions for CODIPAS. . . . . . . . . . . . . . . . 157


4.2 CODIPAS: information and computation assumptions . . . 194
4.3 CODIPAS: learnable data. . . . . . . . . . . . . . . . . . . . 195

5.1 Assumptions for games under channel uncertainty. . . . . . 214

6.1 2 × 2 expected robust game. . . . . . . . . . . . . . . . . . . 248

8.1 Strategic form representation of 2 nodes and 2 technologies. 318


8.2 Strategic form representation for 3 users - 2 frequencies . . . 329
8.3 Frequency selection game: 3 players, 3 frequencies . . . . . . 333
8.4 QoS parameters and ranges from the user payoff function . . 338

9.1 Asymptotic pseudo-trajectories of pure learning . . . . . . . 382


9.2 Frequency selection games . . . . . . . . . . . . . . . . . . . 393
9.3 Frequency selection games: random activity . . . . . . . . . 395
9.4 Risk-sensitive frequency selection games . . . . . . . . . . . 395
9.5 Frequency selection games: random activity . . . . . . . . . 396
9.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415

xvii

© 2012 by Taylor & Francis Group, LLC


Foreword

We live today in a truly interconnected world. Viewed as a network of decision


making agents, decisions taken and information generated in one part, or one
node, rapidly propagate to other nodes, and have impact on the well being (as
captured by utilities) of agents at those other nodes. Hence, it is not only the
information flow that connects different agents (or players, in the parlance of
game theory), but also the cross-impact of individual actions. Individual play-
ers therefore know that their performance will be affected by decisions taken
by at least a subset of other players, just as their decisions will affect others.
To expect a collaborative effort toward picking the “best” decisions is gen-
erally unreasonable, and for various reasons, among which are nonalignment
of individual objectives, limits on communication, incompatibility of beliefs,
and lack of a mechanism to enforce a stable cooperative solution. Sometimes
a player will not even know the objective or utility functions of other players,
their motivations, and the possible cross-impacts of decisions.
How can one define an equilibrium solution concept that will accommo-
date different elements of such an uncertain decision making environment?
How can such an equilibrium be reached when players operate under incom-
plete information? Can players learn through an iterative process and with
strategic plays the equilibrium-relevant part of the game? Would such an it-
erative process converge, and to the desired equilibrium, when players learn
at different rates, employ heterogeneous learning schemes, receive informa-
tion at different rates, and adopt different attitudes toward risk (some being
risk-neutral, other being risk-sensitive)?
The questions listed above all relate to issues that sit right in the heart of
multi-agent networked systems research. And this comprehensive book meets
the challenge of addressing them all, in the nine chapters to follow.

Professor Tamer Başar,


Urbana-Champaign,
Illinois, 11-11-11.

xix

© 2012 by Taylor & Francis Group, LLC


Preface

Preface to the book Distributed Strategic Learning for Wireless Engi-


neers

Much of Game Theory has developed within the community of Economists,


starting from the book “Theory of Games and Economic behavior” by Mor-
genstern and Von Neumann (1944). To a lesser extent, it has had an impact on
biology (with the development of evolutionary games) and on road traffic Engi-
neering (triggered by the concept of Wardrop equilibrium introduced already
in 1952 along with the Beckmann potential approach introduced in 1956).
Since 1999 game theory has had a remarkable penetration into computer sci-
ence with the formation of the community of Algorithmic game theory.
I am convinced that game theory will play a much more central role in
many fields in the future including telecommunication network engineering.
I use the term Network Engineering Games (NEGs) to call games that arise
within the latter context. NEG is the young brother of the Algorithmic game
theory. NEG is concerned with competition that arises at all levels of a net-
work. This includes aspects related to information theory, to power control
and energy management, to routing, to the transport and application layers of
communication networks. It also includes competition arising in spread of in-
formation over a network as well as issues related to the economy of networks.
Finally, it includes security issues, service denial attacks, spread of virus in
computers and measures to fight it.
This book is the first to consider a systematic analysis of games arising in
all network layers and is thus an important contribution to NEGs.
The word “game” may have connotations to “toys” or of “playing” (as
opposed to decision making). But in fact it stands for decision making by
several decision makers, each having her (or his) own individual objectives.
Is game theory a relevant tool for research in communication networks? On
20/12/2011 I searched on Scholar Google the documents containing “wireless
networks” together with “power control”. 20500 documents were found. Of
these, 3380 appeared in 2011, and 1680 dated from 2000 or earlier. I then
repeated the experience restricting further to documents containing “game
theory”. 2600 documents were found. Of these, 20 dated from prior to 2001

xxi

© 2012 by Taylor & Francis Group, LLC


xxii Distributed Strategic Learning for Wireless Engineers

and 580 dated from the single year 2011. The share of documents containing
“game theory” thus increased from 1.2% to 17% within 10 years.

Is game theory relevant in wireless Engineering?

A user that changes some protocols in his cellular telephone may find out
that a more aggressive behavior is quite beneficial and allows to obtain better
performances. Yet if all the population tried to act selfishly and use more
aggressive protocols, then everyone may loose in performance. But in practice
we do not bother to change the protocols in our cellular phones. Making
such changes would require access to the hardware, skills and training which
is too much to invest. This may suggest that game theory should be used
for other networking issues, perhaps in other scales (such as auctions over
bandwidth, competition between service providers etc). So is there a need in
NEG? Here are two different angles that one can use to look at this problem.
First, we made here the assumption that decisions are taken concerning how
to use equipment. But we can instead consider the decisions as being which
equipment to buy. The user’s decisions concerning which protocol to use are
taken when one purchases a telephone terminal. One prefers a telephone that is
known to perform better. The game is then between equipment constructors.
Secondly, not all decisions require special skills and knowhow. The service
providers and/or the equipment constructors can often gain considerably by
delegating to the users to take decisions. For example, when you wish to
connect to the Internet from your laptop, you often go to a menu that provides
you with a list of available connections along with some of their properties.
The equipment provider has decided to leave us, the users, this choice. It also
decides what information to let us have when we take the decision.
Leaving the decisions to the users is beneficial for service providers because
of scalability issues: decentralizing a network may reduce signaling, computa-
tions and costs. When designing a network that relies on decisions taken by
users, one needs to predict the users behavior. Learning is part of their behav-
ior. Much of the theory of learning in games has been developed by biologists
that used mathematical tools to model learning and adaptation within com-
peting species. In NEG one need not restrict to describing existing learning
approaches, one can propose and design learning procedures.

Why learn to play an equilibrium?

Sometimes it’s better not to learn. For example, assume that there are
two players, one chooses x and the other chooses y, where both x and y lye
in the half closed unit interval [0, 1[ Assume that both have the same utility

© 2012 by Taylor & Francis Group, LLC


Preface xxiii

to maximize, which is given by r(x, y) = xy. It is easily seen that this game
has an equilibrium which is unique: (0, 0). This is the worst possible choice
for both players. Any values of x and y that are strictly different from the
equilibrium value give both a strictly better utility!
When a service provider delegates to users some decisions, it can control
what parameter to let them control and what information to let them have
so as to avoid such situations. Learning to play equilibrium may then be in
the interest of the players, and exploring learning algorithms enrich the tools
available in designing networks.
This book is unique among the books on learning in game theory in focus-
ing on problems relevant to games in wireless engineering. It is a masterpiece
bringing the state-of-the art foundations of learning in games to wireless.

Professor Eitan Altman


INRIA Sophia Antipolis
February 3rd, 2012

© 2012 by Taylor & Francis Group, LLC


xxiv Distributed Strategic Learning for Wireless Engineers

Strategic learning has made substantial progress since the early 1950s and
has become a central element in economics, engineering, and computer sci-
ence. One of the most significant accomplishments in strategic decision mak-
ing during the last decades have been the development of game dynamics.
Learning and dynamics are necessary when the problem to be solved is un-
der uncertainty, time-variant and depends on the structure of the dynamic
environment. This book develops distributed strategic learning schemes in
games [15, 16, 17]. It offers several examples in networking, communications,
economics and evolutionary biology in which learning and dynamics play an
important role in understanding the behavior of the system.
As a first example, consider a spectrum access problem where the sec-
ondary users can sense a subset of channels. If there are unused channels by
primary users at a given time slot, then the secondary users which sensed can
access to the free channels. The problem is that even under slotted time and
frames, several secondary users can simultaneously sense the same channels at
the same time. We can explicitly describe this problem depending the channel
conditions, the throughput, the set of primary users, the set of malicious users,
the set of altruistic users (relays), the set of secondary users, their arrivals,
departure rates, their past activities, but we are unable to explain how the
secondary users do it if they sensed the same channel at the same time. Thus,
it is useful to find a learning mechanism that allows an access allocation in
the long-run.
As a second example, consider a routing packet over a wireless ad hoc
network. The wireless path maximizing the quality of service with minimal
end-to-end delay from a source to a destination changes continuously as the
network traffic and the topology change. A learning-based routing protocol
is therefore needed to estimate the network traffic and to predict the best
stochastic path.
Already there are many successful applications of learning in networked
games but also in many other domains: robotics, machine learning, bio-
informatics, economics, finance, cloud computing, network security and relia-
bility, social networks, etc. A great many textbooks have been written about
learning in dynamic game theory. Most of them adopt either an economic
perspective or a mathematical perspective. In the past several years, though,
the application of game theory to problems in networking and communication
systems has become more important. Specifically, game-theoretic models have
been developed to better understand flow control, congestion control, power
control, admission control, access control, network security, quality of service,
quality of experience management and other issues in wireline and wireless
systems. By modeling interdependent decision makers such as users, trans-
mitters, radio devices, nodes, designer, operators, etc, game theory allows us
to model scenarios in which there is no centralized entity with a full picture
of the system conditions. It allows also teams, collaborations, and coalitional
behaviors among the participants. The challenges in applying game theory to
networking systems has attracted a lot of attention in the last decade. Most

© 2012 by Taylor & Francis Group, LLC


Preface xxv

of the game-theoretic models can abstract away important assumptions and


mask critical unanswered questions. In absence of observation of the actions of
the other participants and under unknown dynamic environment, the predic-
tion of the outcome are less clear. It is our hope that this book will illuminate
both the promise of learning in dynamic games as a tool for analyzing net-
work evolution and the potential pitfalls and difficulties likely to be encoun-
tered when game theory is applied by practicing engineers, undergraduate,
graduate students, and researchers. We have not attempted to cover either
learning in games or its applications to networking and communications. We
have severely restricted our exposition to those topics that we feel are neces-
sary to give the reader a grounding in the fundamentals of learning in games
under uncertainty or robust games and their applications to networking and
communications.
As most of wireless networks are dynamic and evolve in time, we are see-
ing a tendency toward decentralized networks, in which each node may play
multiple roles at different times without relying on an access point or a base
station (small base station, femto-cell BS or macro-cell BS) to make decisions
such as in what frequency band to operate, how much power to use during
transmission frame, when to transmit, when to go in sleep mode, when to up-
grade, etc. Examples include cognitive radio networks, opportunistic mobile
ad hoc networks, and sensor networks that are autonomous and self-organizing
and support multihop communications. These characteristics lead to the need
for distributed decision making that potentially takes into account network
conditions as well as channel conditions. In such distributed systems, an in-
dividual terminal may not have access to control information regarding other
terminal’s actions and network congestion may occur. We address the follow-
ing questions:

• Question One: How much information is enough for effective distributed


decision making?

• Question Two: Is having more information always useful in terms of system


performance (value/price of information)?

• Question Three: What are the individual learning performance bounds


under outdated and imperfect measurement?

• Question Four: What are the possible dynamics and outcomes if the players
adopt different learning patterns?
• Question Five: If convergence occurs, what is the convergence time of
heterogeneous learning (at least two of the players use different learning
patterns)?

• Question Six: What are the issues (solution concepts, non-convergence,


convergence rate, convergence time etc) of hybrid learning (at least one
player changes its learning pattern during the interaction)?

© 2012 by Taylor & Francis Group, LLC


xxvi Distributed Strategic Learning for Wireless Engineers

• Question Seven: How to develop very fast and efficient learning schemes
in scenarios where some players have more information than the others?
• Question Eight: What is the impact of risk-sensitivity in strategic learning
systems?

• Question Nine: How do we construct learning schemes in a dynamic envi-


ronment in which one of the players does not observe a numerical value of
its own-payoffs but only a signal of it?
• Question Ten: How to learn “unstable” equilibria and global optima in a
fully distributed manner?

These questions are discussed through this book. There is an explicit descrip-
tion of how players attempt to learn over time about the game and about the
behavior of others (e.g. through reinforcement, adaptation, imitation, belief
updating, estimations or combination of these etc.). The focus is both on finite
and infinite systems, where the interplay among the individual adjustments
undertaken by the different players generate different learning dynamics, het-
erogeneous learning, risk-sensitive learning, and hybrid dynamics.

© 2012 by Taylor & Francis Group, LLC


Preface xxvii
How to use this book?

This Guide is designed to assist instructors in helping students grasp the


main ideas and concepts of distributed strategic learning. It can serve as the
text for learning algorithm courses with a variety of different goals, and for
courses that are organized in a variety of different manners. The Instructor’s
note and supporting materials is developed for use in a course using distributed
strategic learning with the following goals for students:
Students will be better able to think about iterative process for engineering
problems;
Students will be better able to make use of their algorithmic, graphing,
and computational skills in real wireless networks based on data;
Students will be better able to independently read, study and understand
the topics that are new to the students such as solution concepts in robust
games;
Students will be better able to explain and describe the learning outcomes
and notions orally and to discuss both qualitative and quantitative topics with
others;
We would like to make the following remarks. The investigations of var-
ious solutions are almost independent of each other. For example, you may
study the strategy dynamics by reading Chapter 2 and payoff dynamics by
reading Chapter 3. If you are interested only in the risk-sensitive learning,
you should read Chapter 8. Similar possibilities exist for the random updates,
heterogeneous learning, and hybrid learning (see the Table of Contents).
If you plan an introductory course on robust game theory, then you may use
Chapter 1 for introducing robust games in strategic-form. Remark. Chapters
2 - 8 may be used for a one-semester course on distributed strategic learning.
Each chapter contains some exercises. The reader is advised to solve at
least those exercises that are used in the text to complete the proofs of various
results.
This book can be used for a one semester course by sampling from the
chapters and possibly by discussing extra research papers; in that case, I hope
that the references at the end of the book are useful. I welcome your feedback
via email to tembineh(at)gmail.com. I very much enjoyed writing this course,
I hope you will enjoy reading it.

Notation and Terminology

The book is comprised of nine chapters and one appendix. Each chapter is
divided into sections, and sections occasionally into subsections. Section 2.3,
for example, refers to the third section of Chapter 2, while Subsection 2.3.1 is
the first section of Subsection 2.3.

© 2012 by Taylor & Francis Group, LLC


xxviii Distributed Strategic Learning for Wireless Engineers

Items like theorems, propositions, lemmas, etc, are identified within each
chapter according to the standard numbering; Equation (7.1) would be the
first equation of Chapter 7.

Organization of the book

The manuscript comprises nine chapters.


• Chapter one introduces basic strategic decision-making and robust games.
State-dependent games with different level of information are formulated
and the associated solution concepts are discussed. Later, distributed
strategic learning approaches in different layers of the open systems inter-
connection model (OSI) including physical layer (PHY), medium access
control (MAC) layer, network layer, transport layer, and application layer
are presented.
• In Chapter two, we overview classical distributed learning schemes. We
start with partially distributed strategy-learning algorithms and their
possible implementation in wireless networks. Generically, partially dis-
tributed learning schemes, sometimes called semi-distributed schemes, as-
sume that all players knew their own-payoff functions and, observe others’
actions in previous stages. This is clearly not the case in many networking
and communication problems of interest. Under this strong assumption,
several game-theoretic formulations are possible for uncertain situations.
Then, the question of how to learn the system characteristics in presence
of incomplete information and imperfect measurements is addressed. Both
convergence and nonconvergence results are provided. In the other chap-
ters of this book, we develop strategic learning framework by assuming
that each player is able to learn progressively its own-action space, knows
his or her current action, and observes a numerical (possibly noisy) value of
her (delayed) payoff (the mathematical structure of the payoff functions
are unknown as well as the actions of the other players). This class of
learning procedures is called fully distributed strategy-learning algorithm
or model-free strategy-learning and is presented in section 2.3.
• Chapter 3 focuses on payoff learning and dynamics. The goal of Payoff
Learning is to learn the payoff functions, the expected payoffs and the
risk-sensitive payoffs. In many cases, the exact payoff functions may not be
known by the players. The players try to learn the unknown data through
the long-run interactions. This chapter complements the Chapter two.
• Chapter 4 studies combined fully distributed payoff and strategy learning
(CODIPAS). The core chapter examines how can evolutionary game the-
ory be used as a framework to analyze multi-player reinforcement learn-
ing algorithms in an heterogeneous setting. In addition, equilibrium seek-

© 2012 by Taylor & Francis Group, LLC


Preface xxix

ing algorithms, learning in multi-armed bandit problems and algorithms


for solving variational inequality are presented. CODIPAS combines both
strategy-learning and payoff-learning.

• Chapter 5 examines combined learning under delayed and unknown pay-


offs. Based on outdated and noisy measurements, combining learning
schemes that incorporates the delays, as well as schemes that avoid the
delays, are investigated. Relevant applications to wireless networks are
presented.

• Chapter 6 analyzes combined learning in constrained-like games. The core


of the chapter comprises two parts. The first part introduces constrained
games and the associated solution concepts. Then, we address the chal-
lenging question of how such a game can be played? How player can choose
their actions in constrained games? The second part of the chapter focuses
on satisfactory solutions. Instead of robust optimization framework, we
propose a robust satisfaction theory which is relevant quality-of-experience
(QoE, application layer) and quality-of-service (QoS, network layer) prob-
lems. The feasibility conditions as well as satisfactory solutions are inves-
tigated. The last part of the chapter is concerned about random matrix
games (RMGs) with variance criterion.

• Chapter 7 extends the heterogeneous learning to hybrid learning. Uncer-


tainty, random updates and switching between different learning proce-
dures are presented.

• Chapter 8 develops learning schemes for global optima. The chapter pro-
vides specific class of games in which global optimum can be found in
a fully distributed manner. Selection of larger sets, Pareto optimal solu-
tions, are discussed. A detailed MATLAB code associated to the example
of resource selection games is provided.

• Chapter 9 presents risk-sensitivity aspects in learning. The classical game-


theoretic approach to modeling multi-player interaction assumes that play-
ers in a game want to maximize their expected payoff. But in many set-
tings, players instead often want to maximize some more complicated func-
tion of their payoff. The expected payoff framework for games is obviously
very general, but it does exclude the possibility that players in the game
have preferences that depend on the entire distribution of payoff, and not
just on its expectation. For example, if a player is sensitive to risk, her
objective might be to balance the variance to be closer to the expectation.
Indeed, this is the recommendation of modern portfolio theory, and a ver-
sion of the mean-variance objective is widely used by investors in financial
markets as well as in network economics. The chapter also addresses the
generalization of familiar notions of Nash and correlated equilibria to set-
tings where players are sensitive to the risk. We especially examine the
impact of risk-sensitivity in the outcome.

© 2012 by Taylor & Francis Group, LLC


xxx Distributed Strategic Learning for Wireless Engineers

• Background materials on dynamical systems and stochastic approxima-


tions are provided in appendices.

© 2012 by Taylor & Francis Group, LLC


Preface xxxi
Acknowledgments

I would like to thank everyone who made this book possible. I owe a spe-
cial debt of gratitude to those colleagues who gave up their time to referee the
Chapters. I would like to thank Professors Eitan Altman, Anatoli Iouditski,
and Sylvain Sorin who initiated my interest in learning under uncertainty. The
development of this book has spanned many years. The material as well as its
presentation has benefited greatly from the inputs of many bright undergrad-
uate and graduate students who worked on this topic. I would like to thank
my colleagues from Ecole Polytechnique for their comments. It is a pleasure
to thank my collaborators and coauthors of articles and papers on which part
of the chapters of this book is based. They have played an important role in
shaping my thinking for so many years. Their direct and indirect contributions
to this work are significant. They are, of course, not responsible for the way I
have assembled material, especially the parts I have added to and subtracted
from our joint works to try to make the manuscript more coherent.
Special thank to Professor Tamer Başar who kindly accepted my invitation
to write a foreword to the book, to Professor Eitan Altman who kindly ac-
cepted to write a preface. My thanks go to Professors Vivek Borkar, Mérouane
Debbah, Samson Lasaulce, David Leslie, Galina Schwartz, Mihaela van der
Schaar, Thanos Vasilakos, and Peyton H. Young for fruitful interactions or
collaborations. I am grateful to the anonymous reviewers for assistance in
proofreading the manuscript.
I thank seminar participants at the University of California at Los Angeles
(UCLA), Ecole Polytechnique, University of California at Berkeley, Univer-
sity of Avignon, the National Institute for Research in Computer Science
and Control (INRIA), University of Illinois at Urbana Champaign (UIUC),
McGill University, Ecole Polytechnique Fédérale de Lausanne (EPFL), Ecole
Supérieure d’Electricité (Supelec), University of California at Santa Cruz
(UCSC), etc.

Artwork

The scientific graphs in the book are generated using the MATLAB soft-
ware by Mathworks Inc. and the mean field package for simulation of large
population games. The two figures in the cover of the book are examples of
cycling learning processes.

© 2012 by Taylor & Francis Group, LLC


The Author Bio

Hamidou Tembine has two master degrees, one in applied mathematics and
one in pure mathematics from respectively Ecole Polytechnique and University
Joseph Fourier, France. He received a PhD degree from Avignon University,
France. He is an assistant professor at Ecole Supérieure d’Electricité, Supelec,
France. His current research interests include evolutionary games, mean field
stochastic games and applications. He was the recipient of many student travel
grant awards, and best paper awards (ACM Valuetools 2007, IFIP Networking
2008, IEEE/ACM WiOpt 2009, IEEE Infocom Workshop 2011).

xxxiii

© 2012 by Taylor & Francis Group, LLC


Contributors

Below is the list of contributors of the preface of the book and the foreword.
Eitan Altman, Research Director, Institut de Recherche en Informatique
et en Automatique, INRIA Sophia Antipolis, France
Tamer Başar, Professor, Coordinated Science Laboratory, University of
Illinois at Urbana-Champaign, Illinois, US.

xxxv

© 2012 by Taylor & Francis Group, LLC


Symbol Description
Rk k−dimensional Euclidean 1l{.} Indicator function.
space, k ≥ 2. l2 Space of sequences {λt }t≥0

N Set of players (finite or infi- such that t |λt |2 < +∞.
nite). l1 Space of sequences
 {λt }t≥0
B(t) Random set of active play- such that t |λt | < +∞
ers at time t. λj,t Learning rate of player j at
Aj Set of actions of player j. t.
sj ∈ Aj An element of Aj . esj ∈ Xj Unit vector with 1 at the
Δ(Aj ) Set of probability distribu- position of sj , and zero oth-
tions over Aj . erwise.
Xj Mixed actions Δ(Aj ).  1
 . 2  x 2 = ( k |xk |2 ) 2 .
aj,t Action of the player j at
., . Inner product.
time t. Element of Aj .
W State space, environment
xj,t Randomized action of the
state.
player j at t. Element of Xj .
rj,t Perceived payoff by player j w ∈ W A scalar, a vector or a ma-
at t. trix (finite dimension).
r̂j,t Estimated payoff vector of 2D The set of the all the sub-
player j at t. Element of sets of D.
R|Aj | . C 0 (A, B) Space of continuous func-
β̃j, (r̂j,t ) Boltzmann-Gibbs strategy tions from A to B.
of player j. Element of Xj . N Set of natural numbers
σ̃j, (r̂j,t ) Imitative Boltzmann-Gibbs (non-negative integers).
strategy of player j. Element Z Set of integers.
of Xj . Mt Martingale.

© 2012 by Taylor & Francis Group, LLC

You might also like