Game-Theoretic Learning in Distributed Control: Jason R. Marden and Jeff S. Shamma
Game-Theoretic Learning in Distributed Control: Jason R. Marden and Jeff S. Shamma
Control
Abstract
In distributed architecture control problems, there is a collection of inter-
connected decision-making components that seek to realize desirable collec-
tive behaviors through local interactions and by processing local information.
Applications range from autonomous vehicles to energy to transportation. One
approach to control of such distributed architectures is to view the components
as players in a game. In this approach, two design considerations are the
components’ incentives and the rules that dictate how components react to the
decisions of other components. In game-theoretic language, the incentives are
defined through utility functions, and the reaction rules are online learning
dynamics. This chapter presents an overview of this approach, covering basic
concepts in game theory, special game classes, measures of distributed efficiency,
utility design, and online learning rules, all with the interpretation of using game
theory as a prescriptive paradigm for distributed control design.
Keywords
Learning in games • Evolutionary games • Multiagent systems • Distributed
decision systems
This work was supported by ONR Grant #N00014-17-1-2060 and NSF Grant #ECCS-1638214
and by funding from King Abdullah University of Science and Technology (KAUST).
J.R. Marden ()
Department of Electrical and Computer Engineering, University of California, Santa Barbara,
CA, USA
e-mail: [email protected]
J.S. Shamma
Computer, Electrical and Mathematical Science and Engineering Division (CEMSE), King
Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
e-mail: [email protected]
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Game-Theoretic Distributed Resource Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Prescriptive Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Solution Concepts, Game Structures, and Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1 Solution Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Measures of Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Smoothness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Game Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5 Illustrative Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.6 A Brief Review of Game Design Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Distributed Learning Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1 Model-Based Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Robust Distributed Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Equilibrium Selection in Potential Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Universal Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1 Introduction
concept is Nash equilibrium, in which each player’s choice is optimal with respect
to the choices of other agents. Other solution concepts, which are generalizations of
Nash equilibrium, are correlated and coarse correlated equilibrium (Young 2004).
Typically, a solution concept does not uniquely specify the outcome of a game
(e.g., a game can have multiple Nash equilibria), and so there is the issue that some
outcomes are better than others.
A remaining concern is how a solution concept emerges at all. Given the
complete description of a game, an outside party can proceed to compute (mod-
ulo computational complexity considerations Daskalakis et al. 2009) a proposed
solution concept realization. In actuality, the data of a game (e.g., specific utility
functions) is distributed among the players and not necessarily shared or commu-
nicated. Rather, over time players might make observations of the choices of the
other players and eventually the collective play converges to some limiting structure.
This latter scenario is the topic of game-theoretic learning, for which there are
multiple survey articles and monographs (e.g., Fudenberg and Levine 1998; Hart
2005; Shamma 2014; Young 2004). Under the descriptive paradigm, the learning in
games discussion provides a plausibility argument of how players may arrive at a
specified solution concept realization. Under the prescriptive paradigm, the learning
in games discussion suggests an online algorithm that can lead agents to a desirable
solution concept realization.
This article will provide an overview of approaching distributed control from the
perspective of game theory. The presentation will touch on each of the aforemen-
tioned aspects of problem formulation, game design, and game-theoretic learning.
2.1 Setup
The update policy i ./ specifies how agent i processes available information to
formulate a decision. We will be more explicit about the argument of the i ./’s
in the forthcoming discussion. For now, the information available to an agent can
include both knowledge regarding previous action choices of other agents and
certain system-level information that is propagated throughout the system.
The main goal is to design both the agents’ utility functions and the agents’ local
policies fi gi 2N to ensure that the emergent collective behavior optimizes the global
objective W in terms of the asymptotic properties of W .a.t// as t ! 1.
Once the players and their choices have been set, the remaining elements in the
prescriptive paradigm that are yet to be designed are (i) the agent utility functions
and (ii) the update policies, fi gi 2N . One can view this specification in terms of the
following two-step design procedure:
Step #1: Game Design. The first step of the design involves defining the underly-
ing interaction structure in a game-theoretic environment. In particular, this choice
involves defining a utility function for each agent i 2 N of the form Ui W A ! R.
The utility of agent i for an action profile a D .a1 ; a2 ; : : : ; an / is expressed as Ui .a/
or alternatively Ui .ai ; ai / where ai denotes the collection of actions other than
player i in the joint action a, i.e., ai D .a1 ; : : : ; ai 1 ; ai C1 ; : : : ; an /. A key feature
of this design choice is the coupling of the agents’ utility functions where the utility,
or payoff, of one agent is affected by the actions of other agents.
Step #2: Learning Design. The second step involves defining the decision-making
rules for the agents. That is, how does each agent process available information to
formulate a decision. A typical assumption in the framework of learning in games
is that each agent uses historical information from previous actions of itself and
Game-Theoretic Learning in Distributed Control 5
other players. Accordingly, at each time t the decision of each agent i 2 N is made
independently through a learning rule of the form
ai .t/ D i fa./g D1;:::;t 1 I Ui ./ : (2)
1
Alternative agent control policies where the policy of agent i also depends on previous actions of
agent i or auxiliary “side information” could also be replicated by introducing an underlying state
in the game-theoretic environment. The framework of state-based games, introduced in Marden
(2012), represents one such framework that could accomplish this goal.
6 J.R. Marden and J.S. Shamma
Given an initial state profile a.0/, the dynamics in (3) produces a sequence of state
profiles a.1/, a.2/, : : : . Whether or not the state profiles converge to consensus
under the above dynamics (or variants thereof) has been extensively studied in the
existing literature (Blondel et al. 2005a; Olfati-Saber and Murray 2003; Tsitsiklis
et al. 1986).
Now we will present a game-theoretic design that leads to the same collective
behavior. More formally, consider a game-theoretic model where each agent i 2 N
is assigned an action set Ai D R and a utility function of the form
1 X
Ui .ai ; ai / D .ai aj /2 ; (4)
2 jNi .t/j
j 2N .t /
i
where jNi .t/j denotes the cardinality of the set Ni .t/. Now, suppose each agent
follows the well-known best-response learning rule of the form
where Bi .ai .t// is referred to as the best-response set of agent i to the action
profile ai .t/. Given an initial state profile a.0/, it is straightforward to show that
the ensuing action or state profiles a.1/, a.2/, : : : , will be equivalent for both design
choices.
The above example illustrates the separation between the learning rule and the
utility function. The learning rule is best-response dynamics. When the utility
function is the above quadratic form, then the combination leads to the usual
distributed averaging algorithm. If the utility function is changed (e.g., weighted,
non-quadratic, etc.), then the realization of best-response learning is altered, as well
as the structure of the game defined by the collection of the utility functions, but the
learning rule remains best-response dynamics.
An important property of best-response dynamics and other learning rules of
interest is that the actions of agent i can depend explicitly on the utility function of
agent i but not (explicitly) on the utility functions of other agents. This property
of learning rules in the learning in games literature is called being uncoupled
(Babichenko 2012; Hart and Mansour 2010; Hart and Mas-Colell 2003; Young
2004). Of course, the action stream of agent i , i.e., ai .0/; ai .1/; : : :, does depend
on the actions of other agents, but not the utility functions behind those actions.
It turns out that there are many instances in which control policies not derived
from a game-theoretic perspective can be reinterpreted as the realization of an
uncoupled learning rule from a game-theoretic perspective. These include control
policies that have been widely studied in the cooperative control literature with
application domains such as consensus and flocking (Olfati-Saber et al. 2007;
Tsitsiklis 1987), sensor coverage (Martinez et al. 2007; Murphey 1999), and routing
information over networks (Roughgarden 2005), among many others.
Game-Theoretic Learning in Distributed Control 7
While the design of such control policies can be approached in either a traditional
perspective or a game-theoretic perspective, there are potential advantages associ-
ated with viewing control design from a game-theoretic perspective. In particular,
a game-theoretic perspective allows for a modularized design architecture, i.e.,
the separation of game design and learning design, that can be exploited in a
plug-and-play fashion to provide control algorithms with automatic performance
guarantees:
Game Design Methodologies. There are several established methodologies for the
design of agent objective functions, e.g., Shapley value and marginal contribution
(Marden and Wierman 2013). The methodologies, which will be briefly reviewed
in Sect. 3.6, are systematic procedures for deriving the agent objective functions
fUi gi 2N from a given system-level objective function G. These methodologies
often provide structural guarantees on the resulting game, e.g., existence of a pure
Nash equilibrium or a potential game structure, that can be exploited in distributed
learning.
Learning Design Methodologies. The field of learning in games has sought out
to establish decision-making rules that lead to Nash equilibrium or other solution
concepts in strategic form games. In general, it has been shown (see Hart and Mas-
Colell 2003) that there are no “natural” dynamics that converge to Nash equilibria
for all games, where natural refers to dynamics that do not rely on some form of
centralized coordination, e.g., exhaustive search of the joint action profiles. For
example, there are no rules of the form (2) that provide convergence to a Nash
equilibrium in any game. However, the same limitations do not hold when we
transition from “all games” to “all games of a given structure.” In particular, there
are several positive results in the context of learning in games for special classes
of games (e.g., potential games and variants thereof). These results, which will be
discussed in Sect. 4, identify learning dynamics that yield desirable performance
guarantees when applied to the realm of potential games.
decision-making entities, e.g., the smart grid, game theory transitions from a design
choice to a necessity. The involvement of human decision-making entities in a
system requires that the system operator utilizes game theory for the purpose of
modeling and influencing the human decision-making entities to optimize system
performance.
The most widely known solution concept in game theory is a pure Nash equilibrium,
defined as follows.
Definition 1. An action profile ane 2 A is a pure Nash equilibrium if for any agent
i 2N
Ui .aine ; ai
ne
/ Ui .ai ; ai
ne
/; 8ai 2 Ai : (5)
A pure Nash equilibrium represents an action profile where no agent has a unilateral
incentive to alter its action provided that the behavior of the remaining agents is
unchanged. A pure Nash equilibrium need not exist for any game G.
The definition of Nash equilibrium also extends to scenarios where the agents can
probabilistically choose their actions. Define a strategy of agent i as pi 2 .Ai /
where .Ai / denotes the simplex over the finite action set Ai . We will express
a strategy pi by the tuple fpiai gai 2Ai where piai 0 for any ai 2 Ai and
Game-Theoretic Learning in Distributed Control 9
P
piai D 1. We will evaluate the utility of an agent i 2 N for a strategy
ai 2Ai
profile p D .p1 ; : : : ; pn / as
X
Ui .pi ; pi / D Ui .a/ p1a1 pnan : (6)
a2A
which has the usual interpretation of the expected utility under independent
randomized actions.
We can now state the definition of Nash equilibrium when extended to mixed (or
probabilistic) strategies.
Ui .pine ; pi
ne
/ Ui .pi ; pi
ne
/; 8pi 2 .Ai /: (7)
Unlike pure Nash equilibria, a mixed Nash equilibrium is guaranteed to exist in any2
game G.
A common critique regarding the viability of pure or mixed Nash equilibria as a
characterization of achievable behavior in multiagent systems is that the complexity
associated with computing such equilibria is often prohibitive (Daskalakis et al.
2009). We now introduce a weaker solution concept, which is defined relative to a
joint distribution z 2 .A/, that does not suffer from such issues.
2
Recall that we are assuming a finite set of players, each with a finite set of actions.
3
Another common equilibrium set, termed correlated equilibrium, is similar to coarse correlated
equilibrium where the difference lies in the consideration of conditional deviations as opposed to
10 J.R. Marden and J.S. Shamma
Fig. 1 The relationship between the three solution concepts: pure Nash equilibrium, mixed Nash
equilibrium, and coarse correlated equilibrium
where aopt 2 arg maxa2A W .a/ and PNE.G/ denotes the set of pure Nash equilibria
in the game G. Note that the price of anarchy given in (9) provides a lower bound
on the performance associated with any pure Nash equilibrium in the game G.
The second measure that we consider is the price of stability, which is defined
as the best-case ratio between the performance of the best equilibrium and the
optimal system behavior. Focusing on pure Nash equilibria for simplicity, the price
of stability associated with a game G is defined as
W .ane /
PoS.G/ D max 1: (10)
ane 2PNE.G/ W .aopt /
the unconditional deviations considered in (8). A formal definition of correlated equilibrium can
be found in Young (2004).
Game-Theoretic Learning in Distributed Control 11
analyzing dynamics that converge to specific types of equilibrium, e.g., the best
Nash equilibrium, the price of stability may be a more reasonable characterization
of the efficiency associated with the limiting behavior.
The above definition of price of anarchy and price of stability also extend to
situations where there is uncertainty regarding the structure of the specific game. To
that end, let G denote a family of possible games. The price of anarchy and price
of stability associated with the family of games is then defined as the worst-case
performance of all games within that family, i.e.,
Clearly, 1 PoS.G/ PoA.G/. For clarity, a PoA.G/ D 0:5 implies that regardless
of the underlying game G 2 G, any pure Nash equilibrium is at least 50% efficient
when compared to the performance of the optimal allocation for that game.
The definitions of price of anarchy and price of stability given in (9) and (10)
can be extended to broader classes of equilibria, i.e., mixed Nash equilibria or
coarse correlated equilibria, in the logical manner. To perform the above analysis
for broader equilibrium sets, we extend P the definition of the welfare function to a
a
distribution z 2 .A/ as W .z/ D a2A W .a/z . Note that for a given family
of games G, the price of anarchy associated with pure Nash equilibria would be
better (closer to 1) than the price of anarchy associated with coarse correlated
equilibrium. Since coarse correlated equilibria contain Nash equilibria, one would
naturally expect that the efficiency associated with equilibria could be far worse
than the efficiency associated with Nash equilibria. Surprisingly, it often turns out
that this is not the case as we will see below.
3.3 Smoothness
Theorem
P 1. Consider any game G where the agents’ utility functions satisfy
U
i 2N i .a/ W .a/ for any a 2 A. If there exists parameters > 0 and > 1
such that for any two action profiles a; a 2 A
X
Ui .ai ; ai / W .a / W .a/; (13)
i
12 J.R. Marden and J.S. Shamma
then the efficiency associated with any coarse correlated equilibrium zcce 2 .A/
of G must satisfy
W .zcce /
: (14)
W .a /
opt 1C
where the above expression is referred to as the robust price of anarchy (Roughgar-
den 2015). In line with the forthcoming discussion (cf., Sect. 4.4), implementing
a learning rule that leads to the set of coarse correlated equilibria provides
performance guarantees that conform to this robust price of anarchy.
One example of an entire class of games with known price of anarchy bounds
is congestion games with affine congestion functions (Roughgarden 2005) (see also
Example 2). Another class is valid utility games, introduced in Vetta (2002), which
is very relevant to distributed resource utilization problems. A critical property of
valid utility games is a system-level objective that is submodular. Submodularity
corresponds to a notion of decreasing marginal returns that is a common feature of
many objective function in engineering systems. A set-based function f W 2N ! R
is submodular if for any S T N n fi g, we have
f .S [ fi g/ f .S / f .T [ fi g/ f .T /: (16)
In each of these settings, Roughgarden (2015) has derived the appropriate smooth-
ness parameters hence providing the price of anarchy guarantees. Accordingly, the
resulting price of anarchy holds for coarse correlated equilibrium as well as Nash
equilibrium.
(iii) For any action profile a 2 A, the sum of the agents’ utilities satisfies
X
Ui .a/ W .a/:
i 2N
We will refer to such a game as a valid utility game. Any valid utility game G is
smooth with parameters D 1 and D 1; hence, the robust price of anarchy
is 1=2 for the class of valid utility games. Accordingly, the efficiency guarantees
associated with any coarse correlated equilibrium zcce 2 .A/ in a valid utility
game satisfies
1
W .z /
cce
W .aopt /:
2
The two components associated with a game-theoretic design are the agent utility
functions, which define an underlying game, and the learning rule. Both components
impact various performance objectives associated with the distributed control
design. The specification of the agent utility functions directly impacts the price
of anarchy, which can be viewed as the efficiency associated with the asymptotic
collective behavior. On the other hand, the specification of the learning algorithm
dictates the transient behavior in its attempt to drive the collective behavior to the
solution concept of interest.
At first glance it appears that the objectives associated with these two components
are unrelated to one another. For example, one could employ a design where
(i) the agents’ utility functions are chosen to optimize the price of anarchy of
pure Nash equilibria and (ii) a learning algorithm is employed that drives the
collective behavior to a pure Nash equilibrium. Unfortunately, such decoupling
is not necessarily possible due to limitations associated with (ii). As previously
discussed, there are no “natural dynamics” of the form
that lead to a (pure or mixed) Nash equilibrium in every game (Hart and Mas-Colell
2003), where “natural” refers to uncoupled dynamics (i.e., agents are uninformed
14 J.R. Marden and J.S. Shamma
of the utility functions of other agents) and rules out behaviors such as exhaustive
search or centralized coordination.
Given such impossibility results, it is imperative that the game design component
addresses objectives beyond just price of anarchy. In particular, it is of paramount
importance that the resulting game has properties that can be exploited in distributed
learning. In this section we will review such game structures. Each of these game
structures provides a degree of alignment between the agents’ utility functions fUi g
and a system-level potential function W A ! R.
The first class of games we introduce, termed potential games (Monderer and
Shapley 1996), exhibits perfect alignment between the agents’ utility functions and
the potential function .
Note that any maximizing action profile a 2 arg maxa2A .a/ is a pure Nash
equilibrium; hence, a pure Nash equilibrium is guaranteed to exist in any potential
game. Further, as we will see in the forthcoming Sect. 4, the structure inherent
to potential games can be exploited to bypass the impossibility result highlighted
above. In other words, there are natural dynamics that lead to a Nash equilibrium in
any potential game. We will survey some of these dynamics in Sect. 4.
There are several variants of potential games that seek to relax the equality given
in (18) while preserving the exploitability of the game structure for distributed
learning. One of the properties that is commonly exploited in distributed learning is
the monotonicity of the potential function along a better reply path, which is defined
as follows:
Definition 5 (Better Reply Path). A better reply path is a sequence of joint actions
a1 ; a2 ; : : : ; am such that for each k 2 f1; : : : ; m 1g (i) akC1 D .ai ; ai
k
/ for some
k kC1 k
agent i 2 N with action ai 2 Ai , ai ¤ ai , and (ii) Ui .a / > Ui .a /.
Informally, a better reply path is a sequence of joint actions where each subsequent
joint action is the result of an advantageous unilateral deviation. In a potential game,
the potential function is monotonically increasing along a better reply path. Since
the joint action set A is finite, any better reply will lead to a pure Nash equilibrium
in a finite number of iterations. This property is known as the finite improvement
property (Monderer and Shapley 1996).4
4
Commonly studied variants of exact potential games, e.g., ordinal or weighted potential games,
also possess the finite improvement property.
Game-Theoretic Learning in Distributed Control 15
We now introduce the class of weakly acyclic games which relaxes the finite
improvement property condition.
At first glance it may appear that the framework of potential games (or weakly
acyclic games) is overly restrictive as a framework for the design of networked
control systems. Here, we provide three examples of potential games, which
illustrates the breadth of the problem domains that can be modeled and analyzed
within this framework.
The first example focuses on distributed routing and highlights how a reasonable
model of user behavior, i.e., users seeking to minimize their experienced congestion,
constitutes a potential game.
where jaje D jfi 2 N W e 2 ai gj denotes the number of agents using edge e in the
allocation a.5 In general, a system designer would like to allocate the agents over
the network to minimize the aggregate congestion given by
X
C .a/ D jaje ce .jaje / :
e2E
5
Here, we use cost functions Ji ./ instead of utility functions Ui ./ in situation where the agents
are minimizers instead of maximizers.
16 J.R. Marden and J.S. Shamma
It is well known that any routing game of the above form, which is commonly
referred to as an anonymous congestion game, is a potential game with a potential
function W A ! R of the form
jaje
XX
.a/ D ce .k/ :
e2E kD1
This implies that a pure Nash equilibrium is guaranteed to exist in any anonymous
congestion, namely, any action profile that minimizes .a/. Furthermore, it is often
the case that this is unique pure Nash equilibrium with regard to aggregate behavior,
i.e., ane 2 arg mina2A .a/. The fact that the potential function and the system cost
are not equivalent, i.e., ./ ¤ C ./, can lead to inefficiencies of the resulting Nash
equilibria.
where pw W AN AN ! R is a local potential function. One choice for this local
potential function is the following:
Game-Theoretic Learning in Distributed Control 17
pw .x; x/ D 0;
pw .y; x/ D U.y; x/ U.x; x/;
pw .x; y/ D U.y; x/ U.x; x/;
pw .y; y/ D .U.y; y/ U.x; y// .U.y; x/ U.x; x// :
0
Observe that any potential function pw D pw C ˛ where ˛ 2 R also leads to a
potential function for the given graphical coordination game.
The first two examples show how potential games could naturally emerge in
two different types of strategic scenarios. The last example we present focuses
on an engineering-inspired resource allocation problem, termed the vehicle-target
assignment problem (Murphey 1999), where the vehicles’ utility functions are
engineered so that the resulting game is a potential game.
where T .a/ T denotes the collection of targets that are assigned to at least one
agent, i.e., T .a/ D [i 2N ai .
Note that in this engineering-based application, there is no appropriate model
of utility functions of the engineered vehicles. Rather, vehicle utility functions are
designed with the goal of engineering desirable system-wide behavior. Consider
one such design where the utility functions of the vehicles are set as the marginal
contribution of the vehicles to the system-level objective, i.e., for each vehicle i 2 N
and allocation a 2 A we have
0 1 0 1
X Y Y
Ui .a/ D vt @1 .1 pj /A vt @1 .1 pj /A ;
t 2ai j Wt 2aj j ¤i Wt 2aj
0 1
X Y
D vt @pi .1 pj /A :
t 2ai j ¤i Wt 2aj
18 J.R. Marden and J.S. Shamma
Given this design of utility functions, it is straightforward to verify that the resulting
game is a potential game with potential function .a/ D W .a/. This immediately
implies that any optimal allocation, aopt 2 arg maxa2A W .a/, is a pure Nash
equilibrium. However, other inefficient Nash equilibria may also exist due to the
lack of uniqueness of Nash equilibrium for such scenarios.
The examples in the previous section illustrate various settings that happen to
fall under the special category of potential games. Given that utility function
specification is a design degree of freedom in the prescriptive paradigm, it is possible
to exploit this degree of freedom to design utility functions to induce desirable
structural properties.
There are several objectives that a system designer needs to consider when
designing the game that defines the interaction framework of the agents in a
multiagent system (Marden and Wierman 2013). These goals could include (i)
ensuring the existence of a pure Nash equilibrium, (ii) ensuring that the agents’
utility functions fit into the realm of potential games, or (iii) ensuring that the agents’
utility functions optimize the price of anarchy/price of stability over an admissible
class of agent utility functions, e.g., local utility functions. While recent research has
identified the full space of methodologies that guarantee (i) and (ii) (Gopalakrishnan
et al. 2014), the existing research has yet to provide mechanisms for optimizing the
price of anarchy.
The following theorem provides one methodology for the design of agent utility
functions with guarantees on the resulting game structure (Marden and Wierman
2013; Wolpert and Tumor 1999).
Theorem 3. Consider the class of resource utilization problems defined in Sect. 2.1
with agent set N , action sets fAi g, and a global objective W W A ! R. Define the
marginal contribution utility function for each agent i 2 N and allocation a 2 A as
A few notes are in order regarding Theorem 3. First, the assignment of the agents’
utility functions is a byproduct of the chosen system-level design function and the
transformation of into the agents’ utility functions, which is given by (21) and
the choice of the baseline action aib for each agent i 2 N . Observe that the utility
design presented in Example 4 is precisely the design detailed in Theorem 3 where
D W and aib D ; for each agent i 2 N . While a system designer could clearly
Game-Theoretic Learning in Distributed Control 19
We now turn our attention toward distributed learning rules. We can categorize the
learning algorithms into the following four areas:
Robust Learning. A learning algorithm of the form (2) defines a systematic rule
for how individual agents process available information to formulate a decision.
Many of the learning algorithms in the existing literature provide guarantees on the
asymptotic collective behavior provided that the agents follow these rules precisely.
Here, we explore the robustness of such learning algorithms, i.e., the asymptotic
guarantees on the collective behavior preserved when agents follow variations of
the prescribed learning rules stemming from delays in information or asynchronous
clock rates.
Equilibrium Selection. The price of anarchy and price of stability are two
measures characterizing the inefficiency associated with Nash equilibria. The
differences between these two measures follow from the fact that Nash equilibria are
often not unique. This lack of uniqueness of Nash equilibria prompts the question
of whether deriving distributed learning that favor certain types of Nash equilibria
is attainable. Focusing on the framework of potential games, we will review one
such algorithm that guarantees the collective behavior will lead to the specific Nash
equilibria that optimize the potential function. Note that when utility functions are
engineered, as in Example 4, a system designer can often ensure that the resulting
game is a potential game where the action profiles that optimize the potential
function coincide with the action profiles that optimize the system-level objective.
(We reviewed one such methodology in Sect. 3.6.)
20 J.R. Marden and J.S. Shamma
The central challenge in distributed learning is dealing with the fact that each agent’s
environment is inherently nonstationary in that the environment from the perspective
of any agent consists of the behaviors of other agents, which are evolving. A
common approach in distributed learning is to have agents make decisions in a
myopic fashion, thereby neglecting the ramifications of an agent’s current decision
on the future behavior of the other agents. In this section we review two learning
algorithms of this form that we categorize as model-based learning algorithms. In
model-based learning, each agent observes the past behavior of the other agents
and utilizes this information to develop a behavioral model of the other agents.
Equipped with this behavioral model, each agent then performs a myopic best
response seeking to optimize its expected utility. It is important to stress here that the
goal is not to accurately model the behavior of the other agents in ensuing period.
Rather, the goal is to derive systematic agent responses that will guide the collective
behavior to a desired equilibrium.
t 1
1X
qiai .t/ D I fai ./ D ai g; (22)
t D0
Game-Theoretic Learning in Distributed Control 21
and I fg is the usual indicator function. At time t, each agent seeks to myopically
maximize its expected utility given the belief that each agent j ¤ i will select its
action independently according to a strategy qj .t/. This update rule takes on the
form
X Y aj
ai .t/ 2 arg max Ui .ai ; ai / qj .t/: (23)
ai 2Ai ai 2Ai j ¤i
Theorem 4. Consider any exact potential game G. If all players follow the
fictitious play learning rule, then the players’ empirical frequencies of play
q1 .t/; : : : ; qn .t/ will converge to a Nash equilibrium of the game G.
The fictitious play learning rule provides a mechanism to guide individual agent
behavior in distributed control systems when the agents (i) can observe the previous
action choices of the other agents in the system and (ii) have access to the structural
form of their utility function. Further, fictitious play provides provable guarantees
on the emergent collective behavior provided that the system can be modeled by an
exact potential game. For example, consider the distributed routing problem given
in Example 2 which can be modeled as a potential game irrespective of the number
of agents, the number of edges, the topology of the network, or the edge-specific
latency functions. Regardless of the structure of the routing problem, the fictitious
play algorithm can be employed to drive the collective system behavior to a Nash
equilibrium.
While the asymptotic guarantees associated with fictitious play in distributed
routing problems is appealing, the implementation of fictitious play in such settings
is problematic. First, each agent must be able to observe the specific behavior of all
other agents in the network each period. Second, the choice of each agent at any
time given in (23) requires (i) knowledge of the structural form of the agent’s utility
function and (ii) computing an expectation of its utility function, which involves
evaluating a weighted summation over jAi j terms. In large-scale systems, such as
distributed routing, each of these requirements could be prohibitive. Accordingly,
research has attempted to alter the fictitious play algorithm to minimize such
requirements while preserving the desirable asymptotic guarantees.
samples. The choice with the best average performance was then substituted for
the choice that maximized the agent’s expected utility in (23), and the process was
repeated. While simulations demonstrated reasonable performance even for limited
samples, unfortunately preserving the theoretical asymptotic guarantees associated
with fictitious play required that the number of samples drawn each period grew
prohibitively large.
A second variant of fictitious play focused on the underlying asymptotic
guarantees given in Theorem 4, which state that the empirical frequency of play
converges to a Nash equilibrium. It is important to highlight this does not imply
that the day-to-day behavior of the agents converges to a Nash equilibrium, e.g., the
agents’ day-to-day behavior could oscillate yielding a frequency of play consistent
with a Nash equilibrium. Furthermore, the cumulative payoff may be less than the
payoff associated with the limiting empirical frequencies. With this issue in mind,
Fudenberg and Levine (1995) introduced a variant of fictitious play that assures
a specific payoff consistency property against arbitrary environments, i.e., not just
when other agents employ fictitious play.
t 1
1X t 1 N ai 1
UN iai .t/ D Ui .ai ; ai .// D Ui .t 1/ C Ui .ai ; ai .t 1//: (24)
t D0 t t
Note that this average hypothetical utility is computed under the belief that
the action choices of the other agents remain unchanged. Now, consider the
decision-making rule where each agent i 2 N independently selects its action
probabilistically according to the rule
Game-Theoretic Learning in Distributed Control 23
arg maxai 2Ai UN iai .t/ with probability .1 /;
ai .t/ D (25)
ai .t 1/ with probability ;
Theorem 5. Consider any exact potential game G. If all players following the
learning algorithm joint strategy fictitious play defined above, then the joint action
profile will converge almost surely to a pure Nash equilibrium of the game G.
Hence, JFSP with inertia provides similar asymptotic guarantees to fictitious play
while minimizing the computational and observational burden on the agents. The
name “joint strategy fictitious play” is derived from the fact that maximizing the
average hypothetical utility in (24) is equivalent to maximizing an expected utility
under the belief that all agents will play collectively according to the empirical
frequency of their past joint play.
Both fictitious play and joint strategy fictitious play are intricate decision-making
rules that provide guarantees regarding the emergent collective behavior. A natural
question that emerges when considering the practicality of such rules for control of
networked systems is the robustness of these guarantees to common implementation
issues including asynchronous clocks, noisy payoffs, and delays in information,
among others. This section highlights that the framework of potential games, or
more generally weakly acyclic games, is inherently robust to such issues.
We review the result in Young (2004) that deals with this exact issue. In
particular, Young (2004) demonstrates the robustness of weakly acyclic games by
identifying a broad family of learning rules, termed finite memory better response
processes, with the property that any rule within this family will probably guide the
collective behavior to a pure Nash equilibrium in any weakly acyclic game.
A finite memory better reply process with inertia is any learning algorithm of the
following form: at each time t, each agent selects its action independently according
to the rule
m m
Bi .h .t// with probability .1 /;
ai .t/ D (26)
ai .t 1/ with probability ;
where m 1 is the size of the agent’s memory, > 0 is the agent’s inertia, hm .t/ D
fa.t 1/; a.t 2/; : : : ; a.t m/g denotes the previous m action profiles, and Bim W
24 J.R. Marden and J.S. Shamma
Am ! .Ai / is the finite memory better reply process.6 A finite memory better
reply process Bim ./ can be any process that satisfies the following properties:
In summary, the only constraint imposed on a finite memory better reply process is
that a better reply to saturated memory fa; : : : ; a} is consistent with a better reply
to the single action profile a.
The following theorem from Young (2004) (Theorem 6.2) demonstrates the
inherent robustness of weakly acyclic games.
Theorem 6. Consider any weakly acyclic game G. If all agents follow a finite
memory better reply process defined above, then the joint action profile will
converge almost surely to a pure Nash equilibrium of the game G.
One can view this result from two perspectives. The first perspective is that
the system designer has extreme flexibility in designing learning rules for weakly
acyclic games that guarantee the agents’ collective behavior will converge to a pure
Nash equilibrium. The second perspective is that perturbations of a nominal learning
rule, e.g., agents updating asynchronously or responding to delayed or inaccurate
histories, will also satisfy the conditions above and ultimately lead behavior to a
Nash equilibrium as well. These perspectives provide the basis for our claim of
robust distributed learning.
The preceding discussion focused largely on algorithms that ensured the emergent
collective behavior constitutes a (pure) Nash equilibrium. In the case where there
are multiple Nash equilibria, these algorithms provide no guarantees on which equi-
librium is likely to emerge. Accordingly, characterizing the efficiency associated
with the emergent collective behavior is equivalent to characterizing the efficiency
associated with the worst performing Nash equilibrium, i.e., the price of anarchy.
6
We write ai .t / D Bim .hm .t // with the understanding that this implies that the action profile ai .t /
is chosen randomly according to the probability distribution specified by Bim .hm .t //.
7
The actual definition of a finite better reply process considered in Young (2004) puts a further
condition on the structure of Bim under the case where the memory is not saturated, i.e., the strategy
assigns positive probability to any action with strictly positive regret. However, an identical proof
holds for any Bim that satisfies the weaker conditions set forth in this chapter.
Game-Theoretic Learning in Distributed Control 25
(iii) All other agents j ¤ i play their previous actions, i.e., ai .t C 1/ D ai .t/.
(iv) The process is then repeated.
It is straightforward to see that the above process will converge almost surely to
a pure Nash equilibrium in any potential game by observing that .a.t C 1//
.a.t// for all times t. Accordingly, the efficiency guarantees associated with the
application of this algorithm to a potential game are in line with the price of anarchy
of the game.
Here, a slight modification, or perturbation, is introduced of the above best-reply
dynamics that ensures that the resulting behavior leads to the pure Nash equilibrium
that optimizes the potential function, i.e., aopt 2 arg maxa2A .a/. The algorithm,
known as log-linear learning or the logit response dynamics (Alos-Ferrer and Netzer
2010; Blume 1993, 1997; Marden and Shamma 2012; Young 1998), follows the
best-reply process highlighted above where step (ii) is replaced by a noisy best
response. More formally, step (ii) is now of the form:
A few remarks are in order regarding the update protocol specified in (28).
First, when T ! 1, the agent’s strategy is effectively a uniform distribution over
the agent’s action set. Second, when T ! 0C , the agent’s strategy is effectively
the best response strategy given in (27). Lastly, we present this algorithm (and
the forthcoming Binary Log-Linear Learning) with regard to a fixed temperature
parameter that is common to all agents. However, there are variations of this
algorithm which allow for annealing of this temperature parameter that preserve
the resulting asymptotic guarantees, e.g., Zhu and Martínez (2013).
The following theorem establishes the asymptotic guarantees associated with
the learning algorithm log-linear learning in potential games (Blume 1993, 1997;
Young 1998).
Theorem 7. Consider any potential game G with potential function . If all players
follow the learning algorithm log-linear learning with temperature T > 0, then the
resulting process has a unique stationary distribution D f a ga2A 2 .A/ of the
form
e .1=T /.a/
a D P .1=T /.Q
a/
: (29)
aQ 2A e
The stationary distribution of the process given in (29) follows the same intuition
as presented for the update protocol in (28). That is, when T ! 1 the stationary
distribution is effectively a uniform distribution over the joint action set A. However,
when T ! 0C , all of the weight of the stationary distribution is concentrated on
the action profiles that maximize the potential function . The above stationary
distribution provides an accurate assessment of the resulting asymptotic behavior
due to the fact that the log-linear learning process is both irreducible and aperiodic,
hence (29) is the unique stationary distribution.
Merging log-linear learning with the marginal contribution utility design given
in Theorem 3 leads to the following corollary.
Corollary 1. Consider the class of resource allocation problems defined in Sect. 2.1
with agent set N , action sets fAi g, and a global objective W W A ! R. Consider
the following game-theoretic control design:
(i) Assign each agent a utility function that captures the agent’s marginal contri-
bution to the global objective, i.e.,
Observe that this design rule ensures that the resulting asymptotic behavior will
be concentrated around the allocations that maximize the global objective W . This
fact has made this design methodology an attractive option for several domains
including wind farms, sensor networks, and coordination of unmanned vehicles,
among others.
(i) Reversibility: Let ai ; ai0 be any two action choices in Ai . If ai0 2 Ri .ai / then
ai 2 Ri .ai0 /.
(ii) Completeness: Let ai ; ai0 be any two action choices in Ai . There exists a
sequence of actions ai D ai0 ; ai1 ; : : : ; aim D ai0 with the property that aikC1 2
Ri .aik / for all k 2 f0; : : : ; m 1g.
One motivation for considering restricted action sets of the above form is when the
individual agents have mobility limitations, e.g., mobile sensor networks.
Note that the log-linear learning update rule given in (28) has full support on the
agent’s action set Ai thereby disqualifying this algorithm for use in the case where
there are restrictions in action sets. Here, we seek to address the question of how to
alter the algorithm so as to preserve the asymptotic guarantees, i.e., convergence in
28 J.R. Marden and J.S. Shamma
the stationary distribution to the action profile that maximizes the potential function.
One natural variation would be to replace (28) with a strategy of the form: for any
ai 2 Ri .ai .t//
and piai .t/ D 0 for any ai … Ri .ai .t//. However, such modifications can have
drastic consequences on the resulting asymptotic guarantees. In fact, such a rule is
not even able to guarantee that the potential function maximizer is in the support of
the limiting distribution as T ! 0C (Marden and Shamma 2012).
Here, we introduce a variation of log-linear learning, termed binary log-linear
learning with restricted action sets (Marden and Shamma 2012), that preserves these
asymptotic guarantees. Binary log-linear learning follows the same setup as log-
linear learning where step (ii) is now of the form:
(ii) Agent i selects a trial action ait 2 Ri .ai .t// according to any distribution with
full support on the set Ri .ai .t//. Conditioned on the selection of this trial action,
the agent selects the action ai .t C 1/ according to a probability distribution
pi .t/ D fpiai .t/gai 2Ai 2 .Ai / of the form
8
ˆ e.1=T /Ui .a.t //
< ai .t/ with probability t
e.1=T /Ui .a.t // Ce.1=T /Ui .ai ;ai .t //
;
piai .t/ D t (33)
:̂ ait e.1=T /Ui .ai ;ai .t //
with probability .1=T /Ui .at ;ai .t //
;
e.1=T /Ui .a.t // Ce i
Much like log-linear learning, for any temperature T > 0 binary log-linear
learning can be modeled by an irreducible and aperiodic Markov chain over the
state space A; hence, there is a unique stationary distribution which we denote
by .T / D f a .T /ga2A . While log-linear learning provides the explicit form of
the stationary distribution .T /, the value of log-linear learning centers on the fact
that the support of the limiting distribution is precisely the set of potential function
maximizers, i.e.,
The action profiles contained in the support of the limiting distribution are termed
the stochastically stable states. Accordingly, log-linear learning ensures that an
action profile is stochastically stable if and only if it is a potential function
maximizer.
The following theorem from Marden and Shamma (2012) characterizes the long
run behavior of binary log-linear learning.
Game-Theoretic Learning in Distributed Control 29
Theorem 8. Consider any potential game G with potential function . If all players
follow the learning algorithm binary log-linear learning with restricted action set
and temperature T > 0, then an action profile is stochastically stable if and only if
it is a potential function maximizer.
This theorem demonstrates that a system designer can effectively deal with
restrictions in action sets by appropriately modifying the learning rule. However,
a consequence of this is that we are no longer able to provide a precise charac-
terization of the stationary distribution as a function of the temperature parameter
T . Unlike log-linear learning, binary log-linear learning applied to such a game
does not satisfy reversibility unless there are additional constraints imposed on the
agents’ restricted action sets, i.e., jRi .ai /j D jRi .ai0 /j for all i 2 N and ai ; ai0 2 Ai .
Hence, in this theorem we forgo a precise analysis of the stationary distribution in
favor of a coarse analysis of the stationary distribution that demonstrates roughly
the same asymptotic guarantees.
variant of log-linear learning provides a mixing time that is nearly linear in the
number of agents for this class of congestion games.
where Œ C denotes the projection to the positive orthant, i.e., Œx C D maxfx; 0g.
The following theorem characterizes the long run behavior of regret matching in
any game.
Theorem 9. Consider any finite game G. If all players follow the learning algo-
rithm regret matching defined above, then the positive regret for any agent i 2 N
and action ai 2 Ai asymptotically vanishes, i.e.,
The connection between the condition (36) and the definition of coarse correlated
equilibria stems from the fact that an agent’s regret and average utility can also be
computed using the empirical frequency of play z.t/ D fza .t/ga2A where
t 1
1X
za .t/ D I fa./ D ag: (37)
t D0
32 J.R. Marden and J.S. Shamma
Accordingly, if a sequence of play a.0/, a.1/, : : : , a.t 1/, satisfies (36), then we
know that the empirical frequency of play z.t/ satisfies
Hence, the limiting empirical frequency of play z.t/ is contained in the set of coarse
correlated equilibria. Note that the convergence highlighted above does not state that
the empirical frequency of play will converge to any specific correlated equilibrium;
rather, it merely states that the empirical frequency of play will approach the set of
coarse correlated equilibria.
Lastly, we presented a version of regret matching that provides convergence to
the set of coarse correlated equilibria. Variants of the presented regret matching
could also ensure convergence to the set of correlated equilibrium, which is a
more rigid solution concept than presented in Definition 3. We direct the readers
to Hart and Mas-Colell (2000) and Young (2004) for the details associated with this
variation.
the other three joint actions. This coarse correlated equilibrium yields an expected
utility of 1=2 to each agent and is clearly more desirable. One could easily imagine
other scenarios, e.g., team versus team games, where specific coarse correlated
equilibrium could provide significant performance improvements over any Nash
equilibrium.
The problem with regret matching for exploiting this potential opportunity is that
behavior is not guaranteed to converge to any specific coarse correlated equilibrium.
Accordingly, the efficiency guarantees associated with coarse correlated equilibria
cannot be better than the efficiency bounds associated with pure Nash equilibria and
can often be quite worse. With this issue in mind, recent work in Marden (2015)
and Borowski et al. (2014) has sought to develop learning algorithms that converge
to the efficient coarse correlated equilibrium, where efficiency is measured by the
sum of the agents’ expected utilities. Here, the algorithm introduced in Marden
(2015) ensures that the empirical frequency of play will converge to the most
efficient coarse correlated equilibrium, while Borowski et al. (2014) provides an
algorithm that guarantees that the day-to-day behavior of the agents will converge to
the most efficient correlated equilibrium. Both of these algorithms view convergence
in a stochastic stability sense.
The motivation for these developments centers on the fact that joint randomiza-
tion, which can potentially be characterized by correlated equilibria, can be key to
providing desirable system-level behavior. One example of such a system is a peer-
to-peer file sharing system where users engage in interactions with other users to
transfer files of interest and satisfy demands (Wang et al. 2009). Here, Wang et al.
(2009) demonstrates that the optimal system performance is actually characterized
by the most efficient correlated equilibrium as defined above. Another example
of such a system is the problem of access control for wireless communications,
where there are a collection of mobile terminals that compete over access to a
common channel (Altman et al. 2006). Optimizing system throughput requires a
level of correlation between the transmission strategies of the mobiles so as to
minimize the chance of simultaneous transmissions and failures. The authors in
Altman et al. (2006) study the efficiency of correlated equilibria in this context.
Identifying the role of correlated equilibrium (and learning strategies for attaining
specific correlated equilibrium) warrants further research attention.
5 Conclusion
The goal of this chapter has been to highlight a potential role of game-theoretic
learning in the design of networked control systems. We reviewed several classes
of learning algorithms accentuating their performance guarantees and reliance on
game structures.
It is important to reemphasize that game-theoretic learning represents just a
single dimension of a game-theoretic control design. The other dimension centers on
the assignment of objective functions to the individual agents. The structure of these
agent objective functions not only dictate convergence guarantees associated with
34 J.R. Marden and J.S. Shamma
References
Alos-Ferrer C, Netzer N (2010) The logit-response dynamics. Games Econ Behav 68:413–427
Altman E, Bonneau N, Debbah M (2006) Correlated equilibrium in access control for wireless
communications. In: 5th International Conference on Networking
Babichenko Y (2012) Completely uncoupled dynamics and Nash equilibria. Games Econ Behav
76:1–14
Blondel VD, Hendrickx JM, Olshevsky A, Tsitsiklis JN (2005a) Convergence in multiagent
coordination, consensus, and flocking. In: IEEE Conference on Decision and Control,
pp 2996–3000
Blondel VD, Hendrickx JM, Olshevsky A, Tsitsiklis JN (2005b) Convergence in multiagent
coordination, consensus, and flocking. In: Proceedings of the Joint 44th IEEE Conference on
Decision and Control and European Control Conference (CDC-ECC’05), Seville
Blume L (1993) The statistical mechanics of strategic interaction. Games Econ Behav 5:387–424
Blume L (1997) Population games. In: Arthur B, Durlauf S, and Lane D (eds) The economy as an
evolving complex system II. Addison-Wesley, Reading, pp 425–460
Borowski H, Marden JR, Frew EW (2013) Fast convergence in semi-anonymous potential games.
In: Proceedings of the IEEE Conference on Decision and Control, pp 2418–2423
Borowski HP, Marden JR, Shamma JS (2014) Learning efficient correlated equilibria. In: Proceed-
ings of the IEEE Conference on Decision and Control, pp 6836–6841
Cortes J, Martinez S, Karatas T, Bullo F (2002) Coverage control for mobile sensing networks.
In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’02),
Washington, DC, pp 1327–1332
Daskalakis C, Goldberg PW, Papadimitriou CH (2009) The complexity of computing a Nash
equilibrium. SIAM J Comput 39(1):195–259
Game-Theoretic Learning in Distributed Control 35
Fudenberg D, Levine DK (1995) Consistency and cautious fictitious play. Games Econ Behav
19:1065–1089
Fudenberg D, Levine DK (1998) The theory of learning in games. MIT Press, Cambridge
Fudenberg D, Tirole J (1991) Game theory. MIT Press, Cambridge
Gopalakrishnan R, Marden JR, Wierman A (2014) Potential games are necessary to ensure pure
Nash equilibria in cost sharing games. Math Oper Res 39(4):1252–1296
Hart S (2005) Adaptive heuristics. Econometrica 73(5):1401–1430
Hart S, Mansour Y (2010) How long to equilibrium? The communication complexity of uncoupled
equilibrium procedures. Games Econ Behav 69(1):107–126
Hart S, Mas-Colell A (2000) A simple adaptive procedure leading to correlated equilibrium.
Econometrica 68(5):1127–1150
Hart S, Mas-Colell A (2003) Uncoupled dynamics do not lead to Nash equilibrium. Am Econ Rev
93(5):1830–1836
Jadbabaie A, Lin J, Morse AS (2003) Coordination of groups of mobile autonomous agents using
nearest neighbor rules. IEEE Trans Autom Control 48(6):988–1001
Kearns MJ, Littman ML, Singh SP (2001) Graphical models for game theory. In: Proceedings of
the 17th Conference in Uncertainty in Artificial Intelligence, pp 253–260
Lambert III TJ, Epelman MA, Smith RL (2005) A fictitious play approach to large-scale
optimization. Oper Res 53(3):477–489
Marden JR (2012) State based potential games. Automatica 48:3075–3088
Marden JR (2015) Selecting efficient correlated equilibria through distributed learning. In:
American Control Conference, pp 4048–4053
Marden JR, Shamma JS (2012) Revisiting log-linear learning: asynchrony, completeness and a
payoff-based implementation. Games Econ Behav 75(2):788–808
Marden JR, Shamma JS (2015) Game theory and distributed control. In: Young HP, Zamir S (eds)
Handbook of game theory with economic applications, vol 4. Elsevier Science, pp 861–899
Marden JR, Wierman A (2013) Distributed welfare games. Oper Res 61:155–168
Marden JR, Arslan G, Shamma JS (2009) Joint strategy fictitious play with inertia for potential
games. IEEE Trans Autom Control 54:208–220
Martinez S, Cortes J, Bullo F (2007) Motion coordination with distributed information. Control
Syst Mag 27(4):75–88
Monderer D, Shapley LS (1996) Fictitious play property for games with identical interests. J Econ
Theory 68:258–265
Montanari A, Saberi A (2009) Convergence to equilibrium in local interaction games. In: 50th
Annual IEEE Symposium on Foundations of Computer Science
Murphey RA (1999) Target-based weapon target assignment problems. In: Pardalos PM, Pitsoulis
LS (eds) Nonlinear assignment problems: algorithms and applications. Kluwer Academic,
Alexandra
Nisan N, Roughgarden T, Tardos E, Vazirani VV (2007) Algorithmic game theory. Cambridge
University Press, New York
Olfati-Saber R (2006) Flocking for multi-agent dynamic systems: algorithms and theory. IEEE
Trans Autom Control 51:401–420
Olfati-Saber R, Murray RM (2003) Consensus problems in networks of agents with switching
topology and time-delays. IEEE Trans Autom Control 49(9):1520–1533
Olfati-Saber R, Fax JA, Murray RM (2007) Consensus and cooperation in networked multi-agent
systems. Proc IEEE 95(1):215–233
Roughgarden T (2005) Selfish routing and the price of anarchy. MIT Press, Cambridge
Roughgarden T (2015) Intrinsic robustness of the price of anarchy. J ACM 62(5):32:1–32:42
Shah D, Shin J (2010) Dynamics in congestion games. In: ACM SIGMETRICS, pp 107–118
Shamma JS (2014) Learning in games. In: Baillieul J, Samad T (eds) Encyclopedia of systems and
control. Springer, London
Shoham Y, Powers R, Grenager T (2007) If multi-agent learning is the answer, what is the question?
Artif Intell 171(7):365–377
36 J.R. Marden and J.S. Shamma
Touri B, Nedic A (2011) On ergodicity, infinite flow, and consensus in random models. IEEE Trans
Autom Control 56(7):1593–1605
Tsitsiklis JN (1987) Decentralized detection by a large number of sensors. Technical report. MIT,
LIDS
Tsitsiklis JN, Bertsekas DP, Athans M (1986) Distributed asynchronous deterministic and stochas-
tic gradient optimization algorithms. IEEE Trans Autom Control 35(9):803–812
A. Vetta, “Nash equilibria in competitive societies, with applications to facility location, traffic
routing and auctions,” Proceedings of the 43rd Annual IEEE Symposium on Foundations of
Computer Science, 2002, pp 416–425
Wang B, Han Z, Liu KJR (2009) Peer-to-peer file sharing game using correlated equilibrium. In:
43rd Annual Conference on Information Sciences and Systems, CISS 2009, pp 729–734
Wolpert D, Tumor K (1999) An overview of collective intelligence. In: Bradshaw JM (ed)
Handbook of agent technology. AAAI Press/MIT Press, Cambridge, USA
Young HP (1998) Individual strategy and social structure. Princeton University Press, Princeton
Young HP (2004) Strategic learning and its limits. Oxford University Press, New York
Zhu M, Martínez S (2013) Distributed coverage games for energy-aware mobile sensor networks.
SIAM J Control Optim 51(1):1–27