Matthew Sadler and Natasha Regan
Game Changer
AlphaZero’s Groundbreaking Chess
Strategies and the Promise of AI
New In Chess 2019
Contents
Explanation of symbols 6
Foreword by Garry Kasparov 7
Introduction by Demis Hassabis 11
Preface 16
Introduction 19
Part I AlphaZero’s history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
Chapter 1 A quick tour of computer chess competition 24
Chapter 2 ZeroZeroZero 33
Chapter 3 Demis Hassabis, DeepMind and AI 54
Part II Inside the box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67
Chapter 4 How AlphaZero thinks 68
Chapter 5 AlphaZero’s style – meeting in the middle 87
Part III Themes in AlphaZero’s play . . . . . . . . . . . . . . . . . . . . . . . . . 131
Chapter 6 Introduction to our selected AlphaZero themes 132
Chapter 7 Piece mobility: outposts 137
Chapter 8 Piece mobility: activity 168
Chapter 9 Attacking the king: the march of the rook’s pawn 208
Chapter 10 Attacking the king: colour complexes 235
Chapter 11 Attacking the king: 276
Chapter 12 Attacking the king: opposite-side castling 299
Chapter 13 Attacking the king: defence 321
Part IV AlphaZero’s opening choices . . . . . . . . . . . . . . . . . . . . . . . . 351
Chapter 14 AlphaZero’s opening repertoire 352
Chapter 15 The King’s Indian Sämisch 364
Chapter 16 The Carlsbad 380
Part V Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
Chapter 17 Epilogue 400
Chapter 18 Technical note 406
Glossary 409
About the authors 413
Index of names 414
5
Game Changer
Preface
This book is about an exceptional chess player, a player whose published
the pinnacle of chess ability. A powerful attacker, capable of defeating
original strategies; and a player that developed its creative style solely by
playing games against itself.
That player is AlphaZero, a totally new kind of chess computer created
Through learning about AlphaZero we can harness the new insights that
AI has uncovered in our wonderful game of chess and use them to build
on and enhance our human knowledge and skills. We talk to the people
who created AlphaZero, and discover the struggles that brilliant people
face when aiming for goals that have never before been achieved.
The authors feel extremely privileged to have worked with the creators
being right at the cutting edge of fast-developing technology that will have
Our collaboration arose following the publication of 10 AlphaZero
games during the December 2017 London Chess Classic tournament.
The previous year, Matthew and Natasha had won the English Chess
Chess for Life, a compilation
of interviews with icons of chess, highlighting themes and core concepts
of their games. We knew we could take a similar approach to AlphaZero,
learnings with the wider chess-playing community.
Who should read this book?
• keen chess players, looking to learn new strategies
AlphaZero’s chess is completely self-taught, stemming from millions of
games played against itself. Much of its play matches the accepted human
wisdom gathered over the past 200 years, which makes AlphaZero’s
play intuitive, allowing humans to learn from it. This book brings out
AlphaZero’s exquisite use of piece mobility and activity, with guidance
from Matthew through the simple, logical, schematic ways in which
AlphaZero builds up attacks against the opponent’s king’s position. We
believe these techniques will inspire professionals and club players alike.
16
Preface
•
As Demis Hassabis, CEO of DeepMind, explains, the application of AI
to games is a means to something greater: ‘We’re not doing this to just
solve games, although it’s a fun endeavour. These are challenging and
a stepping stone for us to build general-purpose algorithms that can be
deployed in all sorts of ways and in all sorts of industries to achieve great
things for society.’
Our interviews with the creative people who designed and built
AlphaZero are full of insights that, using chess as an example, help us to
• chess enthusiasts
As well as providing instructional material, this book is also a collection
of fascinating games of astonishing quality, featuring dashing attacks,
compared playing through these games to uncovering the lost notebooks
of a great attacking player of the past, such as his hero Alexander
Alekhine
How to read this book
The chess content of this book is arranged in discrete chapters and
designed to be read out of sequence, so it is perfectly possible to pick
a theme you are interested in and start in the middle of the book. The
chess content is not too heavy, with an emphasis on explanations rather
than variations. We would recommend playing through the games with
a chessboard. In our opinion, this promotes a measured pace of reading
most conducive to learning.
Acknowledgements
We would like to thank DeepMind, and in particular Demis Hassabis,
for the wonderful opportunity to study the games of AlphaZero, and for
his personal involvement in making this project a success. We would like
to thank Dave Silver, Lead Researcher on AlphaZero, as well as Thore
Graepel, Matthew Lai, Thomas Hubert, Julian Schrittwieser and Dharshan
Kumaran for their extensive technical explanations and their assistance
in running test games and test positions on AlphaZero. Nenad Tomasev
deserves a special mention for reviewing the chess content and giving us
plenty of great feedback!
A big debt of gratitude is owed to Lorrayne Bennett, Sylvia Christie,
Jon Fildes, Claire McCoy, Sarah-Jane Allen and Alice Talbert for all their
amazing work in keeping this project running and helping us with all
17
Game Changer
also like to thank everybody at DeepMind for making us feel so welcome
Thanks are also due to Allard Hoogland and the team at New in Chess
who have published this book. They have supported our unique project
and have ensured that the book is beautifully presented.
We would like to thank our families for their enthusiasm and support and,
in the case of Matthew Selby, also for his technical expertise in extracting
All of these amazing people contributed to what has been a madly
enjoyable and memorable project.
18
Introduction
DeepMind published ‘Mastering Chess and shogi by Self-Play with a
General Reinforcement Learning Algorithm’. The paper described the
company’s self-learning AI AlphaZero, which, within 24 hours of starting
from random play and with no domain knowledge except the game rules,
achieved a superhuman level of play in the games of chess and shogi
1
.
strength from being entirely self-taught. It is momentous for chess players
which built its chess strategy independently of our own rich history
of chess development. It is also far-reaching for AI developers, with
AlphaZero achieving superhuman strength in a matter of hours without
up the possibility of using these AI techniques for applications where
In an interview later in this book, Demis Hassabis describes how the
success of AlphaZero builds on DeepMind’s earlier work creating AlphaGo,
a neural network based system that applied deep learning to successfully
defeat Go legend Lee Sedol in 2016, and how both are milestones in the
positively transform the world through AI. Among other things, it seeks to:
• help address the problems of climate change and energy;
• enable medical advances in diagnostics to make excellent medical care
more widely available;
•
to human well-being.
The importance of the AlphaZero story has impact far beyond DeepMind’s
and Go, developers around the world have been motivated to invest
techniques that created DeepMind’s AlphaGo to produce publicly available
Game Changer
professional-strength Go playing machines, in what many consider to
be a tipping point for public participation in the advancement of AI. In
recent months the open-source Leela Chess Zero was developed based on
the AlphaZero paper, and is now a dangerous challenger to the traditional
such a central role in the development of this critical technology!
This new approach to machine self-learning in chess has given us a strong
chess player with a new style and approach, and that is the crux of this
book. AlphaZero has independently developed strategies that possess many
similarities to human wisdom, and many that are further developed or show
situations where our well-established positional ‘rules’ are ‘broken’.
In 2018, AlphaZero cannot yet explain to us directly what it has learnt
DeepMind and other groups are developing will make this possible in the
human praxis; and through detailed explanations based on illustrative
AlphaZero’s ideas can be incorporated into our own games.
This book explores the following chess themes:
• Outposts (Chapter 7): we examine the variety of ways in which
AlphaZero secures valuable posts for its pieces, from the knight and bishop
all the way up to the king itself.
• Activity (Chapter 8): AlphaZero is skilled in maximising the mobility
of its own pieces and restricting its opponent’s pieces. We pay particular
attention to the ways that AlphaZero restricts the opposing king.
• The march of the rook’s pawn (Chapter 9): AlphaZero frequently
advances its rook’s pawn as part of its attack and plants it close to the
opponent’s king.
• Colour complexes (Chapter 10): Matthew explains AlphaZero’s
fondness for positions with opposite-coloured bishops.
• : AlphaZero makes
• : we consider some stunning
examples in which castling queenside was the prelude to a dangerous
AlphaZero attack.
• Defence (Chapter 13): we learn about the contrasting defensive
20
Introduction
In addition, we have looked at the ways in which the thinking process
regularly uses engine assessments in their chess studies. We explore
we believe gives it the ability to head for generally promising positions,
have gathered have also revealed to us some features of engine analysis
knowledge should better equip chess engine users to understand their
assessments.
In the process of writing this book, we had access to previously
unpublished games2 and evaluations from AlphaZero. We believe that
there is a large amount of new and instructive material in this book that
we hope you will thoroughly enjoy reading and trying out in your games.
Matthew Sadler and
Natasha Regan,
London, November 2018
2 A description of the AlphaZero games we received and the technical settings used for
21
Part II – Inside the box
CHAPTER 4
How AlphaZero thinks
AlphaZero’s self-learning design is
from understanding AlphaZero’s
thought process, including:
chapter, we take a quick tour of the
mechanics of AlphaZero’s thinking, Professional chess players:
as it trains and as it plays. This Professional chess players now use
chapter uses information from a engines for all of their pre- and
post-game analysis, frequently
the journal Science released in switching engine according to
December 2018: ‘A general the type of position they wish to
Lillicrap, David Silver, Thomas
Lai
Graepel, Demis , Julian
, Ioannis
Guez
Lanctot, Laurent , Dharshan
Kumaran and Karen
reinforcement learning algorithm analyse. By better understanding
that masters chess, shogi, and Go the skill sets of the various engines,
through self-play’6. Silver, professionals can make better use of
Lead Researcher on AlphaZero,
explained the inner workings of thought processes can provide
AlphaZero and research scientist fresh insight into the strengths and
Thore Graepel and research weaknesses of traditional engines
engineer Matthew Lai were on and help professional players to
hand to answer our questions. optimise their use of them.
6 By David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai,
Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy
Lillicrap, Karen Simonyan and Demis Hassabis. Science
1140-1144, DOI: 10.1126/science.aar6404.
68
Chapter 4 – How AlphaZero thinks
Amateur chess players: AlphaZero’s games against
AlphaZero’s thought processes are
more human-like than traditional
chess engines and we can pick How AlphaZero works –
up tips from how it makes its
decisions. AlphaZero’s architecture is
informed by the four principles
: that govern DeepMind’s approach
thought processes gives pointers
for making traditional engines 1. Learning rather than being
stronger. In addition, developers are
increasingly experimenting with The algorithm learns its strategy
using AI. from examples rather than
: expert knowledge.
The architecture of AlphaZero
is general and its combination
of computational reasoning The algorithm applies general
and intuition extends to many principles and hence can be
important problem domains applied to multiple domains, e.g.
beyond game play. shogi, Go, and chess.
In this chapter we illustrate the
AlphaZero thought process by Learning is based on concrete
taking a look under the bonnet at observations rather than
its analysis at a critical moment in preconceived logical rules.
the game – how deeply and widely
it searches, what moves it considers, 4. Active rather than passive
and how it evaluates the resulting The machine explores the game
positions. rather than being instructed by a
human.
In the next chapter, ‘AlphaZero’s
style – meeting in the middle’, By satisfying all four requirements,
we relate the design of AlphaZero AlphaZero deviates considerably
to its play and advance several from traditional computer game-
hypotheses about AlphaZero’s playing systems.
playing style and evaluations.
We then attempt to validate Thore Graepel described
these hypotheses by observing AlphaZero’s architecture as follows:
Part II – Inside the box
‘With approximately 10^47 neurons in the brain – comes in.
The neural network takes the
be too computationally expensive current game position as its input,
to exhaustively search through and returns move probabilities
every available move, and every for each possible move to be the
possible sequence of moves
that might follow in the game.
Therefore, most chess engines – along with a value estimate for
including AlphaZero – combine a
search algorithm with an evaluation
function that provides an estimate output guides the Monte Carlo tree
of how good a position is at any search towards the most promising
point in the game. segments of the game tree. By
Traditional chess engines use reducing the number of moves
variants of what is called alpha-beta considered in each position, the
tree search, enhanced by dozens of move probabilities cut down the
of the search. Being able to
combine this with an evaluation estimate the value of non-terminal
function designed by expert chess positions in this way reduces the
players. In contrast, AlphaZero of the necessary search in
instead learns entirely on its own, the tree, because the value of the
developing its own evaluation outcome of a given variation can be
function and using Monte Carlo determined even before the end of
tree search the game is reached.
alternative to alpha-beta tree search, Crucially, the same algorithm is
that has the added advantage of able to reach superhuman ability
being able to take into account prior across several games without
knowledge about which moves are adapting the architecture for each
promising and which ones are not.
This allows the search to focus the system displays a degree of
mostly on promising and relevant generality: the same process of
variations. Furthermore, MCTS is Monte Carlo tree search guided by
robust with respect to inaccuracies a neural network, trained with self-
of the evaluation function, which play reinforcement learning, proves
positions.
Where, then, does the prior
knowledge come from? This is
where AlphaZero’s neural network Now, we zoom further into
– a computer system loosely AlphaZero’s training for its clash
modelled on the connections and
70
Chapter 4 – How AlphaZero thinks
practical examples to illustrate the who has taught chess to a small
process. child will know, random play will
get you nowhere when playing
How AlphaZero trains chess.
Grandmasters might train for an To avoid endless random games,
upcoming match by spending those early games were stopped
many hours and days researching after a certain number of moves
the latest openings and chess and called draws. Every now and
developments, and adopt a strict then, though, some random games
diet and exercise regime. They would end as wins for one side, and
this rare signal allowed AlphaZero
the opponent they are expecting to learn the evaluation function
involves collecting all of the
prospective opponent’s games using
a huge database of tournament
games from all over the world, Intuitively, the system adjusts the
looking for weaknesses in the parameters of the neural network
opponent’s play, and particularly in such that it makes the moves played
their openings set-ups. by the winning side more likely
In the last 20 years or so, top in their move probabilities, and it
grandmasters have found they evaluates the positions encountered
have to remember much more than as more favourable to the winning
previously, as they feel the need side.
to adopt a wide range of openings During those nine hours,
to avoid the preparation of their AlphaZero played a total of 44
opponents. million games against itself – more
By contrast, AlphaZero’s training than 1,000 games per second. At the
same time, it continuously adjusted
nine hours. It began training from a the parameters of its neural
clean slate with no chess knowledge network so as to capture moves and
other than the rules of the game, outcomes from the most recent
batch of games played against
play at all. AlphaZero also did not itself. For each move played during
use any available chess openings self-play, the MCTS performed 800
knowledge, and instead worked out ‘simulations’, each of which extends
its own openings as it trained and the current search by one move
played against itself. while assessing the value of the
At the start of these crucial nine resulting position. As an example,
hours, AlphaZero did play chess, AlphaZero could begin to analyse
but not as we know it. As anyone the chess starting position like this:
71
Part II – Inside the box
Step in If a given variation has not been
thought Node
process
that node considered many times before, if
the move appears plausible and if
Simulation 1
the variation looks promising, then
AlphaZero will tend to select the
Simulation 3 variation and its continuation for
simulation. The evaluation of the
Simulation 4 initial position will then be the
Simulation 5
average of all position evaluations
from each of the 800 simulations.
Simulation 6 It should be noted that this is quite
Simulation 7 engines assess positions. Rather
than returning an average of all
Simulation 8
lines considered, an engine such as
the so-called principal variation,
We’ve shown eight simulations i.e. the very best line for both sides
in the table above. By the time according to the current search
AlphaZero has completed the 800 tree. We believe that this is one of
simulations used for each position the reasons why AlphaZero plays
encountered during its lightning-
fast training games, it would have traditional chess engines, often
looked a few moves deep for the taking a more intuitive approach.
most plausible lines. However, We explore this further in the next
the search during training is chapter.
much shallower than during a
tournament game. traditional chess engines such
Each time AlphaZero selects a function, which may account
variation to consider, it will be on
the basis of three criteria:
1. how plausible the move is in this combination of positional features.
evaluates a given position can be
2. how promising is the outcome of found in its evaluation guide. These
position using online resources at
3. how often this variation has
been considered in the search. Evaluation-Guide/:
72
Chapter 4 – How AlphaZero thinks
White Black Total
White Black Total
middle middle middle
endgame endgame endgame
game game game
-11 -11 -11 -11
Initiative
-49 -16 79 18
8599 9187 8651 9493
147 151 -4 17
-36 48 -36
Pieces -61 -87 15
Space 55 -15
Threats 114 118 117 -3 -8
Total 8799 9495 9778 -8
represented as a linear combination
assesses the position using an of positional features, AlphaZero’s
evaluation function that is a evaluation function is learnt and
linear combination of features, represented in terms of a neural
with two sets of weights, one for network called the ‘value network’,
the middlegame and one for the trained to predict game outcomes
based on a raw representation of
number of factors for both White chess positions.
As a result, AlphaZero is
are showing as contributors for unconstrained by human design
or lack of imagination and has
evaluation is the weighted sum of
the various components. Note that features it takes into account when
beneath these groups of factors evaluating a given position.
there is a more detailed list of But how AlphaZero’s value
network works remains a bit of a
taking hundreds of positional mystery and cannot be explained
factors into consideration. in simple rules such as a knight
AlphaZero also has an evaluation being worth approximately three
pawns. It is likely that AlphaZero’s
chess engines of the last 50 years, value network views and assesses
which use handcrafted functions
designed by grandmasters and dependent way.
73
Part II – Inside the box
So, rather than being constrained of tunable weights, which means
to, say, an evaluation function that the system can learn for itself
that adds separate assessments of what features to represent. Finally,
material and mobility, AlphaZero AlphaZero combines all of these
can consider the interaction of features together, using even more
tunable weights, to determine the
be very useful in understanding
the overall position, and could
explain how AlphaZero implements opposite-coloured bishops, and you add
combinations of positional motifs those values all up to get a score. Does
AlphaZero create its own function and
does it have the same positional features
we cannot understand exactly in mind?
how AlphaZero is thinking, we
can explore the ways in which AlphaZero learns its own features
AlphaZero generates its innovative by tuning the connections of its
and active plans, and how it neural network. So while AlphaZero
conducts its ferocious attacks could in principle learn a feature
through analysing its games.
pawns’ it could equally see the
We asked Silver to explain
a little deeper how the training perhaps learning complex features
process works and progresses: that are useful to the machine but
hard for a human to interpret.
How does AlphaZero’s neural network
give a value for a position? How does AlphaZero improve during
AlphaZero sees the chessboard as an training?
8x8 grid of numerical values. These
values are processed by a series the connections in the value
of computational steps known as network are updated to evaluate
layers of the neural network. Each each position in that game more
layer takes the previous 8x8 board
representation and constructs a new same time, the connections in the
8x8 board that can represent richer policy network are strengthened
features. This process is repeated so as to play more often the move
over many layers to produce ever recommended by AlphaZero itself,
more powerful representations after a lot of thinking by its Monte
of the board. The nature of each Carlo tree search. AlphaZero plays
layer is determined by millions against itself millions of times,
74
Chapter 4 – How AlphaZero thinks
learning to provide better move
play is almost entirely random.
Towards the end of the training
we observed similar draw rates to
something akin to ‘intuition’ when they play against themselves
for how to play the game. This – about 70-80%. This increases to
process of learning for itself, solely
from its interactions, is known as
‘reinforcement learning’. Can you get a sense of how AlphaZero’s
play develops as it trains?
If you left AlphaZero training against Periodically during training, we
itself for a very long time, would it just take snapshots and play through
keep getting better and better? some games, using AlphaZero at
When we trained AlphaZero on Go, each given stage in its training.
we saw its performance continue We don’t want to take the training
to improve over a very long games themselves because they are
training time. However, training played at about 40 milliseconds
AlphaZero on chess appears to have per move, but we take a snapshot
diminishing returns, perhaps due and play longer games to see how
to the large number of draws that it is progressing. One interesting
start occurring during self-play. thing we found is that AlphaZero
re-discovers opening sequences that
Matthew Lai explained the nature are frequently played by human
of these training games: players as well. What we found
most amazing is that, as training
How does it work with those training progresses, AlphaZero often
games? Are they just very fast games? discards those known variations
Each training game is played
very quickly, using about 40 them!
milliseconds thinking time per
move to execute a Monte Carlo It looks to us like AlphaZero uses piece
tree search consisting of 800 mobility well and is a fantastic attacker.
simulations. Do you think it looks at these concepts in
When AlphaZero was training against mathematically?
itself, did many of the games result in Those are well-known concepts in
draws? the computer chess literature, but
In the beginning, almost all games in traditional chess engines they are
ended in draws by the 50 moves usually applied with minimal or no
selectivity. As a consequence, they
75
Part II – Inside the box
have to be given low weights so that AlphaZero used a single machine
they do not exert an overly strong
AlphaZero, the highly non-linear Some game-playing computers
nature of neural networks means it simulate outcomes as they play
can potentially learn to apply them
much more selectively, and with
millions of games whilst training,
the features are valid. In addition, but does not use the technique
since AlphaZero maximises of simulating to the end of the
expected score, it is not so tied to
keeping the material balance. play. Once AlphaZero’s training
is complete, the latest neural nets
Is the speed of training games the sort
of thing you might change if you were
changing the training process? You might
give it longer? play. As Matthew Lai explained to
us, DeepMind’s earlier versions of
good you want the moves to be AlphaGo used to conduct random
rollouts during play. This is not
want for training. And the more necessary for AlphaZero because
games the system can see, the its value network is already so
advanced that additional rollouts
across them and the less it will during play do not add any value.
As a consequence, though, there
individual game. is no randomness built into
AlphaZero as it plays.
It is important to comment
We asked Matthew Lai about
between the hardware used to whether AlphaZero would play the
train AlphaZero and the hardware same game twice:
used by AlphaZero in match
When it’s playing, does AlphaZero have
any randomness in its play?
generate self-play games, and 16 When AlphaZero is playing against
itself during training, it is very
to train the neural networks. These important that we see a wide
computing resources minimise the variety of positions and moves. This
time taken to complete the training. is achieved by explicitly adding
randomness to its move selection.
76
Chapter 4 – How AlphaZero thinks
After training, when AlphaZero is In the opening, AlphaZero had
playing matches, there is still some
randomness due to the parallel
nature of the hardware used; also after gaining the bishop pair. I
we sometimes add randomness had expected AlphaZero to line up
to the opening to ensure diverse
evaluation. but instead AlphaZero dedicated
The AlphaZero thought process centre with the goal of opening
diagonals for its bishops to support
move
We can illustrate AlphaZero’s game at the critical moment.
thought process in a match situation
In the coming pages, we will
present snapshots of AlphaZero’s
in detail at one particular position thinking at various points in
its thought process, starting at
Let us follow AlphaZero’s steps as the beginning – when it has
it thinks and decides about what searched very few branches of its
move to make next. The position we tree of variations – to the end of
have chosen comes from the game its thought process when it has
‘Exactly how to attack ’: a fabulous
decisive game which we will come evaluation of the position.
back to several times in this book.
The position we have chosen occurs To help us, Matthew Lai has
provided us with trees of the moves
that AlphaZero considered, together
AlphaZero with supplementary information
such as the evaluation of the move.
In the above diagram AlphaZero is
considering its 30th move as White.
after just 64 nodes of search, and
walk you through the moves and
the information displayed:
[see next page]
77
Part II – Inside the box
0.657
100.0%
Bd3 (29.77%) Bf3 (18.82%) c6 (16.15%) d5 (10.21%) Bg2 (4.75%) f3 (3.50%) Bh1 (4.75%) Qd2 (3.50%) Re2 (0.41%) Qd3 (0.35%) Bc3 (0.22%) Rb1 (0.08%) Kb1 (0.07%) Qe2 (1.90%) Rg1 (1.40%) Qc4 (1.20%) Rh1 (1.03%) Rcd1 (0.65%) Qd1 (0.26%)
0.601 0.645 0.773 0.871 0.616 0.673 0.616 0.726 0.659 0.659 0.508 0.616 0.570 0.673 0.659 0.687 0.659 0.687 0.687
19.4% 13.4% 10.4% 7.5% 4.5% 4.5% 3.0% 3.0% 3.0% 3.0% 3.0% 3.0% 3.0% 1.5% 1.5% 1.5% 1.5% 1.5% 1.5%
That looks scary doesn’t it? That was exactly my thought when I saw it too,
but some explanations from Matthew Lai and from my co-author Natasha
helped enormously. Let’s zoom into a small part of the tree:
The root node 2. The total percentage of node searches
spent
100% means that 100% of Alpha-
Zero’s node searches were used to
produce this result. We will always
see 100% for the position at the
1. AlphaZero’s evaluation of the position As AlphaZero searches deeper
down the tree into the branches,
Zero’s evaluation of the position it divides up the available time
from White’s point of view. 0.657 and energy between moves and
means a 65.7% expected score, i.e. variations, spending the bulk of
better for White. This expected its time on the moves it considers
score is made up of a combination most important.
When we move on to looking at the
branches, we will see percentages
For example, 65.7% wins, no draws less than 100% and this gives
and 34.3% losses would give a 65.7% the proportion of the time that
expected score, as would 31.4% AlphaZero has allocated to this
wins, 68.6% draws and no losses. possibility.
78
Chapter 4 – How AlphaZero thinks
3.50% likely to be its choice in this
position. Not very likely therefore,
moves, with a percentage value next but not impossible. Compare that to
to each of them. the incomprehensible and illogical
The percentage represents the prior
move probability, and as we will see a 0.22% chance of being selected
throughout our trees, this number
will never change for a given
position.
The number indicates how likely resources spent on that move
AlphaZero believes it is at the
beginning of its thought process
that it would eventually choose this
move. It is like showing a position
to a grandmaster and asking them
glance and which might be best.
The grandmaster might come up AlphaZero’s evaluation of the
them from most likely to least
likely. This is just what AlphaZero is pretty good, and it spent 3.0%
is doing to prioritise its analysis. of its total node searches on that
In chess terms: move. In human terms, AlphaZero
AlphaZero
To recapitulate, working from top to
bottom, we can see:
1. AlphaZero’s overall evaluation of
the position;
2. the moves it has looked at, and
how likely AlphaZero thinks it
3. AlphaZero’s evaluation of the
position after each move;
spent considering the move.
Part II – Inside the box
The alert among you will have AlphaZero takes a more
noticed a couple of interesting probabilistic view. AlphaZero
points: essentially evaluates the position as
a whole by taking into account the
1. The percentages on each level of the tree evaluations of all the moves it looks
don’t add up to 100%. at, giving more weight to the moves
it considers more deeply.
This is because we are not showing
all the moves that AlphaZero
we will discuss the advantages and
disadvantages of this approach
2. The overall evaluation of the position and give some practical examples.
(0.657 = 65.7% expected score) does not Just in general however, such an
match the evaluation of any individual approach might end up mimicking
move. human intuition where players
steer for a position because ‘it feels
good’ and then work out a concrete
on the basis that the evaluation line when the position arises.
of the best move determines
the evaluation of the position.
very broad, with AlphaZero’s top
have put the results into a table to
help readability:
the position. [see next page]
80
Chapter 4 – How AlphaZero thinks
Actual %
Evaluation Rank
Move Rank move Rank
Move searches node
eval
spent on score) searches
the move
d3 1 1 17
13
3 3
4 4 1
5 5 14
7 5 7
h1 3 5 7 14
3 7 7 3
3 14 7 9
d3 3 15 7 9
c3 3 17 7 19
3 18 7 14
3 19 7 18
9 14 7
14 9
c4 11 14 4
h1 14 9
cd1 13 14 4
d1 16 14 4
At this very early stage of thinking, From prior expectations, AlphaZero
AlphaZero’s search is quite broad,
and AlphaZero has spent some time most likely move, but its intuition
searching moves which it thinks is not borne out by the evaluation
are quite unlikely, even when their of the move. Its fourth most likely
evaluations aren’t particularly move – 30.d5 – storms to the top of
special either. the evaluation table!
81
Part III – Themes in AlphaZero’s play
Section B – Reducing the mobility of the
opponent’s forces to create opportunities
In this section AlphaZero reduces the
opponent’s forces to passivity using AlphaZero
London 2018
up an advanced rook’s pawn on
to be a useful advantage for the
of White’s pieces. The knight
decentralises to win the pawn, the
dark-squared bishop is restricted Already having some experience of
by pawn phalanxes on b6/c5 and AlphaZero’s play, I was expecting
h4/g3, and most importantly the
king is rendered passive. AlphaZero squared bishop for the knight on
converts its edge by keeping the d4 and play the rook and opposite-
white king pinned to the corner coloured bishops middlegame,
hoping to invade the weakened
central light squares and play on
useless knight on f6, a rook tied to
the back rank and an unstoppable That probably wasn’t a bad plan, but
passed c-pawn to contend with. AlphaZero’s creative strategy took
me completely by surprise.
the opponent’s activity [31…c5]
2. Reducing the opponent’s activity
with pawn advances [31…c5,
32…g5, 33…g4, 34…g3]
active pieces to leave passive
196
Chapter 8 – Piece mobility: activity
in this position fails to deliver
my eyes was one of Black’s strongest anything resembling clear equality:
assets!
However, AlphaZero is trading this
asset for a series of other dynamic
plusses. for White due to the weakness of
With this manoeuvre, Black has
gained space on the kingside and
severely restricted the freedom of
the white king and dark-squared
bishop.
AlphaZero’s pieces advance
has gained a new potential channel inexorably, increasing the
for entry into the White position:
pieces and the opponent’s pieces.
its box on the kingside: the bishop
White is by no means helpless as stops the king escaping via f1-e2
Black’s position is extended and whilst the g3-pawn covers f2 and
AlphaZero has a lot of squares h2.
to protect with limited forces.
to have it all under control and Typical AlphaZero play, exchanging
temporary activity before opponent with passive pieces: we
proceeding to push White back and will see another example on move
reclaim all that White has gained. 57.
Playing around with the engines
197
Part III – Themes in AlphaZero’s play
on an outpost on f6 but unable
are testament to the grandeur of
AlphaZero’s strategy. The c-pawn arisen in which White’s strong
centre is counter-balanced by
c3 wins.
HISTORICAL PARALLEL which tends to hang around a bit in
the early middlegame (as here on
This encounter between the a5). There is another less obvious
current World Champion Magnus challenge to Black’s position: his
kingside has been weakened by the
Ukrainian genius Vasily Ivanchuk has exchange of his king’s knight on
themes of AlphaZero’s games in the
15.h4
which AlphaZero built up play on the The young Carlsen plays a move
that AlphaZero likes too! White
counterplay never got going. The exerts pressure on the black
kingside with the h-pawn.
the decisive invasion happens on
strongest, reminds me of AlphaZero.
2690 this position. The queen assists
2750
Morelia/Linares 2007 (11) but loses sight of the kingside dark
squares a little. A high-class game
198
Chapter 8 – Piece mobility: activity
A powerful move, threatening
AlphaZero-style use of the
advanced h-pawn to target the dark
Bukavshin, Aix-les-Bains 2011. squares around the black king.
A fantastic switchback! Black must
defend his kingside (which he
was forced to weaken with 22...
We saw this idea in AlphaZero’s
on e5 reduces the activity of Black’s
Black’s dark-squared weaknesses on
that wing too by creating an outpost
on f6 for a white bishop or knight.
Magnus Carlsen: an AlphaZero-
AlphaZero chooses the same like switchback at 17.
approach in this position!
Exchanging the rooks prevents the h6) but this gives White a tactical
knight on a5 from activating itself opportunity to exploit the pin
on the black knight on the open
maintain the attacking tempo. counterplay has become White’s
decisive channel of entry, as so
often happens when the mobility of
one side is much greater than the
other.
Winning a piece.
199
Part III – Themes in AlphaZero’s play
As we discuss in the chapter on
using its queen to slow down the
build-up of an opponent’s attack by
harrying the opponent’s pieces and
sowing confusion. The game ‘Using a
’ is a
It was a striking theme from the
games released in December 2017
that AlphaZero could render
passive. to box in the black king.
Let’s see one of those games:
Black has been under pressure since
Game: ‘Risky rooks’
A remarkable occurrence: in the
middlegame the black queen is is the choice of my engines when
analysing for six hours or more.
legal move.
make the queen wish she had not
retreated to the corner.
the opponent’s activity
AlphaZero
London 2017
The black queen is completely
imprisoned. Black can only sit and
await its fate.
200
Chapter 8 – Piece mobility: activity
HISTORICAL PARALLEL
A fantastic move. The queen ties
down the black king and the rook
reminded me of an idea played by on h8 completely while preparing
elite players until her retirement
Laurent Fressinet. The following
position was reached after 24
on f6 covers d8 and stops the
2656
2536
Istanbul ol 2000 (9) but after much examination, she
the coordination of White’s pieces
and take control of the e5-square:
the rook cannot be taken because
of mate on e1. But then inspiration
and tactical genius struck!
now on e1, which means that after
As Judit explains in volume two
of her Best Games collection, she
thought for 32 minutes after Black’s Wonderful tactical ingenuity!
24th move, which shows the The defence in the game also didn’t
work.
for White, with queen and three
enormous passed pawns on the
queenside.
201