100% found this document useful (1 vote)
294 views28 pages

AlphaZero Chess Strategies Unveiled

The book 'Game Changer' by Matthew Sadler and Natasha Regan explores the revolutionary chess strategies developed by AlphaZero, an AI that learned to play chess solely through self-play. It highlights how AlphaZero's unique approach to piece mobility and activity can enhance human chess skills and understanding. The authors aim to share insights from AlphaZero's play with chess enthusiasts and professionals, emphasizing its potential impact on both chess and broader AI applications.

Uploaded by

lakshithchess71
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
294 views28 pages

AlphaZero Chess Strategies Unveiled

The book 'Game Changer' by Matthew Sadler and Natasha Regan explores the revolutionary chess strategies developed by AlphaZero, an AI that learned to play chess solely through self-play. It highlights how AlphaZero's unique approach to piece mobility and activity can enhance human chess skills and understanding. The authors aim to share insights from AlphaZero's play with chess enthusiasts and professionals, emphasizing its potential impact on both chess and broader AI applications.

Uploaded by

lakshithchess71
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Matthew Sadler and Natasha Regan

Game Changer

AlphaZero’s Groundbreaking Chess


Strategies and the Promise of AI

New In Chess 2019


Contents
Explanation of symbols                                             6
Foreword by Garry Kasparov                                         7
Introduction by Demis Hassabis                                     11
Preface                                                            16
Introduction                                                       19

Part I AlphaZero’s history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23


Chapter 1 A quick tour of computer chess competition             24
Chapter 2 ZeroZeroZero                                        33
Chapter 3 Demis Hassabis, DeepMind and AI                    54

Part II Inside the box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67


Chapter 4 How AlphaZero thinks                              68
Chapter 5 AlphaZero’s style – meeting in the middle              87

Part III Themes in AlphaZero’s play . . . . . . . . . . . . . . . . . . . . . . . . . 131


Chapter 6 Introduction to our selected AlphaZero themes         132
Chapter 7 Piece mobility: outposts                              137
Chapter 8 Piece mobility: activity                               168
Chapter 9 Attacking the king: the march of the rook’s pawn      208
Chapter 10 Attacking the king: colour complexes                  235
Chapter 11 Attacking the king:    276
Chapter 12 Attacking the king: opposite-side castling             299
Chapter 13 Attacking the king: defence                           321

Part IV AlphaZero’s opening choices . . . . . . . . . . . . . . . . . . . . . . . . 351


Chapter 14 AlphaZero’s opening repertoire                       352
Chapter 15 The King’s Indian Sämisch                          364
Chapter 16 The Carlsbad                                       380

Part V Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399


Chapter 17 Epilogue                                           400
Chapter 18 Technical note                                     406

Glossary                                                         409
About the authors                                                 413
Index of names                                                    414

5
Game Changer

Preface
This book is about an exceptional chess player, a player whose published

the pinnacle of chess ability. A powerful attacker, capable of defeating

original strategies; and a player that developed its creative style solely by
playing games against itself.
That player is AlphaZero, a totally new kind of chess computer created

Through learning about AlphaZero we can harness the new insights that
AI has uncovered in our wonderful game of chess and use them to build
on and enhance our human knowledge and skills. We talk to the people
who created AlphaZero, and discover the struggles that brilliant people
face when aiming for goals that have never before been achieved.
The authors feel extremely privileged to have worked with the creators

being right at the cutting edge of fast-developing technology that will have

Our collaboration arose following the publication of 10 AlphaZero


games during the December 2017 London Chess Classic tournament.
The previous year, Matthew and Natasha had won the English Chess
Chess for Life, a compilation
of interviews with icons of chess, highlighting themes and core concepts
of their games. We knew we could take a similar approach to AlphaZero,

learnings with the wider chess-playing community.

Who should read this book?


• keen chess players, looking to learn new strategies
AlphaZero’s chess is completely self-taught, stemming from millions of
games played against itself. Much of its play matches the accepted human
wisdom gathered over the past 200 years, which makes AlphaZero’s
play intuitive, allowing humans to learn from it. This book brings out
AlphaZero’s exquisite use of piece mobility and activity, with guidance
from Matthew through the simple, logical, schematic ways in which
AlphaZero builds up attacks against the opponent’s king’s position. We
believe these techniques will inspire professionals and club players alike.

16
Preface


As Demis Hassabis, CEO of DeepMind, explains, the application of AI
to games is a means to something greater: ‘We’re not doing this to just
solve games, although it’s a fun endeavour. These are challenging and

a stepping stone for us to build general-purpose algorithms that can be


deployed in all sorts of ways and in all sorts of industries to achieve great
things for society.’
Our interviews with the creative people who designed and built
AlphaZero are full of insights that, using chess as an example, help us to

• chess enthusiasts
As well as providing instructional material, this book is also a collection
of fascinating games of astonishing quality, featuring dashing attacks,

compared playing through these games to uncovering the lost notebooks


of a great attacking player of the past, such as his hero Alexander
Alekhine

How to read this book


The chess content of this book is arranged in discrete chapters and
designed to be read out of sequence, so it is perfectly possible to pick
a theme you are interested in and start in the middle of the book. The
chess content is not too heavy, with an emphasis on explanations rather
than variations. We would recommend playing through the games with
a chessboard. In our opinion, this promotes a measured pace of reading
most conducive to learning.

Acknowledgements
We would like to thank DeepMind, and in particular Demis Hassabis,
for the wonderful opportunity to study the games of AlphaZero, and for
his personal involvement in making this project a success. We would like
to thank Dave Silver, Lead Researcher on AlphaZero, as well as Thore
Graepel, Matthew Lai, Thomas Hubert, Julian Schrittwieser and Dharshan
Kumaran for their extensive technical explanations and their assistance
in running test games and test positions on AlphaZero. Nenad Tomasev
deserves a special mention for reviewing the chess content and giving us
plenty of great feedback!
A big debt of gratitude is owed to Lorrayne Bennett, Sylvia Christie,
Jon Fildes, Claire McCoy, Sarah-Jane Allen and Alice Talbert for all their
amazing work in keeping this project running and helping us with all

17
Game Changer

also like to thank everybody at DeepMind for making us feel so welcome

Thanks are also due to Allard Hoogland and the team at New in Chess
who have published this book. They have supported our unique project
and have ensured that the book is beautifully presented.

We would like to thank our families for their enthusiasm and support and,
in the case of Matthew Selby, also for his technical expertise in extracting

All of these amazing people contributed to what has been a madly


enjoyable and memorable project.

18
Introduction

DeepMind published ‘Mastering Chess and shogi by Self-Play with a


General Reinforcement Learning Algorithm’. The paper described the
company’s self-learning AI AlphaZero, which, within 24 hours of starting
from random play and with no domain knowledge except the game rules,
achieved a superhuman level of play in the games of chess and shogi

1
.

strength from being entirely self-taught. It is momentous for chess players

which built its chess strategy independently of our own rich history
of chess development. It is also far-reaching for AI developers, with
AlphaZero achieving superhuman strength in a matter of hours without

up the possibility of using these AI techniques for applications where

In an interview later in this book, Demis Hassabis describes how the


success of AlphaZero builds on DeepMind’s earlier work creating AlphaGo,
a neural network based system that applied deep learning to successfully
defeat Go legend Lee Sedol in 2016, and how both are milestones in the

positively transform the world through AI. Among other things, it seeks to:
• help address the problems of climate change and energy;
• enable medical advances in diagnostics to make excellent medical care
more widely available;

to human well-being.

The importance of the AlphaZero story has impact far beyond DeepMind’s

and Go, developers around the world have been motivated to invest

techniques that created DeepMind’s AlphaGo to produce publicly available


Game Changer

professional-strength Go playing machines, in what many consider to


be a tipping point for public participation in the advancement of AI. In
recent months the open-source Leela Chess Zero was developed based on
the AlphaZero paper, and is now a dangerous challenger to the traditional

such a central role in the development of this critical technology!


This new approach to machine self-learning in chess has given us a strong
chess player with a new style and approach, and that is the crux of this
book. AlphaZero has independently developed strategies that possess many
similarities to human wisdom, and many that are further developed or show
situations where our well-established positional ‘rules’ are ‘broken’.
In 2018, AlphaZero cannot yet explain to us directly what it has learnt

DeepMind and other groups are developing will make this possible in the

human praxis; and through detailed explanations based on illustrative

AlphaZero’s ideas can be incorporated into our own games.

This book explores the following chess themes:


• Outposts (Chapter 7): we examine the variety of ways in which
AlphaZero secures valuable posts for its pieces, from the knight and bishop
all the way up to the king itself.
• Activity (Chapter 8): AlphaZero is skilled in maximising the mobility
of its own pieces and restricting its opponent’s pieces. We pay particular
attention to the ways that AlphaZero restricts the opposing king.
• The march of the rook’s pawn (Chapter 9): AlphaZero frequently
advances its rook’s pawn as part of its attack and plants it close to the
opponent’s king.
• Colour complexes (Chapter 10): Matthew explains AlphaZero’s
fondness for positions with opposite-coloured bishops.
• : AlphaZero makes

• : we consider some stunning


examples in which castling queenside was the prelude to a dangerous
AlphaZero attack.
• Defence (Chapter 13): we learn about the contrasting defensive

20
Introduction

In addition, we have looked at the ways in which the thinking process

regularly uses engine assessments in their chess studies. We explore

we believe gives it the ability to head for generally promising positions,

have gathered have also revealed to us some features of engine analysis

knowledge should better equip chess engine users to understand their


assessments.
In the process of writing this book, we had access to previously
unpublished games2 and evaluations from AlphaZero. We believe that
there is a large amount of new and instructive material in this book that
we hope you will thoroughly enjoy reading and trying out in your games.

Matthew Sadler and


Natasha Regan,
London, November 2018

2 A description of the AlphaZero games we received and the technical settings used for

21
Part II – Inside the box

CHAPTER 4

How AlphaZero thinks


AlphaZero’s self-learning design is
from understanding AlphaZero’s
thought process, including:
chapter, we take a quick tour of the
mechanics of AlphaZero’s thinking, Professional chess players:
as it trains and as it plays. This Professional chess players now use
chapter uses information from a engines for all of their pre- and
post-game analysis, frequently
the journal Science released in switching engine according to
December 2018: ‘A general the type of position they wish to

Lillicrap, David Silver, Thomas


Lai
Graepel, Demis , Julian
, Ioannis

Guez
Lanctot, Laurent , Dharshan
Kumaran and Karen

reinforcement learning algorithm analyse. By better understanding


that masters chess, shogi, and Go the skill sets of the various engines,
through self-play’6. Silver, professionals can make better use of
Lead Researcher on AlphaZero,
explained the inner workings of thought processes can provide
AlphaZero and research scientist fresh insight into the strengths and
Thore Graepel and research weaknesses of traditional engines
engineer Matthew Lai were on and help professional players to
hand to answer our questions. optimise their use of them.

6 By David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai,
Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy
Lillicrap, Karen Simonyan and Demis Hassabis. Science
1140-1144, DOI: 10.1126/science.aar6404.

68
Chapter 4 – How AlphaZero thinks

Amateur chess players: AlphaZero’s games against


AlphaZero’s thought processes are
more human-like than traditional
chess engines and we can pick How AlphaZero works –
up tips from how it makes its
decisions. AlphaZero’s architecture is
informed by the four principles
: that govern DeepMind’s approach

thought processes gives pointers


for making traditional engines 1. Learning rather than being
stronger. In addition, developers are
increasingly experimenting with The algorithm learns its strategy
using AI. from examples rather than

: expert knowledge.
The architecture of AlphaZero
is general and its combination
of computational reasoning The algorithm applies general
and intuition extends to many principles and hence can be
important problem domains applied to multiple domains, e.g.
beyond game play. shogi, Go, and chess.

In this chapter we illustrate the


AlphaZero thought process by Learning is based on concrete
taking a look under the bonnet at observations rather than
its analysis at a critical moment in preconceived logical rules.
the game – how deeply and widely
it searches, what moves it considers, 4. Active rather than passive
and how it evaluates the resulting The machine explores the game
positions. rather than being instructed by a
human.
In the next chapter, ‘AlphaZero’s
style – meeting in the middle’, By satisfying all four requirements,
we relate the design of AlphaZero AlphaZero deviates considerably
to its play and advance several from traditional computer game-
hypotheses about AlphaZero’s playing systems.
playing style and evaluations.
We then attempt to validate Thore Graepel described
these hypotheses by observing AlphaZero’s architecture as follows:
Part II – Inside the box

‘With approximately 10^47 neurons in the brain – comes in.


The neural network takes the
be too computationally expensive current game position as its input,
to exhaustively search through and returns move probabilities
every available move, and every for each possible move to be the
possible sequence of moves
that might follow in the game.
Therefore, most chess engines – along with a value estimate for
including AlphaZero – combine a
search algorithm with an evaluation
function that provides an estimate output guides the Monte Carlo tree
of how good a position is at any search towards the most promising
point in the game. segments of the game tree. By
Traditional chess engines use reducing the number of moves
variants of what is called alpha-beta considered in each position, the
tree search, enhanced by dozens of move probabilities cut down the
of the search. Being able to
combine this with an evaluation estimate the value of non-terminal
function designed by expert chess positions in this way reduces the
players. In contrast, AlphaZero of the necessary search in
instead learns entirely on its own, the tree, because the value of the
developing its own evaluation outcome of a given variation can be
function and using Monte Carlo determined even before the end of
tree search the game is reached.
alternative to alpha-beta tree search, Crucially, the same algorithm is
that has the added advantage of able to reach superhuman ability
being able to take into account prior across several games without
knowledge about which moves are adapting the architecture for each
promising and which ones are not.
This allows the search to focus the system displays a degree of
mostly on promising and relevant generality: the same process of
variations. Furthermore, MCTS is Monte Carlo tree search guided by
robust with respect to inaccuracies a neural network, trained with self-
of the evaluation function, which play reinforcement learning, proves

positions.
Where, then, does the prior
knowledge come from? This is
where AlphaZero’s neural network Now, we zoom further into
– a computer system loosely AlphaZero’s training for its clash
modelled on the connections and

70
Chapter 4 – How AlphaZero thinks

practical examples to illustrate the who has taught chess to a small


process. child will know, random play will
get you nowhere when playing
How AlphaZero trains chess.
Grandmasters might train for an To avoid endless random games,
upcoming match by spending those early games were stopped
many hours and days researching after a certain number of moves
the latest openings and chess and called draws. Every now and
developments, and adopt a strict then, though, some random games
diet and exercise regime. They would end as wins for one side, and
this rare signal allowed AlphaZero
the opponent they are expecting to learn the evaluation function

involves collecting all of the


prospective opponent’s games using
a huge database of tournament
games from all over the world, Intuitively, the system adjusts the
looking for weaknesses in the parameters of the neural network
opponent’s play, and particularly in such that it makes the moves played
their openings set-ups. by the winning side more likely
In the last 20 years or so, top in their move probabilities, and it
grandmasters have found they evaluates the positions encountered
have to remember much more than as more favourable to the winning
previously, as they feel the need side.
to adopt a wide range of openings During those nine hours,
to avoid the preparation of their AlphaZero played a total of 44
opponents. million games against itself – more
By contrast, AlphaZero’s training than 1,000 games per second. At the
same time, it continuously adjusted
nine hours. It began training from a the parameters of its neural
clean slate with no chess knowledge network so as to capture moves and
other than the rules of the game, outcomes from the most recent
batch of games played against
play at all. AlphaZero also did not itself. For each move played during
use any available chess openings self-play, the MCTS performed 800
knowledge, and instead worked out ‘simulations’, each of which extends
its own openings as it trained and the current search by one move
played against itself. while assessing the value of the
At the start of these crucial nine resulting position. As an example,
hours, AlphaZero did play chess, AlphaZero could begin to analyse
but not as we know it. As anyone the chess starting position like this:

71
Part II – Inside the box

Step in If a given variation has not been


thought Node
process
that node considered many times before, if
the move appears plausible and if
Simulation 1
the variation looks promising, then
AlphaZero will tend to select the
Simulation 3 variation and its continuation for
simulation. The evaluation of the
Simulation 4 initial position will then be the
Simulation 5
average of all position evaluations
from each of the 800 simulations.
Simulation 6 It should be noted that this is quite

Simulation 7 engines assess positions. Rather


than returning an average of all
Simulation 8
lines considered, an engine such as

the so-called principal variation,


We’ve shown eight simulations i.e. the very best line for both sides
in the table above. By the time according to the current search
AlphaZero has completed the 800 tree. We believe that this is one of
simulations used for each position the reasons why AlphaZero plays
encountered during its lightning-
fast training games, it would have traditional chess engines, often
looked a few moves deep for the taking a more intuitive approach.
most plausible lines. However, We explore this further in the next
the search during training is chapter.
much shallower than during a
tournament game. traditional chess engines such

Each time AlphaZero selects a function, which may account


variation to consider, it will be on
the basis of three criteria:
1. how plausible the move is in this combination of positional features.

evaluates a given position can be


2. how promising is the outcome of found in its evaluation guide. These

position using online resources at


3. how often this variation has
been considered in the search. Evaluation-Guide/:

72
Chapter 4 – How AlphaZero thinks

White Black Total


White Black Total
middle middle middle
endgame endgame endgame
game game game

-11 -11 -11 -11

Initiative

-49 -16 79 18

8599 9187 8651 9493

147 151 -4 17

-36 48 -36

Pieces -61 -87 15

Space 55 -15

Threats 114 118 117 -3 -8

Total 8799 9495 9778 -8

represented as a linear combination


assesses the position using an of positional features, AlphaZero’s
evaluation function that is a evaluation function is learnt and
linear combination of features, represented in terms of a neural
with two sets of weights, one for network called the ‘value network’,
the middlegame and one for the trained to predict game outcomes
based on a raw representation of
number of factors for both White chess positions.
As a result, AlphaZero is
are showing as contributors for unconstrained by human design
or lack of imagination and has
evaluation is the weighted sum of
the various components. Note that features it takes into account when
beneath these groups of factors evaluating a given position.
there is a more detailed list of But how AlphaZero’s value
network works remains a bit of a
taking hundreds of positional mystery and cannot be explained
factors into consideration. in simple rules such as a knight
AlphaZero also has an evaluation being worth approximately three
pawns. It is likely that AlphaZero’s
chess engines of the last 50 years, value network views and assesses
which use handcrafted functions
designed by grandmasters and dependent way.

73
Part II – Inside the box

So, rather than being constrained of tunable weights, which means


to, say, an evaluation function that the system can learn for itself
that adds separate assessments of what features to represent. Finally,
material and mobility, AlphaZero AlphaZero combines all of these
can consider the interaction of features together, using even more
tunable weights, to determine the

be very useful in understanding


the overall position, and could
explain how AlphaZero implements opposite-coloured bishops, and you add
combinations of positional motifs those values all up to get a score. Does
AlphaZero create its own function and
does it have the same positional features
we cannot understand exactly in mind?
how AlphaZero is thinking, we
can explore the ways in which AlphaZero learns its own features
AlphaZero generates its innovative by tuning the connections of its
and active plans, and how it neural network. So while AlphaZero
conducts its ferocious attacks could in principle learn a feature
through analysing its games.
pawns’ it could equally see the
We asked Silver to explain
a little deeper how the training perhaps learning complex features
process works and progresses: that are useful to the machine but
hard for a human to interpret.
How does AlphaZero’s neural network
give a value for a position? How does AlphaZero improve during
AlphaZero sees the chessboard as an training?
8x8 grid of numerical values. These
values are processed by a series the connections in the value
of computational steps known as network are updated to evaluate
layers of the neural network. Each each position in that game more
layer takes the previous 8x8 board
representation and constructs a new same time, the connections in the
8x8 board that can represent richer policy network are strengthened
features. This process is repeated so as to play more often the move
over many layers to produce ever recommended by AlphaZero itself,
more powerful representations after a lot of thinking by its Monte
of the board. The nature of each Carlo tree search. AlphaZero plays
layer is determined by millions against itself millions of times,

74
Chapter 4 – How AlphaZero thinks

learning to provide better move


play is almost entirely random.
Towards the end of the training
we observed similar draw rates to

something akin to ‘intuition’ when they play against themselves


for how to play the game. This – about 70-80%. This increases to
process of learning for itself, solely
from its interactions, is known as
‘reinforcement learning’. Can you get a sense of how AlphaZero’s
play develops as it trains?
If you left AlphaZero training against Periodically during training, we
itself for a very long time, would it just take snapshots and play through
keep getting better and better? some games, using AlphaZero at
When we trained AlphaZero on Go, each given stage in its training.
we saw its performance continue We don’t want to take the training
to improve over a very long games themselves because they are
training time. However, training played at about 40 milliseconds
AlphaZero on chess appears to have per move, but we take a snapshot
diminishing returns, perhaps due and play longer games to see how
to the large number of draws that it is progressing. One interesting
start occurring during self-play. thing we found is that AlphaZero
re-discovers opening sequences that
Matthew Lai explained the nature are frequently played by human
of these training games: players as well. What we found
most amazing is that, as training
How does it work with those training progresses, AlphaZero often
games? Are they just very fast games? discards those known variations
Each training game is played
very quickly, using about 40 them!
milliseconds thinking time per
move to execute a Monte Carlo It looks to us like AlphaZero uses piece
tree search consisting of 800 mobility well and is a fantastic attacker.
simulations. Do you think it looks at these concepts in

When AlphaZero was training against mathematically?


itself, did many of the games result in Those are well-known concepts in
draws? the computer chess literature, but
In the beginning, almost all games in traditional chess engines they are
ended in draws by the 50 moves usually applied with minimal or no
selectivity. As a consequence, they

75
Part II – Inside the box

have to be given low weights so that AlphaZero used a single machine


they do not exert an overly strong

AlphaZero, the highly non-linear Some game-playing computers


nature of neural networks means it simulate outcomes as they play
can potentially learn to apply them
much more selectively, and with
millions of games whilst training,
the features are valid. In addition, but does not use the technique
since AlphaZero maximises of simulating to the end of the
expected score, it is not so tied to
keeping the material balance. play. Once AlphaZero’s training
is complete, the latest neural nets
Is the speed of training games the sort
of thing you might change if you were
changing the training process? You might
give it longer? play. As Matthew Lai explained to
us, DeepMind’s earlier versions of
good you want the moves to be AlphaGo used to conduct random
rollouts during play. This is not
want for training. And the more necessary for AlphaZero because
games the system can see, the its value network is already so
advanced that additional rollouts
across them and the less it will during play do not add any value.
As a consequence, though, there
individual game. is no randomness built into
AlphaZero as it plays.
It is important to comment
We asked Matthew Lai about
between the hardware used to whether AlphaZero would play the
train AlphaZero and the hardware same game twice:
used by AlphaZero in match
When it’s playing, does AlphaZero have
any randomness in its play?
generate self-play games, and 16 When AlphaZero is playing against
itself during training, it is very
to train the neural networks. These important that we see a wide
computing resources minimise the variety of positions and moves. This
time taken to complete the training. is achieved by explicitly adding
randomness to its move selection.

76
Chapter 4 – How AlphaZero thinks

After training, when AlphaZero is In the opening, AlphaZero had


playing matches, there is still some
randomness due to the parallel
nature of the hardware used; also after gaining the bishop pair. I
we sometimes add randomness had expected AlphaZero to line up
to the opening to ensure diverse
evaluation. but instead AlphaZero dedicated

The AlphaZero thought process centre with the goal of opening


diagonals for its bishops to support
move
We can illustrate AlphaZero’s game at the critical moment.
thought process in a match situation
In the coming pages, we will
present snapshots of AlphaZero’s
in detail at one particular position thinking at various points in
its thought process, starting at
Let us follow AlphaZero’s steps as the beginning – when it has
it thinks and decides about what searched very few branches of its
move to make next. The position we tree of variations – to the end of
have chosen comes from the game its thought process when it has
‘Exactly how to attack ’: a fabulous
decisive game which we will come evaluation of the position.
back to several times in this book.
The position we have chosen occurs To help us, Matthew Lai has
provided us with trees of the moves
that AlphaZero considered, together
AlphaZero with supplementary information
such as the evaluation of the move.
In the above diagram AlphaZero is
considering its 30th move as White.

after just 64 nodes of search, and


walk you through the moves and
the information displayed:

[see next page]

77
Part II – Inside the box

0.657
100.0%

Bd3 (29.77%) Bf3 (18.82%) c6 (16.15%) d5 (10.21%) Bg2 (4.75%) f3 (3.50%) Bh1 (4.75%) Qd2 (3.50%) Re2 (0.41%) Qd3 (0.35%) Bc3 (0.22%) Rb1 (0.08%) Kb1 (0.07%) Qe2 (1.90%) Rg1 (1.40%) Qc4 (1.20%) Rh1 (1.03%) Rcd1 (0.65%) Qd1 (0.26%)

0.601 0.645 0.773 0.871 0.616 0.673 0.616 0.726 0.659 0.659 0.508 0.616 0.570 0.673 0.659 0.687 0.659 0.687 0.687
19.4% 13.4% 10.4% 7.5% 4.5% 4.5% 3.0% 3.0% 3.0% 3.0% 3.0% 3.0% 3.0% 1.5% 1.5% 1.5% 1.5% 1.5% 1.5%

That looks scary doesn’t it? That was exactly my thought when I saw it too,
but some explanations from Matthew Lai and from my co-author Natasha
helped enormously. Let’s zoom into a small part of the tree:

The root node 2. The total percentage of node searches


spent
100% means that 100% of Alpha-
Zero’s node searches were used to
produce this result. We will always
see 100% for the position at the

1. AlphaZero’s evaluation of the position As AlphaZero searches deeper


down the tree into the branches,
Zero’s evaluation of the position it divides up the available time
from White’s point of view. 0.657 and energy between moves and
means a 65.7% expected score, i.e. variations, spending the bulk of
better for White. This expected its time on the moves it considers
score is made up of a combination most important.
When we move on to looking at the
branches, we will see percentages
For example, 65.7% wins, no draws less than 100% and this gives
and 34.3% losses would give a 65.7% the proportion of the time that
expected score, as would 31.4% AlphaZero has allocated to this
wins, 68.6% draws and no losses. possibility.

78
Chapter 4 – How AlphaZero thinks

3.50% likely to be its choice in this


position. Not very likely therefore,
moves, with a percentage value next but not impossible. Compare that to
to each of them. the incomprehensible and illogical
The percentage represents the prior
move probability, and as we will see a 0.22% chance of being selected
throughout our trees, this number
will never change for a given
position.
The number indicates how likely resources spent on that move
AlphaZero believes it is at the
beginning of its thought process
that it would eventually choose this
move. It is like showing a position
to a grandmaster and asking them

glance and which might be best.


The grandmaster might come up AlphaZero’s evaluation of the

them from most likely to least


likely. This is just what AlphaZero is pretty good, and it spent 3.0%
is doing to prioritise its analysis. of its total node searches on that
In chess terms: move. In human terms, AlphaZero

AlphaZero

To recapitulate, working from top to


bottom, we can see:
1. AlphaZero’s overall evaluation of
the position;
2. the moves it has looked at, and
how likely AlphaZero thinks it

3. AlphaZero’s evaluation of the


position after each move;

spent considering the move.


Part II – Inside the box

The alert among you will have AlphaZero takes a more


noticed a couple of interesting probabilistic view. AlphaZero
points: essentially evaluates the position as
a whole by taking into account the
1. The percentages on each level of the tree evaluations of all the moves it looks
don’t add up to 100%. at, giving more weight to the moves
it considers more deeply.
This is because we are not showing
all the moves that AlphaZero

we will discuss the advantages and


disadvantages of this approach
2. The overall evaluation of the position and give some practical examples.
(0.657 = 65.7% expected score) does not Just in general however, such an
match the evaluation of any individual approach might end up mimicking
move. human intuition where players
steer for a position because ‘it feels
good’ and then work out a concrete
on the basis that the evaluation line when the position arises.
of the best move determines
the evaluation of the position.
very broad, with AlphaZero’s top

have put the results into a table to


help readability:

the position. [see next page]

80
Chapter 4 – How AlphaZero thinks

Actual %
Evaluation Rank
Move Rank move Rank
Move searches node
eval
spent on score) searches
the move

d3 1 1 17

13

3 3

4 4 1

5 5 14

7 5 7

h1 3 5 7 14

3 7 7 3

3 14 7 9

d3 3 15 7 9

c3 3 17 7 19

3 18 7 14

3 19 7 18

9 14 7

14 9

c4 11 14 4

h1 14 9

cd1 13 14 4

d1 16 14 4

At this very early stage of thinking, From prior expectations, AlphaZero


AlphaZero’s search is quite broad,
and AlphaZero has spent some time most likely move, but its intuition
searching moves which it thinks is not borne out by the evaluation
are quite unlikely, even when their of the move. Its fourth most likely
evaluations aren’t particularly move – 30.d5 – storms to the top of
special either. the evaluation table!

81
Part III – Themes in AlphaZero’s play

Section B – Reducing the mobility of the


opponent’s forces to create opportunities
In this section AlphaZero reduces the
opponent’s forces to passivity using AlphaZero
London 2018

up an advanced rook’s pawn on

to be a useful advantage for the

of White’s pieces. The knight


decentralises to win the pawn, the
dark-squared bishop is restricted Already having some experience of
by pawn phalanxes on b6/c5 and AlphaZero’s play, I was expecting
h4/g3, and most importantly the
king is rendered passive. AlphaZero squared bishop for the knight on
converts its edge by keeping the d4 and play the rook and opposite-
white king pinned to the corner coloured bishops middlegame,
hoping to invade the weakened
central light squares and play on
useless knight on f6, a rook tied to
the back rank and an unstoppable That probably wasn’t a bad plan, but
passed c-pawn to contend with. AlphaZero’s creative strategy took
me completely by surprise.

the opponent’s activity [31…c5]


2. Reducing the opponent’s activity
with pawn advances [31…c5,
32…g5, 33…g4, 34…g3]

active pieces to leave passive

196
Chapter 8 – Piece mobility: activity

in this position fails to deliver


my eyes was one of Black’s strongest anything resembling clear equality:
assets!
However, AlphaZero is trading this
asset for a series of other dynamic
plusses. for White due to the weakness of

With this manoeuvre, Black has


gained space on the kingside and
severely restricted the freedom of
the white king and dark-squared
bishop.
AlphaZero’s pieces advance
has gained a new potential channel inexorably, increasing the
for entry into the White position:
pieces and the opponent’s pieces.

its box on the kingside: the bishop


White is by no means helpless as stops the king escaping via f1-e2
Black’s position is extended and whilst the g3-pawn covers f2 and
AlphaZero has a lot of squares h2.
to protect with limited forces.

to have it all under control and Typical AlphaZero play, exchanging

temporary activity before opponent with passive pieces: we


proceeding to push White back and will see another example on move
reclaim all that White has gained. 57.

Playing around with the engines

197
Part III – Themes in AlphaZero’s play

on an outpost on f6 but unable

are testament to the grandeur of


AlphaZero’s strategy. The c-pawn arisen in which White’s strong
centre is counter-balanced by

c3 wins.

HISTORICAL PARALLEL which tends to hang around a bit in


the early middlegame (as here on
This encounter between the a5). There is another less obvious
current World Champion Magnus challenge to Black’s position: his
kingside has been weakened by the
Ukrainian genius Vasily Ivanchuk has exchange of his king’s knight on
themes of AlphaZero’s games in the
15.h4
which AlphaZero built up play on the The young Carlsen plays a move
that AlphaZero likes too! White
counterplay never got going. The exerts pressure on the black
kingside with the h-pawn.
the decisive invasion happens on

strongest, reminds me of AlphaZero.

2690 this position. The queen assists


2750
Morelia/Linares 2007 (11) but loses sight of the kingside dark
squares a little. A high-class game

198
Chapter 8 – Piece mobility: activity

A powerful move, threatening

AlphaZero-style use of the


advanced h-pawn to target the dark
Bukavshin, Aix-les-Bains 2011. squares around the black king.

A fantastic switchback! Black must


defend his kingside (which he
was forced to weaken with 22...

We saw this idea in AlphaZero’s

on e5 reduces the activity of Black’s

Black’s dark-squared weaknesses on


that wing too by creating an outpost
on f6 for a white bishop or knight.
Magnus Carlsen: an AlphaZero-
AlphaZero chooses the same like switchback at 17.
approach in this position!
Exchanging the rooks prevents the h6) but this gives White a tactical
knight on a5 from activating itself opportunity to exploit the pin
on the black knight on the open

maintain the attacking tempo. counterplay has become White’s


decisive channel of entry, as so
often happens when the mobility of
one side is much greater than the
other.

Winning a piece.

199
Part III – Themes in AlphaZero’s play

As we discuss in the chapter on

using its queen to slow down the


build-up of an opponent’s attack by
harrying the opponent’s pieces and
sowing confusion. The game ‘Using a
’ is a

It was a striking theme from the


games released in December 2017
that AlphaZero could render

passive. to box in the black king.


Let’s see one of those games:
Black has been under pressure since
Game: ‘Risky rooks’
A remarkable occurrence: in the
middlegame the black queen is is the choice of my engines when
analysing for six hours or more.
legal move.
make the queen wish she had not
retreated to the corner.

the opponent’s activity

AlphaZero

London 2017

The black queen is completely


imprisoned. Black can only sit and
await its fate.

200
Chapter 8 – Piece mobility: activity

HISTORICAL PARALLEL
A fantastic move. The queen ties
down the black king and the rook
reminded me of an idea played by on h8 completely while preparing

elite players until her retirement

Laurent Fressinet. The following


position was reached after 24

on f6 covers d8 and stops the


2656
2536
Istanbul ol 2000 (9) but after much examination, she

the coordination of White’s pieces


and take control of the e5-square:
the rook cannot be taken because
of mate on e1. But then inspiration
and tactical genius struck!

now on e1, which means that after

As Judit explains in volume two


of her Best Games collection, she
thought for 32 minutes after Black’s Wonderful tactical ingenuity!
24th move, which shows the The defence in the game also didn’t
work.

for White, with queen and three


enormous passed pawns on the
queenside.

201

Common questions

Powered by AI

AlphaZero has shed light on both the strengths and limitations of existing chess engines by illustrating their relatively static approach to evaluation based on predefined heuristics. It revealed the limits of traditional engines like Stockfish in positional understanding, as they often default to static evaluations and material balance. Contrarily, AlphaZero's dynamic, learning-based approach challenges these limitations, exposing the potential benefits of adaptive strategies and deep learning architectures. This knowledge pushes existing engines to consider more flexible approaches in re-evaluating positions and adapting to the nuances of dynamic play .

AlphaZero challenges traditional chess wisdom by independently discovering and using strategies that both align with and break conventional positional rules established over the years. It employs innovative concepts such as maximizing piece mobility and activity, creatively using sacrifices for long-term positional advantage, and rethinking opening strategies. AlphaZero’s style blends intuitive human-like strategies with high selectivity and dynamic maneuvering, which can surprise even experienced players by eschewing established continuations for potentially stronger and unexpected moves .

AlphaZero employs probabilistic assessments to guide its decision-making processes during games. This technique allows it to evaluate the general promise of positions rather than making purely deterministic decisions. This probabilistic strategy incorporates the potential for various outcomes and dynamically adapts to situations, differing significantly from deterministic approaches seen in traditional engines like Stockfish that rely on fixed evaluations and precise calculations. This flexibility affords AlphaZero a more intuitive and adaptable play style, often leading to creative solutions that traditional engines may not find .

AlphaZero’s approach to securing outposts is considered advanced because it seamlessly integrates these strategic positions into broader tactical plans, often involving complex inter-piece dynamics and strategic sacrifices. Unlike typical human practice which might follow more predictable and static use of outposts, AlphaZero dynamically adapts positions and utilizes outposts in conjunction with active strategies that maintain pressure and create sustained threats, overcoming static weaknesses of traditional approaches .

AlphaZero's sacrifices, highlighted during its match against Stockfish, are notable for targeting long-term positional advantages rather than immediate material gain. It executes brilliant sacrifices focusing on improving piece activity, creating advanced outposts, and opening up the board in strategically advantageous ways. AlphaZero’s use of sacrifices goes beyond typical opening gambits, involving more intricate plans to damage an opponent's structure or restrict key pieces, demonstrating its deep understanding and advanced strategic foresight .

AlphaZero's learning process is grounded in reinforcement learning, where it plays millions of games against itself and refines its strategy based on these interactions. This is different from traditional engines like Stockfish that rely heavily on pre-programmed heuristics and vast databases of human games. While Stockfish applies concepts like piece mobility with minimal selectivity, AlphaZero uses neural networks to learn these concepts more selectively and with higher influence where valid. This allows AlphaZero to exhibit a more intuitive and human-like play style .

AlphaZero's development has significantly influenced public engagement with AI by demonstrating the potential of machine self-learning and intelligence beyond coded instructions. The creation of powerful open-source projects like Leela Chess Zero inspired by AlphaZero showcases the transformative possibilities of AI, sparking curiosity and interest. Its achievements in complex domains like chess make AI advancements accessible and relatable, driving public fascination and promoting broader engagement with AI technologies and research. This has set a precedent for AI as both a competitive field and a feasible partner in solving global complexity .

AlphaZero has had a profound impact on AI fields beyond just games, serving as a benchmark and inspiration for developing general-purpose algorithms. It has led to the creation of advanced AI systems in other domains, such as the open-source Leela Chess Zero, which parallels AlphaZero's methodologies. The techniques employed by AlphaZero motivate further research into machine self-learning, evidenced by various fields adopting similar reinforcement learning strategies to develop AI systems that could revolutionize industries .

AlphaZero's endgame strategy exemplifies its advanced capabilities by leveraging near-perfect positional understanding and piece coordination. It exercises high degree of control and adaptability in limiting opponent's options, frequently utilizing concepts such as opposition, zugzwang, and systematic reduction of opponent's piece activity. AlphaZero's approach often includes nuanced positional play and lengthy strategic plans, which are outcomes of its training methodology that combines probabilistic assessments with extensive self-play experience. Its ability to simplify positions and convert small advantages into wins distinguishes its endgame play from human players .

AlphaZero develops what can be seen as 'intuition' through its reinforcement learning methodology. During training, it updates its value network to re-evaluate each position positively or negatively based on game outcomes. The policy network is adjusted to favor moves that lead to success, as determined by many Monte Carlo tree search simulations. Over millions of self-play games, these networks help AlphaZero to form an intuition-like understanding of which moves or positions are promising, leading to a style of play that feels intuitive and aligned with strong human strategic insight .

You might also like