ML | Monte Carlo Tree Search (MCTS)
Last Updated :
01 Aug, 2025
Monte Carlo Tree Search (MCTS) is a algorithm designed for problems with extremely large decision spaces, like the game Go with its 10^{170}possible board states. Instead of exploring all moves, MCTS incrementally builds a search tree using random simulations (rollouts) to guide its decisions. It balances exploration of new possibilities and usage of known promising paths, effectively focusing computational effort where it matters most, making it highly efficient for complex decision-making tasks.
Consider a chess player deciding their next move, they could either pursue a line of play they know is good (exploitation) or investigate a new superior strategy (exploration). MCTS formalizes this decision-making process through statistical sampling and tree construction.
The Four-Phase Algorithm
MCTS consists of four distinct phases that repeat iteratively until a computational budget is exhausted:
MCTS stepsSelection Phase: Starting from the root node, the algorithm traverses down the tree using a selection policy. The most common approach employs the Upper Confidence Bounds applied to Trees (UCT) formula, which balances exploration and exploitation by selecting child nodes based on both their average reward and uncertainty.
Expansion Phase: When the selection phase reaches a leaf node that isn't terminal, the algorithm expands the tree by adding one or more child nodes representing possible actions from that state.
Simulation Phase: From the newly added node, a random playout is performed until reaching a terminal state. During this phase, moves are chosen randomly or using simple heuristics, making the simulation computationally inexpensive.
Backpropagation Phase: The result of the simulation is propagated back up the tree to the root, updating statistics (visit counts and win rates) for all nodes visited during the selection phase.
The selection phase relies on the UCB1 (Upper Confidence Bound) formula to determine which child node to visit next:
\text{UCB1}(i) = \bar{X}_i + c \sqrt{\frac{\ln(N)}{n_i}}
Where:
- \bar{X}_i is the average reward of node i
- c is the exploration parameter (typically √2)
- N is the total number of visits to the parent node
- n_i is the number of visits to node i
The first term encourages exploitation of nodes with high average rewards, while the second term promotes exploration of less-visited nodes. The logarithmic factor ensures that exploration decreases over time as confidence in the estimates increases.
Python Implementation
Here's a comprehensive implementation of MCTS for a simple game like Tic-Tac-Toe:
1. Importing Libraries
We will start by importing required libraries:
- math : to perform mathematical operations like logarithms and square roots for UCB1 calculations.
- random : to randomly pick moves during simulations (rollouts).
Python
import math
import random
2. MCTS Node Class
We create a MCTSNode class to represent each node (game state) in the search tree. This class contains methods for:
- __init__(): Initializes board state, parent node, move taken, children, visits, wins and untried moves.
- get_actions(): Returns a list of all empty cells as possible moves.
- is_terminal(): Checks if the game is over (winner or no moves left).
- is_fully_expanded(): Checks if all possible moves have been explored.
- check_winner(): Determines if any player has won the game.
Python
class MCTSNode:
def __init__(self, state, parent=None, action=None):
self.state = state # Current board
self.parent = parent # Parent node
self.action = action # Move leading to this node
self.children = [] # List of children
self.visits = 0 # Visit count
self.wins = 0 # Win count
self.untried_actions = self.get_actions() # Available moves
def get_actions(self):
"""Return all empty cells."""
return [(i, j) for i in range(3) for j in range(3) if self.state[i][j] == 0]
def is_terminal(self):
"""Check if the game has ended."""
return self.check_winner() is not None or not self.get_actions()
def is_fully_expanded(self):
return len(self.untried_actions) == 0
def check_winner(self):
"""Find winner (1 or 2) or None."""
for i in range(3):
if self.state[i][0] == self.state[i][1] == self.state[i][2] != 0:
return self.state[i][0]
if self.state[0][i] == self.state[1][i] == self.state[2][i] != 0:
return self.state[0][i]
if self.state[0][0] == self.state[1][1] == self.state[2][2] != 0:
return self.state[0][0]
if self.state[0][2] == self.state[1][1] == self.state[2][0] != 0:
return self.state[0][2]
return None
3. Expansion, Selection, Rollout and Backpropagation
We now define methods that enable the core MCTS operations:
- expand() : Adds a new child node for an untried move.
- best_child() : Selects the most promising child using the UCB1 formula, balancing exploration and exploitation.
- rollout() : Plays random moves from the current state until the game ends, simulating the outcome.
- backpropagate() : Updates the node's statistics (wins and visits) and propagates them back up to the root.
Python
def expand(self):
"""Add one of the remaining actions as a child."""
action = self.untried_actions.pop()
new_state = [row[:] for row in self.state]
player = self.get_current_player()
new_state[action[0]][action[1]] = player
child = MCTSNode(new_state, parent=self, action=action)
self.children.append(child)
return child
def get_current_player(self):
"""Find whose turn it is."""
x_count = sum(row.count(1) for row in self.state)
o_count = sum(row.count(2) for row in self.state)
return 1 if x_count == o_count else 2
def best_child(self, c=1.4):
"""Select child with best UCB1 score."""
return max(self.children, key=lambda child:
(child.wins / child.visits) +
c * math.sqrt(math.log(self.visits) / child.visits))
def rollout(self):
"""Play random moves until the game ends."""
state = [row[:] for row in self.state]
player = self.get_current_player()
while True:
winner = self.check_winner_for_state(state)
if winner: return 1 if winner == 1 else 0
actions = [(i, j) for i in range(3) for j in range(3) if state[i][j] == 0]
if not actions: return 0.5 # Draw
move = random.choice(actions)
state[move[0]][move[1]] = player
player = 1 if player == 2 else 2
def check_winner_for_state(self, state):
"""Same winner check for rollout."""
return MCTSNode(state).check_winner()
def backpropagate(self, result):
"""Update stats up the tree."""
self.visits += 1
self.wins += result
if self.parent:
self.parent.backpropagate(result)
4. Implementing the MCTS Search
Now we implement the mcts_search() function, which performs:
- Selection : choose a promising node.
- Expansion : add new nodes for unexplored moves.
- Simulation (Rollout) : play random games.
- Backpropagation : update nodes with results.
Python
def mcts_search(root_state, iterations=500):
root = MCTSNode(root_state)
for _ in range(iterations):
node = root
# Selection
while not node.is_terminal() and node.is_fully_expanded():
node = node.best_child()
# Expansion
if not node.is_terminal():
node = node.expand()
# Simulation
result = node.rollout()
# Backpropagation
node.backpropagate(result)
return root.best_child(c=0).action # Return best move
5. Play the Tic-Tac-Toe Game
We define the play_game() function, where:
- Player 1 (MCTS) chooses the best move using MCTS.
- Player 2 plays randomly for demonstration purposes.
Python
def play_game():
board = [[0]*3 for _ in range(3)]
current_player = 1
print("MCTS Tic-Tac-Toe Demo")
print("0 = empty, 1 = X, 2 = O\n")
for turn in range(9):
for row in board: print(row)
print()
if current_player == 1:
move = mcts_search(board, iterations=500)
print(f"MCTS plays: {move}")
else:
empty = [(i, j) for i in range(3) for j in range(3) if board[i][j] == 0]
move = random.choice(empty)
print(f"Random plays: {move}")
board[move[0]][move[1]] = current_player
if MCTSNode(board).check_winner():
for row in board: print(row)
print(f"Player {current_player} wins!")
return
current_player = 1 if current_player == 2 else 2
print("Draw!")
6. Run the game
Python
Output:
Sample run outputWhen running the above implementation, MCTS demonstrates strong performance even against optimal play in Tic-Tac-Toe. With 1000 iterations per move, the algorithm can identify winning opportunities and avoid losing positions effectively. The quality of play improves significantly as the number of iterations increases.
AlphaGo, which uses MCTS combined with neural networks, achieved superhuman performance in Go by performing millions of simulations per move. Monte Carlo's strength lies in its ability to focus computational resources on the most promising areas of the search space.
Practical Applications Beyond Games
MCTS has found applications in numerous domains outside of game playing:
1. Planning and Scheduling: The algorithm can optimize resource allocation and task scheduling in complex systems where traditional optimization methods struggle.
2. Neural Architecture Search: MCTS guides the exploration of neural network architectures, helping to discover optimal designs for specific tasks.
3. Portfolio Management: Financial applications use MCTS for portfolio optimization under uncertainty, where the algorithm balances risk and return through simulated market scenarios.
Limitations and Edge Cases
1. Sample Efficiency: The algorithm requires a lot of simulations to achieve reliable estimates, particularly in complex domains. This can be computationally expensive when quick decisions are needed.
2. High Variance: Random simulations can produce inconsistent results, especially in games with high variance outcomes. Techniques like progressive widening and RAVE (Rapid Action Value Estimation) help mitigate this issue.
3. Tactical Blindness: MCTS may miss short-term tactical opportunities due to its reliance on random playouts. In chess, for example, the algorithm might overlook a forced checkmate sequence if the simulations fail to explore the variations.
4. Exploration-Exploitation Balance: The UCB1 formula requires careful tuning of the exploration constant. Too much exploration leads to inefficient search, while too little can cause the algorithm to get trapped in local optima.
Similar Reads
Machine Learning Algorithms Machine learning algorithms are essentially sets of instructions that allow computers to learn from data, make predictions, and improve their performance over time without being explicitly programmed. Machine learning algorithms are broadly categorized into three types: Supervised Learning: Algorith
8 min read
Top 15 Machine Learning Algorithms Every Data Scientist Should Know in 2025 Machine Learning (ML) Algorithms are the backbone of everything from Netflix recommendations to fraud detection in financial institutions. These algorithms form the core of intelligent systems, empowering organizations to analyze patterns, predict outcomes, and automate decision-making processes. Wi
14 min read
Linear Model Regression
Ordinary Least Squares (OLS) using statsmodelsOrdinary Least Squares (OLS) is a widely used statistical method for estimating the parameters of a linear regression model. It minimizes the sum of squared residuals between observed and predicted values. In this article we will learn how to implement Ordinary Least Squares (OLS) regression using P
3 min read
Linear Regression (Python Implementation)Linear regression is a statistical method that is used to predict a continuous dependent variable i.e target variable based on one or more independent variables. This technique assumes a linear relationship between the dependent and independent variables which means the dependent variable changes pr
14 min read
Multiple Linear Regression using Python - MLLinear regression is a statistical method used for predictive analysis. It models the relationship between a dependent variable and a single independent variable by fitting a linear equation to the data. Multiple Linear Regression extends this concept by modelling the relationship between a dependen
4 min read
Polynomial Regression ( From Scratch using Python )Prerequisites Linear RegressionGradient DescentIntroductionLinear Regression finds the correlation between the dependent variable ( or target variable ) and independent variables ( or features ). In short, it is a linear model to fit the data linearly. But it fails to fit and catch the pattern in no
5 min read
Bayesian Linear RegressionLinear regression is based on the assumption that the underlying data is normally distributed and that all relevant predictor variables have a linear relationship with the outcome. But In the real world, this is not always possible, it will follows these assumptions, Bayesian regression could be the
10 min read
How to Perform Quantile Regression in PythonIn this article, we are going to see how to perform quantile regression in Python. Linear regression is defined as the statistical method that constructs a relationship between a dependent variable and an independent variable as per the given set of variables. While performing linear regression we a
4 min read
Isotonic Regression in Scikit LearnIsotonic regression is a regression technique in which the predictor variable is monotonically related to the target variable. This means that as the value of the predictor variable increases, the value of the target variable either increases or decreases in a consistent, non-oscillating manner. Mat
6 min read
Stepwise Regression in PythonStepwise regression is a method of fitting a regression model by iteratively adding or removing variables. It is used to build a model that is accurate and parsimonious, meaning that it has the smallest number of variables that can explain the data. There are two main types of stepwise regression: F
6 min read
Least Angle Regression (LARS)Regression is a supervised machine learning task that can predict continuous values (real numbers), as compared to classification, that can predict categorical or discrete values. Before we begin, if you are a beginner, I highly recommend this article. Least Angle Regression (LARS) is an algorithm u
3 min read
Linear Model Classification
Regularization
K-Nearest Neighbors (KNN)
Support Vector Machines
ML - Stochastic Gradient Descent (SGD) Stochastic Gradient Descent (SGD) is an optimization algorithm in machine learning, particularly when dealing with large datasets. It is a variant of the traditional gradient descent algorithm but offers several advantages in terms of efficiency and scalability, making it the go-to method for many d
8 min read
Decision Tree
Ensemble Learning