0% found this document useful (0 votes)

21 views6 pages

FrozenLake - Using - Dynamic - Programming5.ipynb - Colab

The document outlines a project to develop a reinforcement learning agent using dynamic programming to solve a Treasure Hunt problem in a FrozenLake environment. The agent must learn an optimal policy to navigate a 5x5 grid while collecting treasures and avoiding holes, with specific rewards assigned to various grid tiles. The project includes creating a custom environment, implementing value iteration and policy improvement algorithms, and evaluating the agent's performance.

Uploaded by

2023ac05887

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views6 pages

FrozenLake - Using - Dynamic - Programming5.ipynb - Colab

Uploaded by

2023ac05887

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

1/10/25, 6:34 PM FrozenLake_using_Dynamic_programming5.

ipynb - Colab

keyboard_arrow_down Group No 29
Group Member Names:
1. Mohit Sharma 2023ac05887
2. Neeraj Choudhary 2023AC05998
3. Shubham Yadav 2023ac05241
4. Sooraj T S 2023ac05659

1.Problem statement:

Develop a reinforcement learning agent using dynamic programming to solve the Treasure Hunt problem in a FrozenLake environment.
The agent must learn the optimal policy for navigating the lake while avoiding holes and maximizing its treasure collection.

2.Scenario:

A treasure hunter is navigating a slippery 5x5 FrozenLake grid. The objective is to navigate through the lake collecting treasures while
avoiding holes and ultimately reaching the exit (goal). Grid positions on a 5x5 map with tiles labeled as S, F, H, G, T. The state includes the
current position of the agent and whether treasures have been collected.

keyboard_arrow_down Objective

The agent must learn the optimal policy π* using dynamic programming to maximize its cumulative reward while navigating the lake.

About the environment

The environment consists of several types of tiles:

Start (S): The initial position of the agent, safe to step.

Frozen Tiles (F): Frozen surface, safe to step.
Hole (H): Falling into a hole ends the game immediately (die, end).
Goal (G): Exit point; reaching here ends the game successfully (safe, end).
Treasure Tiles (T): Added to the environment. Stepping on these tiles awards +5 reward but does not end the game.

After stepping on a treasure tile, it becomes a frozen tile (F). The agent earns rewards as follows:

Reaching the goal (G): +10 reward.

Falling into a hole (H): -10 reward.
Collecting a treasure (T): +5 reward.
Stepping on a frozen tile (F): 0 reward.

States
Current position of the agent (row, column).
A boolean flag (or equivalent) for whether each treasure has been collected.

Actions

Four possible moves: up, down, left, right

Rewards
Goal (G): +10.
Treasure (T): +5 per treasure.
Hole (H): -10.
Frozen tiles (F): 0.

Environment
Modify the FrozenLake environment in OpenAI Gym to include treasures (T) at certain positions. Inherit the original FrozenLakeEnv and modify
the reset and step methods accordingly. Example grid:

https://colab.research.google.com/drive/1BMrdZ1_Rx1N5Z8KQ4KHiBj1gToPUeWCw#scrollTo=T0yht24etGS2&printMode=true 1/6
1/10/25, 6:34 PM FrozenLake_using_Dynamic_programming5.ipynb - Colab

image.png

Double-click (or enter) to edit

Expected Outcomes:

1. Create the custom environment by modifying the existing “FrozenLakeNotSlippery-v0” in OpenAI Gym and Implement the dynamic
programming using value iteration and policy improvement to learn the optimal policy for the Treasure Hunt problem.
2. Calculate the state-value function (V*) for each state on the map after learning the optimal policy.
3. Compare the agent’s performance with and without treasures, discussing the trade-offs in reward maximization.
4. Visualize the agent’s direction on the map using the learned policy.
5. Calculate expected total reward over multiple episodes to evaluate performance.

keyboard_arrow_down Import required libraries and Define the custom environment - 2 Marks
!pip install gymnasium

Requirement already satisfied: gymnasium in /usr/local/lib/python3.10/dist-packages (1.0.0)

Requirement already satisfied: numpy>=1.21.0 in /usr/local/lib/python3.10/dist-packages (from gymnasium) (1.26.4)
Requirement already satisfied: cloudpickle>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from gymnasium) (3.1.0)
Requirement already satisfied: typing-extensions>=4.3.0 in /usr/local/lib/python3.10/dist-packages (from gymnasium) (4.12.2)
Requirement already satisfied: farama-notifications>=0.0.1 in /usr/local/lib/python3.10/dist-packages (from gymnasium) (0.0.4)

# Import statements
# import numpy as np
# import gym
# from gym.envs.toy_text.frozen_lake import FrozenLakeEnv
# from collections import defaultdict
import numpy as np
import gymnasium as gym
from gymnasium.envs.toy_text.frozen_lake import FrozenLakeEnv

# Custom environment to create the given grid and respective functions that are required for the problem

#Include functions to take an action, get reward, to check if episode is over

class CustomFrozenLake(FrozenLakeEnv):
def __init__(self, desc=None, is_slippery=False):
if desc is None:
raise ValueError("A custom map (desc) must be provided.")
self.desc = np.asarray(desc, dtype="c") # Convert the map to a character array
print(f"this is the shape of desc {self.desc.shape}")
self.nrow, self.ncol = self.desc.shape
self.nS = self.nrow * self.ncol # Total states
self.nA = 4 # Total actions (up, down, left, right)

self.reward_map = {
b'S': 0, # Start
b'F': 0, # Frozen Tile
b'H': -10, # Hole
b'G': 10, # Goal
b'T': 5 # Treasure
}

super().init(desc=self.desc, map_name=None, is_slippery=is_slippery) # Custom map

def step(self, action):
state, reward, done, info = super().step(action)
print(f"state={state},reward={reward},done={done},info={info}")
row, col = divmod(state, self.ncol) # Get row and column from the state index
print(f"row, col = {row},{col}")
tile = self.desc[row][col]
print(f"tile={tile}")

https://colab.research.google.com/drive/1BMrdZ1_Rx1N5Z8KQ4KHiBj1gToPUeWCw#scrollTo=T0yht24etGS2&printMode=true 2/6
1/10/25, 6:34 PM FrozenLake_using_Dynamic_programming5.ipynb - Colab
reward = self.reward_map[tile]
# Convert treasure tile to frozen tile after collection
if tile == b'T':
self.desc[row][col] = b'F'
return state, reward, done, info # Ensure state is a single integer

def reset(self):
state = super().reset()
return state # Ensure reset returns a single integer state index

# Define the custom 5x5 map

map_desc = [
"SFFHT", # Treasure tile at (0, 4)
"FHFFF",
"FFFTF", # Treasure tile at (2, 3)
"TFHFF", # Treasure tile at (3, 0)
"FFFFG" # Goal at (4, 4)
]

# Create the environment

#env = CustomFrozenLake(desc=map_desc, is_slippery=False)

# Verify the environment details

print(f"Rows: {env.nrow}, Columns: {env.ncol}, Total States: {env.nS}, Actions: {env.nA}")

this is the shape of desc (5, 5)

Rows: 5, Columns: 5, Total States: 25, Actions: 4

for state in env.P:

print(f"State {state}:")
for action in env.P[state]:
print(f" Action {action}: {env.P[state][action]}")

State 0:
Action 0: [(1.0, 0, 0.0, False)]
Action 1: [(1.0, 5, 0.0, False)]
Action 2: [(1.0, 1, 0.0, False)]
Action 3: [(1.0, 0, 0.0, False)]
State 1:
Action 0: [(1.0, 0, 0.0, False)]
Action 1: [(1.0, 6, 0.0, True)]
Action 2: [(1.0, 2, 0.0, False)]
Action 3: [(1.0, 1, 0.0, False)]
State 2:
Action 0: [(1.0, 1, 0.0, False)]
Action 1: [(1.0, 7, 0.0, False)]
Action 2: [(1.0, 3, 0.0, True)]
Action 3: [(1.0, 2, 0.0, False)]
State 3:
Action 0: [(1.0, 3, 0, True)]
Action 1: [(1.0, 3, 0, True)]
Action 2: [(1.0, 3, 0, True)]
Action 3: [(1.0, 3, 0, True)]
State 4:
Action 0: [(1.0, 3, 0.0, True)]
Action 1: [(1.0, 9, 0.0, False)]
Action 2: [(1.0, 4, 0.0, False)]
Action 3: [(1.0, 4, 0.0, False)]
State 5:
Action 0: [(1.0, 5, 0.0, False)]
Action 1: [(1.0, 10, 0.0, False)]
Action 2: [(1.0, 6, 0.0, True)]
Action 3: [(1.0, 0, 0.0, False)]
State 6:
Action 0: [(1.0, 6, 0, True)]
Action 1: [(1.0, 6, 0, True)]
Action 2: [(1.0, 6, 0, True)]
Action 3: [(1.0, 6, 0, True)]
State 7:
Action 0: [(1.0, 6, 0.0, True)]
Action 1: [(1.0, 12, 0.0, False)]
Action 2: [(1.0, 8, 0.0, False)]
Action 3: [(1.0, 2, 0.0, False)]
State 8:
Action 0: [(1.0, 7, 0.0, False)]
Action 1: [(1.0, 13, 0.0, False)]
Action 2: [(1.0, 9, 0.0, False)]
Action 3: [(1.0, 3, 0.0, True)]
State 9:
Action 0: [(1.0, 8, 0.0, False)]
Action 1: [(1.0, 14, 0.0, False)]
Action 2: [(1.0, 9, 0.0, False)]
Action 3: [(1.0, 4, 0.0, False)]
State 10:
Action 0: [(1.0, 10, 0.0, False)]

https://colab.research.google.com/drive/1BMrdZ1_Rx1N5Z8KQ4KHiBj1gToPUeWCw#scrollTo=T0yht24etGS2&printMode=true 3/6
1/10/25, 6:34 PM FrozenLake_using_Dynamic_programming5.ipynb - Colab
Action 1: [(1.0, 15, 0.0, False)]
Action 2: [(1.0, 11, 0.0, False)]
Action 3: [(1.0, 5, 0.0, False)]
State 11:
Action 0: [(1.0, 10, 0.0, False)]
A ti 1 [(1 0 16 0 0 F l )]

keyboard_arrow_down Value Iteration Algorithm - 1 Mark

def value_iteration(env, gamma=0.9, theta=1e-6):
value_table = np.zeros(env.nS)
while True:
delta = 0
for state in range(env.nS):
v = value_table[state]
q_values = []
for action in range(env.nA):
q_value = sum(prob * (reward + gamma * value_table[next_state])
for prob, next_state, reward, done in env.P[state][action])
q_values.append(q_value)
value_table[state] = max(q_values)
delta = max(delta, abs(v - value_table[state]))
if delta < theta:
break
return value_table

# Compute the optimal value function

optimal_value = value_iteration(env)
print("\nOptimal Value Function (V*):")
print(optimal_value.reshape(env.nrow, env.ncol))

Optimal Value Function (V*):

[[0.4782969 0.531441 0.59049 0. 0.729 ]
[0.531441 0. 0.6561 0.729 0.81 ]
[0.59049 0.6561 0.729 0.81 0.9 ]
[0.6561 0.729 0. 0.9 1. ]
[0.729 0.81 0.9 1. 0. ]]

keyboard_arrow_down Policy Improvement Function - 1 Mark

# def policy_improvement(env, value_table, gamma=0.9):
# policy = np.zeros([env.nS, env.nA])
# for state in range(env.nS):
# q_values = []
# for action in range(env.nA):
# q_value = sum(prob * (reward + gamma * value_table[next_state])
# for prob, next_state, reward, done in env.P[state][action])
# q_values.append(q_value)
# best_action = np.argmax(q_values)
# policy[state, best_action] = 1.0
# return policy
def policy_improvement(env, value_table, gamma=0.9):
print(f"env={env}, {value_table},{gamma}")
print(f"env.nS={env.nS}")
print(f"env.nA={env.nA}")
policy = np.zeros([env.nS, env.nA])
print(f"police={policy}")
dir(policy)
for state in range(env.nS):
q_values = []
for action in range(env.nA):
q_value = sum(prob * (reward + gamma * value_table[next_state])
for prob, next_state, reward, done in env.P[state][action])
q_values.append(q_value)
best_action = np.argmax(q_values)
policy[state, best_action] = 1.0
#print(f"{policy[state, best_action]}")
return policy

# Compute the optimal policy

optimal_policy = policy_improvement(env, optimal_value)
print("\nOptimal Policy (as action probabilities):")
print(optimal_policy)

env=<CustomFrozenLake instance>, [0.4782969 0.531441 0.59049 0. 0.729 0.531441 0.

0.6561 0.729 0.81 0.59049 0.6561 0.729 0.81
0.9 0.6561 0.729 0. 0.9 1. 0.729
0.81 0.9 1. 0. ],0.9

https://colab.research.google.com/drive/1BMrdZ1_Rx1N5Z8KQ4KHiBj1gToPUeWCw#scrollTo=T0yht24etGS2&printMode=true 4/6
1/10/25, 6:34 PM FrozenLake_using_Dynamic_programming5.ipynb - Colab
env.nS=25
env.nA=4
police=[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]

Optimal Policy (as action probabilities):

[[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 1. 0. 0.]
[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 1. 0. 0.]
[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 1. 0. 0.]
[0. 1. 0. 0.]
[0. 1. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 1. 0. 0.]
[0. 1. 0. 0.]
[0. 1. 0. 0.]
[0. 1. 0. 0.]
[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 1. 0.]
[0. 0. 1. 0.]
[0. 0. 1. 0.]
[ 0 0 0 ]]
Start coding or generate with AI.

keyboard_arrow_down Visualization of the learned optimal policy - 1 Mark

# def visualize_policy(env, policy):
# action_symbols = ['<', 'v', '>', '^']
# policy_grid = np.array([action_symbols[np.argmax(policy[state])] for state in range(env.nS)])
# policy_grid = policy_grid.reshape(env.desc.shape)
# for row in policy_grid:
# print(' '.join(row))
def visualize_policy(env, policy):
action_symbols = ['<', 'v', '>', '^']
policy_grid = np.array([action_symbols[np.argmax(policy[state])] for state in range(env.nS)])
policy_grid = policy_grid.reshape(env.desc.shape)
print("\nOptimal Policy Directions:")
for row in policy_grid:
print(' '.join(row))

# Visualize the optimal policy

visualize_policy(env, optimal_policy)

Optimal Policy Directions:

v > v < v
v < v v v
v v > v v
v v < v v
> > > > <

Start coding or generate with AI.

https://colab.research.google.com/drive/1BMrdZ1_Rx1N5Z8KQ4KHiBj1gToPUeWCw#scrollTo=T0yht24etGS2&printMode=true 5/6
1/10/25, 6:34 PM FrozenLake_using_Dynamic_programming5.ipynb - Colab

print(f"{env}")

Start coding or generate with AI.

def evaluate_policy(env, policy, episodes=1000):

total_reward = 0
for _ in range(episodes):
state = env.reset()
done = False
while not done:
# Debugging: Check state and policy
if state >= policy.shape[0]:
print("Invalid state:", state)
break
action = np.argmax(policy[state]) # Get the best action
state, reward, done, _ = env.step(action) # Take a step in the environment
total_reward += reward
return total_reward / episodes

print(env)

average_reward = evaluate_policy(env, optimal_policy)

print(f"\nExpected Average Reward Over 1000 Episodes: {average_reward}")

Expected Average Reward Over 1000 Episodes: 8.7

keyboard_arrow_down Main Execution

if __name__ == "__main__":
env = CustomFrozenLake(desc=map_desc, is_slippery=False)

Double-click (or enter) to edit

Explanation of Results:

1.Value Iteration: Finds the optimal value of each state.

2.Policy Improvement: Maps the best action for each state.

3.Policy Visualization: Displays the agent's movement on the grid.

4.Average Reward: Indicates the expected performance of the optimal policy.

Start coding or generate with AI.

Final Answer:

Below is the complete code for solving the Treasure Hunt problem in the FrozenLake environment using dynamic programming. It includes
custom environment setup, value iteration, policy improvement, visualization, and evaluation.

Start coding or generate with AI.

https://colab.research.google.com/drive/1BMrdZ1_Rx1N5Z8KQ4KHiBj1gToPUeWCw#scrollTo=T0yht24etGS2&printMode=true 6/6

FL QL
No ratings yet
FL QL
5 pages
Mila University Centre M1: I2A Uncertain Decision: Work 2 (Reinforcement Learning) 1. Definition
No ratings yet
Mila University Centre M1: I2A Uncertain Decision: Work 2 (Reinforcement Learning) 1. Definition
2 pages
Hill Climbing1
No ratings yet
Hill Climbing1
47 pages
61 Report
No ratings yet
61 Report
12 pages
Monte Carlo RL for Frozen Lake
No ratings yet
Monte Carlo RL for Frozen Lake
18 pages
Markov Decision Process Assignment Guide
No ratings yet
Markov Decision Process Assignment Guide
6 pages
Ass1 Merged Merged
No ratings yet
Ass1 Merged Merged
15 pages
Assignment 1 Lab Report: 1 Task 1
No ratings yet
Assignment 1 Lab Report: 1 Task 1
12 pages
AI Outputs (4,5,6,7)
No ratings yet
AI Outputs (4,5,6,7)
16 pages
Amity University, Noida Aset (Cse) Batch: 2020-2024: Course Code: CSE401 Course: Artificial Intelligence
No ratings yet
Amity University, Noida Aset (Cse) Batch: 2020-2024: Course Code: CSE401 Course: Artificial Intelligence
31 pages
Group 20 Lab 6
No ratings yet
Group 20 Lab 6
15 pages
Ai Exp 1-10
No ratings yet
Ai Exp 1-10
26 pages
Practical No4,5
No ratings yet
Practical No4,5
7 pages
FrozenLake Q-Learning Guide
No ratings yet
FrozenLake Q-Learning Guide
4 pages
01 Module 1 Early Reinforcement Learning
No ratings yet
01 Module 1 Early Reinforcement Learning
134 pages
Assignment 01 200021364
No ratings yet
Assignment 01 200021364
10 pages
AI Assiggnment Afrar
No ratings yet
AI Assiggnment Afrar
9 pages
III - AI-DS - AD3311 - AI - Lab Manual
No ratings yet
III - AI-DS - AD3311 - AI - Lab Manual
34 pages
Ai Case T4
No ratings yet
Ai Case T4
36 pages
Artificial Intelligence Lab File
No ratings yet
Artificial Intelligence Lab File
16 pages
Simple Hill Climbing for 8-Puzzle
No ratings yet
Simple Hill Climbing for 8-Puzzle
3 pages
FA16-BCS-178 (AI ASSIGNMENT#2 8puzzle)
No ratings yet
FA16-BCS-178 (AI ASSIGNMENT#2 8puzzle)
6 pages
Sliding Puzzle Solver Code
No ratings yet
Sliding Puzzle Solver Code
3 pages
EECS Homework 1 PDF
No ratings yet
EECS Homework 1 PDF
16 pages
Aiml Full Record
No ratings yet
Aiml Full Record
46 pages
AI Lab Manual
No ratings yet
AI Lab Manual
22 pages
Hasnain Ali (22-CS-143)
No ratings yet
Hasnain Ali (22-CS-143)
12 pages
Department of Computer Science: Misbah Zahra
No ratings yet
Department of Computer Science: Misbah Zahra
12 pages
AIL2
No ratings yet
AIL2
2 pages
Reinforcement Learning: Foundations Exam
No ratings yet
Reinforcement Learning: Foundations Exam
42 pages
Ai Manual Program
No ratings yet
Ai Manual Program
19 pages
Lab#08 - (2018 BCS 076)
No ratings yet
Lab#08 - (2018 BCS 076)
10 pages
AI Record
No ratings yet
AI Record
17 pages
A* Algorithm for 8-Puzzle Solution
No ratings yet
A* Algorithm for 8-Puzzle Solution
14 pages
Assignment 1
No ratings yet
Assignment 1
24 pages
Trabajo 03
No ratings yet
Trabajo 03
5 pages
AI Practicals BSC It
No ratings yet
AI Practicals BSC It
25 pages
21L7734 Shais Quiz3 Aml 8A
No ratings yet
21L7734 Shais Quiz3 Aml 8A
25 pages
HW 1 MeowMeow
No ratings yet
HW 1 MeowMeow
8 pages
15-381 Spring 2007 Assignment 1 Solutions: Out: January 23rd, 2007 Due: February 6th, 1:30pm Tuesday
No ratings yet
15-381 Spring 2007 Assignment 1 Solutions: Out: January 23rd, 2007 Due: February 6th, 1:30pm Tuesday
5 pages
Homework Solutions
No ratings yet
Homework Solutions
5 pages
MCD4710 2021 T1 Lect21 Backtracking
No ratings yet
MCD4710 2021 T1 Lect21 Backtracking
141 pages
AI Exam: Search & Learning
No ratings yet
AI Exam: Search & Learning
5 pages
Treasure Island MDP Using Value Iteration: Python Code
No ratings yet
Treasure Island MDP Using Value Iteration: Python Code
5 pages
All
No ratings yet
All
10 pages
Practical - 6
No ratings yet
Practical - 6
6 pages
ML - 6 - Jupyter Notebook
No ratings yet
ML - 6 - Jupyter Notebook
5 pages
CS3491 - Artificial Intelligence and Machine Learn - 250331 - 111332
No ratings yet
CS3491 - Artificial Intelligence and Machine Learn - 250331 - 111332
32 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Reinforcement Learning - Project 3
No ratings yet
Reinforcement Learning - Project 3
9 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
A13 Solved
No ratings yet
A13 Solved
9 pages
Ai Unit 2
No ratings yet
Ai Unit 2
4 pages
DFS Water Jug Problem Solution
No ratings yet
DFS Water Jug Problem Solution
26 pages
Labview Core 1
No ratings yet
Labview Core 1
7 pages
A. Application of DFS, BFS, UCS For Sokoban Game
No ratings yet
A. Application of DFS, BFS, UCS For Sokoban Game
6 pages
Class-Work-1 (26-08-2024)
No ratings yet
Class-Work-1 (26-08-2024)
5 pages
NieHouAn AIPlays2048 Report
No ratings yet
NieHouAn AIPlays2048 Report
5 pages
Risc-V Bitmanip Extension Document Version 0.90: Editor: Clifford Wolf Symbiotic GMBH June 10, 2019
No ratings yet
Risc-V Bitmanip Extension Document Version 0.90: Editor: Clifford Wolf Symbiotic GMBH June 10, 2019
72 pages
Windows Debugging Fundamentals Guide
No ratings yet
Windows Debugging Fundamentals Guide
19 pages
R22 MCA Syllabus - PYTHONPROGRAMMING
No ratings yet
R22 MCA Syllabus - PYTHONPROGRAMMING
1 page
Ex 7
No ratings yet
Ex 7
17 pages
Pps Lab Assignment BTech II Sem SP2025
No ratings yet
Pps Lab Assignment BTech II Sem SP2025
3 pages
Program: B.C.A. (Science) Program Outcome
No ratings yet
Program: B.C.A. (Science) Program Outcome
11 pages
React - Js For Beginners
No ratings yet
React - Js For Beginners
12 pages
Ardustim Ino
No ratings yet
Ardustim Ino
9 pages
Python Programming Assignment Overview
No ratings yet
Python Programming Assignment Overview
3 pages
Eletronka
No ratings yet
Eletronka
7 pages
Urooj Project PDF
No ratings yet
Urooj Project PDF
25 pages
Understanding Python Functions and Scope
No ratings yet
Understanding Python Functions and Scope
10 pages
Full Stack Web Development Guide
No ratings yet
Full Stack Web Development Guide
6 pages
Payroll Arch Handbk RPT
No ratings yet
Payroll Arch Handbk RPT
33 pages
DBMS Lab - Manual Dec 15, 2020
No ratings yet
DBMS Lab - Manual Dec 15, 2020
22 pages
Syllabus For BCS
No ratings yet
Syllabus For BCS
15 pages
Working With Arrays in PHP
No ratings yet
Working With Arrays in PHP
25 pages
Programming Fundamentals Overview
No ratings yet
Programming Fundamentals Overview
176 pages
Data Structure MCQ
No ratings yet
Data Structure MCQ
31 pages
Lab 088
No ratings yet
Lab 088
2 pages
Unit 3 High Level Language To Machine Level
No ratings yet
Unit 3 High Level Language To Machine Level
7 pages
Tic-Tac-Toe & Tower of Hanoi Code
No ratings yet
Tic-Tac-Toe & Tower of Hanoi Code
7 pages
Class 12 Cs Practical Exercises 2023-2024 (Stu Updat)
No ratings yet
Class 12 Cs Practical Exercises 2023-2024 (Stu Updat)
37 pages
Implementation of Finite Automata and String Validation.: Practical-1
No ratings yet
Implementation of Finite Automata and String Validation.: Practical-1
8 pages
Dependency Analysis en
No ratings yet
Dependency Analysis en
18 pages
BBD PDF
No ratings yet
BBD PDF
10 pages
Pro NuGet (2nd Edition) Decoster
No ratings yet
Pro NuGet (2nd Edition) Decoster
10 pages
Operating Systems - Set 5 - GeeksforGeeks
No ratings yet
Operating Systems - Set 5 - GeeksforGeeks
7 pages
Revision Tour Worksheet
No ratings yet
Revision Tour Worksheet
10 pages

FrozenLake - Using - Dynamic - Programming5.ipynb - Colab

Uploaded by

FrozenLake - Using - Dynamic - Programming5.ipynb - Colab

Uploaded by

1/10/25, 6:34 PM FrozenLake_using_Dynamic_programming5.

About the environment

The environment consists of several types of tiles:

Start (S): The initial position of the agent, safe to step.

Reaching the goal (G): +10 reward.

Four possible moves: up, down, left, right

Double-click (or enter) to edit

Requirement already satisfied: gymnasium in /usr/local/lib/python3.10/dist-packages (1.0.0)

#Include functions to take an action, get reward, to check if episode is over

super().__init__(desc=self.desc, map_name=None, is_slippery=is_slippery) # Custom map

# Define the custom 5x5 map

# Create the environment

# Verify the environment details

this is the shape of desc (5, 5)

for state in env.P:

keyboard_arrow_down Value Iteration Algorithm - 1 Mark

# Compute the optimal value function

Optimal Value Function (V*):

keyboard_arrow_down Policy Improvement Function - 1 Mark

# Compute the optimal policy

env=<CustomFrozenLake instance>, [0.4782969 0.531441 0.59049 0. 0.729 0.531441 0.

Optimal Policy (as action probabilities):

keyboard_arrow_down Visualization of the learned optimal policy - 1 Mark

# Visualize the optimal policy

Optimal Policy Directions:

Start coding or generate with AI.

Start coding or generate with AI.

def evaluate_policy(env, policy, episodes=1000):

average_reward = evaluate_policy(env, optimal_policy)

Expected Average Reward Over 1000 Episodes: 8.7

keyboard_arrow_down Main Execution

Double-click (or enter) to edit

1.Value Iteration: Finds the optimal value of each state.

2.Policy Improvement: Maps the best action for each state.

3.Policy Visualization: Displays the agent's movement on the grid.

4.Average Reward: Indicates the expected performance of the optimal policy.

Start coding or generate with AI.

Start coding or generate with AI.

You might also like

super().init(desc=self.desc, map_name=None, is_slippery=is_slippery) # Custom map