Mila University Centre
M1: I2A
Uncertain decision
Work 2 (reinforcement learning)
1. Definition
Gym is an open source project created by OpenAI used for reinforcement learning experiments.
2. Install OpenAI Gym
- pip install gym
- pip install gym [toy-text]
3. The Frozen Lake Environment
Frozen lake involves crossing a frozen lake from Start(S) to Goal(G) without falling into any Holes(H)
by walking over the Frozen(F) lake (see Figure ).
0 1 2 3
S F F F
4 5 6 7
F H F H
8 9 10 11
F F F H
12 13 14 15
H F F G
Frozen Lake environment
import gym
env = gym.make("FrozenLake-v1",render_mode="human") ## to create the Frozen Lake
environment
env.reset() ## to put the environment on its initial state.
env.render() # to print the environment into the console.
3.1 State space
This environment consists of 16 fields (4 by 4 grid). The states are denoted from 0 to 15 (See figure
above) . There are four types of fields: start field (S), frozen fields (F), holes (H), and the goal field
(G).That is, the game is completed if we step on a hole field or if we reach the goal field.
env.observation_space
Mila University Centre
M1: I2A
Uncertain decision
3.2 Action space
env.action_space.
we have 4 possible actions: : left(0), down (1), right (2), up(3)
To take a random action, we use :
random_action = env.action_space.sample()
env.step(random_action)
This function has the following parameter:
(1, 0.0, False, False, {'prob': 0.3333333333333333})
1: The current state, 2: reward, 3: Boolean parameter taking true when the agent achieves the goal or
falls into a hole.
The last parameter concerns the probability that the agent move in the intended direction. In fact, The
agent may not always move in the intended direction, due to the slippery nature of the frozen lake.
3.3 Probability transition
Env.P[s][a]
The different Probability of reaching the adjacent state of s, including s, using action a.
[(0.3333333333333333, 0, 0.0, False), (0.3333333333333333, 4, 0.0, False), (0.3333333333333333, 1,
0.0, False)]
3.4 leave the environment
env.close:
4. Questions
1. Simulate an episode
2. Implement an Algorithm that allows determining the optimal policy to achieve the Goal.