0% found this document useful (0 votes)
38 views15 pages

Reinforcement Learning

Here are the key points about reinforcement learning: 1. The key components are the agent, environment, actions, states, and rewards. The agent takes actions in the environment and transitions between states. It receives rewards depending on the actions and states. The goal is for the agent to learn which actions yield the most reward through trial and error interactions with the environment. 2. Model-based methods (like dynamic programming) learn an internal model of the environment and use it to plan optimal behavior. Model-free methods (like Q-learning) directly estimate value functions without learning a model. Q-learning estimates state-action values directly from experience. Value iteration uses a learned model to calculate state values. 3. In an

Uploaded by

21dce106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views15 pages

Reinforcement Learning

Here are the key points about reinforcement learning: 1. The key components are the agent, environment, actions, states, and rewards. The agent takes actions in the environment and transitions between states. It receives rewards depending on the actions and states. The goal is for the agent to learn which actions yield the most reward through trial and error interactions with the environment. 2. Model-based methods (like dynamic programming) learn an internal model of the environment and use it to plan optimal behavior. Model-free methods (like Q-learning) directly estimate value functions without learning a model. Q-learning estimates state-action values directly from experience. Value iteration uses a learned model to calculate state values. 3. In an

Uploaded by

21dce106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Reinforcement Learning

What is Reinforcement Learning ?


Reinforcement learning (RL) is a subfield of machine learning that involves teaching
computers how to learn from experience by assigning them rewards and punishments
in response to their actions.
What is Reinforcement Learning ?
Scenario: Imagine a robot named RoboStudent in a classroom setting. RoboStudent's
objective is to learn how to behave in class to maximize its "student performance
score." It interacts with the classroom environment and learns over time.
Key Components:

Agent (RoboStudent): RoboStudent is the learner or decision-maker in this scenario. It


takes actions in the classroom based on the current state to maximize its cumulative
performance score.

Environment (Classroom): The classroom is where RoboStudent operates. It can be in


various states, such as quiet, noisy, teacher speaking, or students discussing.

Actions: RoboStudent can take actions like raising its hand, asking questions, paying
attention, or chatting with classmates.
What is Reinforcement Learning ?
States: The classroom can be in different states, representing different situations. For
instance, it might be in a state where the teacher is teaching, students are discussing,
or it's a quiet study session.

Rewards: RoboStudent receives rewards or penalties based on its actions. For


instance, it might receive a positive reward when it raises its hand to answer a
question correctly or a negative reward when it disrupts the class.
What is Q- Reinforcement Learning ?
Q-Learning is a Reinforcement learning policy that will find the next best action, given
a current state. It chooses this action at random and aims to maximize the reward.
Q- RL- Example
Q- RL- Example

We’ll call each room, including outside, a “state”, and the agent’s movement
from one room to another will be an “action”. In our diagram, a “state” is
depicted as a node, while “action” is represented by the arrows.
Q- RL- Example

matrix R

The transition rule of Q learning is a very simple formula:

Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)]


Q- RL- Example
The Q-Learning algorithm goes as follows:

1. Set the gamma parameter and environment rewards in matrix R.


2. Initialize matrix Q to zero.
3. For each episode:
Select a random initial state.
Do While the goal state hasn’t been reached.
• Select one among all possible actions for the current state.
• Using this possible action, consider going to the next state.
• Get maximum Q value for this next state based on all possible actions.
• Compute: Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)]
• Set the next state as the current state.
End Do
End For
Q- RL- Example

learning parameter Gamma = 0.8, and the initial state as Room 1.

1. Look at the second row (state 1) of matrix R. There are two possible actions for the
current state 1: go to state 3, or go to state 5. By random selection, we select to go to 5
as our action.
Q- RL- Example
learning parameter Gamma = 0.8, and the initial state as Room 1.

2. Now let’s imagine what would happen if our agent were in state 5. Look at the sixth
row of the reward matrix R (i.e. state 5). It has 3 possible actions: go to states 1, 4, or 5.

3. Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)]


Q(1, 5) = R(1, 5) + 0.8 * Max[Q(5, 1), Q(5, 4), Q(5, 5)] = 100 + 0.8 * 0 = 100
Q- RL- Example
learning parameter Gamma = 0.8, and the initial state as Room 1.

100

The next state, 5, now becomes the current state. Because 5 is the goal state, we’ve
finished one episode.
Q- RL- Example
learning parameter Gamma = 0.8, and the initial state as Room 1.

100

• For the next episode, we start with a randomly chosen initial state. This time, we have
state 3 as our initial state.
• Look at the fourth row of matrix R; it has 3 possible actions: go to states 1, 2, or 4. By
random selection, we select to go to state 1 as our action.
• Now we imagine that we are in state 1. Look at the second row of reward matrix R (i.e.
state 1). It has 2 possible actions: go to state 3 or state 5. Then, we compute the Q value:
Q- RL- Example
learning parameter Gamma = 0.8, and the initial state as Room 1.

100

80

• Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)]


Q(3, 1) = R(3,1) + 0.8 * Max[Q(1, 3), Q(1, 5)] = 0 + 0.8 * Max(0, 100) = 80
Question
1. What are the key components of a reinforcement learning problem, and how do they interact with each other?
2. Distinguish between model-based and model-free reinforcement learning approaches. Illustrate the dissimilarity
with the help of specific algorithms for each category.
3. How does an agent interact with an environment in the context of reinforcement learning? Provide an example.

You might also like