Unit-5
Unit-5
Techniques
KCS 055
Reinforcement Learning
Reinforcement Learning
• It is a feedback-based machine
learning approach.
• Learns depending on changes
occurring in environment without
any labelled data.
• Goal - To perform actions by looking
at the environment and get the
maximum positive rewards.
• Example: Chessboard
– Goal - To win the game
– Feedback- based on the right
choice
Reinforcement Learning
Get a Feedback
(Reward/Penalty)
Remain in same
state/change state
Two types of Reinforcement Learning:
• Rewards: for every action, the agent receives a reward and penalty.
• Episodes: the end of the stage, where agents can’t take new action. It
happens when the agent has achieved the goal or failed.
Q-Learning Algorithm
• Q-Table: the agent maintains the Q-table of sets of states and actions.
Choose Perform
Initialize Measure Update
an an
Q-table Reward Q-table
Action Action
Reference: https://www.datacamp.com/tutorial/introduction-q-
learning-beginner-tutorial
Q-Learning Algorithm
Q Table
• Step 1: Initialize Q- table
Q Table
• Step 2: Choose an action.
At the start, the agent will
choose to take the random
action(down or right), and
on the second run, it will
use an updated Q-Table to
select the action.
Q Table
• Step 3: Perform an action.
Initially, the agent is in exploration
mode and chooses a random
action to explore the environment.
The Epsilon Greedy Strategy is a
simple method to balance
exploration and exploitation. The
epsilon stands for the probability of
choosing to explore and exploits
when there are smaller chances of
exploring.
Q Table
• Step 3: Perform an action.
At the start, the epsilon rate is
higher, meaning the agent is in
exploration mode. While exploring
the environment, the epsilon
decreases, and agents start to
exploit the environment. During
exploration, with every iteration,
the agent becomes more confident
in estimating Q-values.
Q Table
• Step 3: Perform an action.
Q Table
• Step 4: Update Q- table.
• We will update the function Q(St, At) using the
equation. It uses the previous episode’s estimated
Q-values, learning rate, and Temporal Differences
error. Temporal Differences error is calculated using
Immediate reward, the discounted maximum
expected future reward, and the former estimation
Q-value.
• The process is repeated multiple times until the Q-
Table is updated and the Q-value function is
maximized.
Q Table
• Step 4: Update Q- table.
Q Table
• Step 4: Update Q- table.
Deep Q learning
• The solution for the above problem comes from the realization
that the values in the matrix only have relative importance i.e. the
values only have importance with respect to the other values.
• Thus, this thinking leads us to Deep Q-Learning which uses a deep
neural network to approximate the values.
• The basic working step for Deep Q-Learning is that the initial state
is fed into the neural network, and it returns the Q-value of all
possible actions as an output.
Deep Q learning
Deep Q learning
• Initialize main
2 • Use Bellman
and target • Use Epsilon- equation to
neural Greedy update network
networks. exploration weight.
strategy.
1 3
Genetic Algorithm
• Search-based optimization technique.
• Based on the principles of genetics and natural selection.
• It keeps on evolving better solutions over next
generations, till it reach stopping criteria.
Basic Terminologies of
Genetic Algorithm
• Genes : A single bit of a
bit string.
• Chromosome: The
possible solution (a bit
string, collection of
genes.
• Population: Set of
solutions.
Basic Terminologies of
Genetic Algorithm
• Allele: The possibility of
combination of genes to
make a property.
• Gene Pool: All possible
combination of genes
that are alleles
Basic Terminologies of
Genetic Algorithm
• Crossover: process of
taking 2 individual bit
stream (solution) and
producing new child bit
stream (offspring) from
them.
Basic Terminologies of
Genetic Algorithm
• 3 types of crossover:
– Single point crossover: data
bits are swapped between 2
parent strings after this
crossover point.
– 2-point crossover: Bits
between 2 points are swapped.
– Uniform crossover: Random
bits are swapped with equal
probability
Basic Terminologies of
Genetic Algorithm
• Mutation: A small random change
in chromosome. It is used to
introduce diversity in the genetic
population.
• Types of Mutation:
– Bit Flip mutation
– Swap mutation
– Random resetting
– Scramble mutation
– Inversion mutation
Basic Terminologies of
Genetic Algorithm
• Bit Flip Mutation: One or more random bits are selected and flipped.
Basic Terminologies of
Genetic Algorithm
Optimal Solution
Mutation as output
Initial Population
of Solutions
Termi
Stop
No nate? Yes
Fitness Function
• Determines the fitness of
individual solution(bit
string).
• Fitness refers to the
ability of an individual to
compete with other
individuals.
• The induvial solution is
selected based on the
fitness score value.
Advantages and Disadvantages of GA
Advantages Disadvantages
• It has wide solution space. • The fitness function
• It is easier to discover global calculation is a limitation.
optimum. • Convergence of GA can be too
• Multiple GA can run fast or too slow.
together in same CPU. • Limit of selecting parameters.
Example-1
a. Evaluate the fitness of each individual, showing all your workings, and arrange them
in order with the fittest first and the least fit last.
b. Perform the following crossover operations.
i. Cross the fittest two individual using one-point crossover at the middle point.
ii. Cross the second and third fittest individuals using a two-point crossover
(points b and f).
iii. Cross the first and third fittest individuals (ranked 1st and 3rd) using a uniform
crossover.
c. Suppose the new population consists of the six offspring individuals received by the
crossover operations in the above question. Evaluate the fitness of the new
population, showing all the workings. Has the overall fitness improved?
x = “a b c d e f g h”
f(x) = (a+b)-(c+d)+(e+f)-(g+h)
x1 = 6 5 4 1 3 5 3 2
x2 = 8 7 1 2 6 6 0 1
x3 = 2 3 9 2 1 2 8 5
x4 = 4 1 8 5 2 0 9 4
a. Evaluate the fitness of each individual, showing all your
workings, and arrange them in order with the fittest first
and the least fit last.
Sol: f(x1) = (6+5)-(4+1)+(3+5)-(3+2) = 9
f(x2) = (8+7)-(1+2)+(6+6)-(0+1) = 23 The order is
f(x3) = (2+3)-(9+2)+(1+2)-(8+5) = -16 x2, x1, x3, x4
f(x4) = (4+1)-(8+5)+(2+0)-(9+4) = -19
x = “a b c d e f g h”
f(x) = (a+b)-(c+d)+(e+f)-(g+h)
x1 = 6 5 4 1 3 5 3 2
x2 = 8 7 1 2 6 6 0 1
x3 = 2 3 9 2 1 2 8 5
x4 = 4 1 8 5 2 0 9 4
i. Cross the fittest two individual using one-point crossover at the middle
point.
Sol: x2 = 8 7 1 2 | 6 6 0 1 o1 = 8 7 1 2 3 5 3 2
x1 = 6 5 4 1 |3 5 3 2 o2 = 6 5 4 1 6 6 0 1
x = “a b c d e f g h”
f(x) = (a+b)-(c+d)+(e+f)-(g+h)
x1 = 6 5 4 1 3 5 3 2
x2 = 8 7 1 2 6 6 0 1
x3 = 2 3 9 2 1 2 8 5
x4 = 4 1 8 5 2 0 9 4
ii. Cross the second and third fittest individuals using a two-point crossover
(points b and f).
Sol: x1 = 6 5 | 4 1 3 5 | 3 2 o3 = 6 5 9 2 1 2 3 2
x3 = 2 3 | 9 2 1 2 | 8 5 o4 = 2 3 4 1 3 5 8 5
x = “a b c d e f g h”
f(x) = (a+b)-(c+d)+(e+f)-(g+h)
x1 = 6 5 4 1 3 5 3 2
x2 = 8 7 1 2 6 6 0 1
x3 = 2 3 9 2 1 2 8 5
x4 = 4 1 8 5 2 0 9 4
iii. Cross the first and third fittest individuals (ranked 1st and 3rd) using a
uniform crossover.
Sol: x2 = 8 7 1 2 6 6 0 1 o5 = 2 7 1 2 6 2 0 1
x3 = 2 3 9 2 1 2 8 5 o6 = 8 3 9 2 1 6 8 5
x = “a b c d e f g h”
f(x) = (a+b)-(c+d)+(e+f)-(g+h)
o1 = 8 7 1 2 3 5 3 2
o2 = 6 5 4 1 6 6 0 1
o3 = 6 5 9 2 1 2 3 2
o4 = 2 3 4 1 3 5 8 5
o5 = 2 7 1 2 6 2 0 1
o6 = 8 3 9 2 1 6 8 5
c. Suppose the new population consists of the six offspring individuals received
by the crossover operations in the above question. Evaluate the fitness of the
new population, showing all the workings. Has the overall fitness improved?.
Sol: f(o1) = (8+7)-(1+2)+(3+5)-(3+2) = 15
f(o2) = (6+5)-(4+1)+(6+6)-(0+1) = 17
f(o3) = (6+5)-(9+2)+(1+2)-(3+2) = -2
f(o4) = (2+3)-(4+1)+(3+5)-(8+5) = -5 Yes, the overall
f(o5) = (2+7)-(1+2)+(6+2)-(0+1) = 13 fitness has
f(o6) = (8+3)-(9+2)+(1+6)-(8+5) = -6 improved.
Reference Books
Tom M. Mitchell, Ethem Alpaydin, ―Introduction Stephen Marsland, Bishop, C., Pattern
―Machine Learning, to Machine Learning (Adaptive ―Machine Learning: An Recognition and Machine
McGraw-Hill Computation and Machine Algorithmic Perspective, Learning. Berlin:
Education (India) Learning), The MIT Press 2004. CRC Press, 2009. Springer-Verlag.
Private Limited, 2013.
Text Books
Saikat Dutt, Andreas C. Müller and John Paul Mueller and Dr. Himanshu
Subramanian Sarah Guido - Luca Massaron - Sharma, Machine
Chandramaouli, Amit Introduction to Machine Machine Learning for Learning, S.K.
Kumar Das – Machine Learning with Python Dummies Kataria & Sons -2022
Learning, Pearson