0% found this document useful (0 votes)

6 views70 pages

Unit-5

The document discusses Reinforcement Learning (RL), a machine learning approach that learns from feedback in an environment without labeled data, aiming to maximize rewards through actions. It covers basic components of RL, including agents, environments, actions, states, rewards, and penalties, as well as algorithms like Q-Learning and Deep Q-Learning, which utilize neural networks for complex environments. Additionally, it introduces Genetic Algorithms as a search-based optimization technique inspired by natural selection, detailing its components and processes.

Uploaded by

darkguardian363

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views70 pages

Unit-5

Uploaded by

darkguardian363

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

Machine Learning

Techniques

KCS 055
Reinforcement Learning
Reinforcement Learning
• It is a feedback-based machine
learning approach.
• Learns depending on changes
occurring in environment without
any labelled data.
• Goal - To perform actions by looking
at the environment and get the
maximum positive rewards.
• Example: Chessboard
– Goal - To win the game
– Feedback- based on the right
choice
Reinforcement Learning

• The agent learn by its experience as there is no

labelled data.
• It is used to solve specific type of problem where
decision making is sequential, and the goal is long-
term, such as game-playing, robotics, etc.
Basic Components Of Reinforcement
Learning

• Agent → A hardware/software/computer program. For

ex: AI Robot, Robotic Car.
• Environment → The situation or surroundings of the
agent, ex: Road Highway.
• Action→ The movement of agent inside the
environment, ex: Move right/left/up/down.
• State → The situation returned by the environment
after each action.
Basic Components Of Reinforcement
Learning

• Reward → Positive feedback

• Penalty → Negative feedback
• Policy → Strategy of agent for next action.
• Policy Map → Agent’s action selection is called policy
map.
Steps in Reinforcement Learning
Take an Action

Get a Feedback
(Reward/Penalty)

Remain in same
state/change state
Two types of Reinforcement Learning:

Positive Reinforcement Learning Negative Reinforcement Learning

• Recurrence of behavior due to • Negative rewards are used as
positive rewards. a deterrent to weaken the
• Such rewards increases behavior and to avoid it.
strength and the frequency of a • These rewards decreases the
specific behavior and strength and the frequency of
encourages to execute similar a specific behavior.
action in future.
Markov Decision Problem
Q-Learning Algorithm

• Model-free reinforcement learning algorithm.

• Learns the value of an action in a particular state.

• The ‘Q’ stands for quality of actions.

• The quality represents the usefulness of a given action.

Q-Learning Algorithm

• States(s): the current position of the agent in the environment.

• Action(a): a step taken by the agent in a particular state.

• Rewards: for every action, the agent receives a reward and penalty.

• Episodes: the end of the stage, where agents can’t take new action. It
happens when the agent has achieved the goal or failed.
Q-Learning Algorithm

• Q(St+1, a): expected optimal Q-value of doing the action in a particular

state.

• Q(St, At): It is the current estimation of Q(St+1, a).

• Q-Table: the agent maintains the Q-table of sets of states and actions.

• Temporal Differences(TD): used to estimate the expected value of Q(St+1,

a) by using the current state and action and previous state and action.
Temporal Difference

• Model free learning.

• Combination of the Monte Carlo (MC) method and the
Dynamic Programming (DP) method.
• Monte Carlo Ideas : (MC)
– Learns directly from raw experience i.e. without
model
– No predefined model.
Temporal Difference

• Dynamic Programming : (DP)

– Estimates based on part of learning rather than
waiting for final outcome.
• 2 properties of Temporal Difference Learning
– Does not require the model to be known in
advance.
– Also, can be applied for non-episodic tasks.
Temporal Difference
Steps followed:

• Exploration: Explore all possible paths

• Exploitation: Best Possible path is identified.

Choose Perform
Initialize Measure Update
an an
Q-table Reward Q-table
Action Action

A number of iteration result a good Q-table.

Q function
• Based on Bellman Equation.
• Takes 2 inputs as state (s) and action (a)
Updating Q-table
Q Table
• Example: In a
Game
• Actions: up,
down, right, left
• State – Start, End,
Idle, Hole, etc.

Reference: https://www.datacamp.com/tutorial/introduction-q-
learning-beginner-tutorial
Q-Learning Algorithm
Q Table
• Step 1: Initialize Q- table
Q Table
• Step 2: Choose an action.
At the start, the agent will
choose to take the random
action(down or right), and
on the second run, it will
use an updated Q-Table to
select the action.
Q Table
• Step 3: Perform an action.
Initially, the agent is in exploration
mode and chooses a random
action to explore the environment.
The Epsilon Greedy Strategy is a
simple method to balance
exploration and exploitation. The
epsilon stands for the probability of
choosing to explore and exploits
when there are smaller chances of
exploring.
Q Table
• Step 3: Perform an action.
At the start, the epsilon rate is
higher, meaning the agent is in
exploration mode. While exploring
the environment, the epsilon
decreases, and agents start to
exploit the environment. During
exploration, with every iteration,
the agent becomes more confident
in estimating Q-values.
Q Table
• Step 3: Perform an action.
Q Table
• Step 4: Update Q- table.
• We will update the function Q(St, At) using the
equation. It uses the previous episode’s estimated
Q-values, learning rate, and Temporal Differences
error. Temporal Differences error is calculated using
Immediate reward, the discounted maximum
expected future reward, and the former estimation
Q-value.
• The process is repeated multiple times until the Q-
Table is updated and the Q-value function is
maximized.
Q Table
• Step 4: Update Q- table.
Q Table
• Step 4: Update Q- table.
Deep Q learning

• Q-Learning creates an exact matrix for the working

agent which it can “refer to” to maximize its
reward in the long run.
• This is only practical for very small environments
and quickly loses it’s feasibility when the number
of states and actions in the environment
increases.
Deep Q learning

• The solution for the above problem comes from the realization
that the values in the matrix only have relative importance i.e. the
values only have importance with respect to the other values.
• Thus, this thinking leads us to Deep Q-Learning which uses a deep
neural network to approximate the values.
• The basic working step for Deep Q-Learning is that the initial state
is fed into the neural network, and it returns the Q-value of all
possible actions as an output.
Deep Q learning
Deep Q learning

• It is a variant of Q-Learning that uses a deep neural network to

represent the Q-function, rather than a simple table of values.
• Handle environments with a large number of states and
actions, as well as to learn from high-dimensional inputs such
as images or sensor data.
• Most important feature
– Experience replay
– Target Networks
Deep Q learning
(Experience replay)

• Experience replay is a technique where the agent

stores a subset of its experiences (state, action,
reward, next state) in a memory buffer and samples
from this buffer to update the Q-function.
• This helps to decorrelate the data and make the
learning process more stable.
Deep Q learning
(Target Networks)

• Target networks are used to stabilize the Q-function

updates.
• In this technique, a separate network is used to
compute the target Q-values, which are then used to
update the Q-function network.
Q learning vs Deep Q learning
Deep Q learning

• Initialize main
2 • Use Bellman
and target • Use Epsilon- equation to
neural Greedy update network
networks. exploration weight.
strategy.
1 3
Genetic Algorithm
• Search-based optimization technique.
• Based on the principles of genetics and natural selection.
• It keeps on evolving better solutions over next
generations, till it reach stopping criteria.
Basic Terminologies of
Genetic Algorithm
• Genes : A single bit of a
bit string.
• Chromosome: The
possible solution (a bit
string, collection of
genes.
• Population: Set of
solutions.
Basic Terminologies of
Genetic Algorithm
• Allele: The possibility of
combination of genes to
make a property.
• Gene Pool: All possible
combination of genes
that are alleles
Basic Terminologies of
Genetic Algorithm
• Crossover: process of
taking 2 individual bit
stream (solution) and
producing new child bit
stream (offspring) from
them.
Basic Terminologies of
Genetic Algorithm
• 3 types of crossover:
– Single point crossover: data
bits are swapped between 2
parent strings after this
crossover point.
– 2-point crossover: Bits
between 2 points are swapped.
– Uniform crossover: Random
bits are swapped with equal
probability
Basic Terminologies of
Genetic Algorithm
• Mutation: A small random change
in chromosome. It is used to
introduce diversity in the genetic
population.
• Types of Mutation:
– Bit Flip mutation
– Swap mutation
– Random resetting
– Scramble mutation
– Inversion mutation
Basic Terminologies of
Genetic Algorithm

• Bit Flip Mutation: One or more random bits are selected and flipped.
Basic Terminologies of
Genetic Algorithm

• Random Resetting: Extension of bit flip method, for integer

representation.
Basic Terminologies of
Genetic Algorithm
• Swap Mutation: We
select 2 positions on the
chromosome at random
and interchange their
values.
Basic Terminologies of
Genetic Algorithm
• Scramble
Mutation: A subset
of genes is chosen,
and their values
are shuffled
randomly.
Basic Terminologies of
Genetic Algorithm
• Inversion Mutation: A
subset of genes is
chosen, and their genes
are inverted as a string.
Evolution

Selection Flow chart of GA

Best Individual
Solution
Crossover
Start

Optimal Solution
Mutation as output
Initial Population
of Solutions

Termi
Stop
No nate? Yes
Fitness Function
• Determines the fitness of
individual solution(bit
string).
• Fitness refers to the
ability of an individual to
compete with other
individuals.
• The induvial solution is
selected based on the
fitness score value.
Advantages and Disadvantages of GA

Advantages Disadvantages
• It has wide solution space. • The fitness function
• It is easier to discover global calculation is a limitation.
optimum. • Convergence of GA can be too
• Multiple GA can run fast or too slow.
together in same CPU. • Limit of selecting parameters.
Example-1

• Let the population of chromosome in genetic

algorithm is represented in terms of binary number.
The strength of fitness of a chromosome in decimal
𝑓(𝑥)
form x, is given by 𝑆𝑓 𝑥 = σ 𝑓(𝑥)
where f(x) = x2 .
• The population is given by P where:
P ={(01101),(11000),(01000),(10011)}
P ={(01101),(11000),(01000),(10011)}
𝑓(𝑥) 2
𝑆𝑓 𝑥 = σ 𝑓(𝑥)
where f(x) = x
Step 1: Selection
P Value in f(x)=x2
decimal
01101 13 169
11000 24 576
01000 8 64
10011 19 361
P ={(01101),(11000),(01000),(10011)}
𝑓(𝑥) 2
𝑆𝑓 𝑥 = σ 𝑓(𝑥)
where f(x) = x
P Value in f(x)=x2 𝑓(𝑥)
𝑆𝑓 𝑥 =
decimal σ 𝑓(𝑥)
01101 13 169 169/1170=0.14
11000 24 576 576/1170=0.49
01000 8 64 64/1170=0.06
10011 19 361 361/1170=0.31
Total 1170
P ={(01101),(11000),(01000),(10011)}
𝑓(𝑥) 2
𝑆𝑓 𝑥 = σ 𝑓(𝑥)
where f(x) = x
Step 1: Selection
P Value in f(x)=x2 𝑓(𝑥) Expected count
𝑆𝑓 𝑥 =
decimal σ 𝑓(𝑥) N*Sf(x)
01101 13 169 0.14 4*0.14=0.56
11000 24 576 0.49 4*0.49=1.96
01000 8 64 0.06 4*0.06=0.24
10011 19 361 0.31 4*0.31=1.24
Total 1170
P ={(01101),(11000),(01000),(10011)}
𝑓(𝑥) 2
𝑆𝑓 𝑥 = σ 𝑓(𝑥)
where f(x) = x
Step 2: Crossover
P Crossover After Value in f(x)=x2
Initial point crossover decimal
0110|1 4 01100 12 144
1100|0 4 11001 25 625
11|000 2 11011 27 729
10|011 2 10000 16 256
Total 1754
P ={(01101),(11000),(01000),(10011)}
𝑓(𝑥) 2
𝑆𝑓 𝑥 = σ 𝑓(𝑥)
where f(x) = x
Step 3: Mutation
After After Value in f(x)=x2
crossover mutation decimal
01100 11100 26 676
11001 11001 25 625
11011 11011 27 729
10000 10100 18 324
Total 2354
Example - 2

Suppose a genetic algorithm uses chromosomes of the form x = “a

b c d e f g h” with a fixed length of eight genes. Each gene can be
any digit between 0 and 9. Let the fitness of individual x be
calculated as: f(x) = (a+b)-(c+d)+(e+f)-(g+h). Let the initial
population consist of four individuals with the following
chromosomes:
x1 = 6 5 4 1 3 5 3 2
x2 = 8 7 1 2 6 6 0 1
x3 = 2 3 9 2 1 2 8 5
x4 = 4 1 8 5 2 0 9 4
Example - 2

a. Evaluate the fitness of each individual, showing all your workings, and arrange them
in order with the fittest first and the least fit last.
b. Perform the following crossover operations.
i. Cross the fittest two individual using one-point crossover at the middle point.
ii. Cross the second and third fittest individuals using a two-point crossover
(points b and f).
iii. Cross the first and third fittest individuals (ranked 1st and 3rd) using a uniform
crossover.
c. Suppose the new population consists of the six offspring individuals received by the
crossover operations in the above question. Evaluate the fitness of the new
population, showing all the workings. Has the overall fitness improved?
x = “a b c d e f g h”
f(x) = (a+b)-(c+d)+(e+f)-(g+h)
x1 = 6 5 4 1 3 5 3 2
x2 = 8 7 1 2 6 6 0 1
x3 = 2 3 9 2 1 2 8 5
x4 = 4 1 8 5 2 0 9 4
a. Evaluate the fitness of each individual, showing all your
workings, and arrange them in order with the fittest first
and the least fit last.
Sol: f(x1) = (6+5)-(4+1)+(3+5)-(3+2) = 9
f(x2) = (8+7)-(1+2)+(6+6)-(0+1) = 23 The order is
f(x3) = (2+3)-(9+2)+(1+2)-(8+5) = -16 x2, x1, x3, x4
f(x4) = (4+1)-(8+5)+(2+0)-(9+4) = -19
x = “a b c d e f g h”
f(x) = (a+b)-(c+d)+(e+f)-(g+h)
x1 = 6 5 4 1 3 5 3 2
x2 = 8 7 1 2 6 6 0 1
x3 = 2 3 9 2 1 2 8 5
x4 = 4 1 8 5 2 0 9 4
i. Cross the fittest two individual using one-point crossover at the middle
point.
Sol: x2 = 8 7 1 2 | 6 6 0 1 o1 = 8 7 1 2 3 5 3 2
x1 = 6 5 4 1 |3 5 3 2 o2 = 6 5 4 1 6 6 0 1
x = “a b c d e f g h”
f(x) = (a+b)-(c+d)+(e+f)-(g+h)
x1 = 6 5 4 1 3 5 3 2
x2 = 8 7 1 2 6 6 0 1
x3 = 2 3 9 2 1 2 8 5
x4 = 4 1 8 5 2 0 9 4
ii. Cross the second and third fittest individuals using a two-point crossover
(points b and f).
Sol: x1 = 6 5 | 4 1 3 5 | 3 2 o3 = 6 5 9 2 1 2 3 2
x3 = 2 3 | 9 2 1 2 | 8 5 o4 = 2 3 4 1 3 5 8 5
x = “a b c d e f g h”
f(x) = (a+b)-(c+d)+(e+f)-(g+h)
x1 = 6 5 4 1 3 5 3 2
x2 = 8 7 1 2 6 6 0 1
x3 = 2 3 9 2 1 2 8 5
x4 = 4 1 8 5 2 0 9 4
iii. Cross the first and third fittest individuals (ranked 1st and 3rd) using a
uniform crossover.
Sol: x2 = 8 7 1 2 6 6 0 1 o5 = 2 7 1 2 6 2 0 1
x3 = 2 3 9 2 1 2 8 5 o6 = 8 3 9 2 1 6 8 5
x = “a b c d e f g h”
f(x) = (a+b)-(c+d)+(e+f)-(g+h)
o1 = 8 7 1 2 3 5 3 2
o2 = 6 5 4 1 6 6 0 1
o3 = 6 5 9 2 1 2 3 2
o4 = 2 3 4 1 3 5 8 5
o5 = 2 7 1 2 6 2 0 1
o6 = 8 3 9 2 1 6 8 5

c. Suppose the new population consists of the six offspring individuals received
by the crossover operations in the above question. Evaluate the fitness of the
new population, showing all the workings. Has the overall fitness improved?.
Sol: f(o1) = (8+7)-(1+2)+(3+5)-(3+2) = 15
f(o2) = (6+5)-(4+1)+(6+6)-(0+1) = 17
f(o3) = (6+5)-(9+2)+(1+2)-(3+2) = -2
f(o4) = (2+3)-(4+1)+(3+5)-(8+5) = -5 Yes, the overall
f(o5) = (2+7)-(1+2)+(6+2)-(0+1) = 13 fitness has
f(o6) = (8+3)-(9+2)+(1+6)-(8+5) = -6 improved.
Reference Books

Tom M. Mitchell, Ethem Alpaydin, ―Introduction Stephen Marsland, Bishop, C., Pattern
―Machine Learning, to Machine Learning (Adaptive ―Machine Learning: An Recognition and Machine
McGraw-Hill Computation and Machine Algorithmic Perspective, Learning. Berlin:
Education (India) Learning), The MIT Press 2004. CRC Press, 2009. Springer-Verlag.
Private Limited, 2013.
Text Books

Saikat Dutt, Andreas C. Müller and John Paul Mueller and Dr. Himanshu
Subramanian Sarah Guido - Luca Massaron - Sharma, Machine
Chandramaouli, Amit Introduction to Machine Machine Learning for Learning, S.K.
Kumar Das – Machine Learning with Python Dummies Kataria & Sons -2022
Learning, Pearson

Sex Chromosomes and Sex Determination
No ratings yet
Sex Chromosomes and Sex Determination
22 pages
IWA Treatment Wetlands PDF
100% (1)
IWA Treatment Wetlands PDF
172 pages
Unit-5
No ratings yet
Unit-5
52 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
unit-5
No ratings yet
unit-5
65 pages
Unit-5 Mlt
No ratings yet
Unit-5 Mlt
13 pages
UNIT-5
No ratings yet
UNIT-5
54 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
UNIT- 5
No ratings yet
UNIT- 5
43 pages
Intro to Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro to Reinforcement Learning - DQ Q AC A3C
36 pages
F20-AI-L11
No ratings yet
F20-AI-L11
52 pages
Unit 1
No ratings yet
Unit 1
18 pages
7- Reinforcement Learning
No ratings yet
7- Reinforcement Learning
23 pages
Lec 09
No ratings yet
Lec 09
26 pages
Unit 5 ML
No ratings yet
Unit 5 ML
23 pages
AI (IT) UNIT-5
No ratings yet
AI (IT) UNIT-5
43 pages
Lecture Notes on Reinforcement Learning Basics
No ratings yet
Lecture Notes on Reinforcement Learning Basics
6 pages
Intro To Reinforcement Learning
No ratings yet
Intro To Reinforcement Learning
56 pages
10. Learning Task
No ratings yet
10. Learning Task
14 pages
RL Class Mtech
No ratings yet
RL Class Mtech
67 pages
Q learning
No ratings yet
Q learning
187 pages
Q-Learning Algorithm (1)
No ratings yet
Q-Learning Algorithm (1)
13 pages
S18 Reinforcement Learning 2
No ratings yet
S18 Reinforcement Learning 2
46 pages
ReinforcementLearning
No ratings yet
ReinforcementLearning
17 pages
Deep Learning Binoy-19-3-RL Q Learning
No ratings yet
Deep Learning Binoy-19-3-RL Q Learning
26 pages
Filippov Theory On Infinitesimal Epsilon-Greedy Q-Learning
No ratings yet
Filippov Theory On Infinitesimal Epsilon-Greedy Q-Learning
66 pages
Q Learning
No ratings yet
Q Learning
38 pages
AI Seminar RL
No ratings yet
AI Seminar RL
27 pages
MDP
No ratings yet
MDP
10 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
37 RL
No ratings yet
37 RL
18 pages
Q Learning SARSA Deep Q Learning
No ratings yet
Q Learning SARSA Deep Q Learning
4 pages
13-RL DRL
No ratings yet
13-RL DRL
102 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
38 pages
Unit-5 Genetic Reinforcement Markov Q-Learning
No ratings yet
Unit-5 Genetic Reinforcement Markov Q-Learning
39 pages
Intelligent Optimization Algorithm for Master (2)
No ratings yet
Intelligent Optimization Algorithm for Master (2)
47 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
unit5 mlt
No ratings yet
unit5 mlt
26 pages
Reinforcement Learning by Comparing Immediate Reward: Punit Pandey Deepshikhapandey
No ratings yet
Reinforcement Learning by Comparing Immediate Reward: Punit Pandey Deepshikhapandey
5 pages
ML Unit-4 - RTU
No ratings yet
ML Unit-4 - RTU
18 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
34 pages
A Painless Q-Learning Tutorial
No ratings yet
A Painless Q-Learning Tutorial
6 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Q-Learning: Reinforcement Learning Basic Q-Learning Algorithm Common Modifications
No ratings yet
Q-Learning: Reinforcement Learning Basic Q-Learning Algorithm Common Modifications
22 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
CS480 Lecture November 21st
No ratings yet
CS480 Lecture November 21st
193 pages
unit 5 ml
No ratings yet
unit 5 ml
15 pages
5th Unit Notes Full File (1)
No ratings yet
5th Unit Notes Full File (1)
22 pages
Reinforedu
No ratings yet
Reinforedu
46 pages
Unit 5
No ratings yet
Unit 5
36 pages
Reinforcement Learning - Ipynb - Colaboratory
No ratings yet
Reinforcement Learning - Ipynb - Colaboratory
7 pages
Q Learning Ejemplo
100% (1)
Q Learning Ejemplo
11 pages
Artificial Intelligence: Project Report "Implementation of Snake Game Using Deep Q-Learning Algorithm"
No ratings yet
Artificial Intelligence: Project Report "Implementation of Snake Game Using Deep Q-Learning Algorithm"
9 pages
Reinforcement learning
No ratings yet
Reinforcement learning
10 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
12 pages
Artificial Intelligence: Computer Science & Engineering, Khulna University
No ratings yet
Artificial Intelligence: Computer Science & Engineering, Khulna University
30 pages
Exploring Game Playing AI Using Reinforcement Learning Techniques
No ratings yet
Exploring Game Playing AI Using Reinforcement Learning Techniques
5 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Ways to Achieve Quality
From Everand
Ways to Achieve Quality
chakrapani srinivasa
5/5 (1)
SEM 4 - Physiology = Zoology
No ratings yet
SEM 4 - Physiology = Zoology
4 pages
Reproductive System
No ratings yet
Reproductive System
22 pages
BEE602 Microprocessor [email protected] SET B
No ratings yet
BEE602 Microprocessor [email protected] SET B
2 pages
MakeMyTrip Research - Aryan Yadav
No ratings yet
MakeMyTrip Research - Aryan Yadav
4 pages
CoPilot Architecture
No ratings yet
CoPilot Architecture
2 pages
unit 3
No ratings yet
unit 3
90 pages
Reasoning Biochemistry
No ratings yet
Reasoning Biochemistry
5 pages
The History of Botulinum Toxin From Poison To Beauty
No ratings yet
The History of Botulinum Toxin From Poison To Beauty
3 pages
How Do Organisms Reproduce Class 10 Notes Science Chapter 8 - Learn CBSE
No ratings yet
How Do Organisms Reproduce Class 10 Notes Science Chapter 8 - Learn CBSE
20 pages
Extracting DNA from Strawberries _ Science project _ Education.com
No ratings yet
Extracting DNA from Strawberries _ Science project _ Education.com
3 pages
Gtu Biochemical Question Bank
No ratings yet
Gtu Biochemical Question Bank
2 pages
Concepts Principles and Theories in The Care of The Older Adults 2
No ratings yet
Concepts Principles and Theories in The Care of The Older Adults 2
61 pages
Amazon Phase Report 094 October-December 2009
No ratings yet
Amazon Phase Report 094 October-December 2009
61 pages
Instant download Biochemistry Cell and Molecular Biology and Genetics An Integrated Textbook 1st Edition Zeynep Gromley pdf all chapter
No ratings yet
Instant download Biochemistry Cell and Molecular Biology and Genetics An Integrated Textbook 1st Edition Zeynep Gromley pdf all chapter
67 pages
Answer guide 4
No ratings yet
Answer guide 4
7 pages
Digestive System 2
No ratings yet
Digestive System 2
4 pages
Molecular Oncology - 2019 - Arlt - High CD206 Levels in Hodgkin Lymphoma Educated Macrophages Are Linked To
No ratings yet
Molecular Oncology - 2019 - Arlt - High CD206 Levels in Hodgkin Lymphoma Educated Macrophages Are Linked To
19 pages
Kisah dari Sumba Maria Monica Wihardja pdf download
100% (9)
Kisah dari Sumba Maria Monica Wihardja pdf download
66 pages
LOMBROSIAN THEORY 12
No ratings yet
LOMBROSIAN THEORY 12
3 pages
Water Quality Assessment, A Case Study On Selected Parameters at Semenyih River, Kajang
100% (2)
Water Quality Assessment, A Case Study On Selected Parameters at Semenyih River, Kajang
25 pages
CGC1D Exam Review Questions-June 2013
100% (2)
CGC1D Exam Review Questions-June 2013
3 pages
Discover Magazine - April 2011-TV
100% (1)
Discover Magazine - April 2011-TV
84 pages
CHEM43 PS On Carbs and Lipids
No ratings yet
CHEM43 PS On Carbs and Lipids
2 pages
Detailed Lesson Plan in Grade Iii Science
100% (3)
Detailed Lesson Plan in Grade Iii Science
6 pages
AWOLOWO
No ratings yet
AWOLOWO
45 pages
Biometry: The Principles and Practices of Statistics in Biological Research. ISBN 0716724111, 978-0716724117
100% (23)
Biometry: The Principles and Practices of Statistics in Biological Research. ISBN 0716724111, 978-0716724117
23 pages
State Forest Report 06 07
100% (1)
State Forest Report 06 07
134 pages
X Chromosome: Genetics Home Reference
No ratings yet
X Chromosome: Genetics Home Reference
10 pages
Activity 11 Bile BIOCHEM
100% (1)
Activity 11 Bile BIOCHEM
8 pages
VP Sales Biotech Pharmaceuticals in Atlanta GA Resume Valerie Darling
No ratings yet
VP Sales Biotech Pharmaceuticals in Atlanta GA Resume Valerie Darling
2 pages
BMS Prime Practice Sheet-01_Human Reproduction _Zoology -1
No ratings yet
BMS Prime Practice Sheet-01_Human Reproduction _Zoology -1
7 pages
Lecture 4 - Cell Line Development
No ratings yet
Lecture 4 - Cell Line Development
44 pages
6min English Food Mood
No ratings yet
6min English Food Mood
5 pages
Lecture 13 - Central Nervous System
No ratings yet
Lecture 13 - Central Nervous System
102 pages

Unit-5

Uploaded by

Unit-5

Uploaded by

Machine Learning

• The agent learn by its experience as there is no

• Agent → A hardware/software/computer program. For

• Reward → Positive feedback

Positive Reinforcement Learning Negative Reinforcement Learning

• Model-free reinforcement learning algorithm.

• Learns the value of an action in a particular state.

• The ‘Q’ stands for quality of actions.

• The quality represents the usefulness of a given action.

• States(s): the current position of the agent in the environment.

• Action(a): a step taken by the agent in a particular state.

• Q(St+1, a): expected optimal Q-value of doing the action in a particular

• Q(St, At): It is the current estimation of Q(St+1, a).

• Temporal Differences(TD): used to estimate the expected value of Q(St+1,

• Model free learning.

• Dynamic Programming : (DP)

• Exploration: Explore all possible paths

A number of iteration result a good Q-table.

• Q-Learning creates an exact matrix for the working

• It is a variant of Q-Learning that uses a deep neural network to

• Experience replay is a technique where the agent

• Target networks are used to stabilize the Q-function

• Random Resetting: Extension of bit flip method, for integer

Selection Flow chart of GA

• Let the population of chromosome in genetic

Suppose a genetic algorithm uses chromosomes of the form x = “a

You might also like