0% found this document useful (0 votes)

12 views18 pages

Unit-5 ML

The document provides an overview of reinforcement learning, including its key components such as agent, environment, state, action, reward, policy, and value function. It also discusses Markov Chain Monte Carlo methods, their applications, and algorithms like Metropolis-Hastings and Gibbs Sampling, as well as graphical models like Bayesian Networks. Additionally, it covers sampling techniques and their types, emphasizing the importance of proposal distributions in MCMC methods.

Uploaded by

22wh1a05i0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views18 pages

Unit-5 ML

Uploaded by

22wh1a05i0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

UNIT-V

Reinforcement Learning; Overview of reinforcement learning, Getting Lost Example.

Markov Chain Monte Carlo Methods: Sampling, Proposal Distribution, Markov Chain Monte
Carlo.
Graphical Models: Bayesian Networks, Markov Random Fields, Hidden Markov Models,
Tracking Methods.
Reinforcement Learning:
• Reinforcement learning (RL) is a type of machine learning where an agent learns to
make decisions by interacting with an environment.
• The agent receives feedback in the form of rewards or penalties based on the actions it
takes, and its goal is to maximize the cumulative reward over time.

Key Components of Reinforcement Learning

1. Agent:
o The learner or decision-maker that interacts with the environment.
2. Environment:
o The external system the agent interacts with. It provides feedback based on the
agent's actions.
3. State:
o A representation of the current situation of the environment. The agent
perceives the environment through states.
4. Action:
o The set of all possible moves the agent can make in the environment.
5. Reward:
o Feedback from the environment based on the agent's actions. Positive rewards
incentivize desirable actions, while negative rewards (or penalties) discourage
undesirable actions.
6. Policy:
o A strategy used by the agent to determine the next action based on the current
state. It can be deterministic or stochastic.
7. Value Function:
o A function that estimates the expected cumulative reward of states or state-
action pairs, helping the agent to make decisions that maximize long-term
rewards.

1
The Learning Process

1. Exploration:
o The agent tries out different actions to discover their effects and gather
information about the environment.
2. Exploitation:
o The agent uses its knowledge to choose actions that it believes will maximize
the reward.
3. Balance:
o Effective RL requires balancing exploration and exploitation to ensure the
agent learns the optimal policy.

Positive Reinforcement
• What it means: Giving a reward to encourage a behavior.
• Example: If a student answers a question correctly and gets a chocolate, they’ll
try to answer more questions correctly.
• Effect: It motivates the person (or agent) to repeat the good behavior.
• Note: Too much reward can sometimes confuse or reduce the impact.

Negative Reinforcement
• What it means: Removing something unpleasant to encourage a behavior.
• Example: If a loud noise stops when you press a button, you’ll press the button
more often to avoid the noise.
• Effect: It also encourages good behavior, but by taking away something bad.
• Note: It works well in some situations but only supports the minimum required
behavior.

Both are used to increase the chance of good actions, but in different ways—one by
giving rewards, the other by removing discomfort.

Algorithms in Reinforcement Learning

1. Q-Learning
➤ What is it?
Q-Learning is a model-free reinforcement learning algorithm. That means the
agent learns how to behave without knowing how the environment works. It
tries different actions and learns from the results.
It learns a Q-value, which tells it how good it is to take a certain action in a
certain situation (state).
➤ How it works:
• The agent is in a state.
• It tries an action.
• It receives a reward.
• It updates its Q-table based on the best action it could take next (even if it doesn't
take it).
➤ Formula:
lua
Copy code
Q(s, a) = Q(s, a) + α [r + γ * max(Q(s', a')) - Q(s, a)]
Where:
• s = current state
• a = current action
• r = reward
• s' = new state after action
• a' = possible next actions
• α = learning rate
• γ = discount factor (how much future rewards matter)
➤ Simple Example:
Imagine a robot in a grid trying to reach a goal.
• Grid has 4 boxes: A, B, C, and D (D is the goal).
• The robot is in box A.
• If it moves to box B, it gets a small reward.
• If it reaches D, it gets a big reward.
Using Q-Learning, the robot will:
• Try all paths randomly.
• Note which moves gave higher rewards.
• Eventually learn the best path to D, by updating the Q-values.
Even if the robot took a different path, it always updates the Q-table based on the
best possible action from the new state.

2. SARSA (State-Action-Reward-State-Action)
➤ What is it?
SARSA is also a model-free algorithm. It’s very similar to Q-Learning but with
one key difference:
It updates the Q-values based on the actual action taken, not the best possible
action.
➤ Formula:
bash
Copy code
Q(s, a) = Q(s, a) + α [r + γ * Q(s', a') - Q(s, a)]
Where:
• a' = action actually taken in the next state.
➤ Key Difference:
• Q-Learning: Uses the best possible future action (max Q(s', a'))
• SARSA: Uses the next action the agent actually took (Q(s', a'))
➤ Simple Example:
Using the same robot-in-grid example:
• In Q-Learning, the robot updates its path based on the best move from the new
box—even if it didn’t actually take that move.
• In SARSA, the robot updates its knowledge based on the path it really took, even if
it wasn't the best move.
Getting Lost Example:
• "getting lost" example using reinforcement learning to understand how an agent learns
to navigate an environment, avoid pitfalls, and reach its goal.
• We can imagine a scenario where an agent (like a robot) is placed in a maze and needs
to find its way to the exit.

Scenario: Robot Navigating a Maze

1. Environment:
o The maze consists of a grid with walls, open spaces, and an exit.
o The robot starts at a random position and must find the exit.
2. State:
o The current position of the robot in the maze, represented by coordinates (x,
y).
3. Actions:
o The robot can move up, down, left, or right.
4. Rewards:
o Positive reward for reaching the exit.
o Negative reward for hitting a wall.

2
Applications of Reinforcement Learning:

Reinforcement learning is a powerful approach to building intelligent systems that can adapt
and improve through experience, opening up possibilities across a wide range of applications.

1. Game Playing: RL agents have achieved superhuman performance in games like

chess, Go, and video games (e.g., AlphaGo, OpenAI Five).
2. Robotics: Training robots to perform complex tasks such as walking, grasping
objects, and navigating environments.
3. Autonomous Vehicles: Learning to drive safely and efficiently in various traffic
conditions.
4. Healthcare: Optimizing treatment strategies, personalized medicine, and managing
clinical trials.
5. Finance: Algorithmic trading, portfolio management, and risk assessment.

Markov Chain Monte Carlo Methods:

Markov Chain Monte Carlo (MCMC) is a group of methods used to generate random samples from
complex probability distributions—especially when it’s too hard to calculate or sample from the
distribution directly.
These methods are used in Bayesian statistics, machine learning, physics, and other fields.

Key Concepts of MCMC Methods:

Markov Chain
• A Markov Chain is a series of steps (states), where:
o The next step only depends on the current step.
o It does not depend on how you got there (this is called the Markov property).
Example:
• Imagine you're on a game board.
• At each turn, you roll a die and move forward or backward based only on the current
position—not on how you got there.
Over time, as you keep playing, your location (state) settles into a pattern—this is called the
stationary distribution.

Monte Carlo
• Monte Carlo methods use randomness (random sampling) to solve problems.
• They are especially useful for estimating numbers or simulating events when exact
calculations are hard.
In MCMC, Monte Carlo sampling is used to estimate the shape of a complex probability
distribution.
How MCMC Works:
o Start with a point: Pick an initial value or state from the space.
o Propose a move: Generate a candidate for the next state based on proposal
rule.
o Evaluate the move: Calculate how likely this new state is compared to the
current state, based on the target distribution.
o Decide to accept or reject: If the new state is more likely then it is accepted
but if less likely then accept it with probability proportional to how less likely
it is.
o Repeat steps 2-4: Continue proposing and accepting/rejecting new states for
several times and over time the collected states (samples) will represent the
target distribution.
o Burn-in period: Discard initial samples until the chain "forgets" its starting
point and reaches steady behaviour.
o Collect samples: Use the remaining samples to estimate properties of the
target distributions
o
Common Algorithms

1. Metropolis-Hastings Algorithm:

One of the most popular MCMC methods.

Here's how it works:

o Start at a state.
o Propose a new state.
o Calculate an acceptance ratio (how good the new state is).
o Accept or reject the new state using a random decision.

If the new state is better, it's usually accepted.

If it's worse, it might still be accepted (to keep exploration flexible).

2. Gibbs Sampling:
• Special kind of MCMC used when you have multiple variables.
• Instead of proposing whole new states, it updates one variable at a time, based on the current
values of the others.
This is done in a cycle:
• Sample variable 1 (given all others fixed),
• Then sample variable 2,
• Then variable 3, and so on.
This is useful when you can easily sample from the conditional distributions.
Applications of MCMC

1. Bayesian Inference:
o Estimating posterior distributions of parameters when the likelihood and prior
are known.
o Useful for hierarchical models and complex data structures.
2. Statistical Physics:

• Simulating the behavior of physical systems at the atomic or molecular level.

• Estimating properties like magnetization or phase transitions.

3. Machine Learning:

• Training models with complex likelihood functions.

• Bayesian neural networks and other probabilistic models.

4. Ecology and Evolutionary Biology:

• Estimating parameters of population dynamics models.

• Studying the evolution of traits under different selection pressures.

Sampling:
Sampling is a technique used to select a subset of data from a larger population,
allowing for the analysis and inference of population characteristics without
examining the entire dataset.
Types of Sampling

1. Probability Sampling:
o Everyone in the population has a known and fair chance of being selected. This makes
the results more reliable and often used in scientific research.
o Examples:
a. Simple Random Sampling
What it means:
Each person (or item) in the population has an equal chance of being picked, just like a lottery.
Example:
If you have 100 students and you randomly pick 10 names from a hat, that’s simple random
sampling.

b. Systematic Sampling
What it means:
You select every k-th person from a list, starting at a random point.
Example:
If you have a list of 1,000 people and pick every 10th person, starting at person #4, you would select
#4, #14, #24, etc.

c. Stratified Sampling
What it means:
You divide the population into groups (called strata) based on some feature (like age or gender), and
then take a sample from each group.
Example:
If your population has 60% men and 40% women, you make sure your sample also includes 60%
men and 40% women.

d. Cluster Sampling
What it means:
You divide the population into clusters (small groups), and then randomly pick a few entire
clusters to study.
Example:
A school has 10 classrooms. Instead of picking students from all over the school, you randomly
choose 3 full classrooms and study all the students in those classes.

2. Non-Probability Sampling
Meaning:
Not everyone has a fair or known chance of being chosen. This method is easier and faster, but
results may be less accurate.

a. Convenience Sampling
What it means:
You select people who are easy to reach.
Example:
Asking your friends or classmates to fill out a survey because they’re nearby.

b. Judgmental (Purposive) Sampling

What it means:
You select people based on your own judgment about who would be best for the study.
Example:
A doctor picks specific patients to study because they fit certain medical conditions she’s interested
in.

c. Quota Sampling
What it means:
You choose people to meet certain fixed numbers (quotas) for different groups.
Example:
You need 20 men and 20 women in your sample. You keep picking people until you fill both quotas.

d. Snowball Sampling
What it means:
You start with a few people, and they help you find more participants—like a chain reaction.
Example:
You interview one person in a secret group (like drug users or refugees), and they introduce you to
others for the study.

Proposal Distribution:
• A proposal distribution is a fundamental component in Markov Chain Monte Carlo
(MCMC) methods.
• It is used to generate new candidate samples from a target probability distribution,
especially when direct sampling is not feasible.
• A proposal distribution, denoted as q(x′∣x), is a probability distribution used to propose
new candidate states x' given the current state x.
• The new candidate state is then accepted or rejected based on a criterion designed to
ensure that the sequence of samples converges to the target distribution π(x).
Markov Chain Monte Carlo Algorithms:
Metropolis-Hastings Algorithm:

• Description: A general algorithm that generates a candidate sample from a proposal

distribution and accepts or rejects it based on an acceptance probability.
• Process:
o Initialize at a state x0.
o Generate a candidate state x′ from the proposal distribution q(x′∣x).
o Calculate the acceptance probability:

5
4. Accept x′ with probability α, otherwise, stay at x.

• Use Case: Widely applicable and flexible for various target distributions.

Gibbs Sampling:

• Description: Samples each variable in turn from its conditional distribution given the
current values of the other variables.
• Process:
1. Initialize all variables.
2. Sample each variable xi from p(xi∣other variables).
3. Repeat until convergence.
• Use Case: Effective when conditional distributions are easier to sample from.
• Example: Ideal for Bayesian networks and hierarchical models.

Graphical Models:
• Graphical models are a powerful framework for representing complex dependencies
among variables in a visual and mathematical way.

Bayesian Networks:
• Bayesian Networks (BNs) are a type of probabilistic graphical model that uses directed
acyclic graphs (DAGs) to represent a set of variables and their conditional
dependencies.
• They are particularly powerful for modeling complex systems where understanding the
relationships between variables is crucial.

6
Joint Probability:
• Joint probability is a probability of two or more events happening together. For
example, the joint probability of two events A and B is the probability that both events
occur, P(A∩B).
P(A ∩ B) = P(A) · P(B)
P(A ∩ B) = P(A | B) · P(B)
Conditional Probability:
• Conditional probability defines the probability that event B will occur, given that event
A has already occurred.

Example:

Burglary ‘B’ –

• P (B=T) = 0.001 (‘B’ is true i.e burglary has occurred)

• P (B=F) = 0.999 (‘B’ is false i.e burglary has not occurred)

Fire ‘F’ –

• P (F=T) = 0.002 (‘F’ is true i.e fire has occurred)

• P (F=F) = 0.998 (‘F’ is false i.e fire has not occurred)

Alarm ‘A’ –

7
B F P (A=T) P (A=F)
T T 0.95 0.05
T F 0.94 0.06
F T 0.29 0.71
F F 0.001 0.999

Person ‘P1’ –

A P (P1=T) P (P1=F)
T 0.95 0.05
F 0.05 0.95

Person ‘P2’ –

A P (P2=T) P (P2=F)
T 0.80 0.20
F 0.01 0.99

P ( P1, P2, A, ~B, ~F)

= P (P1/A) * P (P2/A) * P (A/~B~F) * P (~B) * P (~F)

= 0.95 * 0.80 * 0.001 * 0.999 * 0.998

= 0.00075

Applications

• Medical Diagnosis: Modeling diseases and symptoms to assist in diagnostic

reasoning.
• Machine Learning: Feature selection, classification, and regression.
• Natural Language Processing: Dependency parsing and language modeling.
• Econometrics: Understanding relationships between economic indicators.

Markov Random Fields:

• Markov Random Fields (MRFs), also known as Markov Networks, are a type of
probabilistic graphical model that represents the dependencies among a set of random
variables using an undirected graph.
• They are particularly useful for modeling scenarios where the exact direction of
dependency is not well-defined or when symmetrical relationships between variables
exist.

8
• A Markov Random Field or Markov Network is a class of graphical models with an
undirected graph between random variables.
• The structure of this graph decides the dependence or independence between the
random variables.

Fig. Markov Random Field with four random variables

Components of Markov Random Fields

1. Nodes (Vertices):
o Each node represents a random variable.
o Nodes can represent observed data, hidden variables, or any entities in the
model.
2. Edges (Links):
o Undirected edges between nodes indicate direct dependencies.
o Unlike Bayesian Networks, MRFs use undirected edges to capture the
symmetrical nature of relationships.
3. Clique Potentials (Factors):
o Potential functions are associated with cliques (fully connected subgraphs) of
the graph.
o They represent the local dependencies among the variables in a clique.
o These potential functions are often denoted as ψ

Applications

• Image Processing: Image segmentation, denoising, and restoration.

• Natural Language Processing: Part-of-speech tagging, named entity recognition.
• Computer Vision: Object recognition, scene labeling.
• Bioinformatics: Modeling spatial dependencies in protein structures.

9
Hidden Markov Models:
• The hidden Markov Model (HMM) is a statistical model that is used to describe the
probabilistic relationship between a sequence of observations and a sequence of hidden
states.
• It is often used in situations where the underlying system or process that generates the
observations is unknown or hidden, hence it has the name “Hidden Markov Model.”
• It is used to predict future observations or classify sequences, based on the underlying
hidden process that generates the data.

An HMM consists of two types of variables: hidden states and observations.

• The hidden states are the underlying variables that generate the observed data, but
they are not directly observable.
• The observations are the variables that are measured and observed.

The Hidden Markov Model (HMM) is the relationship between the hidden states and the
observations using two sets of probabilities: the transition probabilities and the emission
probabilities.

• The transition probabilities describe the probability of transitioning from one hidden
state to another.

• The emission probabilities describe the probability of observing an output given a

hidden state.

Hidden Markov Model Algorithm:

Step 1: Define the state space and observation space

The state space is the set of all possible hidden states, and the observation space is the set of
all possible observations.

Step 2: Define the initial state distribution

This is the probability distribution over the initial state.

Step 3: Define the state transition probabilities

These are the probabilities of transitioning from one state to another. This forms the
transition matrix, which describes the probability of moving from one state to another.

Step 4: Define the observation likelihoods:

These are the probabilities of generating each observation from each state. This forms the
emission matrix, which describes the probability of generating each observation from each
state.

10
Step 5: Train the model

The parameters of the state transition probabilities and the observation likelihoods are
estimated using the Baum-Welch algorithm, or the forward-backward algorithm. This is done
by iteratively updating the parameters until convergence.

Step 6: Decode the most likely sequence of hidden states

Given the observed data, the Viterbi algorithm is used to compute the most likely sequence
of hidden states. This can be used to predict future observations, classify sequences, or detect
patterns in sequential data.

Step 7: Evaluate the model

The performance of the HMM can be evaluated using various metrics, such as accuracy,
precision, recall, or F1 score.
Tracking Methods:
• Tracking methods in machine learning, often referred to as object tracking, involve
techniques used to locate and follow an object's position over time in a sequence of
frames or images.
• These methods have applications in various fields, including computer vision, robotics,
surveillance, and augmented reality.
Kalman Filter:
• The Kalman filter is an optimal estimator for linear systems with Gaussian noise.
• It provides a recursive solution to the linear quadratic estimation problem, efficiently
processing noisy measurements to produce an estimate of the system's state.
Components:

1. State Vector (xt):

o Represents the state of the system at time t.
2. State Transition Model (A):
o Describes how the state evolves over time.
o xt=Axt−1+But+wtxt where ut is the control input and wt is the process noise.
3. Measurement Model (H):
o Relates the state to the observations.
o zt=Hxt+vtzt = where zt is the measurement and vt is the measurement noise.
4. Covariance Matrices (Q and R):
o Q: Process noise covariance.
o R: Measurement noise covariance.

11
Algorithm:

1. Prediction:
o Predict the next state
o Predict the error covariance
2. Update:
o Compute the Kalman gain
o Update the state estimate
o Update the error covariance

Applications:

• Navigation Systems: GPS and inertial navigation.

• Control Systems: Robotics, aerospace.
• Economics: Estimating trends and cycles.

Particle Filter:
• The particle filter, or Sequential Monte Carlo (SMC) method, is used for non-linear,
non-Gaussian systems.
• It represents the posterior distribution of the state using a set of random samples
(particles) and weights.
Components

1. Particles:
o A set of samples representing possible states.
2. Weights:

12
o Importance weights for each particle, representing the likelihood given the
observations.

Algorithm:

1. Initialization:
o Generate an initial set of particles from the prior distribution.
o Initialize weights
2. Prediction:
o Propagate particles according to the state transition model
3. Update:
o Update weights based on the measurement likelihood
o Normalize weights
4. Resampling:
o Resample particles based on their weights to avoid degeneracy.

Applications:

• Robotics: Localization and mapping (SLAM).

• Computer Vision: Object tracking.
• Finance: Filtering in stochastic volatility models.

13
Comparison:

• Kalman Filter:
o Assumes linear dynamics and Gaussian noise.
o Computationally efficient.
o Optimal for linear systems.
• Particle Filter:
o Handles non-linear and non-Gaussian systems.
o More computationally intensive.
o Provides a flexible framework for complex systems.

*****

Unit 5
No ratings yet
Unit 5
39 pages
ML Unit 05
No ratings yet
ML Unit 05
14 pages
Unit-5 Alt
No ratings yet
Unit-5 Alt
15 pages
ML Unit5
No ratings yet
ML Unit5
15 pages
ML Unit 5
No ratings yet
ML Unit 5
30 pages
Unit V ML GHGTGH
No ratings yet
Unit V ML GHGTGH
15 pages
ML Unit 5 New
No ratings yet
ML Unit 5 New
15 pages
Unit V
No ratings yet
Unit V
15 pages
ML Unit 5 at VS
No ratings yet
ML Unit 5 at VS
29 pages
ML Unit 5
No ratings yet
ML Unit 5
29 pages
UNIT-5 Machine Learning
No ratings yet
UNIT-5 Machine Learning
31 pages
ML Unit 5
No ratings yet
ML Unit 5
30 pages
My Notes Unit 5
No ratings yet
My Notes Unit 5
12 pages
Module 5
No ratings yet
Module 5
52 pages
Unit-5 ML
100% (1)
Unit-5 ML
14 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
ML Unit-5
No ratings yet
ML Unit-5
14 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
ML Unit-5
No ratings yet
ML Unit-5
14 pages
ML Unit-4 - RTU
No ratings yet
ML Unit-4 - RTU
18 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
ML Basics Unit 5
No ratings yet
ML Basics Unit 5
19 pages
10 ReinforcementLearning
No ratings yet
10 ReinforcementLearning
59 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
ML 4
No ratings yet
ML 4
4 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I
35 pages
Unit 1 Reinforcement Learning
No ratings yet
Unit 1 Reinforcement Learning
70 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
R22 ML Unit 5
No ratings yet
R22 ML Unit 5
29 pages
Artificial Intelligence: Lecture 10 - Reinforcement Learning Prof. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 10 - Reinforcement Learning Prof. Shivanjali Khare
45 pages
Reinforcement Learning Guide
100% (1)
Reinforcement Learning Guide
24 pages
Reinforcement Learning - Personal Study Notes
No ratings yet
Reinforcement Learning - Personal Study Notes
12 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
Ideai Reinforcement Learning
No ratings yet
Ideai Reinforcement Learning
167 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
5 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
52 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
38 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
50 pages
37 RL
No ratings yet
37 RL
18 pages
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
Reinforcement Learning Mastery Path
No ratings yet
Reinforcement Learning Mastery Path
18 pages
Unit 3
No ratings yet
Unit 3
29 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
46 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
Unit - 5
No ratings yet
Unit - 5
43 pages
ML U5 Notes
No ratings yet
ML U5 Notes
26 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
45 pages
ML Unit 5
No ratings yet
ML Unit 5
29 pages
Elementos Basicos Aprendizaje Por Refuerzo
No ratings yet
Elementos Basicos Aprendizaje Por Refuerzo
52 pages
H&M
No ratings yet
H&M
10 pages
NLP Unit 3 1
No ratings yet
NLP Unit 3 1
44 pages
Darwinbox All Possible Questions
No ratings yet
Darwinbox All Possible Questions
2 pages
Unit 3 Printed
No ratings yet
Unit 3 Printed
14 pages
Unit V
No ratings yet
Unit V
17 pages
Unit-1 Imp
No ratings yet
Unit-1 Imp
10 pages
Unit-4 ML
No ratings yet
Unit-4 ML
19 pages
JNTUH FLAT UNIT 1 Notes
No ratings yet
JNTUH FLAT UNIT 1 Notes
144 pages
Daa Unit-Iv
No ratings yet
Daa Unit-Iv
13 pages
Proposed PHD in Data Science
No ratings yet
Proposed PHD in Data Science
166 pages
ML DecisionTrees
No ratings yet
ML DecisionTrees
46 pages
Large Language Models in BPM
No ratings yet
Large Language Models in BPM
18 pages
Machine Learning Guide For Oil and Gas Using Python
No ratings yet
Machine Learning Guide For Oil and Gas Using Python
1 page
Shivam Intership
100% (1)
Shivam Intership
18 pages
Mini Project 1.1
No ratings yet
Mini Project 1.1
59 pages
Nlp Basic 03-N-gram Language Model: Nguyễn Quốc Thái
No ratings yet
Nlp Basic 03-N-gram Language Model: Nguyễn Quốc Thái
31 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
7 pages
Applications of Data-Driven Approaches in Prediction of Fatigue and Fracture
No ratings yet
Applications of Data-Driven Approaches in Prediction of Fatigue and Fracture
13 pages
L L M M K: U S M S GPT: Arge Anguage Odels As Aster EY Nlocking THE Ecrets of Aterials Cience With
No ratings yet
L L M M K: U S M S GPT: Arge Anguage Odels As Aster EY Nlocking THE Ecrets of Aterials Cience With
17 pages
Recipe - Generation - From Food Image-Project
No ratings yet
Recipe - Generation - From Food Image-Project
6 pages
Report
No ratings yet
Report
64 pages
Black Box Fairness Testing of Machine Learning Models
No ratings yet
Black Box Fairness Testing of Machine Learning Models
11 pages
Unit 3 - DS - 1st Year
No ratings yet
Unit 3 - DS - 1st Year
5 pages
Artificial Intelligence, Machine Learning, and Deep Learning in Structural Engineering - A Scientometrics Review of Trends and Best Practices 2022
No ratings yet
Artificial Intelligence, Machine Learning, and Deep Learning in Structural Engineering - A Scientometrics Review of Trends and Best Practices 2022
45 pages
Handbook of Artificial Intelligence Applications For Industrial Sustainability 1st Edition Vikas Garg Download
100% (1)
Handbook of Artificial Intelligence Applications For Industrial Sustainability 1st Edition Vikas Garg Download
80 pages
Instagram's Impact on Fashion Sales
No ratings yet
Instagram's Impact on Fashion Sales
17 pages
Sensors 23 03333 v2
No ratings yet
Sensors 23 03333 v2
22 pages
Less Data More Knowledge Semantic Walid
No ratings yet
Less Data More Knowledge Semantic Walid
38 pages
Deep Learning Revision Guide
No ratings yet
Deep Learning Revision Guide
6 pages
1 s2.0 S2589721722000034 Main
No ratings yet
1 s2.0 S2589721722000034 Main
8 pages
GenAI ElasticSearch
No ratings yet
GenAI ElasticSearch
42 pages
3250+module+1+ +Intro+to+Data+Science
No ratings yet
3250+module+1+ +Intro+to+Data+Science
71 pages
Original
No ratings yet
Original
21 pages
AI Adventurers - Teacher Resource Guide - AI Adventurers and Minecraft Education
No ratings yet
AI Adventurers - Teacher Resource Guide - AI Adventurers and Minecraft Education
20 pages
Smart Cattle Care: An IOT Based Monitoring and Management System
No ratings yet
Smart Cattle Care: An IOT Based Monitoring and Management System
4 pages
AI Workshops for Industry Leaders
No ratings yet
AI Workshops for Industry Leaders
8 pages
Classification and Forecasting of Water Stress in Tomato Plants Using Bioristor Data
No ratings yet
Classification and Forecasting of Water Stress in Tomato Plants Using Bioristor Data
13 pages
Machine Learning Final Notes by Sakhawat Hossain
No ratings yet
Machine Learning Final Notes by Sakhawat Hossain
76 pages
Nelam Gowri Sankar
No ratings yet
Nelam Gowri Sankar
2 pages

Unit-5 ML

Uploaded by

Unit-5 ML

Uploaded by

UNIT-V

Reinforcement Learning; Overview of reinforcement learning, Getting Lost Example.

Key Components of Reinforcement Learning

Algorithms in Reinforcement Learning

Scenario: Robot Navigating a Maze

1. Game Playing: RL agents have achieved superhuman performance in games like

Markov Chain Monte Carlo Methods:

Key Concepts of MCMC Methods:

One of the most popular MCMC methods.

Here's how it works:

If the new state is better, it's usually accepted.

• Simulating the behavior of physical systems at the atomic or molecular level.

• Training models with complex likelihood functions.

4. Ecology and Evolutionary Biology:

• Estimating parameters of population dynamics models.

b. Judgmental (Purposive) Sampling

• Description: A general algorithm that generates a candidate sample from a proposal

• P (B=T) = 0.001 (‘B’ is true i.e burglary has occurred)

• P (F=T) = 0.002 (‘F’ is true i.e fire has occurred)

P ( P1, P2, A, ~B, ~F)

= P (P1/A) * P (P2/A) * P (A/~B~F) * P (~B) * P (~F)

= 0.95 * 0.80 * 0.001 * 0.999 * 0.998

• Medical Diagnosis: Modeling diseases and symptoms to assist in diagnostic

Markov Random Fields:

Fig. Markov Random Field with four random variables

Components of Markov Random Fields

• Image Processing: Image segmentation, denoising, and restoration.

An HMM consists of two types of variables: hidden states and observations.

• The emission probabilities describe the probability of observing an output given a

Hidden Markov Model Algorithm:

Step 1: Define the state space and observation space

Step 2: Define the initial state distribution

This is the probability distribution over the initial state.

Step 3: Define the state transition probabilities

Step 4: Define the observation likelihoods:

Step 6: Decode the most likely sequence of hidden states

Step 7: Evaluate the model

1. State Vector (xt):

• Navigation Systems: GPS and inertial navigation.

• Robotics: Localization and mapping (SLAM).

You might also like