0% found this document useful (0 votes)
12 views18 pages

Unit-5 ML

The document provides an overview of reinforcement learning, including its key components such as agent, environment, state, action, reward, policy, and value function. It also discusses Markov Chain Monte Carlo methods, their applications, and algorithms like Metropolis-Hastings and Gibbs Sampling, as well as graphical models like Bayesian Networks. Additionally, it covers sampling techniques and their types, emphasizing the importance of proposal distributions in MCMC methods.

Uploaded by

22wh1a05i0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views18 pages

Unit-5 ML

The document provides an overview of reinforcement learning, including its key components such as agent, environment, state, action, reward, policy, and value function. It also discusses Markov Chain Monte Carlo methods, their applications, and algorithms like Metropolis-Hastings and Gibbs Sampling, as well as graphical models like Bayesian Networks. Additionally, it covers sampling techniques and their types, emphasizing the importance of proposal distributions in MCMC methods.

Uploaded by

22wh1a05i0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

UNIT-V

Reinforcement Learning; Overview of reinforcement learning, Getting Lost Example.


Markov Chain Monte Carlo Methods: Sampling, Proposal Distribution, Markov Chain Monte
Carlo.
Graphical Models: Bayesian Networks, Markov Random Fields, Hidden Markov Models,
Tracking Methods.
Reinforcement Learning:
• Reinforcement learning (RL) is a type of machine learning where an agent learns to
make decisions by interacting with an environment.
• The agent receives feedback in the form of rewards or penalties based on the actions it
takes, and its goal is to maximize the cumulative reward over time.

Key Components of Reinforcement Learning

1. Agent:
o The learner or decision-maker that interacts with the environment.
2. Environment:
o The external system the agent interacts with. It provides feedback based on the
agent's actions.
3. State:
o A representation of the current situation of the environment. The agent
perceives the environment through states.
4. Action:
o The set of all possible moves the agent can make in the environment.
5. Reward:
o Feedback from the environment based on the agent's actions. Positive rewards
incentivize desirable actions, while negative rewards (or penalties) discourage
undesirable actions.
6. Policy:
o A strategy used by the agent to determine the next action based on the current
state. It can be deterministic or stochastic.
7. Value Function:
o A function that estimates the expected cumulative reward of states or state-
action pairs, helping the agent to make decisions that maximize long-term
rewards.

1
The Learning Process

1. Exploration:
o The agent tries out different actions to discover their effects and gather
information about the environment.
2. Exploitation:
o The agent uses its knowledge to choose actions that it believes will maximize
the reward.
3. Balance:
o Effective RL requires balancing exploration and exploitation to ensure the
agent learns the optimal policy.

Positive Reinforcement
• What it means: Giving a reward to encourage a behavior.
• Example: If a student answers a question correctly and gets a chocolate, they’ll
try to answer more questions correctly.
• Effect: It motivates the person (or agent) to repeat the good behavior.
• Note: Too much reward can sometimes confuse or reduce the impact.

Negative Reinforcement
• What it means: Removing something unpleasant to encourage a behavior.
• Example: If a loud noise stops when you press a button, you’ll press the button
more often to avoid the noise.
• Effect: It also encourages good behavior, but by taking away something bad.
• Note: It works well in some situations but only supports the minimum required
behavior.

Both are used to increase the chance of good actions, but in different ways—one by
giving rewards, the other by removing discomfort.

Algorithms in Reinforcement Learning


1. Q-Learning
➤ What is it?
Q-Learning is a model-free reinforcement learning algorithm. That means the
agent learns how to behave without knowing how the environment works. It
tries different actions and learns from the results.
It learns a Q-value, which tells it how good it is to take a certain action in a
certain situation (state).
➤ How it works:
• The agent is in a state.
• It tries an action.
• It receives a reward.
• It updates its Q-table based on the best action it could take next (even if it doesn't
take it).
➤ Formula:
lua
Copy code
Q(s, a) = Q(s, a) + α [r + γ * max(Q(s', a')) - Q(s, a)]
Where:
• s = current state
• a = current action
• r = reward
• s' = new state after action
• a' = possible next actions
• α = learning rate
• γ = discount factor (how much future rewards matter)
➤ Simple Example:
Imagine a robot in a grid trying to reach a goal.
• Grid has 4 boxes: A, B, C, and D (D is the goal).
• The robot is in box A.
• If it moves to box B, it gets a small reward.
• If it reaches D, it gets a big reward.
Using Q-Learning, the robot will:
• Try all paths randomly.
• Note which moves gave higher rewards.
• Eventually learn the best path to D, by updating the Q-values.
Even if the robot took a different path, it always updates the Q-table based on the
best possible action from the new state.

2. SARSA (State-Action-Reward-State-Action)
➤ What is it?
SARSA is also a model-free algorithm. It’s very similar to Q-Learning but with
one key difference:
It updates the Q-values based on the actual action taken, not the best possible
action.
➤ Formula:
bash
Copy code
Q(s, a) = Q(s, a) + α [r + γ * Q(s', a') - Q(s, a)]
Where:
• a' = action actually taken in the next state.
➤ Key Difference:
• Q-Learning: Uses the best possible future action (max Q(s', a'))
• SARSA: Uses the next action the agent actually took (Q(s', a'))
➤ Simple Example:
Using the same robot-in-grid example:
• In Q-Learning, the robot updates its path based on the best move from the new
box—even if it didn’t actually take that move.
• In SARSA, the robot updates its knowledge based on the path it really took, even if
it wasn't the best move.
Getting Lost Example:
• "getting lost" example using reinforcement learning to understand how an agent learns
to navigate an environment, avoid pitfalls, and reach its goal.
• We can imagine a scenario where an agent (like a robot) is placed in a maze and needs
to find its way to the exit.

Scenario: Robot Navigating a Maze

1. Environment:
o The maze consists of a grid with walls, open spaces, and an exit.
o The robot starts at a random position and must find the exit.
2. State:
o The current position of the robot in the maze, represented by coordinates (x,
y).
3. Actions:
o The robot can move up, down, left, or right.
4. Rewards:
o Positive reward for reaching the exit.
o Negative reward for hitting a wall.

2
Applications of Reinforcement Learning:

Reinforcement learning is a powerful approach to building intelligent systems that can adapt
and improve through experience, opening up possibilities across a wide range of applications.

1. Game Playing: RL agents have achieved superhuman performance in games like


chess, Go, and video games (e.g., AlphaGo, OpenAI Five).
2. Robotics: Training robots to perform complex tasks such as walking, grasping
objects, and navigating environments.
3. Autonomous Vehicles: Learning to drive safely and efficiently in various traffic
conditions.
4. Healthcare: Optimizing treatment strategies, personalized medicine, and managing
clinical trials.
5. Finance: Algorithmic trading, portfolio management, and risk assessment.

Markov Chain Monte Carlo Methods:


Markov Chain Monte Carlo (MCMC) is a group of methods used to generate random samples from
complex probability distributions—especially when it’s too hard to calculate or sample from the
distribution directly.
These methods are used in Bayesian statistics, machine learning, physics, and other fields.

Key Concepts of MCMC Methods:


Markov Chain
• A Markov Chain is a series of steps (states), where:
o The next step only depends on the current step.
o It does not depend on how you got there (this is called the Markov property).
Example:
• Imagine you're on a game board.
• At each turn, you roll a die and move forward or backward based only on the current
position—not on how you got there.
Over time, as you keep playing, your location (state) settles into a pattern—this is called the
stationary distribution.

Monte Carlo
• Monte Carlo methods use randomness (random sampling) to solve problems.
• They are especially useful for estimating numbers or simulating events when exact
calculations are hard.
In MCMC, Monte Carlo sampling is used to estimate the shape of a complex probability
distribution.
How MCMC Works:
o Start with a point: Pick an initial value or state from the space.
o Propose a move: Generate a candidate for the next state based on proposal
rule.
o Evaluate the move: Calculate how likely this new state is compared to the
current state, based on the target distribution.
o Decide to accept or reject: If the new state is more likely then it is accepted
but if less likely then accept it with probability proportional to how less likely
it is.
o Repeat steps 2-4: Continue proposing and accepting/rejecting new states for
several times and over time the collected states (samples) will represent the
target distribution.
o Burn-in period: Discard initial samples until the chain "forgets" its starting
point and reaches steady behaviour.
o Collect samples: Use the remaining samples to estimate properties of the
target distributions
o
Common Algorithms

1. Metropolis-Hastings Algorithm:

One of the most popular MCMC methods.

Here's how it works:

o Start at a state.
o Propose a new state.
o Calculate an acceptance ratio (how good the new state is).
o Accept or reject the new state using a random decision.

If the new state is better, it's usually accepted.


If it's worse, it might still be accepted (to keep exploration flexible).

2. Gibbs Sampling:
• Special kind of MCMC used when you have multiple variables.
• Instead of proposing whole new states, it updates one variable at a time, based on the current
values of the others.
This is done in a cycle:
• Sample variable 1 (given all others fixed),
• Then sample variable 2,
• Then variable 3, and so on.
This is useful when you can easily sample from the conditional distributions.
Applications of MCMC

1. Bayesian Inference:
o Estimating posterior distributions of parameters when the likelihood and prior
are known.
o Useful for hierarchical models and complex data structures.
2. Statistical Physics:

• Simulating the behavior of physical systems at the atomic or molecular level.


• Estimating properties like magnetization or phase transitions.

3. Machine Learning:

• Training models with complex likelihood functions.


• Bayesian neural networks and other probabilistic models.

4. Ecology and Evolutionary Biology:

• Estimating parameters of population dynamics models.


• Studying the evolution of traits under different selection pressures.

Sampling:
Sampling is a technique used to select a subset of data from a larger population,
allowing for the analysis and inference of population characteristics without
examining the entire dataset.
Types of Sampling

1. Probability Sampling:
o Everyone in the population has a known and fair chance of being selected. This makes
the results more reliable and often used in scientific research.
o Examples:
a. Simple Random Sampling
What it means:
Each person (or item) in the population has an equal chance of being picked, just like a lottery.
Example:
If you have 100 students and you randomly pick 10 names from a hat, that’s simple random
sampling.

b. Systematic Sampling
What it means:
You select every k-th person from a list, starting at a random point.
Example:
If you have a list of 1,000 people and pick every 10th person, starting at person #4, you would select
#4, #14, #24, etc.

c. Stratified Sampling
What it means:
You divide the population into groups (called strata) based on some feature (like age or gender), and
then take a sample from each group.
Example:
If your population has 60% men and 40% women, you make sure your sample also includes 60%
men and 40% women.

d. Cluster Sampling
What it means:
You divide the population into clusters (small groups), and then randomly pick a few entire
clusters to study.
Example:
A school has 10 classrooms. Instead of picking students from all over the school, you randomly
choose 3 full classrooms and study all the students in those classes.

2. Non-Probability Sampling
Meaning:
Not everyone has a fair or known chance of being chosen. This method is easier and faster, but
results may be less accurate.

a. Convenience Sampling
What it means:
You select people who are easy to reach.
Example:
Asking your friends or classmates to fill out a survey because they’re nearby.

b. Judgmental (Purposive) Sampling


What it means:
You select people based on your own judgment about who would be best for the study.
Example:
A doctor picks specific patients to study because they fit certain medical conditions she’s interested
in.

c. Quota Sampling
What it means:
You choose people to meet certain fixed numbers (quotas) for different groups.
Example:
You need 20 men and 20 women in your sample. You keep picking people until you fill both quotas.

d. Snowball Sampling
What it means:
You start with a few people, and they help you find more participants—like a chain reaction.
Example:
You interview one person in a secret group (like drug users or refugees), and they introduce you to
others for the study.

Proposal Distribution:
• A proposal distribution is a fundamental component in Markov Chain Monte Carlo
(MCMC) methods.
• It is used to generate new candidate samples from a target probability distribution,
especially when direct sampling is not feasible.
• A proposal distribution, denoted as q(x′∣x), is a probability distribution used to propose
new candidate states x' given the current state x.
• The new candidate state is then accepted or rejected based on a criterion designed to
ensure that the sequence of samples converges to the target distribution π(x).
Markov Chain Monte Carlo Algorithms:
Metropolis-Hastings Algorithm:

• Description: A general algorithm that generates a candidate sample from a proposal


distribution and accepts or rejects it based on an acceptance probability.
• Process:
o Initialize at a state x0.
o Generate a candidate state x′ from the proposal distribution q(x′∣x).
o Calculate the acceptance probability:

5
4. Accept x′ with probability α, otherwise, stay at x.

• Use Case: Widely applicable and flexible for various target distributions.

Gibbs Sampling:

• Description: Samples each variable in turn from its conditional distribution given the
current values of the other variables.
• Process:
1. Initialize all variables.
2. Sample each variable xi from p(xi∣other variables).
3. Repeat until convergence.
• Use Case: Effective when conditional distributions are easier to sample from.
• Example: Ideal for Bayesian networks and hierarchical models.

Graphical Models:
• Graphical models are a powerful framework for representing complex dependencies
among variables in a visual and mathematical way.

Bayesian Networks:
• Bayesian Networks (BNs) are a type of probabilistic graphical model that uses directed
acyclic graphs (DAGs) to represent a set of variables and their conditional
dependencies.
• They are particularly powerful for modeling complex systems where understanding the
relationships between variables is crucial.

6
Joint Probability:
• Joint probability is a probability of two or more events happening together. For
example, the joint probability of two events A and B is the probability that both events
occur, P(A∩B).
P(A ∩ B) = P(A) · P(B)
P(A ∩ B) = P(A | B) · P(B)
Conditional Probability:
• Conditional probability defines the probability that event B will occur, given that event
A has already occurred.

Example:

Burglary ‘B’ –

• P (B=T) = 0.001 (‘B’ is true i.e burglary has occurred)


• P (B=F) = 0.999 (‘B’ is false i.e burglary has not occurred)

Fire ‘F’ –

• P (F=T) = 0.002 (‘F’ is true i.e fire has occurred)


• P (F=F) = 0.998 (‘F’ is false i.e fire has not occurred)

Alarm ‘A’ –

7
B F P (A=T) P (A=F)
T T 0.95 0.05
T F 0.94 0.06
F T 0.29 0.71
F F 0.001 0.999

Person ‘P1’ –

A P (P1=T) P (P1=F)
T 0.95 0.05
F 0.05 0.95

Person ‘P2’ –

A P (P2=T) P (P2=F)
T 0.80 0.20
F 0.01 0.99

P ( P1, P2, A, ~B, ~F)

= P (P1/A) * P (P2/A) * P (A/~B~F) * P (~B) * P (~F)

= 0.95 * 0.80 * 0.001 * 0.999 * 0.998

= 0.00075

Applications

• Medical Diagnosis: Modeling diseases and symptoms to assist in diagnostic


reasoning.
• Machine Learning: Feature selection, classification, and regression.
• Natural Language Processing: Dependency parsing and language modeling.
• Econometrics: Understanding relationships between economic indicators.

Markov Random Fields:


• Markov Random Fields (MRFs), also known as Markov Networks, are a type of
probabilistic graphical model that represents the dependencies among a set of random
variables using an undirected graph.
• They are particularly useful for modeling scenarios where the exact direction of
dependency is not well-defined or when symmetrical relationships between variables
exist.

8
• A Markov Random Field or Markov Network is a class of graphical models with an
undirected graph between random variables.
• The structure of this graph decides the dependence or independence between the
random variables.

Fig. Markov Random Field with four random variables

Components of Markov Random Fields

1. Nodes (Vertices):
o Each node represents a random variable.
o Nodes can represent observed data, hidden variables, or any entities in the
model.
2. Edges (Links):
o Undirected edges between nodes indicate direct dependencies.
o Unlike Bayesian Networks, MRFs use undirected edges to capture the
symmetrical nature of relationships.
3. Clique Potentials (Factors):
o Potential functions are associated with cliques (fully connected subgraphs) of
the graph.
o They represent the local dependencies among the variables in a clique.
o These potential functions are often denoted as ψ

Applications

• Image Processing: Image segmentation, denoising, and restoration.


• Natural Language Processing: Part-of-speech tagging, named entity recognition.
• Computer Vision: Object recognition, scene labeling.
• Bioinformatics: Modeling spatial dependencies in protein structures.

9
Hidden Markov Models:
• The hidden Markov Model (HMM) is a statistical model that is used to describe the
probabilistic relationship between a sequence of observations and a sequence of hidden
states.
• It is often used in situations where the underlying system or process that generates the
observations is unknown or hidden, hence it has the name “Hidden Markov Model.”
• It is used to predict future observations or classify sequences, based on the underlying
hidden process that generates the data.

An HMM consists of two types of variables: hidden states and observations.

• The hidden states are the underlying variables that generate the observed data, but
they are not directly observable.
• The observations are the variables that are measured and observed.

The Hidden Markov Model (HMM) is the relationship between the hidden states and the
observations using two sets of probabilities: the transition probabilities and the emission
probabilities.

• The transition probabilities describe the probability of transitioning from one hidden
state to another.

• The emission probabilities describe the probability of observing an output given a


hidden state.

Hidden Markov Model Algorithm:

Step 1: Define the state space and observation space

The state space is the set of all possible hidden states, and the observation space is the set of
all possible observations.

Step 2: Define the initial state distribution

This is the probability distribution over the initial state.

Step 3: Define the state transition probabilities

These are the probabilities of transitioning from one state to another. This forms the
transition matrix, which describes the probability of moving from one state to another.

Step 4: Define the observation likelihoods:

These are the probabilities of generating each observation from each state. This forms the
emission matrix, which describes the probability of generating each observation from each
state.

10
Step 5: Train the model

The parameters of the state transition probabilities and the observation likelihoods are
estimated using the Baum-Welch algorithm, or the forward-backward algorithm. This is done
by iteratively updating the parameters until convergence.

Step 6: Decode the most likely sequence of hidden states

Given the observed data, the Viterbi algorithm is used to compute the most likely sequence
of hidden states. This can be used to predict future observations, classify sequences, or detect
patterns in sequential data.

Step 7: Evaluate the model

The performance of the HMM can be evaluated using various metrics, such as accuracy,
precision, recall, or F1 score.
Tracking Methods:
• Tracking methods in machine learning, often referred to as object tracking, involve
techniques used to locate and follow an object's position over time in a sequence of
frames or images.
• These methods have applications in various fields, including computer vision, robotics,
surveillance, and augmented reality.
Kalman Filter:
• The Kalman filter is an optimal estimator for linear systems with Gaussian noise.
• It provides a recursive solution to the linear quadratic estimation problem, efficiently
processing noisy measurements to produce an estimate of the system's state.
Components:

1. State Vector (xt):


o Represents the state of the system at time t.
2. State Transition Model (A):
o Describes how the state evolves over time.
o xt=Axt−1+But+wtxt where ut is the control input and wt is the process noise.
3. Measurement Model (H):
o Relates the state to the observations.
o zt=Hxt+vtzt = where zt is the measurement and vt is the measurement noise.
4. Covariance Matrices (Q and R):
o Q: Process noise covariance.
o R: Measurement noise covariance.

11
Algorithm:

1. Prediction:
o Predict the next state
o Predict the error covariance
2. Update:
o Compute the Kalman gain
o Update the state estimate
o Update the error covariance

Applications:

• Navigation Systems: GPS and inertial navigation.


• Control Systems: Robotics, aerospace.
• Economics: Estimating trends and cycles.

Particle Filter:
• The particle filter, or Sequential Monte Carlo (SMC) method, is used for non-linear,
non-Gaussian systems.
• It represents the posterior distribution of the state using a set of random samples
(particles) and weights.
Components

1. Particles:
o A set of samples representing possible states.
2. Weights:

12
o Importance weights for each particle, representing the likelihood given the
observations.

Algorithm:

1. Initialization:
o Generate an initial set of particles from the prior distribution.
o Initialize weights
2. Prediction:
o Propagate particles according to the state transition model
3. Update:
o Update weights based on the measurement likelihood
o Normalize weights
4. Resampling:
o Resample particles based on their weights to avoid degeneracy.

Applications:

• Robotics: Localization and mapping (SLAM).


• Computer Vision: Object tracking.
• Finance: Filtering in stochastic volatility models.

13
Comparison:

• Kalman Filter:
o Assumes linear dynamics and Gaussian noise.
o Computationally efficient.
o Optimal for linear systems.
• Particle Filter:
o Handles non-linear and non-Gaussian systems.
o More computationally intensive.
o Provides a flexible framework for complex systems.

*****

14

You might also like