Lecture Notes on Reinforcement Learning Basics
Lecture Notes on Reinforcement Learning Basics
By
Dr. Adetokunbo MacGregor JOHN-OTUMU
Reinforcement Learning (RL) is a type of machine learning where an agent learns to make
decisions by performing actions in an environment to maximize cumulative rewards. Unlike
supervised learning, RL does not require labeled input/output pairs and instead learns from the
consequences of actions.
Agent:
Environment:
State (s):
Action (a):
Reward (r):
1|Page
• Feedback from the environment based on the action taken by the agent.
Policy (π):
• A strategy that defines the action the agent should take in a given state.
Q-Function (Q):
• A function that estimates the expected reward for a given state-action pair.
Episode:
• Model-Free: The agent learns directly from interactions without a model of the
environment.
• Model-Based: The agent builds a model of the environment and plans by simulating
actions.
• Value-Based: The agent learns the value function to determine the best action.
• Policy-Based: The agent directly learns the policy without using value functions.
2|Page
• On-Policy: The agent learns the value of the policy it is currently following.
• Off-Policy: The agent learns the value of an optimal policy while following another policy.
4. Q-Learning Algorithm
Q-Learning Formula:
where:
Q-Learning Steps:
3|Page
Exploration vs. Exploitation:
ε-Greedy Strategy:
DQN Architecture:
1. Experience Replay:
o Store experiences (state, action, reward, next state) in a replay buffer.
o Sample mini-batches from the replay buffer to train the network, breaking the
correlation between consecutive experiences.
2. Target Network:
o Use a separate target network to stabilize training.
o The target network's weights are periodically updated to match the main network's
weights.
4|Page
Q(s,a;θ) ← Q(s,a;θ) + α[r+γmaxa′ Q(s′,a′;θ−) − Q(s,a;θ)]
where:
Games:
Robotics:
Finance:
Healthcare:
Autonomous Systems:
5|Page
7. Challenges in Reinforcement Learning
Exploration-Exploitation Trade-off:
• Balancing the need to explore new actions and exploit known rewards.
Sparse Rewards:
Sample Efficiency:
Conclusion
Reinforcement Learning is a powerful paradigm for training agents to make decisions in complex
environments. Q-Learning and its deep learning variant, DQN, have demonstrated remarkable
success in various domains. Understanding these algorithms' principles, workflows, and
applications is crucial for leveraging their full potential in solving real-world problems.
6|Page