MDPs: Policies, Search & Utility

This document discusses Markov decision processes (MDPs) and methods for solving them. An MDP is defined by states, actions, transition probabilities, and reward functions. The goal is to find an optimal policy that maps states to actions to maximize expected utility. MDP search trees can be used to represent the problem. Discounting future rewards helps algorithms converge by giving more weight to nearer term rewards.

Uploaded by

Aimee Lemma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views13 pages

MDPs: Policies, Search & Utility

Uploaded by

Aimee Lemma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Artificial Intelligence and

Intelligent Agents (F29AI)

MDP II: Policies, Search & Utility

Arash Eshghi

Based on slides from Ioannis Konstas @HWU, Verena Rieser @HWU, Dan Klein @UC Berkeley
Markov Decision Processes

• An MDP is defined by:

• A set of states s Î S
• A set of actions a Î A
• A transition function T(s, a, s’)
• Prob that action a from s leads to s’
i.e., P(s’ | s,a)
• Also called “the model”
• A reward function R(s, a, s’)
• Sometimes just R(s) or R(s’)
• A start state (or distribution)
• Maybe a terminal state

• MDPs are a family of non-deterministic search problems

• One way to solve them is with expectimax search – but we will
have a new tool soon
2
Policies
• In deterministic single-agent search problems, we wanted an optimal
plan, or sequence of actions, from start to a goal

• In an MDP, we want an optimal policy p*: S → A

• A policy p gives an action for each state
• An optimal policy maximizes expected utility if followed
• An explicit policy defines a reflex agent

• Expectimax didn’t compute entire policies

• Expectimax computed actions
for a single state only!

Optimal policy when R(s, a, s’) = -0.03

for all non-terminals s
Example Optimal Policies

R(s) = -0.01 R(s) = -0.3

R(s) = -0.4 R(s) = -2.0 4

Example: Racing
Example: racing
• A robot car wants to travel far, quickly
• Three states: Cool, Warm, Overheated
• Two actions: Slow, Fast
• Going faster gets double reward
• Break-down: Game over!
Racing Search Tree
Racing Search Tree
MDP Search Trees
Each MDP state projects an expectimax-like search tree

a s is a state

(s,a) is a
q-state
s,a
(s,a,s’) is a transition

s,a,s’ T(s,a,s’) = P(s’|s,a)

R(s,a,s’)
s’

9
Utilities of Sequences
Utilities of Sequences
• What preferences should an agent have over reward
sequences?

• More or less? [1,2,2] or [2,3,4]

• Now or later? [0,0,1] or [1,0,0]

Discounting (gamma)
• It’s reasonable to maximise the sum of rewards
• It’s also reasonable to prefer rewards now to rewards later
• One solution: values of rewards decay exponentially!

1
Worth Now Worth next step Worth in two steps
Discounting
• How to discount?
• Each time we descend a level,
we multiply in the discount once
• Why discount?
• Sooner rewards probably do
have higher utility than later
rewards
• Also helps our algorithms
converge
• Example: discount of 0.5
• U([1,2,3]) = 1*1 + 0.5*2 + 0.25*3
• U([3,2,1]) = 3*1 + 0.5*2 + 0.25*1
• U([1,2,3]) < U([3,2,1])

06 MDP
No ratings yet
06 MDP
89 pages
L12 Markov Decision Processes
No ratings yet
L12 Markov Decision Processes
64 pages
Logistics: CSE 473 Markov Decision Processes
No ratings yet
Logistics: CSE 473 Markov Decision Processes
10 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
Slides
No ratings yet
Slides
10 pages
Lecture7 MDPs I
No ratings yet
Lecture7 MDPs I
9 pages
Markov Decision Processes Overview
No ratings yet
Markov Decision Processes Overview
111 pages
MDP Basics for AI Researchers
No ratings yet
MDP Basics for AI Researchers
22 pages
Markov Decision
No ratings yet
Markov Decision
11 pages
Lecture7 MDP
No ratings yet
Lecture7 MDP
44 pages
MDP Basics for AI Researchers
No ratings yet
MDP Basics for AI Researchers
23 pages
Unit-4 of Ai
No ratings yet
Unit-4 of Ai
9 pages
CS415 - Lecture 21 - MDPs I
No ratings yet
CS415 - Lecture 21 - MDPs I
49 pages
Sp14 Cs188 Lecture 8 - Mdps I
No ratings yet
Sp14 Cs188 Lecture 8 - Mdps I
50 pages
2024 MDPs Part 1
No ratings yet
2024 MDPs Part 1
59 pages
AIML Unit - 3 MDP New
No ratings yet
AIML Unit - 3 MDP New
30 pages
Markov Decision Processes Overview
No ratings yet
Markov Decision Processes Overview
14 pages
AI Decision Making & RL Guide
No ratings yet
AI Decision Making & RL Guide
18 pages
Artificial Intelligence and Intelligent Agents (F29AI) MDP I: Intro To Markov Decision Processes
No ratings yet
Artificial Intelligence and Intelligent Agents (F29AI) MDP I: Intro To Markov Decision Processes
10 pages
2025 - MDPs 1
No ratings yet
2025 - MDPs 1
62 pages
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
No ratings yet
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
74 pages
RL Unit-Ii
No ratings yet
RL Unit-Ii
14 pages
Microsoft PowerPoint - Lecture20Final-Part1
No ratings yet
Microsoft PowerPoint - Lecture20Final-Part1
65 pages
Understanding Regret and MDP Basics
No ratings yet
Understanding Regret and MDP Basics
29 pages
Cs5811 Ch17 Complex Dec
No ratings yet
Cs5811 Ch17 Complex Dec
29 pages
Lec 08
No ratings yet
Lec 08
59 pages
A Tutorial For Reinforcement Learning
No ratings yet
A Tutorial For Reinforcement Learning
17 pages
Cse 473 MDP Notes
No ratings yet
Cse 473 MDP Notes
11 pages
Solve MDPs with MDPtoolbox in Matlab
No ratings yet
Solve MDPs with MDPtoolbox in Matlab
9 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
DSA5102 Lecture11
No ratings yet
DSA5102 Lecture11
44 pages
Lecture4 Model Free Prediction
No ratings yet
Lecture4 Model Free Prediction
34 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
43 pages
Unit-4 MDP
No ratings yet
Unit-4 MDP
21 pages
Lecture Notes
No ratings yet
Lecture Notes
29 pages
mdp1 6pp
No ratings yet
mdp1 6pp
13 pages
RL DQN PG
No ratings yet
RL DQN PG
65 pages
Lect28 4up
No ratings yet
Lect28 4up
11 pages
Markov Decision Processes Guide
No ratings yet
Markov Decision Processes Guide
2 pages
A Tutorial For Reinforcement Learning
No ratings yet
A Tutorial For Reinforcement Learning
14 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
62 pages
Finite Markov Decision Processes-BR
No ratings yet
Finite Markov Decision Processes-BR
31 pages
Markov Decision & RL Overview
No ratings yet
Markov Decision & RL Overview
39 pages
08 MDPs
No ratings yet
08 MDPs
110 pages
Cs221 LEC 4 Slides
No ratings yet
Cs221 LEC 4 Slides
73 pages
17 - Markov Decision Processes
No ratings yet
17 - Markov Decision Processes
59 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
Reinforcement Learning: Part I - Definitions
No ratings yet
Reinforcement Learning: Part I - Definitions
26 pages
Lecture 3 - MDPS, Returns, V, Q
No ratings yet
Lecture 3 - MDPS, Returns, V, Q
31 pages
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
No ratings yet
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
40 pages
MIT 6.036 Lecture
No ratings yet
MIT 6.036 Lecture
64 pages
Ai (It) Unit-4
100% (1)
Ai (It) Unit-4
37 pages
15 MDP
No ratings yet
15 MDP
35 pages
Lec 12
No ratings yet
Lec 12
60 pages
Deep RL - Content Beyond Syllabus
No ratings yet
Deep RL - Content Beyond Syllabus
16 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
Chapter 18 - Reinforcement Learning
No ratings yet
Chapter 18 - Reinforcement Learning
29 pages
Cpte01 PDF
No ratings yet
Cpte01 PDF
10 pages
Thesis
No ratings yet
Thesis
62 pages
Ludwig Wittgenstein - Philosophical Grammar (2005, University of California Press) PDF
No ratings yet
Ludwig Wittgenstein - Philosophical Grammar (2005, University of California Press) PDF
247 pages
Research Gap Identification and Problem Formulation
No ratings yet
Research Gap Identification and Problem Formulation
37 pages
CAT Prep Schedule & Sessions
No ratings yet
CAT Prep Schedule & Sessions
10 pages
Adjective - Adverb - Noun - Verb LIST
100% (1)
Adjective - Adverb - Noun - Verb LIST
3 pages
English Workshop for 11th Grade
0% (1)
English Workshop for 11th Grade
3 pages
Games As A Pedagogical Approach For Teaching Compassion A Holistic Scientistís Inquiry Into Creating Social Change
No ratings yet
Games As A Pedagogical Approach For Teaching Compassion A Holistic Scientistís Inquiry Into Creating Social Change
33 pages
The Role of Reading Skills On Reading Comprehensio
No ratings yet
The Role of Reading Skills On Reading Comprehensio
16 pages
Impact of Peer Tutoring On Learning of Students: March 2015
No ratings yet
Impact of Peer Tutoring On Learning of Students: March 2015
7 pages
Outlining in Reverse
No ratings yet
Outlining in Reverse
2 pages
The Johari Window - Building Self-Awareness and Trust
No ratings yet
The Johari Window - Building Self-Awareness and Trust
15 pages
Free Will Could All Be An Illusion, Scientists Suggest After Study Shows Choice May Just Be Brain Tricking Itself - Science - News - The Independent
No ratings yet
Free Will Could All Be An Illusion, Scientists Suggest After Study Shows Choice May Just Be Brain Tricking Itself - Science - News - The Independent
9 pages
EPPP Psychological Assessment
100% (1)
EPPP Psychological Assessment
8 pages
Metco Flavel
No ratings yet
Metco Flavel
7 pages
Information Society: by Prof. Liwayway Memije-Cruz
No ratings yet
Information Society: by Prof. Liwayway Memije-Cruz
30 pages
Jan 11 Intro To Positive Psych
No ratings yet
Jan 11 Intro To Positive Psych
41 pages
Quirino State University: Self-Paced Learning Module
No ratings yet
Quirino State University: Self-Paced Learning Module
18 pages
Text Mining of Twitter Data Using A Latent Dirichlet Allocation Topic Model and Sentiment Analysis
No ratings yet
Text Mining of Twitter Data Using A Latent Dirichlet Allocation Topic Model and Sentiment Analysis
6 pages
Unit Plan Photosynthesis
No ratings yet
Unit Plan Photosynthesis
24 pages
BSBLDR501 Develop and Use Emotional Intelligence: Release 1 (Aspire Version 1.1) © Aspire Training & Consulting
No ratings yet
BSBLDR501 Develop and Use Emotional Intelligence: Release 1 (Aspire Version 1.1) © Aspire Training & Consulting
41 pages
Newsletter Martha Rogers
No ratings yet
Newsletter Martha Rogers
4 pages
Upper Intermediate Unit 12a
No ratings yet
Upper Intermediate Unit 12a
2 pages
10 Class Notes On Creativity and Innovation
No ratings yet
10 Class Notes On Creativity and Innovation
3 pages
Future Tense For Class 8
75% (4)
Future Tense For Class 8
5 pages
Reading
No ratings yet
Reading
2 pages
Cópia de Four-Corners-Level1-Unit9-Teachers-Edition-Language-Summary
No ratings yet
Cópia de Four-Corners-Level1-Unit9-Teachers-Edition-Language-Summary
1 page
Teacher Training & Strategy Plan
89% (37)
Teacher Training & Strategy Plan
3 pages
Information Literacy
No ratings yet
Information Literacy
9 pages
Machine Learning Life Cycle
No ratings yet
Machine Learning Life Cycle
11 pages