Reinforcement Learning Question Bank
Reinforcement Learning Question Bank
(MODULE 1&MODULE 2)
1. Why RL is known as Feedback based machine learning technique explain in your own
words.
4..Contrast the deterministic policy and the stochastic policy Define Value function
10.Differentiate between the Markov reward process (MRP) and the Markov decision
process (MDP)?
19.If X is a random variable with the outcomes of throwing a dice.Find the expectation
E[f(x)],where f(x)= X3
22.For the following grid world Environment Construct the Value function which follows a
deterministic policy.
23.Using the Model dynamics table of State A find the optimal policy using Policy Iteration
24.The model dynamics of an RL environment is given below. Identify V(A) after the first
iteration using value iteration algorithm. Assume discount factor as 1.
25.Without the dynamics of the environment, the RL agent has to evaluate a policy of a
state using Monte-Carlo methods. Which type of RL task the agent has to use for this?
Explain the corresponding algorithm in detail
27.Using the value iteration algorithm and the model dynamics of state A given in the table
below, identify the optimal value of state A, after the first iteration.
PRACTICAL QUESTIONS
3.Implement the reinforcement learning environment namely, Cart pole Environment using
a random policy and show the output of the following:
4.Implement Value Iteration for a simple RL environment and display the optimal policy.
Submit the code and explain the results.
5.Implement Policy Iteration for a simple RL environment and display the optimal policy.
Submit the code and explain the results.