OUTCOME
BASED LAB
TASK REPORT
1
IMPLEMENTATION OF REINFORCEMENT
LEARNING ALGORITHM FOR ABALONE DATASET
OUTCOME BASED LAB TASK REPORT
Submitted by
OWSIKAN M
BANNARI AMMAN INSTITUTE OF TECHNOLOGY
(An Autonomous Institution Affiliated to Anna University, Chennai)
SATHYAMANGALAM-638401
OCTOBER 2022
2
DECLARATION
I affirm that the lab task work titled “IMPLEMENTATION OF REINFORCEMENT
LEARNING ALGORITHM FOR ABALONE DATASET”
being submitted as the record of original work done by us under the guidance of
[Link] M.E., Ph.D., Department of Computer Science And Engineering.
(Signature of candidate)
OWSIKAN M
201CS240
I certify that the declaration made above by the candidates is true.
(Signature of the Guide)
[Link]
3
TABLE OF CONTENTS
CHAPTER NO. TITLE PAGE NO.
1. OBJECTIVE OF THE TASK 5
2. OVERALL BLOCK DIAGRAM 5-6
METHODOLOGY PROPOSED / 7
3.
ALGORITHM
8-10
4.
CODING
11-12
5.
OUTPUT SCREENSHOT
6. CONCLUSION 12
7. REFERENCES 12
8. RUBRICS 13
9. PROCESS PLAN 14
10. REFLECTION SHEET 15
4
IMPLEMENTATION OF REINFORCEMENT
LEARNING ALGORITHM FOR ABALONE
DATASET
1. OBJECTIVE OF THE TASK:
Implementation of reinforcement learning algorithm for
Abalone dataset.
2. OVERALL BLOCK DIAGRAM OF THE TASK:
5
3. METHODOLOGY PROPOSED/ALGORITHM:
STEP 1: Observation of the environment
STEP 2: Deciding how to act using some strategy
STEP 3: Acting accordingly
STEP 4: Receiving a reward or penalty
STEP 5: Learning from the experiences and refining our strategy
STEP 6: Iterate until an optimal strategy is found
4. CODING:
// Importing the required libraries
import numpy as np
import pylab as pl
import networkx as nx
// Defining and visualising the graph
edges = [(0, 1), (1, 5), (5, 6), (5, 4), (1, 2),
(1, 3), (9, 10), (2, 4), (0, 6), (6, 7),
(8, 9), (7, 8), (1, 7), (3, 9)]
goal = 10
G = [Link]()
G.add_edges_from(edges)
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos)
nx.draw_networkx_edges(G, pos)
nx.draw_networkx_labels(G, pos)
[Link]()
6
//Defining the reward the system for the bot
MATRIX_SIZE = 11
M = [Link]([Link](shape =(MATRIX_SIZE, MATRIX_SIZE)))
M *= -1
for point in edges:
print(point)
if point[1] == goal:
M[point] = 100
else:
M[point] = 0
if point[0] == goal:
M[point[::-1]] = 100
else:
M[point[::-1]]= 0
# reverse of point
M[goal, goal]= 100
print(M)
# add goal point round trip
// Defining some utility functions to be used in the training
Q = [Link]([Link]([MATRIX_SIZE, MATRIX_SIZE]))
gamma = 0.75
# learning parameter
initial_state = 1
# Determines the available actions for a given state
def available_actions(state):
current_state_row = M[state, ]
available_action = [Link](current_state_row >= 0)[1]
return available_action
available_action = available_actions(initial_state)
# Chooses one of the available actions at random
def sample_next_action(available_actions_range):
7
next_action = int([Link](available_action, 1))
return next_action
action = sample_next_action(available_action)
def update(current_state, action, gamma):
max_index = [Link](Q[action, ] == [Link](Q[action, ]))[1]
if max_index.shape[0] > 1:
max_index = int([Link](max_index, size = 1))
else:
max_index = int(max_index)
max_value = Q[action, max_index]
Q[current_state, action] = M[current_state, action] + gamma * max_value
if ([Link](Q) > 0):
return([Link](Q / [Link](Q)*100))
else:
return (0)
# Updates the Q-Matrix according to the path chosen
update(initial_state, action, gamma)
//Training and evaluating the bot using the Q-Matrix
scores = []
for i in range(1000):
current_state = [Link](0, int([Link][0]))
available_action = available_actions(current_state)
action = sample_next_action(available_action)
score = update(current_state, action, gamma)
[Link](score)
# print("Trained Q matrix:")
# print(Q / [Link](Q)*100)
# You can uncomment the above two lines to view the trained Q matrix
# Testing
current_state = 0
steps = [current_state]
8
while current_state != 10:
next_step_index = [Link](Q[current_state, ] == [Link](Q[current_state, ]))[1]
if next_step_index.shape[0] > 1:
next_step_index = int([Link](next_step_index, size = 1))
else:
next_step_index = int(next_step_index)
[Link](next_step_index)
current_state = next_step_index
print("Most efficient path:")
print(steps)
[Link](scores)
[Link]('No of iterations')
[Link]('Reward gained')
[Link]()
Most efficient path:
[0,1,3,9,10]
//Defining and visualizing the new graph with the environmental clues
# Defining the locations of the police and the drug traces
police = [2, 4, 5]
drug_traces = [3, 8, 9]
G = [Link]()
G.add_edges_from(edges)
mapping = {0:'0 - Detective', 1:'1', 2:'2 - Police', 3:'3 - Drug traces',
4:'4 - Police', 5:'5 - Police', 6:'6', 7:'7', 8:'Drug traces',
9:'9 - Drug traces', 10:'10 - Drug racket location'}
H = nx.relabel_nodes(G, mapping)
pos = nx.spring_layout(H)
nx.draw_networkx_nodes(H, pos, node_size =[200, 200, 200, 200, 200, 200, 200,
200])
nx.draw_networkx_edges(H, pos)
9
nx.draw_networkx_labels(H, pos)
[Link]()
//Defining some utility functions for the training process
Q = [Link]([Link]([MATRIX_SIZE, MATRIX_SIZE]))
env_police = [Link]([Link]([MATRIX_SIZE, MATRIX_SIZE]))
env_drugs = [Link]([Link]([MATRIX_SIZE, MATRIX_SIZE]))
initial_state = 1
# Same as above
def available_actions(state):
current_state_row = M[state, ]
av_action = [Link](current_state_row >= 0)[1]
return av_action
# Same as above
def sample_next_action(available_actions_range):
next_action = int([Link](available_action, 1))
return next_action
# Exploring the environment
def collect_environmental_data(action):
found = []
if action in police:
[Link]('p')
if action in drug_traces:
[Link]('d')
return (found)
available_action = available_actions(initial_state)
action = sample_next_action(available_action)
10
def update(current_state, action, gamma):
max_index = [Link](Q[action, ] == [Link](Q[action, ]))[1]
if max_index.shape[0] > 1:
max_index = int([Link](max_index, size = 1))
else:
max_index = int(max_index)
max_value = Q[action, max_index]
Q[current_state, action] = M[current_state, action] + gamma * max_value
environment = collect_environmental_data(action)
if 'p' in environment:
env_police[current_state, action] += 1
if 'd' in environment:
env_drugs[current_state, action] += 1
if ([Link](Q) > 0):
return([Link](Q / [Link](Q)*100))
else:
return (0)
# Same as above
update(initial_state, action, gamma)
def available_actions_with_env_help(state):
current_state_row = M[state, ]
av_action = [Link](current_state_row >= 0)[1]
# if there are multiple routes, dis-favor anything negative
env_pos_row = env_matrix_snap[state, av_action]
if ([Link](env_pos_row < 0)):
# can we remove the negative directions from av_act?
temp_av_action = av_action[[Link](env_pos_row)[0]>= 0]
if len(temp_av_action) > 0:
av_action = temp_av_action
return av_action
# Determines the available actions according to the environment
//Visualising the Environmental matrices
11
scores = []
for i in range(1000):
current_state = [Link](0, int([Link][0]))
available_action = available_actions(current_state)
action = sample_next_action(available_action)
score = update(current_state, action, gamma)
# print environmental matrices
print('Police Found')
print(env_police)
print('')
print('Drug traces Found')
print(env_drugs)
//Training and evaluating the model
scores = []
for i in range(1000):
current_state = [Link](0, int([Link][0]))
available_action = available_actions_with_env_help(current_state)
action = sample_next_action(available_action)
score = update(current_state, action, gamma)
[Link](score)
[Link](scores)
[Link]('Number of iterations')
[Link]('Reward gained')
[Link]()
5. OUTPUT SCREENSHOT:
12
6. CONCLUSION:
Therefore, implementation of reinforcement learning algorithm has been
successfully implemented using the Abalone dataset.
7. REFERENCES:
1. [Link]
implementation-using-q-learning/
2. [Link]
python-openai-gym/
13
OUTCOME BASED LAB TASKS
RUBRICS FORM (*to be filled by the lab handling faculty only)
Student name:
Register number:
Name of the laboratory:
Name of the lab handling faculty:
Name of the task:
Experiments mapped:
1.
2.
3.
[Link] Rubrics Reward points awarded
1
2
3
4
5
Total (150 reward points)
14
PROCESS PLAN
Proposed Process Plan Actual Plan Executed
1. Downloading the dataset – 20 mins 1. Downloading the dataset – 20 mins
2. Importing the dataset – 5 mins 2. Importing the dataset – 5 mins
3. Defining the reward system – 30 mins 3. Defining the reward system – 30 mins
4. Training and evaluating – 40 mins 4. Training and evaluating – 40 mins
5. Defining and visualizing the graph – 5. Defining and visualizing the graph –
30 mins 30 mins
6. Training and evaluating the model – 6. Training and evaluating the model –
40 mins 45 mins
Skill: MACHINE LEARNING LABORATORY Date: 29/10/2022 Name: OWSIKAN M
Reflection Sheet
S/N Problems Counter measures Status
1. Difficulty faced during defining the
reward Referred through internet web sources
2. Difficulty faced during defining
some utility function for the training Referred through internet web sources
process
Date: 29/10/2022 Prepared By: OWSIKAN M
Status Legend :
Self-understood and resolved
Discussed with Trainer and resolved
Yet to discuss / find solution
16