1.前言
1.1一直都觉得深度强化学习(DRL Deepein Reinforcement Learning)是一个很神奇的技术,利用奖励去(Reward)诱导神经网络(Neural network)学习参数,调整策略(Policy),使得智能体(Agent)做出适合当前局面(State)的动作(Action).
1.2技术很神奇,但是学起来还是有些难度的,就上面这句话,就包含了深度强化学习的5个基本概念. DRL到目前给人的感觉就是概念异常的多, 或许是因为有一些概念难以用简练的语言描述.

1.3总而言之, 想要学好一个技术, 总得多实践, 因此,本博文参考了Github上一位大佬的项目, 做了些改进和优化, 同时根据大佬的README文档做了些笔记记录在此, 以供学习, 如有错误还望指正.
2.代码
2.1话不多说,先贴代码,后面再附上对于代码的解读:
import random
import gym
import numpy as np
from collections import deque
import tensorflow as tf
import os
EPISODES = 1000
class DQNAgent:
def __init__(self, state_size, action_size):
self.state_size = state_size
self.action_size = action_size
self.memory = deque(maxlen=2000)
self.gamma = 0.95
self.epsilon = 1.0
self.epsilon_min = 0.01
self.epsilon_decay = 0.995
self.learning_rate = 0.005
self.model = self._build_model()
def _build_model(self):
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(
12, input_dim=self.state_size, activation=tf.keras.layers.LeakyReLU(alpha=0.1)))
model.add(tf.keras.layers.Dense(
12, activation=tf.keras.layers.LeakyReLU(alpha=0.1)))
model.add(tf.keras.layers.Dense(
12, activation=tf.keras.layers.LeakyReLU(alpha=0.1)))
model.add(tf.keras.layers.Dense(
12, activation=tf.keras.layers.LeakyReLU(alpha=0.1)))
model.add(tf.keras.layers.Dense(self.action_size, activation='linear'))
model.compile(loss='mse',optimizer=tf.optimizers.Adam(lr=self.learning_rate))
return model
def memorize(self, state, action, reward, next_state, done):
self.memory.