Dueing DQN改进

### 关于DQN改进方法 #### Double DQN 为了改善原始DQN算法中存在的过估计问题，在Double DQN中引入了一种新的机制来更精确地评估动作价值函数。通过分离选择和评价的动作，即利用当前策略选择最佳行动但使用旧的目标网络对其进行估值的方式，降低了传统DQN可能产生的过高估价偏差[^1]。 ```python def double_dqn_loss(current_q_values, target_next_q_values, rewards, dones, gamma=0.99): best_action_indices = torch.argmax(current_q_values, dim=-1).unsqueeze(-1) selected_target_q_values = target_next_q_values.gather(1, best_action_indices).squeeze() expected_q_values = rewards + (gamma * selected_target_q_values * (1 - dones)) loss = F.smooth_l1_loss(current_q_values.gather(1, actions.unsqueeze(-1)).squeeze(), expected_q_values.detach()) return loss ``` #### Dueling DQN 此版本的DQN旨在解决标准架构下难以区分状态好坏的问题。它将Q值分解成两个部分——V(s)，表示给定状态下获得的最大奖励；A(a|s), 表示采取特定行为a相对于其他可行选项所能带来的额外收益。这种设计有助于更好地理解不同情境下的相对优势，并提高了模型的学习效率。 ```python class DuelingNetwork(nn.Module): def __init__(self, input_size, output_size): super(DuelingNetwork, self).__init__() self.feature_layer = nn.Sequential( nn.Linear(input_size, 128), nn.ReLU(), nn.Linear(128, 128), nn.ReLU() ) self.value_stream = nn.Sequential( nn.Linear(128, 128), nn.ReLU(), nn.Linear(128, 1) ) self.advantage_stream = nn.Sequential( nn.Linear(128, 128), nn.ReLU(), nn.Linear(128, output_size) ) def forward(self, state): features = self.feature_layer(state) value = self.value_stream(features) advantages = self.advantage_stream(features) qvals = value + (advantages - advantages.mean(dim=1, keepdim=True)) return qvals ``` #### Noisy DQN 针对探索不足的情况,Noisy DQN通过对线性层权重加入随机噪声实现自动调整探索强度的功能。这种方法不仅简化了超参数调优的过程，而且使得代理能够在不依赖外部激励的情况下维持有效的探索活动。 ```python import math from functools import partial class FactorizedNoisyLinear(nn.Module): def __init__(self, in_features, out_features, std_init=0.5): super(FactorizedNoisyLinear, self).__init__() self.in_features = in_features self.out_features = out_features self.std_init = std_init self.weight_mu = nn.Parameter(torch.empty(out_features, in_features)) self.weight_sigma = nn.Parameter(torch.empty(out_features, in_features)) self.register_buffer('weight_epsilon', None) self.bias_mu = nn.Parameter(torch.empty(out_features)) self.bias_sigma = nn.Parameter(torch.empty(out_features)) self.register_buffer('bias_epsilon', None) self.reset_parameters() @staticmethod def scale_factor(x): return x.abs().sqrt() / math.sqrt(x.numel()) def reset_parameters(self): mu_range = 1 / math.sqrt(self.in_features) self.weight_mu.data.uniform_(-mu_range, mu_range) self.weight_sigma.data.fill_(self.std_init / math.sqrt(self.in_features)) self.bias_mu.data.uniform_(-mu_range, mu_range) self.bias_sigma.data.fill_(self.std_init / math.sqrt(self.out_features)) def _factors(self, size): epsilon_in = torch.randn(size[1]) epsilon_out = torch.randn(size[0]) sign_eps = torch.sign(epsilon_in) * torch.sign(epsilon_out).view(-1, 1) eps = sign_eps * ((torch.abs(epsilon_in)*FactorizedNoisyLinear.scale_factor(epsilon_in)) * (torch.abs(epsilon_out.view(-1, 1))*FactorizedNoisyLinear.scale_factor(epsilon_out))) return eps def sample_noise(self): if self.training: weight_epsilon = self._factors((self.out_features, self.in_features)) bias_epsilon = self._factors([self.out_features]) with torch.no_grad(): self.weight_epsilon.copy_(weight_epsilon) self.bias_epsilon.copy_(bias_epsilon) def forward(self, inputs): if not self.training or self.weight_epsilon is None: self.sample_noise() return F.linear(inputs, self.weight_mu + self.weight_sigma*self.weight_epsilon, self.bias_mu + self.bias_sigma*self.bias_epsilon) noisy_linear = partial(FactorizedNoisyLinear, std_init=0.4) ``` #### PER DQN 优先级经验重放缓冲区(PER)允许那些具有更大TD误差的经验被更多次采样，从而加速学习过程并减少方差。这有助于更快地纠正错误预测，并使训练更加稳定高效。

阅读全文

相关推荐

基于Python的DQN改进方法设计与实现源码

最基础的DQN，DQN模型改进，DQN算法改进，分层DRL

强化学习DQN ppt课件

DQN

dqn_DQN算法_DQN_DQN实现_

DQN.rar_DQN_DQN算法_dqn c++

基于改进的DQN机器人路径规划.pdf

DQN.zip_DQN_DQN demo_DQN算法_airplanepsp_tensorflow实现dqn

深度强化学习Rainbow DQN：综合改进与性能提升

【进阶】Double DQN的改进与实现

DQN算法优化与改进研究：TensorFlow实现

改进DQN

DQN的改进算法有哪些

transformer改进dqn

改进经验回放dqn模型

DQN算法的几种改进算法

Dueling DQN和DQN

Rainbow与DQN相比作了哪些改进？

基于改进dqn算法的复杂海战场路径规划方法

计算机销售工作总结.doc

大家在看

基于python开发的工商企业名录查询软件v2.2.4下载

ruijin_round2：瑞金医院MMC人工智能辅助建立知识图谱大赛复赛

泛微e8后台维护手册

虚拟光驱DAEMON（支持2000/XP/2003）

ISO/IEC 27001:2022与ISO 27002:2022最新版中英文版合集

最新推荐

计算机销售工作总结.doc

计算机专业项目代码：ASP民航售票管理系统的设计与实现(源代码+论文).7z

复变函数与积分变换完整答案解析

【性能测试基准】：为RK3588选择合适的NVMe性能测试工具指南

buildroot安装miniconda

局域网聊天工具：C#与MSMQ技术结合源码解析

【固态硬盘寿命延长】：RK3588平台NVMe维护技巧大公开

c语言标准库 map

基于C++的联网对战五子棋游戏开发

【故障恢复策略】：RK3588与NVMe固态硬盘的容灾方案指南