PyTorch中的梯度计算1_梯度计算公式-CSDN博客

本文通过一个简单的PyTorch代码示例，解释了动态图的工作原理以及梯度计算过程。在例子中，使用PyTorch的autograd模块，对权重w进行优化，展示了如何计算梯度并更新参数。讨论了叶子节点的概念，并指出非叶子节点的梯度通常不被保留，但可以通过调用`.retain_grad()`来保存。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

主要用pytorch，对其他的框架用的很少，而且也没经过系统的学习，对动态图和梯度求导没有个准确的认识。今天根据代码仔细的看一下。

import torch
from torch.autograd import Variable

torch.manual_seed(1)
x = torch.Tensor([2])
y = torch.Tensor([10])
loss = torch.nn.MSELoss()
w = Variable(torch.randn(1),requires_grad=True)
print(w)
for i in range(20):
    y_ = w * x
    if y - y_ <= 1e-5:
        break
    else:
        l = loss(y,y_)
        l.backward()
        print('w的梯度为{},数值为{},导数为{}.'.format(w.grad.data, w, -2*x*(y-y_)))
        print('是否叶子节点:', y_.is_leaf, y.is_leaf, w.is_leaf, x.is_leaf)
        print('梯度:', y.grad, w.grad, x.grad)
        print('梯度:', y_.grad)
        w = Variable(w - 0.1 * w.grad.data, requires_grad=True)
print('轮数{}'.format(i))

结果显示为：

tensor([0.6614], requires_grad=True)
w的梯度为tensor([-34.7092]),数值为tensor([0.6614], requires_grad=True),导数为tensor([-34.7092], grad_fn=<MulBackward0>).
是否叶子节点: False True True True
梯度: None tensor([-34.7092]) None

可以看到w的值为0.6614，梯度计算公式为：

在这里插入图片描述
可以看到叶子节点有w，x=2，y=10三个，其中变量只有w，也只有w是叶子节点中有梯度的。y_也是变量，也应当有梯度，但实际上没有对其梯度进行保留。

UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed.

如果想保留梯度的话，需要use .retain_grad()。

梯度: tensor([-17.3546])

这里得到的梯度是：

在这里插入图片描述

tensor([0.6614], requires_grad=True)
w的梯度为tensor([-34.7092]),数值为tensor([0.6614], requires_grad=True),导数为tensor([-34.7092], grad_fn=<MulBackward0>).
w的梯度为tensor([-6.9418]),数值为tensor([4.1323], requires_grad=True),导数为tensor([-6.9418], grad_fn=<MulBackward0>).
w的梯度为tensor([-1.3884]),数值为tensor([4.8265], requires_grad=True),导数为tensor([-1.3884], grad_fn=<MulBackward0>).
w的梯度为tensor([-0.2777]),数值为tensor([4.9653], requires_grad=True),导数为tensor([-0.2777], grad_fn=<MulBackward0>).
w的梯度为tensor([-0.0555]),数值为tensor([4.9931], requires_grad=True),导数为tensor([-0.0555], grad_fn=<MulBackward0>).
w的梯度为tensor([-0.0111]),数值为tensor([4.9986], requires_grad=True),导数为tensor([-0.0111], grad_fn=<MulBackward0>).
w的梯度为tensor([-0.0022]),数值为tensor([4.9997], requires_grad=True),导数为tensor([-0.0022], grad_fn=<MulBackward0>).
w的梯度为tensor([-0.0004]),数值为tensor([4.9999], requires_grad=True),导数为tensor([-0.0004], grad_fn=<MulBackward0>).
w的梯度为tensor([-8.7738e-05]),数值为tensor([5.0000], requires_grad=True),导数为tensor([-8.7738e-05], grad_fn=<MulBackward0>).
轮数9
[tensor([1.])]