opt.compute_gradients() 与 tf.gradients 与 tf.stop_gradient()

最新推荐文章于 2022-12-08 18:50:09 发布

YiqiangXu

最新推荐文章于 2022-12-08 18:50:09 发布

阅读量4.9k

点赞数 2

分类专栏： tensorflow

tensorflow 专栏收录该内容

10 篇文章

订阅专栏

本文详细介绍了TensorFlow中tf.gradients()函数的使用方法，包括如何计算张量的梯度、如何处理与特定变量相关的梯度计算以及如何利用grad_ys参数自定义初始梯度。同时，还探讨了tf.stop_gradient()函数的作用，即如何阻止特定节点的梯度反向传播。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

opt.compute_gradients() :

opt.apply_gradients()

You can also ask your optimizer to take gradients of specific variables. You can also modify the gradients calculated by your optimizer.

tf.gradients():

def gradients(ys, xs, grad_ys=None, name="gradients", colocate_gradients_with_ops=False, gate_gradients=False,
aggregation_method=None)
"""Constructs symbolic partial derivatives of sum of `ys` w.r.t. x in `xs`.

`ys` and `xs` are each a `Tensor` or a list of tensors. `grad_ys`
is a list of `Tensor`, holding the gradients received by the
`ys`. The list must be the same length as `ys`.

`gradients()` adds ops to the graph to output the partial
derivatives of `ys` with respect to `xs`. It returns a list of
`Tensor` of length `len(xs)` where each tensor is the `sum(dy/dx)`
for y in `ys`.

`grad_ys` is a list of tensors of the same length as `ys` that holds
the initial gradients for each y in `ys`. When `grad_ys` is None,
we fill in a tensor of '1's of the shape of y for each y in `ys`. A
user can provide their own initial `grad_ys` to compute the
derivatives using a different initial gradient for each y (e.g., if
one wanted to weight the gradient differently for each value in
each y).

Args:
ys: A `Tensor` or list of tensors to be differentiated.
xs: A `Tensor` or list of tensors to be used for differentiation.
grad_ys: Optional. A `Tensor` or list of tensors the same size as
`ys` and holding the gradients computed for each y in `ys`.
name: Optional name to use for grouping all the gradient ops together.
defaults to 'gradients'.
colocate_gradients_with_ops: If True, try colocating gradients with
the corresponding op.
gate_gradients: If True, add a tuple around the gradients returned
for an operations. This avoids some race conditions.
aggregation_method: Specifies the method used to combine gradient terms.
Accepted values are constants defined in the class `AggregationMethod`.

Returns:
A list of `sum(dy/dx)` for each x in `xs`.

Raises:
LookupError: if one of the operations between `x` and `y` does not
have a registered gradient function.
ValueError: if the arguments are invalid.

"""

tensorflow中有一个计算梯度的函数tf.gradients(ys, xs)，要注意的是，xs中的x必须要与ys相关，不相关的话，会报错。
代码中定义了两个变量w1， w2，但res只与w1相关

#wrong
import tensorflow as tf

w1 = tf.Variable([[1,2]])
w2 = tf.Variable([[3,4]])

res = tf.matmul(w1, [[2],[1]])

grads = tf.gradients(res,[w1,w2])

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    re = sess.run(grads)
    print(re)
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
   1
2
3
4
5
6
7
8
9
10
11
12
13
14

错误信息
TypeError: Fetch argument None has invalid type

# right
import tensorflow as tf

w1 = tf.Variable([[1,2]])
w2 = tf.Variable([[3,4]])

res = tf.matmul(w1, [[2],[1]])

grads = tf.gradients(res,[w1])

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    re = sess.run(grads)
    print(re)
#  [array([[2, 1]], dtype=int32)] # ehe
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

对于grad_ys的测试：

import tensorflow as tf

w1 = tf.get_variable('w1', shape=[3])
w2 = tf.get_variable('w2', shape=[3])

w3 = tf.get_variable('w3', shape=[3])
w4 = tf.get_variable('w4', shape=[3])

z1 = w1 + w2+ w3
z2 = w3 + w4

grads = tf.gradients([z1, z2], [w1, w2, w3, w4], grad_ys=[tf.convert_to_tensor([2.,2.,3.]),
                                                          tf.convert_to_tensor([3.,2.,4.])])

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    print(sess.run(grads))
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

[array([ 2.,  2.,  3.],dtype=float32),
 array([ 2.,  2.,  3.], dtype=float32), 
 array([ 5.,  4.,  7.], dtype=float32), 
 array([ 3.,  2.,  4.], dtype=float32)]
   1
2
3
4
   1
2
3
4

tf.stop_gradient()

阻挡节点BP的梯度

import tensorflow as tf

w1 = tf.Variable(2.0)
w2 = tf.Variable(2.0)

a = tf.multiply(w1, 3.0)
a_stoped = tf.stop_gradient(a)

# b=w1*3.0*w2
b = tf.multiply(a_stoped, w2)
gradients = tf.gradients(b, xs=[w1, w2])
print(gradients)
#输出
#[None, <tf.Tensor 'gradients/Mul_1_grad/Reshape_1:0' shape=() dtype=float32>]
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
   1
2
3
4
5
6
7
8
9
10
11
12
13
14

可见，一个节点被 stop之后，这个节点上的梯度，就无法再向前BP了。由于w1变量的梯度只能来自a节点，所以，计算梯度返回的是None。

a = tf.Variable(1.0)
b = tf.Variable(1.0)

c = tf.add(a, b)

c_stoped = tf.stop_gradient(c)

d = tf.add(a, b)

e = tf.add(c_stoped, d)

gradients = tf.gradients(e, xs=[a, b])

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    print(sess.run(gradients))
#输出 [1.0, 1.0]
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

虽然 c节点被stop了，但是a，b还有从d传回的梯度，所以还是可以输出梯度值的。

import tensorflow as tf

w1 = tf.Variable(2.0)
w2 = tf.Variable(2.0)
a = tf.multiply(w1, 3.0)
a_stoped = tf.stop_gradient(a)

# b=w1*3.0*w2
b = tf.multiply(a_stoped, w2)

opt = tf.train.GradientDescentOptimizer(0.1)

gradients = tf.gradients(b, xs=tf.trainable_variables())

tf.summary.histogram(gradients[0].name, gradients[0])# 这里会报错，因为gradients[0]是None
#其它地方都会运行正常，无论是梯度的计算还是变量的更新。总觉着tensorflow这么设计有点不好，
#不如改成流过去的梯度为0
train_op = opt.apply_gradients(zip(gradients, tf.trainable_variables()))

print(gradients)
with tf.Session() as sess:
    tf.global_variables_initializer().run()
    print(sess.run(train_op))
    print(sess.run([w1, w2]))
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
   
    
   
   1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24