Automatic differentiation package - torch.autograd¶

torch.autograd provides classes and functions implementing automatic differentiation of arbitrary scalar valued functions.

It requires minimal changes to the existing code - you only need to declare Tensor s for which gradients should be computed with the requires_grad=True keyword. As of now, we only support autograd for floating point Tensor types ( half, float, double and bfloat16) and complex Tensor types (cfloat, cdouble).

backward

Compute the sum of gradients of given tensors with respect to graph leaves.

grad

Compute and return the sum of gradients of outputs with respect to the inputs.

Forward-mode Automatic Differentiation¶

Warning

This API is in beta. Even though the function signatures are very unlikely to change, improved operator coverage is planned before we consider this stable.

Please see the forward-mode AD tutorial for detailed steps on how to use this API.

`forward_ad.dual_level`	Context-manager for forward AD, where all forward AD computation must occur within the `dual_level` context.
`forward_ad.make_dual`	Associate a tensor value with its tangent to create a "dual tensor" for forward AD gradient computation.
`forward_ad.unpack_dual`	Unpack a "dual tensor" to get both its Tensor value and its forward AD gradient.
`forward_ad.enter_dual_level`	Enter a new forward grad level.
`forward_ad.exit_dual_level`	Exit a forward grad level.
`forward_ad.UnpackedDualTensor`	Namedtuple returned by `unpack_dual()` containing the primal and tangent components of the dual tensor.

Functional higher level API¶

Warning

This API is in beta. Even though the function signatures are very unlikely to change, major improvements to performances are planned before we consider this stable.

This section contains the higher level API for the autograd that builds on the basic API above and allows you to compute jacobians, hessians, etc.

This API works with user-provided functions that take only Tensors as input and return only Tensors. If your function takes other arguments that are not Tensors or Tensors that don’t have requires_grad set, you can use a lambda to capture them. For example, for a function f that takes three inputs, a Tensor for which we want the jacobian, another tensor that should be considered constant and a boolean flag as f(input, constant, flag=flag) you can use it as functional.jacobian(lambda x: f(x, constant, flag=flag), input).

`functional.jacobian`	Compute the Jacobian of a given function.
`functional.hessian`	Compute the Hessian of a given scalar function.
`functional.vjp`	Compute the dot product between a vector `v` and the Jacobian of the given function at the point given by the inputs.
`functional.jvp`	Compute the dot product between the Jacobian of the given function at the point given by the inputs and a vector `v`.
`functional.vhp`	Compute the dot product between vector `v` and Hessian of a given scalar function at a specified point.
`functional.hvp`	Compute the dot product between the scalar function's Hessian and a vector `v` at a specified point.

Locally disabling gradient computation¶

See Locally disabling gradient computation for more information on the differences between no-grad and inference mode as well as other related mechanisms that may be confused with the two. Also see Locally disabling gradient computation for a list of functions that can be used to locally disable gradients.

Default gradient layouts¶

When a non-sparse param receives a non-sparse gradient during torch.autograd.backward() or torch.Tensor.backward() param.grad is accumulated as follows.

If param.grad is initially None:

If param’s memory is non-overlapping and dense, .grad is created with strides matching param (thus matching param’s layout).
Otherwise, .grad is created with rowmajor-contiguous strides.

If param already has a non-sparse .grad attribute:

If create_graph=False, backward() accumulates into .grad in-place, which preserves its strides.
If create_graph=True, backward() replaces .grad with a new tensor .grad + new grad, which attempts (but does not guarantee) matching the preexisting .grad’s strides.

The default behavior (letting .grads be None before the first backward(), such that their layout is created according to 1 or 2, and retained over time according to 3 or 4) is recommended for best performance. Calls to model.zero_grad() or optimizer.zero_grad() will not affect .grad layouts.

In fact, resetting all .grads to None before each accumulation phase, e.g.:

for iterations...
    ...
    for param in model.parameters():
        param.grad = None
    loss.backward()

such that they’re recreated according to 1 or 2 every time, is a valid alternative to model.zero_grad() or optimizer.zero_grad() that may improve performance for some networks.

Manual gradient layouts¶

If you need manual control over .grad’s strides, assign param.grad = a zeroed tensor with desired strides before the first backward(), and never reset it to None. 3 guarantees your layout is preserved as long as create_graph=False. 4 indicates your layout is likely preserved even if create_graph=True.

In-place operations on Tensors¶

Supporting in-place operations in autograd is a hard matter, and we discourage their use in most cases. Autograd’s aggressive buffer freeing and reuse makes it very efficient and there are very few occasions when in-place operations actually lower memory usage by any significant amount. Unless you’re operating under heavy memory pressure, you might never need to use them.

In-place correctness checks¶

All Tensor s keep track of in-place operations applied to them, and if the implementation detects that a tensor was saved for backward in one of the functions, but it was modified in-place afterwards, an error will be raised once backward pass is started. This ensures that if you’re using in-place functions and not seeing any errors, you can be sure that the computed gradients are correct.

Variable (deprecated)¶

Warning

The Variable API has been deprecated: Variables are no longer necessary to use autograd with tensors. Autograd automatically supports Tensors with requires_grad set to True. Below please find a quick guide on what has changed:

Variable(tensor) and Variable(tensor, requires_grad) still work as expected, but they return Tensors instead of Variables.
var.data is the same thing as tensor.data.
Methods such as var.backward(), var.detach(), var.register_hook() now work on tensors with the same method names.

In addition, one can now create tensors with requires_grad=True using factory methods such as torch.randn(), torch.zeros(), torch.ones(), and others like the following:

autograd_tensor = torch.randn((2, 3, 4), requires_grad=True)

Tensor autograd functions¶

`torch.Tensor.grad`	This attribute is `None` by default and becomes a Tensor the first time a call to `backward()` computes gradients for `self`.
`torch.Tensor.requires_grad`	Is `True` if gradients need to be computed for this Tensor, `False` otherwise.
`torch.Tensor.is_leaf`	All Tensors that have `requires_grad` which is `False` will be leaf Tensors by convention.
`torch.Tensor.backward`([gradient, ...])	Computes the gradient of current tensor wrt graph leaves.
`torch.Tensor.detach`	Returns a new Tensor, detached from the current graph.
`torch.Tensor.detach_`	Detaches the Tensor from the graph that created it, making it a leaf.
`torch.Tensor.register_hook`(hook)	Registers a backward hook.
`torch.Tensor.register_post_accumulate_grad_hook`(hook)	Registers a backward hook that runs after grad accumulation.
`torch.Tensor.retain_grad`()	Enables this Tensor to have their `grad` populated during `backward()`.

Function¶

class torch.autograd.Function(*args, **kwargs)[source][source]¶

Base class to create custom autograd.Function.

To create a custom autograd.Function, subclass this class and implement the forward() and backward() static methods. Then, to use your custom op in the forward pass, call the class method apply. Do not call forward() directly.

To ensure correctness and best performance, make sure you are calling the correct methods on ctx and validating your backward function using torch.autograd.gradcheck().

See Extending torch.autograd for more details on how to use this class.

Examples:

>>> class Exp(Function):
>>>     @staticmethod
>>>     def forward(ctx, i):
>>>         result = i.exp()
>>>         ctx.save_for_backward(result)
>>>         return result
>>>
>>>     @staticmethod
>>>     def backward(ctx, grad_output):
>>>         result, = ctx.saved_tensors
>>>         return grad_output * result
>>>
>>> # Use it by calling the apply method:
>>> output = Exp.apply(input)

`Function.forward`	Define the forward of the custom autograd Function.
`Function.backward`	Define a formula for differentiating the operation with backward mode a