Introduction to Autograd
Autograd is PyTorch’s automatic differentiation library, a key feature that powers the deep learning capabilities of PyTorch. It enables the automatic computation of gradients, which are essential for training neural networks using backpropagation.
Autograd records the operations performed on tensors to create a computation graph, which is then used to compute gradients. This allows you to focus on building and training models without having to manually compute gradients.
Key Concepts of Autograd
- Computation Graph: A dynamic graph that records the sequence of operations applied to tensors.
- Gradients: Derivatives of a tensor with respect to another tensor, typically used in optimization algorithms to minimize loss functions.
- Backward Pass: The process of computing gradients by traversing the computation graph in reverse.
- Basic Example of Autograd:
import torch
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = x + 2
z = y * y * 3
out = z.mean()
print("Tensor x:", x)
print("Tensor y:", y)
print("Tensor z:", z)
print("Output:", out)
Computing Gradients
Computing gradients is a fundamental operation in training neural networks. In PyTorch, you can compute the gradients of a tensor by calling the backward()
method on the final result of your computation. This method calculates the gradients of the output tensor with respect to the input tensors that have requires_grad=True
.
Basic Gradient Computation
- Example of Backward Pass:
import torch
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = x + 2
z = y * y * 3
out = z.mean()
out.backward() # Computes the gradients
print("Gradient of x:", x.grad)
In this example, out.backward()
computes the gradient of out
with respect to x
, and stores it in x.grad
.
Understanding the Computation Graph
The computation graph is dynamically created during the forward pass. Each operation creates a node in the graph, and each tensor with requires_grad=True
becomes a leaf node. When you call backward()
, PyTorch traverses this graph in reverse to compute the gradients.
- Graph Visualization (Conceptual):
- Input Tensor:
x
- Operation:
y = x + 2
- Operation:
z = y * y * 3
- Operation:
out = z.mean()
- Input Tensor:
- Backward Pass:
- Compute the gradient of
out
with respect toz
. - Compute the gradient of
z
with respect toy
. - Compute the gradient of
y
with respect tox
.
- Compute the gradient of
Retaining Graph for Multiple Backward Passes
By default, the computation graph is freed after the first backward pass to save memory. If you need to perform multiple backward passes, you can retain the graph by passing retain_graph=True
to backward()
.
- Retain Graph Example:
out.backward(retain_graph=True)
out.backward() # Second backward pass without an error