Part 18: PyTorch, Gradient Calculation with Non-Scalar Outputs

When dealing with non-scalar outputs (tensors with more than one element), you need to specify the gradient argument in the backward() method. This gradient argument should be a tensor of the same shape as the output tensor.

Example with Non-Scalar Outputs

Non-Scalar Output Example:

import torch

x = torch.tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)
y = x * 2
z = y ** 2

# Define a gradient tensor for the output
grad_tensor = torch.tensor([[1.0, 1.0], [1.0, 1.0]])

z.backward(grad_tensor)
print("Gradient of x:", x.grad)

n this example, z is a non-scalar tensor. To compute its gradient with respect to x, we pass a gradient tensor (grad_tensor) of the same shape as z to backward().

Example: Summing the Output Tensor

Often, you might want to sum the output tensor before computing gradients. This results in a scalar, simplifying the gradient calculation.

Sum Example:

x = torch.tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)
y = x * 2
z = y ** 2

z_sum = z.sum()  # Sum to get a scalar output
z_sum.backward()
print("Gradient of x after sum:", x.grad)

Example: Specifying the Gradient Argument

Specifying the gradient argument can also be useful for more control over the backward pass, especially in custom gradient calculations.

Custom Gradient Argument Example:

x = torch.tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)
y = x * 2
z = y ** 2

# Define a custom gradient tensor
grad_tensor = torch.tensor([[0.1, 0.2], [0.3, 0.4]])

z.backward(grad_tensor)
print("Gradient of x with custom gradient argument:", x.grad)

In this example, the custom gradient tensor (grad_tensor) scales the gradients computed during the backward pass.

Stopping Gradient Tracking

In some situations, you might want to perform operations on tensors without tracking gradients. This is particularly useful during inference, evaluation, or when performing operations that should not affect the computation graph. PyTorch provides context managers and methods to stop gradient tracking.

Using `torch.no_grad()`

The torch.no_grad() context manager temporarily sets all the requires_grad flags to False. This is useful for inference or evaluation to reduce memory usage and improve computational efficiency.

No Gradient Tracking Example:

import torch

x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
with torch.no_grad():
    y = x * 2
    print("Tensor y without gradient tracking:", y)

In this example, y is computed without tracking gradients, so it won’t be part of the computation graph.

Using `detach()`

The detach() method creates a new tensor that shares data with the original tensor but does not require gradients. This method is useful when you want to perform operations on a tensor without affecting its gradients.

Detach Example:

x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = x.detach()
z = y * 2
print("Tensor y (detached):", y)
print("Tensor z (operation on detached tensor):", z)

In this example, y is a detached version of x, and operations on y do not affect the computation graph of x.

Combining Both Methods

You can combine torch.no_grad() and detach() for more complex scenarios where you need both functionalities.

Combined Example:

x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
with torch.no_grad():
    y = x.detach() * 2
    print("Tensor y with combined no_grad and detach:", y)

In this example, y is computed within a torch.no_grad() context and is detached from x.

Part 18: PyTorch, Gradient Calculation with Non-Scalar Outputs

Example with Non-Scalar Outputs

Example: Summing the Output Tensor

Example: Specifying the Gradient Argument

Using torch.no_grad()

Using detach()

Combining Both Methods

Part 17: PyTorch, Introduction to Autograd

Part 19: PyTorch, Custom Gradient Functions

Related Posts

Using `torch.no_grad()`

Using `detach()`