Tensor Operations with Autograd

Autograd is PyTorch’s automatic differentiation library. It is a core component for building and training neural networks, as it automatically computes gradients for tensor operations, allowing for easy implementation of backpropagation.

Basic Usage of Autograd

To use autograd, you need to enable gradient computation on tensors by setting requires_grad=True.

Enabling Gradients:

import torch

x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = x + 2
z = y * y * 3
out = z.mean()

print("Tensor x:", x)
print("Tensor y:", y)
print("Tensor z:", z)
print("Output:", out)

Computing Gradients

Once you’ve performed operations on tensors with gradients enabled, you can compute the gradients by calling backward() on the final result.

Backward Pass:

out.backward()
print("Gradient of x:", x.grad)

In this example, out.backward() computes the gradient of out with respect to x, and stores it in x.grad.

Stopping Gradient Tracking

Sometimes you may want to stop tracking gradients for certain computations, such as when performing evaluation or inference. You can achieve this using torch.no_grad().

No Gradient Tracking:

with torch.no_grad():
    y = x * 2
    print("Tensor y without gradient tracking:", y)

Example: Simple Neural Network with Autograd

Here’s an example of a simple neural network training loop using autograd for backpropagation.

Neural Network Example:

import torch.nn as nn

class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(10, 50)
        self.fc2 = nn.Linear(50, 1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

model = SimpleNN()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
criterion = nn.MSELoss()

for epoch in range(100):
    inputs = torch.randn(64, 10)
    targets = torch.randn(64, 1)

    optimizer.zero_grad()  # Zero the gradients
    outputs = model(inputs)  # Forward pass
    loss = criterion(outputs, targets)  # Compute loss
    loss.backward()  # Backward pass (compute gradients)
    optimizer.step()  # Update weights

    if epoch % 10 == 0:
        print(f'Epoch [{epoch}/100], Loss: {loss.item():.4f}')

print("Training completed.")

Example: Custom Gradient Function

You can define custom gradient functions by subclassing torch.autograd.Function.

Custom Gradient Example:

class MyReLU(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
        ctx.save_for_backward(input)
        return input.clamp(min=0)

    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input

relu = MyReLU.apply

x = torch.randn(5, requires_grad=True)
y = relu(x)
y.sum().backward()
print("Input:", x)
print("ReLU Output:", y)
print("Gradient of x:", x.grad)

In this example, we define a custom ReLU function with custom forward and backward methods.

Common Pitfalls and Best Practices with Tensors

Working with tensors in PyTorch can sometimes lead to errors or inefficient code if not done correctly. Understanding common pitfalls and adhering to best practices can help avoid these issues and improve performance.

Common Pitfalls

Incorrect Shape Handling:

Problem: Performing operations on tensors with incompatible shapes can lead to runtime errors.
Solution: Always check tensor shapes before performing operations. Use debugging tools to print tensor shapes.

tensor_a = torch.randn(2, 3)
tensor_b = torch.randn(3, 2)
try:
    result = tensor_a + tensor_b
except RuntimeError as e:
    print(f"Shape mismatch: {e}")

In-Place Operations:

Problem: In-place operations (operations that modify tensors in place) can lead to unintended side effects, especially when tensors require gradients.
Solution: Avoid in-place operations if tensors require gradients or if you are unsure of their side effects.

tensor = torch.tensor([1, 2, 3], requires_grad=True)
# Bad practice: tensor.add_(1)
# Good practice:
tensor = tensor + 1

Detached Tensors in Computation Graphs:

Problem: Accidentally detaching tensors from the computation graph can prevent gradient computation.
Solution: Ensure that tensors are not detached unless explicitly required.

tensor = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
detached_tensor = tensor.detach()  # Only if necessary

Not Using torch.no_grad() for Inference:

Problem: Failing to disable gradient computation during inference can lead to unnecessary memory usage.
Solution: Use torch.no_grad() during inference or evaluation to reduce memory consumption.

with torch.no_grad():
    outputs = model(inputs)
    print("Inference outputs:", outputs)

Best Practices

Use Device-Agnostic Code:

Practice: Write code that can run on both CPUs and GPUs to ensure portability and flexibility.

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
tensor = tensor.to(device)

Utilize DataLoaders:

Practice: Use PyTorch DataLoaders for efficient data loading, batching, and preprocessing.

from torch.utils.data import DataLoader

dataset = torch.randn(100, 10)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

for batch in dataloader:
    print(batch)

Profile Your Code:

Practice: Use PyTorch’s profiling tools to identify bottlenecks and optimize performance.

import torch.autograd.profiler as profiler

with profiler.profile(record_shapes=True) as prof:
    outputs = model(inputs)
print(prof.key_averages().table(sort_by="cpu_time_total"))

Save and Load Models Efficiently:

Practice: Save and load models using state_dict to ensure portability and flexibility.

torch.save(model.state_dict(), 'model.pth')
model.load_state_dict(torch.load('model.pth'))

Example: Best Practices in a Complete Workflow

Here’s an example demonstrating some best practices in a complete workflow:

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(10, 50)
        self.fc2 = nn.Linear(50, 1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

model = SimpleNN().to(device)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

dataset = TensorDataset(torch.randn(100, 10), torch.randn(100, 1))
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

for epoch in range(100):
    for inputs, targets in dataloader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

    if epoch % 10 == 0:
        print(f'Epoch [{epoch}/100], Loss: {loss.item():.4f}')

torch.save(model.state_dict(), 'model.pth')

This example includes device-agnostic code, efficient data loading, proper gradient handling, and model saving.

Part 16: Tensor Operations with Autograd