2.4K
Optimizers are algorithms that adjust the parameters of a neural network to minimize the loss function. Learning rate schedulers adjust the learning rate during training to improve performance and convergence.
Common Optimizers
PyTorch provides several built-in optimizers that implement various optimization algorithms.
Stochastic Gradient Descent (SGD):
- Description: A simple and widely used optimization algorithm that updates the model parameters using the gradient of the loss function.
- Usage
import torch.optim as optim
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
Adam (Adaptive Moment Estimation):
- Description: An optimization algorithm that combines the benefits of AdaGrad and RMSProp, using adaptive learning rates for each parameter.
- Usage
optimizer = optim.Adam(model.parameters(), lr=0.001)
RMSProp:
- Description: An optimization algorithm that adjusts the learning rate based on the moving average of squared gradients.
- Usage:
optimizer = optim.RMSprop(model.parameters(), lr=0.01)
Example: Using Different Optimizers
- Optimizer Example:
# Using SGD
optimizer_sgd = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
# Using Adam
optimizer_adam = optim.Adam(model.parameters(), lr=0.001)
# Using RMSProp
optimizer_rmsprop = optim.RMSprop(model.parameters(), lr=0.01)
Learning Rate Schedulers
Learning rate schedulers adjust the learning rate during training to improve convergence and performance.
StepLR:
- Description: Reduces the learning rate by a factor every few epochs.
- Usage
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)
ExponentialLR:
- Description: Decays the learning rate exponentially.
- Usage:
scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9)
ReduceLROnPlateau:
- Description: Reduces the learning rate when a metric has stopped improving.
- Usage
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min')
Example: Combining Optimizers and Schedulers
- Combined Example:
import torch.optim as optim
model = SimpleNN()
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)
for epoch in range(100):
for batch_inputs, batch_targets in dataloader:
optimizer.zero_grad()
outputs = model(batch_inputs)
loss = loss_fn(outputs, batch_targets)
loss.backward()
optimizer.step()
scheduler.step() # Update the learning rate
if epoch % 10 == 0:
print(f'Epoch [{epoch}/100], Loss: {loss.item():.4f}, LR: {scheduler.get_last_lr()[0]}')
Example: ReduceLROnPlateau Scheduler
- ReduceLROnPlateau Example:
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min')
for epoch in range(100):
for batch_inputs, batch_targets in dataloader:
optimizer.zero_grad()
outputs = model(batch_inputs)
loss = loss_fn(outputs, batch_targets)
loss.backward()
optimizer.step()
# Simulate validation loss for demonstration
val_loss = loss_fn(model(val_inputs), val_targets)
scheduler.step(val_loss)
if epoch % 10 == 0:
print(f'Epoch [{epoch}/100], Loss: {loss.item():.4f}, Validation Loss: {val_loss.item():.4f}, LR: {optimizer.param_groups[0]["lr"]}')
This example demonstrates how to combine optimizers and learning rate schedulers to train a neural network more effectively.