PyTorch 101
As we've discussed at the course, PyTorch is eagarly execute, meaning that we'll write normal Python programs. For the below tutorial, please start a new jupyter notebook and follow along in it.
Tensors and Variables
PyTorch Tensors are similar in behaviour to NumPy’s arrays.
>>> import torch
>>> a = torch.Tensor([[1,2],[3,4]])
>>> print(a)
1 2
3 4
[torch.FloatTensor of size 2x2]
>>> print(a**2)
1 4
9 16
[torch.FloatTensor of size 2x2]
PyTorch Variables allow you to wrap a Tensor and record operations performed on it. This allows you to perform automatic differentiation.
>>> from torch.autograd import Variable
>>> a = Variable(torch.Tensor([[1,2],[3,4]]), requires_grad=True)
>>> print(a)
Variable containing:
1 2
3 4
[torch.FloatTensor of size 2x2]
>>> y = torch.sum(a**2) # 1 + 4 + 9 + 16
>>> print(y)
Variable containing:
30
[torch.FloatTensor of size 1]
>>> y.backward() # compute gradients of y wrt a
>>> print(a.grad) # print dy/da_ij = 2*a_ij for a_11, a_12, a21, a22
Variable containing:
2 4
6 8
[torch.FloatTensor of size 2x2]
Core Training Step
Let’s begin with a look at what the heart of our training algorithm looks like. The five lines below pass a batch of inputs through the model, calculate the loss, perform backpropagation and update the parameters. (we won't run this code, we'll createa model instance below)
output_batch = model(train_batch) # compute model output
loss = loss_fn(output_batch, labels_batch) # calculate loss
optimizer.zero_grad() # clear previous gradients
loss.backward() # compute gradients of all variables wrt loss
optimizer.step() # perform updates using calculated gradients
Each of the variables train_batch
, labels_batch
, output_batch
and loss
is a
PyTorch Variable and allows derivates to be automatically calculated.
All the other code that we write is built around this- the exact specification of the model, how to fetch a batch of data and labels, computation of the loss and the details of the optimizer. In this post, we’ll cover how to write a simple model in PyTorch, compute the loss and define an optimizer. The subsequent posts each cover a case of fetching data- one for image data and another for text data.
Models in PyTorch
A model can be defined in PyTorch by subclassing the torch.nn.Module
class.
The model is defined in two steps. We first specify the parameters of the
model, and then outline how they are applied to the inputs. For operations that
do not involve trainable parameters (activation functions such as ReLU,
operations like maxpool), we generally use the torch.nn.functional
module.
import torch.nn as nn
import torch.nn.functional as F
class TwoLayerNet(nn.Module):
def __init__(self, D_in, H, D_out):
"""
In the constructor we instantiate two nn.Linear modules and assign them as
member variables.
D_in: input dimension
H: dimension of hidden layer
D_out: output dimension
"""
super(TwoLayerNet, self).__init__()
self.linear1 = nn.Linear(D_in, H)
self.linear2 = nn.Linear(H, D_out)
def forward(self, x):
"""
In the forward function we accept a Variable of input data and we must
return a Variable of output data. We can use Modules defined in the
constructor as well as arbitrary operators on Variables.
"""
h_relu = F.relu(self.linear1(x))
y_pred = self.linear2(h_relu)
return y_pred
The __init__
function initialises the two linear layers of the model. PyTorch
takes care of the proper initialization of the parameters you specify. In the
forward
function, we first apply the first linear layer, apply ReLU activation
and then apply the second linear layer. The module assumes that the first
dimension of x
is the batch size. If the input to the network is simply a
vector of dimension 100, and the batch size is 32, then the dimension of x
would be 32,100. Let’s see an example of how to define a model and compute a
forward pass:
#N is batch size; D_in is input dimension;
#H is the dimension of the hidden layer; D_out is output dimension.
N, D_in, H, D_out = 32, 100, 50, 10
#Create random Tensors to hold inputs and outputs, and wrap them in Variables
x = Variable(torch.randn(N, D_in)) # dim: 32 x 100
#Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)
#Forward pass: Compute predicted y by passing x to the model
y_pred = model(x) # dim: 32 x 10
Loss Function
PyTorch comes with many standard loss functions available for you to use in the
torch.nn
module. Here’s a simple example of how to calculate Cross Entropy
Loss. Let’s say our model solves a multi-class classification problem with C
labels. Then for a batch of size N
, out
is a PyTorch Variable of dimension NxC
that is obtained by passing an input batch through the model. We also have a
target
Variable of size N
, where each element is the class for that example,
i.e. a label in [0,...,C-1]
. You can define the loss function and compute the
loss as follows:
loss_fn = nn.CrossEntropyLoss()
loss = loss_fn(out, target)
PyTorch makes it very easy to extend this and write your own custom loss function. We can write our own Cross Entropy Loss function as below (note the NumPy-esque syntax):
def myCrossEntropyLoss(outputs, labels):
batch_size = outputs.size()[0] # batch_size
outputs = F.log_softmax(outputs, dim=1) # compute the log of softmax values
outputs = outputs[range(batch_size), labels] # pick the values corresponding to the labels
return -torch.sum(outputs)/num_examples
Optimizer
The torch.optim
package provides an easy to use interface for common
optimization algorithms. Defining your optimizer is really as simple as:
#pick an SGD optimizer
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01, momentum=0.9)
#or pick ADAM
optimizer = torch.optim.Adam(model.parameters(), lr = 0.0001)
You pass in the parameters of the model that need to be updated every iteration. You can also specify more complex methods such as per-layer or even per-parameter learning rates.
Once gradients have been computed using loss.backward()
, calling
optimizer.step()
updates the parameters as defined by the optimization
algorithm.
Training vs Evaluation
Before training the model, it is imperative to call model.train()
. Likewise,
you must call model.eval()
before testing the model. This corrects for the
differences in dropout, batch normalization during training and testing.
Painless Debugging
With its clean and minimal design, PyTorch makes debugging a breeze. You can
place breakpoints using pdb.set_trace()
at any line in your code. You can then
execute further computations, examine the PyTorch Tensors/Variables and
pinpoint the root cause of the error.