Inference and Validation

Inference, a term borrowed from statistics, is the process of using a trained model to make making predictions. However, neural networks have a tendency to perform too well on the training data and aren’t able to generalize to data that hasn’t been seen before. This is called overfitting and it impairs inference performance.

To test for overfitting while training, we measure the performance on data not in the training set called the validation set. We avoid overfitting through regularization such as dropout while monitoring the validation performance during training.

Let’s start by loading the dataset through torchvision. This time we’ll be taking advantage of the test set which you can get by setting train=False here:

testset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', 
                                download=True, 
                                train=False, 
                                transform=transform)

The test set contains images just like the training set. Typically you’ll see 10-20% of the original dataset held out for testing and validation with the rest being used for training.

import torch
from torchvision import datasets, transforms

# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,))])
# Download and load the training data
trainset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', 
                                 download=True, 
                                 train=True, 
                                 transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, 
                                          batch_size=64, 
                                          shuffle=True)

# Download and load the test data
testset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', 
                                download=True, 
                                train=False, 
                                transform=transform)
testloader = torch.utils.data.DataLoader(testset, 
                                         batch_size=64, 
                                         shuffle=True)

Use the same model as was used in the classifying fashion mnist example.

from torch import nn

model = nn.Sequential(nn.Linear(784, 256),
                      nn.ReLU(),
                      nn.Linear(256, 128),
                      nn.ReLU(),
                      nn.Linear(128, 64),
                      nn.ReLU(),
                      nn.Linear(64, 10),
                      nn.LogSoftmax(dim=1))

The goal of validation is to measure the model’s performance on data that isn’t part of the training set. Performance here is up to the developer to define. Typically this is just accuracy, the percentage of classes the network predicted correctly. Other options are precision and recall and top-5 error rate. This example focuses on accuracy.

images, labels = next(iter(testloader))
# Flatten MNIST images into a 784 long vector
images = images.view(images.shape[0], -1)

As explected, below, there are 10 class probabilities for 64 examples, as expected.

# Get the class probabilities
ps = torch.exp(model(images))
print(ps.shape)
torch.Size([64, 10])

With the probabilities, we can get the most likely class using the ps.topk method. This returns the $k$ highest values.

Since we just want the most likely class for each of the 64 examples, we can use ps.topk(1). This returns a tuple of the top-$k$ values and the top-$k$ indices. If the highest value is the fifth element, we’ll get back 4 as the index.

As shown, there is a prediction for each of the 64 examples.

top_p, top_class = ps.topk(1, dim=1)
print(str(top_class.shape) + '\n')
print(top_class[:5,:])
torch.Size([64, 1])

tensor([[9],
        [9],
        [9],
        [9],
        [9]])

Now we can check if the predicted classes match the labels. This is simple to do by equating top_class and labels, but we have to be careful of the shapes. Here top_class is a 2D tensor with shape (64, 1) while labels is 1D with shape (64).

To get the equality to work out the way we want, top_class and labels must have the same shape.

If we do

equals = top_class == labels

equals will have shape (64, 64). What it’s doing is comparing the one element in each row of top_class with each element in labels which returns 64 True/False boolean values for each row.

equals = top_class == labels.view(*top_class.shape)

Now we need to calculate the percentage of correct predictions. equals has binary values, either 0 or 1. This means that if we just sum up all the values and divide by the number of values, we get the percentage of correct predictions. This is the same operation as taking the mean, so we can get the accuracy with a call to torch.mean. If only it was that simple. If you try torch.mean(equals), you’ll get an error

RuntimeError: mean is not implemented for type torch.ByteTensor

This happens because equals has type torch.ByteTensor but torch.mean isn’t implemented for tensors with that type. So we’ll need to convert equals to a float tensor. Note that when we take torch.mean it returns a scalar tensor, to get the actual value as a float we’ll need to do accuracy.item().

accuracy = torch.mean(equals.type(torch.FloatTensor))
print(f'Accuracy: {accuracy.item()*100}%')
Accuracy: 6.25%

The network is untrained so it’s making random guesses and we should see an accuracy around 10%. Now let’s train our network and include our validation pass so we can measure how well the network is performing on the test set. Since we’re not updating our parameters in the validation pass, we can speed up our code by turning off gradients using torch.no_grad():

# turn off gradients
with torch.no_grad():
    # validation pass here
    for images, labels in testloader:
        ...

Training

Print Total Accuracy for Each Validation Loop

The following is a helper function to show losses and accuracies over the course of training.

%matplotlib inline

import matplotlib.pyplot as plt

def plot_losses_and_accuracies(train_losses, test_losses, accuracies, title):
    fig, (ax1, ax2) = plt.subplots(figsize=(12,6), ncols=2)
    
    plt.suptitle(title)
    
    ax1.set_title('Losses')
    ax1.set_ylim(0,1)
    ax1.plot(train_losses, label='Training loss')
    ax1.plot(test_losses, label='Validation loss')
    ax1.legend(frameon=False)
    
    ax2.set_title('Accuracy')
    ax2.set_ylim(0,1)
    ax2.plot(accuracies, label='Accuracy')

The following is a helper function that performs the actual training.

from torch import optim

def train_model(model, optimizer_str):
    criterion = nn.NLLLoss()
    if optimizer_str == 'SGD':
        optimizer = optim.SGD(model.parameters(),
                              lr=0.003)
    elif optimizer_str == 'Adam':
        optimizer = optim.Adam(model.parameters(),
                               lr=0.003)

    epochs = 50

    print("\tepoch\ttrain_loss\ttest_loss\taccuracy")
    train_losses, test_losses, accuracies = [], [], []
    for epoch in range(epochs):
        running_loss, test_loss, accuracy = 0, 0, 0
        for images, labels in trainloader:
            # Flatten MNIST images into a 784 long vector
            images = images.view(images.shape[0], -1)

            optimizer.zero_grad()

            log_ps = model(images)
            loss = criterion(log_ps, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()

        # Turn off gradients for validation, saves memory and computations
        with torch.no_grad():
            # Turn off dropout for validation
            model.eval()
            
            for images, labels in testloader:
                # Flatten MNIST images into a 784 long vector
                images = images.view(images.shape[0], -1)

                log_ps = model(images)
                test_loss += criterion(log_ps, labels)

                ps = torch.exp(log_ps)
                top_p, top_class = ps.topk(1, dim=1)
                equals = top_class == labels.view(*top_class.shape)
                accuracy += torch.mean(equals.type(torch.FloatTensor))
        
        # Turn on dropout for validation
        model.train()

        train_losses.append(running_loss/len(trainloader))
        test_losses.append(test_loss/len(testloader))

        accuracy = accuracy.item()/len(testloader)
        accuracies.append(accuracy)

        if epoch == 0 or (((epoch+1) % 10) == 0):
            print("\t{:5}\t{:10.3}\t{:9.3}\t{:8.3}".format(epoch+1,
                                                           train_losses[-1],
                                                           test_losses[-1],
                                                           accuracies[-1]))

    plot_losses_and_accuracies(train_losses, 
                               test_losses, 
                               accuracies, 
                               optimizer_str)

Adam Optimizer

from torch import nn

model_1 = nn.Sequential(nn.Linear(784, 256),
                        nn.ReLU(),
                        nn.Linear(256, 128),
                        nn.ReLU(),
                        nn.Linear(128, 64),
                        nn.ReLU(),
                        nn.Linear(64, 10),
                        nn.LogSoftmax(dim=1))

train_model(model_1, optimizer_str = 'Adam')
	epoch	train_loss	test_loss	accuracy
	    1	     0.516	    0.462	   0.833
	   10	     0.271	    0.364	   0.876
	   20	     0.219	     0.42	   0.877
	   30	     0.183	    0.408	   0.885
	   40	     0.162	    0.488	   0.886
	   50	     0.156	    0.515	   0.882

png

SGD Optimizer

model_2 = nn.Sequential(nn.Linear(784, 256),
                        nn.ReLU(),
                        nn.Linear(256, 128),
                        nn.ReLU(),
                        nn.Linear(128, 64),
                        nn.ReLU(),
                        nn.Linear(64, 10),
                        nn.LogSoftmax(dim=1))

train_model(model_2, optimizer_str = 'SGD')
	epoch	train_loss	test_loss	accuracy
	    1	      2.09	     1.64	    0.43
	   10	     0.485	    0.509	   0.814
	   20	       0.4	    0.445	   0.839
	   30	     0.355	    0.403	   0.854
	   40	     0.322	    0.382	   0.863
	   50	     0.296	    0.358	   0.872

png

Overfitting

If we look at the training and validation losses as we train the network, we can see a phenomenon known as overfitting.

The network learns the training set better and better, resulting in lower training losses. However, it starts having problems generalizing to data outside the training set leading to the validation loss increasing. The ultimate goal of any deep learning model is to make predictions on new data, so we should strive to get the lowest validation loss possible.

One option is to use the version of the model with the lowest validation loss, here the one around 8-10 training epochs. This strategy is called early-stopping. In practice, you’d save the model frequently as you’re training then later choose the model with the lowest validation loss.

The most common method to reduce overfitting (outside of early-stopping) is dropout, where we randomly drop input units. This forces the network to share information between weights, increasing it’s ability to generalize to new data. Adding dropout in PyTorch is straightforward using the nn.Dropout module.

class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 64)
        self.fc4 = nn.Linear(64, 10)
        
        # Dropout module with 0.2 drop probability
        self.dropout = nn.Dropout(p=0.2)
        
    def forward(self, x):
        # make sure input tensor is flattened
        x = x.view(x.shape[0], -1)
        
        # Now with dropout
        x = self.dropout(F.relu(self.fc1(x)))
        x = self.dropout(F.relu(self.fc2(x)))
        x = self.dropout(F.relu(self.fc3(x)))
        
        # output so no dropout here
        x = F.log_softmax(self.fc4(x), dim=1)
        
        return x

During training we want to use dropout to prevent overfitting, but during inference we want to use the entire network. So, we need to turn off dropout during validation, testing, and whenever we’re using the network to make predictions. To do this, you use model.eval(). This sets the model to evaluation mode where the dropout probability is 0. You can turn dropout back on by setting the model to train mode with model.train(). In general, the pattern for the validation loop will look like this, where you turn off gradients, set the model to evaluation mode, calculate the validation loss and metric, then set the model back to train mode.

# turn off gradients
with torch.no_grad():
    
    # set model to evaluation mode
    model.eval()
    
    # validation pass here
    for images, labels in testloader:
        ...

# set model back to train mode
model.train()

Add Dropout

Note the overfitting is reduced if not eliminated.

Adam Optimizer

dropout_model_1 = nn.Sequential(nn.Dropout(0.2),
                                nn.Linear(784, 256),
                                nn.ReLU(),
                                nn.Dropout(0.2),
                                nn.Linear(256, 128),
                                nn.ReLU(),
                                nn.Dropout(0.2),
                                nn.Linear(128, 64),
                                nn.ReLU(),
                                nn.Dropout(0.2),
                                nn.Linear(64, 10),
                                nn.LogSoftmax(dim=1))

train_model(dropout_model_1, optimizer_str = 'Adam')
	epoch	train_loss	test_loss	accuracy
	    1	     0.626	    0.479	   0.828
	   10	     0.435	     0.38	   0.864
	   20	     0.413	    0.376	    0.87
	   30	     0.402	    0.367	   0.871
	   40	     0.392	    0.398	   0.869
	   50	      0.38	    0.385	   0.871

png

SGD Optimizer

dropout_model_2 = nn.Sequential(nn.Dropout(0.2),
                                nn.Linear(784, 256),
                                nn.ReLU(),
                                nn.Dropout(0.2),
                                nn.Linear(256, 128),
                                nn.ReLU(),
                                nn.Dropout(0.2),
                                nn.Linear(128, 64),
                                nn.ReLU(),
                                nn.Dropout(0.2),
                                nn.Linear(64, 10),
                                nn.LogSoftmax(dim=1))

train_model(dropout_model_2, optimizer_str = 'SGD')
	epoch	train_loss	test_loss	accuracy
	    1	       2.2	     1.92	   0.365
	   10	     0.645	    0.566	   0.789
	   20	     0.534	    0.478	   0.823
	   30	     0.485	    0.434	   0.841
	   40	     0.453	    0.412	    0.85
	   50	     0.432	    0.394	   0.858

png

Inference

Now that the model is trained, it can be used for inference. We need to set the model in inference mode with model.eval() and turn off autograd with the torch.no_grad() context.

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

def view_classify(img, ps):
    ps = ps.data.numpy().squeeze()

    fig, (ax1, ax2) = plt.subplots(figsize=(6,9), ncols=2)
    ax1.imshow(img.resize_(1, 28, 28).numpy().squeeze())
    ax1.axis('off')
    
    ax2.barh(np.arange(10), ps)
    ax2.set_aspect(0.1)
    ax2.set_yticks(np.arange(10))
    ax2.set_yticklabels(['T-shirt/top',
                         'Trouser',
                         'Pullover',
                         'Dress',
                         'Coat',
                         'Sandal',
                         'Shirt',
                         'Sneaker',
                         'Bag',
                         'Ankle Boot'], size='small');
    ax2.set_title('Class Probability')
    ax2.set_xlim(0, 1.1)

    plt.tight_layout()

Use model trained with dropout and the ‘Adam’ optimizer.

dropout_model_1.eval()

dataiter = iter(testloader)
for _ in range(10):
    images, labels = dataiter.next()
    img = images[0]
    
    # Convert 2D image to 1D vector
    img = img.view(1, 784)

    # Calculate the class probabilities (softmax) for img
    with torch.no_grad():
        output = dropout_model_1.forward(img)

    ps = torch.exp(output)

    # Plot the image and probabilities
    view_classify(img.view(1, 28, 28), ps)

png

png

png

png

png

png

png

png

png

png