# Inference and Validation

Inference, a term borrowed from statistics, is the process of using a trained model to make making predictions. However, neural networks have a tendency to perform too well on the training data and aren’t able to generalize to data that hasn’t been seen before. This is called overfitting and it impairs inference performance.

To test for overfitting while training, we measure the performance on data not in the training set called the validation set. We avoid overfitting through regularization such as dropout while monitoring the validation performance during training.

Let’s start by loading the dataset through torchvision. This time we’ll be taking advantage of the test set which you can get by setting train=False here:

testset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/',
train=False,
transform=transform)


The test set contains images just like the training set. Typically you’ll see 10-20% of the original dataset held out for testing and validation with the rest being used for training.

import torch
from torchvision import datasets, transforms

# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])
trainset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/',
train=True,
transform=transform)
batch_size=64,
shuffle=True)

testset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/',
train=False,
transform=transform)
batch_size=64,
shuffle=True)


Use the same model as was used in the classifying fashion mnist example.

from torch import nn

model = nn.Sequential(nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 10),
nn.LogSoftmax(dim=1))


The goal of validation is to measure the model’s performance on data that isn’t part of the training set. Performance here is up to the developer to define. Typically this is just accuracy, the percentage of classes the network predicted correctly. Other options are precision and recall and top-5 error rate. This example focuses on accuracy.

images, labels = next(iter(testloader))
# Flatten MNIST images into a 784 long vector
images = images.view(images.shape[0], -1)


As explected, below, there are 10 class probabilities for 64 examples, as expected.

# Get the class probabilities
ps = torch.exp(model(images))
print(ps.shape)

torch.Size([64, 10])


With the probabilities, we can get the most likely class using the ps.topk method. This returns the $k$ highest values.

Since we just want the most likely class for each of the 64 examples, we can use ps.topk(1). This returns a tuple of the top-$k$ values and the top-$k$ indices. If the highest value is the fifth element, we’ll get back 4 as the index.

As shown, there is a prediction for each of the 64 examples.

top_p, top_class = ps.topk(1, dim=1)
print(str(top_class.shape) + '\n')
print(top_class[:5,:])

torch.Size([64, 1])

tensor([[9],
[9],
[9],
[9],
[9]])


Now we can check if the predicted classes match the labels. This is simple to do by equating top_class and labels, but we have to be careful of the shapes. Here top_class is a 2D tensor with shape (64, 1) while labels is 1D with shape (64).

To get the equality to work out the way we want, top_class and labels must have the same shape.

If we do

equals = top_class == labels


equals will have shape (64, 64). What it’s doing is comparing the one element in each row of top_class with each element in labels which returns 64 True/False boolean values for each row.

equals = top_class == labels.view(*top_class.shape)


Now we need to calculate the percentage of correct predictions. equals has binary values, either 0 or 1. This means that if we just sum up all the values and divide by the number of values, we get the percentage of correct predictions. This is the same operation as taking the mean, so we can get the accuracy with a call to torch.mean. If only it was that simple. If you try torch.mean(equals), you’ll get an error

RuntimeError: mean is not implemented for type torch.ByteTensor


This happens because equals has type torch.ByteTensor but torch.mean isn’t implemented for tensors with that type. So we’ll need to convert equals to a float tensor. Note that when we take torch.mean it returns a scalar tensor, to get the actual value as a float we’ll need to do accuracy.item().

accuracy = torch.mean(equals.type(torch.FloatTensor))
print(f'Accuracy: {accuracy.item()*100}%')

Accuracy: 6.25%


The network is untrained so it’s making random guesses and we should see an accuracy around 10%. Now let’s train our network and include our validation pass so we can measure how well the network is performing on the test set. Since we’re not updating our parameters in the validation pass, we can speed up our code by turning off gradients using torch.no_grad():

# turn off gradients
# validation pass here
...


### Training

Print Total Accuracy for Each Validation Loop

The following is a helper function to show losses and accuracies over the course of training.

%matplotlib inline

import matplotlib.pyplot as plt

def plot_losses_and_accuracies(train_losses, test_losses, accuracies, title):
fig, (ax1, ax2) = plt.subplots(figsize=(12,6), ncols=2)

plt.suptitle(title)

ax1.set_title('Losses')
ax1.set_ylim(0,1)
ax1.plot(train_losses, label='Training loss')
ax1.plot(test_losses, label='Validation loss')
ax1.legend(frameon=False)

ax2.set_title('Accuracy')
ax2.set_ylim(0,1)
ax2.plot(accuracies, label='Accuracy')


The following is a helper function that performs the actual training.

from torch import optim

def train_model(model, optimizer_str):
criterion = nn.NLLLoss()
if optimizer_str == 'SGD':
optimizer = optim.SGD(model.parameters(),
lr=0.003)
lr=0.003)

epochs = 50

print("\tepoch\ttrain_loss\ttest_loss\taccuracy")
train_losses, test_losses, accuracies = [], [], []
for epoch in range(epochs):
running_loss, test_loss, accuracy = 0, 0, 0
# Flatten MNIST images into a 784 long vector
images = images.view(images.shape[0], -1)

log_ps = model(images)
loss = criterion(log_ps, labels)
loss.backward()
optimizer.step()

running_loss += loss.item()

# Turn off gradients for validation, saves memory and computations
# Turn off dropout for validation
model.eval()

# Flatten MNIST images into a 784 long vector
images = images.view(images.shape[0], -1)

log_ps = model(images)
test_loss += criterion(log_ps, labels)

ps = torch.exp(log_ps)
top_p, top_class = ps.topk(1, dim=1)
equals = top_class == labels.view(*top_class.shape)
accuracy += torch.mean(equals.type(torch.FloatTensor))

# Turn on dropout for validation
model.train()

accuracies.append(accuracy)

if epoch == 0 or (((epoch+1) % 10) == 0):
print("\t{:5}\t{:10.3}\t{:9.3}\t{:8.3}".format(epoch+1,
train_losses[-1],
test_losses[-1],
accuracies[-1]))

plot_losses_and_accuracies(train_losses,
test_losses,
accuracies,
optimizer_str)


from torch import nn

model_1 = nn.Sequential(nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 10),
nn.LogSoftmax(dim=1))


	epoch	train_loss	test_loss	accuracy
1	     0.516	    0.462	   0.833
10	     0.271	    0.364	   0.876
20	     0.219	     0.42	   0.877
30	     0.183	    0.408	   0.885
40	     0.162	    0.488	   0.886
50	     0.156	    0.515	   0.882


#### SGD Optimizer

model_2 = nn.Sequential(nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 10),
nn.LogSoftmax(dim=1))

train_model(model_2, optimizer_str = 'SGD')

	epoch	train_loss	test_loss	accuracy
1	      2.09	     1.64	    0.43
10	     0.485	    0.509	   0.814
20	       0.4	    0.445	   0.839
30	     0.355	    0.403	   0.854
40	     0.322	    0.382	   0.863
50	     0.296	    0.358	   0.872


## Overfitting

If we look at the training and validation losses as we train the network, we can see a phenomenon known as overfitting.

The network learns the training set better and better, resulting in lower training losses. However, it starts having problems generalizing to data outside the training set leading to the validation loss increasing. The ultimate goal of any deep learning model is to make predictions on new data, so we should strive to get the lowest validation loss possible.

One option is to use the version of the model with the lowest validation loss, here the one around 8-10 training epochs. This strategy is called early-stopping. In practice, you’d save the model frequently as you’re training then later choose the model with the lowest validation loss.

The most common method to reduce overfitting (outside of early-stopping) is dropout, where we randomly drop input units. This forces the network to share information between weights, increasing it’s ability to generalize to new data. Adding dropout in PyTorch is straightforward using the nn.Dropout module.

class Classifier(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 256)
self.fc2 = nn.Linear(256, 128)
self.fc3 = nn.Linear(128, 64)
self.fc4 = nn.Linear(64, 10)

# Dropout module with 0.2 drop probability
self.dropout = nn.Dropout(p=0.2)

def forward(self, x):
# make sure input tensor is flattened
x = x.view(x.shape[0], -1)

# Now with dropout
x = self.dropout(F.relu(self.fc1(x)))
x = self.dropout(F.relu(self.fc2(x)))
x = self.dropout(F.relu(self.fc3(x)))

# output so no dropout here
x = F.log_softmax(self.fc4(x), dim=1)

return x


During training we want to use dropout to prevent overfitting, but during inference we want to use the entire network. So, we need to turn off dropout during validation, testing, and whenever we’re using the network to make predictions. To do this, you use model.eval(). This sets the model to evaluation mode where the dropout probability is 0. You can turn dropout back on by setting the model to train mode with model.train(). In general, the pattern for the validation loop will look like this, where you turn off gradients, set the model to evaluation mode, calculate the validation loss and metric, then set the model back to train mode.

# turn off gradients

# set model to evaluation mode
model.eval()

# validation pass here
...

# set model back to train mode
model.train()


Note the overfitting is reduced if not eliminated.

dropout_model_1 = nn.Sequential(nn.Dropout(0.2),
nn.Linear(784, 256),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(256, 128),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(128, 64),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(64, 10),
nn.LogSoftmax(dim=1))


	epoch	train_loss	test_loss	accuracy
1	     0.626	    0.479	   0.828
10	     0.435	     0.38	   0.864
20	     0.413	    0.376	    0.87
30	     0.402	    0.367	   0.871
40	     0.392	    0.398	   0.869
50	      0.38	    0.385	   0.871


#### SGD Optimizer

dropout_model_2 = nn.Sequential(nn.Dropout(0.2),
nn.Linear(784, 256),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(256, 128),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(128, 64),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(64, 10),
nn.LogSoftmax(dim=1))

train_model(dropout_model_2, optimizer_str = 'SGD')

	epoch	train_loss	test_loss	accuracy
1	       2.2	     1.92	   0.365
10	     0.645	    0.566	   0.789
20	     0.534	    0.478	   0.823
30	     0.485	    0.434	   0.841
40	     0.453	    0.412	    0.85
50	     0.432	    0.394	   0.858


## Inference

Now that the model is trained, it can be used for inference. We need to set the model in inference mode with model.eval() and turn off autograd with the torch.no_grad() context.

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

def view_classify(img, ps):
ps = ps.data.numpy().squeeze()

fig, (ax1, ax2) = plt.subplots(figsize=(6,9), ncols=2)
ax1.imshow(img.resize_(1, 28, 28).numpy().squeeze())
ax1.axis('off')

ax2.barh(np.arange(10), ps)
ax2.set_aspect(0.1)
ax2.set_yticks(np.arange(10))
ax2.set_yticklabels(['T-shirt/top',
'Trouser',
'Pullover',
'Dress',
'Coat',
'Sandal',
'Shirt',
'Sneaker',
'Bag',
'Ankle Boot'], size='small');
ax2.set_title('Class Probability')
ax2.set_xlim(0, 1.1)

plt.tight_layout()


Use model trained with dropout and the ‘Adam’ optimizer.

dropout_model_1.eval()

for _ in range(10):
images, labels = dataiter.next()
img = images[0]

# Convert 2D image to 1D vector
img = img.view(1, 784)

# Calculate the class probabilities (softmax) for img