# Inference and Validation

**Inference**, a term borrowed from statistics, is the process of using a trained model to make making predictions. However, neural networks have a tendency to perform *too well* on the training data and aren’t able to generalize to data that hasn’t been seen before. This is called **overfitting** and it impairs inference performance.

To test for overfitting while training, we measure the performance on data not in the training set called the **validation** set. We avoid overfitting through regularization such as dropout while monitoring the validation performance during training.

Let’s start by loading the dataset through torchvision. This time we’ll be taking advantage of the test set which you can get by setting `train=False`

here:

```
testset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/',
download=True,
train=False,
transform=transform)
```

The test set contains images just like the training set. Typically you’ll see 10-20% of the original dataset held out for testing and validation with the rest being used for training.

```
import torch
from torchvision import datasets, transforms
# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])
# Download and load the training data
trainset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/',
download=True,
train=True,
transform=transform)
trainloader = torch.utils.data.DataLoader(trainset,
batch_size=64,
shuffle=True)
# Download and load the test data
testset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/',
download=True,
train=False,
transform=transform)
testloader = torch.utils.data.DataLoader(testset,
batch_size=64,
shuffle=True)
```

Use the same model as was used in the classifying fashion mnist example.

```
from torch import nn
model = nn.Sequential(nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 10),
nn.LogSoftmax(dim=1))
```

The goal of validation is to measure the model’s performance on data that isn’t part of the training set. Performance here is up to the developer to define. Typically this is just accuracy, the percentage of classes the network predicted correctly. Other options are precision and recall and top-5 error rate. This example focuses on **accuracy**.

```
images, labels = next(iter(testloader))
# Flatten MNIST images into a 784 long vector
images = images.view(images.shape[0], -1)
```

As explected, below, there are 10 class probabilities for 64 examples, as expected.

```
# Get the class probabilities
ps = torch.exp(model(images))
print(ps.shape)
```

```
torch.Size([64, 10])
```

With the probabilities, we can get the most likely class using the `ps.topk`

method. This returns the $k$ highest values.

Since we just want the most likely class for each of the 64 examples, we can use `ps.topk(1)`

. This returns a tuple of the top-$k$ values and the top-$k$ indices. If the highest value is the fifth element, we’ll get back 4 as the index.

As shown, there is a prediction for each of the 64 examples.

```
top_p, top_class = ps.topk(1, dim=1)
print(str(top_class.shape) + '\n')
print(top_class[:5,:])
```

```
torch.Size([64, 1])
tensor([[9],
[9],
[9],
[9],
[9]])
```

Now we can check if the predicted classes match the labels. This is simple to do by equating `top_class`

and `labels`

, but we have to be careful of the shapes. Here `top_class`

is a 2D tensor with shape `(64, 1)`

while `labels`

is 1D with shape `(64)`

.

To get the equality to work out the way we want, `top_class`

and `labels`

must have the same shape.

If we do

```
equals = top_class == labels
```

`equals`

will have shape `(64, 64)`

. What it’s doing is comparing the one element in each row of `top_class`

with each element in `labels`

which returns 64 True/False boolean values for each row.

```
equals = top_class == labels.view(*top_class.shape)
```

Now we need to calculate the percentage of correct predictions. `equals`

has binary values, either 0 or 1. This means that if we just sum up all the values and divide by the number of values, we get the percentage of correct predictions. This is the same operation as taking the mean, so we can get the accuracy with a call to `torch.mean`

. If only it was that simple. If you try `torch.mean(equals)`

, you’ll get an error

```
RuntimeError: mean is not implemented for type torch.ByteTensor
```

This happens because `equals`

has type `torch.ByteTensor`

but `torch.mean`

isn’t implemented for tensors with that type. So we’ll need to convert `equals`

to a float tensor. Note that when we take `torch.mean`

it returns a scalar tensor, to get the actual value as a float we’ll need to do `accuracy.item()`

.

```
accuracy = torch.mean(equals.type(torch.FloatTensor))
print(f'Accuracy: {accuracy.item()*100}%')
```

```
Accuracy: 6.25%
```

The network is untrained so it’s making random guesses and we should see an accuracy around 10%. Now let’s train our network and include our validation pass so we can measure how well the network is performing on the test set. Since we’re not updating our parameters in the validation pass, we can speed up our code by turning off gradients using `torch.no_grad()`

:

```
# turn off gradients
with torch.no_grad():
# validation pass here
for images, labels in testloader:
...
```

### Training

Print Total Accuracy for Each Validation Loop

The following is a helper function to show losses and accuracies over the course of training.

```
%matplotlib inline
import matplotlib.pyplot as plt
def plot_losses_and_accuracies(train_losses, test_losses, accuracies, title):
fig, (ax1, ax2) = plt.subplots(figsize=(12,6), ncols=2)
plt.suptitle(title)
ax1.set_title('Losses')
ax1.set_ylim(0,1)
ax1.plot(train_losses, label='Training loss')
ax1.plot(test_losses, label='Validation loss')
ax1.legend(frameon=False)
ax2.set_title('Accuracy')
ax2.set_ylim(0,1)
ax2.plot(accuracies, label='Accuracy')
```

The following is a helper function that performs the actual training.

```
from torch import optim
def train_model(model, optimizer_str):
criterion = nn.NLLLoss()
if optimizer_str == 'SGD':
optimizer = optim.SGD(model.parameters(),
lr=0.003)
elif optimizer_str == 'Adam':
optimizer = optim.Adam(model.parameters(),
lr=0.003)
epochs = 50
print("\tepoch\ttrain_loss\ttest_loss\taccuracy")
train_losses, test_losses, accuracies = [], [], []
for epoch in range(epochs):
running_loss, test_loss, accuracy = 0, 0, 0
for images, labels in trainloader:
# Flatten MNIST images into a 784 long vector
images = images.view(images.shape[0], -1)
optimizer.zero_grad()
log_ps = model(images)
loss = criterion(log_ps, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
# Turn off gradients for validation, saves memory and computations
with torch.no_grad():
# Turn off dropout for validation
model.eval()
for images, labels in testloader:
# Flatten MNIST images into a 784 long vector
images = images.view(images.shape[0], -1)
log_ps = model(images)
test_loss += criterion(log_ps, labels)
ps = torch.exp(log_ps)
top_p, top_class = ps.topk(1, dim=1)
equals = top_class == labels.view(*top_class.shape)
accuracy += torch.mean(equals.type(torch.FloatTensor))
# Turn on dropout for validation
model.train()
train_losses.append(running_loss/len(trainloader))
test_losses.append(test_loss/len(testloader))
accuracy = accuracy.item()/len(testloader)
accuracies.append(accuracy)
if epoch == 0 or (((epoch+1) % 10) == 0):
print("\t{:5}\t{:10.3}\t{:9.3}\t{:8.3}".format(epoch+1,
train_losses[-1],
test_losses[-1],
accuracies[-1]))
plot_losses_and_accuracies(train_losses,
test_losses,
accuracies,
optimizer_str)
```

#### Adam Optimizer

```
from torch import nn
model_1 = nn.Sequential(nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 10),
nn.LogSoftmax(dim=1))
train_model(model_1, optimizer_str = 'Adam')
```

```
epoch train_loss test_loss accuracy
1 0.516 0.462 0.833
10 0.271 0.364 0.876
20 0.219 0.42 0.877
30 0.183 0.408 0.885
40 0.162 0.488 0.886
50 0.156 0.515 0.882
```

#### SGD Optimizer

```
model_2 = nn.Sequential(nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 10),
nn.LogSoftmax(dim=1))
train_model(model_2, optimizer_str = 'SGD')
```

```
epoch train_loss test_loss accuracy
1 2.09 1.64 0.43
10 0.485 0.509 0.814
20 0.4 0.445 0.839
30 0.355 0.403 0.854
40 0.322 0.382 0.863
50 0.296 0.358 0.872
```

## Overfitting

If we look at the training and validation losses as we train the network, we can see a phenomenon known as overfitting.

The network learns the training set better and better, resulting in lower training losses. However, it starts having problems generalizing to data outside the training set leading to the validation loss increasing. The ultimate goal of any deep learning model is to make predictions on new data, so we should strive to get the lowest validation loss possible.

One option is to use the version of the model with the lowest validation loss, here the one around 8-10 training epochs. This strategy is called *early-stopping*. In practice, you’d save the model frequently as you’re training then later choose the model with the lowest validation loss.

The most common method to reduce overfitting (outside of early-stopping) is **dropout**, where we randomly drop input units. This forces the network to share information between weights, increasing it’s ability to generalize to new data. Adding dropout in PyTorch is straightforward using the `nn.Dropout`

module.

```
class Classifier(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 256)
self.fc2 = nn.Linear(256, 128)
self.fc3 = nn.Linear(128, 64)
self.fc4 = nn.Linear(64, 10)
# Dropout module with 0.2 drop probability
self.dropout = nn.Dropout(p=0.2)
def forward(self, x):
# make sure input tensor is flattened
x = x.view(x.shape[0], -1)
# Now with dropout
x = self.dropout(F.relu(self.fc1(x)))
x = self.dropout(F.relu(self.fc2(x)))
x = self.dropout(F.relu(self.fc3(x)))
# output so no dropout here
x = F.log_softmax(self.fc4(x), dim=1)
return x
```

During training we want to use dropout to prevent overfitting, but during inference we want to use the entire network. So, we need to turn off dropout during validation, testing, and whenever we’re using the network to make predictions. To do this, you use `model.eval()`

. This sets the model to evaluation mode where the dropout probability is 0. You can turn dropout back on by setting the model to train mode with `model.train()`

. In general, the pattern for the validation loop will look like this, where you turn off gradients, set the model to evaluation mode, calculate the validation loss and metric, then set the model back to train mode.

```
# turn off gradients
with torch.no_grad():
# set model to evaluation mode
model.eval()
# validation pass here
for images, labels in testloader:
...
# set model back to train mode
model.train()
```

### Add Dropout

Note the overfitting is reduced if not eliminated.

#### Adam Optimizer

```
dropout_model_1 = nn.Sequential(nn.Dropout(0.2),
nn.Linear(784, 256),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(256, 128),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(128, 64),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(64, 10),
nn.LogSoftmax(dim=1))
train_model(dropout_model_1, optimizer_str = 'Adam')
```

```
epoch train_loss test_loss accuracy
1 0.626 0.479 0.828
10 0.435 0.38 0.864
20 0.413 0.376 0.87
30 0.402 0.367 0.871
40 0.392 0.398 0.869
50 0.38 0.385 0.871
```

#### SGD Optimizer

```
dropout_model_2 = nn.Sequential(nn.Dropout(0.2),
nn.Linear(784, 256),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(256, 128),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(128, 64),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(64, 10),
nn.LogSoftmax(dim=1))
train_model(dropout_model_2, optimizer_str = 'SGD')
```

```
epoch train_loss test_loss accuracy
1 2.2 1.92 0.365
10 0.645 0.566 0.789
20 0.534 0.478 0.823
30 0.485 0.434 0.841
40 0.453 0.412 0.85
50 0.432 0.394 0.858
```

## Inference

Now that the model is trained, it can be used for inference. We need to set the model in inference mode with `model.eval()`

and turn off autograd with the `torch.no_grad()`

context.

```
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
def view_classify(img, ps):
ps = ps.data.numpy().squeeze()
fig, (ax1, ax2) = plt.subplots(figsize=(6,9), ncols=2)
ax1.imshow(img.resize_(1, 28, 28).numpy().squeeze())
ax1.axis('off')
ax2.barh(np.arange(10), ps)
ax2.set_aspect(0.1)
ax2.set_yticks(np.arange(10))
ax2.set_yticklabels(['T-shirt/top',
'Trouser',
'Pullover',
'Dress',
'Coat',
'Sandal',
'Shirt',
'Sneaker',
'Bag',
'Ankle Boot'], size='small');
ax2.set_title('Class Probability')
ax2.set_xlim(0, 1.1)
plt.tight_layout()
```

Use model trained with dropout and the ‘Adam’ optimizer.

```
dropout_model_1.eval()
dataiter = iter(testloader)
for _ in range(10):
images, labels = dataiter.next()
img = images[0]
# Convert 2D image to 1D vector
img = img.view(1, 784)
# Calculate the class probabilities (softmax) for img
with torch.no_grad():
output = dropout_model_1.forward(img)
ps = torch.exp(output)
# Plot the image and probabilities
view_classify(img.view(1, 28, 28), ps)
```

Copyright © 2018 Udacity