In this example, I predict student admissions to graduate school at UCLA based on three pieces of data:

• GRE Scores (Test)
• Class Rank (1-4)

The dataset was originally taken from this link: http://www.ats.ucla.edu/ but may have been modified by Udacity.

%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

d = pd.read_csv('neural-network-admissions-example/data/student_data.csv')
print(str(d.shape) + ' rows')

400 rows


0 0 380 3.61 3
1 1 660 3.67 3
2 1 800 4.00 1
3 1 640 3.19 4
4 0 520 2.93 4

### Plot the Data

First, ignore the rank, so the plot is 2-dimensional.

def plot_points(d):
plt.figure(figsize=[6,4.75])

X = np.array(d[["gre","gpa"]])

rejected = X[np.argwhere(y==0)]

plt.scatter([s for s in rejected],
[s for s in rejected],
s = 50,
color = 'red',
label = 'Denied',
alpha=0.35)

s = 50,
color = 'darkblue',
alpha=0.35)

plt.legend(
fancybox=True,
loc=3,
facecolor='white',
edgecolor='white')

plt.xlabel('Test (GRE)')

plot_points(d)
plt.title('All Ranks'); Roughly, it looks like students with higher grades and test scores were admitted. But, the data is clearly not linearly separable. The rank may help us to make more sense of the data.

First, separate the ranks into different subsets of the data. Then, plot using each subset.

d1 = d[d["rank"]==1]
d2 = d[d["rank"]==2]
d3 = d[d["rank"]==3]
d4 = d[d["rank"]==4]

plot_points(d1)
plt.title("Rank 1")
plt.show()
plot_points(d2)
plt.title("Rank 2")
plt.show()
plot_points(d3)
plt.title("Rank 3")
plt.show()
plot_points(d4)
plt.title("Rank 4")
plt.show()    It appears that the lower the rank, the higher the acceptances. It makes sense to use the rank as one of the inputs. To do this, we will need to one-hot encode it.

### One-hot Encoding the Rank

Use Panda’s get_dummies.

one_hot_data = pd.get_dummies(d,
columns=['rank'])


admit gre gpa rank_1 rank_2 rank_3 rank_4
0 0 380 3.61 0 0 1 0
1 1 660 3.67 0 0 1 0
2 1 800 4.00 1 0 0 0
3 1 640 3.19 0 0 0 1
4 0 520 2.93 0 0 0 1

### Scale the Data

Fit both the GRE and GPA data into a range of 0-1 by dividing the grades by 4.0 and the test score by 800.

processed_data = one_hot_data
processed_data['gre'] /= 800
processed_data['gpa'] /= 4.


admit gre gpa rank_1 rank_2 rank_3 rank_4
0 0 0.475 0.9025 0 0 1 0
1 1 0.825 0.9175 0 0 1 0
2 1 1.000 1.0000 1 0 0 0
3 1 0.800 0.7975 0 0 0 1
4 0 0.650 0.7325 0 0 0 1

### Split the Data into Training and Testing

The example below does not use Sklearn’s train_test_split to demonstrate an alternative method.

sample = np.random.choice(processed_data.index,
size=int(len(processed_data)*0.9),
replace=False)
train_data, test_data = processed_data.iloc[sample], processed_data.drop(sample)
print('Training Data: {} rows'.format(train_data.shape))
print(' Testing Data:  {} rows'.format(test_data.shape))

Training Data: 360 rows
Testing Data:  40 rows


### Split the data into Features and Labels

features = train_data.drop(columns=['admit'])


### Training the 2-Layer Neural Network

First, some utility functions.

#### Sigmoid Activation Function

def sigmoid(x):

return 1 / (1 + np.exp(-x))


#### Derivative of Sigmoid Activation Function

def sigmoid_prime(x):

return sigmoid(x) * (1-sigmoid(x))


#### Error Function

def error_formula(y, output):

return - y*np.log(output) - (1 - y) * np.log(1-output)


#### Error Term Formula

$$Error\ Term = -(y-\hat{y})\sigma (x)$$

def error_term_formula(y, output):

return (y-output) * output * (1 - output)


### Backpropagate the Error

# Neural Network hyperparameters
epochs = 1000
learnrate = 0.5

# Training function
def train_nn(features, targets, epochs, learnrate):

# Use to same seed to make debugging easier
np.random.seed(42)

n_records, n_features = features.shape
last_loss = None

# Initialize weights
weights = np.random.normal(scale=1 / n_features**.5, size=n_features)

for e in range(epochs):
del_w = np.zeros(weights.shape)
for x, y in zip(features.values, targets):
# Loop through all records, x is the input, y is the target

# Activation of the output unit
#   Notice we multiply the inputs and the weights here
#   rather than storing h as a separate variable
output = sigmoid(np.dot(x, weights))

# The error, the target minus the network output
error = error_formula(y, output)

# The error term
#   Calulate f'(h) here instead of defining a separate
#   sigmoid_prime function. This just makes it faster because we
#   can re-use the result of the sigmoid function stored in
#   the output variable
error_term = error_term_formula(y, output)

del_w += error_term * x

# Update the weights here. The learning rate times the
# change in weights, divided by the number of records to average
weights += learnrate * del_w / n_records

# Printing out the mean square error on the training set
if e % (epochs / 10) == 0:
out = sigmoid(np.dot(features, weights))
loss = np.mean((out - targets) ** 2)
print("Epoch:", e)
if last_loss and last_loss < loss:
print("Train loss: ", loss, "  WARNING - Loss Increasing")
else:
print("Train loss: ", loss)
last_loss = loss
print("=========")
print("Finished training!")
return weights

weights = train_nn(features, targets, epochs, learnrate)

Epoch: 0
Train loss:  0.27322921091147634
=========
Epoch: 100
Train loss:  0.2047581256002055
=========
Epoch: 200
Train loss:  0.20255452027395668
=========
Epoch: 300
Train loss:  0.20153969072910288
=========
Epoch: 400
Train loss:  0.2010100178416723
=========
Epoch: 500
Train loss:  0.20068356425817008
=========
Epoch: 600
Train loss:  0.20044728889319113
=========
Epoch: 700
Train loss:  0.20025427734771004
=========
Epoch: 800
Train loss:  0.20008402998279995
=========
Epoch: 900
Train loss:  0.199927063762658
=========
Finished training!


### Calculate the Accuracy on the Test Data

tes_out = sigmoid(np.dot(features_test, weights))
predictions = tes_out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))

Prediction accuracy: 0.625
`