Multilayer Perceptrons
Adding a “hidden” layer of perceptrons allows the model to find solutions to linearly inseparable problems. An example of this architecture is shown below.
The input to the hidden layer is the same as it was before, as is the the activation function.
The input to the output layer is the output of the hidden layer.
Stacking more and more layers helps the network learn more patterns. This is where the term “deep learning” comes from.
Neural Network Mathematics
The following notation will be used for the weights:
For example, the following image shows are network with input units labeled
The weights need to be stored in a matrix, indexed as
To initialize these weights we need the shape of the matrix. If we have a 2D array called features
containing the input data:
n_records, n_inputs = features.shape
n_hidden = 2
weights_input_to_hidden = np.random.normal(0, n_inputs**-0.5, size=(n_inputs, n_hidden))
The result is a 2D array (or, “matrix”) named weights_input_to_hidden
with dimensions n_inputs
by n_hidden
. Recall that the input to a hidden unit is the sum of all the inputs multiplied by the hidden unit’s weights. So, for each hidden unit,
For the architecture described above, we’re multiplying the input (a row vector) by the weights (a matrix). This is accomplished by taking the dot product of the inputs with each column in the weights matrix. The following is for the input to the first hidden unit,
The sum of products for the second hidden layer output would be similar, except using the dot product of the inputs with the second column, and so on. Using Numpy, this calculation can be run for all the inputs and outputs simultaneously:
hidden_inputs = np.dot(inputs, weights_input_to_hidden)
Note that the weights matrix could also be defined as follows, with dimensions n_hidden
by n_inputs
, then multiply by a inputs when set up as a column vector. Note that the weight indices no longer match up with the labels used in earlier diagrams. In matrix notation, the row index always precedes the column index.
The critical thing to remember with matrix multiplication is that the dimensions must match. For example, three columns in the input vector, and three rows in the weights matrix. Here’s the rule:
- If multiplying an array from the left, the array must have the same number of elements as there are rows in the matrix.
- If multiplying the matrix from the left, the number of columns in the matrix must equal the number of elements in the array on the right.