Project 4 - Backpropagation
Data files
Each line of the file is one pattern which contains the input values
x and y (and z) and the expected output value as the last number in
that line.
For Problem 1 (2 inputs)
training1.txt
testing1.txt
validation1.txt
For Problem 2 (3 inputs)
training2.txt
testing2.txt
validation2.txt
Inputs to the program
Your program should accept the following either on the command line
or from standard input:
- number of hidden layers
- number of neurons in each hidden layer
- learning rate
- training, testing, and validation data files
- number of training epochs
It may help to also accept the number of inputs to the network (2
for Problem 1 and 3 for Problem 2) for reading in all the data
files.
Note that each hidden layer can have a different number of nodes.
Representing the neural network
The network can be represented as a three dimensional array for the
hidden layers and a single dimensional array for the output layer.
The first dimension of the hidden layer array gives the layer, the
second dimension gives the neuron in a given layer, and the third
dimension denotes the weight associated with an input node or a node
from the previous hidden layer. The output layer array holds the
weights associated with the nodes in the last hidden layer. Each
neuron in the hidden and output layers also needs a bias weight.
These can be stored as separate arrays or as the 0th elements of the
hidden and output arrays (I used separate arrays for the biases).
Note that the words neuron and node are used interchangeably.
Here is an example of a neural network (does not include the
biases):
The leftmost (smaller) two nodes are the input nodes, the rightmost
node is the output node, and the rest of the nodes are the hidden
nodes of the three hidden layers. The input nodes are connected to
the first hidden layer, and the output node is connected to the last
hidden layer. Suppose hiddenweights
and outputweights
are the names of the arrays storing the weights of the network. The
indexing scheme of the weight arrays using the colored examples in
the diagram is as follows (assume indexing starts at 0):
red: hiddenweights[0][1][0]
(hidden layer 0, node 1,
input 0)
green: hiddenweights[1][1][0]
(hidden layer 1, node 1,
previous hidden layer's node 0)
blue: hiddenweights[2][0][2]
(hidden layer 2, node 0,
previous hidden layer's node 2)
magenta: outputweights[0]
(last hidden layer's node 0)
You will also need a way of keeping track of the output (h and
sigma) and the delta values for each hidden node and the output
node.
You will probably also want arrays for storing the data from the
training, testing, and validation data files.
Initializing the weights
Each weight in the network (hidden and output layers) should be
initialized to a random number between -0.1 and 0.1. This is also
true for all of the bias weights (hidden or output). Note that each
node (hidden or output) only has a single bias weight associated
with it.
Overall logical flow
After reading the command line, input files, allocating data
structures, and initializing the network weights, main()
should have this overall logic:
for(i=0; i<numepochs; i++){
trainNet();
testNet(i);
}
evaluateNet();
evaluateNet()
refers to validation. The reason for the
parameter of epoch number to testNet()
is to associate
an epoch number with the RMSE that will be computed in testNet()
for data gathering and graphing purposes. In other words, it will
facilitate creating a graph that shows the RMSE at each epoch.
Training the network
The training method given below uses online learning, where the
weights are updated after each pattern is presented. For each epoch,
you will iterate through all the training patterns. For each
pattern, you will compute the outputs, compute the delta values, and
update the weights as follows.
First initialize all output values (h and sigma) and delta values of
each hidden and output node to 0. These need to be reset after going
through each of the training patterns.
Computing the outputs
The forward pass through the network consists of computing the
outputs of each successive hidden layer and then finally the output
layer. For the hidden layers, iterate through each node of each
hidden layer. For each of these nodes, check to see whether the
current hidden layer is the first hidden layer. If so, then iterate
through the number of inputs (either 2 or 3), and update the output
value (h) of that node as follows
h += hiddenweight * input
where hiddenweight
is the weight associated with the
current hidden node and the input node, and input
is
the value of the current input node (x, y, or z).
If the current hidden layer is not the first one, then iterate
through all the nodes of the previous hidden layer. Update the
output value (h
) of the current hidden node as follows
h += hiddenweight * sigmahiddenprevious
where hiddenweight
is the weight associated with the
node to be updated and the node of the previous hidden layer, and
sigmahiddenprevious
is the sigma value of the output of
the node from the previous hidden layer.
After computing the output h
of the current node by
going through the previous layer (or the input layer), first add to
h
the bias weight associated with the current node.
Then find the sigma value of the output h
of the
current node according to the following formula
sigmahidden = 1/(1+exp(-h))
Then compute the output at the output node by iterating through the
nodes in the last hidden layer and updating the output value as
follows
output += outputweight * sigmahiddenlast
where outputweight
is the weight associated with the
output node and the node from the last hidden layer, and sigmahiddenlast
is the sigma value of the output of the node from the last hidden
layer.
Then add the output bias weight to the output value, and then find
the sigma value of this output value.
Computing the delta values
The backward pass through the network involves computing the delta
values first at the output layer then backwards from the last hidden
layer to the first hidden layer.
Compute delta of the output node as
deltaoutput = sigmaoutput * (1-sigmaoutput) *
(expectedoutput-sigmaoutput)
where sigmaoutput
is the sigma value of the output
node and expectedoutput
is the expected output of the
current pattern (this is given in the training data file).
To compute the delta values for the hidden nodes, iterate through
each node of each hidden layer. Be sure to iterate through the
layers in reverse order. For each of these nodes, check whether the
current hidden layer is the last hidden layer. If so, then compute
delta for that node as follows
deltahidden = sigmahidden * (1-sigmahidden) * deltaout *
outputweight
where sigmahidden
is the sigma value of the output h
of the current hidden node, deltaout
is the delta
value of the output node, and outputweight
is the
weight associated with the current hidden node and the output node.
If the current hidden layer is not the last hidden layer, then
iterate through the number of nodes in the next (forward) hidden
layer and update delta for the current hidden node as follows
deltahidden += sigmahidden * (1-sigmahidden) * deltahiddennext *
hiddenweightnext
where sigmahidden
is the sigma value of the output h
of the current hidden node, deltahiddennext
is the
delta value of the node from the next hidden layer, and hiddenweightnext
is the weight associated with the node from the current hidden layer
and the node from the next hidden layer.
Updating the weights
For updating the hidden layers, iterate through each node of each
hidden layer. For each node, check to see whether or not the current
hidden layer is the first hidden layer. If so, then iterate through
the number of inputs (2 or 3) and update the weight of the hidden
node as follows
hiddenweight += learningrate * deltahidden * input
where deltahidden
is the delta value of the current
hidden node and input
is the current input value.
If the current hidden layer is not the first layer, then iterate
through the nodes of the previous hidden layer and update the weight
of the hidden node as follows
hiddenweight += learningrate * deltahidden * sigmahiddenprevious
where deltahidden
is the delta value of the current
hidden node and sigmahiddenprevious
is the sigma value
of the output h
of the node from the previous hidden
layer.
After updating the weights of the current hidden node, update the
bias weight of the current hidden node as follows
biasweighthidden += learningrate * deltahidden
where deltahidden
is the delta value of the current
hidden node. This assumes that the bias value is 1.
To update the weights of the output layer, iterate through the nodes
in the last hidden layer and update the weight as follows
outputweight += learningrate * deltaout * sigmahiddenprevious
where deltaout
is the delta value of the output node
and sigmahiddenprevious
is the sigma value of the
output h
of the hidden node from the previous layer.
Then update the output bias weight as follows
biasweightoutput += learningrate * deltaout
where deltaout
is the delta value of the output node.
This also assumes that the bias value is 1.
Testing the network
After each epoch of the training, the network is tested by iterating
through all the patterns of the testing set and computing the output
of each hidden and output node as in the procedure for computing
outputs during training. For evaluating how well the network
performs, calculate the root mean square error of the testing data.
After presenting a testing pattern to the network a sum is
accumulated
sum += (expectedoutput-sigmaoutput)2
where expectedoutput
is the expected output of the
current testing pattern and sigmaoutput
is the sigma
value of the output node.
After presenting all the testing patterns, calculate the root mean
square error as
rmse = sqrt((1/(2*numtestingpatterns))*sum)
Print to the screen or write to a file this rmse value after each
training epoch. You can use this data to generate graphs for your
report.
After the training process, apply this same technique to the
validation data.
You might need to iterate for quite a large number of epochs (at
least 10,000, possibly up to 50,000 or more) to get lower error
values. In some cases, I got errors of around 0.17, sometimes lower,
such as 0.01. The error might stay steady for a long time, then
after many epochs, suddenly drop.
Experiments and report
Try several experiments testing combinations of a different number
of hidden layers, a different number of neurons in each hidden
layer, and different learning rates. Make graphs of the rmse (and
any other measure you use) over time (epochs) and include these
graphs and discussion in your report on what network architecture
seems optimal for the problem and how changing the network
architecture affects the network's performance. Do all this for both
Problems 1 and 2 (and the graduate part for 427 and 527 students).
Be sure to evaluate performance on both the testing sets and the
validation sets.