Project 4 - Backpropagation
Data files
Each line of the file is one pattern which contains the input values x
and y (and z) and the expected output value as the last number in that
line.
For Problem 1 (2 inputs)
training1.txt
testing1.txt
validation1.txt
For Problem 2 (3 inputs)
training2.txt
testing2.txt
validation2.txt
Inputs to the program
Your program should accept the following either on the command line or
from standard input:
- number of hidden layers
- number of neurons in each hidden layer
- learning rate
- training, testing, and validation data files
- number of training epochs
It may help to also accept the number of inputs to the network (2 for
Problem 1 and 3 for Problem 2) for reading in all the data files.
Note that each hidden layer can have a different number of nodes.
Representing the neural network
The network can be represented as a three dimensional array for the
hidden layers and a single dimensional array for the output layer. The
first dimension of the hidden layer array gives the layer, the second
dimension gives the neuron in a given layer, and the third dimension
denotes the weight associated with an input node or a node from the
previous hidden layer. The output layer array holds the weights
associated with the nodes in the last hidden layer. Each neuron in the
hidden and output layers also needs a bias weight. These can be stored
as separate arrays or as the 0th elements of the hidden and output
arrays (I used separate arrays for the biases). Note that the words neuron
and node are used interchangeably.
Here is an example of a neural network (does not include the biases):
The leftmost (smaller) two nodes are the input nodes, the rightmost node is
the output node, and the rest of the nodes are the hidden nodes of the three
hidden layers. The input nodes are connected to the first hidden layer, and
the output node is connected to the last hidden layer. Suppose
hiddenweights and outputweights are the names of
the arrays storing the weights of the network. The indexing scheme of the
weight arrays using the colored examples in the diagram is as follows
(assume indexing starts at 0):
red: hiddenweights[0][1][0] (hidden layer 0, node 1, input 0)
green: hiddenweights[1][1][0] (hidden layer 1, node 1, previous hidden layer's node 0)
blue: hiddenweights[2][0][2] (hidden layer 2, node 0, previous hidden layer's node 2)
magenta: outputweights[0] (last hidden layer's node 0)
You will also need a way of keeping track of the output (h and sigma)
and the delta values for each hidden node and the output node.
You will probably also want arrays for storing the data from the training,
testing, and validation data files.
Initializing the weights
Each weight in the network (hidden and output layers) should be initialized
to a random number between -0.1 and 0.1. This is also true for all of the
bias weights (hidden or output). Note that each node (hidden or output) only
has a single bias weight associated with it.
Overall logical flow
After reading the command line, input files, allocating data structures, and
initializing the network weights, main() should have this
overall logic:
for(i=0; i<numepochs; i++){
trainNet();
testNet(i);
}
evaluateNet();
evaluateNet() refers to validation. The reason for the
parameter of epoch number to testNet() is to associate an
epoch number with the RMSE that will be computed in testNet()
for data gathering and graphing purposes. In other words, it will facilitate
creating a graph that shows the RMSE at each epoch.
Training the network
The training method given below uses online learning, where the weights
are updated after each pattern is presented. For each epoch, you
will iterate through all the training patterns. For each pattern, you
will compute the outputs, compute the delta values, and update the
weights as follows.
First initialize all output values (h and sigma) and delta values of
each hidden and output node to 0. These need to be reset after going
through each of the training patterns.
Computing the outputs
The forward pass through the network consists of computing the outputs
of each successive hidden layer and then finally the output layer. For
the hidden layers, iterate through each node of each hidden layer. For
each of these nodes, check to see whether the current hidden layer is
the first hidden layer. If so, then iterate through the number of
inputs (either 2 or 3), and update the output value (h) of that node as
follows
h += hiddenweight * input
where hiddenweight is the weight associated with the
current hidden node and the input node, and input is the value of
the current input node (x, y, or z).
If the current hidden layer is not the first one, then iterate through
all the nodes of the previous hidden layer. Update the output value
(h)
of the current hidden node as follows
h += hiddenweight * sigmahiddenprevious
where hiddenweight is the weight associated with the node
to be updated and the node of the previous hidden layer, and
sigmahiddenprevious is the sigma value of the output of the
node from the previous hidden layer.
After computing the output h of the current node by going
through the previous layer (or the input layer), first add to h
the bias weight associated with the current node. Then find the sigma value
of the output h of the current node according to the following
formula
sigmahidden = 1/(1+exp(-h))
Then compute the output at the output node by iterating through the
nodes in the last hidden layer and updating the output value as follows
output += outputweight * sigmahiddenlast
where outputweight is the weight associated with the output node
and the node from the last hidden layer, and sigmahiddenlast is
the sigma value of the output of the node from the last hidden layer.
Then add the output bias weight to the output value, and then find the sigma
value of this output value.
Computing the delta values
The backward pass through the network involves computing the delta
values first at the output layer then backwards from the last hidden
layer to the first hidden layer.
Compute delta of the output node as
deltaoutput = sigmaoutput * (1-sigmaoutput) * (expectedoutput-sigmaoutput)
where sigmaoutput is the sigma value of the output node and
expectedoutput is the expected output of the current
pattern (this is given in the training data file).
To compute the delta values for the hidden nodes, iterate through each
node of each hidden layer. Be sure to iterate through the layers in reverse
order. For each of these nodes, check whether the current hidden layer is
the last hidden layer. If so, then compute delta for that node as follows
deltahidden = sigmahidden * (1-sigmahidden) * deltaout * outputweight
where sigmahidden is the sigma value of the output
h of the current hidden node, deltaout is the delta
value of the output node, and outputweight is the weight
associated with the current hidden node and the output node.
If the current hidden layer is not the last hidden layer, then iterate
through the number of nodes in the next (forward) hidden layer and
update delta for the current hidden node as follows
deltahidden += sigmahidden * (1-sigmahidden) * deltahiddennext * hiddenweightnext
where sigmahidden is the sigma value of the output
h of the current hidden node, deltahiddennext is
the delta value of the node from the next hidden layer, and
hiddenweightnext is the weight associated with the node from the
current hidden layer and the node from the next hidden layer.
Updating the weights
For updating the hidden layers, iterate through each node of each hidden
layer. For each node, check to see whether or not the current hidden
layer is the first hidden layer. If so, then iterate through the number
of inputs (2 or 3) and update the weight of the hidden node as follows
hiddenweight += learningrate * deltahidden * input
where deltahidden is the delta value of the current hidden
node and input is the current input value.
If the current hidden layer is not the first layer, then iterate through
the nodes of the previous hidden layer and update the weight of the
hidden node as follows
hiddenweight += learningrate * deltahidden * sigmahiddenprevious
where deltahidden is the delta value of the current hidden
node and sigmahiddenprevious is the sigma value of the output
h of the node from the previous hidden layer.
After updating the weights of the current hidden node, update the bias weight
of the current hidden node as follows
biasweighthidden += learningrate * deltahidden
where deltahidden is the delta value of the current hidden
node. This assumes that the bias value is 1.
To update the weights of the output layer, iterate through the nodes in
the last hidden layer and update the weight as follows
outputweight += learningrate * deltaout * sigmahiddenprevious
where deltaout is the delta value of the output node and
sigmahiddenprevious is the sigma value of the output
h of the hidden node from the previous layer.
Then update the output bias weight as follows
biasweightoutput += learningrate * deltaout
where deltaout is the delta value of the output node. This also
assumes that the bias value is 1.
Testing the network
After each epoch of the training, the network is tested by iterating
through all the patterns of the testing set and computing the output of
each hidden and output node as in the procedure for computing outputs
during training. For evaluating how well the network performs,
calculate the root mean square error of the testing data. After
presenting a testing pattern to the network a sum is accumulated
sum += (expectedoutput-sigmaoutput)2
where expectedoutput is the expected output of the current
testing pattern and sigmaoutput is the sigma value of the
output node.
After presenting all the testing patterns, calculate the root mean
square error as
rmse = sqrt((1/(2*numtestingpatterns))*sum)
Print to the screen or write to a file this rmse value after each
training epoch. You can use this data to generate graphs for your
report.
After the training process, apply this same technique to the validation
data.
You might need to iterate for quite a large number of epochs (at least
10,000, possibly up to 50,000 or more) to get lower error values. In some
cases, I got errors of around 0.17, sometimes lower, such as 0.01. The
error might stay steady for a long time, then after many epochs, suddenly
drop.
Experiments and report
Try several experiments testing combinations of a different number of
hidden layers, a different number of neurons in each hidden layer, and
different learning rates. Make graphs of the rmse (and any other
measure you use) over time (epochs) and include these graphs and
discussion in your report on what network architecture seems optimal for
the problem and how changing the network architecture affects the
network's performance. Do all this for both Problems 1 and 2 (and the
graduate part for 527 students). Be sure to evaluate performance on
both the testing sets and the validation sets.