Project 4 - Backpropagation

Data files

Each line of the file is one pattern which contains the input values x and y (and z) and the expected output value as the last number in that line.

For Problem 1 (2 inputs)

training1.txt
testing1.txt
validation1.txt

For Problem 2 (3 inputs)

training2.txt
testing2.txt
validation2.txt

Inputs to the program

Your program should accept the following either on the command line or from standard input:
It may help to also accept the number of inputs to the network (2 for Problem 1 and 3 for Problem 2) for reading in all the data files.

Note that each hidden layer can have a different number of nodes.

Representing the neural network

The network can be represented as a three dimensional array for the hidden layers and a single dimensional array for the output layer. The first dimension of the hidden layer array gives the layer, the second dimension gives the neuron in a given layer, and the third dimension denotes the weight associated with an input node or a node from the previous hidden layer. The output layer array holds the weights associated with the nodes in the last hidden layer. Each neuron in the hidden and output layers also needs a bias weight. These can be stored as separate arrays or as the 0th elements of the hidden and output arrays (I used separate arrays for the biases). Note that the words neuron and node are used interchangeably.

Here is an example of a neural network (does not include the biases):



The leftmost (smaller) two nodes are the input nodes, the rightmost node is the output node, and the rest of the nodes are the hidden nodes of the three hidden layers. The input nodes are connected to the first hidden layer, and the output node is connected to the last hidden layer. Suppose hiddenweights and outputweights are the names of the arrays storing the weights of the network. The indexing scheme of the weight arrays using the colored examples in the diagram is as follows (assume indexing starts at 0):

red: hiddenweights[0][1][0] (hidden layer 0, node 1, input 0)
green: hiddenweights[1][1][0] (hidden layer 1, node 1, previous hidden layer's node 0)
blue: hiddenweights[2][0][2] (hidden layer 2, node 0, previous hidden layer's node 2)
magenta: outputweights[0] (last hidden layer's node 0)

You will also need a way of keeping track of the output (h and sigma) and the delta values for each hidden node and the output node.

You will probably also want arrays for storing the data from the training, testing, and validation data files.

Initializing the weights

Each weight in the network (hidden and output layers) should be initialized to a random number between -0.1 and 0.1. This is also true for all of the bias weights (hidden or output). Note that each node (hidden or output) only has a single bias weight associated with it.

Overall logical flow

After reading the command line, input files, allocating data structures, and initializing the network weights, main() should have this overall logic:
  for(i=0; i<numepochs; i++){
    trainNet();
    testNet(i);
  }
  evaluateNet();
evaluateNet() refers to validation. The reason for the parameter of epoch number to testNet() is to associate an epoch number with the RMSE that will be computed in testNet() for data gathering and graphing purposes. In other words, it will facilitate creating a graph that shows the RMSE at each epoch.

Training the network

The training method given below uses online learning, where the weights are updated after each pattern is presented. For each epoch, you will iterate through all the training patterns. For each pattern, you will compute the outputs, compute the delta values, and update the weights as follows.

First initialize all output values (h and sigma) and delta values of each hidden and output node to 0. These need to be reset after going through each of the training patterns.

Computing the outputs

The forward pass through the network consists of computing the outputs of each successive hidden layer and then finally the output layer. For the hidden layers, iterate through each node of each hidden layer. For each of these nodes, check to see whether the current hidden layer is the first hidden layer. If so, then iterate through the number of inputs (either 2 or 3), and update the output value (h) of that node as follows

h += hiddenweight * input

where hiddenweight is the weight associated with the current hidden node and the input node, and input is the value of the current input node (x, y, or z).

If the current hidden layer is not the first one, then iterate through all the nodes of the previous hidden layer. Update the output value (h) of the current hidden node as follows

h += hiddenweight * sigmahiddenprevious

where hiddenweight is the weight associated with the node to be updated and the node of the previous hidden layer, and sigmahiddenprevious is the sigma value of the output of the node from the previous hidden layer.

After computing the output h of the current node by going through the previous layer (or the input layer), first add to h the bias weight associated with the current node. Then find the sigma value of the output h of the current node according to the following formula

sigmahidden = 1/(1+exp(-h))

Then compute the output at the output node by iterating through the nodes in the last hidden layer and updating the output value as follows

output += outputweight * sigmahiddenlast

where outputweight is the weight associated with the output node and the node from the last hidden layer, and sigmahiddenlast is the sigma value of the output of the node from the last hidden layer.

Then add the output bias weight to the output value, and then find the sigma value of this output value.

Computing the delta values

The backward pass through the network involves computing the delta values first at the output layer then backwards from the last hidden layer to the first hidden layer.

Compute delta of the output node as

deltaoutput = sigmaoutput * (1-sigmaoutput) * (expectedoutput-sigmaoutput)

where sigmaoutput is the sigma value of the output node and expectedoutput is the expected output of the current pattern (this is given in the training data file).

To compute the delta values for the hidden nodes, iterate through each node of each hidden layer. Be sure to iterate through the layers in reverse order. For each of these nodes, check whether the current hidden layer is the last hidden layer. If so, then compute delta for that node as follows

deltahidden = sigmahidden * (1-sigmahidden) * deltaout * outputweight

where sigmahidden is the sigma value of the output h of the current hidden node, deltaout is the delta value of the output node, and outputweight is the weight associated with the current hidden node and the output node.

If the current hidden layer is not the last hidden layer, then iterate through the number of nodes in the next (forward) hidden layer and update delta for the current hidden node as follows

deltahidden += sigmahidden * (1-sigmahidden) * deltahiddennext * hiddenweightnext

where sigmahidden is the sigma value of the output h of the current hidden node, deltahiddennext is the delta value of the node from the next hidden layer, and hiddenweightnext is the weight associated with the node from the current hidden layer and the node from the next hidden layer.

Updating the weights

For updating the hidden layers, iterate through each node of each hidden layer. For each node, check to see whether or not the current hidden layer is the first hidden layer. If so, then iterate through the number of inputs (2 or 3) and update the weight of the hidden node as follows

hiddenweight += learningrate * deltahidden * input

where deltahidden is the delta value of the current hidden node and input is the current input value.

If the current hidden layer is not the first layer, then iterate through the nodes of the previous hidden layer and update the weight of the hidden node as follows

hiddenweight += learningrate * deltahidden * sigmahiddenprevious

where deltahidden is the delta value of the current hidden node and sigmahiddenprevious is the sigma value of the output h of the node from the previous hidden layer.

After updating the weights of the current hidden node, update the bias weight of the current hidden node as follows

biasweighthidden += learningrate * deltahidden

where deltahidden is the delta value of the current hidden node. This assumes that the bias value is 1.

To update the weights of the output layer, iterate through the nodes in the last hidden layer and update the weight as follows

outputweight += learningrate * deltaout * sigmahiddenprevious

where deltaout is the delta value of the output node and sigmahiddenprevious is the sigma value of the output h of the hidden node from the previous layer.

Then update the output bias weight as follows

biasweightoutput += learningrate * deltaout

where deltaout is the delta value of the output node. This also assumes that the bias value is 1.

Testing the network

After each epoch of the training, the network is tested by iterating through all the patterns of the testing set and computing the output of each hidden and output node as in the procedure for computing outputs during training. For evaluating how well the network performs, calculate the root mean square error of the testing data. After presenting a testing pattern to the network a sum is accumulated

sum += (expectedoutput-sigmaoutput)2

where expectedoutput is the expected output of the current testing pattern and sigmaoutput is the sigma value of the output node.

After presenting all the testing patterns, calculate the root mean square error as

rmse = sqrt((1/(2*numtestingpatterns))*sum)

Print to the screen or write to a file this rmse value after each training epoch. You can use this data to generate graphs for your report.

After the training process, apply this same technique to the validation data.

You might need to iterate for quite a large number of epochs (at least 10,000, possibly up to 50,000 or more) to get lower error values. In some cases, I got errors of around 0.17, sometimes lower, such as 0.01. The error might stay steady for a long time, then after many epochs, suddenly drop.

Experiments and report

Try several experiments testing combinations of a different number of hidden layers, a different number of neurons in each hidden layer, and different learning rates. Make graphs of the rmse (and any other measure you use) over time (epochs) and include these graphs and discussion in your report on what network architecture seems optimal for the problem and how changing the network architecture affects the network's performance. Do all this for both Problems 1 and 2 (and the graduate part for 427 and 527 students). Be sure to evaluate performance on both the testing sets and the validation sets.