Project 2 - How to Better Train a Neural Network (Due 09/28)
Objectives:
The objective of this project is to obtain in-depth
understanding on practical issues in training a neural network,
including CNN. MNIST will be used as the benchmark.
Requirements:
- Task 1: Based on Nielsen's original BPNN code, "network.py",
experiment the following
- Task 1.1: Effect of cost function using the default network
structure of [784,10] with no hidden layer
- Implement quadratic cost function with sigmoid as
activation function and plot
the convergence curve
- Implement cross entropy as cost function with sigmoid as activation function
and plot the convergence curve
- Implement log-likelihood as cost function with softmax as
output layer activation function and plot the convergence curve
- In the report, need to comment on the behavior of
different cost function in resolving the learning slowdown
problem. Also explain what is learning slowdown problem.
- Task 1.2: Effect of regularization using the default network
structure of [784,10] with no hidden layer, and cost entropy as cost function
- Add L2-normalization in the cost function and plot the
convergence curve.
- Add L1-normalization in the cost function and plot the
convergence curve.
- Base on the L1-normalization, expand training dataset
using affine transformations, i.e., scaling, rotation, and
translation. Plot the convergence curve.
- In the report, need to comment on the problem of
overfitting, and how regularization terms can potentially solve the
overfitting problem. Also comment on the difference between L1 and
L2 normalization. Finally, comment on the two aspects of
improving accuracy, i.e., improve the algorithm or improve the
training set.
- Task 1.3: Effect of hidden layers based on cross-entropy
with L1-normalization and expanded training set. (No dropout needed.)
- Add one hidden layer with 30 nodes. Plot the convergence
curve.
- Add two hidden layers with 30 nodes each. Plot the
convergence curve and the change rate of each node in the hidden
layers. Use the partial derivative of the cost function with respect
to the bias as an indicator of the change rate of each node.
- (692 only, bonus for 599) Based on your L1-normalization
implementation and the expanded training set, add dropout. Experiment convergence curve with
different percentages of dropout.
- In the report, need to comment on the unstable gradient
problem.
- (692 only, bonus for 599) In the report, comment on the
difference between normalization-based regularization and
dropout-based regularization.
- Task 2: CNN (LeNet) with TensorFlow
- Study the sample code
from Liu Liu on how to train a LeNet-5 to recognize MNIST
digits.
- Modify hyperparameters to get to the best performance you can
achieve.
- In the report, provide a plot of accuracy improvement using
the previously mentioned techniques. Also plot the convergence curve
for LeNet-5. Comment on why you think LeNet-5 further improves the
accuracy is any at all. And if it doesn't, why not?
Note that this is an open-ended problem. We'll have a
leaderboard updated frequently to report the best in class.
Report
Please
submit the report (in .pdf) and source code (in .zip) through Canvas before midnight on the due date.