CS 420/594 — Biologically Inspired Computation
NetLogo Simulation

Perceptron Geometry


This page was automatically generated by NetLogo 4.0.3. Questions, problems? Contact feedback@ccl.northwestern.edu.

The applet requires Java 1.4.1 or higher. It will not run on Windows 95 or Mac OS 8 or 9. Mac users must have OS X 10.2.6 or higher and use a browser that supports Java 1.4. (Safari works, IE does not. Mac OS X comes with Safari. Open Safari and set it as your default web browser under Safari/Preferences/General.) On other operating systems, you may obtain the latest Java plugin from Sun’s Java site.  General information on the models, including instructions for running them on your own computer, is available from the NetLogo Simulation Information Page.  To download this page, do not use "Save As," but right-click (or on Macs control-click) on this link.  You also need to download the NetLogo program, which you can do by right-clicking or control-clicking this link.

powered by NetLogo

view/download model file: Perceptron-Geometry.nlogo

WHAT IS IT?

This model demonstrates the geometry of the Perceptron Learning Algorithm. It generates a linearly separable, or almost linearly separable, set of data, and shows how a weight vector can be adjusted so that a single perceptron is able to separate the positive and negative data points. Two classes of data are linearly separable if they can be separated by a straight line, flat plane, or a flat hyperplane (for dimensions greater than 3).

A perceptron is a simple artificial neuron with n inputs and one binary output. Each input k has a weight (synaptic strength) w_k, and so we can treat the n weights as a vector, w = (w_1, ..., w_n). Each neuron also has a threshold t, and if the weighted inputs are greater than the threshold, the neuron produces a 1 output, otherwise it produces a 0 output. Therefore, if the input is a vector x = (x_1, ..., x_n), then the output will be 1 if w_1 x_1 + ... + w_n x_n > t, and 0 otherwise. In vector notation, we can say the output is 1 if the dot product w.x > t.

Suppose we are given a dataset comprising a number p of data vectors x1, x2, ..., xp, and that some of these are positive examples of some class of patterns (i.e., they fit the pattern), and others are negative examples (they don't fit). The perceptron learning problem is to find a single threshold and set of weights so that the perceptron correctly classifies all these data (and, hopefully, other data of a similar kind). The perceptron learning algorithm solves this problem.

There are several things we can do to simplify perceptron learning. First, we can eliminate the threshold by treating it as an extra, "zeroth" weight. To do this, we add a corresponding zeroth element to the data vectors, which is always -1. In effect the zeroth input is "clamped" at a -1 value. To see why this works, define an extended weight vector W where W_0 = t, W_1 = w_1, ..., W_n = w_n. Also, let X be the "extended" vector corresponding to x: X = (-1, x_1, x_2, ..., x_n). So now we are working with n+1 dimensional vectors. Note that:

W.X = t (-1) + w_1 x_1 + ... + w_n x_n = w.x - t

Thus w.x > t if and only if W.X > 0. Therefore, since the threhold has been turned into a weight, we only have to worry about adjusting weights.

The second simplification is to find a way to treat the positive and negative examples the same way. If X is a positive example, we want W.X > 0, but if X is a negative example, we want W.X < 0. However, since W.(-X) = -W.X, we know that W.X < 0 if and only if W.(-X) > 0. Therefore, we can replace the negative examples by their complements and treat them like positive examples. Specifically, we will define a new set of test data Z1, ..., Zp corresponding to the original data X1, ..., Xp. If Xi is a positive sample, then Zi = Xi, but if Xi is a negative sample, then Zi = -Xi.

Therefore, we have a simplified perceptron training problem, which is to find a weight wector W such that W.Z > 0 for every (modified) sample data vector Z.

For ease in visualization, this model uses two-dimensional data and weight vectors. It generates random test data, linearly separable into positive and negative examples. The model then converts all the data to positive example, as explained above, and demonstrates the adjustment of the weight vector so that all the test data are on the same (positive) side of the separating line.


HOW IT WORKS

The learning procedure is as follows. The algorithm selects a random data point. If it is already classified correctly (i.e., W.Z > 0), the algorithm does nothing. If it is classified incorrectly (W.Z <= 0), then it alters the weight vector (to W') so that the classification is closer to being correct (W'.Z > W.Z). It does this by vector addition of a fraction (eta) of the data vector to the weight vector in order to produce the new weight vector: W' = W + eta * Z. The model displays this vector addition, which can be seen most clearly by stepping through the update process (clicking repeatedly in order: Test One Datum, Calculate New Weight, and Update Weight).

Eta determines the learning rate, but if it is too high, it may lead to instability, since the weight vector may adapt too much to the most recent datum, and "forget" previous ones.


HOW TO USE IT

NUM DATA POINTS determines the number of random test data points to be generated. They have an equal porbability of being positive (shown in red) or negative (shown in blue). The two classes are all or mostly linearly separable.

CORRECT SEPARATOR set the angle of the normal vector to the line through the origin that will separate the positive from the negative samples (except for the exceptions: see next).

NON-LINEAR-SEPARABLE determines the percentage of samples that are not linearly separable. They are located randomly and are randomly positive or negative. Set this slider to 0 in order to have the positive and negative samples separated by Correct Separator.

SETUP generates the requested number of sample data points, classified according to the specified separator and percentage of exception. Positive and negative samples are indicated by red and blue, respectively. The initial, randomly chosen weight vector is shown in green, along with the separating line perpendicular to it.

REFLECT NEGATIVES converts all the samples to positive samples by replacing the negative samples by their negations. After Setup, you should click Reflect Negatives so that all the data are on one side of the separator.

ETA is the learning rate. It can be changed while the model is running. For example decreasing Eta may allow a non-converging learning process to converge.

To walk through learning process step by step, use the following three buttons:

TEST ONE DATUM picks a random data point for learning; it is indicated by yellow color.

CALCULATE NEW WEIGHT calculates the new weight vector (blue), which is shown as a vector sum of the old weight vector (green) and Eta times the test datum (yellow).

UPDATE WEIGHT replaces the old weight vector by the new one, and rotates the separating line accordingly.

The following controls allow the learning algorithm to be run continuously:

DELAY is the amount of time between each of the above three steps in continuous update (GO) mode. You can change the Delay while the model is running in order to speed it up or slow it down.

GO continuously cycles through the three steps, Test One Datum, Calculate New Weight, and Update Weight.

LEARNED ANGLE displays the angle of the learned weight vector, which may be compared to the angle set by Correct Separator.


THINGS TO NOTICE

Notice that if the learning rate Eta is too high, the learning process may not converge. Try decreasing Eta to see if it converges.

Observe the differences depending on whether the initial random weight vector tends to point in about the same direction as the positive test, or tends to point in the opposite direction.

Observe the effects of non-linearly separable data on learning convergence.

Notice how close the angle of the learned weight vector comes to approximate the angle of the separator used to generate the test data. Can you think of ways of altering the learning algorithm to make the approximation closer?


THINGS TO TRY

Try various learning rates (Eta). The Perceptron Learning Theorem proves that for linearly separable data the Perceptron Learning Algorithm will always converge *if the learning rate is sufficiently slow*. On the other hand, we would like learning to go as fast as possible. Run some experiments and determine a learning rate that always, or almost always, converges. Does it depend on the number of data points?

Try introducing a small percentage (e.g., 5%) of non-linearly separable data, and observe the behavior of the model with several randomly generated datasets.


EXTENDING THE MODEL

Notice that the weight vector often gets shorter as it is updated, which could lead to numerical problems with a large number of updates. Since only the direction of the weight vector matters, how could you modify the model keep it from getting shorter?

Since all that really matters is the directions of the vectors, and not their lengths, consider modifying the algorithm to work in terms of normalized vectors. How does this new algorithm compare with the original one.

The original perceptron learning algorithm, embodied in this model, makes no weight adjustment for data that are classified correctly ("if it ain't broke, don't fix it"). Can you think of some useful adjustment that could be done for correctly classified data? Try out your idea.


NETLOGO FEATURES

This model uses the links feature to draw the vectors and the left and right segments of the separating line. The endpoints of these lines are represented by turtles (sometimes hidden).


RELATED MODELS

There are two related models in the standard Models Library (Computer Science, unverfied): Perceptron, which gives a neural network view of perceptron learning, and Artificial Neural Net, which demonstrates the back-propagation learning (essentially multilayer perceptron learning).


CREDITS AND REFERENCES

The perceptron, which was one of the first artificial neural net models, was developed by Frank Rosenblatt in the late 1950s. It is described in most textbooks on artificial neural networks, but this particular model follows Ballard (1997, ¤8.2).

Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386-408.

Rosenblatt, F. (1962). Principles of neurodynamics. Spartan Books.

Ballard, D.H. (1997). An introduction to natural computation. MIT Press.

To refer to this model in academic publications, please use: MacLennan, B.J. (2008). NetLogo Perceptron Geometry model. http://www.cs.utk.edu/~mclennan. Dept. of Electrical Engineering & Computer Science, Univ. of Tennessee, Knoxville.

In other publications, please use: Copyright 2008 Bruce MacLennan. All rights reserved. See http://www.cs.utk.edu/~mclennan/420/NetLogo/Perceptron-Geometry.html for terms of use.



Return to CS 420/594 home page

Return to MacLennan's home page

Send mail to Bruce MacLennan / MacLennan@utk.edu

Valid HTML 4.01! This page is www.cs.utk.edu/~mclennan/Classes/420/NetLogo/Perceptron-Geometry.html
Last updated: 2008-10-12.