CS 420/527 — Biologically Inspired Computation
NetLogo Simulation

Perceptron Geometry

This page was automatically generated by NetLogo 4.1. Questions, problems? Contact feedback@ccl.northwestern.edu.

The applet requires Java 5 or higher. Java must be enabled in your browser settings. Mac users must have Mac OS X 10.4 or higher. Windows and Linux users may obtain the latest Java from Sun's Java site. If the display appear cut off with Firefox, then try another browser (Safari works).

view/download model file: Perceptron-Geometry.nlogo

WHAT IS IT?

This model demonstrates the geometry of the Perceptron Learning Algorithm. It generates a linearly separable, or almost linearly separable, set of data, and shows how a weight vector can be adjusted so that a single perceptron is able to separate the positive and negative data points. Two classes of data are linearly separable if they can be separated by a straight line, flat plane, or a flat hyperplane (for dimensions greater than 3).

A perceptron is a simple artificial neuron with n inputs and one binary output. Each input k has a weight (synaptic strength) w_k, and so we can treat the n weights as a vector, w = (w_1, ..., w_n). Each neuron also has a threshold t, and if the weighted inputs are greater than the threshold, the neuron produces a 1 output, otherwise it produces a 0 output. Therefore, if the input is a vector x = (x_1, ..., x_n), then the output will be 1 if w_1 x_1 + ... + w_n x_n > t, and 0 otherwise. In vector notation, we can say the output is 1 if the dot product w.x > t.

Suppose we are given a dataset comprising a number p of data vectors x1, x2, ..., xp, and that some of these are positive examples of some class of patterns (i.e., they fit the pattern), and others are negative examples (they don't fit). The perceptron learning problem is to find a single threshold and set of weights so that the perceptron correctly classifies all these data (and, hopefully, other data of a similar kind). The perceptron learning algorithm solves this problem.

There are several things we can do to simplify perceptron learning. First, we can eliminate the threshold by treating it as an extra, "zeroth" weight. To do this, we add a corresponding zeroth element to the data vectors, which is always -1. In effect the zeroth input is "clamped" at a -1 value. To see why this works, define an extended weight vector W where W_0 = t, W_1 = w_1, ..., W_n = w_n. Also, let X be the "extended" vector corresponding to x: X = (-1, x_1, x_2, ..., x_n). So now we are working with n+1 dimensional vectors. Note that:

W.X = t (-1) + w_1 x_1 + ... + w_n x_n = w.x - t

Thus w.x > t if and only if W.X > 0. Therefore, since the threhold has been turned into a weight, we only have to worry about adjusting weights.

The second simplification is to find a way to treat the positive and negative examples the same way. If X is a positive example, we want W.X > 0, but if X is a negative example, we want W.X < 0. However, since W.(-X) = -W.X, we know that W.X < 0 if and only if W.(-X) > 0. Therefore, we can replace the negative examples by their complements and treat them like positive examples. Specifically, we will define a new set of test data Z1, ..., Zp corresponding to the original data X1, ..., Xp. If Xi is a positive sample, then Zi = Xi, but if Xi is a negative sample, then Zi = -Xi.

Therefore, we have a simplified perceptron training problem, which is to find a weight wector W such that W.Z > 0 for every (modified) sample data vector Z.

For ease in visualization, this model uses two-dimensional data and weight vectors. It generates random test data, linearly separable into positive and negative examples. The model then converts all the data to positive example, as explained above, and demonstrates the adjustment of the weight vector so that all the test data are on the same (positive) side of the separating line.

HOW IT WORKS

The learning procedure is as follows. The algorithm selects a random data point. If it is already classified correctly (i.e., W.Z > 0), the algorithm does nothing. If it is classified incorrectly (W.Z <= 0), then it alters the weight vector (to W') so that the classification is closer to being correct (W'.Z > W.Z). It does this by vector addition of a fraction (eta) of the data vector to the weight vector in order to produce the new weight vector: W' = W + eta * Z. The model displays this vector addition, which can be seen most clearly by stepping through the update process (clicking repeatedly in order: Test One Datum, Calculate New Weight, and Update Weight).

Eta determines the learning rate, but if it is too high, it may lead to instability, since the weight vector may adapt too much to the most recent datum, and "forget" previous ones.

HOW TO USE IT

NUM DATA POINTS determines the number of random test data points to be generated. They have an equal porbability of being positive (shown in red) or negative (shown in blue). The two classes are all or mostly linearly separable.

CORRECT SEPARATOR set the angle of the normal vector to the line through the origin that will separate the positive from the negative samples (except for the exceptions: see next).

NON-LINEAR-SEPARABLE determines the percentage of samples that are not linearly separable. They are located randomly and are randomly positive or negative. Set this slider to 0 in order to have the positive and negative samples separated by Correct Separator.

SETUP generates the requested number of sample data points, classified according to the specified separator and percentage of exception. Positive and negative samples are indicated by red and blue, respectively. The initial, randomly chosen weight vector is shown in green, along with the separating line perpendicular to it.

REFLECT NEGATIVES converts all the samples to positive samples by replacing the negative samples by their negations. After Setup, you should click Reflect Negatives so that all the data are on one side of the separator.

ETA is the learning rate. It can be changed while the model is running. For example decreasing Eta may allow a non-converging learning process to converge.

To walk through learning process step by step, use the following three buttons:

TEST ONE DATUM picks a random data point for learning; it is indicated by yellow color.

CALCULATE NEW WEIGHT calculates the new weight vector (blue), which is shown as a vector sum of the old weight vector (green) and Eta times the test datum (yellow).

UPDATE WEIGHT replaces the old weight vector by the new one, and rotates the separating line accordingly.

The following controls allow the learning algorithm to be run continuously:

DELAY is the amount of time between each of the above three steps in continuous update (GO) mode. You can change the Delay while the model is running in order to speed it up or slow it down.

GO continuously cycles through the three steps, Test One Datum, Calculate New Weight, and Update Weight.

LEARNED ANGLE displays the angle of the learned weight vector, which may be compared to the angle set by Correct Separator.

THINGS TO NOTICE

Notice that if the learning rate Eta is too high, the learning process may not converge. Try decreasing Eta to see if it converges.

Observe the differences depending on whether the initial random weight vector tends to point in about the same direction as the positive test, or tends to point in the opposite direction.

Observe the effects of non-linearly separable data on learning convergence.

Notice how close the angle of the learned weight vector comes to approximate the angle of the separator used to generate the test data. Can you think of ways of altering the learning algorithm to make the approximation closer?

THINGS TO TRY

Try various learning rates (Eta). The Perceptron Learning Theorem proves that for linearly separable data the Perceptron Learning Algorithm will always converge *if the learning rate is sufficiently slow*. On the other hand, we would like learning to go as fast as possible. Run some experiments and determine a learning rate that always, or almost always, converges. Does it depend on the number of data points?

Try introducing a small percentage (e.g., 5%) of non-linearly separable data, and observe the behavior of the model with several randomly generated datasets.

EXTENDING THE MODEL

Notice that the weight vector often gets shorter as it is updated, which could lead to numerical problems with a large number of updates. Since only the direction of the weight vector matters, how could you modify the model keep it from getting shorter?

Since all that really matters is the directions of the vectors, and not their lengths, consider modifying the algorithm to work in terms of normalized vectors. How does this new algorithm compare with the original one.

The original perceptron learning algorithm, embodied in this model, makes no weight adjustment for data that are classified correctly ("if it ain't broke, don't fix it"). Can you think of some useful adjustment that could be done for correctly classified data? Try out your idea.

NETLOGO FEATURES

This model uses the links feature to draw the vectors and the left and right segments of the separating line. The endpoints of these lines are represented by turtles (sometimes hidden).

RELATED MODELS

There are two related models in the standard Models Library (Computer Science, unverfied): Perceptron, which gives a neural network view of perceptron learning, and Artificial Neural Net, which demonstrates the back-propagation learning (essentially multilayer perceptron learning).

CREDITS AND REFERENCES

The perceptron, which was one of the first artificial neural net models, was developed by Frank Rosenblatt in the late 1950s. It is described in most textbooks on artificial neural networks, but this particular model follows Ballard (1997, ¤8.2).

Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386-408.

Rosenblatt, F. (1962). Principles of neurodynamics. Spartan Books.

Ballard, D.H. (1997). An introduction to natural computation. MIT Press.

To refer to this model in academic publications, please use: MacLennan, B.J. (2008). NetLogo Perceptron Geometry model. http://www.cs.utk.edu/~mclennan. Dept. of Electrical Engineering & Computer Science, Univ. of Tennessee, Knoxville.

PROCEDURES

globals [
 max-X ; X coordinates are +- max-X
max-Y ; Y coordinates are +- max-Y
test-datum ; the current datum under test
weight ; the current weight vector
new-weight ; temporary new weight vector
weight-change? ; set if a weight has changed
learned-angle ; the angle of the currrent weight vector
]
breed [ data datum ] ; type of the data points
breed [ origins origin ] ; type of the origin point
breed [ weight-vectors weight-vector ] ; type of the weight vector
directed-link-breed [ normal-vectors normal-vector ] ; normal vector to separator
undirected-link-breed [ separatrices separatrix ] ; separator line segments
breed [ septx-ends septx-end ] ; end points of separator
directed-link-breed [ data-vectors data-vector ] ; visible data vectors
data-own [ positive? ] ; set if it is a positive example
septx-ends-own [ theta ] ; angle separator ends (+- 90)
to setup
 ca
set max-X max-pxcor - 1
set max-Y max-pycor - 1
create-origins 1 [ initialize-origin ]
 create-data num_data_points [ initialize-data ]
 create-weight-vectors 1 [ initialize-weight-vector ]
 create-septx-ends 1 [ initialize-separatrix 90 ]
 create-septx-ends 1 [ initialize-separatrix -90 ]
 set weight-change? false
set learned-angle [ atan xcor ycor ] of weight
end
to initialize-origin
 set size 2
set shape "circle"
set color green
setxy 0 0
end
to initialize-data
 set size 2
set shape "circle"
set color red
set xcor (random (2 * max-X) - max-X)
 set ycor (random (2 * max-Y) - max-Y)
 ifelse non-linear-separable <= random 100 ; generate lin-sep data
[ ifelse (xcor * sin correct_separator + ycor * cos correct_separator) > 0
[ set positive? true ]
 [ set positive? false ] ]
 [ ifelse 0 = random 1 ; generate non-lin-sep data
[ set positive? true ]
 [ set positive? false ] ]
 set color ifelse-value positive? [ red ] [ blue ]
end
;; Generate initial random weight vector
;; Weight-vector procedure
to initialize-weight-vector
 hide-turtle
let init-theta random 360
let magnitude 0.9 * max-X
 setxy (magnitude * sin init-theta) (magnitude * cos init-theta)
 create-normal-vector-from origin 0 [
 set thickness 0.5
set color green
]
set weight self
end
;; Generate separator line segments relative to weight vector
;; Separatrix endpoint (septx-end) procedure
to initialize-separatrix [init-theta] ; theta = +- 90
hide-turtle
set theta init-theta
 move-separatrix
 create-separatrix-with origin 0 [
 set thickness 0.5
set color green
]
end
;; Reposition separatrix endpoint relative to weight vector
;; Separatrix endpoint (septx-end) procedure
to move-separatrix
 setxy 0 0
set heading theta + towards weight
 fd max-X
end
;; Convert negative examples to positive examples
to reflect-negatives
 let w_X [ xcor ] of weight
 let w_Y [ ycor ] of weight
 ask data [
 if not positive? [
 setxy (- xcor) (- ycor)
 ]
 ]
 wait 0.5 ; so that color change is visible
ask data [ set color red ]
end
;; Cycle through learning procedure steps
to go
 test-one
 wait delay
 calculate-new-weight
 wait delay
 update-weight
 wait delay
end
;; Select random datum for testing
to test-one
 set test-datum one-of data
 ask test-datum [
 set color yellow 
 create-data-vector-from origin 0 [
 set color yellow
]
]
set weight-change? false
end
;; Calculate new weight vector (if necessary) for test datum
to calculate-new-weight
 let old_X [ xcor ] of weight
 let old_Y [ ycor ] of weight
 let datum_X [ xcor ] of test-datum
 let datum_Y [ ycor ] of test-datum
 if (old_X * datum_X + old_Y * datum_Y) <= 0 [
 create-weight-vectors 1 [ ; create temporary new weight
set new-weight self
setxy (old_X + eta * datum_X) (old_Y + eta * datum_Y)
 create-data-vector-from weight [ ; the update vector
set color yellow
]
create-data-vector-from origin 0 [ ; the new weight vector
set color blue
]
]
set weight-change? true
]
end
;; Update weight vector and update display
to update-weight
 ask test-datum [ set color red ]
 ask data-vectors [ die ]
 if weight-change? [
 ask weight [ move-to new-weight ]
 ask new-weight [ die ] ; destroy temporary new weight
ask septx-ends [ move-separatrix ]
 set weight-change? false
set learned-angle [ atan xcor ycor ] of weight
 ]
end