Final Questions 22 Nov 2011
Q1: The value of PSTs is in their potential with real datasets.
This is a multipart question using PSTs on real time series.
The datasets are x-y-z coordinate-sensors on a human in three activities:
walking, sitting, or lying down. The datasets are doubles as time-series.
The original data is heavily filtered into abs(diff)) values, restricted to the
range 1:10 and rounded to uint8. This means the PST p-codes can process them.
The files ta.mat (lying down), tb.mat (walking), and tc.mat (sitting)
are on the same webpage as the
MATLAB p-code. Each is 150 X 3 for 150 samples of x-y-z data.
t-(1:50,:) is for one block of time; t-(51:100,:) is a different
block of time; t-(101:end,:) is again a different block of time.
(I) The first part of this question is whether PST models can be used
to classify the human's activity via an input sequence. The first
part of the first part is to test x, y, and z separately to see whether
one seems better than the others in classification.
(a) Use sample 1:50 for x to build three PSTs. Compute CompPST
between them.
(b) Use sample 51:100 for x as one sequence to test for correct match.
Record the counts of correct and missed in a neat table.
(c) Now try this classification method: break sample 51:100 into
subsequences 51:5:100, match each subsequence, and use the sum of
the log-probs for the subsequences to classify the sequence.
Again record the counts of correct/missed.
(d) Repeat (a)-(b)-(c) for the y coordinate.
(e) Repeat (a)-(b)-(c) for the z coordinate.
Which coordinate and which method (b) or (c) gives the best overall
performance?
(II) A potentially valuable use of PSTs is in predicting future sequence
from sequence seen so far. Predict.p does this by maximum-likelihood,
given an input sequence and a PST. Use as
[syms,prob] = Predict(seq,IC,ah)
for given sequence 'seq', PST 'IC', and positive integer 'ah' for
syms to predict ahead.
(a) Use seq = ta(51:60,3) to test Predict(seq,PST,10) for each
PST inferred above for the z coordinate.
(b) Compute norm(ACTUAL-syms(1:10)) for the actual 10 next symbols
ta(61:70,3) as a numerical comparison of predicted with actual.
Record these error norms in a table with clear labels.
(c) Repeat (a)-(b) using seq = tb(51:60,3) and ACTUAL = tb(61:70,3).
(d) Repeat (a)-(b) using seq = tc(51:60,3) and ACTUAL = tc(61:70,3).
(III) HAVING DECIDED ON A METHOD IN (I) ABOVE, TEST ITS PERFORMANCE
WITH THE REMAINING DATA t-(101:end,?). USE PSTs INFERRED FROM
t-(1:50,?). GIVE HITS/MISS COUNTRS IN NEAT TABLULAR FORM. ALSO
REPEAT (II) AND TABULATE NORMS FOR THE PREDICTIONS.
-----------------------------------------------------------------------
Q2: PSTs may not be ideal for texture identification, but we can still give
it a try. On the usual webpage: im1d, im2d, im3d are hard-thresholded images
of three textures (a stone walkway, eggs, and a rug). These are *.mat files
to be loaded by MATLAB. NOTE: this is an experiment with real, hard-thresholded
data not claimed to be best for PSTs---the classification performance
might be terrible!
Also x1, x2, x3 are corresponding
scan-line samples to use in FM3 to infer three PSTs. List the
CompPST value for each pair of PSTs.
Then test using the 80 samples im1d(41:120,:), likewise for im2d and im3d
for the same indices. As usual, record clearly the counts of hits/misses
for each class.
Suppose we decide that im1d and im2d are really the same class.
Use MCPST to merge the PSTs for im1d and im2d, but keep the im3d-PST separate.
List the CompPST value for each original PST compared with this merged PST.
Then repeat the 80-sample
experiments with the three sets of data. Count as hit for im1d/im2d if
the new merged PST wins. Record the counts of hits/misses for this two-class
problem.