3. Reordering Techniques

We now consider the use of symbolic and spectral methods to permute the term-document matrix defined in Equation (1). The goal of such permutations is to make the detection of document (or hypertext) clusters more immediate without having to consider high-dimensional representations such as those used in LSI. One desirable form for the detection of such clusters is a banded or nearly diagonal matrix in which all the nonzero values (weighted term frequencies) fall within a band in each row and column. Specifically, the nonzero values should all fall near the line from the upper left to the lower right of the matrix. Such a nonzero structure (or pattern) facilitates the identification (demonstrated in Section 4.3) of term or document clusters having similar meaning and context.

Michael W. Berry (
Mon Jan 29 14:30:24 EST 1996