Four hypertext matrices used for performance comparisons
among the symbolic and spectral-based methods presented
in Sections 3.3 through 3.5 are listed
in Table 3. The first two matrices, ` MAN1` and
` MAN2`, were constructed from the * See Also* entries of the BSD 4.3
Unix manpages. The manpage of ` who`, for example, contains
the * See Also* entries ` getuid` and ` utmp`. Hence,
two links are associated with ` who`, namely
` who`` getuid` and
` who`` utmp`. Parsing all **625** manpages
produced **1853** links, and hence the rows and columns of ` MAN1`
correspond to links and manpages, respectively. The nonzero elements
of both ` MAN1` and ` MAN2` are all **1**'s and simply reflect the
incidence rather than frequency of linkage. The ` MAN2` matrix
was derived from the ` MAN1` matrix by removing duplicate links
(i.e., ` who`` getuid` is the same as
` getuid`` who`) and removing **18** manpages
(columns) whose links were not connected to the main graph of
the ` MAN1` matrix. The resulting ` MAN2` matrix had
**1426** unique links corresponding to **607** manpages.

The **850** entries under the letter A of the
Concise Columbia Encyclopedia
(1989 Second Edition, on-line version) produced the **1778** cross-references
or links for the ` CCE-A` matrix listed in
Table 3. For the **14**-th entry ABDOMEN shown below,
there are
five cross-references or links indicated by brackets: [stomach], [liver],
[gall bladder], [pancreas], and [kidneys]. Hence, the **14**-th
column of ` CCE-A` has five nonzeros (or **1**'s) in row positions
**25** ([stomach]) through **29** ([kidneys]), which correspond to the links in
order of their occurrence in the text.

ABDOMEN in vertebrates, portion of the trunk between the diaphragm and lower pelvis. In humans the abdominal cavity is lined with a thin membrane, the peritoneum, which encloses the [stomach], intestines, [liver], and [gall bladder]. The [pancreas], [kidneys], urinary bladder, and, in the female, reproductive organs are also located within the abdominal cavity.The

The density and average number of nonzeros per row ()
and column () of each of the four hypertext matrices are
also provided in Table 3. The * Density* of
each matrix is defined to be the ratio (Nonzeros) (Rows Columns).
Clearly, these matrices are quite sparse with only **1** or **2** links
associated with each document (manpage, article, or HTML page).

Table 4 lists the bandwidth , envelope size
, and the alternative bandwidth measure
(see in Section 3.1) for each of the hypertext matrices, and
Figure 1 illustrates the nonzero patterns for three of
the four matrices considered. The upper-triangular structure for the
nonzeros of matrices ` MAN2` and
` CCE-A` reflects the identification of links
(cross-references) in their order of occurrence in the original
(on-line) text.

**Table 4:** Profiles of the hypertext
matrices prior to reordering;

is the envelope size, denotes the bandwidth,
and is the alternative (non-diagonal) bandwidth measure.

The goal of the techniques in the next two sections will be to reorder rows and columns of the term-by-document matrix to reduce both and . A similar problem for symmetric matrices arises in the context of sparse Cholesky factorization. Since the Cholesky decomposition is stable under any symmetric permutation, various schemes have been proposed to reorder the rows and columns to reduce the number of generated nonzero values during the factorization. Several of the techniques derived below have their antecedents in symmetric envelope reduction algorithms. The correspondence analysis technique described in Section 3.5 is the primary exception, since its expense is too large for the Cholesky reordering problem.

Michael W. Berry (berry@cs.utk.edu)

Mon Jan 29 14:30:24 EST 1996