The envelope minimization problem for a term-by-document (or hypertext) matrix can be formulated and solved in purely symbolic terms by reordering vertices in a suitable graph representation of the matrix. The graph methods we describe in this section are based on reorderings for sparse symmetric matrices for Cholesky factorization.

Perhaps the most widely used envelope minimization method for symmetric
sparse matrices
is the Reverse Cuthill-McKee (RCM) method of Alan George [Geo71] which is
applied to the graph of the matrix.
For an symmetric matrix **B**, the graph
is undirected with **n** vertices each corresponding to
a row or column and edges corresponding to each nonzero, i.e.
**iff** . The RCM method generates
a new labeling or ordering of the rows and columns of **B**. Observe that
if , , row **u** has been labeled, but
rows **v** and **z** have not, then, to
minimize the bandwidth of row **u**, **v** should be numbered as soon as possible. Furthermore,
to minimize the bandwidth of row **z**, **z** should also be numbered as soon as possible
after **u** and **v**. In terms of , notice that **z** is adjacent to **v** which
is in turn adjacent to **u**.
The RCM method makes use of this observation. The main
step involves a modified breadth first search (level search) from a
designated starting vertex; the modification to breadth first search is
that neighbors of a given vertex are explored in increasing order of
degree. The RCM
numbering is obtained by reversing the breadth first search numbering,
i.e., if vertex **u** is the **i**-th vertex to be explored then its RCM labeling
is **n-i+1**. This reversal was shown to produce a better envelope
[LS76]. The choice of the starting vertex is very significant and
a * peripheral* vertex is desired. The implementation of
of RCM [GL81] uses an approximation to a peripheral vertex by
choosing a vertex of high * eccentricity*, i.e., a
vertex whose distance to some other vertex in the graph is
close to the maximum distance between any two vertices in the graph.

For nonsymmetric overdetermined hypertext matrices, bipartite graphs
provide a natural extension of the graph model for symmetric matrices.
For the hypertext matrix **A**, the associated undirected bipartite graph is
denoted by and has **m** row vertices and **n** column vertices. The row
vertices are labeled and the column vertices
are labeled .
The graph has an edge between row vertex and column
vertex **c** for each .
To compute reorderings of **A** we apply RCM to **H** but we maintain
two distinct numbering sequences during modified breadth first search: one for
the row vertices and another for the column vertices. We obtain
the final reordering by reversing each of these sequences.
For example, if is a row (column) vertex
numbered (**l**) during the search, then it is given the final
number (**n-l+1**).
Figure 2 illustrates the main step in RCM for
the term-by-document matrix from Section 2.2 and
Table 5 shows the reordered matrix.

The complexity of the RCM for ordering **H** is proportional to the product
of the maximum degree of any vertex in **H** and the total number of
edges (nonzeroes in the matrix **A**). For hypertext matrices
with small maximum degree, the method would be extremely fast.
The strength of the method is its low time complexity but it does
suffer from certain drawbacks. The heuristic for finding the starting
vertex is influenced by the initial numbering of vertices and so
the quality of the reordering can vary slightly for the same problem
for different initial numberings. Next, the overall method does not
accommodate dense rows (e.g., a common link used in every document),
and if a row has a significantly large
number of nonzeroes it might be best to process it separately;
i.e., extract the dense rows, reorder the remaining matrix and
augment it by the dense rows (or common links) numbered last.

Michael W. Berry (berry@cs.utk.edu)

Mon Jan 29 14:30:24 EST 1996