The envelope minimization problem for a term-by-document (or hypertext) matrix can be formulated and solved in purely symbolic terms by reordering vertices in a suitable graph representation of the matrix. The graph methods we describe in this section are based on reorderings for sparse symmetric matrices for Cholesky factorization.
Perhaps the most widely used envelope minimization method for symmetric sparse matrices is the Reverse Cuthill-McKee (RCM) method of Alan George [Geo71] which is applied to the graph of the matrix. For an symmetric matrix B, the graph is undirected with n vertices each corresponding to a row or column and edges corresponding to each nonzero, i.e. iff . The RCM method generates a new labeling or ordering of the rows and columns of B. Observe that if , , row u has been labeled, but rows v and z have not, then, to minimize the bandwidth of row u, v should be numbered as soon as possible. Furthermore, to minimize the bandwidth of row z, z should also be numbered as soon as possible after u and v. In terms of , notice that z is adjacent to v which is in turn adjacent to u. The RCM method makes use of this observation. The main step involves a modified breadth first search (level search) from a designated starting vertex; the modification to breadth first search is that neighbors of a given vertex are explored in increasing order of degree. The RCM numbering is obtained by reversing the breadth first search numbering, i.e., if vertex u is the i-th vertex to be explored then its RCM labeling is n-i+1. This reversal was shown to produce a better envelope [LS76]. The choice of the starting vertex is very significant and a peripheral vertex is desired. The implementation of of RCM [GL81] uses an approximation to a peripheral vertex by choosing a vertex of high eccentricity, i.e., a vertex whose distance to some other vertex in the graph is close to the maximum distance between any two vertices in the graph.
For nonsymmetric overdetermined hypertext matrices, bipartite graphs provide a natural extension of the graph model for symmetric matrices. For the hypertext matrix A, the associated undirected bipartite graph is denoted by and has m row vertices and n column vertices. The row vertices are labeled and the column vertices are labeled . The graph has an edge between row vertex and column vertex c for each . To compute reorderings of A we apply RCM to H but we maintain two distinct numbering sequences during modified breadth first search: one for the row vertices and another for the column vertices. We obtain the final reordering by reversing each of these sequences. For example, if is a row (column) vertex numbered (l) during the search, then it is given the final number (n-l+1). Figure 2 illustrates the main step in RCM for the term-by-document matrix from Section 2.2 and Table 5 shows the reordered matrix.
The complexity of the RCM for ordering H is proportional to the product of the maximum degree of any vertex in H and the total number of edges (nonzeroes in the matrix A). For hypertext matrices with small maximum degree, the method would be extremely fast. The strength of the method is its low time complexity but it does suffer from certain drawbacks. The heuristic for finding the starting vertex is influenced by the initial numbering of vertices and so the quality of the reordering can vary slightly for the same problem for different initial numberings. Next, the overall method does not accommodate dense rows (e.g., a common link used in every document), and if a row has a significantly large number of nonzeroes it might be best to process it separately; i.e., extract the dense rows, reorder the remaining matrix and augment it by the dense rows (or common links) numbered last.