Next: Computational Time Up: Performance on Hypertext Previous: Performance on Hypertext

## 4.1. Bandwidth Reduction

As indicated by values in Tables 8 through 10, , , and are very large for the hypertext matrices in their natural ordering. Values in Table 8 show that is substantially reduced for all orderings with the largest envelope reduction obtained by the Fiedler vector approach discussed in Section 3.4. With respect to , however, the RCM ordering achieves the greatest bandwidth reduction (see Table 9). Notice that for some matrices ( CCE-A, NHSE), the value of shown in Table 10 is significantly lower than that of . This is due to the clustering of nonzeros in a narrow band but the band itself is significantly displaced from the diagonal. Also, the orderings using Correspondence Analysis without distances (see Section 3.5) tend to produce 's more similar to those of RCM than the Fiedler approach.

Table 11 illustrates the effects of choosing different pairs of principal axes for Correspondence Analysis with distances (CACS). Since the 8-largest generalized singular values (see Equation 9) for these matrices were all approximately equal to 1 (i.e., form a cluster of generalized singular values near 1), it is not clear which pairs of principal axes best explains the variation in Equation (11). By selecting the 10-th pair (or 10-th largest) of principal axes, a reduction in of 43% and 17% can be obtained for MAN1 and NHSE, respectively. CACS(10) also achieves an average reduction of 25% for and 29% for for these matrices.

Table 8: Envelope size ( ) of the hypertext matrices after reorderings; CACS(2) denotes the use of Correspondence Analysis using distances and the second principal axes, and CANC(1) denotes the use of Correspondence Analysis with no distances and the first principal axes.

Table 9: Bandwidth () of the hypertext matrices after reorderings; CACS(2) denotes the use of Correspondence Analysis using distances and the second principal axes, and CANC(1) denotes the use of Correspondence Analysis with no distances and the first principal axes.

Table 10: Non-diagonal bandwidth () of the hypertext matrices after reorderings; CACS(2) denotes the use of Correspondence Analysis (CA) using distances and the second principal axes, and CANC(1) denotes the use of CA with no distances and the first principal axes.

Table 11: Effects of using different principal axes in Correspondence Analysis with distances. CACS(2) and CACS(10) denote the use of the second and tenth principal axes, respectively.

Figures 3 through 5 illustrate the reorderings obtained for a few of the sample hypertext matrices discussed in Section 3.2. Of particular interest is the similarity of reorderings produced by the pairs (RCM, CANC(1)) and (Fiedler, CACS(2)) for the MAN1 and MAN2 matrices. For the other two matrices ( CCE-A, NHSE) whose average number of documents per link (see from Table 3) is smaller, the similarities were not as prominent. As shown in Figure 4, the near-diagonal clusterings obtained are quite different. In particular, CANC(1) produces a definite block-diagonal pattern of nonzeros but at the expense of the largest bandwidths ( and ) among all reorderings of the four test matrices (see Tables 9 and 10). The reduction in bandwidth achieved by CACS(10), i.e., using the 10-th largest pair of principal axes or left and right generalized singular vectors of A in Equation (9), is illustrated in Figure 5 for the MAN1 matrix.

The proper selection of principal axes for Correspondence Analysis with distances when the hypertext matrix A has clustered generalized singular values ('s from Equation (11)) is problematic. Nevertheless, an iterative procedure which cycles through a subset of the largest generalized singular vectors of A as computed by a Lanczos or block Lanczos SVD method (see [B+93]) is plausible. Future research in the use of principal axes for bandwidth reduction in the presence of clustered spectra is warranted.

Next: Computational Time Up: Performance on Hypertext Previous: Performance on Hypertext

Michael W. Berry (berry@cs.utk.edu)
Mon Jan 29 14:30:24 EST 1996