next up previous
Next: Preconditioner for Arnoldi's Up: CSI-MSVD Performance Results Previous: CSI-MSVD Performance Results

5.1 Comparison with Lanczos Algorithm

The cost and performance of the PVM implementation of CSI-MSVD has been compared with that of LAS1 and LAS2, the Lanczos algorithms implemented in SVDPACK [3]. LAS1 solves the eigenvalue problem for the two-cyclic matrix from Equation (2)), and LAS2 solves the eigenvalue problem for the matrix . In order to estimate the cost of each method, the maximum memory requirement for any process in the virtual machine for CSI-MSVD was compared to the corresponding requirements for LAS1 and LAS2.

  


Table 2: Memory requirements for matrix-vector multiplication in CSI-MSVD (using 20 processors) and memory requirements for SVDPACK routines LAS1 and LAS2 (single processor).

The dominant memory-cost for CSI-MSVD is for the storage for the matrix A itself. The memory requirements for calculating the triangular matrix defined in Equation (24) and for the extraction of singular values from the resulting bidiagonal matrix in Equation (30) are bounded by the number of iterations i required for convergence. As demonstrated by Equations (22) and (21), the construction of the Jacobi matrix is accelerated by the extraction of moments and at step k+1, and hence the number of iterations i is usually small (e.g., ). For a maximum iteration limit when approximating the 10-largest singular triplets, the upper-bound on the memory requirements for the calculation of is 0.191 Kbytes and that for the bidiagonal QR iteration is 0.456 Kbytes. In contrast, the combined memory requirements for matrix A storage and the Chebyshev iteration are shown in Table 2. Certainly the memory requirement for storage of the matrix A within CSI-MSVD can be controlled by varying the number of processors used.

The memory requirements for the Lanczos algorithms from SVDPACK are also listed in Table 2. The values (in Mbytes) reported for both LAS1 and LAS2 reflect auxillary memory requirements beyond that required for matrix A storage, i.e., storage for Lanczos vectors and temporary work arrays. Even if the matrix A is partitioned across several processors and matrix-vector multiplication is carried out through data-parallel computation, the larger memory requirements for the SVDPACK routines (compared to CSI-MSVD) could prove to be an insurmountable bottleneck for modest computing environments.

  


Figure 6: Comparison of cumulative overhead when computing singular triplets by LAS1 and CSI-MSVD. CSI-MSVD requires fewer matrix-vector multiplications to calculate each triplet, but LAS1 has a lower cumulative count for k > 8 because more than one singular triplet can be deflated at each step.

The number of matrix-vector multiplications of the form required by CSI-MSVD and the Lanczos algorithms have compared in [22]. When approximating the 10-largest singular values and the corresponding singular vectors of the matrices from Table 2 using an error tolerance of , the CSI-MSVD method may eventually require more matrix-vector multiplications. Figure 6 illustrates a comparison in cumulative overhead for the matrix ENCY in terms of the number of matrix-vector multiplications required to get the k-largest triplets using LAS1 and CSI-MSVD. Notice that the number of matrix-vector multiplications for CSI-MSVD is typically lower than the corresponding values for LAS1 for the 8-largest singular triplets.

Although CSI-MSVD only requires 13 iterations to approximate the first singular triplet (compared to the 50 iterations required by LAS1), the incremental overhead for calculating each additional triplet for CSI-MSVD will increase. Such overhead could be reduced if, at each step, it were possible to accept more than one singular triplet. However, the absence of more than one singular vector approximation (in the current implementation) per iteration precludes this reduction in overhead.



next up previous
Next: Preconditioner for Arnoldi's Up: CSI-MSVD Performance Results Previous: CSI-MSVD Performance Results



Michael W. Berry (berry@cs.utk.edu)
Sun May 19 11:34:27 EDT 1996