The cost and performance of the PVM implementation of CSI-MSVD has been compared with that of LAS1 and LAS2, the Lanczos algorithms implemented in SVDPACK . LAS1 solves the eigenvalue problem for the two-cyclic matrix from Equation (2)), and LAS2 solves the eigenvalue problem for the matrix . In order to estimate the cost of each method, the maximum memory requirement for any process in the virtual machine for CSI-MSVD was compared to the corresponding requirements for LAS1 and LAS2.
The dominant memory-cost for CSI-MSVD is for the storage for the matrix A itself. The memory requirements for calculating the triangular matrix defined in Equation (24) and for the extraction of singular values from the resulting bidiagonal matrix in Equation (30) are bounded by the number of iterations i required for convergence. As demonstrated by Equations (22) and (21), the construction of the Jacobi matrix is accelerated by the extraction of moments and at step k+1, and hence the number of iterations i is usually small (e.g., ). For a maximum iteration limit when approximating the 10-largest singular triplets, the upper-bound on the memory requirements for the calculation of is 0.191 Kbytes and that for the bidiagonal QR iteration is 0.456 Kbytes. In contrast, the combined memory requirements for matrix A storage and the Chebyshev iteration are shown in Table 2. Certainly the memory requirement for storage of the matrix A within CSI-MSVD can be controlled by varying the number of processors used.
The memory requirements for the Lanczos algorithms from SVDPACK are also listed in Table 2. The values (in Mbytes) reported for both LAS1 and LAS2 reflect auxillary memory requirements beyond that required for matrix A storage, i.e., storage for Lanczos vectors and temporary work arrays. Even if the matrix A is partitioned across several processors and matrix-vector multiplication is carried out through data-parallel computation, the larger memory requirements for the SVDPACK routines (compared to CSI-MSVD) could prove to be an insurmountable bottleneck for modest computing environments.
The number of matrix-vector multiplications of the form required by CSI-MSVD and the Lanczos algorithms have compared in . When approximating the 10-largest singular values and the corresponding singular vectors of the matrices from Table 2 using an error tolerance of , the CSI-MSVD method may eventually require more matrix-vector multiplications. Figure 6 illustrates a comparison in cumulative overhead for the matrix ENCY in terms of the number of matrix-vector multiplications required to get the k-largest triplets using LAS1 and CSI-MSVD. Notice that the number of matrix-vector multiplications for CSI-MSVD is typically lower than the corresponding values for LAS1 for the 8-largest singular triplets.
Although CSI-MSVD only requires 13 iterations to approximate the first singular triplet (compared to the 50 iterations required by LAS1), the incremental overhead for calculating each additional triplet for CSI-MSVD will increase. Such overhead could be reduced if, at each step, it were possible to accept more than one singular triplet. However, the absence of more than one singular vector approximation (in the current implementation) per iteration precludes this reduction in overhead.