The cost and performance of the PVM implementation of CSI-MSVD has been compared with that of LAS1 and LAS2, the Lanczos algorithms implemented in SVDPACK [3]. LAS1 solves the eigenvalue problem for the two-cyclic matrix from Equation (2)), and LAS2 solves the eigenvalue problem for the matrix . In order to estimate the cost of each method, the maximum memory requirement for any process in the virtual machine for CSI-MSVD was compared to the corresponding requirements for LAS1 and LAS2.

The dominant memory-cost for CSI-MSVD is for
the storage for the matrix **A** itself. The memory requirements
for calculating the triangular matrix
defined in Equation (24) and
for the extraction of singular values from the resulting bidiagonal
matrix in Equation (30) are bounded by the number of iterations **i**
required for convergence. As
demonstrated by Equations (22) and
(21), the construction of the Jacobi matrix is
accelerated by the extraction of moments and at
step **k+1**, and hence the number of iterations **i** is usually small
(e.g., ). For a maximum iteration limit
when approximating the 10-largest singular triplets,
the upper-bound on the memory requirements for
the calculation of is 0.191 Kbytes and
that for the bidiagonal QR iteration is 0.456 Kbytes. In contrast,
the combined memory requirements for matrix **A** storage
and the Chebyshev iteration are shown in Table 2.
Certainly the memory requirement for storage of the matrix **A** within
CSI-MSVD can be controlled by varying the number of processors used.

The memory requirements for the Lanczos algorithms from SVDPACK
are also listed in Table 2. The values (in Mbytes)
reported for both LAS1 and LAS2 reflect auxillary memory requirements
beyond that required for matrix **A** storage, i.e., storage for
Lanczos vectors and temporary work arrays. Even if the matrix **A** is
partitioned across several processors and matrix-vector multiplication
is carried out through data-parallel computation, the larger
memory requirements for the SVDPACK routines (compared to CSI-MSVD)
could prove to be an insurmountable bottleneck for modest computing
environments.

The number of matrix-vector
multiplications of the form required by
CSI-MSVD and the Lanczos algorithms have compared in [22].
When approximating
the 10-largest singular values and the corresponding singular vectors of the
matrices from Table 2 using
an error tolerance of , the CSI-MSVD method may eventually
require more matrix-vector multiplications. Figure
6 illustrates a comparison in cumulative overhead for
the matrix ` ENCY` in terms of the number of matrix-vector
multiplications required
to get the **k**-largest triplets using LAS1 and CSI-MSVD. Notice that
the number of matrix-vector multiplications for CSI-MSVD is
typically lower than the corresponding values for LAS1 for the
8-largest singular triplets.

Although CSI-MSVD only requires 13 iterations to approximate the first singular triplet (compared to the 50 iterations required by LAS1), the incremental overhead for calculating each additional triplet for CSI-MSVD will increase. Such overhead could be reduced if, at each step, it were possible to accept more than one singular triplet. However, the absence of more than one singular vector approximation (in the current implementation) per iteration precludes this reduction in overhead.

Michael W. Berry (berry@cs.utk.edu)

Sun May 19 11:34:27 EDT 1996