next up previous contents index
Next: Practical Algorithm Up: Block Arnoldi Method   Previous: Block Arnoldi Method     Contents   Index

Block Arnoldi Reductions

Let $A$ be a matrix of order $n$ and $b > 0$ be the block size. We say that
\begin{displaymath}
A V_{[m]} = V_{[m]} H_{[m]} + F_m E^{\ast}_m
\end{displaymath} (135)

is a block Arnoldi reduction of length $m$ when $V_{[m]}^\ast AV_{[m]}=H_{[m]}$ is a band upper Hessenberg matrix of order $m\cdot b$, $V_{[m]}^\ast V_{[m]}= I_{m b}$, and $V_{[m]}^\ast F_m=0.$ For us, a band upper Hessenberg matrix is an upper triangular matrix with $b$ subdiagonals. The columns of $V_{[m]} = [V_1,V_2,\ldots,V_m]$ are an orthogonal basis for the block Krylov subspace

\begin{displaymath}{\cal K}^m(A,V_1) \equiv \mbox{span}\{V_1, A V_1,\ldots, A^{m-1} V_1 \}.\end{displaymath}

If $ m > \bar{m} \equiv \mbox{ceiling}(n/b)$, then $F_m = {0}$ and $H_{[m]}$ is the orthogonal reduction of $A$ into banded upper Hessenberg form. We assume, for the moment, that $F_m$ is of full rank and further suppose that the elements on the diagonal of $H_{m+1,m}$ are positive. Thus, a straightforward extension of the implicit Q theorem [198, pp. 367-368] gives that $F_m$ is (uniquely) specified by the starting block $V_1.$ Note that if $A=A^\ast$, then $H_{[m]}$ is a block tridiagonal matrix. Algorithm 7.11 lists an algorithm to compute a block Arnoldi reduction.


\begin{algorithm}{Extending a Block Arnoldi Reduction
}
{
\begin{tabbing}
(nr)s...
...m]}^\ast W \\
H_{m+1,m+1}
\end{array}\right] $\end{tabbing}}
\end{algorithm}
We now comment on some steps of Algorithm 7.11.

(1)
Here the QR factorization is computed via an iterated CGS algorithm using a possible correction step. See [96] for details and the simple test used to determine whether a correction step is necessary. One benefit of this scheme is that it allows the use of the Level 2 BLAS [133] matrix-vector multiplication subroutine xGEMV (also see §10.2). Moreover, this scheme also gives a simple way to fill out a rank-deficient $F_m.$ For instance, if a third step of orthogonalization is needed when generating column $j$ of $V_{m+1},$ then the corresponding column of $F_m$ is linearly dependent on the previous $j-1$ columns of $V_{m+1}.$ The $j$th diagonal element of $H_{m+1,m}$ is set to zero, and a random unit vector is orthogonalized against $V_{[m]}$ and the first $j-1$ columns of $V_{m+1}.$

(3)
This step allows the application of $A$ to a group of vectors. This might prove essential when accessing $A$ is expensive. Clearly, the goal is to amortize the cost of applying $A$ over several vectors.

(4)
This allows the use of the Level 3 BLAS [134] matrix-matrix multiplication subroutine xGEMM for computing $V_m^\ast W.$

(6)
This is one step of block classical Gram-Schmidt (bCGR). The Level 3 BLAS [134] matrix-matrix multiplication subroutine xGEMM is used for computing the rank-$b$ update needed for the computation of $F_{m+1}$. To ensure the orthogonality of $F_{m+1}$ with $V_{m+1}$, a second step of bCGR is performed except when $b=1.$ In this latter case, the simple test in DGKS [96] is used to determine whether a second orthogonalization step is needed. See [295] for details.

The scheme given for computing $V_{[m+1]}$ is equivalent to the one proposed by Ruhe
[375] (for symmetric matrices) and is the one used by the implicitly restarted block Lanczos code [24]. Although the approach in [24] cleanly deals with the problem of rank-deficient $F_m$, the implementation does not exploit the ability to apply $A$ as in (3) above. Instead, as proposed in [375], $A$ is applied to each column of $F_m$ followed by computing the corresponding column of $V_{m+1}$ and $H_{[m+1]}.$ Our implementation further reorganizes Ruhe's approach so that the computation of the matrix of coefficients $V_{[m]}^\ast W$ is separated from the QR factorization of $F_m.$ The advantage is that steps (3)-(5) reduce the cost of I/O by a factor of the block size and increase the amount of floating point operations per memory reference.


next up previous contents index
Next: Practical Algorithm Up: Block Arnoldi Method   Previous: Block Arnoldi Method     Contents   Index
Susan Blackford 2000-11-20