next up previous contents index
Next: CDS Matrix-Vector Product Up: Sparse BLAS Previous: Sparse BLAS   Contents   Index


CRS Matrix-Vector Product

The matrix-vector product $y=Ax$ using CRS format can be expressed in the usual way:

\begin{displaymath}y_i=\sum_j a_{i,j}x_j,\end{displaymath}

since this traverses the rows of the matrix $A$. For an $n \times n$ matrix A, the matrix-vector multiplication is given by
   for i = 1, n 
       y(i)  = 0 
       for j = row_ptr(i), row_ptr(i+1) - 1
           y(i) = y(i) + val(j) * x(col_ind(j))
       end;
   end;
Since this method only multiplies nonzero matrix entries, the operation count is $2$ times the number of nonzero elements in $A$, which is a significant savings over the dense operation requirement of $2n^2$.

For the transpose product $y=A^Tx$ we cannot use the equation

\begin{displaymath}
y_i=\sum_j (A^T)_{i,j}x_j=\sum_j a_{j,i}x_j,
\end{displaymath}

since this implies traversing columns of the matrix, an extremely inefficient operation for matrices stored in CRS format. Hence, we switch indices to

\begin{displaymath}\hbox{for all $j$, do for all $i$:}\qquad y_i\leftarrow
y_i+a_{j,i}x_j. \end{displaymath}

The matrix-vector multiplication involving $A^T$ is then given by
   for i = 1, n
       y(i) =  0
   end;
   for j = 1, n
       for i = row_ptr(j), row_ptr(j+1)-1
           y(col_ind(i)) = y(col_ind(i)) + val(i) * x(j)
       end;
   end;

Both matrix-vector products above have largely the same structure, and both use indirect addressing. Hence, their vectorizability properties are the same on any given computer. However, the first product ($y=Ax$) has a more favorable memory access pattern in that (per iteration of the outer loop) it reads two vectors of data (a row of matrix $A$ and the input vector $x$) and writes one scalar. The transpose product ($y=A^Tx$), on the other hand, reads one element of the input vector and one row of matrix $A$ and both reads and writes the result vector $y$. Unless the machine on which these methods are implemented has three separate memory paths (e.g., Cray's vector computers), the memory traffic will then limit the performance. This is an important consideration for RISC-based architectures.


next up previous contents index
Next: CDS Matrix-Vector Product Up: Sparse BLAS Previous: Sparse BLAS   Contents   Index
Susan Blackford 2000-11-20