PARA'04 State-of-the-Art
in Scientific Computing
June 20-23, 2004 (Home page)

Updated: 20 February 2004

Optimization of a Sparse Hypermatrix Cholesky Factorization

Jose R. Herrero and Juan J. Navarro
Universitat Politecnica de Catalunya
Spain

Abstract:

We will present our work on a sparse Cholesky factorization based on the Hypermatrix [1,2] data structure. The matrix is partitioned recursively into blocks of different sizes. The HM structure consists of several (N) levels of submatrices. The top N-1 levels hold pointer matrices which point to the next lower level submatrices. Only the last (bottom) level holds data matrices. Data matrices are stored as dense matrices and operated as such. Null pointers in pointer matrices indicate that the corresponding submatrix does not have any non-zero elements and is therefore unnecessary.

Working on dense matrices has several advantages. First, it simplifies the data structure. Second, it allows for the usage of level 3 BLAS routines. However, it also has one disadvantage, namely the storage and computation on zero elements. We have been trying to reduce the large overhead introduced by such zeros. In a recent paper [3], we showed how we could improve performance by producing very efficient operations working on small matrices. Thus, the block size could be reduced while the effective performance increased.

We have also managed to reduce the number of zeros actually used within a data submatrix. Dimensions of a data submatrix are determined by the dimensions of two diagonal blocks: the one in the same row and the one in the same column. However, a submatrix is not necessarily full. To avoid using the whole data submatrix we have modified the original way of storing data submatrices as dense matrices. Now we can keep and operate on a subset of a data submatrix specified by the top-left and bottom-right corners. Using this, we have managed to outperform a classical left-looking supernode-supernode algorithm [4] for medium and large matrices.

However, performance is still worse than that of a 2D sparse Cholesky as implemented in routine PSLDLT on an R10000 processor [5]. We will analyze the reasons and propose ways to improve our sequential code. References:

Fuchs, G., Roy, J., Schrem, E.: Hypermatrix solution of large sets of symmetric positive-definite linear equations. Comp. Meth. Appl. Mech. Eng. (1972) 197-216

Ast, M., Fischer, R., Manz, H., Schulz, U.: PERMAS: User's reference manual. INTES publication no. 450, rev.d (1997)

Herrero, J.R., Navarro, J.J.: Improving Performance of Hypermatrix Cholesky Factorization. Euro-Par 2003,LNCS2790, Springer-Verlag (2003) 461-469

Liu, J.W., Ng, E.G., Peyton, B.W.: Block sparse Cholesky algorithms on advanced uniprocessor computers. SIAM J. Sci. Comput. (1993) 1034-1056

Rothberg, E., Gupta, A.: An efficient block-oriented approach to parallel sparse Cholesky factorization. SIAM J. Sci. Comput. (1994) 1413-1439

Home page

Jerzy Wasniewski 2004-02-20