High Performance Computing
on Hewlett-Packard Systems

(Annual conference of the HP2EUG)

ZARM, University of Bremen
October 7 - 9, 2001
Bremen, Germany


AUTHORS: Hsin-Ying Lin (lin@rsn.hp.com) and Piotr Rafal Luszczek (luszczek@rsn.hp.com), HP TCD, Richardson, Texas, U.S.A.

TITLE: Tuning LINPACK N*N for PA-RISC platforms

SESSION: HP Technology Update

ABSTRACT:

The Linpack N*N benchmark is used as a performance measure in ranking the 500 most powerful computer systems installed in the world. It is therefore very important for the hardware vendors to tune the benchmark for the benefit of the company and their customers.

We present highly scalable implementations of parallel algorithms for Linpack N*N. They can be run on either HP single node shared-memory machines or constellation systems. We discuss ways to efficiently use HP's hardware and software solutions which can greatly contribute not only to the performance of the Linpack benchmark but also regular user applications. One of the key performance factors for a single CPU is the Level 3 BLAS (Basic Linear Algebra Subprograms) matrix-matrix multiply routine for 64-bit double precision floating-point data, usually referred to as DGEMM. We analyze the performance of DGEMM in HP MLIB against its ATLAS (Automatic Tuned Linear Algebra Software) counterpart. Based on these results, we decided to incorporate the highly tuned implementation of DGEMM from HP MLIB. We then compared the performance of our LU factorization implementations against previously used algorithms.

We show that the our implementation achieves more than 10% performance improvement over previous implementations on both 64-processor HP SuperDome and Caribe systems. Furthermore, our implementation performs far better than the public domain software. We also analyze the scalability of our parallel algorithms for both single node and constellation systems.


HiPer'01 is sponsored by: HiPer'01 is locally organized by:

hiper01@hp2eug.org
Last modified: 2001-07-27