PARA'04 State-of-the-Art
in Scientific Computing
June 20-23, 2004 (Home page)

Updated: 29 February 2004

Hierarchical Partitioning and Dynamic Load Balancing for Scientific Computation

J. D. Teresco
Department of Computer Science
Williams College
USA

Modern three-dimensional scientific computations must execute in parallel to achieve acceptable performance. Target parallel environments range from clusters of workstations to the largest tightly-coupled supercomputers. Hierarchical and heterogeneous systems are increasingly common. Emerging grid technologies make Internet execution more likely. Software efficiency may be improved using optimizations based on system characteristics and domain knowledge. Our focus has been on resource-aware partitioning and dynamic load balancing, achieved by adjusting target partition sizes or the choice of a dynamic load-balancing procedure or its parameters, or by using a combination of load-balancing procedures. For example, consider a cluster of symmetric multiprocessor (SMP) nodes connected by Ethernet. A more costly graph partitioning can be done to partition among the nodes, to minimize communication across the slow network interface, possibly at the expense of some computational imbalance, then a fast geometric algorithm can be used to partition independently within each node.

We present the Dynamic Resource Utilization Model (DRUM), a software system that supports automatic resource-aware partitioning and dynamic load balancing for heterogeneous, non-dedicated, and hierarchical computing environments. DRUM dynamically models the computing environment using a tree structure that encapsulates the capabilities and performance of communication and processing resources. The tree is populated with performance data obtained from a priori benchmarks and dynamic monitoring agents that run concurrently with the application. It is then used to guide partition-weighted and hierarchical partitioning and dynamic load balancing. We focus here on hierarchical balancing, where different procedures are used in different parts of the computing environment. DRUM's hierarchical balancing capabilities have been developed within the freely-available Zoltan library, which provides applications with a reusable, object-oriented interface to several partitioning and dynamic load balancing procedures, including coordinate bisection, octree/space filling curve methods, and multilevel graph partitioners. DRUM's hierarchical balancing is guided by the tree representation of the computational environment. The implementation utilizes a lightweight intermediate structure and a set of callback functions that permit an automated and efficient hierarchical balancing that can use any of the procedures available within Zoltan without modification, and in any combination. Procedures to be used at each level of hierarchical balancing may be specified using DRUM's graphical configuration tool.

We have tested our procedures using a three-dimensional adaptive discontinuous Galerkin solution of a perforated shock tube on a cluster of multiprocessors. Among all combinations of traditional and hierarchical procedures, time to solution is minimized by using hierarchical load balancing where ParMetis multilevel graph partitioning is used for inter-node partitioning and inertial recursive bisection is used within each node. Studies are underway that utilize hierarchical balancing on larger clusters, on other architectures and with a wider variety of applications.

This work is in collaboration with Joseph Flaherty and Jamal Faik at Rensselaer Polytechnic Institute (Troy, New York, USA), and with Karen Devine at Sandia National Laboratories (Albuquerque, New Mexico, USA). Home page


2004-02-29