Mapping
Algorithms to the Network Topology in a Portable Manner
Dave Turner, SCL Ames Laboratory, Ames IA 50011 USA
PARA’04
Workshop on the State-of-the-Art in Scientific Computing
Abstract
One long-standing challenge in
parallel computing is mapping an algorithm to the topology of the network. Every architecture has a unique topology,
and the effective topology can change with each run because the physical
arrangement of nodes may be different, I/O nodes can intervene, and
communication performance may be affected by other applications. Most application programmers essentially
ignore the network topology. This
retains portability, but sacrifices scalability and efficiency, and typically
limits applications to systems of a few hundred processors.
When dealing with computing systems
of 1,000 – 100,000 processors, it is simply no longer possible to ignore the
topology of the underlying network.
Good algorithms are needed that take advantage of local communications,
and they must be mapped onto the network so local communications are done between
neighboring nodes over direct connections to avoid contention for communication
channels.
Many algorithms can be efficiently
mapped onto a simple topology like a 2D or 3D mesh. In turn, these meshes can be mapped onto most network topologies
in a manner that insures that neighboring nodes are directly connected to each
other. Mapping algorithms to a virtual
2D or 3D mesh can therefore provide both efficiency and portability.
A tool called NodeMap is
being developed that automatically detects the network topology at run time,
then provides an optimal mapping of a virtual 2D or 3D mesh to that network
topology. NodeMap will be called
from the MPI_Cart_create() function when the reorder flag is set. It may also be called from MPI_Init()
to find optimal mappings for the global MPI functions.
NodeMap uses a variety of techniques to
probe a wide range of network characteristics.
A simple gethostname function determines processes on the same
SMP node. The uname function and
compiler definitions are used to identify MPP systems, identifying the global
network topology. Subnet mappers can be
probed to determine the topology for Myrinet and InfiniBand networks. Latency and throughput measurements using
simple ping-pong tests can help identify the topology of Ethernet
clusters. Static configuration files
are used as a last resort.
If the programmer maps the
application to a virtual mesh, the rest will be done automatically. This should at least take one big step
towards providing a portable means to take advantage of the network topology.