next up previous
Next: Acknowledgement Up: Parallel Hoshen-Kopelman Algorithm Title Previous: Parallel Performance Results

4. Summary and Future Work

In this work, both a faster sequential implementation (SI) of the Hoshen-Kopelman cluster identification algorithm and a parallel implementation (PI) on the CM-5 have been presented. The implementations involved using a finite state machine to maintain results of prior integer comparisons in order to reduce redundant comparison. The SI achieved a time improvement ranging from 1.39 to 2.00 over the sequential HK implementation in Berry et al. (1994) and the PI achieved an average speedup of 5.71 over the fastest SI. Future optimization of particular PI functions, REDUCTION and MERGEFUN, is possible by examining the message passing mechanism in the CMOST operating system with a goal of reducing the total time spent in inter-processor communication. Although this work employed the NEWS neighborhood rule (see Section 1.3) for spatial cluster analysis, the finite state machine approach could be applied to other neighborhood rules such as the 9-cell Moore rule (Packard and Wolfram, 1985). Extensions to three-dimensional spatial analysis problems are also possible.

Improving the combination mechanism for the FSM parallel implementation is another future concern. Since the PI creates a csize working array on each PN, the range of indexes into the array (based on the PN number) is limited. For example, in a 32 PN system, would have access to the first indexes in the csize array on . The rest of the indexes would remain unused. This mechanism was chosen to make use of the global reduction operations available in the CMOST operating system. The global reduction operation is responsible for 48% of the overall cluster identification time which is more expensive than the actual cluster identification function. Furthermore, the extra memory allocated in the csize working array on each of the PN could be better used to process larger input maps.

A possible solution could be to switch to a host/node paradigm in which the host PN would allocate a csize working array large enough to support the entire map; but, the PNs would only allocate enough memory to support the PN's portion of the input map. The MERGEFUN would then be performed entirely on the host processor. The REDUCTION process would be more complicated because the host would have to map the individual csize array index numbers to a specific range of indexes; but, it should reduce the time spent passing messages.



next up previous
Next: Acknowledgement Up: Parallel Hoshen-Kopelman Algorithm Title Previous: Parallel Performance Results



Michael W. Berry (berry@cs.utk.edu)
Sat Mar 30 23:40:13 EST 1996