next up previous
Next: Summary and Future Up: Performance Results Previous: Sequential Performance Results

3.3 Parallel Performance Results

The success of the parallel implementation (PI) was evaluated by comparisons with the sequential implementation (SI) in two categories: accuracy and execution time speedup. Speedup is defined as the ratio of the sequential time to the parallel time and was calculated using the FORD map. Because the HK algorithm's working arrays (csize and matrix) were distributed across the PNs of the CM-5, the PI functions SEND, IDROWS, REDUCTION, MERGEFUN, ROW1, and CALCSIZE were necessary to duplicate the functionality of the sequential HK algorithm. Table 7 details the responsibility of each of the functions involved in cluster identification in the FSM parallel implementation. The total time spent in cluster identification in the PI is the sum of the time spent in the SEND, IDROWS, REDUCTION, MERGEFUN, ROW1, and CALCSIZE functions (see Table 8).

The PI was able to achieve a speedup ranging from 5.41 to 5.90 on the FORD map. The average speedup was 5.71 for the 31 map classes of the FORD map. Given the parallel potential of the HK algorithm, the performance of the PI met expectations. The HK algorithm only requires access to a small portion of the input map and the process of merging overlapping clusters is well defined. The PI takes advantage of the PN's local memory to partition the input map which enables cluster identification to work entirely in parallel. However, the separation of the data and working sets became a hindrance during the MERGEFUN process. The combination process, which involves merging information from each PN, turned out to be more expensive than the cluster identification process.

  


Table 7: Functional responsibilities of the functions involved in cluster identification in the FSM parallel implementation on the CM-5.

  


Figure 11: Cluster identification time for the previous implementation in Berry et al. (1994) vs. the FSM implementation. All times are in CPU seconds.

 


Table 8: Functional profile of the Parallel Implementation (PI) of the FSM cluster identification algorithm applied to the FORD map. 

The combination process consumes 48% of the total cluster identification time compared to the cluster identification's 35%. Future research concerning the combination process is discussed in the next and final section. The SI and PI results for the 31 map classes of the FORD map (see Figure 12) demonstrate a constant speedup factor near 6.

  


Figure 12: Comparison of parallel and sequential cluster identification times for 31 map classes of the FORD map. All times are elapsed CPU seconds.



next up previous
Next: Summary and Future Up: Performance Results Previous: Sequential Performance Results



Michael W. Berry (berry@cs.utk.edu)
Sat Mar 30 23:40:13 EST 1996