next up previous
Next: Performance Results Up: Performance Methodology Previous: Sequential Implementation

2.2.4 Parallel Implementation

The algorithm was parallelized by partitioning the map across the PNs of the CM-5 involved the basic input/output functions of the operating system. In the parallel implementation (PI), I/O functions were performed in either Synchronous Broadcast Mode (SBM) or Synchronous Sequential Mode (SSM). SBM requires all PNs to issue a synchronized I/O function call, which is implemented by having one PN perform the physical I/O operation followed by a global message broadcast to the other PNs using the CM-5's internal message passing facilities. This procedure results in each PN receiving/writing the same information and avoids physical disk contention. Physical disk contention is a phenomenon observed when the number of read or write operations targeted at a physical hard disk drive are more than can be serviced simultaneously. SSM allows an input data set to be sequentially divided across the PNs. For example, given 32 PNs and bytes to read: would read the first b bytes, would read the next b bytes, etc., so that would read the last b bytes. While SBM was used to read and distribute the 128-byte header to all PNs, SSM was used to read and partition the input map file across all PNs.

Cluster identification on each PN is a mutually exclusive process - totally independent of other PNs and partitions. Each PN, using a csize array that is sufficient to hold the statistics for the entire map, is allowed access to a specific range of indexes in its csize array. One may take into account knowledge of the numbers of clusters exhibited by real landscape maps to reduce the length of csize arrays on each PN. The beginning cluster label number for each PN must agree with the starting index of its assigned range. Mapping PN numbers to a unique range of indexes in the csize array protects against the assignment of duplicated temporary cluster labels across partitions. The absence of duplicate label numbers is significant in order for the CMOST's global integer reduction operation (GIRO) to yield meaningful results. The GIRO reduces 1 object on each PN returning the result of the operation to the object on each PN. The GIRO requires access to the object on each PN and a binary operator to apply during the reduction operation. For example, given an integer object i on p PNs, each with a value of 1 and a binary operator of addition, the result of the GIRO would have been i = p on each PN. If the binary operator was multiplication, the result would have been i = 1 on each PN. The effective result of cluster identification is a csize array on each PN containing the complete map's temporary cluster statistics. Figure 8 illustrates a simple example of index range assignment on a 4-processor system. is assigned the first 25% of the csize indexes, is assigned the second 25%, is assigned the third 25%, and is assigned the last 25%. After the CMMD global integer reduction operation, all of the csize arrays contain the union of all of the csize arrays.

  


Figure 8: Illustration of index ranges in the csize array on a 4-PN system and the results of a global integer reduction operation (GIRO).

During cluster identification, all PNs have relabeled the first row (firstRow) and last row (lastRow) in its partition. Afterwards, each PN can then send a copy of lastRow to the PN with the next higher number. The relabel process uses the pixel value as an index into the csize working array and checks the arithmetic sign of the array stored value. If the stored value is a positive number, the pixel is correctly labeled. Otherwise, the correct label value is at the tail of the linked list created with the negative array values (see Figure 7). Each PN compares the row it receives to its relabeled firstRow and generates a list of merge-clusters ordered pairs. A merge-clusters ordered pair contains two temporary cluster labels and is maintained in an integer array, mergeArray, with each ordered pair occupying 2 successive array slots. Figure 9 illustrates the merge-clusters ordered pair generation by after receiving lastRow from . receives lastRow from into messageRow, and then compares messageRow and firstRow on . Temporary clusters 12 and 13 from overlap with temporary cluster 24 from ; therefore, would insert 12, 24, 13, and 24 respectively into mergeArray's next four available array slots. Figure 10 illustrates the resulting mergeArray on for the example in Figure 9. Zero entries in indexes 4 and 5 are termination indicators that signal to stop processing this array during the final merge operation. Using CMMD message receiving function (CMMD_receive_block), receives the mergeArray of merge-clusters ordered pairs from all other PNs and then performs a merge operation on the csize array for each merge-clustered ordered pair. For example, when receives mergeArray from containing 12, 24, 13, and 24, will merge temporary cluster 12 into temporary cluster 24 and temporary cluster 13 into temporary cluster 24. The result of the identification process is a csize array on containing the cluster statistics for the entire input map.

 


Figure 9: Illustration of the merge-clusters ordered-pair generation. LastRow is passed from and received by into messageRow. messageRow is then compared to firstRow to determine overlapping clusters.

  
Figure 10: Illustration of the merge-clusters ordered-pair array, mergeArray.




next up previous
Next: Performance Results Up: Performance Methodology Previous: Sequential Implementation



Michael W. Berry (berry@cs.utk.edu)
Sat Mar 30 23:40:13 EST 1996