next up previous
Next: Summary and Future Up: Verification and Performance Previous: Parallel Results

Performance of Processing Nodes

Although asynchronous communication was used, the explicit synchronization in the parallel model resulted in similar execution times per processor. If the synchronization points were removed, similar execution times would still result due to the use of the CMMD reduction functions at the end of each simulation day. In order to determine actual computation time per processor (Figure 21), the CMMD timing functions were used to measure idle time. Idle time is that portion of time during which a processor is performing no useful computation. The total number of deer processed during the 23 year simulation for each processor is shown in Figure 22.

    


Figure 21 (top): Total computation time per processor for an initial deer population of 10,000.

Figure 22 (bottom): Total number of deer processed on each processor for an initial deer population of 10,000.

The difference in computation time per processor is dependent not only on the number of deer residing on a processor, but also on the proximity of the deer to the processor boundary. In addition, this difference is dependent upon the grid positions of deer located on neighboring processors. The closer a deer is to a processor boundary, the more likely it will be that the neighboring processor is required to participate in the deer's forage search. For example, in Figures 21 and 22, PN processed more deer during the simulation than PN, PN, and PN, but has a smaller computation time. Similarly, PN processed more deer than PN, however PN has a greater computation time.


Michael W. Berry (berry@cs.utk.edu)
Wed Oct 11 14:53:18 EDT 1995