This assessment tests your knowledge of the history of different message passing systems. The final section test your understanding of how they work and the differences in function and semmantics.
10 points
For each only give a brief answer.
receivers address, data buffer, size of data, message tags, status of operation or handle for non-blocking operations.
Blocking returns after completing some operation, while non-blocking returns immediately and the user has to check when the operation has completed. The main point here is, the data buffer associated with the operation is unsafe to use on non-blocking calls when they return but safe to use on blocking calls.
Local blocking generally block until the message/data has been handed to some lower level sub-system (i.e. no delivered yet)
Global blocking is where the data has been received before the sending call returns.
As they have not completed the operation yet, the system usually returns a handle, which the user can check. Supported operations include, a wait, test and maybe a cancel.
On Caltech HyperCube there were only a few send and receive operations, on moden systems there are upward of two hundred. Many more would be needed, but these are secified as arguments. There are also status, async handles and other arguments.
10 points
For each question give only a brief answer.
(1) Describe why the Caltech Hypercube was considered a difficuilt platform to program? (1)
Sync send and receive operations of only 8 bytes.
(2) Which message passing system introduced the concept of 'virtual processes'? (1)
Distributed Process Environment on the Hypercube.
(3) What improvements were made to NX between NX and NX2? (1)
Interrupt driven send and recv calls as well as better tagging of message.
(4) Which system introduced a send and receive operation in a single call, and why was it usefull ? (2)
CMMD on the Thinking Machines CM5. Allowed swapping of data values without having to make copies of data to prevent the system overwriting it. It was good for stencil operations.
(5) Which system(s) supported comprehensive collective/group communications? (1)
The IBM EUI had good collective operations for a vendor system. MPI does now, but these were modelled on the IBM system.
(6) On a distributed cluster, which system would I use, PVM or MPI, and why? (2)
PVM. Its better suited to Heterogeneous systems and is able to handle failures of processes, hosts and network links.
(7) Is the Linda programming model, message passing or shared memory ? (2)
Shared memory abstraction, which some message passing semmantics.
out = send and in = recv
10 points
For the following sample of code:
Task 1
send ( task2, data, data-length )
receive ( task2, new-data, new-data-length )
Task 2
send ( task1, data, data-length )
receive ( task1, new-data, new-data-length )
(1) Why could this code fail on some systems rather than others ? (2)
It relies on both sides buffering the sending data. Neither recv can complete until after the sends which can't happen unless their is some form of 'local buffering'.
(2) Which systems would this code always work on, and why? (2)
PVM, as it always buffers sends for you.
(3) How should the code be changed to allow it to work on any system ? (1)
Swap one of the send, recv pairs
(4) To work under MPI without changing the order of any calls? (2)
Use the buffered send MPI_Bsend() and buffer attach semmantic. The buffer must be at least new-data-length + any headers MPI adds!
(5) If using either CMMD or MPI, how could it be shortened and still work? (1)
Use a send and recv call (sendrec())
(6) If a code is performing an exchange of boundrary conditions on a 2-D grid (i.e. 4 exchanges), how could you use four non-blocking calls instead of four blocking calls? Why might it be faster? (2)
Use 4 non-blocking sends and 4 non-blocking recvs
isend()
isend()
isend()
isend()
waitall()
irecv()
irecv()
irecv()
irecv()
waitall()
If using just one wait, you should make copies of the sending data to prevent the receives from overwriting them.
20 points
An application that has a master-slave structure is made from two pieces of source code (master.c and slave.c). To prevent errors from coding mistakes between the two codes all constansts are kept in a single header file (cons.h).
The application solves a problem by domain decomposition. The communication pattern used is:
At random times the code appears to produce a wrong result which then goes away. It is suspected that the code has a race condition in it (see additional diagram).
The form of the code is:
cons.h
/* This file contains message tags used by master/slave */
#define init-data 1
#define delta-data 2
#define result-data 3
master.c
start slaves
broadcast (all-slaves, init-data, data)
for (iterations)
for (each slave) recv (slave, result-data, data)
end
slave.c
start
recv (master, init-data, data)
for (iterations)
calculate delta-values
for (each other slave) send (other slave, delta-data, delta-values)
for (each other slave) recv (any, delta-data, delta-values)
calculate new values
send (master, result-data, values)
end
(1) Write a short paragraph using the additional diagram on why a race occurs. (6)
The slaves echange data with each other, without checking which iteration the senders are on, and without forcing the receives to receive from each other slave just the once per iteration.
(2) Is it due to the programming style? (2)
Yes, using just the tags in headerfile has restricted the way receive is used.
(3) Or the choice of semmantics? (2)
Yes, this is the real cause. The receive should be more specific.
(4) What should be changed to fix it? Give 3 methods including the actual changes to the code. One of the methods should involve a collective operation. (7)
3 Methods
(1) Receive from each task once per iteration
for (i=0;i<num slaves; i++) {
if (i!=me) recv (i, tag, .....)
(2) Receive from each slave once per iteration by keeping count of the iterations...
recv (-1, iteration, .......)
.
.
.
/* after all receives */
iteration++
(3) Force all slaves to stay on the same iteration by using a barrier after all the receives (or before all the sends)
(5) How could this race be detected in the first place? (3)
Debugging tools like xpvm help.
Keep a CRC of all the messages senders and tags per iteration so you know when the communication patten has changed.
Use Promela (SPIN model)
All answers should be emailed to me at fagg@cs.utk.edu by Tuesday the 2nd of Feburary, 1999. I will cover the answers to this assessment on the 3rd of Feburary.
The numbers in brackets (2) at the end of the line, denote the number of marks per question, the maximum score is 50.