CS560 Midterm Exam, March 12, 2009

Answer all questions

Question 1

This solution comes directly from the lecture notes:

Philosopher 0 is hungry and gets the chopsticks.
Philosopher 4 is hungry and waits.
Philosopher 2 is hungry and gets the chopsticks.
Philosophers 1 and 3 are hungry and wait.
Philosopher 2 is sated and puts down the chopsticks.
Philosopher 3 now gets the chopsticks.
Philosopher 0 is sated and puts down the chopsticks.
Philosopher 1 now gets the chopsticks.
Philosophers 0 and 2 are hungry and wait.
Philosopher 1 is sated and puts down the chopsticks.
Philosopher 0 now gets the chopsticks.
Philosopher 3 is sated and puts down the chopsticks.
Philosopher 2 now gets the chopsticks.
Philosophers 1 and 3 are hungry and wait.
Repeat from step 5

The process is illustrated below. Note how philosopher 4 is never able to get the chopsticks because both are never available:

There are other solutions. For example, a four-philosopher table can work too -- suppose eating times are long and thinking times are short. Then philosophers 0 and 2 can starve philosophers 1 and 3 so long as either philosopher 0 or 2 gets the chopsticks first.

Grading

Six points. You had to specify the order in which the philosophers drop he chopsticks to receive full credit.

Question 2

pthread_create(): In a user-level thread system, the system allocates a stack for the new thread, sets it up, and then puts it on the ready queue. This is all done in user space without the operating system's help. In a system-level thread system, pthread_create() involves a system call so that the new thread can be bound to an operating system thread. The operating system will allocate the stack space. Clearly, the user-level thread system will be quicker, since it does not involve a system call.
Scheduling by the Operating System: In a user-level thread system, the operating system schedules the whole process as one program, and the fact that there are multiple threads of execution is not taken into account. In a system-level thread system, the operating system schedules each thread. In the user-level system, there is little burdon on the operating system, but the process is unable to perform actions in parallel. For example, a blocking system call will block all threads. In the system-level system, there can be multiple simultaneous blocking calls, and if there are multiple processors, it can schedule the threads on more than one processor. Obviously, this makes scheduling more difficult, but it makes the thread system more powerful.
Context switching: In the user-level system, it is basically a longjmp() call to switch registers from one thread (on one stack) to another. Cheap and quick. In a system-level thread system, context switching is performed by the Operating System, and a thread context switch is akin to a process context switch. It is much more expensive, but enables all of those nice functionalities above.
Caching behavior: The two here are identical unless multiple threads are on multiple processors (in a system-level thread system). Then we may have issues with keeping the various caches coherent.

Grading

Pthread_create(): 3 points -- you needed to mention that system level requires a system call and that the user-level will be faster.
Scheduling: 3 points -- you needed to mention blocking system calls in user thread libraries, how the OS only manages one process in a user-level library, and that multiple processors can only be exploited in a system level library.
Context-Switching: 2 points -- Done by the OS in system level -- basically a longjmp() call at the user level.
Caching: 1 point -- basically identical.

Question 3

Like all blocking calls in the CBThread library, the basic structure is to figure out what to do with the current thread, and then to call block_myself(). Although this thread is never going to execute again, its state may need to be maintained for another thread to call cbthread_wait() on it.

The operations are as follows:

First, initialize the thread system if that has not happened yet.
Next, check to see if there is another thread calling cbthread_join() on me. If so, put that thread onto the ready queue, free up all your state and call block_myself().
Otherwise, check to see if there is a cbthread_joinall() thread. If so, simply free up your state and call block_myself(). That will execute the joinall thread if there are no more running threads.
Otherwise, you are a zombie process -- one that is dead but has not been joined. Put yourself on a queue of zombies, set your state to ZOMBIE, and call block_myself().

Grading

Checking for a joiner & dealing correctly: 2 points
Checking for a joinall & freeing state: 1 point
Checking to become a zombie: set state and join zombie queue: 2 points
Calling block_myself() at the end: 2 points

Question 4

Round robin scheduling with 5 CPU-bound jobs. This will give each job equal time-slices of the CPU. The jobs basically execute at a little less than 1/5 of the CPU speed, and their turnaround time is a little more than 5 times what it would be were there no other jobs in the system. This is pretty effective -- the only inefficiency is that an increased time quantum would reduce the number of context switches, decreasing the turnaround time just a little.
Round robin scheduling with 10 CPU-bound and 3 I/O bound jobs. This is the pathelogical case for round robin scheduling. The problem is that when an I/O bound job is done with I/O or sleeping, it has to wait 10 time quanta to get the CPU. This affects the job's turnaround time and its interactive behavior (the maximum time that it spends on the ready queue). The CPU bound jobs see very little difference from the case above.
Predictive SJF scheduling, α = 0.5, with 10 CPU-bound and 3 I/O bound jobs. This scheduling algorithm will properly identify CPU vs I/O bound jobs. This is good for the I/O bound jobs, because they will get the CPU in preference to the CPU bound jobs, improving their turnaround time and exhibiting excellent ready queue wait times. This algorithm has issues with the CPU-bound jobs because the blending of the CPU-bound jobs with the I/O bound jobs will give the CPU-bound jobs slightly differing priorities, and some will get a disproportionate share of the CPU with respect to the others. In other words, this algorithm does not deliver good fairness to the CPU-bound jobs, and in our example from class, one of them (the ill-fated CPU-C job) gets completely starved.

You can fix that last situation by treating all jobs with a range of predictions in a FIFO manner and by elongating their time quanta. That fixes the fairness issue, and also gives more CPU time to the CPU-bound jobs.

Grading

Round-robin with 5 CPU-Bound Jobs: 3 points
10 CPU and 3 I/O: 3 points (CPU stays the same; I/O get killed)
Predictive SJF: 3 points I/O do great, CPU have fairness issues
Tweaks: 1 for each (quantum, FIFO for CPU-bound jobs)

Question 5

The strategy is to keep a red-black tree keyed on the name. When a sender sends a message, it should look at the red-black tree for the name. If it's not there, then the ZB_call struct should be put on the tree. Same for when a receiver posts a receive.

Now, suppose that the name is on the tree and it is of the proper type (e.g. a sender is sending a message and a receiving ZB_call is on the tree, or a receiver is trying to receive and a sending ZB_call is on the tree). Then you take the ZB_call off the tree, perform the proper memcpy() and have both processes return.

The subtlety about dealing with multiple senders and receivers of messages with the same name was a little cruel. You can deal with it by having the red-black tree's val fields be dllists and handle it that way. I'm going to show both implementations. Both will use the following modified jos.h:

#include "simulator.h"
#include "dllist.h"
#include "jrb.h"
#include "cbthread.h"

typedef struct {
  int regs[NumTotalRegs];
} PCB;

typedef struct {
  char *name;
  char *physical_ptr;
  int size;
  PCB *caller;
  int send_or_receive;  /* 'S' or 'R' */
} ZB_call; 

typedef struct {
  Dllist readyq;     /* Ready queue */
  PCB *running;     /* Pointer to the currently running process's PCB */
  PCB *nulljob;     /* Pointer to a 'null' job -- we'll set running to 
                       this when we run noop() */
  JRB zb_waiters;   /* The red-black tree of waiting senders/receivers */
} JOS_GLOBALS;

extern JOS_GLOBALS JG;

extern void JOS_Scheduler();

extern void process_zb(void *arg);

I assume that zb_waiters is initialized in the JOS initialization.

Here's the easy solution (q5-ans.c) q5-ans.c

#include "jos.h"

void process_zb(void *arg)
{
  ZB_call *zb, *other;
  JRB tmp;
  int nbytes;

  zb = (ZB_call *) arg;
  tmp = jrb_find_str(JG.zb_waiters, zb->name);

  /* If there's no matching call in the tree, insert the ZB_call and return */

  if (tmp == NULL) {
    jrb_insert_str(JG.zb_waiters, zb->name, new_jval_v((void *) zb));
    return;   /* Note, this will have the scheduler run. */
  }

  /* If the other call in the tree is of the same type, insert the ZB_call and return */

  other = (ZB_call *) tmp->val.v;
  if (other->send_or_receive == zb->send_or_receive) {
    jrb_insert_str(JG.zb_waiters, zb->name, new_jval_v((void *) zb));
    return;  
  }

  /* Otherwise, we have a match.  First remove it from the tree: */

  jrb_delete_node(tmp);

  /* Next, copy the message. */

  nbytes = zb->size;
  if (other->size < nbytes) nbytes = other->size;
  if (zb->send_or_receive == 'S') {
    memcpy(other->physical->ptr, zb->physical->ptr, nbytes);
  } else {
    memcpy(zb->physical->ptr, other->physical->ptr, nbytes);
  }

  /* Set the return value (it goes is Register 2, but any one would do for this exam). */

  zb->caller->regs[Reg2] = nbytes;
  other->caller->regs[Reg2] = nbytes;

  /* Put the two PCB's onto the ready queue */

  dll_append(JG.readyq, new_jval_v((void *) zb->caller));
  dll_append(JG.readyq, new_jval_v((void *) other->caller));

  /* Finally, free up the memory */

  free(zb);
  free(other);
  return;
}

The hard solution just uses a dllist as the val field (q5-hard-ans.c):

#include jos.h

void process_zb(void *arg)
{
  ZB_call *zb, *other;
  JRB tmp;
  Dllist l;
  int nbytes;

  zb = (ZB_call *) arg;
  tmp = jrb_find_str(JG.zb_waiters, zb->name);

  /* If there's no matching call in the tree, insert the ZB_call and return */

  if (tmp == NULL) {
    l = new_dllist();
    dll_append(l, new_jval_v((void *) zb));
    jrb_insert_str(JG.zb_waiters, zb->name, new_jval_v((void *) l));
    return;   
  }

  l = (dllist) tmp->val.v;
  other = (ZB_call *) l->flink;

  /* If the other calls in the tree are of the same type, append the ZB_call and return */

  if (other->send_or_receive == zb->send_or_receive) {
    dll_append(l, new_jval_v((void *) zb));
    return;  
  }

  /* Otherwise, we have a match.  First remove it from the list/tree: */

  dll_delete_node(l->flink);
  if (dll_empty(l)) {
    dll_free_tree(l);
    jrb_delete_node(tmp);
  }

  /* The rest of the code is the same as before. */

  nbytes = zb->size;
  if (other->size < nbytes) nbytes = other->size;
  if (zb->send_or_receive == 'S') {
    memcpy(other->physical->ptr, zb->physical->ptr, nbytes);
  } else {
    memcpy(zb->physical->ptr, other->physical->ptr, nbytes);
  }

  /* Set the return value (it goes is Register 2, but any one would do for this exam). */

  zb->caller->regs[Reg2] = nbytes;
  other->caller->regs[Reg2] = nbytes;

  /* Put the two PCB's onto the ready queue */

  dll_append(JG.readyq, new_jval_v((void *) zb->caller));
  dll_append(JG.readyq, new_jval_v((void *) other->caller));

  /* Finally, free up the memory */

  free(zb);
  free(other);
  return;
}

You could have used a semaphore instead -- have two JRB trees -- one of senders and one of receivers. The logic gets convoluted, unfortunately, and a list is much eaiser to implement.

Grading

Clearly, this question was too hard. Sorry.

I've allocated 18 points to the question, broken into three point sections on the details to which you should have tried to attend:

A way to connect senders to receivers and vice versa, using the name field.
A way to block senders/receivers when there is no matching call.
A memcpy() statment to copy the message from the sender or receiver (I accepted strcpy()).
A way to calculate the message size and put it into the PCB's return register.
Appending the two PCB's onto the ready queue once the message is copied.
An attempt to handle the name collision problem (the easy/hard part).

You got points in each section according to how well or coherently you addressed the point. If your answer had some merit that is not covered by the sections above, or some problems not addressed above, I adjusted your score at the end.