Lecture notes:
http://www.cs.utk.edu/~mbeck/classes/cs560/560/notes/Thread4/lecture.html
In this lecture, we go over a simulation program that uses the
producer/consumer paradigm, and that requires monitors/condition variables.
The Printer Simulation Problem
This lecture revolves around a simulation that we will write.
This is of a system that has users and printers. In particular,
we have nusers users, and nprinters printers.
We'll assume that all the printers are identical (e.g. in a machine
room) so that when a user wants to print something out, it doesn't
matter which printer it comes out on.
Now, in our simulation, every so often, a user will decide to print
something. When this happens, the print job will be submitted,
and if any printer is available, it will print the job (taking
4 seconds a page). If all the printers are printing something, then
the job will be queued until one of the printers is ready.
Our print queue will have a fixed size. If the queue is full,
then the user must wait until the queue is not full to submit the
job.
Obviously, we are going to use threads for this simulation.
Each user will have its own thread, and each printer will have
its own thread. The threads will communicate through shared
memory.
printqsim
The structure of this program is going to have a specific format,
which the threads lab will share. There will be a header file,
in this case
printqsim.h.
This defines some data structures that will be used, plus some
subroutine prototypes. In the labs, these subroutines are
the ones that you have to write to make the lab work.
There is also a driver program, in this case
printqsim.c.
This defines a main() routine which sets up the
threads. Together with your definitions of the subroutines,
the driver program will solve the problem.
You are not allowed to change either the header or driver
files. Instead, you are to provide a C file that defines the
subroutines in the header file, and when this is compiled with
the driver program, the resulting program solves the problem.
In this case, our job is to define initialize_v(),
submit_job() and get_print_job() so that together
with printqsim.c, our program performs the user/printer
simulation correctly.
Ok, let's look at printqsim.c. It takes 6 arguments:
- nusers: The number of users.
- nprinters: The number of printers.
- arrtime: The average time that users will take
between submitting print jobs.
- maxpages: The maximum size of a print job (in pages).
- bufsize: The size of the print queue.
- nevents: The number of print jobs that each
user will make.
Now, the main() routine sets up a Spq
struct. This is defined in printqsim.h. Each
user and printer thread will receive a pointer to one
of these structs as its argument, and all information that
the thread needs will be in this struct. In this way,
we won't need to use any global variables. Each of the
command line arguments has a field in the Spq struct,
plus there are the following extra fields:
- id: The user/printer's id.
- starttime: The time(0) value of when the
program began. This is useful for printing out information
while the program runs.
- v: This is a (void *) that you will get to
define in your code. You will initialize it in
initialize_v(). Each Spq struct's v
field will point to the same value. You'll see how to
use this in a bit.
After setting up one Spq struct (including a call to
initialize_v()), the main()
thread sets up a random number generator, and then
creates nusers user threads, and nprinters
printer threads. Each thread gets its own Spq struct
as its argument. The only thing that differs in each Spq
struct is the id. Everything else (including the
pointer to v) is the same. Finally, the main
thread exits, leaving only the user and printer threads.
User threads
Each user thread does the same thing. It iterates for nevents
iterations. In each iteration, it sleeps for a random period of
time (between 1 and arrtime*2 -- this yields a mean waiting
time of arrtime), and then submits a print job. This
job is represented by a Job struct, which has three
fields -- the user's id, a job id (which is i), and
the number of pages, which is a random number between 1 and
maxpages. The job is then submitted with submit_job.
After submitting nevents jobs, the user thread exits.
The user thread prints out when it sleeps, and when it submits
a job.
Printer threads
Each printer thread does the same thing. It iterates
forever, first getting a job using get_print_job(),
and then printing that job. It simulates printing the
job by sleeping for 4 seconds for each page. After
printing, it repeats the process.
The printer thread prints out when it asks for
a job, and when it prints
a job.
A dummy solution
Now, all that's left is to write
initialize_v(),
submit_job(),
and get_print_job().
To reiterate, were this a lab, your job would be to write
these three subroutines so that they work with
printqsim.h and printqsim.c
You would not be allowed to modify
printqsim.h and printqsim.c
Now, look at
ps1.c.
This is one solution to the problem. It's not a
working solution, but it is one that will compile and run.
What it does is set s->v to NULL,
ignore print jobs when they are submitted, and
force the printer threads to exit.
Try running it:
UNIX> ps1 5 3 5 5 5 3
0: user 0/000: Sleeping for 6 seconds
0: user 1/000: Sleeping for 7 seconds
0: user 2/000: Sleeping for 6 seconds
0: user 3/000: Sleeping for 1 seconds
0: user 4/000: Sleeping for 10 seconds
0: prnt 0/000: ready to print
0: prnt 0/000: Done
0: prnt 1/000: ready to print
0: prnt 1/000: Done
0: prnt 2/000: ready to print
0: prnt 2/000: Done
1: user 3/000: Submitting a job with size 4
1: user 3/001: Sleeping for 7 seconds
6: user 2/000: Submitting a job with size 5
6: user 2/001: Sleeping for 8 seconds
6: user 0/000: Submitting a job with size 2
6: user 0/001: Sleeping for 8 seconds
7: user 1/000: Submitting a job with size 5
7: user 1/001: Sleeping for 6 seconds
8: user 3/001: Submitting a job with size 2
8: user 3/002: Sleeping for 4 seconds
10: user 4/000: Submitting a job with size 5
10: user 4/001: Sleeping for 8 seconds
12: user 3/002: Submitting a job with size 3
12: user 3/003: Done
13: user 1/001: Submitting a job with size 3
13: user 1/002: Sleeping for 5 seconds
14: user 0/001: Submitting a job with size 1
14: user 0/002: Sleeping for 10 seconds
14: user 2/001: Submitting a job with size 5
14: user 2/002: Sleeping for 9 seconds
18: user 4/001: Submitting a job with size 5
18: user 4/002: Sleeping for 6 seconds
18: user 1/002: Submitting a job with size 3
18: user 1/003: Done
23: user 2/002: Submitting a job with size 4
23: user 2/003: Done
24: user 0/002: Submitting a job with size 2
24: user 0/003: Done
24: user 4/002: Submitting a job with size 4
24: user 4/003: Done
UNIX>
This created a simulation with 5 users, 3 printers,
an average of 5 seconds between print jobs, a max
page size of 5, a print queue size of 5, and three
print jobs per user.
You'll note that the simulation did run, but not correctly.
Why? Well, the printers never printed anything, for
starters. Moreover, more than 5 print jobs were submitted
and ostesibly queued, and the subsequent print jobs were
still allowed to be submitted.
This may seem like a boneheaded example, but it illustrates
something important -- solutions to a problem may compile and
run, but you have to check their output for correctness. I will
provide "solutions" like this one for your thread labs that
will be incorrect, but give you a starting point.
Starting on a real solution
To actually solve thus problem, it's pretty clear how to start.
You need to set up a queue of print jobs in your v
pointer. This queue will have bufsize elements.
When a user submits a job, if there are less than bufsize
elements in the queue, you will put the job there. Otherwise,
you'll have to wait for a printer to remove one of the jobs.
Since you have multiple threads accessing the buffer, you'll
need to protect it with a mutex. The above is all done in
ps2.c.
First, it defines a Buffer struct
that uses an array as a circular queue (with head/tail/njobs)
defining the state of the queue. It also has a mutex.
In initialize_v(), the buffer is allocated, and
v is set to be the buffer.
Moreover, now submit_job inserts the job into
the buffer if there's room. If there's not room, the
user thread exits. Also, nothing is done with get_print_job().
This is an example of programming incrementally -- you try one
thing and test it to make sure it works before going on.
When we call this with the same arguments as before, we
see that 5 jobs get submitted, and then the users all
exit. This is what we expect, so the code is working:
UNIX> ps2 5 3 5 5 5 3
0: user 0/000: Sleeping for 10 seconds
0: user 1/000: Sleeping for 5 seconds
0: user 2/000: Sleeping for 8 seconds
0: user 3/000: Sleeping for 3 seconds
0: user 4/000: Sleeping for 6 seconds
0: prnt 0/000: ready to print
0: prnt 0/000: Done
0: prnt 1/000: ready to print
0: prnt 1/000: Done
0: prnt 2/000: ready to print
0: prnt 2/000: Done
3: user 3/000: Submitting a job with size 2
3: user 3/001: Sleeping for 1 seconds
4: user 3/001: Submitting a job with size 2
4: user 3/002: Sleeping for 6 seconds
5: user 1/000: Submitting a job with size 5
5: user 1/001: Sleeping for 6 seconds
6: user 4/000: Submitting a job with size 2
6: user 4/001: Sleeping for 2 seconds
8: user 2/000: Submitting a job with size 2
8: user 2/001: Sleeping for 6 seconds
8: user 4/001: Submitting a job with size 3
8: user 4 -- the queue is full -- exiting
10: user 3/002: Submitting a job with size 3
10: user 3 -- the queue is full -- exiting
10: user 0/000: Submitting a job with size 5
10: user 0 -- the queue is full -- exiting
11: user 1/001: Submitting a job with size 3
11: user 1 -- the queue is full -- exiting
14: user 2/001: Submitting a job with size 5
14: user 2 -- the queue is full -- exiting
UNIX>
A semi-working solution
Now the question is -- what should we do when the queue is full.
Moreover, when we start writing get_print_job(), what
do we do when the queue is empty and there are no jobs to
print. Well, ps3.c provides one
solution. It is not a good solution, but it works.
When submit_job() is called and the queue is full,
the mutex is released, and sleep(1) is called.
Then the queue is checked again. In this way, if a printer
thread calls get_print_job() during that second,
then it can take a job off the queue, and then user's job
may be submitted. Similarly, when the queue is empty and
a printer calls get_print_job() it sleeps for a
second an checks again. note, it has to release the
mutex when it sleeps so that a user thread can actually
put a job on the queue.
The code works. Try it out:
UNIX> ps3 5 3 5 5 5 3
0: user 0/000: Sleeping for 10 seconds
0: user 1/000: Sleeping for 1 seconds
0: user 2/000: Sleeping for 4 seconds
0: user 3/000: Sleeping for 1 seconds
0: user 4/000: Sleeping for 10 seconds
0: prnt 0/000: ready to print
0: prnt 0 sleeping because the queue is empty
0: prnt 1/000: ready to print
0: prnt 1 sleeping because the queue is empty
0: prnt 2/000: ready to print
0: prnt 2 sleeping because the queue is empty
1: user 1/000: Submitting a job with size 3
1: user 1/001: Sleeping for 7 seconds
1: user 3/000: Submitting a job with size 4
1: user 3/001: Sleeping for 4 seconds
1: prnt 0/000: Printing job 0 from user 1 size 3
1: prnt 1/000: Printing job 0 from user 3 size 4
1: prnt 2 sleeping because the queue is empty
2: prnt 2 sleeping because the queue is empty
3: prnt 2 sleeping because the queue is empty
4: user 2/000: Submitting a job with size 4
4: user 2/001: Sleeping for 10 seconds
4: prnt 2/000: Printing job 0 from user 2 size 4
5: user 3/001: Submitting a job with size 1
5: user 3/002: Sleeping for 2 seconds
7: user 3/002: Submitting a job with size 2
7: user 3/003: Done
8: user 1/001: Submitting a job with size 5
8: user 1/002: Sleeping for 4 seconds
10: user 4/000: Submitting a job with size 3
10: user 4/001: Sleeping for 9 seconds
10: user 0/000: Submitting a job with size 5
10: user 0/001: Sleeping for 5 seconds
12: user 1/002: Submitting a job with size 3
12: user 1 sleeping because the queue is full
13: prnt 0/001: ready to print
13: prnt 0/001: Printing job 1 from user 3 size 1
13: user 1/003: Done
14: user 2/001: Submitting a job with size 1
14: user 2 sleeping because the queue is full
15: user 0/001: Submitting a job with size 1
15: user 0 sleeping because the queue is full
15: user 2 sleeping because the queue is full
16: user 2 sleeping because the queue is full
16: user 0 sleeping because the queue is full
17: prnt 1/001: ready to print
17: prnt 1/001: Printing job 2 from user 3 size 2
17: user 0/002: Sleeping for 3 seconds
17: prnt 0/002: ready to print
...
60: prnt 1 sleeping because the queue is empty
60: prnt 2/004: ready to print
60: prnt 2 sleeping because the queue is empty
60: prnt 0 sleeping because the queue is empty
61: prnt 2 sleeping because the queue is empty
61: prnt 0 sleeping because the queue is empty
61: prnt 1 sleeping because the queue is empty
62: prnt 0 sleeping because the queue is empty
62: prnt 1 sleeping because the queue is empty
62: prnt 2 sleeping because the queue is empty
63: prnt 0 sleeping because the queue is empty
63: prnt 1 sleeping because the queue is empty
63: prnt 2 sleeping because the queue is empty
64: prnt 1 sleeping because the queue is empty
^CUNIX>
It all works fine. When all the user jobs are done,
the printer threads keep sleeping and checking the
queue, so you eventually have to cntl-c out of
the program.
This is a workable solution, but it is not a good one.
The technique of periodically checking the queue is called
polling. It's not really what you want because
you'd like for a printer thread to wake up and start printing
as soon as a job is inserted into the queue, instead of up
to a second afterward. Similarly, you'd like the user to
complete submitting a job as soon as a printer thread
empties a space in the queue instead of up to a second afterward.
In short, polling is a ok, but not great. I show it to you
because it's good for you to see, but I don't want to see an
polling in your labs -- if you do it, you will get points
taken off.
Monitors and condition variables
Monitors and condition variables together form a very convenient tool
for synchronization. There are two ways to discuss
monitors and condition variables -- as part of a threaded language,
or as part of a threads library. The book (chapter 6) discusses them
as part of a threaded language, but I'm going to discuss them as part
of a threads library, since that's how you will use them.
A monitor is a data structre which a thread can "enter" and "exit".
Only one thread may be in the monitor at a time. This is just like
a mutex, and in pthreads, there is no entity called a "monitor". You
just use a mutex.
Condition variables allow you to do more sophisticated things with
monitors. A condition variable must be associated with a specific monitor.
There are three procedures that act on condition varaibles,
and whenever you call them, you must have entered the relevant monitor
(i.e. you must have locked the relevant mutex):
- pthread_cond_wait(pthread_cond_t *c, pthread_mutex_t *mon)
This says to release the mutex and block until another thread unblocks
you. This is of course, done atomically. When pthread_cond_wait()
returns, that means that you have been woken up, and you have reacquired
the mutex.
- pthread_cond_signal(pthread_cond_t *c)
This chooses one or more thread that has blocked on
the condition variable, and unblocks it. If there is no thread that
has blocked on the condition variable, then pthread_cond_signal()
does nothing.
There are no guarantees about which thread gets
unblocked if there are more than one blocked -- just that some thread(s)
will be unblocked. The pthreads library does not require that you
actually own the mutex when you call pthread_cond_signal().
Some threads packages do, and I think that it's a good idea, so whenever
you see me use pthread_cond_signal(), I will have locked the
relevant mutex.
- pthread_cond_broadcast(pthread_cond_t *c)
This unblocks all threads that have blocked on the condition variable.
Let me advocate testing the return values of all monitor and condition
variable calls. This is because you often make errors messing with these,
and testing the return value can save you hours of debugging.
Now, here is an odd thing -- if you call
pthread_cond_signal() or pthread_cond_broadcast(),
then you should own the mutex (i.e. you should have locked
the mutex).
However, the thread that you are
unblocking will have locked the mutex when it called
pthread_cond_wait(). This at first appears to be
a contradiction, but you must remember that the waiting
thread unlocks the mutex while it is blocked. When
it is unblocked, it must relock the mutex before returning
from pthread_cond_wait.
As it turns out there are a few choices that the threads system
has in implementing condition variables.
- The unblocked thread has to wait until the the thread calling
pthread_cond_signal()/pthread_cond_broadcast()
unlocks
mutex to run. I.e. the unblocking
merely makes it block on the mutex instead of the condition variable.
- The unblocked thread automatically locks mutex and the thread
calling pthread_cond_signal()/pthread_cond_broadcast()
goes back to blocking on the mutex.
When the mutex is free, the thread will reenter it and continue
executing following the
pthread_cond_signal()/pthread_cond_broadcast() call.
Believe it or not, there are arguments for both approaches. In pthreads,
the former approach is taken. My personal philosophy on this
is that you should program in such a way that either approach will
work. One way to do this is to make sure that you unlock the
mutex immediately after calling pthread_cond_signal() or
pthread_cond_broadcast. My code will always do this.
Read the book (chapter 6) for a further discussion of this.
Using condition variables
Now, adding condition variables to our program is straightforward.
We need two condition variables -- one for when the queue is full
and one for when the queue is empty. We'll call pthread_cond_wait()
in submit_job() when the queue is full, and
pthread_cond_signal() in get_print_job() when a printer
thread removes a job from a full queue.
Likewise,
we'll call pthread_cond_wait()
in get_print_job() when the queue is empty, and
pthread_cond_signal() in submit_job() when a user
thread inserts a job into an empty queue.
Note that submit_job() and get_print_job() both
use while loops because when pthread_cond_wait() returns,
the queue may have become full/empty in the time between when
the waiting thread unblocked and the time that it acquired
the mutex. Therefore, it may have to wait again.
The code is in
ps4.c. When you run it, everything
seems to work just fine.
UNIX> !ps
ps4 5 3 5 5 5 3
0: user 0/000: Sleeping for 4 seconds
0: user 1/000: Sleeping for 10 seconds
0: user 2/000: Sleeping for 5 seconds
0: user 3/000: Sleeping for 2 seconds
0: user 4/000: Sleeping for 7 seconds
0: prnt 0/000: ready to print
0: prnt 0 blocking because the queue is empty
0: prnt 1/000: ready to print
0: prnt 1 blocking because the queue is empty
0: prnt 2/000: ready to print
0: prnt 2 blocking because the queue is empty
2: user 3/000: Submitting a job with size 5
2: user 3/001: Sleeping for 10 seconds
2: prnt 0/000: Printing job 0 from user 3 size 5
4: user 0/000: Submitting a job with size 1
4: user 0/001: Sleeping for 1 seconds
4: prnt 1/000: Printing job 0 from user 0 size 1
5: user 2/000: Submitting a job with size 4
5: user 2/001: Sleeping for 6 seconds
5: user 0/001: Submitting a job with size 3
5: user 0/002: Sleeping for 10 seconds
5: prnt 2/000: Printing job 0 from user 2 size 4
7: user 4/000: Submitting a job with size 4
7: user 4/001: Sleeping for 10 seconds
8: prnt 1/001: ready to print
8: prnt 1/001: Printing job 1 from user 0 size 3
10: user 1/000: Submitting a job with size 1
10: user 1/001: Sleeping for 6 seconds
11: user 2/001: Submitting a job with size 3
11: user 2/002: Sleeping for 1 seconds
12: user 3/001: Submitting a job with size 1
12: user 3/002: Sleeping for 10 seconds
12: user 2/002: Submitting a job with size 5
12: user 2/003: Done
...
A bug
However, there is still one problem with this code.
Suppose there are two printer threads waiting because
the queue is empty. Moreover, there are two user
threads that want to submit jobs at the same time.
The first user thread puts the job on the queue and
calls pthread_cond_signal(). This unblocks
one of the printer threads, but it then blocks so
that it can acquire the mutex. Now, the user thread
releases the mutex, but the printer thread does not
get it -- instead, the next user thread gets it.
It puts a job into the queue, but since there is
already a job there, it does not call
pthread_cond_signal(). Therefore, even though
there are two jobs to be printed, only one printer
thread is awake. This means that we've lost one
printer. This is a bug.
Look at ps4-bad.txt.
This is exactly what happens. There are three user
threads and five printer threads. Initially, all of
the printer threads block. At the 3 second mark, two
user threads submit jobs, but only one printer thread
(0) is signalled. Then, more jobs are put onto
the print queue, but since njobs is greater than
1, no more printers get awakened. This is a bug.
Fixing this bug is simple (in
ps5.c)
-- simply remove the if
statements around the pthread_cond_signal() calls.
This means that submit_job always signals the
empty condition variable, and get_print_job
always signals the full condition variable. This
works fine -- if there are no blocked threads,
pthread_cond_signal() does nothing, and if, for example,
a user thread is unblocked and there is no room on the
queue, it will simply call pthread_cond_wait() again.
Try it out. If you look at
ps5-good.txt, you'll see
the same scenario as in ps4-bad.txt at the 27 second
mark, and that it is handled just fine.
So
So, you've learned what monitors/condition variables are,
and you've seen a detailed example of their use. You have
also seen that synchronization problems can be subtle, and
you have to examime your program's output carefully to make
sure that it is working like you think it should.