CS360 Lecture notes -- Thread #3


In this lecture, we go over race conditions in more detail, focusing on using mutexes, and the trade-off between safety and performance.

SSNSERVER

The lecture revolves around a piece of code that maintains a database of people/ages/social security numbers. The main code is in ssnserver.c. It maintains a red-black tree (t) keyed on a person's name (in the order last, first). The val field points to an Entry struct, which contains the person's name again, his/her age, and his/her social-security number, stored as a string.

Ssnserver.c creates the tree and then accepts four kinds of inputs from standard input:

  1. ADD fn ln age ssn -- This adds an entry to the tree.
  2. DELETE fn ln -- This deletes an entry from the tree.
  3. PRINT -- This prints the tree.
  4. DONE -- This causes the program to exit.
Try it out:

INPUTGEN

Ok, now look at inputgen.c. This is a program that I wrote to really beat on ssnserver. As input, it takes a number of events, a random number seed, and a file of last names. The file of last names that I've created is lns, which is simply /usr/dict/words copied into a local file. The program reads the last names into the array lns, and it has an array fns of 65 first names. Now, what it does is create nevents random input events for ssnserver.c. The first 50 events are random ADD events, and thereafter, it will create either ADD, DELETE or PRINT events (these in the ratio 5/5/1). It ends with a PRINT and a DONE event.

In order to create DELETE events that correspond to entries in the tree, inputgen uses a rb-tree of its own. This tree is keyed on a random number, and its val field is one of the names that it added previously. When it creates a DELETE event, it chooses the first name in the tree -- this will be a random name, deletes it from the tree, and then uses this name for the DELETE event.

So, this is a little complex, but you should be able to understand it. Inputgen is set up so that the tree that it manages will average around 50 elements, regardless of the number of events that it generates. To prove this to yourself, try it:

You'll note that the above tree has 50 elements.

Turning ssnserver into a real server

Now, look at ssnserver1.c.

What this does is turn ssnserver into a real server. It serves a socket, and then calls accept_connection(), and creates a server_thread() thread to service the connection. The server_thread() thread works just like ssnserver.c, with the exception that the tree is a global variable.

Try it out with telnet. For example, in one window on hydra4 I do:

while in another, I do: It works just fine. I modified inputgen.c to work as a socket client -- the code is in inclient.c. It is straightforward and uses a second thread to read the socket output and print it to standard out. Try it out on the same server: Now, look at ssnserver2.c. This works just like ssnserver1 except that it can service multiple connections simultaneously by forking off one server_thread() per connection. Note however, that that access to t is not protected by mutexes. This presents a problem because, for example, one thread may be adding one element to the tree while another is deleting a nearby element. If the first thread is interrupted before it finishes adding the element, then the rb-tree pointers may not be where they should be when the second thread tries to delete. This will result in an error, probably a core dump.

To help illustrate this, I wrote a shell script called kill_it.sh. This forks off a given number of inclient processes who all blast away at the given ssnserver2 server.

Try it out: On one machine, start a ssnserver2. For example, I did the following on hydra4:

Then, on hydra3, I had 5 inclients send 1000 entries simultaneously to the server: Within a few seconds, the ssnserver2 dumped core. This doesn't always happen, but usually. The reason is that access to t is not protected.

Adding a mutex

Now look at ssnserver3.c. This adds a mutex that each thread locks while it processes a connection. This solves the problem with accessing t, because no two threads may access t simultaneously. I.e. try out kill_it.sh: On hydra4: And on hydra3: No core dump!

So, this solves the mutual exclusion problem, but it is like stapling papers with a sledge hammer. By having each thread lock the mutex throughout its lifetime, we have serialized the server -- no two threads can do anything simultaneously, and this is a performance problem. Ssnserver4.c solves this problem in a very standard way. Instead of locking the mutex at all times, the thread only locks the mutex when it accesses the tree. This is within the code for ADD, DELETE and PRINT.

To show how this improves performance, I ran ssnserver3 on hydra4, and simultaneously ran the following clients on hydra1, hydra2, hydra3 and hydra5:

The clients took 10, 37, 81 and 145 seconds respectively. This is because they were serviced serially. I then did the same test using ssnserver4, and the times were 71, 71, 79 and 80 seconds. Obviously, ssnserver4 is better at servicing the connections simultaneously, although the average client time is better with ssnserver3 (68.25 seconds) than with ssnserver4 (75.25 seconds).

ssnserver5

Is the fact that ssnserver3 has a better average client service time than ssnserver4 surprising to you? It actually shouldn't be. One reason is that in ssnserver4, the average tree size is going to be 200 elements for all clients. In ssnserver3 the average tree size is 125 (50 for the first client, which exits before the second client runs. Then 100 for the second client, 150 for the third, and 200 for the fourth). Another reason is that the server holds the mutex while printing the tree to the client. This is a time-consuming operation, and means that no other client operation may be serviced at this time.

Does the mutex really need to be locked while printing the tree? No, not really. You can do some buffering to help you. Instead, create the string that you'll be using to print the tree while holding this mutex. This will take some time, but not nearly as much as writing this string to the socket. Then you release the mutex and write the string. This is done in Ssnserver5.c. Note I keep the tree size in a global variable, and this helps me malloc() the buffer when needed.

Now, when I repeat the test of having 4 clients call

I get times of 18, 50, 51 and 52 seconds. This is a big improvment.

The lesson to be learned

The lesson to be learned here is that you need to think carefully about your use of synchronization primitives. There are two issues: correctness and performance. You want to make sure that there are no race conditions in your code, as there were in ssnserver2.c. However, you want to eliminate these race conditions in a way that maximizes performance. This should be done by making sure you only hold a mutex for as long as you need it locked. If you are performing a very time consuming operation (such as writing to a socket or file) while holding the mutex, then you should consider the use of buffering so that you can move the time consuming operation out of the code that holds the mutex. This is what ssnserver5 does.