CS360 Lecture notes -- A quick primer on mutexes and condition variables


The pthread_mutex_t data structure

This text is copied from the lecture notes on race conditions (thread 2).

The pthreads library provides a simple data structure called a "mutex" that allows threads to synchronize by locking and unlocking the data structure. The three procedures that you use with mutexes are as follows:

pthread_mutex_init(pthread_mutex_t *mutex, NULL);
pthread_mutex_lock(pthread_mutex_t *mutex);
pthread_mutex_unlock(pthread_mutex_t *mutex);

You create a mutex with pthread_mutex_init(). You have to have allocated memory for it ahead of time (i.e. pthread_mutex_init() does not call malloc(). Then any thread may lock or unlock the mutex. When a thread locks the mutex, no other thread may lock it. If a thread calls pthread_mutex_lock() while the mutex is locked, then the thread will block until the mutex is unlocked. Only one thread may lock the mutex at a time.

The program mutex_example_1.c is a simple illustration of a mutex in action. In this program, we fork off multiple threads. Each thread has access to a shared counter, and what each thread does is lock the mutex, update the counter, sleep for a bit, and then unlock the mutex. Before it unlocks the mutex, it checks to make sure that the counter has not been altered while it was asleep. The properties of the mutex data structure assure that this works. Here's the code.

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <pthread.h>

/* Each thread is going to have private information and 
   shared information.  Here is the shared information. */

struct shared {
  int counter;            /* A shared counter. */
  pthread_mutex_t *lock;  /* A lock to protect the counter. */
  int usleeptime;         /* Microseconds that each thread will sleep after updating the counter. */
};

/* Here is the private information */

struct info {
  int id;                 /* The thread's id. */
  struct shared *s;       /* Pointer to the shared information. */
};

/* Here is the procedure that each thread calls.  
   In a nutshell, each thread locks the mutex, increments the
   counter, then sleeps.  It then tests to make sure that the
   counter hasn't been modified, and unlocks the mutex.  It
   repeats this loop indefinitely. */

void *share_counter(void *arg)
{
  struct info *info;    /* The thread's private info. */
  struct shared *s;     /* The thread's shared info. */
  int counter;          /* A copy of the counter, to test. */

  info = (struct info *) arg;
  s = info->s;

  while (1) {

    /* Lock the mutex, update the counter and print. */

    pthread_mutex_lock(s->lock);      
    s->counter++;
    counter = s->counter;
    printf("Thread: %3d - Begin - Counter %3d.\n", info->id, s->counter);
    fflush(stdout);

    /* Sleep, and then print the counter again. */

    usleep(s->usleeptime);
    printf("Thread: %3d - End   - Counter %3d.\n", info->id, s->counter);
    fflush(stdout);

    /* Make sure the counter hasn't been modified, then unlock the mutex. */

    if (s->counter != counter) {
      printf("Thread %d - Problem -- counter was %d, but now it's %d\n",
             info->id, counter, s->counter);
      exit(1);
    }
    pthread_mutex_unlock(s->lock);

  }

  return NULL;   /* Shut the compiler up. */
}

/* The main sets up the threads, and exits. */
  
int main(int argc, char **argv)
{
  int nthreads;
  int usleeptime;
  pthread_t *tids;
  struct shared S;
  struct info *infos;
  int i;

  if (argc != 3) {
    fprintf(stderr, "usage: mutex_example nthreads usleep_time\n");
    exit(1);
  }

  nthreads = atoi(argv[1]);
  usleeptime = atoi(argv[2]);

  tids = (pthread_t *) malloc(sizeof(pthread_t) * nthreads);
  infos = (struct info *) malloc(sizeof(struct info) * nthreads);
  for (i = 0; i < nthreads; i++) {
    infos[i].id = i;
    infos[i].s = &S;
  }
  S.counter = 0;
  S.usleeptime = usleeptime;
  S.lock = (pthread_mutex_t *) malloc(sizeof(pthread_mutex_t));
  pthread_mutex_init(S.lock, NULL);

  for (i = 0; i < nthreads; i++) {
    pthread_create(tids+i, NULL, share_counter, (void *) &infos[i]);
  }

  pthread_exit(NULL);
}

You call this with the number of threads, and the number of microseconds that each thread sleeps. Let's call it with 4 threads and 10,000 microseconds. You'll see that it works as anticipated -- the threads line up on the mutex, and each time a thread unlocks the mutex, another thread grabs it and updates the counter:

UNIX> make mutex_example_1
gcc -o mutex_example_1 mutex_example_1.c -lpthread
UNIX> mutex_example_1 4 10000 | head -n 20
UNIX> mutex_example_1 4 10000 | head -n 20
Thread:   0 - Begin - Counter   1.
Thread:   0 - End   - Counter   1.
Thread:   1 - Begin - Counter   2.
Thread:   1 - End   - Counter   2.
Thread:   2 - Begin - Counter   3.
Thread:   2 - End   - Counter   3.
Thread:   3 - Begin - Counter   4.
Thread:   3 - End   - Counter   4.
Thread:   0 - Begin - Counter   5.
Thread:   0 - End   - Counter   5.
Thread:   1 - Begin - Counter   6.
Thread:   1 - End   - Counter   6.
Thread:   2 - Begin - Counter   7.
Thread:   2 - End   - Counter   7.
Thread:   3 - Begin - Counter   8.
Thread:   3 - End   - Counter   8.
Thread:   0 - Begin - Counter   9.
Thread:   0 - End   - Counter   9.
Thread:   1 - Begin - Counter  10.
Thread:   1 - End   - Counter  10.
UNIX> 

What happens if we don't use a mutex

If we don't use a mutex, then the threads don't have exclusive access to the counter. in mutex_example_2.c, I have simply commented out the pthread_mutex_lock() and pthread_mutex_unlock() calls. Take a look at the output:
UNIX> make mutex_example_2
gcc -o mutex_example_2 mutex_example_2.c -lpthread
UNIX> mutex_example_2 4 10000 | head -n 20
UNIX> mutex_example_2 4 10000 | head -n 20
Thread:   0 - Begin - Counter   1.
Thread:   1 - Begin - Counter   2.
Thread:   2 - Begin - Counter   3.
Thread:   3 - Begin - Counter   4.
Thread:   0 - End   - Counter   4.
Thread 0 - Problem -- counter was 1, but now it's 4
Thread:   1 - End   - Counter   4.
Thread:   2 - End   - Counter   4.
Thread:   3 - End   - Counter   4.
UNIX>
What you see here is that all of the threads update the counter, with thread 0 updating it first. When thread 0 wakes up, the counter has been changed. Interestingly, between the time that it prints its error statement and the exit(1) call, threads 1, 2 and 3 wake up and print their counters. Then the exit(1) call kills the process.

This program is non-deterministic -- its output depends on the ordering of the threads by the system. Here's a second call, which is quite different:

UNIX> mutex_example_2 4 10000 | head -n 20
Thread:   3 - Begin - Counter   3.
Thread:   1 - Begin - Counter   1.
Thread:   0 - Begin - Counter   1.
Thread:   2 - Begin - Counter   2.
Thread:   3 - End   - Counter   3.
Thread:   1 - End   - Counter   3.
Thread:   0 - End   - Counter   3.
Thread:   2 - End   - Counter   3.
Thread:   3 - Begin - Counter   4.
Thread 1 - Problem -- counter was 1, but now it's 4
Thread 0 - Problem -- counter was 1, but now it's 4
Thread 2 - Problem -- counter was 2, but now it's 4
UNIX> 
There are two really interesting things here: I hope this output helps to motivate why you use mutexes to protect shared data.

Just because you're using a mutex, it doesn't mean that your data is safe from other threads.

I also quote this text from the lecture notes on race conditions.:

I want to point out here, that pthread_mutex_lock() does not actively "lock" other threads. Instead, it locks a data structure, which can be shared among the threads. The locking and unlocking of the data structure makes synchronization guarantees, which are very important to avoiding race conditions. However, I don't want you to get into the habit of thinking that pthread_mutex_lock() actively blocks other threads, or "locks them out." It doesn't -- it locks a data structure, and when other threads try to lock the same data structure, they block. Please reread this paragraph.

To illustrate this, mutex_example_3.c is the exact same as mutex_example_1.c, except the last thread does not lock or unlock the mutex. All of the others do. Take a look at the output:

UNIX> mutex_example_3 4 10000 | head -n 20
Thread:   1 - Begin - Counter   1.
Thread:   3 - Begin - Counter   2.
Thread:   1 - End   - Counter   2.
Thread:   3 - End   - Counter   2.
Thread 1 - Problem -- counter was 1, but now it's 2
Thread:   3 - Begin - Counter   3.
UNIX> 
Thread 1 locks the mutex, which locks out threads 0 and 2. However, since thread 3 is not calling pthread_mutex_lock(), it goes ahead and updates the counter. Thread 1 discovers this after it wakes up, and flags the error. As you can see, just because thread 0 locks the mutex, that doesn't mean that it "locks out" all of the other threads. It works cooperatively with all of the other threads that try to lock the same mutex.

An example of multiple threads working on one problem: Determining primes

This program is slightly contrived, but it will help you with your jtalk lab. The two files numbers-100.txt and numbers-5000.txt contain 100 and 5000 large numbers (up to 16 digits), and we want to know which ones are prime numbers. We're going to write a multithreaded program so that multiple threads can test primality of these numbers. The first of these is in prime_example_1.c. I'm only going to show the code that the threads run. The setup code in the main() thread is straightforward and very much like the code in the mutex examples.

/* Each thread will have its own private information, 
   and some shared information.  Here's the shared information. */

struct shared {
  int debug;          /* This is 0 or 1.  If 1, it will print more information. */
};

/* Here's the private information.  
   This includes a pointer to the shared information. */

struct info {
  int id;
  struct shared *s;
};

/* This is what each thread runs.  
   Each thread reads longs from standard input, and prints out the primes. */
   
void *find_primes(void *arg)
{
  struct info *info;
  struct shared *s;
  long prime_to_test, i;
  int prime;

  info = (struct info *) arg;
  s = info->s;

  while (1) {

    /* Read a number from standard input. */

    if (scanf("%ld", &prime_to_test) != 1) {
      if (s->debug) printf("Thread %d - Input_over - Exiting.\n", info->id);
      return NULL;
    }

    if (s->debug) {
      printf("Thread %d testing %ld\n", info->id, prime_to_test);
      fflush(stdout);
    }

    /* Determine if the number is prime. */

    prime = 1;
    for (i = 2; prime && i*i <= prime_to_test; i++) {
        if (prime_to_test % i == 0) prime = 0;
    }

    if (prime) {
      printf("Thread %d found prime %ld\n", info->id, prime_to_test);
    }
  }
}

We run this on one thread, and it finds two prime numbers in numbers-100.txt. It finds a bunch more in numbers-5000.txt and takes one minute and 45 seconds on my Macbook (in 2017):

UNIX> prime_example_1 1 n < numbers-100.txt 
Thread 0 found prime 7075339107081019
Thread 0 found prime 8608832394453737
UNIX> time prime_example_1 1 n < numbers-5000.txt
Thread 0 found prime 7463817079068967
Thread 0 found prime 9023127434641013
....
Thread 0 found prime 6889406536167677
Thread 0 found prime 6371913195199121
101.312u 0.340s 1:42.21 99.4%	0+0k 0+0io 0pf+0w
UNIX> 
When we employ multiple threads, we get speedups up to four threads (my mac has multiple cores), and nothing after that. I'm piping the output to wc to show that I'm getting the same number of lines of output in each case.
UNIX> time sh -c "prime_example_1 2 n < numbers-5000.txt | wc"
     122     610    4623
91.968u 0.044s 0:46.40 198.2%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "prime_example_1 3 n < numbers-5000.txt | wc"
     122     610    4623
93.900u 0.072s 0:31.81 295.4%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "prime_example_1 4 n < numbers-5000.txt | wc"
     122     610    4623
95.235u 0.088s 0:24.32 391.8%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "prime_example_1 5 n < numbers-5000.txt | wc"
     122     610    4623
112.364u 0.046s 0:23.56 477.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "prime_example_1 6 n < numbers-5000.txt | wc"
     122     610    4623
131.229u 0.069s 0:23.07 569.0%	0+0k 0+0io 0pf+0w
UNIX> time prime_example_1 8 n < numbers-5000.txt | wc
     122     610    4623
UNIX> time sh -c "prime_example_1 8 n < numbers-5000.txt | wc"
     122     610    4623
163.402u 0.119s 0:21.44 762.6%	0+0k 0+0io 0pf+0w
UNIX> 
You'll note that this code does not have any mutexes in it. That's because the standard I/O library is "thread-safe" -- you don't get race conditions on that scanf() call. You can take a look at http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_01 if you want more information about that...

Suppose I have a separate procedure that reads 1000 numbers at a time into a dllist

Now, suppose that instead of reading from standard input, I have a procedure called read_numbers(), which reads the numbers, 1000 at a time, and puts them onto a shared Dllist. Here's the procedure:

int read_numbers(Dllist l)
{
  int i;
  long n;

  for (i = 0; i < 1000; i++) {
    if (scanf("%ld", &n) != 1) {
      return (i == 0) ? 0 : 1;
    }
    dll_append(l, new_jval_l(n));
  }
  return 1;
}

Now, in prime_example_2.c, I add a dllist to the shared information, and have the threads call read_numbers() to fill it when its empty. Again, we are not using any mutexes here, so there is a potential race condition, that multiple threads may call read_numbers() simultaneously, and multiple threads may access and modify that dllist simultaneously. Regardless, here is the thread code:

/* Each thread will have its own private information, 
   and some shared information.  Here's the shared information. 
   There's a dllist to hold the numbers to test. */

struct shared {
  int nthreads;
  int debug;          /* This is 0 or 1.  If 1, it will print more information. */
  Dllist numbers;
};

/* ... */

/* Find_primes now calls read_numbers() when the dllist is
   empty.  Each thread pulls numbers off the dllist to check. */

void *find_primes(void *arg)
{
  struct info *info;
  struct shared *s;
  long prime_to_test, i;
  int prime;

  info = (struct info *) arg;
  s = info->S;

  while (1) {

    if (dll_empty(s->numbers)) {
      if (s->debug) {
        printf("Thread %d calling read_numbers.\n", info->id);
        fflush(stdout);
      }
      if (read_numbers(s->numbers) == 0) {
        if (s->debug) {
          printf("Thread %d exiting.\n", info->id);
          fflush(stdout);
        }
        return NULL;
      }
    }

    prime_to_test = s->numbers->flink->val.l;
    dll_delete_node(s->numbers->flink);

    if (s->debug) {
      printf("Thread %d testing %ld\n", info->id, prime_to_test);
      fflush(stdout);
    }

    prime = 1;
    for (i = 2; prime && i*i <= prime_to_test; i++) {
      if (prime_to_test % i == 0) prime = 0;
    }

    if (prime) {
      printf("Thread %d found prime %ld\n", info->id, prime_to_test);
    }
  }
}

When you run this, sometimes it works, but often (for me), you get an error on malloc() because of the race condition. (To compile this, you may have to change some of the initial variables in the makefile to find dllist.h and libfdr.a).

UNIX> prime_example_2 8 n < numbers-5000.txt
prime_example_2(1928,0x700000104000) malloc: *** error for object 0x7fcb82403410: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort
UNIX> 

Let's fix the race condition with a mutex

We can fix the race condition with a simple mutex. We make sure that that threads lock the mutex whenever they are accessing or modifying the dllist. This code is in prime_example_3.c, and I just show the thread code, and the new shared data structure. The lock/unlock calls are highlighted by blue comments so that you can see them.

struct shared {
  int nthreads;
  int debug;          
  pthread_mutex_t *lock;   /* This is the lock to protect the dllist */
  Dllist numbers;
};

/* ... */

void *find_primes(void *arg)
{
  struct info *info;
  struct shared *s;
  long prime_to_test, i;
  int prime;

  info = (struct info *) arg;
  s = info->S;

  while (1) {

    pthread_mutex_lock(s->lock);       /* Lock */
    if (dll_empty(s->numbers)) {
      if (s->debug) {
        printf("Thread %d calling read_numbers.\n", info->id);
        fflush(stdout);
      }
      if (read_numbers(s->numbers) == 0) {
        if (s->debug) {
          printf("Thread %d exiting.\n", info->id);
          fflush(stdout);
        }
        pthread_mutex_unlock(s->lock);       /* Unlock */
        return NULL;
      }
    }

    prime_to_test = s->numbers->flink->val.l;
    dll_delete_node(s->numbers->flink);
    pthread_mutex_unlock(s->lock);           /* Unlock */

    if (s->debug) {
      printf("Thread %d testing %ld\n", info->id, prime_to_test);
      fflush(stdout);
    }

    prime = 1;
    for (i = 2; prime && i*i <= prime_to_test; i++) {
      if (prime_to_test % i == 0) prime = 0;
    }

    if (prime) {
      printf("Thread %d found prime %ld\n", info->id, prime_to_test);
    }
  }
}

When we run it, it doesn't have any errors. Of course, that's not a proof that it's correct, but no matter how many times we run it, we don't get errors, so that gives us a little confidence. That is one of the unfortunate parts of thread programming -- it's hard to really assure correctness.

UNIX> time sh -c "prime_example_3 8 n < numbers-5000.txt | wc"
     122     610    4623
165.062u 0.069s 0:21.51 767.6%	0+0k 0+0io 0pf+0w
UNIX> 
You'll also note that it runs at the same speed -- the overhead in this program comes from calculating the primes, not reading the input.

The Condition Variable - A new synchronization primitive

Now, let's suppose that you want to have one thread that is responsible for output. You may want to do this to process the output, or perhaps to send it to multiple socket connections, etc. This is a great time to introduce a second synchronization primitive called the "condition variable." Like the mutex, it has three procedures that act on it:

pthread_cond_init(pthread_cond_t *cond, NULL);
pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *lock);
pthread_cond_signal(pthread_cond_t *cond);

Like the mutex, you initialize a condition variable with pthread_cond_init. You call pthread_cond_wait() when you want your thread to block. You call it on a condition variable and a mutex. You must own the lock to the mutex when you call it. pthread_cond_waid() will atomically release the mutex and block your thread.

When a thread calls pthread_cond_signal() on a condition variable, then the thread system checks to see if there are any threads that are blocked because they called pthread_cond_wait() on that condition variable. If there are none, then the pthread_cond_signal() call does nothing. However, if there are any threads that are blocked, then the thread system unblocks one of them, and it tries to re-lock the mutex. When it does lock the mutex, then it returns from the pthread_cond_wait() call that blocked it.

Condition variables are designed for when a thread needs to block because some condition is present in the system. Another thread will unblock it when the condition is no longer present by calling pthread_cond_signal().

Now, in the case of our program, we are going to have the threads stop calling printf() when they find prime numbers. Instead, when a thread finds a prime number, it is going to add it to a dllist of prime numbers, and call pthread_cond_signal() on a condition variable. We are going to have the main thread be waiting on that condition variable, and when it wakes up, it is going to print the numbers on the dllist, and delete them. In that way, the main thread is the only one performing output, and the other threads communicate with it through the dllist and the condition variable. Of course, we will protect the dllist with a mutex.

The code is in prime_example_4.c. Here's the new shared data and thread code:

struct shared {
  int nthreads;
  int debug;          
  pthread_mutex_t *input_lock;    /* This is the lock to protect the numbers dllist */
  pthread_mutex_t *output_lock;   /* This is the lock to protect the primes dllist */
  pthread_cond_t  *output_cond;   /* The output thread (which is the main thread) 
                                     blocks on this condition variable. */
  Dllist numbers;
  Dllist primes;
};

/* ... */

void *find_primes(void *arg)
{
  struct info *info;
  struct shared *s;
  long prime_to_test, i;
  int prime;

  info = (struct info *) arg;
  s = info->S;

  while (1) {

    pthread_mutex_lock(s->input_lock);
    if (dll_empty(s->numbers)) {
      if (s->debug) {
        printf("Thread %d calling read_numbers.\n", info->id);
        fflush(stdout);
      }
      if (read_numbers(s->numbers) == 0) {
        if (s->debug) {
          printf("Thread %d exiting.\n", info->id);
          fflush(stdout);
        }
        pthread_mutex_unlock(s->input_lock);
        return NULL;
      }
    }

    prime_to_test = s->numbers->flink->val.l;
    dll_delete_node(s->numbers->flink);
    pthread_mutex_unlock(s->input_lock);

    if (s->debug) {
      printf("Thread %d testing %ld\n", info->id, prime_to_test);
      fflush(stdout);
    }

    prime = 1;
    for (i = 2; prime && i*i <= prime_to_test; i++) {
      if (prime_to_test % i == 0) prime = 0;
    }

    /* When we find a prime, we now add it to the primes dllist and signal 
       the condition variable to alert the printing thread to print. */

    if (prime) {
      if (s->debug) {
        printf("Thread %d found prime %ld\n", info->id, prime_to_test);
        fflush(stdout);
      }
      pthread_mutex_lock(s->output_lock);
      dll_append(s->primes, new_jval_l(prime_to_test));
      pthread_cond_signal(s->output_cond);
      pthread_mutex_unlock(s->output_lock);
    } 
  }
}

And here's the part of the main thread that processes output:

  /* ... */
  pn = 1;
  pthread_mutex_lock(S.output_lock);
  while(1) {
    pthread_cond_wait(S.output_cond, S.output_lock);
    while (!dll_empty(S.primes)) {
      printf("Prime number %3d: %ld\n", pn, S.primes->flink->val.l);
      dll_delete_node(S.primes->flink);
      pn++;
    }
  }
  pthread_exit(NULL);
}

Think about the flow of control when a thread finds a prime number. It puts the number onto the primes dllist and signals the condition variable. That causes the main thread to wake up, print what's on the dllist and clear the dllist.

When we run this, it works, but the program never exits:

UNIX> prime_example_4 1 n < numbers-100.txt 
Prime number   1: 7075339107081019
Prime number   2: 8608832394453737
^C
That's a drag. The reason is that the main thread is blocked, and for some reason, the threads system doesn't want to exit. I tried putting that functionality into another thread, and that didn't help, so we're going to have to fix this ourselves. Fortunately, that's not too hard.

What I'm going to do is add another shared variable called dead_threads. When a thread dies, it will increment this value and signal the condition variable. Now, after outputting the primes dllist, the main thread checks to see if all of the threads are dead. If they are, it exits.

The code is in prime_example_6.c. Here is the code when the threads exit:

      if (read_numbers(s->numbers) == 0) {
        if (s->debug) {
          printf("Thread %d exiting.\n", info->id);
          fflush(stdout);
        }
        pthread_mutex_unlock(s->input_lock);
        
        /* When a thread dies, update "dead_threads" and signal the condition. */

        pthread_mutex_lock(s->output_lock);
        s->dead_threads++;
        pthread_cond_signal(s->output_cond);
        pthread_mutex_unlock(s->output_lock);

        return NULL;
      }
    }

And here is where the main thread checks to see if all the thread are dead, and exits:

  pn = 1;
  pthread_mutex_lock(S.output_lock);
  while(1) {
    pthread_cond_wait(S.output_cond, S.output_lock);
    while (!dll_empty(S.primes)) {
      printf("Prime number %3d: %ld\n", pn, S.primes->flink->val.l);
      dll_delete_node(S.primes->flink);
      pn++;
    }
    if (S.dead_threads == S.nthreads) exit(0);   /* Check to see if all the threads are dead. */
  }
    
  pthread_exit(NULL);
}

Now, we run to completion -- Yay!!

UNIX> prime_example_6 1 n < numbers-100.txt
Prime number   1: 7075339107081019
Prime number   2: 8608832394453737
UNIX> time sh -c "prime_example_6 8 n < numbers-5000.txt | wc"
     122     488    4257
162.931u 0.115s 0:21.52 757.6%	0+0k 0+0io 0pf+0w
UNIX> 

Lessons Learned from this Lecture

You've gotten further illustration of race conditions and how to help prevent them with a mutex. You've also learned how to use a condition variable to control a thread that needs to block and unblock because of certain conditions. In this example, the thread was a thread the processed output, and that's very similar to the thread that will broadcast output in your chat server lab!