Take a look at src/race1.c. Its explanation is inline with the code:
/* This program forks off two threads which share an integer, on which there is a race condition. */ #include <stdio.h> #include <stdlib.h> #include <pthread.h> /* This is information shared by the two threads. */ typedef struct { int i; int die; } Shared_Info; /* This is information which will be unique to each thread (SI is a pointer to shared data) */ typedef struct { int id; Shared_Info *SI; } Info; /* Here's the thread code -- pretty simple. */ void *thread(void *x) { Info *I; I = (Info *) x; while (!I->SI->die) I->SI->i = I->id; return NULL; } /* The main code sets up the shared and unique info, then forks off two threads. It then sleeps for two seconds and prints out the shared variable, si.i. Finally, it calls pthread_join() to wait for the two threads to die, and prints out the shared variable again. */ int main(int argc, char **argv) { pthread_t tids[2]; Shared_Info si; Info I[2]; void *retval; /* Set up the data to send to the threads. */ I[0].id = 0; I[0].SI = &si; I[1].id = 1; I[1].SI = &si; si.die = 0; /* Create the two threads and sleep */ if (pthread_create(tids, NULL, thread, I) != 0) { perror("pthread_create"); exit(1); } if (pthread_create(tids+1, NULL, thread, I+1) != 0) { perror("pthread_create"); exit(1); } sleep(2); /* Tell the threads to die, then print the shared info. */ si.die = 1; printf("%d\n", si.i); /* Wait for the threads to die and print out the shared info again. */ if (pthread_join(tids[0], &retval) != 0) { perror("pthread_join"); exit(1); } if (pthread_join(tids[1], &retval) != 0) { perror("pthread_join"); exit(1); } printf("%d\n", si.i); return 0; } |
Ok -- this program forks off two threads. Each thread has its own Info struct, which contains an id unique to the thread -- either 0 or 1. The Info struct also has a pointer to a Shared_Info struct, which is shared between the two threads. The Shared_Info struct has two variables -- i, which each thread is going to repeatedly overwrite with its id, and die, which each thread checks, and when it is one, the threads exit.
Ask yourself the following question: Where are the Info and Shared_Info structs stored? Heap or stack? If stack, whose stack?
The answer is that they are stored on the main thread's stack. There is no restriction on where threads can access memory -- a pointer is a pointer, and if it points to another thread's stack, so be it!
It should be pretty clear that this program has a race condition. The two threads are wontonly overwriting I->SI->i without any synchronization. If I asked you what the output of this program will be, you have to say that you don't really know. It can be one of four things:
UNIX> bin/race1 0 1 UNIX> bin/race1 1 0 UNIX> bin/race1 0 0 UNIX> bin/race1 1 1 UNIX>The shell script scripts/r1.sh runs it 100 times, putting the output of each run on a single line. After taking the 200 seconds to run the shell script, you can see that all outputs have occurred:
UNIX> sh scripts/r1.sh > r1-output.txt UNIX> grep '0 0' r1-output.txt | wc 26 52 104 UNIX> grep '0 1' r1-output.txt | wc 55 110 220 UNIX> grep '1 0' r1-output.txt | wc 10 20 40 UNIX> grep '1 1' r1-output.txt | wc 9 18 36 UNIX>
This is most definitely a race condition. Is it a bad one? Not really, because this program doesn't do anything but demonstrate a race condition. Let's look at a more complex race condition:
UNIX> bin/race2 4 4 1 Thread 0: AAA Thread 1: BBB Thread 2: CCC Thread 3: DDD UNIX>Similarly, the following make sense:
UNIX> bin/race2 4 4 2 Thread 0: AAA Thread 0: AAA Thread 1: BBB Thread 1: BBB Thread 2: CCC Thread 2: CCC Thread 3: DDD Thread 3: DDD UNIX> bin/race2 4 30 2 Thread 0: AAAAAAAAAAAAAAAAAAAAAAAAAAAAA Thread 0: AAAAAAAAAAAAAAAAAAAAAAAAAAAAA Thread 1: BBBBBBBBBBBBBBBBBBBBBBBBBBBBB Thread 1: BBBBBBBBBBBBBBBBBBBBBBBBBBBBB Thread 2: CCCCCCCCCCCCCCCCCCCCCCCCCCCCC Thread 2: CCCCCCCCCCCCCCCCCCCCCCCCCCCCC Thread 3: DDDDDDDDDDDDDDDDDDDDDDDDDDDDD Thread 3: DDDDDDDDDDDDDDDDDDDDDDDDDDDDD UNIX>Unfortunately, that output is not guaranteed. The reason is that threads can be preempted anywhere. In particular, they may be preempted in the middle of the for() loop, or in the middle of the printf() statement. This can lead to strange output. For example, try the following:
UNIX> bin/race2 2 70 200000 | grep 'AB'This searches for output lines where the character 'A' is followed by a B. When I ran this, I got:
UNIX> bin/race2 2 70 200000 | grep 'AB' Thread 0: AAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB Thread 1: AAAABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB UNIX>This shows two instances where thread 0 was interrupted by thread 1, which had been interrupted in the middle of its for loop. When thread 1 resumed, it overwrote the string with B's.
So, this program too has a race condition, this time with the shared array s. The output above is particularly confusing, which is often what happens with race conditions.
When you program with threads, you must pay attention to shared memory. If more than one thread can modify the shared memory, then you often need to protect the memory so that wierd things do not happen to the memory.
In our race2 program, we can "fix" the race condition by enforcing that no thread can be interrupted by another thread when it is modifying and printing s. This can be done with a mutex, sometimes called a "lock" or sometimes a "binary semaphore." There are three procedures for dealing with mutexes in pthreads:
int pthread_mutex_init(pthread_mutex_t *mutex, NULL); int pthread_mutex_lock(pthread_mutex_t *mutex); int pthread_mutex_unlock(pthread_mutex_t *mutex);
All three return 0 on success and something else if they fail.
You create a mutex with pthread_mutex_init(). You have to have allocated memory for it ahead of time (i.e. pthread_mutex_init() does not call malloc(). Then any thread may lock or unlock the mutex. When a thread locks the mutex, no other thread may lock it. If a thread calls pthread_mutex_lock() while the mutex is locked, then the thread will block until the mutex is unlocked. Only one thread may lock the mutex at a time.
I want to point out here, that pthread_mutex_lock() does not actively "lock" other threads. Instead, it locks a data structure, which can be shared among the threads. The locking and unlocking of the data structure makes synchronization guarantees, which are very important to avoiding race conditions. However, I don't want you to get into the habit of thinking that pthread_mutex_lock() actively blocks other threads, or "locks them out." It doesn't -- it locks a data structure, and when other threads try to lock the same data structure, they block. Please reread this paragraph.
So, we "fix" the program with src/race3.c. You'll notice that a thread locks the mutex just before modifying s and it unlocks the mutex just after printing s. This fixes the program so that the output makes sense:
UNIX> bin/race3 4 4 1 Thread 0: AAA Thread 1: BBB Thread 2: CCC Thread 3: DDD UNIX> bin/race3 4 4 2 Thread 0: AAA Thread 0: AAA Thread 2: CCC Thread 2: CCC Thread 1: BBB Thread 1: BBB Thread 3: DDD Thread 3: DDD UNIX> bin/race3 4 70 1 Thread 0: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Thread 1: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB Thread 2: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Thread 3: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD UNIX> bin/race3 2 70 100000 | grep AB This call will never have any output because of the mutex. UNIX>
Our example program is mutex_example_1.c, where we fork off multiple threads, and each thread shares a counter and a mutex. What each thread does is lock the mutex, update the counter, sleep for a bit, and then unlock the mutex. Before it unlocks the mutex, it checks to make sure that the counter has not been altered while it was asleep. The properties of the mutex data structure assure that this works. Here's the code.
#include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <pthread.h> /* Each thread is going to have private information and shared information. Here is the shared information. */ struct shared { int counter; /* A shared counter. */ pthread_mutex_t *lock; /* A lock to protect the counter. */ int usleeptime; /* Microseconds that each thread will sleep after updating the counter. */ }; /* Here is the private information */ struct info { int id; /* The thread's id. */ struct shared *s; /* Pointer to the shared information. */ }; /* Here is the procedure that each thread calls. In a nutshell, each thread locks the mutex, increments the counter, then sleeps. It then tests to make sure that the counter hasn't been modified, and unlocks the mutex. It repeats this loop indefinitely. */ void *share_counter(void *arg) { struct info *info; /* The thread's private info. */ struct shared *s; /* The thread's shared info. */ int counter; /* A copy of the counter, to test. */ info = (struct info *) arg; s = info->s; while (1) { /* Lock the mutex, update the counter and print. */ pthread_mutex_lock(s->lock); s->counter++; counter = s->counter; printf("Thread: %3d - Begin - Counter %3d.\n", info->id, s->counter); fflush(stdout); /* Sleep, and then print the counter again. */ usleep(s->usleeptime); printf("Thread: %3d - End - Counter %3d.\n", info->id, s->counter); fflush(stdout); /* Make sure the counter hasn't been modified, then unlock the mutex. */ if (s->counter != counter) { printf("Thread %d - Problem -- counter was %d, but now it's %d\n", info->id, counter, s->counter); exit(1); } pthread_mutex_unlock(s->lock); } return NULL; /* Shut the compiler up. */ } /* The main sets up the threads, and exits. */ int main(int argc, char **argv) { int nthreads; int usleeptime; pthread_t *tids; struct shared S; struct info *infos; int i; if (argc != 3) { fprintf(stderr, "usage: mutex_example nthreads usleep_time\n"); exit(1); } nthreads = atoi(argv[1]); usleeptime = atoi(argv[2]); tids = (pthread_t *) malloc(sizeof(pthread_t) * nthreads); infos = (struct info *) malloc(sizeof(struct info) * nthreads); for (i = 0; i < nthreads; i++) { infos[i].id = i; infos[i].s = &S; } S.counter = 0; S.usleeptime = usleeptime; S.lock = (pthread_mutex_t *) malloc(sizeof(pthread_mutex_t)); pthread_mutex_init(S.lock, NULL); for (i = 0; i < nthreads; i++) { pthread_create(tids+i, NULL, share_counter, (void *) &infos[i]); } pthread_exit(NULL); } |
You call this with the number of threads, and the number of microseconds that each thread sleeps. Let's call it with 4 threads and 10,000 microseconds. You'll see that it works as anticipated -- the threads line up on the mutex, and each time a thread unlocks the mutex, another thread grabs it and updates the counter:
UNIX> make bin/mutex_example_1 gcc -o bin/mutex_example_1 src/mutex_example_1.c -lpthread UNIX> bin/mutex_example_1 4 10000 | head -n 20 Thread: 0 - Begin - Counter 1. Thread: 0 - End - Counter 1. Thread: 1 - Begin - Counter 2. Thread: 1 - End - Counter 2. Thread: 2 - Begin - Counter 3. Thread: 2 - End - Counter 3. Thread: 3 - Begin - Counter 4. Thread: 3 - End - Counter 4. Thread: 0 - Begin - Counter 5. Thread: 0 - End - Counter 5. Thread: 1 - Begin - Counter 6. Thread: 1 - End - Counter 6. Thread: 2 - Begin - Counter 7. Thread: 2 - End - Counter 7. Thread: 3 - Begin - Counter 8. Thread: 3 - End - Counter 8. Thread: 0 - Begin - Counter 9. Thread: 0 - End - Counter 9. Thread: 1 - Begin - Counter 10. Thread: 1 - End - Counter 10. UNIX>
UNIX> make bin/mutex_example_2 gcc -o bin/mutex_example_2 src/mutex_example_2.c -lpthread UNIX> bin/mutex_example_2 4 10000 | head -n 20 Thread: 0 - Begin - Counter 1. Thread: 1 - Begin - Counter 2. Thread: 2 - Begin - Counter 3. Thread: 3 - Begin - Counter 4. Thread: 0 - End - Counter 4. Thread 0 - Problem -- counter was 1, but now it's 4 Thread: 1 - End - Counter 4. Thread: 2 - End - Counter 4. Thread: 3 - End - Counter 4. UNIX>What you see here is that all of the threads update the counter, with thread 0 updating it first. When thread 0 wakes up, the counter has been changed. Interestingly, between the time that it prints its error statement and the exit(1) call, threads 1, 2 and 3 wake up and print their counters. Then the exit(1) call kills the process.
This program is non-deterministic -- its output depends on the ordering of the threads by the system. Here's a second call, which is quite different:
UNIX> bin/mutex_example_2 4 10000 | head -n 20 Thread: 3 - Begin - Counter 3. Thread: 1 - Begin - Counter 1. Thread: 0 - Begin - Counter 1. Thread: 2 - Begin - Counter 2. Thread: 3 - End - Counter 3. Thread: 1 - End - Counter 3. Thread: 0 - End - Counter 3. Thread: 2 - End - Counter 3. Thread: 3 - Begin - Counter 4. Thread 1 - Problem -- counter was 1, but now it's 4 Thread 0 - Problem -- counter was 1, but now it's 4 Thread 2 - Problem -- counter was 2, but now it's 4 UNIX>There are two really interesting things here:
I want to point out here, that pthread_mutex_lock() does not actively "lock" other threads. Instead, it locks a data structure, which can be shared among the threads. The locking and unlocking of the data structure makes synchronization guarantees, which are very important to avoiding race conditions. However, I don't want you to get into the habit of thinking that pthread_mutex_lock() actively blocks other threads, or "locks them out." It doesn't -- it locks a data structure, and when other threads try to lock the same data structure, they block. Please reread this paragraph.
To illustrate this, src/mutex_example_3.c is the exact same as src/mutex_example_1.c, except the last thread does not lock or unlock the mutex. All of the others do. Take a look at the output:
UNIX> bin/mutex_example_3 4 10000 | head -n 20 Thread: 1 - Begin - Counter 1. Thread: 3 - Begin - Counter 2. Thread: 1 - End - Counter 2. Thread: 3 - End - Counter 2. Thread 1 - Problem -- counter was 1, but now it's 2 Thread: 3 - Begin - Counter 3. UNIX>Thread 1 locks the mutex, which means that threads 0 and 2 block as they try to lock the mutex. However, since thread 3 is not calling pthread_mutex_lock(), it goes ahead and updates the counter. Thread 1 discovers this after it wakes up, and flags the error. As you can see, just because thread 0 locks the mutex, that doesn't mean that it "locks out" all of the other threads. It works cooperatively with all of the other threads that try to lock the same mutex.