C460 Lecture notes -- Threads Lecture #1

Directory: /mahogany/homes/plank/cs460/notes/Thread1

Lecture notes -- html: file:/mahogany/homes/plank/cs460/notes/Thread1/lecture.html

Threads are covered in chapter 4 of the book. This semester, I am going to cover threads before anything else, because it will make your later programming much easier. When the time comes to read about threads in the book, you should be more or less experts about them.

Why threads?

There are 4 reasons why prople program with threads

It's fun/logical to program that way -- i.e. it might make the task of programming more efficient, even if it doesn't have any impact on the performance of the program
Multiprocessing on a uniprocessor (needs threaded OS)
Working on a shared-memory multiprocessor
Simplifies OS writing

Each of these assumes a different platform for the threads, and has different assumptions. Sometimes this gets blurred in the book, so I'll try to help keep this unblurred.

What are threads?

What are threads? Threads are often called "lightweight processes". Whereas a typical process in Unix consists of CPU state (i.e. registers), memory (code, globals, heap and stack), and OS info (such as open files, a process ID, etc), in a thread system there is a larger entity, called a "task", or sometimes a "pod".

The task consists of a memory (code, globals, heap), OS info, and threads. Each thread is a unit of execution, which consists of a stack and CPU state (i.e. registers). Multiple threads resemble multiple processes, except that multiple threads within a task use the same code, globals and heap. Thus, while two processes in Unix can only communicate through the operating system (e.g. through files, pipes, or sockets), two threads in a task can communicate through memory.

There are various primitives that a thread system must provide. Let's start with three basic ones. In this initial discussion, I am talking about a generic thread system. We'll talk about specific ones (such as the sun lightweight process library) later.

 tcb thread_fork(procedure, arguments);

This says to create a new thread which runs the given procedure with the given arguments. Sometimes the arguments are omitted, and sometimes only one argument (a (void *)) is allowed. It returns a pointer to the new thread (which I'll call a thread control block).

 thread_join(tcb)

This says to wait for the thread represented by tcb to finish executing. Often thread_join() returns an integer or a (void *) as its exit value. You can think of thread_join() as analogous to wait() in Unix it waits for the specified thread to complete, and gathers information about the thread's exit status.

 thread_yield()

In the thread system, some threads will be blocked (e.g. those calling thread_join()) and some will be ready to run. The ready ones will be on a ready queue. Thread_yield() says to put the currently running thread on the ready queue and run some other thread on the ready queue.

The Sun Lightweight Process Library

On our Sparcs, there is a thread system that you can use. It is the ``lwp'' library, which stands for ``lightweight process'' library. To use the lwp library, first you have to make the following #include directives:

#include < lwp/lwp.h >
#include < lwp/check.h >
#include < lwp/lwpmachdep.h >
#include < lwp/stackdep.h >

And you have to link liblwp.a to your object files. (i.e. if your program is in main.c, you need to do the following to make your thread executable):

UNIX> cc -c main.c
UNIX> cc -o main main.o -llwp

There's a lot of junk in the lwp library. You can read about it in the various man pages. Start with ``man 3l intro''. The three basic primitives described above are:

int lwp_create(tid, func, prio, flags, stack, nargs, arg1, ..., argn)
thread_t *tid;
void (*func)();
int prio;
int flags;
stkalign_t *stack;
int nargs;
int arg1, ..., argn;

int lwp_join(tid)
thread_t tid;
    
int lwp_yield(tid)
thread_t tid;

Yuck. lwp_create() is not too simple. You give it a function func and arguments (defined with nargs and arg1, ..., argnargn>), but you must also give it a priority, some flags, and a stack for it to use. A priority of 1 and flags of 0 work fine. To allocate a stack, there are procedures to help you out:

Call lwp_setstkcache(MINSTACKSZ*sizeof(stkalign_t), 4) at the beginning of your main() routine. This initializes 4 stacks and saves them. When you want a stack, you call

  stkalign_t *s;

  s = lwp_newstk();

Then you can use that stack for the lwp_create() call. Finally, lwp_create() puts the tcb info into the tid variable. This is what you use for lwp_join(). So, for example, suppose you want to fork off one thread that prints ``Hello world''. The following program will do it. (This is in hw.c)

#include < lwp/lwp.h >
#include < lwp/check.h >
#include < lwp/lwpmachdep.h >
#include < lwp/stackdep.h >
#include < stdio.h >

printme()
{
  printf("Hello world\n");
}

main()
{
  stkalign_t *s;
  thread_t t;

  lwp_setstkcache(MINSTACKSZ*sizeof(stkalign_t), 1);
  s = lwp_newstk();
  lwp_create(&t, printme, 1, 0, s, 0);
  lwp_join(t);
}

Try copying hw.c to your home area, compiling it, and running it. It should print out ``Hello world''.

Forking multiple threads

Now, look at print4.c. This forks off 4 threads that print out ``Hi. I'm thread n'', where n goes from zero to 3. This should give you a good idea of how the lwp library works. Feel free to play with this library to get a feeling for how a thread system works. Since Unix is not multithreaded, and since your machines are not multiprocessors, the threads don't get you any extra performance. It just lets you play with threads.

The lwp library has the threads be non-preemptive. Thus, a thread does not relinquish control of the cpu unless it blocks or calls lwp_yield(). This means that you don't have to worry about synchronization like you would if you were running on a multiprocessor or a thread system which preemptively reschedules threads. In fact, you should be able to trace exactly how the print4 program should run.

Producer/consumer

In a producer/consumer program, you have one thread (or many) producing items that get put into a buffer or queue, and one (or many) threads consuming items from the queue. For example, look at the following program (in seqpc.c):

main()
{
  struct inputstruct *input;
  struct outputstruct *output;
  
  while(1) {
    input = read_input();
    if (input == NULL) exit(0);
    output = process_input(input);
    print_output(output);
  }
}

It's not really important what read_input(), process_input() and print_output() really do, except that they read input, process the input and make output, and print the output respectively. I've defined them as follows in stuff.c:

read_input() reads in a line from stdin, and stores it in a struct inputstruct.
process_input() takes a struct inputstruct as an argument, and assumes that it is character string containing four integers. It adds those four integers, then frees the inputstruct. It allocates an outputstruct, places the sum into the outputstruct, and finally returns the outputstruct.
print_output() prints out this sum.

Try out seqpc:

UNIX> seqpc
1 2 3 4
Sum: 10
1 1 1 1
Sum: 4
1 1 1 1 1
Sum: 4
1
Sum: 1
a b c d
Sum: 0

UNIX>

Now, a standard way to turn seqpc.c into a threaded program is to separate threads do the producing (reading input and processsing it) and consuming (printing the output). Look at the program threadpc1.c. It does just this with three threads. The first one forks off a producer and consumer thread. Then the producer thread reads inputstructs, processes them, and puts the outputstruct into a dlist (doubly linked list). The consumer thread takes outputstructs from the dlist and prints them out.

(the SELF is a thread_t that points to the calling thread)

Try copying this and running it:

UNIX> threadpc1
1 2 3 4

Nothing happens. Hmmm. Type CNTL-D though:

UNIX> threadpc1
1 2 3 4
< CNTL-D >
Sum: 10
UNIX>

What's going on? Well, in read_input(), the thread blocks on the read() call (inside gets()). Since Unix doesn't support threaded system calls, this blocks the whole process (or task). So, even though the consumer thread can run, it doesn't because the producer thread blocks. Were we working on a true multiprocessor or on an operating system that actually supports threads (rather than a user-level library like this one), the consumer thread could run even though the producer thread is blocked. Thus, this wouldn't be a problem. However, there may be other problems though, which I'll talk about later.

Look at threadpc2.c. It puts a lwp_yield() call after the dl_insert_b() call, which lets the consumer thread run.

Now you get:

UNIX> threadpc2
1 2 3 4
Sum: 10
0 0 0 2
Sum: 2

UNIX>

This is an important first lesson about threads. You have to know the parameters of your thread system. In particular:

Are threads preemptive? In other words, can a thread be interrupted in the middle of its execution so that another thread can run?
Is blocking a problem? In the sun lightweight process library, if one thread makes a blocking system call (like read()), then the entire task blocks. In most threaded systems (such as multiprocessors or operating systems such as Solaris), blocking is not a problem. In fact, one of the main advantages of threads is on systems where one thread may block on a system call, but others may run. This presents an excellent programming interface, which we will talk about later.