CS360 Lecture notes -- Pipe

  • Jim Plank, modified by Jian Huang
  • Directory: ~huangj/cs360/notes/Pipe
  • Lecture notes: http://www.cs.utk.edu/~huangj/cs360/360/notes/Pipe/lecture.html

    Pipe()

    First, read the pipe man page. I've included the SunOS version in pipe.txt. It is one of the best 70 lines of text you will ever read. It tells you all you need to know in a very terse manner. It is beautiful.

    The pipe() system call gives parent-child processes a way to communicate with each other. It is called as follows:

    int pipe(int fd[2]);
    
    In other words, you pass it an array of two integers. It fills in that array with two file descriptors that can talk to each other. Anything that is written on fd[1] may be read by fd[0]. This is of no use in a single process. However, between processes, it gives a method of communication.

    Look at pipe.c. This first calls pipe() to set up two file descriptors pipefd[0] and pipefd[1]. Anything written to pipefd[1] can be read by pipefd[0]. To put this in another way, whenever you call write(fd, buf, size), your process sends size bytes starting at the address specified by buf to the operating system. The fd tells the operating system what to do with those bytes. Usually fd is a file descriptor returned by open() -- thus, your write() call tells the operating system to write those bytes to a file. However, there are other types of file descriptors. For example, when you say write(1, buf, size), you are saying to print those bytes to standard output, which often is not a disk file, but instead is a terminal. When fd is the writing end of a pipe, the write() specifies for the operating system to hold those bytes in a buffer until some process requests for them by performing a read() on the read end of the pipe.

    It's important to recognize that all interprocess communication must take place through the operating system. Pipes are a nice clean way for this to occur.

    Back to pipe.c. When we first execute pipe, we can view the running process as having 5 open file descriptors: standard input (0), standard output (1), standard error (2), the read end of the pipe (pipefd[0]), and the write end of the pipe (pipefd[1]). Each of those file descriptors is a pointer to the operating system. We can visualize it as follows:

       pipe            file
    |----------|      desc-
    | code,    |     riptors
    | globals, |                  |---------|
    | heap.    |       0 <-----   |         |
    |          |       1 ----->   |operating|
    |          |       2 ----->   | system  |
    |          | pipefd[0] <---   |         |
    |          | pipefd[1] --->   |---------|
    |          |
    | stack    |
    |----------|
    
    Now, we first call: "write(pipefd[1], s2, strlen(s2));"

    This sends the string "Rex Morgan MD" to the operating system, which holds it in a buffer:

       pipe            file
    |----------|      desc-
    | code,    |     riptors
    | globals, |                  |---------|
    | heap.    |       0 <-----   |         |
    |          |       1 ----->   |operating|
    |          |       2 ----->   | system  |
    |          | pipefd[0] <---   |         |
    |          | pipefd[1] --->   |         |-> "Rex Morgan MD"
    |          |                  |---------|
    | stack    |
    |----------|
    
    Next, we call "i = read(pipefd[0], s, 1000);", which attempts to read up to 1000 bytes from the pipe. This extracts the string "Rex Morgan MD" from the OS and puts it into the variable s:
       pipe            file
    |----------|      desc-
    | code,    |     riptors
    | globals, |                  |---------|
    | heap.    |       0 <-----   |         |
    |          |       1 ----->   |operating|
    |          |       2 ----->   | system  |
    |          | pipefd[0] <---   |         |
    |          | pipefd[1] --->   |         |
    |          |                  |---------|
    | stack   s|-> "Rex Morgan MD"
    |----------|
    
    This is a very simple use of pipes, and is really next to useless. However, it shows the use of a pipe from within one process.

    The normal way to use pipe is to first create a pipe, then fork a child process. Since both the parent and child has the same file descriptors, communication via pipe can be easily implemented.

    A very important point to note here is that pipe has been considered to be half duplex only, instead of being full duplex. In fact, in most cases, communication via pipe is only unidirectional. The correct way to program pipe is to have both the parent and the child to close a pipefd that they do not use. If you don't do this, very interesting things happen. See below for an example.


    Now, look at pipe1.c.

    Again, after pipe() is called, the system looks like:

       pipe1           file
    |----------|      desc-
    | code,    |     riptors
    | globals, |                  |---------|
    | heap.    |       0 <-----   |         |
    |          |       1 ----->   |operating|
    |          |       2 ----->   | system  |
    |          | pipefd[0] <---   |         |
    |          | pipefd[1] --->   |---------|
    |          |
    | stack    |
    |----------|
    
    Now, when fork() is called, a new process is created which is a duplicate of the original pipe1 process. The file descriptors are also duplicated so that they are the same pointers into the operating system. The state now looks like:
    pipe1(parent)     file                                        pipe1(child)
    |----------|      desc-                                       |----------|
    | code,    |     riptors                                      | code,    |
    | globals, |                  |---------|                     | globals, |
    | heap.    |       0 <-----   |         |  -----> 0           | heap.    |
    |          |       1 ----->   |operating|  <----- 1           |          |
    |          |       2 ----->   | system  |  <----- 2           |          |
    |          | pipefd[0] <---   |         |  ---> pipefd[0]     |          |
    |          | pipefd[1] --->   |---------|  <--- pipefd[1]     |          |
    |          |                                                  |          |
    | stack    |                                                  | stack    |
    |----------|                                                  |----------|
    
    The parent process now calls fgets() to read lines of text from standard input, and writes them to the pipe. The child reads from the pipe, and prints each line on standard output, preceeded by its line number. This code should seem straightforward to you. You should see that each write to pipefd[1] goes to the operating system, which it passes it to the child process as it calls read on pipefd[0].

    Try running pipe1:

    UNIX> pipe1
    How bout them Vols!
         1  How bout them Vols!
    Give him six!
         2  Give him six!
    Juice em, Big Dog, Juice em!
         3  Juice em, Big Dog, Juice em!
    < CNTL-D >
    UNIX>
    

    Looks good. Now, try doing the same thing to an output file:

    UNIX> pipe1 > output
    How bout them Vols!
    Give him six!
    Juice em, Big Dog, Juice em!
    < CNTL-D >
    UNIX> cat output
    UNIX>
    
    Hmmm. This appears to be a problem. Since fork() duplicates file descriptors we'd assume that the child process writes to output, as that is where standard output has been redirected. This is correct. The problem can be seen by doing a ps x (or ps aux | grep $USER):
    UNIX> ps aux | grep plank
    ...
    plank     6277  0.1  0.3  760  576 pts/22   S 09:40:25  0:00 grep plank
    plank     6241  0.0  0.2  684  436 pts/22   S 09:39:02  0:00 pipe1
    plank     6244  0.0  0.2  700  452 pts/22   S 09:39:23  0:00 pipe1
    ...
    UNIX>
    
    What's going on? Well, it can best be explained by the picture below. When the parent process receives CNTL-D, it closes pipefd[1], and then exits. The state of the system now looks as follows:
    pipe1(parent)                                                 pipe1(child)
                                                                  |----------|
    exited                                                        | code,    |
                                  |---------|                     | globals, |
                                  |         |  -----> 0           | heap.    |
                                  |operating|  <----- 1           |          |
                                  | system  |  <----- 2           |          |
                                  |         |  ---> pipefd[0]     |          |
                                  |---------|  <--- pipefd[1]     |          |
                                                                  |          |
                                                                  | stack    |
                                                                  |----------|
    
    Note that the child process still has pipefd[1] open. Thus, it is waiting to read from pipefd[0], and the operating system doesn't know that no process will be writing to pipefd[1]. So the child process just sits there doing nothing. There is nothing in the output file because the child is printing using printf(), which buffers the output. As the buffer isn't full, it hasn't performed the write(1, ...) yet, and thus we don't see anything in the output file.

    As the ps x command shows, there are two pipe1 processes -- one child from each of the two pipe1's that we called above. (i.e. "pipe1" and "pipe1 > output").

    Make sure you understand what has gone on here before you read further. The child process is hung reading from pipefd[0]. The read() call will not return because there is nothing to read, and since pipefd[1] is not closed, the OS cannot make the read() call return with a value of zero. Thus, the process is hung.


    Now, kill those two child processes:
    UNIX> kill 6241 6244
    
    And look at pipe2.c. Pipe2.c has the parent close the file descriptors that it is not going to use, and it has the child close the file descriptors that it is not going to use. When the parent and child both enter their loops, the state of the system looks as follows, due to the closing of unused file descriptors:
    pipe2(parent)     file                                        pipe2(child)
    |----------|      desc-                                       |----------|
    | code,    |     riptors                                      | code,    |
    | globals, |                  |---------|                     | globals, |
    | heap.    |       0 <-----   |         |                     | heap.    |
    |          |                  |operating|  <----- 1           |          |
    |          |       2 ----->   | system  |  <----- 2           |          |
    |          |                  |         |  ---> pipefd[0]     |          |
    |          | pipefd[1] --->   |---------|                     |          |
    |          |                                                  |          |
    | stack    |                                                  | stack    |
    |----------|                                                  |----------|
    
    Now, when you type < CNTL-D >, the parent exits, leaving the system in the following state:
    pipe2(parent)                                                 pipe2(child)
                                                                  |----------|
    exited                                                        | code,    |
                                  |---------|                     | globals, |
                                  |         |                     | heap.    |
                                  |operating|  <----- 1           |          |
                                  | system  |  <----- 2           |          |
                                  |         |  ---> pipefd[0]     |          |
                                  |---------|                     |          |
                                                                  |          |
                                                                  | stack    |
                                                                  |----------|
    
    Note that the writing end of pipefd is gone completely. Thus, the operating system can have the child's read(pipefd[0], ...) return zero, and the child exits gracefully. So, when you call pipe2 as you did pipe1 before, the output file is correctly created, and there are no child processes left hanging around:
    UNIX> pipe2 > output
    How bout them Vols!
    Give him six!
    Juice em, Big Dog, Juice em!
    < CNTL-D >
    UNIX> cat output
         1  How bout them Vols!
         2  Give him six!
         3  Juice em, Big Dog, Juice em!
    UNIX>
    
    If you do a "ps x", you should see no pipe2 processes.

    SIGPIPE

    As you see from the program above, when you try to read from a pipe that has no write end, the read() returns 0. When you try to write to a pipe that has no read end, a SIGPIPE signal is generated. If the signal isn't handled, the program exits silently. This is nice because it means that, for example, you execute:
    UNIX> cat exec1.c | head -5 | tail -1
    
    and you kill the middle process (the head one), the other two will exit automatically -- tail will have read() return 0, and will exit, and cat will try to write to an empty pipe, and thus will generate SIGPIPE and exit.

    Look at pipe3.c.

    It does the same thing as the others, but catches SIGPIPE (If signals are unknown to you, read Chapter 10 in the book, and the signal man page. We will have a lecture on signal later in the class). So, to test it, run pipe3;

    UNIX> pipe3
    Juice em, Big Dog, Juice em!
         1  Juice em, Big Dog, Juice em!
    
    Then, in another window, kill the child process -- it will be the one with the higher pid:
    UNIX> ps aux | grep plank
    ...
    plank     7064  0.1  0.2  684  452 pts/22   S 09:44:24  0:00 pipe3
    plank     7065  0.0  0.2  684  304 pts/22   S 09:44:24  0:00 pipe3
    ...
    UNIX> kill 7065
    UNIX>
    
    You'll see nothing happen in the pipe3 window, but the child is gone. (Type "ps aux | grep $USER" again to make sure). This means that there is no process that has pipefd[0] open. Thus, if you type into the pipe3 process:
    Give Him Six!
    15454: caught a SIGPIPE
    UNIX>
    
    The write() to pipefd[1] generates SIGPIPE.

    This should give you enough info to do the next lab. (Actually, you won't need SIGPIPE in the Jsh lab. That will come up later). To help you out, look at the program headsort.c -- this implements:

    "head -10 headsort.c | sort"
    
    and waits for it to finish. If we don't get to it in class, go over the code yourself. You will find it very helpful for the Jsh lab.
    Since a common operation is to create a pipe to another process, to either read its output or write to its input, the standard Unix I/O library (note, not ANSI C standard) has provided the popen and pclose functions. These two functions handle the following steps:
           creating a pipe
           fork a child
           close the unused ends of the pipe
           exccing a shell to execute a command
           waiting for the command to terminate
    

    Of course, popen and pclose have extensive error checking. For your lab assignments, you are not allowed to use either popen or pclose. However, if you are ever curious about how many errors one should expect, you can consult the sample code in Chapter 14. Note, you are not allowed to use waitpid either. So, don't just blindly follow those sample source code.