CS360 Lecture notes -- Pipe

James S. Plank
Directory: /home/plank/cs360/notes/Pipe
Lecture notes: http://web.eecs.utk.edu/~jplank/plank/classes/cs360/360/notes/Pipe/lecture.html
Original Lecture Notes: Mid 1990's.
Most recent modification: Wed Mar 24 15:16:44 EDT 2021

Pipe()

First, read the pipe man page. I've included the SunOS version (from the 1990's) in pipe.txt. It is one of the best 70 lines of text you will ever read. It tells you all you need to know in a very terse manner. It is beautiful.

The pipe() system call gives parent-child processes a way to communicate with each other. It is called as follows:

int pipe(int fd[2]);

In other words, you pass it an array of two integers. It fills in that array with two file descriptors that can talk to each other. Anything that is written on fd[1] may be read by fd[0]. This is of no use in a single process. However, between processes, it gives a method of communication.

Nonetheless, my first program will just be in a single process. Look at src/pipe0.c:

/* pipe0.c - Create a pipe in the current process, write to it and read from it. */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>

int main()
{
  int pipefd[2];
  int i;
  char s[1000];
  char *s2;

  /* Create the pipe. */

  if (pipe(pipefd) < 0) {
    perror("pipe");
    exit(1);
  }

  /* Write an 11-byte string to it.  This gets stored in the operating system. */

  s2 = "James Plank";
  write(pipefd[1], s2, strlen(s2));

  /* Now read the string from the pipe.  Even though we ask for 1000 bytes,
     it simply returns the 11 bytes that are in the pipe. */ 

  i = read(pipefd[0], s, 1000);
  s[i] = '\0';

  printf("Read %d bytes from the pipe: '%s'\n", i, s);
  return 0;
}

This first calls pipe() to set up two file descriptors pipefd[0] and pipefd[1]. Anything written to pipefd[1] can be read by pipefd[0]. To put this in another way, whenever you call write(fd, buf, size), your process sends size bytes starting at the address specified by buf to the operating system. The fd tells the operating system what to do with those bytes. Usually fd is a file descriptor returned by open() -- thus, your write() call tells the operating system to write those bytes to a file. However, there are other types of file descriptors. For example, when you say write(1, buf, size), you are saying to print those bytes to standard output, which often is not a disk file, but instead is a terminal. When fd is the writing end of a pipe, the write() specifies for the operating system to hold those bytes in a buffer until some process requests for them by performing a read() on the read end of the pipe.

It's important to recognize that all interprocess communication must take place through the operating system. Pipes are a nice clean way for this to occur.

Back to src/pipe0.c. After the pipe() system call returns in pipe0, we can view the running process as having 5 open file descriptors: standard input (0), standard output (1), standard error (2), the read end of the pipe (pipefd[0]), and the write end of the pipe (pipefd[1]). Each of those file descriptors is a pointer to the operating system. We can visualize it as follows:

   pipe0        
|----------|      file  
| code,    |   descriptors
| globals, |                  |---------|
| heap.    |       0 <-----   |         |
|          |       1 ----->   |operating|
|          |       2 ----->   | system  |
|          | pipefd[0] <---   |         |
|          | pipefd[1] --->   |---------|
|          |
| stack    |
|----------|

Now, we first call: "write(pipefd[1], s2, strlen(s2));"

This sends the string "James Plank" to the operating system, which holds it in a buffer:

   pipe0           
|----------|      file 
| code,    |   descriptors
| globals, |                  |---------|
| heap.    |       0 <-----   |         |
|          |       1 ----->   |operating|
|          |       2 ----->   | system  |
|          | pipefd[0] <---   |         |
|          | pipefd[1] --->   |         |-> "James Plank"
|          |                  |---------|
| stack    |
|----------|

Next, we call "i = read(pipefd[0], s, 1000);", which attempts to read up to 1000 bytes from the pipe. This extracts the string "James Plank" from the OS and puts it into the variable s:

   pipe0           
|----------|      file 
| code,    |   descriptors
| globals, |                  |---------|
| heap.    |       0 <-----   |         |
|          |       1 ----->   |operating|
|          |       2 ----->   | system  |
|          | pipefd[0] <---   |         |
|          | pipefd[1] --->   |         |
|          |                  |---------|
| stack   s|-> "James Plank"
|----------|

This is a very simple use of pipes, and is not really something that you would ever do. However, it shows the use of a pipe from within one process.

Now, look at src/pipe1.c

/* This program shows how a parent and 
   child can communicate with a pipe. */

int main()
{
  int pipefd[2];
  int pid;
  int i, line;
  char s[1000];

  if (pipe(pipefd) < 0) {
    perror("pipe");
    exit(1);
  }

  pid = fork();

  /* The parent reads lines of input from 
     standard input, and writes them 
     to the pipe. */

  if (pid > 0) {
    while(fgets(s, 1000, stdin) != NULL) {
      write(pipefd[1], s, strlen(s));
    }

  /* The child reads single characters from
     the pipe, and when it sees a newline, it
     writes the line to standard output, 
     preceded by the line number. */

  } else {
    i = 0;
    line = 1;
    while(read(pipefd[0], s+i, 1) == 1) {
      if (s[i] == '\n') {
        s[i] = '\0';
        printf("%6d  %s\n", line, s);
        line++;
        i = 0;
      } else {
        i++;
      }
    }
  }
  return 0;
} 

/* I'll comment here that you shouldn't write code
   like the child that reads one byte at a time.  Why? */

Again, after pipe() is called, the system looks like:

   pipe1           
|----------|      file 
| code,    |   descriptors
| globals, |                  |---------|
| heap.    |       0 <-----   |         |
|          |       1 ----->   |operating|
|          |       2 ----->   | system  |
|          | pipefd[0] <---   |         |
|          | pipefd[1] --->   |---------|
|          |
| stack    |
|----------|

Now, when fork() is called, a new process is created which is a duplicate of the original pipe1 process. The file descriptors are also duplicated so that they are the same pointers into the operating system. The state now looks like:

pipe1(parent)                                                 pipe1(child)
|----------|      file-                                       |----------|
| code,    |   descriptors                                    | code,    |
| globals, |                  |---------|                     | globals, |
| heap.    |       0 <-----   |         |  -----> 0           | heap.    |
|          |       1 ----->   |operating|  <----- 1           |          |
|          |       2 ----->   | system  |  <----- 2           |          |
|          | pipefd[0] <---   |         |  ---> pipefd[0]     |          |
|          | pipefd[1] --->   |---------|  <--- pipefd[1]     |          |
|          |                                                  |          |
| stack    |                                                  | stack    |
|----------|                                                  |----------|

The parent process now calls fgets() to read lines of text from standard input, and writes them to the pipe. The child reads from the pipe, and prints each line on standard output, preceeded by its line number. This code should seem straightforward to you. You should see that each write to pipefd[1] goes to the operating system, which it passes it to the child process as it calls read on pipefd[0].

Try running pipe1:

UNIX> bin/pipe1
How bout them Vols!
     1  How bout them Vols!
Give him six!
     2  Give him six!
Juice em, Big Dog, Juice em!
     3  Juice em, Big Dog, Juice em!
<CNTL-D>
UNIX>

Looks good. Now, try doing the same thing to an output file:

UNIX> bin/pipe1 > output
How bout them Vols!
Give him six!
Juice em, Big Dog, Juice em!
<CNTL-D>
UNIX> cat output
UNIX>

Hmmm. This appears to be a problem. Since fork() duplicates file descriptors we'd assume that the child process writes to output, as that is where standard output has been redirected. This is correct. The problem can be seen by doing a ps x (or ps aux | grep $USER):

UNIX> ps aux | grep plank
...
plank     6277  0.1  0.3  760  576 pts/22   S 09:40:25  0:00 grep plank
plank     6241  0.0  0.2  684  436 pts/22   S 09:39:02  0:00 pipe1
plank     6244  0.0  0.2  700  452 pts/22   S 09:39:23  0:00 pipe1
...
UNIX>

What's going on? Well, it can best be explained by the picture below. When the parent process receives CNTL-D, it closes pipefd[1], and then exits. The state of the system now looks as follows:

pipe1(parent)                                                 pipe1(child)
                                                              |----------|
exited                                                        | code,    |
                              |---------|                     | globals, |
                              |         |  -----> 0           | heap.    |
                              |operating|  <----- 1           |          |
                              | system  |  <----- 2           |          |
                              |         |  ---> pipefd[0]     |          |
                              |---------|  <--- pipefd[1]     |          |
                                                              |          |
                                                              | stack    |
                                                              |----------|

Note that the child process still has pipefd[1] open. Thus, it is waiting to read from pipefd[0], and the operating system doesn't know that no process will be writing to pipefd[1]. So the child process just sits there doing nothing. There is nothing in the output file because the child is printing using printf(), which buffers the output. As the buffer isn't full, it hasn't performed the write(1, ...) yet, and thus we don't see anything in the output file.

As the ps x command shows, there are two pipe1 processes -- one child from each of the two pipe1's that we called above. (i.e. "bin/pipe1" and "./pipe1 > output").

Make sure you understand what has gone on here before you read further. The child process is hung reading from pipefd[0]. The read() call will not return because there is nothing to read, and since pipefd[1] is not closed, the OS cannot make the read() call return with a value of zero. Thus, the process is hung.

Now, kill those two child processes:

UNIX> kill 6241 6244

And look at src/pipe2.c. Pipe2.c has the parent close the file descriptors that it is not going to use, and it has the child close the file descriptors that it is not going to use. When the parent and child both enter their loops, the state of the system looks as follows, due to the closing of unused file descriptors:

pipe2(parent)     file                                        pipe2(child)
|----------|      desc-                                       |----------|
| code,    |     riptors                                      | code,    |
| globals, |                  |---------|                     | globals, |
| heap.    |       0 <-----   |         |                     | heap.    |
|          |                  |operating|  <----- 1           |          |
|          |       2 ----->   | system  |  <----- 2           |          |
|          |                  |         |  ---> pipefd[0]     |          |
|          | pipefd[1] --->   |---------|                     |          |
|          |                                                  |          |
| stack    |                                                  | stack    |
|----------|                                                  |----------|

Now, when you type < CNTL-D >, the parent exits, leaving the system in the following state:

pipe2(parent)                                                 pipe2(child)
                                                              |----------|
exited                                                        | code,    |
                              |---------|                     | globals, |
                              |         |                     | heap.    |
                              |operating|  <----- 1           |          |
                              | system  |  <----- 2           |          |
                              |         |  ---> pipefd[0]     |          |
                              |---------|                     |          |
                                                              |          |
                                                              | stack    |
                                                              |----------|

Note that the writing end of pipefd is gone completely. Thus, the operating system can have the child's read(pipefd[0], ...) return zero, and the child exits gracefully. So, when you call pipe2 as you did pipe1 before, the output file is correctly created, and there are no child processes left hanging around:

UNIX> bin/pipe2 > output
How bout them Vols!
Give him six!
Juice em, Big Dog, Juice em!
< CNTL-D >
UNIX> cat output
     1  How bout them Vols!
     2  Give him six!
     3  Juice em, Big Dog, Juice em!
UNIX>

If you do a "ps x", you should see no pipe2 processes.

SIGPIPE

As you see from the program above, when you try to read from a pipe that has no write end, the read() returns 0. When you try to write to a pipe that has no read end, a SIGPIPE signal is generated. If the signal isn't handled, the program exits silently. This is nice because it means that, for example, you execute:

UNIX> cat exec1.c | head -n 5 | tail -n 1

You'll notice that the first process to die will be the middle one, because it exits after reading the first five lines of standard input. When it exits, the other two will exit automatically -- tail will have read() return 0, and will exit, and cat will try to write to an empty pipe, and thus will generate SIGPIPE and exit.

Headsort

You now have enough information to do the shell lab. To help you out, look at the program src/headsort.c -- this does what the shell does when you run:

UNIX> head -n 10 src/headsort.c | sort

If we don't get to it in class, go over the code yourself. You will find it very helpful for the Jsh lab.

More on SIGPIPE

Come back to this section after you have learned about signals. I don't think that it's overly useful at this point in the class.

Look at src/pipe3.c.

It does the same thing as the others, but catches SIGPIPE (If signals are unknown to you, read Chapter 10 in the book, and the signal man page. We will have a lecture on signal later in the class). To test it, run pipe3;

UNIX> bin/pipe3
Juice em, Big Dog, Juice em!
     1  Juice em, Big Dog, Juice em!

Then, in another window, kill the child process -- it will be the one with the higher pid:

UNIX> ps aux | grep plank
...
plank     7064  0.1  0.2  684  452 pts/22   S 09:44:24  0:00 pipe3
plank     7065  0.0  0.2  684  304 pts/22   S 09:44:24  0:00 pipe3
...
UNIX> kill 7065
UNIX>

You'll see nothing happen in the pipe3 window, but the child is gone. (Type "ps aux | grep $USER" again to make sure). This means that there is no process that has pipefd[0] open. Thus, if you type into the bin/pipe3 process:

Give Him Six!
15454: caught a SIGPIPE
UNIX>

The write() to pipefd[1] generates SIGPIPE.