CS360 Lecture notes -- Wait and Exec


Wait()

Wait() is a very important system call. What it does is wait for one of your child processes, that you have created with fork(), to exit. When one exits, wait() returns and gives you some information about the child and how it exited. Here's the prototype (man -s wait):

#include 

pid_t wait(int *stat_loc);   // pid_t is an int or uint32_t, typically

When you call it, if you don't have any children, it returns -1. Otherwise, it will wait for one of your children to exit. When they do, the wait() call will return with the process id of the child that exited. It will also fill in the integer pointed to by stat_loc with information on how the child exited. There are macros in the sys/wait.h include file that can help you parse this integer.

Here are examples. src/forkwait0.c calls wait() with no children:

/* This shows what happens when you call wait() and have no children.  
   It will return with a value of -1. */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main()
{
  int rv, stat_loc;
  
  stat_loc = 0xabcdef;

  rv = wait(&stat_loc);
  printf("RV: %d.   Stat_loc = 0x%x\n", rv, (unsigned int) stat_loc);
  return 0;
}

It returns instantly with a return value of -1, and stat_loc remains unchanged:

UNIX> bin/forkwait0
RV: -1.   Stat_loc = 0xabcdef
UNIX> 
src/forkwait1.c forks off one child, which exits instantly with a return code of zero. The parent calls wait() and prints out a bunch of stuff about the return status:

/* Fork off one child that exits with a value of 0.
   The parent uses macros to examine the status variable of wait. */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main()
{
  int i, j, status;

  i = fork();

  if (i > 0) {
    j = wait(&status);
    printf("Parent: Child done.\n");
    printf("  Return value: %d\n", j);
    printf("  Status:       %d\n", status);
    printf("  WIFSTOPPED:   %d\n", WIFSTOPPED(status));
    printf("  WIFSIGNALED:  %d\n", WIFSIGNALED(status));
    printf("  WIFEXITED:    %d\n", WIFEXITED(status));
    printf("  WEXITSTATUS:  %d\n", WEXITSTATUS(status));
    printf("  WTERMSIG:     %d\n", WTERMSIG(status));
    printf("  WSTOPSIG:     %d\n", WSTOPSIG(status));
  } else {
    printf("Child (%d) calling exit(0)\n", getpid());
    exit(0);                                             // BTW, "return 0" will do the same thing.
  }
  return 0;
}

You can see with "WEXITSTATUS" that the child exited with a return code of zero.

UNIX> bin/forkwait1
Child (8575) calling exit(0)
Parent: Child done.
  Return value: 8575
  Status:       0
  WIFSTOPPED:   0
  WIFSIGNALED:  0
  WIFEXITED:    1
  WEXITSTATUS:  0
  WTERMSIG:     0
  WSTOPSIG:     0
UNIX> 
src/forkwait2.c is the exact same as src/forkwait1.c, except that the child calls exit(1) instead of exit(0):
UNIX> bin/forkwait2
Child (8747) calling exit(1)
Parent: Child done.
  Return value: 8747
  Status:       256
  WIFSIGNALED:  0
  WIFEXITED:    1
  WEXITSTATUS:  1
  WTERMSIG:     0
UNIX> 
With src/forkwait3.c, the child goes into an infinite loop, which means that the parent's wait() call does not return:
UNIX> forkwait3
Child (8912) doing nothing until you kill it
Kill the child with 'kill -9 8912'  or just 'kill 8912'
In another window, go ahead and kill the child process:
UNIX> kill -9 8912
Now, the parent's wait() call returns, and shows you that the child process was terminated with "signal 9." That is how you killed the child process with the "kill" command:
Parent: Child done.
  Return value: 8912
  Status:       9
  WIFSTOPPED:   0
  WIFSIGNALED:  1
  WIFEXITED:    0
  WEXITSTATUS:  0
  WTERMSIG:     9
  WSTOPSIG:     0
UNIX> 
Finally, src/forkwait4.c has the child generate a segmentation violation. That is reported to the parent as terminating with "signal 11."
UNIX> bin/forkwait4
Child (9255) generating a seg fault
Parent: Child done.
  Return value: 9255
  Status:       11
  WIFSTOPPED:   0
  WIFSIGNALED:  1
  WIFEXITED:    0
  WEXITSTATUS:  0
  WTERMSIG:     11
  WSTOPSIG:     0
UNIX> 
We'll go over signals in detail in another lecture.

Ok. Now, look at src/forkwait5.c. It does the following:

As mentioned above, by the time the parent calls system("ps aux | grep plank"), the child has exited. Thus, we might expect there to be no listing in the "ps x" command for the child, and possibly that the wait() might wait forever, since the child is gone. However, this is not the case.

When a child exits, its process becomes a "zombie" until its parent process either dies or calls wait() for it. By a "zombie", we mean that it takes up no resources, and doesn't run, but it is just being maintained by the operating system so that when the parent calls wait(), it will get the proper information. Look at the output of forkwait5:

UNIX> bin/forkwait5
Child (6698) calling exit(2)
root      5286  0.0  0.1 168264  5504 ?        Ss   10:55   0:00 sshd: plank [priv]
plank     5315  0.0  0.0 168264  2428 ?        S    10:55   0:00 sshd: plank@pts/0
plank     5316  0.0  0.0 127964  2160 pts/0    Ss   10:55   0:00 -csh
plank     5617  0.0  0.1 158956  4960 pts/0    S+   10:58   0:00 vim lecture.html
root      6406  0.0  0.1 168264  5508 ?        Ss   11:04   0:00 sshd: plank [priv]
plank     6413  0.0  0.0 168264  2432 ?        S    11:04   0:00 sshd: plank@pts/1
plank     6414  0.0  0.0 127964  2160 pts/1    Ss   11:04   0:00 -csh
plank     6697  0.0  0.0   4172   360 pts/1    S+   11:06   0:00 bin/forkwait5
plank     6698  0.0  0.0      0     0 pts/1    Z+   11:06   0:00 [forkwait5] <defunct>
plank     6700  0.0  0.0 113140  1428 pts/1    S+   11:06   0:00 sh -c ps aux | grep plank
plank     6701  0.0  0.0 161372  1840 pts/1    R+   11:06   0:00 ps aux
plank     6702  0.0  0.0 112676   964 pts/1    S+   11:06   0:00 grep plank

Parent: Child done.
  Return value: 6698
  Status:       512
  WIFSTOPPED:   0
  WIFSIGNALED:  0
  WIFEXITED:    1
  WEXITSTATUS:  2
  WTERMSIG:     0
  WSTOPSIG:     2
UNIX> ps aux | grep plank
root      5286  0.0  0.1 168264  5504 ?        Ss   10:55   0:00 sshd: plank [priv]
plank     5315  0.0  0.0 168264  2428 ?        S    10:55   0:00 sshd: plank@pts/0
plank     5316  0.0  0.0 127964  2160 pts/0    Ss   10:55   0:00 -csh
plank     5617  0.0  0.1 158956  5012 pts/0    S+   10:58   0:00 vim lecture.html
root      6406  0.0  0.1 168264  5508 ?        Ss   11:04   0:00 sshd: plank [priv]
plank     6413  0.0  0.0 168264  2432 ?        S    11:04   0:00 sshd: plank@pts/1
plank     6414  0.0  0.0 127964  2160 pts/1    Ss   11:04   0:00 -csh
plank     6821  0.0  0.0 161372  1840 pts/1    R+   11:08   0:00 ps aux
plank     6822  0.0  0.0 112676   964 pts/1    S+   11:08   0:00 grep plank
UNIX> 
The process 6698 is the zombie process, denoted in the "ps x" output with a capital Z. When forkwait5 (process 6697) calls wait(), then process goes away.

What happens if the parent exits without calling wait()? Then the child process goes away if/when the child has exited.

wait() returns whenever a child exits. If a process has more than one child, then you can't force wait() to wait for a specific child. You simply get whichever child exits first. For example, see src/multichild.c. This program forks off 4 children processes and then calls wait() four times. The children sleep for a random period of time, and then exit. As you see, the first wait() call returns the first child to return:

UNIX> bin/multichild
Fork 0 returned 14160
Fork 1 returned 14161
Fork 2 returned 14162
Fork 3 returned 14163
Child 1 (14161) exiting
Wait returned 14161
Child 3 (14163) exiting
Wait returned 14163
Child 0 (14160) exiting
Wait returned 14160
Child 2 (14162) exiting
Wait returned 14162
UNIX>
Now, you can use waitpid() to wait for a specific process, and you can even have it return if the specified process has not exited. I personally think using waitpid() is usually bad form, and most certainly using the version that returns instantly is really bad form.

You will not be allowed to use any wait() variant besides wait() in your jsh lab. I will instruct the TA's to be ruthless if you call waitpid() with NOHANG set (or one of the waitx() equivalents).


Execve

You saw last lecture how we create new processes -- using fork(). Now, we go over how to execute programs using execve().

Execve() is simple in concept:

int execve(const char *path, const char **argv, const char **envp);

Execve() assumes that path is the name of an executable file. Argv is an array of null-terminated strings, such that the last element is NULL, and envp is another null-terminated array of null-terminated strings. (I'm not going to go over envp -- it holds your environment variables, which you can get and set with getenv()/setenv(). You can get envp as a third argument to main(), which is what I'm going to do here).

Execve() overwrites the current process so that it executes the file in path with the arguments in argv, and the environment variables in envp. Execve() does not return unless it encounters an error, such as the file in path not existing, or not being an executable file.

This may seem confusing. Why does execve() not return? Well, look at the example in src/exec2.c:

/* A simple example of using execve() to run the program cat. */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char **argv, char **envp)
{
  char *newargv[3];
  int i;

  newargv[0] = "cat";
  newargv[1] = "src/exec2.c";
  newargv[2] = NULL;

  i = execve("/bin/cat", newargv, envp);
  perror("exec2: execve failed");
  exit(1);
}

Suppose we compile this to the program bin/exec2. Then we execute it with no arguments. When we get to the execve() call, the state of memory is the following:

                      |-----------------|
                      |                 |
                      | code for exec2  |
                      |                 |
                      |-----------------|
                      |                 |
                      |     globals     |
                      |    for exec2    |
                      |                 |
                      |-----------------|
                      |                 |
                      | heap for exec2  |
                      |                 |
                      |-----------------|
                            ....
                      | stack for exec2 |
                      |-----------------|
Now, execve() is called. This is a system call that says "overwrite my process' memory so that it is running main() in the program in /bin/cat" with the arguments "cat" "exec2.c". When execve() is done, the state of memory has been changed so that we are in the main() routine of cat, with argc and argv set properly:
                      |-----------------|
                      |                 |
                      |  code for cat   |
                      |                 |
                      |-----------------|
                      |                 |
                      |     globals     |
                      |     for cat     |
                      |                 |
                      |-----------------|
                      |                 |
                      |  heap for cat   |
                      |                 |
                      |-----------------|
                            ....
                      |  stack for cat  |
                      |-----------------|
You'll notice that everything concerning exec2.c is gone. This is because the state of memory has been overwritten to run cat. There is no trace of exec2 left. This is why execve() cannot return if it is sucessful -- the state to which it might have returned has been overwritten. It is gone. When cat exits, the operating system simply destroys the process.

Here it is running -- looks just like "cat exec2.c":

UNIX> bin/exec2
/* A simple example of using execve() to run the program cat. */

#include 
#include 
#include 

int main(int argc, char **argv, char **envp)
{
  char *newargv[3];
  int i;

  newargv[0] = "cat";
  newargv[1] = "src/exec2.c";
  newargv[2] = NULL;

  i = execve("/bin/cat", newargv, envp);
  perror("exec2: execve failed");
  exit(1);
}
UNIX> 

So how come when you execute cat in the shell it looks like it returns to the shell? This is because the shell calls fork(). It then has the child call execve(), and the parent calls wait().

There are six variants of execve() -- see the man page for execve(). I'll summarize them below:

  1. execl(char *path, char *arg0, char *arg1, ..., char *argn, NULL): This uses your current envp, and lets you specify the argv as parameters, rather than building an array of pointers. Path must specify the path name exactly.

  2. execv(char *path, char **argv): This is just like execve, but uses your current envp.

  3. execle(char *path, char *arg0, char *arg1, ..., char *argn, NULL, char **envp): This is just like execl, but you must specify envp.

  4. execve(char *path, char **argv, char **envp).

  5. execlp(char *path, char *arg0, char *arg1, ..., char *argn, NULL): This is just like execl, except that if path is a relative filename, then the directories in your PATH variable will be searched to find path.

  6. execvp(char *path, char **argv): This is just like execv, except that the PATH variable will be searched to find path.
Of the six, execvp() and execlp() are the most useful. All are implemented on top of execve() though.

If execve() is unsuccessful (for example, there is no file with the name "path", or that file does not have the executable bit set), it will return with a value of -1. For example, look at src/exec1.c.

This program tries to execute "cat", which does not exist. Thus, the execve() call fails, and the perror statement is executed.

UNIX> bin/exec1
exec1: execve failed: No such file or directory
UNIX> 

This leads to:




DR. PLANK'S CARDINAL SIN OF EXEC

You should NEVER EVER EVER call execve() or any of its variants without testing the return value and exiting if it returns!!!!!!!




Why is this such a sin? Look at the execcatx.c programs. First is src/execcat1.c:

/* This program runs the "cat" program four times by calling fork() in a for loop.
   Inside the loop, the child calls execvp("cat"), and the parent calls wait(). 

   Although this program runs fine, it will turn into a fork bomb if there's a bug. */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main(int argc, char **argv)
{
  char *newargv[3];
  int status, j;

  newargv[0] = "cat";
  newargv[1] = "f1.txt";
  newargv[2] = NULL;

  for (j = 0; j < 4; j++) {
    if (fork() == 0) {
      (void) execvp("cat", newargv);
    } else {
      wait(&status);
    }
  }
  return 0;
}

This program forks off four processes that all exec "cat f1.txt". Note the use of execvp(), which does not need an environment variable, and which searches the PATH variable to find "cat".

UNIX> bin/execcat1
This is file f1.txt
This is file f1.txt
This is file f1.txt
This is file f1.txt
UNIX>
Now, execcat2.c simply substitutes execv() for execvp(). Now, the path is not searched, and the cat executable will not be found. When you run it, ostensibly nothing happens:
UNIX> bin/execcat2
UNIX>
What's really going on? Well, the execv() call fails. This means that the execv() call returns with i = -1, and then the child process continues. It too will go through the for loop and call fork(). In other words, the number of processes blows up exponentially -- it's a fork bomb.

To help illustrate what goes on, look at execcat3.c, which prints out the value of j, and the process' process id at the top of each for loop, and when the process exits:

UNIX> bin/execcat3
Process 81647 - Top of the for loop.  j = 0
Process 81648 - Top of the for loop.  j = 1
Process 81649 - Top of the for loop.  j = 2
Process 81650 - Top of the for loop.  j = 3
Process 81651 exiting.
Process 81650 exiting.
Process 81649 - Top of the for loop.  j = 3
Process 81652 exiting.
Process 81649 exiting.
Process 81648 - Top of the for loop.  j = 2
Process 81653 - Top of the for loop.  j = 3
Process 81654 exiting.
Process 81653 exiting.
Process 81648 - Top of the for loop.  j = 3
Process 81655 exiting.
Process 81648 exiting.
Process 81647 - Top of the for loop.  j = 1
Process 81656 - Top of the for loop.  j = 2
Process 81657 - Top of the for loop.  j = 3
Process 81658 exiting.
Process 81657 exiting.
Process 81656 - Top of the for loop.  j = 3
Process 81659 exiting.
Process 81656 exiting.
Process 81647 - Top of the for loop.  j = 2
Process 81660 - Top of the for loop.  j = 3
Process 81661 exiting.
Process 81660 exiting.
Process 81647 - Top of the for loop.  j = 3
Process 81662 exiting.
Process 81647 exiting.
UNIX>
As you can see, fork() is called 15 times, not four, because the processes that failed the execv() call continue in the for loop. This isn't bad when the for loop stops when j = 4, but were that a 10 rather than a 4, then fork() would be called 1023 times. (i.e. fork() gets called 2n-1 times if the 4 were an n). This can be devastating. Fix the error by calling perror() and exit() if execv() returns, as in src/execcat4.c:

    if (fork() == 0) {
      (void) execv("cat", newargv);
      perror("execcat4's execv call");  /* Here are the only changes to the code -- */
      exit(1);                          /* no longer committing the Cardinal Sin. */
    } else {
      wait(&status);
    }

UNIX> bin/execcat4
Process 81733 - Top of the for loop.  j = 0
execcat4's execv call: No such file or directory
Process 81733 - Top of the for loop.  j = 1
execcat4's execv call: No such file or directory
Process 81733 - Top of the for loop.  j = 2
execcat4's execv call: No such file or directory
Process 81733 - Top of the for loop.  j = 3
execcat4's execv call: No such file or directory
Process 81733 exiting.
UNIX> 

My personal history with the Cardinal Sin

The year is 1986, or may 1987. I am a junior computer science major at Yale, and I'm taking a course called "Systems Programming." Sound familiar? One difference between me in 1986 and y'all who are reading these notes is that my previous languages were BASIC, Pascal and a Lisp variant called "T". The first two weeks of Systems Programming were to go through Kernighan and Ritchie from cover-to-cover and learn C.

The professor was a big, bearded dude named Stan Eisenstat. He seemed 8000 times smarter than the rest of us, and I was scared of him. Toward the end of the semester, he gave us an assignment that you share -- write a shell. He warned us to be careful, because if we had the wrong type of bug, we could shut down the department's mainframe. This is 1986 -- departments like ours had one computer. One. Faculty and student offices had VT100 terminals that all connected to the one mainframe. Undergrads used a roomful of VT100's, again connected to the same mainframe. So you didn't want to be the undergrad who shut down the mainframe.

I shut down the mainframe. It was a Cardinal Sin bug. I tested a shell command of about 10 Unix commands connected by pipes, and I didn't test the return value of exec. Worse, whatever loop I had that was executing the commands didn't increment to the next command when exec failed, so I had a colossal fork-bomb that had no chance of exiting. Within seconds, I couldn't do a ps, because the operating system had run out of processes, and no one's fork() could succeed. Panic.

So I slinked up to Dr. Eisenstat's office, terrified.

Me: Knock knock
Him: "PLANK!"
Me: Opening the door -- "Yes. Sorry."
Him: "It's ok -- just make sure you fix the bug."
Me: "Yes. Thank you."

That was our only exchange ever, but it was certainly memorable for me, and is why I give you the Cardinal Sin.

Of course, now, when you have your fork bomb, you can just unplug your workstation or shut down your laptop, so it's not quite so devastating, but hopefully you can enjoy the story with me.

Stan Eisenstat passed away in 2020 in his 70's. They made a nice memorial page for him at https://seas.yale.edu/news-events/news/memoriam-stanley-eisenstat-professor-computer-science.