CS360 Lecture notes -- Wait and Exec


Wait()

Wait() is a very important system call. What it does is wait for one of your child processes, that you have created with fork(), to exit. When one exits, wait() returns and gives you some information about the child and how it exited. Here's the prototype (man -s wait):

#include 

pid_t wait(int *stat_loc);   // pid_t is an int or uint32_t, typically

When you call it, if you don't have any children, it returns -1. Otherwise, it will wait of one of your children to exit. When they do, the wait() call will return with the process id of the child that exited. It will also fill in the integer pointed to by stat_loc with information on how the child exited. There are macros in the sys/wait.h include file that can help you parse this integer.

Here are examples. forkwait0.c calls wait() with no children:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main()
{
  int rv, stat_loc;
  
  stat_loc = 0xabcdef;

  rv = wait(&stat_loc);
  printf("RV: %d.   Stat_loc = 0x%x\n", rv, (unsigned int) stat_loc);
  return 0;
}

It returns instantly with a return value of -1, and stat_loc remains unchanged:

UNIX> ./forkwait0
RV: -1.   Stat_loc = 0xabcdef
UNIX> 
forkwait1.c forks off one child, which exits instantly with a return code of zero. The parent calls wait() and prints out a bunch of stuff about the return status:

int main()
{
  int i, j, status;

  i = fork();

  if (i > 0) {
    j = wait(&status);
    printf("Parent: Child done.\n");
    printf("  Return value: %d\n", j);
    printf("  Status:       %d\n", status);
    printf("  WIFSTOPPED:   %d\n", WIFSTOPPED(status));
    printf("  WIFSIGNALED:  %d\n", WIFSIGNALED(status));
    printf("  WIFEXITED:    %d\n", WIFEXITED(status));
    printf("  WEXITSTATUS:  %d\n", WEXITSTATUS(status));
    printf("  WTERMSIG:     %d\n", WTERMSIG(status));
    printf("  WSTOPSIG:     %d\n", WSTOPSIG(status));
  } else {
    printf("Child (%d) calling exit(0)\n", getpid());
    exit(0);
  }
}

You can see with "WEXITSTATUS" that the child exited with a return code of zero.

UNIX> ./forkwait1
Child (8575) calling exit(0)
Parent: Child done.
  Return value: 8575
  Status:       0
  WIFSTOPPED:   0
  WIFSIGNALED:  0
  WIFEXITED:    1
  WEXITSTATUS:  0
  WTERMSIG:     0
  WSTOPSIG:     0
UNIX> 
forkwait2.c is the exact same as forkwait1.c, except that the child calls exit(1) instead of exit(0):
UNIX> ./forkwait2
Child (8747) calling exit(1)
Parent: Child done.
  Return value: 8747
  Status:       256
  WIFSIGNALED:  0
  WIFEXITED:    1
  WEXITSTATUS:  1
  WTERMSIG:     0
UNIX> 
With forkwait3.c, the child goes into an infinite loop, which means that the parent's wait() call does not return:
UNIX> forkwait3
Child (8912) doing nothing until you kill it
Kill the child with 'kill -9 8912'  or just 'kill 8912'
In another window, go ahead and kill the child process:
UNIX> kill -9 8912
Now, the parent's wait() call returns, and shows you that the child process was terminated with "signal 9." That is how you killed the child process with the "kill" command:
Parent: Child done.
  Return value: 8912
  Status:       9
  WIFSTOPPED:   0
  WIFSIGNALED:  1
  WIFEXITED:    0
  WEXITSTATUS:  0
  WTERMSIG:     9
  WSTOPSIG:     0
UNIX> 
Finally, forkwait4.c has the child generate a segmentation violation. That is reported to the parent as terminating with "signal 11."
UNIX> ./forkwait4
Child (9255) generating a seg fault
Parent: Child done.
  Return value: 9255
  Status:       11
  WIFSTOPPED:   0
  WIFSIGNALED:  1
  WIFEXITED:    0
  WEXITSTATUS:  0
  WTERMSIG:     11
  WSTOPSIG:     0
UNIX> 
We'll go over signals in detail in another lecture.

Ok. Now, look at forkwait5.c. It does the following:

As mentioned above, by the time the parent calls system("ps aux | grep plank"), the child has exited. Thus, we might expect there to be no listing in the "ps x" command for the child, and possibly that the wait() might wait forever, since the child is gone. However, this is not the case.

When a child exits, its process becomes a "zombie" until its parent process either dies or calls wait() for it. By a "zombie", we mean that it takes up no resources, and doesn't run, but it is just being maintained by the operating system so that when the parent calls wait(), it will get the proper information. Look at the output of forkwait5:

UNIX> ./forkwait5
Child (6698) calling exit(2)
root      5286  0.0  0.1 168264  5504 ?        Ss   10:55   0:00 sshd: plank [priv]
plank     5315  0.0  0.0 168264  2428 ?        S    10:55   0:00 sshd: plank@pts/0
plank     5316  0.0  0.0 127964  2160 pts/0    Ss   10:55   0:00 -csh
plank     5617  0.0  0.1 158956  4960 pts/0    S+   10:58   0:00 vim lecture.html
root      6406  0.0  0.1 168264  5508 ?        Ss   11:04   0:00 sshd: plank [priv]
plank     6413  0.0  0.0 168264  2432 ?        S    11:04   0:00 sshd: plank@pts/1
plank     6414  0.0  0.0 127964  2160 pts/1    Ss   11:04   0:00 -csh
plank     6697  0.0  0.0   4172   360 pts/1    S+   11:06   0:00 ./forkwait5
plank     6698  0.0  0.0      0     0 pts/1    Z+   11:06   0:00 [forkwait5] 
plank     6700  0.0  0.0 113140  1428 pts/1    S+   11:06   0:00 sh -c ps aux | grep plank
plank     6701  0.0  0.0 161372  1840 pts/1    R+   11:06   0:00 ps aux
plank     6702  0.0  0.0 112676   964 pts/1    S+   11:06   0:00 grep plank

Parent: Child done.
  Return value: 6698
  Status:       512
  WIFSTOPPED:   0
  WIFSIGNALED:  0
  WIFEXITED:    1
  WEXITSTATUS:  2
  WTERMSIG:     0
  WSTOPSIG:     2
UNIX> ps aux | grep plank
root      5286  0.0  0.1 168264  5504 ?        Ss   10:55   0:00 sshd: plank [priv]
plank     5315  0.0  0.0 168264  2428 ?        S    10:55   0:00 sshd: plank@pts/0
plank     5316  0.0  0.0 127964  2160 pts/0    Ss   10:55   0:00 -csh
plank     5617  0.0  0.1 158956  5012 pts/0    S+   10:58   0:00 vim lecture.html
root      6406  0.0  0.1 168264  5508 ?        Ss   11:04   0:00 sshd: plank [priv]
plank     6413  0.0  0.0 168264  2432 ?        S    11:04   0:00 sshd: plank@pts/1
plank     6414  0.0  0.0 127964  2160 pts/1    Ss   11:04   0:00 -csh
plank     6821  0.0  0.0 161372  1840 pts/1    R+   11:08   0:00 ps aux
plank     6822  0.0  0.0 112676   964 pts/1    S+   11:08   0:00 grep plank
UNIX> 
The process 6698 is the zombie process, denoted in the "ps x" output with a capital Z. When forkwait5 (process 6697) calls wait(), then process goes away.

What happens if the parent exits without calling wait()? Then the child process goes away if/when the child has exited.

wait() returns whenever a child exits. If a process has more than one child, then you can't force wait() to wait for a specific child. You simply get whichever child exits first. For example, see multichild.c. This program forks off 4 children processes and then calls wait() four times. The children sleep for a random period of time, and then exit. As you see, the first wait() call returns the first child to return:

UNIX> ./multichild
Fork 0 returned 14160
Fork 1 returned 14161
Fork 2 returned 14162
Fork 3 returned 14163
Child 1 (14161) exiting
Wait returned 14161
Child 3 (14163) exiting
Wait returned 14163
Child 0 (14160) exiting
Wait returned 14160
Child 2 (14162) exiting
Wait returned 14162
UNIX>
Now, you can use waitpid() to wait for a specific process, and you can even have it return if the specified process has not exited. I personally think using waitpid() is usually bad form, and most certainly using the version that returns instantly is really bad form.

You will not be allowed to use any wait() variant besides wait() in your jsh lab. I will instruct the TA's to be ruthless if you call waitpid() with NOHANG set (or one of the waitx() equivalents).


Execve

You saw last lecture how we create new processes -- using fork(). Now, we go over how to execute programs using execve().

Execve() is simple in concept:

int execve(char *path, char **argv, char **envp);

Execve() assumes that path is the name of an executable file. Argv is an array of null-terminated strings, such that the last element is NULL, and envp is another null-terminated array of null-terminated strings. (I'm not going to go over envp -- it holds your environment variables, which you can get and set with getenv()/setenv(). You can get envp as a third argument to main(), which is what I'm going to do here).

Execve() overwrites the current process so that it executes the file in path with the arguments in argv, and the environment variables in envp. Execve() does not return unless it encounters an error, such as the file in "path" not existing, or not being an executable file.

This may seem confusing. Why does execve() not return? Well, look at the example in exec2.c:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char **argv, char **envp)
{
  char *newargv[3];
  int i;

  newargv[0] = "cat";
  newargv[1] = "exec2.c";
  newargv[2] = NULL;

  i = execve("/bin/cat", newargv, envp);
  perror("exec2: execve failed");
  exit(1);
}

Suppose we compile this to the program exec2. Then we execute it with no arguments. When we get to the execve() call, the state of memory is the following:

                      |-----------------|
                      |                 |
                      | code for exec2  |
                      |                 |
                      |-----------------|
                      |                 |
                      |     globals     |
                      |    for exec2    |
                      |                 |
                      |-----------------|
                      |                 |
                      | heap for exec2  |
                      |                 |
                      |-----------------|
                            ....
                      | stack for exec2 |
                      |-----------------|
Now, execve() is called. This is a system call that says "overwrite my process' memory so that it is running main() in the program in /bin/cat" with the arguments "cat" "exec2.c". When execve() is done, the state of memory has been changed so that we are in the main() routine of cat, with argc and argv set properly:
                      |-----------------|
                      |                 |
                      |  code for cat   |
                      |                 |
                      |-----------------|
                      |                 |
                      |     globals     |
                      |     for cat     |
                      |                 |
                      |-----------------|
                      |                 |
                      |  heap for cat   |
                      |                 |
                      |-----------------|
                            ....
                      |  stack for cat  |
                      |-----------------|
You'll notice that everything concerning exec2.c is gone. This is because the state of memory has been overwritten to run cat. There is no trace of exec2 left. This is why execve() cannot return if it is sucessful -- the state to which it might have returned has been overwritten. It is gone. When cat exits, the operating system simply destroys the process.

Here it is running -- looks just like "cat exec2.c":

UNIX> ./exec2
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char **argv, char **envp)
{
  char *newargv[3];
  int i;

  newargv[0] = "cat";
  newargv[1] = "exec2.c";
  newargv[2] = NULL;

  i = execve("/bin/cat", newargv, envp);
  perror("exec2: execve failed");
  exit(1);
}
UNIX> 

So how come when you execute cat in the shell it looks like it returns to the shell? This is because the shell calls fork(). It then has the child call execve(), and the parent calls wait().

There are six variants of execve() -- see the man page for execve(). I'll summarize them below:

  1. execl(char *path, char *arg0, char *arg1, ..., char *argn, NULL): This uses your current envp, and lets you specify the argv as parameters, rather than building an array of pointers. Path must specify the path name exactly.

  2. execv(char *path, char **argv): This is just like execve, but uses your current envp.

  3. execle(char *path, char *arg0, char *arg1, ..., char *argn, NULL, char **envp): This is just like execl, but you must specify envp.

  4. execve(char *path, char **argv, char **envp).

  5. execlp(char *path, char *arg0, char *arg1, ..., char *argn, NULL): This is just like execl, except that if path is a relative filename, then the directories in your PATH variable will be searched to find path.

  6. execvp(char *path, char **argv): This is just like execv, except that the PATH variable will be searched to find path.
Of the six, execvp() and execlp() are the most useful. All are implemented on top of execve() though.

If execve() is unsuccessful (for example, there is no file with the name "path", or that file does not have the executable bit set), it will return with a value of -1. For example, look at exec1.c.

This program tries to execute "./cat", which does not exist. Thus, the execve() call fails, and the perror statement is executed.

UNIX> ./exec1
exec1: execve failed: No such file or directory
UNIX> 

This leads to:




DR. PLANK'S CARDINAL SIN OF EXEC

You should NEVER EVER EVER call execve() or any of its variants without testing the return value and exiting if it returns!!!!!!!




Why is this such a sin? Look at the execcatx.c programs. First is execcat1.c

int main(int argc, char **argv)
{
  char *newargv[3];
  int i, j;

  newargv[0] = "cat";
  newargv[1] = "f1.txt";
  newargv[2] = NULL;

  for (j = 0; j < 4; j++) {
    if (fork() == 0) {
      i = execvp("cat", newargv);
    } else {
      wait(&i);
    }
  }
  return 0;
}

This program forks off four processes that all exec "cat f1.txt". Note the use of execvp(), which does not need an environment variable, and which searches the PATH variable to find "cat".

UNIX> ./execcat1
This is file f1.txt
This is file f1.txt
This is file f1.txt
This is file f1.txt
UNIX>
Now, execcat2.c simply substitutes execv() for execvp(). Now, the path is not searched, and the cat executable will not be found. When you run it, ostensibly nothing happens:
UNIX> ./execcat2
UNIX>
What's really going on? Well, the execv() call fails. This means that the execv() call returns with i = -1, and then the child process continues. It too will go through the while loop and call fork(). To help illustrate what goes on, look at execcat3.c, which prints out the value of j and the pid of the process before every fork() call:
UNIX> ./execcat3
I am 13796.  j = 0
I am 13797.  j = 1
I am 13798.  j = 2
I am 13799.  j = 3
I am 13798.  j = 3
I am 13797.  j = 2
I am 13802.  j = 3
I am 13797.  j = 3
I am 13796.  j = 1
I am 13805.  j = 2
I am 13806.  j = 3
I am 13805.  j = 3
I am 13796.  j = 2
I am 13809.  j = 3
I am 13796.  j = 3
UNIX>
As you can see, fork() is called 15 times, not four, because the processes that failed the execv() call continue in the while loop. This isn't bad when the for loop stops when j = 4, but were that a 10 rather than a 4, then fork() would be called 1023 times. (i.e. fork gets called 2n-1 times if the 4 were an n). This can be devastating. Fix the error by calling perror() and exit() if execv() returns, as in execcat4.c:
UNIX> ./execcat4
execcat4: No such file or directory
execcat4: No such file or directory
execcat4: No such file or directory
execcat4: No such file or directory
UNIX>
If you want a truly disastrous program, take a look at disaster.c

int main(int argc, char **argv)
{
  char *newargv[3];
  int i, j;

  newargv[0] = "cat";
  newargv[1] = "f1.txt";
  newargv[2] = NULL;

  j = 0;
  while (j < 4) {
    if (fork() == 0) {
      execv("cat", newargv);
    } else {
      j++;
    }
  }
  return 0;
}

Don't run that one unless you are a machine that you can reboot.....