CS360 Lecture notes -- Exec/Wait

Directory: /blugreen/homes/plank/cs360/notes/Exec

Lecture notes: http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Exec/lecture.html

Wait()

I'm not going to go over much about wait() in class. Read the man page (do 'man -s 2 wait') and the book. Basically, wait(int *statusp) waits for any child process to complete. When it does, wait() returns the pid of the child process as the return value to wait(), and initializes statusp to contain information on how the child exited. You can use macros in /usr/include/sys/wait.h to examine statusp.

For example, look at the programs forkwait.c, forkwait2.c and forkwait3.c. They show examples of forking off a child, waiting for it to exit, and then examining statusp to see how it exited. They are all straightforward.

In forkwait3, the child must be killed with a signal, using the command "kill". For example:

UNIX> forkwait3 &
[1] 22326
UNIX> Child (22327) doing nothing until you kill it
Kill the child with 'kill -9 22327'  or just 'kill 22327'

Now, you can kill the child manually with "kill -9 22327", which sends it the "sure kill signal (signal number 9)", or with "kill 22327", which sends it signal 15. Try both:

UNIX> forkwait3 &
[1] 22326
UNIX> Child (22327) doing nothing until you kill it
Kill the child with 'kill -9 22327'  or just 'kill 22327'

(hit return a few times)
UNIX> kill -9 22327
UNIX> Parent: Child done.
  Return value: 22327
  Status:       9
  WIFSTOPPED:   0
  WIFSIGNALED:  1
  WIFEXITED:    0
  WEXITSTATUS:  0
  WTERMSIG:     9
  WSTOPSIG:     0

Try again:

UNIX> forkwait3 &
[1] 22328
UNIX> Child (22329) doing nothing until you kill it
Kill the child with 'kill -9 22329'  or just 'kill 22329'

UNIX> kill 22329
UNIX> Parent: Child done.
  Return value: 22329
  Status:       15
  WIFSTOPPED:   0
  WIFSIGNALED:  1
  WIFEXITED:    0
  WEXITSTATUS:  0
  WTERMSIG:     15
  WSTOPSIG:     0

forkwait3a.c has the child generate a segmentation violation, and you'll see that the parent can recognize this as the child terminating with signal 11. We'll go over signals in detail in another lecture.

Ok. Now, look at forkwait4.c. What it does is have the child exit immediately, and have the parent wait 4 seconds, print out the output of the "ps x" command, and then have it call wait(). It should be clear that by the time the parent calls system("ps x"), the child has exited. Thus, we might expect there to be no listing in the "ps x" command for the child, and possibly that the wait() might wait forever, since the child is completed. However, this is not the case.

When a child exits, its process becomes a "zombie" until its parent process either dies or calls wait() for it. By a "zombie", we mean that it takes up no resources, and doesn't run, but it is just being maintained by the operating system so that when the parent calls wait(), it will get the proper information. Look at the output of forkwait4:

UNIX> forkwait4
Child (1624) calling exit(4)
  PID TT STAT  TIME COMMAND
...
  381 p2 S     0:02 -sh (csh)
 1623 p2 S     0:00 forkwait4
 1624 p2 Z     0:00 
 1625 p2 S     0:00 sh -c ps x
 1626 p2 R     0:00 ps x
...

Parent: Child done.
  Return value: 1624
  Status:       1024
  WIFSTOPPED:   0
  WIFSIGNALED:  0
  WIFEXITED:    1
  WEXITSTATUS:  4
  WTERMSIG:     0
  WSTOPSIG:     4
UNIX> ps x
...
  381 p2 S     0:02 -sh (csh)
 1627 p2 R     0:00 ps x
...

The process 1624 is the zombie process, denoted in the "ps x" output with a capital Z. When forkwait4 (process 1623) calls wait(), then process goes away.

What happens if the parent exits without calling wait()? Then the child zombie process should transfer parentage to /sbin/init. Instead, the child simply goes away.

wait() returns whenever a child exits. If a process has more than one child, then you can't force wait() to wait for a specific child. You simply get whichever child exits first. For example, see multichild.c. This program forks off 4 children processes and then calls wait() four times. The children sleep for a random period of time, and then exit. As you see, the first wait() call returns the first child to return:

UNIX> multichild
Fork 0 returned 14160
Fork 1 returned 14161
Fork 2 returned 14162
Fork 3 returned 14163
Child 1 (14161) exiting
Wait returned 14161
Child 3 (14163) exiting
Wait returned 14163
Child 0 (14160) exiting
Wait returned 14160
Child 2 (14162) exiting
Wait returned 14162
UNIX>

Now, you can use waitpid() to wait for a specific process, and you can even have it return if the specified process has not exited. I personally think using waitpid() is usually bad form, and most certainly using the version that returns instantly is really bad form.

You will not be allowed to use any wait() variant besides wait() in your jsh lab. I will instruct the TA's to be ruthless if you call waitpid() with NOHANG set.

Execve

You saw last lecture how we create new processes -- using fork(). In this lecture, we go over how to execute programs using execve().

Execve() is simple in concept:

int execve(char *path, char **argv, char **envp);

Execve() assumes that path is the name of an executable file. Argv is an array of null-terminated strings, such that the last element is NULL, and envp is another null-terminated array of null-terminated strings.

Execve() overwrites the current process so that it executes the file in path with the arguments in argv, and the environment variables in envp. Execve() does not return unless it encounters an error, such as the file in "path" not existing, or not being an executable file.

This may seem confusing. Why does execve() not return? Well, look at the example in exec2.c:

#include 

main(int argc, char **argv, char **envp)
{
  char *newargv[3];
  int i;

  newargv[0] = "cat";
  newargv[1] = "exec2.c";
  newargv[2] = NULL;

  i = execve("/bin/cat", newargv, envp);
  perror("exec2: execve() failed");
  exit(1);
}

Suppose we compile this to the program exec2. Then we execute it with no arguments. When we get to the execve() call the state of memory is the following:

                      |----------------|
                      |                | 
                      | code for exec2 |
                      |                |
                      |                |
                      |----------------|
                      |                |
                      |   globals      |
                      |   for exec2    |
                      |                |
                      |----------------|
                      |                |
                      | heap for exec2 |
                      |                |
                      |----------------|
                      |                |
                            ....
                      |                |
                      |   stack        |
                      |   for exec2    |
                      |                |
                      |----------------|

Now, execve() is called. This is a system call that says "execute the program in /bin/cat" with the arguments "cat" "exec2.c". When execve() is done, the state of memory has been changed so that we are in the main() routine of cat, with argc and argv set properly:

                      |----------------|
                      |                | 
                      | code for cat   |
                      |                |
                      |                |
                      |----------------|
                      |                |
                      |   globals      |
                      |   for cat      |
                      |                |
                      |----------------|
                      |                |
                      | heap for cat   |
                      |                |
                      |----------------|
                      |                |
                            ....
                      |                |
                      |   stack        |
                      |   for cat      |
                      |                |
                      |----------------|

You'll notice that everything concerning exec2.c is gone. This is because the state of memory has been overwritten to run cat. There is no trace of exec2 left. This is why execve() cannot return if it is sucessful -- the state to which it might have returned has been overwritten. It is gone. When cat exits, the operating system simply destroys the process.

So how come when you execute cat in the shell it looks like it returns to the shell? This is because the shell calls fork - exec - wait.

There are six variants of execve() -- see the man page for execve(). I'll summarize them below:

execl(char *path, char *arg0, char *arg1, ..., char *argn, NULL): This uses your current envp, and lets you specify the argv as parameters, rather than building an array of pointers. Path must specify the path name exactly.
execv(char *path, char **argv): This is just like execve, but uses your current envp.
execle(char *path, char *arg0, char *arg1, ..., char *argn, NULL, char **envp): This is just like execl, but you must specify envp.
execve(char *path, char **argv, char **envp).
execlp(char *path, char *arg0, char *arg1, ..., char *argn, NULL): This is just like execl, except that if path is a relative filename, then the directories in your PATH variable will be searched to find path.
execvp(char *path, char **argv): This is just like execv, except that the PATH variable will be searched to find path.

Of them execvp() and execlp() are the most useful. All are implemented on top of execve() though.

If execve() is unsuccessful (for example, there is no file with the name "path", or that file does not have the executable bit set), it will return with a value of -1. For example, look at exec1.c.

This program tries to execute "./cat", which does not exist. Thus, the execve() call fails, and the perror statement is executed.

This leads to:

DR. PLANK'S CARDINAL SIN OF EXEC

You should NEVER EVER EVER call execve() or any of its variants without testing the return value and exiting if it returns!!!!!!!

Why is this such a sin? Look at the execcatx.c programs.

execcat1.c forks off three processes that all exec "cat f1". Note the use of execvp(), which does not need an environment variable, and which searches the PATH variable to find "cat".

UNIX> execcat1
This is file f1
This is file f1
This is file f1
UNIX>

Now, execcat2.c substitutes execv() for execvp(). When you run it, ostensibly nothing happens:

UNIX> execcat2
UNIX>

What's going on? Well, the execv() call fails. This means that the execv call returns with i = -1, and then the child process continues. It too will go through the while loop and call fork(). To help illustrate what goes on, look at execcat3.c, which prints out the value of j and the pid of the process before every fork() call:

UNIX> execcat3
I am 4794.  j = 1
I am 4795.  j = 2
I am 4796.  j = 3
I am 4795.  j = 3
I am 4794.  j = 2
I am 4799.  j = 3
I am 4794.  j = 3
UNIX>

As you can see, fork() is called 7 times, not three, because the processes that failed the execv call continue in the while loop. This isn't bad for j = 3, but were the 3 a 10, then fork() would be called 1023 times. (i.e. fork gets called 2^n-1 times if the 3 were an n). This can be devastating. Fix the error by checking the return value of execv() as in execcat4.c:

UNIX> execcat4
execcat4: No such file or directory
execcat4: No such file or directory
execcat4: No such file or directory
UNIX>