The operating system (OS) has three kinds of data structures for files:
So, look at the following program:
main() { int fd1, fd2; fd1 = open("file1", O_WRONLY | O_CREAT | O_TRUNC, 0644); fd2 = open("file1", O_WRONLY); }Now, what has happened? The OS has created two file table entries, one for each open() call, but only one vnode. This is because there is only one file. Both file table entries point to the same vnode, but they each have different seek pointers. Thus, if we expand the above program into: (This is file fs1.c)
main() { int fd1, fd2; fd1 = open("file1", O_WRONLY | O_CREAT | O_TRUNC, 0644); fd2 = open("file1", O_WRONLY); write(fd1, "Jim\n", strlen("Jim\n")); write(fd2, "Plank\n", strlen("Plank\n")); close(fd1); close(fd2); }Then what will happen? Well, the first write() call will write the string "Jim\n" into file1. Then the second write() call will overwrite it with "Plank\n". This is because each fd points to its own file table entry, which has its own lseek pointer, and thus the first write() does not update the lseek pointer of the fd2.
To make this more clear, fs1a.c prints out the values of each fd's seek pointer at each step of the program. As you can see, even though the two fd's are for the same file, since they each have their own file table entry, they each have their own seek pointer:
UNIX> fs1 UNIX> cat file1 Plank UNIX> fs1a Before writing Jim: lseek(fd1, 0, 1): 0. lseek(fd2, 0, 1): 0 Before writing Plank: lseek(fd1, 0, 1): 4. lseek(fd2, 0, 1): 0 After writing Plank: lseek(fd1, 0, 1): 4. lseek(fd2, 0, 1): 6 UNIX> cat file1 Plank UNIX>
Look at an alteration of fs1.c (in fs2.c). Instead of calling open() to initialize fd2, it calls dup(fd1). Thus, after the first write, the lseek pointer of fd2 has been updated to reflect the write to fd1 -- this is because the two file descriptors point to the same file table entry.
Thus, after running fs2.c, the file "file2" should say "Jim\nPlank\n". Like fs1a.c, fs2a.c prints out the lseek pointers of fd1 and fd2 at each step. As you can see, the write() to fd1 updates the lseek pointer for fd2:
UNIX> fs2 UNIX> cat file2 Jim Plank UNIX> fs2a Before writing Jim: lseek(fd1, 0, 1): 0. lseek(fd2, 0, 1): 0 Before writing Plank: lseek(fd1, 0, 1): 4. lseek(fd2, 0, 1): 4 After writing Plank: lseek(fd1, 0, 1): 10. lseek(fd2, 0, 1): 10 UNIX> cat file2 Jim Plank UNIX>
Now, when fork() is called, ALL FILE DESCRIPTORS ARE DUPLICATED, AS IF dup() WERE CALLED. Thus, look at the following program (fs3.c):
main() { char s[1000]; int i, fd; fd = open("file3", O_WRONLY | O_CREAT | O_TRUNC, 0644); i = fork(); sprintf(s, "fork() = %d.\n", i); write(fd, s, strlen(s)); }What should happen? Well, whichever process gets control of the CPU first after the fork() will write s to file3. Then the other process will append its string s to file3. For example:
UNIX> fs3 UNIX> cat file3 fork() = 0. fork() = 22107. UNIX> fs3 UNIX> cat file3 fork() = 0. fork() = 22110. UNIX> fs3 UNIX> cat file3 fork() = 22113. fork() = 0. UNIX>Now, this is because the file descriptor fd is duplicated across fork() calls. Were it not duplicated, but instead re-opened, then one write() would overwrite the other.
Perhaps you're thinking, ``He opened a file and then called fork(). Does he have to worry about that buffer copying problem in the last lab?'' The answer is no, because I'm using write(), which is a system call, and there is no buffering. You have to worry about the buffering problem when the standard I/O library is being used, and the buffer is not empty when fork() is called. For example, look at fs3a.c and fs3b.c. They use fprintf() instead of write(). When I call them, I get the following:
UNIX> fs3a UNIX> cat file3 fork() = 0. fork() = 3716. UNIX> fs3b UNIX> cat file3 This is file3 fork() = 3719. This is file3 fork() = 0. UNIX>Do you see where the copied buffer is a problem? Make sure you can explain this phenomenon.
int dup2(int fd1, int fd2) With dup2(), fd2 specifies the desired value of the new descriptor. If descriptor fd2 is already in use, it is first deallocated as if it were closed by close(2V).Dup2() is most often used so that you can redirect standard input or output. When you call dup2(fd, 0) and the dup2() call is successful, then whenever your program reads from standard input, it will read from fd. Similarly, when you call dup2(fd, 1) and the dup2() call is successful, then whenever your program writes to standard output, it will write to fd.
For example, look at dup2ex.c. This opens the file file4 for writing, and then uses dup2 to redirect standard output to that file. When it's done, you'll see that everything has gone intto file4:
UNIX> dup2ex UNIX> cat file4 Standard output now goes to file4 It goes even after we closed file descriptor 3 putchar works And fwrite And write UNIX>Why did I make the fflush() call in dup2ex.c? Take it out and see. Make sure that you can explain this.
Now, suppose you want to execute, for example, "cat < f1 > f2" by calling fork(), exec() and dup2() instead of doing it from the shell. You can do this in catf1f2.c. This opens f1 for reading on stdin (fd 0), and f2 for writing on stdout (fd 1).
Study this program closely, because you will find it greatly helpful in the jsh lab.
Note: not all properties of the process would change across an exec call. In other words, the new process inherits a number of properties from the calling process:
Obviously, here we care the most about open file descriptors. While we say that exec replaces the old process with a new one, most open file descriptors would remain open. Care to venture where all the above information is stored at?