CS360 Lecture notes -- Dup

  • Jim Plank
  • Directory: /blugreen/homes/plank/cs360/notes/Dup
  • Lecture notes: http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Dup/lecture.html

    Open files

    First, read section 3.10 in the book. This discusses the various ways you can share open files. Basically, what is says is the following:

    The operating system (OS) has three kinds of data structures for files:

    The difference between a vnode and an inode is where it's located and when it's valid. Inodes are located on disk and are always valid because they contain information that is always needed such as ownership and protection. Vnodes are located in the operating system's memory, and only exist when a file is opened. However, just one vnode exists for every physical file that is opened.

    So, look at the following program:

    main()
    {
      int fd1, fd2;
    
      fd1 = open("file1", O_WRONLY | O_CREAT | O_TRUNC, 0644);
      fd2 = open("file1", O_WRONLY);
    
    }
    
    Now, what has happened? The OS has created two file table entries, one for each open() call, but only one vnode. This is because there is only one file. Both file table entries point to the same vnode, but they each have different seek pointers. Thus, if we expand the above program into: (This is file fs1.c)
    main()
    {
      int fd1, fd2;
    
      fd1 = open("file1", O_WRONLY | O_CREAT | O_TRUNC, 0644);
      fd2 = open("file1", O_WRONLY);
    
      write(fd1, "Jim\n", strlen("Jim\n"));
      write(fd2, "Plank\n", strlen("Plank\n"));
    
      close(fd1);
      close(fd2);
    }
    
    Then what will happen? Well, the first write() call will write the string "Jim\n" into file1. Then the second write() call will overwrite it with "Plank\n". This is because each fd points to its own file table entry, which has its own lseek pointer, and thus the first write() does not update the lseek pointer of the fd2.

    To make this more clear, fs1a.c prints out the values of each fd's seek pointer at each step of the program. As you can see, even though the two fd's are for the same file, since they each have their own file table entry, they each have their own seek pointer:

    UNIX> fs1
    UNIX> cat file1
    Plank
    UNIX> fs1a
    Before writing Jim:   lseek(fd1, 0, 1): 0.  lseek(fd2, 0, 1): 0
    Before writing Plank: lseek(fd1, 0, 1): 4.  lseek(fd2, 0, 1): 0
    After writing Plank:  lseek(fd1, 0, 1): 4.  lseek(fd2, 0, 1): 6
    UNIX> cat file1
    Plank
    UNIX>
    

    Dup()

    Now, the system call dup(int fd) duplicates a file descriptor fd. What this does is return a second file descriptor that points to the same file table entry as fd does. So now you can treat the two file descriptors as identical.

    Look at an alteration of fs1.c (in fs2.c). Instead of calling open() to initialize fd2, it calls dup(fd1). Thus, after the first write, the lseek pointer of fd2 has been updated to reflect the write to fd1 -- this is because the two file descriptors point to the same file table entry.

    Thus, after running fs2.c, the file "file2" should say "Jim\nPlank\n". Like fs1a.c, fs2a.c prints out the lseek pointers of fd1 and fd2 at each step. As you can see, the write() to fd1 updates the lseek pointer for fd2:

    UNIX> fs2
    UNIX> cat file2
    Jim
    Plank
    UNIX> fs2a
    Before writing Jim:   lseek(fd1, 0, 1): 0.  lseek(fd2, 0, 1): 0
    Before writing Plank: lseek(fd1, 0, 1): 4.  lseek(fd2, 0, 1): 4
    After writing Plank:  lseek(fd1, 0, 1): 10.  lseek(fd2, 0, 1): 10
    UNIX> cat file2 
    Jim
    Plank
    UNIX> 
    

    Now, when fork() is called, ALL FILE DESCRIPTORS ARE DUPLICATED, AS IF dup() WERE CALLED. Thus, look at the following program (fs3.c):

    main()
    {
      char s[1000];
      int i, fd;
    
      fd = open("file3", O_WRONLY | O_CREAT | O_TRUNC, 0644);
    
      i = fork();
      sprintf(s, "fork() = %d.\n", i);
      write(fd, s, strlen(s));
    }
    
    What should happen? Well, whichever process gets control of the CPU first after the fork() will write s to file3. Then the other process will append its string s to file3. For example:
    UNIX> fs3
    UNIX> cat file3
    fork() = 0.
    fork() = 22107.
    UNIX> fs3
    UNIX> cat file3
    fork() = 0.
    fork() = 22110.
    UNIX> fs3
    UNIX> cat file3
    fork() = 22113.
    fork() = 0.
    UNIX> 
    
    Now, this is because the file descriptor fd is duplicated across fork() calls. Were it not duplicated, but instead re-opened, then one write() would overwrite the other.

    Perhaps you're thinking, ``He opened a file and then called fork(). Does he have to worry about that buffer copying problem in the last lab?'' The answer is no, because I'm using write(), which is a system call, and there is no buffering. You have to worry about the buffering problem when the standard I/O library is being used, and the buffer is not empty when fork() is called. For example, look at fs3a.c and fs3b.c. They use fprintf() instead of write(). When I call them, I get the following:

    UNIX> fs3a
    UNIX> cat file3
    fork() = 0.
    fork() = 3716.
    UNIX> fs3b
    UNIX> cat file3
    This is file3
    fork() = 3719.
    This is file3
    fork() = 0.
    UNIX>
    
    Do you see where the copied buffer is a problem? Make sure you can explain this phenomenon.

    Dup2()

    Dup2() is a system call that dups an open file descriptor so that the result is a desired file descriptor.
    int dup2(int fd1, int fd2)
    
         With dup2(), fd2 specifies the  desired  value  of  the  new
         descriptor.   If  descriptor  fd2  is  already in use, it is
         first deallocated as if it were closed by close(2V).
    
    Dup2() is most often used so that you can redirect standard input or output. When you call dup2(fd, 0) and the dup2() call is successful, then whenever your program reads from standard input, it will read from fd. Similarly, when you call dup2(fd, 1) and the dup2() call is successful, then whenever your program writes to standard output, it will write to fd.

    For example, look at dup2ex.c. This opens the file file4 for writing, and then uses dup2 to redirect standard output to that file. When it's done, you'll see that everything has gone intto file4:

    UNIX> dup2ex
    UNIX> cat file4
    Standard output now goes to file4
    It goes even after we closed file descriptor 3
    putchar works
    And fwrite
    And write
    UNIX>
    
    Why did I make the fflush() call in dup2ex.c? Take it out and see. Make sure that you can explain this.

    Now, suppose you want to execute, for example, "cat < f1 > f2" by calling fork(), exec() and dup2() instead of doing it from the shell. You can do this in catf1f2.c. This opens f1 for reading on stdin (fd 0), and f2 for writing on stdout (fd 1).

    Study this program closely, because you will find it greatly helpful in the jsh lab.


    Note: not all properties of the process would change across an exec call. In other words, the new process inherits a number of properties from the calling process:

    Obviously, here we care the most about open file descriptors. While we say that exec replaces the old process with a new one, most open file descriptors would remain open. Care to venture where all the above information is stored at?