CS360 Lecture notes -- Dup

  • James S. Plank
  • Directory: /home/plank/cs360/notes/Dup
  • Lecture notes: http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Dup/lecture.html
  • Original Notes: Sometime in the 1990's.
  • Last Modification: Tue Mar 29 13:01:36 EDT 2016

    Open files

    The operating system (OS) has four kinds of data structures for files:

    The difference between a vnode and an inode is where each is located and when each is valid. Inodes are located on disk and are always valid because they contain information that is always needed, such as ownership and protection. Vnodes are located in the operating system's memory, and only exist when a file is opened. However, just one vnode exists for every physical file that is opened.

    So, look at the following program (I don't have this program in a file -- it is the beginning of fs1.c).

    main()
    {
      int fd1, fd2;
    
      fd1 = open("file1", O_WRONLY | O_CREAT | O_TRUNC, 0644);
      fd2 = open("file1", O_WRONLY);
    
    }
    

    What has happened? The OS has created two file table entries, one for each open() call, but only one vnode. This is because there is only one file. Both file table entries point to the same vnode, but they each have different seek pointers. The state of the system is as pictured:

    Now, suppose we write to these file descriptors: (This is file fs1.c)

    main()
    {
      int fd1, fd2;
    
      fd1 = open("file1", O_WRONLY | O_CREAT | O_TRUNC, 0644);
      fd2 = open("file1", O_WRONLY);
    
      write(fd1, "Jim\n", strlen("Jim\n"));
      write(fd2, "Plank\n", strlen("Plank\n"));
    
      close(fd1);
      close(fd2);
    }
    

    Then what will happen? Well, the first write() call will write the string "Jim\n" into file1. Then the second write() call will overwrite it with "Plank\n". This is because each fd points to its own file table entry, which has its own lseek pointer, and thus the first write() does not update the lseek pointer of the fd2:

    To make this more clear, fs1a.c prints out the values of each fd's seek pointer at each step of the program. As you can see, even though the two fd's are for the same file, since they each have their own file table entry, they each have their own seek pointer:

    UNIX> fs1
    UNIX> cat file1
    Plank
    UNIX> fs1a
    Before writing Jim:   lseek(fd1, 0, 1): 0.  lseek(fd2, 0, 1): 0
    Before writing Plank: lseek(fd1, 0, 1): 4.  lseek(fd2, 0, 1): 0
    After writing Plank:  lseek(fd1, 0, 1): 4.  lseek(fd2, 0, 1): 6
    UNIX> cat file1
    Plank
    UNIX>
    

    Dup()

    The system call dup(int fd) duplicates a file descriptor fd. What this does is return a second file descriptor that points to the same file table entry as fd does. So now you can treat the two file descriptors as identical.

    Look at an alteration of fs1.c (in fs2.c). Instead of calling open() to initialize fd2, it calls dup(fd1). Thus, after the first write, the lseek pointer of fd2 has been updated to reflect the write to fd1 -- this is because the two file descriptors point to the same file table entry.

    Thus, after running fs2.c, the file "file2" should say "Jim\nPlank\n". Like fs1a.c, fs2a.c prints out the lseek pointers of fd1 and fd2 at each step. As you can see, the write() to fd1 updates the lseek pointer for fd2:

    UNIX> fs2
    UNIX> cat file2
    Jim
    Plank
    UNIX> fs2a
    Before writing Jim:   lseek(fd1, 0, 1): 0.  lseek(fd2, 0, 1): 0
    Before writing Plank: lseek(fd1, 0, 1): 4.  lseek(fd2, 0, 1): 4
    After writing Plank:  lseek(fd1, 0, 1): 10.  lseek(fd2, 0, 1): 10
    UNIX> cat file2 
    Jim
    Plank
    UNIX> 
    

    When fork() is called, ALL FILE DESCRIPTORS ARE DUPLICATED, AS IF dup() WERE CALLED. Thus, look at the following program (fs3.c): fs3.c

    main()
    {
      char s[1000];
      int i, fd;
    
      fd = open("file3", O_WRONLY | O_CREAT | O_TRUNC, 0644);
    
      i = fork();
      sprintf(s, "fork() = %d.\n", i);
      write(fd, s, strlen(s));
      close (fd);
    }
    

    What should happen? Well, whichever process gets control of the CPU first after the fork() will write s to file3. Then the other process will append its string s to file3. For example:

    UNIX> fs3
    UNIX> cat file3
    fork() = 0.
    fork() = 22107.
    UNIX> fs3
    UNIX> cat file3
    fork() = 0.
    fork() = 22110.
    UNIX> fs3
    UNIX> cat file3
    fork() = 22113.
    fork() = 0.
    UNIX> 
    
    This is because the file descriptor fd is duplicated across fork() calls. Were it not duplicated, but instead re-opened, then one write() would overwrite the other.

    Perhaps you're thinking, ``He opened a file and then called fork(). Does he have to worry about that buffer copying problem that was in the Fork Lecture notes?'' The answer is no, because I'm using write(), which is a system call, and there is no buffering. You have to worry about the buffering problem when the standard I/O library is being used, and the buffer is not empty when fork() is called. For example, look at fs3a.c and fs3b.c. They use fprintf() instead of write(). When I call them, I get the following:

    UNIX> fs3a
    UNIX> cat file3
    fork() = 0.
    fork() = 3716.
    UNIX> fs3b
    UNIX> cat file3
    This is file3
    fork() = 3719.
    This is file3
    fork() = 0.
    UNIX>
    
    Do you see where the copied buffer is a problem? Make sure you can explain this phenomenon.

    Dup2()

    Dup2() is a system call that dups an open file descriptor so that the result is a desired file descriptor.
    int dup2(int fd1, int fd2)
    
         With dup2(), fd2 specifies the  desired  value  of  the  new
         descriptor.   If  descriptor  fd2  is  already in use, it is
         first deallocated as if it were closed by close(2V).
    
    Dup2() is most often used so that you can redirect standard input or output. When you call dup2(fd, 0) and the dup2() call is successful, then whenever your program reads from standard input, it will read from fd. Similarly, when you call dup2(fd, 1) and the dup2() call is successful, then whenever your program writes to standard output, it will write to fd.

    For example, look at dup2ex.c. This opens the file file4 for writing, and then uses dup2 to redirect standard output to that file. It then writes to standard output in a variety of ways:

    main()
    {
      int fd;
      char *s;
    
      fd = open("file4", O_WRONLY | O_CREAT | O_TRUNC, 0666);
    
      if (dup2(fd, 1) < 0) { perror("dup2"); exit(1); }
    
      printf("Standard output now goes to file4\n");
    
      close(fd);
    
      printf("It goes even after we closed file descriptor %d\n", fd);
    
      putchar('p'); putchar('u'); putchar('t'); putchar('c'); putchar('h');
      putchar('a'); putchar('r'); putchar(' '); putchar('w'); putchar('o');
      putchar('r'); putchar('k'); putchar('s'); putchar('\n');
    
      s = "And fwrite\n";
    
      fwrite(s, sizeof(char), strlen(s), stdout);
    
      fflush(stdout); 
    
      s = "And write\n";
      write(1, s, strlen(s));
    }
    

    When it's done, you'll see that everything has gone into file4:

    UNIX> dup2ex
    UNIX> cat file4
    Standard output now goes to file4
    It goes even after we closed file descriptor 3
    putchar works
    And fwrite
    And write
    UNIX>
    
    Why did I make the fflush() call in dup2ex.c? Take it out and see. Make sure that you can explain this.

    The program dup2ex2.c puts a printf() statement at the beginning of the program:

    main()
    {
      int fd;
      char *s;
    
      printf("Printing this before we open anything.\n");
    
      /* Same as before.... */
    

    I'm doing this again to highlight buffering in the standard I/O library. If we run this program with standard output going to the screen, that first printf() statement goes to the screen, and the remainder to file4. This is because the standard I/O library flushes the buffer after every newline when the output is to the screen:

    UNIX> dup2ex2
    Printing this before we open anything.
    UNIX> cat file4
    Standard output now goes to file4
    It goes even after we closed file descriptor 3
    putchar works
    And fwrite
    And write
    UNIX> 
    
    However, if we redirect standard output to a file, then the standard I/O library does not flush the buffer until either the buffer is full, or the program exits. For that reason, the first string does not go to the screen, but instead it goes to the buffer. The buffer doesn't get flushed until after the dup2() call, so all of the output goes to file4:
    UNIX> dup2ex2 > file5
    UNIX> cat file5
    UNIX> cat file4
    Printing this before we open anything.
    Standard output now goes to file4
    It goes even after we closed file descriptor 3
    putchar works
    And fwrite
    And write
    UNIX> 
    
    I've been over this buffering phenomenon a lot. Make sure you understand it.

    You'll want to read this for jsh1

    Now, suppose you want to execute, for example, "cat < f1 > f2" by calling fork(), exec() and dup2() instead of doing it from the shell. You can do this in catf1f2.c. This opens f1 for reading on stdin (fd 0), and f2 for writing on stdout (fd 1).

    main(int argc, char **argv, char **envp)
    {
      int fd1, fd2;
      int dummy;
      char *newargv[2];
    
      if (fork() == 0) {
        fd1 = open("f1", O_RDONLY);
        if (fd1 < 0) {
          perror("catf1f2: f1");
          exit(1);
        }
      
        if (dup2(fd1, 0) != 0) {
          perror("catf1f2: dup2(f1, 0)");
          exit(1);
        }
        close(fd1);
      
        fd2 = open("f2", O_WRONLY | O_TRUNC | O_CREAT, 0644);
        if (fd2 < 0) {
          perror("catf1f2: f2");
          exit(2);
        }
      
        if (dup2(fd2, 1) != 1) {
          perror("catf1f2: dup2(f2, 1)");
          exit(1);
        }
        close(fd2);
    
        newargv[0] = "cat";
        newargv[1] = (char *) 0;
    
        execve("/bin/cat", newargv, envp);
        perror("execve(bin/cat, newargv, envp)");
        exit(1);  
      } else {
        wait(&dummy);
      }
    }
    

    Study this program closely, because you will find it greatly helpful in the jsh lab.