CS360 Lab #4 -- Jtar

  • Jian Huang
  • CS360

  • So, this is your first real systems program. You will be writing a pared-down version of tar(1). It will help you to read the man page for tar, and try using it a bit. Use ``tar cvf tarfile files'' to create a tar file, and ``tar xpofv tarfile'' to unpack a tar file. Jtar will work like ``tar cf - files'' and ``tar xpof -''.

    Description of jtar

    Your job is to write the program jtar. Like tar, jtar can be called in one of two ways:

    Examples

    Suppose we have an example user barkley, and suppose he cd's to the directory ~plank/cs360/labs/lab3, and does the following.
    UNIX> cd ~plank/cs360/labs/lab3
    UNIX> pwd
    /home/plank/cs360/labs/lab3
    UNIX> ls -l
    total 18
    -rw-r--r--  2 plank          27 Sep  9 11:48 f.c
    -rw-r--r--  2 plank          36 Sep  9 11:48 f.fm
    -rw-r--r--  2 plank          55 Sep  9 11:49 f.h
    -rw-r--r--  2 plank          69 Sep  9 11:48 f1.c
    -rw-r--r--  2 plank          69 Sep  9 11:48 f2.c
    -rw-r--r--  2 plank       10061 Sep  9 14:18 lab3.html
    -rw-r--r--  2 plank         490 Sep  9 15:09 makefile
    -rw-r--r--  2 plank         871 Sep  9 11:49 mysort.c
    -rw-r--r--  2 plank         155 Sep  9 12:03 mysort.fm
    UNIX> jtar c . > ~/tarfile
    UNIX> cd
    UNIX> ls -l tarfile
    -rw-r--r--  1 barkley     13303 Sep 25 11:25 tarfile
    
    What he has done is create a tarfile which holds the stuff in my directory cs360/labs/lab3. Now, he can recreate those files in a directory of his own:
    UNIX> pwd
    /home/barkley
    UNIX> mkdir notes
    UNIX> cd notes
    UNIX> ls -l
    total 0
    UNIX> jtar x < ../tarfile
    UNIX> ls -l
    total 18
    -rw-r--r--  1 barkley        27 Sep  9 11:48 f.c
    -rw-r--r--  1 barkley        36 Sep  9 11:48 f.fm
    -rw-r--r--  1 barkley        55 Sep  9 11:49 f.h
    -rw-r--r--  1 barkley        69 Sep  9 11:48 f1.c
    -rw-r--r--  1 barkley        69 Sep  9 11:48 f2.c
    -rw-r--r--  1 barkley     10061 Sep  9 14:18 lab3.html
    -rw-r--r--  1 barkley       490 Sep  9 15:09 makefile
    -rw-r--r--  1 barkley       871 Sep  9 11:49 mysort.c
    -rw-r--r--  1 barkley       155 Sep  9 12:03 mysort.fm
    UNIX> 
    
    Note, all the files are recreated with the same protection modes, and the same modification times. They also have the same access times (as you would be able to see with ls -lu). Jtar also saves hard links and directories, as well as the contents of directories. However, jtar ignores soft links and all other non-regular files. Thus, for example, suppose that barkley cd's to the directory /home/plank/cs360/labs/lab4/d1, and does the following:
    UNIX> cd /home/plank/cs360/labs/lab4/d1
    UNIX> pwd
    /home/plank/cs360/labs/lab4/d1
    UNIX> ls -l
    total 4
    -rw-r--r--   2 plank          11 Feb 20  1995 f2
    -rw-r--r--   2 plank          11 Feb 20  1995 f2-hard-link
    lrwxrwxrwx   1 root            2 Aug 23 12:42 f2-soft-link -> f2
    dr-xr-xr-x   2 plank         512 Feb 20  1995 sub_dir
    UNIX> ls -l sub_dir
    total 1
    -r--r--r--   1 plank          11 Feb 20  1995 f1
    UNIX> jtar c sub_dir/f1 > ~/tf2
    UNIX> jtar c . > ~/tf3
    UNIX> cd
    UNIX> ls -l tf*
    -rw-r--r--   1 barkley       161 Sep 12 12:11 tf2
    -rw-r--r--   1 barkley       762 Sep 12 12:11 tf3
    
    So, what's going on? The directory has 3 files and a subdirectory. F2 and f2-hard-link are the same file (i.e. both links to the same file), and f2-soft-link is a soft link to f2. Sub_dir is a subdirectory with one file. Note the protection modes of both sub_dir and f1.

    Now, note what happens when barkley extracts tf2:

    UNIX> cd
    UNIX> pwd
    /home/barkley
    UNIX> mkdir ex1
    UNIX> cd ex1
    UNIX> jtar x < ../tf2
    UNIX> ls -l
    total 1
    drwxr-xr-x   2 barkley       512 Sep 12 12:12 sub_dir
    UNIX> ls -l sub_dir
    total 1
    -r--r--r--   1 barkley        11 Feb 20  1995 f1
    UNIX>
    
    Jtar remade the file sub_dir/f1, and in doing so, had to create the directory sub_dir. It did so with the default protection. Why didn't it remake sub_dir with it's original protection? Because that's not in the tar file (look again and see how tf2 was made.).

    Now, note what happens when barkley extracts tf3:

    UNIX> cd
    UNIX> pwd
    /home/barkley
    UNIX> mkdir ex2
    UNIX> cd ex2
    UNIX> jtar x < ../tf3
    UNIX> ls -l
    total 3
    -rw-r--r--   2 barkley        11 Feb 20  1995 f2
    -rw-r--r--   2 barkley        11 Feb 20  1995 f2-hard-link
    dr-xr-xr-x   2 barkley       512 Feb 20  1995 sub_dir
    UNIX> ls -l sub_dir
    total 1
    -r--r--r--   1 barkley        11 Feb 20  1995 f1
    
    Jtar saved all files reachable from /home/plank/cs360/labs/lab4/d1/. Note that f2 and f2-hard-link point to the same file, and that the soft link f2-soft-link was ignored. The directory sub_dir and its file f1 were both recreated. The protection modes and file access times of all files were restored.


    The cv and xv options

    Jtar should also support the options cv and xv in place of c and x. This should work just as before, only jtar should print on standard error the directories and files that are saved, along with their sizes as it does so. For example:
    UNIX> pwd
    /home/plank/cs360/labs/lab4
    UNIX> jtar cv . > ~/tf3
    Directory .
    Directory ./sub_dir
    File ./sub_dir/f1    11 bytes
    File ./f2    11 bytes
    File ./f2-hard-link link to ./f2
    Ignoring Soft Link ./f2-soft-link
    UNIX> cd
    UNIX> cd ex2
    UNIX> pwd
    /home/barkley/ex2
    UNIX> jtar xv < ../tf3
    Directory: .
    Directory: ./sub_dir
    File: ./sub_dir/f1    11 bytes
    File: ./f2    11 bytes
    Link: ./f2-hard-link to ./f2
    

    Duplicates

    If duplicate files are somehow specified, tar should be able to recognize that, and only save one copy. For example:
    UNIX> pwd
    /home/plank/cs360/labs/lab4/d1
    UNIX> jtar c . > ~/tf3
    UNIX> jtar c . . . f2 > ~/tf4
    UNIX> cd
    UNIX> ls -l tf3 tf4
    -rw-r--r--   1 plank         762 Sep 12 12:15 tf3
    -rw-r--r--   1 plank         904 Sep 12 12:15 tf4
    UNIX>
    
    Note that tf4 is a little bigger than tf3, but not 3 times bigger. That is because it has no extra files -- the extra files specified are all duplicates, and thus need not be included in the tar file (however their names are there, hence the extra size --- as it turns out, you don't even need to have the names there ).

    The simplest way to do this is to make use of the library call realpath() (read the man page). You should maintain a rb-tree of real path names, and whenever you discover a duplicate, you can ignore it. (Note, though, that you still need a rb-tree for inodes so that you can recognize hard links).


    A word of warning

    Make sure you always try to extract files into a clean directory. Do not extract files into your working directory, because if you have a bug, you may trash your files. I've heard of students trashing their home directories doing this program. Make sure you be careful not to let that happen to you. Saving full path names or following .. in this assigment can also do this. Be careful.

    Also

    No use of the system() command is allowed. You must make use of the system calls that Unix provides for you (note that "system call" is different from the system() procedure call).

    Strategy

    So, this is a nontrivial program. The following is the strategy that I used for writing jtar. You may want to use a different strategy if you find that easier; however, this one has the benefit of creating jtar in small pieces, each of which may be tested before going on.