CS360 Lecture notes -- Files, Links and Inodes


Files

Files store information on secondary storage. This means that the information should exist even when the computer that stored it is powered off. This is as opposed to primary storage, which only works while a computer is powered up, and which goes away forever when a computer is powered down.

When you create a file in Unix, there are quite a few things that happen. In this lecture, we are going to focus on three components of a file in Unix:

Let's do a simple example:
UNIX> echo "This is f1.txt" > f1.txt
UNIX> ls -lai
total 20
497584377 drwxr-xr-x  2 plank guest   38 Feb  3 13:54 .
430430681 drwxr-xr-x 51 plank guest 4096 Feb  3  2014 ..
497584369 -rw-r--r--  1 plank loci    15 Feb  3 13:54 f1.txt
497584451 -rw-r--r--  1 plank guest 9896 Feb  3 13:44 lecture.html
UNIX>
We created a file called f1.txt, and that places three things on disk: We can associate a picture with this example:

You can see the inode for f1.txt (497584369), and how it points to block 0x4ab, which contains the bytes of the file (you don't have access to this information unless you are the operating system -- I just made up the number 0x4ab for the sake of the example). I have included the information in the directory, which is itself a file on disk, and the inode for that file. See how everything links together?

Also, you will note that I haven't put a null character at the end of the string in the disk block. That's because there is no null character -- that is only there when you are using a string inside a C program. When you write it to disk, there is no null character.

When you give the -i flag to ls, it will tell you the inode number, as in the example above.

To use Unix lingo, the way we name a file is by attaching a "link" to the inode. Links are stored in "directories" -- each entry in a directory maps the name of the link to the inode number of the inode that points to the file. Again, you can see that in the example above.

We can have more than one link point to a file. Suppose we are in a fresh directory, and we have created the file f1 to contain the bytes "This is f1\n". Moreover, suppose that file has an inode number of 34778. And now we do the following:

UNIX> ln f1 f2
This says to create another link to the file f1, and call it "f2". That link is really an entry in the directory that maps "f2" to inode 34778. What we have now are two pointers to the same metadata and the same bytes on disk. When we do a listing:
UNIX> ls -li f1 f2
34778  -rw-r--r--  2 plank          11 Sep 16 10:12 f1
34778  -rw-r--r--  2 plank          11 Sep 16 10:12 f2
UNIX> cat f1
This is f1
UNIX> cat f2
This is f1
UNIX> 
We see that the files are exactly the same, except that the links have different names. If we change either of these files -- for example, let's edit f2 using vi, and change the word "This" to "That", then the change is seen in both f1 and f2, because they both point to the same bytes on disk:
UNIX> vi f2
...
UNIX> cat f2
That is f1
UNIX> cat f1
That is f1
UNIX> ls -li f1 f2
34778  -rw-r--r--  2 plank          11 Sep 16 10:14 f1
34778  -rw-r--r--  2 plank          11 Sep 16 10:14 f2
UNIX>
Note that even though we only modified f2, the file modification time for f1 has changed as well. That is because file modification time is stored as part of the inode -- thus, when f2 changes it, the change is seen in f1 as well. Same with file protection modes. If we change the protection for f1, then we will see the changes in f2:
UNIX> chmod 0400 f1
UNIX> ls -li f1 f2
34778  -r--------  2 plank          11 Sep 16 10:14 f1
34778  -r--------  2 plank          11 Sep 16 10:14 f2
UNIX> 
Note the third column of the ls command. It is the number of links to the file. If we make another link to f1, then this column will be updated:
UNIX> ln f1 f3
UNIX> ls -li f1 f2 f3
34778  -r--------  3 plank          11 Sep 16 10:14 f1
34778  -r--------  3 plank          11 Sep 16 10:14 f2
34778  -r--------  3 plank          11 Sep 16 10:14 f3
When we use the "rm" command, we are actually removing links. E.g.
UNIX> chmod 0644 f1
UNIX> rm f1
UNIX> ls -li f*
34778  -rw-r--r--  2 plank          11 Sep 16 10:14 f2
34778  -rw-r--r--  2 plank          11 Sep 16 10:14 f3
UNIX> 
When the last link to a file is removed, then the file itself, inode and all, is deleted. As long as there is a link pointing to a file, however, the file remains. It is interesting to see what happens when files with links are overwritten. For example, suppose I do the following:
UNIX> cat > f2
This is now file f2
^D
UNIX> cat f2
This is now file f2
UNIX> cat f3
This is now file f2
By saying you want to redirect output to the file f2, you end up changing f3. This means that when the shell performs output redirection, it opens the file and truncates it, instead of removing the file and creating it anew.

Instead, suppose you do:

UNIX> gcc -o f2 ../Stat/src/ls1.c
UNIX> ls -li f*
34779  -rwxr-xr-x  1 plank       24576 Sep 16 10:16 f2
34778  -rw-r--r--  1 plank          20 Sep 16 10:16 f3
UNIX> 
You'll note that the c compiler gcc did a "rm f2" before creating f2 as an executable.

All directories have at least 2 links:

UNIX> mkdir test
UNIX> ls -li | grep test
34800  drwxr-xr-x  2 plank         512 Sep 16 10:17 test
UNIX> 
This is because every directory contains two subdirectories "." and ".." The first is a link to itself, and the second is a link to the parent directory. Thus, there are two links to the directory file "test": "test" and "test/." Similarly, suppose we make a subdirectory of test:
UNIX> mkdir test/sub
UNIX> ls -li | grep test
34800  drwxr-xr-x  3 plank         512 Sep 16 10:17 test
UNIX> 
Now there are three links to "test": "test", "test/.", and "test/sub/.."

Besides these links which are automatically created for you, you cannot manually create links to directories. Instead, there is a special kind of a link called a "symbolic link" (also called a "soft link"), which you make using the command "ln -s". For example, we can create a soft link to the test directory as follows:

UNIX> ln -s test test-soft
UNIX> ls -li | grep test
34800  drwxr-xr-x  3 plank         512 Sep 16 10:17 test
34801  lrwxrwxrwx  1 plank           4 Sep 16 10:18 test-soft -> test
UNIX>
Note that soft links have a different kind of directory listing. Moreover, note that the creation of a soft link to "test" doesn't update the link field of test's inode. That only records regular, or "hard" links.

A soft link is a way of pointing to a file without changing the file's inode. However, soft links can do pretty much everything that hard links can do:

UNIX> cat > f1
This is f1
UNIX> ln -s f1 f2
UNIX> cat f2
This is f1
UNIX> cat > f2
This is f2
UNIX> cat f1
This is f2
UNIX> ls -l f*
-rw-r--r--  1 plank          11 Sep 16 10:19 f1
lrwxrwxrwx  1 plank           2 Sep 16 10:18 f2 -> f1
UNIX> chmod 0600 f2
UNIX> ls -l f*
-rw-------  1 plank          11 Sep 16 10:19 f1
lrwxrwxrwx  1 plank           2 Sep 16 10:18 f2 -> f1
UNIX> 
What is the main difference between hard and soft links then? Well, if the file to which the soft link points gets deleted or moved, then the link becomes unusable:
UNIX> rm f1
UNIX> ls -l f*
lrwxrwxrwx  1 plank           2 Sep 16 10:18 f2 -> f1
UNIX> cat f2
cat: f2: No such file or directory
UNIX> 
The link is called "unresolved."


In Unix, you cannot make hard links from a file in one filesystem to a directory in another filesystem. I.e., from your student accounts, you cannot do a command such as:
UNIX> ln /home/jplank/cs360/notes/Links/lecture.html ~/lecture.html
because your home directory is not on the same filesystem as mine. However, you can make a soft link:
UNIX> ln -s /home/jplank/cs360/notes/Links/lecture.html ~/lecture.html