Manipulating the File System with Perl


  1. These notes largely summarize the Files and Directories chapter from Perl in 24 Hours by Clinton Pierce, although they contain additional material as well.


    Overview

    Perl is often used by system administrators to install software packages and to perform other types of file manipulation. These notes provide a brief overview of Perl's file manipulation commands. For basic directions on how to open, close, and read files, see perl2.


    Directories

    You've already had a brief introduction in the perl2 notes to reading/writing and opening/closing files in Perl. Sometimes you will want to perform an operation on either all of the files in a directory or a certain class of files, such as those ending with the suffix .html. You can determine which files are in a directory in one of two ways: 1) use glob, or 2) use opendir/closedir/readdir.

    Glob

    The glob operator can also be used to obtain either all the names in a directory or the names of a specified class of files in a directory. Its formal syntax is:

    glob 'pattern'
         or
    <pattern> # no no quotes are needed
    
    where pattern may be a C-shell type of file pattern. For example: @files = glob '*.html'; @perlfiles = <*.perl>; Note that this type of pattern is not as powerful or flexible as regular expressions. A pattern may be composed from names, partial names, and the operators shown in the following table:

    patternmeaningexample
    ?matches a single character-similar to . in regular expressionsbrad? matches any filename that starts with brad and then has a fifth character.
    *matches zero or more characters. Similar to '.*' in regular expressions.*.perl matches all files with the suffix .perl.
    [chars]matches any of the characters between the brackets. Ranges like [3-6] and [a-z] are allowed. Shorthand notation like \d does not work.*[1-9]* matches any file that has a numeric digit.
    {a1,a2,...,an}matches any of the strings between the braces*.{perl,html} matches any file with either the suffix .perl or the suffix .html. Note that there is no space between the comma and the alternative strings.

    opendir/closedir/readdir

    opendir returns a directory handle that can be used by readdir to access the filenames in a directory. The formal syntax for these commands is:

    • opendir directory_handle, directory_name: The directory handle typically consists of upper case letters, just like file handles.
    • readdir directory_handle
    • closedir(directory_handle)
    readdir will return a single file name when used in a scalar context and a list of file names when used in a list context. Typically the filenames will be returned in alphabetical order. As an example, the following sample code prints a list of files in the /notes directory, thus mimicking the UNIX ls command:
         opendir(MY_DIR, 'perl') || die "Cannot open perl";
         while ($filename = readdir(MY_DIR)) {
           print "$filename\n";
         }
         closedir(MY_DIR);
         
         sample output
         .
         ..
         #data#
         #perl2.html#
         #perlsys.html#
         .#data
         .#perlsys.html
         add
         catinput.perl
         data
         data~
         g.perl
         g.perl~
         ...
         
    Notice that readdir does not return the full pathname of the file.

    You can use the grep command to narrow the files returned by readdir to a particular category of files. grep takes a regular expression and a list of names and returns a sublist with all names that match the regular expression. For example, to find all .html files in the perl directory I could write:

         opendir(MY_DIR, 'perl') || die "Cannot open perl";
         @html_files = grep(/^.+\.html$/, readdir(MY_DIR));
         

    Changing Directories

    chdir newdir: changes to a new directory. For example:

    chdir '/home/bvz/cs594' # changes to my cs594 directory.
    

    Things to note about chdir:

    1. it returns false if it cannot change to the directory (e.g., you do not have permission to access the directory or the directory does not exist).

    2. if you do not give chdir a directory name then it changes to your home directory.

    3. the directory in which you ran the Perl program is restored when the Perl program exits.

    4. if you want to only temporarily change to the new directory and then move back to the existing directory you will need to save the name of the existing directory. To do this put the following use statement at the beginning of your program:
           use Cwd;
           
      and then use the cwd command to return the name of your current directory:
           $save_dir = cwd;
           
      The following code will change to my cs594 directory and then change back to the current directory:
           use Cwd;
           $save_dir = cwd;
           chdir '/home/bvz/cs594';
           chdir $save_dir;
           
      The use statement imports all the functions defined in the Cwd module. We will not be covering modules in this course so do not worry about how modules get constructed.

    Creating and Removing Directories

    mkdir 'directory_name', permissions
    rmdir 'pathname'
    
    Notes:

    1. You can give a pathname to mkdir but the entire path up to directory_name must exist. For example, if you try to mkdir 'grades/bvz' and grades does not already exist, then mkdir will return false.
    2. mkdir and rmdir return true or false based on whether or not they succeed.
    3. The permissions are C-shell style permissions that use octal numbers. For example "mkdir 'bvz', 0644" creates a directory that gives me read/write permissions and the rest of the world read only permission.


    unlink deletes files and returns the number of files it deleted:

    unlink list_of_files 
    unlink 'file_name'
    
    Notes:

    1. unlink is particularly succinct when used with the <> form of glob. For example, the following statement deletes all files in the current directory that end with the suffix .html: unlink <*.html>;


    Renaming Files

    The rename command renames files:

    rename 'oldname', 'newname'
    
    Notes:

    1. rename is like the UNIX mv command.
    2. If newname exists, it gets clobbered.
    3. rename returns true/false depending on whether or not it succeeded.
    4. if rename fails then $! contains the reason why it failed.
    5. If oldname and newname are directories then rename changes the name of the directory.
    6. You can move files from one directory to another using rename. For example: rename 'perlsys.html', '/home/bvz/cs594/notes/perlsys.html'; moves perlsys.html to my cs594/notes directory.
    7. Here is an example of using the glob operator to find all the .doc files in a directory that start with XHTML and contain spaces and converting them to named files without the spaces: foreach $i (<"XHTML - *.doc">) { $y = $i; # save the original file name $i =~ s/XHTML - /XHTML_/; # replace the spaces with a _ character rename ( $y, $i ); # rename the original file with the new name }


    Getting File Information

    Perl provides a great many "flag-like" commands that allow you to query a file about certain information, such as whether or not it exists, its size in bytes, and its last modification date. For example, the following code tests whether or not a file exists before trying to open it:

    if (-e $inputfile) {
      open(INPUT, $inputfile) || die "$inputfile: $!"; # we could still fail
    }
    
    Some of the most useful flag operators and their meanings are shown in the following table. Unless otherwise noted, the return value is true/false (1/0):

    operatormeaning
    -ewhether or not the file/directory exists
    -rwhether or not the script has read permission for this file/directory
    -wwhether or not the script has write permission for this file/directory
    -xwhether or not the script has execute permission for this file/directory
    -owhether or not the script "owns" this directory and hence whether or not it can remove it.
    -sthe size of the file in bytes or undef if the file does not exist
    -Mtime since last modification, measured in days.
    -Atime since last access, measured in days.

    Even more file information can be obtained via Perl's stat and lstat functions, but a description of them are beyond the scope of this course.