CS140 Lecture notes -- Fields

  • Jim Plank -- -- with modifications by Brad Vander Zanden
  • Directory: /blugreen/homes/plank/cs140/Spring-1999/notes/Fields
  • Lecture notes: http://web.eecs.utk.edu/~jplank/plank/classes/cs140/Spring-1999/notes/Fields/
  • Fri Sep 25 09:41:52 EDT 1998
    The fields library is a suite of routines that make reading input easier than using getchar(), scanf() or gets(). This is a library that I wrote -- it is not standard in Unix, but it should work with any C compiler (this includes on DOS/Windows). If you want to take the fields library with you after class, go ahead and do so. You can get the source code at http://web.eecs.utk.edu/~jplank/plank/fields/fields.html.

    In order to use the fields procedures in this class, you should include the file fields.h, which can be found in the directory /home/cs140/spring-2004/include. Instead of including the full path name in your C file, just do:

    #include "fields.h"
    
    and then compile the program with:
    gcc -I/home/cs140/spring-2004/include
    
    Also when you link your object files to make an executable, you need to include /home/cs140/spring-2004/objs/libfdr.a.

    The makefile in this directory does both of these things for you. When you look over the file printwords.c, make sure you figure out how to compile it so that it finds fields.h, and so that the compilation links with either fields.o or libj.a.

    An Aside on on the -I Flag

    The -I flag tells gcc the location of additional directories that it can use to find .h files that you have defined. Recall from your introductory CS course that include files come in two flavors: 1) system-defined .h files, such as stdio.h, that you include using braces (<>), and 2) user-defined .h files, such as fields.h, that you include using quotes (""). By default gcc looks in certain system-defined locations for system-defined include files. By default gcc looks in the current directory for user-defined .h files. However, you will often want to include .h files that are not defined in the current directory, such as fields.h. In that case you must tell gcc where it can find these files. The -I flag allows you to do just that. You can include multiple -I flags in the same gcc command. For example:

    gcc -I/home/cs140/spring-2004/include -I/home/bvz/include
    


    The fields library defines and implements a data structure that simplifies input processing in C. The data structure consists of a type definition and four procedure calls. All are defined in fields.h:
    #define MAXLEN 1001
    #define MAXFIELDS 1000
    
    typedef struct inputstruct {
      char *name;               /* File name */
      FILE *f;                  /* File descriptor */
      int line;                 /* Line number */
      char text1[MAXLEN];       /* The line */
      char text2[MAXLEN];       /* Working -- contains fields */
      int NF;                   /* Number of fields */
      char *fields[MAXFIELDS];  /* Pointers to fields */
      int file;                 /* 1 for file, 0 for popen */
    } *IS; 
    
    extern IS new_inputstruct(/* FILENAME -- NULL for stdin */);
    extern IS pipe_inputstruct(/* COMMAND -- NULL for stdin */);
    extern int get_line(/* IS */); /* returns NF, or -1 on EOF.  Does not
                                      close the file */
    extern void jettison_inputstruct(/* IS */);  /* frees the IS and fcloses 
                                                    the file */
    
    To read a file with the fields library, you call new_inputstruct() with the proper filename. New_inputstruct() takes the file name as its argument (NULL for standard input), and returns an IS as a result. Note that the IS is a pointer to a struct inputstruct. This is malloc()'d for you in the new_inputstruct() call. If new_inputstruct() cannot open the file, it will return NULL, and you can call perror() to print out the reason for the failure (read the man page on perror() if you want to learn about it).

    Once you have an IS, you call get_line() on it to read a line. Get_line() changes the state of the IS to reflect the reading of the line. Specifically:

    Jettison_inputstruct() closes the file associated with the IS and deallocates (frees) the IS. Do not worry about pipe_inputstruct() for now.


    These procedures are very convenient for processing input files. For example, the following program (in printwords.c) prints out every word of an input file prepended with its line number:
    #include < stdio.h >
    #include "fields.h"
    
    main(int argc, char **argv)
    {
      IS is;
      int i;
    
      if (argc != 2) {
        fprintf(stderr, "usage: printwords filename\n");
        exit(1);
      }
     
      is = new_inputstruct(argv[1]);
      if (is == NULL) {
        perror(argv[1]);
        exit(1);
      }
    
      while(get_line(is) >= 0) {
        for (i = 0; i < is->NF; i++) {
          printf("%d: %s\n", is->line, is->fields[i]);
        }
      }
    
      jettison_inputstruct(is);
      exit(0);
    }
    
    So, for example, if the file rex.in contains the following three lines:
    June: Hi ... I missed you!
    Rex:  Same here!  You're all I could think about!
    June: I was?
    
    Then running printwords on rex.in results in the following output:
    UNIX> printwords rex.in
    1: June:
    1: Hi
    1: ...
    1: I
    1: missed
    1: you!
    2: Rex:
    2: Same
    2: here!
    2: You're
    2: all
    2: I
    2: could
    2: think
    2: about!
    3: June:
    3: I
    3: was?
    UNIX>
    

    One important thing to note about fields.o is that only new_inputstruct() calls malloc(). Get_line() simply fills in the fields of the IS structure --- it does not perform memory allocation. Therefore, suppose you wanted to print out the first word on the second-to-last line. The following program (badword.c) would not work:
    #include < stdio.h >
    #include "fields.h"
    
    main(int argc, char **argv)
    {
      IS is;
      int i;
      char *penultimate_word;
      char *last_word;
    
      if (argc != 2) {
        fprintf(stderr, "usage: badword filename\n");
        exit(1);
      }
     
      is = new_inputstruct(argv[1]);
      if (is == NULL) {
        perror(argv[1]);
        exit(1);
      }
    
      penultimate_word = NULL;
      last_word = NULL;
    
      while(get_line(is) >= 0) {
        penultimate_word = last_word;
        if (is->NF > 0) {
          last_word = is->fields[0];
        } else {
          last_word = NULL;
        }
      }
    
      if (penultimate_word != NULL) printf("%s\n", penultimate_word);
      jettison_inputstruct(is);
      exit(0);
    }
    
    Why? Look at what happens when you execute it on rex.in:
    UNIX> badword rex.in
    June:
    UNIX>
    
    It prints ``June:'' instead of ``Rex:'' because get_line() does not allocate any new memory. Both penultimate_word and last_word end up pointing to the same thing. Make sure you understand this example, because you can get yourself into a mess of trouble otherwise. The correct version of the program is in goodword.c: (note that this is a very inefficient program because of all the strdup() and free() calls. You could do better if you wanted to.
    #include < stdio.h >
    #include < string.h >
    #include "fields.h"
    
    main(int argc, char **argv)
    {
      IS is;
      int i;
      char *penultimate_word;
      char *last_word;
    
      if (argc != 2) {
        fprintf(stderr, "usage: badword filename\n");
        exit(1);
      }
     
      is = new_inputstruct(argv[1]);
      if (is == NULL) {
        perror(argv[1]);
        exit(1);
      }
    
      penultimate_word = NULL;
      last_word = NULL;
    
      while(get_line(is) >= 0) {
        if (penultimate_word != NULL) free(penultimate_word);
        penultimate_word = last_word;
        if (is->NF > 0) {
          last_word = strdup(is->fields[0]);
        } else {
          last_word = NULL;
        }
      }
    
      if (penultimate_word != NULL) printf("%s\n", penultimate_word);
      jettison_inputstruct(is);
      exit(0);
    }
    
    Field.o assumes that all input lines are less than 1000 characters.

    tailanyf

    Now, as another example, let's write tailany from the PointMalloc lecture using the fields library. This simply uses the fields library and is->text1 instead of gets(). Everything else is pretty much the same. The code is in tailanyf.c.