Answers: CS360 Exam #1. October 25, 1996

  • Jim Plank

  • General Comments

    The test went about as expected. It was challenging, but not overwhelming. I apologize for the confusion with question 3, but most of you seemed to understand the question ok. The way I anticipated the test going was: To see the score distribution see the histogram of final grades and Tukey plots below.

    As far translating numbers into grades, I'd say:

    The distinction between B and A is blurry. Below 30 means you should be striving to improve (and have good lab scores).

    Histogram of exam scores

    Tukey plots of exam scores

    (the line goes from min to max, the box from 1st quartile to 3rd quartile, dot at the mean and hash lines at the median).


    Question 1 (10 points)

    Part 1 (5 points)

    Explain why you should use the standard I/O library rather than the file I/O system calls for reading lines of text from a file. (Don't just give a one word answer -- explain it).

    Part 2 (5 points)

    If you are reading one character at a time from a file, why is it better to use getchar() or fgetc() rather than fread()?

    Answer

    Part 1 (5 points)

    The standard I/O library performs buffering. That is, when you read a small number of bytes from a file using the standard I/O library using a procedure like fgets(), it reads a large chunk of bytes into a buffer with one read() system call. Subsequent fgets() calls then read from the buffer until the buffer is empty, at which point it is filled anew with another read() call. Since system calls are expensive, this strategy takes much less time than performing a read() call for each line of text.

    Part 2 (5 points)

    Getchar() and fgetc() are optimized to perform buffered I/O of single bytes using the standard I/O library. Think of what getchar() needs to do:
    1. Check to see if the buffer is empty.
    2. If empty, fill it with a read call.
    3. Increment the pointer to the head of the buffer, and return the byte in the old head to the user.
    Contrast that with what fread() must do:
    1. Calculate the size of the request.
    2. Check to see if there are enough characters left in the buffer to satisfy the request.
    3. If so, copy the bytes using from the buffer to the given pointer.
    4. If not, copy as many bytes as you can, refill the buffer, and try again.
    While you don't have to do some of these steps when calling fread() for single bytes, there is still more processing that must be done. In fact, the pushing of 4 arguments onto the stack rather than zero (for getchar()) and one (for fgetc()) will mkae fread() perform poorly when compared to the others for the same tasks.

    Grading

    Grading of this question was straightforward. If you said ``getchar() and fgetc() are optimized for reading one character while fread() is not'', you received 4 points on part 2. You had to give some more detail to get the final point.

    Histogram of scores


    Question 2 (12 points)

    Convert the following C code into (unoptimized) assembler. Use only registers r0 and r1 (plus of course the stack and frame pointers).
    main(int argc, char **argv)
    {
      int i, j;
    
      j = 0;
      while(argc > 0) {
        i = j + 5;
        j = main(b(j), i-1);
      }
    }
    

    Answer

    Note that I have i at [fp-4] and j at [fp]. Some of you had the opposite, but that's fine.
    main:
    	push #8            / Allocate i and j, i is at [fp-4], j is at [fp]
    
            st %g0 -> [fp]     / Set j to zero
    l1:
            ld [fp+12] -> %r0  / Test for argc > 0
            cmp %r0, %g0
            ble l2         
            
            ld [fp] -> %r0       / Do i = j + 5.  Note you have to put 5 into
            mv #5 -> %r1         / a register before doing the addition
            add %r0, %r1 -> %r0
            st %r0 -> [fp-4]
    
            ld [fp-4] -> %r0     / Push i-1 onto the stack
            add %r0, %gm1 -> %r0
            st %r0 -> [sp]--
    
            ld [fp] -> %r0       / Call b(j) and put the return value
            st %r0 -> [sp]--     / (which is in r0) on the stack
            jsr b
            pop #4
            st %r0 -> [sp]--
     
            jsr main             / Call main and store the return value in j
            pop #8
            st %r0 -> [fp]
    
            b l1                 / Go back to the top of the while statement
    l2:
            ret
    

    Grading

    Grading was broken up into the following parts: If you did something that affected all parts, you received the following deductions (and these were not included in the above assessment): At the end, half points were rounded up.

    Histogram of scores


    Question 3 (14 points)

    So, as Kim pointed out in class, my definition of parent left something to be desired. What I meant to say was that parent prints out the name of the current directory as it appears in the parent directory. That is what my examples show. However, if you printed out the name of the parent directory, you received full credit.

    Part 1 (12 points)

    Write the program parent that prints the name of the parent directory of the current directory. In other words, the following should be the output of parent: You may not use system(), getcwd() or getenv().
    UNIX> cd /mahogany/homes/plank
    UNIX> pwd
    /mahogany/homes/plank
    UNIX> parent
    plank
    UNIX> cd papers
    UNIX> pwd
    /mahogany/homes/plank/papers
    UNIX> parent
    papers
    UNIX>
    
    See the last page of this test for prototypes of C library calls and system calls that may be helpful.

    Part 2 (2 points)

    What will be the output of:
    UNIX> cd /
    UNIX> parent
    

    Answer

    Part 1 (12 points)

    What you needed to do was find out the inode number of the current directory. You do that by calling stat() and using the st_ino field. Then, you traverse the .. directory and look for an entry whose inode (kept in the d_fileno field) number matches. When you find that match, print out the name of that directory entry. The code is as follows:
    #include 
    #include 
    #include 
      
    main()
    {
      struct stat buf;
      int i;
      int inode;
      DIR *d;
      struct dirent *de;
    
      if (stat(".", &buf) < 0) { perror("stat"); exit(1); }
    
      inode = buf.st_ino;
    
      d = opendir("..");
      if (d == NULL) { perror("opendir"); exit(1); }
      for (de = readdir(d); de != NULL; de = readdir(d)) {
        if (de->d_fileno == inode) {
          printf("%s\n", de->d_name);
          exit(0);
        }
      }
      printf("If my program gets here, life is unhappy....\n");
    }
    

    Part 2 (2 points)

    Since .. equals ., it will print out either .. or ., depending on which is gets to first (and on our systems, it will get to .. first).

    Grading

    There are really two parts to this program: performing stat on . (or .. if you tried to print out the parent directory) and saving the inode number, and traversing .. (or ../.. if you tried to print out the parent directory) and trying to match the inodes. If you did a reasonable approximation of those two parts, then you started off with 12 points, and were deducted for things that were wrong.

    If you did some random stat and directory traversal, then you started off with 8 points, and were deducted for things that were egregiously wrong. Also, if you used realpath instead of stat/opendir you started here.

    If you wrote little that made sense you got a few points, the number of which depended on how much sense you made.

    For part 2, you had to write something reasonable to get points.

    Histogram of scores


    Question 4 (14 points)

    This question concerns the program printword. This program takes a file on standard input, and sorts the words. For each word, it prints out the line numbers (sorted) that contain the word. It skips lines whose first word starts with the '#' character. It should not print out a line number twice for the same word.

    Thus, for the following file:

    UNIX> cat geh
    # Beginning
    I am Sam
    I am Sam
    Sam I am
       That Sam I am   That Sam I am    I do not like that Sam I am
     # end
    UNIX> 
    
    The output of the program should be the following:
    UNIX> wordline < geh
    I: 2, 3, 4, 5
    Sam: 2, 3, 4, 5
    That: 5
    am: 2, 3, 4, 5
    do: 5
    like: 5
    not: 5
    that: 5
    UNIX>
    
    Now, behold the following code for printword:
    #include < stdio.h >
    #include "fields.h"
    #include "rb.h"
    
    main()
    {
      IS is;
      Rb_node t, tmp;
      char *s;
      int i, fnd;
    
      t = make_rb();
      is = new_inputstruct(NULL);
    
      while(get_line(is) > 0) {
        if (is->text1[0] != '#') {
          for (i = 0; i < is->NF; i++) {
            tmp = rb_find_key_n(t, is->fields[i], &fnd);
            if (!fnd || is->line != (int) (tmp->v.val)) {
              rb_insert(t, is->fields[i], (char *) (is->line));
            }
          }
        }
      }
    
      s = NULL;
      rb_traverse(tmp, t) {
        if (strcmp(s, tmp->k.key) != 0) {
          if (s != NULL) printf("\n");
          printf("%s: %d", tmp->k.key, (int) (tmp->v.val));
        } else {
          printf(", %d", (int) (tmp->v.val));
        }
        s = tmp->k.key;
      }
      printf("\n");
    }
    
    There are five bugs in this program. By ``bug'', I mean that they will cause incorrect output (or core dumpage), not inefficiency. Four of them are simple and can be fixed within the line that they occur. The fifth is a disign flaw in the program. For each of these bugs: Note -- none of the bugs are syntax/compiler errors. This code will compile just fine. They are all functional errors.

    Again, prototypes of relevant C functions and structs are at the end of the test.

    Answer

    The bugs:
    1. In the statement:
        while(get_line(is) > 0) {
      
      The > should be changed to >=. Otherwise, the program will exit the first time it sees a blank line.
    2. The statement:
        if (is->text1[0] != '#') {
      
      is wrong, because it only tests the first character of each line. Lines that start with '#' but not at the first character will not be omitted. It should be:
        if (is->NF > 0 && is->fields[0][0] != '#') {
      
    3. The rb_insert statement should be called as follows:
           rb_insert(t, strdup(is->fields[i]), (char *) (is->line));
      
      Since each get_line() does not implicitly call malloc(), the string inserted into the rb-tree will be overwritten each time get_line() is called.
    4. The following line:
          if (strcmp(s, tmp->k.key) != 0) {
      
      is going to cause a segmentation violation when it is first called, since s is NULL. There are several ways around this. A simple one is to change the line to:
          if (s == NULL || strcmp(s, tmp->k.key) != 0) {
      
    5. The design flaw is that words are inserted into the rb-tree if the combination of [word,line] is not in the tree. However, the rb-tree library does not guarantee that if you insert a word twice, you can make any assumptions about where the first and second word are. Thus, the line numbers will not be sorted, and there may be line numbers multiply inserted. To fix this, you must either have a secondary rb-tree for each word which is the line number, or you can use "rb_insertg", and have the key field be a pointer to a struct with both the word and line numbers. In both cases, you'll have to alter the printing loop too.

    Grading

    Each bug was worth 4 points: one for identifying the bug, one for describing what it does, one for fixing it, and an extra one if you got all three points for a bug. At the end, you get the minimum of your score and 14.

    Histogram of scores