Answers: CS360 Exam #1. October 25, 1996

Jim Plank

General Comments

The test went about as expected. It was challenging, but not overwhelming. I apologize for the confusion with question 3, but most of you seemed to understand the question ok. The way I anticipated the test going was:

Question 1: A giveaway -- straight from the lecture notes.
Question 2: Easier than your lab -- should take a few minutes, but if you did lab 7, you should have done fine.
Question 3: Requires thought and coding. Not surprisingly, the scores showed a lot of variance.
Question 4: Subtle code reading -- a very difficult question to do on a timed, test, but the grading was lenient -- you only needed to find 3.5 bugs out of 5 to get a perfect score.

To see the score distribution see the histogram of final grades and Tukey plots below.

As far translating numbers into grades, I'd say:

Over 45: A+
35 -- 45: A
30 -- 34: B to B+
25 -- 29: C+
20 -- 24: C
15 -- 19: D
10 -- 14: F

The distinction between B and A is blurry. Below 30 means you should be striving to improve (and have good lab scores).

Histogram of exam scores

Tukey plots of exam scores

(the line goes from min to max, the box from 1st quartile to 3rd quartile, dot at the mean and hash lines at the median).

Question 1 (10 points)

Part 1 (5 points)
Explain why you should use the standard I/O library rather than the file I/O system calls for reading lines of text from a file. (Don't just give a one word answer -- explain it).
Part 2 (5 points)
If you are reading one character at a time from a file, why is it better to use getchar() or fgetc() rather than fread()?

Answer

Part 1 (5 points)

The standard I/O library performs buffering. That is, when you read a small number of bytes from a file using the standard I/O library using a procedure like fgets(), it reads a large chunk of bytes into a buffer with one read() system call. Subsequent fgets() calls then read from the buffer until the buffer is empty, at which point it is filled anew with another read() call. Since system calls are expensive, this strategy takes much less time than performing a read() call for each line of text.

Part 2 (5 points)

Getchar() and fgetc() are optimized to perform buffered I/O of single bytes using the standard I/O library. Think of what getchar() needs to do:

Check to see if the buffer is empty.
If empty, fill it with a read call.
Increment the pointer to the head of the buffer, and return the byte in the old head to the user.

Contrast that with what fread() must do:

Calculate the size of the request.
Check to see if there are enough characters left in the buffer to satisfy the request.
If so, copy the bytes using from the buffer to the given pointer.
If not, copy as many bytes as you can, refill the buffer, and try again.

While you don't have to do some of these steps when calling fread() for single bytes, there is still more processing that must be done. In fact, the pushing of 4 arguments onto the stack rather than zero (for getchar()) and one (for fgetc()) will mkae fread() perform poorly when compared to the others for the same tasks.

Grading

Grading of this question was straightforward. If you said ``getchar() and fgetc() are optimized for reading one character while fread() is not'', you received 4 points on part 2. You had to give some more detail to get the final point.

Histogram of scores

Question 2 (12 points)

Convert the following C code into (unoptimized) assembler. Use only registers r0 and r1 (plus of course the stack and frame pointers).
main(int argc, char **argv) { int i, j; j = 0; while(argc > 0) { i = j + 5; j = main(b(j), i-1); } }

Answer

Note that I have i at [fp-4] and j at [fp]. Some of you had the opposite, but that's fine.

main:
	push #8            / Allocate i and j, i is at [fp-4], j is at [fp]

        st %g0 -> [fp]     / Set j to zero
l1:
        ld [fp+12] -> %r0  / Test for argc > 0
        cmp %r0, %g0
        ble l2         
        
        ld [fp] -> %r0       / Do i = j + 5.  Note you have to put 5 into
        mv #5 -> %r1         / a register before doing the addition
        add %r0, %r1 -> %r0
        st %r0 -> [fp-4]

        ld [fp-4] -> %r0     / Push i-1 onto the stack
        add %r0, %gm1 -> %r0
        st %r0 -> [sp]--

        ld [fp] -> %r0       / Call b(j) and put the return value
        st %r0 -> [sp]--     / (which is in r0) on the stack
        jsr b
        pop #4
        st %r0 -> [sp]--
 
        jsr main             / Call main and store the return value in j
        pop #8
        st %r0 -> [fp]

        b l1                 / Go back to the top of the while statement
l2:
        ret

Grading

Grading was broken up into the following parts:

Allocating space for i and j: 1 point
Setting j to zero: 1 point
Doing the while statement correctly: 2 points
Doing i = j + 5 correctly: 2 points
Pushing i-1 on the stack: 1 point
Calling b correctly, including popping j off the stack at the end: 1 points
Pushing the return value of b on the stack: 1 point
Calling main correctly, including popping the arguments off the stack: 1 point
Setting j to the return value of main: 1 point
Calling ret at the end: 1 point

If you did something that affected all parts, you received the following deductions (and these were not included in the above assessment):

Accessing i and j from sp instead of from fp: -2
Using add instead of push/pop: -1
Assuming that the return values are in r1 instead of r0: -1
Pushed arguments to main in wrong order: -2

At the end, half points were rounded up.

Histogram of scores

Question 3 (14 points)

So, as Kim pointed out in class, my definition of parent left something to be desired. What I meant to say was that parent prints out the name of the current directory as it appears in the parent directory. That is what my examples show. However, if you printed out the name of the parent directory, you received full credit.
Part 1 (12 points)
Write the program parent that prints the name of the parent directory of the current directory. In other words, the following should be the output of parent: You may not use system(), getcwd() or getenv().
UNIX> cd /mahogany/homes/plank UNIX> pwd /mahogany/homes/plank UNIX> parent plank UNIX> cd papers UNIX> pwd /mahogany/homes/plank/papers UNIX> parent papers UNIX>
See the last page of this test for prototypes of C library calls and system calls that may be helpful.
Part 2 (2 points)
What will be the output of:
UNIX> cd / UNIX> parent

Answer

Part 1 (12 points)

What you needed to do was find out the inode number of the current directory. You do that by calling stat() and using the st_ino field. Then, you traverse the .. directory and look for an entry whose inode (kept in the d_fileno field) number matches. When you find that match, print out the name of that directory entry. The code is as follows:

#include 
#include 
#include 
  
main()
{
  struct stat buf;
  int i;
  int inode;
  DIR *d;
  struct dirent *de;

  if (stat(".", &buf) < 0) { perror("stat"); exit(1); }

  inode = buf.st_ino;

  d = opendir("..");
  if (d == NULL) { perror("opendir"); exit(1); }
  for (de = readdir(d); de != NULL; de = readdir(d)) {
    if (de->d_fileno == inode) {
      printf("%s\n", de->d_name);
      exit(0);
    }
  }
  printf("If my program gets here, life is unhappy....\n");
}

Part 2 (2 points)

Since .. equals ., it will print out either .. or ., depending on which is gets to first (and on our systems, it will get to .. first).

Grading

There are really two parts to this program: performing stat on . (or .. if you tried to print out the parent directory) and saving the inode number, and traversing .. (or ../.. if you tried to print out the parent directory) and trying to match the inodes. If you did a reasonable approximation of those two parts, then you started off with 12 points, and were deducted for things that were wrong.

If you did some random stat and directory traversal, then you started off with 8 points, and were deducted for things that were egregiously wrong. Also, if you used realpath instead of stat/opendir you started here.

If you wrote little that made sense you got a few points, the number of which depended on how much sense you made.

For part 2, you had to write something reasonable to get points.

Histogram of scores

Question 4 (14 points)

This question concerns the program printword. This program takes a file on standard input, and sorts the words. For each word, it prints out the line numbers (sorted) that contain the word. It skips lines whose first word starts with the '#' character. It should not print out a line number twice for the same word.
Thus, for the following file:
UNIX> cat geh # Beginning I am Sam I am Sam Sam I am That Sam I am That Sam I am I do not like that Sam I am # end UNIX>
The output of the program should be the following:
UNIX> wordline < geh I: 2, 3, 4, 5 Sam: 2, 3, 4, 5 That: 5 am: 2, 3, 4, 5 do: 5 like: 5 not: 5 that: 5 UNIX>
Now, behold the following code for printword:
#include < stdio.h > #include "fields.h" #include "rb.h" main() { IS is; Rb_node t, tmp; char *s; int i, fnd; t = make_rb(); is = new_inputstruct(NULL); while(get_line(is) > 0) { if (is->text1[0] != '#') { for (i = 0; i < is->NF; i++) { tmp = rb_find_key_n(t, is->fields[i], &fnd); if (!fnd || is->line != (int) (tmp->v.val)) { rb_insert(t, is->fields[i], (char *) (is->line)); } } } } s = NULL; rb_traverse(tmp, t) { if (strcmp(s, tmp->k.key) != 0) { if (s != NULL) printf("\n"); printf("%s: %d", tmp->k.key, (int) (tmp->v.val)); } else { printf(", %d", (int) (tmp->v.val)); } s = tmp->k.key; } printf("\n"); }
There are five bugs in this program. By ``bug'', I mean that they will cause incorrect output (or core dumpage), not inefficiency. Four of them are simple and can be fixed within the line that they occur. The fifth is a disign flaw in the program. For each of these bugs:

Identify the bug and state what behavior the bug will cause.
State how the fix the bug. For the four simple bugs, fix them. For the fifth, state how you have to redesign the program so that it is no longer a bug. Make sure that this new design is efficient, at least as far as CPU time is concerned (don't worry about memory usage).
Note -- none of the bugs are syntax/compiler errors. This code will compile just fine. They are all functional errors.

Again, prototypes of relevant C functions and structs are at the end of the test.

Answer

The bugs:

In the statement:
```
  while(get_line(is) > 0) {
```
The > should be changed to >=. Otherwise, the program will exit the first time it sees a blank line.
The statement:
```
  if (is->text1[0] != '#') {
```
is wrong, because it only tests the first character of each line. Lines that start with '#' but not at the first character will not be omitted. It should be:
```
  if (is->NF > 0 && is->fields[0][0] != '#') {
```
The rb_insert statement should be called as follows:
```
     rb_insert(t, strdup(is->fields[i]), (char *) (is->line));
```
Since each get_line() does not implicitly call malloc(), the string inserted into the rb-tree will be overwritten each time get_line() is called.
The following line:
```
    if (strcmp(s, tmp->k.key) != 0) {
```
is going to cause a segmentation violation when it is first called, since s is NULL. There are several ways around this. A simple one is to change the line to:
```
    if (s == NULL || strcmp(s, tmp->k.key) != 0) {
```
The design flaw is that words are inserted into the rb-tree if the combination of [word,line] is not in the tree. However, the rb-tree library does not guarantee that if you insert a word twice, you can make any assumptions about where the first and second word are. Thus, the line numbers will not be sorted, and there may be line numbers multiply inserted. To fix this, you must either have a secondary rb-tree for each word which is the line number, or you can use "rb_insertg", and have the key field be a pointer to a struct with both the word and line numbers. In both cases, you'll have to alter the printing loop too.

Grading

Each bug was worth 4 points: one for identifying the bug, one for describing what it does, one for fixing it, and an extra one if you got all three points for a bug. At the end, you get the minimum of your score and 14.