Jim Plank
General Comments
The test went about as expected. It was challenging, but not overwhelming.
I apologize for the confusion with question 3, but most of you seemed to
understand the question ok. The way I anticipated the test going was:
- Question 1: A giveaway -- straight from the lecture notes.
- Question 2: Easier than your lab -- should take a few minutes, but
if you did lab 7, you should have done fine.
- Question 3: Requires thought and coding. Not surprisingly, the
scores showed a lot of variance.
- Question 4: Subtle code reading -- a very difficult question to
do on a timed, test, but the grading was lenient -- you only needed
to find 3.5 bugs out of 5 to get a perfect score.
To see the score distribution see the
histogram of final grades and Tukey plots below.
As far translating numbers into grades, I'd say:
- Over 45: A+
- 35 -- 45: A
- 30 -- 34: B to B+
- 25 -- 29: C+
- 20 -- 24: C
- 15 -- 19: D
- 10 -- 14: F
The distinction between B and A is blurry. Below 30 means you should
be striving to improve (and have good lab scores).
Histogram of exam scores
Tukey plots of exam scores
(the
line goes from min to max, the box from 1st quartile to 3rd quartile,
dot at the mean and hash lines at the median).
Question 1 (10 points)
Part 1 (5 points)
Explain why you should use the standard I/O library rather than
the file I/O system calls for reading lines of text from a file.
(Don't just give a one word answer -- explain it).
Part 2 (5 points)
If you are reading one character at a time from a file, why
is it better to use getchar() or fgetc() rather than
fread()?
Answer
Part 1 (5 points)
The standard I/O library performs buffering. That is, when you read
a small number of bytes from a file using the standard I/O library
using a procedure like fgets(),
it reads a large chunk of bytes into a buffer with one read()
system call. Subsequent fgets() calls then read from the
buffer until the buffer is empty, at which point it is filled anew
with another read() call. Since system calls are expensive,
this strategy takes much less time than performing a read()
call for each line of text.
Part 2 (5 points)
Getchar() and fgetc() are optimized to perform buffered
I/O of single bytes using the standard I/O library. Think of what
getchar() needs to do:
- Check to see if the buffer is empty.
- If empty, fill it with a read call.
- Increment the pointer to the head of the buffer, and return the byte
in the old head to the user.
Contrast that with what fread() must do:
- Calculate the size of the request.
- Check to see if there are enough characters left in the buffer to
satisfy the request.
- If so, copy the bytes using from the buffer to the given pointer.
- If not, copy as many bytes as you can, refill the buffer, and try
again.
While you don't have to do some of these steps when calling fread()
for single bytes, there is still more processing that must be done. In
fact, the pushing of 4 arguments onto the stack rather than zero (for
getchar()) and one (for fgetc()) will mkae fread()
perform poorly when compared to the others for the same tasks.
Grading
Grading of this question was straightforward. If you said ``getchar()
and fgetc() are optimized for reading one character while fread()
is not'', you received 4 points on part 2. You had to give some more detail
to get the final point.
Histogram of scores
Question 2 (12 points)
Convert the following C code into (unoptimized) assembler.
Use only registers r0 and r1 (plus of course the
stack and frame pointers).
main(int argc, char **argv)
{
int i, j;
j = 0;
while(argc > 0) {
i = j + 5;
j = main(b(j), i-1);
}
}
Answer
Note that I have i at [fp-4] and j at [fp].
Some of you had the opposite, but that's fine.
main:
push #8 / Allocate i and j, i is at [fp-4], j is at [fp]
st %g0 -> [fp] / Set j to zero
l1:
ld [fp+12] -> %r0 / Test for argc > 0
cmp %r0, %g0
ble l2
ld [fp] -> %r0 / Do i = j + 5. Note you have to put 5 into
mv #5 -> %r1 / a register before doing the addition
add %r0, %r1 -> %r0
st %r0 -> [fp-4]
ld [fp-4] -> %r0 / Push i-1 onto the stack
add %r0, %gm1 -> %r0
st %r0 -> [sp]--
ld [fp] -> %r0 / Call b(j) and put the return value
st %r0 -> [sp]-- / (which is in r0) on the stack
jsr b
pop #4
st %r0 -> [sp]--
jsr main / Call main and store the return value in j
pop #8
st %r0 -> [fp]
b l1 / Go back to the top of the while statement
l2:
ret
Grading
Grading was broken up into the following parts:
- Allocating space for i and j: 1 point
- Setting j to zero: 1 point
- Doing the while statement correctly: 2 points
- Doing i = j + 5 correctly: 2 points
- Pushing i-1 on the stack: 1 point
- Calling b correctly, including popping j off the
stack at the end: 1 points
- Pushing the return value of b on the stack: 1 point
- Calling main correctly, including popping the arguments
off the stack: 1 point
- Setting j to the return value of main: 1 point
- Calling ret at the end: 1 point
If you did something that affected all parts, you received the following
deductions (and these were not included in the above assessment):
- Accessing i and j from sp instead of from
fp: -2
- Using add instead of push/pop: -1
- Assuming that the return values are in r1 instead of
r0: -1
- Pushed arguments to main in wrong order: -2
At the end, half points were rounded up.
Histogram of scores
Question 3 (14 points)
So, as Kim pointed out in class, my definition of parent
left something to be desired. What I meant to say was that
parent prints out the name of the current directory as
it appears in the parent directory. That is what my examples
show. However, if you printed out the name of the parent directory,
you received full credit.
Part 1 (12 points)
Write the program parent that prints the name of the
parent directory of the current directory. In other words,
the following should be the output of parent:
You may not use system(), getcwd() or
getenv().
UNIX> cd /mahogany/homes/plank
UNIX> pwd
/mahogany/homes/plank
UNIX> parent
plank
UNIX> cd papers
UNIX> pwd
/mahogany/homes/plank/papers
UNIX> parent
papers
UNIX>
See the last page of this test for prototypes of C library calls
and system calls that may be helpful.
Part 2 (2 points)
What will be the output of:
UNIX> cd /
UNIX> parent
Answer
Part 1 (12 points)
What you needed to do was find out the inode number of the current
directory. You do that by calling stat() and using the st_ino
field. Then, you traverse the .. directory and look for an entry
whose inode (kept in the d_fileno field) number matches. When
you find that match, print out the name of that directory entry.
The code is as follows:
#include
#include
#include
main()
{
struct stat buf;
int i;
int inode;
DIR *d;
struct dirent *de;
if (stat(".", &buf) < 0) { perror("stat"); exit(1); }
inode = buf.st_ino;
d = opendir("..");
if (d == NULL) { perror("opendir"); exit(1); }
for (de = readdir(d); de != NULL; de = readdir(d)) {
if (de->d_fileno == inode) {
printf("%s\n", de->d_name);
exit(0);
}
}
printf("If my program gets here, life is unhappy....\n");
}
Part 2 (2 points)
Since .. equals ., it will print out either .. or
., depending on which is gets to first (and on our systems, it
will get to .. first).
Grading
There are really two parts to this program: performing stat on
. (or .. if you tried to print out the parent directory)
and saving the inode number, and traversing .. (or ../..
if you tried to print out the parent directory) and trying to match the
inodes. If you did a reasonable approximation of those two parts, then
you started off with 12 points, and were deducted for things that were
wrong.
If you did some random stat and directory traversal, then
you started off with 8 points, and were deducted for things that were
egregiously wrong. Also, if you used realpath instead of
stat/opendir you started here.
If you wrote little that made sense you got a few points, the number
of which depended on how much sense you made.
For part 2, you had to write something reasonable to get points.
Histogram of scores
Question 4 (14 points)
This question concerns the program printword. This
program takes a file on standard input, and sorts the words. For
each word, it prints out the line numbers (sorted) that contain
the word.
It skips lines whose first word starts with the '#' character.
It should not print out a line number twice for the same word.
Thus, for the following file:
UNIX> cat geh
# Beginning
I am Sam
I am Sam
Sam I am
That Sam I am That Sam I am I do not like that Sam I am
# end
UNIX>
The output of the program should be the following:
UNIX> wordline < geh
I: 2, 3, 4, 5
Sam: 2, 3, 4, 5
That: 5
am: 2, 3, 4, 5
do: 5
like: 5
not: 5
that: 5
UNIX>
Now, behold the following code for printword:
#include < stdio.h >
#include "fields.h"
#include "rb.h"
main()
{
IS is;
Rb_node t, tmp;
char *s;
int i, fnd;
t = make_rb();
is = new_inputstruct(NULL);
while(get_line(is) > 0) {
if (is->text1[0] != '#') {
for (i = 0; i < is->NF; i++) {
tmp = rb_find_key_n(t, is->fields[i], &fnd);
if (!fnd || is->line != (int) (tmp->v.val)) {
rb_insert(t, is->fields[i], (char *) (is->line));
}
}
}
}
s = NULL;
rb_traverse(tmp, t) {
if (strcmp(s, tmp->k.key) != 0) {
if (s != NULL) printf("\n");
printf("%s: %d", tmp->k.key, (int) (tmp->v.val));
} else {
printf(", %d", (int) (tmp->v.val));
}
s = tmp->k.key;
}
printf("\n");
}
There are five bugs in this program. By ``bug'', I mean that they
will cause incorrect output (or core dumpage), not inefficiency.
Four of them are simple
and can be fixed within the line that they occur. The fifth is
a disign flaw in the program. For each of these bugs:
- Identify the bug and state what behavior the bug will
cause.
- State how the fix the bug. For the four simple bugs, fix them.
For the fifth, state how you have to redesign the program so that
it is no longer a bug. Make sure that this new design is efficient,
at least as far as CPU time is concerned (don't worry about memory
usage).
Note -- none of the bugs are syntax/compiler errors. This
code will compile just fine. They are all functional errors.
Again, prototypes of relevant C functions and structs are at the
end of the test.
Answer
The bugs:
- In the statement:
while(get_line(is) > 0) {
The > should be changed to >=. Otherwise, the program will exit the
first time it sees a blank line.
- The statement:
if (is->text1[0] != '#') {
is wrong, because it only tests the first character of each line. Lines
that start with '#' but not at the first character will not be omitted.
It should be:
if (is->NF > 0 && is->fields[0][0] != '#') {
- The rb_insert statement should be called as follows:
rb_insert(t, strdup(is->fields[i]), (char *) (is->line));
Since each get_line() does not implicitly call malloc(),
the string inserted into the rb-tree will be overwritten each time
get_line() is called.
- The following line:
if (strcmp(s, tmp->k.key) != 0) {
is going to cause a segmentation violation when it is first called, since
s is NULL. There are several ways around this. A simple one
is to change the line to:
if (s == NULL || strcmp(s, tmp->k.key) != 0) {
-
The design flaw is that words are inserted into the rb-tree if the combination
of [word,line] is not in the tree. However, the rb-tree library does not
guarantee that if you insert a word twice, you can make any assumptions about
where the first and second word are. Thus, the line numbers will not be
sorted, and there may be line numbers multiply inserted. To fix this,
you must either have a secondary rb-tree for each word which is the line
number, or you can use "rb_insertg", and have the key field be a pointer
to a struct with both the word and line numbers. In both cases,
you'll have to alter the printing loop too.
Grading
Each bug was worth 4 points: one for identifying the bug, one for describing
what it does, one for fixing it, and an extra one if you got all three
points for a bug. At the end, you get the minimum of
your score and 14.
Histogram of scores