CS360 Midterm Exam #2. March 11, 2015. James S. Plank

Answers and Grading

Question 1: 12 Points

The only subtle part of this question was dealing with p and k. The for loop sets k so that it is:

0x00

0x01

0x02

0x03

0x04

0x05

0x06

0x07

The memcpy() statement overwrites the middle four bytes of k. However, is k[2] the least significant byte of 0x90abcdef, or the most signifcant? We don't know yet. However, we know that when we're done, k is either going to be:

0x00

0x01

0x90

0xab

0xcd

0xef

0x06

0x07

0x00 0x01 0xef 0xcd 0xab 0x90 0x06 0x07

Line zero shows us that k[2] is 0xef, so it is the second representation of k that we need to go with. That also means that when we represent integers, their least significant bytes go first. Now, we're ready to answer the question:

Line 1: Right shifting by four simply deletes the rightmost hex digit: The answer is 0x1234567.
Line 2: The best thing here is to represent h as bits:

0000 0000 0000 1111 0111 1101 1001 1100

Left-shifting by one appends a zero to the last four bits, and shifts everything to the left one:

0000 0000 0001 1110 1111 1011 0011 1000

The answer is 0x1efb38.
Line 3: This byte is unchanged during the memcpy()The answer is 0x7. I accept 0x07 too.
Line 4: From the drawing and explanation above, The answer is 0x90.
Line 5: Again, from the drawing and explanation above, the four bytes of p[0] are 0x00, 0x01, 0xef and 0xcd. The least significant byte is first (0x00), so the answer is 0xcdef0100.
Line 6: By the same reasoning, the answer is 0x70690ab.

The program is in q1.c if you want to test it:

UNIX> gcc q1.c
UNIX> a.out
Line 0: 0xef
Line 1: 0x1234567
Line 2: 0x1efb38
Line 3: 0x7
Line 4: 0x90
Line 5: 0xcdef0100
Line 6: 0x70690ab
UNIX>

Grading

Two points per line.

Here's the partial credit rubric:

Line 1, you got .75 points for 0x123456b8, 0.50 points for 0x123456 and 0x12345678, and 0.25 for 0x12345780.
Line 2: you got 1.75 for 0xefb38, 1.50 for 0x1dfb38, 0x1ef738, 0x1efb28, 0x1efb3a, 0x1efb3c, 0x1efbe8, 0x1efc38 and 0x1efd38, and one for 0xef338, g0xefab8, 0xefb18 and 0xfdb38.
Line 3: I gave a point for 0x0111.
Line 4: I gave half a point for 0xef, and 0.3 points for 0x5/0x05.
Line 5: I gave 1.5 for 0x0001efcd, 0x1efcd, 0xcdef0201 and 0xefcd0100. I gave 0.8 for 0x0001ef.
Line 6: 1.5 for 0xab900607, 1.2 for 0xab9067, 1 for 0xf6070 and 0xef67.

Question 2: 12 Points

I was hoping the two problems would jump out at you:

Problem #1: There is a memory leak due to strdup(), because strdup() allocates and creates a new copy of each word, and passes that to strcpy(). All that strcpy() does is copy the bytes from its second argument to the first, so when strcpy() is done, the pointer to the newly allocated string is lost. That is a classic memory leak.
Problem #2: While this kind of code works in C++, it is bad in C. The reason is that C++ maintains the string's length in the string class. C does not. This means that each time you call strcat(), it has to find the end of rv by searching from the beginning. As rv grows, this becomes wasteful -- in particular, if the average word size is n and the size of words is m, then this loop becomes O(m²n). That's a problem.

Fixing the first bug is easy -- don't call strdup(). To fix the second bug, you need to keep track of the end of rv, and simply call strcpy() or strcat() there. I call strcpy(), because that way I don't have to keep the string null-terminated when I add the space.

In my code, I add a second int which I name eorv. Then, I replace the second for() loop with:

  eorv = 0;
  for (i = 0; i < numwords; i++) {
    if (i != 0) {
      rv[eorv] = ' ';
      eorv++;
    }
    strcpy(rv+eorv, words[i]);
    eorv += strlen(words[i]);
  }

My loop is now O(mn). I have the old code in q2.cpp and the new code in q2-good.cpp. They both read words from standard input, build a words array, and then call build_string(). As you can see from q2-input.txt, the second is much faster:

UNIX> wc q2-input.txt
   11000   23298  174306 q2-input.txt
UNIX> g++ q2.cpp
UNIX> time a.out < q2-input.txt
1.571u 0.003s 0:01.57 100.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "cat q2-input.txt q2-input.txt | a.out"
6.243u 0.016s 0:06.59 94.8%	0+0k 5+2io 0pf+0w
UNIX> g++ q2-good.cpp
UNIX> time a.out < q2-input.txt
0.032u 0.002s 0:00.03 100.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "cat q2-input.txt q2-input.txt | a.out"
0.066u 0.011s 0:00.07 100.0%	0+0k 0+0io 0pf+0w
UNIX>

Grading

Three points for spotting each problem. Three points for fixing each problem. I gave a point for spotting problems that weren't really problems. When you see "ML" in the grading, that means "Memory Leak".

Question 3: 8 Points

Here are the two lines of code that I was anticipating:

Line 1: i = a() + b(): You need to store the return value of a() while you call b(). To do so, you need to use r2, r3 or r4, because r0 and r1 are not guaranteed to retain their values across procedure calls. Suppose you use r2. Because you use it, you must spill it, because whoever is calling you is relying on the same guarantee.
Line 2: i = k*j + m*n: Here, you need to use three registers, because you need to use one to store the product k*j, and then one each to load m and n. That means you have to use one of r2, r3 or r4. Whichever one you use, you need to spill it at the beginning of the procedure and unspill it at the end, so that its value remains the same when the procedure is done.

Four points for a line that requires spilling, and four points for your explanation.

Question 4: 10 Points

Nuts and bolts assembler from the third Assembler lab.

a:
  push #8  / Allocate i and b.  i is [fp-4] and b is [fp]

  ld [fp+16] -> %r0    / i = *y
  ld [r0] -> %r0
  st %r0 -> [fp-4]

  ld [fp+12] -> %r0    / b = *x
  ld [r0] -> %r0
  st %r0 -> [fp]

  ld [fp-4] -> %r0    / return b[i]
  mov #4 -> %r1
  mul %r0, %r1 -> %r0
  ld [fp] -> %r1
  add %r0, %r1 -> %r0
  ld [r0] -> %r0
  ret

In q4.jas, I have a main() and b() that set up the stack in jassem exactly as in the question.

Grading

The initial "Push #8" - 2 points
The first line of C code - 2 points
The second line of C code - 2 points
The last line of C code - 4 points

Question 5: 19 Points

Each part was worth a point.

A: This is fp+12: 0xfff418.
B: This is fp+16: 0xfff41c.
C: This is fp-4: 0xfff408. Half credit if you swapped your answers for C and D.
D: This is fp: 0xfff40c. (Half credit if you swapped your answers for C and D.)
E: This is the value of x: 0xfff42c. The grading of this part and subsequent parts was on consistency with your previous answers. So, to get full credit for this, your answer had to match your answer to part A. This means, for example, if you answered 0xfff410 for part A, then you only got credit for 0xfff41c for this part.
F: This is what's in 0xfff42c: 0xfff440. To get full credit, your answer here had to match your answer to part E.
G: This is what's in 0xfff440: 0xfff434. To get full credit, your answer here had to match your answer to part F.
H: This is the same as the value of *x: 0xfff440. To get full credit, your answer here had to equal your answer to part F.
I: This is *y, or, what's in address 0xfff428: 0x2. To get full credit, your answer here had to match your answer to part B.
J: This adds 2*4 to b - 0xfff448 - and dereferences it: 0xfff430. To get full credit, your answer here had to match your answers to parts H and I.
K: Now you dereference 0xfff430: 0xaa. To get full credit, your answer here had to match your answer to part J.
L: The caller of a()'s frame pointer is 0xfff41c. There are four words in that stack frame. Going from bottom to top: y in a(), x in a(), the pc when a() returns and the fp when a() returns. There's no room for any local variables. The answer is zero. I gave half credit to eight.
M: This is the word two below the frame pointer: 0x1054.
N: This is the word above that: 0xfff41c.
O: This is what happens when the pc and the fp are popped off the stack: 0xfff414.
P: Obviously, one is 0xfff40c. The next is a word below that: 0xfff41c. And the next is the word below 0xfff41c: 0xfff448.

Question 6: 18 Points

As you can see, these two procedures are identical, except one reads from ssd_buf, while the other writes to it. There are three egregious problems with this code, which all involve making too many system calls:

Both procedures call load_ssd_page() once for every byte read or written. That's too many -- you can call it once, and then do all of the reading and writing on the page, before you call it for the next page.
ssd_read() calls flush_ssd_page(), even though it never modifies the page.
ssd_write() calls flush_ssd_page() after every byte. That's the same problem as with load_ssd_page() above.

A more subtle problem involving system calls is that if ssd_read() needs to read from the page that's already loaded, there is no need to call load_ssd_page() at all.

There are more minor problems, and perhaps your compiler can figure some of them out, but why rely on that?

You are doing divisions and mod operations on every byte, when you don't need to.
You are copying byte by byte, when most machines can copy at least 64 bytes at a time.

The code below solves all of these problems. You can bet that memcpy() has been written to use the widest word size possible to do its copying, so using it is much better than trying to write it yourself.

This can be improved, and were I writing these procedures for real, I would make the improvement. I'll tell you how after you see the code (which is in q6.c):

void ssd_read(char *buf, int size, int a)
{
  int page, bytes;

  page = a / 4096;
  a %= 4096;
  bytes = (a + size > 4096) ? (4096 - a) : size;
  if (ssd_bufid != page) load_ssd_page(page);
  memcpy(buf, ssd_buf+a, bytes);
  size -= bytes;
  buf += bytes;

  while (size > 0) {
    page++;
    load_ssd_page(page);
    bytes = (size > 4096) ? 4096 : size;
    memcpy(buf, ssd_buf, bytes);
    size -= bytes;
    buf += bytes;
  }
}

What's the improvement? Well, I only check to see if the first page is loaded already. It could be that one of the pages in the middle of the read is already in memory. I could read that page first, and then read the remainders.

Grading

Spotting the problems was worth 10 points -- these sum to more than 10, but you were capped at 10 points for this part:

Too many system calls: 6 points.
Copying done a character at a time: 3 points
Flush is unnecessary in read: 2 points
The calculations done on every byte are unnecessary.

Fixing the problem was worth 8 points. If your code structure was off, you started off with four points, and if it was really off, you started with two points. I took off for things like calling flush_ssd_page(), not calling memcpy(), or not having a loop in your code.