CS302 Final Exam - Answers and Grading

James S. Plank - December 11, 2012

Question 1: Warm Up

This is a garden variety BFS:

int fastest(vector <string> &A)
{
  vector <int> distance;
  vector <int> Q;
  int i, j, N;

  N = A.size();

  for (i = 0; i < N; i++) distance.push_back(-1);
  distance[0] = 0;
  Q.push_back(0);

  for (i = 0; i < Q.size(); i++) {
    if (Q[i] == N-1) return distance[N-1];
    for (j = 0; j < N; j++) {
      if (A[Q[i]][j] == '1' && distance[j] == -1) {
        distance[j] = distance[Q[i]]+1;
        Q.push_back(j);
      }
    }
  }
  return -1;
}

I've hacked up a main() and put it in q1.cpp so you can test it yourself.

Grading

8 points -- this one is rather objective -- I assign points according to how well I think you got the program.

Question 2: Topological Sort

a: A, B, C, D, E, F, G, H: Nope, F has to come before D.
b: A, B, C, G, F, D, E, H: Nope, E has to come before F.
c: A, G, B, C, E, H, F, D: Fine.
d: A, G, H, B, E, F, C, D: Nope, C has to come before E.
e: A, B, C, G, E, F, D, H: Fine.
f: A, G, H, E, F, D, B, C: Nope, B has to come before E.
g: A, B, E, G, C, F, G, D: Nope, C has to come before E.
h: A, G, H, B, E, F, C, D: Nope, C has to come before E.
i: Nope.

Grading: 6 points -- three for corrects, and -1 for incorrects.

Question 3: Network flow

Part 1:

a: Greedy DFS: S, A, B, F, G, C, D, E, H, I, T
b: Modified Dijkstra: S, A, G, C, D, T. Actually, once you reach D, any path that holds a flow of 23 works fine.
c: Edmonds-Karp: S, A, C, D, T.

Part 2:

a: A cut is a collection of edges, not a number. Yes, that is the weight of the minimum cut, but it's not the minimum cut itself.
b: CD is a cut, but it's not a minimum cut.
c: That is a cut whose weight is 51. Yes, it's a minimum cut.
d: Those edges sum to 51, but they are not a cut, since you can get from the source to the sink through edge DT.

Grading

2 points per path, 1.5 points per answer in part 2.

Question 4: Sorting

Part 1: This is bucket sort. You first set all elements in the temporary array to a sentinel value, like 50,000,001. Then you convert each number to an index between 0 and 2,000,000: index = (i+50,000,000)*2/100,000,001 You find the empty element in the temporary array closest to that index and put the value there. At the end, you copy the non-empty elements of the temporary array back to the array, and use insertion sort to sort them. Since they will be nearly sorted, the insertion sort pass will run in linear time.

Part 2: If Harvey is given a sorted array, his quicksort will take O(n²) time. That's why we use the "median of three" pivot selection.

Part 3: Merge sort requires a second copy of the array. You call it recursively, the perform the merge to the second copy, and copy it back. The extra time and memory makes it slower than quicksort.

Grading

Five points for part 1. Two each for parts 2 and 3.

Question 5: Dijkstra, Prim, Kruskal

Part 1: This one helps to use scratch paper. The order visited, with the shortest path lengths to each node:

A (0), D(10), E(15) C(35), B(40), F(40), H(55), G(60), I(85).

F can be processed before B -- it's arbitrary. You can see the progression of the algorithm in Dijkstra-Ans.pdf. The path is A, D, H, I.

Part 2: G, D, A, E, C, F, H, I, B. You can see the progression of the algorithm in Prim-Ans.pdf.

Part 3: This one adds the edges to the minimum spanning tree in increasing order: CF, AD, AE, EC, HF, HI, BC, DG.

Grading

Four points per part.

Question 6: Dynamic Programming

The original code with a main() is in khash-orig.cpp. It assumes that there is one command line argument, and it calls Get_Hash() on it. It sets up Base with random numbers, and Shift() performs a 31-bit circular shift on the integer.

This is not the world's best hashing algorithm, especially because it hashes all two-letter words 0, but you can blame Khloe for that. Bigger values look pretty random, but as promised, it starts to choke as the string gets large:

UNIX> khash-orig a
27f7ad1c a
UNIX> khash-orig b
31e15f77 b
UNIX> khash-orig ab
00000000 ab
UNIX> khash-orig cde
63191872 cde
UNIX> time khash-orig abcdefghijklmnopqrstuv
20bca1f2 abcdefghijklmnopqrstuv
0.734u 0.001s 0:00.73 100.0%	0+0k 0+0io 0pf+0w
UNIX> time khash-orig abcdefghijklmnopqrstuvw
4cc3c8e2 abcdefghijklmnopqrstuvw
1.473u 0.001s 0:01.47 100.0%	0+0k 0+0io 0pf+0w
UNIX> time khash-orig abcdefghijklmnopqrstuvwx
69dfb95c abcdefghijklmnopqrstuvwx
3.084u 0.004s 0:03.08 100.0%	0+0k 0+0io 0pf+0w
UNIX>

The reason that it's blowing up is that it is called on the same string many times. For example:

Get_Hash("abcdefg") calls Get_Hash("bcdefg") and Get_Hash("abcdef").
Get_Hash("abcdef") calls Get_Hash("bcdef") and Get_Hash("abcde").
Get_Hash("bcdefg") calls Get_Hash("cdefg") and Get_Hash("bcdef").

As you can see, with just two levels of recursion, we're calling Get_Hash("bcdef") twice. You don't have to think too hard to memoize this. You can do that in two ways. First, you can simply have a map keyed on the string and use that as a cache. That's done in khash-map.cpp:

class KHash {
  public:
    KHash();
    int Get_Hash(string s);
    int Shift(int n);
  protected:
    vector <int> Base;
    map <string, int> cache;
};

int KHash::Get_Hash(string s)
{
  int i;
  int rv;

  if (cache.find(s) != cache.end()) return cache[s];
  i = s.size();
  if (i == 1) return Base[s[0]];

  rv = (Base[s[0]] ^ 
          Shift(Base[s[i-1]]) ^ 
          Shift(Get_Hash(s.substr(1, i-1))) ^
          Get_Hash(s.substr(0, i-1)));
  cache[s] = rv;
  return rv;
}

You don't even need to update the constructor. This works fine and is much faster than the original:

UNIX> khash-map a
27f7ad1c a
UNIX> khash-map ab
00000000 ab
UNIX> khash-map b
31e15f77 b
UNIX> time khash-map abcdefghijklmnopqrstuvwx
69dfb95c abcdefghijklmnopqrstuvwx
0.000u 0.001s 0:00.00 0.0%	0+0k 0+0io 0pf+0w
UNIX>

Think about the running time -- Get_Hash() ends up being called on every substring of the original string. If the string size is n, then there are n substrings of size 1, n-1 of size 2, etc. That makes O(n²) substrings. Since we're using a map, the running time of this is actually O(n^slog(n)) (it's log(n²), but log(n²) is O(log(n)). Think about it.

If you really want this to be quadratic -- O(n²), you can use a cache that is a two-dimensional vector indexed by the indices of the substring. That's a little more difficult, but it is indeed O(n²). It's in khash-vector.cpp:

class KHash {
  public:
    KHash();
    int Get_Hash(string s);
    int Get_Hash_DP(int start, int size);
    int Shift(int n);
  protected:
    vector < vector <int> > cache;
    vector <int> Base;
    string S;
};
    
int KHash::Get_Hash(string s)
{
  int i;

  S = s;
  cache.resize(s.size());
  for (i = 0; i < cache.size(); i++) cache[i].resize(s.size(), -1);
  return Get_Hash_DP(0, s.size());
}

int KHash::Get_Hash_DP(int start, int size)
{
  if (size == 1) return Base[S[start]];

  if (cache[start][size] != -1) return cache[start][size];

  cache[start][size] =  (Base[S[start]] ^ 
          Shift(Base[S[start+size-1]]) ^ 
          Shift(Get_Hash_DP(start+1, size-1)) ^
          Get_Hash_DP(start, size-1));
  return cache[start][size];
}

Finally, to do step3, you need to remove the recursion. You do that by realizing that you are always making calls to smaller substring sizes. So you build the cache from small substrings to large. This is in khash-step3.cpp:

#include <iostream>
#include <vector>
#include <cstdio>
#include <cstdlib>
using namespace std;

class KHash {
  public:
    KHash();
    int Get_Hash(string s);
    int Shift(int n);
  protected:
    vector <int> Base;
    vector < vector <int> > cache;
};
    
int KHash::Get_Hash(string s)
{
  int start, size, i;

  cache.resize(s.size());
  for (i = 0; i < cache.size(); i++) cache[i].resize(s.size()+1, -1);

  for (size = 1; size <= s.size(); size++) {
    for (start = 0; start+size <= s.size(); start++) {
      if (size == 1) {
        cache[start][size] = Base[s[start]];
      } else {
        cache[start][size] =  (Base[s[start]] ^ 
                              Shift(Base[s[start+size-1]]) ^ 
                              Shift(cache[start+1][size-1]) ^
                              cache[start][size-1]);
      }
    }
  }
  return cache[0][s.size()];
}

Grading

10 points if you got the real quadratic solution. 8 points if you got the map solution. 4 if you told me how to do it, but didn't get the details correct. An extra 5 if you did step 2 properly, but you had to hack it up (if you simply described it, you got a point).