CS302 Final Exam - Answers and Grading

James S. Plank - December 9, 2014

Question 1

Part A: This graph is very structured. In particular, every path from Y to Z has five edges. Therefore, this could be one of the paths. The answer is Yes.

Part B: This path has a flow of 12. It is easy to eyeball the graph and see paths with greater flow, such as the top one, whose flow is 18: ( Y → A → F → K → P → Z ). The answer is No.

Part C: The edge with the limiting flow: HN.

Part D: These are the five back edges: CY, HC, NH, SN, ZS.

Part E: 12.

Part F: Perform a DFS on the residual graph to find all nodes connected to Y. Then the minimum cut is composed of all of the edges in the original graph that are coming out of that set: AF, GL, MR, SZ, TZ.

Part G: This is the sum of the weights of the edges in the minimum cut: 145.

Grading

Part A: 1.5 points
Part B: 1.5 points
Part C: 1 point
Part D: .4 points for correct edges and 0.4 for incorrect edges
Part E: 1 point for 12, .5 points for -12
Part F: .6 points for correct edges and .3 of reverse edges. -.6 for incorrect edges.

Question 2

Part 1: Let's go through Dijkstra's algorithm to find the shortest path:

Add A to the map with a distance of zero.
Process A -- remove it from the map. Its distance is zero. Add B, C and E to the map, with distances of 4, 19 and 79. The map is: (4,C),(19,B),(79,E).
Process C -- remove it from the map. Its distance is four. Add F, and I to the map, with distances of 86 and 6. Also update E's distance to 60. The map is: (6,I),(19,B),(60,E),(86,F).
Process I -- remove it from the map. Its distance is six. Add L, M and O to the map, with distances of 87, 61 and 7. The map is: (7,O),(19,B),(60,E),(61,M),(86,F),(87,L).
Process O -- remove it from the map. Its distance is seven. Add P, to the map with a distance of 10. The map is: (10,P),(19,B),(60,E),(61,M),(86,F),(87,L).
Process P -- you're done. Its distance is 10, and the final map is: (19,B),(60,E),(61,M),(86,F),(87,L).

Therefore, the answer is: n: B E M F L.

Part 2: We only processed 10 edges with Dijkstra's algorithm. We'd have to process all 33 edges using topological sort, so Dijkstra's algorithm will be faster: Dijkstra

Part 3: The tradeoff is number of edges processed vs. time to process edges. Topological sort must process all of the edges of the graph; however, it takes O(1) time to process each edge. Dijkstra's algorithm quits when it finds the shortest path, and it may do so well before processing all of the edges of the graph, as in the example above. However, becaase it manages a multimap, it takes O(|V|) times to process each edge.

So, if Dijkstra's algorithm processes a fraction of the total number of edges in finding the shortest path, then it will be faster. If the two algorithms must process roughly the same number of edges, the topological sort will be faster.

Please see the lecture notes on Topological Sort for a full discussion and experimental analysis of this.

Grading:

Part 1: 4 points for n or i. 2.5 for d, e, f, j, m, s, t or w. One for a.
Part 2: 3 points for Dijkstra
Part 3: 7 points. The following points were important:
- The interaction between the shortest path and Dijkstra
- The fact that TS processes all of the edges
- The fact that TS does O(1) work per edge.
- The fact that Dijkstra does O(log(v)) work per edge.

Question 3

Part A: To prove that NC-K is in NP, you must show that you can verify a "yes" answer in polynomial time. A "yes" answer in this case would be a circuit through the graph whose edge weights sum to less than K. You can verify this in linear time: Are there |V| edges? Do they compose a circuit? Do their edge weights sum to less than K? Each of these determinations is O(|V|), so the verification is easily in polynomial time.

Part B: For the second step, you need to show that you can use NC-K to solve a known NP-Complete problem, and if you could solve NC-K in polynomial time, then you could solve the NP-Complete problem in polynomial time.

So, take an arbitrary instance of 3-SAT, and convert it to an instance of NC-K. Then, show that if you can solve the instance of NC-K in polynomial time, then you can convert that answer to solve the instance of 3-SAT. The conversion to NC-K and the conversion back to 3-SAT most both be polynomial time in the size of 3-SAT.

Grading

Five points per part.

Question 4

This is a DFS, pretty much straight from the DFS lecture notes. You need to add a recursive method to help you do the DFS. The only difference from this and the lecture notes is that you use E1 and E2 rather than a normal adjacency list:

void Graph::DFS(int n, int cn)
{
  if (Components[n] != -1) return;
  Component[n] = cn;
  DFS(E1[n], cn);
  DFS(E2[n], cn);
}

void Graph::CompCon()
{
  int i, cn;

  Components.clear();
  Components.resize(N, -1);
  cn = 0;
  for (i = 0; i < E1.size(); i++) {
    if (Components[i] == -1) {
      DFS(i, cn);
      cn++;
    }
  }
}

Grading

You started with 10 points and were deducted for things that you did incorrectly. Common things were:

Only finding the component reachable from node 0.
Incrementing the component number for every loop.
Having your DFS go into an infinite loop.
Using non-linear algorithms.

Question 5

Part 1: The easiest thing here is to use Kruskal's algorithm. BC, BE and BF all connect different nodes to the same component. EF doesn't, so you throw it out. DE, HI and FI are next into the tree. At this point, all of the nodes are connected to the same component, except A and G. It's easy to see that DG and AD will be the last edges in. The answer is AD,BC,BE,BF,DE,DG,HI,FI. This wasn't on the multiple choice answers, but that was rectified during the exam, as I had the students list the edges rather than use the multiple choice.

Parts 2 and 3: Both of these are O(|E| log |V|). This is equivalent to O(|E| log |E|) (see the lecture notes on Minimum Spanning Tree), so that answer is fine too.

Grading

.625 points per correct edge; -.3 per incorrect edge.
2.5 each for the running times. O(|V| log |E|) or O(|V| log |E|) were worth 1.5 points.

Question 6

This is a dynamic program, much like the LCS() example from the Dynamic Programming lecture notes. Let's start with step one -- writing the recursive program. This is in Q6-1.cpp:

int F(string s, string t)
{
  char a, b;
  string x, y;
  int a1, a2;

  if (s.size() == 0 || t.size() == 0) return 0;
  a = s[0];
  x = s.substr(1);
  b = t[0];
  y = t.substr(1);
  if (a > b) {
    a1 = F(s, y);
    a2 = 1 + F(x, t);
  } else {
    a1 = 2 + F(s, y);
    a2 = F(x, t);
  }
  return (a1 > a2) ? a1 : a2;
}

main()
{
  string s1, s2;

  if (!(cin >> s1 >> s2)) exit(1);
  printf("%d\n", F(s1, s2));
}

When you run it, it works, but it chokes on pretty small input, because of the exponential blowup of recursive calls:

UNIX> echo abc def | a.out
6
UNIX> echo def abc | a.out
3
UNIX> echo vjsi slek | a.out
9
UNIX> time sh -c "echo idjdjwjjejakskd fjiiwjjlal | a.out"
31
2.924u 0.011s 0:02.93 100.0%	0+0k 0+2io 0pf+0w
UNIX> time sh -c "echo idjdjwjjejakskdr fjiiwjjlalb | a.out"
35
11.598u 0.031s 0:11.64 99.8%	0+0k 0+0io 0pf+0w
UNIX>

So, let's memoize. The easiest thing is to convert the two strings into a key for the memoization cache. Since the strings are alphanumeric, we can concatenate them with a space in between to make the key. It's easiest to bundle everything up into a class here so that the recursive method has access to the cache. This is in Q6-2.cpp:

class DP {
  public:
    int F(string s, string t);
    map <string, int> Cache;
};
    
int DP::F(string s, string t)
{
  char a, b;
  string x, y, key;
  int a1, a2;

  if (s.size() == 0 || t.size() == 0) return 0;
  key = s;
  key += ' ';
  key += t;
  if (Cache.find(key) != Cache.end()) return Cache[key];

  a = s[0];
  x = s.substr(1);
  b = t[0];
  y = t.substr(1);
  if (a > b) {
    a1 = F(s, y);
    a2 = 1 + F(x, t);
  } else {
    a1 = 2 + F(s, y);
    a2 = F(x, t);
  }
  Cache[key] = ( (a1 > a2) ? a1 : a2 );
  return Cache[key];
}

main()
{
  string s1, s2;
  DP dp;

  if (!(cin >> s1 >> s2)) exit(1);
  printf("%d\n", dp.F(s1, s2));
}

Now, we're running nice and fast:

UNIX> g++ -O3 Q6-2.cpp
UNIX> echo abc def | a.out
6
UNIX> echo def abc | a.out
3
UNIX> time sh -c "echo idjdjwjjejakskd fjiiwjjlal | a.out"
31
0.002u 0.013s 0:00.01 100.0%	0+0k 0+2io 0pf+0w
UNIX> time sh -c "echo idjdjwjjejakskdr fjiiwjjlalb | a.out"
35
0.001u 0.003s 0:00.00 0.0%	0+0k 0+0io 0pf+0w
UNIX>

However, as the string sizes get bigger, the program starts to slow down. This is because that the cache size is O(n²), and each key of the cache is also of size O(n²). The file Q6-Small.txt has two strings of size 950, and the program takes roughly 20 seconds to run:

UNIX> wc Q6-Small.txt
       2       2    1902 Q6-Small.txt
UNIX> time a.out < Q6-Small.txt
2764
19.279u 2.410s 0:21.69 99.9%	0+0k 0+0io 0pf+0w
UNIX>

The solution is to instead hold s and t in the class, and have F() operate on their indices. Now the cache keys are simply two numbers, so we can have the cache be a doubly-indexed vector. The code is in Q6-3.cpp:

class DP {
  public:
    string s, t;
    int F(int si, int ti);
    vector < vector <int> > Cache;
};
    
int DP::F(int si, int ti)
{
  int a1, a2;
  char a, b;

  if (si == s.size() || ti == t.size()) return 0;
  if (Cache[si][ti] != -1) return Cache[si][ti];

  a = s[si];
  b = t[ti];
  if (a > b) {
    a1 = F(si, ti+1);
    a2 = 1 + F(si+1, ti);
  } else {
    a1 = 2 + F(si, ti+1);
    a2 = F(si+1, ti);
  }
  Cache[si][ti] = ( (a1 > a2) ? a1 : a2 );
  return Cache[si][ti];
}

main()
{
  DP dp;
  int i;

  if (!(cin >> dp.s >> dp.t)) exit(1);

  dp.Cache.resize(dp.s.size());
  for (i = 0; i < dp.Cache.size(); i++) {
    dp.Cache[i].resize(dp.t.size(), -1);
  }

  printf("%d\n", dp.F(0, 0));
}

Now it can handle strings whose sizes are nearly 8,000 in under 2 seconds:

UNIX> g++ -O3 Q6-3.cpp
UNIX> echo abc def | a.out
6
UNIX> echo def abc | a.out
3
UNIX> time sh -c "echo idjdjwjjejakskd fjiiwjjlal | a.out"
31
0.002u 0.004s 0:00.00 0.0%	0+0k 0+3io 0pf+0w
UNIX> time sh -c "echo idjdjwjjejakskdr fjiiwjjlalb | a.out"
35
0.001u 0.003s 0:00.00 0.0%	0+0k 0+0io 0pf+0w
UNIX> time a.out < Q6-Small.txt
2764
0.053u 0.003s 0:00.05 100.0%	0+0k 0+1io 0pf+0w
UNIX> wc Q6-Medium.txt 
       2       2    7448 Q6-Medium.txt
UNIX> time a.out < Q6-Medium.txt
10411
0.834u 0.049s 0:00.88 98.8%	0+0k 0+0io 0pf+0w
UNIX> wc Q6-Big.txt 
       2       2   11448 Q6-Big.txt
UNIX> time a.out < Q6-Big.txt
15990
1.887u 0.108s 0:01.99 99.4%	0+0k 0+0io 0pf+0w
UNIX>

Grading

You started with 15 points if you used the integer cache, and 12 if you started with the string cache. You lost points for things you did wrong or omitted. Commong things were:

No main().
Calculating a, b, x and y incorrectly.
Not using a delimited (like a space) in your memoization key.