CS302 Midterm Exam. October 20, 2011. James S. Plank

Answers and Grading Guide

Question 1

The disjoint set data structure allows one to manage a set of n items. At first all n items are in disjoint sets. However, one may call union() to coalesce two sets, and call find() on two elements to see if they are in the same set.

Here's an API in C++;

class Disjoint {
  public:
    Disjoint(int n);
    int Union(int s1, int s2);
    int Find(int element);
    Print();
  protected:
    vector  links;
    vector  ranks;
};

The constructor takes a number of elements, n, and creates an instance of the Disjoint Set API that has n distinct sets, each with one element, numbered zero through n-1. Its running time is O(n).

Union() takes two set id's and coalesces the two sets into one. Its running time is O(1).

Find() takes an element and returns its set id. Its running time is O(α(n)), where α(n)) is the inverse Ackerman function.

Maze generation is one problem ideally suited to Disjoint Sets. We start with an r by c grid of cells with walls between adjacent cells. This is equivalent to an initial instance of a disjoint set with rc elements. Then we choose a random wall that separates two cells c1 and c2. If those two cells are in different disjoint sets, then we remove the wall between them and call union() on the sets. When we are left with exactly one set, we have created a maze with the maximum number of walls.

Grading: 14 points

Overview: Three points (manages sets, union, identification (find))
API: Three points (one half each: constructor, union, find, links, ranks, proper public/protected)
Constructor -- what it does: Half a point
Union -- what it does: One point
Find -- what it does : One point
Constructor -- running time: Half a point
Union -- running time: One point
Find -- running time: One point
Maze generation: Three points

Question 2

This is where quicksort is ideal. You can't use bucket sort because you don't know anything about the data. Thus you have to use an n log(n) algorithm, and quicksort is the fastest of these.
Since we do know how the data is distributed, we can use that fact to sort the data in linear time with bucket sort. We can put each number in its approximate place in an oversized array, and then use insertion sort to put everything into its proper place.
Insertion sort is the fastest when the input is small. Recall our fast implementations of quicksort and merge sort both default to insertion sort for small arrays.
In the program to the right, each number at index i is between i and i+5. Thus, each number is at most five places from where it should be when it is sorted. Insertion sort will run in linear time on this, so it is the best sorting algorithm. Although fast, quicksort will still be O(n(log(n))).

Grading: 10 points

2.5 points per part. 1.5 for the algorithm, 1.0 for the "why". Partial credit below. If you gave the wrong algorithm but with a plausibly reasonable answer, I gave you some partial credit for the "why".

Merge and Heap sort get you 1 point.
Any of the O(n log n) algorithms gets you a half of a point.
Quicksort gets you 1 point.
Any of the O(n log n) algorithms gets you a half of a point. If you made some reasonable explanation of bucket

Question 3

This is a nuts-and-bolts map problem. You first need a map to store names and number of tests. Then you need a second map to sort the number of tests. Here's the answer:

#include <iostream>
#include <map>
using namespace std;

main()
{
  map <string, int> all;
  map <string, int>::iterator ait;
  multimap <int, string> sorted;
  multimap <int, string>::reverse_iterator sit;
  string name;
  int ntests;
  double val;

  while (cin >> name >> ntests >> val) {
    all[name] += ntests;
  }

  for (ait = all.begin(); ait != all.end(); ait++) {
    sorted.insert(make_pair(ait->second, ait->first));
  }
  
  for (sit = sorted.rbegin(); sit != sorted.rend(); sit++) {
    printf("%4d %s\n", sit->first, sit->second.c_str());
  }
  exit(0);
}

Grading: 10 points

Reading with cin: 1 point
Correctly handling eof: 1 point
Declaring a <string, int> map: 1 point
Declaring a <int, string> multimap: 1 point
Declaring iterators correctly
Putting the tests into the map: 1 point
Incrementing the number of runs: 1 point
Inserting values correctly into the multimap: 1 point
Traversing the multimap in the right order: 1 point
Correct print statement: 1 point

You lost a bunch of points if you used an O(n²) algorithm instead of either map.

Question 4

Part A:

Part B: First, you insert the 2 into the bottom of the heap:

Then you percolate up, in this case, all the way to the top of the heap:

Part C: In this case, you replace the 3 at the top of the hap with the last element, 41, and delete the 41:

Of course, this isn't a valid heap, so you must percolate down, which will swap 41 with 4, then 13, then 39:

Part D: First, you consider the vector as a heap, which of course is not a valid heap:

Next, you call percolate down on the four elements in the penultimate row of the tree:

Next, you call percolate down on the two elements in the next higher level:

Finally, you call percolate down on the root:

Part E:: Push is O(log n), since you may have to percolate up to the top of the heap.

Part F:: Pop is also O(log n), since you may have to percolate down to the bottom of the heap.

Part G:: This is the "gotcha" of heaps: O(n).

Grading: 12 Points

2 points each for parts A-C. Three points for part D. One point each for E, F and G. There's not much partial credit available here.

Question 5

Nuts and bolts bit-arithmetic to represent sets:

void print_subset(int subset, vector <string>; &names)
{
  int i;

  for (i = 0; i < names.size()); i++) {
    if (subset & (1 << i)) cout << names[i] << endl;
  }
}

Grading: 5 points

This one was subjective -- I gave you partial credit according to:

How well you understood the problem and solution.
How correct your code is.
The fact that your code was devoid of jibberish.