## Question 1

The disjoint set data structure allows one to manage a set of n items. At first all n items are in disjoint sets. However, one may call union() to coalesce two sets, and call find() on two elements to see if they are in the same set.

Here's an API in C++;

 ```class Disjoint { public: Disjoint(int n); int Union(int s1, int s2); int Find(int element); Print(); protected: vector links; vector ranks; }; ```

The constructor takes a number of elements, n, and creates an instance of the Disjoint Set API that has n distinct sets, each with one element, numbered zero through n-1. Its running time is O(n).

Union() takes two set id's and coalesces the two sets into one. Its running time is O(1).

Find() takes an element and returns its set id. Its running time is O(α(n)), where α(n)) is the inverse Ackerman function.

Maze generation is one problem ideally suited to Disjoint Sets. We start with an r by c grid of cells with walls between adjacent cells. This is equivalent to an initial instance of a disjoint set with rc elements. Then we choose a random wall that separates two cells c1 and c2. If those two cells are in different disjoint sets, then we remove the wall between them and call union() on the sets. When we are left with exactly one set, we have created a maze with the maximum number of walls.

• Overview: Three points (manages sets, union, identification (find))
• API: Three points (one half each: constructor, union, find, links, ranks, proper public/protected)
• Constructor -- what it does: Half a point
• Union -- what it does: One point
• Find -- what it does : One point
• Constructor -- running time: Half a point
• Union -- running time: One point
• Find -- running time: One point
• Maze generation: Three points

## Question 2

1. This is where quicksort is ideal. You can't use bucket sort because you don't know anything about the data. Thus you have to use an n log(n) algorithm, and quicksort is the fastest of these.
2. Since we do know how the data is distributed, we can use that fact to sort the data in linear time with bucket sort. We can put each number in its approximate place in an oversized array, and then use insertion sort to put everything into its proper place.
3. Insertion sort is the fastest when the input is small. Recall our fast implementations of quicksort and merge sort both default to insertion sort for small arrays.
4. In the program to the right, each number at index i is between i and i+5. Thus, each number is at most five places from where it should be when it is sorted. Insertion sort will run in linear time on this, so it is the best sorting algorithm. Although fast, quicksort will still be O(n(log(n))).

2.5 points per part. 1.5 for the algorithm, 1.0 for the "why". Partial credit below. If you gave the wrong algorithm but with a plausibly reasonable answer, I gave you some partial credit for the "why".
1. Merge and Heap sort get you 1 point.
2. Any of the O(n log n) algorithms gets you a half of a point.
3. Quicksort gets you 1 point.
4. Any of the O(n log n) algorithms gets you a half of a point. If you made some reasonable explanation of bucket

## Question 3

This is a nuts-and-bolts map problem. You first need a map to store names and number of tests. Then you need a second map to sort the number of tests. Here's the answer:

 ```#include #include using namespace std; main() { map all; map ::iterator ait; multimap sorted; multimap ::reverse_iterator sit; string name; int ntests; double val; while (cin >> name >> ntests >> val) { all[name] += ntests; } for (ait = all.begin(); ait != all.end(); ait++) { sorted.insert(make_pair(ait->second, ait->first)); } for (sit = sorted.rbegin(); sit != sorted.rend(); sit++) { printf("%4d %s\n", sit->first, sit->second.c_str()); } exit(0); } ```

• Reading with cin: 1 point
• Correctly handling eof: 1 point
• Declaring a <string, int> map: 1 point
• Declaring a <int, string> multimap: 1 point
• Declaring iterators correctly
• Putting the tests into the map: 1 point
• Incrementing the number of runs: 1 point
• Inserting values correctly into the multimap: 1 point
• Traversing the multimap in the right order: 1 point
• Correct print statement: 1 point
You lost a bunch of points if you used an O(n2) algorithm instead of either map.

## Question 4

Part A:

Part B: First, you insert the 2 into the bottom of the heap:

Then you percolate up, in this case, all the way to the top of the heap:

Part C: In this case, you replace the 3 at the top of the hap with the last element, 41, and delete the 41:

Of course, this isn't a valid heap, so you must percolate down, which will swap 41 with 4, then 13, then 39:

Part D: First, you consider the vector as a heap, which of course is not a valid heap:

Next, you call percolate down on the four elements in the penultimate row of the tree:

Next, you call percolate down on the two elements in the next higher level:

Finally, you call percolate down on the root:

Part E:: Push is O(log n), since you may have to percolate up to the top of the heap.

Part F:: Pop is also O(log n), since you may have to percolate down to the bottom of the heap.

Part G:: This is the "gotcha" of heaps: O(n).

2 points each for parts A-C. Three points for part D. One point each for E, F and G. There's not much partial credit available here.

## Question 5

Nuts and bolts bit-arithmetic to represent sets:

 ```void print_subset(int subset, vector ; &names) { int i; for (i = 0; i < names.size()); i++) { if (subset & (1 << i)) cout << names[i] << endl; } } ```