As an example, in pqueue_set.h I have a class definition for a priority queue for doubles that has a few additional methods:
#include <vector> #include <set> using namespace std; class PQueue { public: PQueue(); PQueue(vector <double> &v); void Push(double d); double Pop(); int Size(); int Empty(); void Print(); protected: multiset <double> h; }; |
The additional methods are pretty obvious. We have two constructors. The one that takes no arguments will create an empty priority queue. The one that takes a reference to a vector will make the queue from the vector.
You may ask "But Dr. Plank, you said that reference parameters are evil. Why are you using one?" The answer is that the STL is too cumbersome otherwise. If I pass a pointer to a vector v, then I have to reference element i either with v->at(i), which is inefficient, or with (*v)[i], which is really hard to read. So I'll bite the bullet and use a reference parameters.
In class, someone asked why it is that I think they are evil? The answer is that when I call new PQueue(v), I have to go find the header to make sure that I'm not making a copy of v. It's very easy to get into the habit of passing large data structures as arguments, and if you don't define them as reference parameters, you will make copies. Most of the time, your programs will run correctly, but they will be exceptionally inefficient. Think about this use -- if I did not specify v as a reference parameter, the code would compile and run correctly. However, it will be very inefficient. We may explore this later....
#include <cstdio> #include <iostream> #include <cstdlib> #include "pqueue_set.h" PQueue::PQueue() { } void PQueue::Push(double d) { h.insert(d); } int PQueue::Size() { return h.size(); } int PQueue::Empty() { return h.empty(); } double PQueue::Pop() { multiset <double>::iterator hit; double retval; if (h.empty()) { cerr << "Error: Called Pop on an empty PQueue\n"; exit(1); } hit = h.begin(); retval = *hit; h.erase(hit); return retval; } PQueue::PQueue(vector <double> &v) { int i; for (i = 0; i < v.size(); i++) h.insert(v[i]); } void PQueue::Print() { multiset <double>::iterator hit; for (hit = h.begin(); hit != h.end(); hit++) { if (hit != h.begin()) cout << " "; cout << *hit; } cout << endl; } |
To test, I have written four programs that use Priority Queues:
UNIX> add_5_set Enter five numbers to insert into the priority queue 3 5 2 4 1 1 2 3 4 5 UNIX>
UNIX> head -n 5 input.txt 782.678 270.576 447.465 501.138 714.567 UNIX> head -n 5 input.txt | sort_set 270.576 447.465 501.138 714.567 782.678 UNIX> wc input.txt 1000 1000 7885 input.txt UNIX> sort_set < input.txt > output1.txt UNIX> sort -n input.txt > output2.txt UNIX> diff output1.txt output2.txt UNIX>
UNIX> time massive_create_set 1000 100 0.045u 0.001s 0:00.04 100.0% 0+0k 0+0io 0pf+0w UNIX> time massive_create_set 10000 100 0.472u 0.003s 0:00.47 100.0% 0+0k 0+0io 0pf+0w UNIX> time massive_create_set 100000 100 7.144u 0.021s 0:07.20 99.4% 0+0k 0+0io 0pf+0w UNIX> time massive_create_set 1000000 10 15.832u 0.102s 0:16.07 99.1% 0+0k 0+0io 0pf+0w UNIX>
The other property of a heap is that the value of each node must be less than or equal to all of the values in its subtrees. That's it. Below are two examples of heaps:
That second one has duplicate values (two sixes), but that's ok. It still fits the definition of a heap.
Below are three examples of non-heaps:
The 10 node is not smaller than the values in its subtrees. |
Not a complete tree. |
The last row is not filled from left to right. |
When we push a value onto a heap, we know where the new node is going to go. So we insert the value there. Unfortunately, that may not result in a new heap, so we adjust the heap by "percolating up." To percolate up, we look at the value's parent. If the parent is smaller than the value, then we can quit -- we have a valid heap. Otherwise, we swap the value with its parent, and continue percolating up.
Below we give an example of inserting the value 2 into a heap:
Start | Put 2 into the new node and percolate up. |
Continue percolating, since 2 is less than 3. | Now we stop, since 1 is less than 2. |
To pop a value off a heap, again, we know the shape of the tree when we're done -- we will lose the rightmost node on the last level. So what we do is put the value in that node into the root (we will return the old value of the root as the return value of the Pop() call). Of course, we may not be left with a heap, so as in the previous example, we must modify the heap. We do that by "Percolating Down." This is a little more complex than percolating up. To percolate down, we check a value against its children. If it is the smallest of the three, then we're done. Otherwise, we swap it with the child that has the minimum value, and continue percolating down.
As before, an example helps. We will call Pop() on the last heap above. This will return 1 to the user, and will delete the node holding the 7, so we put the value 7 into the root and start percolating down:
Start | Swap 7 and 2 |
Swap 7 and 3. We're Done. |
If we think about running time complexity, obviously the depth of a heap with n elements is log_{2}(n). Thus, Push() and Pop() both run with O(log(n)) complexity. This is the exact same as using a multiset above. So why do we bother?
The answer is that even though the two have the same running time complexity, the heap implementation is more efficient. Since we always add and subtract elements to and from the end of the heap, we may implement the heap as a vector. The root is at index 0, the nodes at the next level are at indices 1 and 2, etc. For example, we show how the final heap above maps to a vector:
When we Push() an element onto the heap, we perform a push_back() on the vector, and then we percolate up. When we Pop() an element off the heap, we store the root, replace it with the last element, call pop_back() on the vector, and percolate down starting at the root. A quick inspection of the indices shows that:
void PQueue::Push(double d) { int i, parent; double tmp; h.push_back(d); // Put the new element at the end of the tree. i = h.size()-1; while (1) { if (i == 0) return; // And percolate up. parent = (i-1)/2; if (h[parent] > h[i]) { // This code swaps a node with its parent, tmp = h[i]; // if the parent is greater than the node. h[i] = h[parent]; h[parent] = tmp; i = parent; } else { return; } } } double PQueue::Pop() { double retval, tmp; int index, lc, rc; if (h.empty()) { cerr << "Error: Called Pop on an empty PQueue\n"; exit(1); } retval = h[0]; // Store the root value to return. h[0] = h[h.size()-1]; // Move the last element to the root. h.pop_back(); // Delete it from the heap. if (h.size() == 0) return retval; index = 0; while (1) { // And percolate down: lc = index*2+1; rc = lc+1; if (lc >= h.size()) return retval; // Stop if you have no children. if (rc == h.size() || h[lc] <= h[rc]) { // If you're not swapping with your right child: if (h[lc] < h[index]) { // Then either swap with your left child. tmp = h[lc]; h[lc] = h[index]; h[index] = tmp; index = lc; } else { // Or don't swap; you're the smallest. return retval; } } else if (h[rc] < h[index]) { // Here's where you swap with your right child. tmp = h[rc]; h[rc] = h[index]; h[index] = tmp; index = rc; } else { // Or you don't swap; you're the smallest. return retval; } } } |
Let's double-check ourselves with add_5_heap_simp:
UNIX> add_5_heap_simp Enter five numbers to insert into the priority queue 1 2 3 4 5 1 2 3 4 5 UNIX> add_5_heap_simp Enter five numbers to insert into the priority queue 5 4 3 2 1 1 2 4 5 3 UNIX>In the first call, we don't need to do any percolating when we insert elements into the heap. Thus, the vector has the values in order. In the second call, the five add calls each percolate the new value to the root of the heap:
To verify that percolating down works, we simply ensure that sort_5_heap_simp works:
UNIX> head -n 5 input.txt | sort_heap_simp 270.576 447.465 501.138 714.567 782.678 UNIX> sort_heap_simp < input.txt > output3.txt UNIX> diff output3.txt output2.txt UNIX>
PQueue::PQueue(vector <double> &v) { int i; for (i = 0; i < v.size(); i++) Push(v[i]); } |
Since Push() is O(log(n)), this implementation is O(n(log(n))). However, you can do better. Let us do the following instead:
Below is an example. Here's the vector that we want to turn into a heap:
We convert the vector into a tree, as pictured below, and then we call "Percolate Down" on nodes of each successively higher level. These are the nodes pictured in yellow. By the time we get to the root, we have created a valid heap:
To analyze the running time complexity, we need a little math. Suppose our tree is complete at each level. The tree has n nodes, and thus its depth is log(n). The worst case performance of "Percolate Down" for a node at level i is log(n)-i. Thus, the performance of "Percolate Down" for a node at the bottom level is 1; for a node at the penultimate level, it is 2, etc.
Then the bottom level contains (n+1)/2 nodes, which is roughly n/2. Thus, all of the Percolate Down's take (1)(n/2) = n/2 operations. At the next level, there are n/4 nodes, whose Percolate Down's take 2 operations each. Thus, all of them take 2n/4. At the next level, there are n/8 nodes whose Percolate Down's take 3 operations each. Thus, all of them take 3n/8. Do you see the pattern?
The creation of the heap can be converted into a summation:
To figure out what that second summation is, let's first write a quick C++ program to calculate its value for the first 15 values of n formula.cpp:
#include <iostream> using namespace std; main() { double num, den, total, n; num = 1; den = 2; total = 0; for (n = 0; n < 15; n++) { total += (num/den); num++; den *= 2; cout << total << endl; } } |
Running it:
UNIX> formula 0.5 1 1.375 1.625 1.78125 1.875 1.92969 1.96094 1.97852 1.98828 1.99365 1.99658 1.99817 1.99902 1.99948 UNIX>Looks like it converges to 2. Let's analyze it mathematically. Consider the following sum:
Now, let's consider the equation G - G/2:
From high school math, we know that this last summation equals one, so G - G/2 = 1, which means that G = 2.
#include <vector> using namespace std; class PQueue { public: PQueue(); PQueue(vector <double> &v); void Push(double d); double Pop(); int Size(); int Empty(); void Print(); protected: void Percolate_Down(int index); vector <double> h; }; |
The implementation is straightforward. We lifted the Percolate_Down() code from our previous implementation of Pop(), and then we use it in our vector constructor:
PQueue::PQueue(vector <double> &v) { int i; h = v; // Copy v to h. Gotta love C++. for (i = h.size()/2-1; i >= 0; i--) Percolate_Down(i); } |
UNIX> sh sort_vec_time.sh 10000 0.089 0.088 UNIX>I wrote a second shell script bigtest_1.sh that varies the number of input elements from 1000 to 100000 elements and calls sort_vec_time.sh to test the sorting programs on them. For each test it prints the number of elements and then the two timings:
UNIX> sh bigtest_1.sh | head 1000 0.02 0.019 2000 0.026 0.025 3000 0.035 0.03 4000 0.043 0.041 5000 0.054 0.045 6000 0.056 0.054 7000 0.065 0.066 8000 0.072 0.074 9000 0.081 0.082 10000 0.088 0.089 UNIX>The full results are in test-data.txt. Do you see anything unexpected about the data? sort_vec_heap is not consistently outperforming sort_vec_heap_simp. Does that extrapolate to larger numbers of elements? We provide a graph in sort_test.jgr:
There are two properties of that graph that should speak to you. First, the squiggly curves indicate that the individual data points might have some variance, and that we should run more than one run per test. Second, there seems to be no appreciable difference between the two implementations. Let's solve the first problem by running bigtest_1.sh more times and plotting average values. The file test-data-2.txt contains the results of running bigtest_1.sh ten times. We graph it using sort_test2.jgr:
That is smoother, but again there is no appreciable difference between the two implementations. Should we then conclude that there is no difference? Not without probing further. Think about what sort_vec is doing when the number of elements is 100,000. It is reading over 900,000 bytes from disk into a vector and then creating a heap from it. The predominant source of overhead is reading the file, and not creating the heap. There is variability that comes from doing the disk reads, and this variability masks the difference between the two implementations.
We should try another approach. Instead, we should use massive_create_heap.cpp (and massive_create_heap_simp.cpp):
#include "pqueue_heap.h" #include <cstdlib> #include <iostream> using namespace std; main(int argc, char **argv) { PQueue *pq; int i, j, nelements, niterations; vector <double> v; if (argc != 3) { cerr << "usage: massive_create_heap nelements niterations\n"; exit(1); } nelements = atoi(argv[1]); niterations = atoi(argv[2]); v.resize(nelements); for (i = 0; i < niterations; i++) { for (j = 0; j < nelements; j++) v[j] = drand48(); pq = new PQueue(v); delete pq; } } |
This program generates the random input in memory without reading from disk. It also will perform this multiple times so that we can make our program run longer, which makes it easier to time.
The shell script massive_test.sh tests the performance of both massive_create_heap_simp and massive_create_heap, running them with enough iterations so that the tests take around a half a second on my Macbook. As above, we run this for a variety of values between 10,000 and 1,000,000 and plot the timings below:
As you can see, massive_create_heap is indeed faster massive_create_heap_simp. Those jumps are interesting, aren't they? I'll talk about them below.
You can now see why we couldn't tell any difference in the sort_vec tests. When we did the test on 100,000 elements, the program was taking 0.8 seconds, but the heap construction was taking roughly 0.01 seconds for both programs -- clearly in the experimental noise.
What should you get from this? First, that experimental computer science research can be very tricky. We had a simple goal: to demonstrate that a O(n) algorithm runs faster than a O(n(log(n))) one. We tried an obvious solution, and we couldn't use it because we weren't really testing what we wanted to test. When we rewrote our testing programs to do a better job of removing the noise from the experiment, we were able to demonstrate improved performance.
Second, the O(n) algorithm is indeed faster.
The first question you should ask yourself is whether those jumps are due to instruction count or memory. When I see curves like that, I think "memory." So, I instrumented pqueue_heap.cpp to keep track of the times it looks at an element of v in Push(). That is a rough estimate of instruction count. If you take a look at a graph of that, you'll see that the curve is smooth:
That is a hint that it's not instruction count that is causing the jumps.
Take a look at the following graphs. In these, I have instrumented the programs so that they keep track of the elements of v that they access in each call to either Percolate_Down(), or Push(). Here's the graph for creating the heap by repeatedly calling Push(). Time goes from top to bottom.
And here's the graph for creating the heap from the vector by repeatedly calling Percolate_Down():
Can you see why the first graph would cause jumps with fixed-sized caches, and the second would not? This would be a great exercise for a cache modeling assignment. We can talk about it in class when we have the time. Or, if I take the time to write about it, I'll do so for you.