CS302 Lecture Notes - Priority Queues

The priority queue data structure maintains a collection of items with fast access to the minimum. Its API supports the following operations:

As an example, in pqueue_set.h I have a class definition for a priority queue for doubles that has a few additional methods:

#include <vector>
#include <set>
using namespace std;

class PQueue {
  public:
    PQueue();
    PQueue(vector <double> &v);
    void Push(double d);
    double Pop();
    int Size();
    int Empty();
    void Print();
  protected:
    multiset <double> h;
};

The additional methods are pretty obvious. We have two constructors. The one that takes no arguments will create an empty priority queue. The one that takes a reference to a vector will make the queue from the vector.

You may ask "But Dr. Plank, you said that reference parameters are evil. Why are you using one?" The answer is that the STL is too cumbersome otherwise. If I pass a pointer to a vector v, then I have to reference element i either with v->at(i), which is inefficient, or with (*v)[i], which is really hard to read. So I'll bite the bullet and use a reference parameters.

In class, someone asked why it is that I think they are evil? The answer is that when I call new PQueue(v), I have to go find the header to make sure that I'm not making a copy of v. It's very easy to get into the habit of passing large data structures as arguments, and if you don't define them as reference parameters, you will make copies. Most of the time, your programs will run correctly, but they will be exceptionally inefficient. Think about this use -- if I did not specify v as a reference parameter, the code would compile and run correctly. However, it will be very inefficient. We may explore this later....


Using a multiset to implement the Priority Queue

The easiest way to implement a priority queue is to use a multiset. This implementation is so simple that most methods take one line of code. It is in pqueue_set.cpp:

#include <cstdio>
#include <iostream>
#include <cstdlib>
#include "pqueue_set.h"

PQueue::PQueue() { }
void PQueue::Push(double d) { h.insert(d); }
int PQueue::Size() { return h.size(); }
int PQueue::Empty() { return h.empty(); }

double PQueue::Pop() 
{
  multiset <double>::iterator hit;
  double retval;
  
  if (h.empty()) {
    cerr << "Error: Called Pop on an empty PQueue\n";
    exit(1);
  }
  hit = h.begin();
  retval = *hit;
  h.erase(hit);
  return retval;
}

PQueue::PQueue(vector <double> &v) 
{ 
  int i;

  for (i = 0; i < v.size(); i++) h.insert(v[i]);
}

void PQueue::Print() 
{
  multiset <double>::iterator hit;

  for (hit = h.begin(); hit != h.end(); hit++) {
    if (hit != h.begin()) cout << " ";
    cout << *hit;
  }
  cout << endl;
}

To test, I have written four programs that use Priority Queues:


Performance, and implementing with a Heap

Typically, one implements a priority queue with a data structure called a heap. A heap is a "complete binary tree" -- which means it is a tree where every level is full, except for the last. The last level fills up from the left to the right. Thus, if we know how many elements are in a heap, we know exactly what it looks like. For example, a heap with 10 elements will look as follows, regardless of the values of the elements.

The other property of a heap is that the value of each node must be less than or equal to all of the values in its subtrees. That's it. Below are two examples of heaps:

That second one has duplicate values (two sixes), but that's ok. It still fits the definition of a heap.

Below are three examples of non-heaps:


The 10 node is not smaller than the values in its subtrees.

Not a complete tree.

The last row is not filled from left to right.

When we push a value onto a heap, we know where the new node is going to go. So we insert the value there. Unfortunately, that may not result in a new heap, so we adjust the heap by "percolating up." To percolate up, we look at the value's parent. If the parent is smaller than the value, then we can quit -- we have a valid heap. Otherwise, we swap the value with its parent, and continue percolating up.

Below we give an example of inserting the value 2 into a heap:

Start

Put 2 into the new node and percolate up.

Continue percolating, since 2 is less than 3.

Now we stop, since 1 is less than 2.

To pop a value off a heap, again, we know the shape of the tree when we're done -- we will lose the rightmost node on the last level. So what we do is put the value in that node into the root (we will return the old value of the root as the return value of the Pop() call). Of course, we may not be left with a heap, so as in the previous example, we must modify the heap. We do that by "Percolating Down." This is a little more complex than percolating up. To percolate down, we check a value against its children. If it is the smallest of the three, then we're done. Otherwise, we swap it with the child that has the minimum value, and continue percolating down.

As before, an example helps. We will call Pop() on the last heap above. This will return 1 to the user, and will delete the node holding the 7, so we put the value 7 into the root and start percolating down:

Start

Swap 7 and 2

Swap 7 and 3. We're Done.

If we think about running time complexity, obviously the depth of a heap with n elements is log2(n). Thus, Push() and Pop() both run with O(log(n)) complexity. This is the exact same as using a multiset above. So why do we bother?

The answer is that even though the two have the same running time complexity, the heap implementation is more efficient. Since we always add and subtract elements to and from the end of the heap, we may implement the heap as a vector. The root is at index 0, the nodes at the next level are at indices 1 and 2, etc. For example, we show how the final heap above maps to a vector:

When we Push() an element onto the heap, we perform a push_back() on the vector, and then we percolate up. When we Pop() an element off the heap, we store the root, replace it with the last element, call pop_back() on the vector, and percolate down starting at the root. A quick inspection of the indices shows that:

Because we use a vector rather than a linked data structure (like a red-black tree or AVL tree, which is how the STL implements maps and sets), the operations should be more efficient than using the multiset implementation.

The Heap Implementation

We will provide two heap implementations, which only differ by the vector constructor method. The first is in pqueue_heap_simp.h and pqueue_heap_simp.cpp. The only difference in the class description is that h is a vector now instead of a map. The two interesting methods are Push() and Pop(), which implement percolating up and down repectively:

void PQueue::Push(double d) 
{ 
  int i, parent;
  double tmp;

  h.push_back(d);    // Put the new element at the end of the tree.
  i = h.size()-1;
  while (1) {
    if (i == 0) return;    // And percolate up.
    parent = (i-1)/2;
    if (h[parent] > h[i]) {      // This code swaps a node with its parent,
      tmp = h[i];                // if the parent is greater than the node.
      h[i] = h[parent];
      h[parent] = tmp;
      i = parent;
    } else {
      return;
    }
  }
}

double PQueue::Pop() 
{
  double retval, tmp;
  int index, lc, rc;
  
  if (h.empty()) {
    cerr << "Error: Called Pop on an empty PQueue\n";
    exit(1);
  }
  retval = h[0];                          // Store the root value to return.
  h[0] = h[h.size()-1];                   // Move the last element to the root.
  h.pop_back();                           // Delete it from the heap.
  if (h.size() == 0) return retval;
  index = 0;
  while (1) {                             // And percolate down:
    lc = index*2+1;
    rc = lc+1;
    if (lc >= h.size()) return retval;    // Stop if you have no children.

    if (rc == h.size() || h[lc] <= h[rc]) { // If you're not swapping with your right child:
      if (h[lc] < h[index]) {               // Then either swap with your left child.
        tmp = h[lc];
        h[lc] = h[index];
        h[index] = tmp;
        index = lc;
      } else {                                   // Or don't swap; you're the smallest.
        return retval;
      }
    } else if (h[rc] < h[index]) {            // Here's where you swap with your right child.
      tmp = h[rc];
      h[rc] = h[index];
      h[index] = tmp;
      index = rc;
    } else {                                     // Or you don't swap; you're the smallest.
      return retval;
    }
  }
}

Let's double-check ourselves with add_5_heap_simp:

UNIX> add_5_heap_simp 
Enter five numbers to insert into the priority queue
1 2 3 4 5
1 2 3 4 5
UNIX> add_5_heap_simp
Enter five numbers to insert into the priority queue
5 4 3 2 1
1 2 4 5 3
UNIX> 
In the first call, we don't need to do any percolating when we insert elements into the heap. Thus, the vector has the values in order. In the second call, the five add calls each percolate the new value to the root of the heap:

To verify that percolating down works, we simply ensure that sort_5_heap_simp works:

UNIX> head -n 5 input.txt | sort_heap_simp
270.576
447.465
501.138
714.567
782.678
UNIX> sort_heap_simp < input.txt > output3.txt
UNIX> diff output3.txt output2.txt
UNIX> 

Constructing a heap from a vector

pqueue_heap_simp.cpp implements its vector constructor by simply traversing the vector and calling Push():

PQueue::PQueue(vector <double> &v) 
{ 
  int i;

  for (i = 0; i < v.size(); i++) Push(v[i]);
}

Since Push() is O(log(n)), this implementation is O(n(log(n))). However, you can do better. Let us do the following instead:

Below is an example. Here's the vector that we want to turn into a heap:

We convert the vector into a tree, as pictured below, and then we call "Percolate Down" on nodes of each successively higher level. These are the nodes pictured in yellow. By the time we get to the root, we have created a valid heap:

To analyze the running time complexity, we need a little math. Suppose our tree is complete at each level. The tree has n nodes, and thus its depth is log(n). The worst case performance of "Percolate Down" for a node at level i is log(n)-i. Thus, the performance of "Percolate Down" for a node at the bottom level is 1; for a node at the penultimate level, it is 2, etc.

Then the bottom level contains (n+1)/2 nodes, which is roughly n/2. Thus, all of the Percolate Down's take (1)(n/2) = n/2 operations. At the next level, there are n/4 nodes, whose Percolate Down's take 2 operations each. Thus, all of them take 2n/4. At the next level, there are n/8 nodes whose Percolate Down's take 3 operations each. Thus, all of them take 3n/8. Do you see the pattern?

The creation of the heap can be converted into a summation:

To figure out what that second summation is, let's first write a quick C++ program to calculate its value for the first 15 values of n formula.cpp:

#include <iostream>
using namespace std;

main()
{
  double num, den, total, n;

  num = 1;
  den = 2;
  total = 0;
  
  for (n = 0; n < 15; n++) {
    total += (num/den);
    num++;
    den *= 2;
    cout << total << endl;
  }
}

Running it:

UNIX> formula
0.5
1
1.375
1.625
1.78125
1.875
1.92969
1.96094
1.97852
1.98828
1.99365
1.99658
1.99817
1.99902
1.99948
UNIX> 
Looks like it converges to 2. Let's analyze it mathematically. Consider the following sum:

Now, let's consider the equation G - G/2:

From high school math, we know that this last summation equals one, so G - G/2 = 1, which means that G = 2.


What is the upshot of all of this??

The upshot is that creating a heap from a vector is less than or equal to Gn, which means that it is O(n) instead of O(n(log(n))). Is that significant? Let's hack it up and test. Our last priority queue implementation is in pqueue_heap.h and pqueue_heap.cpp. The header simply adds a protected Percolate_Down() method:

#include <vector>
using namespace std;

class PQueue {
  public:
    PQueue();
    PQueue(vector <double> &v);
    void Push(double d);
    double Pop();
    int Size();
    int Empty();
    void Print();
  protected:
    void Percolate_Down(int index);
    vector <double> h;
};

The implementation is straightforward. We lifted the Percolate_Down() code from our previous implementation of Pop(), and then we use it in our vector constructor:

PQueue::PQueue(vector <double> &v) 
{ 
  int i;

  h = v;    // Copy v to h.  Gotta love C++.

  for (i = h.size()/2-1; i >= 0; i--) Percolate_Down(i);
}


Performance Testing

Let's test the difference between the two implementations. First, let's try performance testing with sort_vec_heap vs sort_vec_heap_simp. I'm not going to go into great detail on how I run these experiments -- it helps to be fluent with all the Unix tools like sh/sed/grep/awk. I first wrote a simple C++ program called genrandom.cpp which generates a random input file. Then I wrote a shell script called sort_vec_time.sh which takes a number of elements, generates a random input file with that many elements, and then times the performance of sort_vec_heap_simp and sort_vec_heap on the file. For example -- with an input size of 10,000 numbers, the two programs take 0.89 and 0.88 seconds respectively on my MacBook Pro (2.16 GHz Intel Processor):
UNIX> sh sort_vec_time.sh 10000
0.089
0.088
UNIX> 
I wrote a second shell script bigtest_1.sh that varies the number of input elements from 1000 to 100000 elements and calls sort_vec_time.sh to test the sorting programs on them. For each test it prints the number of elements and then the two timings:
UNIX> sh bigtest_1.sh | head
1000 0.02 0.019
2000 0.026 0.025
3000 0.035 0.03
4000 0.043 0.041
5000 0.054 0.045
6000 0.056 0.054
7000 0.065 0.066
8000 0.072 0.074
9000 0.081 0.082
10000 0.088 0.089
UNIX> 
The full results are in test-data.txt. Do you see anything unexpected about the data? sort_vec_heap is not consistently outperforming sort_vec_heap_simp. Does that extrapolate to larger numbers of elements? We provide a graph in sort_test.jgr:

There are two properties of that graph that should speak to you. First, the squiggly curves indicate that the individual data points might have some variance, and that we should run more than one run per test. Second, there seems to be no appreciable difference between the two implementations. Let's solve the first problem by running bigtest_1.sh more times and plotting average values. The file test-data-2.txt contains the results of running bigtest_1.sh ten times. We graph it using sort_test2.jgr:

That is smoother, but again there is no appreciable difference between the two implementations. Should we then conclude that there is no difference? Not without probing further. Think about what sort_vec is doing when the number of elements is 100,000. It is reading over 900,000 bytes from disk into a vector and then creating a heap from it. The predominant source of overhead is reading the file, and not creating the heap. There is variability that comes from doing the disk reads, and this variability masks the difference between the two implementations.

We should try another approach. Instead, we should use massive_create_heap.cpp (and massive_create_heap_simp.cpp):

#include "pqueue_heap.h"
#include <cstdlib>
#include <iostream>
using namespace std;

main(int argc, char **argv)
{
  PQueue *pq;
  int i, j, nelements, niterations;
  vector <double> v;

  if (argc != 3) {
    cerr << "usage: massive_create_heap nelements niterations\n";
    exit(1);
  }

  nelements = atoi(argv[1]);
  niterations = atoi(argv[2]);

  v.resize(nelements);
  for (i = 0; i < niterations; i++) {
   for (j = 0; j < nelements; j++) v[j] = drand48();
   pq = new PQueue(v);
   delete pq;
  }
}

This program generates the random input in memory without reading from disk. It also will perform this multiple times so that we can make our program run longer, which makes it easier to time.

The shell script massive_test.sh tests the performance of both massive_create_heap_simp and massive_create_heap, running them with enough iterations so that the tests take around a half a second on my Macbook. As above, we run this for a variety of values between 10,000 and 1,000,000 and plot the timings below:

As you can see, massive_create_heap is indeed faster massive_create_heap_simp. Those jumps are interesting, aren't they? I'll talk about them below.

You can now see why we couldn't tell any difference in the sort_vec tests. When we did the test on 100,000 elements, the program was taking 0.8 seconds, but the heap construction was taking roughly 0.01 seconds for both programs -- clearly in the experimental noise.

What should you get from this? First, that experimental computer science research can be very tricky. We had a simple goal: to demonstrate that a O(n) algorithm runs faster than a O(n(log(n))) one. We tried an obvious solution, and we couldn't use it because we weren't really testing what we wanted to test. When we rewrote our testing programs to do a better job of removing the noise from the experiment, we were able to demonstrate improved performance.

Second, the O(n) algorithm is indeed faster.


Those jumps, and effects of memory

I will not test you on this -- this is just optional material, in case you are interested.

The first question you should ask yourself is whether those jumps are due to instruction count or memory. When I see curves like that, I think "memory." So, I instrumented pqueue_heap.cpp to keep track of the times it looks at an element of v in Push(). That is a rough estimate of instruction count. If you take a look at a graph of that, you'll see that the curve is smooth:

That is a hint that it's not instruction count that is causing the jumps.

Take a look at the following graphs. In these, I have instrumented the programs so that they keep track of the elements of v that they access in each call to either Percolate_Down(), or Push(). Here's the graph for creating the heap by repeatedly calling Push(). Time goes from top to bottom.

And here's the graph for creating the heap from the vector by repeatedly calling Percolate_Down():

Can you see why the first graph would cause jumps with fixed-sized caches, and the second would not? This would be a great exercise for a cache modeling assignment. We can talk about it in class when we have the time. Or, if I take the time to write about it, I'll do so for you.