CS302 Lecture Notes - Priority Queues

The priority queue data structure maintains a collection of items with fast access to the minimum. Its API supports the following operations:

In include/pqueue.hpp, I define an interface for a priority queue which holds doubles. Then I define class definitions for two implementations:

As you can see, I've added a few useful methods -- Size(), Empty() and Print() -- to the interface.

#include <vector>
#include <set>

/* The Main Priority Queue Interface. */

class PQueue {
  public:
    virtual ~PQueue() {};
    virtual void    Push(double d) = 0;
    virtual double  Pop()          = 0;
    virtual size_t  Size() const   = 0;
    virtual bool    Empty() const  = 0;
    virtual void    Print() const  = 0;
};
/* PQueueSet: Implementing the priority 
   queue with a multiset */

class PQueueSet : public PQueue {
  public:
    void    Push(double d);
    double  Pop();
    size_t  Size() const;
    bool    Empty() const;
    void    Print() const;

    PQueueSet();
  protected:
    std::multiset <double> elements;
};
/* PQueueHeap: Implementing the priority 
   queue with a binary heap.  You'll note
   that there is a second constructor
   here that creates the priority queue
   from a vector of doubles. */

class PQueueHeap : public PQueue {
  public:
    void    Push(double d);
    double  Pop();
    size_t  Size() const;
    bool    Empty() const;
    void    Print() const;

    PQueueHeap();
    PQueueHeap(const std::vector <double> &init);
  protected:
    std::vector <double> h;
    void Percolate_Down(size_t index);
};

You'll note that there is an extra constructor in the heap implementation. This is a constructor that creates a priority queue from an unsorted vector of doubles. We'll discuss that more below.


The multiset implementation

The easiest way to implement a priority queue is to use a multiset. This implementation is so simple that most methods take one line of code. It is in src/pqueue_set.cpp:

#include <cstdio>
#include <iostream>
#include <cstdlib>
#include "pqueue.hpp"
using namespace std;

/* Using a multiset to implement the priority queue makes 
   most of these methods really simple.  The first four
   here are one-liners: */

        PQueueSet::PQueueSet()    { }
void    PQueueSet::Push(double d) { elements.insert(d); }
size_t  PQueueSet::Size() const   { return elements.size(); }
bool    PQueueSet::Empty() const  { return elements.empty(); }

/* Print() is straightfoward. */

void PQueueSet::Print() const
{
  multiset <double>::const_iterator hit;

  for (hit = elements.begin(); hit != elements.end(); hit++) {
    if (hit != elements.begin()) cout << " ";
    cout << *hit;
  }
  cout << endl;
}
/* With Pop(), we error check, and then if the multiset
   is not empty, we return the first element, and erase
   it from the multiset. */

double PQueueSet::Pop() 
{
  multiset <double>::iterator hit;
  double retval;
  
  if (elements.empty()) {
    throw (string) "Called Pop() on an empty PQueue";
  }
  hit = elements.begin();
  retval = *hit;
  elements.erase(hit);
  return retval;
}

To test, I have written two simple programs that use Priority Queues. The first is src/add_5.cpp. It creates an insteance of the PQueue and then prompts the user for five numbers. It pushes them onto the priority queue and then prints the queue. It reads its command line argument to determine whether to use the multiset implementation ('s') or the heap implementation ('h').

#include "pqueue.hpp"
#include <iostream>
#include <cstdlib>
#include <cstdio>
using namespace std;

int main(int argc, char **argv)
{
  PQueue *pq;
  int i;
  double d;

  /* Construct the proper implementation of the priority queue. */

  if (argc != 2 || (argv[1][0] != 's' && argv[1][0] != 'h')) {
    fprintf(stderr, "usage: add_5 s|h\n");
    exit(1);
  }

  switch (argv[1][0]) { 
    case 's': pq = new PQueueSet(); break;
    case 'h': pq = new PQueueHeap(); break;
    default:  exit(1);
  }
  /* Prompt for five numbers.
     Push them and print the PQueue. */

  cout << "Enter five numbers.\n";

  for (i = 0; i < 5; i++) {
    if (!(cin >> d)) exit(1);
    pq->Push(d);
  }
  pq->Print();
  return 0;
}

When we run bin/add_5 on the multiset implementation, the numbers are sorted, because the multiset sorts them:

UNIX> bin/add_5 s
Enter five numbers.
3 5 2 4 1
1 2 3 4 5
UNIX> 
The second program is src/pq_sort.cpp. This program reads doubles from standard input and sorts them using a priority queue. Like bin/add_5 it uses its command line argument to determine which implementation to use: After reading in the values, it pops them all, printing them out as it pops them. That prints the values in sorted order. Take a look at the code to error check -- it's a useful trick when you want to check whether a value belongs to a set of characters:

#include "pqueue.hpp"
#include <iostream>
#include <cstdlib>
#include <cstdio>
using namespace std;

int main(int argc, char **argv)
{
  PQueue *pq;
  double d;
  string imps;
  vector <double> v;

  /* Error check, and then create the pq with either the set or heap implementation. */

  imps = "shl";
  if (argc != 2 || imps.find(argv[1][0]) == string::npos) {
    fprintf(stderr, "usage: pqueue_sort c where c is a character in %s\n", imps.c_str());
    exit(1);
  }

  switch (argv[1][0]) { 
    case 's': pq = new PQueueSet(); break;
    case 'h': pq = new PQueueHeap(); break;
    case 'l': pq = NULL; break;
    default:  exit(1);
  }

  /* With 's' and 'h', simply read and push onto the heap.  With 'l', read everything
     into a vector and then construct the heap from the vector. */

  if (pq == NULL) {
    while (cin >> d) v.push_back(d);
    pq = new PQueueHeap(v);
  } else {
    while (cin >> d) pq->Push(d);
  }

  /* Popping everything sorts the doubles. */

  while (!pq->Empty()) cout << pq->Pop() << endl;

  return 0;
}

Let's test it on the multiset:

UNIX> head -n 5 txt/input.txt
782.678
270.576
447.465
501.138
714.567
UNIX> head -n 5 txt/input.txt | bin/pq_sort s
270.576
447.465
501.138
714.567
782.678
UNIX> wc txt/input.txt
    1000    1000    7885 txt/input.txt
UNIX> bin/pq_sort s < txt/input.txt > txt/output1.txt
UNIX> sort -n input.txt > txt/output2.txt
UNIX> diff txt/output1.txt txt/output2.txt
UNIX> 
Now, time to learn about binary heaps:

Implementing the priority queue with a binary heap

Typically, one implements a priority queue with a data structure called a binary heap (or simply heap for short). A heap is a "complete binary tree" -- which means it is a tree where every level is full, except for the last. The last level fills up from the left to the right. Thus, if we know how many elements are in a heap, we know exactly what it looks like. For example, a heap with 10 elements will look as follows, regardless of the values of the elements.

The other property of a heap is that the value of each node must be less than or equal to all of the values in its subtrees. That's it. Below are two examples of heaps:

That second one has duplicate values (two sixes), but that's ok. It still fits the definition of a heap.

Below are three examples of non-heaps:


The 10 node is not smaller than the values in its subtrees.

Not a complete tree.

The last row is not filled from left to right.

When we push a value onto a heap, we know where the new node is going to go. So we insert the value there. Unfortunately, that may not result in a new heap, so we adjust the heap by "percolating up." To percolate up, we look at the value's parent. If the parent is smaller than the value, then we can quit -- we have a valid heap. Otherwise, we swap the value with its parent, and continue percolating up.

Below we give an example of inserting the value 2 into a heap:

Start

Put 2 into the new node and percolate up.

Continue percolating, since 2 is less than 3.

Now we stop, since 1 is less than 2.

To pop a value off a heap, again, we know the shape of the tree when we're done -- we will lose the rightmost node on the last level. So what we do is put the value in that node into the root (we will return the old value of the root as the return value of the Pop() call). Of course, we may not be left with a heap, so as in the previous example, we must modify the heap. We do that by "Percolating Down." This is a little more complex than percolating up. To percolate down, we check a value against its children. If it is the smallest of the three, then we're done. Otherwise, we swap it with the child that has the minimum value, and continue percolating down.

As before, an example helps. We will call Pop() on the last heap above. This will return 1 to the user, and will delete the node holding the 7, so we put the value 7 into the root and start percolating down:

Start

Swap 7 and 2

Swap 7 and 3. We're Done.

If we think about running time complexity, obviously the depth of a heap with n elements is log2(n). Thus, Push() and Pop() both run with O(log(n)) complexity. This is the exact same as using a multiset above. So why do we bother?

The answer is that even though the two have the same running time complexity, the heap implementation is more efficient. Since we always add and subtract elements to and from the end of the heap, we may implement the heap as a vector. The root is at index 0, the nodes at the next level are at indices 1 and 2, etc. For example, we show how the final heap above maps to a vector:

When we Push() an element onto the heap, we perform a push_back() on the vector, and then we percolate up. When we Pop() an element off the heap, we store the root, replace it with the last element, call pop_back() on the vector, and percolate down starting at the root. A quick inspection of the indices shows that:

Because we use a vector rather than a linked data structure (like a red-black tree or AVL tree, which is how the STL implements maps and sets), the operations should be more efficient than using the multiset implementation.

The Heap Implementation

The heap implementation is in src/pqueue_heap.cpp. The vector that implements the heap is named h. Below, I show Push(), which appends the element to the vector and then percolates up:

/* With Push(), we append the new element to the
   end of the vector, and then percolate up. */

void PQueueHeap::Push(double d) 
{ 
  int i, parent;
  double tmp;

  /* Put the element at the end of the vector. */

  h.push_back(d);

  /* And then percolate up.  i is always
     pointing to the newly added element. */

  i = h.size()-1;
  while (1) {
    if (i == 0) return;
    parent = (i-1)/2;
    if (h[parent] > h[i]) {
      tmp = h[i];
      h[i] = h[parent];
      h[parent] = tmp;
      i = parent;
    } else {
      return;
    }
  }
}

I have implemented Percolate_Down as a separate protected method, because it is used in two places. The first is Pop(), which moves the last element to the root of the tree and percolates down. Here is the code for both Percolate_Down(), Pop() and Print(). Percolate_Down() has to test a few cases, depending on whether there is a right child, or whether the right child's value is less than the left child's value.

/* I have implemented Percolate_Down as a separate
   protected method, because it is used in two
   places: Pop() and the vector constructor. */

void PQueueHeap::Percolate_Down(size_t index)
{
  size_t lc, rc;
  double tmp;

  /* lc is the left child, and
     rc is the right child. */

  while (1) {                
    lc = index*2+1;
    rc = lc+1;
   
    /* If index is on the bottom of the heap,
       then return, because we can't 
       percolate any more. */

    if (lc >= h.size()) return;

    /* With this code, either there is no
       right child, or the left child is
       smaller than the right child.  In 
       this case, you need to compare the
       element at index with the left child. */

    if (rc == h.size() || h[lc] <= h[rc]) {
      if (h[lc] < h[index]) {
        tmp = h[lc];
        h[lc] = h[index];
        h[index] = tmp;
        index = lc;
      } else {
        return;
      }

    /* Otherwise, we compare the element at
       index with the right child. */

    } else if (h[rc] < h[index]) {
      tmp = h[rc];
      h[rc] = h[index];
      h[index] = tmp;
      index = rc;
    } else {
      return;
    }
  }
}
/* With Pop(), after we error check, we
   move the last element to the root of
   the heap, and percolate down. */

double PQueueHeap::Pop() 
{
  double retval;
  
  if (h.empty()) {
    throw (string) "Called Pop() on an empty PQueue";
  }
  retval = h[0];
  h[0] = h[h.size()-1];
  h.pop_back();
  if (h.size() == 0) return retval;
  Percolate_Down(0);
  return retval;
}

/* The vector constructor calls Percolate_Down()
   on all non-leaf elements, going from the bottom 
   of the heap to the top. */

PQueueHeap::PQueueHeap(const vector <double> &v) 
{ 
  int i;

  h = v;

  for (i = (int) h.size()/2-1; i >= 0; i--) Percolate_Down(i);
}

/* Print is straightforward. */

void PQueueHeap::Print() const
{
  size_t i;

  for (i = 0; i < h.size(); i++) {
    if (i != 0) cout << " ";
    cout << h[i];
  }
  cout << endl;
}

Let's double-check ourselves with bin/add_5:

UNIX> bin/add_5 h
Enter five numbers.
3 5 2 4 1
1 2 3 5 4
UNIX> bin/add_5 h
Enter five numbers.
5 4 3 2 1
1 2 4 5 3
UNIX> 
In the first call, we don't need to do any percolating when we insert elements into the heap. Thus, the vector has the values in order. In the second call, the five add calls each percolate the new value to the root of the heap:

To verify that percolating down works, let's compare pq_sort with the multiset implementation:

UNIX> head -n 5 txt/input.txt | bin/pq_sort h
270.576
447.465
501.138
714.567
782.678
UNIX> bin/pq_sort h < txt/input.txt > txt/output3.txt
UNIX> diff txt/output1.txt txt/output3.txt
UNIX> 

Constructing a heap from a vector

The final piece of the implementation is the code that constructs the priority queue from an unsorted vector. Obviously, you could simply Push() each element of the vector, which would make the constructor < O(n(log(n))). However, you can do better. Let us do the following instead:

Below is an example. Here's the vector that we want to turn into a heap:

77 21 55 94 1 27 7 67 61 78

We convert the vector into a tree, as pictured below, and then we call Percolate_Down() on nodes of each successively higher level. These are the nodes pictured in yellow. By the time we get to the root, we have created a valid heap:

To analyze the running time complexity, we need a little math. Suppose our tree is complete at each level. The tree has n nodes, and thus its depth is log(n). The worst case performance of Percolate_Down() for a node at level i is log(n)-i. Thus, the performance of Percolate_Down() for a node at the bottom level is 1; for a node at the penultimate level, it is 2, etc.

The bottom level contains (n+1)/2 nodes, which is roughly n/2. Thus, all of the Percolate_Down()'s take (1)(n/2) = n/2 operations. At the next level, there are n/4 nodes, whose Percolate_Down()'s take 2 operations each. Thus, all of them take 2n/4. At the next level, there are n/8 nodes whose Percolate_Down()'s take 3 operations each. Thus, all of them take 3n/8. Do you see the pattern? To help you, try the animated gif in img/vector.gif, which shows the percolate down's for a heap with 511 elements.

The creation of the heap can be converted into a summation:

To figure out what that second summation is, let's first write a quick C++ program to calculate its value for the first 15 values of n. The code is below (src/formula.cpp), and to the right, we show its output:

#include <iostream>
using namespace std;

int main()
{
  double num, den, total, n;

  num = 1;
  den = 2;
  total = 0;
  
  for (n = 0; n < 15; n++) {
    total += (num/den);
    num++;
    den *= 2;
    cout << total << endl;
  }
}
UNIX> bin/formula
0.5
1
1.375
1.625
1.78125
1.875
1.92969
1.96094
1.97852
1.98828
1.99365
1.99658
1.99817
1.99902
1.99948
UNIX> 

Looks like it converges to 2. Let's analyze it mathematically. Consider the following sum:

Now, let's consider the equation G - G/2:

From high school math, we know that this last summation equals one, so G - G/2 = 1, which means that G = 2.


What is the upshot of all of this??

The upshot is that creating a heap from a vector is less than or equal to Gn, which means that it is O(n) instead of O(n(log(n))). Is that significant? Let's test it -- here's the vector constructor in src/pqueue_heap.cpp:

PQueueHeap::PQueueHeap(const vector <double> &v) 
{ 
  int i;

  h = v;

  for (i = (int) h.size()/2-1; i >= 0; i--) Percolate_Down(i);
}

That was simple, wasn't it? We test performance below:


Performance Testing

For our first performance test, let's use bin/pq_sort. I have written a very simple program called src/genrandom.cpp that prints a specified number of random doubles. My thought is to use bin/genrandom to generate test cases for bin/pq_sort. We'll pipe bin/genrandom into bin/pq_sort and then redirect the standard output of bin/pq_sort to /dev/null, which deletes it:
UNIX> bin/genrandom 1000 | bin/pq_sort s > /dev/null
UNIX>
If we try to time this with time, we see no output:
UNIX> time bin/genrandom 1000 | bin/pq_sort s > /dev/null
UNIX> 
That's because time's output is also going to /dev/null. To fix this, we have time execute a subshell:
UNIX> time sh -c "bin/genrandom 1000 | bin/pq_sort s > /dev/null"
0.009u 0.005s 0:00.01 0.0%	0+0k 0+0io 0pf+0w
UNIX> 
Great -- now we can write a shell script called scripts/bigtest_1.sh, which calls all three implementations with input sizes ranging from 1000 to 100,000:
UNIX> sh scripts/bigtest_1.sh | head
s 1000 0.00
s 2000 0.01
s 3000 0.02
s 4000 0.02
s 5000 0.02
s 6000 0.04
s 7000 0.04
s 8000 0.05
s 9000 0.05
s 10000 0.05
UNIX> 
It prints out the implementation, the number of doubles and the wall clock time in seconds (MacBook Pro in 2018). The full data is in txt/test-data1.txt. I graph it below (using jgraph - scripts/sort_test1.jgr):

That graph sucks. The main reason is the squiggly lines -- what they indicate is that the individual data points have some variance, and that we should run more than one run per test, averaging the results. Still, if we ignore the squiggly lines, we draw two conclusions:

  1. The heap implementations are faster than the set implementation.
  2. There is no appreciable difference between the two heap implementations.
Is the second conclusion surprising? Let's think about it some more. What's going on when we are testing 100,000 numbers? We are timing a lot of things here, in addition to what we want to be timing. In particular, we are timing a lot of I/O to and from pq_test. Let's instead try timing just what we want to time -- the time to create the heap.

I've written a testing program called src/massive_create.cpp, which takes three command line arguments:

  1. s, h or c
  2. A number of elements
  3. A number of iterations
It then iterates for the given number of iterations, creating a priority queue with the given number of random elements. Once it is done creating the priority queue, it deletes it. In that way, we're only measuring the priority queue creation time. I won't bother showing you the code -- you can get it from the link above.

I've also changed my shell script. For each test, it runs bin/massive_create with i iterations, where i * n = 10,000,000. That gives me running times on the order of .5 seconds per test. I ran this on the Linux box in my office (so that I'm not timing the 500 other things that my Macbook is doing right now), and varied the number of elements from 10,000 to 1,000,000 (just 100,000 for the multiset).

Those jumps in the red lines are interesting, aren't they? I'll talk about them below.

That's a BIG difference, isn't it? Look at the values on the axes -- these tests are running roughly 100 times faster than the previous ones! Not only are the lines smoother, but you can very clearly see the performance improvements between multiset and heap, and from the two different constructors.

What should you take from this? First, that experimental computer science research can be very tricky. We had a simple goal: to demonstrate that a O(n) algorithm runs faster than a O(n(log(n))) one. We tried an obvious solution, and we couldn't use it because we weren't really testing what we wanted to test. When we rewrote our testing programs to do a better job of removing the noise from the experiment, we were able to demonstrate improved performance.

Second, the O(n) algorithm is indeed faster.


Those jumps, and effects of memory

I will not test you on this -- this is just optional material, in case you are interested.

The first question you should ask yourself is whether those jumps are due to instruction count or memory. When I see curves like that, I think "memory." So, I instrumented pqueue_heap.cpp to keep track of the times it looks at an element of v in Push(). That is a rough estimate of instruction count. If you take a look at a graph of that, you'll see that the curve is smooth:

That is a hint that it's not instruction count that is causing the jumps.

Take a look at the following graphs. In these, I have instrumented the programs so that they keep track of the elements of v that they access in each call to either Percolate_Down(), or Push(). Here's the graph for creating the heap by repeatedly calling Push(). Time goes from top to bottom.

And here's the graph for creating the heap from the vector by repeatedly calling Percolate_Down():

Can you see why the first graph would cause jumps with fixed-sized caches, and the second would not? This would be a great exercise for a cache modeling assignment. We can talk about it in class when we have the time. Or, if I take the time to write about it, I'll do so for you.