CS302 Lecture notes -- More Priority Queues in C++

Brad Vander Zanden (heavily adapted from Jim Plank)

The Bank Simulation

The book uses a bank simulation example to motivate the need for priority queues. We will do such a simulation, more for the programming exercise than for any other reason.

Basically, we are confronted with the following problem. We are bank executives, and we need to hire tellers. We want to hire enough tellers that customers don't have to wait in line too long. However, we don't want to hire too many tellers, because we want to save money. Thus, we want to hire an optimal number of tellers.

One way to do this is to simulate a bank. To do this, we need to characterize on average when people come to a bank, and how long their transactions take. Then we can write a simulator that randomly generates people coming into a bank doing transactions, and we can see how the number of tellers impacts how long people wait, and how long tellers are idle. This will let us make a decision about the optimal number of tellers.

Now, suppose we run the following simulation:

New Person:    0 enters the bank at    1.5. Transaction time:    5.7
New Person:    1 enters the bank at    2.8. Transaction time:    1.9
New Person:    2 enters the bank at    3.3. Transaction time:    8.7
New Person:    3 enters the bank at    9.2. Transaction time:    2.7

We have two tellers, and four people.

What happens is that the simulator generates four random people that enter the bank at 1.5, 2.8, 3.3, and 6.2 minutes. Their transactions take 5.7, 1.9, 8.7, and 2.8 minutes respectively. Given these parameters and two tellers, the simulation will go as follows:

At 1.5, person 0 enters the bank. Both tellers are free, so teller 1 starts working on the person's transaction. The transaction will take 5.7 minutes, so it will be done at 7.2 minutes.
At 2.8, person 1 enters the bank. Teller 2 is free, so teller 2 starts working on the person's transaction. The transaction will take 1.9 minutes, so it will be done at 4.7 minutes.
At 3.3, person 2 enters the bank. Both tellers are busy, so person 2 must wait in line until one of the tellers is free.
At 4.7, teller 2 is free, so person 2 can get off the line, and the teller can work on person 2's transaction. Person 2's transaction will take 8.7 minutes, so it will be done at 13.4 minutes.
At 7.2, teller 1 is free. Since no one is in line, teller 1 remains free.
At 9.2, person 3 enters the bank. Teller1 is free, so teller 1 starts working on the person's transaction. The transaction will take 2.7 minutes, so it will be done at 11.9 minutes.
At 12.0, teller 1 is free.
At 13.4, teller 2 is free. There are no more people generated by the simulation, so the simulation is complete.

Note, we can easily calculate the average waiting time for people -- zero for persons 0, 1, and 3, and 1.4 minutes for person 2, so the average waiting time is .35 minutes. We can also calculate the tellers' idle time. Teller 1 waits 1.5 minutes for the first person and 2 minutes for the second person. Teller 2 waits 2.8 minutes for the first person and 0 minutes for the second person. Depending on the length of the simulation, we might also say that Teller 2 waits for some amount of time after the second person leaves. In this case let's assume that the bank "closes" at 12.0 minutes (meaning that the simulation will not generate any more persons after 12.0 minutes). Then Teller 2 waits for .1 minutes after the second person leaves. The idle time for teller 1 is 3.5 minutes and for teller 2 is 2.9 minutes. The average idle time is 3.2 minutes.

Suppose we changed the situation so that there are three tellers. Now person 2 will not have to wait as they can be immediately served by teller 3. Persons 0, 1, and 4 still will not have to wait. The average waiting time therefore drops to 0 minutes. However, the teller idle time goes way up. Assuming that Teller 1 handles persons 0 and 3, then Teller 1 is idle from 0 to 1.5, 7.2 to 9.2, and 11.9 to 12.0, for a total of 4.1 minutes. Teller 2 is idle from 0 to 2.8 and then from 4.7 to 12 for a total of 10.1 minutes. Teller 3 is idle from 0 to 3.3 for a total of 3.3 minutes. The average idle time climbs from 3.2 minutes to 5.83 minutes. That's half the time the bank is open and would probably be considered wasteful by the bank's executives.

Hopefully, at this point, you understand the basics of the simulation -- why we're writing it, and what the input and output is like. Now we get into details.

Event Generation

One of the trickiest parts of writing a simulator is choosing how you generate random events. Suppose I say that the average transaction time is 10 minutes. The following sequence of transactions fits this description: (i.e. the numbers average to ten):

8 12 10 11 9 7 13

As does this sequence:

0 0 0 0 0 0 0 0 0 100

I think we would all agree that these two sequences are greatly different, even though they both average to ten.

This whole area is a big area in statistics, and there is quite a bit of math involved. I am not going to bore you with it. However, I'm going to use some statistical terms and concepts.

Random numbers are defined to fit what are known as distributions. These define how we can characterize the random numbers in ways that are more specific than, say, a mean value. We will use two such distributions in our simulation. The first is a very simple distribution. It is called the uniform distribution. For our purposes, if we choose a random number r according to a uniform distribution with a mean m, this means that r will have a value between 0 and 2m, and that every value between 0 and 2m are equally likely.

For example, suppose that our random numbers are integers. If we are choosing random numbers according to a uniform distribution with mean 3, then each time we choose a random number, it is equally likely that this number will be 0, 1, 2, 3, 4, 5, and 6.

Uniform distributions are very easy to use in C. There are two important functions as part of the C standard library: srandom() and random(). srandom(i) takes a long i and uses it as a seed to the random number generator. Then each time you call random() it returns a long uniformly distributed in the range [0, 2³¹-1].

Therefore, if you want to get a random number according to a uniform distribution with a mean of m, you should use the formula:

random()%((2*m) + 1)

This formula will produce random numbers between 0 and 2m. Of course you may want to restrict this range but the idea should be clear: your range should be from [m - amount, m + amount].

Here is an example:

    /* generate 20 random numbers using a uniform distribution with a mean
       of 50 */
    srandom(737);   // provide an initial "seed" to the random generator
    for (i = 0; i < 20; i++) {
      d = random()%101;
      printf("%d \t %ld \n", i, d);
    }

We can view distribution functions using histograms. For example the following histogram shows a uniform distribution function whose mean is 5:

You can read this in the following way: If you choose a random value, the probability of it being between 0 and 1 is 0.1. Similarly, the probability of it being between 1 and 2 is 0.1. Etc.

Different distribution functions have diffierent histograms. A very important distribution function is the exponential distribution function. This distribution function has one parameter called lambda which is one over the mean of the distribution function. Here is a histogram of an exponential whose mean is 120:

Now, this is awfully hard to make any sense of. If we instead plot the histogram on a log axis, it looks a little more palatable.

Cool, no? Now, the exponential is very important, because many real-life phenomena follow an exponential distribution. For example, light bulbs fail according to an exponential distribution. Moreover, most queueing situations (i.e. people entering a bank, cars arriving at a stop light) follow exponential distributions.

In our simulation, we are going to assume that people enter our bank according to an exponential distribution.

Generating Random Events Without Too Much Math

Now, in our simulation, we're going to generate the times that people enter a bank according to an exponential distribution. We're also going to generate their transactions according to a uniform distribution. To make this easier, we're going to define two event generator classes, histogramEventGenerator and uniformEventGenerator, which generate events from a histogram and from a uniform distribution with mean 'mean'. The declarations for these classes can be found in EventGen.h:

#include <map>
#include <string>
   using namespace std;

class histogramEventGenerator {
  public:
    histogramEventGenerator(string filename);
    ~histogramEventGenerator();
    long next();   // produce the next random number
  protected:
    map<long,long> tree;
    long total;
};

class uniformEventGenerator {
  public:
    uniformEventGenerator(long mean);
    ~uniformEventGenerator();
    long next();   // produce the next random number
  protected:
    long mean;
};

Basically, you create an event generator by giving it either a mean of a uniform distribution, or the name of a histogram file. The format of this file is simply lines of x and y values of the histogram. That is, the x value is the middle of one of the histogram bars, and the y value is the height of the bar. The y values are relative frequencies. In other words, they do not have to sum to 1. For example, here is a histogram file for the uniform distribution with a mean of 1:

0  1
1  1
2  1

In other words, each value between 0 and 2 is equally likely. The exponential with a lambda of 1/120 (i.e. a mean of 120) is in the file expon_120:

UNIX> head expon_120
1      8264
2      8195
3      8127
4      8060
5      7993
6      7927
7      7861
8      7796
9      7731
10      7667
UNIX> tail expon_120
1157      1
1158      1
1159      1
1160      1
1161      1
1162      1
1163      1
1164      1
1165      1
1166      1
UNIX>

Now, with one of these histogram files, we can generate random numbers as follows:

Create a C++ STL map.
Set total to zero.
Do the following for each line of the histogram file:
- Read in a line of the file and get an x and y value.
- Add y to total
- Insert a new node into the tree with a key of total and a value of x.

Now, when you want to choose a random number, first choose a random number between 0 and total-1. Then find the node in the map whose key is the smallest key greater than this random number. Your random number is the val field of this node. The map class provides a method named upper_bound that returns an iterator to the first map entry strictly greater than the given key, which is perfect for this application.

For example, suppose you wanted to do this with the histogram file for the uniform distribution with a mean of 1. You'll insert three nodes into your tree: (key=1, val=0), (key=2,val=1), (key=3,val=2). Now, when you want to get a random number, you choose a random number using random() between 0 and 2. Suppose that number is 1. You find the node whose key is the smallest key greater than 1 -- that is the node whose key is 2. And you use that xnode's val, which is 1.

In sum, the next method for the uniform and exponential event generator classes will be implemented as follows:

For the uniform distribution, the random number is:

     randomNumber = random() % ((2 * mean) + 1);

For a histogram distribution, the random number is:

     index = random() % (total-1);
     find the first node in the map that is greater than index
     and return the value of this node (i.e., the x-value in the histogram)

You will write these random number simulators in lab.

Writing the Simulator

Ok -- now we have enough information to write our simulator. We're going to generate times for people entering the bank according to a histogram file. The one we'll use is expon_120, which is an exponential whose mean is 120. We'll generate transaction times according to a uniform distribution.

Now, our simulator will revolve around three classes: Tellers, Persons, and Events. There will be one Teller for each teller in the bank, and we will number them starting with zero. There will be one Person for each person that enters the bank. Again, we will number them starting with zero. There will be one Event for each of the following events:

Arrival Event: A person enters the bank. This event will contain a pointer to the Person.
Departure Event: A teller finishes a transaction and the person leaves the bank. This event will contain a pointer to the Person and the Teller.

There are 5 main data structures in the program.

A queue called line of people waiting in line at the bank. A person must wait in line if he/she is in the bank and all the tellers are busy with transactions.
A queue called free_tellers. A teller is placed on free_tellers if he/she is free and if line is empty.
A priority queue called eventQueue. You can use the C++ STL priority queue in which case you will store events on the queue and provide a comparator class that compares event times. For arrival events, the event time is the time that the person entered the bank. For departure events, the event time is the time that the transaction finishes and the person leaves the bank.
An event generator that generates times from an exponential distribution. This event generator is called histogramEventGenerator.
An event generator that generates times from a uniform distribution. This event generator is called uniformEventGenerator.

Now, what the program does is the following:

It error checks the command line arguments.
It creates line, free_tellers and eventQueue. At first all of these are empty.
It creates an instance of histogramEventGenerator and uniformEventGenerator.
Now, it creates the Tellers and puts them all on the free_tellers queue.
Next, it creates all the Person's for the simulation, generating both their arrival time and their transaction times. You will write a procedure called generate_persons that starts the time clock at 0. Your method should then repeatedly create customers by:
1. calling the histogramGenerator to generate an inter-arrival time (or elapsed time between customer arrivals)
2. adding this time to the current clock time in order to get the time when the customer arrives at the bank. If the arrival time occurs after the bank "closes" (i.e., the simulation ends), then you should discard this time and return from the function. Otherwise you should update the clock time to be equal to this customer's arrival time.
3. calling the uniformEventGenerator to get a transaction time.
4. creating an arrivalEvent record with the new person, their arrival time, and their transaction time.
5. adding the arrival event to the eventQueue.
Now, eventQueue is processed. What happens is we grab the event with the smallest key. This will either be a person entering the bank, or a teller finishing a transaction. If it's an arrival event, then the following actions are performed:
- the free_tellers queue is checked. If a teller is free, then that teller is removed from the queue, and a departure event is generated and put into eventQueue. The customer's transaction time should be added to the time at which the person is assigned to the teller to get the departure time for the event. The teller's cumulative idle time is also incremented by the amount of time the teller spent waiting on the free queue. This time can be calculated as the difference between the current time and the time when the teller was placed on the teller's queue. Finally the person's waiting time is calculated. The idle time is equal to the time when the transaction started minus the time the person entered the bank. The waiting time is added to a counter that keeps track of cumulative waiting time. If no teller is free, then the person is put onto line.
If the event is a departure event, then the following actions occur:
- A counter that keeps track of the number of people processed is incremented.
- The person is deleted (i.e. the person leaves the bank), and the teller either processes the next person on the line, or puts itself onto the free_tellers queue. If the teller puts itself onto the free_tellers queue, then it records the time it is put on the queue.
You keep processing events until eventQueue is empty. When eventQueue is empty, the simulation is over.