CS302 Lecture notes -- More Priority Queues in C++

  • Brad Vander Zanden (heavily adapted from Jim Plank)

    The Bank Simulation

    The book uses a bank simulation example to motivate the need for priority queues. We will do such a simulation, more for the programming exercise than for any other reason.

    Basically, we are confronted with the following problem. We are bank executives, and we need to hire tellers. We want to hire enough tellers that customers don't have to wait in line too long. However, we don't want to hire too many tellers, because we want to save money. Thus, we want to hire an optimal number of tellers.

    One way to do this is to simulate a bank. To do this, we need to characterize on average when people come to a bank, and how long their transactions take. Then we can write a simulator that randomly generates people coming into a bank doing transactions, and we can see how the number of tellers impacts how long people wait, and how long tellers are idle. This will let us make a decision about the optimal number of tellers.

    Now, suppose we run the following simulation:

    New Person:    0 enters the bank at    1.5. Transaction time:    5.7
    New Person:    1 enters the bank at    2.8. Transaction time:    1.9
    New Person:    2 enters the bank at    3.3. Transaction time:    8.7
    New Person:    3 enters the bank at    9.2. Transaction time:    2.7
    
    We have two tellers, and four people.

    What happens is that the simulator generates four random people that enter the bank at 1.5, 2.8, 3.3, and 6.2 minutes. Their transactions take 5.7, 1.9, 8.7, and 2.8 minutes respectively. Given these parameters and two tellers, the simulation will go as follows:

    Note, we can easily calculate the average waiting time for people -- zero for persons 0, 1, and 3, and 1.4 minutes for person 2, so the average waiting time is .35 minutes. We can also calculate the tellers' idle time. Teller 1 waits 1.5 minutes for the first person and 2 minutes for the second person. Teller 2 waits 2.8 minutes for the first person and 0 minutes for the second person. Depending on the length of the simulation, we might also say that Teller 2 waits for some amount of time after the second person leaves. In this case let's assume that the bank "closes" at 12.0 minutes (meaning that the simulation will not generate any more persons after 12.0 minutes). Then Teller 2 waits for .1 minutes after the second person leaves. The idle time for teller 1 is 3.5 minutes and for teller 2 is 2.9 minutes. The average idle time is 3.2 minutes.

    Suppose we changed the situation so that there are three tellers. Now person 2 will not have to wait as they can be immediately served by teller 3. Persons 0, 1, and 4 still will not have to wait. The average waiting time therefore drops to 0 minutes. However, the teller idle time goes way up. Assuming that Teller 1 handles persons 0 and 3, then Teller 1 is idle from 0 to 1.5, 7.2 to 9.2, and 11.9 to 12.0, for a total of 4.1 minutes. Teller 2 is idle from 0 to 2.8 and then from 4.7 to 12 for a total of 10.1 minutes. Teller 3 is idle from 0 to 3.3 for a total of 3.3 minutes. The average idle time climbs from 3.2 minutes to 5.83 minutes. That's half the time the bank is open and would probably be considered wasteful by the bank's executives.

    Hopefully, at this point, you understand the basics of the simulation -- why we're writing it, and what the input and output is like. Now we get into details.


    Event Generation

    One of the trickiest parts of writing a simulator is choosing how you generate random events. Suppose I say that the average transaction time is 10 minutes. The following sequence of transactions fits this description: (i.e. the numbers average to ten):
    8 12 10 11 9 7 13
    
    As does this sequence:
    0 0 0 0 0 0 0 0 0 100
    
    I think we would all agree that these two sequences are greatly different, even though they both average to ten.

    This whole area is a big area in statistics, and there is quite a bit of math involved. I am not going to bore you with it. However, I'm going to use some statistical terms and concepts.

    Random numbers are defined to fit what are known as distributions. These define how we can characterize the random numbers in ways that are more specific than, say, a mean value. We will use two such distributions in our simulation. The first is a very simple distribution. It is called the uniform distribution. For our purposes, if we choose a random number r according to a uniform distribution with a mean m, this means that r will have a value between 0 and 2m, and that every value between 0 and 2m are equally likely.

    For example, suppose that our random numbers are integers. If we are choosing random numbers according to a uniform distribution with mean 3, then each time we choose a random number, it is equally likely that this number will be 0, 1, 2, 3, 4, 5, and 6.

    Uniform distributions are very easy to use in C. There are two important functions as part of the C standard library: srandom() and random(). srandom(i) takes a long i and uses it as a seed to the random number generator. Then each time you call random() it returns a long uniformly distributed in the range [0, 231-1].

    Therefore, if you want to get a random number according to a uniform distribution with a mean of m, you should use the formula:

    random()%((2*m) + 1)
    
    This formula will produce random numbers between 0 and 2m. Of course you may want to restrict this range but the idea should be clear: your range should be from [m - amount, m + amount].

    Here is an example:

        /* generate 20 random numbers using a uniform distribution with a mean
           of 50 */
        srandom(737);   // provide an initial "seed" to the random generator
        for (i = 0; i < 20; i++) {
          d = random()%101;
          printf("%d \t %ld \n", i, d);
        }
    

    We can view distribution functions using histograms. For example the following histogram shows a uniform distribution function whose mean is 5:

    You can read this in the following way: If you choose a random value, the probability of it being between 0 and 1 is 0.1. Similarly, the probability of it being between 1 and 2 is 0.1. Etc.

    Different distribution functions have diffierent histograms. A very important distribution function is the exponential distribution function. This distribution function has one parameter called lambda which is one over the mean of the distribution function. Here is a histogram of an exponential whose mean is 120:

    Now, this is awfully hard to make any sense of. If we instead plot the histogram on a log axis, it looks a little more palatable.
    Cool, no? Now, the exponential is very important, because many real-life phenomena follow an exponential distribution. For example, light bulbs fail according to an exponential distribution. Moreover, most queueing situations (i.e. people entering a bank, cars arriving at a stop light) follow exponential distributions.

    In our simulation, we are going to assume that people enter our bank according to an exponential distribution.


    Generating Random Events Without Too Much Math

    Now, in our simulation, we're going to generate the times that people enter a bank according to an exponential distribution. We're also going to generate their transactions according to a uniform distribution. To make this easier, we're going to define two event generator classes, histogramEventGenerator and uniformEventGenerator, which generate events from a histogram and from a uniform distribution with mean 'mean'. The declarations for these classes can be found in EventGen.h:
    #include <map>
    #include <string>
       using namespace std;
    
    class histogramEventGenerator {
      public:
        histogramEventGenerator(string filename);
        ~histogramEventGenerator();
        long next();   // produce the next random number
      protected:
        map<long,long> tree;
        long total;
    };
    
    class uniformEventGenerator {
      public:
        uniformEventGenerator(long mean);
        ~uniformEventGenerator();
        long next();   // produce the next random number
      protected:
        long mean;
    };
    
    Basically, you create an event generator by giving it either a mean of a uniform distribution, or the name of a histogram file. The format of this file is simply lines of x and y values of the histogram. That is, the x value is the middle of one of the histogram bars, and the y value is the height of the bar. The y values are relative frequencies. In other words, they do not have to sum to 1. For example, here is a histogram file for the uniform distribution with a mean of 1:
    0  1
    1  1
    2  1
    
    In other words, each value between 0 and 2 is equally likely. The exponential with a lambda of 1/120 (i.e. a mean of 120) is in the file expon_120:
    UNIX> head expon_120
    1      8264
    2      8195
    3      8127
    4      8060
    5      7993
    6      7927
    7      7861
    8      7796
    9      7731
    10      7667
    UNIX> tail expon_120
    1157      1
    1158      1
    1159      1
    1160      1
    1161      1
    1162      1
    1163      1
    1164      1
    1165      1
    1166      1
    UNIX> 
    
    Now, with one of these histogram files, we can generate random numbers as follows: Now, when you want to choose a random number, first choose a random number between 0 and total-1. Then find the node in the map whose key is the smallest key greater than this random number. Your random number is the val field of this node. The map class provides a method named upper_bound that returns an iterator to the first map entry strictly greater than the given key, which is perfect for this application.

    For example, suppose you wanted to do this with the histogram file for the uniform distribution with a mean of 1. You'll insert three nodes into your tree: (key=1, val=0), (key=2,val=1), (key=3,val=2). Now, when you want to get a random number, you choose a random number using random() between 0 and 2. Suppose that number is 1. You find the node whose key is the smallest key greater than 1 -- that is the node whose key is 2. And you use that xnode's val, which is 1.

    In sum, the next method for the uniform and exponential event generator classes will be implemented as follows:

    1. For the uniform distribution, the random number is:
           randomNumber = random() % ((2 * mean) + 1);
           

    2. For a histogram distribution, the random number is:
           index = random() % (total-1);
           find the first node in the map that is greater than index
           and return the value of this node (i.e., the x-value in the histogram)
           

    You will write these random number simulators in lab.


    Writing the Simulator

    Ok -- now we have enough information to write our simulator. We're going to generate times for people entering the bank according to a histogram file. The one we'll use is expon_120, which is an exponential whose mean is 120. We'll generate transaction times according to a uniform distribution.

    Now, our simulator will revolve around three classes: Tellers, Persons, and Events. There will be one Teller for each teller in the bank, and we will number them starting with zero. There will be one Person for each person that enters the bank. Again, we will number them starting with zero. There will be one Event for each of the following events:

    There are 5 main data structures in the program.
    1. A queue called line of people waiting in line at the bank. A person must wait in line if he/she is in the bank and all the tellers are busy with transactions.
    2. A queue called free_tellers. A teller is placed on free_tellers if he/she is free and if line is empty.
    3. A priority queue called eventQueue. You can use the C++ STL priority queue in which case you will store events on the queue and provide a comparator class that compares event times. For arrival events, the event time is the time that the person entered the bank. For departure events, the event time is the time that the transaction finishes and the person leaves the bank.
    4. An event generator that generates times from an exponential distribution. This event generator is called histogramEventGenerator.
    5. An event generator that generates times from a uniform distribution. This event generator is called uniformEventGenerator.
    Now, what the program does is the following: