CS302 Lecture Notes - Uniform, Exponential and Weibull Random Numbers


When we generate a sequence of random numbers, there need to be parameters involved to say what the properties that sequence has. We usually assume a "uniform" distribution between two numbers. For example, drand48() returns a number uniformly distributed between 0 and 1. This means that all numbers between 0 and 1 have an equal chance of being generated. When we roll a die, we are generating a random integer uniformly distributed between 1 and 6, meaning that each of those values has an equal chance of being generated.

A probability distribution defines the characteristics of generating random numbers. One way to characterize this is with a cumulative distribution function (CDF), defined as: F(x) = the probability that a random number is less than or equal to x for all possible x.

Think about a uniform distribution whose minimum value is a and whose maximum value is b. What is F(b)? It is one, since all values are less than or equal to b. The mean of a and b is (a+b)/2. So what about F((b+a)/2)? It is 0.5, since half of the numbers will be below and mean and half will be above the mean.

We plot CDF's with the x values on the x-axis, and the F(x) values on the Y axis. Note, the Y axis therefore only goes from zero to one, and the CDF has to be a monotonically increasing function. Here is the CDF for the uniform distribution whose minimum value is a and whose maximum value is b:

Mathematically, this is:

F(x) = (x-a)/(b-a)

We can use a CDF to generate random numbers using drand48(). Let y be a number generated with drand48(). What we do to generate the number according to a CDF is to find the value of x for which F(x) = y, and x is our random number.

Try it for the uniform distribution above. Suppose we generate y = drand48(). We need to find the value of x for which:

y = F(x) = (x-a)/(b-a)

Using algebra we can massage that properly:

y = (x-a)/(b-a)
(b-a)y = x-a
(b-a)y + a = x.

So when we want to emit random numbers, we do:

   number = drand48() * (b - a) + a;
If you think about it, that would be how you would generate uniformly distributed random numbers between a and b, even if there were no math or CDF's to guide you. It's nice when the theory matches what you would do anyway.


Exponential Distributions

Exponential Distributions come up in real-life scenarios quite a bit. Examples are arrival events, like people showing up to a store, or cars on a highway, and failure events, such as light bulbs failing. The uniform distribution above was defined by a maximum and a minimum. An exponential distribution is parameterized by a variable λ, which is the mean arrival rate/failure rate. For example, light bulbs may fail at a rate of 1 every 1000 hours, or people may enter a store at a rate of 30 people an hour.

The CDF for an exponential distribution is defined for x ≥ 0, and is:

F(x,λ) = 1- e-λx

Here are CDF's for three values of λ:

Note a few things -- the function is asymptotic, meaning any value of x, from one to a gazillion can occur. It's just that the gazillion has a really low probability of occurring.

The mean value is 1/λ. However, in looking at the graph, you will see that well over half of the values generated will be below the mean - roughly 60 percent. This means that when you buy a light bulb that is rated to last for 1000 hours, chances are 60 percent that it will fail before 1000 hours. However, there's also a chance that the light bulb will last for 1000 years. It's just a really small chance.

We can generate values from an exponential distribution by using the CDF, just as we did above:

y = F(x,&lambda)
y = 1- e-λx
y - 1 = -e-λx
1 - y = e-λx
ln(1 - y) = ln(e-λx)
ln(1 - y) = -λx
-ln(1 - y)/λ = x

So, to generate random numbers according to an exponential, we do:

  number = -1.0 * log(1.0 - drand48()) / lambda;
How about we try this out? Look at expon.cpp:

#include <stdio.h>
#include <iostream>
#include <math.h>
#include <string>
#include <set>
#include <map>
using namespace std;

main(int argc, char **argv)
{
  double lambda;
  int iterations;
  int i;
  
  if (argc != 3 || sscanf(argv[1], "%lf", &lambda) != 1 || lambda <= 0 || sscanf(argv[2], "%d", &iterations) != 1) {
    cerr << "usage: expon lambda iterations\n";
    exit(1);
  }

  srand48(time(0));

  for (i = 0; i < iterations; i++) {
    cout << log(1-drand48())/lambda*-1 << endl;
  }
}
    

Let's try it out on a simulation scenario. Suppose we buy a crate of 1000 light bulbs, each of which has an average lifetime of 1000 hours. We can simulate the failure time of each bulb by choosing 1000 numbers from an exponential distribution where λ = 1/1000 = 0.001:

UNIX> expon 0.001 1000 > bulb.txt
UNIX> head bulb.txt
286.928
120.251
1234.99
185.829
4899.49
2493.85
3980.23
1912.57
697.718
687.042
UNIX> sort -n bulb.txt | head -n 5
1.06986
1.20739
2.45384
2.62195
2.73788
UNIX> sort -n bulb.txt | tail -n 5
5608.55
5671.58
5792.11
7189.54
8034.5
UNIX> 
You can see from the first head statement that the random numbers are indeed all over the place. The two sort statements show you the five bulbs that fail the soonest, and the five that last the longest. It's amazing, no? One unlucky bulb fails after a little over an hour, while one lasts almost a year: 8035 hours = 335 days.

Suppose we sort bulb.txt, and plot the numbers as a graph: the y-axis will be the line number of the file, and the x-axis will have the value on that line:

Think to yourself -- how is that related to the CDF for an exponential? Well, divide the values on the y-axis by 1000 and what do you get? You should get an approximation for the CDF for λ = 1000. To show that, the graph below graphs two lines: the previous graph with the y-axis divided by 1000, and the CDF function y = 1 - e-0.001 x:

Awesome, no?


Weibull Distributions

A Weibull distribution is an enriched exponential that is often used to model physical components that wear out over time. It is charactertized by three variables:

The CDF for a Weibull is defined for x ≥ γ:

F(x,β,η,&gamma) = 1- e-((x-γ)/η)β

So, to generate random numbers, you do:

y = 1- e-((x-γ)/η)β
1 -y = e-(x-γ)/η)β
ln(1 -y) = -((x-γ)/η)β
(-ln(1 -y)) = ((x-γ)/η)β
(-ln(1 -y))1/&beta = (x-γ)/η
η(-ln(1 -y))1/&beta = x-γ
η(-ln(1 -y))1/&beta + γ = x

Yuck. How do you compute that 1/&beta? Use the pow() math routine (man pow). That allows us to compute a Weibull:

    tmp1 = -log(1.0 - drand48());
    tmp2 = pow(tmp1, 1.0/beta);
    return eta * tmp2 + gamma;
Use this exact generator in your labs.

You can test it in weibull.cpp

#include <stdio.h>
#include <iostream>
#include <math.h>
#include <string>
#include <set>
#include <map>
using namespace std;

main(int argc, char **argv)
{
  double eta, beta, tmp1, tmp2, gamma;
  int iterations;
  int i;
  
  if (argc != 5 || sscanf(argv[1], "%lf", &beta) != 1 || beta <= 0 || 
                   sscanf(argv[2], "%lf", &eta) != 1 || eta <= 0 || 
                   sscanf(argv[3], "%lf", &gamma) != 1 || gamma < 0 ||
                   sscanf(argv[4], "%d", &iterations) != 1) {
    cerr << "usage: weibull beta eta gamma iterations\n";
    exit(1);
  }

  srand48(time(0));

  for (i = 0; i < iterations; i++) {
    tmp1 = -log(1.0 - drand48());
    tmp2 = pow(tmp1, 1.0/beta);
    cout << tmp2 * eta + gamma << endl;
  }
}


What do Wiebull's Look Like?

Since the lifetime and minimum value parameters (&eta and &gamma) are straightforward, you may ask "so, exactly what does this shape parameter do?" We'll explore that below. While CDF's are nice for generating random numbers, I find that looking at histograms of probability distributions gives a more intuitive feeling about them.

Rather than generate these histograms mathematically, as we would do in a course on probability, we're going to generate them empirically. The program histogram.cpp takes numbers on standard input and turns them into histograms of a given granularity. We use this program to take a look at a random number generator that generates numbers uniformly between zero and 2000. The command used to generate the graph below is:

UNIX> uniform 0 2000 1000000 | histogram 100

The histogram breaks up the random numbers into "bins" of size 100. Thus, the numbers 0-100 are in the first bin, 100-200 are in the second bin, etc. The Y-axis plots the fraction of all numbers generated in each bin. Since there are 20 bins in the above graph, and numbers are generated uniformly, we would expect each bin to hold roughly 1/20 = 0.05 of the numbers, and indeed, the histogram shows that.

Let's now turn our attention to the Exponential, which is equivalent to the Weibull with β equal to one. We use the following command:

UNIX> weibull 1 1000 0 1000000 | histogram 100
to generate the following histogram:

See how fundamentally different the exponential is from the uniform number generator? The bin with the most values is the smallest -- nearly 1/10th of the values are between zero and 100, even though the mean is 1000.

Let's increase β to 1.3 and see what it does to the histogram:

It shifts the values toward the mean. In other words, more of the values generated are closer to the mean. There are fewer values in the 0-100 bin, and more in the bins closer to 1000. If we increase β to 3, we see something approximating a bell curve:

Increasing β to 50, we see that the values become completely concentrated around the mean:

If we view Weibulls as generating failure data, we say that increasing β "increases the failure rate", because as time gets closer to the mean, if you have not had a failure yet, your probability increases as β increases.

What about decreasing β? We plot three histograms below. First, β = .8:

Then β = .3:

And finally β = .05:

Each one concentrates values further towards zero. However, remember that the mean must still be 1000. This means that there are some really big values generated (or more precisely that the really big bins probably have higher frequencies than their counterparts for larger β). This is sometimes called a "decreasing" failure rate.


Can we look at CDF's?

Sure -- the program cdf.cpp takes input data and emits X/Y values for a CDF. Using that, we plot the CDF's of all the Weibulls above:

Does that give you a better intuitive feel than the histograms? I don't think so, although all of the conclusions that we drew from the histograms are present in the CDF's. Most are just harder to see. One that is easier to see is when β equals 0.05. Just above, we said that the really big values have higher frequencies than their counterparts for larger β. That is clear from the CDF, and less clear from the histogram.

I didn't expect the CDF(1000) to be independent of β, but I'm sure if you look at the math, you'll see that it is. Interesting.