CS140 Lecture notes -- Big-O

  • Jim Plank
  • Directory: /home/plank/cs140/notes/BigO
  • Lecture notes: http://www.cs.utk.edu/~plank/plank/classes/cs140/Fall-2004/notes/BigO/
  • Original Notes: Wed Sep 22 11:48:08 EDT 2004
  • Last Modification: Thu Mar 27 09:33:06 EDT 2014

    Big-O

    Big-O notation is one of the ways in which we talk about how complex an algorithm or program is. It gives us a nice way of quantifying or classifying how fast or slow a program is as a function of the size of its input, and independently of the machine on which it runs.

    Examples

    Let's look at a program (in linear1.cpp):

    #include <cstdio>
    #include <cstdlib>
    #include <iostream>
    using namespace std;
    
    main(int argc, char **argv)
    {
      int n;
      double f;
      double count;
      int i, j;
      long t0;
    
      if (argc != 2) { 
        fprintf(stderr, "Usage: linear1 n\n");
        exit(1);
      }
    
      n = atoi(argv[1]);
    
      t0 = time(0);
      count = 0;
      f = 23.4;
      for (i = 0; i < n; i++) {
        count++;
        f = f * f + 1;
        j = (int) f / 1000;
        f = f - j*1000;
        // printf("%g\n", f);
      }
      printf("N: %d   Count: %.0lf   Time: %ld\n", n, count, time(0)-t0);
    }
    
    

    What this does is compute n random-ish numbers between one and 1000, and if we uncommented the printf() statement, then it would print them -- try it out (uncomment the print statement).

    Suppose we run this program with varying values of n. What do we expect? Well, as n increases, so will the count, and so will the running time of the program:

    (This is on my machine at home -- I don't know how fast the machines here will be.)

    UNIX> g++ -o linear1 linear1.c
    UNIX> linear1 1
    N: 1   Count: 1 Time: 0
    UNIX> linear1 10000
    N: 10000   Count: 10000 Time: 0
    UNIX> linear1 10000000
    N: 10000000   Count: 10000000 Time: 8
    UNIX> linear1 20000000
    N: 20000000   Count: 20000000 Time: 14
    UNIX> linear1 100000000
    N: 100000000   Count: 100000000 Time: 72
    UNIX>
    
    Just what you'd think. The running time is roughly linear:

    running time = 72n/100000000
    Obviously, count equals n.

    Now, look at four other programs below. I will just show their loops:

    linear2.cpp:
      ...
      for (i = 0; i < n; i++) {
        count++;
        f = f * f + 1;
        j = (int) f / 1000;
        f = f - j*1000;
      }
      for (i = 0; i < n; i++) {
        count++;
        f = f * f + 1;
        j = (int) f / 1000;
        f = f - j*1000;
      }
      ...
    
    log.cpp:
      ...
      for (i = 1; i <= n; i *= 2) {
        count++;
        f = f * f + 1;
        j = (int) f / 1000;
        f = f - j*1000;
      }
      ...
      
    nlogn.cpp:
      ...
      for (k = 0; k < n; k++) {
        for (i = 1; i <= n; i *= 2) {
          count++;
          f = f * f + 1;
          j = (int) f / 1000;
          f = f - j*1000;
        }
      }
      ...
      
    nsquared.cpp:
      ... 
      for (i = 0; i < n; i++) {
        for (k = 0; k < n; k++) {
          count++;
          f = f * f + 1;
          j = (int) f / 1000;
          f = f - j*1000;
        }
      }
      ...
      

    In the five programs total, the value of count will be the following (note, I will expect you to be able to do things like tell me what count is as a function of n on tests):

    In each program, the running time is going to be directly proportional to count. Why? Read chapter two for how to count instructions. So, what do the running times look like if you increase n to large values. I have the output of the various programs in the following table of links:

    linear1 linear2 log nlogn nsquared

    Some things you should notice right off the bat: log(n) is very, very small in comparison to n. This means that log.cpp is blazingly fast for even huge values of n. On the other end of the spectrum, n*n grows very quickly as n increases. Below, I graph all of the programs' running times as a function of n:

    So, this shows what you'd think:

    log(n) < n < 2n < n*log(n) < n*n

    Perhaps its hard to gauge how much each is less than the other until you see it. Below I plot the same graph, but zoomed up a bit so you can get a better feel for n*log(n) and n*n.




    Back to Big-O: Function comparison

    Big-O notation tries to work on classifying functions. The functions that we care about are the running times of programs. The first concept when we deal with Big-O is comparing functions. Basically, we will say that one function f(n)is greater than another g(n) if there is a value x0 so that for all x >= x0:

    f(x) >= g(x)

    Put graphically, it means that after a certain point on the x axis, as we go right, the curve for f(n) will always be higher than g(n). Thus, given the graphs above, you can see that n*n is greater than n*log(n), which is greater than 2n, which is greater than n, which is greater than log(n).

    So, here are some functions:

    So, we can ask ourselves questions: Is b(n) > a(n)? Yes. Why? Because for any value of n, b(n) is 100, and a(n) is 1. Therefore for any value of n, b(n) is greater than a(n).

    That was easy. How about c(n) and b(n)? b(n) is greater, because we can set x0 equal to 6: For any value of x greater than 6, b(x) is 100 and c(x) is negative.

    Here's a total ordering of the above. Make sure you can prove all of these to yourselves:

    g(n) > j(n) > e(n) > f(n) > d(n) > h(n) > i(n) > b(n) > a(n) > c(n)

    Some rules:


    Big-O

    Given the above, we can now define Big-O:

    T(N) = O(f(N)) if there exists a constant c such that c*f(N) >= T(N).

    Given the definitions of a(n) through j(n) above:


    Big Omega and Big Theta

    Note that O(f(N)) is an upper bound on T(N). That means that T(N) is definitely not bigger than f(n). Similarly, if g(N) > f(N) and T(N) = O(f(N)), then T(N) = O(g(n)) too. That's inconvenient. Why? Well, if a program's running time is linear in the size of its input, then we'd like to say that the running time is O(n) and not O(n2). Unfortunately, it is both.

    Big Omega and Bit Theta help make things more precise:

    Let me give an example. Suppose I have a program that takes 3n + 5 operations on an input of size n. We typically say that the program is O(n). That is clearly true (choose c=4 and x0=10). However, as mentioned above, the program is also O(n2) (choose c=1 and x0=10). Is it O(1)? No -- there is no c such that c ≥ 3n + 5.

    The program is Ω(n): choose c = 1 and x0=1 (in other words, for any x ≥ 1, 3x+5 > x). However, it is not Ω(n2), because there is no c such that c(3x+5) ≥ x2. It is, however, Ω(1): choose c = 1 and x0=1 -- it's pretty easy to see that 3x + 5 > 1.

    Now, we can put this in terms of Big-Theta. The program is Θ(n), but not Θ(n2) or Θ(1).

    It is unfortunate that we as computer scientists quantify algorithms using Big-O rather than Big-Theta, but it is a fact of life. You need to know these definitions, and remember that most of the time, when we say something is Big-O, in reality it is also Big-Theta, which is much more precise.


    At this point, I think that giving the Wikipedia page on Big-O a scan is a good idea. This includes:

    Two Big-O Proofs

    You are not responsible for proofs like this, but it's not a bad idea to see them:

    Is n*n + n + 1 = O(n*n)? See the following PDF file for a proof.

    Generalizing, is an*n + bn + d = O(n*n) for a,b,d > 1 and b > d? See the following PDF file for a proof.


    Using Big-O to Categorize

    Although Big-O is laden with math, we use it to characterize the running times of programs and algorithms. The following Big-O characterizations are particularly useful (and they are all Big-Theta as well, even though we don't say so).