CS202 Lecture notes -- Big-O

  • Jim Plank
  • Directory: /home/jplank/cs202/notes/BigO
  • Lecture notes: http://web.eecs.utk.edu/~jplank/plank/classes/cs202/Fall-2004/notes/BigO/
  • Original Notes: Wed Sep 22 11:48:08 EDT 2004
  • Last Modification: Wed Oct 6 10:28:53 EDT 2021

    Big-O

    Big-O notation is one of the ways in which we talk about how complex an algorithm or program is. It gives us a nice way of quantifying or classifying how fast or slow a program is as a function of the size of its input, and independently of the machine on which it runs.

    Examples

    Let's look at a program (in src/linear1.cpp):

    /* This program takes a number n on standard input.  
       It then executes a for loop that iterates n times, counting the iterations.  
       It prints the number of iterations and a timing that uses the system call gettimeofday().  
       That's why you need to include <sys/time.h>. */
    
    #include <sys/time.h>
    #include <cstdio>
    #include <iostream>
    using namespace std;
    
    int main()
    {
      long long n, count, i;
      double start_time, end_time;
      struct timeval tv;
    
      if (!(cin >> n)) return 1;
    
      /* Get the starting time. */
    
      gettimeofday(&tv, NULL);
      start_time = tv.tv_usec;
      start_time /= 1000000.0;
      start_time += tv.tv_sec;
    
      /* Here's the loop, that executes n times. */
    
      count = 0;
      for (i = 0; i < n; i++) count++;
    
      /* Get the ending time. */
    
      gettimeofday(&tv, NULL);
      end_time = tv.tv_usec;
      end_time /= 1000000.0;
      end_time += tv.tv_sec;
    
      /* Print N, the iterations, and the time. */
    
      printf("N: %lld     Count: %lld    Time: %.3lf\n", n, count, end_time - start_time);
      return 0;
    }
    

    Obviously, this is a simple program. I don't want to go into gettimeofday too much. It returns the value of a timer, which includes seconds and microseconds. I convert that to a double, so that we can print out the timing the for loop. Suppose we run this program with varying values of n. What do we expect? Well, as n increases, so will the count, and so will the running time of the program:

    (This is on my Macbook, chunking along at 2.2 Ghz in 2019):

    UNIX> echo 100000 | bin/linear1
    N: 100000     Count: 100000    Time: 0.000
    UNIX> echo 1000000 | bin/linear1
    N: 1000000     Count: 1000000    Time: 0.003
    UNIX> echo 10000000 | bin/linear1
    N: 10000000     Count: 10000000    Time: 0.020
    UNIX> echo 100000000 | bin/linear1
    N: 100000000     Count: 100000000    Time: 0.195
    UNIX> echo 1000000000 | bin/linear1
    N: 1000000000     Count: 1000000000    Time: 2.021
    UNIX> 
    
    Just what you'd think. The running time is roughly linear. Now, I'm also going to run this on a Raspberry Pi 3, which is a slower machine -- I'll tabulate the times below:

    n Time on Macbook (s) Time on Pi 3 (s)
    1,000,000 0.003 0.021
    10,000,000 0.020 0.143
    100,000,000 0.195 1.201
    1,000,000,000 2.021 11.779

    As you can see, the running time on both machines scales linearly with n. The Pi is slower, but the relative behavior of the two machines is the same.

    Now, look at six other programs below. I will just show their loops:

    src/linear2.cpp:
    /* This loop executes 2n times. */
    
    count = 0;
    for (i = 0; i < 2*n; i++) count++;
    
    src/log.cpp:
    /* This loop  executes log_2(n) times. */
    
    count = 0;
    for (i = 1; i < n; i *= 2) count++;
    
    src/nlogn.cpp:
    /* This loop executes n*log_2(n) times. */
    
    count = 0;
    for (j = 0; j < n; j++) {
      for (i = 1; i < n; i *= 2) {
        count++;
      }
    }
    
    src/nsquared.cpp:
    /* This loop executes n*n times. */
    
    count = 0;
    for (j = 0; j < n; j++) {
      for (i = 0; i < n; i++) {
        count++;
      }
    }
    
    src/all_i_j_pairs.cpp:
    /* This loop executes (n - 1) * n / 2 times.
       It arises when you enumerate all (i,j) pairs
       such as 0 <= i < j < n. */
    
    count = 0;
    for (j = 0; j < n; j++) {
      for (i = 0; i < j; i++) {
        count++;
      }
    }
    
    src/two_to_the_n.cpp:
    /* This loop executes 2^n times. */
    
    count = 0;
    for (i = 0; i < (1LL << n); i++) {
      count++;
    }
    

    In each of the programs, I tell you how many times the loop executes in the comment. That will be the final value of count. Make sure you can calculate all of these below -- it's an easy test question:

    UNIX> echo 16 | bin/linear1
    N: 16     Count: 16    Time: 0.000
    UNIX> echo 16 | bin/linear2
    N: 16     Count: 32    Time: 0.000
    UNIX> echo 16 | bin/log
    N: 16     Count: 4    Time: 0.000
    UNIX> echo 16 | bin/nlogn
    N: 16     Count: 64    Time: 0.000
    UNIX> echo 16 | bin/nsquared
    N: 16     Count: 256    Time: 0.000
    UNIX> echo 16 | bin/all_i_j_pairs 
    N: 16     Count: 120    Time: 0.000
    UNIX> echo 16 | bin/two_to_the_n
    N: 16     Count: 65536    Time: 0.000
    UNIX> 
    
    In each program, the running time is going to be directly proportional to count. So, what do the running times look like if you increase n to large values? To test this, I ran all of the programs with increasing values of n. I quit either when n got really large (about 1015), or when the running time exceeded a minute. You can see the data in the following files (these are in the data directory):

    linear1 linear2 log nlogn nsquared all_i_j_pairs two_to_the_n

    It's pretty illustrative if you look at the last line of each file -- I've done some formatting:

    log            N: 7500000000000000  Count: 53          Time: 0.000
    linear1        N: 50000000000       Count: 50000000000 Time: 98.628
    linear2        N: 25000000000       Count: 50000000000 Time: 99.176
    nlogn          N: 2500000000        Count: 80000000000 Time: 167.156
    nsquared       N: 250000            Count: 62500000000 Time: 125.680
    all_i_j_pairs  N: 250000            Count: 31249875000 Time: 61.589
    two_to_the_n   N: 35                Count: 34359738368 Time: 76.261
    

    Two quick observations: log(n) is really fast. On the flip side, 2n is really slow. Below I show some graphs of the data. The graphs all graph the same data; they just have different scales, so that you can do some visual comparisons:

    So, this shows what you'd think:

    log(n) < n < 2n < n*log(n) < all_pairs(n) < n*n < 2n


    Back to Big-O: Function comparison

    Big-O notation tries to work on classifying functions. The functions that we care about are the running times of programs. The first concept when we deal with Big-O is comparing functions. Basically, we will say that one function f(n)is greater than another g(n) if there is a value x0 so that for all x >= x0:

    f(x) >= g(x)

    Put graphically, it means that after a certain point on the x axis, as we go right, the curve for f(n) will always be higher than g(n). Thus, given the graphs above, you can see that n*n is greater than n*log(n), which is greater than 2n, which is greater than n, which is greater than log(n).

    So, here are some functions:

    So, we can ask ourselves questions: Is b(n) > a(n)? Yes. Why? Because for any value of n, b(n) is 100, and a(n) is 1. Therefore for any value of n, b(n) is greater than a(n).

    That was easy. How about d(n) and b(n)? d(n) is greater, and to demonstrate it, we need to pick a value of x0. We can't pick 0, because b(0) is 100 and d(0) is 0. However, if we pick x0 to be 101, now it works -- every value of d(n) is greater than 100 when n > 101.

    Similarly, look at g(n) and d(n). For small values of n, d(n) is a lot greater. However, let's consider a large value of x0, like 1,000,000. d(n) = 1,000,000. And g(n) = 1,000,000,000,000 - 5,000,000,000, which is 999,995,000,000. That's much bigger than d(n). Plus, as n grows bigger than x0, g(n) grows more quickly than d(n). Therefore, g(n) > d(n)

    Here's a total ordering of the above functions. Make sure you can prove all of these to yourselves:

    g(n) > j(n) > e(n) > f(n) > d(n) > h(n) > i(n) > b(n) > a(n) > c(n)

    Some rules:


    Big-O

    Given the above, we can now define Big-O:

    T(N) = O(f(N)) if there exists a constant c such that c*f(N) >= T(N).

    Given the definitions of a(n) through j(n) above:


    The Imprecision of Big-O

    O(f(N)) is an upper bound on T(N). That means that T(N) is definitely not bigger than f(n). We'd like it to be a tight bound, but often that is too difficult to prove. However, you should be aware of it. For example, although we showed above that b(n) = O(1), it is also true that b(n) = O(n2). Why? Well, set c equal to one, and x0 to 101, and you should see that for all values n ≥ 101, 1*n2 ≥ 100.

    So, in some respects, Big-O is imprecise, because b(n) above is not only O(1), but O(n), O(n2), O(n*log(n)), O(2n) and O(n!). In computer science, when we say that a = O(f(n)), what we really mean is that f(n) is the smallest known function for which a = O(f(n)).

    As an aside, don't use the imprecision as a way to get around test questions. For example, if I ask for the Big-O complexity of sorting a vector with n elements, you shouldn't answer O(n10), because you know that it's technically correct, and n10 is probably bigger than any function that we teach in this class. You will not get credit for that answer, and I will cite this text when you argue with me about it...


    Big Omega and Big Theta

    Big Omega and Big Theta are two more definitions that help to clean up the above imprecision with Big-O.

    Let me give an example. Suppose I have a program that takes 3n + 5 operations on an input of size n. We typically say that the program is O(n). That is clearly true (choose c=4 and x0=10). However, as mentioned above, the program is also O(n2) (choose c=1 and x0=10). Is it O(1)? No -- there is no c such that c ≥ 3n + 5.

    The program is Ω(n): choose c = 1 and x0=1 (in other words, for any x ≥ 1, 3x+5 > x). However, it is not Ω(n2), because there is no c such that c(3x+5) ≥ x2. It is, however, Ω(1): choose c = 1 and x0=1 -- it's pretty easy to see that 3x + 5 > 1.

    Now, we can put this in terms of Big-Theta. The program is Θ(n), but not Θ(n2) or Θ(1).

    It is unfortunate that we as computer scientists quantify algorithms using Big-O rather than Big-Theta, but it is a fact of life. You need to know these definitions, and remember that most of the time, when we say something is Big-O, in reality it is also Big-Theta, which is much more precise.


    At this point, I think that giving the Wikipedia page on Big-O a scan is a good idea. This includes:


    Using Big-O to Categorize

    Although Big-O is laden with math, we use it to characterize the running times of programs and algorithms. The following Big-O characterizations are particularly useful (and they are all Big-Theta as well, even though we don't say so).

    Two Big-O Proofs

    You are not responsible for proofs like this, but it's not a bad idea to see them:

    Is n*n + n + 1 = O(n*n)? See the following PDF file for a proof.

    Generalizing, is an*n + bn + d = O(n*n) for a,b,d > 1 and b > d? See the following PDF file for a proof.