CS140 Lecture notes -- More Big O

  • Jim Plank with modifications by Brad Vander Zanden
  • Directory: ~cs140/www-home/notes/MoreBigO
  • Lecture notes: http://www.cs.utk.edu/~cs140/notes/MoreBigO
  • Tue Nov 3 11:05:32 EST 1998

    More Big O

    The goal with this big-O stuff is to be able to classify the running times of programs and procedures. For example, take the following code example, from section 2.4.1 of the book:
    int sum(int n)
    {
      int i, partial_sum;
    
      partial_sum = 0;                          /* Line 1 */
      for (i = 1; i <= n; i++) {                /* Line 2 */
        partial_sum += i*i*i;                   /* Line 3 */
      }
      return partial_sum;                       /* Line 4 */
    }
    
    This returns the sum from i = 1 to n of i cubed. Suppose we count the number of statements that are executed in this procedure. You have one for line 1 and one for line 4. The for loop on line 2 can be more easily counted if you re-write it as:
    i = 1;
    while (i <= n) {
      ...
      i++
    }
    
    You should now be able to see that its statements are executed 2n+2 times: Thus, the sum is 1+(n+1)+n = 2n+2 times. Finally, the body of the loop is executed n times, so line 3 is executed n times. The total sum of statements is 1+(2n+2)+n+1 = 3n+4.

    In terms of big-O notation, this is O(n). Why? Informally it is because n is the largest degree polynomial. Formally, it is because if we choose c=4, then we see that cn > 3n+4.

    Of course, it is also O(n*log(n)) and O(n*n), since n*n > n*log(n) > 3n+4. However, to best characterize it, we choose the smallest function: it is O(n).

    Given the definitions of last class, you'll also see that it is Omega(n), and therefore Theta(n) too. Why Omega(n)? Because if we choose c=1, then we see that cn < 3n+4. Since 3n+4 = O(n), and 3n+4 = Omega(n), 3n+4 = Theta(n) too.

    Now, in line three, should we count each execution of that statement as one or more? Certainly executing ``partial_sum += i*i*i'' should be more expensive than executing ``partial_sum = 0.'' In fact, the book counts each execution of line three as four statements -- two multiplications, an addition, and an assignment. This is reasonable. However, in terms of big-O, it doesn't matter. Why? Well, suppose you do count it as 4 statements. Then line three contributes 4n statements to the running time, and the total running time is now 1+(2n+2)+4n+1 = 6n+4.

    In terms of big-O, Omega and Theta, this changes nothing: 6n+4 = O(n) (choose c to be 7); 6n+4 = Omega(n) (choose c to be 1), and therefore 6n+4 = Theta(n) too.

    The bottom line is that in terms of big-O, you can count statements as one, and not worry about their relative execution time, as long as it is O(1).

    General Rules

    The book, in chapter 2.4.2 gives rules for counting the number of statements in a program or procedure:

    As you get better at this stuff, you learn to just quantify everything as big-O, and you use the following identities. (You might think about how you could prove these).

    For example, in the sum() procedure, you see that lines 1, 3 and 4 are O(1). The for loop in line 2 iterates O(n) times. Therefore, by Rule 1 the running time of lines 2 and 3 is O(n*1) = O(n). Adding it all up, the running time of the whole program is O(1)+O(n)+O(1), which is O(n).

    Suppose line 2 of the sum program was:

       for (i = 0; i <= n; i++) 
    
    Does that change the running time of the program? No, because the for loop is still O(n). Similarly, if line 2 is
       for (i = 0; i <= n/2; i++) 
    
    then it is still O(n).

    If we change line 2 to be:

      for (i = 1; i <= n; i *= 2)
    
    then the loop iterates log(n) times. Now the running time of the program is O(1)+O(log(n))+O(1) = O(log(n)).

    Significance

    Once again, quantifying the running time of the program in terms of big-O is important, because it allows you to classify your programs and the algorithms that it uses, independent of things like the speed of the machine that your program is running on. In general, you want to get it to be the lowest O(f(n)) that you can.

    Since 4n+1 and 1000n+400000 are both O(n) does this mean that you shouldn't care if your program's running time is one or the other? No. If you can get it to be 4n+1 rather than 1000n+400000, you should. However, a more fundamental thing is to make sure that your program's running time is O(n) instead of O(n*log(n)) or O(n*log(n)) instead of O(n*n) if that is possible.