CS302 -- Lab 5
Empirical Evaluation of Time Complexity
- CS302 -- Fundamental Algorithms
- Fall, 1999
- Brad Vander Zanden
- Due: Wednesday, October 6 at 2:30PM for the Wednesday lab and
Friday, October 8 at 11:15AM for the Friday lab.
Lab Objective
The objective of this lab is to
show you how an algorithm's time complexity translates to real world
performance. A second objective
is to give you experience with using a profiler. A profiler
instruments a program and prints out information about the time spent
in each function. In this lab you will use a profiling tool called
quantify.
Problem Statement
You will be performing a number of
experiments that compare the relative performance of
several search algorithms.
For the search algorithms you will be comparing linear search, unordered
binary search trees, splay trees,
red black trees, and hash tables on ordered and random data.
You will be asked to graph your results and also to find the optimal
data structures for various input ranges.
Setting Up
There is no set up required for this lab.
Search Algorithms
In /ruby/homes/ftp/pub/bvz/classes/cs302/labs/lab5 you will find a
file called search which implements the linear search, unordered
binary search tree, red black tree, and hash table algorithms.
This program 1) generates a sequence of keys and
inserts the keys into their respective data structures, 2) searches once
for each of the generated keys (these searches will all be successful), and
3) searches for a sequence of keys not in the data structures (these
searches will all be unsuccessful). There are an equal number of
successful and unsuccessful searches.
The search program takes two arguments:
- An integer indicating the number of keys to be generated. If
n is the number of keys to be generated, then integer keys in
the range [0, 2n-1] are generated. In the ordered case, the
keys are inserted in ascending order (i.e., 0, 2, 4, ..., 2n-2). In the
random case, the keys are inserted in a random order. Once
the keys are inserted, there will be n successful and
n unsuccessful searches conducted.
- An integer seed for the random number generator.
- A switch, o for ordered and r for random, that
determines whether the program generates ordered or random
keys (the ordered keys are in sorted order).
For example:
search 1000 10 r
There are a number of points you should know about the programs:
- You might not have quantify in your path. If you do not,
you should add /usr/local/pure/quantify to your
path.
- search does not generate any output.
- search has already been compiled for you so you don't have to
compile it or modify it.
- The hash table's size is chosen so that on average, each entry
will have five keys hash to it. Table sizes are determined
by dividing the number of keys by 5, then choosing the smallest
prime number greater than the computed number.
- You should use a seed of 10 for the search experiments.
Using quantify, you will measure the performance of three aspects
of these programs:
- The time to insert all n keys into the data structure.
- The time to perform n successful searches.
- The time to perform n unsuccessful searches.
The functions you want to look for have the following names:
| data structure |
insertion |
successful search |
unsuccessful search |
| Linear Search |
linarray_insert |
linarray_successful_search |
linarray_unsuccessful_search |
| Hash Table |
HashTable::insert |
hashtable_successful_search |
hashtable_unsuccessful_search |
| Binary Search Tree |
BinarySearchTree::insert |
bintree_successful_search |
bintree_unsuccessful_search |
| Red Black Trees |
RedBlackTree::insert |
rbtree_successful_search |
rbtree_unsuccessful_search |
| Splay Trees |
SplayTree::insert |
splaytree_successful_search |
splaytree_unsuccessful_search |
The numbers you will report from quantify should be the number of cycles for
the function plus its descendants.
To complete this part of the lab, do the following:
- prepare two sets of graphs, one for ordered data and one for
random data. Each set consists of three graphs, one each for
insertion, successful searches, and unsuccessful searches. Each graph
should have a curve for linear search, binary search trees,
red black trees, splay trees and hashing. The x-axis should be the
number of elements and the y-axis should be the number of cycles. You
should run the experiments with the following numbers of elements: 10, 50,
100, 500, 1000, 2000.
If the curves for some of the slower algorithms are growing too rapidly,
take the value for the next highest curve and make this value the
maximum value for the y-axis. The curves for the slower algorithms
may then leave the graph but that's ok. For example, suppose that
the binary search tree on unordered data is taking 50,000,000 cycles
to insert 2000 elements while the next slowest algorithm is taking
only 5,000,000 cycles. You should then use 5,000,000 as your maximum
value for the y-axis and terminate the binary search tree curve.
If you're not sure what it means to "cut off" a curve, look at
Figures 2.3 and 2.4 on page 46 of the Weiss book. Both figures have
graphs in which some of the faster growing curves are cut off.
Some of the experiments with
the larger number of elements may take several minutes because of the
bad time complexity of linear search and the bad time complexity of
binary search trees on ordered data, so be patient.
- Answer the following questions:
a. In general, which data structure is fastest for inserting ordered data?
b. In general, which data structure is fastest for inserting random data?
c. In general which data structure is fastest for successful searches on ordered
data?
d. In general, which data structure is fastest for successful searches on random
data?
e. In general, which data structure is fastest for unsuccessful searches on ordered
data?
f. In general, which data structure is fastest for unsuccessful searches on random
data?
- For smaller numbers of items, the data structures that you have listed
in question 2 as being the fastest may be slower than some of the
other algorithms. Assuming that successful and unsuccessful searches are
equally likely, prepare two tables, one for ordered data and one for
random data, that would allow a programmer to choose the fastest data
structure for any input size. To determine the fastest data structure for a
given input size, sum the insertion time, the successful search time,
and the unsuccessful search time. Your table may round the input sizes
may round the input size to the nearest value of 10. For example:
| Number of Items |
Fastest Data Structure |
| 0 - 20 |
Linear Search |
| 20 - 200 |
Red Black Trees |
| 200+ |
Hash Tables |
You should submit graphs that support this table. That is, at the points
where the optimal data structure goes from one data structure to another,
you should graph the total combined cycles for all five data structures,
along with a couple points on either side (in increments of 10), so that
it is clear to the viewer how the curves are moving.
What To Hand In
You should hand in your answers to DJ at the beginning of lab.
For this assignment, there is
no need to submit any source files. Your graphs and answers constitute
the work for this lab.