CS302 -- Lab 5

Empirical Evaluation of Time Complexity

CS302 -- Fundamental Algorithms
Fall, 1999
Brad Vander Zanden
Due: Wednesday, October 6 at 2:30PM for the Wednesday lab and Friday, October 8 at 11:15AM for the Friday lab.

Lab Objective

The objective of this lab is to show you how an algorithm's time complexity translates to real world performance. A second objective is to give you experience with using a profiler. A profiler instruments a program and prints out information about the time spent in each function. In this lab you will use a profiling tool called quantify.

Problem Statement

You will be performing a number of experiments that compare the relative performance of several search algorithms. For the search algorithms you will be comparing linear search, unordered binary search trees, splay trees, red black trees, and hash tables on ordered and random data. You will be asked to graph your results and also to find the optimal data structures for various input ranges.

Setting Up

There is no set up required for this lab.

Search Algorithms

In /ruby/homes/ftp/pub/bvz/classes/cs302/labs/lab5 you will find a file called search which implements the linear search, unordered binary search tree, red black tree, and hash table algorithms. This program 1) generates a sequence of keys and inserts the keys into their respective data structures, 2) searches once for each of the generated keys (these searches will all be successful), and 3) searches for a sequence of keys not in the data structures (these searches will all be unsuccessful). There are an equal number of successful and unsuccessful searches. The search program takes two arguments:

An integer indicating the number of keys to be generated. If n is the number of keys to be generated, then integer keys in the range [0, 2n-1] are generated. In the ordered case, the keys are inserted in ascending order (i.e., 0, 2, 4, ..., 2n-2). In the random case, the keys are inserted in a random order. Once the keys are inserted, there will be n successful and n unsuccessful searches conducted.
An integer seed for the random number generator.
A switch, o for ordered and r for random, that determines whether the program generates ordered or random keys (the ordered keys are in sorted order).

For example:

search 1000 10 r

There are a number of points you should know about the programs:

You might not have quantify in your path. If you do not, you should add /usr/local/pure/quantify to your path.
search does not generate any output.
search has already been compiled for you so you don't have to compile it or modify it.
The hash table's size is chosen so that on average, each entry will have five keys hash to it. Table sizes are determined by dividing the number of keys by 5, then choosing the smallest prime number greater than the computed number.
You should use a seed of 10 for the search experiments.

Using quantify, you will measure the performance of three aspects of these programs:

The time to insert all n keys into the data structure.
The time to perform n successful searches.
The time to perform n unsuccessful searches.

The functions you want to look for have the following names:

data structure insertion successful search unsuccessful search

Linear Search linarray_insert linarray_successful_search linarray_unsuccessful_search

Hash Table HashTable::insert hashtable_successful_search hashtable_unsuccessful_search
Binary Search Tree BinarySearchTree::insert bintree_successful_search bintree_unsuccessful_search

Red Black Trees RedBlackTree::insert rbtree_successful_search rbtree_unsuccessful_search

Splay Trees SplayTree::insert splaytree_successful_search splaytree_unsuccessful_search

The numbers you will report from quantify should be the number of cycles for the function plus its descendants. To complete this part of the lab, do the following:

prepare two sets of graphs, one for ordered data and one for random data. Each set consists of three graphs, one each for insertion, successful searches, and unsuccessful searches. Each graph should have a curve for linear search, binary search trees, red black trees, splay trees and hashing. The x-axis should be the number of elements and the y-axis should be the number of cycles. You should run the experiments with the following numbers of elements: 10, 50, 100, 500, 1000, 2000. If the curves for some of the slower algorithms are growing too rapidly, take the value for the next highest curve and make this value the maximum value for the y-axis. The curves for the slower algorithms may then leave the graph but that's ok. For example, suppose that the binary search tree on unordered data is taking 50,000,000 cycles to insert 2000 elements while the next slowest algorithm is taking only 5,000,000 cycles. You should then use 5,000,000 as your maximum value for the y-axis and terminate the binary search tree curve. If you're not sure what it means to "cut off" a curve, look at Figures 2.3 and 2.4 on page 46 of the Weiss book. Both figures have graphs in which some of the faster growing curves are cut off.
Some of the experiments with the larger number of elements may take several minutes because of the bad time complexity of linear search and the bad time complexity of binary search trees on ordered data, so be patient.
Answer the following questions:
a. In general, which data structure is fastest for inserting ordered data?
b. In general, which data structure is fastest for inserting random data?
c. In general which data structure is fastest for successful searches on ordered data?
d. In general, which data structure is fastest for successful searches on random data?
e. In general, which data structure is fastest for unsuccessful searches on ordered data?
f. In general, which data structure is fastest for unsuccessful searches on random data?
For smaller numbers of items, the data structures that you have listed in question 2 as being the fastest may be slower than some of the other algorithms. Assuming that successful and unsuccessful searches are equally likely, prepare two tables, one for ordered data and one for random data, that would allow a programmer to choose the fastest data structure for any input size. To determine the fastest data structure for a given input size, sum the insertion time, the successful search time, and the unsuccessful search time. Your table may round the input sizes may round the input size to the nearest value of 10. For example:

Number of Items Fastest Data Structure

0 - 20 Linear Search

20 - 200 Red Black Trees

200+ Hash Tables

You should submit graphs that support this table. That is, at the points where the optimal data structure goes from one data structure to another, you should graph the total combined cycles for all five data structures, along with a couple points on either side (in increments of 10), so that it is clear to the viewer how the curves are moving.

What To Hand In

You should hand in your answers to DJ at the beginning of lab. For this assignment, there is no need to submit any source files. Your graphs and answers constitute the work for this lab.

data structure	insertion	successful search	unsuccessful search
Linear Search	linarray_insert	linarray_successful_search	linarray_unsuccessful_search
Hash Table	HashTable::insert	hashtable_successful_search	hashtable_unsuccessful_search
Binary Search Tree	BinarySearchTree::insert	bintree_successful_search	bintree_unsuccessful_search
Red Black Trees	RedBlackTree::insert	rbtree_successful_search	rbtree_unsuccessful_search
Splay Trees	SplayTree::insert	splaytree_successful_search	splaytree_unsuccessful_search

Number of Items	Fastest Data Structure
0 - 20	Linear Search
20 - 200	Red Black Trees
200+	Hash Tables