Homework 5 Solutions

  1. Hashing is the best strategy since:
    1. ordered traversals and find min/find max operations are not required.
    2. the fact that the data is almost ordered should have no impact on the performance of the hashing algorithm since the hashing function should assure that keys are properly randomized, and
    3. hashing's O(1) average time complexity for insert, delete, and find operations is better than either the O(log n) time complexity for balanced tree and the O(N2) time complexity for unbalanced tree schemes on almost ordered data.

    1. the prefix expression is obtained from a pre-order traversal: - * * a b + c d e
    2. the infix expression is obtained from an in-order traversal: a * b * c + d - e
    3. the postfix expression is obtained from a post-order traversal: a b * c d + * e -

  2. Result of inserting 3, 1, 4, 6, 9, 2, 5, 7 into an initially empty binary search tree:
    			              3
                               	     / \
    				    1	4
    		 	     	     \   \
    			     	      2   6
    					 / \
    					5   9
    					   / 
    					  7
    	  

  3. Consider the following tree:
                        ---------50----------
                       /                     \
                  ----25----             ----75----
                 /          \           /          \
                10          40         60          90
              /            /   \     /            /
             2           35     45  55          85
                                     \
                                      57
         
    1. Draw the tree that results from deleting 2
                          ---------50----------
                         /                     \
                    ----25----             ----75----
                   /          \           /          \
                  10          40         60          90
                             /   \     /            /
                           35     45  55          85
                                       \
                                        57
           
      When a node is a leaf, like 2, you simply remove it from its parent.

    2. Draw the tree that results from deleting 25
                          ---------50----------
                         /                     \
                    ----35----             ----75----
                   /          \           /          \
                  10          40         60          90
                 /               \     /            /
                2                 45  55          85
                                       \
                                        57
           
      When a node with two children is deleted, you must perform the following actions:

      • Find the smallest node in the node's right subtree and promote that node to the root. In this case, 35 is the smallest node in 25's right subtree.
      • Recursively delete the promoted child's node from the tree. In this case, 35 is a leaf so it is simply removed from the tree.

    3. Draw the tree that results from deleting 50
                          ---------55----------
                         /                     \
                    ----25----             ----75----
                   /          \           /          \
                  10          40         60          90
                /            /   \     /            /
               2           35     45  57          85
           
      When a node with two children is deleted, you must perform the following actions:

      • Find the smallest node in the node's right subtree and promote that node to the root. In this case, 55 is the smallest node in 50's right subtree.
      • Recursively delete the promoted child's node from the tree. In this case, deleting 55 causes 57 to be promoted to 55's old location.

    1. The height of the tree in problem 4 is 4, which is obtained via the path 50-75-60-55-57, which is a path of length 4.

    2. The height of a tree is equal to 1 plus the maximum of the heights of the root's two subtrees:
      tree_height = 1 + max(height(root->left), height(root->right))
      	
      This definition is recursive in that the heights of the left and right subtrees can be computed in the same fashion. By convention the height of an empty tree is -1. These facts lead to the following recursive function for computing a tree's height:
      	int height (node *root) {
      	    if (root == 0)
      		return -1;
      	    else 
      		return 1 + max(height(root->left_child),
      			       height(root->right_child));
      	}
      	
      The max function is defined in the math.h library. If you did not use the max function, and I would not expect you to, then the following function would work:
      	int height (node *root) {
      	    int left_height;
      	    int right_height;
      	    if (root == 0)
      		return -1;
      	    else {
      		left_height = height(root->left_child);
      		right_height = height(root->right_child);
      		if (left_height > right_height)
      		    return 1 + left_height;
      		else
      		    return 1 + right_height;
      	    }
      	}
      	

  4. It uses a post-order traversal because the height of a node is computed only after its left and right children are processed (i.e., the heights of its left and right children are computed before the node's own height is computed).

  5. The problem with the function is that there is no statement to stop the recursion and hence it will recurse infinitely, or in practice, until the program runs out of stack space. The fix comes from recognizing that the function is computing n! and that 0! = 1. Hence the function can be rewritten properly as:
    	int fact(n) {
    	    if (n == 0)
    		return 1;
    	    else
    		return n * fact(n-1);
    	}
    	
    If you wanted to be really careful, you could also ensure that the initial value of n is non-negative. It would be inefficient for every call to fact to check whether n is non-negative so one would write a helper function that would do the recursion and have fact itself perform the check:
    	int fact_helper(n) {
    	    if (n == 0)
    		return 1;
    	    else
    		return n * fact_helper(n-1);
    	}
    
    	int fact(n) {
    	    if (n < 0) {
    		fprintf(stderr, "fact(%d): The argument must be non-negative\n", n);
    		exit(1);
    	    }
    	    return fact_helper(n);
    	}
    	

  6. Suppose that you have a data set with 1,000,000 randomly distributed elements. Compute the average number of searches required to a) find a key that exists, b) determine that a key does not exist in the following data structures. Use the big O notation to do your calculation (e.g, if the average time was O(n2) for a successful find, you would answer 1012. Do not write your answers using exponents, this number was simply too big to write out):

    Data StructureKey existsKey does not exist
    Ordered linked list500,000500,000
    Unbalanced binary search tree2020
    Hash table11

    For an ordered linked list, a found key will be on average halfway through the list, resulting in n/2 comparisions. For a key that does not exist, you can stop as soon as you reach an element in the list that is greater than the key. For example, if you have the list 1, 3, 8, 15, 20, 25, 30, 40, 55, 80, ... and you want to know if 17 is in the list, then you can stop as soon as you reach 20 because you know that all the remaining elements in the list are greater than 20 and hence greater than 17. Thus you cannot hope to find 17 later in the list and you can stop. On average, you will reach the first greater element halfway through the list, thus also resulting in n/2 comparisions.

    For an unbalanced binary tree, since the distribution of the keys is random, the tree will be roughly balanced, and hence finds for both existing and non-existent keys will require O(log2 n) which is 19.93 for n = 1,000,000.

    For hash tables, the number of comparisons is O(1), so is roughly 1 in a table where the keys are well distributed.