CS140 Final Answers

Fall 2018

(10 points) Show the binary search tree that results when the keys are presented in the following order:

	300 150 40 80 450 600 550 200 800 20

                   ----- 300 -----          
                  /               \
	     ----150---           450---
            /          \                \
           40	       200	       600
          /  \                        /   \
         20  80                     550   800

(10 points) Show the binary search tree that results if susan is deleted from the tree below:

                ------nancy----------------
               /                           \
            -fred-                     ---susan-------
            /     \                   /               \
        bonnie  george             peter            zachary
               /      \                \  	   /     
            charles  nick              sarah     yifan
                                      /          /
	                          rebecca      xavier
                                 /
                              ralph

To delete "susan" from this tree we must find the largest child in her left subtree, which is "sarah", and delete sarah's node. We will then replace susan with sarah. sarah has a single child, rebecca, so delete sarah's node and replace the deleted node with rebecca, producing the tree (I have underlined the affected nodes):
------nancy---------------- / \ -fred- ---sarah------- / \ / \ bonnie george peter zachary / \ \ / charles nick rebecca yifan / / ralph xavier

(10 points) Show the result of doing a single left rotation about the node 500. Do not worry if the rotation increases the height of the tree. All I care about is whether you know how to perform a rotation.
```
                            -300-
                           /     \
                         175	 500
                        /       /   \
		      100     400  600
		      /      /   \
		     50	   350   450
```
The left rotation will cause 300 to become a left child of 500. In so doing the left subtree of 500, which is rooted at 400, will become an orphan because 300 is taking its place as 500's left child. Therefore 300 adopts 400 as its right child, while retaining the tree rooted at 175 as its left child. Note that it is ok for 300 to adopt 400 as its right subtree because all of the values in 400's tree are greater than 300. The final tree becomes:
--500-- / \ -300- 600 / \ 175 400 / / \ 100 350 450 / 50
(12 points) If we delete 60 from the following AVL tree
```
                     --100---
                    /        \
                   40        200
                     \      /   \
                     60   150   400
                         /  \
                       125  175
  
```
we end up with the following binary search tree that is not a valid AVL tree and hence needs to be re-balanced:
```
                     --100---
                    /        \
                   40        200
                            /   \
                          150   400
                         /  \
                       125  175
  
```
- Identify the bottom-most node that violates the AVL condition and explain why that node violates the AVL condition.
  In the tree below I have labeled each node with its height. Since 100 is the first node whose subtree heights differ by more than 1, it is the first node to violate the AVL condition.
```
                     --100³---
                    /        \
                   40⁰        200²
                            /   \
                          150¹   400⁰
                         /  \
                       125⁰  175⁰
  
```
- In order to rebalance the tree do we have to use the zig-zig case or the zig-zag case? Justify your answer.
  We must use the zig-zag case. If we follow the subtrees with the greatest heights from 100, we see that we first go right to 200, and then go left to 150 because the height of 150 is greater than the height of its sibling, 400. This right-left traversal is a zig-zag.
- Use the proper rotation(s) to rebalance the above tree so that it becomes a legitimate AVL tree.
  We need to use a double rotation since we have a zig-zag. This requires a rotation about 100's grandchild, which is 150. We will first do a right rotation about 150, and then do a left rotation about 150.
  Here is the original tree:
```
                     --100---
                    /        \
                   40        200
                            /   \
                          150   400
                         /  \
                       125  175
  
```
  and here is the result of doing the two rotations:
  --100--- --150--- / \ / \ 40 150 left ---> 100 200 right ---> / \ rotation / \ / \ rotation 125 200 40 125 175 400 / \ 175 400
  The right rotation makes 200 be a right child of 150. 175 becomes orphaned in this process and since 200's left child is now freed up, 200 adopts 175 as its left child. The resulting tree is shown as the leftmost tree in the above diagram.
  The left rotation makes 100 be a right child of 150. 125 becomes orphaned in this process and since 100's right child is now freed up, 100 adopts 125 as its right child. The resulting, rebalanced tree is shown as the rightmost tree in the above diagram. It is also now a valid AVL tree.
(10 points) Behold the following recursive function that computes and returns the sum of an array of n numbers:
```
int sum(int numbers[], int start, int end) {
1)   int middle = (start+end)/2;
2)   return sum(numbers, start, middle) + sum(numbers, middle + 1, end);
}
  
```
Answer the following questions about this function:
1. Why is this function incorrect? You must answer in 3 sentences or less.
  The function is incorrect because it is missing a base case and hence will recurse infinitely.
2. Write the C++ code fragment that you must add to the above function to make it compute the sum correctly and indicate the line that it should be placed before or after (e.g., you might write after line x where x is the line number). You may not modify any code from the above function.
  The base case occurs when there is only one entry in the array to be added. In this case we simply return the value associated with that entry. We know there is only one entry in the array to be added when the start and end indices are the same. Hence the code fragment for the base case is:
```
if (start == end)
  return numbers[start];
	  
```
  This base case must be added before the recursive case, which means that it should be added after line 1 and before line 2.

(15 points) The greatest common divisor (gcd) of two non-zero numbers is the largest positive integer that divides the numbers with a remainder of 0. For example, the gcd of 48 and 20 is 4, the gcd of 48 and 12 is 12, and the gcd of 12 and 12 is 12.

Euclid's algorithm is a recursive algorithm for finding the gcd of two integers a and b that can be written as the following C++ function:

int gcd(int a, int b) {
1)    if (a == b) return a;
2)    else if (a < b) return gcd(a, b-a);
3)    else return gcd(a-b, b);
}

Suppose I have the following main function:

int main() {
1)    int x = 48;
2)    int y = 20;
3)    cout << gcd(x, y) << endl;
}

In the stack diagram shown below, complete the stack frames that exist when the series of recursive calls finally arrives at the base case for gcd.

I have started the stack for you by showing the frame for main.
Each stack frame should show 1) the name of the executing function, 2) the line number that is executing when it makes its function call, and 3) the values of the local variables/parameters.
The final stack frame will be the stack frame for the base case. It will not call a function so for this final stack frame show the values of the variables and the return value. You should not show the return values for any other frame.
The stack is going downwards, so successive frames should appear under each other, with the frame for the base case being the bottommost stack frame you fill in.


	  |----------------------------------------|
	  | main:                                  |
	  |    line: 3                             |
	  |    x: 48                               |
	  |    y: 20                               |
	  |----------------------------------------|
	  | gcd:                                   |
	  |    line: 3                             |
	  |    a: 48                               |
	  |    b: 20                               |
	  |----------------------------------------|
	  | gcd:                                   |
	  |    line: 3                             |
	  |    a: 28                               |
	  |    b: 20                               |
	  |----------------------------------------|
	  | gcd:                                   |
	  |    line: 2                             |
	  |    a: 8                                |
	  |    b: 20                               |
	  |----------------------------------------|
	  | gcd:                                   |
	  |    line: 2                             |
	  |    a: 8                                |
	  |    b: 12                               |
	  |----------------------------------------|
	  | gcd:                                   |
	  |    line: 3                             |
	  |    a: 8                                |
	  |    b: 4                                |
	  |----------------------------------------|
	  | gcd:                                   |
	  |    line: 1                             |
	  |    a: 4                                |
	  |    b: 4                                |
	  |    return value: 4                     |
	  |----------------------------------------|

(12 points) Behold the following 4 fragments of code:

(a)

int i, j;
int sum = 0;
for (i = 0; i < n*n; i++) 
    sum += i;
for (j = 0; j < n/2; j++)
    sum *= j;

(b)

for (year = 0; year < 2000; year++) {
    for (day = 0; day < 365; day++) {			
        if (n == year % day) {
            printf("year = %d and day = %d\n");			
        }
    }
}

(c)
int mystery(vector<int> &row, vector<int> &col) {
  int i;
  int result = 0;

  for (i = 1; i < row.size(); i *= 2) {
    result += row[i] * col[i];
  }
  return result;
}

(d)

In the following code, assume that the function f is O(n²)
int i, j;
int sum_f = 0, sum_loops = 0;
for (i = 0; i < n; i++) {
    sum += f(i);
}	    
for (i = 0; i < n; i++) {
  for (j = i; j < n; j++) {				
     sum += i * j;
  }  
}
if (sum_f > sum_loops) 
    cout << sum_f << endl;		  
else
    cout << sum_loops << endl;

For each fragment of code, please circle its Big-O running time:

O(n²): The first loop executes n² times and the loop body has a constant number of instructions so the running time of the first loop is n². The second loop executes n/2 times and the loop body has a constant number of instructions so the running time of the second loop is n. The loops run sequentially so the running time of the code is T(n) = n² + n or O(n²).

O(1): The running time is independent of n. Even though there are two nested loops, each loop is executed a constant number of times, regardless of the size of n. The loop bodies also execute a constant number of instructions and hence the overall running time of the code is constant or O(1).

O(log n): n is the size of the row vector (i.e., it is the number of elements in the row vector) and i is doubled each time, which makes i get to the end of the row vector after log n iterations of the loop. The loop body executes a constant number of instructions so the running time of the code fragment is O(log n).

O(n³): The first loop executes n times and the loop body requires O(n²) time to execute f(i), so the running time of the first loop is (n * n²) or n³. The second part of the code fragment is two nested loops. The inner loop executes n times when i is 0, (n-1) times when i is 1, (n-2 times) when i is 2, and so on as follows:
i number of times inner loop executes 0 n 1 n-1 2 n-2 ... n-2 2 n-1 1
The loop body of the inner loop runs in constant time and so the running time of the inner loop is the number of iterations it performs. The running time of the outer loop is the sum of the running times of the inner loop. The sum of the running times of the inner loop is "n + (n-1) + (n-2) + ... + 2 + 1" which as shown in class is n(n+1)/2. Thus the running time of the nested loops is n².
Finally the running time of the conditional at the end of the code fragment is constant time. Thus the running time of the code fragment is n³ + n² + 1 which is O(n³).

(12 points)
```
 
     a. array
     b. vector
     c. stack
     d. deque
     e. hash table
     f. list
     g. binary search tree
     h. AVL tree
```
For each of the following questions choose the best answer from the above list. Assume that the size of an array is fixed once it is created, and that its size cannot be changed thereafter. Sometimes it may seem as though two or more choices would be equally good. In those cases think about the operations that the data structures support and choose the data structure whose operations are best suited for the problem. You may have to use the same answer for more than one question:
1. (g) binary search tree: The keys must be kept in sorted order so we will be using a tree. The names of the runners will be inserted in random order so a binary search tree should work fine.
  The data structure you should use if you want to implement a map that records the results of a 10K race and the keys are the last names of the runners. The keys must be kept in alphabetical order and the runners names will be entered in the order that the runners finish the race (hence if "Joe" finishes before "Barry" then "Joe" will be inserted first and then "Barry").
2. (h) AVL tree: The keys must be kept in sorted order so we will be using a tree. The times will be inserted in sorted order, which is the worst case for a binary search tree. Hence we must use a balanced AVL tree.
  The data structure you should use if you want to implement a map that records the results of a 10K race and the keys are the race times of the runner. The keys must be kept in sorted order and the race times will be entered in the order that the runners finish the race (hence a runner with the time 15:21 will be inserted before a runner with the time 15:36).
3. (d) deque: When you insert and remove from the front of a queue, a deque is the fastest data structure available, faster than either a list or a vector (you should never use a vector for inserting/deleting at the front of a queue because each insert/delete takes O(n) time). Remember that a deque is optimized for insertions/deletions at the front or end of a queue. It is also faster to iterate through a deque then it is through a list.
  The data structure you should use if you want to reverse a file by reading its lines, adding each line to the front of the data structure, and then traversing the data structure from front to back and printing the lines.
4. (h) AVL tree: You need to insert/delete/find emails and they need to be kept in sorted order by time so you will need to use a tree. Further the emails arrive in sorted order by time, which is the worst case for a binary search tree. Hence we need to use a balanced AVL tree.
  The data structure you should use to store a collection of emails where the emails are ordered by the time that they arrived and they are inserted at the time that they arrived. You want to be able to print the emails in sorted order by time of arrival, insert/delete emails, and find emails.
5. (a) array: Hash tables are implemented using "tables" and tables can be implemented using either arrays or vectors. Since we know the size of the data in advance, we can allocate a fixed-size array that is at least 2 times the size of the data (thus leading to load factors less than 0.5) and know that the table will never become more than half full.
  The data structure that is used to implement hashing with linear probing when the size of the data is known in advance (the answer is not a hash table--I want to know the data structure used to implement the hash table).
6. (b) vector: Buckets are kept as a list and a vector is the most efficient data structure to use for implementing a list when you only add values at the end of the list.
  The best data structure to use to implement the buckets in separate chaining. Each bucket holds the key/value pairs that hash to that bucket and new key/value pairs are typically added to the end of the bucket.

Coding Questions

Reverse String

void reverseStringHelper(string &s, int start, int end) {
   if (start >= end) return;
   char saveChar = s[start];
   s[start] = s[end];
   s[end] = saveChar;
   reverseStringHelper(s, start+1, end-1);
}

void reverseString(string &s) {
   reverseStringHelper(s, 0, s.length()-1);
}

Leaf Counting

int BSTree::recursive_leafcount(BSTNode *n) {
  if (n == sentinel)
    return 0;
  else {
    int leftSize = recursive_leafcount(n->left);
    int rightSize = recursive_leafcount(n->right);
    int numLeaves = leftSize + rightSize;
    if (numLeaves == 0)
      return 1;
    else
      return numLeaves;
  }
}