SRM 597, D2, 250-Pointer (LittleElephantAndDouble)

James S. Plank

Sun Aug 28 14:14:17 EDT 2016

Problem Statement.
A main() with the examples compiled in.
A skeleton that compiles with main.cpp.
Problem Given in Topcoder: November, 2013
Competitors who opened the problem: 1071
Competitors who submitted a solution: 1007
Number of correct solutions: 611
Accuracy (percentage correct vs those who opened): 57.0%
Average Correct Time: 13 minutes, 53 seconds.

In case Topcoder's servers are down

You are given a vector of integers with 1 to 50 elements.
Each integer is between 1 and 1,000,000,000.
Your goal is to double each integer some number of times until all of the integers equal each other.
Return "YES" if that's possible, and "NO" if it's not.

Examples:

Example 0: {1, 2} - "YES"
Example 1: {1, 2, 3} - "NO"
Example 2: {4, 8, 2, 1, 16} - "YES"
Example 3: {94, 752, 94, 376, 1504} - "YES"
Example 4: {148, 298, 1184} - "NO"
Example 5: {7, 7, 7, 7} - "YES"

Testing yourself: The main.cpp file will generate random input with a vector size of n if you call it with a command line argument of n that is > 5. As always, there is a testing script in tests.sh, and the output should match answers.txt. It should take under 5 seconds to run tests.sh

Three Solutions

I'm going to give you three solutions along with their running times. All three will pass Topcoder's time limit, largely because the constraints are so small. The second solution is the fastest in some circumstances, and the third one is fastest in others, even though the Big-O complexity of the second is the best.

Solution #1 -- Check each pair

(In Solution-1.cpp.)

One way to do this is to check each pair of elements and make sure that you can create the larger one by doubling the smaller one successively. I did that with a simple procedure:

bool check(int x, int y)
{
  if (x > y) return check(y, x);
  while (x < y) x *= 2;
  return (x == y);
}

Ask yourself two questions about this procedure:

Will it be slow? The answer is no, because you can double at most 30 times before you hit 10⁹ (which roughly equals 2³⁰). So this loop will execute for a maximum of 30 iterations.
Should I be worried about integer overflow? As it turns out, no. The maximum integer is (2³¹-1), which is roughly 2,000,000,000. A little larger, actually. The maximum value that you'll get to double is 999,999,999, and that doubled is less than 2,000,000,000. You're good.

This solution is bad because the loop that tests each pair does way too much work. When the answer is "YES, it will check O(n²) pairs, which makes the running time O(n²).

Solution #2 -- Remove the 2's and check each answer against A[0]

(In Solution-2.cpp.)

Rather then doubling elements to see if they equal each other, you can simply divide out the two's -- while a number is even, divide it by two -- then all of the elements of A should equal each other. The second solution does this, and then checks each element against A[0]].

The running time complexity of this is O(n), because you are doing a loop that executes at most 30 times (which is O(1)) for each element of the vector.

Solution #3 -- Sort and check adjacent elements

(In Solution-3.cpp.)

This solution starts by sorting the vector. Now each element is less than or equal to the next element, so you can double it until it is greater than or equal to the next element. If it is greater, then you return "NO". Otherwise, you move onto the next element.

The running time complexity of this is O(n log n) for the sort, and then O(n) for the testing. Since O(n log n) is bigger than O(n), the complexity is O(n log n)

Interestingly, when you time solutions 2 and 3, there are times when each outperforms the other:

UNIX> cp Solution-2.cpp LittleElephantAndDouble.cpp       # This is how you compile and run Solution 2
UNIX> g++ main.cpp
UNIX> time ./a.out 2000000                                # 0.93 seconds for 2,000,000
NO
0.093u 0.005s 0:00.10 90.0%	0+0k 0+0io 0pf+0w
UNIX> time ./a.out 3000000                                # 2.47 seconds for 3,000,000
YES
0.247u 0.009s 0:00.26 92.3%	0+0k 0+0io 0pf+0w
UNIX> cp Solution-3.cpp LittleElephantAndDouble.cpp 
UNIX> g++ main.cpp
UNIX> time ./a.out 2000000                                # Slower for 2,000,000
NO
0.115u 0.005s 0:00.12 91.6%	0+0k 0+0io 0pf+0w
UNIX> time ./a.out 3000000                                # Faster for 3,000,000
YES
0.172u 0.008s 0:00.18 94.4%	0+0k 0+0io 0pf+0w
UNIX>

Can you think of why this is so? Here are my answers:

When is Solution 2 faster? This is when the answer is "NO" and you can exit the loop early. In this case, the solution is O(index of the first mismatch) rather than O(n). If the mismatch occurs at the second element, then it exits pretty much instantly. You'll note that Solution 3 starts by sorting the vector, so it always does the O(n log n) work.

When is Solution 3 faster? When the answer is "YES". This goes in the face of the big-O analysis, doesn't it? Let's think about the work performed by both solutions:

Solution 2 has to divide out the twos for every element. The way the input is generated, this is an is an average of 12 times per element.
Solution 3 has to sort the vector, and that is O(n log n), but it is implemented efficiently. Then, think about the data when the input is large and the answer is "YES" -- there are at most 30 different values that can be in the vector. Since the vector is sorted, most adjacent values are equal to each other -- there will be very little doubling. So the loop that traverses the vector is probably 12 times faster than the loop for Solution 2. That makes up for the extra time in sorting. Were we to keep scaling n, I would anticipate that solution 2 would be faster in all cases.