Leetcode.com problem 164: "Maximum Gap"

James S. Plank

Sat Oct 29 15:13:00 EDT 2022

In case Leetcode's servers are down

Here's your class prototype:

class Solution {
public:
    int maximumGap(vector<int>& nums);
};


First - some scaffolding

The challenge with this problem isn't coding a correct solution. It's coding a solution that is correct and fast. There's also a little challenge in checking the speed of your solution. If you simply put 100,000 numbers into a file, then reading the file is going to be much slower than solving the problem. So, when I solved this, I did two initial subtasks: I put that code into Skeleton.cpp. You call the program as follows:

usage: a.out check-solution(y|n|p) [seed nums.size() max(nums[i])+1]

Here are some examples:
UNIX> echo 3 6 9 1 | a.out p           # Their first example, printing out the vector
3 6 9 1 
Slow: 3.  Yours: -1. Incorrect
UNIX> echo 10 | a.out y                # Their second example, not printing out the vector
Slow: 0.  Yours: -1. Incorrect
UNIX> a.out p 1 10 100                 # 10 random elements between 0 and 100
4 45 83 33 56 0 18 99 75 36 
Slow: 19.  Yours: -1. Incorrect
UNIX> time a.out y 1 100000 1000000000 # 100,000 random elements between 0 and 1000,000,000
Slow: 114561.  Yours: -1. Incorrect

real	0m0.023s
user	0m0.019s
sys	0m0.002s
UNIX> time a.out n 1 100000 1000000000 # Same as the last one, but not precalculating the answer.
-1

real	0m0.013s
user	0m0.010s
sys	0m0.002s
UNIX> 
You'll note that sorting that vector was pretty fast. I'm not sure how leetcode tests your solution -- I'm sure they disable sort(). Maybe they test for recursion, or they figure you can't write a sort() that's as fast as the STL's quicksort.


Does Bucket Sort Really Work?

Since we're going for a linear time solution, my mind went instantly to bucket sort. I thought about something like this:

Use a vector of vectors, called buckets. Have its size be nums.size(). Calculate the max and min of nums, and then use that for your bucket sort. Your loop will do the following:

index = (nums[i] - min) * nums.size() / (max +1 - min);
buckets[index].push_back(nums[i]);

Now, your answer will either be the maximum of:

Let's do an example to help illustrate. We'll use the 10 random numbers above:

4 45 83 33 56 0 18 99 75 36 

Min is 0, max is 99, and nums.size() is 10. So, we'll have 10 buckets, which will end up being populated as follows:

    0: 4 0    
    1: 18    
    2:     
    3: 33 36     
    4: 45     
    5: 56     
    6:     
    7: 75    
    8: 83     
    9: 99    

Now, we can run through this table:

- The max difference in bucket 0 is 4.
- The max difference in bucket 3 is 3.
- The other buckets only have one element, so we won't worry about them.
- The difference between the max of bucket 0 and the min of bucket 1 is 14
- The difference between the max of bucket 1 and the min of bucket 3 is 15
- The difference between the max of bucket 3 and the min of bucket 4 is 9
- The difference between the max of bucket 4 and the min of bucket 5 is 11
- The difference between the max of bucket 5 and the min of bucket 7 is 19
- The difference between the max of bucket 7 and the min of bucket 8 is 8
- The difference between the max of bucket 8 and the min of bucket 9 is 15
And the answer is 19.

Why does this solution make me uneasy?

Well, a bucket could have as many as nums.size()-1 elements, which means that finding the maximum difference within a bucket is solving the same problem as our overall problem.

Can we cure my uneasiness?

So, the next question is -- "do you need to look for the maximum difference within a bucket?" If the answer to that is "no", then you don't even need to store vectors in the buckets -- you only need to store the min and max values within a bucket, and our solution will indeed be linear time.

To answer the question, let's consider two cases:

Case 1: (max+1-min) is less than nums.size(): In this case, we have more buckets than values. Therefore, each value will go into its own bucket. The minimum of each bucket will equal the maximum of each bucket, and therefore the maximum difference within each bucket is zero. The answer has to come from differences betwen buckets.

Nice

Case 2: (max+1-min) is greater than or equal to than nums.size(): First, we are guaranteed that the first and last buckets are non-empty. We have as many elements as buckets, so either each bucket has exactly one element, or there will be at least one empty bucket in the middle of buckets. You'll note, that is the case in the example above -- buckets two and six are empty.

So, if there are no empty buckets, then each bucket has one element, and we don't care about the max difference within a bucket.

If there are empty buckets, then the difference between the non-empty buckets that surround it must be greater than any inter-bucket difference. Let me use the example above to make that concrete. Since bucket 2 is empty, the minimum difference between buckets 1 and 3 is 11. Why? Because the maximum possible element in bucket 1 is 19, and the minimum possible element in bucket 3 is 30. That's bigger than any two elements within the same bucket. So we don't have to care about the max difference within a bucket!!


This gives us a nice linear time solution

Instead of storing vectors within buckets, just store the min and max values, and whether or not the bucket is empty. Here's our example above:

Bucket  empty   min  max     
   0      n       4    0   
   1      n      18   18   
   2      y       -    -   
   3      n      33   36   
   4      n      45   45   
   5      n      56   56   
   6      y       -    -   
   7      n      75   75   
   8      n      83   83   
   9      n      99   99   

We see that the answer is 19 -- the difference between 75 and 56.


Performance

It works, but it's not really faster than sorting. Whatever, it passes Leetcode's tests, and that's what matters!
UNIX> echo 3 6 9 1 | a.out p
3 6 9 1 
Slow: 3.  Yours: 3. Correct
UNIX> echo 10 | a.out y
Slow: 0.  Yours: 0. Correct
UNIX> a.out p 1 10 100
4 45 83 33 56 0 18 99 75 36 
Slow: 19.  Yours: 19. Correct
UNIX> time a.out y 1 100000 1000000000
Slow: 114561.  Yours: 114561. Correct

real	0m0.031s
user	0m0.028s
sys	0m0.002s
UNIX> time a.out n 1 100000 1000000000
114561

real	0m0.023s
user	0m0.020s
sys	0m0.002s
UNIX>