Leetcode.com problem 540: "Single Element in a Sorted Array"

James S. Plank

Tue Feb 21 00:32:32 EST 2023

Problem Statement.
Skeleton.cpp: A file that compiles and runs examples on stdin, but doesn't solve the problem.
tests.sh: A script for testing that works with the input files in this directory.
answers.txt: The correct output for tests.sh.
files.tgz: A gzipped tar file of all of the files here -- so you can get all of the input files if you want.
This is a medium problem.
Success Rate (as of February, 2023): 58.8%

In Case Their Servers Are Down

You are given the following class definition:

class Solution {
public:
  int singleNonDuplicate(vector <int> &nums);
};

Here are the details of the problem:

nums will be an array of integers between 0 and 100,000.
nums.size() will be between 1 and 100,000.
nums will be sorted.
Every number in nums appears exactly twice in nums, except for exactly one number which appears once.
Return the number that appears once.
Your algorithm must run in O(log n) time and O(1) space.

Examples

UNIX> echo 1 1 2 3 3 4 4 8 8 | ./a.out     # This is their example 1
2
UNIX> echo 3 3 7 7 10 11 11 | ./a.out      # This is their example 2
10
UNIX> echo 1 | ./a.out
1
UNIX> echo 1 2 2 | ./a.out
1
UNIX> echo 1 1 2 | ./a.out
2
UNIX>

There are 100 example input problems in this directory (which you can get from our EECS machines if you are at the University of Tennessee. Otherwise, you can get them from the files.tgz file linked above). They range in size from very small to hitting the constraints. For example:

UNIX> cat input-06.txt
64968
64968
73717
UNIX> ./a.out < input-06.txt
73717
UNIX> cat input-12.txt
40306
40306
45989
92737
92737
UNIX> a.out < input-12.txt
45989
UNIX> wc input-50.txt
      15      15      88 input-50.txt
UNIX> ./a.out < input-50.txt
77725
UNIX> grep 77725 input-50.txt    # This confirms that 77725 appears exactly once in the file.
77725
UNIX> wc input-90.txt
   19999   19999  117756 input-90.txt
UNIX> ./a.out < input-90.txt
18114
UNIX> grep 18114 input-90.txt    # This confirms that 18114 appears exactly once in the file.
18114
UNIX>

In case your O(n) solution is faster

A bunch of leetcoders posted in the discussion that their O(n) solution was faster than O(log n) solutions. My favorite said, "An O(N/2) solution beats 80% of the solutions. What's even the point of putting that O(logN) time constraint?"

It won't be faster if you measure it properly. The problem with the leetcode test (and with my inputs for that matter) is that generating the problem is O(n), so it may be hard to measure the runtime of the solution. If you properly measure the solution time rather than the problem generation time, you can easily show that O(log n) outperforms O(n). It's a pity the leetcode tests didn't do this.

That comment is one of those comments that won't get you hired for two reasons. First, it shows that you don't understand the value of an O(log n) solution as compared to an O(n) solution. I would pass on hiring you for that. Second, writing "O(N/2)" shows that you don't understand big-O, and I would definitely pass on hiring you for that.

If you agree with the comment above, you are missing the point, so please take some time to understand why O(log n) is vastly superior to O(n), even if the leetcode problem doesn't test it properly.

Hints

Since the requirement is O(log n) you should be thinking binary search. I'll be honest with you -- while binary searches are straightforward in concept, I think they always require care in implementation.

The binary search actually doesn't require the numbers to be sorted. It simply requires the pairs to be together. What you do is roughly the following

Look at the middle element and see if it's part of a pair. If not, you're done.
Otherwise, look at the number of elements before the pair.
If that number is an even number, then you can ignore the elements before the pair, because each of them belongs to a pair.
Otherwise, then you can ignore the elements after the pair, because the single element has to be before the pair.

I'll illustrate with an ASCII-art drawing:

      |--------------------------|------|-------------------------| 
nums: | elements-before-the-pair | pair | elements-after-the-pair |
      |--------------------------|------|-------------------------| 
        If this is odd, then the          If this is odd, then the
       single element is in here         single element is in here

Now, the concept there is straightforward. How to implement it? I maintained a while loop with four variables:

low -- This is the lowest index of the region that I'm looking at.
high -- This is the highest index of the region that I'm looking at.
mid -- This is the middle of low and high.
pli -- This is the first index of the pair that contains nums[mid].

For my ending condition, I explicitly checked for the answer when (high-low) is zero or two. And I checked to see if nums[mid] didn't belong to a pair (so I could return instantly).

Let's do some examples:

UNIX> cat input-06.txt 
64968
64968
73717            # Here's the unique value
UNIX> ./a.out < input-06.txt
low =     0.  high =     2.  high - low <= 2 - calculate the answer.
73717
UNIX> cat input-12.txt
40306
40306
45989            # Here's the unique value
92737
92737
UNIX> ./a.out < input-12.txt
low =     0.  high =     4.  mid =     2.  nums[mid] = 45989.  Not part of a pair, return.
45989
UNIX> cat input-13.txt
28868
28868
31958            # Here's the unique value
97310
97310
UNIX> cat input-14.txt
37299            # Here's the unique value
68537
68537
89939
89939
UNIX> ./a.out < input-14.txt
low =     0.  high =     4.  mid =     2.  nums[mid] = 68537.  pli:     1.
low =     0.  high =     0.  high - low <= 2 - calculate the answer.
37299
UNIX> cat input-50.txt
2268
2268
20560
20560
41719
41719
55466
55466
65085
65085
66067
66067
77725           # Here's the unique value
89233
89233
UNIX> ./a.out < input-50.txt
low =     0.  high =    14.  mid =     7.  nums[mid] = 55466.  pli:     6.
low =     8.  high =    14.  mid =    11.  nums[mid] = 66067.  pli:    10.
low =    12.  high =    14.  high - low <= 2 - calculate the answer.
77725
UNIX> cat input-47.txt
10649
10649
12866           # Here's the unique value
15839
15839
23952
23952
41625
41625
52714
52714
73624
73624
91269
91269
UNIX> ./a.out < input-47.txt
low =     0.  high =    14.  mid =     7.  nums[mid] = 41625.  pli:     7.
low =     0.  high =     6.  mid =     3.  nums[mid] = 15839.  pli:     3.
low =     0.  high =     2.  high - low <= 2 - calculate the answer.
12866
UNIX>