SRM 583, D2, 250-Pointer (SwappingDigits)

James S. Plank (with help from Allen McBride)

Sat Feb 15 10:45:08 EST 2014 (Revised Mon Sep 30 12:13:37 EDT 2019)

Problem Statement.
A main() with the examples compiled in.
A skeleton that compiles with main.cpp.
A shell script that you can use to test your answer.
Your output should match this file exactly.
Problem Given in Topcoder: 2013
Competitors who opened the problem: 1078
Competitors who submitted a solution: 892
Number of correct solutions: 474
Accuracy (percentage correct vs those who opened): 44.0%
Average Correct Time: 21 minutes, 30 seconds.

In case Topcoder's servers are down

Please use the workflow in the problem Cryptography.

Here is a summary of the problem:

You are given a string that represents a positive integer.
The integer can be huge. The topcoder constraints say up to 50 digits, but my testing program goes up to roughly 9,600 digits. Clearly, you can't store them in a long long.
The string only contains the characters from '0' to '9', and it does not start with a zero.
You are allowed to swap any two digits, although you don't have to if you don't want to.
Your goal is to produce the smallest possible integer after swapping those two digits (or not swapping, if that yields the smallest integer).
That integer may not start with '0'.
Return the string containing that integer.

The examples

Example	Input String	Answer
0	`"596"`	`"569"`
1	`"93561"`	`"13569"`
2	`"5491727514"`	`"1491727554"`
3	`"10234"`	`"10234"`
4	`"93218910471211292416"`	`"13218910471211292496"`

I am going to present three ways to solve this problem -- one is very easy to write, and will run easily within Topcoder's time limits (but not with tests.sh). The other two require more thought, but end up with programs that are better. Go ahead and program up the easy one and submit/test it. Then, try one of the other ways. It's good programming and thought practice. You should be able to analyze all of them to determine that the second and third ways are better than the first.

The Easy and Inefficient Way

If you represent two numbers as strings with the same number of characters, then comparing the strings is equivalent to comparing the numbers. So, the easiest thing to do is to try all combinations of i and j, swapping their digits, and returning the smallest number, while discarding numbers that begin with zero.

Enumerating all combinations of i and j takes roughly n² operations, so this is not a good way to solve the problem in general. However, the topcoder constraints limit the string to 50 characters, and 50*50 is a pretty small number, so it works easily.

In tests.sh, you can do the first 58 problems in under a second. The 59th, and last problem is roughly 9600 characters, and took my program roughly 9 seconds to do. You can test your solution on the first 58 problems with:

UNIX> head -n 58 tests.sh | sh

The Harder and Efficient Way

To solve this problem more efficiently, think about conditions when you swap digits. In particular:

Under what conditions would you swap the first digit?
If you don't swap the first digit, then under what conditions would you swap the second digit?
If you don't swap the second digit, how do you find the leftmost digit to swap?

If you're still stuck, I'll answer those questions:

If the first digit is greater than any other non-zero digit, then you are going to swap the first digit. Consider example 2: "5491727514". If you don't swap that initial 5, you'll end up with a number in the form "5xxxxxxxxx". If you swap the 5 and the 4, you'll end up with "4591727514". Clearly that number is smaller than any number that begins with 5. That means you have to swap the five.
What are you going to swap it with? Well, you want to maximize the effect of replacing the first digit, so you should swap it with the smallest digit that's not zero (because we are not allowed to start the number with a zero). What if there are multiple of these? Again, let's look at example 2: "5491727514". We want to swap the '5' with a '1' and there are two '1' digits. The two potential swaps yield "1495727514" and "1491727554". The second of these is smaller -- we want to swap with the rightmost '1'.
Of course, that was just proof-by-Cosmo. Can you prove to yourself that you want the rightmost, minimum non-zero digit? It is correct.
As a second example, look at example 4: "93218910471211292416". '9' is definitely bigger than some digits to its right, so you are going to swap it. There are a bunch of ones, so you will swap it with a one, because that is the smallest non-zero digit. Which '1'? The rightmost one. That's how you get the answer "13218910471211292496".
Now, if you don't swap the first digit, then you want to swap the second digit. We don't care about leading zeros now, so if the second digit is greater than any digit to its right, we will be swapping the second digit. With what do we swap it? With the rightmost, smallest digit.
If we don't swap the second digit, that's because it is less than or equal to all digits to the right. Now, you need to find the leftmost digit that is greater than any digit to its right. Swap that one with the rightmost smallest digit.

This gives us a strategy for solving the problem, but if we program it in the most natural way, the program still runs in n² operations. The most "natural" way is to work from the definition: start i at zero and have it go to the end of the string. For each value of i, you look at each value to the right of i and find the minimum, rightmost value (when i equals zero, you exclude zero). If this value is less than the digit in i, then you swap those digits and return. If the value is greater than or equal to the digit in i, then you increment i and try again.

If you think about it, when your input is a non-decreasing sequence of digits, this technique still uses n² operations. Why? Because at iteration i, you look at all of the digits greater than i, and there are n-i of those (where the number has n digits).

How can we fix this? By thinking clearly and organizing our code so that we are not doing unnecessary, nested for loops. Interestingly, I solved it in one way, and Allen McBride (who TA's the class many years ago) solved it another. I'll describe both. Allen's solution is better.

The Plank Solution

The Plank solution relies on the fact that there are only ten digits in our number. So, let's keep a vector V with ten elements. V[i] holds the rightmost index of digit 'i' in the string. If digit 'i' doesn't occur, then element i is -1. In example 1, where the string is "93561", the vector is

i	0	1	2	3	4	5	6	7	8	9
v[i]	-1	4	-1	1	-1	2	3	-1	-1	0

Now, you run through the string as before, but finding the minimum value to the right of digit i simply requires you to run through the vector, whose size is limited to ten elements. Instead of requiring n² operations in the worst case, it is linear!

When I ran tests.sh on this solution, the entire thing ran in 0.16 seconds.

The McBride Solution

In this solution, you work from the end of the string to the beginning of the string, and you maintain three variables:

min_digit is the minimum digit that you've seen so far. Start it at '0'+10 at the beginning of the loop (this is an impossible value, but since it is greater than every digit, it works nicely in the loop. This is called a "sentinel", by the way).
min_nonzero is the minimum non-zero that you've seen so far. Also start it at '0'+10 at the beginning of the loop.
lpos is the index of the leftmost digit that is greater than either min_digit or min_nonzero. If lpos = 0 it must be greater than min_nonzero. Otherwise, it must be greater than min_digit. If there is no value of lpos that works, it should be -1.

Let's give an example that is similar to example 3: "10423". You should be able to see that the answer will be "10243". Here are the variables as you run through the string from right to left:

i	min_digit	min_nonzero	lpos
Start	'0'+10	'0'+10	-1
4	'3'	'3'	-1
3	'2'	'2'	-1
2	'2'	'2'	2
1	'0'	'2'	2
0	'0'	'1'	2

In that last iteration, we don't set lpos equal to zero because the digit must be less then min_nonzero when lpos equals zero.

Now, when you're done, you are going to swap the digit at index lpos. Obviously, if lpos equals -1, you simply return the original string. If lpos equals zero, you want to find the rightmost minimum, non-zero digit, and swap with that. Otherwise, you want to find the rightmost, minimum digit that is to the right of lpos, and swap with that. In the example above, we start at index 2 (whose digit is '4'), and find the rightmost, minimum digit to the right of it -- that's the '2', which is at index 3. Swap the '2' and the '4', and you have your answer.

This solution is better than the Plank solution, because it doesn't require you to traverse the entire vector, if the problem structure is good, and it doesn't depend on a vector of digits like the Plank solution. Very nice, Allen!

When I programmed this up, it also ran tests.sh in 0.16 seconds.

When you program up one of these two solutions, test it with tests.sh.

Running Times

Let's make this general. We'll define the following quantities:

n - The number of digits in the string.
d - The number of potential digits. In this problem there are 10 potential digits (0 through 9), but one could make the problem more general by working in a different base.
m - If min_digit equals zero, then this is n minus the location of min_digit in the McBride solution. Otherwise, m equals n. This represents the number of steps to find min_digit, because if it equals 0, you can stop looking when you find it. Otherwise, you have to traverse the entire array.
z - If min_nonzero equals one, then this is n minus the location of min_nonzero in the McBride solution. Otherwise, z equals n. This represents the number of steps to find min_nonzero, and is similar to m above.
l - lpos in the McBride solution.

The easy an inefficient solution: This is O(n²), plain and simple.

The Plank solution: I traverse the entire vector to create my digit vector v, whose size is d. So my solution is O(n + d). Why the distinction? Well, suppose I'm working in base 100, but my string has two digits. Then, the d term dominates.

The McBride solution: Determining min_digit and min_nonzero is O(m+z) -- that's the number of elements that you look at to find min_digit or min_nonzero, and as always, the addition operator in the big-O equation stands for "either-or, depending on which is bigger". Finding lpos is O(l). So the running time of the overall solution is O(m+z+l). You'll note, this can be O(n) in the worst case, but if I were to generate the input randomly, and n is large, then it would be O(d). Why? Because on average we would find the rightmost 0 and 1 within the last d characters, and lpos would be a very small number.