Hints for SRM 603, D2, 250-pointer (MiddleCode)

James S. Plank

Wed Jan 29 18:28:46 EST 2014
Problem Statement.
The simplest solution is to simply execute the algorithm as written. You'll start with a string s and an empty string t, and you'll find the character in s that goes into t and append it to t. Then you delete the character from s.

For example, when s is "word", you have the following:

s t New s New t
"word" "" "wrd" "o"
"wrd" "o" "wd" "or"
"wd" "or" "w" "ord"
"w" "ord" "" "ordw"

The only subtlety is how to delete the character from s. One way is to use the substr() method to create s from the substrings before the letter that you're deleting and after the letter that you're deleting. So, for example, when you delete the 'o' from "word", you do it by creating the substrings "w" and "rd", and concatenating them.

Here's some C++ code that does the trick (i and j are ints).

  while (s.size() > 0) {
    i = s.size();
    j = i/2;

    /* If i is odd, then j is the index of the character that we're deleting, so
       we do nothing in that case.

       If i is even, then the two indices that we need to compare are j and j-1.
       We want to set j to be the character that is the smaller of these two.
       The statement below does this.  

       When we're done with the statement, we want to put the character in s[j]
       onto the back of t, and then delete it. */


    if (i % 2 == 0 && s[j-1] < s[j]) j--;
      
    printf("S = %s.  T = %s.  i = %d.  j = %d.\n", s.c_str(), t.c_str(), i, j);

    t.push_back(s[j]);
    s = s.substr(0, j) + s.substr(j+1);
  }
  printf("T = %s\n", t.c_str());
}

Here's what it prints out on examples 0 and 3:

Example 0:
S = word.  T = .  i = 4.  j = 1.
S = wrd.  T = o.  i = 3.  j = 1.
S = wd.  T = or.  i = 2.  j = 1.
S = w.  T = ord.  i = 1.  j = 0.
T = ordw
Example 3:
S = shjegr.  T = .  i = 6.  j = 3.
S = shjgr.  T = e.  i = 5.  j = 2.
S = shgr.  T = ej.  i = 4.  j = 2.
S = shr.  T = ejg.  i = 3.  j = 1.
S = sr.  T = ejgh.  i = 2.  j = 1.
S = s.  T = ejghr.  i = 1.  j = 0.
T = ejghrs

That works fine and solves the problem for Topcoder. However, I want you to think about its running time. Suppose that s originally has n characters. Then the while() loop will run n times, and each time, it is creating s from scratch. At iteration i, the size of s is n-i, which means that the amount of work that we're doing to create s overall is n-1, n-2, n-3, ..., 1. This is the sum of one through n-1, which is equal to n(n-1)/2, or 1/2(n2 - n). We'll learn how to characterize this later in the semester, but let's think about it for various values of n:

n n(n-1)/2
10 45
100 4,950
1000 499,500
10000 49,995,000
100000 4,999,950,000

That's growing pretty quickly. I gave my program a word that has 10,000 characters, (and took out the print statements), and it took 0.6 seconds. On a word with 100,000 characters, it took a minute and forty seconds! That's too slow!

To fix this, let's try not to delete the character in s, or remake s. Instead, we can take advantage of the following observation. If the string is even, then the next two characters moved from s to t will be the two middle characters, since the character that loses in the comparison contest will become the middle character in an odd-length s, and hence will be moved to t in the next round. After we have compared and moved the two middle characters, s will again be even, and the two characters we compare will be the two characters immediately to the left and right of the previous two characters (i.e., their indices will be 1 less and 1 greater than the left and right characters that we compared previously).

We can take advantage of this observation as follows. The first time we look at s, if it's odd, we'll move the middle character to t. Now, s is even. Let's set i to s.size()/2, and j to i-1. Those are the characters to test. Let's suppose that s[j] is less than s[i] -- we'll add s[j] to t, and then s[i]. Otherwise, we'll add s[i] to t, and then s[j]. And then we'll decrement j and increment i. If we keep doing that until j is -1, we'll have constructed t properly, and we haven't had to mess around with creating a new s every time.

Here's the relevant snippet of code:

  middle = s.size()/2;
  if (s.size()%2 == 0) {
    t = "";
    i = middle;
    j = i-1;
  }
  else {
    t = s[middle];
    i = middle+1;
    j = middle-1;
  }
  while (j != -1) {
    if (s[j] < s[i]) {
      t.push_back(s[j]);
      t.push_back(s[i]);
    } else {
      t.push_back(s[i]);
      t.push_back(s[j]);
    }
    i++;
    j--;
  }

Notice that the code does not delete the middle character from s if s starts with an odd length. Instead we simply copy this character to t, and then set j to point to the character immediately to the left of this middle character and i to point to the character immediately to the right of this middle character. This action has the same effect as deleting the middle character from s, but does not require the characters above the middle character to be shifted down by 1 spot in the vector.

When I run it on a 100,000 character string, it takes 0.02 seconds. That's a huge difference! We're going to see things like this time and time again in this class. It's called "algorithm analysis," and it's important, because it lets you predict how quickly your program will run, and figure out how to make it much faster.