CS302 Lecture Notes - Dynamic Programming
Example program #4: ConvertibleStrings

James S. Plank

Original Notes: Thu Nov 14 21:59:54 EST 2013.
Latest revision: Mon Nov 9 10:27:28 EST 2020

This is from Topcoder SRM 591, Division 2, 500-point problem.
Problem Statement.


In case Topcoder's servers are not working, here is a summary of the problem:

Examples

0: A: "DD"
   B: "FF"
   Answer: 0 -- If you change D's to F's, you're done.  No deletion is required.
    	
1: A: "AAAA"
   B: "ABCD"
   Answer: 3 -- Since you can only map A's to one character.  Whichever it is -- A, B, C or D,
                you'll have to delete the other indices from both strings.
    	
2: A: "AAIAIA"
   B: "BCDBEE"
   Answer: Delete indices 1, 2 and 5, and A becomes "AAI" and B becomes "BBE".  That works.

3: A: "ABACDCECDCDAAABBFBEHBDFDDHHD"
   B: "GBGCDCECDCHAAIBBFHEBBDFHHHHE"
   Answer: 9 -- We'll have to program this one to get it right.

Approach with Dynamic Programming

This one screams dynamic programming. As always, the hard part is to spot the recursion. Here's how I thought about it. You have your two strings, A and B. Consider the first character of each. Either you are going to remove that character from each string, or you are going to keep the character, which means that you'll match the character in A with the character in B. In either case, you can solve a smaller sub-problem, and use that solution to solve your problem.

Let's think about it in terms of a concrete example. I work Example 2 to completion below, but I'm going to start with a harder one here to motivate the recursion. I've put this in the main as example 4.

A = "DEFDEDFFDEED", B = "WYZYXYWYZYXY"

Now, consider the first character of each string -- this is the character 'D' for A, and 'W' for B. Our solution is going to be one of the following:

Whichever of these approaches yields the smaller number will be the answer.
Let's run through a second example, this time all the way to completion. This is example 2 from the Topcoder problem:

A = "AAIAIA". B = "BCDBEE"

Suppose we remove the first character from A and B. Then the number of overall removals is going to be one plus the minimum number of removals when you set A to "AIAIA" and B to "CDBEE".

Now suppose instead that we don't remove the first character. Then 'A' in A will match with 'B' in B. So, we run through both strings, and whenever there is an 'A' in A, or a 'B' in B, we'll have to decide whether this will cause us to remove the characters, or whether they match appropriately. Let's draw the same picture as above:

        " A A I A I A " 
          | |   |   | 
          | |   |   | 
          M X   M   X 
          | |   |   | 
          | |   |   | 
        " B C D B E E " 
As you can see, two of them match, and two of them must be removed. For the recursive call, we'll remove all of those indices, leaving us with A = "II" and B = "DE". We'll add two to the recursive call, because of the two non-matching characters above.

So, to summarize, we are going to do two things with the first characters of A and B:

  1. Remove them from both strings and solve the subproblem. The answer is the solution to the sub-problem, plus one.
  2. Match them. This may cause us to remove other non-matching characters in the remainder of the string. Let's call the number of such removals R. Create the sub-problem by deleting all instances of the first character in A (and their corresponding characters in B), and all instances of the first character in B (and their corresponding characters in A). The answer is the solution to the sub-problem, plus R.
Whichever of these solutions is the minimum is our answer.

There's your recursion. Now, this is a dynamic program, so you have to memoize. I had my cache be a map that I key on a concatenation of A and B. With the example above, the first key is "AAIAIABCDBEE".

Hack it up. This one is a really nice practice DP problem.

My solution is here (there is a main() in that program so that you can run it from the command line).


If you want to walk through this in detail, the picture below shows the call graph of example 2. Each node makes two recursive calls, which are represented by edges to other nodes. When the edges leave the bottom of a node, it's because we are removing the first characters and recursively calling the procdure on the remaining characters. For that reason, the edge weights are always one.

When the edge leaves the right of a node, it's because we are matching the first characters, and then we have to remove any non-matching characters in the remaining strings. The edge weights are variable now. For the starting node, the weight of the edge to its right has a weight of two, because we have to change two characters when we match 'A' to 'B'. For the node "IA EE", the edge to the right goes to the empty string with a weight of one, because when you assign 'I' to 'E', you have to remove the 'A'.


"But Dr. Plank, is this Dynamic Programming, Topological Sort, or Dijkstra's Algorithm?"

Good question. You'll note that the graph above is a directed acyclic graph, and you are looking for the shortest path from the starting node to the node with the null string. So you can solve it in three different ways:

A final note on using enumeration to solve this problem

Given the constraints, enumeration is a possible way to solve this problem. Think to yourself -- what kind of enumeration will work?