SRM 621, D1, 500-Pointer (TreesAnalysis)

James S. Plank

Tue Nov 6 00:00:04 EST 2018
This is a problem that leverages topological sort very nicely. It would be an easier problem were it not for the constraints. With 4,000 nodes, you can barely get away with a quadratic solution. I know, because I tried and failed. This was my solution: Too slow.

So, I read the topcoder analysis, and their solution is really nice.


Step 1 - Turn the trees into DAG's, where each node has one edge, to its parent

This can be done with a DFS, the result of which is a vector P of integers, where P[i] is the parent of node i. You can do anything you want with the parent of the root. In these examples, I'll have it be -1.

We'll use Example 1 to illustrate. Here is how my DFS defined the two trees (edges going from the lower nodes to the higher nodes):

Parent Pointers
Node  0  1  2  3  4
T1   -1  2  0  0  1
T2   -1  0  4  4  1


Step 2 - Derive a topological ordering of the nodes

You can do this using the classic algorithm for topological sort, or you can just use the same DFS as above. A postorder traversal of the nodes yields a topological ordering, where a node appears after all of its children in the ordering. Here are orderings for the two trees above:
T1:  4 1 2 3 0
T2:  2 3 4 1 0

Step 3 - Determine the two sets for each edge of the first graph, using topological sort

I represented each set with a vector of long long's. If an element's bit was zero, it was in set A, and if was 1, it was in set B. You can do this by running through the nodes in toplogical order. Here's how it works. First set all nodes' vectors to zeros. Then, run through the topological order. For each node i, you set bit i in the vector. The resulting set will define the two sets when you remove the edge from node i to its parent.

Next, you OR node i's vector with its parents. That's how you get the sets right for the parents.

Below, I'll go through the whole process for T1. For each node that we process (which defines an edge that separates the tree into two components), I'll draw the trees, and identify the two sets as yellow (set A) and green (set B). I'll include the bit representation of the sets, both as bits and as hex.

I'm also coloring the sets when they change -- red means that that is the final set specification for the node, and blue means that this was the parent of a node that changed.

Node Edge Graph Set for 0 Set for 1 Set for 2 Set for 3 Set for 4
Start Start 00000 0x00 00000 0x00 00000 0x00 00000 0x00 00000 0x00
4 4->1 00000 0x00 10000 0x10 00000 0x00 00000 0x00 10000 0x10
1 1->2 00000 0x00 10010 0x12 10010 0x12 00000 0x00 10000 0x10
2 2->0 10110 0x16 10010 0x12 10110 0x16 00000 0x00 10000 0x10
3 3->0 11110 0x1e 10010 0x12 10110 0x16 01000 0x8 10000 0x10

You'll note, we don't process node 0, because it's the root, and has no parent pointers.


Step 4 - For each node/edge in T1, do a topological sort on T2 to count up the similarities

Ok, now what we're going to do is once again traverse each node/edge of T1. We know the two sets that result when you remove the edge, because we determined them in Step 3 above. Call the yellow set A and the green set B.

Now, we're going to traverse each node/edge of T2, also in topological order, and we're going to partition the nodes into two sets, C and D. Before we begin, all of the nodes are in C.

We store two values during the traversal:

  1. |A| is the number of nodes in set A.
  2. |B| is the number of nodes in set B.

Now, for each node/edge in T2, we are going to maintain two values:

  1. AD[n] -- This is the number of nodes that belong in sets A and D, when the edge from node n is removed from T2.
  2. BD[n] -- This is the number of nodes that belong in sets B and D, when the edge from node n is removed from T2.
We'll start with all of these being zero. However, when we process an node/edge, then we will:

T1's
Node
T1's
Edge
T1's
Graph
The set AC BC T2's
Node
T2's
Edge
T2's
Graph
AD[n] BD[n] |AIC| |AID| |BIC| |BID| Similarity
4 4->1 10000 0x10 4 1 2 2->4 1 0 3 1 1 0 9
4 4->1 10000 0x10 4 1 3 3->4 1 0 3 1 1 0 9
4 4->1 10000 0x10 4 1 4 4->1 2 1 2 2 0 1 4
4 4->1 10000 0x10 4 1 1 1->0 3 1 1 3 0 1 9
1 1->2 10010 0x12 3 2 2 2->4 1 0 2 1 2 0 4
1 1->2 10010 0x12 3 2 3 3->4 1 0 2 1 2 0 4
1 1->2 10010 0x12 3 2 4 4->1 2 1 1 2 1 1 4
1 1->2 10010 0x12 3 2 1 1->0 2 2 1 2 0 2 4

And so on.