SRM 621, D1, 500-Pointer (TreesAnalysis)

James S. Plank

Tue Nov 6 00:00:04 EST 2018

Problem Statement.
A main() with the examples compiled in.
A skeleton that compiles with main.cpp.
Problem Given in Topcoder: January, 2014
Competitors who opened the problem: 672
Competitors who submitted a solution: 212
Number of correct solutions: 137
Accuracy (percentage correct vs those who opened): 20.4%
Accuracy (percentage correct vs those who submitted): 64.6%
Average Correct Time: 33 minutes, 30 seconds.

This is a problem that leverages topological sort very nicely. It would be an easier problem were it not for the constraints. With 4,000 nodes, you can barely get away with a quadratic solution. I know, because I tried and failed. This was my solution:

For each edge, calculate the two sets of nodes using DFS. That's already O(n²), but it's fast enough.
Store the sets using bits -- with 4000 nodes, you can store each node's sets in 4000/64 = 62.5, or 63 long long's.
For each pair of nodes, calculate the intersections (using AND's and NOT's), then sum up their cardinalities. Unfortunately, that's O(n³).
Square the biggest cardinality, and add that to the total.

Too slow.

So, I read the topcoder analysis, and their solution is really nice.

Step 1 - Turn the trees into DAG's, where each node has one edge, to its parent

This can be done with a DFS, the result of which is a vector P of integers, where P[i] is the parent of node i. You can do anything you want with the parent of the root. In these examples, I'll have it be -1.

We'll use Example 1 to illustrate. Here is how my DFS defined the two trees (edges going from the lower nodes to the higher nodes):

Parent Pointers

Node  0  1  2  3  4
T1   -1  2  0  0  1
T2   -1  0  4  4  1

Step 2 - Derive a topological ordering of the nodes

You can do this using the classic algorithm for topological sort, or you can just use the same DFS as above. A postorder traversal of the nodes yields a topological ordering, where a node appears after all of its children in the ordering. Here are orderings for the two trees above:

T1:  4 1 2 3 0
T2:  2 3 4 1 0

Step 3 - Determine the two sets for each edge of the first graph, using topological sort

I represented each set with a vector of long long's. If an element's bit was zero, it was in set A, and if was 1, it was in set B. You can do this by running through the nodes in toplogical order. Here's how it works. First set all nodes' vectors to zeros. Then, run through the topological order. For each node i, you set bit i in the vector. The resulting set will define the two sets when you remove the edge from node i to its parent.

Next, you OR node i's vector with its parents. That's how you get the sets right for the parents.

Below, I'll go through the whole process for T1. For each node that we process (which defines an edge that separates the tree into two components), I'll draw the trees, and identify the two sets as yellow (set A) and green (set B). I'll include the bit representation of the sets, both as bits and as hex.

I'm also coloring the sets when they change -- red means that that is the final set specification for the node, and blue means that this was the parent of a node that changed.

Node	Edge	Graph	Set for 0	Set for 1	Set for 2	Set for 3	Set for 4
Start	Start		`00000 0x00`	`00000 0x00`	`00000 0x00`	`00000 0x00`	`00000 0x00`
`4`	`4->1`		`00000 0x00`	`10000 0x10`	`00000 0x00`	`00000 0x00`	`10000 0x10`
`1`	`1->2`		`00000 0x00`	`10010 0x12`	`10010 0x12`	`00000 0x00`	`10000 0x10`
`2`	`2->0`		`10110 0x16`	`10010 0x12`	`10110 0x16`	`00000 0x00`	`10000 0x10`
`3`	`3->0`		`11110 0x1e`	`10010 0x12`	`10110 0x16`	`01000 0x8`	`10000 0x10`

You'll note, we don't process node 0, because it's the root, and has no parent pointers.

Step 4 - For each node/edge in T1, do a topological sort on T2 to count up the similarities

Ok, now what we're going to do is once again traverse each node/edge of T1. We know the two sets that result when you remove the edge, because we determined them in Step 3 above. Call the yellow set A and the green set B.

Now, we're going to traverse each node/edge of T2, also in topological order, and we're going to partition the nodes into two sets, C and D. Before we begin, all of the nodes are in C.

We store two values during the traversal:

|A| is the number of nodes in set A.
|B| is the number of nodes in set B.

Now, for each node/edge in T2, we are going to maintain two values:

AD[n] -- This is the number of nodes that belong in sets A and D, when the edge from node n is removed from T2.
BD[n] -- This is the number of nodes that belong in sets B and D, when the edge from node n is removed from T2.

We'll start with all of these being zero. However, when we process an node/edge, then we will:

Determine whether the node (call it n) belongs to set A or B. If so, we increment AD[n] or BD[n]. Appropriately.
We also add AD[n] and BD[n] to AD[p] and BD[p] to the node's parent, p.
We can now calculate the cardinality of the four sets that we care about in this problem:
- The number of nodes in A and C is AC-AD[n].
- The number of nodes in A and D is AD[n].
- The number of nodes in B and C is BC-BD[n].
- The number of nodes in B and D is BD[n].
Calculate the biggest of these, square it, and add it to the total.

T1's Node	T1's Edge	T1's Graph	The set	AC	BC	T2's Node	T2's Edge	T2's Graph	AD[n]	BD[n]	\|AIC\|	\|AID\|	\|BIC\|	\|BID\|	Similarity
`4`	`4->1`		`10000 0x10`	4	1	`2`	`2->4`		1	0	3	1	1	0	9
`4`	`4->1`		`10000 0x10`	4	1	`3`	`3->4`		1	0	3	1	1	0	9
`4`	`4->1`		`10000 0x10`	4	1	`4`	`4->1`		2	1	2	2	0	1	4
`4`	`4->1`		`10000 0x10`	4	1	`1`	`1->0`		3	1	1	3	0	1	9
`1`	`1->2`		`10010 0x12`	3	2	`2`	`2->4`		1	0	2	1	2	0	4
`1`	`1->2`		`10010 0x12`	3	2	`3`	`3->4`		1	0	2	1	2	0	4
`1`	`1->2`		`10010 0x12`	3	2	`4`	`4->1`		2	1	1	2	1	1	4
`1`	`1->2`		`10010 0x12`	3	2	`1`	`1->0`		2	2	1	2	0	2	4

And so on.