CS494 Lab 6

The lab follows the topcoder writeup:

SRM 720, D2, 1000-Pointer (RainbowGraph)

James S. Plank

Mon Dec 3 15:25:44 EST 2018

Problem Statement.
A main() with the examples compiled in.
Problem Given in Topcoder: January, 2017
Competitors who opened the problem: 169
Competitors who submitted a solution: 8
Number of correct solutions: 1
Accuracy (percentage correct vs those who opened): 0.5%
Accuracy (percentage correct vs those who submitted): 12.5%
Average Correct Time: 44 minutes, 28 seconds.

Grad students Clara Nguyen and Natalie Bogda wrote up a very nice presentation of this problem, with some interesting commentary. The web link is http://utk.claranguyen.me/talks.php?id=bitdp. That's not how I recommend to solve the problem, but it makes for some interesting reading!

In case topcoder's servers are down

Here is a summary of the problem:

You are given an undirected graph with n nodes, numbered 0 through n-1.
Each node is assigned a color, which is a number from 0 through 9. These colors are given in a vector color.
Edges in the graph are specified by two vectors a and b, where the edges connect nodes a[i] and b[i].
There are at most 2500 edges in the graph.
No more than 10 nodes may have the same color.
A walk in the graph is a path that visits every node in the graph exactly once, and may be partitioned into subpaths, where the nodes in each subpath all have the same color.
A legal walk is one where there are no two subpaths that contain nodes of the same color.
Return the number of legal walks, modulo 1,000,000,007.

Examples:

0	{0,0,0,1,1,1,2,2,2}	{0,1,2,3,4,5,6,7,8,0,3,6}	{1,2,0,4,5,3,7,8,6,3,6,0}	0
1	{0,0,0,1,1,1,2,2,2}	{0,1,2,3,4,5,6,7,8,0,4,8}	{1,2,0,4,5,3,7,8,6,3,7,2}	24
2	{0,3,9,8,6,4}	{0,0,0,0,0,1,1,1,1,2,2,2,3,3,4}	{1,2,3,4,5,2,3,4,5,3,4,5,4,5,5}	720
3	{0,0,0,0,3,3,3,6,6,9}	{9,9,9,9,9,9,9,9,9,7,7,7,7,7,7,7,4,4,4,4,0,1,2,4,5,8}	{0,1,2,3,4,5,6,7,8,0,1,2,3,4,5,6,0,1,2,3,1,2,3,5,6,7}	64
4	{3,1,4,1,5,9,2,6,5,3,5}	{1}	{2}	0
5	Too big.	See main.cpp		983979105

Introduction

This is a wonderful problem -- a mixture of DFS and dynamic programming. You have to program it carefully, or you won't get it in under the time limit!

Obviously, you are going to focus on the connected components, which are denoted by the node colors. Once a path reaches a node of one color, it must travel through every node in that color's connected component, before it can go to a node with another color. The constraints help you: There is a maximum of ten components, and each component has a maximum of ten nodes.

An Example

I put this into main.cpp as example 7:

color = { 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2 }
a =     { 0, 1, 2, 3, 3, 6, 6, 4, 7, 7, 10, 10, 8, 2, 6 }
b =     { 1, 2, 0, 4, 5, 4, 5, 5, 8, 9,  8,  9, 9, 3, 7 }

I've colored the inter-component edges black, and the intra-component edges the same color as the component nodes.

Let's logic our way through the answer. It should be clear that the only paths are going to go through the green-red-blue components in that order, or blue-red-green. Let's focus on the paths through the components, when they are going through green-red-blue.

There are two ways through the green component: 0-1-2 and 1-0-2.
There are two ways through the red component: 3-4-5-6 and 3-5-4-6.
There are four ways through the blue component: 7-8-9-10, 7-8-10-9, 7-9-8-10 and 7-9-10-8.

That means that there are 2*2*4 = 16 paths that go through green-red-blue. If you think about it, the paths that go through blue-red-green are the exact same paths as green-red-blue, just in reverse. So the answer is 32 paths.

The solution, part 1

I solved this in two parts. In the first part, I created a two-dimensional array NP. NP[i][j] is non-zero only if nodes i and j are in the same connected component. It contains the number of paths from node i to node j, where each path must contain every node in the component.

In our example above, we'll have the following:

NP[0][1] = NP[0][2] = 1.
NP[1][0] = NP[1][2] = 1.
NP[2][0] = NP[2][1] = 1.
NP[3][4] = NP[3][5] = NP[4][3] = NP[5][3] = 1.
NP[6][4] = NP[6][5] = NP[4][6] = NP[5][6] = 1.
NP[3][6] = NP[6][3] = 2.
NP[7][8] = NP[7][9] = NP[8][7] = NP[9][7] = 1.
NP[10][8] = NP[10][9] = NP[8][10] = NP[9][10] = 1.
NP[7][10] = NP[10][7] = 2.

For a given node i, you can calculate NP[i][j] for all j using a an enhanced DFS which travels every intracomponent path from node i. While you're traveling each path, you keep track of the path length, and if you reach a node j and the path length contains every node in the component, you increment NP[i][j].

Let me run through an example, where the starting node is node 3. When I did this, I had a V (visited) field for each node, and a variable NIP (nodes in the path). I set them all to zero. Here is what happens when I call DFS(3). I call it an "enhanced" DFS, because when you're done with a node, you set V back to zero, so it can participate in more paths.

DFS(3):        NIP:0 -- Begin.  Increment NIP and Set V[3] to 1.
DFS(3):        NIP:1 -- Will call DFS on: 4 5
DFS(3):        NIP:1 -- Calling DFS(4)
  DFS(4):      NIP:1 -- Begin.  Increment NIP and Set V[4] to 1.
  DFS(4):      NIP:2 -- Will call DFS on: 6 5
  DFS(4):      NIP:2 -- Calling DFS(6)
    DFS(6):    NIP:2 -- Begin.  Increment NIP and Set V[6] to 1.
    DFS(6):    NIP:3 -- Will call DFS on: 5
    DFS(6):    NIP:3 -- Calling DFS(5)
      DFS(5):  NIP:3 -- Begin.  Increment NIP and Set V[5] to 1.
      DFS(5):  NIP:4 -- Setting NP[3][5] to 1.
      DFS(5):  NIP:3 -- Done.  Setting V[5] = 0
    DFS(6):    NIP:2 -- Done.  Setting V[6] = 0
  DFS(4):      NIP:2 -- Calling DFS(5)
    DFS(5):    NIP:2 -- Begin.  Increment NIP and Set V[5] to 1.
    DFS(5):    NIP:3 -- Will call DFS on: 6
    DFS(5):    NIP:3 -- Calling DFS(6)
      DFS(6):  NIP:3 -- Begin.  Increment NIP and Set V[6] to 1.
      DFS(6):  NIP:4 -- Setting NP[3][6] to 1.
      DFS(6):  NIP:3 -- Done.  Setting V[6] = 0
    DFS(5):    NIP:2 -- Done.  Setting V[5] = 0
  DFS(4):      NIP:1 -- Done.  Setting V[4] = 0
DFS(3):        NIP:1 -- Calling DFS(5)
  DFS(5):      NIP:1 -- Begin.  Increment NIP and Set V[5] to 1.
  DFS(5):      NIP:2 -- Will call DFS on: 6 4
  DFS(5):      NIP:2 -- Calling DFS(6)
    DFS(6):    NIP:2 -- Begin.  Increment NIP and Set V[6] to 1.
    DFS(6):    NIP:3 -- Will call DFS on: 4
    DFS(6):    NIP:3 -- Calling DFS(4)
      DFS(4):  NIP:3 -- Begin.  Increment NIP and Set V[4] to 1.
      DFS(4):  NIP:4 -- Setting NP[3][4] to 1.
      DFS(4):  NIP:3 -- Done.  Setting V[4] = 0
    DFS(6):    NIP:2 -- Done.  Setting V[6] = 0
  DFS(5):      NIP:2 -- Calling DFS(4)
    DFS(4):    NIP:2 -- Begin.  Increment NIP and Set V[4] to 1.
    DFS(4):    NIP:3 -- Will call DFS on: 6
    DFS(4):    NIP:3 -- Calling DFS(6)
      DFS(6):  NIP:3 -- Begin.  Increment NIP and Set V[6] to 1.
      DFS(6):  NIP:4 -- Setting NP[3][6] to 2.
      DFS(6):  NIP:3 -- Done.  Setting V[6] = 0
    DFS(4):    NIP:2 -- Done.  Setting V[4] = 0
  DFS(5):      NIP:1 -- Done.  Setting V[5] = 0
DFS(3):        NIP:0 -- Done.  Setting V[3] = 0

This is most definitely an expensive algorithm. Think about it -- if there are ten nodes in a component, and the nodes are completely connected, then NP[i][j] will equal 8! for each i and j. That is because there is a path for every permutation of the other nodes. Of course, 8! equals 40,320, which is definitely doable in the universe of topcoder.

The solution, part 2.

The second part uses dynamic programming. Define the following procedure:

long long NumWalks(int starting_node, int remaining_components);

This is going to return the number of legal paths that start with starting_node, go through starting_node's component, and then go through remaining_components. remaining_components is an integer that stores a set of components using bit arithmetic. It should not include starting_node's component. You are going to sum up NumWalks(n, s) for every node n, and every set s composed of all of the components except for n's component.

To implement NumWalks(n, s), what you do is look at every node m in node n's component. If NP[n][m] is greater than zero, then you look at every node l which is connected to m and in a component in s. You then call NumWalks(l, s - l's component). You will add the product of that and NP[n][m] to the return value for NumWalks(n, s).

You need a base case for this -- if s = {}, then NumWalks(n, s) is the sum of all NP[n][m].

Let's do an example -- we'll calculate NumWalks(0, { red, blue } ). There are only two values of NP[0][j] which is greater than zero -- NP[0][1] and NP[0][2] both equal one. First, consider NP[0][1]. There is no edge from 1 to the red or blue components. So, there are no paths involving NP[0][1].

Next, consider NP[0][2]. There is an edge from 2 to 3, so the return value for NumWalks(0, { red, blue } ) is going to equal NP[0][2] (which is one) times NumWalks(3, { blue } ). So let's focus on NumWalks(3, { blue } ):

Although NP[3][4] and NP[3][5] equal one, neither four nor five are connected to the blue component. So, the only value of NP that matters is NP[3][6], which equals 2. NumWalks(3, { blue } ) is going to equal NP[3][6] times NumWalks(7, { } ).

So, we focus on on NumWalks(7, { } ). This is the base case of the recursion -- it equals the sum of all NP[7][m]. This is 4. So NumWalks(3, { blue } ) equals 4*2 = 8. And NumWalks(0, { red, blue } ) equals 8*1 = 8.

This is dynamic programming, so you cache the return values of NumWalks(). The nodes are numbers from 0 to 9, and the component sets are numbers from 0 to 1023. So, your cache isn't that big -- best to make it a two-dimensional vector.

Below is the dynamic programming cache for example 7. Go ahead and walk through it.

`n`	`remaining_components (int)`	`remaining_components (set)`	`NumWalks(n, remaining_components)`
0	0x6	{red,blue}	8
1	0x6	{red,blue}	8
2	0x0	{}	2
2	0x4	{blue}	0
2	0x6	{red,blue}	0
3	0x4	{blue}	8
3	0x5	{green,blue}	0
4	0x5	{green,blue}	0
5	0x5	{green,blue}	0
6	0x1	{green}	4
6	0x5	{green,blue}	0
7	0x0	{}	4
7	0x1	{green}	0
7	0x3	{green,red}	0
8	0x3	{green,red}	4
9	0x3	{green,red}	4
10	0x3	{green,red}	8

CS494 Lab 6

You are only to hand in RG.cpp. You may not modify any of the other files in this lab.

Your job is to implement the RainbowGraph class in the file RG.cpp. The RainbowGraph class is defined in RG.h. You are not allowed to modify this file.

#include <string>
#include <vector>
#include <iostream>
#include <cstdio>
#include <cstdlib>
using namespace std;

class RainbowGraph {
  public:
    int countWays(vector <int> color, vector <int> a, vector <int> b);
    string Verbose;

    vector <int> Color;              // This is a copy of the input parameter "color".
    vector < vector <int> > Same;    // Adjacency lists of intracomponent edges.
    vector < vector <int> > Diff;    // Adjacency lists of intercomponent edges.
    vector < vector <int> > CNodes;  // Cnodes[i] contains all nodes whose color is i.
    vector < vector <int> > NP;      // NP[i][j] = number of paths from i to j that go
                                     // through all of the nodes in the component.

    vector <int> V;                  // The visited vector for the DFS.
    int NIP;                         // During the DFS, this is the number of nodes in the current path.
    int Source;                      // This is the initial node for each DFS call.
    int Target;                      // The size of Source's component: Cnodes[Source].size()

    vector < vector <long long > > Cache;  // The DP cache.

    void CountPaths(int n);          // This is the extended DFS.  Set Source, Target, V and NIP
                                     // before you call CountPaths(Source) to set NP[Source][j].

    long long NumWalks(int node, int setid);   // Number of walks starting at node node that 
                                               // still need to go through the nodes in setid.
};

You should have RG.cpp include "RB.h", and then implement countWays() as I have described above. Besides countWays() and Verbose, you don't have to implement or use any of the member variables or methods in this class. They are the ones that I used, though, and they are all that you need. You are not allowed to add things to this class.

Besides implementing countWays() so that it works correctly, you should also implement the following inside countWays():

If the Verbose string contains the character 'N', then you should print out all non-zero values of NP, one per line, in the form:
NP[i][j] = value
If the Verbose string contains the character 'C', then you should print out your dynamic programming cache, one entry per line, in the form:
Cache[node][setid] = value
The setid should be in hex, preceded by "0x".

You can implement other functionality for debugging, but I'll only test you on those two, and that you return the proper answer. My code prints out the DFS if the verbose string contains 'D'. You don't have to do that (and you may not want to, because it may slow your code down too much).

Testing with RG-Tester.cpp, and with Grade-Timer.sh

The makefile will make two executables:

RG-Tester. This uses the main() defined in RG-Tester.cpp. You give it the verbose string as its first argument (or none) and color, a and b on standard input. These should be in the form "{ v0, v1, v2 }", where the braces and the values should be distinct words. The commas are optional -- I simply call sscanf() on the word to convert it to an integer.
a.out. This makes RainbowGraph.cpp out of RG.h and RG.cpp (concatenating them, and then deleting the line that includes RB.h), and compiles it with main.cpp, so you can test how it does with the topcoder examples. Examples 1-7 are in main.cpp, along with Example 11 from the Topcoder system test which is a challenging one time-wise.

For grading, we're only going to use your RG-Tester. The gradescript tests the 'A' 'N' and 'C' verbose flags of RG-Tester. Additionally, there is a shell script in the lab directory called Grade-Timer.sh. It times all of the gradescript examples whose numbers are one, mod three. Your RG-Tester needs to complete each example in under 2 seconds (mine works in under .75 seconds for each). Don't try this on a heavily loaded machine.

I have examples 1-7 and example 11 as files in the lab directory (example 11 is exb.txt):

UNIX> RG-Tester < ex7.txt
32
UNIX> RG-Tester N < ex7.txt
NP[0][1] = 1
NP[0][2] = 1
NP[1][0] = 1
NP[1][2] = 1
NP[2][0] = 1
NP[2][1] = 1
NP[3][4] = 1
NP[3][5] = 1
NP[3][6] = 2
NP[4][3] = 1
NP[4][6] = 1
NP[5][3] = 1
NP[5][6] = 1
NP[6][3] = 2
NP[6][4] = 1
NP[6][5] = 1
NP[7][8] = 1
NP[7][9] = 1
NP[7][10] = 2
NP[8][7] = 1
NP[8][10] = 1
NP[9][7] = 1
NP[9][10] = 1
NP[10][7] = 2
NP[10][8] = 1
NP[10][9] = 1
UNIX> RG-Tester C < ex7.txt
Cache[0][0x6] = 8
Cache[1][0x6] = 8
Cache[2][0x0] = 2
Cache[2][0x4] = 0
Cache[2][0x6] = 0
Cache[3][0x4] = 8
Cache[3][0x5] = 0
Cache[4][0x5] = 0
Cache[5][0x5] = 0
Cache[6][0x1] = 4
Cache[6][0x5] = 0
Cache[7][0x0] = 4
Cache[7][0x1] = 0
Cache[7][0x3] = 0
Cache[8][0x3] = 4
Cache[9][0x3] = 4
Cache[10][0x3] = 8
UNIX>

Four bullets of advice

Be forewarned that the makefile will clobber RainbowGraph.cpp. It will copy it to RainbowGraph.cpp.backup first, but be ready for it.
The constraints pushed my program to the limits -- my first solution used a map for the Dynamic Programming cache, and that was too slow.
My second was also too slow, because my DFS was too slow. I fixed that by testing to see if V[i] was one before calling CountPaths(i). Previously, I simply put "if (V[i] == 1) return" as the first line of CountPaths(i), and all of the extra recursive calls made my program too slow.
As a corollary, if you have printing code in your DFS to help you debug, you need to comment it out before you do the topcoder test (as opposed to doing something like "if (Verbose.find('D') != string::npos) ...".