So, for example, below is the graph from the Network Flow Lecture Notes #1:
g1.txt
SOURCE A SINK G EDGE A D 3 EDGE A B 3 EDGE B C 4 EDGE C A 3 EDGE C D 1 EDGE C E 2 EDGE D E 2 EDGE D F 6 EDGE E B 1 EDGE E G 1 EDGE F G 9 |
The file g2.txt adds an edge from A to C with a capacity of 1.
I have a program makerandom.cpp which takes one argument, a number of nodes, and creates a random graph with one source, one sink, and the given number of other nodes. There is a random edge between every pair of nodes (in a random direction) with a random capacity between zero and 10. There are edges from the source to random nodes with a 40% probability, and there are edges from the sink to random nodes with a 40% probability. Thus, this is a pretty dense graph which should be a challenge to our network flow programs.
Below is an example of a graph made with makerandom 5:
g3.txt
SOURCE s SINK t EDGE n00 t 4.923 EDGE n01 n00 8.824 EDGE n00 n02 6.932 EDGE n00 n03 6.518 EDGE n00 n04 6.183 EDGE n01 t 8.471 EDGE n02 n01 4.929 EDGE n03 n01 5.566 EDGE n01 n04 6.661 EDGE s n02 6.263 EDGE n02 n03 0.741 EDGE n04 n02 5.840 EDGE n04 n03 4.417 EDGE s n04 8.033 |
class Graph { public: // Methods: Graph(); void Print(); Node *Get_Node_By_Name(string name); // Data: map <string, Node *> node_names; // All nodes by name. map <string, Edge *> edge_names; // All edges by name. Node *source; Node *sink; }; |
The Node class defines a node, which has a name, an adjacency list of edges, a backedge for determining paths, and a visited flag for searching.
typedef list <class Edge *> Adjlist; class Node { public: string name; Adjlist edges; Node *backedge; // For determining a path in augmenting paths. int visited; }; |
And finally, the Edge class defines an edge from n1 to n2 with the given capacity. It also contains a flow variable, so that we can represent both the flow and residual graph with the same data structure. Each edge will have a name, which is the names of n1 and n2 with an ASCII arrow between them. Finally, each edge has a pointer to its reverse edge, if it exists (NULL if there is no reverse edge), and a pointer to its entry in its node's adjacency list, in case it needs to be deleted.
class Edge { public: string name; Node *n1; // From node Node *n2; // To node Edge *reverse; // This lets you find your reverse edge quickly double capacity; double flow; Adjlist::iterator adj_ptr; // This lets you delete yourself // from the adjacently list quickly }; |
The code to read a graph is straightforward, in netflow1.cpp. There are some subtleties in figuring out whether a node has reverse edges or not. Here it is running on two of the example files:
UNIX> netflow1 < g1.txt Source: A, Sink: G Node A, Edges: (B,3) (D,3) Node B, Edges: (C,4) Node C, Edges: (E,2) (D,1) (A,3) Node D, Edges: (F,6) (E,2) Node E, Edges: (G,1) (B,1) Node F, Edges: (G,9) Node G, Edges: Edge A -> B -- No reverse edge. Edge A -> D -- No reverse edge. Edge B -> C -- No reverse edge. Edge C -> A -- No reverse edge. Edge C -> D -- No reverse edge. Edge C -> E -- No reverse edge. Edge D -> E -- No reverse edge. Edge D -> F -- No reverse edge. Edge E -> B -- No reverse edge. Edge E -> G -- No reverse edge. Edge F -> G -- No reverse edge. UNIX> |
UNIX> netflow1 < g2.txt Source: A, Sink: G Node A, Edges: (C,1) (B,3) (D,3) Node B, Edges: (C,4) Node C, Edges: (E,2) (D,1) (A,3) Node D, Edges: (F,6) (E,2) Node E, Edges: (G,1) (B,1) Node F, Edges: (G,9) Node G, Edges: Edge A -> B -- No reverse edge. Edge A -> C -- Reverse edge: C -> A Edge A -> D -- No reverse edge. Edge B -> C -- No reverse edge. Edge C -> A -- Reverse edge: A -> C Edge C -> D -- No reverse edge. Edge C -> E -- No reverse edge. Edge D -> E -- No reverse edge. Edge D -> F -- No reverse edge. Edge E -> B -- No reverse edge. Edge E -> G -- No reverse edge. Edge F -> G -- No reverse edge. UNIX> |
Second, I've removed flow from edges -- we can worry about adding flow later. Third, I've removed the name field from edges, and instead added a Print() method.
More significantly, I've added three new methods to graphs: GetFlow(), which calculates the maximum flow, FindAugmentingPath(), which finds an augmenting path, modifies the graph by reducing capacity among the flow edges and adding capacity to backedges (creating backedges in the process), and DFS(), which is a recursive procedure using standard depth-first search to find an augmenting path.
I've also added a path to the data. When DFS() returns, if it finds a path, it puts it into path and returns the flow through the path.
class Graph { public: // Methods: Graph(); void Print(); Node *Get_Node_By_Name(string name); double GetFlow(); double FindAugmentingPath(); double DFS(Node *n); // Data: list <Edge *> path; map <string, Node *> node_names; // All nodes by name. Node *source; Node *sink; }; |
Here's the DFS() procedure. Pretty straightforward -- the only subtlety is that when we find the path, we create it by prepending to the path list, since we create it in reverse:
double Graph::DFS(Node *n) { Adjlist::iterator alit; Edge *e; double flow; n->visited = 1; for (alit = n->edges.begin(); alit != n->edges.end(); alit++) { e = *alit; if (e->capacity > 0) { if (e->n2 == sink) { path.push_front(e); return e->capacity; } else if (!e->n2->visited) { flow = DFS(e->n2); if (flow > 0) { path.push_front(e); return (flow < e->capacity) ? flow : e->capacity; } } } } return 0; } |
Now, below is FindAugmentingPath(), which calls DFS() to find a path, and then performs the modifications on the graph. Note, we also delete the path while we are modifying the graph. One item of concern is what happens when an edge's capacity goes to zero? Should we delete it or just leave it with a capacity of zero? In this code, we just leave it, which is why in DFS() above, we make sure to ignore zero capacity edges. We'll explore the implications of this decision in netflow3 below.
double Graph::FindAugmentingPath() { map <string, Node *>::iterator nptr; list <Edge *>::iterator pit; Node *n; double flow; Edge *e; for (nptr = node_names.begin(); nptr != node_names.end(); nptr++) { n = nptr->second; n->visited = 0; } flow = DFS(source); if (flow > 0) { while (!path.empty()) { pit = path.begin(); e = *pit; path.erase(pit); e->capacity -= flow; // Remove flow from the edge. if (e->reverse == NULL) { // Create the reverse edge if necessary e->reverse = new Edge; e->reverse->reverse = e; e->reverse->capacity = 0; e->reverse->n1 = e->n2; e->reverse->n2 = e->n1; e->n2->edges.push_front(e->reverse); e->reverse->adj_ptr = e->n2->edges.begin(); } e->reverse->capacity += flow; // Add capacity to the reverse edge. } } return flow; } |
The rest is straightforward -- GetFlow() simply finds paths as long as they exist:
double Graph::GetFlow() { double maxflow; double flow; maxflow = 0; while (1) { flow = FindAugmentingPath(); if (flow == 0) return maxflow; maxflow += flow; } } |
You'll see that this does work to find flow (I've uncommented the lines that print out the augmenting paths:)
UNIX> netflow2 < g1.txt [A->B:3][B->C:4][C->E:2][E->G:1] [A->B:2][B->C:3][C->D:1][D->F:6][F->G:9] [A->D:3][D->F:5][F->G:8] Flow is 5 UNIX> netflow2 < g2.txt [A->C:1][C->E:2][E->G:1] [A->B:3][B->C:4][C->D:1][D->F:6][F->G:9] [A->D:3][D->F:5][F->G:8] Flow is 5 UNIX> netflow2 < g3.txt [s->n04:8.033][n04->n03:4.417][n03->n01:5.566][n01->t:8.471] [s->n04:3.616][n04->n02:5.84][n02->n03:0.741][n03->n01:1.149][n01->t:4.054] [s->n04:2.875][n04->n02:5.099][n02->n01:4.929][n01->t:3.313] [s->n02:6.263][n02->n01:2.054][n01->t:0.438] [s->n02:5.825][n02->n01:1.616][n01->n00:8.824][n00->t:4.923] Flow is 10.087 UNIX>
... if (e->capacity == 0) { e->reverse->reverse = NULL; e->n1->edges.erase(e->adj_ptr); delete e; } ... } |
This is why we have the adj_ptr field.
To test performance, I wrote a shell script time_one.sh which times sucessive iterations of one of the programs with a given graph size (I modified makerandom.cpp to take a seed for random graph generation):
UNIX> sh time_one.sh usage: sh time_one n program iterations UNIX> sh time_one.sh 10 netflow2 5 0 Flow is 18.711 0.004 1 Flow is 12.003 0.003 2 Flow is 11.066 0.003 3 Flow is 21.673 0.003 4 Flow is 14.788 0.003 UNIX>And then wrote a second shell script gen_data.sh that runs time_one.sh for a given number of iterations, averages the result, and then increments the graph size by a given value. It continues doing this until the average time reaches a certain threshhold. The example below averages five runs, starting with a graph size of ten until the average reaches 1 second (I've cut off the output:
UNIX> sh gen_data.sh usage: sh gen_data.sh program iterations start increment end-time UNIX> sh gen_data.sh netflow2 5 10 1 1 10 0.003000 11 0.003400 12 0.003000 13 0.003600 14 0.004200 15 0.005200 16 0.004200 ...Now, I use these scripts to compare the two implementations. Here's graph number one, using ten iterations per value:
I certainly couldn't publish data that noisy, so I've repeated the test with 50 iterations per value (sh gen_data netflow2/3 50 10 4 15).:
Although both graphs clearly show that netflow2 outperforms netflow3, the second is much better, and worth the extra time to produce (think about it -- 50 iterations of 10 seconds is a little over eight minutes for a single data point).
... typedef multimap <double, class Edge *> Adjlist; ... double Graph::DFS(Node *n) { Adjlist::iterator alit; Edge *e; double flow; n->visited = 1; for (alit = n->edges.begin(); alit != n->edges.end(); alit++) { e = alit->second; .... // Everything else is the same in this method. } double Graph::FindAugmentingPath() { ... // Initialization code deleted flow = DFS(source); if (flow > 0) { while (!path.empty()) { pit = path.begin(); e = *pit; path.erase(pit); // Here is where the edge is erased and reinserted. e->n1->edges.erase(e->adj_ptr); e->capacity -= flow; e->adj_ptr = e->n1->edges.insert(make_pair(e->capacity, e)); // The same thing needs to happen with the reverse edge if (e->reverse == NULL) { e->reverse = new Edge; e->reverse->reverse = e; e->reverse->capacity = flow; e->reverse->n1 = e->n2; e->reverse->n2 = e->n1; } else { e->n2->edges.erase(e->reverse->adj_ptr); e->reverse->capacity += flow; } e->reverse->adj_ptr = e->n2->edges.insert(make_pair(e->reverse->capacity, e->reverse)); } } return flow; } Graph::Graph() // Create a graph from standard input. { ... // edge inserted rather than appended e->adj_ptr = n1->edges.insert(make_pair(cap, e)); ... } void Graph::Print() { ... for (alit = n->edges.begin(); alit != n->edges.end(); alit++) { e = alit->second; e->Print(); edges.push_back(e); // This is a list of edges, not an adjacency list } ... } main() { Graph *g; double flow; g = new Graph(); flow = g->GetFlow(); cout << "Flow is " << flow << endl; } |
Let's look at performance:
Ick! That's horrible!! The reason is that I'm traversing the edges from smallest to greatest in DFS(), and not the other way around. It's a two-line fix in netflow5.cpp:
double Graph::DFS(Node *n) { Adjlist::reverse_iterator alit; Edge *e; double flow; n->visited = 1; for (alit = n->edges.rbegin(); alit != n->edges.rend(); alit++) { ... // The rest is the same |
and the resulting performance is excellent, blowing away the previous algorithms:
We also have each node store a pointer to its entry in the BFS_Q. You'll see why in a bit. There are the changed typedefs and Node specification:
typedef list <class Edge *> Adjlist; typedef multimap <double, class Node *> BFS_Q; class Node { public: string name; Adjlist edges; Edge *backedge; double maxflow; BFS_Q::iterator bfsq_ptr; }; |
The main change is in FindAugmentingPath(), which now performs Dijkstra's algorithm to find the maximum flow through the graph. Note how bfsq_ptr is required to delete the node from bfsq, because you are not allowed to delete with a reverse iterator.
double Graph::FindAugmentingPath() { map <string, Node *>::iterator nptr; list <Edge *>::iterator pit; Adjlist::iterator alit; BFS_Q bfsq; BFS_Q::reverse_iterator bfsq_it; Node *n, *n2; double flow; Edge *e; for (nptr = node_names.begin(); nptr != node_names.end(); nptr++) { n = nptr->second; n->maxflow = -1; } source->maxflow = capsum; n = source; n->backedge = NULL; while (n != NULL && n != sink) { // cout << "BFS - processing node " << n->name << endl; for (alit = n->edges.begin(); alit != n->edges.end(); alit++) { e = *alit; if (e->capacity > 0) { n2 = e->n2; flow = (n->maxflow > e->capacity) ? e->capacity : n->maxflow; if (flow > n2->maxflow) { if (n2->maxflow > -1) { bfsq.erase(n2->bfsq_ptr); } n2->maxflow = flow; n2->backedge = e; n2->bfsq_ptr = bfsq.insert(make_pair(flow, n2)); } } } if (bfsq.empty()) { n = NULL; } else { bfsq_it = bfsq.rbegin(); n = bfsq_it->second; bfsq.erase(n->bfsq_ptr); } } if (n == sink) { flow = n->maxflow; while (n != source) { path.push_front(n->backedge); n = n->backedge->n1; } } else { flow = 0; } // The rest of the code is the same as before. ... |
This algorithm is even faster than the others:
It's worse than using Dijkstra's algorithm and better than using the greedy algorithm. Moreover, it appears more reliable in its performance. Think about why -- there is a pretty clear answer.
I've modified all of the programs to print paths as they are determined. You should be able to explain all of them. First, netflow2 simply chooses random augmenting paths:
UNIX> netflow2-path < g3.txt Path: [s->n04:8.033] [n04->n03:4.417] [n03->n01:5.566] [n01->t:8.471] Path: [s->n04:3.616] [n04->n02:5.84] [n02->n03:0.741] [n03->n01:1.149] [n01->t:4.054] Path: [s->n04:2.875] [n04->n02:5.099] [n02->n01:4.929] [n01->t:3.313] Path: [s->n02:6.263] [n02->n01:2.054] [n01->t:0.438] Path: [s->n02:5.825] [n02->n01:1.616] [n01->n00:8.824] [n00->t:4.923] Flow is 10.087 UNIX> |
Netflow4 uses a greedy algorithm to peform the DFS by selecting minimum edges first:
UNIX> netflow4-path < g3.txt Path: [s->n02:6.263] [n02->n03:0.741] [n03->n01:5.566] [n01->t:8.471] Path: [s->n02:5.522] [n02->n01:4.929] [n01->t:7.73] Path: [s->n04:8.033] [n04->n03:4.417] [n03->n01:4.825] [n01->t:2.801] Path: [s->n04:5.232] [n04->n03:1.616] [n03->n01:2.024] [n01->n00:8.824] [n00->t:4.923] Flow is 10.087 UNIX> |
Netflow5 fixes this by performing the DFS by selecting maximum edges first.
UNIX> netflow5-path < g3.txt Path: [s->n04:8.033] [n04->n02:5.84] [n02->n01:4.929] [n01->n00:8.824] [n00->t:4.923] Path: [s->n02:6.263] [n02->n04:4.923] [n04->n03:4.417] [n03->n01:5.566] [n01->t:8.471] Path: [s->n04:3.11] [n04->n02:5.334] [n02->n03:0.741] [n03->n01:1.149] [n01->t:4.054] Path: [s->n04:2.369] [n04->n02:4.593] [n02->n01:0.006] [n01->t:3.313] Flow is 10.087 UNIX> |
Netflow6 always chooses the path with the maximum flow. Note how the first path is not the same one chosen by netflow5, because netflow5 starts with the biggest edge from the source, and the maximum flow path is not along that edge.
UNIX> netflow6-path < g3.txt Path: [s->n02:6.263] [n02->n01:4.929] [n01->t:8.471] Path: [s->n04:8.033] [n04->n03:4.417] [n03->n01:5.566] [n01->n00:8.824] [n00->t:4.923] Path: [s->n04:3.616] [n04->n02:5.84] [n02->n03:0.741] [n03->n01:1.149] [n01->t:3.542] Flow is 10.087 UNIX> |
Finally, netflow7 chooses the minimum hop path:
UNIX> netflow7-path < g3.txt Path: [s->n02:6.263] [n02->n01:4.929] [n01->t:8.471] Path: [s->n04:8.033] [n04->n03:4.417] [n03->n01:5.566] [n01->t:3.542] Path: [s->n04:4.491] [n04->n03:0.875] [n03->n01:2.024] [n01->n00:8.824] [n00->t:4.923] Path: [s->n02:1.334] [n02->n03:0.741] [n03->n01:1.149] [n01->n00:7.949] [n00->t:4.048] Flow is 10.087 UNIX> |
Wouldn't a wonderful test question be to give you a graph and the output of these programs and to have you tell me which programs created which outputs?
class Node { public: string name; Adjlist edges; Adjlist original_edges; Edge *backedge; double maxflow; BFS_Q::iterator bfsq_ptr; int visited; }; |
After calculating the flow, we perform a very simple depth-first search starting at the source to determine the set of reachable nodes. These nodes will have visited equal one. We then traverse all the nodes in the visited set and print all the edges to nodes that are not in the visited set:
void Graph::DFS(Node *n) { Adjlist::iterator alit; Edge *e; n->visited = 1; for (alit = n->edges.begin(); alit != n->edges.end(); alit++) { e = *alit; if (e->capacity > 0 && !e->n2->visited) DFS(e->n2); } } main() { Graph *g; double flow; map <string, Node *>::iterator nit; Adjlist::iterator alit; Edge *e; Node *n; g = new Graph(); flow = g->GetFlow(); cout << "Flow is " << flow << endl; // Find the set of nodes reachable from the source and mark them as visited. for (nit = g->node_names.begin(); nit != g->node_names.end(); nit++) { n = nit->second; n->visited = 0; } g->DFS(g->source); // Now, for each node in the visited set, print out original edges going to // nodes from the non-visited set. Calculate their sum too. flow = 0; cout << endl; cout << "Cut Edges:\n"; for (nit = g->node_names.begin(); nit != g->node_names.end(); nit++) { n = nit->second; if (n->visited) { for (alit = n->original_edges.begin(); alit != n->original_edges.end(); alit++) { e = *alit; if (!e->n2->visited) { e->Print(); cout << endl; flow += e->capacity; } } } } cout << endl; cout << "Cut Capacity: " << flow << endl; } |
When we run this on g3.txt we see that there are only three nodes in the set, and that their sum does indeed equal the network flow of 10.087:
UNIX> netflow6-cut < g3.txt Flow is 10.087 Cut Edges: [n02->n03:0.741] [n02->n01:4.929] [n04->n03:4.417] Cut Capacity: 10.087 UNIX>Pictorally: