I've also modified makerandom to create random graphs with integer capacities less than 10000. We'll evaluate this on a few graphs. First is g5.txt, which is the same as g3.txt in the last lecture, only I've multiplied each value by 1000 so that they are integers. The flow of this graph is 10,087:
UNIX> netflow_dfs_1 Print < g5.txt Found path with flow of 4923: s->n02 n02->n01 n01->n00 n00->t Found path with flow of 6: s->n02 n02->n01 n01->t Found path with flow of 741: s->n02 n02->n03 n03->n01 n01->t Found path with flow of 4417: s->n04 n04->n03 n03->n01 n01->t Max flow is 10087 UNIX>We'll also evaluate g10.txt and g100.txt, which were created with the following calls:
UNIX> makerandom 10 0 > g10.txt UNIX> makerandom 100 0 > g100.txt UNIX>We'll take a look at the paths coming from g10.txt, and with g100.txt, we'll look at the number of paths and the running time:
UNIX> netflow_dfs_1 Print < g10.txt Found path with flow of 2240: s->n01 n01->n00 n00->t Found path with flow of 1022: s->n01 n01->n00 n00->n03 n03->n02 n02->n04 n04->n07 n07->n05 n05->t Found path with flow of 552: s->n01 n01->n00 n00->n03 n03->n02 n02->n07 n07->n05 n05->t Found path with flow of 2740: s->n01 n01->n00 n00->n03 n03->t Found path with flow of 1311: s->n01 n01->n02 n02->n00 n00->n03 n03->t Found path with flow of 237: s->n01 n01->n02 n02->n00 n00->n05 n05->n03 n03->t Found path with flow of 455: s->n01 n01->n02 n02->n00 n00->n05 n05->n04 n04->n03 n03->t Found path with flow of 428: s->n01 n01->n02 n02->n00 n00->n05 n05->n07 n07->n03 n03->t Found path with flow of 1574: s->n04 n04->n00 n00->n01 n01->n02 n02->n03 n03->t Found path with flow of 873: s->n04 n04->n00 n00->n01 n01->n02 n02->n07 n07->n03 n03->t Max flow is 11432 UNIX> netflow_dfs_1 Print < g100.txt | wc 4375 325064 2793151 UNIX> time netflow_dfs_1 < g100.txt Max flow is 157463 0.145u 0.002s 0:00.14 100.0% 0+0k 0+0io 0pf+0w UNIX>
class Edge { public: string name; Node *n1; Node *n2; Edge *reverse; int original; int residual; int flow; list <Edge *>::iterator iterator; }; ... Graph::Graph() { ... for (i = 0; i < Edges.size(); i++) { e = Edges[i]; if (e->original > 0) { e->n1->adj.push_front(e); e->iterator = e->n1->adj.begin(); } } } |
When I put an edge onto an adjacency list, I use push_front(), and then store the iterator to the newly created node with the edge. That way, I can delete edges in constant time with "e->n1->adj.erase(e->iterator)."
The other relevant code changes are in the DFS, which now does not have to check for positive capacity edges:
int Graph::DFS(Node *n) { Edge *e; int f; list <Edge *>::iterator eit; if (n->visited) return 0; n->visited = 1; if (n == Sink) return 1; for (eit = n->adj.begin(); eit != n->adj.end(); eit++) { e = *eit; if (DFS(e->n2)) { Path.push_back(e); return 1; } } return 0; } |
And in Find_Augmenting_Path(), which now has to remove zero capacity edges when it updates the residual graph. It also has to add reverse edges to the residual graph when their capacities were zero. This is where having the iterator in the edge comes in handy. I've bold-faced the new code:
int Graph::Find_Augmenting_Path() { int i; int f; Edge *e; for (i = 0; i < Nodes.size(); i++) Nodes[i]->visited = 0; Path.clear(); if (DFS(Source)) { f = Path[0]->residual; for (i = 1; i < Path.size(); i++) { if (Path[i]->residual < f) f = Path[i]->residual; } for (i = 0; i < Path.size(); i++) { e = Path[i]; e->residual -= f; e->flow += f; if (e->residual == 0) e->n1->adj.erase(e->iterator); e->reverse->residual += f; if (e->reverse->residual == f) { e->n2->adj.push_front(e->reverse); e->reverse->iterator = e->n2->adj.begin(); } } if (Print_Paths) { printf("Found path with flow of %d:", f); for (i = Path.size()-1; i >= 0; i--) printf(" %s", Path[i]->name.c_str()); printf("\n"); } return f; } else { return 0; } } |
When we run it, it's correct, but look at the time and number of paths!
UNIX> netflow_dfs_2 Print < g5.txt Found path with flow of 4417: s->n04 n04->n03 n03->n01 n01->t Found path with flow of 741: s->n04 n04->n02 n02->n03 n03->n01 n01->t Found path with flow of 2875: s->n04 n04->n02 n02->n01 n01->t Found path with flow of 438: s->n02 n02->n01 n01->t Found path with flow of 1616: s->n02 n02->n01 n01->n00 n00->t Max flow is 10087 UNIX> netflow_dfs_2 < g10.txt Max flow is 11432 UNIX> time netflow_dfs_2 < g100.txt Max flow is 157463 0.402u 0.003s 0:00.40 100.0% 0+0k 0+1io 0pf+0w UNIX> time netflow_dfs_2 Print < g100.txt | wc 41801 3628627 31364656 UNIX>Oh my. I had first mistakenly blamed the poor performance of this on the memory operations. Instead, it's the number of paths! Now, why would the number of paths be so great? I have a hunch that it's because I'm calling push_front() and traversing edges from front to back. Think about it from a logical standpoint. Suppose my first path through the graph has 50 edges. The flow is likely to be small -- like 200. The reverse edges that are put onto the graph all have flow of 200 and they are in the *front* of the adjacency list. That means the next path will probably have a flow of less than 200. And so on -- the paths are going to have small flows because of these reverse edges.
Let's test that hunch. How about traversing the adjacency list from back to front in DFS()? I do that in netflow_dfs_2a.cpp:
UNIX> diff netflow_dfs_2.cpp netflow_dfs_2a.cpp 118c118 < listWhen I run it, it has a lot fewer paths and runs pretty fast:::iterator eit; --- > list ::reverse_iterator eit; 124c124 < for (eit = n->adj.begin(); eit != n->adj.end(); eit++) { --- > for (eit = n->adj.rbegin(); eit != n->adj.rend(); eit++) { UNIX>
UNIX> netflow_dfs_2a Print < g5.txt Found path with flow of 4923: s->n02 n02->n01 n01->n00 n00->t Found path with flow of 6: s->n02 n02->n01 n01->t Found path with flow of 741: s->n02 n02->n03 n03->n01 n01->t Found path with flow of 4417: s->n04 n04->n03 n03->n01 n01->t Max flow is 10087 UNIX> netflow_dfs_2a < g10.txt Max flow is 11432 UNIX> time netflow_dfs_2a < g100.txt Max flow is 157463 0.034u 0.003s 0:00.03 100.0% 0+0k 0+0io 0pf+0w UNIX> time netflow_dfs_2a Print < g100.txt | wc 1388 100213 860611 UNIX>Wow. 1388 paths as opposed to 41801. Let's confirm our suspicion by graphing the flow in the first 100 paths (you can blow up the graph by clicking on it):
Interesting.
int Graph::DFS(Node *n) { Edge *e; int f; int i; if (n->visited) return 0; n->visited = 1; if (n == Sink) return 1; for (i = 0; i < n->non_zero_edges; i++) { e = n->adj[i]; if (DFS(e->n2)) { Path.push_back(e); return 1; } } return 0; } |
In the constructor, we create the adjacency list in three passes. We first put on the non-zero capacity edges. Then we set non_zero_edges for each node, and then we put on the zero capacity edges:
Graph::Graph() { ... for (i = 0; i < Edges.size(); i++) { /* Pass 1 */ e = Edges[i]; if (e->original > 0) { e->index = e->n1->adj.size(); e->n1->adj.push_back(e); } } for (i = 0; i < Nodes.size(); i++) { /* Pass 2 */ n = Nodes[i]; n->non_zero_edges = n->adj.size(); } for (i = 0; i < Edges.size(); i++) { /* Pass 3 */ e = Edges[i]; if (e->original == 0) { e->index = e->n1->adj.size(); e->n1->adj.push_back(e); } } } |
Finally, the hard code is in Find_Augmenting_Path() when the residual is processed. Instead of deleting an edge when we set its capacity to zero, we instead swap it with the non-zero edge in the vector position non_zero_edges-1. We then decrement non_zero_edges. Similarly, when we add flow to a zero-capacity edge, we swap it with the edge in position non_zero_edges, and then we increment non_zero_edges. In that way, we maintain the two sets, but we're not performing deleting or insertion into a list.
Yes, the code is ugly:
int Graph::Find_Augmenting_Path() { int i; int f; Edge *e, *eswap; Node *n; int ie, ieswap; for (i = 0; i < Nodes.size(); i++) Nodes[i]->visited = 0; Path.clear(); if (DFS(Source)) { f = Path[0]->residual; for (i = 1; i < Path.size(); i++) { if (Path[i]->residual < f) f = Path[i]->residual; } for (i = 0; i < Path.size(); i++) { e = Path[i]; e->residual -= f; e->flow += f; if (e->residual == 0) { n = e->n1; n->non_zero_edges--; ie = e->index; ieswap = n->non_zero_edges; if (ie != ieswap) { eswap = n->adj[ieswap]; n->adj[ieswap] = e; n->adj[ie] = eswap; e->index = ieswap; eswap->index = ie; } } e->reverse->residual += f; if (e->reverse->residual == f) { n = e->n2; ieswap = n->non_zero_edges; ie = e->reverse->index; n->non_zero_edges++; if (ie != ieswap) { eswap = n->adj[ieswap]; n->adj[ieswap] = e->reverse; n->adj[ie] = eswap; e->reverse->index = ieswap; eswap->index = ie; } } } if (Print_Paths) { printf("Found path with flow of %d:", f); for (i = Path.size()-1; i >= 0; i--) printf(" %s", Path[i]->name.c_str()); printf("\n"); } return f; } else { return 0; } } |
When we run it, it's hard to compare to netflow_dfs_2a because netflow_dfs_2a has more paths:
UNIX> netflow_dfs_3 Print < g5.txt Found path with flow of 4923: s->n02 n02->n01 n01->n00 n00->t Found path with flow of 6: s->n02 n02->n01 n01->t Found path with flow of 741: s->n02 n02->n03 n03->n01 n01->t Found path with flow of 4417: s->n04 n04->n03 n03->n01 n01->t Max flow is 10087 UNIX> netflow_dfs_3 < g10.txt Max flow is 11432 UNIX> time !!:s/10/100 time netflow_dfs_3 < g100.txt Max flow is 157463 0.065u 0.002s 0:00.06 100.0% 0+0k 0+1io 0pf+0w UNIX> netflow_dfs_3 Print < g100.txt | wc 4447 357264 3080610 UNIX>If we normalize by the number of paths, netflow_dfs_2a takes 0.000024 seconds per path, while netflow_dfs_3 takes 0.000015. When we implement Dijkstra's algorithm, we'll implement it starting with both of these to see how the two techniques fare with an apples to apples comparison.
Graph::Graph() { ... for (i = 0; i < Edges.size(); i++) { e = Edges[i]; if (e->original > 0) { e->iterator = e->n1->adj.insert(make_pair(e->original, e)); } } } |
Next, in DFS() we traverse the multimap from back to front:
int Graph::DFS(Node *n) { Edge *e; int f; multimap <int, Edge *>::reverse_iterator eit; if (n->visited) return 0; n->visited = 1; if (n == Sink) return 1; for (eit = n->adj.rbegin(); eit != n->adj.rend(); eit++) { e = eit->second; if (DFS(e->n2)) { Path.push_back(e); return 1; } } return 0; } |
Finally, in Find_Augmenting_Path(), we perform operations on the multimap rather than on a list:
int Graph::Find_Augmenting_Path() { int i; int f; Edge *e; for (i = 0; i < Nodes.size(); i++) Nodes[i]->visited = 0; Path.clear(); if (DFS(Source)) { f = Path[0]->residual; for (i = 1; i < Path.size(); i++) { if (Path[i]->residual < f) f = Path[i]->residual; } for (i = 0; i < Path.size(); i++) { e = Path[i]; e->residual -= f; e->flow += f; if (e->residual == 0) e->n1->adj.erase(e->iterator); e->reverse->residual += f; if (e->reverse->residual == f) { e->reverse->iterator = e->n2->adj.insert(make_pair(f, e->reverse)); } } if (Print_Paths) { printf("Found path with flow of %d:", f); for (i = Path.size()-1; i >= 0; i--) printf(" %s", Path[i]->name.c_str()); printf("\n"); } return f; } else { return 0; } } |
When we run it, it's a little bittersweet. Indeed it finds fewer paths, because it's looking for big edges. However, since its managing a multimap rather than a list, the DFS() is more expensive. Therefore, the timing is slower than both DFS 2A and 3:
UNIX> netflow_greedy Print < g5.txt Found path with flow of 4923: s->n04 n04->n02 n02->n01 n01->n00 n00->t Found path with flow of 6: s->n04 n04->n02 n02->n01 n01->t Found path with flow of 741: s->n04 n04->n02 n02->n03 n03->n01 n01->t Found path with flow of 2363: s->n04 n04->n03 n03->n01 n01->t Found path with flow of 2054: s->n02 n02->n04 n04->n03 n03->n01 n01->t Max flow is 10087 UNIX> netflow_greedy < g10.txt Max flow is 11432 UNIX> time netflow_greedy < g100.txt Max flow is 157463 0.069u 0.003s 0:00.07 85.7% 0+0k 0+1io 0pf+0w UNIX> netflow_greedy Print < g100.txt | wc 1112 96497 835531 UNIX>
In the modified version, the multimap contains nodes along with the maximum known flow to these nodes. The node with the greatest flow is the one that we process, and we traverse its edges to see if we should update nodes in the multimap because we can get more flow to them through this node.
Here's Dijkstra() in the list version. I'm not going to walk you through the code. You should have learned enough about Dijkstra's shortest path algorithm to figure out how this works (code in netflow_dijkstra_list.cpp):
class Node { public: string name; list <class Edge *> adj; int maxflow; class Edge *backedge; multimap <int, Node *>::iterator iterator; }; ... int Graph::Dijkstra() { Edge *e; Node *n, *n2; int i, f, newflow; list <Edge *>::iterator eit; multimap <int, Node *> Q; multimap <int, Node *>::iterator qit; for (i = 0; i < Nodes.size(); i++) Nodes[i]->maxflow = 0; Q.insert(make_pair(0, Source)); while (!Q.empty()) { qit = Q.begin(); f = -qit->first; n = qit->second; Q.erase(qit); if (n == Sink) { while (n != Source) { Path.push_back(n->backedge); n = n->backedge->n1; } return 1; } for (eit = n->adj.begin(); eit != n->adj.end(); eit++) { e = *eit; n2 = e->n2; if (f == 0 || e->residual < f) { newflow = e->residual; } else { newflow = f; } if (newflow > n2->maxflow) { if (n2->maxflow != 0) Q.erase(n2->iterator); n2->maxflow = newflow; n2->backedge = e; n2->iterator = Q.insert(make_pair(-newflow, n2)); } } } return 0; } |
When we run it, you can see that it dramatically reduces both the number of paths and the time of the program:
UNIX> netflow_dijkstra_list Print < g5.txt Found path with flow of 4929: s->n02 n02->n01 n01->t Found path with flow of 4417: s->n04 n04->n03 n03->n01 n01->n00 n00->t Found path with flow of 741: s->n04 n04->n02 n02->n03 n03->n01 n01->t Max flow is 10087 UNIX> netflow_dijkstra_list < g10.txt Max flow is 11432 UNIX> time netflow_dijkstra_list < g100.txt Max flow is 157463 0.027u 0.003s 0:00.03 66.6% 0+0k 0+0io 0pf+0w UNIX> netflow_dijkstra_list Print < g100.txt | wc 43 420 2573 UNIX>What about list vs. vector? I've put the vector version in netflow_dijkstra_vector.cpp:
UNIX> netflow_dijkstra_vector Print < g5.txt Found path with flow of 4929: s->n02 n02->n01 n01->t Found path with flow of 4417: s->n04 n04->n03 n03->n01 n01->n00 n00->t Found path with flow of 741: s->n04 n04->n02 n02->n03 n03->n01 n01->t Max flow is 10087 UNIX> netflow_dijkstra_vector < g10.txt Max flow is 11432 UNIX> time netflow_dijkstra_vector < g100.txt Max flow is 157463 0.025u 0.002s 0:00.02 100.0% 0+0k 0+0io 0pf+0w UNIX> netflow_dijkstra_vector Print < g100.txt | wc 43 420 2573 UNIX>The vector version wins!
UNIX> netflow_edkarp Print < g5.txt Found path with flow of 4929: s->n02 n02->n01 n01->t Found path with flow of 741: s->n02 n02->n03 n03->n01 n01->t Found path with flow of 2801: s->n04 n04->n03 n03->n01 n01->t Found path with flow of 1616: s->n04 n04->n03 n03->n01 n01->n00 n00->t Max flow is 10087 UNIX> netflow_edkarp < g10.txt Max flow is 11432 UNIX> time netflow_edkarp < g100.txt Max flow is 157463 0.025u 0.002s 0:00.02 100.0% 0+0k 0+0io 0pf+0w UNIX> netflow_edkarp Print < g100.txt | wc 93 832 4845 UNIX>All of this makes sense -- there are more paths than when Dijkstra's algorithm is used, because it's not finding maximum flow paths. However, the act of finding paths is faster; hence why the running time is roughly the same.
The results (on my Linux box in 2012) are plotted below:
The conclusion here is pretty clear -- for graphs of this type, Modified Dijkstra and Edmonds-Karp are clearly the best algorithms to use. I'm pretty impressed, though, at how well DFS #2A performs, given how simple it is.
Suppose I create a graph in g15.txt using makerandom:
UNIX> makerandom 15 0 > g15.txt UNIX> grep s g15.txt SOURCE s EDGE s n03 2644 EDGE s n04 6510 EDGE s n05 1223 EDGE s n09 5983 EDGE s n10 509 EDGE s n12 7952 EDGE s n13 2479 UNIX>I am going to make the following calls from the Unix prompt:
Output 1:
Found path with flow of 7952: s->n12 n12->n14 n14->n02 n02->n00 n00->n11 n11->t Found path with flow of 6510: s->n04 n04->n07 n07->n13 n13->n08 n08->t Found path with flow of 5983: s->n09 n09->n14 n14->n01 n01->t Output 2: Found path with flow of 2644: s->n03 n03->n01 n01->t Found path with flow of 2479: s->n13 n13->n01 n01->t Found path with flow of 2240: s->n12 n12->n00 n00->t Output 3: Found path with flow of 7952: s->n12 n12->n14 n14->n06 n06->n02 n02->n00 n00->n11 n11->t Found path with flow of 313: s->n04 n04->n07 n07->n03 n03->n14 n14->n06 n06->n02 n02->n00 n00->n11 n11->t Found path with flow of 1499: s->n04 n04->n07 n07->n03 n03->n14 n14->n06 n06->n02 n02->n00 n00->n09 n09->n11 n11->t Output 4: Found path with flow of 2240: s->n03 n03->n01 n01->n00 n00->t Found path with flow of 404: s->n03 n03->n01 n01->n00 n00->n14 n14->n02 n02->t Found path with flow of 511: s->n13 n13->n01 n01->n00 n00->n14 n14->n02 n02->t |
To answer this, start with Edmonds-Karp. That will have the shortest paths, so it will be output 2. Modified Dijkstra will have paths of decreasing flow, which can only be output 1. The greedy DFS will start with the edge "s->n12", since that is the largest edge leaving s. Thus, it's output 3. That leaves output 4 for the vanilla DFS.
The answer (the "why" is above):