EDGE v1 v2 capacityThe two vertices will have strings for names, and the capacity will be a double greater than zero.
Also, we'll need to name the source and sink vertices:
SOURCE vname SINK vname
An example file is g1.nfg, which represents the graph of figure 9.39 in the book:
EDGE s b 2 EDGE s a 3 EDGE b d 2 EDGE d t 3 EDGE a c 3 EDGE c t 2 EDGE a b 1 EDGE a d 4 SOURCE s SINK t
class Vertex {
public:
string name;
rbTree<string> edges;
public:
Vertex(string n): name(n) {}
};
class Edge {
public:
string name; // Names are "v1->name v2->name"
Vertex *v1;
Vertex *v2;
double cap;
public:
Edge(Vertex *from, Vertex *to): v1(from), v2(to), cap(0.0) {
name = v1->name;
name += " ";
name += v2->name;
}
~Edge();
};
class Graph {
public:
rbTree<string> vertices;
Vertex *source;
Vertex *sink;
public:
Graph(): source(0), sink(0), maxcap(0.0) {}
Vertex *get_vertex(string name);
Edge *get_edge(Vertex *v1, Vertex *v2);
void print_graph();
void read_graph(string fname);
};
All rather self-explanatory. I give names to edges to help debugging.
Note that the adjacency list for each vertex is sorted by the name
of the vertex that it goes to. This makes it easy to find edges in
log time if you have the names of the two vertices.
Get_vertex() returns the vertex with the given name, creating it if it does not exist already. Similarly, Get_edge() returns the edge between the given vertices, creating it and giving it zero capacity if it does not exist already. These are straightforward:
Vertex *Graph::get_vertex(string name)
{
Vertex *v;
if (!vertices.find(name)) {
v = new Vertex(name);
vertices.insert(name, new_jval_v(v));
} else {
v = (Vertex *) vertices.getVal().v;
}
return v;
}
Edge *Graph::get_edge(Vertex *v1, Vertex *v2)
{
Edge *e;
if (!(v1->edges.find(v2->name))) {
e = new Edge(v1, v2);
v1->edges.insert(v2->name, new_jval_v(e));
} else {
e = (Edge *) v1->edges.getVal().v;
}
return e;
}
Ok -- the program nfreader.cpp
reads in one of these graphs and prints it out using the routine
print_graph(). Here is example output when executed
on g1.nfg:
UNIX> nfreader g1.nfg Source s, Sink t Vertex a Edge to b. Capacity 1.000 Edge to c. Capacity 3.000 Edge to d. Capacity 4.000 Vertex b Edge to d. Capacity 2.000 Vertex c Edge to t. Capacity 2.000 Vertex d Edge to t. Capacity 3.000 Vertex s Edge to a. Capacity 3.000 Edge to b. Capacity 2.000 Vertex t UNIX>
class Vertex {
public:
string name;
rbTree<string> edges;
int visited;
...
};
class Edge {
public:
string name; // Names are "v1->name v2->name"
Vertex *v1;
Vertex *v2;
double cap;
...
};
class Graph {
public:
double flow;
double maxcap;
rbTree<string> vertices;
Vertex *source;
Vertex *sink;
...
};
Now, after creating our main graph, we create a flow graph fg.
This is a graph with the same vertices as our original graph, but
no edges. We create it with the following code:
for (g->vertices.first(); !g->vertices.endOfList(); g->vertices.next()) {
v = (Vertex *) g->vertices.getVal().v;
fg->get_vertex(v->name);
}
fg->source = fg->get_vertex(g->source->name);
fg->sink = fg->get_vertex(g->sink->name);
fg->flow = 0.0;
Next, we try to find paths. We first set all the visited fields of
vertices to false by incrementing the num_dfs counter by 1.
This technique of using a counter for the visited field is a cute trick
that you should remember. When we visit a vertex we will set its visited
field to num_dfs. When we want to see if a vertex has been visited
we will compare its visited field with num_dfs. If the two
values are equal, then the vertex has been visited; otherwise it has not
been visited. If we did not use the num_dfs counter, then we
would have to go through all the vertices before each call to visit and
set their visited field to false. Note that we accomplish the same result
in one statement by incrementing num_dfs.
Now we call the visit() routine on the source. Here is visit():
bool visit(Vertex *v, Vertex *sink, dListIt does what you think it would do -- it recursively finds a path to the sink, and when the path is found, each edge is put into the list path. When we're done, path will contain a path from the source to the sink.*path) { Edge *e; if (v->visited == num_dfs) return false; if (v == sink) return true; v->visited = num_dfs; rbTree<string> *edges = &(v->edges); for (edges->first(); !edges->endOfList(); edges->next()) { e = (Edge *) edges->getVal().v; if (visit(e->v2, sink, path)) { path->prepend(e); return true; } } return false; }
Now, in our main(), we find a path, figure out its capacity (the capacity of the smallest edge), then add that to the flow graph, and remove it from the original graph (which thus becomes the residual graph). Here is the code (print statements are removed). Note that when flow is added to the flow graph, get_edge() is called to get the correct edge. This is nice because if the edge does not exist, it is created correctly.
ok = true;
while (ok) {
num_dfs++; // make all visited fields be false
/* Find a path */
path = new dList<Edge *>;
ok = visit(g->source, g->sink, path);
mincap = g->maxcap;
/* If a path is found */
if (ok) {
/* Find the maximum capacity of the path */
for (path->first(); !path->endOfList(); path->next()) {
e = path->get();
if (e->cap < mincap) mincap = e->cap;
}
/* Add that flow to the flow graph and subtract it from the
residual graph */
fg->flow += mincap;
for (path->first(); !path->endOfList(); path->next()) {
e = path->get();
e->cap -= mincap;
v1 = fg->get_vertex(e->v1->name);
v2 = fg->get_vertex(e->v2->name);
if (e->cap == 0.0) {
delete e;
}
e = fg->get_edge(v1, v2);
e->cap += mincap;
}
}
delete path;
}
printf("Total flow: %.3lf\n", fg->flow);
Now, I call augpath1 on
g1.nfg, and you'll see that it doesn't
find the maximum flow of 5:
UNIX> augpath1 g1.nfg no Total flow: 4.000 UNIX>If you call it with a last argument of yes, it prints out the paths found, plus the flow and residual graphs. The output of augpath1 on g1.nfg is in a1-g1.txt. As you can see, it finds the paths:
Path found -- s -> a -> b -> d -> t -- cap = 1.000 Path found -- s -> a -> c -> t -- cap = 2.000 Path found -- s -> b -> d -> t -- cap = 1.000Thus, it shows how the simplistic augmenting paths algorithm can fail. The file g2.nfg has the same graph with the vertices named differently so that the correct paths are found by augpath1:
UNIX> augpath1 g2.nfg no Total flow: 5.000 UNIX> augpath1 g2.nfg yes > a1-g2.nfg UNIX> grep Path a1-g2.nfg Path found -- s -> a -> b -> t -- cap = 2.000 Path found -- s -> c -> b -> t -- cap = 1.000 Path found -- s -> c -> d -> t -- cap = 2.000 UNIX>Make sure you understand this example and why it works for g2.nfg but not for g1.nfg.
/* Update the residual graph. For each edge on the path, reduce the
capacity along that edge by
mincap, and then add that capacity to the back edge */
for (path->first(); !path->endOfList(); path->next()) {
e = path->get();
e->cap -= mincap;
backedge = g->get_edge(e->v2, e->v1);
backedge->cap += mincap;
v1 = fg->get_vertex(e->v1->name);
v2 = fg->get_vertex(e->v2->name);
if (e->cap == 0.0) {
delete e;
}
/* Add flow to the flow graph */
e = fg->get_edge(v1, v2);
e->cap += mincap;
}
Note once again how get_edge() makes our life easier -- we don't
have to care whether the back edge actually exists -- if it doesn't,
get_edge() creates it.
Ok -- augpath2.cpp works now for both g1.nfg and g2.nfg. You can see how the backedge works in the output -- the third path (s->b->a->d->t) pushes flow back along the back edge.
UNIX> augpath2 g1.nfg no Total flow: 5.000 UNIX> augpath2 g1.nfg yes > a2-g1.txt UNIX> grep Path a2-g1.txt Path found -- s -> a -> b -> d -> t -- cap = 1.000 Path found -- s -> a -> c -> t -- cap = 2.000 Path found -- s -> b -> a -> d -> t -- cap = 1.000 Path found -- s -> b -> d -> t -- cap = 1.000 UNIX>
main(int argc, char **argv)
{
int nnodes;
int i, j;
double cap;
if (argc != 2) {
fprintf(stderr, "makerandom nnodes\n");
exit(1);
}
nnodes = atoi(argv[1]);
printf("SOURCE s\n");
printf("SINK t\n");
for (i = 0; i < nnodes; i++) {
j = lrand48()%10;
if (j < 4) {
printf("EDGE s n%05d %.3lf\n", i, drand48()*10.0);
} else if (j < 8) {
printf("EDGE n%05d t %.3lf\n", i, drand48()*10.0);
}
for (j = i+1; j < nnodes; j++) {
cap = drand48()*10.0;
if (drand48() < .5) {
printf("EDGE n%05d n%05d %.3lf\n", i, j, cap);
} else {
printf("EDGE n%05d n%05d %.3lf\n", j, i, cap);
}
}
}
}
What it does is put an edge in a random direction between every pair
of nodes other than the source or sink. It also has 40 percent of the
nodes connected to the source, and 40 percent connected to the sink.
Capacities are random numbers between 0 and 10.
I used this to create 10 random graph files with 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100 nodes. These are in randxx.nfg.
I then timed augpath2 and other variations on these graphs (on cetus5a during the day). Interestingly, augpath2 gets pretty slow -- for rand100.nfg, it takes roughly a minute to find the flow.
The timings of the programs are graphed below, in both linear and log scale:
Interestingly, this improves performance a little for graphs under 40 nodes. For bigger graphs, the addition of all those zero capacity edges makes for extra overhead in visit(), since it now has to ignore zero capacity edges, and it actually degrades performance.
The second change was to eliminate the search for the flow graph edge. Instead, each edge in g has a pointer to its corresponding edge in the flow graph. This is in augpath4.cpp. As you see in the graphs above, again, this improves performance over augpath2 for smaller graphs (under 70 nodes, but as the graphs get bigger, those zero capacity edges become a problem).
bool visit(Vertex *v, Vertex *sink, dListOne nice thing about this is that if a vertex has 100 zero capacity edges, only the first gets tested in visit().*path) { Edge *e; if (v->visited == num_dfs) return false; if (v == sink) return true; v->visited = num_dfs; rbTree<double> *edges = &(v->e_sorted); for (edges->last(); !edges->endOfList(); edges->prev()) { e = (Edge *) edges->getVal().v; if (e->cap <= 0) return false; if (visit(e->v2, sink, path)) { path->prepend(e); return true; } } return false; }
The other important code here is when you change the capacity of an edge. When this happens, the edge needs to be removed from e_sorted before the change is made, and re-inserted afterwards. Here is that code:
// update the edge's capacity. In doing so, we must remove it
// from the vertex's sorted_edges list and re-insert it.
e->v1->remove_sorted_edge(e);
e->cap -= mincap;
e->v1->e_sorted.insert(e->cap, new_jval_v(e));
// update the backedge's capacity and update its position in
// the sorted edges list
e->backedge->v1->remove_sorted_edge(e->backedge);
e->backedge->cap += mincap; /* Note, now we don't need to find
the back edge */
e->backedge->v1->e_sorted.insert(e->backedge->cap,
new_jval_v(e->backedge));
/* Add flow to the flow graph */
e->fgedge->cap += mincap;
As you see, this improves performance in a huge way. If you grep
for paths, you'll see that augpath6 is very successful at
reducing the number of paths that you are looking for (7682 down
do 84):
UNIX> augpath2 rand100.nfg yes | grep Path | wc
7682 1106800 5426452
UNIX> augpath6 rand100.nfg yes | grep Path | wc
84 13962 68634
UNIX>
UNIX> augpath6 rand100.nfg yes | grep Path Path found -- a -> c -> d -> f -- cap = 3.000 Path found -- a -> b -> d -> f -- cap = 5.000 Path found -- a -> c -> e -> f -- cap = 3.000 Path found -- a -> b -> e -> f -- cap = 1.000 UNIX>
The first augmenting path chosen is not the one that increases the flow by the greatest amount. That honor belongs to the path a -> b -> d -> f, which increases the flow by 6.0.
The book indicates that a one-line change to Dijkstra's algorithm allows us to always find the largest augmenting path. That may be true, but the change is not obvious. We want to perform a priority-first search in which 1) each vertex keeps track of the maximum flow that can currently reach it from the source and, 2) we visit the next vertex with the maximum flow. When we visit the sink vertex we can stop because we know that all the remaining vertices have lesser flow from the source and thus a path through them cannot result in a greater flow to the sink. The modified algorithm is shown in augpath7.cpp (it modifies augpath4, not augpath6). We add two instance variables to the Vertex class: 1) a flow variable that keeps track of the largest possible flow from the source to this vertex, and 2) a path variable that stores the edge from the vertex's immediate predecessor on the augmenting path that allows this flow. We also maintain a max heap since we always want to remove the largest flow vertex from the heap. visit() now does the priority-first search just described:
bool visit(Vertex *source, Vertex *sink, dListCheck out augpath7 on g3.nfg and you can see that it now finds the largest augmenting path at each step and finishes one step sooner than augpath6:*path) { Edge *e; Vertex *currentVertex, *neighborVertex; flowHeap->insert(source->flow, new_jval_v(source)); while (!flowHeap->empty()) { currentVertex = (Vertex *)flowHeap->deleteMax().v; // for simplicity we update vertice's flow by re-inserting them into // the priority queue so there could be multiple instances of the // same vertex in the heap. If we remove an instance of a vertex that // has already been visited, we discard it if (currentVertex->visited == VISITED) continue; // compute the path to the sink if we've reached the sink by // walking backwards from the sink else if (currentVertex == sink) { for (Vertex *v = currentVertex; v != source; v = v->path->v1) path->prepend(v->path); return true; } currentVertex->visited = VISITED; rbTree *edges = &(currentVertex->edges); for (edges->first(); !edges->endOfList(); edges->next()) { e = (Edge *) edges->getVal().v; neighborVertex = e->v2; // if the neighbor vertex is not yet in the fringe, we re-set its // flow to 0 and mark it as being in the fringe if (neighborVertex->visited < FRINGE) { neighborVertex->flow = 0.0; neighborVertex->visited = FRINGE; } if ( neighborVertex->visited != VISITED ) { double candidateFlow = min(currentVertex->flow, e->cap); // if the current vertex can increase the flow to the neighbor // vertex, then note this fact by re-inserting the neighbor // vertex into the priority queue and by saving the edge from // current vertex to neighbor vertex if (candidateFlow > neighborVertex->flow) { neighborVertex->flow = candidateFlow; flowHeap->insert(candidateFlow, new_jval_v(neighborVertex)); neighborVertex->path = e; } } } } return false; }
UNIX> augpath6 rand100.nfg yes | grep Path Path found -- a -> b -> d -> f -- cap = 6.000 Path found -- a -> c -> e -> f -- cap = 3.000 Path found -- a -> c -> d -> b -> e -> f -- cap = 3.000UNIX>
The decrease in the number of steps is even more profound for larger graphs. If you grep for paths on rand100.nfg, you'll see that augpath7 halves the number of paths:
UNIX> augpath6 rand100.nfg yes | grep Path | wc
84 13962 68634
UNIX> augpath7 rand100.nfg yes | grep Path | wc
44 788 3324
The paths generated by augpath7 are also a fraction of the size of the paths generated by augpath6. As an example, here are the first augmenting paths generated by each algorithm:
augpath7: s -> n00013 -> n00080 -> n00084 -> n00081 -> n00059 -> n00096 -> n00046 -> n00041 -> n00036 -> n00002 -> n00032 -> n00094 -> n00009 -> n00037 -> t -- cap = 9.659 augpath6: s -> n00013 -> n00080 -> n00085 -> n00021 -> n00064 -> n00010 -> n00017 -> n00035 -> n00055 -> n00084 -> n00019 -> n00026 -> n00022 -> n00041 -> n00028 -> n00087 -> n00053 -> n00007 -> n00089 -> n00057 -> n00071 -> n00099 -> n00003 -> n00070 -> n00065 -> n00040 -> n00004 -> n00062 -> n00096 -> n00046 -> n00067 -> n00043 -> n00042 -> n00063 -> n00050 -> n00025 -> n00097 -> n00039 -> n00014 -> n00093 -> n00078 -> n00033 -> n00092 -> n00072 -> n00030 -> n00036 -> n00002 -> n00032 -> n00094 -> n00009 -> n00048 -> n00044 -> n00034 -> n00098 -> n00061 -> n00095 -> n00076 -> n00047 -> n00049 -> n00056 -> n00082 -> n00023 -> n00066 -> n00016 -> n00081 -> n00059 -> n00031 -> n00083 -> n00074 -> n00029 -> n00090 -> n00051 -> n00001 -> n00069 -> n00011 -> n00015 -> n00088 -> n00073 -> n00068 -> t -- cap = 8.269
The big question is, does augpath7 actually improve the speed of the algorithm? I tried timing both augpath6 and augpath7 on the 100 node random graph. augpath7 took 140ms and augpath6 took 360ms so on that particular graph, augpath7 is 2.5 times faster than augpath6 (these measurements were made several years after the graphed measurements so Moore's law had kicked in and reduced the running time of augpath6 considerably). In general this result should hold and augpath7 should be significantly faster than augpath6.