CS302 Lecture notes -- Network Flow

  • (slighly modified from Brad Vander Zanden's notes, which were adapted and expanded from Jim Plank's notes)
    Read the book on network flow. I'll hack up some of the algorithms. First, we must represent graphs with edge capacities in a file. We can do that by simply having edge specifications. These will be of the form:
    EDGE v1 v2 capacity
    
    The two vertices will have strings for names, and the capacity will be a double greater than zero.

    Also, we'll need to name the source and sink vertices:

    SOURCE vname
    SINK vname
    

    An example file is g1.nfg, which represents the graph of figure 9.39 in the book:

    EDGE s b 2
    EDGE s a 3
    EDGE b d 2
    EDGE d t 3
    EDGE a c 3
    EDGE c t 2
    EDGE a b 1
    EDGE a d 4
    SOURCE s
    SINK t
    

    Reading in the Graph

    Ok, once again, we need to figure out how to represent the graph, vertices and edges. I'm going to use the following class definitions:
    class Vertex {
      public:
        string name;
        rbTree<string> edges;
      public:
        Vertex(string n): name(n) {}
    };
    
    class Edge {
      public:
        string name;   // Names are "v1->name v2->name"
        Vertex *v1;
        Vertex *v2;
        double cap;
      public:
        Edge(Vertex *from, Vertex *to): v1(from), v2(to), cap(0.0) {
          name = v1->name;
          name += " ";
          name += v2->name;
        }
        ~Edge();
    }; 
    
    class Graph {
      public:
        rbTree<string> vertices;
        Vertex *source;
        Vertex *sink;
      public:
        Graph(): source(0), sink(0), maxcap(0.0) {}
        Vertex *get_vertex(string name);
        Edge *get_edge(Vertex *v1, Vertex *v2);
        void print_graph();
        void read_graph(string fname);
    };
    
    All rather self-explanatory. I give names to edges to help debugging. Note that the adjacency list for each vertex is sorted by the name of the vertex that it goes to. This makes it easy to find edges in log time if you have the names of the two vertices.

    Get_vertex() returns the vertex with the given name, creating it if it does not exist already. Similarly, Get_edge() returns the edge between the given vertices, creating it and giving it zero capacity if it does not exist already. These are straightforward:

    Vertex *Graph::get_vertex(string name)
    {
      Vertex *v;
    
      if (!vertices.find(name)) {
        v = new Vertex(name);
        vertices.insert(name, new_jval_v(v));
      } else {
        v = (Vertex *) vertices.getVal().v;
      }
      return v;
    }
    
    Edge *Graph::get_edge(Vertex *v1, Vertex *v2)
    {
      Edge *e;
    
      if (!(v1->edges.find(v2->name))) {
        e = new Edge(v1, v2);
        v1->edges.insert(v2->name, new_jval_v(e));
      } else {
        e = (Edge *) v1->edges.getVal().v;
      }
      return e;
    }
    
    Ok -- the program nfreader.cpp reads in one of these graphs and prints it out using the routine print_graph(). Here is example output when executed on g1.nfg:
    UNIX> nfreader g1.nfg
    Source s, Sink t
    
    Vertex a
      Edge to b.  Capacity   1.000
      Edge to c.  Capacity   3.000
      Edge to d.  Capacity   4.000
    
    Vertex b
      Edge to d.  Capacity   2.000
    
    Vertex c
      Edge to t.  Capacity   2.000
    
    Vertex d
      Edge to t.  Capacity   3.000
    
    Vertex s
      Edge to a.  Capacity   3.000
      Edge to b.  Capacity   2.000
    
    Vertex t
    
    UNIX> 
    

    Augpath1.cpp -- augmenting paths, try 1

    Ok -- our first pass at network flow is to write augpath1.cpp. First, we add a few fields to our classes (these are boldfaced):
    class Vertex {
      public:
        string name;
        rbTree<string> edges;
        int visited;
        ...
    };
    
    class Edge {
      public:
        string name;   // Names are "v1->name v2->name"
        Vertex *v1;
        Vertex *v2;
        double cap;
        ...
    }; 
    
    class Graph {
      public:
        double flow;
        double maxcap;
        rbTree<string> vertices;
        Vertex *source;
        Vertex *sink;
        ...
    };
    
    Now, after creating our main graph, we create a flow graph fg. This is a graph with the same vertices as our original graph, but no edges. We create it with the following code:
      for (g->vertices.first(); !g->vertices.endOfList(); g->vertices.next()) {
        v = (Vertex *) g->vertices.getVal().v;
        fg->get_vertex(v->name);
      }
      fg->source = fg->get_vertex(g->source->name);
      fg->sink = fg->get_vertex(g->sink->name);
      fg->flow = 0.0;
    
    Next, we try to find paths. We first set all the visited fields of vertices to false by incrementing the num_dfs counter by 1. This technique of using a counter for the visited field is a cute trick that you should remember. When we visit a vertex we will set its visited field to num_dfs. When we want to see if a vertex has been visited we will compare its visited field with num_dfs. If the two values are equal, then the vertex has been visited; otherwise it has not been visited. If we did not use the num_dfs counter, then we would have to go through all the vertices before each call to visit and set their visited field to false. Note that we accomplish the same result in one statement by incrementing num_dfs.

    Now we call the visit() routine on the source. Here is visit():

    bool visit(Vertex *v, Vertex *sink, dList *path)
    {
      Edge *e;
    
      if (v->visited == num_dfs) return false;
      if (v == sink) return true;
      v->visited = num_dfs;
    
      rbTree<string> *edges = &(v->edges);
      for (edges->first(); !edges->endOfList(); edges->next()) {  
        e = (Edge *) edges->getVal().v;
        if (visit(e->v2, sink, path)) {
          path->prepend(e);
          return true;
        }
      }
      return false;
    }
    
    It does what you think it would do -- it recursively finds a path to the sink, and when the path is found, each edge is put into the list path. When we're done, path will contain a path from the source to the sink.

    Now, in our main(), we find a path, figure out its capacity (the capacity of the smallest edge), then add that to the flow graph, and remove it from the original graph (which thus becomes the residual graph). Here is the code (print statements are removed). Note that when flow is added to the flow graph, get_edge() is called to get the correct edge. This is nice because if the edge does not exist, it is created correctly.

      ok = true;
      while (ok) {
        num_dfs++;  // make all visited fields be false
    
        /* Find a path */
    
        path = new dList<Edge *>;
        ok = visit(g->source, g->sink, path);
        mincap = g->maxcap;
    
        /* If a path is found */
        if (ok) {
    
          /* Find the maximum capacity of the path */
          for (path->first(); !path->endOfList(); path->next()) {
            e = path->get();
            if (e->cap < mincap) mincap = e->cap;
          }
    
          /* Add that flow to the flow graph and subtract it from the 
             residual graph */
    
          fg->flow += mincap;
          for (path->first(); !path->endOfList(); path->next()) {
            e = path->get();
            e->cap -= mincap;
            v1 = fg->get_vertex(e->v1->name);
            v2 = fg->get_vertex(e->v2->name);
            if (e->cap == 0.0) {
              delete e;
            }
            e = fg->get_edge(v1, v2);
            e->cap += mincap;
          }
        }
        delete path;
      }
      printf("Total flow: %.3lf\n", fg->flow);
    
    Now, I call augpath1 on g1.nfg, and you'll see that it doesn't find the maximum flow of 5:
    UNIX> augpath1 g1.nfg no
    Total flow: 4.000
    UNIX> 
    
    If you call it with a last argument of yes, it prints out the paths found, plus the flow and residual graphs. The output of augpath1 on g1.nfg is in a1-g1.txt. As you can see, it finds the paths:
    Path found -- s -> a -> b -> d -> t -- cap = 1.000
    Path found -- s -> a -> c -> t -- cap = 2.000
    Path found -- s -> b -> d -> t -- cap = 1.000
    
    Thus, it shows how the simplistic augmenting paths algorithm can fail. The file g2.nfg has the same graph with the vertices named differently so that the correct paths are found by augpath1:
    UNIX> augpath1 g2.nfg no
    Total flow: 5.000
    UNIX> augpath1 g2.nfg yes > a1-g2.nfg
    UNIX> grep Path a1-g2.nfg
    Path found -- s -> a -> b -> t -- cap = 2.000
    Path found -- s -> c -> b -> t -- cap = 1.000
    Path found -- s -> c -> d -> t -- cap = 2.000
    UNIX> 
    
    Make sure you understand this example and why it works for g2.nfg but not for g1.nfg.

    Augpath2.cpp -- fixing the algorithm with back edges

    As explained in the book, you can fix the algorithm by adding back edges to the residual graph. When ever you subtract flow from an edge in the residual graph, you add it to the back edge. Augpath2.cpp contains straightforward code to do that:
          /* Update the residual graph. For each edge on the path, reduce the 
               capacity along that edge by 
               mincap, and then add that capacity to the back edge */
    
          for (path->first(); !path->endOfList(); path->next()) {
            e = path->get();
            e->cap -= mincap;
            backedge = g->get_edge(e->v2, e->v1);
            backedge->cap += mincap;
    
            v1 = fg->get_vertex(e->v1->name);
            v2 = fg->get_vertex(e->v2->name);
            if (e->cap == 0.0) {
              delete e;
            }
    
            /* Add flow to the flow graph */
            e = fg->get_edge(v1, v2);
            e->cap += mincap;
          }
    
    Note once again how get_edge() makes our life easier -- we don't have to care whether the back edge actually exists -- if it doesn't, get_edge() creates it.

    Ok -- augpath2.cpp works now for both g1.nfg and g2.nfg. You can see how the backedge works in the output -- the third path (s->b->a->d->t) pushes flow back along the back edge.

    UNIX> augpath2 g1.nfg no
    Total flow: 5.000
    UNIX> augpath2 g1.nfg yes > a2-g1.txt
    UNIX> grep Path a2-g1.txt
    Path found -- s -> a -> b -> d -> t -- cap = 1.000
    Path found -- s -> a -> c -> t -- cap = 2.000
    Path found -- s -> b -> a -> d -> t -- cap = 1.000
    Path found -- s -> b -> d -> t -- cap = 1.000
    UNIX> 
    

    Speed

    Augpath2.cpp is a fine piece of code, but it has two big inefficiencies:
    1. Finding backedges could be done in constant time, but instead it does a red-black-tree find operation.
    2. Finding flow-graph edges could be done in constant time, but instead it does a red-black-tree find operation.
    Fixing these is not too hard, but it makes the code yucky, and interestingly, less efficient in some cases. First, to test speed, I had to create some big graphs. I did this by writing makerandom.cpp, which creates a random graph with a source, a sink, and a given number of other nodes:
    main(int argc, char **argv)
    {
      int nnodes;
      int i, j;
      double cap;
    
      if (argc != 2) {
        fprintf(stderr, "makerandom nnodes\n");
        exit(1);
      }
    
      nnodes = atoi(argv[1]);
    
      printf("SOURCE s\n");
      printf("SINK t\n");
    
      for (i = 0; i < nnodes; i++) {
        j = lrand48()%10;
        if (j < 4) {
          printf("EDGE s n%05d %.3lf\n", i, drand48()*10.0);
        } else if (j < 8) {
          printf("EDGE n%05d t %.3lf\n", i, drand48()*10.0);
        }
    
        for (j = i+1; j < nnodes; j++) {
          cap = drand48()*10.0;
          if (drand48() < .5) {
            printf("EDGE n%05d n%05d %.3lf\n", i, j, cap);
          } else {
            printf("EDGE n%05d n%05d %.3lf\n", j, i, cap);
          }
        }
      }
    }
    
    What it does is put an edge in a random direction between every pair of nodes other than the source or sink. It also has 40 percent of the nodes connected to the source, and 40 percent connected to the sink. Capacities are random numbers between 0 and 10.

    I used this to create 10 random graph files with 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100 nodes. These are in randxx.nfg.

    I then timed augpath2 and other variations on these graphs (on cetus5a during the day). Interestingly, augpath2 gets pretty slow -- for rand100.nfg, it takes roughly a minute to find the flow.

    The timings of the programs are graphed below, in both linear and log scale:


    I went about eliminating the two inefficiencies. The first change is in augpath3.cpp, where I have a pointer to each edge's back edge. These are only used in the main graph, and are created in read_graph(). To make my life simpler, I don't delete edges anymore -- instead, all edges have back edges, perhaps with capacity zero. This mandates a change to visit() because it should not search on zero capacity edges. Now, the code to set backedge capacities does not need to actually find the backedge.

    Interestingly, this improves performance a little for graphs under 40 nodes. For bigger graphs, the addition of all those zero capacity edges makes for extra overhead in visit(), since it now has to ignore zero capacity edges, and it actually degrades performance.

    The second change was to eliminate the search for the flow graph edge. Instead, each edge in g has a pointer to its corresponding edge in the flow graph. This is in augpath4.cpp. As you see in the graphs above, again, this improves performance over augpath2 for smaller graphs (under 70 nodes, but as the graphs get bigger, those zero capacity edges become a problem).


    Augpath6.cpp -- The greedy algorithm

    The first algorithm that the book tries is a greedy algorithm that does a depth-first search in order of edge capacities (remember that this algorithm is not optimal but it still shows how a greedy algorithm can significantly improve performance). In other words, instead of sorting the edges by alphabetical order, as done in the above programs, you sort them by capacity, and in visit() you visit the larger edges first. This is done in augpath6.cpp. We maintain two adjacency trees in each vertex -- one sorted in alphabetical order, and one sorted by capacity. Then, visit() traverses the sorted tree in reverse order:
    bool visit(Vertex *v, Vertex *sink, dList *path)
    {
      Edge *e;
    
      if (v->visited == num_dfs) return false;
      if (v == sink) return true;
      v->visited = num_dfs;
    
      rbTree<double> *edges = &(v->e_sorted);
      for (edges->last(); !edges->endOfList(); edges->prev()) {  
        e = (Edge *) edges->getVal().v;
        if (e->cap <= 0) return false;
        if (visit(e->v2, sink, path)) {
          path->prepend(e);
          return true;
        }
      }
      return false;
    }
    
    One nice thing about this is that if a vertex has 100 zero capacity edges, only the first gets tested in visit().

    The other important code here is when you change the capacity of an edge. When this happens, the edge needs to be removed from e_sorted before the change is made, and re-inserted afterwards. Here is that code:

    	// update the edge's capacity. In doing so, we must remove it
    	// from the vertex's sorted_edges list and re-insert it.
    	e->v1->remove_sorted_edge(e);
            e->cap -= mincap;
    	e->v1->e_sorted.insert(e->cap, new_jval_v(e));
    
    	// update the backedge's capacity and update its position in
    	// the sorted edges list
    	e->backedge->v1->remove_sorted_edge(e->backedge);
            e->backedge->cap += mincap;        /* Note, now we don't need to find
                                                  the back edge */
    	e->backedge->v1->e_sorted.insert(e->backedge->cap, 
    					 new_jval_v(e->backedge));
    
            /* Add flow to the flow graph */
            e->fgedge->cap += mincap;
    
    As you see, this improves performance in a huge way. If you grep for paths, you'll see that augpath6 is very successful at reducing the number of paths that you are looking for (7682 down do 84):
    UNIX> augpath2 rand100.nfg yes | grep Path | wc
        7682 1106800 5426452
    UNIX> augpath6 rand100.nfg yes | grep Path | wc
          84   13962   68634
    UNIX>
    

    Augpath5 -- some bad code

    Augpath5.cpp is just like augpath6.cpp, only it traverses the e_sorted tree in the wrong order in visit(). Obviously, a bad idea, as you can see from the graphs. This is just an example where a very tiny change in the code can affect performance drastically.


    Augpath7 -- the optimal algorithm

    As well as augpath6 does, we can do better still. The greedy algorithm in Augpath6 does not always find the augmenting path that guarantees the largest increase in flow, and we can do better if we always choose this path. To see that the greedy algorithm does not always find the augmenting path with the largest flow, check out g3.nfg. When we run augpath6 on it, we get:

    UNIX> augpath6 rand100.nfg yes | grep Path
    Path found -- a -> c -> d -> f -- cap = 3.000
    Path found -- a -> b -> d -> f -- cap = 5.000
    Path found -- a -> c -> e -> f -- cap = 3.000
    Path found -- a -> b -> e -> f -- cap = 1.000
    UNIX> 
    

    The first augmenting path chosen is not the one that increases the flow by the greatest amount. That honor belongs to the path a -> b -> d -> f, which increases the flow by 6.0.

    The book indicates that a one-line change to Dijkstra's algorithm allows us to always find the largest augmenting path. That may be true, but the change is not obvious. We want to perform a priority-first search in which 1) each vertex keeps track of the maximum flow that can currently reach it from the source and, 2) we visit the next vertex with the maximum flow. When we visit the sink vertex we can stop because we know that all the remaining vertices have lesser flow from the source and thus a path through them cannot result in a greater flow to the sink. The modified algorithm is shown in augpath7.cpp (it modifies augpath4, not augpath6). We add two instance variables to the Vertex class: 1) a flow variable that keeps track of the largest possible flow from the source to this vertex, and 2) a path variable that stores the edge from the vertex's immediate predecessor on the augmenting path that allows this flow. We also maintain a max heap since we always want to remove the largest flow vertex from the heap. visit() now does the priority-first search just described:

    bool visit(Vertex *source, Vertex *sink, dList *path)
    {
      Edge *e;
      Vertex *currentVertex, *neighborVertex;
    
      flowHeap->insert(source->flow, new_jval_v(source));
    
      while (!flowHeap->empty()) {
        currentVertex = (Vertex *)flowHeap->deleteMax().v;
    
        // for simplicity we update vertice's flow by re-inserting them into
        // the priority queue so there could be multiple instances of the
        // same vertex in the heap. If we remove an instance of a vertex that
        // has already been visited, we discard it
        if (currentVertex->visited == VISITED)
          continue;
    
        // compute the path to the sink if we've reached the sink by 
        // walking backwards from the sink
        else if (currentVertex == sink) {
          for (Vertex *v = currentVertex; v != source; v = v->path->v1)
    	path->prepend(v->path);
          return true;
        }
    
        currentVertex->visited = VISITED;
    
        rbTree *edges = &(currentVertex->edges);
        for (edges->first(); !edges->endOfList(); edges->next()) {  
          e = (Edge *) edges->getVal().v;
          neighborVertex = e->v2;
    
          // if the neighbor vertex is not yet in the fringe, we re-set its
          // flow to 0 and mark it as being in the fringe
          if (neighborVertex->visited < FRINGE) {
    	neighborVertex->flow = 0.0;
    	neighborVertex->visited = FRINGE;
          }
          if ( neighborVertex->visited != VISITED ) {
    	double candidateFlow = min(currentVertex->flow, e->cap);
    
    	// if the current vertex can increase the flow to the neighbor
    	// vertex, then note this fact by re-inserting the neighbor
    	// vertex into the priority queue and by saving the edge from
    	// current vertex to neighbor vertex
    	if (candidateFlow > neighborVertex->flow) {
    	  neighborVertex->flow = candidateFlow;
    	  flowHeap->insert(candidateFlow, new_jval_v(neighborVertex));
    	  neighborVertex->path = e;
    	}
          }
        }
      }
      return false;
    }
    
    Check out augpath7 on g3.nfg and you can see that it now finds the largest augmenting path at each step and finishes one step sooner than augpath6:

    UNIX> augpath6 rand100.nfg yes | grep Path
    Path found -- a -> b -> d -> f -- cap = 6.000
    Path found -- a -> c -> e -> f -- cap = 3.000
    Path found -- a -> c -> d -> b -> e -> f -- cap = 3.000
    
    UNIX>

    The decrease in the number of steps is even more profound for larger graphs. If you grep for paths on rand100.nfg, you'll see that augpath7 halves the number of paths:

    UNIX> augpath6 rand100.nfg yes | grep Path | wc
          84   13962   68634
    UNIX> augpath7 rand100.nfg yes | grep Path | wc
          44     788    3324
    

    The paths generated by augpath7 are also a fraction of the size of the paths generated by augpath6. As an example, here are the first augmenting paths generated by each algorithm:

    augpath7:
    s -> n00013 -> n00080 -> n00084 -> n00081 -> n00059 -> n00096 -> n00046 
      -> n00041 -> n00036 -> n00002 -> n00032 -> n00094 -> n00009 -> n00037 
      -> t -- cap = 9.659
    
    augpath6:
    s -> n00013 -> n00080 -> n00085 -> n00021 -> n00064 -> n00010 -> n00017 
      -> n00035 -> n00055 -> n00084 -> n00019 -> n00026 -> n00022 -> n00041 
      -> n00028 -> n00087 -> n00053 -> n00007 -> n00089 -> n00057 -> n00071 
      -> n00099 -> n00003 -> n00070 -> n00065 -> n00040 -> n00004 -> n00062 
      -> n00096 -> n00046 -> n00067 -> n00043 -> n00042 -> n00063 -> n00050 
      -> n00025 -> n00097 -> n00039 -> n00014 -> n00093 -> n00078 -> n00033 
      -> n00092 -> n00072 -> n00030 -> n00036 -> n00002 -> n00032 -> n00094 
      -> n00009 -> n00048 -> n00044 -> n00034 -> n00098 -> n00061 -> n00095 
      -> n00076 -> n00047 -> n00049 -> n00056 -> n00082 -> n00023 -> n00066 
      -> n00016 -> n00081 -> n00059 -> n00031 -> n00083 -> n00074 -> n00029 
      -> n00090 -> n00051 -> n00001 -> n00069 -> n00011 -> n00015 -> n00088 
      -> n00073 -> n00068 -> t -- cap = 8.269
    

    The big question is, does augpath7 actually improve the speed of the algorithm? I tried timing both augpath6 and augpath7 on the 100 node random graph. augpath7 took 140ms and augpath6 took 360ms so on that particular graph, augpath7 is 2.5 times faster than augpath6 (these measurements were made several years after the graphed measurements so Moore's law had kicked in and reduced the running time of augpath6 considerably). In general this result should hold and augpath7 should be significantly faster than augpath6.