CS302 Lecture Notes

CS302 Lecture Notes - Network Flow

Lecture 2: Programming the Ford-Fulkerson Algorithm (with DFS)

James S. Plank
Original Notes: April 2, 2008.
Major Overhaul: November, 2014.
Latest revision: October, 2020.
Directory: /home/plank/cs302/Notes/Netflow-All

In this lecture, we are going to program the generic Ford-Fulkerson algorithm for network flow. We will represent directed, weighted graphs in text files, read them into a graph data structure, then find the maximum flow. We won't put any smarts into finding augmenting paths. We will simply use DFS.

Graph Representation and Generating Random Graphs

We are going to represent graphs with a very flexible file format. The file will contain a stream of words, which should be in the following format:

EDGE n1 n2 capacity -- This specifies that there is an edge from node n1 to node n2 with the given capacity. Capacities must be positive integers. If an edge is specified multiple times, then the capacities are added. It is fine to have edges in both directions between a pair of nodes.
SOURCE name -- This specifies the name of the source node.
SINK name -- This specifies the name of the sink node.

There must be a single source and sink.

So, for example, below are two files that represent the graph from the First set of network flow lecture notes. They differ in the order in which they specify edges. Thus, the graph is the same, but the order of the adjacency lists will differ.

g1.txt:

SOURCE S
SINK T
EDGE A B 5
EDGE A T 5
EDGE B T 8
EDGE C B 13
EDGE C D 10
EDGE D T 12
EDGE S A 10
EDGE S C 14

g1a.txt:

SOURCE S
SINK T
EDGE D T 12
EDGE A T 5
EDGE S C 14
EDGE S A 10
EDGE C D 10
EDGE B T 8
EDGE A B 5
EDGE C B 13

I also have a program that generates random graphs, called src/makerandom.cpp. This program takes one or two arguments. The first argument is a number of nodes. The second number is an optional seed to srand48(). The program creates a random graph with one source, one sink, and the given number of other nodes. There is a random edge between every pair of nodes (in a random direction) with a random integer capacity between 1 and 10,000. There are edges from the source to random nodes with a 40% probability, and there are edges to the sink from random nodes with a 40% probability. Thus, this is a pretty dense graph which should be a challenge to our network flow programs.

Below is an example of a graph made with makerandom 5. I think we can all agree that finding the maximum flow through this graph will be a bit of a challenge. To help you, I've colored the edges in the minimum cut red:

g5.txt:

SOURCE s
SINK t
EDGE n00 t 4923
EDGE n01 n00 8824
EDGE n00 n02 6932
EDGE n00 n03 6518
EDGE n00 n04 6183
EDGE n01 t 8471
EDGE n02 n01 4929
EDGE n03 n01 5566
EDGE n01 n04 6661
EDGE s n02 6263
EDGE n02 n03 0741
EDGE n04 n02 5840
EDGE n04 n03 4417
EDGE s n04 8033

And here's a graph showing the maximum flow of 10,087

The Implementations

I have eight network flow implementations. I will only explain a few of them in detail; however, I will explain how they each differ, and I will do a performance comparison.

These are written in an "old-school" manner, which will be useful for you to read-over so you can prepare for CS360. I may update them someday, but not this year...

The Classes in the Program

I will start my implementation with the program src/netflow_skeleton.cpp. This sets up the classes and reads the graph. We will modify it to calculate the maximum flow.

My program has three basic classes: Node, Edge and Graph. Let's start with the Node and Edge classes:

class Node {
  public: 
    string name;                    // The node's name
    vector <class Edge *> adj;      // The node's adjacency list in the residual graph.
    int visited;                    // A visited field for depth-first search.
};

class Edge {
  public:
    void Print() const;             // This prints the edge's name and residual flow.
    string name;                    // The edge's name (to store in the edge map).
    Node *n1;                       // The "from" node
    Node *n2;                       // The "to" node
    Edge *reverse;                  // The reverse edge, for processing the residual graph
    int original;                   // The original edge capacity.
    int residual;                   // The edge capacity in the residual graph.
};

Nodes are pretty simple. They have names, and adjacency lists. For now, we will represent the adjacency lists with vectors. Nodes also have a visited field, which helps us when we use DFS to find paths.

Edges are a little meatier. They have names, and pointers to the nodes that they connect. We store their original capacities and their residual capacities. If we wanted to maintain flow, we could do that too. However, I'm not doing that, simply to keep the program cleaner. When you do your lab, you're going to need to maintain the flow.

Each edge has a pointer to its reverse edge. This simplifies the task of processing the residual graph. I'm going to make sure that each edge always has a reverse edge, regardless of whether that reverse edge really exists in the residual graph. If the reverse edge really doesn't exist, I'm going to create it anyway, and give it an original and residual capacity of zero. When I "remove" an edge from the residual graph, I simply set its residual capacity to zero.

In this initial implementation, I'm going to have all edges go onto their nodes' adjacency lists, regardless of whether their capacities are zero. That makes the programming easier to begin with. We'll fix this later.

Finally, each edge has a name and a Print() method, which is useful, because there are several places where we print out edges.

You'll note, we're keeping pointers to nodes and edges. This is because there are several different places where we want to access nodes and edges, and to avoid making copies, we have exactly one copy of each node and edge, and we manipulate pointers to them.

Here is the definition of the Graph class:

class Graph {
  public:
     Graph();
     ~Graph();
     void Print() const;                 /* Print the graph to stdout. */
     string Verbose;                     /* G = print graph at each step. 
                                            P = print augmenting paths at each step.
                                            B = Basic: print graph at beginning and end. */
     Node *Get_Node(const string &s);    /* Create a node, or return it if already created. */
     Edge *Get_Edge(Node *n1, Node *n2); /* Create an edge or return it if already created. */
     int MaxFlow();                      /* Do the max flow calculation for the graph. */
     int Find_Augmenting_Path();         /* Find and process an augmenting path. */
  
     bool DFS(Node *n);                  /* DFS to find an augmenting path - returns success. */
     vector <Edge *> Path;               /* The augmenting path. */

     int NPaths;                         /* Number of paths for the calculation. */
     int MaxCap;                         /* Maximum edge capacity */

     Node *Source;                       /* Source and sink node. */
     Node *Sink;

     vector <Node *> Nodes;              /* All of the nodes. */
     vector <Edge *> Edges;              /* All of the edges. */

     map <string, Node *> N_Map;         /* The nodes, keyed by name. */
     map <string, Edge *> E_Map;         /* The edges, keyed by name. */
};

There's quite a bit here. Let's get the simple things out of the way. There is a constructor, a destructor, and a Print() method. I keep a string Verbose, which lets me adjust what I print:

If the string has a G, then I'm going to print the residual graph before finding each augmenting path.
If the string has a P, then I'm going to print each augmenting path.
If the string has a B, then I will print the graph at the beginning and the end.
I will always print the maximum flow and the number of augmenting paths found at the end of the program.

I have variables to store the number of paths, the source, the sink, and the maximum capacity that any edge in the graph can achieve. This gives me a sentinel for flow calculations.

The next four fields are how I hold nodes and edges. I have a vector, which enables me to easily access every node and edge, and I have a map, which I only use when I'm reading in the file. The map lets me look up nodes and edges by name. (Note in 2023 -- When I shifted to C++-11, I should have changed this map to an unordered_map. So, if you were thinking "why isn't he using an unordered_map?" you are thinking correctly).

Get_Node() and Get_Edge() are really convenient procedures. They look up the node or edge in the relevant map, and if it's there, they return a pointer. If it's not there, then they create it, insert it into map, and return it.

The last three methods and Path are for doing the network flow calculation.

Reading in the Graph

As always, reading in the graph is a bit of a pain. First, here are Get_Node() and Get_Edge(). They are straightforward:

Node *Graph::Get_Node(const string &s)
{
  Node *n;

  if (N_Map.find(s) != N_Map.end()) return N_Map[s];

  n = new Node;
  n->name = s;
  n->visited = 0;
  N_Map[s] = n;
  Nodes.push_back(n);
  return n;
}

Edge *Graph::Get_Edge(Node *n1, Node *n2)
{
  string en;
  Edge *e;

  en = n1->name + "->";
  en += n2->name;

  if (E_Map.find(en) != E_Map.end()) return E_Map[en];
  e = new Edge;
  e->name = en;
  E_Map[en] = e;
  Edges.push_back(e);
  e->original = 0;
  e->residual = 0;
  e->n1 = n1;
  e->n2 = n2;
  e->reverse = NULL;
  return e;
}

The only real subtlety here is that I don't create reverse edges in Get_Edge(). Instead, I do it in the graph constructor.

The graph constructor is also straightforward. I'll include the code here, but I'm not going to explain it much:

Graph::Graph()
{
  string s, nn, nn2, en;
  int cap;
  Node *n1, *n2;
  Edge *e, *r;

  MaxCap = 0;
  Source = NULL;
  Sink = NULL;

  while (cin >> s) {
    if (s == "SOURCE" || s == "SINK") {   /* Set the source or sink (error checking for duplicates. */
      if (!(cin >> nn)) exit(0);
      n1 = Get_Node(nn);
      if (s == "SOURCE") {
        if (Source != NULL) { cerr << "Two sources.\n"; exit(1); }
        Source = n1;
      } else {
        if (Sink != NULL) { cerr << "Two sinks.\n"; exit(1); }
        Sink = n1;
      }
                                    /* When you read an edge, create it, potentially creating */
    } else if (s == "EDGE") {       /* the nodes, set the capacities and reverse edges. */
      if (!(cin >> nn >> nn2 >> cap)) exit(0);
      if (cap <= 0) exit(0);

      n1 = Get_Node(nn);
      n2 = Get_Node(nn2);
      e = Get_Edge(n1, n2);
      e->original += cap;
      e->residual += cap;
      if (e->residual > MaxCap) MaxCap = cap + 1;
       
      if (e->reverse == NULL) {  /* This means that the edge was just created */
        r = Get_Edge(n2, n1);
        e->reverse = r;
        r->reverse = e;
        n1->adj.push_back(e);
        n2->adj.push_back(r);
      }
    }
  }
  MaxCap *= 2;        /* This is because you can add flow in both directions (just trust me) */
  if (Source == NULL) { cerr << "No Source.\n"; exit(1); }
  if (Sink == NULL) { cerr << "No Sink.\n"; exit(1); }
}

When I call Get_Edge(), I check reverse to see if it's NULL. If it is, then I have just created the edge for the first time. That is when I create the reverse edge, and put both of them on their respective adjacency lists.

I also have a destructor, which deletes all of the nodes and edges. This is good form, because if I create an instance of a graph and then delete it, all of the memory associated with the graph will be released:

Graph::~Graph()
{
  int i;

  for (i = 0; i < Nodes.size(); i++) delete Nodes[i];
  for (i = 0; i < Edges.size(); i++) delete Edges[i];
}

You'll note that I don't have a copy constructor or assignment overload. I probably should, and have them throw errors, because the pointers mean that the defaults will not work.

The Print() method prints each node and its adjacency list, showing the name and residual flow on each edge. And the main() simply reads the graph and prints it if Verbose contains a 'B'. Here are both:

void Edge::Print()
{
  printf("[%s:%d]", name.c_str(), residual);
}

void Graph::Print()
{
  int i, j;
  Node *n;

  printf("Graph:\n");
  for (i = 0; i < Nodes.size(); i++) {
    n = Nodes[i];
    printf("  ");
    printf("Node: %s - ", n->name.c_str());
    for (j = 0; j < n->adj.size(); j++) n->adj[j]->Print();
    printf("\n");
  }
}

main(int argc, char **argv)
{
  Graph *G;
  int f;

  if (argc > 2) {
    cerr << "usage: netflow verbosity(BGP) - graph on stdin\n";
    exit(1);
  }

  G = new Graph();

  if (argc == 2) G->Verbose = argv[1];
  if (G->Verbose.find('B') != string::npos) G->Print();

  delete G;   /* Doing this just to make sure that the destructor works */
}

With all of these pieces in place, let's print out the graphs in g1.txt and g5.txt:

UNIX> netflow_skeleton B < g1.txt
Graph:
  Node: S - [S->A:10][S->C:14]
  Node: T - [T->A:0][T->B:0][T->D:0]
  Node: A - [A->B:5][A->T:5][A->S:0]
  Node: B - [B->A:0][B->T:8][B->C:0]
  Node: C - [C->B:13][C->D:10][C->S:0]
  Node: D - [D->C:0][D->T:12]
UNIX> netflow_skeleton B < g5.txt
Graph:
  Node: s - [s->n02:6263][s->n04:8033]
  Node: t - [t->n00:0][t->n01:0]
  Node: n00 - [n00->t:4923][n00->n01:0][n00->n02:6932][n00->n03:6518][n00->n04:6183]
  Node: n01 - [n01->n00:8824][n01->t:8471][n01->n02:0][n01->n03:0][n01->n04:6661]
  Node: n02 - [n02->n00:0][n02->n01:4929][n02->s:0][n02->n03:741][n02->n04:0]
  Node: n03 - [n03->n00:0][n03->n01:5566][n03->n02:0][n03->n04:0]
  Node: n04 - [n04->n00:0][n04->n01:0][n04->n02:5840][n04->n03:4417][n04->s:0]
UNIX>

Again, every edge has a reverse edge, and these edges are stored on the adjacency lists.

Calculating the Maximum Flow

The program src/netflow_dfs_v_no_delete.cpp is the simplest implementation that calculates the maximum flow. This program implements the three methods that were left unimplemented in the skeleton program. The first of these is MaxFlow(), which simply calls the Find_Augmenting_Path() repeatedly, until no more augmenting paths may be found. MaxFlow() is the simplest of the three methods:

int Graph::MaxFlow()
{
  int mf, f;

  NPaths = 0;
  mf = 0;
  while (1) {
    f = Find_Augmenting_Path();
    mf += f;
    if (f == 0) return mf;
    NPaths++;
  }
}

The next method is Find_Augmenting_Path(). This calls DFS() to find an augmenting path from Source to Sink. DFS() returns 1 if it succeeds and 0 if it fails. If it succeeds, the path is in the vector Path. The order of the edges in Path should be immaterial here -- as long as all of the edges are in the vector, then Find_Augmenting_Path() can process it. As it turns out, DFS and all of the other implementations have the edges in reverse order. Fortunately, that's not important.

int Graph::Find_Augmenting_Path()
{
  int i, f;
  Edge *e;

  /* Clear visited fields, and the path. Then find an augmenting path with DFS. */

  for (i = 0; i < Nodes.size(); i++) Nodes[i]->visited = 0;
  Path.clear();
  if (Verbose.find('G') != string::npos) Print();
  if (DFS(Source)) {

    /* Calculate the flow through the path */

    f = MaxCap;
    for (i = 0; i < Path.size(); i++) {
      if (Path[i]->residual < f) f = Path[i]->residual;
    }

    /* The path is in reverse order, so we print the vector from back to front */

    if (Verbose.find('P') != string::npos) {
      printf("Path with flow %d: ", f);
      for (i = Path.size()-1; i >= 0; i--) Path[i]->Print();
      printf("\n");
    }

    /* Process the residual Graph */

    for (i = 0; i < Path.size(); i++) {
      e = Path[i];
      e->residual -= f;
      e->reverse->residual += f;
    }

    return f;
  }
  return 0;
}

After finding a path, Find_Augmenting_Path() does three things. First, it calculates the flow, then it optionally prints the path, and last, it processes the residual graph, removing flow from the forward direction and adding it to the reverse direction. This is all straightfoward, because we don't bother deleting edges with zero residual capacity from the adjacency lists.

All that's left is to find an augmenting path. Here is the depth-first search. It is a straightforward recursive DFS. The only important thing is that we need to ignore zero capacity edges, because we want to find paths with flow:

int Graph::DFS(Node *n)
{
  int i;
  Edge *e;

  if (n->visited) return 0;
  if (n == Sink) return 1;
  n->visited = 1;

  for (i = 0; i < n->adj.size(); i++) {
    e = n->adj[i];
    if (e->residual != 0) {
      if (DFS(e->n2)) {
        Path.push_back(e);
        return 1;
      }
    }
  }
  return 0;
}

We create the path whenever a recursive call returns 1. In other words, we create it when we have actually found a path to the sink.

Let's see how this works on our example graph, which I'll reproduce below:

g1.txt:

SOURCE S
SINK T
EDGE A B 5
EDGE A T 5
EDGE B T 8
EDGE C B 13
EDGE C D 10
EDGE D T 12
EDGE S A 10
EDGE S C 14

g1a.txt:

SOURCE S
SINK T
EDGE D T 12
EDGE A T 5
EDGE S C 14
EDGE S A 10
EDGE C D 10
EDGE B T 8
EDGE A B 5
EDGE C B 13

Because these two graph files create adjacency lists in different orders, they find different paths when calculating the maximum flow. The flow in both graphs is the same, however.

UNIX> netflow_dfs_v_no_delete PB < g1.txt
Graph:
  Node: S - [S->A:10][S->C:14]
  Node: T - [T->A:0][T->B:0][T->D:0]
  Node: A - [A->B:5][A->T:5][A->S:0]
  Node: B - [B->A:0][B->T:8][B->C:0]
  Node: C - [C->B:13][C->D:10][C->S:0]
  Node: D - [D->C:0][D->T:12]
Path with flow 5: [S->A:10][A->B:5][B->T:8]
Path with flow 5: [S->A:5][A->T:5]
Path with flow 3: [S->C:14][C->B:13][B->T:3]
Path with flow 10: [S->C:11][C->D:10][D->T:12]
Max flow is 23 - Paths: 4
Graph:
  Node: S - [S->A:0][S->C:1]
  Node: T - [T->A:5][T->B:8][T->D:10]
  Node: A - [A->B:0][A->T:0][A->S:10]
  Node: B - [B->A:5][B->T:0][B->C:3]
  Node: C - [C->B:10][C->D:0][C->S:13]
  Node: D - [D->C:10][D->T:2]
UNIX> netflow_dfs_v_no_delete PB < g1a.txt
Graph:
  Node: S - [S->C:14][S->A:10]
  Node: T - [T->D:0][T->A:0][T->B:0]
  Node: D - [D->T:12][D->C:0]
  Node: A - [A->T:5][A->S:0][A->B:5]
  Node: C - [C->S:0][C->D:10][C->B:13]
  Node: B - [B->T:8][B->A:0][B->C:0]
Path with flow 10: [S->C:14][C->D:10][D->T:12]
Path with flow 4: [S->C:4][C->B:13][B->T:8]
Path with flow 5: [S->A:10][A->T:5]
Path with flow 4: [S->A:5][A->B:5][B->T:4]
Max flow is 23 - Paths: 4
Graph:
  Node: S - [S->C:0][S->A:1]
  Node: T - [T->D:10][T->A:5][T->B:8]
  Node: D - [D->T:2][D->C:10]
  Node: A - [A->T:0][A->S:9][A->B:1]
  Node: C - [C->S:14][C->D:0][C->B:9]
  Node: B - [B->T:0][B->A:4][B->C:4]
UNIX>

It is not a bad idea for you to process these paths yourself to reinforce how to process the residual graph. To help you, you can run the program with "PG", and that lets you see the residual graph at each stage of the calculation.

How to represent adjacency lists?

The representation of the adjacency list has a profound impact on the performance of Network Flow. The solution above was easy to code. However, keeping zero edges on the adjacency list is going to slow down our DFS. On the flip side, deleting edges from the adjacency lists will incur its own overhead. Not to mention, it makes the programming harder. I have five different implementations, and although I'm not going to go over the code specifically with you, the code is there so that you can answer your own questions:

src/netflow_dfs_v_no_delete.cpp: This is the version above.
src/netflow_dfs_v_delete.cpp: In this version, I delete zero edges from the adjacency lists. I don't delete the Edge class though, which means that each edge will have a valid reverse edge. It just so happens that if the reverse edge has zero residual capacity, it is not on an adjacency list.
To delete edges, I have each edge store an integer index, which is the index of the edge in the adjacency list. When you delete an edge, what you do is first swap it with the last edge in the node's adjacency list (adjusting that node's index). Then you call pop_back() to delete the edge from the adjacency list. I do this because it makes deletion constant time. What I don't want to do is, for example, move every edge over and then resize the vector, because that is not a constant time action.
src/netflow_dfs_list_pf.cpp: In this version, I use a list rather than a vector for the adjacency list. Instead of storing an index, I store the iterator to the edge on the list, in the variable pointer. This lets me delete easily using the erase() method, which is constant time on lists.
There is an interesting implementational issue arising from (in my opinion) bad design of the STL. Because push_back() doesn't return an iterator to the new node (like it should), and because rbegin() returns a reverse_iterator, I use push_front() whenever I put a new edge onto an adjacency list. This is because I can now grab its iterator easily with begin().
src/netflow_dfs_list_pb.cpp: This is identical to the last version, but now I am calling push_back(), and grabbing the iterator by calling end() and then decrementing. This ends up being quite a significant change.
src/netflow_dfs_edge_list.cpp: Only read this code if you want to torture yourself. Because push_back() and erase() have some overhead to them, what I did here was implement a class called an EdgeList, which stores an adjacency list, but makes sure that the zero capacity edges are at the end. When you set an edge's capacity to zero, you swap it with the last non-zero capacity edge. I thought this may be more efficient than src/netflow_dfs_v_delete.cpp, because it eliminates all of the push_back() and pop_backs(). Functionally, it is equivalent to src/netflow_dfs_v_delete.cpp.

When you run these on a big graph, the results are stunningly not what you'd think. Here are timings on my MacbookPro (2.4 Ghz Intel Core i5 processor - I got the machine in 2011 or so) on g200.txt, which is a graph with 200 interior nodes:

UNIX> time netflow_dfs_v_no_delete < g200.txt
Max flow is 316197 - Paths: 35860
4.451u 0.013s 0:04.46 100.0%	0+0k 0+1io 0pf+0w
UNIX> time netflow_dfs_v_delete < g200.txt
Max flow is 316197 - Paths: 39037
1.485u 0.009s 0:01.49 99.3%	0+0k 0+0io 0pf+0w
UNIX> time netflow_dfs_list_pf < g200.txt
Max flow is 316197 - Paths: 205014
11.953u 0.022s 0:11.97 100.0%	0+0k 0+0io 0pf+0w
UNIX> time netflow_dfs_list_pb < g200.txt
Max flow is 316197 - Paths: 5389
0.361u 0.009s 0:00.37 97.2%	0+0k 0+0io 0pf+0w
UNIX> time netflow_dfs_edge_list < g200.txt
Max flow is 316197 - Paths: 39037
1.996u 0.011s 0:02.01 99.5%	0+0k 0+0io 0pf+0w
UNIX>

First, they all find the same flow of 316197. Yay.

Second, since I know no one looks at those numbers but me, here they are graphed:

In terms of the two programs that use vectors, the one that deletes nodes is far preferable to the one that doesn't. That's because all of those zero capacity edges on the adjacency lists take up time in DFS(). The "Edge List" does a worse job than simply calling pop_back(), so we don't have to worry about that one any more. (also Yay).

What about the two list implementations -- that is amazing, isn't it? The one that calls push_front() took nearly 12 seconds, and the one that calls push_back() took the shortest amount of time of any of the implementations: Just 0.361 seconds.

The reason why has nothing to do with how the data structures are implemented. Instead, it has to do with the number of paths that are generated. The push_front() version generated 205,014 paths, while the push_back() version generated just 5,389. The other versions generated roughly 40,000 each. I want you to think about why the push_front() version generates lots of paths, and why the push_back() version generates few. It has to do with the fact that when you process a graph and you add an edge to an adjacency list, the edge is likely to have a small amount of capacity. So, if you put it in the front of a list, then you're more likely to find a new path with small flow rather than large. And small flow edges are bad (remember the pathelogical example from the first set of lecture notes).

Now, if you divide the running times by the number of paths (rightmost graph), then you get 0.000038 seconds per path with netflow_dfs_v_delete, and 0.000067 seconds per path with netflow_dfs_list_pb. So using a vector and deleting is the fastest way to process paths, when you normalize by the number of paths. We are therefore going to use src/netflow_dfs_v_delete.cpp as our starting point for the remaining implementations.