CS302 Lecture Notes

CS302 Lecture Notes - Network Flow Part 2: Hacking up the Algorithms

April 3, 2008
James S. Plank
Directory: /home/plank/cs302/Notes/Netflow2

Graph Representation and Generating Random Graphs

We are going to represent graphs with a very flexible file format. The file will contain a stream of words, which should be in the following format:

EDGE n1 n2 capacity -- This specifies that there is an edge from node n1 to node n2 with the given capacity. Capacities must be positive; there may only be one edge from n1 to n2, and it is fine to have edges in both directions between a pair of nodes.
SOURCE name -- This specifies the name of the source node.
SINK name -- This specifies the name of the sink node.

There must be a single source and sink.

So, for example, below is the graph from the Network Flow Lecture Notes #1:

g1.txt

SOURCE A
SINK G
EDGE A D 3
EDGE A B 3
EDGE B C 4
EDGE C A 3
EDGE C D 1
EDGE C E 2
EDGE D E 2
EDGE D F 6
EDGE E B 1
EDGE E G 1
EDGE F G 9

The file g2.txt adds an edge from A to C with a capacity of 1.

I have a program makerandom.cpp which takes one argument, a number of nodes, and creates a random graph with one source, one sink, and the given number of other nodes. There is a random edge between every pair of nodes (in a random direction) with a random capacity between zero and 10. There are edges from the source to random nodes with a 40% probability, and there are edges from the sink to random nodes with a 40% probability. Thus, this is a pretty dense graph which should be a challenge to our network flow programs.

Below is an example of a graph made with makerandom 5:

g3.txt

SOURCE s
SINK t
EDGE n00 t 4.923
EDGE n01 n00 8.824
EDGE n00 n02 6.932
EDGE n00 n03 6.518
EDGE n00 n04 6.183
EDGE n01 t 8.471
EDGE n02 n01 4.929
EDGE n03 n01 5.566
EDGE n01 n04 6.661
EDGE s n02 6.263
EDGE n02 n03 0.741
EDGE n04 n02 5.840
EDGE n04 n03 4.417
EDGE s n04 8.033

Reading and representing graphs

The program netflow1.cpp shows the basic data structures that we are going to use for holding our graph and detemining maximum flow. First, let's look at the Graph class. It is straightforward, holding nodes and edges referenced by name, plus having pointers to the source and sink nodes. There are three methods -- a constructor, which will create a graph from standard input, a printing method, and a method that gets a node by name. This is convenient, because it accepts a string, and returns a node. If the node already exists, it simply returns the node. Otherwise, it creates the node, inserts it into node_names, and returns it.

class Graph {
  public:
    // Methods:
    Graph();
    void Print();
    Node *Get_Node_By_Name(string name);

    // Data:
    map <string, Node *> node_names;            // All nodes by name.
    map <string, Edge *> edge_names;            // All edges by name.
    Node *source;
    Node *sink;
};

The Node class defines a node, which has a name, an adjacency list of edges, a backedge for determining paths, and a visited flag for searching.

typedef list <class Edge *> Adjlist;

class Node {
  public:
    string name;
    Adjlist edges;
    Node *backedge;          // For determining a path in augmenting paths.
    int visited;
};

And finally, the Edge class defines an edge from n1 to n2 with the given capacity. It also contains a flow variable, so that we can represent both the flow and residual graph with the same data structure. Each edge will have a name, which is the names of n1 and n2 with an ASCII arrow between them. Finally, each edge has a pointer to its reverse edge, if it exists (NULL if there is no reverse edge), and a pointer to its entry in its node's adjacency list, in case it needs to be deleted.

class Edge {
  public:
    string name;
    Node *n1;                     // From node
    Node *n2;                     // To node
    Edge *reverse;                // This lets you find your reverse edge quickly
    double capacity;
    double flow;
    Adjlist::iterator adj_ptr;    // This lets you delete yourself 
                                  // from the adjacently list quickly
};

The code to read a graph is straightforward, in netflow1.cpp. There are some subtleties in figuring out whether a node has reverse edges or not. Here it is running on two of the example files:

UNIX> netflow1 < g1.txt
Source: A, Sink: G
Node A, Edges: (B,3) (D,3)
Node B, Edges: (C,4)
Node C, Edges: (E,2) (D,1) (A,3)
Node D, Edges: (F,6) (E,2)
Node E, Edges: (G,1) (B,1)
Node F, Edges: (G,9)
Node G, Edges:
Edge A -> B -- No reverse edge.
Edge A -> D -- No reverse edge.
Edge B -> C -- No reverse edge.
Edge C -> A -- No reverse edge.
Edge C -> D -- No reverse edge.
Edge C -> E -- No reverse edge.
Edge D -> E -- No reverse edge.
Edge D -> F -- No reverse edge.
Edge E -> B -- No reverse edge.
Edge E -> G -- No reverse edge.
Edge F -> G -- No reverse edge.
UNIX>

UNIX> netflow1 < g2.txt
Source: A, Sink: G
Node A, Edges: (C,1) (B,3) (D,3)
Node B, Edges: (C,4)
Node C, Edges: (E,2) (D,1) (A,3)
Node D, Edges: (F,6) (E,2)
Node E, Edges: (G,1) (B,1)
Node F, Edges: (G,9)
Node G, Edges:
Edge A -> B -- No reverse edge.
Edge A -> C -- Reverse edge: C -> A
Edge A -> D -- No reverse edge.
Edge B -> C -- No reverse edge.
Edge C -> A -- Reverse edge: A -> C
Edge C -> D -- No reverse edge.
Edge C -> E -- No reverse edge.
Edge D -> E -- No reverse edge.
Edge D -> F -- No reverse edge.
Edge E -> B -- No reverse edge.
Edge E -> G -- No reverse edge.
Edge F -> G -- No reverse edge.
UNIX>

The Ford-Fulkerson Algorithm with no smartness to selecting the Augmenting Path

The program in netflow2.cpp implements the Ford-Fulkersom algorithm. I've changed a few things from the previous program. First, I've removed edge_names from the graph and instead only use it when reading a graph.

Second, I've removed flow from edges -- we can worry about adding flow later. Third, I've removed the name field from edges, and instead added a Print() method.

More significantly, I've added three new methods to graphs: GetFlow(), which calculates the maximum flow, FindAugmentingPath(), which finds an augmenting path, modifies the graph by reducing capacity among the flow edges and adding capacity to backedges (creating backedges in the process), and DFS(), which is a recursive procedure using standard depth-first search to find an augmenting path.

I've also added a path to the data. When DFS() returns, if it finds a path, it puts it into path and returns the flow through the path.

class Graph {
  public:
    // Methods:
    Graph();
    void Print();
    Node *Get_Node_By_Name(string name);
    double GetFlow();
    double FindAugmentingPath();
    double DFS(Node *n);

    // Data:
    list <Edge *> path;
    map <string, Node *> node_names;            // All nodes by name.
    Node *source;
    Node *sink;
};

Here's the DFS() procedure. Pretty straightforward -- the only subtlety is that when we find the path, we create it by prepending to the path list, since we create it in reverse:

double Graph::DFS(Node *n)
{
  Adjlist::iterator alit;
  Edge *e;
  double flow;

  n->visited = 1;

  for (alit = n->edges.begin(); alit != n->edges.end(); alit++) {
    e = *alit;
    if (e->capacity > 0) {
      if (e->n2 == sink) {
        path.push_front(e);
        return e->capacity;
      } else if (!e->n2->visited) {
        flow = DFS(e->n2);
        if (flow > 0) {
          path.push_front(e);
          return (flow < e->capacity) ? flow : e->capacity;
        } 
      }
    }
  }
  return 0;
}

Now, below is FindAugmentingPath(), which calls DFS() to find a path, and then performs the modifications on the graph. Note, we also delete the path while we are modifying the graph. One item of concern is what happens when an edge's capacity goes to zero? Should we delete it or just leave it with a capacity of zero? In this code, we just leave it, which is why in DFS() above, we make sure to ignore zero capacity edges. We'll explore the implications of this decision in netflow3 below.

double Graph::FindAugmentingPath()
{
  map <string, Node *>::iterator nptr;
  list <Edge *>::iterator pit;
  Node *n;
  double flow;
  Edge *e;

  for (nptr = node_names.begin(); nptr != node_names.end(); nptr++) {
    n = nptr->second;
    n->visited = 0;
  }
  flow = DFS(source);
  if (flow > 0) {
    while (!path.empty()) {
      pit = path.begin();
      e = *pit;
      path.erase(pit);
      e->capacity -= flow;          // Remove flow from the edge.
      if (e->reverse == NULL) {     // Create the reverse edge if necessary
        e->reverse = new Edge;
        e->reverse->reverse = e;
        e->reverse->capacity = 0;
        e->reverse->n1 = e->n2;
        e->reverse->n2 = e->n1;
        e->n2->edges.push_front(e->reverse);
        e->reverse->adj_ptr = e->n2->edges.begin();
      }
      e->reverse->capacity += flow;  // Add capacity to the reverse edge.
    }
  }
  return flow;
}

The rest is straightforward -- GetFlow() simply finds paths as long as they exist:

double Graph::GetFlow()
{
  double maxflow;
  double flow;
  
  maxflow = 0;
  while (1) {
    flow = FindAugmentingPath();
    if (flow == 0) return maxflow;
    maxflow += flow;
  }
}

You'll see that this does work to find flow (I've uncommented the lines that print out the augmenting paths:)

UNIX> netflow2 < g1.txt
[A->B:3][B->C:4][C->E:2][E->G:1]
[A->B:2][B->C:3][C->D:1][D->F:6][F->G:9]
[A->D:3][D->F:5][F->G:8]

Flow is 5
UNIX> netflow2 < g2.txt
[A->C:1][C->E:2][E->G:1]
[A->B:3][B->C:4][C->D:1][D->F:6][F->G:9]
[A->D:3][D->F:5][F->G:8]

Flow is 5
UNIX> netflow2 < g3.txt
[s->n04:8.033][n04->n03:4.417][n03->n01:5.566][n01->t:8.471]
[s->n04:3.616][n04->n02:5.84][n02->n03:0.741][n03->n01:1.149][n01->t:4.054]
[s->n04:2.875][n04->n02:5.099][n02->n01:4.929][n01->t:3.313]
[s->n02:6.263][n02->n01:2.054][n01->t:0.438]
[s->n02:5.825][n02->n01:1.616][n01->n00:8.824][n00->t:4.923]

Flow is 10.087
UNIX>

Deleting edges or making them zero capacity?

Earlier we made the decision to have zero capacity edges rather than delete them when their capacity reaches zero. How does this compare to a version where we delete the edges? The code for this is in netflow3.cpp. There is only one relevant change, to FindAugmentingPath():

...
      
      if (e->capacity == 0) {
        e->reverse->reverse = NULL;
        e->n1->edges.erase(e->adj_ptr);
        delete e;
      }
...
}

This is why we have the adj_ptr field.

To test performance, I wrote a shell script time_one.sh which times sucessive iterations of one of the programs with a given graph size (I modified makerandom.cpp to take a seed for random graph generation):

UNIX> sh time_one.sh 
usage: sh time_one n program iterations
UNIX> sh time_one.sh 10 netflow2 5
0 Flow is 18.711 0.004
1 Flow is 12.003 0.003
2 Flow is 11.066 0.003
3 Flow is 21.673 0.003
4 Flow is 14.788 0.003
UNIX>

And then wrote a second shell script gen_data.sh that runs time_one.sh for a given number of iterations, averages the result, and then increments the graph size by a given value. It continues doing this until the average time reaches a certain threshhold. The example below averages five runs, starting with a graph size of ten until the average reaches 1 second (I've cut off the output:

UNIX> sh gen_data.sh 
usage: sh gen_data.sh program iterations start increment end-time
UNIX> sh gen_data.sh netflow2 5 10 1 1
10 0.003000
11 0.003400
12 0.003000
13 0.003600
14 0.004200
15 0.005200
16 0.004200
...

Now, I use these scripts to compare the two implementations. Here's graph number one, using ten iterations per value:

I certainly couldn't publish data that noisy, so I've repeated the test with 50 iterations per value (sh gen_data netflow2/3 50 10 4 15).:

Although both graphs clearly show that netflow2 outperforms netflow3, the second is much better, and worth the extra time to produce (think about it -- 50 iterations of 10 seconds is a little over eight minutes for a single data point).

Greedy Ford-Fulkerson

A simple technique to improve the path selection of the Ford-Fulkerson algorithm is to keep the edge capacities sorted, and traverse them in the depth-first search algorithm in decreasing order. We do this (almost) in netflow4.cpp. I only include the relevant changes from netflow2.cpp. First, the type of Adjlist is changed to a multimap instead of a list. Then, whenever edges are created, or edge capacities are modified, the edges are deleted from the node's adjacency list and reinserted with the new capacity.

...

typedef multimap <double, class Edge *> Adjlist;

...

double Graph::DFS(Node *n)
{
  Adjlist::iterator alit;
  Edge *e;
  double flow;

  n->visited = 1;

  for (alit = n->edges.begin(); alit != n->edges.end(); alit++) {
    e = alit->second;

     .... // Everything else is the same in this method.
}

double Graph::FindAugmentingPath()
{
  ... // Initialization code deleted

  flow = DFS(source);
  if (flow > 0) {
    while (!path.empty()) {
      pit = path.begin();
      e = *pit;
      path.erase(pit);

      // Here is where the edge is erased and reinserted.

      e->n1->edges.erase(e->adj_ptr);      
      e->capacity -= flow;
      e->adj_ptr = e->n1->edges.insert(make_pair(e->capacity, e));

      // The same thing needs to happen with the reverse edge

      if (e->reverse == NULL) {
        e->reverse = new Edge;
        e->reverse->reverse = e;
        e->reverse->capacity = flow;
        e->reverse->n1 = e->n2;
        e->reverse->n2 = e->n1;
      } else {
        e->n2->edges.erase(e->reverse->adj_ptr);
        e->reverse->capacity += flow;
      }
      e->reverse->adj_ptr = e->n2->edges.insert(make_pair(e->reverse->capacity, e->reverse));
    }
  }
  return flow;
}

Graph::Graph()       // Create a graph from standard input.
{
  ...
        // edge inserted rather than appended
        e->adj_ptr = n1->edges.insert(make_pair(cap, e));  
  ...
}
  
void Graph::Print()
{
   ...
    for (alit = n->edges.begin(); alit != n->edges.end(); alit++) {
      e = alit->second;
      e->Print();
      edges.push_back(e);  // This is a list of edges, not an adjacency list
    }
    ...
}
    
main()
{
  Graph *g;
  double flow;

  g = new Graph();
  flow = g->GetFlow();
  cout << "Flow is " << flow << endl;
}

Let's look at performance:

Ick! That's horrible!! The reason is that I'm traversing the edges from smallest to greatest in DFS(), and not the other way around. It's a two-line fix in netflow5.cpp:

double Graph::DFS(Node *n)
{
  Adjlist::reverse_iterator alit;
  Edge *e;
  double flow;

  n->visited = 1;

  for (alit = n->edges.rbegin(); alit != n->edges.rend(); alit++) {
  ...  // The rest is the same

and the resulting performance is excellent, blowing away the previous algorithms:

Ford-Fulkerson with the maximum capacity augmenting path

Another well-known technique to improve Ford-Fulkerson is to use a modification of Dijkstra's shortest path algorithm to find the maximum capacity augmenting path. This is no longer a depth-first search, but a breadth-first search, where we maintain a map of the flows to nodes not currently in our set. When we are in a position to remove the sink from the map, that means that we have found our flow. The code is in netflow6.cpp. We first change our adjacency lists back to lists and add a flow to our nodes. We also define a BFS_Q, which is the map that maintains the nodes not currently in our breadth-first-search set, but that have edges from nodes in our breadth-first-search set. The map is sorted by maximum flow to each node.

We also have each node store a pointer to its entry in the BFS_Q. You'll see why in a bit. There are the changed typedefs and Node specification:

typedef list <class Edge *> Adjlist;
typedef multimap <double, class Node *> BFS_Q;

class Node {
  public:
    string name;
    Adjlist edges;
    Edge *backedge;         
    double maxflow;
    BFS_Q::iterator bfsq_ptr;
};

The main change is in FindAugmentingPath(), which now performs Dijkstra's algorithm to find the maximum flow through the graph. Note how bfsq_ptr is required to delete the node from bfsq, because you are not allowed to delete with a reverse iterator.

double Graph::FindAugmentingPath()
{
  map <string, Node *>::iterator nptr;
  list <Edge *>::iterator pit;
  Adjlist::iterator alit;
  BFS_Q bfsq;
  BFS_Q::reverse_iterator bfsq_it;
  Node *n, *n2;
  double flow;
  Edge *e;

  for (nptr = node_names.begin(); nptr != node_names.end(); nptr++) {
    n = nptr->second;
    n->maxflow = -1;
  }

  source->maxflow = capsum;
  n = source;
  n->backedge = NULL;

  while (n != NULL && n != sink) {
    // cout << "BFS - processing node " << n->name << endl;
    for (alit = n->edges.begin(); alit != n->edges.end(); alit++) {
      e = *alit;
      if (e->capacity > 0) {
        n2 = e->n2;
        flow = (n->maxflow > e->capacity) ? e->capacity : n->maxflow;
        if (flow > n2->maxflow) {
          if (n2->maxflow > -1) {
            bfsq.erase(n2->bfsq_ptr);
          }
          n2->maxflow = flow;
          n2->backedge = e;
          n2->bfsq_ptr = bfsq.insert(make_pair(flow, n2));
        }
      }
    }
    if (bfsq.empty()) {
      n = NULL;
    } else {
      bfsq_it = bfsq.rbegin();
      n = bfsq_it->second;
      bfsq.erase(n->bfsq_ptr);
    }
  }
  if (n == sink) {
    flow = n->maxflow;
    while (n != source) {
      path.push_front(n->backedge);
      n = n->backedge->n1;
    }
  } else {
    flow = 0;
  }

  // The rest of the code is the same as before.
  ...

This algorithm is even faster than the others:

The Edmonds-Karp Algorithm

As explained in the First network flow lecture notes, the Edmonds-Karp algorithm uses unweighted shortest paths to find the minimum-hop path at each augmenting path step. My code is in netflow7.cpp, but I don't allow access to it, because you are going to have to write Edmonds-Karp in your lab. The performance is on par with the last three:

It's worse than using Dijkstra's algorithm and better than using the greedy algorithm. Moreover, it appears more reliable in its performance. Think about why -- there is a pretty clear answer.

Looking at paths in g3.txt

Recall again g3.txt:

I've modified all of the programs to print paths as they are determined. You should be able to explain all of them. First, netflow2 simply chooses random augmenting paths:

UNIX> netflow2-path < g3.txt
Path: [s->n04:8.033] [n04->n03:4.417] [n03->n01:5.566] [n01->t:8.471]
Path: [s->n04:3.616] [n04->n02:5.84] [n02->n03:0.741] [n03->n01:1.149] [n01->t:4.054]
Path: [s->n04:2.875] [n04->n02:5.099] [n02->n01:4.929] [n01->t:3.313]
Path: [s->n02:6.263] [n02->n01:2.054] [n01->t:0.438]
Path: [s->n02:5.825] [n02->n01:1.616] [n01->n00:8.824] [n00->t:4.923]

Flow is 10.087
UNIX>

Netflow4 uses a greedy algorithm to peform the DFS by selecting minimum edges first:

UNIX> netflow4-path < g3.txt
Path: [s->n02:6.263] [n02->n03:0.741] [n03->n01:5.566] [n01->t:8.471]
Path: [s->n02:5.522] [n02->n01:4.929] [n01->t:7.73]
Path: [s->n04:8.033] [n04->n03:4.417] [n03->n01:4.825] [n01->t:2.801]
Path: [s->n04:5.232] [n04->n03:1.616] [n03->n01:2.024] [n01->n00:8.824] [n00->t:4.923]

Flow is 10.087
UNIX>

Netflow5 fixes this by performing the DFS by selecting maximum edges first.

UNIX> netflow5-path < g3.txt
Path: [s->n04:8.033] [n04->n02:5.84] [n02->n01:4.929] [n01->n00:8.824] [n00->t:4.923]
Path: [s->n02:6.263] [n02->n04:4.923] [n04->n03:4.417] [n03->n01:5.566] [n01->t:8.471]
Path: [s->n04:3.11] [n04->n02:5.334] [n02->n03:0.741] [n03->n01:1.149] [n01->t:4.054]
Path: [s->n04:2.369] [n04->n02:4.593] [n02->n01:0.006] [n01->t:3.313]

Flow is 10.087
UNIX>

Netflow6 always chooses the path with the maximum flow. Note how the first path is not the same one chosen by netflow5, because netflow5 starts with the biggest edge from the source, and the maximum flow path is not along that edge.

UNIX> netflow6-path < g3.txt
Path: [s->n02:6.263] [n02->n01:4.929] [n01->t:8.471]
Path: [s->n04:8.033] [n04->n03:4.417] [n03->n01:5.566] [n01->n00:8.824] [n00->t:4.923]
Path: [s->n04:3.616] [n04->n02:5.84] [n02->n03:0.741] [n03->n01:1.149] [n01->t:3.542]

Flow is 10.087
UNIX>

Finally, netflow7 chooses the minimum hop path:

UNIX> netflow7-path < g3.txt
Path: [s->n02:6.263] [n02->n01:4.929] [n01->t:8.471]
Path: [s->n04:8.033] [n04->n03:4.417] [n03->n01:5.566] [n01->t:3.542]
Path: [s->n04:4.491] [n04->n03:0.875] [n03->n01:2.024] [n01->n00:8.824] [n00->t:4.923]
Path: [s->n02:1.334] [n02->n03:0.741] [n03->n01:1.149] [n01->n00:7.949] [n00->t:4.048]

Flow is 10.087
UNIX>

Wouldn't a wonderful test question be to give you a graph and the output of these programs and to have you tell me which programs created which outputs?

The Minimum Cut

As explained in the First network flow lecture notes, you can use the final residual flow graph to determine the minimum cut. We'll do that in netflow6-cut.cpp. First, we add a second set of edges to each node, which are the original edges that will not be modified by the network flow. They are created in the graph constructor.

class Node {
  public:
    string name;
    Adjlist edges;
    Adjlist original_edges;
    Edge *backedge;         
    double maxflow;
    BFS_Q::iterator bfsq_ptr;
    int visited;
};

After calculating the flow, we perform a very simple depth-first search starting at the source to determine the set of reachable nodes. These nodes will have visited equal one. We then traverse all the nodes in the visited set and print all the edges to nodes that are not in the visited set:

void Graph::DFS(Node *n)
{
  Adjlist::iterator alit;
  Edge *e;

  n->visited = 1;

  for (alit = n->edges.begin(); alit != n->edges.end(); alit++) {
    e = *alit;
    if (e->capacity > 0 && !e->n2->visited) DFS(e->n2);
  }
}
  
main()
{
  Graph *g;
  double flow;
  map <string, Node *>::iterator nit;
  Adjlist::iterator alit;
  Edge *e;
  Node *n;

  g = new Graph();
  flow = g->GetFlow();
  cout << "Flow is " << flow << endl;
  
  // Find the set of nodes reachable from the source and mark them as visited.

  for (nit = g->node_names.begin(); nit != g->node_names.end(); nit++) {
    n = nit->second;
    n->visited = 0;
  }
  g->DFS(g->source);

  // Now, for each node in the visited set, print out original edges going to 
  // nodes from the non-visited set.  Calculate their sum too.
  flow = 0;
  cout << endl;
  cout << "Cut Edges:\n";

  for (nit = g->node_names.begin(); nit != g->node_names.end(); nit++) {
    n = nit->second;
    if (n->visited) {
      for (alit = n->original_edges.begin(); alit != n->original_edges.end(); alit++) {
        e = *alit;
        if (!e->n2->visited) { 
          e->Print(); 
          cout << endl; 
          flow += e->capacity;
        }
      }
    }
  }
  cout << endl;
  cout << "Cut Capacity: " << flow << endl;
}

When we run this on g3.txt we see that there are only three nodes in the set, and that their sum does indeed equal the network flow of 10.087:

UNIX> netflow6-cut < g3.txt
Flow is 10.087

Cut Edges:
[n02->n03:0.741]
[n02->n01:4.929]
[n04->n03:4.417]

Cut Capacity: 10.087
UNIX>

Pictorally: