CS302 Lecture Notes

CS302 Lecture Notes - Network Flow Part 2: Hacking up the Augmenting Path Algorithm

April 3, 2008. Latest revision: November, 2012
James S. Plank
Directory: /home/plank/cs302/Notes/Netflow2

In this lecture, we are going to program several solutions to network flow. We will represent directed, weighted graphs in text files, read them into a graph data structure, then find the maximum flow and the minimum cut.

Graph Representation and Generating Random Graphs

We are going to represent graphs with a very flexible file format. The file will contain a stream of words, which should be in the following format:

EDGE n1 n2 capacity -- This specifies that there is an edge from node n1 to node n2 with the given capacity. Capacities must be positive. If an edge is specified multiple times, then the capacities are added. It is fine to have edges in both directions between a pair of nodes.
SOURCE name -- This specifies the name of the source node.
SINK name -- This specifies the name of the sink node.

There must be a single source and sink.

So, for example, below is the graph from the Network Flow Lecture Notes #1:

g1.txt

SOURCE A
SINK G
EDGE A D 3
EDGE A B 3
EDGE B C 4
EDGE C A 3
EDGE C D 1
EDGE C E 2
EDGE D E 2
EDGE D F 6
EDGE E B 1
EDGE E G 1
EDGE F G 9

The file g2.txt adds an edge from A to C with a capacity of 1.

I have a program makerandom.cpp which takes one argument, a number of nodes, and creates a random graph with one source, one sink, and the given number of other nodes. There is a random edge between every pair of nodes (in a random direction) with a random capacity between zero and 10. There are edges from the source to random nodes with a 40% probability, and there are edges from the sink to random nodes with a 40% probability. Thus, this is a pretty dense graph which should be a challenge to our network flow programs.

Below is an example of a graph made with makerandom 5. I think we can all agree that finding the maximum flow through this graph will be a bit of a challenge. To help you, I've colored the edges in the minimum cut red:

g3.txt

SOURCE s
SINK t
EDGE n00 t 4.923
EDGE n01 n00 8.824
EDGE n00 n02 6.932
EDGE n00 n03 6.518
EDGE n00 n04 6.183
EDGE n01 t 8.471
EDGE n02 n01 4.929
EDGE n03 n01 5.566
EDGE n01 n04 6.661
EDGE s n02 6.263
EDGE n02 n03 0.741
EDGE n04 n02 5.840
EDGE n04 n03 4.417
EDGE s n04 8.033

Take a minute to study that graph for a bit. Try to convince yourself that the edges in the minimum cut have to be in any maximum flow through the graph. If you take my word for it that these edges compose the minimum cut, it's pretty easy to construct a maximum flow graph:

By the way, the maximum flow of g1.txt is 5 (see the first set of Network Flow lecture notes), and the maximum flow of g3.txt is 10.087.

Reading and representing graphs (netflow1.cpp)

Setting up the data structures in this lecture is a challenge, and I'll be honest that it has taken me several years to get it correct. We will have three different classes: Nodes, edges and graphs. We'll start with the simplest: the node:

class Node {
  public: 
    string name;
    vector <class Edge *> adj;
    int visited;
};

Each node has a name, an adjacency list of edges, and a visited field which helps us perform depth-first-search. We're using a vector instead of a list, because as it turns out, once we create an edge, we never delete it. Therefore, using a vector makes like easier than deques or lists. You have to say "class Edge" because the definition for an Edge is below that of a Node.

Edges are a little meatier. Each edge has a name, pointers to the two nodes which it connects, a pointer to its reverse edge, and three weights:

original is the edge's capacity in the original graph.
residual will be the edge's capacity in the residual graph.
flow will be the edge's capacity in the final flow graph.

By setting edges up in this way, you maintain all three graphs -- original, residual and flow, with one set of nodes and edges. It makes life easier. Here is the Edge definition:

class Edge {
  public:
    string name;
    Node *n1;
    Node *n2;
    Edge *reverse;
    double original;
    double residual;
    double flow;
};

Finally, we'll have a Graph class. We're going to start with the following definition:

class Graph {
  public:

     Graph();
     ~Graph();
     void Print();
     Node *Get_Node(string &s);
     Edge *Get_Edge(Node *n1, Node *n2);

     Node *Source;
     Node *Sink;
     vector <Node *> Nodes; 
     vector <Edge *> Edges; 
     map <string, Node *> N_Map;
     map <string, Edge *> E_Map;
};

There are some design decisions here, which I'd like to go over. First, look at the data. There's a source and a sink, and two vectors that contain pointers to all of the nodes and edges. I have these vectors so that whenever you want to perform an operation on all of the nodes or edges, you can do it with these vectors. They are also convenient because I can use them to delete nodes and edges in the destructor. In fact, I'll show my destructor now, since all it does is delete nodes and edges:

Graph::~Graph()
{
  int i;

  for (i = 0; i < Nodes.size(); i++) delete(Nodes[i]);
  for (i = 0; i < Edges.size(); i++) delete(Edges[i]);
}

The maps N_Map and E_Map store nodes and edges by their names. They are only used when we read in the graph, because when we specify a node or an edge, it may exist already. To figure out whether it exists already, we construct a name and then check N_Map or E_Map.

To exemplify, here's the code for Get_Node(), which either finds a node and returns a pointer to it, or determines that the node doesn't exist, in which case it creates the node, puts it into Nodes and N_Map, and returns it:

Node *Graph::Get_Node(string &s)
{
  map <string, Node *>::iterator nit;
  Node *n;

  nit = N_Map.find(s);
  if (nit != N_Map.end()) return nit->second;

  n = new Node;
  n->name = s;
  Nodes.push_back(n);
  N_Map.insert(make_pair(s, n));
  return n;
}

Finally, our constructor reads graph files. For the moment, we're just going to have it create nodes and not edges:

Graph::Graph()
{
  string s, f, t;
  double cap;
  Node *n1, *n2;

  Source = NULL;
  Sink = NULL;

  while (cin >> s) {
    if (s == "SOURCE") {
      cin >> s;
      if (Source != NULL) { fprintf(stderr, "Can't specify two sources\n"); exit(1); }
      Source = Get_Node(s);
    } else if (s == "SINK") {
      cin >> s;
      if (Sink != NULL) { fprintf(stderr, "Can't specify two sinks\n"); exit(1); }
      Sink = Get_Node(s);
    } else if (s == "EDGE") {
      cin >> f >> t >> cap;

      n1 = Get_Node(f);
      n2 = Get_Node(t);   /* We're not creating edges yet */
    }
  }

  if (Source == NULL) { fprintf(stderr, "No Source.\n"); exit(1); }
  if (Sink == NULL) { fprintf(stderr, "No Sink.\n"); exit(1); }
}

Finally, we have a print method that prints the nodes, and a main() that creates a graph and prints it. All of the above code is in netflow1.cpp:

void Graph::Print()
{
  Node *n;
  int i;

  printf("Source: %s\n", Source->name.c_str());
  printf("Sink:   %s\n", Sink->name.c_str());
  printf("Nodes: ");

  for (i = 0; i < Nodes.size(); i++) {
    n = Nodes[i];
    printf(" %s", n->name.c_str());
  }
  printf("\n");
}

main()
{
  Graph *g;

  g = new Graph();
  g->Print();
}

When we run this on g1.txt and g1.txt, it prints all of the node names, plus the source and sink:

UNIX> make netflow1
g++ -O -c netflow1.cpp
g++ -O -o netflow1 netflow1.cpp
UNIX> netflow1 < g1.txt
Source: A
Sink:   G
Nodes:  A G D B C E F
UNIX> netflow1 < g3.txt
Source: s
Sink:   t
Nodes:  s t n00 n01 n02 n03 n04
UNIX>

Netflow2.cpp - Reading in edges

Our next task is to read in edges, and there are three major design decisions that we make that will simplify our lives quite a bit:

We will never delete edges -- we simply have their capacities be zero, and we ignore them when we do things like calculate flow.
Whenever we have an edge from A to B, we will make sure that we also have a reverse edge from B to A. This edge may have all three capacities equal to zero -- that's ok. The fact that we guarantee that this edge exists makes our lives easier.
We make sure that each edge e has its pointer e->reverse defined so that e->reverse->reverse equals e.

Given these decisions, writing Get_Edge() is pretty straightforward. We calculate an edge's key from the names of the nodes at its endpoints. (The code is in netflow2.cpp).

Edge *Graph::Get_Edge(Node *n1, Node *n2)
{
  map <string, Edge *>::iterator eit;
  Edge *e;
  string name;

  name = n1->name + "->";
  name += n2->name;

  eit = E_Map.find(name);
  if (eit != E_Map.end()) return eit->second;

  e = new Edge;
  e->name = name;
  e->n1 = n1;
  e->n2 = n2;
  e->original = 0;
  e->residual = 0;
  e->flow = 0;
  e->reverse = NULL;
  
  Edges.push_back(e);
  E_Map.insert(make_pair(name, e));
  return e;
}

We use Get_Edge() in our constructor, which is a little subtle:

Graph::Graph()
{
  string s, f, t;
  double cap;
  Node *n1, *n2;
  Edge *e;

  Source = NULL;
  Sink = NULL;

  while (cin >> s) {
    if (s == "SOURCE") {
      cin >> s;
      if (Source != NULL) { fprintf(stderr, "Can't specify two sources\n"); exit(1); }
      Source = Get_Node(s);
    } else if (s == "SINK") {
      cin >> s;
      if (Sink != NULL) { fprintf(stderr, "Can't specify two sinks\n"); exit(1); }
      Sink = Get_Node(s);
    } else if (s == "EDGE") {
      cin >> f >> t >> cap;

      n1 = Get_Node(f);
      n2 = Get_Node(t);

      e = Get_Edge(n1, n2);
      e->original += cap;

      if (e->reverse == NULL) {
        e->reverse = Get_Edge(n2, n1);
        e->reverse->reverse = e;
        n1->adj.push_back(e);
        n2->adj.push_back(e->reverse);
      }
    }
  }

  if (Source == NULL) { fprintf(stderr, "No Source.\n"); exit(1); }
  if (Sink == NULL) { fprintf(stderr, "No Sink.\n"); exit(1); }
}

The subtlety is that we only put an edge onto its node's adjacency list when we first create it, which we test by testing whether e->reverse is NULL. If e->reverse is non-NULL, then we know that both the edge and the reverse edge have been created before, and therefore are already on their nodes' adjacency lists.

We test our program, we also use the input file g2.txt, which is identical to g1.txt, except there is an additional edge from A to C.

UNIX> make netflow2
g++ -O -c netflow2.cpp
g++ -O -o netflow2 netflow2.cpp
UNIX> netflow2 < g1.txt
Source: A
Sink:   G
Node A: (A->D:3.000) (A->B:3.000) (A->C:0.000)
Node G: (G->E:0.000) (G->F:0.000)
Node D: (D->A:0.000) (D->C:0.000) (D->E:2.000) (D->F:6.000)
Node B: (B->A:0.000) (B->C:4.000) (B->E:0.000)
Node C: (C->B:0.000) (C->A:3.000) (C->D:1.000) (C->E:2.000)
Node E: (E->C:0.000) (E->D:0.000) (E->B:1.000) (E->G:1.000)
Node F: (F->D:0.000) (F->G:9.000)
UNIX> netflow2 < g2.txt
Source: A
Sink:   G
Node A: (A->D:3.000) (A->B:3.000) (A->C:1.000)
Node G: (G->E:0.000) (G->F:0.000)
Node D: (D->A:0.000) (D->C:0.000) (D->E:2.000) (D->F:6.000)
Node B: (B->A:0.000) (B->C:4.000) (B->E:0.000)
Node C: (C->B:0.000) (C->A:3.000) (C->D:1.000) (C->E:2.000)
Node E: (E->C:0.000) (E->D:0.000) (E->B:1.000) (E->G:1.000)
Node F: (F->D:0.000) (F->G:9.000)
UNIX> netflow2 < g3.txt
Source: s
Sink:   t
Node s: (s->n02:6.263) (s->n04:8.033)
Node t: (t->n00:0.000) (t->n01:0.000)
Node n00: (n00->t:4.923) (n00->n01:0.000) (n00->n02:6.932) (n00->n03:6.518) (n00->n04:6.183)
Node n01: (n01->n00:8.824) (n01->t:8.471) (n01->n02:0.000) (n01->n03:0.000) (n01->n04:6.661)
Node n02: (n02->n00:0.000) (n02->n01:4.929) (n02->s:0.000) (n02->n03:0.741) (n02->n04:0.000)
Node n03: (n03->n00:0.000) (n03->n01:5.566) (n03->n02:0.000) (n03->n04:0.000)
Node n04: (n04->n00:0.000) (n04->n01:0.000) (n04->n02:5.840) (n04->n03:4.417) (n04->s:0.000)
UNIX>

Netflow3.cpp - Setting up the augmenting paths, and finding one with DFS

To find the max flow, we create a new graph method Find_Max_Flow(). This starts by clearing all of the flow and copying each residual from the original. It also calculates the maximum capacity edge. It then repeatedly calls another new method Find_Augmenting_Path(), which keeps finds an augmenting path through the residual graph and updates both the flow and residual graphs accordingly. It returns the flow of the augmenting path, which allows Find_Max_Flow() to terminate when no more flow is found. This code is in netflow3.cpp:

double Graph::Find_Max_Flow()
{
  double total;
  double f;
  Edge *e;
  int i;

  for (i = 0; i < Edges.size(); i++) {
    e = Edges[i];
    e->flow = 0;
    e->residual = e->original;
  }
  total = 0;
  while (1) {
    f = Find_Augmenting_Path();
    if (f == 0) {
      return total;
    } else {
      total += f;
    }
  }
}

So that we can program incrementally, we write Find_Augmenting_Path() so that it simply calls a depth-first search to find a path from the source to the sink, and then it exits the program:

double Graph::Find_Augmenting_Path()
{
  int i;

  for (i = 0; i < Nodes.size(); i++) Nodes[i]->visited = 0;
  if (DFS(Source)) {
    printf("Quitting.\n");
  }
  exit(0);
}

Finally, our depth-first search finds a path to the sink and then returns, printing the edges along the path in reverse order:

int Graph::DFS(Node *n)
{
  int i;
  Edge *e;
  double f;

  if (n->visited) return 0;
  n->visited = 1;
  if (n == Sink) return 1;

  for (i = 0; i < n->adj.size(); i++) {
    e = n->adj[i];
    if (e->residual > 0) {
      if (DFS(e->n2)) {
        printf("Found a path to the sink.  Edge %s.\n", e->name.c_str());
        return 1;
      }
    }
  }
  return 0;
}

We call Find_Max_Flow() in main(), and it's time to test.

main()
{
  Graph *g;
  double f;

  g = new Graph();
  f = g->Find_Max_Flow();
  printf("Max flow is %.3lf\n", f);
}

It should find a valid path from the source to the sink:

UNIX> make netflow3
g++ -O -c netflow3.cpp
g++ -O -o netflow3 netflow3.cpp
UNIX> netflow3 < g1.txt
Found a path to the sink.  Edge E->G.
Found a path to the sink.  Edge D->E.
Found a path to the sink.  Edge A->D.
Quitting.
UNIX> netflow3 < g3.txt
Found a path to the sink.  Edge n00->t.
Found a path to the sink.  Edge n01->n00.
Found a path to the sink.  Edge n02->n01.
Found a path to the sink.  Edge s->n02.
Quitting.
UNIX>

Here are the paths:

Netflow4.cpp - Processing the augmenting paths

Now, instead of printing the path, we're going to put a Path vector into the graph class, and have DFS create the path upon discovery:

int Graph::DFS(Node *n)
{
  int i;
  Edge *e;
  double f;

  if (n->visited) return 0;
  n->visited = 1;
  if (n == Sink) return 1;

  for (i = 0; i < n->adj.size(); i++) {
    e = n->adj[i];
    if (e->residual > 0) {
      if (DFS(e->n2)) {
        Path.push_back(e);
        return 1;
      }
    }
  }
  return 0;
}

In Find_Augmenting_Path(), we process the path, figuring out the flow and then modifying the flow and residual graphs accordingly:

double Graph::Find_Augmenting_Path()
{
  int i;
  double f;
  Edge *e;

  for (i = 0; i < Nodes.size(); i++) Nodes[i]->visited = 0;
  Path.clear();

  if (DFS(Source)) {
    f = Path[0]->residual;
    for (i = 1; i < Path.size(); i++) {
      if (Path[i]->residual < f) f = Path[i]->residual;
    }
    for (i = 0; i < Path.size(); i++) {
      e = Path[i];
      e->residual -= f;
      e->flow += f;
      e->reverse->residual += f;
    }
    printf("Found path with flow of %.3lf:", f);
    for (i = Path.size()-1; i >= 0; i--) printf(" %s", Path[i]->name.c_str());
    printf("\n");
    return f;
  } else {
    return 0;
  }
}

We test it, and it's successful at finding the flow in each graph:

UNIX> make netflow4
g++ -O -c netflow4.cpp
g++ -O -o netflow4 netflow4.cpp
UNIX> netflow4 < g1.txt
Found path with flow of 1.000: A->D D->E E->G
Found path with flow of 2.000: A->D D->F F->G
Found path with flow of 1.000: A->B B->C C->D D->F F->G
Found path with flow of 1.000: A->B B->C C->E E->D D->F F->G
Max flow is 5.000
UNIX> netflow4 < g3.txt
Found path with flow of 4.923: s->n02 n02->n01 n01->n00 n00->t
Found path with flow of 0.006: s->n02 n02->n01 n01->t
Found path with flow of 0.741: s->n02 n02->n03 n03->n01 n01->t
Found path with flow of 4.417: s->n04 n04->n03 n03->n01 n01->t
Max flow is 10.087
UNIX>

Netflow5.cpp - Finding the minimum cut

While we're at it, let's find the minimum cut. As it turns out, when Find_Augmenting_Path() fails to find a path to the sink, all of the nodes reachable from the source will have their visited fields set. That allows us to find the cut in Find_Max_Flow() -- we'll put it into a variable Cut, and then print the cut in the main program (in netflow5.cpp):

double Graph::Find_Max_Flow()
{
  double total;
  double f;
  Edge *e;
  Node *n;
  int i, j;

  for (i = 0; i < Edges.size(); i++) {
    e = Edges[i];
    e->flow = 0;
    e->residual = e->original;
  }
  total = 0;
  while (1) {
    f = Find_Augmenting_Path();
    if (f == 0) {
      Cut.clear();
      for (i = 0; i < Nodes.size(); i++) {
        n = Nodes[i];
        if (n->visited) {
          for (j = 0; j < n->adj.size(); j++) {
            e = n->adj[j];
            if (e->original > 0 && !e->n2->visited) Cut.push_back(e);
          }
        }
      }
      return total;
    } else {
      total += f;
    }
  }
}

main()
{
  Graph *g;
  double f;
  int i;

  g = new Graph();
  f = g->Find_Max_Flow();
  printf("Max flow is %.3lf\n", f);
  printf("Cut:");
  for (i = 0; i < g->Cut.size(); i++) printf(" %s", g->Cut[i]->name.c_str());
  printf("\n");
}

It properly finds the cuts in our two examples:

UNIX> make netflow5
g++ -O -c netflow5.cpp
g++ -O -o netflow5 netflow5.cpp
UNIX> netflow5 < g1.txt
Found path with flow of 1.000: A->D D->E E->G
Found path with flow of 2.000: A->D D->F F->G
Found path with flow of 1.000: A->B B->C C->D D->F F->G
Found path with flow of 1.000: A->B B->C C->E E->D D->F F->G
Max flow is 5.000
Cut: A->D C->D E->G
UNIX> netflow5 < g3.txt
Found path with flow of 4.923: s->n02 n02->n01 n01->n00 n00->t
Found path with flow of 0.006: s->n02 n02->n01 n01->t
Found path with flow of 0.741: s->n02 n02->n03 n03->n01 n01->t
Found path with flow of 4.417: s->n04 n04->n03 n03->n01 n01->t
Max flow is 10.087
Cut: n02->n01 n02->n03 n04->n03
UNIX>

Roundoff Error

If you try large graphs, you'll notice some behavior that's troubling:

UNIX> makerandom 50 0 | netflow5 | grep 0\.000 | head -n 1
Found path with flow of 0.000: s->n22 n22->n01 n01->n00 n00->n06 n06->n02 n02->n04 n04->n03 n03->n08 n08->n05 n05->n07 n07->n15 n15->n10 n10->n09 n09->n11 n11->n12 n12->n13 n13->n14 n14->n16 n16->t
UNIX>

Should this ever happen? No. Why? Because makerandom spits out capacities to three decimal digits. Therefore, no flow should be less than 0.001. What's going on is roundoff error, as many flow values are subtracted from residuals. I'm not going to worry about chasing down the problem and fixing it. However, if you ever wonder how or when roundoff error occurs, this is it. For that reason, in the next lecture, we're going to switch to integers.