CS302 Lecture notes
Topological Sort / Cycle Detection


Here's a little code for topological sort and cycle detection. Before going into them, whenever you are dealing with representing graphs in files, you have to decide how you are going to format them. A good way is to specify vertices with names and then to specify edges between vertices. An example is a file that specifies the prerequisite structure for courses. For example, CS140 is a prerequisite for CS302. In schedule.ts is a file that defines (more or less) the prerequisite structure of all the computer science classes:
CLASS CS140
  PREREQ CS102

CLASS CS160
  PREREQ CS102

CLASS CS302
  PREREQ CS140

CLASS CS311
  PREREQ MATH300
  PREREQ CS302
...
Note, the class names are vertices, and are defined either by a "CLASS" line, or by a "PREREQ" line (for example, there is no "CLASS" line for CS102). The "PREREQ" lines also define an edge from the specified class to the most recently defined class. For example, the first "PREREQ" line above defines an edge from CS102 to CS140, because CS102 is a prerequisite for CS140.

Thus, the above file defines a directed graph.

The first challenge when dealing with a graph is to read it in. The program GraphReader.cpp does just that. First, it defines classes for vertices and edges:

class Vertex {
protected:
    string name;
    dList<Edge *> edges;
public:
  Vertex(string n) {
    name = n;
  }
  string getName() { return name; }

  // methods for iterating through the edges. We do not make the edge
  // list available since we might want to change the implementation at
  // a later time and therefore we do not want the program to depend on
  // the edges being implemented as a dlist
  void firstEdge() { edges.first(); }
  void nextEdge() { edges.next(); }
  bool endOfEdges() { return edges.endOfList(); }
  Edge *getEdge() { return edges.get(); }

  void addEdge(Edge *e) { edges.append(e); }
};

class Edge {
public:
  Edge(Vertex *vtx1, Vertex *vtx2) {
    v1 = vtx1;
    v2 = vtx2;
  }
  Vertex *getVertex1() { return v1; }
  Vertex *getVertex2() { return v2; }

protected:
    Vertex *v1;
    Vertex *v2;
};
Note, this is an adjacency list representation. Now, a graph is simply a list of vertices. Since we are accessing vertices by name in the specification file, It is actually better to maintain a red-black tree of vertices, so that we can access any vertex by name in log(n) time. (Obviously, n is the number of vertices).
class Graph {
protected:
    rbTree<string> vertices;
public:
  // find the vertex with the given name 
  Vertex *find(string name) {
    if (vertices.find(name)) 
      return (Vertex *)vertices.getVal().v;
    else
      return 0;
  }

  // insert a vertex with the given name into the graph
  Vertex *insert(string name) {
    vertices.insert(name, new_jval_v(new Vertex(name)));
    return (Vertex *)vertices.getVal().v;
  }

  // methods for iterating through the vertices in the graph
  void firstVertex() { vertices.first(); }
  void nextVertex() { vertices.next(); }
  bool endOfVertices() { return vertices.endOfList(); }
  Vertex *getVertex() { return (Vertex *)vertices.getVal().v; }
};
In a real application we would probably not define the graph class since it simply adds an unnecessary layer of complexity. Instead we would simply define a variable called vertices that is an rbtree. Note how all the methods in the graph class essentially replicate methods in the rbtree class that holds the vertices.

Ok -- here is GraphReader.cpp. Pretty straightforward. When it is done, it prints out all the nodes and their edges.

main()
{
  Fields *f;
  Graph g;
  Vertex *v;
  Vertex *v2;
  Edge *e;
  string s;

  v = 0;

  f = new Fields();

  while (f->get_line() >= 0) {
    if (f->get_NF() > 0) {
      if (f->get_field(0) == "CLASS") {
        if (f->get_NF() != 2) {
          fprintf(stderr, "%d: CLASS name\n", f->get_line_number());
          exit(1);
        } 
        s = f->get_field(1);
	v = g.find(s);
        if (v == 0) {
          v = g.insert(s);
        } 
      } else if (f->get_field(0) == "PREREQ") {
        if (f->get_NF() != 2) {
          fprintf(stderr, "%d: PREREQ class\n", f->get_line_number());
          exit(1);
        } 
        if (v == 0) {
          fprintf(stderr, "%d: PREREQ -- no current vertex\n", 
                  f->get_line_number());
          exit(1);
        } 
        s = f->get_field(1);
        v2 = g.find(s);
        if (v2 == 0) {
          v2 = g.insert(s);
        } 
        e = new Edge(v2, v);
        v2->addEdge(e);
    
      } else {
          fprintf(stderr, "%d: lines must be CLASS or PREREQ\n", 
                   f->get_line_number());
          exit(1);
      }
    }
  }

  for (g.firstVertex(); !g.endOfVertices(); g.nextVertex()) {
    v = g.getVertex();
    printf("Class %s\n", v->getName().c_str());
    for (v->firstEdge(); !v->endOfEdges(); v->nextEdge()) {
      e = v->getEdge();
      printf("   is a prereq for %s\n", e->getVertex2()->getName().c_str());
    }
    printf("\n");
  }
}

Topological Sort

The book describes topological sort. Read it. In the example of classes and prerequisites, a topological sort will return a schedule of classes that does not violate the prerequisite structure. As the book says, a simple way to do this is to first find a class with no incoming edges (i.e. no prerequisites). Print that out, and then remove it and its outgoing edges from the graph. Repeat until the graph is empty.

The code in TS1.cpp does this. First, it adds a field nincident to each vertex. This is the number of prerequisites that the vertex has. This is set when the graph is created from the input file. Then, the routine find_zero_incident() returns a pointer to the RBNode of a vertex with no prerequisites. Note, it does this by traversing the tree.

Make sure you understand this code. This is very simple graph code. Test it on schedule.ts:

UNIX> TS1 < schedule.ts
CS102
CS140
CS160
CS302
CS340
CS360
CS365
CS530
CS560
MATH231
CS370
MATH300
CS311
CS380
CS411
CS580
UNIX> 
Ok, now, as the book says, you can improve this by instead maintaining a list of nodes with zero incident edges. Then the task of finding a vertex with zero incident edges is constant time. This code is in TS2.cpp. Again, make sure that you can trace through this. Note that when you run it, the output is different than the output for TS1. Both lists represent topological orders and show that topological orders are not necessarily unique.


Cycle Detection

One problem with both TS1,cpp and TS2.cpp is that if you give them an input file with a cycle, such as cycle.ts, then they can't work:
UNIX> TS1 < cycle.ts
MATH231
CS370
MATH300
Problems.....
UNIX> TS2 < cycle.ts
MATH300
MATH231
CS370
The standard way to recognize cycles in a graph is to do a depth-first search, marking vertices along the way. If you hit a vertex that you have already marked, then you have a cycle. This is done in CycleTest1.cpp. The important code is visit(), which does the depth-first search. Note, we assign a value of 2 for visited to show that there is no cycle from the vertex, and we've already checked it:
const int NOT_VISITED = 0;
const int BEING_VISITED = 1;
const int DONE_VISITED = 2;

  void Vertex::visit()
  {
    Edge *e;
    Vertex *v2;

    if (visited == BEING_VISITED) {
      printf("Cycle detected\n");
      exit(1);
    }
    if (visited == DONE_VISITED) return;

    visited = BEING_VISITED;
    for (edges.first(); !edges.endOfList(); edges.next()) {
      e = edges.get();
      e->getVertex2()->visit();
    }
    visited = DONE_VISITED;
    return;
  }
};
Finally CycleTest2.cpp actually prints out the cycle that is detected. This is done by a simple modification to visit():
  Vertex *Vertex::visit()
  {
    Edge *e;
    Vertex *v2;

    if (visited == BEING_VISITED) {
-->      printf("Cycle: %s", name.c_str());
-->      return this;
    }
    if (visited == DONE_VISITED) return 0;

    visited = BEING_VISITED;
    for (edges.first(); !edges.endOfList(); edges.next()) {
      e = edges.get();
      v2 = e->getVertex2()->visit();
-->      if (v2 != 0) {
-->	printf(" <- %s", name.c_str());
-->	if (v2 == this) { printf("\n"); exit(1); }
-->	return v2;
      }
    }
    visited = DONE_VISITED;
    return 0;
  }
};
See how it works on cycle.ts:
UNIX> CycleTest2 < cycle.ts
Cycle: CS102 <- CS580 <- CS380 <- CS311 <- CS302 <- CS140 <- CS102
UNIX>