CS302 Lecture Notes - Topological Sort


Some additional material:
Topological sorts work on directed, acyclic graphs, and they are very simple. It is a sorting of the vertices of a graph, such that if there is an edge from a to b, then a comes before b in the sorting. Since the graph is acyclic, a topological sort is guaranteed to exist, although it is not guaranteed to be unique.

For example, consider the following graph (from the Topcoder problem ConvertibleStrings, which we use as an example of Dynamic progrmaming). The numbers in green show a valid topological sort of the graph:

As I said before, the sortings do not have to be unique. For example, you could swap the 3rd and 4th nodes in the sort, and you would still have a valid sort.

To perform a topological sort, you maintain a list of nodes with no incoming edges. Then, until that list is empty, you do the following:

That's how I got the ordering in the graph above.

This is guaranteed to work, because the graph is acyclic. The running time is O(|V|+|E|). Like DFS and BFS, it visits each node once, and each edge once.

There are some problems that you can solve with topological sort:


Shortest paths with Dijkstra or Topological Sort?

If our graph is directed and acyclic, then we can calculate shortest paths using either Dijkstra's algorithm or with topological sort. If all we cared about was worst-case running time, we'd use topological sort, because O(|E|+|V|) is a better than O(|E|log|V|). However, we are not always dealing with worst-case running times. Think about it:

Let's explore this a little.

What I've done is write two programs: topo.cpp and dijkstra.cpp (I don't let you see dijkstra.cpp). These take the following command line arguments:

topo|dijkstra n maxcap window seed print(y|n)

The programs create random directed, acyclic graphs with n nodes, numbered 0 through n-1. The edges all have random capacities uniformly distributed between 1 and maxcap. Each node i has edges to nodes i+1 through i+window. These are interesting graphs. For example, take a look at a small graph:

UNIX> topo 4 50 2 1 y
Node 0: [1,35][2,44]
Node 1: [2,26][3,6]
Node 2: [3,48]
Node 3: 
Total edges in graph:           5
Shortest Path:                 41
Edges Processed:                5
Graph Creation Time:        0.000
Shortest Path Time:         0.000
UNIX> dijkstra 4 50 2 1 y
Node 0: [1,35][2,44]
Node 1: [2,26][3,6]
Node 2: [3,48]
Node 3: 
Total edges in graph:           5
Shortest Path:                 41
Edges Processed:                4
Graph Creation Time:        0.000
Shortest Path Time:         0.000
UNIX> 

It's pretty easy to see that the shortest path is 0 -> 1 -> 3. And you can see the difference between topological sort and Dijkstra -- topological sort has to process every edge. Dijkstra on the other hand, does not visit node 2, because the shortest path to node 3 is shorter than the one to node 2. For that reason, the edge from 2 to 3 is not processed.

Let's look at a larger example to see a class of graphs where Dijkstra's algorithm will outperform topological sort: Those where window equals n. Here's an example where n equals 8:

UNIX> topo 8 10 8 8 y
Node 0: [1,7][2,7][3,1][4,10][5,3][6,9][7,4]
Node 1: [2,7][3,5][4,3][5,3][6,2][7,6]
Node 2: [3,5][4,7][5,1][6,2][7,4]
Node 3: [4,3][5,6][6,1][7,2]
Node 4: [5,4][6,6][7,5]
Node 5: [6,8][7,3]
Node 6: [7,1]
Node 7: 
Total edges in graph:          28
Shortest Path:                  3
Edges Processed:               28
Graph Creation Time:        0.000
Shortest Path Time:         0.000
UNIX> dijkstra 8 10 8 8 n
Total edges in graph:          28
Shortest Path:                  3
Edges Processed:               14
Graph Creation Time:        0.000
Shortest Path Time:         0.000
UNIX> 
To help visualize this, I'm drawing the graph below, where the edges are colored according to their weights:

What you can see here is that while topological sort has to process all 28 edges, Dijkstra's algorithm only processes the edges from nodes 0, 3 and 6. Let's extrapolate and time. In each of these tests, maxcap is 1000 and windwow is equal to n.

I'm not super-proud of that graph, BTW -- the Dijkstra numbers are averages of 50 runs each, but there's still so enough randomness in the graphs that you see wavy lines. However, what you are seeing is that Dijkstra's algorithm process so many fewer edges than topological sort, that it is over ten times faster on the larger graphs.

Now, let's instead construct graphs that favor topological sort. Let's make n big, but limit window to 64. Here's an example:

UNIX> topo 10000 1000 64 1 n
Total edges in graph:      637920
Shortest Path:               2102
Edges Processed:           637920
Graph Creation Time:        0.048
Shortest Path Time:         0.010
UNIX> dijkstra 10000 1000 64 1 n
Total edges in graph:      637920
Shortest Path:               2102
Edges Processed:           635321
Graph Creation Time:        0.053
Shortest Path Time:         0.015
UNIX> 
Now, you can see that Dijkstra's algorithm is processing nearly all of the edges on the graph. Since it has to do map operations, which are O(log m) (where m is the size of the map), it is slower than topological sort, which is doing O(1) operations for each edge.

Let's look at how the timings scale with n when we keep window fixed at 64:

It's no longer a 10-fold improvement, but the topological sort clearly outperforms Dijkstra.

These are nice examples of showing how the structure of the graph impacts the performance of the two algorithms.