CS302 Lecture Notes - Random Graphs and Depth-First Search


Reference Material Online


Generating Random Graphs & Graph Representation

For some of our examples, we are going to generate random undirected graphs. This is a relatively simple matter, but does take a little care. First, think about a file format format for our graphs. A simple format is to first specify the number of nodes and assume that the nodes are labeled with numbers from zero to the number of nodes minus one. Then we specify the edges by specifying the two nodes that each edge connects. That specification makes it easy to write code to read the graph, since you can allocate all the nodes after reading the first line.

There are other ways, of course, to represent graphs, which you will see in subsequent lectures and labs.

Our graph generation program gen_graph takes two arguments: number of nodes and number of edges, and then it emits the number of nodes and generates the appropriate number of random edges. There are two pitfalls in writing gen_graph. First is that you don't want to generate edges from a node to itself, and second is that you don't want to generate duplicate edges. The first pitfall is taken care of easily by checking to make sure that the second random node generated does not equal the first.

To address the second pitfall, we use a set. When we generate a random edge, we turn it into a string composed of the id of the smaller node followed by a space and then the id of the larger node. We check the set for that string, and if it is there, then we have a duplicate edge and must throw it out and try again.

The code is in gen_graph.cpp. Note it error checks to make sure that e is ≤ n(n-1)/2. Think about why:

#include <iostream>
#include <string>
#include <set>
#include <stdlib.h>
using namespace std;

main(int argc, char **argv)
{
  int n;
  int e;
  int i;
  int n1, n2;
  set <string> edges;
  set <string>::iterator eit;
  string s;
  char edge[100];

  if (argc != 3) {
    cerr << "usage: ggraph n e\n";
    exit(1);
  }

  n = atoi(argv[1]);
  
  e = atoi(argv[2]);
  if (e > (n-1) * n / 2) {
    cerr << "e is too big\n";
    exit(1);
  }
  srand48(time(0));

  cout << "NNODES " << n << endl;
  for (i = 0; i < e; i++) {
    do {
      n1 = lrand48()%n;
      do n2 = lrand48()%n; while (n2 == n1);
      if (n1 < n2) {
        sprintf(edge, "%d %d", n1, n2);
      } else {
        sprintf(edge, "%d %d", n2, n1);
      }
      s = edge;
    } while (edges.find(s) != edges.end());
    
    edges.insert(s);

    cout << "EDGE " << s << endl;
  }
}

It works as it should. Here we generate two random graphs each with ten nodes.

UNIX> gen_graph 10 6 > g1.txt
UNIX> sleep 1
UNIX> gen_graph 10 9 > g2.txt
Here are the graph pictures and files:

g1.txt

NNODES 10
EDGE 4 9
EDGE 4 6
EDGE 4 7
EDGE 6 8
EDGE 3 5
EDGE 1 3
g2.txt

NNODES 10
EDGE 5 9
EDGE 1 2
EDGE 5 8
EDGE 3 7
EDGE 2 7
EDGE 0 3
EDGE 5 7
EDGE 6 8
EDGE 2 9

You'll note, g1 has six edges, four connected components and no cycles. G2 has nine edges, two connected components and one cycle (2,7,5,9,2).

(As an aside, is the above program really a good one? Ask youself, when is it good, and when is it bad? If you aren't sure of yourself, ask me in class.)


Depth First Search To Count Connected Components

Two nodes are in the same connected component if there is a path between them. Thus, a graph may be partitioned into its connected components. To discover all the nodes connected to a given node, you perform a depth first search, marking a visited field on each node that you encounter. When you encounter a node that you have visited already, you return. Otherwise, you mark that you have visited that node and recursively visit the node's children. When you are done, all nodes that you marked are in the same connected component.

This maps into a fairly simple algorithm for counting connected components. First, you read in a graph. Then you set all visited fields to zero. Then you traverse all the nodes in the graph, and whenever you encounter one whose visited field is zero, you perform the connected component depth first search on it. The total number of depth first searches is equal to the number of connected components in the graph.

The code is in concomp.cpp:

#include <iostream>
#include <string>
#include <vector>
#include <stdlib.h>
using namespace std;

class Node {
  public:
    int id;
    vector <int> edges;
    int component;
};

class Graph {
  public:
    vector <Node *> nodes;
    void Print();
    void Component_Count(int index, int cn);
};

void Graph::Component_Count(int index, int cn)
{
  Node *n;
  int i;

  n = nodes[index];
  if (n->component != -1) return;
  n->component = cn;
  for (i = 0; i < n->edges.size(); i++) Component_Count(n->edges[i], cn);
}

void Graph::Print()
{
  int i, j;
  Node *n;

  for (i = 0; i < nodes.size(); i++) {
    n = nodes[i];
    cout << "Node " << i << ": " << n->component << ":";
    for (j = 0; j < n->edges.size(); j++) {
      cout << " " << n->edges[j];
    }
    cout << endl;
  }
}


main(int argc, char **argv)
{
  Graph g;
  string s;
  int nn, n1, n2, i, c;
  Node *n;

  cin >> s;
  if (s != "NNODES") { cerr << "Bad graph\n"; exit(1); }
  cin >> nn;

  for (i = 0; i < nn; i++) {
    n = new Node;
    n->component = -1;
    n->id = i;
    g.nodes.push_back(n);
  }

  while (!cin.fail()) {
    cin >> s >> n1 >> n2;
    if (!cin.fail()) {
      if (s != "EDGE") { cerr << "Bad graph\n"; exit(1); }
      g.nodes[n1]->edges.push_back(n2);
      g.nodes[n2]->edges.push_back(n1);
    }
  }

  c = 0;
  for (i = 0; i < g.nodes.size(); i++) {
    if (g.nodes[i]->component == -1) {
      c++;
      g.Component_Count(i, c);
     }
  }

  g.Print();
}

As we can see, it works fine on our two example files. Pay attention to the output. Each line prints a node, its connected component number, and its adjacency list. Make sure you understand the output and how it relates to the pictures.

UNIX> concomp < g1.txt
Node 0: 1:
Node 1: 2: 3
Node 2: 3:
Node 3: 2: 5 1
Node 4: 4: 9 6 7
Node 5: 2: 3
Node 6: 4: 4 8
Node 7: 4: 4
Node 8: 4: 6
Node 9: 4: 4
UNIX> concomp < g2.txt
Node 0: 1: 3
Node 1: 1: 2
Node 2: 1: 1 7 9
Node 3: 1: 7 0
Node 4: 2:
Node 5: 1: 9 8 7
Node 6: 1: 8
Node 7: 1: 3 2 5
Node 8: 1: 5 6
Node 9: 1: 5 2
UNIX> 
The first call identifies the connected components as: It's not a bad idea to copy this file over and put some print statements in so that you can visualize the depth first search.

What's the running time? O(|V| + |E|). This covers two cases:

Thus, we say that counting connected components is linear in the number of vertices and edges.

Depth First Search To Perform Cycle Detection

Cycle detection is another depth first search. Here we also set a visited field; however, if we now encounter a node whose visited field is set, we know that the node is part of a cycle, and we return that fact. Again, it's a simple search, and I put the relevant code below (in cycledet0.cpp):

class Graph {
  public:
    vector <Node *> nodes;
    void Print();
    int is_cycle(int index);
};

int Graph::is_cycle(int index)
{
  Node *n;
  int i;

  n = nodes[index];
  if (n->visited) return 1;
  n->visited = 1;
  for (i = 0; i < n->edges.size(); i++) {
    if (is_cycle(n->edges[i])) return 1;
  }
  return 0;
}

main(int argc, char **argv)
{
  ...

  for (i = 0; i < g.nodes.size(); i++) {
    if (!g.nodes[i]->visited) {
      if (g.is_cycle(i)) {
        cout << "There is a cycle reachable from node " << i << endl;
      } else {
        cout << "No cycle reachable from node " << i << endl;
      }
    }
  }
}

Note that unlike connected components, this procedure has a return value, and it uses that return value to truncate the search when a cycle is found.

When we run it, we see that it doesn't work correctly, as it says that g1 has a bunch of cycles, when we know that it doesn't:

UNIX> cycledet0 < g1.txt
No cycle reachable from node 0
There is a cycle reachable from node 1
No cycle reachable from node 2
There is a cycle reachable from node 4
There is a cycle reachable from node 6
There is a cycle reachable from node 7
There is a cycle reachable from node 8
UNIX> 
Hmmm -- in cycledet1.cpp I put a print statement at the beginning of is_cycle():
UNIX> cycledet1 < g1.txt
Called is_cycle(0)
No cycle reachable from node 0
Called is_cycle(1)
Called is_cycle(3)
Called is_cycle(5)
Called is_cycle(3)
There is a cycle reachable from node 1
...
There's the bug. The program first visits node 0 and finds no cycle. Then it visits node 1 and recursively visits nodes 3 and 5. Since node 5 has an edge back to node 3, it detects a cycle there. How do we fix this bug?

One simple way is to include who calls is_cycle() as a parameter so that is_cycle() will not detect cycles that include the same edge twice. Here's the changed procedure and call from main() in cycledet2.cpp

int Graph::is_cycle(int index, int from)
{
  Node *n;
  int i;

  n = nodes[index];
  if (n->visited) return 1;
  n->visited = 1;
  for (i = 0; i < n->edges.size(); i++) {
    if (n->edges[i] != from) {
      if (is_cycle(n->edges[i], index)) return 1;
    }
  }
  return 0;
}

main(int argc, char **argv)
{
  ...
  for (i = 0; i < g.nodes.size(); i++) {
    if (!g.nodes[i]->visited) {
      if (g.is_cycle(i, -1)) {
        cout << "There is a cycle reachable from node " << i << endl;
      } else {
        cout << "No cycle reachable from node " << i << endl;
      }

    }
  }
}

All works well now:

UNIX> cycledet2 < g1.txt
No cycle reachable from node 0
No cycle reachable from node 1
No cycle reachable from node 2
No cycle reachable from node 4
UNIX> cycledet2 < g2.txt
There is a cycle reachable from node 0
No cycle reachable from node 4
UNIX> 
If you want to print the cycle, then you can start from when you first detect the cycle, and then stop when you reach the node from whence you detected the cycle. That's in cycledet3.cpp. Note, when I detect the cycle, I set the visited field to two. That is how I know when to stop printing and exit the program:

int Graph::is_cycle(int index, int from)
{
  Node *n;
  int i;
  int rv;

  n = nodes[index];
  if (n->visited) {
    n->visited = 2;
    cout << "Cycle: " << index;
    return 1;
  }
  n->visited = 1;
  for (i = 0; i < n->edges.size(); i++) {
    if (n->edges[i] != from) {
      if (is_cycle(n->edges[i], index)) {
        cout << " " << index;
        if (n->visited == 2) {
          cout << endl;
          exit(1);
        }
        return 1;
      }
    }
  }
  return 0;
}

UNIX> cycledet3 < g2.txt
Cycle: 7 5 9 2 7
UNIX>