Graph Definitions

Glossary

Graph: A collection of vertices and edges
Vertex: A simple object that can have a name and other properties
Edge: A connection between two vertices
Path: A list of vertices in which successive vertices are connected by edges in the graph
Connected Graph: A graph in which there is a path from every node to every other node in the graph.
Connected Component: A set of vertices within a graph in which:
1. there is a path from every vertex within the set to every other vertex in the set, and
2. there is not a path from any vertex within the set to any vertex not in the set
If a graph is not connected, then it has at least two connected components.
Cycle: A path in which the first and last vertices are the same (i.e., a path from a vertex back to itself)
Undirected Graph: A graph in which the edges are undirected (i.e., bidirectional).
Directed Graph: A graph in which the edges are directed (i.e., unidirectional). If edges are directed, then we speak of the edge as going from one vertex to another vertex.
Dense Graph: Roughly speaking, a graph in which the number of edges is greater than or equal to V lg V, where V is the number of vertices.
Sparse Graph: Roughly speaking, a graph in which the number of edges is < V lg V, where V is the number of vertices.

Graph Representation

Data Structures for Vertices

Vertices are typically represented as a set. Four of the possible, commonly used, representations for this set are a hash table, an array, a list, and a binary search tree (such as a red-black tree). A hash table is generally preferred because it has O(1) access and supports dynamic insertions and deletions. The one drawback of a hash table is that it is a little inefficient for finding all the vertices in a graph, since you have to walk through each hash table entry, and for each hash table entry, walk through its list of vertices. Since most hash tables have a few empty entries, the visits to the empty entries are wasted.

An array works well when there are a static number of vertices and this number is known in advance. In this case the array can be pre-allocated. If the vertices are numbered consecutively, then the array can support direct access; otherwise a binary search may be required. An advantage of an array is that all the vertices can be visited by simply walking through the array entries.

A list is typically the least preferable of the data structures. Its advantage is that all the vertices can be visited by simply walking through the list. However, to find a vertex will require O(V) time and to insert a vertex may require O(V) time if the list is sorted.

The red-black tree has the advantage of finding a vertex in O(lg V) time.

Data Structures for Edges

The two most common representations for edges are a matrix and an adjacency list.

Adjacency Matrix

An adjacency matrix is a two dimensional array whose dimensions are equal to the number of vertices.

Values of the entries

If the graph is undirected, then the entries adj[x][y] and adj[y][x] are 1 if and only if there is an edge connecting vertices x and y. Otherwise the entry is 0.
If the graph is directed, then the entry adj[x][y] is 1 if and only if there is a directed edge from vertex x to vertex y.
If the problem does not allow a vertex to have an edge to itself, then the diagonal of the adjacency matrix is set to either 1's or 0's, depending on which is more convenient for the algorithm.
If the edge requires more information, then an edge class is created and instances are allocated for each edge. In this case an entry points either to the appropriate edge instance or is null, indicating the lack of an edge.

Implementation: If the language supports a bit data type, then matrices are typically stored as bit arrays. However, if either the language does not support a bit data type or the edges are represented as instances, then an integer or edge array is typically used.

Adjacency List

An adjacency list is a list of vertices to which a vertex has connections (i.e., it is a list of vertices that are attached to this vertex by edges). In an adjacency list representation, each vertex typically points to its adjacency list of edges. In an undirected graph, if there is an edge from x to y, then the adjacency list for x will have an entry for y and the adjacency list for y will have an entry for x.

Matrices versus Lists

As a rough rule of thumb, matrices are typically used for dense arrays and adjacency lists for sparse arrays. The reason is that matrices consume less space for dense arrays and adjacency lists consume less space for sparse arrays. However, space is only one consideration. Other factors must also be taken into account:

Inserting an edge, determining if an edge exists, and deleting an edge all take O(1) time in a matrix but potentially O(V) time in an adjacency list. If time is of the essence and these operations predominate, then a matrix may be the best choice, period.
Initializing a matrix takes O(V^2) time. Consequently, while subsequent operations may be fast, the initial start-up time for any algorithm involving a matrix requires O(V^2) time. Hence, any algorithm involving a matrix is minimally an O(V^2) algorithm. Important: This is a case where Big-O notation can trip you up. In a long-running system like an airline reservation system, the most important consideration is generally how long the steady-state operations take, not how long the initialization step takes. Consequently, while an algorithm may appear to be O(n^2) because of the initialization time, for all intents and purposes the algorithm may act like an O(n) algorithm or an O(1) algorithm if it runs for a long period of time. Nonetheless, in computing the Big-O running time of an algorithm, the time devoted to the initialization steps must be included.

Design of the Graph Data Structure

As already noted, a graph consists of two elements, vertices and edges. To create a graph data structure we need to create classes for vertices and edges. We also need to create a container class for a graph. The graph class will provide the various methods that an application can use such as depth-first search, breadth-first search, and shortest path methods.

There are two types of graphs, directed and undirected graphs. Since both types of graphs support many of the same operations, we will define an abstract superclass called Graph and make DirectedGraph and UndirectedGraph inherit from Graph:

               Graph
	         |
	   --------------
	   |            |
    DirectedGraph  UndirectedGraph

The methods in Graph will be declared as pure virtual. For example:

class Graph {
    void dfs( Vertex *v ) = 0;
    ...
};

Ordinarily implementations of these methods would be provided in DirectedGraph and UndirectedGraph. However, these classes should also be abstract base classes because they will have different implementations depending on whether they are dense or sparse graphs. Consequently DirectedGraph might have subclasses labeled DenseDirectedGraph and SparseDirectedGraph and UndirectedGraph might have subclasses labeled DenseUndirectedGraph and SparseUndirectedGraph. These subclasses would then provide concrete implementations of the methods declared by Graph.

The design of the vertex and edge classes also must be handled with care. Different real world problems require different types of information to be stored with a vertex or an edge. We want a design that will accommodate the different needs of these problems without having to rewrite our graph classes for each different problem. One way to accomplish this task is to define abstract base classes for Vertex and Edge that contain information needed by many of the algorithms. For example, many of the algorithms require a visited field for a vertex so we will declare a visited field in the Vertex class. Similarly edges typically have a weight so we will declare a weight field in the Edge class.

The vertex class can be used with either directed or undirected graphs. However, depending on the design we choose for the edge class, it may or may not be usable by both types of graphs. For example, an undirected edge might maintain pointers to both of its vertices whereas a directed edge only needs to maintain a pointer to the vertex to which it points. Our solution will be to define a single vertex pointer field in the edge class. This field will suffice for a directed edge. An undirected edge can then be handled in one of two ways:

Represent an undirected edge using two edge records. For example, if there is an undirected edge between vertices v and w, then v will have an edge object that points to w and w will have an edge object that points to v. The remaining edge information will be duplicated in both objects. This duplication could be wasteful of space, which is a disadvantage of this approach.
Declare an undirected edge to be a subclass of an edge and add a second pointer field. The advantage of this approach is that it does not duplicate the edge information. The disadvantage is that if we are following an edge from vertex v to vertex w, we will need to check both of the vertex pointers to determine which one points to w. This will make the code more inelegant.

Because storage does not tend to be that much of a concern with graphs and because the former method leads to more elegant code, we will use the former approach in the algorithms used in this course.

The vertex and edge data structures can now be roughed out using templates as follows:

template <class vtx_id_type, class weight_type> Edge;

template <class id_type, class weight_type>
class Vertex {
  protected:
    id_type id;
    bool visited;
}

// A vertex for a sparse graph uses an adjacency list
template <class id_type, class weight_type>
class SparseVertex : public Vertex<id_type, weight_type> {
  // We want graphs to have access to a vertice's variables. Unfortunately
  // we have to declare each type of graph to be a friend.
  friend class SparseDirectedGraph   
  friend class SparseUndirectedGraph
  ...
  protected:
     Dlist<Edge<id_type, weight_type> *> adj_list;
};

template <class vtx_id_type, class weight_type>
class Edge {
  // We want graphs to have access to a vertice's variables. Unfortunately
  // we have to declare each type of graph to be a friend.
  friend class SparseDirectedGraph   
  friend class DenseDirectedGraph
  ...
  protected:
     Vertex<vtx_id_type> *adj_vtx;
     weight_type weight;   
};

A graph must take the vertex id type and the weight type as parameters so it will need to be a template class as well.

Note that while we have provided these definitions using templates, it is not necessary to use templates in your own code. See examples in these notes for definitions of vertex and edge classes that do not use templates.