Graph Definitions

Glossary

Graph: A collection of vertices and edges
Vertex: A simple object that can have a name and other properties
Edge: A connection between two vertices
Path: A list of vertices in which successive vertices are connected by edges in the graph
Connected Graph: A graph in which there is a path from every node to every other node in the graph.
Connected Component: A set of vertices within a graph in which:
1. there is a path from every vertex within the set to every other vertex in the set, and
2. there is not a path from any vertex within the set to any vertex not in the set
If a graph is not connected, then it has at least two connected components.
Cycle: A path in which the first and last vertices are the same (i.e., a path from a vertex back to itself)
Undirected Graph: A graph in which the edges are undirected (i.e., bidirectional).
Directed Graph: A graph in which the edges are directed (i.e., unidirectional). If edges are directed, then we speak of the edge as going from one vertex to another vertex.
Dense Graph: Roughly speaking, a graph in which the number of edges is greater than or equal to V lg V, where V is the number of vertices.
Sparse Graph: Roughly speaking, a graph in which the number of edges is < V lg V, where V is the number of vertices.

Graph Representation

Data Structures for Vertices

Vertices are typically represented as a set. Four of the possible, commonly used, representations for this set are a hash table, an array, a list, and a binary search tree (such as a red-black tree). A hash table is generally preferred because it has O(1) access and supports dynamic insertions and deletions. The one drawback of a hash table is that it is a little inefficient for finding all the vertices in a graph, since you have to walk through each hash table entry, and for each hash table entry, walk through its list of vertices. Since most hash tables have a few empty entries, the visits to the empty entries are wasted.

An array works well when there are a static number of vertices and this number is known in advance. In this case the array can be pre-allocated. If the vertices are numbered consecutively, then the array can support direct access; otherwise a binary search may be required. An advantage of an array is that all the vertices can be visited by simply walking through the array entries.

A list is typically the least preferable of the data structures. Its advantage is that all the vertices can be visited by simply walking through the list. However, to find a vertex will require O(V) time and to insert a vertex may require O(V) time if the list is sorted.

The red-black tree has the advantage of finding a vertex in O(lg V) time.

Data Structures for Edges

The two most common representations for edges are a matrix and an adjacency list.

Adjacency Matrix

An adjacency matrix is a two dimensional array whose dimensions are equal to the number of vertices.

Values of the entries

If the graph is undirected, then the entries adj[x][y] and adj[y][x] are 1 if and only if there is an edge connecting vertices x and y. Otherwise the entry is 0.
If the graph is directed, then the entry adj[x][y] is 1 if and only if there is a directed edge from vertex x to vertex y.
If the problem does not allow a vertex to have an edge to itself, then the diagonal of the adjacency matrix is set to either 1's or 0's, depending on which is more convenient for the algorithm.
If the edge requires more information, then an edge class is created and instances are allocated for each edge. In this case an entry points either to the appropriate edge instance or is null, indicating the lack of an edge.

Implementation: If the language supports a bit data type, then matrices are typically stored as bit arrays. However, if either the language does not support a bit data type or the edges are represented as instances, then an integer or edge array is typically used.

Adjacency List

An adjacency list is a list of vertices to which a vertex has connections (i.e., it is a list of vertices that are attached to this vertex by edges). In an adjacency list representation, each vertex typically points to its adjacency list of edges. In an undirected graph, if there is an edge from x to y, then the adjacency list for x will have an entry for y and the adjacency list for y will have an entry for x.

Matrices versus Lists

As a rough rule of thumb, matrices are typically used for dense arrays and adjacency lists for sparse arrays. The reason is that matrices consume less space for dense arrays and adjacency lists consume less space for sparse arrays. However, space is only one consideration. Other factors must also be taken into account:

Inserting an edge, determining if an edge exists, and deleting an edge all take O(1) time in a matrix but potentially O(V) time in an adjacency list. If time is of the essence and these operations predominate, then a matrix may be the best choice, period.
Initializing a matrix takes O(V^2) time. Consequently, while subsequent operations may be fast, the initial start-up time for any algorithm involving a matrix requires O(V^2) time. Hence, any algorithm involving a matrix is minimally an O(V^2) algorithm. Important: This is a case where Big-O notation can trip you up. In a long-running system like an airline reservation system, the most important consideration is generally how long the steady-state operations take, not how long the initialization step takes. Consequently, while an algorithm may appear to be O(n^2) because of the initialization time, for all intents and purposes the algorithm may act like an O(n) algorithm or an O(1) algorithm if it runs for a long period of time. Nonetheless, in computing the Big-O running time of an algorithm, the time devoted to the initialization steps must be included.