Graph Definitions
See also Chapter 9 of Weiss.
Glossary
Graph Representation
Data Structures for Vertices
Vertices are typically represented as a set. Four of the possible, commonly
used,
representations for this set are a hash table, an array, a list, and a
binary search tree (such as a red-black tree). A
hash table is generally preferred because it has O(1) access and supports
dynamic insertions and deletions. The one drawback of a hash table is
that it is a little inefficient for finding all the vertices in a graph,
since you have to walk through each hash table entry, and for each hash
table entry, walk through its list of vertices. Since most hash tables
have a few empty entries, the visits to the empty entries are wasted.
An array works well when there are a static number of vertices and this
number is known in advance. In this case the array can be pre-allocated.
If the vertices are numbered consecutively, then the array can support
direct access; otherwise a binary search may be required. An advantage
of an array is that all the vertices can be visited by simply walking
through the array entries.
A list is typically the least preferable of the data structures. Its
advantage is that all the vertices can be visited by simply walking
through the list. However, to find a vertex will require O(V) time and
to insert a vertex may require O(V) time if the list is sorted.
The red-black tree has the advantage of finding a vertex in O(lg V) time.
Data Structures for Edges
The two most common representations for edges are a matrix and an
adjacency list.
Adjacency Matrix
An adjacency matrix is a two dimensional array whose dimensions are equal to
the number of vertices.
Values of the entries
- If the graph is undirected, then the entries
adj[x][y] and adj[y][x] are 1 if and only if there
is an edge connecting vertices x and y. Otherwise the entry is 0.
- If the graph is directed, then the entry adj[x][y] is 1 if
and only if there is a directed edge from vertex x to vertex y.
-
If the problem does not allow a vertex to have an edge to itself, then the
diagonal of the adjacency matrix is set to either 1's or 0's,
depending on which is more convenient for the algorithm.
- If the edge requires more information, then an edge class is created
and instances are allocated for each edge. In this case an entry points
either to the appropriate edge instance or is null, indicating the
lack of an edge.
Implementation: If the language supports a bit data type, then
matrices are typically stored as bit arrays. However, if either
the language does not support a bit data type or the edges are
represented as instances, then an integer or edge array is
typically used.
Adjacency List
An adjacency list is a list of vertices to which a vertex has connections
(i.e., it is a list of vertices that are attached to this vertex by edges).
In an adjacency list representation, each vertex typically points to
its adjacency list of edges. In an undirected graph, if there is an edge
from x to y, then the adjacency list for x will have an entry for y and
the adjacency list for y will have an entry for x.
Matrices versus Lists
As a rough rule of thumb, matrices are typically used for dense arrays and
adjacency lists for sparse arrays. The reason is that matrices consume
less space for dense arrays and adjacency lists consume less space for
sparse arrays. However, space is only one consideration. Other factors
must also be taken into account:
- Inserting an edge, determining if an edge exists, and deleting an
edge all take O(1) time in a matrix but potentially O(V) time
in an adjacency list. If time is of the essence and these
operations predominate, then a matrix may be the best choice, period.
- Initializing a matrix takes O(V^2) time. Consequently, while subsequent
operations may be fast, the initial start-up time for any algorithm
involving a matrix requires O(V^2) time. Hence, any algorithm
involving a matrix is minimally an O(V^2) algorithm.
Important: This is a case where Big-O notation can trip
you up. In a long-running system like an airline reservation system,
the most important consideration is generally how long the steady-state
operations take, not how long the initialization step takes.
Consequently, while an algorithm may appear to be O(n^2) because
of the initialization time, for all intents and purposes the
algorithm may act like an O(n) algorithm or an O(1) algorithm if
it runs for a long period of time. Nonetheless, in computing the
Big-O running time of an algorithm, the time devoted to the
initialization steps must be included.