Term-document matrices are sparse, nonsymmetric and typically, highly overdetermined. As mentioned above, it is desirable that these matrices be reordered so that nonzeroes are clustered evenly about a line from the upper left to the lower right corner of the matrix. This line, though visually a diagonal, is not the conventional diagonal of a nonsquare matrix. We define metrics suitable for evaluating reorderings by adapting some well established metrics used in symmetric matrix computations.

The * bandwidth* () and * envelope size* (
) are two measures
used in the context of symmetric sparse matrices reordered to a band form.
Let **C** be an symmetric matrix;
, the bandwidth of row **i**, is the
difference between **i** (the row number) and the smallest column number **j**
such that . Let be the maximum of bandwidth values
over all rows, i.e., .
The * bandwidth* is defined as .
We chose this definition over other alternatives
(such as defining the bandwidth as )
because its natural extension to overdetermined term-document matrices
seems more suitable as a metric for evaluating reorderings.
The * envelope* of **C** incorporates the variation in
bandwidth over all rows. The envelope size
is defined by

Observe that these terms capture the * distance* from the diagonal
to the farthest nonzero on each row of the matrix.

To extend these definitions to evaluate reorderings of
a nonsymmetric term-document matrix
**A**, we consider the straight line from the upper left
(row **1**, column **1**) to the lower right (row **m**, column **n**) corner.
The equation to this visual diagonal line can be easily computed;
the * diagonal* subscript of a row is defined as the abscissa
obtained using as the ordinate value in this equation.
To account for nonsymmetry, we define
, the bandwidth of row **i**, as the
largest difference between and any column number **j**
such that .
The bandwidth and the envelope size
are as defined
as earlier but using the the new definition of .
Defining as twice the maximum over row bandwidths
measures how evenly the nonzero clusters are
centered about the * diagonal.*

We also provide values of
, a quantity which measures the size of
the nonzero band but unlike does not take into account
the displacement from the * diagonal*.
Let be the difference
between the largest and smallest nonzero subscript in row
**i** of the matrix. Define as the maximum of
over all rows; now differs from in
not being relative to the diagonal. However, the sum
of over all rows is still the same
as
which was defined earlier in terms of .

Michael W. Berry (berry@cs.utk.edu)

Mon Jan 29 14:30:24 EST 1996