** Next:** Introduction

# Sparse Matrix Reordering Schemes for Browsing Hypertext

**Michael W. Berry**

Department of Computer Science,

107 Ayres Hall,

University of Tennessee,

Knoxville, TN 37996-1301,

berry@cs.utk.edu

**Bruce Hendrickson**

Sandia National Laboratories,

M/S 1110,

Albuquerque, NM 87185-1110,

bah@cs.sandia.gov

**Padma Raghavan **

Department of Computer Science,

107 Ayres Hall,

University of Tennessee,

Knoxville, TN 37996-1301,

padma@cs.utk.edu

Proceedings of the AMS-SIAM Summer Seminar on *Mathematics of Numerical
Analysis: Real Number Algorithms*, Park City, UT, July 17 - August
11, 1995. Published in
**Lectures in Applied Mathematics (LAM)**
Vol. 32: The Mathematics of Numerical
Analysis, J. Renegar, M. Shub, and S. Smale (eds.),
American Mathematical Society, (1996), pp. 99-123.

### Abstract

*Many approaches for retrieving documents
from electronic databases depend on the literal matching of
words in a user's query to the keywords defining
database objects. Since there is great diversity in
the words people use to describe the same object, literal- or lexical-
based methods can often retrieve irrelevant documents. Another approach
to exploit the implicit higher-order structure
in the association of terms with text objects is to compute the
singular value decomposition (SVD)
of large sparse term by text-object matrices. Latent Semantic
Indexing (LSI) is a conceptual indexing method which employs
the SVD to represent terms and objects by dominant singular
subspaces so that user queries can be matched in a lower-rank
semantic space. This paper considers a third, intermediate approach
to facilitate the immediate detection of document (or term) clusters.
We demonstrate that both traditional sparse matrix reordering
schemes (e.g., Reverse Cuthill-McKee) and spectral-based
approaches (e.g., Correspondence Analysis or Fiedler vector-based
spectral bisection) can be used to permute original
term by document (hypertext) matrices
to a narrow-banded form suitable for the detection
of document (or term) clusters. Although this approach would
not exploit the higher-order semantic structure in the database,
it can be used to develop browsing tools for hypertext and
on-line information at a reduced computational cost.
*

The first author's research was supported by the
National Science Foundation under grant Nos. NSF-CDA-91-15428 and
NSF-ASC-92-03004.

The second author's research was supported by the
U.S. Dept. of Energy, Office of Energy Research
(Mathematical Sciences program) under contract No. DE-AC04-76DP00789.

The third author's research was supported by the
National Science Foundation under grant No. NSF-ASC-94-11394, and by
the Advanced Research Projects Agency under grant Army-DAAL03-91-C-0047.

Michael W. Berry (berry@cs.utk.edu)

Mon Jan 29 14:30:24 EST 1996