next up previous
Next: Introduction

Sparse Matrix Reordering Schemes for Browsing Hypertext

Michael W. Berry
Department of Computer Science,
107 Ayres Hall,
University of Tennessee,
Knoxville, TN 37996-1301,
berry@cs.utk.edu

Bruce Hendrickson
Sandia National Laboratories,
M/S 1110,
Albuquerque, NM 87185-1110,
bah@cs.sandia.gov

Padma Raghavan
Department of Computer Science,
107 Ayres Hall,
University of Tennessee,
Knoxville, TN 37996-1301,
padma@cs.utk.edu


Proceedings of the AMS-SIAM Summer Seminar on Mathematics of Numerical Analysis: Real Number Algorithms, Park City, UT, July 17 - August 11, 1995. Published in Lectures in Applied Mathematics (LAM) Vol. 32: The Mathematics of Numerical Analysis, J. Renegar, M. Shub, and S. Smale (eds.), American Mathematical Society, (1996), pp. 99-123.

Abstract

Many approaches for retrieving documents from electronic databases depend on the literal matching of words in a user's query to the keywords defining database objects. Since there is great diversity in the words people use to describe the same object, literal- or lexical- based methods can often retrieve irrelevant documents. Another approach to exploit the implicit higher-order structure in the association of terms with text objects is to compute the singular value decomposition (SVD) of large sparse term by text-object matrices. Latent Semantic Indexing (LSI) is a conceptual indexing method which employs the SVD to represent terms and objects by dominant singular subspaces so that user queries can be matched in a lower-rank semantic space. This paper considers a third, intermediate approach to facilitate the immediate detection of document (or term) clusters. We demonstrate that both traditional sparse matrix reordering schemes (e.g., Reverse Cuthill-McKee) and spectral-based approaches (e.g., Correspondence Analysis or Fiedler vector-based spectral bisection) can be used to permute original term by document (hypertext) matrices to a narrow-banded form suitable for the detection of document (or term) clusters. Although this approach would not exploit the higher-order semantic structure in the database, it can be used to develop browsing tools for hypertext and on-line information at a reduced computational cost.





The first author's research was supported by the National Science Foundation under grant Nos. NSF-CDA-91-15428 and NSF-ASC-92-03004.

The second author's research was supported by the U.S. Dept. of Energy, Office of Energy Research (Mathematical Sciences program) under contract No. DE-AC04-76DP00789.

The third author's research was supported by the National Science Foundation under grant No. NSF-ASC-94-11394, and by the Advanced Research Projects Agency under grant Army-DAAL03-91-C-0047.


Michael W. Berry (berry@cs.utk.edu)
Mon Jan 29 14:30:24 EST 1996