2.2. Sample Term-by-Document Matrix

  For purposes of comparing the reordering schemes discussed in the next section, consider the small database of Bellcore technical memoranda first presented in [DDF+90]. In Table 1, a total of nine titles of technical memoranda with five of them ( c1- c5) related to human-computer interaction and four of them ( m1- m4) related to graph theory. All the bold-faced words in Table 1 denote keywords which are used as referents to the titles. The parsing rule used for this sample database required that keywords appear in more than one title. Of course, alternative parsing strategies can increase or decrease the number of indexing keywords (or terms).


Table 1: Database of titles from Bellcore technical memoranda. Bold-faced keywords appear in more than one title. 


Table 2: The term-by-document matrix corresponding to the technical memoranda titles in Table 2 .

Corresponding to the text in Table 1 is the term-by-document matrix shown in Table 2. The elements of this matrix are the frequencies in which a term occurs in a document or title. For example, in title c5, the fifth column of the term-by-document matrix, response, time, and user all occur once. For simplicity, term weighting was not used to construct this sample matrix.

