Correspondence analysis [Gre84] is a geometric-based method for
displaying the rows and columns of a matrix (or a two-way contingency
table) as points in dual low-dimensional vector spaces. In contingency
tables [Gif90] or term-by-document matrices such as the matrix **A**
defined in Equation (1), the cell or matrix element
contains the frequency with which row category (keyword) **i** co-occurs
with column category (document) **j**.

Define as the vector of all **1**'s. Then, and
define vectors of row and column sums, respectively. It then
follows that is the sum of all the
nonnegative
matrix elements . Let and define diagonal matrices
composed of the elements of vectors **r** and **c**, respectively. As
originally described by Benzécri in [Ben73], the goal
of Correspondence Analysis is to find another matrix representation
(say matrix **X**) of the rows of **A** such that the Euclidean
distances between rows in **X** approximate certain
* profile* distances between the rows of **A**.
The term *profile* as used in [Gre84]
refers to a set of relative frequencies
found in the representation of a data matrix as a long, flat
table having frequencies in each row expressed as percentages
of their respective row sums. Simultaneously,
another matrix representation (say matrix **Y**) of the columns
of **A** is desired whose Euclidean distances among columns approximate
certain * profile* distances between columns of **A**.

The squared distance or distance between rows **i** and **j**
of the matrix **A** is defined as

where and denote the **i**-th and **k**-th elements of
the column vectors **r** and **c**, respectively. It can easily be shown
that is the same squared Euclidean distance
between rows **i** and **j** for the matrix whose
elements are defined by

The matrix **B** can also be written as
so that for any Euclidean representation **X** of **B** (see [Gif90]).

One derivation of the Euclidean representation **X** is given by the
SVD (see Equation (3)) of defined by

where , and ). By defining

it follows that

Using Equation (7), it follows that

and hence the Euclidean distances among the rows of **X** are equal to the
Euclidean distances among the rows of **B**.

The matrix **X** by construction will have one column of constant elements
(or equal to with appropriate scaling). This can be easily shown
by first noting that the matrix has a singular triplet
corresponding to the largest singular value , i.e.,
. This triplet can be derived
from the following equalities:

Consequently, the scaled right singular vector (corresponding to
) given by
has unit length so that
is a column
of the matrix **X**. This constant column of **X** does not contribute
to the distance between any two rows of **X**, and can be removed
by computing the SVD

which deflates or prevents the trivial singular vectors (, ) from occurring. Hence, this correction yields a Euclidean representation of

rather than that of the matrix **B**.

Whereas the rows of the matrix **X** (see Equation (8)) have the
same profile distances of the rows of matrix **A**, the columns of the
matrix have the same profile distances
of the columns of **A**. Note that . As discussed
in [Gif90], this suggests that the row elements of the matrix **X** are,
in fact, the * center of gravity* or * centroid* of the elements
of the column elements of **A**, weighted by their frequency in the
row profile. Benzécri
[Ben73] referred to this as
*le principle barycentrique.*

If the matrix representation
**Y** for the matrix **A** had initially been sought, distances
between columns of **A** (as opposed to the rows of **A**) would be used to
produce column elements of the matrix **Y** which are at the centroid of the
row elements of **A**. Using the same SVD from Equation (7),
the derived matrix **Y** and corresponding matrix **X** are given by

An alternative formulation for the matrix **X** determined by
correspondence analysis is discussed in [Gre84]. Here
the * generalized* singular value decomposition [GL89]

is used to minimize

where and are the **i**-th rows of the matrices **A**
and **X**, respectively, and .
The **k**-largest * generalized* singular triplets
from Equation (9) provide the optimal matrix **X** in
Equation (10) of rank **k** [Mir60].

For the right generalized singular vectors
from
Equation (9), the first **k** column vectors
are referred to as the **k**
principal axes of the rows of **A**. The total * variation* or
* inertia* of the matrix **A** or how well **A** is represented along
the **k** principal axes is given by

where is the **i**-th largest generalized singular
value (diagonal element of ) from Equation (9).
The * unexplained* variation
when approximating **A** via , where the subscript **k** denotes the first **k** columns
of each factor of the generalized singular value decomposition in
Equation (9), is given by

Since the total inertia is decomposed along the principal axes
[Gre84], the **i**-th principal axis accounts for an amount
of of the total inertia in Equation (11).

When used as permutation vectors for the rows and columns of an
matrix **A**, the **k** principal axes of the rows and
columns of **A** (i.e., and
, respectively) as determined
by Equations (7) or (9) can produce a
reordering of the original matrix **A** having more banded (or block diagonal)
form [Gre84]. Typically, the elements of the second largest left
and right generalized singular vectors () are
sorted in ascending order to produce the required row and column permutations.
Table 7 illustrates the reordering using the second
largest generalized singular triplets from Equation (9)
when **A** is the matrix
defined in Table 2.

Michael W. Berry (berry@cs.utk.edu)

Mon Jan 29 14:30:24 EST 1996