Correspondence analysis [Gre84] is a geometric-based method for displaying the rows and columns of a matrix (or a two-way contingency table) as points in dual low-dimensional vector spaces. In contingency tables [Gif90] or term-by-document matrices such as the matrix A defined in Equation (1), the cell or matrix element contains the frequency with which row category (keyword) i co-occurs with column category (document) j.
Define as the vector of all 1's. Then, and define vectors of row and column sums, respectively. It then follows that is the sum of all the nonnegative matrix elements . Let and define diagonal matrices composed of the elements of vectors r and c, respectively. As originally described by Benzécri in [Ben73], the goal of Correspondence Analysis is to find another matrix representation (say matrix X) of the rows of A such that the Euclidean distances between rows in X approximate certain profile distances between the rows of A. The term profile as used in [Gre84] refers to a set of relative frequencies found in the representation of a data matrix as a long, flat table having frequencies in each row expressed as percentages of their respective row sums. Simultaneously, another matrix representation (say matrix Y) of the columns of A is desired whose Euclidean distances among columns approximate certain profile distances between columns of A.
The squared distance or distance between rows i and j of the matrix A is defined as
where and denote the i-th and k-th elements of the column vectors r and c, respectively. It can easily be shown that is the same squared Euclidean distance between rows i and j for the matrix whose elements are defined by
The matrix B can also be written as so that for any Euclidean representation X of B (see [Gif90]).
One derivation of the Euclidean representation X is given by the SVD (see Equation (3)) of defined by
where , and ). By defining
it follows that
Using Equation (7), it follows that
and hence the Euclidean distances among the rows of X are equal to the Euclidean distances among the rows of B.
The matrix X by construction will have one column of constant elements (or equal to with appropriate scaling). This can be easily shown by first noting that the matrix has a singular triplet corresponding to the largest singular value , i.e., . This triplet can be derived from the following equalities:
Consequently, the scaled right singular vector (corresponding to ) given by has unit length so that is a column of the matrix X. This constant column of X does not contribute to the distance between any two rows of X, and can be removed by computing the SVD
which deflates or prevents the trivial singular vectors (, ) from occurring. Hence, this correction yields a Euclidean representation of
rather than that of the matrix B.
Whereas the rows of the matrix X (see Equation (8)) have the same profile distances of the rows of matrix A, the columns of the matrix have the same profile distances of the columns of A. Note that . As discussed in [Gif90], this suggests that the row elements of the matrix X are, in fact, the center of gravity or centroid of the elements of the column elements of A, weighted by their frequency in the row profile. Benzécri [Ben73] referred to this as le principle barycentrique.
If the matrix representation Y for the matrix A had initially been sought, distances between columns of A (as opposed to the rows of A) would be used to produce column elements of the matrix Y which are at the centroid of the row elements of A. Using the same SVD from Equation (7), the derived matrix Y and corresponding matrix X are given by
An alternative formulation for the matrix X determined by correspondence analysis is discussed in [Gre84]. Here the generalized singular value decomposition [GL89]
is used to minimize
where and are the i-th rows of the matrices A and X, respectively, and . The k-largest generalized singular triplets from Equation (9) provide the optimal matrix X in Equation (10) of rank k [Mir60].
For the right generalized singular vectors from Equation (9), the first k column vectors are referred to as the k principal axes of the rows of A. The total variation or inertia of the matrix A or how well A is represented along the k principal axes is given by
where is the i-th largest generalized singular value (diagonal element of ) from Equation (9). The unexplained variation when approximating A via , where the subscript k denotes the first k columns of each factor of the generalized singular value decomposition in Equation (9), is given by
Since the total inertia is decomposed along the principal axes [Gre84], the i-th principal axis accounts for an amount of of the total inertia in Equation (11).
When used as permutation vectors for the rows and columns of an matrix A, the k principal axes of the rows and columns of A (i.e., and , respectively) as determined by Equations (7) or (9) can produce a reordering of the original matrix A having more banded (or block diagonal) form [Gre84]. Typically, the elements of the second largest left and right generalized singular vectors () are sorted in ascending order to produce the required row and column permutations. Table 7 illustrates the reordering using the second largest generalized singular triplets from Equation (9) when A is the matrix defined in Table 2.