Vous êtes sur la page 1sur 48

POLAR DECOMPOSITIONS, FACTOR ANALYSIS AND

PROCRUSTES PROBLEMS IN FINITE DIMENSIONAL INDEFINITE


SCALAR PRODUCT SPACES
ULRIC KINTZEL†

Abstract. A criterion for the existence of H-polar decompositions based on comparing canonical
forms is presented, and a numerical procedure is explained for computing H-polar decompositions
of a matrix for which the product of itself with its H-adjoint is diagonalisable. Furthermore, the
H-orthogonal or H-unitary procrustes problem is stated, and solved by application of H-polar de-
compositions.

Key words. Indefinite scalar products, polar decompositions, factor analysis, procrustes prob-
lems.

AMS subject classifications. 15A63, 15A23.

1. Introduction. Let F be the field of the real numbers R or of the complex


numbers C and let Fn be a n−dimensional vector space over F. Furthermore, let
H be a fixed chosen regular symmetric or hermitian matrix of Fn×n and let x =
(x1 , . . . , xn )T , y = (y 1 , . . . , y n )T be column vectors of Fn . Then the bilinear or
sesquilinear functional
n
X
[x, y] = (Hx, y) where (x, y) = xα y α (y α = y α if F = R)
α=1

defines an indefinite scalar product in Fn . Indefinite scalar products have almost


all the properties of ordinary scalar products, except for the fact that the squared
norm [x, x] of a vector x 6= 0 can be positive, negative or zero. A corresponding
vector is called positive (space-like), negative (time-like) or neutral (isotropic, light-
like) respectively. If A is an arbitrary matrix of Fn×n then its H-adjoint A[∗] is
characterised by the property that

[Ax, y] = [x, A[∗] y] for all x, y ∈ Fn .

This is equivalent to the fact that between the H-adjoint A[∗] and the ordinary adjoint
A∗ = A T the relationship

A[∗] = H−1 A∗ H

exists. If in particular A[∗] = A or A∗ H = HA, one speaks of an H-selfadjoint or


H-symmetric or H-hermitian matrix, and an invertible matrix U with U[∗] = U−1 or
U∗ HU = H is called an H-isometry or H-orthogonal matrix or H-unitary matrix. If
A is a given matrix of Fn×n , then a representation such as

A = UM with U∗ HU = H and M∗ H = HM

is called an H-polar decomposition of A.

† Institut für Mathematik, MA 4-5, TU Berlin, Straße des 17. Juni 136, 10623 Berlin, Germany;

email: UKintzel@aol.com
1
Decompositions of this kind have been investigated in detail in the publications
[BMRRR1-3] and [MRR] as well as in the further references specified there. More spe-
cialised results concerning polar decompositions of H-normal matrices, i.e. matrices
which commute with their H-adjoint, are discussed in [LMMR].
H-polar decompositions are also the central subject of this paper, in which theo-
retical as well as practical questions are discussed. For example, the more theoretical
Chapter 3 is primarily concerned with finding a further criterion for the existence of
H-polar decompositions, whereas the more practical Chapter 4 presents a numerical
procedure for computing H-polar decompositions of a matrix A for the case in which
the matrix A[∗] A is diagonalisable. In both chapters some statements are required
concerning subspaces of the Fn , which are first of all derived in Chapter 2, whereby
some numerical questions are already examined in outlook for Chapter 4. In the final
Chapter 5, two applications from a branch of mathematics known in psychology as
factor analysis or multidimensional scaling, are ported into the environment of indef-
inite scalar products. This involves on the one hand the task of constructing sets of
points which take up given distances, and on the other hand the task of converging
two sets of such points in the sense of the method of least squares or the procrustes
problem1 , which is achieved with the help of an H-polar decomposition.
In a typical application of multidimensional scaling (MDS, for example see [BG])
test persons are first of all requested to estimate the dissimilarity (or similarity)
of specified objects which are selected terms describing the subject of the analysis.
For example, if professions are to be analysed, terms such as politician, journalist,
physician, etc. can be used as the objects. In this way the comparison of N objects
in pairs produces the similarity measures, called proximities, pkl , 1 ≤ k, l ≤ N ,
from which the distances dkl = f (pkl ) are then determined using a function f , for
example f (x) = ax + b, which is called the MDS model. Using these distances, the
coordinates of points xk in an n−dimensional Euclidean space are constructed such
that kxk − xl k = dkl whereby k.k stands for the Euclidean norm. Thus each object
is now represented by a point in a coordinates system and the data can be analysed
with regard to their geometric properties. For example, it can thereby be attempted
to interpret the basis vectors of the space in the sense of psychological factors, such
as social status in the given example of professions.
The results of interrogating the test persons are often categorised in M groups,
e.g. according to gender and/or age, producing several descriptive constellations of
(r)
points xk , 1 ≤ r ≤ M , in a Euclidean space of dimension n = max{n(r) } which
must be mutually compared in the analysis. To make such a comparison of two con-
stellations xk and yk possible, it is first of all necessary to compensate for irrelevant
differences resulting from the different locations
P in space. This is done with an orthog-
onal transformation U devised such that k kUxk − yk k2 is minimised. Thereafter
the constellations x̃k = Uxk and yk can be analysed.
Thus the MDS model f is chosen in particular by adding a constant b (and
by making further assumptions such as dkk = 0), so that the triangular inequality is
fulfilled and therefore the points can be embedded in an Euclidean space [BG, Chapter
18]. This restriction is not required mathematically if a pseudo-Euclidean geometry is

1 Procrustes, a robber in Greek mythology, who lived near Eleusis in Attica. Originally he was

called Damastes or Polypemon. He was given the name Procrustes (“the stretcher”) because he
tortured his victims to fit them into a bed. If they were too tall, he chopped off their limbs or formed
them with a hammer. If they were too small, he stretched them. He was overcome by Theseus who
served him the same fate by chopping off his head to fit him into the bed.
2
admitted. This is the subject of the investigations in Chapter 5, where the stated task
of constructing points from given distances and rotation data in the sense of optimum
balancing in the environment of indefinite scalar products is considered.
The following notation is used in the course of this work: The kernel, the image
and the rank of a matrix A are designated ker A, im A and rank A respectively. If
the matrix A is square, then tr A, det A and σ(A) are its trace, determinant and
spectrum respectively. Furthermore, the abbreviation A−∗ = (A∗ )−1 = (A−1 )∗ is
used. The symbol 0 is used for zero vectors as well as for zero matrices. In some
places it is additionally provided with size attributes 0p,q ∈ Fp×q or 0p ∈ Fp×p ,
whereby lower indices may also be intended as enumeration indices. This is evident
from the respective context. Ip , Np and Zp respectively designate the p × p identity
matrix, the p × p matrix with ones on the superdiagonal and otherwise zeros, and
the p × p matrix with ones on the antidiagonal and otherwise zeros. In particular
Jp (λ) = λIp + Np specifies an upper Jordan block for the eigenvalue λ. Moreover,
A1 ⊕ . . . ⊕ Ak stands for the block diagonal matrix consisting of the specified blocks,
and diag(α1 , . . . , αk ) stands for a diagonal matrix with the specified diagonal elements.
Even when no further specifications are made, a regular (real) symmetric or (complex)
hermitian matrix is always meant by H, and instead of A[∗] sometimes AH is written
to specify the matrix on which the scalar product is based. Lastly, the direct sum of
two subspaces X, Y ⊂ Fn is denoted by X ⊕ Y .
2. Subspaces. The properties of subspaces of an indefinite scalar product space
over the field of the real or complex numbers are discussed in detail in [GLR, Chapter
I.1]. In this chapter some additional properties are described which are required in
the course of the further considerations.
Let F = R or F = C and let [., .] be an indefinite scalar product of Fn with the
underlying regular symmetric or hermitian matrix H ∈ Fn×n . A subspace M ⊂ Fn is
said to be positive (non-negative, neutral, non-positive, negative) if

[x, x] > 0 ([x, x] ≥ 0, [x, x] = 0, [x, x] ≤ 0, [x, x] < 0)

is satisfied for all 0 6= x ∈ M . The set defined by

M [⊥] = {x ∈ Fn : [x, y] = 0 for all y ∈ M }

is also a subspace of Fn and is termed the H-orthogonal companion of M , for which


the important equations

(M [⊥] )[⊥] = M and dim M + dim M [⊥] = n

hold. A subspace M is called non-degenerate if x ∈ M and [x, y] = 0 for all y ∈ M


imply that x = 0, otherwise M is called degenerate. Furthermore, the equations

M ∩ M [⊥] = {0} and M ⊕ M [⊥] = Fn

are satisfied if and only if M is non-degenerate. (This is an essential difference com-


pared with the ordinary scalar product, for which these equations are always fulfilled
for the ordinary orthogonal complement M ⊥ .) It is now shown in [GLR, Theorem
1.4] that every non-negative (non-positive) subspace is a direct sum of a positive (neg-
ative) and a neutral subspace. However, the following more general theorem, whose
proof is based on statements made in [GR, Chapter IX, §2], is also true.
Theorem 2.1 (Decomposition of subspaces).
3
1. Every non-degenerate subspace M ⊂ Fn can be expressed as a direct sum
M = M+ ⊕ M− whereby M+ is positive, M− is negative and both spaces are
H-orthogonal.
2. Every subspace M ⊂ Fn can be expressed as a direct sum M = M0 ⊕
M1 whereby M0 is neutral, M1 is non-degenerate and both spaces are H-
orthogonal.
Proof. 1. Assume that M+ is a positive subspace of M with maximum dimension.
[⊥]
Then M+ is non-degenerate and M+ ⊕ M+ = Fn . Thus a representation
[⊥] [⊥]
M+ ⊕ (M ∩ M+ ) = M, M− = M ∩ M+ ,

exists with two H-orthogonal summands, and it remains to show that M− is negative.
Suppose that a vector x ∈ M− exists with [x, x] > 0. Then it would follow that
[x + y, x + y] = [x, x] + [y, y] > 0 for all y ∈ M+ . But this would mean that the
subspace M+ ⊕ span{x} is also positive, in contradiction to the maximality of M+ .
Thus M− is non-positive and the Schwarz inequality [GLR, Chapter I.1.3]2

|[x, y]|2 ≤ [x, x][y, y]

can be applied. Now assume that x0 ∈ M− with [x0, x0 ] = 0. Then the Schwarz
inequality shows that [x0, x] = 0 must be fulfilled for all x ∈ M− . Since it is also true
that [x0, y] = 0 for all y ∈ M+ , it follows that [x0, z] = 0 for all z ∈ M . Thus x0 = 0,
because M is non-degenerate.
2. Let M0 = M ∩ M [⊥] . Then M0 is neutral, because if a vector x ∈ M0 ⊂ M
were to exist with [x, x] 6= 0, it would follow that x ∈ / M [⊥] ⊃ M0 . Now let M1 be a
complementary subspace, so that

(M ∩ M [⊥] ) ⊕ M1 = M, M0 = M ∩ M [⊥] ,

with two H-orthogonal summands applies. To show that M1 is non-degenerate, let


x0 ∈ M1 with [x0 , x] = 0 for all x ∈ M1 . Furthermore, [x0 , y] = 0 for all y ∈ M0 , so
that [x0 , z] = 0 for all z ∈ M . Thus it follows that x0 ∈ M1 and x0 ∈ M0 , so that
x0 = 0.
On combining the two statements of the theorem, it is clear that every subspace
M ⊂ Fn can be expressed in the form

M = M+ ⊕ M− ⊕ M0

with a positive, a negative and a neutral - mutually H-orthogonal - subspace. In order


to deduce the dimensions of these spaces, we can refer to the following classical result
[GR, Chapter IX, §2].
Remark 2.2 (Projection onto subspaces). Let M = span{x1 , . . . , xm } be a
subspace of Fn . Then every vector y ∈ M ,
m
X
y= ηµ xµ ,
µ=1

can be represented uniquely by its coordinates ỹ = (η1 , . . . , ηm )T ∈ Fm with respect


to the given basis of M . Now if X = [x1 . . . xm ] ∈ Fn×m is a matrix whose columns
2 There is a typing error contained in equation (1.8): It must be read |(Hy, z)|2 ≤ (Hy, y)(Hz, z).
4
are the basis vectors, then y = Xỹ and for H̃ = X∗ HX ∈ Fm×m we obtain

(Hy, z)n = (HXỹ, Xz̃)n = (X∗ HXỹ, z̃)m = (H̃ỹ, z̃)m where
k
X
(x, y)k = xα y α .
α=1

Consequently all properties of the non-degenerate scalar product H : Fn × Fn → F


in the subspace M can be studied with the help of the possibly degenerate scalar
product H̃ : Fm × Fm → F. In particular, if σ(H̃) contains p positive and q negative
eigenvalues, and if r = m − p − q is the multiplicity of the eigenvalue 0, then for the
decomposition of M described above,

dim M+ = p, dim M− = q, dim M0 = r.

The dimensions of the subspaces are uniquely determined. This is a consequence of


Sylvester’s law of inertia, according to which the numbers of positive, negative and
vanishing elements are invariant in all diagonal representations of H̃. Furthermore,
the subspace M0 is uniquely determined by the nullspace ker H̃ of the degenerate
scalar product. If M is ultimately a non-degenerate subspace, then det H̃ 6= 0, i.e.
r = 0, and the maximum dimension of a neutral subspace of M is given by min(p, q).
This is not explicitly shown here, but the proof is given in [GLR, Theorem 1.5]. ♦
The following theorem describes the interesting fact that a single subspace induces
a decomposition of the entire space into four complementary subspaces.
Theorem 2.3 (Decomposition of the space). Let M ⊂ Fn . Then four subspaces
M1 , M2 , M00 , M000 of Fn exist with the following properties:
1. Fn = M0 ⊕ M1 ⊕ M2 with M0 = M00 ⊕ M000 .
2. M00 = M ∩ M [⊥] and M = M1 ⊕ M00 as well as M [⊥] = M2 ⊕ M00 .
3. M0 , M1 , M2 are non-degenerate and H-orthogonal in pairs.
4. M00 , M000 are neutral and dim M00 = dim M000 .
Proof. Suppose that M1 and M2 are the complements of M00 which exist according
to Theorem 2.1 and for which the assertion 2. is fulfilled. Then M1 ⊂ M and
M2 ⊂ M [⊥] are H-orthogonal and non-degenerate, so that M1 ⊕ M2 , too, is non-
degenerate. Consequently

Fn = (M1 ⊕ M2 ) ⊕ (M1 ⊕ M2 )[⊥]

and, moreover, M00 ⊂ (M1 ⊕ M2 )[⊥] . If it is now chosen that M0 = (M1 ⊕ M2 )[⊥] =
M00 ⊕ M000 , then the assertions 1. and 3. are fulfilled too. From

Fn = (M1 ⊕ M2 ⊕ M00 ) ⊕ M000 = (M + M [⊥] ) ⊕ M000

it furthermore follows that

dim M000 = n − dim(M + M [⊥] )


= n − (dim M + dim M [⊥] − dim(M ∩ M [⊥] ))
= dim M00 ,

whereby the equations

dim M + dim N = dim(M + N ) + dim(M ∩ N ) and dim M + dim M [⊥] = n,


5
which are valid for all subspaces, have been applied [GR, Chapter I.1.21], [GLR,
Chapter I.1.2].
It remains to show that M000 is neutral. Let r = dim M00 = dim M000 . Then M0
is a 2r−dimensional non-degenerate subspace of Fn , which can be split according
to Theorem 2.1 into a positive and a negative subspace M0 = M0+ ⊕ M0− . Let
p = dim M0+ and q = dim M0− . Since M0 must contain the r−dimensional neutral
subspace M00 it follows that r ≤ min(p, q) and thus p ≥ r and q ≥ r [GLR, Theorem
1.5]. On the other hand p + q = 2r, so that p = q = r. Thus for the subspace M0 the
representations
M0 = M0+ ⊕ M0− = M00 ⊕ M000 exist with
dim M0+ = dim M0− = dim M00 = dim M000

and H-orthogonal spaces M0+ , M0− , so that the three bases


M0+ = span{x+ + − − − 0 0 0
1 , . . . , xr }, M0 = span{x1 , . . . , xr }, M0 = span{x1 , . . . , xr }

with

[x+ + − − + − 0 0
k , xl ] > 0, [xk , xl ] < 0, [xk , xl ] = 0 and [xk , xl ] = 0 for 1 ≤ k, l ≤ r

can now be chosen. Since M0+ is positive, M0− is negative and M00 is neutral, it must
also be true that M0+ ∩ M00 = M0− ∩ M00 = {0}, so that each basis vector of M00 can
be expressed in the form
r
X r
X
x0k = αki x+
i + βki x− T T
i with (αk1 , . . . , αkr ) 6= 0, (βk1 , . . . , βkr ) 6= 0.
i=1 i=1

Furthermore, the vectors defined by


r
X r
X
x̃+
k = αki x+ −
i and x̃k = βki x−
i
i=1 i=1

can be used as new basis vectors for M0+ ,M0− , because if it is assumed that the
+
constants (λ1 , . . . , λr ) 6= 0 with λ1 x̃1 + . . . , λr x̃+ 0
r = 0 exist, then 0 6= λ1 x1 + . . . +
0 + − + − − − −
λr xr = λ1 (x̃1 + x̃1 ) + . . . + λr (x̃r + x̃r ) = λ1 x̃1 + . . . + λr x̃r ∈ M0 and thus
M0− ∩ M00 6= {0}. The linear independence of the vectors x̃− −
1 , . . . , x̃r can be shown
analogously. Finally, defining
x00k = x̃+ − 00 00 00
k − x̃k for 1 ≤ k ≤ r and M0 = span{x1 , . . . , xr },

then M000 is on the one hand a neutral subspace because


[x00k , x00l ] = [x̃+ − + − + + − −
k − x̃k , x̃l − x̃l ] = [x̃k , x̃l ] + [x̃k , x̃l ]
= [x̃+ − + − 0 0
k + x̃k , x̃l + x̃l ] = [xk , xl ] = 0

and on the other hand


M0 = M0+ ⊕ M0−
= span{x̃+ + − −
1 , . . . , x̃r } ⊕ span{x̃1 , . . . , x̃r }
= span{x̃+ − + − + − + −
1 + x̃1 , . . . , x̃r + x̃r } ⊕ span{x̃1 − x̃1 , . . . , x̃r − x̃r }
= M00 ⊕ M000 ,
6
so that the 4th assertion of the theorem is fulfilled too.
Whereas the statements have been proved so far without reference to particular
bases, we will also continue to use H-orthogonal bases. The following two theorems
contain corresponding generalisations of the Schmidt orthonormalisation method.
They will in particular be used to construct H-orthogonal bases of eigenspaces.
Theorem 2.4 (Orthonormalisation of bases). Let F = R or F = C and let X be
a subspace of Fn with dim X = m. Then there exists a basis {u1 , . . . , um } of X such
that
 
+1, for 1 ≤ k ≤ p 
[uk, ul ] = ²k δkl , ²k = −1, for p + 1 ≤ k ≤ p + q
 
0, for p + q + 1 ≤ k ≤ p + q + r

and p + q + r = m. In particular, if X is non-degenerate, then r = 0.


Proof. (Complete induction) Let X initially be non-degenerate and let {x1 , . . . ,
xm } be a basis of X. Also let k, l be two indices of {1, . . . , m} such that |[xk , xl ]| is
maximised. Then it necessarily follows that [xk , xl ] 6= 0, because otherwise X would
be degenerate. For the case k = l let the basis which is obtained by interchanging
x1 and xk still be designated as {x1 , . . . , xm }. Otherwise, on account of the polar
identities which are valid for all x, y ∈ Fn ,
1¡ ¢
[x, y] = [x + y, x + y] − [x − y, x − y] if F = R and
4
1¡ ¢
[x, y] = [x + y, x + y] − [x − y, x − y]
4
i¡ ¢
+ [x + iy, x + iy] − [x − iy, x − iy] if F = C,
4
the following selection can be made: Let
 + 
 z = xk + xl , z− = xk − xl , if F = R or F = C and 
| Re[xk , xl ]| ≥ | Im[xk , xl ]| and
 + 
z = xk + ixl , z− = xk − ixl , otherwise
½ ¾
x̃k = z+ , x̃l = z− , if |[z+ , z+ ]| ≥ |[z− , z− ]|
.
x̃k = z− , x̃l = z+ , otherwise

Then span{x̃k , x̃l } = span{xk , xl } and [x̃k , x̃k ] 6= 0. Let the particular basis obtained
by replacing xk , xl with x̃k , x̃l and then exchanging x1 and x̃k still be designated as
{x1 , . . . , xm }. If now we make
p
u1 = x1 / |[x1 , x1 ]| and ²1 = sig[x1 , x1 ] ∈ {+1, −1},

then [u1 , u1 ] = [x1 , x1 ]/|[x1 , x1 ]| = ²1 and for the vectors defined by

x0i = xi − ²1 [xi , u1 ]u1 for 2 ≤ i ≤ m

we obtain [x0i , u1 ] = [xi , u1 ] − ²1 [xi , u1 ][u1 , u1 ] = 0. Thus X can be expressed as


direct sum of its H-orthogonal subspaces span{u1 } and X 0 = span{x02 , . . . , x0m }, so
that X 0 too is nondegenerate. Now according to the induction hypothesis there exists
a basis {u2 , . . . , um } of X 0 with the demanded properties, so that finally {u1 , . . . , um }
is the wanted basis of X, if a suitable sorting is also made in the case of ²1 = −1.
If X is a degenerate subspace, the same construction can be applied, but it then
7
terminates after a certain number of steps, namely when no more non-vanishing scalar
products can be found. The remaining r vectors x0i thus satisfy [x0i , x0j ] = 0 for
m − r + 1 ≤ i, j ≤ m.
Theorem 2.5 (Orthonormalisation of pairs of bases). Let F = R or F = C and
let X, Y be two neutral subspaces of Fn with dim X = dim Y = m and X ∩ Y = {0}.
Then there exists a basis {u1 , . . . , um } of X and a basis {v1 , . . . , vm } of Y such that
½ ¾
1, for 1 ≤ k ≤ p
[uk, vl ] = ²k δkl , ²k =
0, for p + 1 ≤ k ≤ p + r
and p + r = m. In particular, if X ⊕ Y is non-degenerate, then r = 0.
Proof. (Complete induction) Let X ⊕ Y initially be non-degenerate and let {x1 ,
. . . , xm } be a basis of X and {y1 , . . . , ym } be a basis of Y . Also let k, l be two
indices from {1, . . . , m} so that |[xk , yl ]| is maximised. Then it necessarily follows that
[xk , yl ] 6= 0, because otherwise X ⊕ Y would be degenerate. Let the particular bases
obtained by exchanging x1 and xk or y1 and yl still be designated as{x1 , . . . , xm }
and {y1 , . . . , ym } respectively. In the case F = R now let ²1 ∈ {+1, −1} such that
λ1 = [x1 , ²1 y1 ] > 0 and let
p p
u1 = x1 / λ1 as well as v1 = ²1 y1 / λ1 ,
so that [u1 , v1 ] = [x1 , ²1 y1 ]/[x1 , ²1 y1 ] = 1; in the case F = C let λ1 = [x1 , y1 ] and let
u1 = x1 /ω1 as well as v1 = y1 /ω 1 where ω12 = λ1
so that [u1 , v1 ] = [x1 , y1 ]/[x1 , y1 ] = 1. For the vectors defined by
x0i = xi − [xi , v1 ]u1 as well as yi0 = yi − [yi , u1 ]v1 for 2 ≤ i ≤ m
we obtain [x0i , v1 ] = [xi , v1 ] − [xi , v1 ][u1 , v1 ] = 0 and [yi0 , u1 ] = [yi , u1 ] − [yi , u1 ] ×
[v1 , u1 ] = 0. Thus X ⊕Y can be expressed as direct sum of its H-orthogonal subspaces
span{u1 , v1 } and X 0 ⊕ Y 0 = span{x02 , . . . , x0m } ⊕ span{y20 , . . . , ym 0
}, so that X 0 ⊕
Y 0 , too, is non-degenerate. Now according to the induction hypothesis, two bases
{u2 , . . . , um } and {v2 , . . . , vm } of X 0 and Y 0 exist with the demanded properties,
so that finally {u1 , . . . , um } and {v1 , . . . , vm } are the wanted bases of X and Y . If
X ⊕ Y is a degenerate subspace, the same construction can be applied, but it then
terminates after a certain number of steps, namely when no more nonvanishing scalar
products can be found. The remaining 2r vectors x0i , yi0 thus satisfy [x0i , yj0 ] = 0 for
m − r + 1 ≤ i, j ≤ m.
Numerical Procedure 2.6 (Orthonormalisation of bases). The proofs of
the Theorems 2.4 and 2.5 were formulated by choice of the maximised scalar product
(pivoting) such that they can be implemented directly as stable numerical methods.
To limit the coordinates of the neutral basis vectors
up+q+1 , . . . , um or up+1 , . . . , um and vp+1 , . . . , vm
p
they should additionally be brought to the Euclidean length kxk = (x, x) = 1
after completing the orthogonalisation process. Furthermore, the normalisation in
the method of Theorem 2.5 can be modified by a factor α
α ²1
u1 = √ x1 , v1 = √ y1 (F = R) or
λ1 α λ1
α 1
u1 = x1 , v1 = y1 (F = C).
ω1 αω 1
8
p
In particular, if the choice α = ky1 k/kx1 k is made, then ku1 k = kv1 k is ensured,
which is found to be particularly advantageous
pin practice. This can be demonstrated
with the following example in which kAk = tr(A∗ A) denotes the Frobenius Norm
and cond A = kAkkA−1 k denotes¡the condition
¢ number
¡ ¢of a matrix A:
Let H = diag(1, −1) and x = x x T , y = y −y T with x, y ∈ C\{0}. Then
X = span{x} and Y = span{y} are neutral subspaces of equal dimension such that
X ∩ Y = {0} and Theorem 2.5 can be applied. Let λ = [x, y] = 2xy and let ω be one
of the two square roots of λ. Then the columns [x0 y0 ] of the matrix X1 where
x y  y y 
ω ω  2 ω ω 
X1 =   , i.e. X−1 = |ω|  ,
x y
1
2xy  x x
− −
ω ω ω ω
are the vectors obtained by orthonormalisation without modification and it is true
that
|x|2 + |y|2
cond X1 = because
|x||y|
2(|x|2 + |y|2 ) |ω|2 (|x|2 + |y|2 )
kX1 k2 = 2
, kX−1 2
1 k = .
|ω| 2|x|2 |y|2
p p
If now α = kyk/kxk = |y|/|x|, then the columns [x00 y00 ] of the matrix X2 where
 αx y   y y 
ω αω  2  αω αω 
X2 =   , i.e. X−1 = |ω|  ,
 αx y 
2
2xy  αx αx 
− −
ω αω ω ω
are the vectors obtained by orthonormalisation with modification and it is true that

α4 |x|2 + |y|2
cond X2 = = 2 because
α2 |x||y|
2(α4 |x|2 + |y|2 ) |ω|2 (α4 |x|2 + |y|2 )
kX2 k2 = 2 2
, kX−1 2
1 k = .
α |ω| 2α2 |x|2 |y|2

But for arbitrary real numbers a, b we find that 0 ≤ (a − b)2 = a2 − 2ab + b2 or


2ab ≤ a2 + b2 from which in the case of ab > 0 the inequality 2 ≤ (a2 + b2 )/ab follows,
so that in particular cond X1 ≥ 2 and thus

cond X1 ≥ cond X2

is fulfilled. Therefore, the matrix X2 obtained by modification with the factor α is


at least as well conditioned as X1 ; but in the case of |x| 6= |y| it is always better
conditioned. ♦
It is easily seen that Theorem 2.4 provides a special basis for the statements from
Theorem 2.1. A corresponding basis representation of Theorem 2.3 can be formulated
as follows.
Theorem 2.7 (Extension of bases). Let F = R or F = C and let X be a subspace
of Fn with dim X = m. Then there exists a basis {u1 , . . . , un } of Fn which has the
following properties:
9
1. If U = [u1 . . . un ] is a matrix whose columns are the basis vectors, then
· ¸ · ¸ · ¸
∗ Ip 0 0 Ir Is 0
U HU = ⊕ ⊕
0 −Iq Ir 0 0 −It

with p + q + r = m and p + q + 2r + s + t = n.
2. If the subspaces X1 , X00 , X000 , X2 are defined by

X1 = span{u1 , . . . , up+q },
X00 = span{up+q+1 , . . . , up+q+r },
X000 = span{up+q+r+1 , . . . , up+q+2r },
X2 = span{up+q+2r+1 , . . . , up+q+2r+s+t },

then X00 , X000 are neutral subspaces with the same dimension, X1 , X2 and
X0 = X00 ⊕ X000 are non-degenerate and mutually H-orthogonal, and Fn =
X0 ⊕ X1 ⊕ X2 as well as

X = X1 ⊕ X00 , X [⊥] = X2 ⊕ X00 , X ∩ X [⊥] = X00 .

Proof. Let p + q + r = m and let E = {ei }m i=1 be a basis of X which exists


according to Theorem 2.4 such that
 
+1, for r + 1 ≤ i = j ≤ r + p 
[ei , ej ] = −1, for r + p + 1 ≤ i = j ≤ r + p + q .
 
0, otherwise

Furthermore, let Ẽ = { e i }m
i=1 be a dual basis with respect to E , i.e.

[ei , e j ] = δij for 1 ≤ i, j ≤ m.

Then the vectors defined by


r
≈ ∼ 1X∼ ∼
ek = ek − [ e k , e µ ]eµ for 1 ≤ k ≤ r
2 µ=1

satisfy the equations3


≈ ≈
[ e k , e l ] = 0 for 1 ≤ k, l ≤ r and

[ei , e k ] = δik for 1 ≤ i ≤ m, 1 ≤ k ≤ r.

If now we make
1 ≈ 1 ≈
e0k = √ (ek + e k ) and e00k = √ (ek − e k ) for 1 ≤ k ≤ r
2 2
then it is true that

[e0k , e0l ] = δkl , [e00k , e00l ] = −δkl , [e0k , e00l ] = 0 for 1 ≤ k, l ≤ r and
[ei , e0k ] = 0, [ei , e00k ] = 0 for r + 1 ≤ i ≤ m, 1 ≤ k ≤ r.
3 The same construction is also specified in [BMRRR2] and in [BR] within the scope of the proof

for Witt’s theorem. However, the necessary orthonormalisation of the vectors e k is there not carried
0 00
out completely, so that the basis {ei , ek , ek } also constructed there is not orthonormalised.
10
Thus the set of the vectors

{u1 , . . . , um+r } = {ei }m 0 r 00 r


i=r+1 ∪ {ek }k=1 ∪ {ek }k=1

constitutes an orthonormalised basis of a non-degenerate subspace Y which can be


complemented with n − m − r further vectors um+r+1 , . . . , un to give an orthonor-
malised basis of Fn

[ui , uj ] = ²i δij , ²i ∈ {+1, −1} for 1 ≤ i, j ≤ n.

For the matrix U consisting of these basis vectors it is true that

U∗ HU = (Ip ⊕ −Iq ) ⊕ (Ir ⊕ −Ir ) ⊕ (Is ⊕ −It ),

whereby s specifies the number of positive, t specifies the number of negative com-
plementing vectors, and a suitable sorting method is assumed. Instead of the basis
{u1 , . . . , un } it is also possible to use the basis

{u1 , . . . , up+q , ũp+q+1 , . . . , ũp+q+2r , up+q+2r+1 , . . . , un }



with {ũp+q+1 , . . . , ũp+q+2r } = {ek }rk=1 ∪ { e k }rk=1

For the matrix Ũ consisting of these basis vectors it is true that


· ¸ · ¸ · ¸
∗ Ip 0 0 Ir Is 0
Ũ HŨ = ⊕ ⊕ ,
0 −Iq Ir 0 0 −It

and evidently the second part of the assertion is fulfilled too by this basis.
An important application of this result is the following Theorem of Witt con-
cerning the extension of isometries, whose proof has been taken over from the papers
[BR, Theorem 4.1] and [BMRRR2, Theorem 2.1]. Thereby π(H) gives the number of
positive eigenvalues of an hermitian matrix H.
Theorem 2.8 (Witt, extension of isometries). Let F = R or F = C and let [., .]1 ,
[., .]2 be two indefinite scalar products of Fn with the underlying regular symmetric or
hermitian matrices H1 , H2 ∈ Fn×n for which π(H1 ) = π(H2 ) is fulfilled. If X1 and
X2 are subspaces of Fn and U0 : X1 → X2 is a regular transformation such that

[U0 x, U0 y]2 = [x, y]1 for all x, y ∈ X1 ,

then there exists a regular transformation U : Fn → Fn such that

[Ux, Uy]2 = [x, y]1 for all x, y ∈ Fn and Ux = U0 x for all x ∈ X1 .

Proof. Let dim X1 = m and let {e1 , . . . , em } be an orthonormalised (according


to Theorem 2.4) basis of X1 with
 
+1, for 1 ≤ k ≤ p 
[ek, el ] = ²k δkl , ²k = −1, for p + 1 ≤ k ≤ p + q
 
0, for p + q + 1 ≤ k ≤ p + q + r

and p + q + r = m. Then {f1 , . . . , fm } with fk = U0 ek for 1 ≤ k ≤ m is an


orthonormalised basis of X2 , and both bases can be extended to bases of Fn according
11
to Theorem 2.7. For the matrices R1 = [e1 . . . en ] and R2 = [f1 . . . fn ] consisting of
the extended basis vectors it follows that
· ¸ · ¸ · ¸
I 0 0 Ir I 0
R∗1 H1 R1 = R∗2 H2 R2 = p ⊕ ⊕ s
0 −Iq Ir 0 0 −It

with r+s+t = n−m, since the number of positive and the number of negative vectors,
s and t respectively, must be identical for both bases, because of the assumption that
the signatures of the matrices H1 and H2 are the same. Thus the transformation
defined by

UR1 = R2 or U = R2 R−1
1

fulfils the assertion of the theorem.


The singular value decomposition is a suitable tool for practical application of
the Theorems 2.7 and 2.8. It can be used for constructing a dual basis as well as for
complementing a basis of a subspace to give a basis of Fn .
Numerical Procedure 2.9 (Extension of bases and isometries). Let X =
[x1 . . . xm ] be an n × m matrix (m < n) whose columns constitute a basis of X1 , and
let
· ¸
£ ¤ Σ̃
HX = Ũ1 Ũ2 V∗ = Ũ1 Σ̃Ṽ∗
0

be a singular value decomposition of the matrix HX. Then the columns [y1 . . . ym ]
of
· ¸
£ ¤ Σ̃−1
Y = Ũ1 Ũ2 Ṽ∗ = Ũ1 Σ̃−1 Ṽ∗
0

constitute a basis which is dual with respect to the columns of X because Y∗ HX =


ṼΣ̃−1 Ũ∗1 Ũ1 Σ̃Ṽ∗ = Im . Thus if it is assumed that

X∗ HX = Ip ⊕ −Iq ⊕ 0r with p + q + r = m,

as can always be achieved by orthonormalisation of the columns of X according to


Theorem 2.4, then it is true for the matrix constituted by X0 = [x1 . . . xp+q xp+q+1 . . .
0 0
xm yp+q+1 . . . ym ] with
m
1 X
yk0 = yk − [yk , yµ ]xk for p + q + 1 ≤ k ≤ m
2 µ=p+q+1

that
· ¸
0r Ir
(X0 )∗ H(X0 ) = Ip ⊕ −Iq ⊕
Ir 0r

and for the matrix defined by X00 = [x1 . . . xp+q x0p+q+1 . . . x0m x00p+q+1 . . . x00m ] with
1 1
x0k = √ (xk + yk0 ), x00k = √ (xk − yk0 ) for p + q + 1 ≤ k ≤ m
2 2
it is found that

(X00 )∗ H(X00 ) = Ip ⊕ −Iq ⊕ Ir ⊕ −Ir .


12
If the columns of X00 are renamed as [x001 . . . x00m+r ] and if
· ¸
£ ¤ Σ
X00 = U1 U2 V∗
0

is a further singular value decomposition, then the columns [um+r+1 . . . un ] of U2


complement the columns of X00 to give a basis of Fn , and for (X00 |U02 ) = [x001 . . . x00m+r
u0m+r+1 . . . u0n ] with the vectors obtained by orthonormalisation

m+r
X
u0k = uk − ²µ [uk , x00µ ]x00µ ,
µ=1

²µ = sig[x00µ , x00µ ] = [x00µ , x00µ ] for m + r + 1 ≤ k ≤ n

it follows that

(X00 |U02 )∗ H(X00 |U02 ) = Ip ⊕ −Iq ⊕ Ir ⊕ −Ir ⊕ (U02 )∗ H(U02 ).

Finally, the columns of U02 can be orthonormalised, giving a matrix U002 for which

(X00 |U002 )∗ H(X00 |U002 ) = Ip ⊕ −Iq ⊕ Ir ⊕ −Ir ⊕ Is ⊕ −It = Z

with p + q + 2r + s + t = n whereby π(H) = p + r + s specifies the positive index of


inertia of H.
Starting with the matrix Y = [y1 . . . ym ] = [U0 x1 . . . U0 xm ], a matrix (Y00 |V200 )
can be constructed in the same manner, for which, too,

(Y00 |V200 )∗ H(Y00 |V200 ) = Ip ⊕ −Iq ⊕ Ir ⊕ −Ir ⊕ Is ⊕ −It = Z

as is also possible by using two scalar products [., .]1 and [., .]2 with π(H1 ) = π(H2 ).
Thus the matrix

U = (Y00 |V200 )(X00 |U002 )−1 = (Y00 |V200 )Z(X00 |U002 )∗ H

gives the wanted isometry. ♦


3. Canonical forms and H-polar decompositions. The following theorem
reviews a well-known statement concerning a transformation of the form (A, H) →
(S−1 AS, S∗ HS), which is designated as canonical form. It goes back to results of
Kronecker and Weierstrass and is fundamental for handling H-selfadjoint matrices.
Theorem 3.1 (Canonical form). Let H ∈ Cn×n be regular and hermitian and let
A ∈ Cn×n be H-hermitian. Then there exists a regular matrix S ∈ Cn×n such that

S−1 AS = A1 ⊕ . . . ⊕ Ak and S∗ HS = H1 ⊕ . . . ⊕ Hk (3.1a)

whereby the blocks Aj and Hj are of equal size and each pair (Aj , Hj ) has one and
only one of the following forms:
1. Pairs belonging to real eigenvalues

Aj = Jp (λ) and Hj = ²Zp (3.1b)

with λ ∈ R, p ∈ N and ² ∈ {+1, −1}.


13
2. Pairs belonging to non-real eigenvalues
· ¸ · ¸
J (λ) 0 0 Zp
Aj = p and Hj = (3.1c)
0 Jp (λ) Zp 0

with λ ∈ C\R, Im(λ) > 0 and p ∈ N.


Moreover, the canonical form (S−1 AS, S∗ HS) of (A, H) is uniquely determined up
to the permutation of blocks.
Proof. see [GLR, Theorem 3.3].
The ordered set of the signs ² appearing in the blocks (3.1b) is an invariant of
the canonical form and is called its sign characteristic. Furthermore, an analogous
form also exists for real matrices [GLR, Theorem 5.3], but this is not required for the
following discussion.
If now F = R or F = C and A ∈ Fn×n , then a representation of the kind A = UM,
in which U ∈ Fn×n is an H-isometry and M ∈ Fn×n is H-selfadjoint, is called an H-
polar decomposition of A. Decompositions of this kind have been studied in detail
in the papers [BMRRR1-3] and [MRR] as well as in the references cited therein. The
following results are important for these investigations.
Theorem 3.2 (H-polar decomposition of real matrices). Let H ∈ Rn×n be regular
and symmetric, and let A ∈ Rn×n . Then A admits a real H-polar decomposition
A = Ur Mr (Ur ∈ Rn×n is H-orthogonal, Mr ∈ Rn×n is H-symmetric) if and only
if it admits a complex H-polar decomposition A = Uc Mc (Uc ∈ Cn×n is H-unitary,
Mc ∈ Cn×n is H-hermitian).
Proof. see [BMRRR1, Lemma 4.2].
Theorem 3.3 (Existence of H-polar decompositions). Let F = R or F = C and
let A ∈ Fn×n . Then A admits an H-polar decomposition if and only if there exists an
H-selfadjoint matrix M ∈ Fn×n such that M2 = A[∗] A and ker M = ker A.
Proof. see [BMRRR1, Theorem 4.1] and [BR, Lemma 4.1].
Theorem 3.4 (Existence of H-selfadjoint square roots). Let F = R or F = C and
let A ∈ Fn×n . Then there exists an H-selfadjoint matrix M such that M2 = A[∗] A and
ker M = ker A if and only if the canonical form of (A[∗] A, H) satisfies the following
conditions:
1. Blocks belonging to a negative real eigenvalue λ < 0 can be represented in the
form
µM
r r
M ¶
[Jpi (λ) ⊕ Jpi (λ)], [Zpi ⊕ −Zpi ] .
i=1 i=1

2. Blocks belonging to the eigenvalue 0 can be represented in the form (J1 ⊕ J2 ⊕


J3 , Z1 ⊕ Z2 ⊕ Z3 ) where
µM
r r
M ¶
(J1 , Z1 ) = [Npi ⊕ Npi ], [Zpi ⊕ −Zpi ] with pi ≥ 1,
i=1 i=1
µM
s s
M ¶
(J2 , Z2 ) = [Npj ⊕ Npj −1 ], [²j Zpj ⊕ ²j Zpj −1 ] with pj > 1,
j=1 j=1
µM
t t
M ¶
(J3 , Z3 ) = 0, ²k .
k=1 k=1

14
3. If a basis in which the blocks from 2. exist is designated with E1 ∪ E2 ∪ E3 ,
(1) r 2pi (2)
s 2pi −1 (3)
t
E1 = {ei,k } i=1 k=1 , E2 = {ei,k } i=1 k=1 , E3 = {ei,1 } i=1 ,

then such a basis must exist in which


(1) (1) r s (2) t(3)
ker A = span{ei,1 + ei,pi +1 } i=1 ⊕ span{ei,1 } i=1 ⊕ span{ei,1 } i=1 .

(Remark: From this condition it follows that ker M = ker A.)


When all conditions are fulfilled, the following relationships exist between the
canonical form of (M2 , H) and of (M, H):
a. Non-real eigenvalues. If the canonical form of (M2 , H) contains a block of
the form

(Jp (α + iβ) ⊕ Jp (α − iβ), Z2p ) with α, β ∈ R and β > 0,

then the canonical form of (M, H) contains a block of the form

(Jp (λ) ⊕ Jp (λ), Z2p ) or (Jp (−λ) ⊕ Jp (−λ), Z2p ) with λ2 = α + iβ.

b. Positive real eigenvalues. If the canonical form of (M2 , H) contains a block


of the form

(Jp (α2 ), ²Zp ) with α > 0,

then the canonical form of (M, H) contains a block of the form

(Jp (α), ²Zp ) or (Jp (−α), (−1)p+1 ²Zp ).

c. Negative real eigenvalues. If the canonical form of (M2 , H) contains a block


of the form

(Jp (−β 2 ) ⊕ Jp (−β 2 ), Zp ⊕ −Zp ) with β > 0,

then the canonical form of (M, H) contains a block of the form

(Jp (iβ) ⊕ Jp (−iβ), Z2p ).

d. First case with eigenvalue 0. If the canonical form of (M2 , H) contains a


block of the form

(Np ⊕ Np , Zp ⊕ −Zp ) ∈ (J1 , Z1 ),

then the canonical form of (M, H) contains a block of the form

(N2p , Z2p ) or (N2p , −Z2p ).

e. Second case with eigenvalue 0. If the canonical form of (M2 , H) contains a


block of the form

(Np ⊕ Np−1 , ²Zp ⊕ ²Zp−1 ) ∈ (J2 , Z2 ),

then the canonical form of (M, H) contains a block of the form

(N2p−1 , ²Z2p−1 ).
15
f. Third case with eigenvalue 0. If the canonical form of (M2 , H) contains a
block of the form

(0, ²) ∈ (J3 , Z3 ),

then the canonical form of (M, H) contains a block of the form

(0, ²).

Proof. see [BMRRR1, Theorem 4.4, Lemma 7.8], [BMRRR3, Errata] and Theo-
rem 4.4.
Whereas Theorem 3.2 makes it possible to transfer results concerning complex
H-polar decompositions to real decompositions, Theorem 3.3 - whose proof is based
on Witt’s theorem - and Theorem 3.4 constitute the essential result for the existence
of H-polar decompositions. Note that there is an error in [BMRRR1, Theorem 4.4]
which is pointed out by the following example.
Example 3.5. With the designations used in [BMRRR1], let
· ¸ · ¸ · ¸
1 0 1 1 + ξ −1 − ξ [∗] 0 0
H= , X= p , X X=
0 −1 1 − ξ 2 1 + ξ −1 − ξ 0 0

with −1 < ξ < 1. Then according to the statement (ii) of Theorem 4.4 the equation
(X[∗] X, H) = (B0 , H0 ) is satisfied and ker B0 = span{e1 , e2 }. However, ker X =
span{e1 + e2 } 6= ker B0 , so that according to the statement (iii) of the theorem the
H-polar decomposition
· ¸ · ¸
1 1 ξ 1 −1
X = UA, U = p , A=
1 − ξ2 ξ 1 1 −1

should not exist. ♦


This error is corrected by making a change in the second condition of Theorem
3.4 for the size of the blocks from (J1 , Z1 ), so that now the conditionpi ≥ 1 is imposed
instead of the original condition pi > 1. This correction is also made in [BMRRR3,
Errata], and the Theorem 4.4 contained in the next chapter, too, emphasises the need
for making this change. (Since Theorem 4.4 is proved independent of Theorem 3.4
the forward reference given here is not problematic.)
A further approach for showing the existence of H-polar decompositions will now
be pointed out. It will be investigated only for complex matrices with regard to
Theorem 3.2 and is based on the following observation. (Since H- as well as Z-
hermitian matrices will be encountered in the following discussion, the abbreviating
notation AH = A[∗]H will be used from now on.)
Lemma 3.6. If A ∈ Cn×n admits an H-polar decomposition, then the canonical
forms of the pairs (AH A, H) and (AAH , H) are identical.
Proof. Let A = UM be an H-polar decomposition of A. Then UH = U−1 and
H
M = M imply

U−1 AAH U = U−1 (UM)(MH UH )U = M2 = (MH UH )(UM) = AH A,

so that AH A and AAH are H-unitary similar. If now (R−1 AH AR, R∗ HR) = (J, Z)
is the canonical form of the pair (AH A, H) and if S = UR, then (S−1 AAH S, S∗ HS)
= (R−1 U−1 AAH UR, R∗ U∗ HUR) = (J, Z) also gives the canonical form of the pair
(AAH , H).
16
The question now arises whether the converse of this statement is also true. It
can be answered immediately as follows for regular matrices.
Theorem 3.7. Let A ∈ Cn×n be regular and let the canonical forms of the pairs
(A A, H) and (AAH , H) be identical. Then A admits an H-polar decomposition.
H

Proof. Let R and S be regular matrices from Cn×n so that

(J, Z) = (R−1 AH AR, R∗ HR) = (S−1 AAH S, S∗ HS)

gives the canonical form of the pairs (AH A, H) and (AAH , H). Then the regular
matrix defined by

B = S−1 AR

is Z-normal, as is given with Z−1 = Z∗ = Z from

(ZB∗ Z)B = Z(R∗ A∗ S−∗ )Z(S−1 AR) = ZR∗ A∗ HAR


= R−1 H−1 A∗ HAR = J,
B(ZB∗ Z) = (S−1 AR)Z(R∗ A∗ S−∗ )Z = S−1 AH−1 A∗ S−∗ Z
= S−1 AH−1 A∗ HS = J.

Let f (B) denote an arbitrary polynomial in B. Then the commutability of B and


ZB∗ Z furthermore yields (ZB∗ Z)f (B) = f (B)(ZB∗ Z) or (Zf (B)∗ Z)B = B(Zf (B)∗ Z)
and

(Zf (B)∗ Z)f (B) = f (B)(Zf (B)∗ Z) or


(Zf (B)−∗ Z)f (B) = f (B)(Zf (B)−∗ Z)

respectively. The last one of these equations is obtained from the last but one equa-
tion and the fact that the commutability of two matrices P and Q also implies the
commutability of P−1 and Q, provided that the inverse exists.
−1
√ If now X √ BX 2
= Jp1 (λ1 )⊕. . .⊕Jpk (λk ) is the Jordan normal form of B, a matrix
B with√( B) = B can be chosen such that it can be expressed as a polynomial
f (B) = B, namely
µq q ¶

B=X Jp1 (λ1 ) ⊕ . . . ⊕ Jpk (λk ) X−1 with

 
f 0 (λ) f 00 (λ) f (p−1) (λ)
f (λ) 1! 2!
···
(p − 1)! 
 
 f 0 (λ) .. .. 
 f (λ) . . 
q  1!  √
 
Jp (λ) =  .. f (λ)  , f (λ) = λ,
00 (3.2)
 f (λ) . 
 2! 
 .. f (λ) 
0
 . 
 
1!
f (λ)

whereby in all blocks belonging to the same


√ eigenvalue λ ∈ σ(B) ⊂ C\{0} the same
branch (!) of the multivalent function λ must be chosen [WED, Chapter VII,
17
Theorem 2]4 . Thus for the matrices defined by
√ √ √ √
K = [Z( B)∗ Z]( B) = ( B)[Z( B)∗ Z],
√ √ √ √
T = ( B)[Z( B)−∗ Z] = [Z( B)−∗ Z]( B)

on the one hand


√ √ √ √ √
TK = ( B)[Z( B)−∗ Z][Z( B)∗ Z]( B) = ( B)2 = B,
√ √ √ √ √
KT = ( B)[Z( B)∗ Z][Z( B)−∗ Z]( B) = ( B)2 = B

and on the other hand the conditions


√ √
K∗ Z = ( B)∗ Z( B) = ZK,
√ √ √ √
T∗ ZT = [Z( B)−1 Z]( B)∗ Z( B)[Z( B)−∗ Z]
√ √ √ √
= Z( B)−1 ( B)[Z( B)∗ Z][Z( B)−∗ Z] = Z

are fulfilled too, so that B = TK = KT gives a Z-polar decomposition of B with (in


this case) commuting factors. Finally, if

M = RKR−1 and U = STR−1

then

UM = (STR−1 )(RKR−1 ) = SBR−1 = A,


M∗ H = (R−∗ K∗ R∗ )(R−∗ ZR−1 ) = (R−∗ ZR−1 )(RKR−1 ) = HM,
U∗ HU = (R−∗ T∗ S∗ )H(STR−1 ) = R−∗ (T∗ ZT)R−1 = R−∗ ZR−1 = H

is the wanted H-polar decomposition of A.


A matrix A satisfying AH A = AAH is called an H-normal matrix. It is a
trivial fact that for such matrices the canonical forms of (AH A, H) and (AAH , H)
are identical, and R = S for the transformations to the normal form used for the
proof in Theorem 3.7. The next result follows immediately from this, as has been
proved in a different way in [LMMR, Theorem 29].
Corollary 3.8. Every regular H-normal matrix from Cn×n admits an H-polar
decomposition with commuting factors.
In the case of singular matrices the square root cannot be built according to (3.2)
and the question of the validity of a statement corresponding to Theorem 3.7 has
not yet been clarified completely. But it can be answered for the case in which all
blocks of the canonical form (J, Z) belonging to the eigenvalue 0 take on the size 1
(nilpotency of index 1).
Lemma 3.9. Let A ∈ Cn×n such that AH A = 0. Then A admits an H-polar
decomposition if and only if also AAH = 0.
Proof. [⇒]: Let A = UM be an H-polar decomposition of A. Then M2 =
A A = 0. Thus also AAH = UM2 UH = 0.
H

[⇐]: First of all, for all matrices A ∈ Cn×n the easily proved equations [GLR,
Proposition 2.1]

im AH = (ker A)[⊥] and ker AH = (im A)[⊥]


4 For a deeper understanding of these statements, the corresponding fundamentals can be studied

in [G, Chapter V and Chapter VIII] as well as [WED, Chapter VII and Chapter VIII].
18
are true, so that AAH = 0 on the one hand implies

(ker A)[⊥] = im AH ⊂ ker A, i.e. (ker A)[⊥] = ker A ∩ (ker A)[⊥] ,

and AH A = 0 on the other hand implies

im A ⊂ ker AH = (im A)[⊥] , i.e. im A = (im A)[⊥] ∩ im A.

Thus if

r = rank A = dim(im A) = dim(im H−1 A∗ H) = dim(im AH ) = rank AH

then Cn can be expressed according to Theorem 2.3 in the form

Cn = X1 ⊕ X00 ⊕ X000 with ker A = X1 ⊕ X00 and X00 = ker A ∩ (ker A)[⊥]

as well as

Cn = Y2 ⊕ Y00 ⊕ Y000 with (im A)[⊥] = Y2 ⊕ Y00 and Y00 = (im A)[⊥] ∩ im A

whereby X00 , X000 , Y00 , Y000 are neutral subspaces of dimension r and X1 , Y2 are non-
degenerate subspaces of dimension p + q = n − 2r. If now {x1 , . . . , xp+q , x01 , . . . , x0r } is
taken to be an orthonormal basis of ker A, then this basis can be extended according
to Theorem 2.7 with r further vectors x001 , . . . , x00r to give a complete basis of the Cn ,
and for the matrix X = [x1 . . . xp+q x01 . . . x0r x001 . . . x00r ] consisting of these basis vectors
the equations

X∗ HX = Z with
· ¸ · ¸
Ip 0 Ir
Z= ⊕ r and AX = [01 . . . 0p+q+r y10 . . . yr0 ]
−Iq Ir 0r

apply. Therein {y10 , . . . , yr0 } is a basis of the neutral space im A, which, too, can be ex-
tended according to Theorem 2.7 with p+q+r further vectors y1 , . . . , yp+q , y100 , . . . , yr00
to give a complete basis of the Cn , so that for the matrix Y = [y1 . . . yp+q y10 . . . yr0 y100
. . . yr00 ] consisting of these basis vectors, the equations

Y∗ HY = Z and
· ¸ · ¸
0p 0r Ir
YK = [01 . . . 0p+q+r y10 . . . yr0 ] with K = ⊕
0q 0r 0r

are satisfied. Moreover, the matrices Z and K thus introduced fulfil the equations
AX = YK, Z−1 = Z∗ = Z and K∗ Z = 0p+q ⊕ Ir ⊕ 0r = ZK. Finally, if

M = XKX−1 and U = YX−1 ,

then

UM = Y(X−1 X)KX−1 = YKX−1 = A,


M∗ H = X−∗ K∗ (X∗ H) = X−∗ (K∗ Z)X−1
= (X−∗ Z)KX−1 = HXKX−1 = HM,
U∗ HU = X−∗ (Y∗ HY)X−1 = X−∗ ZX−1 = H
19
is the wanted H-polar decomposition of A.
Remark 3.10. In addition to the H-polar decomposition of A given in the
proof,

AH = ŨM̃ with M̃ = UMU−1 = YKY−1 and Ũ = U−1 = XY−1

is an H-polar decomposition of AH and, furthermore, AH Y = XK as can easily


be verified using AH = H−1 A∗ H. Thus the basis vectors can be assigned to the
subspaces as follows
im AH im A
z }| { z }| {
x1 , . . . , xp+q , x01 , . . . , x0r , x001 , . . . , x00r , y1 , . . . , yp+q , y10 , . . . , yr0 , y100 , . . . , yr00 ,
| {z } | {z }
ker A ker AH

as is also evident from the equations

im AH ⊂ ker A = (im AH )[⊥] and (ker AH )[⊥] = im A ⊂ ker AH

which were not used in the proof but are valid too. ♦
Example 3.11.
1. If λ ∈ C\{0} and
· ¸ · ¸ · ¸
Ip 0p 0p λIp 0p
H= , A= , AH = ,
Ip λIp λIp λIp 0p

then AH A = 0 but AAH = 6 0. Therefore no H-polar decomposition of A


exists.
2. If λ ∈ C\{0} and
· ¸ · ¸ · ¸
Ip 0 λIp
H= , A= p , AH = ,
Ip λIp 0p

then AH A = 0 and AAH = 0. Therefore an H-polar decomposition of A


exists, namely
· −1 ¸ · ¸
λ Ip , M = Ip
A = UM with U = .
λIp 0p

3. If λ ∈ R and
· ¸ · ¸ · ¸
Zp 0p Jp (λ)
H= , A= , AH = ,
Zp Jp (λ) 0p

then AH A = 0 and AAH = 0. Therefore an H-polar decomposition of A


exists, namely
· ¸ · ¸
Ip Jp (λ)
A = UM with U = , M= .
Ip 0p

4. If λ ∈ C\R, J2p (λ, λ) = Jp (λ) ⊕ Jp (λ) and


· ¸ · ¸ · ¸
Z2p 02p H J2p (λ, λ)
H= , A= , A = ,
Z2p J2p (λ, λ) 02p
20
then AH A = 0 and AAH = 0. Therefore an H-polar decomposition of A
exists, namely
· ¸ · ¸
I2p J2p (λ, λ)
A = UM with U = , M= . ♦
I2p 02p

These examples are here listed in detail because the cases 3. and 4. are important
in connection with normal forms of H-normal matrices, but this will not be considered
further here [LMMR, Theorem 10, Example 11]. The following sufficient condition for
the existence of H-polar decompositions can now be proved with the help of Lemma
3.9.
Theorem 3.12. Let A ∈ Cn×n and let the canonical forms of the pairs (AH A,
H) and (AAH , H) be identical. Furthermore, let all blocks of the canonical form
belonging to the eigenvalue 0 be of size 1. Then A admits an H-polar decomposition.
Proof. Let R, S, J, Z and B be as in the proof of Theorem 3.7, so that BZ B =
Z
BB = J can be assumed. Furthermore, let

J = J1 ⊕ J0 , J0 ∈ Cm×m and Z = Z1 ⊕ Z0 , Z0 ∈ Cm×m

whereby J0 , Z0 designates the part of the canonical form belonging to the eigenvalue
0. Then the spectra of the blocks J1 and J0 are disjoint and also the matrices J and
B commute, so that also B must take the form

B = B1 ⊕ B0 , B0 ∈ Cm×m .

Therefore B1 is regular because of BZ Z1


1 B1 = B1 B1 = J1 and it consequently admits
1

a Z1 −polar decomposition constructed according to Theorem 3.7

B1 = T1 K1 (= K1 T1 ) with T∗1 Z1 T1 = Z1 and K∗1 Z1 = Z1 K1 .

Moreover, the prerequisites of the theorem yield5 BZ Z0


0 B0 = B0 B0 = J0 = 0m and
0

therefore B0 admits a Z0 −polar decomposition

B0 = T0 K0 with T∗0 Z0 T0 = Z0 and K∗0 Z0 = Z0 K0

constructed according to Lemma 3.9. Finally, if

T = T1 ⊕ T0 , U = STR−1 and K = K1 ⊕ K0 , M = RKR−1

then TK is a Z-polar decomposition of B and UM is an H-polar decomposition of


A.
Corollary 3.13. Let A ∈ Cn×n be H-normal and let all blocks of the canonical
form of the pair (AH A, H) belonging to the eigenvalue 0 be of size 1. Then A admits
an H-polar decomposition.
A further criterion for the existence of H-polar decompositions of H-normal matri-
ces is given in Theorem 34 of [LMMR]. In Theorem 35 also contained there, H-polar
decompositions of singular H-normal Matrices X = UA ∈ Cn×n with UH = U−1 ,
AH = A are presented for all possible nontrivial cases in which H has exactly two
5 If it could be proved that every H-normal matrix A with (AH A)k = 0 for a k from N admits

an H-polar decomposition, then the present restrictions regarding the block sizes for the eigenvalue
0 would no longer be needed.
21
negative eigenvalues and whose existence is not guaranteed by Theorem 34. Thereby,
in the listed cases, the index of nilpotency k of the matrices XH X = XXH is
 
1, in case(I),(VI)-(VII) 
k = 2, in case(II)-(III),(VIII)-(XII)
 
3, in case(IV)-(V)

as can easily be verified. Thus, the existence of the given H-polar decompositions in
the cases (I), (VI)-(VII) is ensured by Corollary 3.13 and, moreover, the hypothesis
expressed in the footnote on the last page is supported, too. On the other hand for
α ∈ R and
     
1 0 1 iα 0 0 1
H =  1  , X =  0 1  with XH X = XXH =  0 0
1 0 0

the existence of the H-polar decomposition


   
1
1 iα − α2 + iβ 0 1 0
 2  
X = UA with U =  1 iα  , A = 0 1 (β ∈ R)
1 0

is guaranteed by Theorem 34(ii) but not by Corollary 3.13, so that the two criteria
are mutually supplementary.
4. Numerical computation of H-polar decompositions of a matrix A
for which A[∗] A is diagonalisable. The practical computation of H-polar decom-
positions of a matrix A ∈ Cn×n is a tedious task consisting of the following steps6 :
1. Computation of the canonical form of the pair (A[∗] A, H),
2. Computation of an H-selfadjoint matrix M such that M2 = A[∗] A and
ker M = ker A,
3. Computation of an H-isometry U such that A = UM.
This chapter specifies a numerical method for the case that the matrix A[∗] A is
diagonalisable. This requires a simplified canonical form for a pair (A, H) where A
is a diagonalisable H-selfadjoint matrix and H is a regular hermitian matrix, which
is derived based on the following facts taken from [GLR, Chapter I.2.2]:
If A ∈ Cn×n is H-selfadjoint then for every non-real eigenvalue λ ∈ σ(A) it follows
that also λ ∈ σ(A) and the Jordan normal form of A contains blocks of the same size
for both eigenvalues. Let

EA (λ) = {x ∈ Cn : (A − λI)k x = 0 for a k ∈ N}

be the generalised eigenspace for the eigenvalue λ. If λ1 , . . . , λr are the real and
λr+1 , . . . , λs are the non-real eigenvalues with positive real parts, and if Xi = EA (λi )

6 In the case H = I the polar decomposition of a square matrix A is obtained from its singular

value decomposition. Namely if A = UΣV∗ , then UV∗ is the isometric and VΣV∗ is the selfadjoint
factor of a polar decomposition of A. This suggests that in the case of an indefinite scalar product,
too, a similar approach can be adopted to obtain an H-polar decomposition. An analogy to the
singular value decomposition is examined in the paper [BR]. In Chapter 8 thereof the statements
regarding H-polar decomposition are derived, on which the present chapter is based: The existence of
an H-polar decomposition of A is inferred there, too, from the canonical form of the pair (A[∗] A, H).
22
for 1 ≤ i ≤ r and Xi,1 = EA (λi ), Xi,2 = EA (λi ) for r + 1 ≤ i ≤ s, then the Cn can
be expressed as the direct sum of the non-degenerate generalised eigenspaces

Cn = X1 ⊕ . . . ⊕ Xr ⊕ Xr+1 ⊕ . . . ⊕ Xs with Xi = Xi,1 ⊕ Xi,2 for r + 1 ≤ i ≤ s

for which the following equations are satisfied

[xk , xl ] = 0 for xk ∈ Xk , xl ∈ Xl and 1 ≤ k 6= l ≤ s,


[xk , yk ] = 0 for xk , yk ∈ Xk,1 or xk , yk ∈ Xk,2 and r + 1 ≤ k ≤ s.

Therefore, if R ∈ Cn×n is a matrix whose column vectors are bases of the subspaces
Xi , then R is regular and
· ¸ · ¸
Ar+1,1 0 As1 0
R−1 AR = A1 ⊕ . . . ⊕ Ar ⊕ ⊕ ... ⊕ ,
0 Ar+1,2 0 As2
· ¸ · ¸
0 Hr+1 0 Hs
R∗ HR = H1 ⊕ . . . ⊕ Hr ⊕ ⊕ . . . ⊕ ,
H∗r+1 0 H∗s 0

whereby the size of the blocks is given by pi = dim Xi for 1 ≤ i ≤ r and pi =


dim Xi,1 = dim Xi,2 for r + 1 ≤ i ≤ s. In the special case of a diagonalisable matrix A
the generalised eigenspaces contain exclusively eigenvectors and we get the simplified
form
· ¸ · ¸
−1 λr+1 Ipr+1 0 λs Ips 0
R AR = λ1 Ip1 ⊕. . .⊕λr Ipr ⊕ ⊕. . .⊕ ,
0 λr+1 Ipr+1 0 λs Ips
· ¸ · ¸
0 Hr+1 0 Hs
R∗ HR = H1 ⊕ . . . ⊕ Hr ⊕ ⊕ ... ⊕ .
H∗r+1 0 H∗s 0

The following result is obtained directly from this representation whose proof will be
given in two ways in order to, on the one hand, show the connection to Theorem 3.1
and, on the other hand, to provide the foundation for the corresponding numerical
method.
Theorem 4.1 (Simplified canonical form). Let A ∈ Cn×n be H-selfadjoint and
diagonalisable. Then there exists a regular matrix S ∈ Cn×n such that

S−1 AS = A1 ⊕ . . . ⊕ Ak and S∗ HS = H1 ⊕ . . . ⊕ Hk , (4.1a)

whereby the blocks Aj and Hj are of equal size and the pairs (Aj , Hj ) have one and
one only of the following forms:
1. Pairs belonging to real eigenvalues.
· ¸
Ip−q 0
Aj = λIp and Hj = (4.1b)
0 −Iq

with λ ∈ R and p, q ∈ N, q ≤ p.
2. Pairs belonging to non-real eigenvalues
· ¸ · ¸
λIp 0 0 Ip
Aj = and Hj = (4.1c)
0 λIp Ip 0

with λ ∈ C\R, Im(λ) > 0 and p ∈ N.


23
Moreover, the form (S−1 AS, S∗ HS) of (A, H) is uniquely determined up to the per-
mutation of blocks.
First proof. According to Theorem 3.1 a regular matrix S ∈ Cn×n exists such
that the pair (A, H) is in the canonical form (3.1). Because of the assumed diagonal-
isability, the size of the blocks appearing therein is always p = 1. Now combining all
p blocks of the form (3.1b) or (3.1c) which belong to the same eigenvalue λ ∈ R or
λ ∈ C\R respectively, then after a suitable permutation it is always possible to build
one block of the form (4.1b) or (4.1c). From p − q blocks of (3.1b) with ² = +1 and
q blocks of (3.1b) with ² = −1 this gives one block of the form (4.1b).
Second proof. For all real eigenvalues λρ ∈ σ(A), 1 ≤ ρ ≤ r, let {u1 , . . . , up }ρ
be an orthonormalised (according to Theorem 2.4) basis of eigenvectors of EA (λρ ),
arranged such that (Huj , uj ) = 1 for 1 ≤ j ≤ p − q and (Huj , uj ) = −1 for p −
q + 1 ≤ j ≤ p. For all non-real eigenvalues λσ , λσ ∈ σ(A), r + 1 ≤ σ ≤ s, let
{u1 , . . . , up }σ and {v1 , . . . , vp }σ be two orthonormalised (according to Theorem 2.5)
bases of eigenvectors of EA (λσ ) and EA (λσ ). On now combining these bases as
columns of the matrix S, the pair (S−1 AS, S∗ HS) takes on the assumed form.
Example 4.2 (Non-defective matrix pencils). If ρH − G is a non-defective
matrix pencil with regular hermitian matrices H and G from Cn×n , then by definition
regular matrices P and Q exist such that

ΛH = P−1 HQ and ΛG = P−1 GQ

are diagonal matrices. Thus the matrix H−1 G = QΛ−1H ΛG Q


−1
is diagonalisable and
because (H G) H = H(H G) it is H-hermitian. The canonical form (S−1 H−1 GS,
−1 ∗ −1

S∗ HS) of the pair (H−1 G, H) can therefore be expressed according to Theorem 4.1
in the simplified form
µM
r ¶ µ M
s · ¸¶
−1 −1 λj Ipj
S H GS = λj Ipj ⊕ ,
λj Ipj
j=1 j=r+1
µMr · ¸¶ µ M
s · ¸¶
Ipj −qj Ipj
S∗ HS = ⊕ ,
−Iqj Ipj
j=1 j=r+1
µMr · ¸¶ µ Ms · ¸¶
∗ Ipj −qj λj Ipj
S GS = λj ⊕
−Iqj λj Ipj
j=1 j=r+1

with λ1 , . . . , λr ∈ R\{0} and λr+1 , . . . , λs ∈ C\R [MMX, Corollary 2.4]. ♦


From the second proof of Theorem 4.1 the following numerical procedure can be
derived for computing a simplified canonical form.
Numerical Procedure 4.3 (Simplified canonical form). First of all the eigen-
values λ1 , . . . , λn and a corresponding basis of the Cn consisting of the eigenvectors
r1 , . . . , rn of A must be computed. For example, the LR method [PW], the QR
method [F] or a method based on Jacobi rotations and shears according to Eberlein
[E] can be used for this purpose. Thereafter the eigenvalues must be combined in
groups of numerical eigenvalues λ∗1 , . . . , λ∗k . This can be done with eigenvalue esti-
mates via Gerschgorin circles [KR], with a modified cluster analysis algorithm [K] or
simply by the following sorting method:
for j = 1 . . . n − 1 do
k=j
for i = j + 1 . . . n do
24
if | Re(λi ) − Re(λk )| < τ then
if | Im(λi ) − Im(λk )| < τ or | Im(λi ) + Im(λk )| < τ then
if Im(λi ) > Im(λk ) then
k=i
end if
else if | Im(λi )| > | Im(λk )| then
k=i
end if
else if | Re(λi ) + Re(λk )| < τ then
if Re(λi ) > Re(λk ) then
k=i
end if
else if | Re(λi )| > | Re(λk )| then
k=i
end if
end for
if k 6= j then
SWAP(λj , λk )
for i = 1 . . . n do
SWAP(rij , rik )
end for
end if
end for
Thereby τ > 0 is a control parameter, with which the sorted eigenvalues are combined
in groups whose mean value is interpreted as numerical eigenvalue

λ1 , . . . , λp1 λp1 +1 , . . . , λp1 +p2 . . . λp1 +...+pk−1 +1 , . . . , λp1 +p2 +...+pk


| {z } | {z } | {z }.
λ∗1 λ∗2 ··· λ∗k

A group limit is reached when either | Re(λi )−Re(λi+1 )| ≥ τ or | Im(λi )−Im(λi+1 )| ≥


τ . Furthermore, the sorting method is devised such that pairs of non-real numeri-
cal eigenvalues always come consecutively, so that either λi is a real eigenvalue or
λi , λi+1 is a pair of conjugate non-real eigenvalues. Thereby the parameter τ also
decides whether an eigenvalue is to be considered as real | Im(λi )| < τ or as non-real
| Im(λi )| ≥ τ , and the eigenvalue 0 is additionally characterised by | Re(λi )| < τ ; it
is always sorted to the end, entailing organisational advantages. The eigenvectors
belonging to an eigenvalue λ∗i or the eigenvectors belonging to a pair of eigenvalues

λ∗i , λ∗i+1 = λi

rmi−1 +1 , . . . , rmi−1 +pi or rmi−1 +1 , . . . , rmi−1 +pi and rmi +1 , . . . , rmi +pi+1

with mi = p1 + p2 + . . . + pi can be orthonormalised by method 2.6, which is possible


in the second case only when pi = pi+1 . Otherwise an error situation has arisen which
might possibly be correctable by choosing a more suitable value for τ (provided that
the matrix A is H-hermitian, as is of course assumed here). Empirically it was found

that in most cases a very good result can be achieved with τ ≈ ²machine (machine
accuracy). A generalised method for numerical computation of the canonical form of
a pair (A, H) with a non-diagonalisable matrix A is described in [K]. ♦
The existence of an H-hermitian square root of the matrix A[∗] A can also be shown
in simplified form if it is diagonalisable. The complete proof of the specialisation of
25
Theorem 3.4 is given for clarity and in order to make the following numerical procedure
comprehensible.
Theorem 4.4 (Existence of H-selfadjoint square roots). Let A ∈ Cn×n and let
B = A[∗] A be diagonalisable. Then there exists an H-selfadjoint matrix M such that
M2 = B and ker M = ker A if and only if the following conditions are satisfied:
1. The part of the (simplified) canonical form of the pair (B, H) belonging to
negative real eigenvalues λ = −α2 consists of blocks of the form
· ¸
2 Ip 0
Bj = −α I2p and Hj =
0 −Ip
with α > 0 and p ∈ N.
2. The part of the (simplified) canonical form of the pair (B, H) belonging to the
eigenvalue 0 consists of the blocks
· ¸
I 0
Bj = 0p and Hj = r+s
0 −Ir+t
with p, r, s, t ∈ N, 2r + s + t = p, and if {e1 , . . . , er+s , f1 , . . . , fr+t } stand for
the basis in which these blocks exist, then two permutations (ρ1 , . . . , ρr+s ) of
(1, . . . , r + s) and (σ1 , . . . , σr+t ) of (1, . . . , r + t) exist, so that
ker A = span{eρ(1) + fσ(1) , . . . , eρ(r) + fσ(r) }
⊕ span{eρ(r+1) , . . . , eρ(r+s) } ⊕ span{fσ(r+1) , . . . , fσ(r+t) }
is fulfilled.
Proof. [⇒]: Let B ∈ Cn×n be a diagonalisable matrix with σ(B) = {λ1 , . . . , λk }
and let
R−1 BR = λ1 Ip1 ⊕ . . . ⊕ λk Ipk
be its Jordan normal form whereby the columns of R form a basis of the Cn consisting
of eigenvectors of B. Then every matrix M with M2 = B can be expressed in the
form
p p
M = R( λ1 Ip1 ⊕ . . . ⊕ λk Ipk )R−1
on setting
p √ √
λIp = Xp ( λIp−q ⊕ − λIq )X−1 p for λ 6= 0 and
µ· ¸ ¶
p Ir
0 p = Xp ⊕ 0p−2r X−1p for λ = 0
0r
where Xp ∈ Cp×p designates an arbitrary regular matrix. The simplified canonical
form of an H-selfadjoint matrix M whose square is diagonalisable
R−1 MR = M1 ⊕ . . . ⊕ Mk , R∗ HR = H1 ⊕ . . . ⊕ Hk
therefore consists of blocks (Mj , Hj ) of the form
µ· ¸ · ¸¶
λIp Ip
, for λ ∈ C\R or
λIp Ip
µ ¶
λIp+q , Ip ⊕ −Iq for λ ∈ R\{0} or
µ· ¸ · ¸ ¶
Ip+q Ip ⊕ −Iq
⊕ 0s+t , ⊕ Is ⊕ −It for λ = 0,
0p+q Ip ⊕ −Iq
26
whereby the blocks belonging to the eigenvalue 0 have been combined in evident
manner to the ordinary canonical form
µM
p+q · ¸¶ µMs+t ¶
0 1
M0 = ⊕ [0] ,
0 0
i=1 i=1
µM
p · ¸¶ µM q · ¸¶ µM s ¶ µM
t ¶
0 1 0 −1
H0 = ⊕ ⊕ [1] ⊕ [−1] .
1 0 −1 0
i=1 i=1 i=1 i=1

Now if such a matrix M is given, and if the basis in which the (simplified) canonical
form exists is designated as {g1 , . . . , gn }, then the (simplified) canonical form of the
pair (M2 , H) has the following properties:
1. If λ ∈ C\R ∪ iR and if the canonical from of (M, H) contains the blocks
· ¸ · ¸
λIp1 −λIp2
Mj1 ⊕ Mj2 = ⊕ ,
λIp1 −λIp2
· ¸ · ¸
Ip1 Ip2
Hj1 ⊕ Hj2 = ⊕ ,
Ip1 Ip2

then the canonical form of (M2 , H) contains a pair of blocks of the form
· 2 ¸ · ¸
λ Ip1 +p2 Ip1 +p2
M2j = 2 , Hj = . [End of 1.]
λ Ip1 +p2 Ip1 +p2

2. If λ ∈ R\{0} and if the canonical form of (M, H) contains the blocks

Mj1 ⊕ Mj2 = λIp1 +q1 ⊕ −λIp2 +q2 ,


Hj1 ⊕ Hj2 = (Ip1 ⊕ −Iq1 ) ⊕ (Ip2 ⊕ −Iq2 ),

then the canonical form of (M2 , H) contains a pair of blocks of the form

M2j = λ2 I(p1 +p2 )+(q1 +q2 ) , Hj = Ip1 +p2 ⊕ −Iq1 +q2 . [End of 2.]

3. If λ = iα ∈ iR\{0} and if the canonical form of (M, H) contains a pair of


blocks
· ¸ · ¸
iαIp Ip
Mj = , Hj = ,
−iαIp Ip

then the canonical form of (M2 , H) contains a pair of blocks


· 2 ¸
−α Ip
M2j = and Hj .
−α2 Ip

If a new basis {e1 , . . . , ep , f1 , . . . , fp } is now chosen with

1 1
ek = √ (gk + gk+p ), fk = √ (gk − gk+p ) for 1 ≤ k ≤ p,
2 2
then

(Hj ek , el ) = δkl , (Hj ek , fl ) = 0, (Hj fk , fl ) = −δkl for 1 ≤ k, l ≤ p


27
and the following blocks appear
· ¸
Ip
M̃2j = M2j and H̃j = . [End of 3.]
−Ip

4. If λ = 0 and if the canonical form of (M, H) contains a pair of blocks


· ¸ · ¸
Ip+q Ip ⊕ −Iq
Mj = ⊕ 0s+t , Hj = ⊕ Is ⊕ −It ,
0p+q Ip ⊕ −Iq

then the canonical form of (M2 , H) contains a pair of blocks

M2j = 02p+2q+s+t and Hj .

If a new basis {e1 , . . . , ep+q+s , f1 , . . . , fp+q+t } is now chosen with

1 1
ek = √ (gk + gk+p+q ), fk = √ (gk − gk+p+q ) for 1 ≤ k ≤ p,
2 2
1 1
ek = √ (gk − gk+p+q ), fk = √ (gk + gk+p+q ) for p + 1 ≤ k ≤ p + q,
2 2
ek+p+q = gk+2p+2q for 1 ≤ k ≤ s,
fk+p+q = gk+2p+2q+s for 1 ≤ k ≤ t,

then

(Hj ek , el ) = δkl , (Hj ek , fν ) = 0, (Hj fµ , fν ) = −δµv


for 1 ≤ k, l ≤ p + q + s and 1 ≤ µ, ν ≤ p + q + t

and the following blocks appear


· ¸
Ip+q+s
M̃2j = M2j and H̃j = .
−Ip+q+t

Moreover, it is true that

ker M = span{g1 , . . . , gp+q , g2p+2q+1 , . . . , g2p+2q+s+t }


= span{e1 + f1 , . . . , ep+q + fp+q }
⊕ span{ep+q+1 , . . . , ep+q+s } ⊕ span{fp+q+1 , . . . , fp+q+t }.

[⇐]: Let B be diagonalisable and let R−1 BR = B1 ⊕ . . . ⊕ Bk , R∗ HR = H1 ⊕


. . . ⊕ Hk be the (simplified) canonical form of the pair (B, H). Furthermore, let the
conditions 1. and 2. be satisfied, and let Σp be diagonal matrices with diagonal
elements of {+1, −1}. Then the matrix M can be constructed as follows:
1. If λ = ω 2 ∈ C\R and if the canonical form contains a pair of blocks of the form
· ¸ · ¸
λIp Ip
Bj = and Hj = ,
λIp Ip

then for
· ¸
ωΣp
Mj =
ωΣp
28
the equations M2j = Bj and M∗j Hj = Hj Mj are satisfied. [End of 1.]
2. If λ ∈ R ∩ (0, ∞) and if the canonical form contains a pair of blocks of the form
· ¸
Ip−q
Bj = λIp and Hj = ,
−Iq
then for

Mj = λΣp
the equations M2j = Bj and M∗j Hj = Hj Mj are satisfied. [End of 2.]
3. If λ = −α2 ∈ R ∩ (−∞, 0) and if the canonical form contains a pair of blocks
of the form
· 2 ¸ · ¸
−α Ip Ip
Bj = and Hj = ,
−α2 Ip −Ip
then for
· ¸ · ¸ · ¸
iαΣp Ip 1 Ip Ip
M̃j = , H̃j = and Sj = √
−iαΣp Ip 2 Ip −Ip

the equations S−1 2 ∗ −1 ∗


j Bj Sj = Bj = M̃j and Sj Hj Sj = H̃j are satisfied (Sj = Sj = Sj ).
Therefore on setting
· ¸
−1 iαΣp
Mj = Sj M̃j Sj = ,
iαΣp

we obtain M2j = Bj and M∗j Hj = Hj Mj . [End of 3.]


4. If λ = 0, then it is always possible to achieve an arrangement of the basis
vectors in the order
{eρ(1) , . . . , eρ(r) , fσ(1) , . . . , fσ(r) , eρ(r+1) , . . . , eρ(r+s) , fσ(r+1) , . . . , fσ(r+t) }
such that the blocks Bj and Hj exist in the form
· ¸ · ¸
Ir 0 I 0
Bj = 0p and Hj = ⊕ s
0 −Ir 0 −It
and for
· ¸ · ¸ · ¸
0 Σr 0 Ir I 0
M̃j = ⊕ 0s+t , H̃j = ⊕ s
0 0 Ir 0 0 −It
· ¸
1 Ir Ir
and Sj = √ ⊕ Is+t
2 Ir −Ir
the equations S−1 2 ∗ ∗ −1
j Bj Sj = Bj = M̃j and Sj Hj Sj = H̃j are satisfied (Sj = Sj = Sj ).
Therefore on setting
· ¸
−1 1 Σr −Σr
Mj = Sj M̃j Sj = ⊕ 0s+t
2 Σr −Σr
we obtain M2j = Bj and M∗j Hj = Hj Mj and also

ker M = span{eρ(1) + fσ(1) , . . . , eρ(r) + fσ(r) ,


eρ(r+1) , . . . , eρ(r+s) , fσ(r+1) , . . . , fσ(r+t) } = ker A.
29
Using the notation of Theorem 3.4, the part of the canonical form belonging to
the eigenvalue 0 in the basis {e1 , f1 , . . . , er , fr , er+1 , . . . , er+s , fr+1 , . . . , fr+t } can be
expressed as
µM
r ¶ µM
s ¶ µM
t ¶
(N1 ⊕ N1 ) ⊕ [0] ⊕ [0] ,
i=1 i=1 i=1
µM
r ¶ µM
s ¶ µM
t ¶
(Z1 ⊕ −Z1 ) ⊕ [1] ⊕ [−1] .
i=1 i=1 i=1

This confirms again the correction of Theorem 3.4 explained with Example 3.5. Fur-
thermore, the diagonal matrices Σp used in the proof given above comply with the
relationships between the canonical forms of (M2 , H) and (M, H) listed in Theorem
3.4(a-f).
Summing up, the following procedure can now be described using the designation
X ¦ Y = [x1 . . . xp y1 . . . yq ] ∈ Cn×(p+q) for given matrices X = [x1 . . . xp ] ∈ Cn×p
and Y = [y1 . . . yq ] ∈ Cn×q having the specified columns.
Numerical Procedure 4.5 (H-polar decomposition). Let A ∈ Cn×n and let
[∗]
A A be diagonalisable. Then an H-polar decomposition of A can be constructed as
follows.
1st step: First of all the (simplified) canonical form of the pair (A[∗] A, H) must
be determined7 using the procedure 4.3, which is given as example by

R−1 A[∗] AR = J = J3 ⊕ J2 ⊕ J1 ⊕ J0 ,
· ¸ · 2 ¸ · 2 ¸ · ¸
ω 2 Ip3 α Ip2 −β Ip1 0
J= ⊕ ⊕ ⊕ r+s ,
ω 2 Ip3 α 2 I q2 −β 2 Ip1 0r+t
R∗ HR = ZJ = ZJ,3 ⊕ ZJ,2 ⊕ ZJ,1 ⊕ ZJ,0 ,
· ¸ · ¸ · ¸ · ¸
Ip3 Ip2 Ip1 I
ZJ = ⊕ ⊕ ⊕ r+s ,
Ip3 −Iq2 −Ip1 −Ir+t
R = R3 ¦ R2 ¦ R1 ¦ R0

with ω ∈ C\R ∪ iR and α, β ∈ R ∩ (0, ∞). Thereby the (rectangular) blocks Rj ,


0 ≤ j ≤ 3, which belong to the corresponding eigenvalue contain the (normalised)
eigenvectors.
2nd step: Now an H-hermitian square root M of the matrix A[∗] A can be deter-
mined with

S−1 MS = K = K3 ⊕ K2 ⊕ K1 ⊕ K0 ,
· ¸ · ¸ · ¸ µ· ¸ · ¸¶
ωIp3 αIp2 iβIp1 Ir 0
K= ⊕ ⊕ ⊕ ⊕ s ,
ωIp3 αIq2 iβIp1 0r 0t
S∗ HS = ZK = ZK,3 ⊕ ZK,2 ⊕ ZK,1 ⊕ ZK,0 ,

7 If the matrix A[∗] A has only non-real and positive eigenvalues, then the canonical form does

not necessarily have to be determined. If in this case K is a diagonal matrix with square roots of
the eigenvalues, then M = RKR−1 and U = AM−1 are given directly as H-polar decomposition
of A. The canonical form is required in the general case considered here, in order to decide whether
an H-hermitian square root exists in the case of negative eigenvalues, and in order to find a suitable
kernel transformation in the case of the eigenvalue 0.
30
· ¸ · ¸ · ¸ µ· ¸ · ¸¶
Ip3 I I Ir I
ZK = ⊕ p2 ⊕ p1 ⊕ ⊕ s ,
Ip3 −Iq2 −Ip1 Ir −It
S = R3 ¦ R2 ¦ R1 ¦ R000 (R000 according to kernel transformation).

In the case of the negative eigenvalue −β 2 this construction is possible only if condition
1. of Theorem 4.4 is fulfilled, i.e. only if the number of negative is the same as the
number of positive eigenvectors from R1 . It would then also be possible to use the
blocks
· ¸ · ¸ · ¸
iβIp1 Ip1 1 I Ip1
K01 = , Z0K,1 = and R01 = √ R1 p1 ,
−iβIp1 Ip1 2 Ip1 −Ip1

but this is found to be less convenient when constructing the isometry in the third
step. Furthermore, in the blocks of K diagonal matrices Σp3 , Σp2 , Σq2 , Σp1 , Σr with
diagonal elements from {+1, −1} could also be used instead of the identity matrices,
which would then produce another H-polar decomposition of A. Finally, the required
treatment of the eigenvalue 0 in this step is based on the following procedure.
Kernel transformation: Let R0 = R+ ¦ R− , where R+ contains the r + s positive
and R− contains the r + t negative vectors from ker A[∗] A, so that on the one hand

R∗+ HR+ = Ir+s and R∗− HR− = −Ir+t

and because of ZJ J = (R∗ HR)(R−1 A[∗] AR) = R∗ A∗ HAR = (AR)∗ H(AR) on


the other hand

(AR+ )∗ H(AR+ ) = 0r+s and (AR− )∗ H(AR− ) = 0r+t .

Furthermore, let U∗+ (AR+ )V+ = Σ+ and U∗− (AR− )V− = Σ− be singular value de-
compositions of the matrices AR+ and AR− . Then if the number r of non-vanishing
singular values σi ≥ τ (see procedure 4.3) in Σ+ and Σ− is the same, i.e.

(AR+ )V+ = U+ Σ+ = [a+ +


1 . . . ar 01 . . . 0s ],
(AR− )V− = U− Σ− = [a− −
1 . . . ar 01 . . . 0t ],

and if the transformation defined by

(U+ Σ+ )r W = −(U− Σ− )r , W = −(Σ−1 ∗


+ U+ U− Σ− )r
(the index r stands for submatrices)

is unitary W∗ W = WW∗ = Ir then

R̂+ ¦ Ř+ = R+ V+ (W ⊕ Is ) and R̂− ¦ Ř− = R− V−

with R̂+ , R̂− ∈ Cn×r can be set. The matrix given by

R00 = R̂+ ¦ R̂− ¦ Ř+ ¦ Ř−

then satisfies the equations

AR00 = [−a− − − −
1 . . . − ar a1 . . . ar 01 . . . 0s+t ], (AR00 )∗ H(AR00 ) = 02r+s+t
and (R00 )∗ H(R00 ) = (Ir ⊕ −Ir ) ⊕ (Is ⊕ −It ),
31
and for
· ¸
1 I Ir
R000 = √ (R̂+ ¦ R̂− ) r ¦ (Ř+ ¦ Ř− )
2 Ir −Ir
we get

AR000 = − 2[01 . . . 0r a− −
1 . . . ar 01 . . . 0s+t ], (AR000 )∗ H(AR000 ) = 02r+s+t
· ¸ · ¸
00 ∗ 00 Ir Is
and (R0 ) H(R0 ) = ⊕ .
Ir −It
If the number of non-vanishing singular values in Σ+ and Σ− differ or the transfor-
mation W is not unitary, then ker A cannot be expressed in the just constructed form
ker A = span{e1 + f1 , . . . , er + fr , er+1 , . . . , er+s , fr+1 , . . . , fr+t } with
R̂+ = [e1 . . . er ], Ř+ = [er+1 . . . er+s ], R̂− = [f1 . . . fr ], Ř− = [fr+1 . . . fr+t ].
In that case the condition 2. of Theorem 4.4 is violated and an H-polar decomposition
of A then does not exist.
3rd step: After the second step M = SKS−1 is the H-hermitian factor, and
in the regular case, in which no blocks of the form J0 , ZJ,0 and K0 , ZK,0 exist,
U = AM−1 = ASK−1 S−1 is the H-unitary factor of an H-polar decomposition of A.
The inverse of the matrix S which thereby appears can be obtained in the numerical
evaluations of the equations using S−1 = ZK S∗ H, which follows from S∗ HS = ZK .
In the singular case let
· ¸
Ir
K̃ = K3 ⊕ K2 ⊕ K1 ⊕ K̃0 with K̃0 = K̃−1 0 = ⊕ Is+t
Ir
and also let
S0 = S00 ¦ S000 ¦ S̃0 with S00 , S000 ∈ Cn×r and S̃0 ∈ Cn×(s+t)
be a partitioning of the part of the matrix S belonging to √ the eigenvalue 0. Then
on the one hand AS0 = 0n,r ¦ A00 ¦ 0n,s+t with A00 = − 2[a− −
1 . . . ar ], because the
columns of S00 and S̃0 constitute a basis of ker A after the kernel transformation, and
on the other hand S0 K0 = 0n,r ¦ S00 ¦ 0n,s+t . Therefore the matrices ASK̃−1 and
MSK̃−1 take on the form
ASK̃−1 = AS3 K−1 −1 −1 −1
3 ¦ AS2 K2 ¦ AS1 K1 ¦ AS0 K̃0

with AS0 K̃−1 0


0 = A0 ¦ 0n,r+s+t ,

MSK̃−1 = S3 K3 K−1 −1 −1 −1
3 ¦ S2 K2 K2 ¦ S1 K1 K1 ¦ S0 K̃0 K̃0

with S0 K0 K̃−1 0
0 = S0 ¦ 0n,r+s+t

and their respective first m = n − r − s − t columns


(ASK̃−1 )m = AS3 K−1 −1 −1 0
3 ¦ AS2 K2 ¦ AS1 K1 ¦ A0 ,

(MSK̃−1 )m = S3 ¦ S2 ¦ S1 ¦ S00 ,
are bases of im A or im M for which
(ASK̃−1 )∗m H(ASK̃−1 )m = (MSK̃−1 )∗m H(MSK̃−1 )m =
· ¸ · ¸ · ¸
Ip3 Ip2 Ip1
(ZK )m = ⊕ ⊕ ⊕ 0r .
Ip3 −Iq2 −Ip1
32
Now both bases can be extended according to Theorem 2.7 to bases of the Cn such
that

(ASK̃−1 )n = AS3 K−1 −1 −1 0 00


3 ¦ AS2 K2 ¦ AS1 K1 ¦ (A0 ¦ A0 ¦ Ã0 ),

(MSK̃−1 )n = S3 ¦ S2 ¦ S1 ¦ (S00 ¦ S000 ¦ S̃0 ),

so that (ASK̃−1 )∗n H(ASK̃−1 )n = (MSK̃−1 )∗n H(MSK̃−1 )n = ZK is fulfilled. This


is obviously trivial in the case of im M because there (MSK̃−1 )n = S can be chosen.
In the case of im A it is convenient to start with the matrix
· ¸
1 −1 Ip3 Ip3
Cm = √ AS3 K3 ¦ AS2 K−1 −1 0
2 ¦ AS1 K1 ¦ A0 with
2 Ip3 −Ip3
· ¸ · ¸ · ¸
∗ Ip3 Ip2 Ip1
Cm HCm = ⊕ ⊕ ⊕ 0r ,
−Ip3 −Iq2 −Ip1

whose columns already contain an orthonormal basis of im A. Finally

U = (ASK̃−1 )n (MSK̃−1 )−1


n = (ASK̃
−1
)n S−1 = (ASK̃−1 )n (ZK S∗ H)

gives the wanted H-isometry for which UM is an H-polar decomposition of A. ♦


In order to be able to assess the numerical properties of the described procedure,
a corresponding implementation was tested using the programming language C. For
this purpose the canonical forms (K, ZK ) were given according to step 2 with

p3 = p2 = q2 = p1 = r = s = t = 2, i.e. n = 20,

and random values for ω, α, β. Using randomly chosen transformations S̃ and ran-
domly chosen H-isometries Ũ, test examples of the kind

M̃ = S̃−1 KS̃, H = S̃∗ ZK S̃, Ũ∗ HŨ = H and A = ŨM̃

were constructed, always based on normally distributed random numbers from the
interval [−2, 2]. Finally H-polar decompositions UM of the test matrices A were
computed, whose numerical accuracy can be estimated via the residuals or the con-
dition number, respectively,

rA = kA − UMk, rM = kM∗ H − HMk, rU = kU∗ HU − Hk, cU = kUkkU−1 k


p
where kAk = tr(A∗ A) is the Frobenius norm. The results of two statistical ex-
periments with 50 repetitions are shown in the following table, in which µ is the
(empirical) mean value, σ 2 the (empirical) variance and min/max specify the respec-
tive minimum and maximum value. The machine accuracy and the parameter τ were
thereby given as

²machine = 2.2204460492503131 · 10−16 and τ = 10−6 .

For the first variant the Gauss-Jordan method of matrix inversion [ST, Kapitel
4.2] was used to compute the matrix S−1 ; for the second variant the inversion was
accomplished with the equation S−1 = ZK S∗ H. The eigenvalues were computed in
the depicted cases using a generalisation of the Jacobi method according to Eberlein
[E]. In other experiments the LR or the QR method was used. It was found that all
methods produce nearly the same numerical results.
33
Table 4.1
Results of two statistical experiments

Variant k = 50 µ σ2 min max 10µ


Jacobi log rA -10.444 1.623 -12.249 -6.954 3.597e-11
with log rM -8.696 1.681 -10.565 -5.674 2.014e-09
inverse log rU -8.303 1.919 -10.225 -5.303 4.977e-09
log cU 4.498 0.868 2.986 6.452 31 477
Jacobi log rA -8.890 2.560 -11.260 -5.040 1.288e-09
without log rM -12.296 0.097 -12.805 -11.216 5.058e-13
inverse log rU -8.297 1.904 -10.140 -5.287 5.047e-09
log cU 4.498 0.868 2.986 6.452 31 477

5. Factor analysis and procrustes problems. This chapter shows how pro-
crustes problems in indefinite scalar product spaces are solved by application of H-
polar decompositions. As motivation for this task, it will first of all be shown how to
construct points from given distances. This is a generalisation of the work [YH] for
complex vector spaces and indefinite scalar products.
Let F = R or F = C and let [., .] be an indefinite scalar product of the Fn with
the underlying regular symmetric or hermitian matrix H ∈ Fn×n . Then for arbitrary
vectors x, y ∈ Fn in the case F = R
1
[x, y] = ([x, x] + [y, y] − [x − y, x − y]) (5.1a)
2
and in the case F = C
1
Re[x, y] = ([x, x] + [y, y] − [x − y, x − y])
2
1
= ([x, x] + [y, y] − [iy − ix, iy − ix]),
2 (5.1b)
1
Im[x, y] = ([x, x] + [y, y] − [x − iy, x − iy])
2
1
= − ([x, x] + [y, y] − [y − ix, y − ix]),
2
so that the scalar products of the vectors can be expressed in terms of their norm
and distance squares. Now let the vectors x1 , . . . , xN ∈ Fn be given and let X =
[x1 . . . xN ]∗ ∈ FN ×n be a matrix whose rows are the conjugate transposed vectors.
Then
W = XHX∗
is the Gramian matrix of the xk and for the dimension of the set of points which they
set up m = rank X = rank W. In particular, if N ≥ n and span{x1 , . . . , xN } = Fn ,
then the number of positive and the number of negative eigenvalues of H and W are
equal, and furthermore the eigenvalue 0 appears in σ(W) with the multiplicity N − n
(Sylvester’s law of inertia). Moreover, the elements wkl = [xl , xk ] of the matrix W
according to (5.1) can be expressed in the form
1 2
wkl = (ρ + ρ2l − σkl
2
) if F = R or (5.2a)
2 k
1 i
wkl = (ρ2k + ρ2l − σkl
2
) + (ρ2k + ρ2l − τkl
2
) if F = C. (5.2b)
2 2
34
where

ρ2k = [xk , xk ], σkl


2 2
= [xl − xk , xl − xk ], τkl = [xl − ixk , xl − ixk ] (5.3)

with ρ2k , σkl


2 2
, τkl 2
∈ R and σkl 2
= σlk 2
, σkk 2
= 0 or τkl 2
+ τlk = 2(ρ2k + ρ2l ) for 1 ≤ k, l ≤ N .
Conversely, let the norm and distance squares be given such that (5.3) is satisfied,
and let the elements of a matrix W be defined by (5.2). Then this matrix is symmetric
or hermitian and can therefore be written in the form

W = RΛR∗ .

Thereby Λ is a diagonal matrix of the real eigenvalues λ1 , . . . , λN of W and R =


[r1 . . . rN ] is a matrix whose columns form a basis of the FN consisting of orthonor-
malised eigenvectors. Now if p is the number of positive and n − p is the number of
negative eigenvalues and if it is assumed that

λ1 , . . . , λp > 0, λp+1 , . . . , λn < 0 and λn+1 = . . . = λN = 0,

then the matrices defined by

Λ1 = diag(λ1 , . . . , λp , λp+1 , . . . , λn ) and R1 = [r1 . . . rn ]

satisfy W = R1 Λ1 R∗1 too. Consequently if we make


p p p p
Σ = diag( λ1 , . . . , λp , −λp+1 , . . . , −λn ) and
H = diag(+1, . . . , +1, −1, . . . , −1)
| {z } | {z }
p n−p

then the matrix

X = R1 Σ ∈ FN ×n

fulfils on the one hand rank X = n and on the other hand XHX∗ = R1 ΣHΣ∗ R∗1
= R1 Λ1 R∗1 = W. Therefore the conjugate transposed rows x1 , . . . , xN ∈ Fn of
the matrix X constitute a generating system of the Fn , and for the indefinite scalar
product defined by [x, y] = (Hx, y) we have wkl = [xl , xk ]. This means that also

[xk , xk ] = wkk = ρ2k ,


2
[xl − xk , xl − xk ] = wkk + wll − wkl − wlk = σkl ,
2
[xl − ixk , xl − ixk ] = wkk + wll + iwkl − iwlk = τkl (if F = C),

so that the constructed points correspond to the given norm and distance squares.
We thus have proved the following theorem.
Theorem 5.1 (Construction of vectors from norm and distance squares). Let
F = R or F = C and let ρ2k , σkl 2
be real numbers such that σkl 2 2
= σlk and σkk2
= 0 for
2
all k, l from {1, . . . , N }. Furthermore, for the case F = C let τkl be real numbers such
2 2
that τkl + τlk = 2(ρ2k + ρ2l ) for all k, l from {1, . . . , N }. Then the following statements
are equivalent:
1. There exist vectors x1 , . . . , xN ∈ Fn constituting a generating system for the
Fn , for which [xk , xk ] = ρ2k as well as [xl − xk , xl − xk ] = σkl2
, and in the case
2
F = C also [xl − ixk , xl − ixk ] = τkl is satisfied. Thereby [., .] is an indefi-
nite scalar product of the Fn with underlying regular symmetric or hermitian
matrix H ∈ Fn×n which has p positive eigenvalues.
35
2. The symmetric or hermitian matrix W ∈ FN ×N whose elements wkl are de-
fined by (5.2) has p positive and n−p negative eigenvalues, and the eigenvalue
0 appears with multiplicity N − n.
A real vector space provided with an indefinite scalar product defined by a matrix

Hq = diag(+1, . . . , +1, −1, . . . , −1)


| {z } | {z }
n−q q

is also called a pseudo-Euclidean space; a corresponding complex vector space is called


a pseudo-unitary space. In both cases q is termed the index of inertia [GR, Chapter
IX, §4]. In the case of q = 0 we have an Euclidean or unitary space for which we
immediately get the following corollary.
Corollary 5.2. The vectors which exist according to Theorem 5.1 are vectors
of an Euclidean or unitary space if and only if the matrix W is positive semidefinite.
In the case N = 2 of a real space, the vectors
−−→ −−→
x1 = OP 1 and x2 = OP 2

describe a triangle with the corner points O, P1 , P2 . Furthermore


1 2 2 1
det W = (ρ1 ρ2 + ρ21 σ12
2
+ ρ22 σ12
2
) − (ρ41 + ρ42 + σ12
4
)
2 4
1¡ 2 ¢¡ ¢
= σ12 − (ρ1 − ρ2 )2 (ρ1 + ρ2 )2 − σ12 2
4
and for ρ1 , ρ2 , σ12 ≥ 0 this determinant is non-negative if and only if

|ρ1 − ρ2 | ≤ σ12 and σ12 ≤ ρ1 + ρ2

and the relations obtained by cyclic exchange of the variables are fulfilled. But this
is just the triangle inequality, so that Corollary 5.2 contains a generalisation of this
essential property of Euclidean geometry.
Remark 5.3 (Factor analysis and multidimensional scaling). If the coordinates
2 2
origin is designated x0 and writing σk0 = σ0k instead of ρ2k , then Theorem 5.1 shows
how N +1 points (objects) must be arranged in a coordinates system so that they take
up given distances. This constitutes the basis for an entire discipline in psychology
which is there called factor analysis [H] or multidimensional scaling [D]. Essentially
this is a procedure for geometric modelling of cognitive processes and analysing the
resulting constellations of objects with regard to the geometric invariants (dimension,
signature, lengths, volumes, angles). It would thereby also be possible in principle to
interpret physical invariants, because the matrix
N
X

£ αβ ¤ αβ β
T = X X, T = T with T = xα
k xk for 1 ≤ α, β ≤ n
k=1

can be interpreted as (contravariant) tensor of inertia in the sense of Hermann Weyl


who discovered in this connection an interesting confusion of an antisymmetric tensor
with a vector product, which is possible only in R3 [WEY, §6]. Unfortunately this
error has so far not been corrected in the textbooks of classical physics, so that the
dynamics of rotational motion could partly be presented in a simplified form.
At any rate, for the points constructed in Theorem 5.1 T = Σ∗ R∗1 R1 Σ = Σ2 is
a diagonal matrix, so that the xk are oriented along its inertial axes and the absolute
36
values of the eigenvalues give the associated (contravariant) moments of inertia. It
must also be mentioned here that the scalar products of N points x1 , . . . , xN ∈ Fn
whose centroid lies at the coordinates origin, i.e.
N
X
xk = 0,
k=1

are determined by the equations


µ N N
1 1 X 1 X
[xk , xl ] = kxj − xk k2H + kxl − xi k2H
2 N j=1 N i=1
N N ¶
1 XX
− kxl − xk k2H − 2 2
kxj − xi kH if F = R,
N i=1 j=1
µ N
1 1 X
[xk , xl ] = (kxj − xk k2H + ikxj − ixk k2H )
2 N j=1
N
1 X
+ (kxl − xi k2H + ikxl − ixi k2H )
N i=1
− (kxl − xk k2H + ikxl − ixk k2H )
N N ¶
1 XX 2 2
− 2 (kxj − xi kH + ikxj − ixi kH ) if F = C
N i=1 j=1

whereby for brevity kxk2H = [x, x] has been set [T]. From the given distance squares
2
σkl = kxl − xk k2H and τkl 2
= kxl − ixk k2H it is therefore possible to calculate the
elements wkl = [xl , xk ] of a matrix W whose row and column sums vanish. Using the
construction of Theorem 5.1 this produces points whose centroid lies at the origin so
that the resulting set of points is oriented along its central main inertial axes. However,
2
in the complex case this is possible only when the τkl satisfy a rather complicated
condition arising from the centroid situation. ♦
After this brief excursion, the main result of this chapter will now be derived. For
this purpose let x1 , . . . , xN still be the vectors, constructed according to Theorem 5.1,
of a Fn provided with an indefinite scalar product [., .] = (H., .). For every H-isometry
U ∈ Fn×n it then follows that

[Uxl , Uxk ] = [xl , xk ] = wkl for U∗ HU = H,

which can also be expressed in matrix equation form

XVHV∗ X∗ = XHX∗ = W for VHV∗ = H

by making V = U∗ . Thus the conjugate transposed rows x0k = Uxk contained in the
matrix X0 = XV take on the specified distances, too.
Now let two N −tuples of vectors (x1 , . . . , xN ) and (y1 , . . . , yN ) of the Fn be given,
which one can consider as having arisen, for example, by measuring the distances of
a dynamical system of objects at two different times. Then, on comparing the two
constellations, the question arises, what part of the observed differences is due to the
different position in space, and what part is due to actual differences in the inner
37
structure of the constellations. Expressed mathematically, the task is to determine
an H-isometry U ∈ Fn×n which solves the optimising problem
N
X
f (U) = [Uxk − yk , Uxk − yk ] → min, h(U) = U∗ HU − H = 0. (5.4a)
k=1

The sum of distance squares arising therein can be expressed in the form of a trace,
so that an alternative expression with

f (V) = tr[(XV − Y)∗ (XV − Y)H] → min, h(V) = VHV∗ − H = 0 (5.4b)

is given, whereby as above X = [x1 . . . xN ]∗ , Y = [y1 . . . yN ]∗ and V = U∗ were set.


Within the scope of Euclidean vector spaces a solution of this problem was found in
the paper [S] where it was called the orthogonal procrustes problem. In the present
context of indefinite scalar products it is furthermore called the H-orthogonal or H-
unitary procrustes problem. By introducing a matrix of the (unknown) Lagrange
multipliers L ∈ Fn×n , the side condition can then be stated in the form

h(V) = tr[L(VHV∗ − H)] (5.5)

and the necessary first order condition for solving the problem is

(f + h) = 0.
∂V
Differentiation of the trace [DP] gives

∂f ∂h
= 2X∗ XVH − 2X∗ YH and = (L + L∗ )VH,
∂V ∂V
so that V must satisfy the equation
L + L∗
X∗ XVH + ΛVH = X∗ YH for Λ = = Λ∗
2
and U must satisfy the equation

HUX∗ X + HUΛ = HY∗ X or UX∗ XH + UΛH = Y∗ XH. (5.6)

Now defining

M = (X∗ X + Λ)H and A = Y∗ XH,

it follows that M∗ H = H(X∗ X + Λ)H = HM and the necessary condition takes the
form

A = UM with A = Y∗ XH and U∗ HU = H, M∗ H = HM. (5.7)

Thus the solution of the problem can be determined by an H-polar decomposition of


the matrix A. Strictly speaking, the trace should not be differentiated in the complex
case, because the complex derivatives do not exist. But there the necessity for the
last equation can be derived too by determining the real derivatives.
If A = A1 + iA2 ∈ Cm×n is a complex matrix, then its real and imaginary parts
are designated below with A1 and A2 respectively. The same applies to complex
38
scalars λ = λ1 + iλ2 ∈ C. Furthermore, let the real representations of a matrix of the
first and second kind be defined by
· ¸ · ¸
∧ A1 −²A2 2m×2n ∨ A2 ²A1
A = ∈R and A = ∈ R2m×2n
²A2 A1 −²A1 A2

whereby ² ∈ {+1, −1} is called the characteristic of the depiction. For the operators
∧ and ∨ introduced this way, the following calculation rules apply.
Lemma 5.4 (Real depiction of complex matrix equations). Let X, Y ∈ Cm×n ,
Z ∈ Cn×k , A, B ∈ Cn×n and λ ∈ C. Then
1. X∧ = (iX)∨ , X∨ = (−iX)∧ ,
2. (λX)∧ = λ1 X∧ − λ2 X∨ , (λX)∨ = λ1 X∨ + λ2 X∧ ,
3. (X + Y)∧ = X∧ + Y∧ , (X + Y)∨ = X∨ + Y∨ ,
4. (XZ)∧ = X∧ Z∧ = −X∨ Z∨ , (XZ)∨ = X∧ Z∨ = X∨ Z∧ ,
T T
5. (X∗ )∧ = (X )∧ = (X∧ )T , (X∗ )∨ = (X )∨ = −(X∨ )T ,
6. A∧ = (A∧ )T , A∨ = −(A∨ )T , if A∗ = A,
7. B∧ = −(B∧ )T , B∨ = (B∨ )T , if B∗ = −B,
8. (A−1 )∧ = (A∧ )−1 , (A−1 )∨ = −(A∨ )−1 , if det(A) 6= 0,
9. 2 tr(A) = tr(A∧ ) + i tr(A∨ ),
10. | det(A)|2 = det(A∧ ) = det(A∨ ).
Proof. The proofs are obtained by simple verification demonstrated by some
examples. Proof of 2 and 3 :

(λX)∧ = [(λ1 X1 − λ2 X2 ) + i(λ1 X2 + λ2 X1 )]∧


· ¸
λ1 X 1 − λ2 X 2 −²(λ1 X2 + λ2 X1 )
=
²(λ1 X2 + λ2 X1 ) λ1 X 1 − λ2 X 2
· ¸ · ¸
X1 −²X2 X2 ²X1
= λ1 − λ2
²X2 X1 −²X1 X2
= λ1 X∧ − λ2 X∨ analogously (λX)∨ = λ1 X∨ + λ2 X∧ ,
(X + Y) = [(X1 + Y2 ) + i(X2 + Y2 )]∧

· ¸
X1 + Y1 −²(X2 + Y2 )
=
²(X2 + Y2 ) X1 + Y1
· ¸ · ¸
X1 −²X2 Y1 −²Y2
= +
²X2 X1 ²Y2 Y1
= X∧ + Y∧ analogously (X + Y)∨ = X∨ + Y∨ .

Proof of 8 : Let B = A−1 . Then I+i0 = AA−1 = AB = (A1 +iA2 )(B1 +iB2 ) =
(A1 B1 − A2 B2 ) + i(A1 B2 + A2 B1 ) and therefore
· ¸· ¸
∧ −1 ∧ ∧ ∧ A1 −²A2 B1 −²B2
A (A ) = A B =
²A2 A1 ²B2 B1
· 2
¸ · ¸
A1 B1 − ² A2 B2 −²(A1 B2 + A2 B1 ) I 0
= = ,
²(A1 B2 + A2 B1 ) A1 B1 − ²2 A2 B2 0 I
· ¸· ¸
A2 ²A1 B2 ²B1
A∨ (A−1 )∨ = A∨ B∨ =
−²A1 A2 −²B1 B2
· ¸ · ¸
A2 B2 − ²2 A1 B1 ²(A1 B2 + A2 B1 ) −I 0
= = .
−²(A1 B2 + A2 B1 ) A2 B2 − ²2 A1 B1 0 −I
39
Proof of 10 : On the one hand | det(A)|2 = det(A)det(A) = det(A) det(A) =
det(AA) = det[(A1 + iA2 )(A1 − iA2 )] = det[A21 + A22 ] and on the other hand
· ¸ · ¸
A1 −²A2 A1 − i²A2 −²A2
det(A∧ ) = det = det
²A2 A1 ²A2 + iA1 A1
· ¸
A1 − i²A2 −²A2
= det
0 A1 + i²A2
= det[(A1 − i²A2 )(A1 + i²A2 )] = det[A21 + A22 ],
· ¸ · ¸
A2 ²A1 A2 + i²A1 ²A1
det(A∨ ) = det = det
−²A1 A2 −²A1 + iA2 A2
· ¸
A2 + i²A1 ²A1
= det
0 A2 − i²A1
= det[(A2 + i²A1 )(A2 − i²A1 )] = det[A21 + A22 ].

If now the abbreviation Z = XV−Y is used, the equations (5.4b) and (5.5) in the
case F = C can be brought with the Lemma 5.4 into the equivalent real representation

2f (V) = 2 tr(Z∗ ZH) = 2 tr(ZHZ∗ )


= tr[(ZHZ∗ )∧ ] + i tr[(ZHZ∗ )∨ ]
= tr[(ZH)∧ (Z∗ )∧ ] + i tr[(ZH)∨ (Z∗ )∧ ]
= tr[Z∧ H∧ (Z∧ )T ] + i tr[Z∧ H∨ (Z∧ )T ]
= tr[(X∧ V∧ − Y∧ )H∧ (X∧ V∧ − Y∧ )T ]
= tr[(X∧ V∧ − Y∧ )T (X∧ V∧ − Y∧ )H∧ ]
= f˜(V∧ )

and

2h(V) = tr[L∧ (VHV∗ − H)∧ ] + i tr[L∨ (VHV∗ − H)∧ ]


= tr[L∧ (V∧ H∧ (V∧ )T − H∧ )] + i tr[L∨ (V∧ H∨ (V∧ )T − H∨ )]
= h̃1 (V∧ ) + ih̃2 (V∧ ) = h̃(V∧ ).

Thereby the antisymmetry of H∨ was taken into account in the first transformation,
from which the vanishing of the imaginary part follows. The necessary first order
condition for an optimum is

∂ f˜ ∂ h̃1 ∂ h̃2
+ = 0 and =0
∂V∧ ∂V∧ ∂V∧
and the differentiation of the trace gives

∂ f˜
= 2(X∧ )T X∧ V∧ H∧ − 2(X∧ )T Y∧ H∧ = 2(X∗ XVH − X∗ YH)∧ ,
∂V∧
∂ h̃1
= [L∧ + (L∧ )T ]V∧ H∧ = [(L + L∗ )VH]∧ ,
∂V∧
∂ h̃2
= [L∨ − (L∨ )T ]V∧ H∧ = [(L − L∗ )VH]∨ ,
∂V∧
40
so that the equations now written again in complex form
L + L∗ L − L∗
X∗ XVH − X∗ YH + VH = 0 and VH = 0
2 2
must be satisfied. The second equation demands that the antihermitian part of the
Lagrange multipliers vanishes, and the first can be expressed with the abbreviation
Λ for the hermitian part and with V∗ = U in the form

HUX∗ X + HUΛ = HY∗ X

This is just (5.6), so that in the complex case too the necessary condition (5.7) must
be fulfilled.
Lastly, to also get a sufficient criterion for the minimum of the function f , let
A = UM be an H-polar decomposition of the matrix A = Y∗ XH and let
k
M
(R−1 A[∗] AR, R∗ HR) = (J, ZJ ) with J = Jpi (λi ),
i=1
Mm
(S−1 MS, S∗ HS) = (K, ZK ) with K = Jpj (κj )
j=1

be the canonical forms of the pairs (A[∗] A, H) and (M, H), respectively. Returning
therewith to the initial equation (5.4b), we find that

f (V) = tr[(XV − Y)∗ (XV − Y)H]


= tr(V∗ X∗ XVH − V∗ X∗ YH − Y∗ XVH + Y∗ YH)
= tr(X∗ XH) + tr(Y∗ YH) − 2 Re[tr[(Y∗ XH)(H−1 U∗ H)]]
= τ − 2 Re[tr(AU−1 )] = τ − 2 Re[tr(UMU−1 )] = τ − 2 Re[tr(SKS−1 )]
= τ − 2 Re[tr(K)] for τ = tr(X∗ XH) + tr(Y∗ YH).

Thus the value of f (V) is minimised when in the calculation of the H-polar decompo-
sition of A the square roots κρ of the positive real eigenvalues λρ of A[∗] A are chosen
positive (Theorem 3.4, case b)
p
κρ = + λρ for λρ > 0 (5.8a)

and when the square roots κσ , κσ of the non-real eigenvalues λσ , λσ of A[∗] A are
chosen such that their real parts are positive (Theorem 3.4, case a)
√ ϕσ ϕσ
κσ = + ασ (cos + i sin ) with
2 2 (5.8b)
ασ = |λσ |, ϕσ = arg(λσ ) for λσ ∈ C\R.

The roots of the negative real eigenvalues and of the eigenvalue 0, which are fixed by
specification anyway, make no contribution to the optimised value. Thus altogether
the following theorem applies.
Theorem 5.5 (Existence of solutions of the H-isometric procrustes problem).
A solution of the H-orthogonal or H-unitary procrustes problem (5.4) exists if the
matrix A = Y∗ XH admits an H-polar decomposition. In this case the H-isometry U
or V = U∗ contained in the decomposition A = UM minimises the function f when
the eigenvalues of M are chosen according to (5.8).
41
Therefore, in the indefinite case a situation can occur in which no solution for
(5.4) can be found. Moreover, there is also another problem which will be pointed
out as follows:
In a pseudo-Euclidean or pseudo-unitary space with index of inertia q the scalar
product of two vectors x = (x1 , . . . , xn )T and y = (y 1 , . . . , y n )T whose coordinates
are referred to the canonical basis {e1 , . . . , en } are defined by
n−q
X n
X
(x, y)q = xα y α − xα y α (y α = y α if F = R).
α=1 α=n−q+1

Therefore if {h1 , . . . , hn } is an arbitrary non-orthogonal basis and if x0 = (ξ 1 , . . . , ξ n )T


and y0 = (η 1 , . . . , η n )T are the coordinates of the vectors with respect to this basis,
so that
X X X X
x= xβ eβ = ξ β hβ and y = y α eα = η α hα ,
β β α α

then the scalar product satisfies


n X
X n n X
X n
(x, y)q = ξ β η α (hβ , hα )q = hαβ ξ β η α = (Hx0 , y0 ) = [x0 , y0 ],
α=1 β=1 α=1 β=1

whereby the components of the matrix H, which is called the metric tensor in tensor
algebra, are given by hαβ = (hβ , hα )q for 1 ≤ α, β ≤ n. From this viewpoint the
H-orthogonal or H-unitary procrustes problem is a generalisation of the orthogonal
procrustes problem for arbitrary non-orthogonal coordinate systems and arbitrary
index of the scalar product of a real or complex vector space. Whereas the solution
of the problem according to Theorem 5.5 is satisfactory in the case of a positive (or
negative) definite matrix H, in the case of an indefinite matrix H it must be taken into
consideration that the minimum of the function f can have a considerable negative
contribution, so that the desired goal - convergence of two tuples of points in the sense
of an optimum congruence - cannot be achieved this way in most cases. However, if
G and H are regular symmetric or hermitian matrices from Fn×n and if the geometry
within the tuples (x1 , . . . , xN ) and (y1 , . . . , yN ) is measured with the scalar product
[., .]G = (G., .), but the geometry between the tuples is measured with the scalar
product [., .]H = (H., .), then the problem can be expressed, instead of (5.4), as
N
X
f (U) = [Uxk − yk , Uxk − yk ]H → min with
k=1
(5.9a)
∗ ∗
g(U) = U GU − G = 0 and h(U) = U HU − H = 0

or in matrix notation
f (V) = tr[(XV − Y)∗ (XV − Y)H] → min with
(5.9b)
g(V) = VGV∗ − G = 0 and h(V) = VHV∗ − H = 0,

which is still called the (G,H)-orthogonal or (G,H)-unitary procrustes problem. Con-


sidering the vectors xk and yk as produced by a construction according to Theorem
5.1, the internal metric G is given, but the external metric H can then be chosen
positive definite, whereby a sum of non-negative distance squares is optimised and
42
thus a solution of the problem - which then always exists - also corresponds to the
visualisable optimum reconciliation.
If again K, L ∈ Fn×n are matrices of the (unknown) Lagrange multipliers and if
the side conditions are brought into the form

g(V) = tr[K(VGV∗ − G)] and h(V) = tr[L(VHV∗ − H)],

then the necessary condition


(f + g + h) = 0
∂V
leads in the same way as above to the equation

GUA + HUB = C̃ with C̃ = HY∗ X and


K + K∗ L + L∗
A= = A∗ , B = X∗ X + = B∗ .
2 2
Furthermore GUG−1 = U−∗ = HUH−1 , so that the transformations

C̃ = GUG−1 GA + HUB = HUH−1 GA + HUB = HU(H−1 GA + B),


C̃ = GUA + HUH−1 HB = GUA + GUG−1 HB = GU(A + G−1 HB)

can be made, giving

UM = H−1 C̃H + G−1 C̃G = C with


(5.10)
M = H−1 GAH + BH + AG + G−1 HBG.

If the matrices G and H are now subjected to the additional condition

H−1 G = µ2 G−1 H for a µ ∈ R\{0}, (5.11)

then on the one hand

M∗ H − HM = GBHG−1 H − HG−1 HBG = µ−2 (GBG − GBG) = 0,


M∗ G − GM = HAGH−1 G − GH−1 GAH = µ2 (HAH − HAH) = 0

and on the other hand (5.10) implies

HCH−1 = C̃ + HG−1 C̃GH−1 = (µ2 /µ2 )GH−1 C̃HG−1 + C̃ = GCG−1


or H−1 C∗ H = G−1 C∗ G.

The necessary condition for solving (5.9), with the additional prerequisite (5.11), thus
finally takes on the form

UM = C with C = Y∗ XH + G−1 HY∗ XG (5.12)


H G H G −1 H G
and C =C , U =U =U , M = M = M.

Calling this representation of the matrix C a (G,H)-polar decomposition and calling


the factor U a (G,H)-isometric (-orthogonal or -unitary) matrix and the factor M
a (G,H)-selfadjoint (-symmetric or -hermitian) matrix, this result can be expressed
43
by the following theorem, in which the argumentation regarding the minimum of the
function f is taken over unchanged from above.
Theorem 5.6 (Existence of solutions of the (G,H)-isometric procrustes problem).
A solution of the (G,H)-orthogonal or (G,H)-unitary procrustes problem (5.9) with the
additional condition (5.11) exists if the matrix C = Y∗ XH + G−1 HY∗ XG admits a
(G,H)-polar decomposition. In this case the (G,H)-isometry U or V = U∗ contained
in the decomposition C = UM minimises the function f when the eigenvalues of M
are chosen according to (5.8).
Before some final remarks can be made concerning (G,H)-polar decompositions,
the matrices G and H which satisfy (5.11) must be characterised.
Lemma 5.7. Let F = R or F = C and let G, H be regular symmetric or hermitian
matrices from Fn×n . Then H−1 G = µ2 G−1 H for a µ ∈ R\{0} if and only if there
exists a regular matrix S ∈ Fn×n such that

S∗ HS = Ip ⊕ −Iq ⊕ Ir ⊕ −Is and S∗ GS = µ(Ip ⊕ −Iq ⊕ −Ir ⊕ Is )

for suitable constants p, q, r, s ∈ N with p + q + r + s = n.


Proof. [⇒]: Let A ∈ Fn×n be a regular matrix such that A = µ2 A−1 for a
µ ∈ R\{0}. Then A2 = µ2 I so that the Jordan normal form of A must take on the
form

P−1 AP = J = diag(±µ).

In particular, if A = H−1 G, then it follows that

(P∗ HP)−1 (P∗ GP) = P−1 H−1 GP = J


= J∗ = P∗ GH−1 P−∗ = (P∗ GP)(P∗ HP)−1 .

Thus the selfadjoint matrices P∗ HP and P∗ GP commute and can therefore be di-
agonalised simultaneously, so that an orthogonal or unitary matrix Q consisting of
eigenvectors of P∗ HP (or P∗ GP) can now be chosen for which

P∗ HP = QΛH Q∗ and P∗ GP = QΛG Q∗

where ΛH , ΛG are diagonal matrices containing the real eigenvalues. This means that

µ2 I = (Q∗ JQ)2 = (Q∗ (QΛH Q∗ )−1 (QΛG Q∗ )Q)2 = (Λ−1


H ΛG )
2

and consequently Λ−1


H ΛG too can be written in the form
8

Λ−1
H ΛG = µΣ with Σ = diag(±1).

Thus, setting ΛH = |ΛH |ΣH and ΛG = |ΛG |ΣG , whereby ΣH and ΣG contain the
signs of the eigenvalues, we obtain ΣH ΣG = Σ as well as |ΛH |−1 |ΛG | = µI and for
p −1
S = PQ |ΛH | we finally have
p p−∗ −1 p −1 p −1
S∗ HS = |ΛH | Q∗ P∗ HPQ
|ΛH | = |ΛH | ΛH |ΛH | = ΣH ,
p −∗ p −1 p −1 p −1
S∗ GS = |ΛH | Q∗ P∗ GPQ |ΛH | = |ΛH | ΛG |ΛH | = µΣG .

8 The matrices µΣ and J have the same diagonal elements, but their arrangement may be different.
44
The asserted form can always
p be obtained by suitable permutation. (The magnitudes
|Λ| and the square roots |Λ| must always be taken element by element.)
[⇐]: It is true that H−1 G = µS(Ip+q ⊕ −Ir+s )S−1 and G−1 H = µ−1 S(Ip+q ⊕
−Ir+s )S−1 , from which the assertion follows directly.
Evidently a (G,H)-polar decomposition A = UM with UH = UG = U−1 , MH =
M = M can exist only if H−1 A∗ H = H−1 M∗ HH−1 U∗ H = G−1 M∗ GG−1 U∗ G =
G

G−1 A∗ G or AH = AG , which has already been shown for the case (5.12). Matrices
with this property can be characterised as follows.
Lemma 5.8. Let F = R or F = C and let G, H be regular symmetric or hermitian
matrices from Fn×n such that H−1 G = µ2 G−1 H for a µ ∈ R\{0}. Furthermore let
A ∈ Fn×n with GAG−1 = HAH−1 (or G−1 A∗ G = H−1 A∗ H). Then there exists a
regular matrix S ∈ Fn×n such that

S−1 AS = A1 ⊕ A2 , S∗ HS = J1 ⊕ J2 , S∗ GS = µJ1 ⊕ −µJ2

with A1 ∈ F(p+q)×(p+q) , A2 ∈ F(r+s)×(r+s) and J1 = Ip ⊕ −Iq , J2 = Ir ⊕ −Is .


Proof. For the regular matrix S ∈ Fn×n existing according to Lemma 5.7, the
matrices S∗ HS and S∗ GS take on the asserted form and H−1 G = SFS−1 where
F = µIp+q ⊕ −µIr+s . According to the prerequisite we also have F(S−1 AS) =
S−1 (H−1 GA)S = S−1 (AH−1 G)S = (S−1 AS)F, which is possible only when S−1 AS
too has the asserted form.
Now if A is a matrix such that AH = AG and if it admits an H-polar de-
composition A = UM, UH = U−1 , MH = M, then although G−1 M∗ U∗ G =
H−1 M∗ HH−1 U∗ H = MU−1 or M∗ U∗ GU = GM it cannot be concluded that
it also admits a G- or a (G,H)-polar decomposition. The needed criterion for the
existence of a (G,H)-polar decomposition is provided in the following statement.
Lemma 5.9. Let F = R or F = C and let G, H be regular symmetric or hermitian
matrices from Fn×n such that H−1 G = µ2 G−1 H for a µ ∈ R\{0}. Furthermore, let
A ∈ Fn×n with GAG−1 = HAH−1 (or G−1 A∗ G = H−1 A∗ H) and let S ∈ Fn×n
be a regular matrix so that S−1 AS, S∗ HS and S∗ GS take on the form from Lemma
5.8. Then A admits a (G,H)-polar decomposition if and only if A1 admits a J1 −polar
decomposition and A2 admits a J2 −polar decomposition.
Proof. Let A = UM be a (G,H)-polar decomposition. Then UH = UG and
M = MG imply
H

S−1 US = U1 ⊕ U2 and S−1 MS = M1 ⊕ M2 ,

whereby the blocks A1 , J1 , U1 , M1 and A2 , J2 , U2 , M2 have the same size. Conse-


quently

A1 ⊕ A2 = (S−1 US)(S−1 MS) = U1 M1 ⊕ U2 M2 ,


U∗1 J1 U1 ⊕ U∗2 J2 U2 = (S−1 US)∗ (S∗ HS)(S−1 US) = S∗ HS = J1 ⊕ J2 ,
M∗1 J1 ⊕ M∗2 J2 = (S−1 MS)∗ (S∗ HS) = (S∗ HS)(S−1 MS) = J1 M1 ⊕ J2 M2 .

If conversely A1 = U1 M1 and A2 = U2 M2 are given J1 − and J2 −polar decompo-


sitions, then these are also (µJ1 )− and (−µJ2 )−polar decompositions and therefore
A = UM with U = S(U1 ⊕ U2 )S−1 and M = S(M1 ⊕ M2 )S−1 is a (G,H)-polar
decomposition.
Now if F = C and the only demand imposed on the regular hermitian matrices
G and H is that ρH − G ∈ Cn×n is a non-defective matrix pencil, then Example 4.2
45
provides a generalisation of Lemma 5.7. Then if A ∈ Fn×n is a matrix with AH = AG ,
it can easily be shown instead of Lemma 5.8 that a regular matrix S ∈ Fn×n exists
for which

S−1 AS = A1 ⊕ . . . ⊕ Ak , S∗ HS = H1 ⊕ . . . ⊕ Hk , S∗ GS = G1 ⊕ . . . ⊕ Gk

with

Aj ∈ Cp×p , Hj = Ip−q ⊕ −Iq , Gj = µ(Ip−q ⊕ −Iq )

for µ ∈ R\{0} or
· ¸ · ¸ · ¸
Aj,1 Ip µIp
Aj = ∈ C2p×2p , Hj = , Gj =
Aj,2 Ip µIp

for µ ∈ C\R. The corresponding generalisation of Lemma 5.9 states that the matrix A
admits a (G,H)-polar decomposition if and only if each block Aj admits a Hj −polar
decomposition. In conclusion, the statements of the lemmas will now be explained
with the help of two examples.
Example 5.10. 1. Let H = Ip ⊕ Ir and G = Ip ⊕ −Ir . Then a matrix
C ∈ F(p+r)×(p+r) with CH = CG takes on the form C = C1 ⊕ C2 where C1 ∈ Fp×p
and C2 ∈ Fr×r . If now

C1 = P1 Σ1 Q∗1 and C2 = P2 Σ2 Q∗2

are singular value decompositions of the blocks of C and if we make

U = P1 Q∗1 ⊕ P2 Q∗2 and M = Q1 Σ1 Q∗1 ⊕ Q2 Σ2 Q∗2 ,

then C = UM is the particular (G,H)-polar decomposition which solves a correspond-


ing procrustes problem. This example represents the most important application.
2. Let α, β, µ be real numbers with α 6= β, µ 6= 0 and let

H = diag(1, −1, 1, −1) and G = µ diag(1, −1, −1, 1).

The matrix
· ¸ · ¸
0 β 0 α
A1 = ⊕ , B1 = AH G 2 2 2 2
1 A1 = A1 A1 = diag(−α , −β , −β , −α ),
α 0 β 0

admits the H-polar decomposition


   
−i 0 0 iα
 0 −i  iβ 0
A1 = U1 M1 with U1 = 
−i 0
 , M1 = 
 0
,


0 −i iα 0

but U∗1 GU1 = −G and M∗1 G = −GM1 . A G-polar decomposition cannot exist,
because the pair (B1 , G) which is present in the canonical form does not fulfil the
condition 1 of Theorem 3.4 (or Theorem 4.4). The matrix
· ¸ · ¸
0 β 0 β
A2 = ⊕ , B 2 = AH G 2 2 2
2 A2 = A2 A2 = diag(−α , −β , −α , −β )
2
α 0 α 0
46
admits the G-polar decomposition
   
0 −i iα 0
 −i 0   0 iβ 
A2 = U2 M2 with U2 = 
 0 −i
 , M2 = 
 iα
,

0
−i 0 0 iβ

but U∗2 HU2 = −H and M∗2 H = −HM2 . An H-polar decomposition cannot exist,
because the pair (B2 , H) which is present in the canonical form does not fulfil the
condition 1 of Theorem 3.4 (or Theorem 4.4). But if we now make α = β, i.e.
· ¸ · ¸
0 α 0 α
A= ⊕ , B = AH A = AG A = diag(−α2 , −α2 , −α2 , −α2 ),
α 0 α 0

then
· ¸ · ¸ · ¸ · ¸
−i −i iα iα
A = UM with U = ⊕ , M= ⊕
−i −i iα iα

is a (G,H)-polar decomposition and evidently Lemma 5.9 is fulfilled. However, U1 M1


and U2 M2 still remain only H- or G-polar decompositions, respectively. ♦
6. Conclusions. After some preparatory considerations, a criterion for the ex-
istence of H-polar decompositions of a given matrix A, based on the comparison of
canonical forms, was specified with Theorem 3.7. This criterion, initially valid only for
regular matrices, was extended with Theorem 3.12 to an important class of singular
matrices. In this connection it remains to investigate the suspected fact that every
H-normal matrix A has an H-polar decomposition provided that A[∗] A is nilpotent.
Successful confirmation of this statement could then bring Lemma 3.6 and Theo-
rem 3.12 together, to constitute a simple characterisation of the existence of H-polar
decompositions.
Furthermore, Procedure 4.3 and 4.5 introduced numerical methods for computing
the canonical form of a pair (A, H) when A is diagonalisable, and an H-polar decom-
position of A when A[∗] A is diagonalisable. These methods cover numerous cases of
practical importance, some of which were investigated in Chapter 5.
Thereby Theorem 5.1 generalised a method for constructing points from given
distances, and Theorem 5.5 as well as 5.6 brought a solution for the procrustes prob-
lems (5.4) and (5.9). This constitutes a fundamental background for multidimensional
scaling in the environment of indefinite scalar products. The concept of (G,H)-polar
decompositions was thereby introduced for solving the problem (5.9).
7. Acknowledgements. I would like to thank Peter Benner, Technische Uni-
versität Berlin, for valuable discussions, for reading the manuscript and for his sugges-
tions which enhanced the quality of my presentation. I also thank Martin L. Michaelis
for his assistance in translating this document into the English language.

REFERENCES

[BG] I. Borg, and P. Groenen, Modern Multidimensional Scaling: Theory and Applications,
Springer, New York, 1997.
[BMRRR1] Y. Bolshakov, C.V.M. van der Mee, A.C.M. Ran, B. Reichstein, and L.Rodman, Po-
lar decompositions in finite dimensional indefinite scalar product spaces: General
Theory, Linear Algebra Appl. 261, 91-141, 1997.
47
[BMRRR2] Y. Bolshakov, C.V.M. van der Mee, A.C.M. Ran, B. Reichstein, and L.Rodman, Exten-
sion of isometries in finite-dimensional indefinite scalar product spaces and polar
decompositions, SIAM J. Matrix Anal. Appl. 18, 752-774, 1997.
[BMRRR3] Y. Bolshakov, C.V.M. van der Mee, A.C.M. Ran, B. Reichstein, and L.Rodman, Polar
decompositions in finite-dimensional indefinite scalar product spaces: special cases
and applications, in: Recent Developments in Operator Theory and its Applica-
tions, OT 87 (I. Gohberg, P. Lancaster, P.N. Shivakumar, Eds.), Birkhäuser, Basel,
1996, 61-94. Errata, Integral equations and Operator Theory 17, 497-501, 1997.
[BR] Y. Bolshakov, and B. Reichstein, Unitary equivalence in an indefinite scalar product:
an analogue of singular-value decomposition, Linear Algebra Appl. 222, 155-226,
1995.
[D] M. Davidson, Multidimensional Scaling, Wiley, New York, 1983.
[DP] P.S.Dwyer, and M.S.McPhail, Symbolic Matrix Derivatives, Ann. Math. Statist. 19,
517-534, 1948.
[E] P.J. Eberlein, Solution to the Complex Eigenproblem by a Norm Reducing Jacobi Type
Method, Numer. Math. 14, 232-245, 1970.
[F] J.F.G. Francis, The QR transformation. An unitary analogue to the LR transformation,
Computer J. 4, 265-271, 332-345, 1961/62.
[G] F.R. Gantmacher, The Theory of Matrices (Vol. I), Chelsea, New York, 1959.
[GLR] I. Gohberg, P. Lancaster, and L. Rodman, Matrices and Indefinite Scalar Products,
Birkhäuser, Basel, 1983.
[GR] W. Greub, Linear Algebra (3rd Ed.), Springer, Berlin, 1967.
[H] H. Harman, Modern Factor Analysis (3rd Ed.), Univ. of Chicago Press, Chicago, 1976.
[K] U. Kintzel, CNF, An algorithm for numerical computation of the canonical form of a
pair (A, H) consisting of an H-hermitian matrix A and a regular hermitian matrix
H, submitted to ACM Transactions on Mathematical Software (TOMS), 2003.
[KR] B. Kågström, and A. Ruhe, An Algorithm for Numerical Computation of the Jordan
Normal Form of a Complex Matrix, ACM Transactions on Mathematical Software
(TOMS) Vol. 6, No. 3, 398-419, 1980.
[LMMR] P. Lins, P. Meade, C. Mehl, and L. Rodman, Normal Matrices and Polar Decompo-
sitions in Indefinite Inner Products, Linear and Multilinear Algebra 49, 45-89,
2001.
[MMX] C. Mehl, V. Mehrmann, and H. Xu, Canonical forms for doubly structured matrices
and pencils, Electron. J. Linear Algebra 7, 112-151, 2000.
[MRR] C.V.M. van der Mee, A.C.M. Ran, and L. Rodman, Stability of self-adjoint square
roots and polar decompositions in indefinite scalar product spaces, Linear Algebra
Appl. 302-303, 77-104, 1999.
[PW] G. Peters, and J.H. Wilkinson, Eigenvectors of Real and Complex Matrices by QR and
LR triangularizations, Numer. Math. 16, 181-204, 1970.
[S] P. Schönemann, A generalized solution of the Orthogonal Procrustes Problem, Psy-
chometrika, Vol. 31, No. 1, 1-10, 1966.
[ST] J. Stoer, Numerische Mathematik 1 (5. Aufl.), Springer, Berlin, 1989.
[T] W.S. Torgerson, Theory and methods of scaling, Wiley, New York, 1958.
[WEY] H. Weyl, Raum, Zeit, Materie: Vorlesungen über allg. Relativitätstheorie (7. Aufl.),
Springer, Berlin, 1988.
[WED] J.H.M. Wedderburn, Lectures on Matrices, AMS, Vol. 17, New York, 1934.
[YH] G. Young, and A.S. Householder, Discussion of a set of points in terms of their mutual
distances, Psychometrika, Vol. 3, No. 1, 19-22, 1938.

48

Vous aimerez peut-être aussi