Académique Documents
Professionnel Documents
Culture Documents
I. I NTRODUCTION
and define
Ak = Uk k VkT ,
(1)
a) Latent Semantic Indexing (LSI): LSI [3], [4] is an effective information retrieval technique which computes the relevance
scores for all the documents in a collection in response to a user
query. In LSI, a collection of documents is represented as a termdocument matrix X = [xij ] where each column vector represents
a document and xij is the weight of term i in document j . A query
is represented as a pseudo-document in a similar forma column
vector q . By performing dimension reduction, each document xj
(j -th column of X ) becomes PkT xj and the query becomes PkT q
in the reduced-rank representation, where Pk is the matrix whose
column vectors are the k major left singular vectors of X . The
relevance score between the j -th document and the query is then
computed as the cosine of the angle between PkT xj and PkT q :
(Xk ej )T q
(P T xj )T (P T q)
,
cos(PkT xj , PkT q) = kT kT =
P xj P q
Xk ej P T q
k
k
k
(3)
3) We strengthen the observation made in [14] that the difference between Ab and the approximation vector (si )
monotonically decreases, by proving rigorously that this difference decays very rapidly in the major singular directions
of A. Illustrations are provided to relate the convergence
behavior of the method based on the good separation of
the largest singular values (eigenvalues) in Section VII.
4) Several practical issues which were not discussed in [14],
are explored in detail. These include the choice of the
approximation schemes depending on the shape of the
matrix, and other practical choices related to memory versus
computational cost trade-offs.
5) The technique described in this paper has a broad variety
of applications and is not limited to information retrieval.
Indeed it can be applied whenever the problem can be
formulated as a filtered mat-vec multiplication. This paper
presents two such applications.
B. Organization of the paper
The symmetric Lanczos algorithm is reviewed in Section II, and
the two proposed Lanczos approximation schemes are introduced
in Section III. Section IV provides a detailed analysis of the
convergence of the proposed techniques and their computational
complexities. Section V discusses how to incorporate entry scaling into the approximation vectors. Applications of the proposed
approximation schemes to information retrieval and face recognition are illustrated in Section VI. Finally, extensive experiments
are shown in Section VII, followed by concluding remarks in
Section VIII.
II. T HE S YMMETRIC L ANCZOS A LGORITHM
Given a symmetric matrix M Rnn and an initial unit vector
q1 , the Lanczos algorithm builds an orthonormal basis of the
Krylov subspace
Kk (M, q1 ) = range{q1 , M q1 , M 2 q1 , . . . , M k1 q1 }.
1
6
6 2
6
6
QT
k M Qk = Tk = 6
6
4
2
2
..
..
k1
..
k1
k
7
7
7
7.
7
7
k 5
k
(4)
3:
wi M qi i qi1
4:
i hwi , qi i
5:
wi wi i qi
6:
i+1 kwi k2
7:
if i+1 = 0 then
8:
stop
9:
end if
10:
qi+1 wi /i+1
11: end for
(7)
A. Re-orthogonalization
It is known that in practice the theoretical orthogonality of
the computed Lanczos vectors qi s is quickly lost. This loss
of orthogonality is triggered by the convergence of one or
more eigenvectors [17]. There has been much research devoted
to re-instating orthogonality. The simplest technique is to reorthogonalize each new vector against each of the previous basis
vectors. This amounts to adding the following line of pseudocode
immediately after line 5 of Algorithm 1:
wi wi
Pi1
j=1
wi , q j q j .
s Ab
s0 0
for i 1, . . . , k do
si si1 + h
s, qi i qi
end for
Q
k A AQk = Tk .
(6)
(8)
(9)
and call this the right projection approximation to Ab. The vector
ti is related to ti1 by
ti = ti1 + A
qi qiT b.
T
QT
k AA Qk = Tk .
(10)
does
Similar to the left projection approximation, the matrix M
not need to be explicitly formed, and the vector tk is also a good
approximation to Ak b (shown later). Algorithm 3 summarizes the
steps to compute tk .
Again, note that lines 26 of the algorithm are equivalent to
k (Q
T b)).
computing tk directly by tk = A (Q
k
t0 0
for i 1, . . . , k do
ti ti1 + hb, qi i qi
end for
tk Atk
IV. A NALYSIS
A. Convergence
The reason why sk and tk are good approximations to Ak b is
the fact that the sequences {si } and {ti } both rapidly converge
to Ab in the major left singular directions of A. This fact implies
that sk and tk , when projected to the subspace spanned by the
major left singular vectors of A, are very close to the projection
of Ab if k is sufficiently large. Since the goal of performing a
filtered mat-vec Ak b in data analysis applications is to preserve
the accuracy of Ab in these directions, the vectors sk and tk are
good alternatives to Ak b. The rapid convergence is analyzed in
the next.
Let uj be the j -th left singular vector of A. The rapid
convergence of the sequence {si } can be shown by establishing
the inequality
Ab si , uj cj kAbk T 1 (j ),
ij
(11)
Ab ti , uj j cj kbk T 1 (
j )
ij
(12)
1 l
e + el
2
l
1`
1 l
e =
exp(arccosh(x)) .
2
2
B. Computational complexities
Compared with the truncated SVD, the two Lanczos approximation schemes are far more economical both in memory
usage and in time consumption (for the preprocessing phase).
This section analyzes the computational complexities of these
algorithms and the results are summarized in Table I. We assume
that the matrix A is of size m n with nnz non-zero elements.
The situation when A is non-sparse is not typical, thus we
omit the analysis for this case here.1 Section IV-D has further
discussions on the trade-offs between space and time in practical
implementations.
In practice, both Algorithms 2 and 3 are split into two phases:
preprocessing and query response. The preprocessing phase consists of the computation of the Lanczos vectors, which is exactly
line 1 of both algorithms. The rest of the algorithms computes
sk (or tk ) by taking in a vector b. Multiple query vectors b can
be input and different query responses sk (or tk ) can be easily
produced provided that the Lanczos vectors have been computed
and saved. Hence the rest of the algorithms, excluding line 1,
constitutes the query response phase.
Since Algorithms 2 and 3 are very similar, it suffices to
show the analysis of Algorithm 2. In the preprocessing phase,
this algorithm computes two sparse mat-vec products (once for
multiplying qi by AT , and once for multiplying this result by
A) and performs a few length-m vector operations during each
iteration. Hence with k iterations, the time cost is O(k(nnz +m)).
The space cost consists of the storage of the matrix A and all the
Lanczos vectors, thus it is O(nnz + km).
The bulk of the preprocessing phase for computing Ak b lies
in the computation of the truncated SVD of A. A typical way
of computing this decomposition is to go through the Lanczos
process as in (5) with k0 > k iterations, and then compute the
eigen-elements of Tk0 . Usually the number of Lanczos steps k0
is far greater than k in order to guarantee the convergence of the
first k Ritz values and vectors. Hence the costs for the truncated
SVD, both in time and space, involve a value of k0 that is much
larger than k. Furthermore, the costs have an additional factor for
performing frequent eigen-decompositions and convergence tests.
For Algorithm 2, the query response phase involves one sparse
mat-vec (Ab) and k vector updates. Hence the total storage cost
is O(nnz + km), and to be more exact, the time cost is nnz +
2km. On the other hand, the filtered mat-vec product Ak b can
be computed in three ways: (a) Uk (UkT (Ab)); (b) A(Vk (VkT b));
or (c) Uk (k (VkT b)). Approach (a) requires O(nnz + km) space
and nnz + 2km time, approach (b) requires O(nnz + kn) space
and nnz + 2kn time, while approach (c) requires O(k(m + n))
space and k(m + n) time. Depending on the relative size of k
and nnz/n (or nnz/m), the best way can be chosen in actual
implementations.
To conclude, the two Lanczos approximation schemes consume
significantly fewer computing resources than the truncated SVD
in the preprocessing phase, and they can be as economical in
the query response phase. As a remark, due to the split of the
algorithms into these two phases, the storage of the k Lanczos
vectors {qi } (or {
qi }) is an issue to consider. These vectors
have to be stored permanently after preprocessing, since they are
needed to produce the query responses. As will be discussed later
1 Roughly speaking, if nnz is replaced by m n in the big-O notations
we will obtain the complexities for the non-sparse case. This is true for the
algorithms in this paper.
TABLE I
C OMPARISONS OF COMPUTATIONAL COSTS .
Left approx.
preprocessing space
preprocessing time
query space
query time
a
O(nnz + km)
O(k(nnz + m))
O(nnz + km)
nnz + 2km
Right approx.
O(nnz + kn)
O(k(nnz + n))
O(nnz + kn)
nnz + 2kn
Trunc. SVDa
O(nnz + k m + Ts ) or O(nnz + k0 n + Ts )
O(k0 (nnz + m) + Tt ) or O(k0 (nnz + n) + Tt )
O(nnz + km) or O(nnz + kn) or O(k(m + n))
nnz + 2km or nnz + 2kn or k(m + n)
0
Ts (resp. Tt ) is the space (resp. time) cost for eigen decompositions of tridiagonal matrices and convergence
tests. These two costs are both complicated and cannot be neatly expressed using factors such as nnz , k0 ,
m and n. Also, note that k0 > k and empirically k0 is a few times larger than k.
will result in a query response without any mat-vec operations. Similarly, in the right projection approximation, update
formula (10) can be rewritten as:
(14)
ti = ti1 + hb, qi i (A
qi ).
(13)
The vectors {A
qi }, byproducts of the preprocessing phase, are
stored in place of A.
In summary, depending on the relative significance of space
and time in practical situations, the two approximation schemes
can be flexibly implemented in three different ways. From low
storage requirement to high storage requirement (or from high
tolerance in time to low tolerance in time), they are: (a) Storing
only the initial vector q1 (or q1 ) and regenerating the Lanczos
vectors in the query response phase; (b) the standard Algorithm 2
(or Algorithm 3); (c) storing two sets of vectors {qi } and {AT qi }
(or {
qi } and {A
qi }) and discarding the matrix A.2 In all situations
these alternatives are far more efficient than those based on the
truncated SVD.
V. E NTRY S CALING
As indicated in the discussions of the two sample applications
in Section I, it is usually necessary to scale the entries of the
filtered mat-vec product. This amounts to dividing each entry of
the vector Ak b by the norm of the corresponding row of Ak . In
parallel, in the proposed Lanczos approximations, we may need
to divide each entry of the approximation vector sk = Qk QTk Ab
kQ
T b) by the norm of the corresponding row
(resp. tk = AQ
k
T
kQ
T ). This section illustrates a way to
of Qk Qk A (resp. AQ
k
compute these row norms in a minimal cost. Similar to the
computation of sk and tk , the row norms are computed in an
iterative update fashion.
(i)
For the left projection approximation, define j to be the norm
T
of the j -th row of Qi Qi A, i.e.,
(i)
(i)
(i) 2
T
= eT
j Qi Qi A .
(i1)
is related to j
(15)
by
T
T
T
= eT
j Qi Qi AA Qi Qi ej
T
= eT
j Qi Ti Qi ej
T
T
= eT
j (Qi1 Ti1 Qi1 + i qi qi
T
T
+ i qi qi1
+ i qi1
qi )ej
2
(i1)
= j
+ i (qi )2j + 2i (qi )j (qi1 )j ,
(16)
2 Depending on the relative size of k and nnz/n (or nnz/m), method (c)
may or may not need more storage than (b). In any case, method (c) should
run faster than (b).
where (v)j means the j -th element of vector v . Hence the row
(k)
norms of Qk QTk A, j , j = 1, . . . , m, can be computed using
the update formula (16) in the preprocessing phase. Specifically,
the following line of pseudocode should be added immediately
after line 4 of Algorithm 1:
(i) 2
(i1) 2
j
+ i (qi )2j + 2i (qi )j (qi1 )j .
(i)
(i) 2
is related to j
(17)
by
T T
= eT
j AQi Qi A ej
T
T
qi qiT AT )ej
= eT
j (AQi1 Qi1 A + A
2
(i1)
= j
+ (A
qi )2j .
(18)
(i) 2
(i1) 2
+ (A
qi )2j .
j
(k)
(k)
A. Approximation quality
This section shows the convergence behavior of the sequences
T
= eT
j AQ i Q i .
(i1)
to be the
were from the data sets MED, CRAN, and NPL, which will be
introduced in the next section. We respectively label the three
matrices as AMED , ACRAN , and ANPL .
Figure 1 shows the convergence behavior of {si } and {ti } in
the major left singular directions u1 , u2 and u3of the matrix
(or
Ab ti , uj ). As shown by (a) and (c), if no reorthogonalization was performed, the {si } and {ti } sequences
diverged starting at around the 10-th iteration, which indicated
that loss of orthogonality in the Lanczos procedure appeared
fairly soon. Hence it is necessary that re-orthogonalization be
performed in order to yield accurate results. Plots (b) and (d)
show the expected convergence pattern: the residuals rapidly fell
towards the level of machine epsilon ( 1016 ).
To show convergence results on more matrices, we plot in
Figure 2 the residual curves for ACRAN and ANPL . For all cases
re-orthogonalization was performed. Note that the shapes of the
two matrices are different: ACRAN has more columns than rows
while for ANPL the opposite is true. Figure 2 implies two facts:
(a) The approximation sequences converged in all three directions,
with u1 being the fastest direction; (b) the approximations did not
differ too much in terms of rate of convergence or approximation
quality. The first fact can be further investigated by plotting the
eigenvalue distribution of the matrices M = AAT as in Figure 3.
The large gap between the first two eigenvalues explains the quick
convergence in the u1 direction. The second fact is guaranteed
by the similar residual bounds (c.f. inequalities (11) and (12)).
It further shows that approximation quality is not a factor in the
choice of left or right projection.
For k = 300
plot in Figure
, we
4 the residuals
A300 b s300 , uj (or A300 b t300 , uj ) over the j -th left
singular directions for j = 1, . . . , 300. All the plots exhibit a similar pattern: A300 b and s300 (or t300 ) were almost identical when
projected onto the first 100 singular directions. This indicates that
sk and tk can be good alternatives to Ak b, since they preserve the
quality of Ak b in a large number of the major singular directions.
B. LSI for information retrieval
1) Data sets: Four data sets were used in the experiments.
Statistics are shown in Table II.
a) MEDLINE and CRANFIELD3 : These are two early and
well-known benchmark data sets for information retrieval. Their
typical statistics is that the number of distinct terms is more than
the number of documents, i.e., for the matrix A (or equivalently
X T ), m < n.
3 ftp://ftp.cs.cornell.edu/pub/smart/
10
u1
10
10
u1
10
u2
u2
u3
5
10
10
10
residual
10
residual
10
residual
u3
10
10
10
residual
10
10
u1
u2
15
10
15
40
60
iteration i
80
100
40
60
iteration i
80
100
40
60
iteration i
80
100
20
40
60
iteration i
80
100
Convergence of the {si } (resp. {ti }) sequence to the vector Ab in the major singular directions u1 , u2 and u3 . Matrix: AMED .
u1
10
u1
10
u2
u
15
10
10
15
10
40
60
iteration i
80
100
10
10
40
60
iteration i
80
10
100
20
10
10
15
10
20
10
15
10
20
10
residual
residual
10
10
10
u2
10
u1
10
u2
u1
10
u2
residual
20
15
10
u3
residual
Fig. 1.
10
20
u2
15
10
u3
20
10
10
u1
40
60
iteration i
80
100
20
40
60
iteration i
80
100
Fig. 2. Convergence of the {si } (resp. {ti }) sequence to the vector Ab in the major singular directions u1 , u2 and u3 . Matrices: ACRAN , ANPL . All with
re-orthogonalization.
8.30
12.33
(b) ACRAN .
207.42
(c) ANPL .
Eigenvalue distribution of M = AAT on the real axis, where AT is the term-document matrix.
10
10
10
10
10
15
10
residual
10
residual
10
10
10
15
10
10
20
50
100
150
200
major left singular direction j
250
300
10
10
10
15
10
20
10
(a) AMED .
Fig. 3.
0
x
residual
20
50
100
150
200
major left singular direction j
250
300
10
50
100
150
200
major left singular direction j
250
300
Fig. 4. Differences between A300 b and s300 (or t300 ) in the major left singular directions. For the first one hundred directions, the differences have decreased
to almost zero, while for other less significant directions the differences have not dropped to this level.
b) NPL3 : This data set is larger than the previous two, with
a property that the number of documents is larger than the number
of distinct terms, i.e., m > n. Including the above two, these three
data sets have the term-document matrices readily available from
the provided links, so we did not perform additional processing
on the data.
c) TREC4 : This is a large data set which is popularly used
for experiments in serious text mining applications. Similar to
NPL, the term-document matrix extracted from this data set has
4 http://trec.nist.gov/data/docs
eng.html
http://trec.nist.gov/data/topics eng/
http://trec.nist.gov/data/qrels eng/
6 Relevance:
MED
7,014
1,033
30
52
CRAN
3,763
1,398
225
53
NPL
7,491
11,429
93
20
TREC
138,232
528,030
50
129
rather than being computed on the fly, the query times would have
been identical for both approaches.
We also performed tests on the large data set TREC. Since we
had no access to a fair environment (which requires as much as
16GB working memory and no interruptions from other users) for
time comparisons, we tested only the accuracy. Figure 6 shows
the precision-recall curves at different ks (400, 600 and 800)
for the two approaches. As illustrated by the figure, the Lanczos
approximation yielded much higher accuracy than the truncated
SVD approach. Meanwhile, according to the rigorous analysis in
Section IV-B, the former approach should consume much fewer
computing resources and be more efficient than the latter.
C. Eigenfaces for face recognition
1) Data sets: Three data sets were used in the experiments.
Statistics are shown in Table III.
a) ORL8 : The ORL data set [23] is small but classic. A
dark homogeneous background was used at the image acquisition
process. We did not perform any image processing task on this
data set.
b) PIE9 : The PIE data set [24] consists of a large set of
images taken under different poses, illumination conditions and
face expressions. We downloaded the cropped version from Deng
Cais homepage10 . This version of the data set contains only five
near frontal poses for each subject. The images had been nonuniformly scaled to a size of 32 32.
c) ExtYaleB11 : Extended from a previous database, the extended Yale Face Database B (ExtYaleB) [25] contains images for
38 human subjects. We used the cropped version [26] that can be
downloaded from the database homepage. We further uniformly
resized the cropped images to half of their sizes. The resizing can
be conveniently performed in Matlab by using the command
img_small = imresize(img, .5, bicubic)
without compromising the quality of the images.
TABLE III
FACE RECOGNITION DATA SETS .
# subjects
# imgs
img size
ORL
40
400
92 112
PIE
68
11,554
32 32
ExtYaleB
38
2,414
84 96
lanczos
tsvd
term matching
lanczos
0.75
tsvd
term matching
lanczos
0.45
0.44
0.6
0.55
0.5
50
100
150
0.42
0.41
0.4
0.39
0.24
0.235
0.23
0.37
0.225
50
100
150
200
250
0.22
450
300
500
550
600
700
750
800
0.7
0.8
650
k
0.9
0.5
0.45
0.6
0.4
0.7
0.5
0.35
0.5
0.4
0.4
precision
precision
0.6
precision
0.25
0.245
0.38
0.36
200
0.255
average precision
0.65
term matching
0.26
0.43
average precision
average precision
0.7
tsvd
0.265
0.3
0.3
0.3
0.25
0.2
0.15
0.2
0.2
0.1
0.1
0.1
lanczos
tsvd
0
0
0.2
0.4
0.6
0.8
lanczos
tsvd
0
0
0.2
lanczos
tsvd
0.05
0.4
recall
0.6
0.8
0
0
0.2
0.4
recall
12
400
lanczos
tsvd
16
0.8
18
lanczos
tsvd
0.6
recall
lanczos
tsvd
350
10
14
12
preprocessing time
preprocessing time
preprocessing time
300
10
8
6
250
200
150
100
50
50
100
150
200
50
100
150
0
450
300
2.8
lanczos
tsvd
600
650
700
750
800
lanczos
tsvd
1.4
2.2
1.3
1.2
1.1
1
0.034
0.032
2
1.8
1.6
1.4
0.028
0.026
0.024
0.022
0.9
1.2
0.02
0.8
0.018
50
100
150
200
0.8
lanczos
tsvd
0.03
average query time
2.4
Fig. 5.
550
x 10
2.6
1.5
0.7
500
x 10
1.6
250
1.7
200
k
50
100
150
200
250
300
0.016
450
500
550
600
650
700
750
800
10
0.35
0.25
0.25
lanczos
tsvd
lanczos
tsvd
lanczos
tsvd
0.3
0.2
0.2
0.15
0.15
0.2
0.15
precision
precision
precision
0.25
0.1
0.1
0.1
0.05
0.05
0.05
0
0
0.2
0.4
0.6
0.8
0
0
0.2
0.4
recall
(a) k = 400.
Fig. 6.
0.6
0.8
0
0
0.2
0.4
recall
0.6
0.8
recall
(b) k = 600.
(c) k = 800.
A PPENDIX
P ROOF OF THE C ONVERGENCE OF THE A PPROXIMATION
VECTORS
(I Q1 QT
(I Qk QT
1 )j
k )j
Kj
Qk QT j
Q1 QT j
Tkj (j )
1
k
(19)
where
j j+1
j = 1 + 2
,
j+1 n
(
Kj =
1
Qj1
i n
i=1 i j
j=1
j 6= 1
T
1
(I Qk Qk )j cj Tkj (j ),
(20)
11
4 train
6 train
0.4
0.4
lanczos
tsvd
0.35
0.3
0.3
0.25
0.25
0.2
error rate
0.3
0.15
0.2
0.15
0.1
0.1
0.1
0.05
0.05
20
30
40
50
60
70
0
10
80
20
30
40
50
60
70
0
10
80
70 train
100 train
0.35
0.35
0.25
error rate
0.25
error rate
0.25
0.2
0.15
0.1
0.1
0.05
0.05
120
140
0
20
160
40
60
80
100
120
140
0
20
160
30 train
40 train
0.9
0.6
0.6
error rate
0.7
0.6
error rate
0.8
0.7
0.5
0.4
0.3
0.2
0.2
0.2
0.1
0.1
0.1
50
100
150
200
lanczos
tsvd
0.4
0.3
160
0.5
0.3
200
140
0.9
0.8
150
120
50 train
0.7
0.4
100
1
lanczos
tsvd
0.8
0.5
80
1
lanczos
tsvd
0.9
Fig. 7.
60
100
40
50
lanczos
tsvd
0.2
0.1
100
80
0.15
0.05
80
70
0.35
0.3
0.15
60
130 train
0.3
0.2
50
0.4
lanczos
tsvd
0.3
60
40
0.4
lanczos
tsvd
40
30
0.4
0
20
20
error rate
0.2
0.15
0.05
0
10
lanczos
tsvd
0.35
0.25
error rate
error rate
0.35
error rate
8 train
0.4
lanczos
tsvd
50
100
150
200
(Face recognition) Performance tests on ORL, PIE and ExtYaleB: error rate.
Ab si , uj = (I Qi QT
i )Ab, uj
D
E
= (I Qi QT
i )uj , Ab
(I Qi QT
i )uj kAbk
1
cj kAbk Tij
(j ).
iQ
T
Ab ti , uj = A(I Q
i )b, uj
D
E
T
iQ
T
= (I Q
i )b, A uj
iQ
T
= j vjT (I Q
i )b
iQ
T
j (I Q
i )vj kbk .
(I Q
i
by Chebyshev polynomial, yielding the result
Ab ti , uj j cj kbk T 1 (
j ).
ij
12
50
16
2.5
lanczos
tsvd
lanczos
tsvd
14
40
1.5
preprocessing time
preprocessing time
12
preprocessing time
lanczos
tsvd
45
10
8
6
35
30
25
20
15
10
0.5
2
0
10
20
30
40
50
60
70
0
20
80
40
60
80
100
120
140
160
50
100
150
200
x 10
5.58
0.0176
lanczos
tsvd
0.0296
5.54
5.52
5.5
5.48
5.46
20
30
40
50
60
70
80
0.0175
0.0175
0.0174
0.0173
Fig. 8.
0.029
0.0288
0.0286
0.0284
0.0172
0.0282
40
60
80
100
120
0.0292
0.0173
20
lanczos
tsvd
0.0294
0.0175
average query time
5.56
5.44
10
0.0298
0.0176
lanczos
tsvd
5.6
140
160
0.028
50
100
150
200
(Face recognition) Performance tests on ORL, PIE and ExtYaleB: time (in seconds).
R EFERENCES
[1] C. Eckart and G. Young, The approximation of one matrix by another
of lower rank, Psychometrika, vol. 1, no. 3, pp. 211218, 1936.
[2] G. H. Golub and C. F. Van Loan, Matrix Computations. Johns Hopkins,
1996.
[3] S. C. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. A.
Harshman, Indexing by latent semantic analysis, J. Am. Soc. Inf. Sci.,
vol. 41, no. 6, pp. 391407, 1990.
[4] M. W. Berry and M. Browne, Understanding Search Engines: Mathematical Modeling and Text Retrieval. SIAM, June 1999.
[5] M. Turk and A. Pentland, Face recognition using eigenfaces, in
Proceedings of IEEE Conference on Computer Vision and Pattern
Recognition, 1991, pp. 586591.
[6] D. I. Witter and M. W. Berry, Downdating the latent semantic indexing
model for conceptual information retrieval, The Computer J., vol. 41,
no. 8, pp. 589601, 1998.
[7] H. Zha and H. D. Simon, On updating problems in latent semantic
indexing, SIAM J. Sci. Comput., vol. 21, no. 2, pp. 782791, 1999.
[8] M. Brand, Fast low-rank modifications of the thin singular value
decomposition, Linear Algebra Appl., vol. 415, no. 1, pp. 2030, 2006.
[9] J. E. Tougas and R. J. Spiteri, Updating the partial singular value
decomposition in latent semantic indexing, Comput Statist. Data Anal.,
vol. 52, no. 1, pp. 174183, 2007.
[10] E. Kokiopoulou and Y. Saad, Polynomial filtering in latent semantic
indexing for information retrieval, in Proceedings of the 27th annual
international ACM SIGIR conference on Research and development in
information retrieval, 2004, pp. 104111.
[11] J. Erhel, F. Guyomarc, and Y. Saad, Least-squares polynomial filters for
ill-conditioned linear systems, University of Minnesota Supercomputing
Institute, Tech. Rep., 2001.
[12] Y. Saad, Filtered conjugate residual-type algorithms with applications,
SIAM J. Matrix Anal. Appl., vol. 28, no. 3, pp. 845870, August 2006.
[13] M. W. Berry, Large scale sparse singular value computations, International Journal of Supercomputer Applications, vol. 6, no. 1, pp. 1349,
1992.
[14] K. Blom and A. Ruhe, A Krylov subspace method for information
retrieval, SIAM J. Matrix Anal. Appl., vol. 26, no. 2, pp. 566582,
2005.
[15] G. Golub and W. Kahan, Calculating the singular values and pseudoinverse of a matrix, SIAM J. Numer. Anal., vol. 2, no. 2, pp. 205224,
1965.
[16] Y. Saad, Numerical Methods for Large Eigenvalue Problems. Halstead
Press, New York, 1992.
[17] B. N. Parlett, The symmetric eigenvalue problem. Prientice-Hall, 1998.
[18] R. M. Larsen, Efficient algorithms for helioseismic inversion, Ph.D.
dissertation, Dept. Computer Science, University of Aarhus, Denmark,
October 1998.
[19] H. D. Simon, Analysis of the symmetric Lanczos algorithm with
reorthogonalization methods, Linear Algebra Appl., vol. 61, pp. 101
131, 1984.
[20] , The Lanczos algorithm with partial reorthogonalization, Mathematics of Computation, vol. 42, no. 165, pp. 115142, 1984.
[21] D. Zeimpekis and E. Gallopoulos, TMG: A MATLAB toolbox for
generating term document matrices from text collections. Springer,
Berlin, 2006, pp. 187210.
[22] M. W. Berry, Large-scale sparse singular value computations, International Journal of Supercomputer Applications, vol. 6, no. 1, pp. 1349,
1992.
[23] F. Samaria and A. Harter, Parameterisation of a stochastic model for
human face identification, in 2nd IEEE Workshop on Applications of
Computer Vision, 1994.
[24] T. Sim, S. Baker, and M. Bsat, The CMU pose, illumination, and
expression database, IEEE Trans. Pattern Anal. Machine Intell., vol. 25,
no. 12, pp. 16151618, 2003.
[25] A. Georghiades, P. Belhumeur, and D. Kriegman, From few to many:
Illumination cone models for face recognition under variable lighting
and pose, IEEE Trans. Pattern Anal. Machine Intell., vol. 23, no. 6, pp.
643660, 2001.
[26] K. Lee, J. Ho, and D. Kriegman, Acquiring linear subspaces for face
recognition under variable lighting, IEEE Trans. Pattern Anal. Machine
Intell., vol. 27, no. 5, pp. 684698, 2005.
[27] Y. Saad, On the rates of convergence of the Lanczos and the blockLanczos methods, SIAM J. Numer. Anal., vol. 17, no. 5, pp. 687706,
October 1980.