Académique Documents
Professionnel Documents
Culture Documents
A11 A12 · · · A1q } n1
A A · · · A
21 22 2q } n2
A = . . . . .
. . . .
Ap1 Ap2 · · · Apq } nq
|{z} |{z} |{z}
n1 n2 nq
y=x
for k = 1:t
y = Ak x
end
ω80 ω80 ω80 ω80 ω80 ω80 ω80 ω80
ω0 ω81 ω82 ω83 ω84 ω85 ω86 7
ω8
8
ω0 ω82 ω84 ω86 ω88 ω810 ω812 14
ω8
8
ω0 ω83 ω86 ω89 ω812 ω815 ω818 21
ω8
8
y = F8x = x
ω0 ω84 ω88 ω812 ω816 ω820 ω824 28
ω8
8
0
ω
8 ω85 ω810 ω815 ω820 ω825 ω830 35
ω8
0
ω
8 ω86 ω812 ω818 ω824 ω830 ω836 42
ω8
ω80 ω87 ω814 ω821 ω828 ω835 ω842 ω849
ω8 = cos(2π/8) − i · sin(2π/8)
The DFT Matrix In General...
pq
[Fn]pq = ωn
= (cos(2π/n) − i · sin(2π/n))pq
= cos(2pqπ/n) − i · sin(2pqπ/n)
Fact:
FnH Fn = nIn
√
Thus, Fn/ n is unitary.
Data Sparse Matrices
Example 1.
A has lots of zeros. (“Traditional Sparse”)
Example 2.
A is Toeplitz...
a b c d
e a b c
A =
f
e a b
g f e a
More Examples of Data Sparse Matrices
" #
b11C b12C
A =
b21C b22C
n X
X n X
n X
n
A = S(i, j, k, `) · (2-by-2) ⊗ · · · ⊗ (2-by-2)
i=1 j=1 k=1 `=1 | {z }
d times
where each A-matrix has 2 nonzeros per row and P1024 is a per-
mutation.
From Factorization to Algorithm
If n = 210 and
Fn = A10 · · · A2A1Pn
then
y = Pnx
for k = 1:10
y = Ak x ← 2n flops.
end
F8(:, [ 0 2 4 6 1 3 5 7 ]) =
1 0 0 0 1 0 0 0
0 1 0 0 0 ω 0 0
8
0 0 1 0 0 0 ω 2 0
8
F 0
0 ω8 3
0 0 0 1 0 0 4
1 0 0 0 −1 0 0 0 0 F4
0 1 0 0 0 −ω8 0 0
0 0 1 0 0 2
0 −ω8 0
0 0 0 1 0 0 0 −ω83
function y =fft(x, n)
if n = 1
y = x
else
m = n/2; ω = exp(−2πi/n)
Ω = diag(1, ω, . . . , ω m−1)
zT = fft(x(0:2:n − 1), m)
zB = Ω· fft(x(1:2:n − 1), m)
Im Im zT
y = Overall: 5n log n flops.
Im −Im zB
end
The Divide-and-Conquer Picture
(0:1:15)
HH
H
H
HH
H
H
(0:2:15) (1:2:15)
Q Q
Q Q
Q Q
Q Q
(0:4:15) (2:4:15) (1:4:15) (3:4:15)
@ @ @ @
@ @ @ @
(0:8:15) (4:8:15) (2:8:15) (6:8:15) (1:8:15) (5:8:15) (3:8:15) (7:8:15)
A A A A A A A A
A A A A A A A A
[0] [8] [4] [12] [2] [10] [6] [14] [1] [9] [5] [13] [3] [11] [7] [15]
Towards a Nonrecursive Implementation
If n = 2m and
Ωm = diag(1, ωn, . . . , ωnm−1),
then " # " #
Fm ΩmFm Im Ωm
FnΠn = = (I2 ⊗ Fm).
Fm −ΩmFm Im −Ωm
n = 2t
Fn = At · · · A1Pn
L/2−1
ΩL/2 = diag(1, ωL, . . . , ωL ) ωL = exp(−2πi/L)
The Bit Reversal Permutation
(0:1:15)
HH
H
H
HH
H
H
(0:2:15) (1:2:15)
Q Q
Q Q
Q Q
Q Q
(0:4:15) (2:4:15) (1:4:15) (3:4:15)
@ @ @ @
@ @ @ @
(0:8:15) (4:8:15) (2:8:15) (6:8:15) (1:8:15) (5:8:15) (3:8:15) (7:8:15)
A A A A A A A A
A A A A A A A A
[0] [8] [4] [12] [2] [10] [6] [14] [1] [9] [5] [13] [3] [11] [7] [15]
Bit Reversal
x(0) x(0000) x(0000) x(0)
x(1) x(0001) x(1000) x(8)
x(2) x(0010) x(0100) x(4)
x(3) x(0011) x(1100) x(12)
x(4) x(0100) x(0010) x(2)
x(5) x(0101) x(1010) x(10)
x(6) x(0110) x(0110) x(6)
x(7) x(0111) x(1110) x(14)
x(8) = x(1000)
→ x(0001) = x(1)
x(9) x(1001) x(1001) x(9)
x(10) x(1010) x(0101) x(5)
x(11) x(1011) x(1101) x(13)
x(12) x(1100) x(0011) x(3)
x(13) x(1101) x(1011) x(11)
x(14) x(1110) x(0111) x(7)
x(15) x(1111) x(1111) x(15)
Butterfly Operations
This matrix is block diagonal...
" #
IL/2 ΩL/2
Aq = I r ⊗ L = 2q , r = n/L
IL/2 −ΩL/2
r copies of things like this
1 ×
1 ×
1 ×
1 ×
1
×
1 ×
1 ×
1 ×
At the Scalar Level...
a sH a + ωb
s
H
H
ω
HH
b s Hs a − ωb
Signal Flow Graph (n = 8)
x0
H
s s s s y0
HH @ A
ω80 @ A
H @ A
@
x4 s
HHs
ω80
A s y1
A A
s
@
@ @ A A
@
@ @ A A
@ A
@s A s y2
x2
HH ω82
A
ω80
s s
A
H @ A A
ω80 @ A A A
A
HH @ A A
x6 Hs @s A ω81 A s y3
A A
s
A
A A A A
A A
A A A A
s A A A s y4
x1
H ω82
A A A
s s
H @
H
ω80 @ A A
A
H @
@ AA A A
H
x5
s Hs ω80 s ω83 A A s y5
@ A A
@ @
@ A A
@ @ A A
@
@ A A s y6
H
x3 2
s s ω8 s
HH A
@
ω80 @ A
H @ A
H
x7
s Hs @
s A s y7
The Transposed Stockham Factorization
If n = 2t, then
Fn = St · · · S2S1,
where for q = 1:t the factor Sq = Aq Γq−1 is defined by
Aq = I r ⊗ BL , L = 2q , r = n/L,
x0 x0
x1 x1
x2 x4
x3 x5
(Π4 ⊗ I2)
x4 = x2
x5 x3
x6 x6
x7 x7
Cooley-Tukey Array Interpretation
Step q:
k
2k 2k+1
8
>
<
L∗ =2q−1
>
−→ L=2q
:
| {z }
r∗ =n/L∗
| {z }
r=n/L
Reshaping
×
×
×
×
× × × ×
x = × → x2×4 =
× × × × ×
×
×
×
Transposed Stockham Array Interp
k k+r
9
>
=
(q−1)
xL∗ ×r∗ = FL∗ xT
r∗ ×L∗ = L∗ =2q−1 .
>
;
| {z }
r∗ =n/L∗
x(q) = Sq x(q−1)
k
9
>
>
>
>
>
>
>
>
=
(q)
xL×r = FL xT
r×L = L=2q
>
>
>
>
>
>
>
>
;
| {z }
r=n/L
2 × 2 × 2 Basic Radix-2 Versions
Fn = At · · · A1PnT
then
Fn = FnT = PnAT1 · · · ATt
and we can compute y = Fnx as follows...
y = x
for k = t: − 1:1
y = ATk x
end
y = Pny
Convolution and Other Aps
I iI −I −iI d (a − c)+i(b − d)
96
#P
cPP
# PP
c PP
# c
# c PP
24 24 24 24
@ @ @ @
@ @ @ @
8 8 8 8 8 8 8 8 8 8 8 8
Multiple DFTs
X ← Fn1 X
X ← XFn2
Blocked Multiple DFTs
X ← Fn1 X becomes
X1 | X2 | · · · | Xp ← Fn1 X1 | Fn1 X2 | · · · | Fn1 Xp
The 4-Step Framework
Initial:
X00 X01 X02 X03
X10 X11 X12 X13
X =
X20
.
X21 X22 X23
X30 X31 X32 X33
Transpose each block:
T
X00 T
X01 T
X02 T
X03
XT T
X11 T
X12 T
X13
10
X ← .
XT T
X21 T
X22 T
X23
20
T
X30 T
X31 T
X32 T
X33
Now regard as 2-by-2 and block transpose each block:
X T XT XT XT
00 10 02 12
T T T T
X X X X
X ← 01 11 03 13 .
T T T T
X X X X
20 30 22 32
T XT XT XT
X21 31 23 33
Now do a 2-by-2 block transpose:
X T XT XT XT
00 10 20 30
T T T T
X X X X
X ← 01 11 21 31 .
T
X XT XT XT
02 12 22 32
T XT XT XT
X03 13 23 33
Factorization and Transpose
xn×m ← xTm×n
corresponds to
x ← P (m, n)x
Option 1.
X ← Fn1 X
X ← XFn2
Given X(1:n1, 1:n2, 1:n3 ), apply DFT in each of the three dimen-
sions.
If
x = reshape(X(1:n1, 1:n2, 1:n3), n1n2n3, 1)
Sample for d = 5:
X(α1, α2 , α3, α4, α5) Fn1
µ=1
X(α2, α3 , α4, α5, α1) ΠTn1,n
X(α2, α3 , α4, α5, α1) Fn2
µ=2
X(α3, α4 , α5, α1, α2) ΠTn2,n
X(α3, α4 , α5, α1, α2) Fn3
µ=3
X(α4, α5 , α1, α2, α3) ΠTn3,n
X(α4, α5 , α1, α2, α3) Fn4
µ=4
X(α5, α1 , α2, α3, α4) ΠTn4,n
X(α5, α1 , α2, α3, α4) Fn5
µ=5
X(α1, α2 , α3, α4, α5) ΠTn5,n
FFTW: http:www.fftw.org