Vous êtes sur la page 1sur 43

Fast And Parallel Algorithms

for Multiple-Scattering Imaging

Mert Hidayetoglu
University of Illinois at Urbana-Champaign, IL, USA

Chow Yei Ching Building 204K


10 am, 17 August 2016
Inverse-Scattering Problems

Transceiver
Reconstructed
Object

Receiver
?
Inverse-Scattering Problems Measured Scattered
Mathematical and Physical
Modeling
Field Data Algorithm Design

Multiple
Scattering
Reconstructed
Transceiver Solutions
Object

Receiver
?
(Sensor) Inverse Solver on
Parallel Computers

Data Acquisition Measurement Pre-Processing Numerical Reconstruction Image Post-Processing


(Experimental Setup Design) (Signal Processing Methods) (Parallel and High- (Signal Processing and
Performance Computing) Statistical Methods)
Case 1: Limited Angle Sensing (Edge Detector) Fourier Relation: ℱ

128 𝑹𝑿
Reconstruction

Freq. Range: 2.2 MHz – 8.6 MHz DC component


Receivers is missing!


0° Frequency

Fourier Relation: ℱ −1
Case 2: Full Angle Sensing Fourier Relation: ℱ

DC
Monochromatic Component
is captured
128 𝑹𝑿 by these
sensors DC component
Receivers is captured!

dB
Frequency

0° 6λ

Fourier Relation: ℱ −1
Distorted Born Approximation
Scattering Equation:
𝐺 = 𝐺0 + 𝐺0 𝑂𝐺
Under Perturbation: 𝛿𝑂
𝑂 = 𝑂𝑏 + 𝛿𝑂
𝐺 = 𝐺𝑏 + 𝛿𝐺
𝐺𝑏 = 𝐺0 + 𝐺0 𝑂𝑏 𝐺𝑏 𝑂𝑏
𝐺𝑏 + 𝛿𝐺 = 𝐺0 + 𝐺0 (𝑂𝑏 + 𝛿𝑂)(𝐺𝑏 + 𝛿𝐺)
𝐺𝑏 + 𝛿𝐺 = 𝐺0 + 𝐺0 𝑂𝑏 𝐺𝑏 + 𝐺0 𝛿𝑂𝐺𝑏 + 𝐺0 𝑂𝑏 𝛿𝐺 + 𝐺0 𝛿𝑂𝛿𝐺

𝛿𝐺 = 𝐺𝑏 𝛿𝑂𝐺𝑏 + 𝐺𝑏 𝛿𝑂𝛿𝐺 Higher-order var.

Higher-order var.
𝛿𝐺 ≈ 𝐺𝑏 𝛿𝑂𝐺𝑏
Distorted Born Approximation (omits higher order variations)

When operates on a source:


𝛿𝐺𝜙 𝑖𝑛𝑐 ≈ 𝐺𝑏 𝛿𝑂𝐺𝑏 𝜙 𝑖𝑛𝑐
𝛿𝜙 ≈ 𝐺𝑏 𝛿𝑂𝜙𝑏 Each 𝐺𝑏 operation requires a forward-scattering solution
Fast Solvers for Forward-Scattering Problems
Direct Methods: 𝑂 𝑁 3
Iterative Methods: 𝑂(𝑁 2 )
Fast Multipole Method: 𝑂 𝑁 1.4 − 𝑂 𝑁 1.5
Multilevel Fast Multipole Algorithm: 𝑂 𝑁 − 𝑂 𝑁 log 𝑁
The Multilevel Fast Multipole Algorithm (MLFMA)
Region A Region B
Level 3
M2L
Multipole Local
Expansions M2L Expansions
Level 2
M2M M2M M2L L2L L2L Far-Field
M2L Evaluations

Level 1 L2P L2P L2P


P2M P2M P2M P2M L2P

Nearfield
Basis Functions Testing Functions
P2P Evaluations
Radiated Field from a Point Source
Fast Solvers for Forward-
Scattering Problems
Number of Levels: 11
Domain Size: 409.6λ
Number of Pixels: 16,777,216
Multiplication Time: 2.16 sec.
Max. Error: 1e-6
The Multilevel Fast Multipole Algorithm (MLFMA)

Region A Region B
Level 3
M2L
Multipole Local
Expansions M2L Expansions
Level 2
M2M M2M M2L L2L L2L Far-Field
M2L Evaluations
Level 1 L2P L2P L2P
P2M P2M P2M P2M L2P

Testing Functions Nearfield


Basis Functions
P2P Evaluations
Fast Algorithms for Born Solvers

Matrix-Vector Multiplication Time Matrix-Vector Multiplication Memory

Memory: 16.4 GB
MVM Time: 282 ms

Memory: 22.8 GB
Dimension: 409.6λ
Unknowns: 16.78 M
MVM Time: 17 s
Dimension: 12.8λ
Unknowns: 16.38 k Memory: 49.4 MB
MVM Time: 15 ms

w/ 16 Threads Computing Node has 32 GB Memory


MLFMA MPIxOpenMP Parallelization Performance

49.89 s (1x1) 89.89 (16x32)


Execution Time 0.56 s (16x32) Speedup 282.9 (16x32)

235.3 (16x16)

3.44 s (16x1)

85.20(16x16)

14.51 (16x1)

11.93 (16x32)
7.45 (16x1)
Pure MPI

4,194,304 Unknowns, 9 Levels


Normal Equation:
ഥ† ⋅ 𝑭
𝜇2 𝑰 + 𝑭 ഥ ⋅𝑶=𝑭
ഥ† ⋅ 𝝓
Single Illumination:
ഥ⋅𝑶=𝝓
𝑭 ഥ1
𝑭 𝝓1
ഥ ഥ †𝐼 ⋅ 𝝓2
ഥ1† 𝑭
𝜇2 𝑶 + 𝑭 ഥ †2 ⋯ 𝑭
ഥ †𝐼 ⋅ 𝑭2 ⋅ 𝑶 = 𝑭
ഥ1† 𝑭
ഥ †2 ⋯ 𝑭
⋮ ⋮
ഥ𝐼
𝑭 𝝓𝐼
Multiple Illuminations:
𝜇2 𝑶 + 𝑭 ഥ1† ⋅ 𝑭
ഥ1 + 𝑭 ഥ †2 ⋅ 𝑭
ഥ2 + ⋯ + 𝑭 ഥ †𝐼 ⋅ 𝑭
ഥ𝐼 ⋅ 𝑶
ഥ1
𝑭 𝝓1
= 𝑭 ഥ1† ⋅ 𝝓1 + 𝑭 ഥ †2 ⋅ 𝝓2 + ⋯ + 𝑭 ഥ †𝐼 ⋅ 𝝓𝐼
ഥ2
𝑭 𝝓
⋅𝑶= 2
⋮ ⋮
ഥ𝐼
𝑭 𝝓𝐼 𝜇2 𝑶 + 𝑭 ഥ1† ⋅ 𝑭
ഥ1 ⋅ 𝑶 + 𝑭 ഥ †2 ⋅ 𝑭 ഥ †𝐼 ⋅ 𝑭
ഥ2 ⋅ 𝑶 + ⋯ + 𝑭 ഥ𝐼 ⋅ 𝑶
= 𝑭 ഥ1† ⋅ 𝝓1 + 𝑭 ഥ †2 ⋅ 𝝓2 + ⋯ + 𝑭 ഥ †𝐼 ⋅ 𝝓𝐼

ഥ †𝑖 ⋅ 𝑭
Multiplications 𝑭 ഥ †𝑖 ⋅ 𝝓𝒊 can be performed independently
ഥ 𝑖 ⋅ 𝑶 and 𝑭
Solution Time: Inverse Sol. Forward Sol.

𝑂 𝑁𝑅𝐹𝑇 + 𝑂 𝑁 2 𝐹𝑇 = 𝑂 𝑁𝑅𝐼 + 𝑂(𝑁 2 𝐼)


with parallelization

Inverse Sol. Forward Sol. 𝐹: # Frequencies


𝑇: # Transmitters
𝑂 𝑁𝑅 + 𝑂(𝑁𝐼) + 𝑂 𝑁 2
𝑅: # Receivers
𝑁: # Unknowns
with fast algorithms

Inverse Sol. Forward Sol. 𝐼 = 𝐹𝑇: # Illuminations


𝑀 = 𝐼𝑅: # Measurements
𝑂 𝑅 + 𝑂 𝑁𝐼 + 𝑂 𝑁

Naïve & Sequential: 𝑂 𝐼𝑁 2 + 𝑂 𝐼𝑁𝑅


Fast & Parallel: 𝑂(𝐼𝑁) + 𝑂 𝑅
Born Iterative Method
𝐵𝐼𝑀: 𝛿𝜙 ≈ 𝐺0 𝛿𝑂𝜙𝑏 𝜇2 ത𝑰 + 𝑭
ഥ† ⋅ 𝑭
ഥ ⋅𝑶=𝑭
ഥ † ⋅△ 𝝓

Cubic Scatterers 𝛿𝜖𝑟 = 𝜖𝑟 − 𝜖𝑏


BIM: 𝛿𝜙 ≈ 𝐺0 𝛿𝑂𝜙𝑏
Effect of Regularization Parameter (BIM)
𝜇2 ത𝑰 + 𝑭
ഥ† ⋅ 𝑭
ഥ ⋅𝑶=𝑭
ഥ † ⋅△ 𝝓

𝜇 = 13 𝜇 = 15 𝜇 = 20

𝜇 = 30 𝜇 = 40 𝜇 = 60
Born Iterative Method

𝐵𝐼𝑀: 𝛿𝜙 ≈ 𝐺0 𝛿𝑂𝜙𝑏 𝜇2 ത𝑰 + 𝑭
ഥ† ⋅ 𝑭
ഥ ⋅𝑶=𝑭
ഥ † ⋅△ 𝝓

Error

𝜇 = 60
BIM vs. DBIM BIM: 𝛿𝜙 ≈ 𝐺0 𝛿𝑂𝜙𝑏
Normal Equation: 𝜇2 ത𝑰 + 𝑭
ഥ† ⋅ 𝑭
ഥ ⋅ 𝛿𝑶 = 𝑭
ഥ † ⋅ 𝛿𝝓
DBIM: 𝛿𝜙 ≈ 𝐺𝑏 𝛿𝑂𝜙𝑏

𝜇 = 40 𝜇 = 60 𝜇 = 80

BIM DBIM BIM DBIM BIM DBIM


# MVM 307,744 667,840 # MVM 266,778 940,366 # MVM 619,594 1,756,948
Total Time (s) 59.93 147.65 Total Time (s) 46.91 206.76 Total Time (s) 111.94 385.20
Parallel Environment Infiniband
Network

Switch
Nodes
Master
Node

Processor Core

Execution Execution
Comm. MPI Region
MPI MPI
Process Process (p processes)

OpenMP OpenMP OpenMP OpenMP MPIxOpenMP Region


Thread Thread Thread Thread (p×t threads)

Comm.
MPI MPI MPI Region
Process Process (p processes)
Parallel Born Solutions
Forward Solution(s)

T/X Frequencies
Execution Execution
MLFMA MLFMA …
MPI MPI MPI Region
Process Process (p processes)
MLFMA MLFMA …
OpenMP OpenMP OpenMP OpenMP MPIxOpenMP Region
Thread Thread Thread Thread (p×t threads)


MPI MPI MPI Region
Process Process
(p processes) MLFMA MLFMA …

T/X Positions
No communication here
Parallel Born Solutions
Inverse Solution

T/X Frequencies
Process Process
Master Slave
Execution Execution ഥ1,𝐹 ⋅ 𝑶
𝑭 … ഥ 𝑇,𝐹 ⋅ 𝑶
𝑭
Comm. MPI Region
MPI MPI
Process Process (p processes)
𝑶
𝑶 𝑶


OpenMP OpenMP OpenMP OpenMP MPIxOpenMP Region
Thread Thread Thread Thread 𝒃𝐹,1 𝒃𝐹,𝑇
(p×t threads)

Master Process
Process 𝑶 Process
Comm.
MPI Region
MPI
Process
MPI
Process
(p processes) ഥ1,1 ⋅ 𝑶
𝑭
… ഥ 𝑇,1 ⋅ 𝑶
𝑭
𝒃 𝒃1,𝑇

T/X Positions
Scaling on CPU Nodes

11.5 Hours
1 Thread

56 Seconds
2,048 Threads

50 minutes
32 Thread 38 Seconds
1.2 Hours
4,096 Threads
16 Thread
NCSA Blue Waters

• Supported by NSF and UIUC


• Maintained by NCSA
• Operational by November 2012
• To all users by March 2013
• 237 Racks of CPU Nodes
• 32 Racks of GPU Nodes
22,640 XE Nodes

4,228 XK Nodes
OpenMP implementation on Blue Waters (within a single node) 1,048,576 Unknowns
Example: 16 OpenMP threads 8 Levels
Floating Point Unit
default option: -j0

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Interlagos Processor Integer Units


(8 Bulldozer Compute Units)
-j1 option

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Execution Time Speedup

-j0 (default)
-j1

-j1
-j0 (default)
CPU Nodes vs. GPU Nodes
50 min. 38 sec.
(32 threads) (4,096 threads)

Sequential Execution:
10 min. 11.5 Hours
(1 GPU)

7 sec.
(128 GPUs)
Synthetic Reconstructions

𝛿𝜖 𝒓 = 𝜖 𝒓 − 𝜖𝑏
102.4𝜆
Synthetic Reconstructions
Computational Resources:
256 Computing Nodes
256 MPI Processes

𝛿𝜖 𝒓 = 𝜖 𝒓 − 𝜖𝑏
32 Threads per Node
8,192 OpenMP Threads
102.4𝜆

Node Node

MPI CPU MPI CPU MPI

CPU CPU
Synthetic Reconstructions
Synthetic Reconstructions
Synthetic Reconstructions
Synthetic Reconstructions
Synthetic Reconstructions

Number of Nodes: 256


Number of threads: 8,192
Execution Time: 313.55 seconds (40 Born Iterations)
Number of MVMs: 482,488 (0.6 ms per MVM)
Number of Pixels: 1,048,576
Preprocessing of the Measurement Data

Raw Measurements (Time Domain)


0° (Broadside) 15° Angle 30° Angle
Time Gating
Raw Measurements (Time Domain)
0° (Broadside) 15° Angle 30° Angle

After Time Gating (Time Domain)


0° (Broadside) 15° Angle 30° Angle
Time Gating Raw Measurements (Freq. Domain)
0° (Broadside) 15° Angle 30° Angle

After Time Gating (Freq. Domain)


0° (Broadside) 15° Angle 30° Angle
Calibration Before Calibration (Freq. Domain)
0° (Broadside) 15° Angle 30° Angle

After Calibration (Freq. Domain) With regularization


0° (Broadside) 15° Angle 30° Angle
Application Example: Ultrasound Imaging
Freq. Range: 2.2 MHz – 8.6 MHz

128 𝑹𝑿

Numerical Scatterer Real Scatterer

0° Line

linear

linear
Target

128 Frequencies
128 Receivers
1 Illumination
Application Example: Ultrasound Imaging
Freq. Range: 2.2 MHz – 8.6 MHz

128 𝑹𝑿

Numerical Scatterer Real Scatterer

Line

linear

linear
Target

+30°

128 Frequencies
128 Receivers
1 Illumination
Application Example: Ultrasound Imaging
Freq. Range: 2.2 MHz – 8.6 MHz

128 𝑹𝑿

Numerical Scatterer Real Scatterer

linear

linear

128 Frequencies
128 Receivers
1 Illumination
Application Example: Ultrasound Imaging
Freq. Range: 2.2 MHz – 8.6 MHz

128 𝑹𝑿

Numerical Scatterer Real Scatterer

linear

linear
+30°

128 Frequencies
128 Receivers
1 Illumination
Current Problems
• Regularization of Born solvers
• Convergence problems with strong scatterers
• Limited-angle measurements
Future Plans
• Near-real time imaging
• New & practical applications
Acknowledgments

• Carl Pearson and Prof. Wen-Mei Hwu

• Anthony Podkowa and Dr. Michael Oelze


Fast And Parallel Algorithms
for Multiple-Scattering Imaging

Mert Hidayetoglu
University of Illinois at Urbana-Champaign, IL, USA

Chow Yei Ching Building 204K


10 am, 17 August 2016
• Yeni Program zamanlar
• Data access index
• Basis – testing functions

Vous aimerez peut-être aussi