Accelerating 3D Brain MRI Analysis Using FPGAs

ACCELERATING A MEDICAL 3D BRAIN MRI ANALYSIS ALGORITHM USING A
HIGH-PERFORMANCE RECONFIGURABLE COMPUTER

Jahyun J. Koo , Alan C. Evans and Warren J. Gross
Department of Electrical and Computer Engineering, McGill University, Montreal, Quebec, Canada
McConnell Brain Imaging Center, Montreal Neurological Institute, Montreal, Quebec, Canada
email : {jkoo10,wjgross}@macs.ece.mcgill.ca, alan@bic.mni.mcgill.ca
ABSTRACT
Many automatic algorithms have been proposed for analyzing magnetic resonance imaging (MRI) data sets. These
algorithms allow clinical researchers to generate quantitative data analyses with consistently accurate results. With
the increasingly large data sets being used in brain mapping,
there has been a significant rise in the need for methods to
accelerate these algorithms, as their computation time can
consume many hours. This paper presents the results from a
recent study on implementing such quantitative analysis algorithms on High-Performance Reconfigurable Computers
(HPRCs). A brain tissue classification algorithm for MRI,
the Partial Volume Estimation (PVE), is implemented on
an SGI RASC RC100 system using the Mitrion-C HighLevel Language (HLL). The CPU-based PVE algorithm is
profiled and computationally intensive floating-point functions are implemented on FPGA-accelerators. The images
resulting from the FPGA-based algorithm are compared to
those generated by the CPU-based algorithm for verification. The Similarity Indexes (SI) for pure tissues are calculated to measure the accuracy of the images resulting from
the FPGA-based implementation. The portion of the PVE
algorithm that was implemented on hardware achieved a 11
performance improvement over the CPU-based implementation. The overall performance improvement of the FPGAaccelerated PVE algorithm was 3.5 with four FPGAs.
1. INTRODUCTION
There has been a significant increase in the need for quantitative analysis of 3D magnetic resonance imaging (MRI)
data [1]. Such analysis provides an enormous amount of
information about the anatomy of the human brain to researchers. Traditionally, MRI data sets for the human brain
were manually analyzed by researchers. However, over the
past decade, automatic analysis methods have been proposed
and developed that reduce computation time and improve
data consistency. Even though the algorithms are automatic,
some can take many hours to process a single subject due
to complexity of the algorithms. As a result, acceleration
of the algorithms is important due to the increasing use of a

very large number of data sets in brain mapping.
MRI data analysis algorithms can be accelerated using
an architecture, High-Performance Reconfigurable Computers (HPRCs) which combine high performance CPUs with
reprogrammable accelerators such as Field Programmable
Gate-Arrays (FPGAs). Using HPRC, the computationally
intensive portion of the algorithm can be implemented on
FPGAs to exploit fine-grained parallelism, while other portions are implemented on the conventional CPU. Implementing the MRI data analysis algorithm on HPRC would be a
challenging task for researchers with only a software background, as basic low-level hardware description languages
(HDLs) and hardware knowledge are required to program
FPGAs. A number of commercial high-level languages have
been proposed to help such individuals program FPGAs by
abstracting away the hardware-dependent details.
In this study the 3D MRI tissue classification algorithm,
Partial Volume Estimation (PVE) is implemented on an SGI
Altix 350 with two RASC RC100 systems using the MitrionC HLL. An overview of HPRC system and HLL used in
this study are provided in Section 2. A general overview of
the PVE algorithm is described in Section 3. In Section 4,
the design strategies used to implement the PVE algorithm
on the FPGA-accelerator are proposed. The performance
comparison and differences of output images between the
FPGA-based and CPU-based implementation are also presented in Section 4. The differences between the two implementations are discussed and future works are described in
Section 5. Conclusions are provided in Section 6.
2. HPRC SYSTEM AND HIGH-LEVEL LANGUAGE
2.1. RASC RC100 System
In this study, the SGI RASC RC100 FPGA system is used to
accelerate the selected algorithm. Two RASC RC100 systems are connected to the Altix 350 multiprocessor via a
NUMAlink interconnect network. The Altix 350 has eight
1.5GHz Intel Itanium 2 CPUs with 16GB of shared physi-
Bank 0/1
8MB 8MB
QDR QDR
SRAM SRAM
Bank 4
Bank 2/3
8MB
8MB 8MB
QDR QDR QDR
SRAM SRAM SRAM
1.6 GB/s
Algorithm FPGA 0
3.2 GB/s
Each direction
TIO Interface ASIC

NUMALink 4
(3.2 GB/s)
Loader
FPGA
Bank 0/1
8MB 8MB
QDR QDR
SRAM SRAM
Bank 4
Bank 2/3
8MB
8MB 8MB
QDR QDR QDR
SRAM SRAM SRAM
1.6 GB/s
Algorithm FPGA 1
3.2 GB/s
Each direction
TIO Interface ASIC

NUMALink 4
(3.2 GB/s)
Fig. 1. RASC RC100 hardware.

cal memory. Each RC100 system is composed of two Xilinx
Virtex-4 LX200 FPGAs, two TIO interface ASICs and a bitstream loader FPGA [2]. Each computational FPGA has five
8MB QDR SRAM DIMM memory banks, although the core
services currently provide access to 32MB per FPGA. These
banks can be accessed by the FPGA at a rate of 128-bits per
clock cycle. The architecture of a single RASC RC100 system is illustrated in Fig. 1.
The SGI RASC RC100 system provides two special features, Streaming and Wide-scaling, which help to efficiently
execute algorithms on large data sets. The streaming feature
allows the algorithm to reduce the overhead of data transfer
by overlapping data loading and unloading while the algorithm is executing. The wide-scaling feature allows the algorithm to be automatically scaled over multiple FPGAs, by
equally distributing large data sets to the available FPGAs
and processing them simultaneously. This technique, therefore provides an implementation to easily exploit coarsegrained parallelism.
2.2. Mitrion-C
Dataflow-oriented Mitrion-C is a parallel HLL with C-like
syntax. Therefore, the programmer does not need to consider hardware-based concerns such as timing and can focus
on expressing the dataflow of the algorithm. Parallelism is
expressed using explicit language constructs and can be exploited in the form of vectorization and/or pipelining, based
on the selection of data types and the parallel constructs provided by Mitrion-C [3]. Vectorization offers explicit parallelism by replicating processing elements (PEs), however
this approach rapidly consumes FPGA resources. On the
other hand, pipelining generates a single PE and uses it to
process the input data sequentially. An algorithm can use a
combinations of both architectures to exploit available hardware resources and parallelism.
Mitrion machine code, generated by the Mitrion compiler, can be used to simulate the implementation or to gen-
erate VHDL code from the Mitrion processor configurator

with the information of target FPGA architecture. During
simulation, design error and performance bottlenecks can
be identified from a graphical representation of the hardware
data flow graph created by the Mitrion debugger/simulator.
The generated VHDL code represents the Mitrion Virtual
Processor (MVP), which runs at a fixed frequency of 100
MHz. The MVP is wrapped in the RASC RC100 core services which provide memory and IO interfaces between the
FPGA external SRAMs and shared memory on the Altix 350
system. Finally, this processor is synthesized and placedand-routed to generate a bitstream.
3. PARTIAL VOLUME ESTIMATION (PVE)
Brain clinical studies are typically interested in white matter
(WM), gray matter (GM) and cerebrospinal fluid (CSF). The
process of classifying these tissue types in 3D MR images of
the human brain is made difficult due to the partial volume
effect. This effect blurs the intensity distinction of tissues at
the boundary areas as shown in Fig. 2. Voxels (3D pixels)
that demonstrate this effect are composed of a mixture of
several different tissue types and are called mixels [4]. An
algorithm, known as Partial Volume Estimation (PVE), has
been developed to estimate the proportion of each tissue type
present in a voxel more accurately.
Several different PVE algorithms have been proposed,
distinguished based on how they model the partial volume
effect. In this study, the PVE algorithm developed in [5] is
implemented on the FPGA-accelerators. It uses a statistical
model, based on Markov Random field (MRF) theory, to address the partial volume effect. This model, first proposed in
[4], assumes that the image intensity value of each mixel can
be described as a sum of weighted random variables (RVs).
Let the acquired 3D MR image be denoted by X = {xi :
i = 1, ... , N } where N is the total number of voxels. Let
j = {1, ... , M } represent the set of possible pure tissues
present in the image and lj represent the RV describing the
tissue type j. The content of a mixel, xi can there be expressed as:
M

xi =
wij lj
(1)
j=1
where the weighting terms wij , known as partial volume coefficients (PVCs), represent the fractionalamount of tissue
M
type j present in voxel i (wij [0, 1] and j=1 wij = 1). If
a partial volume context image is denoted as W = {wij : i =
1, ... , N and j = 1, ... M }, an estimate W* for the true
partial volume context image can be determined from the
given MR image X using the maximum a posteriori (MAP)
criterion [5]. The PVE algorithm is therefore divided into
two different stages. The first stage estimates the partial volume context image W*, while the second stage estimates the
between the voxel i and its neighbour voxel k. The value of

aik is determined based on the tissue relationship between
voxel i and k. Reference [6] proposes to use aik terms with
a curvature image to overcome the oversmoothing artifacts
on the boundaries between GM and CSF tissues in the deep
sulcal regions (Fig. 2). This is demonstrated in Equation 5.
WM
GM
CSF
Sulcal
Region
Voxels with Partial

Volume Effect
Fig. 2. Partial volume effect on Sulcal region.

fractional amount (PVCs) of each pure tissue type present in
each voxel.
3.1. Partial Volume (PV) classification
In the PV classification stage, each voxel is classified and
labeled as one of four pure tissue types (BG, CSF, WM and
GM) or three mixed tissue types (WMGM, GMCSF and
CSFBG). If C is the context image to be estimated and X
is the observed image, then using MAP criterion, the estimated classified image can be expressed as:
C = arg max P (C|X) = arg max P (C)
C
where
N
N
P (xi |ci )
i=1
(2)
p(xi |ci ) and P(C) are the probability densities and
i=1
the prior information for all possible tissue classes, respectively. The probability densities for mixed tissue classes can
be expressed as Equation 3 where {j, k} indicates a mixel
that
is composed by tissue type j and k, while (w) and
(w) are the mean and the covariance of the mixel, respectively.

p(xi |ci = {j, k}) =

g xi ; (w),
(w) dw
(3)
Since adjacent voxels can be assumed that they likely

have similar tissue types in data sets generated by the 1 mm
resolution MRI scanner, a MRF model with 26-neighbour
voxels is used to represent the prior information due to its
flexibility in defining the neighborhood system [4]-[6]. The
prior information can be modeled using a MRF as:

N
aik
(4)
P (C) exp
d(i, k)
i=1
kNi
where is the MRF weighting parameter, i is the index of

a voxel in the image X, N is the total number of voxels in
image X, Ni is the 26-neighbour voxels around voxel i, k
is the index of a voxel from Ni and d(i,k) is the distance
ci = ck
ci and ck share a component
/ {GM CSF, CSF } or C(i) 0
ci
aik =
f
(C(i))
c
{GM CSF, CSF } and C(i) < 0
1
otherwise
(5)
where C(i) is the value of the curvature at voxel i and f(x) is
a function, provided by [6], to calculate an appropriate MRF
weight term for the voxel in the sulcal region.
In order to estimate the context image, C* from Equation
2, the iterative conditional modes (ICM) algorithm is used.
During each iteration, the ICM technique re-estimates the
tissue labels based on the labels calculated in the previous
iteration until convergence is reached.
3.2. PVC Estimation

The fraction of each pure tissue present within a particular
voxel is calculated in the PVC estimation stage. If voxel i
is a mixel containing tissue j and k, the fractional amount
of each pure tissue within the voxel i can be calculated by
employing the maximum-likelihood principle as shown in
[5]. A simple grid search algorithm is used to solve the
maximum-likelihood PVC estimation [5].
3.3. Overview of the PVE Algorithm

An overview of the described PVE algorithm is provided in
Fig. 3. Three input images are required to perform the PVE
algorithm on a single 3D MR brain image. The T1 image
is a normal T1-weighted MR image from an MR scanner.
The curvature image contains the information of CSF in the
sulcal region. The classified image is used to calculate the
parameters of the pure tissue classes and to reduce the computation complexity by eliminating the voxels in the background. The algorithm is terminated when the total number
of changed voxels during the previous iteration is lower than
the total number of changed voxels during the current iteration or the maximum number of iterations (20) is reached.
The PVE algorithm generates four output images, such as
PVE classified, PVE CSF, PVE WM and PVE GM images.
Each voxel in the PVE classified image is labeled to the
dominant tissue type among all possible tissue types. The
PVE CSF, WM and GM images provide the proportion of
CSF, WM and GM present in each voxel, respectively.
PV
PV
Classification
classification Stage
stage
Estimate probability
densities for pure and
mixed tissue types
Table 1. PVE profiling report.

Functions
CPU-based FPGA-based
Estimating
301s (81%) 28.4s (28%)
the prior information
Estimating
49.8s (13%) 49.8s (50%)
the probability densities
PVC Estimation
3.8s (1%)
3.8s (4%)
IO functions
18.2s (5%)
18.2s (18%)
Total
372.8s
100.2s
PVE Classified
T1 image
Compute the prior

information P(C)
PVE CSF
Curvature
Termination
criteria
No
Yes
Estimate the
fractional amount of
each tissue within
each voxel
Classified
Image
Input images
PVC
estimation stage
PVE Algorithm
Two subsections within the PV classification stage
PVE WM
PVE GM
Output images
Fig. 3. PVE algorithm.

4. HARDWARE IMPLEMENTATION OF PVE
The CPU-based PVE algorithm was first profiled using GNU
gprof to identify the computationally intensive portions and
where data-parallelism could be exploited on the FPGA. The
report generated by the tool is shown in Table 1 and indicates that estimation of the prior information for all tissue
types in the PV classification stage described in Section 3.1
is the primary performance bottleneck.
4.1. Implementing PV Classification
The PV classification, with the exception of the portion that
computes the probability densities for all tissue types, was
the most ideal candidate to be implemented on RASC RC100
FPGA-accelerators over other functions, as it is performed
iteratively and consumes approximately 80% of the total
PVE algorithm computation time. To run the FPGA-based
PV classification function, three images are required: the
PVE classified image, the curvature image and an image that
contains the floating-point probability densities of different
tissue classes. Due to the limited size of the FPGA external SRAMs, all three images could not be sent to the FPGA
at once. The RASC Streaming feature has therefore been
utilized to reduce the overhead of data transfer between the
memory in Altix 350 and FPGA external SRAMs.
Since the prior information for each voxel can be computed independently, the algorithm execution is distributed
to the four available FPGA-accelerators to exploit coarsegrained parallelism. Unfortunately, the RASC Wide-scaling
feature in Section 2.1 could not be used in this algorithm be-
cause the bitstreams will be uploaded onto the FPGAs are

not identical. Therefore, new bitstreams are generated from
the design which is manually partitioned into four different
parts, such that each FPGA can work on a horizontally partitioned segment of a single slice from the 3D input image.
The PV classification of the C-kernel is reprogrammed
using the Mitrion-C HLL to generate the bitstreams for the
FPGAs. First, the hardware-implemented algorithm reads
all of the data required to process one row from the input SRAM and stores it in the internal Block RAMs. Nine
rows (three each from the front, current and back slices) of
the PVE classified image are used to determine the weighting term aik in Equation 5. In the CPU-based PVE algorithm, the PVE classified image is stored in an array of unsigned char (8-bits) data type. However, to reduce hardware resources and achieve better performance, each row element is stored in 4-bit internal Block RAMs for the FPGAaccelerated implementation because the voxels in the classified image are used to represent all possible tissue classes.
Both a single row of the curvature image and the floatingpoint probability densities for all possible tissue types are
also read and stored in the internal Block RAMs.
After all of the required data is stored in the internal
Block RAMs, the P (C) values are calculated for all tissue types. Since the P (C) values of each tissue type are
independent, they are computed in a pipelined fashion. A
aik
term in Equation 4 is
PE which generates a single d(i,k)
designed and nine of them are implemented in parallel to
compute 26 distinct terms from 26-neighbour voxels due to
the limited hardware resources, as shown in Fig. 4. Each
PE first verifies the relationship between the reference tissue
class and one of its 26-neighbour voxels and then determines
the appropriate aik term as Equation 5. The weighting term
is multiplied by the inverse of the distance between the voxel
i and its neighbour voxel k, instead of dividing, to avoid
costly floating-point division operations. The output of each
PE is accumulated using a four-level single-precision reduction tree instead of a data-dependent for-loop. A sequential loop-dependent accumulator is added at the end of the
reduction tree. The final P (C) is then generated using a
Reference
PE
tissue class
PE
Pre-calculated
P(C)P(x|c)
aik
d (i, k )
P (C )
26neighbour
PE
voxels
PE
PE
PE
EXP
a
ik
kN i d (i, k )
Compute
P(C)P(x|c) for all
possible tissue
types
Evaluate the tissue
type which has the
maximum
P(C)P(x|c) value
Table 3. Average speedup achieved from 11 test MRI.

Software Hardware-PVE
Total time
362s
100.2s
Time per itr
26.7s
2.3s
Speedup per itr
11.6
Overall speedup
3.6
New classified
tissue
PE
PE
PE
(a) CPU-based
Fig. 4. Hardware-implemented PV classification.
(b) FPGA-based
(c) Difference overlaid on the T1 image
Fig. 5. Resulting images from two implementations.

Table 2. Hardware resources (Xilinx Virtex-4 LX200).
Resource (total)
Hardware-PVE
Slices (89,088)
77,954 (87%)
Flip Flops (178,176)
82,159 (46%)
4-inputs LUTs (178,176) 96,479 (54%)
Multipliers (96 1818)
65 (67%)
18kb Block RAMs (336)
37 (11%)
single-precision floating-point exponent operation as shown

in Fig. 4.
The generated P (C)s are multiplied by the corresponding single-precision tissue probability densities in a pipelined
fashion and the tissue type which has the maximum P (C) is
determined. Finally, the voxel i is re-classified as the determined tissue type and is written to the output RAM.
4.2. Results
The original CPU-based PVE algorithm and the host program of the FPGA-based implementation are developed in
ANSI-C and run on a single 1.5 GHz Itanium 2. The FPGAimplemented sections are developed using SGI RASC Library 2.1 and Mitrion-C 1.2. Xilinx ISE 8.2i is used to synthesize and place-and-route the implementations in Section
4.1. The average time spent for the synthesis and placeand-route for the implementation is approximately 18 hours.
The resources consumed by each implementation are given
in Table 2.
The FPGA-based PVE algorithm is performed on several real human brain MR images to measure the performance improvement. An image, colin27, is generated by
averaging the 27 distinct MR scans of a single subject to
increase the contrast between tissues and to minimize undesired noise [7]. Moreover, ten normal human images from
the International Consortium for Brain Mapping (ICBM)
database are randomly selected to compare the execution
time spent by the CPU-based and FPGA-based implementation. Each image is composed of 181 slices and each slice
contains 39277 of 1 mm3 voxels (217 181). The average
execution times and speedup obtained by both implementations on these subjects are shown in Table 3. To allow for
a fair comparison, the hardware execution time includes all
overhead associated with the preparation of the input buffer,
sending of data, running of bitstream and receiving of data.
The very same colin27 and ten ICBM images are used to
verify the accuracy of the resulting images from the FPGAimplemented PVE algorithm. A single slice of the resulting
image from each implementation is shown in Fig. 5(a) and
Fig. 5(b). The mismatched 1 mm3 voxels between the two
implementations are represented as black dots and overlaid
on the T1 image in Fig. 5(c). For a more accurate measure of
the differences, the Similarity Index (SI) is calculated over
the whole volume. The SI is derived from a reliability measure known as the kappa statistic [8] and defined as:
SI = 2
n{ASW AHW }
n{ASW } + n{AHW }
(6)
where n{A} represents the number of elements in set A

while ASW and AHW are the set of voxels classified as the
tissue type A resulting from the software and hardware implementation, respectively. If the two results are identical,
its SI is 1. If there is no common elements between the results, its SI becomes 0. The evaluated SI for each pure tissue
class is provided in Table 4.
Table 4. Similarity Index (SI).

Tissue colin27 ICBM
CSF
0.994
0.995
WM
0.999
0.999
GM
0.998
0.997
rithm is expected to be increased by a factor of 8 since the

number of voxels in an image is increased by a factor of 8.
With some minor memory structure modifications and new
partitioning strategies, the proposed implementation can be
performed on the 0.5 mm brain MRI data sets.
6. CONCLUSION
5. DISCUSSION
The hardware PVE algorithm was profiled again and the new
report indicated that estimation of the probability densities
is the new performance bottleneck as shown in Table 1. The
percentage of the total time spent for this function increased
from 13% to 50%. As a result, although the average speedup
of the portions implemented on the FPGAs achieved 11.6
per iteration over the CPU-based parts, the actual overall
speedup only varied from 3.5 to 4.2 since different number of iterations required for each subject (average 3.6).
To achieve better overall speedup over the CPU-based implementation, a new design is currently under development,
which will also implement the function that estimates the
probability densities on the FPGA-accelerators.
The mismatched voxels overlaid on the T1 image in Fig.
5(c) and the calculated SI values indicate that the results
from the two implementations were not identical. However,
the difference was very slight and would not cause any significant errors in clinical studies. These mismatches were
due to the different strategies used for updating the estimated
tissue type between the two implementations. In the CPUbased algorithm, the new classified tissue type for a voxel
is instantly stored into memory. The 26-neighbour voxels
of the current voxel are also instantly updated as they are
changed. However, the FPGA-based implementation first
stores the updated tissue types in the FPGA external SRAMs
and then transfers them to the memory on the Altix 350
system when the algorithm has processed all voxels. As a
result, the newly updated voxels were computed from the
26-neighbour voxels generated during the previous iteration.
The updating scheme of the hardware-implemented algorithm was necessary to fully exploit the parallelism offered
by FPGAs. Since there were no data-dependencies between
voxels, all voxels were computed in a pipelined fashion.
While double-precision floating-point was used to calculate P (C)P (x|c) terms during the PV classification in the
CPU-based PVE algorithm, single-precision floating-point
was used in the FPGA-based implementation to reduce the
required hardware resources. This difference did not cause
any errors in the output images because the minimal precision error will not degrade determining the tissue class that
generates the maximum P (C)P (x|c) term.
As the resolution of MR imaging increases from 1 mm
to 0.5 mm, the computation time of the proposed PVE algo-
In the described study, a quantitative analysis algorithm for

3D MR imaging was accelerated using HPRC. The computationally intensive portions of the PVE algorithm were implemented on RASC RC100 FPGA-accelerators using the
fine-grained optimization techniques provided by MitrionC HLL. The CPU-based and FPGA-based PVE algorithms
were used to process on several real human brain MR imaging data set. The images generated by the FPGA-based algorithm were then compared to those generated by the CPUbased algorithm as a means of verification. Moreover, the
SI values for pure tissue classes were calculated to measure
the accuracy of the resulting images. The PV classification function implemented on four FPGAs achieved approximately 11 speedup over the CPU-based implementation.
The overall performance improvement of the FPGA-based
PVE algorithm was approximately 3.5 with four FPGAs.
7. REFERENCES
[1] A. P. Zijdenbos, R. Forghani, and A. C. Evans, Automatic
pipeline analysis of 3-d mri data for clinical trials: application to multiple sclerosis, IEEE Transactions on Medical
Imaging, vol. 21, no. 10, pp. 1280 1291, Oct 2002.
[2] Silicon graphics Inc., Reconfigurable Application-specific
Computing Users Guide, 007th ed., 2006.
[3] S. Mohl, The Mitrion-C Programming Language, Mitrionics
Inc.
[4] H. S. Choi, D. R. Haynor, and Y. Kim, Partial volume tissue classification of multichannel magneticresonance imagesa mixel model, IEEE Transactions on Medical Imaging,
vol. 10, no. 3, pp. 395407, Sep 1991.
[5] J. Tohka, A. Zijdenbos, and A. Evans, Fast and robust parameter estimation for statistical partial volume models in brain
mri, NeuroImage, vol. 23, no. 1, pp. 8497, September 2004.
[6] V. Singh, Use of a non-stationary markov random field in
brain tissue partial volume segmentation, masters thesis,
Dept. of Electrical and Computer Eng, McGill University, Feb
2005.
[7] C. J. Holmes, R. Hoge, L. Collins, R. Woods, A. W. Toga, and
A. C. Evans, Enhancement of mr images using registration for
signal averaging, Journal of computer assisted tomography,
vol. 22, no. 2, pp. 324333, Mar 1998.
[8] A. P. Zijdenbos, B. M. Dawant, R. A. Margolin, and A. C.
Palmer, Morphometric analysis of white matter lesions in mr
images: Method and validation, IEEE Transactions on Medical Imaging, vol. 13, no. 4, pp. 716724, Dec 1994.

Accelerating 3D Brain MRI Analysis Using FPGAs

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Accelerating 3D Brain MRI Analysis Using FPGAs

Transféré par

Droits d'auteur :

Formats disponibles

ACCELERATING A MEDICAL 3D BRAIN MRI ANALYSIS ALGORITHM USING A

HIGH-PERFORMANCE RECONFIGURABLE COMPUTER

of the algorithms is important due to the increasing use of a

TIO Interface ASIC

TIO Interface ASIC

Fig. 1. RASC RC100 hardware.

erate VHDL code from the Mitrion processor configurator

between the voxel i and its neighbour voxel k. The value of

Voxels with Partial

Fig. 2. Partial volume effect on Sulcal region.

p(xi |ci ) and P(C) are the probability densities and

Since adjacent voxels can be assumed that they likely

where is the MRF weighting parameter, i is the index of

{GM CSF, CSF } and C(i) < 0

3.2. PVC Estimation

3.3. Overview of the PVE Algorithm

Table 1. PVE profiling report.

Compute the prior

Two subsections within the PV classification stage

Fig. 3. PVE algorithm.

cause the bitstreams will be uploaded onto the FPGAs are

Table 3. Average speedup achieved from 11 test MRI.

Fig. 4. Hardware-implemented PV classification.

(c) Difference overlaid on the T1 image

Fig. 5. Resulting images from two implementations.

single-precision floating-point exponent operation as shown

where n{A} represents the number of elements in set A

Table 4. Similarity Index (SI).

rithm is expected to be increased by a factor of 8 since the

In the described study, a quantitative analysis algorithm for

Vous aimerez peut-être aussi