Information Sciences: Zhiqin Zhu, Hongpeng Yin, Yi Chai, Yanxia Li, Guanqiu Qi

evaluation
Information Sciences 432 (2018) 516–529
Contents lists available at ScienceDirect
Information Sciences
journal homepage: www.elsevier.com/locate/ins
A novel multi-modality image fusion method based on image

decomposition and sparse representation
Zhiqin Zhu a,b, Hongpeng Yin a,b,∗, Yi Chai b, Yanxia Li a,b, Guanqiu Qi b,c
a
Key Laboratory of Dependable Service Computing in Cyber Physical Society of Ministry of Education, Chongqing University Chongqing,
400030, China
b
College of Automation, Chongqing University, Chongqing 400044, China
c
School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, Arizona 85287, USA
a r t i c l e i n f o a b s t r a c t
Article history: Multi-modality image fusion is an effective technique to fuse the complementary infor-
Received 14 December 2016 mation from multi-modality images into an integrated image. The additional information
Revised 1 September 2017
can not only enhance visibility to human eyes, but also mutually complement the limita-
Accepted 3 September 2017
tions of each image. To preserve the structure information and perform the detailed in-
Available online 6 September 2017
formation of source images, a novel image fusion scheme based on image cartoon-texture
Keywords: decomposition and sparse representation is proposed. In proposed image fusion method,
Sparse representation source multi-modality images are decomposed into cartoon and texture components. For
Dictionary construction cartoon components a proper spatial-based method is presented for morphological struc-
Multi-modality image fusion ture preservation. An energy based fusion rule is used to preserve structure information of
Cartoon-texture decomposition each source image. For texture components, a sparse-representation based method is pro-
posed. A dictionary with strong representation ability is trained for the proposed sparse-
representation based fusion method. Finally, according to the texture enhancement fusion
rule, the fused cartoon and texture components are integrated. The experimentation re-
sults have clearly shown that the proposed method outperforms the state-of-art methods,
in terms of visual and quantitative evaluations.
© 2017 Elsevier Inc. All rights reserved.
1. Introduction
Multi-modality image fusion combines the complementary information from multi-modality sensors to enhance visibility
of human eyes or mutually complement the limitations of each image. Diverse modalities of images, such as infrared-visible
images, multi-focus images, and medical images are utilized in visual surveillance systems for visibility enhancement and
better situation awareness [12,17]. A large amount of research efforts were made to improve image fusion performance in
the last two decades. Spatial-domain and transform-domain fusion are two typical branches of image fusion.
Spatial-domain based methods directly choose clear pixels, blocks, or regions of source images to compose a fused image
without transformation [14]. Based on the measurement of image clarity, those pixels or regions with higher clarity are
selected to construct the fused image by spatial-domain based methods. To decrease the constraints, averaging, max pixel
schemes perform on single pixel to generate fused image. However, the contrast and edge intensity of fused image may
∗
Corresponding author at: Key Laboratory of Dependable Service Computing in Cyber Physical Society of Ministry of Education, Chongqing University
Chongqing, 40 0 030, China. Tel.: +8618623081817.
E-mail addresses: zhiqinzu@outlook.com (Z. Zhu), yinhongpeng@gmail.com (H. Yin), cqchaiyi@cqu.edu.cn (Y. Chai), liyanxia106@gmail.com (Y. Li),
guanqiuq@asu.edu (G. Qi).
http://dx.doi.org/10.1016/j.ins.2017.09.010
0020-0255/© 2017 Elsevier Inc. All rights reserved.
Z. Zhu et al. / Information Sciences 432 (2018) 516–529 517
decrease. In general, spatial-domain methods may lead to blurring edges, contrast decrease, and reduction of sharpness
[25]. Some methods, as block-based and region-based algorithms [13], are proposed to improve the quality of fused image.
Although lock-based algorithms improve the detailed expression of fused image, the sharpness of fused image may still be
undesirable. Block-based algorithms may cause block effect, when they are applied to spatial-domain based methods.
Transform-domain based methods use a transform tool to decompose source images into coefficients and transform bases
first. Then the coefficients are fused by diverse fusion rules in different applications. Finally, the fused image can be obtained
by inversely transforming fused coefficients and transform bases. Multi-scale transform(MST) is one of the most popular fu-
sion techniques in multi-modal image fusion. Starting with Discrete Wavelet Transform (DWT) [28], a variety of transforms
including Dual-Tree Complex Wavelet Transform (DT-CWT) [11], Curvelet Transform (CVT) [27], Shearlet Transform [37] and
Non-Subsampled Contourlet Transform (NSCT) [15] have been used in multi-modal image fusion. Although transform coef-
ficients can reasonably represent important features of an image, each transform has its own merits and limitations cor-
responding to the context of input images [42]. Thus, selecting an optimal transform basis is not an obvious and trivial
problem as it relies on scene contexts and applications [17].
In recent years, sparse representation(SR) has been successfully implemented to image classification [2,19], image super-
resolution [44],image recognition [21], image feature extraction [20], image deblurring [30], image object recognition
[18,20] and multi-modality information fusion [47]. As a transform-based method, SR was first applied to image fusion by Li
and Yang [41]. They used DCT transform to build the dictionary for SR, and an SR-based fusion framework was proposed. A
medical image fusion and de-noising method by group sparse-representation was introduced in [17]. However, this method
is not tested on color medical images. Yang and Liu [42] proposed several kinds of mathematical models, that were utilized
to construct hybrid dictionaries for image fusion. The hybrid dictionaries can well reflect several specific structures, but are
still lack of the adaptability to represent different types of images. In that case, learning-based adaptive dictionaries were
implemented for SR-based image fusion [36]. KSVD-based method is the most widely used adaptive dictionary construc-
tion method for SR-based image fusion [26,45]. Multi-focus image fusion methods based on KSVD were proposed by Yin
[45] and Nejati [26], and showed good state-of-art performances. Yin [47] also proposed a multi-modality medical image
fusion method by KSVD, which can enhance the performance of image details. Nonparametric Bayesian adaptive dictionary
learning was proposed in [40] for remote-sensing image fusion.
Although SR gets great performance in image fusion, it still has two limitations in multi-modality image fusion. The first
limitation is the most advanced Max-L1 sparse coefficients fusion rule may cause spatial inconsistency in the integrated
multi-modality images [22]. The second limitation is only one generally trained dictionary cannot accurately reflect the
complex structures of input image[17].
For the first limitation, decomposing source images into both high and low frequency components is the most common
solution. Gaussian filter as a common used filter is applied to image decomposition of SR-based image fusion method [10].
They used SR and Max-L1 rule for image high-frequency components fusion, which can remain more fusion details. Multi-
scale transformation(MST) filter [22] is also used for SR-based image fusion method. However, MST-based filter [22] shows
limitations on decomposing specific kinds of images. Gaussian filter [10] also has limited performance on decomposing the
detailed and structure information of input images.
For the second limitation, in [10], Kim first clustered training samples into a few structural groups by k-means method.
Then Kim trained a specific sub-dictionary for each group. In this way, each sub-dictionary can fit a particular structure
and the whole dictionary has strong representation ability. Similarly, Wang et al. separately constructed spectral and spatial
dictionary for the fusion of multi-spectral and panchromatic images [39]. However, Kim’s method needs to set the cluster
numbers before clustering and Wang’s method can only be used in remote-sensing image fusion.
In this paper, a novel image fusion method based on image decomposition and sparse representation is proposed, and
is specific to the mentioned two limitations. For the first limitation, a compact and informative dictionary learning method
for texture component fusion is proposed. In the dictionary learning processes, input images are clustered into a few pixel
groups for sub-dictionary learning. Sub-dictionary of each pixel group can fit particular structures of different image features.
To cluster image pixels with different features, the local regression weight of steering kernel (SKR) [33] is implemented. As a
sophisticated and effective feature, SKR feature can reflect local image structures effectively even with the presence of noises
[10]. In this case, SKR feature is implemented as the pixel feature for clustering. In image pixel clustering, local density peaks
(LDP) clustering method is implemented [1]. Local density peaks clustering method can group image pixels without setting
the cluster number in advance. A few compact sub-dictionaries can be trained by extracting the underlying information of
each pixel cluster.
For the second limitation, a cartoon-texture based image fusion framework is proposed. According to the properties of
cartoon and texture contents, proper fusion methods are constructed to fuse cartoon and texture components respectively.
For texture components, SR-based method is conducted. For cartoon components, energy-based fusion rule is implemented
to preserve the geometric structure information of all the source images. A gradient information based method is proposed
to integrate cartoon and texture components.
The main contributions can be summarized as follows:
1. We propose a novel dictionary training method to enhance the sparse-representation ability. In our proposed, image
pixels are clustered and sub-dictionaries are trained for each cluster. Sub-dictionaries are combined to construct the final
dictionary.
518 Z. Zhu et al. / Information Sciences 432 (2018) 516–529
Fig. 1. The proposed fusion framework.
2. We propose proper fusion rules to fuse cartoon and texture components, for structure and detailed information preserva-
tion. For cartoon components, an energy-based spatial fusion method is proposed for structure information preservation.
For texture components, an SR-based fusion method is implemented for preserving the details of source images.
3. We propose a novel combination rule of cartoon and texture components to improve the texture information of fused
image. The proposed combination rule estimates the gradient information of texture components. The estimated gradient
information is conducted as a reference for the enhancement of texture information in fused image.
The rest of this paper is structured as follows: Section 2 presents the proposed framework; Section 3 simulates the
proposed solutions and analyzes experiment results; and Section 4 concludes this paper.
2. The novel multi-modality image fusion method
In Fig. 1, cartoon-texture decomposition based image fusion method is proposed. In proposed fusion framework, source
images are decomposed into cartoon and texture components first. Total-variation based method is implemented to obtain
the cartoon and texture components of source images. The decomposed cartoon components are fused by means of spatial-
based fusion rule. The spatial based fusion rule consists of two parts, proportional calculating and spatial fusion. Proportional
calculation rule uses the Sum-modified-Laplacian (SML) of cartoon components. The texture components are fused by SR-
based method. The SR-based fusion method has composed by three steps: dictionary learning, sparse coding and sparse
coefficients fusion. In dictionary learning process, image pixels are clustered based on morphology similarity. Then sub-
dictionaries can be trained for clusters using PCA based dictionary learning method. At last, the fused cartoon and texture
components are combined to generate the integrated image by our proposed integration rule. In Fig. 1, and represent
the cartoon-texture components integration rule and sparse coefficients restitution respectively.
2.1. Image cartoon-texture decomposition
Cartoon-texture decomposition can decompose images into texture and cartoon components, which mainly describe the
detailed information and structure information, respectively [24]. First, input images are decomposed into cartoon and tex-
ture components in proposed fusion framework. Then the cartoon and texture components are integrated by proper fusion
rules respectively. In this work, Vese–Osher (VO) model [35] is implemented for image decomposition. The VO model is
shown in Eq. (1):

−
−→ −
→2 →
inf
−
→
V O p u, g = |u|T V + λ f − u − div g 2 + μ g , (1)
L Lp
u, g
−
→
where g = (g1 , g2 ) is a vector in G space to represent
−digital images. λ and μ are regularization parameters. u represents
→ −
→
the cartoon component of image. f is the input image. g p represents Lp norm of g , which can be calculated by Eq. (2):
L
−
p 1p
→
g p = g21 + g22 dxdy . (2)
L
This model can fast calculate cartoon component [35] u of image f by setting p between 1 to 10. When the cartoon compo-
nent u is calculated, the texture component v can be simply calculated by v = I − u.
2.2. Fusion rule for cartoon component
For the fusion of cartoon components, the fundamental principle is to preserve the geometric structure information of
source images. An energy-based fusion rule is proposed to preserve geometric structure information. In this work, the energy
of cartoon-components is estimated by SML. SML can reflect the energy of image local information [5]. SML is defined in
Eqs. (3) and (4). The modified-Laplacian ML(i, j) of each pixel in image I is defined as Eq. (3),
ML(i, j ) = |2u(i, j ) − u(i − step, j ) − u(i + step, j )|
+|2u(i, j ) − u(i, j − step) − u(i, j + step)|. (3)
In Eq. (3), i and j represent the spatial coordinate of image pixel. u is the cartoon component of each image. SML is defined
as Eq. (4).

P
Q
SML(i, j ) = [ML(i + p, j + q )] .
2
(4)
p=−P q=−Q
where P and Q are the parameters to determine the window size. The window size is (2P + 1) × (2Q + 1). In this work,

parameters P and Q are set to 2. Energy of image cartoon-components N is defined as N = SML(i, j )1 , where •1 is
i, j
the L1 norm of •. Supposing there are n source images, the energy-based fusion rule for can be constructed by Eq. (5)
Ni
uf = ui , (5)
N1 + N2 + . . . Nn
i=1,2...n
where uf is fused cartoon component. ui is the cartoon component of ith source image.
2.3. Dictionary learning for texture component fusion
Texture components mainly describe the detailed information of a nature image. Thus, the detailed information of fused
texture component should be preserved as specific as possible. As SR-based fusion method with a trained dictionary shows
quite remarkable advancement in detailed performance of integrated image [10,36,45], it is implemented for the fusion of
texture components. Dictionary learning is the key of SR-based fusion, and consists of two steps. Dictionary learning method
consists of two steps. First, the SKR feature of image pixels is calculated as clustering features. Using the calculated features,
image pixels can be clustered into a few groups. Second, PCA analysis is conducted for extracting sub-dictionaries from each
group. The final dictionary, which is used for sparse representation, can be constructed by these sub-dictionaries.
2.3.1. Image pixels clustering

According to image features, it clusters image pixels into a few groups for specific sub-dictionary learning in proposed
solution. The proposed clustering method uses SKR feature for clustering. Traditional clustering methods do not usually take
local image structures into consideration [3]. SKR feature is a sophisticated feature, which can reflect local image structures
effectively [10]. The feature vector of SKR for the ith pixel of the lth source image sil , can be defined with the steering
covariance matrix as Eq. (6).

T

im T
det(C j ) xil − xlj C j xil − xlj
sil = sil1 , sil2 , . . . sl , sil j = exp − , (6)
2π h2steer 2h2steer
where m denotes the total number of pixels in the kernel. Hence it also represents the size of feature vector sil . xil represents
the spatial location of the ith pixel of the lth source image. hsteer is a global smoothing parameter for adjusting the support
of kernel [33]. Cj is the steering covariance matrix that can be estimated with local gradient of neighboring patch centered
at the jth pixel.
Fig. 2. LDP clustering for image pixels.
By using SKR feature of pixels from all source images as an input of LDP clustering algorithm[1], our proposed pixel
clustering method can group pixels without presetting the cluster amount. As the essential step of LDP clustering, the cluster
centers are calculated by Algorithm 1.
Algorithm 1 Image pixels clustering.

Input:
SKR feature sik of each pixel pik
Output:
Selected clustering centers
ij j
1: Calculate the SKR feature distance of each two pixels dlk = sil − sk ;
2
,2...w
j=1
Calculate the local density ρli of ith pixel from lth source image ρli = χ (dlk
ij
2: − dc );
k=1,2,...n
3: Calculate the minimum distance between the ith pixel from lth image and any other pixels with higher density δli =
ij
min (dlk );
j:ρk j >ρli
4: for i = 1 to x do do
5: for l=1 to z do
6: if δli >> δmost && ρli > ρmost then
7: Set pik as cluster center
8: end if
9: end for
10: end for
First, SKR feature distance is calculated. k and l correspond to image number. j and i correspond to the pixel number.
•2 is the L2 norm of •. dlk
ij
is the SKR feature distance between ith pixel of the lth source image and jth pixel of the
kth source image. SKR feature distance vector dli = [dl1 i , d i , . . . d i ], where d i = [d i1 , d i2 , . . . d iw ] is the SKR feature distance
l2 ln lk lk lk lk
between ith pixel of the lth source image and pixels of the kth source image.
Second, local density ρli of ith pixel from lth source image. n is the total number of input images. χ (x ) = 1 if x < 0,
otherwise χ (x ) = 0. dc is a cutoff distance. Basically, ρli is equal to the number of points that is less than dc to i pixel of
lth source image. x and z are the total number of source images and image pixels of each source image, respectively. In
the practice, this algorithm is only sensitive to the relative magnitude of ρli in different points, implying that the results of
analysis are robust to the choice of dc for large dataset.
Third, the minimum distance between ith pixel from lth image and any other pixels with higher density is calculated.
δ li is the minimum distance between ith pixel from lth image and any other pixels with higher density. In the practise, a
few pixels have a much larger δ li than the typical nearest neighbor distance. These points are local or global maxima in the
density [1].
Finally, clustering centers are recognized as the pixels with anomalously large relative-distance δ and relatively large ρ .
ρ most and δ most represent most δ and ρ . As shown in Fig. 2, δ and ρ of image pixels are demonstrated in (b) as a data points
map. In this map, pixels which have anomalously large relative-distance δ and local density ρ , are labeled by the squares.
These labeled pixels are the cluster centers.
When the cluster centers are selected, the rest of pixels can be merged to nearest cluster centers by SKR feature distance.
2.3.2. Sub-dictionary training and dictionary construction

After clustering image pixels into a few groups, principal components analysis (PCA) is conducted to obtain sub-
dictionaries. The PCA-based sub-dictionary learning method can reduce redundant information in building compact and in-
formative sub-dictionaries [4]. Since the clustered image groups contain different image features [48], specific sub-dictionary
is trained for each image group to fit particular structures of different image features. In this work, top n most informative
PCA bases are chosen to form a sub-dictionary for each image pixel cluster. The PCA bases are chosen by Eq. (7):

n
Si = [si1 , si2 , . . . sin ], s.t.n = arg max Lj > , (7)
n
j=n+1
where Si denotes the sub-dictionary for the ith cluster, which consists of n eigenvectors. Lj is the eigenvalue of corresponding
the jth eigenvector sij . The eigenvalues are sorted in descending order (si1 > si2 > · · · > siw > 0 ). is used to control the
amount of approximation with rank n. In our proposed sub-dictionary learning method, is set to 0.9, which can keep
the sub-dictionary informative and avoid data noise. When the sub-dictionaries are obtained, the dictionary D for sparse
representation can be constructed by Eq. (8):
D = [S1 , S2 , . . . So ], (8)
where o is the group number of image pixel.
2.4. SR Based fusion method for texture component
In order to fuse texture components of source images accurately, an SR-based fusion method that consists of four steps
is implemented.
First, texture components of input images are split into image blocks B = [b1 , b2 , . . . bn ], where n is the amount of
texture-component patches.
Second, texture-component patches are coded to sparse coefficients Z = [z1 , z2 , . . . zn ] by trained dictionary and OMP
algorithm [31].
Finally, Max-L1 fusion rule [41] is conducted for integrating all the coefficients of different source images:

m
Ok = 1, max(z1 1 , z2 1 . . . , zm 1 ) = zk 1
f k
z = zk ∗ O , where . (9)
Ok = 0, otherwise
k=1
In Eq. (9) zk is the sparse coefficient corresponding to image patch bk . z f is the fused vector. •1 represents L1 norm of •.
In the last step, fused coefficients are conducted to obtain the fused texture-component patches. The fused texture-
component patches can be combined together to get the fused texture component vf .
2.5. Fusion rule for cartoon and texture components
In image processing, gradient information has been widely used in texture analysis [9,34] for image classification. As tex-
ture enhancement, image de-noising is an important application of gradient-information based texture analysis [49]. In order
to solve the over-smooth problem of de-noised image, Zuo and Zhang [49] proposed a gradient information estimation and
preservation method for texture enhancement in image de-noising. Be inspired by gradient-information based texture anal-
ysis methods [9,34,49], a gradient-information based texture enhancement is proposed. The proposed gradient-information
based fusion rule is shown as Eq. (10):

( G i 1 )

i=1,2,...n (10)
If = uf + vf ,
G f
1
where •1 represents L1 norm of •. n is the total number of source images. uf and vf are cartoon and texture compo-
nents of fused image respectively. Gf and Gi are the gradient strength of fused texture component vf and ith input im-
age texture component vi , respectively. In this work, gradient strength G of an image is defined by summating the gra-
dient amplitude
of each pixel. In image k, gradient amplitude gk (i, j) of pixel at spatial position i, j can be calculated by
2 2
gk (i, j ) = gkx (i, j ) + gky (i, j ) . gkx (i, j ) and gky (i, j ) represent the gradient on x and y direction of pixel at position (i, j). The
gradient strength Gk of an n × m image k can be calculated by Eq. (11):

Gk = gk (i, j ), (11)
i, j
3. Experiments and analyses
3.1. Experimental setup
To test the efficiency of our proposed method, diverse image pairs of different sensor modalities are prepared. These,
image pairs including multi-modality medical image pairs, infrared/near infrared(NIR)-visible image pairs and multi-focus
image pairs, are used to test our proposed fusion method. The multi-modality medical image pairs can be obtained from
http://www.med.harvard.edu/aanlib/home.html. The “street view” infrared and visible image pair are originally from http:
//www.imagefusion.org. Remote sensing NIR and RGB-visible image pairs are from “IKONOS” of global land cover facility
http://www.glcf.umiacs.umd.edu/data/. The multi-focus images are from Lytro-multi-focus data-set http://mansournejati.ece.
iut.ac.ir [26] and http://www.imagefusion.org. In comparison experiments, the fusion performance of proposed framework
is compared with other the-state-of-art SR-based methods, including KSVD [45], MST-SR [22] and JCPD [10]. The sparse
representation error tolerance is determined to 0.14, which is considered as optimal value for both representation accuracy
and noise tolerance. The size of patch for all SR-based methods is set to 8 × 8 and the overlap of all the experiments is set
to 6. The dictionary size of KSVD and MST-SR method is set to 256. All the experiments are programmed in Matlab 2014a
on an Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz Laptop with 12.00 GB RAM.
Five mainstream objective evaluation metrics are implemented for the quantitative evaluation. These metrics include
mutual information (MI) [38], edge retention (QAB/F ) [29], visual information fidelity (VIF) [32], Yang proposed fusion metric
(QY ) [23,43] and Chen-Blum metric (QCB ) [6,23]. For fused image, the sizes of MI, QAB/F , VIF, QY and QCB are bigger, the fusion
results are better.
3.1.1. Mutual information

MI is defined in Eq. (12). It can be used to describe the common features between source images and the fused image.

L
L
hA,F (i, j )
MI = hA,F (i, j )log2 , (12)
hA ( i )hF ( j )
i=1 j=1
where L is the number of gray-level, hR,F (i, j) is the gray histogram of image A and F. hA (i) and hF (j) are edge histogram of
image A and F. MI of fused image can be calculated by Eq. (13).
MI (A, B, F ) = MI (A, F ) + MI (B, F ), (13)
where MI(A, F) represents the MI value of input image A and fused image F; MI(B, F) represents the MI value of input image
B and fused image F.
3.1.2. QAB/F
QAB/F metric is a gradient-based quality index to measure the edge information of source images compared with the
fused image. QAB/F is formalized in Eq. (14).

Q AF (i, j )wA (i, j ) + Q BF (i, j )wB (i, j )
A
AB/F i, j
Q = , (14)
i, j w (i, j ) + w (i, j )
B
where Q AF = QgAF Q0AF , QgAF and Q0AF are the edge strength and orientation preservation values at location (i,j). QBF can be
computed similarly to QAF . wA (i, j) and wB (i, j) are the weights of QAF and QBF , respectively.
3.1.3. Visual information fidelity

VIF is the novel full reference image quality metric. VIF quantifies the mutual information between the reference and
test images based on Natural Scene Statistics (NSS) theory and Human Visual System (HVS) model. It can be expressed as
the ratio between the distorted test image information and the reference image information, the calculation equation of VIF
is shown in Eq. (15).
−→ −→
I (C N,i ; F N,i )
V IF = i∈subbands −→ −→ , (15)
i∈subbands I (C ; E )
N,i N,i
−→ −→ −→ −→
where I (C N,i ; F N,i ) and I (C N,i ; E N,i ) represent the mutual information, which are extracted from a particular subband in the
−→ −
→ −→
reference and the test images respectively. C N denotes N elements from a random field, E N and F N are visual signals in the
output of HVS model from the reference and the test images, respectively.
To evaluate the VIF of fused image, an average of VIF values of each input image and the integrated image is proposed
[32]. The evaluation function of VIF for image fusion is shown in Eq. (16).
V IF (A, F ) + V IF (B, F )
V IF (A, B, F ) = , (16)
2
where VIF(A, F) is the VIF value between input image A and fused image F; VIF(B, F) is the VIF value between input image B
and fused image F.
3.1.4. QY
QY is a structural similarity-based way for fusion assessment [43]. The definition of QY is shown in Eq. (17).
⎧
⎨λ(ω )SSIM (A, F |ω ) + (1 − λ(ω )(SSIM (B, F |ω ),
⎪
SSIM (A, B|ω ) ≥ 0.75,
QY = , (17)
⎩max{SSIM (A, F |ω ), SSIM (A, B|ω )},
⎪
SSIM (A, B|ω ) < 0.75
where λ(ω) is the local weight, SSIM(A, B) is an index measure of structural similarity for images A and B. The details of
λ(ω) and SSIM(A, B) can be found in [43] and [23].
3.1.5. QCB
QCB is human-perception inspired fusion metric. The definition of QCB metric consists of 5 steps:
In the first step, filtering image I(i, j) in frequency domain. I(i, j) is transformed

to frequency domain and get I(m, n).
Filtering I(m, n) by contrast sensitive function(CSV) [6] filter S(r), where r = m2 + n2 . In this image fusion metric S(r) is in
polar form. I˜(m, n ) can be obtained by I˜(m, n ) = I (m, n ) ∗ S(r ).
In the second step, local contrast is computed. For QCB metric, Pelis contrast is used and can be defined as:
φk (i, j ) ∗ I (i, j )
C (i, j ) = − 1. (18)
φk+1 (i, j ) ∗ I (i, j )
A common choice for φ k (i, j) would be
i2 + j 2
1
φk (i, j ) = √ e 2σ 2
k . (19)
2πσk
with a standard deviation σk = 2.

In the third step, The masked contrast map for input image IA (i, j) is calculated as:
t (CA )
p

CA = . (20)
h(CA ) + Z
p
Here, t, h, p, q and Z are real scalar parameters that can determine the shape of non-linear masking function [6].
In the fourth step, the saliency map of IA (i, j) can be calculated by Eq. (21),
2
C A (i, j )
λA (i, j ) = 2 2
. (21)
C A (i, j ) + C B (i, j )
The information preservation value can be computed in Eq. (22),

⎧
⎨ CA (i, j ) , i fC (i, j ) < C (i, j ),
CF (i, j ) A F
QAF (i, j ) = (22)
⎩ CF (i, j ) , otherwise.
CA (i, j )
In the fifth step, the global quality map can be calculated:
QGQM (i, j ) = λA (i, j )QAF (i, j ) + λB (i, j )QBF (i, j ). (23)
Then the value of QCB can be obtained by averaging the global quality map:

QCB = QGQM (i, j ). (24)
i, j
3.2. Experimental results and analysis
In the experiments, multi-modality medical, infrared-visible, and multi-focus image pairs are used for testing.
Positron emission computed tomography(PET)-magnetic resonance imaging(MRI) and single-photon emission computed
tomography(SPECT)-MRI image pairs are conducted for multi-modality medical image fusion test. Infrared and visible images
of street are used to test infrared-visible image fusion. Remote sensing NIR and visible image pairs are also implemented
for infrared-visible image fusion test. Both color and gray-level multi-focus image pairs are conducted for multi-focus image
fusion.
Fig. 3. Fusion results of PET-MRI image pair with different image fusion methods.
Table 1
Objective evaluations of PET-MRI image pair fusion experimenta-
tions.
QAB/F MI VIF QY QCB
KSVD 0.2689 1.9477 0.2939 0.9201 0.7127

MST-SR 0.2943 1.9405 0.3129 0.9156 0.6880
JCPD 0.2790 1.9364 0.3737 0.9153 0.7481
Proposed 0.2990 2.0520 0.3090 0.9178 0.7576
3.2.1. Multi-modality medical image fusion

In the multi-modality medical image fusion experiments, two pairs of multi-modality medical images are implemented
for testing. All the multi-modality medical images are taken from brain slices images in this test. The first image pair for
testing is the PET-MRI medical image pair. MRI image of PET-MRI image pair can give the clear soft-tissue image and the PET
image can construct a three-dimensional image of tracer concentration within the body. The second image pair for testing
is the SPECT-MRI image pair. For SPECT-MRI image pair, MRI image also shows the image of brain slices with soft-tissue
information in this experiment. SPECT image provides the true 3D information of brain slices that is typically presented as
cross-sectional slices through the patient. Fused image of PET-MRI can be used to analyze the general situation of organ.
Additionally, the fusion result of PET-MRI image pair can help detecting lesions and diagnosing diseases in early stage.
SPECT-MRI image fusion can get similar function of PET-MRI image fusion. The size of PET-MRI and SPECT-MRI image pairs
for testing is 256 × 256 respectively.
The PET-MRI fusion results are demonstrated in Fig. 3. In Fig. 3, (a) and (b) are source images of MRI and PET respectively.
(c) to (f) of Fig. 3 are the fusion results of KSVD, MST-SR, JCPD and our proposed method. The fusion details are shown on
the right-top corner of all the images in Fig. 3. Comparing with the details of fused images by different methods, it can be
found that detailed information of MRI in (e) is missing. The details of fused image (d) show there is a black area around
MRI-PET information integrated area. Fused image (f) shows good performance on both color information restoration and
contrast.
The fusion performance measured by objective metric is shown in Table 1. As shown in Table 1, the proposed fusion
approach outperforms other methods in terms of the evaluation criteria including QAB/F , MI, QY , QCB . The performances
of these metrics can indicate, that the fused image obtained by our proposed method not only contains more detailed
information, is but also more suitable for human visual perception. Although VIF metric is not as good as JCPD and MST-
SR based method, our proposed method gets comparable performance. Thus our proposed fusion approach is superior to
SR-based methods.
The fusion results of SPECT-MRI are demonstrated in Fig. 4. In Fig. 4, (a) and (b) are source images of MRI and SPECT,
respectively. (c) to (f) of Fig. 4 are the fused images of SPECT-MRI by KSVD, MST-SR, JCPD and our proposed method. The
detailed information of specific part of images is shown on the right-top corner of Fig. 4 (a) to (f). In Fig. 5, the contrast and
brightness of (c) is poor, and information of MRI in (c) is also not clear. The detailed part of image (d) of Fig. 5 shows some
details of MRI information is missing. Thus, the images integrated by our proposed method and JCPD have better fusion
performance.
The metrics of proposed and comparing methods in Table 2. Table 2 shows our proposed fusion approach has best
performance on all the image fusion metrics, which include QAB/F , MI, VIF, QY and QCB metric. Thus, as our proposed method
Fig. 4. Fusion results of SPECT-MRI image pair with different image fusion methods.
Fig. 5. Fusion results of infrared and visible image pair “street view” with different image fusion methods.
Table 2
Objective evaluations of SPECT-MRI image pair fusion experimen-
tations.
QAB/F MI VIF QY QCB
KSVD 0.3164 1.7722 0.3241 0.3541 0.4807

MST-SR 0.3563 1.7958 0.3444 0.4193 0.4992
JCPD 0.3552 1.7090 0.3146 0.4111 0.4947
Proposed 0.3575 1.8105 0.3470 0.4227 0.5050
shows better performance on both subjective visual performance and objective metrics, it can indicate our proposed method
outperforms comparing methods.
3.2.2. Infrared/NIR and visible image fusion

Infrared and visible image fusion is widely used to enhance the performance for human visual perception, object detec-
tion, and target recognition. Visible sensors can capture reflected lights with abundant appearance information. In contrast,
infrared sensors can capture the thermal radiations of objects. Integrating infrared and visible information into a fused im-
age allows us to construct a more complete and accurate description of scene [46]. Near-infrared spectroscopy is based on
molecular overtone and combination vibrations. Typical applications of NIR and visible image fusion include medical and
chemistry analysis [7,8]. In the practise, fusing NIR and visible remote-sensing image can be used for soil organic carbon
prediction analysis [7], estimation of crop gross primary production [8], etc.
In this experiment, to test the performance of our proposed method in infrared and visible image fusion, a surveillance
infrared and visible image pair of “street view” is conducted. Additionally, a remote sensing image pair “IKONOS” of NIR and
visible is also implemented for testing. The size of image pair “street view” is 240 × 320. Remote sensing image is taken
from Wenchuan, China by satellite “IKONOS”. The size of image pair “IKONOS” is 500 × 500.
Table 3
Objective evaluations of infrared and visible image pair “street
view” fusion experimentations.
QAB/F MI VIF QY QCB
KSVD 0.6246 1.7508 0.3601 0.7095 0.5233

MST-SR 0.5181 1.6100 0.3626 0.5890 0.4460
JCPD 0.4975 1.8782 0.3877 0.6352 0.4812
Proposed 0.7055 1.8611 0.4159 0.8305 0.5484
Fig. 6. Fusion results of NIR-visible image pair “IKONOS” with different image fusion methods.
Table 4
Objective evaluations of NIR-visible image pair “IKONOS” fusion
experimentations.
QAB/F MI VIF QY QCB
KSVD 0.8193 1.6310 0.4227 0.9057 0.5637

MST-SR 0.8232 1.5888 0.4112 0.9105 0.6024
JCPD 0.4975 1.7301 0.4427 0.9369 0.5805
Proposed 0.8375 1.7453 0.5041 0.9474 0.6425
The source images of infrared and visible image pair “street view” are shown in Fig. 5(a) and (b). Fig. (a) and (b) are
visible and infrared image respectively. The fusion results by KSVD, MST-SR, JCPD and our proposed method are shown
in (c)–(f) of Fig. 5, respectively. A pedestrian on the street of each images is demonstrated on the left-bottom corner of
Fig. 5 (a)–(f) for detail comparison. In Fig. 5, integrated images of KSVD, MST-SR, JCPD are not shown as good contrast as
our proposed method. The sharpness of integrated image by KSVD, MST-SR, JCPD are also not as good as our proposed
method. Additionally, the detail of pedestrian also shows our proposed method gets better performance on the brightness
and contrast of image details.
Table 3 shows the objective metric of fused image “street view”. In Table 3, our proposed method shows best perfor-
mance on the metrics QAB/F , VIF, QY and QCB . It can infer that our proposed method shows better performance on the image
edge retention, human visualisation performance and source image structural similarity. For the MI, image integrated by
JCPD algorithm shows best result. However, the contrast of image integrated by JCPD is slightly poorer than the image fused
by our proposed method. Therefore, the proposed fusion approach outperforms other methods for comparison.
The source images and fusion results of image pair of “IKONOS” are shown in Fig. 6. Source images are shown in Fig. 6 (a)
and (b). Image Fig. 6(a) is the RGB image photo by visible sensor on “IKONOS”. NIR image is shown in Fig. 6(b). The inte-
grated images by KSVD, MST-SR, JCPD and our proposed method are shown in Fig. 6 (c)–(f). Specific details of fused images
and source images are shown on the right-top corner of (a)–(f) of Fig. 6. Comparing the details of images in Fig. 6, MST–SR
and our proposed method show better performance on detailed information than the rest of integrated images. Our pro-
posed method also shows great performance on image sharpness, which is an important feature for fusing remote sensing
image.
The objective metrics of “IKONOS” are shown in Table 4. As shown in Table 4, the proposed fusion approach achieves best
scores on objective evaluations of comparing methods. In conclusion, for remote sensing image pair “IKONOS”, our proposed
method shows the best performance on both objective and subjective evaluations.
Fig. 7. Fusion results of multi-focus image pair “book” with different image fusion methods.
Table 5
Objective evaluations of multi-focus image pair “book” fusion ex-
perimentations.
QAB/F MI VIF QY QCB
KSVD 0.8051 5.2185 0.9311 0.9201 0.8542

MST-SR 0.8221 5.1470 0.9498 0.9156 0.8478
JCPD 0.8068 5.5800 0.9423 0.9153 0.8488
Proposed 0.8197 5.6112 0.9501 0.9362 0.8649
Table 6
Objective evaluations of multi-focus image pair “HK and card” fu-
sion experimentations.
QAB/F MI VIF QY QCB
KSVD 0.5943 4.3009 0.7561 0.7275 0.6222

MST-SR 0.7903 4.4598 0.7402 0.9345 0.6967
JCPD 0.6091 4.3012 0.7399 0.6921 0.6070
Proposed 0.7997 4.8016 0.7624 0.9625 0.7140
3.2.3. Multi-focus image fusion

Multi-focus image fusion has emerged as a major topic in image processing. The goal of multi-focus image fusion is to
increase depth-of-field of camera lens by integrating multiple multi-focused images. In this test, gray-level image pair with
the size of 240 × 320 and RGB image “HK and card” with the size of 520 × 520 are implemented for multi-focus image fusion
tests.
The source images of multi-focus image pair “book” is shown in Fig. 7(a) and (b). In image Fig. 7(a), the left book is
out of focus and the right book of Fig. 7(b) is out-of-focus. Integrated images use KSVD, MST-SR, JCPD and our proposed
method are shown in Fig. 7(c)–(f). Details of book on the right side are shown on the left-top corner of Fig. 7(a)–(f). Since
all the fused image of “book” is quiet similar in human visual system, objective metric is a better way to measure the fusion
performance.
As shown in Table 5, our proposed method leads in 4 of 5 objective metrics, which include MI, VIF, QY , and QCB . There-
fore, it can infer that our proposed method achieves the best performance on information preservation of source images.
Additionally, in the human visual system inspired metrics VIF and QCB , our proposed method also shows the best perfor-
mance. MST–SR achieves best score on QAB/F , which means the edge retention performance is better than proposed method.
In addition, although QAB/F score of the proposed method is a little bit small, our proposed fusion method shows the best
performance on information preservation and visualisation.
The source images of RGB multi-focus image pair “HK and card” is shown in Fig. 8(a) and (b). In image Fig. 8(a), the
heart-shaped card is focused and the vista of Hongkong city is out-of-focus. Oppositely, in image Fig. 8(b), the vista of
Hongkong city is focused and the heart-shaped card is unclear. Image Fig. 8(a), (b) fused by KSVD, MST-SR, JCPD and our
proposed method are shown in Fig. 8(c)–(f). The detail of texture on heart-shaped card is shown on the left corner of
Fig. 7(a)–(f). Comparing the details on the right-top corner, it can be found that the details of KSVD and JCPD are slightly
blurry. For the remaining two integrated images, it is hard to figure out the difference by human visual system.
As shown in Table 6, our proposed method shows better performance on all the assessments including QAB/F , MI, VIF,
QY , and QCB . In brief, the results of objective evaluation sketched here illustrate that our proposed fusion approach performs
better than other comparing methods.
Fig. 8. Fusion results of multi-focus image pair “HK and card” with different image fusion methods.
4. Conclusion and discussion
This paper proposes a novel image fusion method based on image decomposition and sparse representation. In proposed
fusion method, an image fusion framework based on cartoon-texture decomposition is proposed. The proposed fusion frame-
work decomposes source images into cartoon and texture components. For cartoon components, an energy-based fusion rule
is proposed for the structure information preservation. For texture components, SR-based fusion method is implemented. In
proposed SR-based fusion method, a dictionary learning method is applied to source images, and image features are clus-
tered into a few groups by SKR features and LDP clustering method. Sub-dictionary of each image group is trained. Then
these sub-dictionaries with different features can be combined into an informative dictionary for sparse representation. In
comparison experiments, our proposed method also gets better performance than other comparing SR-based methods.
However, since there are plentiful matrix computations in sparse coding and dictionary learning, processing time of SR-
based image fusion methods is a major limitation. In comparison experiments, the average processing time of KSVD, JPCD,
MST-SR and our proposed method are 151.77 s, 90.9 s, 81.55 s and 95.55 s, respectively. The processing time of these SR-
based methods are longer than most of transform and spatial based fusion methods. Shortening computing time is one of
the most important research subjects for SR-based fusion methods. In future work, we will try to use GPU and multi-thread
computing to speed up the matrix computations in SR-based fusion methods.
Sparse representation is widely used in huge-scale data processing [16]. In many applications, running sparse repre-
sentation on huge-scale data sets usually exceeds the computing capacity of a single machine running single-thread ap-
proaches. Parallelizing the sparse representation processes for huge-scale data can reduce the capacity of each thread and
computation time. In following work, we will try to parallelize the images sparse representation and fusion processes on a
high-performance computing platform for huge-scale dataset.
Acknowledgments
We would like to thank the supports by National Natural Science Foundation of China (61633005, 61773080) China Post-
doctoral Science Foundation (2012M521676) China Central Universities Foundation (106112015CDJXY170 0 03, 106112016CD-
JZR175511) Chongqing Special Funding in Postdoctoral Scientific Research Project (XM2013007) and Natural Science Founda-
tion of Chongqing (cstc2015jcyjB0569).
References
[1] R. Alex, L. Alessandro, Clustering by fast search and find of density peaks, Science 344 (6191) (2014) 1492–1496.
[2] J. Cao, J. Hao, X. Lai, C.M. Vong, M. Luo, Ensemble extreme learning machine and sparse representation classification, J. Frankl. Inst. 353 (17) (2016)
4526–4541.
[3] X. Chang, F. Nie, Z. Ma, Y. Yang, X. Zhou, A convex formulation for spectral shrunk clustering 4 (2015) 2532–2538 http://xueshu.baidu.com/s?
wd=paperuri%3A%28b8492f13e159ea2f3735c8678092fdee%29&filter=sc_long_sign&tn=SE_xueshusource_2kduw22v&sc_vurl=http%3A%2F%2Farxiv.org%
2Fabs%2F1411.6308&ie=utf-8&sc_us=13052775242784454503.
[4] X. Chang, F. Nie, Y. Yang, C. Zhang, H. Huang, Convex sparse PCA for unsupervised feature learning, ACM Trans. Knowl. Discov. Data. 11 (1) (2016) 3.
[5] X. Chang, Y. Yang, Semi-supervised feature analysis by mining correlations among multiple tasks, IEEE Trans. Neural Netw. (2014) 1–12.
[6] Y. Chen, R.S. Blum, A new automated quality assessment algorithm for image fusion, Image Vis. Comput. 27 (10) (2009) 1421–1432.
[7] A.A. Gitelson, Y. Peng, J.G. Masek, D.C. Rundquist, S. Verma, A. Suyker, J.M. Baker, J.L. Hatfield, T. Meyers, Remote estimation of crop gross primary
production with landsat data, Remote Sens. Environ. 121 (2012) 404–414.
[8] C. Gomez, R. Rossel, A. Mcbratney, Soil organic carbon prediction by hyperspectral remote sensing and field vis-NIR spectroscopy: an australian case
study, Geoderma 148 (3–4) (2008) 403–411.
[9] E. Hadjidemetriou, M.D. Grossberg, S.K. Nayar, Multiresolution histograms and their use for recognition, IEEE Trans. Pattern Anal. Mach. Intell. 26 (7)
(2004) 831–847.
[10] M. Kim, D.K. Han, H. Ko, Joint patch clustering-based dictionary learning for multimodal image fusion, Inf. Fus. 27 (2016) 198–214.
[11] J.J. Lewis, R.J. O’Callaghan, S.G. Nikolov, D.R. Bull, C.N. Canagarajah, Pixel- and region-based image fusion with complex wavelets, Inf. Fus. 8 (2) (2007)
119–130.
[12] H. Li, X. Li, Z. Yu, C. Mao, Multifocus image fusion by combining with mixed-order structure tensors and multiscale neighborhood, Inf. Sci. (Ny)
349–350 (2016) 25–49.
[13] H. Li, X. Liu, Z. Yu, Y. Zhang, Performance improvement scheme of multifocus image fusion derived by difference images, Signal Process. 128 (2016)
474–493.
[14] H. Li, H. Qiu, Z. Yu, B. Li, Multifocus image fusion via fixed window technique of multiscale images and non-local means filtering, Signal Process. 138
(2017) 71–85.
[15] H. Li, H. Qiu, Z. Yu, Y. Zhang, Infrared and visible image fusion scheme based on NSCT and low-level visual features, Infrared Phys. Technol. 76 (2016)
174–184.
[16] Q. Li, S. Qiu, S. Ji, P.M. Thompson, J. Ye, J. Wang, Parallel lasso screening for big data optimization, in: Proceedings of the ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, 2016, pp. 1705–1714.
[17] S. Li, H. Yin, L. Fang, Group-sparse representation with dictionary learning for medical image denoising and fusion, IEEE Trans. Biomed. Eng. 59 (12)
(2012) 3450–3459.
[18] H. Liu, D. Guo, F. Sun, Object recognition using tactile measurements: kernel sparse coding methods, IEEE Trans. Instrum. Meas. 65 (3) (2016) 656–665.
[19] H. Liu, Y. Liu, F. Sun, Traffic sign recognition using group sparse coding, Inf. Sci. (Ny) 266 (10) (2014) 75–89.
[20] H. Liu, Y. Liu, F. Sun, Robust exemplar extraction using structured sparse coding, IEEE Trans. Neural Netw. Learn. Syst. 26 (8) (2015) 1816–1821.
[21] H. Liu, Y. Yu, F. Sun, J. Gu, Visualtactile fusion for object recognition, IEEE Trans. Autom. Sci. Eng. 14 (2) (2017) 996–1008.
[22] Y. Liu, S. Liu, Z. Wang, A general framework for image fusion based on multi-scale transform and sparse representation, Inf. Fus. 24 (2015) 147–164.
[23] Z. Liu, E. Blasch, Z. Xue, J. Zhao, R. Laganiere, W. Wu, Objective assessment of multiresolution image fusion algorithms for context enhancement in
night vision: a comparative study, IEEE Trans. Pattern Anal. Mach. Intell. 34 (1) (2011) 94–109.
[24] Z. Liu, Y. Chai, H. Yin, J. Zhou, Z. Zhu, A novel multi-focus image fusion approach based on image decomposition, Inf. Fus. 35 (2017) 102–116.
[25] K.A. May, M.A. Georgeson, Blurred edges look faint, and faint edges look sharp: the effect of a gradient threshold in a multi-scale edge coding model,
Vis. Res. 47 (13) (2007) 1705–1720.
[26] M. Nejati, S. Samavi, S. Shirani, Multi-focus image fusion using dictionary-based sparse representation, Inf. Fus. 25 (2015) 72–84.
[27] F. Nencini, A. Garzelli, S. Baronti, L. Alparone, Remote sensing image fusion using the curvelet transform, Inf. Fus. 8 (2) (2007) 143–156.
[28] G. Pajares, J.M. de la Cruz, A wavelet-based image fusion tutorial, Pattern Recognit 37 (9) (2004) 1855–1872.
[29] V.S. Petrovic, Subjective tests for image fusion evaluation and objective metric validation, Inf. Fus. 8 (2) (2007) 208–216.
[30] T. Qiao, W. Li, B. Wu, J. Wang, A chaotic iterative algorithm based on linearized Bregman iteration for image deblurring, Inf. Sci. 272 (2014) 198–208.
[31] R. Rubinstein, M. Zibulevsky, M. Elad, Efficient implementation of the k-SVD algorithm using batch orthogonal matching pursuit, Cs Tech. 40 (2008).
[32] H.R. Sheikh, A.C. Bovik, Image information and visual quality, IEEE Trans. Image Process. 15 (2) (2006) 430–444.
[33] H. Takeda, S. Farsiu, P. Milanfar, Kernel regression for image processing and reconstruction, IEEE Trans. Image Process. 16 (2) (2007) 349–366.
[34] M. Varma, A. Zisserman, A statistical approach to texture classification from single images, Int. J. Comput. Vis. 62 (1–2) (2005) 61–81.
[35] L.A. Vese, S. Osher, Image denoising and decomposition with total variation minimization and oscillatory functions, J Math Imaging Vis 20 (1–2)
(2004) 7–18.
[36] K. Wang, G. Qi, Z. Zhu, Y. Chai, A novel geometric dictionary construction approach for sparse representation based image fusion, Entropy 19 (7) (2017)
306.
[37] L. Wang, B. Li, L. Tian, Multi-modal medical image fusion using the inter-scale and intra-scale dependencies between image shift-invariant shearlet
coefficients, Inf. Fus. 19 (2014) 20–28.
[38] Q. Wang, Y. Shen, Y. Zhang, J.Q. Zhang, Fast quantitative correlation analysis and information deviation analysis for evaluating the performances of
image fusion techniques, IEEE Trans. Instrum. Meas. 53 (5) (2004) 1441–1447.
[39] W. Wang, L. Jiao, S. Yang, Fusion of multispectral and panchromatic images via sparse representation and local autoregressive model, Inf. Fus. 20
(2014) 73–87.
[40] J. Xie, Y. Huang, J. Paisley, X. Ding, X. Zhang, Pan-sharpening based on nonparametric Bayesian adaptive dictionary learning, in: Proceedings of the
IEEE International Conference on Image Processing, ICIP 2013, Melbourne, Australia, September 15–18 2013, 2013, pp. 2039–2042.
[41] B. Yang, S. Li, Multifocus image fusion and restoration with sparse representation, IEEE Trans. Instrum. Meas. 59 (4) (2010) 884–892.
[42] B. Yang, S. Li, Pixel-level image fusion with simultaneous orthogonal matching pursuit, Inf. Fus. 13 (1) (2012) 10–19.
[43] C. Yang, J.Q. Zhang, X.R. Wang, X. Liu, A novel similarity based quality metric for image fusion, Inf. Fus. 9 (2) (2008) 156–160.
[44] J. Yang, Z. Wang, Z. Lin, S. Cohen, T.S. Huang, Coupled dictionary training for image super-resolution, IEEE Trans. Image Process. 21 (8) (2012)
3467–3478.
[45] H. Yin, Y. Li, Y. Chai, Z. Liu, Z. Zhu, A novel sparse-representation-based multi-focus image fusion approach, Neurocomputing 216 (2016) 216–229.
[46] Z. Zhou, B. Wang, S. Li, M. Dong, Perceptual fusion of infrared and visible images through a hybrid multi-scale decomposition with gaussian and
bilateral filters, Inf. Fus. 30 (2015) 15–26.
[47] Z. Zhu, Y. Chai, H. Yin, Y. Li, Z. Liu, A novel dictionary learning approach for multi-modality medical image fusion, Neurocomputing (214) (2016)
471–482.
[48] Z. Zhu, G. Qi, Y. Chai, P. Li, A geometric dictionary learning based approach for fluorescence spectroscopy image fusion, Appl. Sci. 7 (2) (2017) 161.
[49] W. Zuo, L. Zhang, C. Song, D. Zhang, H. Gao, Gradient histogram estimation and preservation for texture enhanced image denoising, IEEE Trans. Image
Process. 23 (6) (2014) 2459–2472.

Information Sciences: Zhiqin Zhu, Hongpeng Yin, Yi Chai, Yanxia Li, Guanqiu Qi

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Information Sciences: Zhiqin Zhu, Hongpeng Yin, Yi Chai, Yanxia Li, Guanqiu Qi

Transféré par

Droits d'auteur :

Formats disponibles

evaluation

Information Sciences 432 (2018) 516–529

Contents lists available at ScienceDirect

A novel multi-modality image fusion method based on image

Fig. 1. The proposed fusion framework.

2. The novel multi-modality image fusion method

2.1. Image cartoon-texture decomposition

shown in Eq. (1):

2.2. Fusion rule for cartoon component

2.3. Dictionary learning for texture component fusion

2.3.1. Image pixels clustering

Fig. 2. LDP clustering for image pixels.

Algorithm 1 Image pixels clustering.

2.3.2. Sub-dictionary training and dictionary construction

where o is the group number of image pixel.

2.4. SR Based fusion method for texture component

2.5. Fusion rule for cartoon and texture components

3. Experiments and analyses

3.1. Experimental setup

3.1.1. Mutual information

3.1.3. Visual information ﬁdelity

with a standard deviation σk = 2.

The information preservation value can be computed in Eq. (22),

In the ﬁfth step, the global quality map can be calculated:

QGQM (i, j ) = λA (i, j )QAF (i, j ) + λB (i, j )QBF (i, j ). (23)

3.2. Experimental results and analysis

QAB/F MI VIF QY QCB

KSVD 0.2689 1.9477 0.2939 0.9201 0.7127

3.2.1. Multi-modality medical image fusion

QAB/F MI VIF QY QCB

KSVD 0.3164 1.7722 0.3241 0.3541 0.4807

3.2.2. Infrared/NIR and visible image fusion

QAB/F MI VIF QY QCB

KSVD 0.6246 1.7508 0.3601 0.7095 0.5233

QAB/F MI VIF QY QCB

KSVD 0.8193 1.6310 0.4227 0.9057 0.5637

QAB/F MI VIF QY QCB

KSVD 0.8051 5.2185 0.9311 0.9201 0.8542

QAB/F MI VIF QY QCB

KSVD 0.5943 4.3009 0.7561 0.7275 0.6222

3.2.3. Multi-focus image fusion

4. Conclusion and discussion

Vous aimerez peut-être aussi