Vous êtes sur la page 1sur 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/323760402

An Efficient Method for the Fusion of Light Field Refocused Images

Conference Paper · March 2018


DOI: 10.1117/12.2302687

CITATIONS READS

0 34

4 authors, including:

Yingqian Wang
National University of Defense Technology
3 PUBLICATIONS   0 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Image fusion View project

All content following this page was uploaded by Yingqian Wang on 23 April 2018.

The user has requested enhancement of the downloaded file.


PROCEEDINGS OF SPIE
SPIEDigitalLibrary.org/conference-proceedings-of-spie

An efficient method for the fusion of


light field refocused images

Yingqian Wang, Jungang Yang, Chao Xiao, Wei An

Yingqian Wang, Jungang Yang, Chao Xiao, Wei An, "An efficient method for
the fusion of light field refocused images," Proc. SPIE 10615, Ninth
International Conference on Graphic and Image Processing (ICGIP 2017),
1061536 (10 April 2018); doi: 10.1117/12.2302687

Event: Ninth International Conference on Graphic and Image Processing,


2017, Qingdao, China

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 4/16/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use


An Efficient Method for the Fusion of Light Field Refocused Images
Yingqian Wang1, Jungang Yang1, Chao Xiao1, Wei An1
1
College of Electronic Science, National University of Defense Technology, Changsha, China

ABSTRACT

Light field cameras have drawn much attention due to the advantage of post-capture adjustments such as refocusing
after exposure. The depth of field in refocused images is always shallow because of the large equivalent aperture. As a
result, a large number of multi-focus images are obtained and an all-in-focus image is demanded. Consider that most
multi-focus image fusion algorithms do not particularly aim at large numbers of source images and traditional DWT-
based fusion approach has serious problems in dealing with lots of multi-focus images, causing color distortion and
ringing effect. To solve this problem, this paper proposes an efficient multi-focus image fusion method based on
stationary wavelet transform (SWT), which can deal with a large quantity of multi-focus images with shallow depth of
fields. We compare SWT-based approach with DWT-based approach on various occasions. And the results demonstrate
that the proposed method performs much better both visually and quantitatively.
Keywords: Light field, Depth of field, Multi-focus image fusion, Stationary wavelet transform

1. INTRODUCTION
Light field cameras have drawn much attention from both industry and academia recently. In the past decade, a
variety of light field cameras have been designed. Light field cameras can capture full 4D spatial-angular information
within a single photographic exposure, which enables post-capture adjustments such as refocusing on arbitrary depth [1].
However, due to the large equivalent aperture, the depth of field in refocused images is always shallow, which requires a
large number of source multi-focus images to cover all the depth. To obtain an image with all objects in focus, multi-
focus image fusion technique is demanded.
As an important branch of image fusion, multi-focus image fusion technique aims at the problem that a clear image
with all objects in focus cannot be obtained within one single snapshot due to the camera’s limited depth of field.
Consequently, the goal of the technique is to generate an all-in-focus image from multi-focus images [2]. So far, many
algorithms have been represented, which can be roughly divided into two categories: spatial domain fusion approach and
transform domain fusion approach [3]. Spatial domain image fusion method always performs less satisfactorily due to
the low contrast and serious block effect [4]. The general idea of multi-focus image fusion in transform domain is to
decompose the source images into several scales, and then integrate all the coefficients according to some rules to
generate the fusion result.
As a powerful tool, discrete wavelet transform (DWT) has been deeply researched and widely applied in multi-focus
image fusion [5]. However, due to the down-sampling process, DWT-based method often gets distorted results [6],
especially when dealing with a large number of images with shallow depth of field. To solve this problem, we propose a
novel multi-focus image fusion algorithm which uses stationary wavelet transform (SWT) to avoid the down-sampling
process [7]. We use SWT to decompose source images, integrate and select the wavelet coefficients under different
scales. Finally we fuse the chosen coefficients by inverse SWT (ISWT) to create a composited all-in-focus image. The
proposed method is more robust on complex light field fusion occasions. We employ both synthetic datasets and real-
world light field datasets to demonstrate the better performance of the approach both visually and quantitatively.
This paper is organized as follows: Section 2 contains a review of DWT and SWT. In Section 3, details and the
process of the algorithm are described. Experimental results and analysis are presented in Section 4. Finally the
conclusion is drawn in Section 5.

Ninth International Conference on Graphic and Image Processing (ICGIP 2017), edited by Hui Yu, Junyu Dong,
Proc. of SPIE Vol. 10615, 1061536 · © 2018 SPIE · CCC code: 0277-786X/18/$18 · doi: 10.1117/12.2302687

Proc. of SPIE Vol. 10615 1061536-1

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 4/16/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use


2. DWT AND SWT DECOMPOSITION
Wavelet transform has been successfully used in many fusion schemes. A common wavelet transform approach used
in multi-focus image fusion is DWT. DWT is a spatial-frequency combined decomposition that provides flexible multi-
resolution analysis of an image [8]. Approximate components, horizontal details, vertical details and diagonal details can
be obtained respectively when source images are decomposed with two-dimensional DWT. The same process is
employed recursively on the down-sampled low frequency components. And the decomposition of the details under
different scales can be achieved.
The main difference between SWT and DWT is that the down-sampling process of the output coefficients in DWT is
replaced by the filter interpolation in SWT. The details under different scales can be extracted after applying the multi-
scaled interpolated filter to the source images. SWT is shift-invariant due to the avoidance of down-sampling process [9],
which can overcome the ringing effect and color distortion in the DWT-based image fusion. The main process of SWT-
based image decomposition is illustrated in Figure 1.

-00 cAi,t
rows
-- F
0 cDN,
F1 -0 -F>+l

-
T2
cA -
cD1+i
rows
C'i T2
GJ
columns

G,
-> cDD t
rows columns

Where x Convolve with filter X


the rows of the entry x Convolve with filter X
the columns of the entry

T2 Upsample Process

Figure 1. SWT-based decomposition process

After the operation in each sub-image according to corresponding rules, the coefficients need to be fused according
to the reverse process of the SWT decomposition.

3. METHODOLOGY
To demonstrate the method proposed in this paper, we assume that the source multi-focus images named In
(n  1,2,..., N ) are decomposed into k levels with SWT, which contain approximate components A , horizontal
details H , vertical details V and diagonal details D in each level. Suppose the size of I n is P  Q , due to the
avoidance of down-sampling process, the size of all the SWT coefficients , H , V , D in each level is P  Q as well.
Then different fusion rules are employed to select the coefficients.

Proc. of SPIE Vol. 10615 1061536-2

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 4/16/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use


Usually, the ‘max absolute value (Max Abs)’ rule is used to select the detail coefficients because areas in focus
always match to the edge and texture, whose detail coefficients’ absolute values are large. Max Abs rule can be
represented as

H k (i, j )  arg max H nk (i, j )


H nk ( i , j )
  1  n  N  . (1)

Where N is the total number of source multi-focus images. n is the frame of the source images ranged from 1 to
N . (i, j ) represents the coordinate in decomposed coefficient images. k represents the decomposition level. H nk
represents the horizontal details coefficients of image n in k th level and H k represents the selected horizontal details
in k th level. The same selecting process can be employed to the coefficients V and D .
As for approximate components which contain some large-scale information, ‘average (Avg)’ rule is commonly used
in many algorithms, which is defined as
N
1
Ak (i, j ) 
N
 A (i, j ) .
n 1
k
n (2)

Where Ank represents the approximate components coefficients of image n in k th level and Ak represents the
integrated approximate components coefficients in k th level.

Both rules above are confirmed efficient by practice. Finally, the integrated coefficients need to be fused with ISWT
to obtain the output image I f . The image fusion process is demonstrated in Figure 2.
Fusion rule
A' A'

Fusion rule
H'

1 `level Fusion rule


vi
Fusion rule
D'

SWT SW7'

decompose fuse
Fusion rule
Ak
Fusion rule
Hk
kthIeLel Fusion rule

Fusion rule

Figure 2. Proposed Methodology

Proc. of SPIE Vol. 10615 1061536-3

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 4/16/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use


4. EXPERIMENTS AND RESULTS
Synthetic datasets and real-world light field datasets are employed to the researched approaches respectively. Since
an all-in-focus reference image is difficult to obtain due to the physical limitation of optical lenses, it’s difficult to
acquire quantitative metrics when using real optical multi-focus images. Thus, the synthetic datasets are demanded in
which we use Gaussian Blurring Kernel to simulate the optical blur. However, although the synthetic datasets can test the
algorithms accurately, only employing them cannot test the performance of the approach when dealing real multi-focus
images with extremely shallow depth of fields. Consequently, real-world light field datasets are needed.
4.1 Evaluation criteria
The root mean square error (RMSE) between the reference image R and the fused image F measures the difference
between the fused image and the reference image [10], which is defined as

   R (i , j )  F (i , j ) 
M N 2

i 1 j 1
 RMSE  
MN
The peak signal noise rate (PSNR) [11] between the reference image R and the fused image F is defined as

2552 MN 
PSNR  10lg
   R(i, j )  F (i, j)
M N 2

i 1 j 1

4.2 Experimental results


Following the preceding idea, we blur the original reference images named ‘Tarot Cards’ and ‘Lena’ into two ‘multi-
focus’ images (left blurred and right blurred). Afterwards methods are employed to the source images. In the experiment,
5 levels of decomposition and ‘Bior 4.4’ wavelet are applied, which is confirmed to produce best results by practice. The
fusion results are shown in Figure 3 and the corresponding quantitative metrics are listed in Table 1.
Tarot Cards
.Y

rMMA.t1A4
.
_

h1
I?Ii

ä HRNGtp ¡
I
1
ÌI.

,V

Z
ti
Z

(a) (b) (c) (d)


Lena

(a) (b) (c) (d)


Figure 3. Fusion results of ‘Tarot Cards’ and ‘Lenna’. (a) Left blurred source image. (b) Right blurred source image. (c) Fusion result
by DWT-based method. (d) Fusion result by SWT-based method.

Proc. of SPIE Vol. 10615 1061536-4

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 4/16/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use


Table 1. Quality assessment of the fused image in Figure 3

Tarot Cards Lena


RMSE PSNR RMSE PSNR
DWT-based 1.9971 42.0087 1.3737 42.0823
SWT-based 1.1253 46.9913 0.6834 48.1408

Since the number of the source images is only two and the fusion situation is simple, it’s difficult to distinguish the
differences visually among the fused images in Figure 3. But from the quality assessment in Table 1, it can be seen that
with the proposed approach, the fused images acquire a better quantitative metrics with a smaller RMSE and a larger
PSNR.
In order to demonstrate the superiority of the proposed method when dealing with large numbers of multi-focus
images containing complicated details, we use the light field datasets caught by a camera array from Stanford University
[12] and refocus them on various depths [13]. Twenty-one refocused images named ‘Lego Knights’ and fourteen
refocused images named ‘Jelly Beans’ are shown in Figure 4 and Figure 5. Each image is with the shallow depth of field.
The resolution of ‘Lego Knights’ and ‘Jelly Beans’ are respectively 512*512 and 512*256, which limit the ceiling of the
decomposition level to 9 and 8 respectively. Due to the lack of quantitative metrics in light field multi-focus image
fusion situation, we vary the decomposition level from 1 to 8 in both scenes to observe its influence on the results. ‘Bior
4.4’ wavelet is still adopted and Figure 6 shows the fusion results produced by the researched methods.

(a) (b) (c) (d) (e) (f) (g)

(h) (i) (j) (k) (l) (m) (n)

(o) (p) (q) (r) (s) (t) (u)

Figure 4. Multi-focus source images named ‘Lego Knights’. From (a) to (u) are the refocused images on various of depths (from
shallow to deep).

Proc. of SPIE Vol. 10615 1061536-5

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 4/16/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use


(a) (b) (c) (d) (e) (f) (g)

(h) (i) (j) (k) (l) (m) (n)

Figure 5. Multi-focus source images named ‘Jelly Beans’. From (a) to (n) are the refocused images on various of depths (from deep
to shallow).

Lego Knights

A1 A2 A3 A4 A5 A6 A7 A8

B1 B2 B3 B4 B5 B6 B7 B8

Jelly Beans

A1 A2 A3 A4 A5 A6 A7 A8

B1 B2 B3 B4 B5 B6 B7 B8

Figure 6. Fusion results of light field refocused images with various decomposition levels. Results in row A are based on the DWT
method, results in row B are based on the SWT method. The number of the columns corresponds to the wavelet decomposition level
(from 1 to 8).

It can be clearly seen that there are serious problems in DWT-based approach. It produces unsatisfactory fusion
results with ringing effect in level 2, 3, 4, and color distortion in level 6, 7, 8. The proposed SWT-based method
obviously gets a better performance with no ringing effect in low levels and less color distortion in high levels. Thus it is
confirmed that the SWT-based approach is more robust to the decomposition level and dose well in extremely
complicated fusion situation.

5. CONCLUSION
This paper first reviews traditional DWT-based fusion method, then proposes a novel approach to fuse complicated
multi-focus images. In the proposed method, we get wavelet coefficients with SWT decomposition and employ different
fusion rules to integrate the coefficients. ‘Max Abs’ rule is adopted to detail coefficients fusion while ‘Avg’ rule is
adopted to approximate coefficients fusion. Finally, we test the researched methods by experiments and make a
comparison. However, how to reduce the computational complexity of the algorithm and optimize the parameters are
worthy to be further researched.

Proc. of SPIE Vol. 10615 1061536-6

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 4/16/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use


REFERENCES

[1] Wang Y, Hou G, Sun Z, “A simple and robust super resolution method for light field images,” IEEE International
Conference on Image Processing. IEEE, 1459-1463, (2016)
[2] Li S, Kang X, Fang L, “Pixel-level image fusion: A survey of the state of the art,” Information Fusion. 33, 100-112,
(2017)
[3] Kannan K, Perumal S A, Arulmozhi K. “The Review of Feature Level Fusion of Multi-Focused Images Using
Wavelets,” Recent Patents on Signal Processing, 2(1), 28-38, (2010)
[4] Bai X, Liu M, Chen Z, “Multifocus image fusion through gradient-based decision map construction and
mathematical morphology,” IEEE Access, 4, 4749-4760, (2016)
[5] Yang Y, Yang M, Huang S, “Multi-focus Image Fusion Based on Extreme Learning Machine and Human Visual
System,” IEEE Access, 5(99), 6989-7000, (2017)
[6] Wang H, Jing Z, Li J, “An image fusion approach based on discrete wavelet frame,” Information Fusion, 2003.
Proceedings of the Sixth International Conference of. IEEE, 1490-1493, (2003)
[7] Zhou Z, Lin J, Jin W, “Image fusion by combining SWT and variational model,” International Congress on Image
and Signal Processing. IEEE, 1907-1910, (2011)
[8] Lewis J J, O'Callaghan R J, Nikolov S G, “Pixel- and region-based image fusion with complex wavelets,”
Information Fusion, 8(2), 119-130, (2007)
[9] Li S, Yang B, “Hybrid multiresolution method for multisensor multimodal image fusion,” IEEE Sensors Journal,
10(9), 1519-1526, (2010)
[10] Zhang Z, Blum R S, “A categorization of multiscale-decomposition-based image fusion schemes with a performance
study for a digital camera application,” Proceedings of the IEEE, 87(8), 1315-1326, (1999)
[11] Chu H, Li J, Zhu W L, “Image fusion scheme based on local gradient,” International Conference on
Communications, Circuits and Systems, 2005. Proceedings(Vol. 1st), IEEE, 528-532, (2005)
[12] Stanford, “The (new) stanford light field archive, http://lightfield.stanford.edu,” (2008).
[13] Wilburn B, Joshi N, Vaish V, “High performance imaging using large camera arrays,” ACM Transactions on
Graphics, 24(3), 765-776, (2005)

Proc. of SPIE Vol. 10615 1061536-7

Downloaded View
From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie
publication stats on 4/16/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

Vous aimerez peut-être aussi