Vous êtes sur la page 1sur 13

Signal Processing 103 (2014) 142154

Contents lists available at ScienceDirect

Signal Processing
journal homepage: www.elsevier.com/locate/sigpro

Generalized joint kernel regression and adaptive dictionary


learning for single-image super-resolution
Chen Huang n, Yicong Liang, Xiaoqing Ding, Chi Fang n
State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology,
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

a r t i c l e in f o abstract

Article history: This paper proposes a new approach to single-image super-resolution (SR) based on
Received 4 June 2013 generalized adaptive joint kernel regression (G-AJKR) and adaptive dictionary learning.
Received in revised form The joint regression prior aims to regularize the ill-posed reconstruction problem by
30 September 2013
exploiting local structural regularity and nonlocal self-similarity of images. It is composed
Accepted 18 November 2013
of multiple locally generalized kernel regressors defined over similar patches found in the
Available online 27 December 2013
nonlocal range which are combined, thus simultaneously exploiting both image statistics
Keywords: in a natural manner. Each regression group is then weighted by a regional redundancy
Single-image super-resolution measure we propose to control their relative effects of regularization adaptively. This joint
Face hallucination
regression prior is further generalized to the range of multi-scales and rotations. For
Face recognition
robustness, adaptive dictionary learning and dictionary-based sparsity prior are intro-
Joint kernel regression
Dictionary learning duced to interact with this prior. We apply the proposed method to both general natural
images and human face images (face hallucination), and for the latter we incorporate a
new global face prior into SR reconstruction while preserving face discriminativity. In both
cases, our method outperforms other related state-of-the-art methods qualitatively and
quantitatively. Besides, our face hallucination method also outperforms the others when
applied to face recognition applications.
& 2013 Elsevier B.V. All rights reserved.

1. Introduction features for recognition purposes. The SR of face images


is also called face hallucination [15].
Single-image super-resolution (SR) refers to the task of The imaging model in the SR problem is generally
estimating a high resolution (HR) image X A Rn from a expressed as
single low resolution (LR) image Y A Rm (lexicographically
ordered vector and m o n). SR techniques are central to Y DHX V; 1
various applications, such as medical imaging, satellite
imaging and video surveillance. They are especially neces- where DA Rmn and H A Rnn are the downsampling
sary for face recognition applications in video surveillance matrix and blurring matrix respectively, and V A Rm is
systems because the face resolution is normally low in assumed to be an additive Gaussian white noise vector.
surveillance video, causing the loss of essential facial Then recovering an HR X from the input LR Y is an ill-
posed problem, and the optimal HR image X can be found
by maximizing the posterior probability pXjY based on
n
the maximum a posteriori (MAP) criterion and Bayesian
Corresponding authors. Tel.: 86 10 62772369 645.
rule
E-mail addresses: yach23@gmail.com (C. Huang),
liangyicong@ocrserv.ee.tsinghua.edu.cn (Y. Liang),
pYjXpX
dxq@ocrserv.ee.tsinghua.edu.cn (X. Ding), X^ arg max pXjY argmax : 2
fangchi@ocrserv.ee.tsinghua.edu.cn (C. Fang). X X pY

0165-1684/$ - see front matter & 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.sigpro.2013.11.042
C. Huang et al. / Signal Processing 103 (2014) 142154 143

Generally, pYjX is modeled by the Gaussian distribu- discards the further potential enabled by the higher-order
tion, thus maximizing pYjX boils down to minimizing the statistics. Besides, it needs a separate deblurring process
data constraint [2] Y DHX22 . On the other hand, p(X) which is ill-posed by itself. In our previous work [18],
codes the prior knowledge that we want to impose in the we proposed an Adaptive Joint Kernel Regression (AJKR)
HR space. Typically, the task of SR reconstruction is method, combining a set of coherent NLM-generalized
formulated as a regularized least-square optimization local regressors in the nonlocal range with higher-order
problem as follows: information (i.e. regional redundancy) injected in. By further
integrating adaptive dictionary learning under the MAP
^ arg minY  DHX2 CX;
X 3
X
2 framework, this algorithm produces superior results than
NLKR and excludes the necessity of separate deblurring.
where is the parameter balancing the effects of the data However, it only builds kernel regressors at the same scale
constraint and the regularization term C(X). Most of the and rotation. To exploit the full potential offered by such
past works focus on designing different formulations joint regression, the core algorithm should be generalized.
of C to regularize the ill-posed reconstruction problem. For face hallucination, the above methods dealing with
Currently, the single-image SR methods can be mainly general natural images cannot be readily applied due to
categorized into three classes: interpolation-based meth- the ignorance of the special properties of face images. This
ods, reconstruction-based methods, and example-based problem was first addressed in the pioneering work of
methods. Interpolation techniques (e.g. [6]) are simple and Baker and Kanade [1]. They adopted an image pyramid to
fast but tend to blur the fine details. The reconstruction- learn a prior on the gradient distribution of frontal face
based methods (e.g. [710]) follow the form of Eq. (3) and images using the Bayesian theory. However, the HR image
how to design a good image prior is always an essential prediction is pixelwise which causes discontinuities and
issue; C is usually a smoothness constraint. The example- artifacts. To generate high-quality HR face images, the
based methods (e.g. [9,1115]) hallucinate detailed textures current face hallucination methods usually involve two
from a training set of LR/HR image or patch pairs. However, steps. The first step reconstructs global faces in the face
such methods strongly rely on the chosen dataset for subspace using MAP criterion [11,2] or manifold learning
satisfactory results. methods. Principal Component Analysis (PCA) [2,3] is
Many example-based methods directly or implicity use widely used for face modeling. Classic manifold learning
a co-occurrence prior to constrain the correspondence methods include Locality Preserving Projections (LPPs) [5],
between LR and HR patches. For example, Yang et al. [11] Canonical Correlation Analysis (CCA) [4] and so on. While
explored the sparse representation of LR patches over an neighborhood preservation and correlation maximization
LR dictionary, and used the same representation coeffi- are the only concerns, discriminativity is often lost and the
cients to generate the HR output, but the result usually frequently used PCA, for example, yields results like mean
suffers from inconsistency between neighboring patches. face; the second step produces a residue image to recover
Other natural image priors are also studied in the litera- details [2,4,5,11].
ture. The gradient profile prior is developed in [8] to This paper focuses on SR from a given LR version of
preserve sharp edges, but is limited in modeling the visual general natural image or face image. Similar to our pre-
complexity of real images. Later, priors of image self- vious AJKR method [18], we address this problem from the
similarities and local/nonlocal regularities have been viewpoints of learning good regression priors and robust
exploited for more robust estimation. In [9], the nonlocal dictionaries. However, we generalize AJKR in two ways: (1)
self-similarity properties both within and across spatial to extend the regression range to multi-scales and rota-
scales are fully exploited, but the local regularities are tions, obtaining a Generalized AJKR (G-AJKR) method, and
neglected. Zhang et al. [10] improved by assembling the (2) to incorporate a new global structure prior of human
Steering Kernel Regression [16] (SKR)-based local prior faces into the G-AJKR method for face hallucination, while
and Nonlocal Means [17] (NLM)-based nonlocal prior, preserving individual discriminativity based on Partial
whose connection, however, remains loose. Least Squares (PLS) [19], which is very important when
Another trend in SR is to combine the reconstruction- applied to face recognition applications.
and example-based methods (usually dictionary induced) The remainder of the paper is organized as follows.
into a unified framework to produce more compelling Section 2 reviews related works on dictionary and mani-
results. In fact, SR can be viewed as a regression problem fold learning as the development that follows relies on
aiming to map LR images to target HR images. Then in this them. Section 3 details our G-AJKR framework and its
sense, dictionary-based methods do local regression using extension to face hallucination. Experimental results of SR
bases learned from an external database or the input image on generic and face images with applications in face
itself, while regression models directly estimate HR pixels recognition are provided in Section 4. We conclude the
(kernel learning) or regularize the estimator. As for the paper in Section 5.
regression models, examples include SKR [16], Gaussian
Process Regression (GPR) [12], Kernel Ridge Regression 2. Related works
(KRR) [13] and Non-Local Kernel Regression (NLKR) [14],
and they can all be effectively exploited as a prior for SR 2.1. Dictionary learning
reconstruction. Among them, NLKR overcomes the draw-
backs of the literature [10] by unifying the local and nonlocal Learning a good dictionary is important for example-
priors into a single model in a complimentary way, but it based methods to do local regression using the learned
144 C. Huang et al. / Signal Processing 103 (2014) 142154

Fig. 1. Two dimensional embeddings of PCA coefficients of LR (first row) and HR (second row) face images by different subspace methods.

bases. Traditional choices are the analytically designed effort on neighborhood preservation or correlation maximiza-
wavelet bases which lack sufficient flexibility to a given tion may congregate different neighbors together in the
image. Other methods learn a dictionary (usually over- projected subspace, making the local topology recovery
complete) from an image database using techniques like therein very difficult. Taking CCA for example, imagine an
K-SVD [20] and PCA [21], but the flexibility is still limited extreme case when the first coordinates of X and Y are
since the dictionary is only learned to perform well on perfectly correlated with the others almost uncorrelated, CCA
average. Online dictionary learning from the given image will give the first coordinate as the principle direction which
itself offers a promising alternative to exploit rich informa- projects all the data points in X and Y to a common single
tion contained in the input [22,23]. One drawback is that point, making it impossible to recover the neighborhood
the learning process easily runs into the risk of building structure. This will result in either wrong neighbor selection
dictionaries with many artifacts under image corruptions. while recovery in the subspace YV with an unfaithful face, or
indistinguishable reconstruction results in the subspace XU
2.2. Manifold learning for face hallucination with a near-mean face, both hampering face recognition with
reduced discriminativity. Recently, Partial Least Squares (PLS)
The most popular modeling method in face hallucina- [19] was proposed and successfully applied to face recognition
tion is PCA [2,3], but it is holistic and tends to yield faces and multi-modal recognition [26]. It finds normalized bases U
like the mean. Since face images are shown to reside on and V to maximize the covariance
a low-dimensional nonlinear manifold, researchers are
arg max covXU; YV arg max varXU  corrXU; YV2  varYV;
inspired to use manifold learning to hallucinate global fU;Vg fU;Vg
faces. Typical methods include Locally linear embedding
s:t: J U J J V J 1: 5
(LLE) [24,25] and LPP [5] projecting onto a subspace that
preserves neighborhood relationships. A major assump- Clearly, PLS tries to maintain correspondence as well as
tion of applying all these methods to infer HR faces is that preserve the variance. Fig. 1 shows LR and HR subspaces
LR and HR manifolds have similar local topologies so that obtained from the PCA coefficients of real face images. As can
the HR image can be represented as a linear combination be seen, the correlation (local neighborhood similarity
of neighbors using the same weights derived in the LR assumption) does not hold well for LPP and LLE, also with
space. To strengthen this assumption in practice, CCA [4] little discrimination preserved. Although the correlation is
finds an optimal subspace that maximizes the correlation perfectly maintained by CCA, the projections still congregate.
between LR and HR images. Specifically, CCA finds two On the other hand, PLS preserves both correlation and
bases U and V to linearly map the two sets of vectors X and discriminativity very well.
Y to a common subspace where the correlation is max-
imized
3. Proposed G-AJKR framework for single-image SR
covXU; YV
arg max corrXU; YV2 arg fU; Vg ; 4
fU;Vg max varXU  varYV
To make better use of image local structural regularity and
where cov;  is the covariance operator. nonlocal self-similarity, a G-AJKR method is proposed in
Unfortunately, CCA and the other subspace methods this section. An overview of the proposed method is shown
mentioned above suffer from a common drawback: too much in Fig. 2. A new joint regression prior is learned across scales
C. Huang et al. / Signal Processing 103 (2014) 142154 145

Fig. 2. (a) Graphical illustration of the G-AJKR framework, where the reference patch is marked as R; (b) block diagram of our generic image SR
algorithm; (c) extension to face hallucination by introducing a new global face structure prior based upon PLS.

and rotations and is weighted by the regional redundancy 3.1. Generic image SR
measure (Fig. 2(a)), and the adaptive dictionary learning is
integrated (Fig. 2(b)). We also study how to introduce a new 3.1.1. Review of the previous AJKR method
global face structure prior into G-AJKR for it to be tailored The AJKR method in our previous work [18] combines
towards face hallucination (Fig. 2(c)). cues from local and nonlocal image priors, which are
146 C. Huang et al. / Signal Processing 103 (2014) 142154

inspired by SKR [16] and NLM [17], respectively. It enables matrix K, we can obtain the matrix form:
more reliable and robust results than similar unifying ^ arg minY  DHX2 I KX2 ;
X 2 2 10
methods by simultaneously exploiting both image priors X
in a higher-order collaborative manner. Let xi denote the where I is the identity matrix.
location of the ith pixel in the HR grid, Yi denotes the pixel Considering the degree of patch redundancy for the
observation at xi, and Y i is the patch vector of pixels in xi0 s joint regression in Eq. (9) varies significantly across differ-
local neighborhood N xi . Then the joint kernel regression ent regions within an image, we further propose an
model is formulated as explicit measure of regional redundancy to determine
the confidence of each regression group that gives
a^ i arg min wN 2
ij Y j aWN ; 6 !
a
Xi  Xj 2WG
j
j A Pxi
N 2 N
Ri wij ; wij exp  2
; 11
where Yj is the similar patch to the one at xi (including j A Pxi hn
itself) found in a nonlocal range Pxi within the same which can also be regarded as penalizing the patch
N
image scale. It performs regression on each similar patch distances due to the way wij are calculated. Obviously,
using the polynomial bases (say second-order) from the smaller the distances are (the larger Ri), the more
Taylor expansion with regression coefficients a, and com- similar the grouped patches are and the more patch
N
bines them using the patch similarity weights wij defined redundancy there is in the nonlocal region. Usually, edges
by and smooth areas have large values while textures have
! small values. We then use it for adaptive regularization:
Y i  Y j 2WG
wN
ij exp  ; 7 ^ arg minY DHX2 I KX2 ;
X 12
2 2 R
hn X

where the diagonal matrix R diagR1 ; R2 ; ; Rn . This


where WG is the weight matrix of a Gaussian kernel, and makes our idea of nonlocally joint regression more adap-
hn is the decay parameter. Note that the combined local tive and complete by building a global image-region vision
regressors are generalized from NLM as in [27], so the at higher level. The regional redundancy measure also
kernel weight matrix is WN j diagwj1 ; wj2 ; ; wjJ ; J
N N N
brings out the inherent dissimilarity between the AJKR
jN xj j instead of the popular spatial kernel matrix WKj method and NLKR method [14], i.e. it can account for the
N
in SKR [16]. The kernel weight wji is calculated in the same inter-group variance rather than in a blindly group-wise
way as in Eq. (7), but in the local neighborhood N xj . The way as in NLKR. Compared with Zhang et al. [10], we enjoy
aim is to harmonize such nonlocal fusion in a complete the advantage of better capturing both image priors in an
nonlocal sense. adaptive and collaborative framework instead of crudely
The joint regression scheme present in Eq. (6) exploits imposing two penalty terms.
local and nonlocal priors simultaneously and collabora-
tively. Clearly, the local kernel regression regularizes the 3.1.2. Proposed G-AJKR method
observations found by nonlocal search via structural Image self-similarity tends to occur both at the same
regression, while the nonlocal self-similarity enhances scale and rotation, and across different scales [9,15] and
the robustness of local estimation by providing redundan- rotations. To attain the full power of joint kernel regression
cies. Besides, the kernel regressors generalized from NLM enabled by the image self-similarity, we extend the AJKR
are more consistent with this nonlocal fusion. The pixel method to a larger range of multi-scales and rotations in
estimation at xi is the first element of the vector solution addition to just translations. Concretely, we additionally
of Eq. (6): compare the upright unscaled reference patch with the
" ! #1 target one that is scaled and rotated around its center. To
z^ xi eT1 a^ i eT1 T wN N
T wN N search a range of scales sA 0; S and a range of rotations
ij W j ij W j Y j ;
j A Pxi j A Pxi A 1 ; 2 , the search space of the original algorithm is
T T
8 extended from (xi ) to (xi ,,s), generalizing the translation
filed to a mapping f : R R4 .
2

where e1 is a vector with the first element equal to one Patch rotation is simply achieved by bicubic interpola-
and the rest zero. tion. To search at multi-scales, we generate a multi-
T
Defining the row vectors kij eT1 T j A Pxi wN N
ij W j 
1 resolution image hierarchy of decreasing resolutions {Xs}
N
T  wN
ij W j to be the equivalent kernels with which we scaled down by operator Ds with scale factors at 1.25-s,
perform the regression for xi, we can plug them into the s A 0; S. Here, Ds is a patch downsampling operator which
T
SR optimization function in Eq. (3) to act as the regulariza- keeps the patch center on the LR grid, while Ds is a patch
tion C : upsampling operator with zero-padding [14]. Then for the
reference patch centered at xi on the current image plane,
n s
^ arg minY  DHX2 X i  kT Xj 2 ; we compare to find its nonlocal neighbors Xj of the same
X 2 ij 2 9 s
X i1 j A Pxi patch size J at location 1.25 xj on the image plane of Xs.
This leads to a generalized nonlocally similar patch set
where Xi denotes the pixel to be estimated at location xi, and P s xi , and further to P s; xi when the rotated patch
Xj is its similar patch centered at xj found in the nonlocal versions Xs; j are considered (see Fig. 2(a) for example).
T
range. By properly arranging kij to the equivalent kernel This way, we obtain a generalized joint kernel regression
C. Huang et al. / Signal Processing 103 (2014) 142154 147

model dictionary learning scheme by combining the offline dic-


S 2 tionary B0 (learned from external database, see Fig. 3(a))
a^ i arg min wN;s;
ij Xs; 2
j  aWN;s; ; 13 and online dictionary B1 (learned from input image, see
a s 0 1 j A P s; xi j
Fig. 3(b)). In doing so, B1 goes beyond the universal nature
where WN;s; is the corresponding kernel weight matrix, of B0 and hence gives rise to the adaptivity to any given
j
and the patch similarity together with the induced regio- patch; on the other hand, B0 introduces robustness by the
nal redundancy in Eq. (11) become learned rich priors that counterbalance the possible out-
! liers in B1 due to the input corruptions. The superiority of
Xi Xs;
j WG
2
this scheme is illustrated in Fig. 3(c).
wN;s;
ij exp  2
;
hn Specifically, we adopt the adaptive PCA strategy in [21]
S 2 to learn B0, and also apply PCA to the already grouped
Ri wN;s;
ij 2 : 14 similar patches to learn B1 as in [22,23]. Note that B1 is
s 0 1 j A P s; xi
learned from patches only at the same scale and rotation
The equivalent kernels for the regression-based regu- in the above G-AJKR process to avoid large variance or
larization in Eq. (9) can be rewritten as reduced descriptiveness of the dictionary. Once the dic-
" ! #1 tionary B B0 B1  A RJd is built, we can represent an
S 2
s; image patch Xi A RJ as a linear combination of the atoms
kij T eT1 T wN;s;
ij WN;s;
j T
s 0 1 j A P s; xi in B such that Xi Bi ; i A Rd . Let the whole image be the
average of all the overlapping patch estimates, is the
wN;s;
ij WN;s;
j : 15
concatenation of all i , and is the representation operator
To derive the solution in matrix form, it should be as in [21,22], we can reach to our final optimization
s;
noted that the regression kernels kij T are applied to the function for SR as
pyramid {X } of different resolutions, not just X0 X. We
s

then add them up using the patch downsampling operator ~ J2 1 ;


^ arg minY  DHBJ22 I  KB
R
Ds and upsampling operator DsT and pack them into a
~ By plugging it into Eq. (12), our 17
single kernel matrix K.
final G-AJKR model is given as
where is the regularization parameter of the additional
~
^ arg minY  DHX2 I  KX2
R: 16
X 2
sparsity term.
X
What distinguishes our dictionary scheme from others is
Fig. 2(b) shows the block diagram of our G-AJKR its ability to adapt to the G-AJKR prior in response to
algorithm, which is actually solved iteratively with our our regional redundancy measure in a unified framework
dictionary learning (see next) by the iterative shrinkage (Fig. 2(b)). Specifically, if no/few similar patches are found
algorithm [28]. At each iteration t, we construct the image for a given one, its near-zero redundancy measure Ri cancels
hierarchy from the current HR image estimation Xt (so we out the erroneous joint regression prior, thus reducing Eq.
get fXt;0 ; ; Xt;S g). While at the first iteration the HR image (17) to sparse coding only at that patch, which is performed
is not available, we initialize it by bicubic interpolation to mainly over the bases in B0 because those in B1 learned
generate the pyramid images. Generally, the more redun- online from mutually dissimilar patches deviate too far from
dancies offered by the 4D search across translations, the true signal space to accurately represent the patches.
rotations and scales can give rise to the stability and While in the case of high patch redundancy, the large Ri
robustness of joint structural regression. imposes a strong effect of the joint regression prior, and on
the other hand, the online dictionary B1 dominates for
3.1.3. Adaptive dictionary learning enforcing the remaining sparse representation prior. This
The above G-AJKR framework can further benefit from adaptive mechanism not only guarantees to provide the best
dictionary-based methods that introduce natural image possible solution for individual patches (never worse than
priors from the learned bases. We here adopt an adaptive sparse coding only), but also makes our framework more

Fig. 3. (a) Centroids of the offline PCA dictionary B0; (b) examples of the online PCA dictionaries B1, where the 8 first atoms are shown; (c) PSNR curves
(  3) versus iterations for different dictionary learning schemes on the lena image corrupted with a Gaussian blur (sb 2) and Gaussian noise (sn 5).
148 C. Huang et al. / Signal Processing 103 (2014) 142154

tolerable to noise or blur that may cause e.g. grouping intermediate subspace
errors. M
ch i0 cH
i0 : 22
0
3.2. Face hallucination i 1
Subsequently, we can reconstruct the PCA coefficients
The face images differ from the general natural images bh of the HR image and the hallucinated HR global face
image X^ as
in that the face images are more regular in structure,
therefore introducing a global facial structure prior can be bh UUT  1 Uch ; ^ H EH bh :
X 23
conductive. Following [2,4,5,11], we also propose a two-
step face hallucination method, where the first step con- To the best of our knowledge, it is the first time that PLS
strains and reconstructs the global face in a discriminativ- is used for the face hallucination problem. The PLS-based
ity preserving subspace. The learned subspace can face reconstruction instead of simple bicubic interpolation
introduce a strong facial prior to p(X) in Eq. (2) describing is implemented as the first step, as shown in Fig. 2(c), to
the main characteristics of face images. incorporate a good global face prior. Then the proposed
As mentioned in Section 2.2, the commonly used mani- G-AJKR algorithm follows to recover details. From the
fold learning methods LLE and LPP potentially suffer from a Bayesian viewpoint in Eq. (2), G-AJKR further imposes
great loss in the discriminative ability. Since CCA only local/nonlocal priors to p(X) and applies the reconstruc-
correlates the LR and HR subspaces, it can also fail to tion constraint for pYjX using Eq. (17).
differentiate between the subspace projections. As a result,
the hallucinated facial features by these subspace methods 4. Experimental results
may not be faithful to the ground truth but are more like the
mean features or unexpected ones. We here propose to use In this section we evaluate the effectiveness of our
PLS [19] to learn an intermediate subspace, which strikes a method G-AJKR for single-image SR. We first give an
balance between the objectives of CCA and PCA by main- illustrative example to demonstrate the superiority of the
taining correlation while capturing the projection variations generalization to the range of multi-scales and rotations.
(discrimination). By using the co-occurrence assumption, Then we compare with several related as well as state-of-
neighbor-based reconstruction in such subspace will gener- the-art algorithms, on several standard test images and
ate faithful (or unique) visual features, which are crucial for real-world images, both qualitatively and quantitatively.
the task of face recognition. We first learn PCA models for For color images all the test algorithms are applied to the
the LR/HR training images, with the corresponding mean illuminance channel only, where the quantitative results
faces L and H , and orthogonal eigenvectors EL and EH. Thus are carried out in terms of the metrics of PSNR and
LR/HR faces are represented as linear combinations of Structural SIMilarity index (SSIM) [29]. Since in real-
eigenfaces using coefficients bL and bH world SR tasks the observed LR images are often contami-
L nated by noise, the robustness of SR methods with respect
Y L EL b ;
to noise is also evaluated. Finally, the face hallucination
H H H
X E b : 18 and recognition performance are presented.
Suppose we have collected from training set the PCA In all experiments we use HR patches of 7  7 pixels
L Q H Q (L49) with a 4-pixel overlap for both local kernel
coefficients {bi }i 1 and {bi }i 1, then PLS is applied to them
to find two normalized bases V and U to maximize the regression and patch matching. We set the support of
L the nonlocal searching to be 15-nearest neighbors in a
covariance using Eq. (5). Projecting the PCA coefficients ci
H
and ci by these bases into a common subspace, we have window of size 21  21 across all pyramid levels. Since the
high dimensionality of the 4D search space imposes a large
L H
cLi VT bi ; cH T
i U bi ; 19 computational burden, we only set S3 (i.e. 4 hierarchy
where
L
ci
and
H
ci
are the subspace projections whose levels) and rotation range A f  901; 451; 01; 451; 901g
correlation is maximized with the discriminativity pre- without much performance degradation. Besides, we
served. Fig. 2(c) illustrates this relationship which is bene- apply the 4D search only to patches with high intensity
ficial to the neighbor-based reconstruction. Given an input variance (threshold 16). This is due to that the SR effect is
LR face image Y with its PCA coefficients bl computed, we expressed mostly in highly detailed image regions (e.g. edges
obtain its PLS projection cl VT bl . Then for cl we seek its M and textures) rather than the uniform and low-frequency
M ones (we apply 2D search for them). As a result, it typically
nearest neighbors fcLi0 gi0 1 in the trained subspace and
corresponding weights f i0 gM takes about 4 min to process a 256  256 image on a PC
i0 1
 2 (3.2 GHz, Pentium IV) with our unoptimized MATLAB code.
  Other parameters are set as 40, 0.25, hn 15. The
 M
 M
arg mincl  i0 cLi0  ; s:t: i0 1: 20
fg  0  0 synthetic LR images were generated from original images
i 1 i 1 2
by a truncated 7  7 Gaussian kernel (sb 1.6) and down-
The closed form solution of the weight is given by [25] sampled by a factor of 3. A Gaussian noise (sn 5) was also
j0 Ai0 j01     added. For solving the optimization problem in Eq. (17), we
i0 1
; Ai0 j0 cl  cLi0  cl  cLj0 : 21 use the iterative shrinkage algorithm [28].
lm Alm
Our face hallucination experiments were performed on
M
Using the same weights for the corresponding fcH g
i0 i0 1
, frontal view face images from the CAS-PEAL [30] database
we can reconstruct the HR projection features in the and FERET [31] database. For the CAS-PEAL database, we
C. Huang et al. / Signal Processing 103 (2014) 142154 149

randomly selected Q500 images (one per person) with 4.1. Generic image SR experiments
normal expression to train the PLS subspace, and other 40
images (disjoint from the training set) for testing; for the Fig. 4 validates the efficacy of our 4D generalization of
FERET database, Q800 images were randomly selected the basic algorithm on noisy Parrot image. We compare the
for subspace training with other 403 images for testing. All results of AJKR and G-AJKR using different scales and
the face images were aligned by the eye positions and rotations. It is shown that the multi-scale-and-rotation
mouth center and cropped to 128  96 pixels (HR) and version of AJKR preserves sharper edges and more faithful
32  24 pixels (LR). Then the training set for PLS learning is details than the original AJKR, with the G-AJKR only across
composed of LR/HR image pairs, and we zoom the 32  24 rotations being in-between.
LR test image by 4 times. To prepare the offline dictionary Next we compare our method against four regression-
B0 for our G-AJKR algorithm in this scenario, we sampled based methods, GPR [12], KRR [13], NLKR [14] and Zhang
patches from the HR training images and learned the et al. [10]0 s method, two dictionary-based methods, Cen-
dictionary following [21]. In the global face reconstruction tralized Sparse Representation (CSR) [22] and Sparse Coding
phase, we use PCA which retains 98% of the variance and (SC) method [11], and three state-of-the-art methods, Shan
200 PLS bases are kept for the subsequent projection. The et al. [7], Glasner et al. [9] and Freedman and Fattal [15].
neighborhood size is set as M50. Among them, the GPR and KRR are two recent regression

Fig. 4. Comparison of SR results (  3) on noisy Parrot image obtained by AJKR [18] and G-AJKR using different scales and rotations. (a) LR input; (b) AJKR
(PSNR: 29.62 dB, SSIM: 0.892); (c) G-AJKR across rotations but at the same scale (S 0) (PSNR: 29.67 dB, SSIM: 0.896); (d) G-AJKR across rotations and
scales (PSNR: 29.78 dB, SSIM: 0.903).

Table 1
Comparison of SR results (  3, PSNR/SSIM) for the noiseless case.

Methods Bike Butterfly Girl Parrot Plants

GPR 21.86/0.635 22.96/0.796 31.71/0.765 26.64/0.854 29.85/0.832


Freedman and Fattal 22.83/0.711 23.61/0.841 32.01/0.782 27.45/0.876 31.12/0.872
Shan et al. 22.78/0.681 24.79/0.847 31.90/0.761 27.90/0.868 30.74/0.841
Glasner et al. 22.95/0.699 24.61/0.837 32.49/0.797 28.28/0.878 31.12/0.867
SC 23.30/0.739 24.59/0.821 30.93/0.804 28.40/0.883 31.26/0.879
KRR 23.16/0.717 24.75/0.855 32.49/0.788 28.42/0.883 31.48/0.872
Zhang et al. 24.11/0.786 26.85/0.896 32.96/0.810 29.60/0.900 32.92/0.897
CSR 24.72/0.802 28.19/0.921 33.68/0.826 30.68/0.918 34.00/0.921
G-AJKR 25.48/0.839 28.57/0.943 33.94/0.844 30.86/0.940 34.63/0.940

Table 2
Comparison of SR results (  3, PSNR/SSIM) for the noisy case.

Methods Bike Butterfly Girl Parrot Plants

GPR 21.75/0.620 22.63/0.775 31.03/0.741 26.36/0.829 29.44/0.807


Freedman and Fattal 22.68/0.695 23.42/0.821 30.93/0.742 27.04/0.826 30.22/0.828
Shan et al. 22.67/0.672 24.61/0.837 31.48/0.749 27.72/0.856 30.38/0.829
Glasner et al. 22.78/0.682 24.47/0.830 31.34/0.749 27.95/0.860 30.24/0.827
SC 22.97/0.698 24.29/0.787 30.35/0.734 27.73/0.800 30.00/0.795
KRR 22.91/0.685 24.44/0.816 31.28/0.736 27.84/0.816 30.33/0.805
Zhang et al. 23.23/0.711 25.71/0.850 31.46/0.749 28.45/0.856 30.77/0.825
CSR 23.78/0.736 26.84/0.888 32.03/0.764 29.47/0.878 31.73/0.860
G-AJKR 24.37/0.771 27.16/0.906 32.25/0.790 29.78/0.903 32.08/0.873
150 C. Huang et al. / Signal Processing 103 (2014) 142154

methods that capture from input image the mapping authors or downloaded from their websites, and we used
between LR and HR patches via Gaussian process regression their default parameter setups.
and sparse kernel regression, respectively. All the compared Table 1 and 2 show the quantitative results for the noise-
results were reproduced by the codes provided by the less and noisy cases, respectively. Since the implementation of

Fig. 5. Visual comparison of SR results (  3). (a) Noiseless case; (b) noisy case. The PSNR and SSIM results of all methods compared can be seen accordingly
in Tables 1 and 2.

Fig. 6. Visual comparison of SR results (  4) on real LR images.


C. Huang et al. / Signal Processing 103 (2014) 142154 151

Fig. 7. Visual comparison of hallucinated global faces by different subspace methods on the CAS-PEAL database. (a) LR input; (b) reconstruction result
directly in the PCA space without projection onto other subspaces; (c) PCA LPP; (d) PCALLE; (e) PCA CCA; (f) PCAPLS; (g) original HR image.

Fig. 8. Visual comparison between our two-step face hallucination algorithm with only the G-AJKR algorithm on the CAS-PEAL database. (a) LR input; (b)
global hallucination result; (c) result using the two-step algorithm; (d) result using the G-AJKR algorithm only; (e) original HR image.
152 C. Huang et al. / Signal Processing 103 (2014) 142154

NLKR is not available, we do not include this method in our the state-of-the-art methods of Freedman and Fattal, Shan
quantitative comparisons. Visual quality comparison with its et al. and Glasner et al. our improvements are also evident.
reported result on a real-world image will be shown next. As Fig. 5(b) shows the noise robustness of our method while
can be seen from the tables, our method constantly outper- preserving edges and small details. Again, this can be
forms the others across all metrics for both cases, with the attributed to the full and adaptive exploitation of image
largest PSNR improvements (over GPR) at 4.09 dB (noiseless) self-similarity and the responsive dictionary scheme. More
and 2.89 dB (noisy) on average which are quite significant. For challenging results (  4 magnifications) on real LR images
the noisy input, not only must image details be recovered, but are shown in Fig. 6. We can see that our method outper-
also the noise need to be suppressed. Methods like SC tend to forms all the others in terms of visual plausibility.
magnify noise during image up-sampling, which lead to lower
results than ours. 4.2. Face hallucination and recognition performance
Fig. 5 visually illustrates the performance difference of
our method to the others. In Fig. 5(a), our method Face hallucination can handle more challenging tasks
synthesizes more visual details and sharpen edges without than generic image SR due to the regular face structure.
blur compared with the three regression methods GPR, Our two-step method first hallucinates a global face in the
KRR and Zhang et al.0 s method. This is due to our coherent learned face subspace which incorporates the special
and collaborative use of regression priors and the rich properties of faces and compensates for the lost informa-
redundancies found across scales and rotations. Our result tion in the input. Fig. 7 compares different subspace
is also free of jaggy and ringing artifacts with the SC and methods for global face hallucination, whose subspace
CSR methods respectively, which validates the benefits of dimension and neighborhood size are the same with ours.
our adaptive dictionary learning scheme (it usually Clearly, after projecting PCA coefficients onto subspaces
improves the baseline performance of G-AJKR by about learned by LPP, LLE and CCA the resultant faces do not look
0.15 dB as in our previous AJKR case [18]). In comparison to like the original face but like a fused one, which means the

Fig. 9. Comparison of hallucination results on the CAS-PEAL (rows 12) and FERET (rows 34) databases along with the corresponding average PSNR
values. (a) LR input; (b) Wang and Tang (average PSNR: 25.33 dB); (c) Zhuang et al. (average PSNR: 29.27 dB); (d) NE (average PSNR: 30.32 dB); (e) SC
(average PSNR: 30.94 dB); (f) our method (average PSNR: 32.84 dB); (g) original HR image.
C. Huang et al. / Signal Processing 103 (2014) 142154 153

discriminativity is lost. Moreover, unlike CCA, both LPP and


LLE cannot maintain the LR-to-HR correspondence. Then
using the co-occurrence prior to infer the HR face is
usually not feasible, making the results deviate further
from the ground truth (more like the mean face). Directly
reconstruction in the PCA space also suffers from the
failure of this co-occurrence assumption, which results in
perceptually distracting artifacts. PLS realizes the tradeoff
between PCA and CCA to generate smooth and faithful
faces which are not only distinct but also similar to the
original faces.
The proposed G-AJKR algorithm follows the global
hallucination step to enhance the edges and recover more
details (see results in Fig. 8(b and c)). Fig. 8(c) and (d)
compares the two-step approach with the direct G-AJKR
method only to validate the necessity of the first step. As
can be seen, a direct application of the generic G-AJKR
method cannot generate sharpen edges and sufficient Fig. 10. Comparison of recognition results (ROC curves) for different face
details without the global facial structure prior. hallucination methods on the FERET database.
A more thorough comparison with recent representa-
tive methods is conducted on both CAS-PEAL and FERET Receiver Operating Characteristic (ROC) curves are plotted
databases. We compare our method with the methods of in Fig. 10. It shows that the gap between the verification
Wang and Tang [3], Zhuang et al. [5], Neighbor Embedding rates of LR and HR images is around 20% when the false
(NE) [24] and Sparse Coding (SC) [11]. Wang and Tang [3] accept rate is 0.1%. Our method significantly improves the
proposed a global hallucination method based on eigen- LR performance by around 18%, whereas the performance
transformation using PCA. We used 300 training image of Zhuang et al.0 s method and SC is only slightly better.
pairs for it and PCA which retained 99% of the variance. Since our method can firstly infer a global face without
Zhuang et al. [5] used manifold learning techniques for undesirable artifacts and noise while preserving unique
both global face reconstruction (based on LPP) and local facial features using PLS, the discriminativity is maintained
detail compensation. We used the default parameter setup which is beneficial to the face recognition task.
for this method. NE and SC are two patch-based methods
for generic image SR. They are here selected for compar- 5. Conclusions
ison due to the similar ideas of using the neighbor-based
reconstruction and co-occurrence assumption. The NE This paper introduces a generalized joint kernel regres-
method applies LLE to the LR neighbor patches (we set sion framework for single-image super-resolution. It com-
150) and estimates the HR image patch by patch using the bines multiple coherent local kernel regressors to exploit
co-occurrence prior. The SC method differs in applying this the local and nonlocal image priors in a higher-order
prior to the sparse representations of image patches. The collaborative manner, and is further generalized to the
default parameters were used for these two methods. range of multi-scales and rotations. An adaptive dictionary
Fig. 9 shows that our method produces clean face learning scheme is also integrated to interact with the
regions with noticeable sharper edges and more faithful regression prior for robustness. The large variety of experi-
details than the others do. The improvements can be seen ments shows that the proposed algorithm achieves state-
on both databases, suggesting the robustness of our of-the-art performance and is also successfully applied to
method to face variations (e.g. illumination). The impres- specific domains of human faces. The extension to face
sion is confirmed by reading the corresponding average hallucination tasks distinguishes itself by incorporating a
PSNR values. discriminativity preserving global face prior based on
It is worth noting that tailoring face SR algorithms Partial Least Squares, which leads to faithful hallucination
purely to good visual quality does not necessarily lead results as well as high face recognition performance.
to high face recognition performance by machine. In
order to demonstrate the advantage of discriminativity
preservation shared by our face hallucination method, we Acknowledgments
finally conducted face recognition experiments on the
FERET database. The training (800 images) and testing This work was supported by the National Basic
(403 images) sets were as described before for PLS learn- Research Program of China (973 program) under Grant
ing. For recognition, Gabor features are fed into the classic No. 2013CB329403.
PCA LDA (reduced to 600 dimensions) classifier using the
cosine similarity measure. Then we compare the verifica-
References
tion rates produced from the hallucinated HR face images
among different methods, namely, representative Zhuang
[1] S. Baker, T. Kanade, Hallucinating faces, in: Proceedings of IEEE
et al.0 s method, SC and our method. The performance of International Conference on Automatic Face and Gesture Recogni-
LR and original HR images is used as benchmarks. The tion, 2000, pp. 8388.
154 C. Huang et al. / Signal Processing 103 (2014) 142154

[2] C. Liu, H.Y. Shum, W.T. Freeman, Face hallucination: theory and [18] C. Huang, X. Ding, C. Fang, Single-image super-resolution via
practice, Int. J. Comput. Vis. 75 (2007) 115134. adaptive joint kernel regression, in: Proceedings of the British
[3] X. Wang, X. Tang, Hallucinating face by eigentransformation, IEEE Machine Vision Conference, 2013.
Trans. Syst. Man Cybern. Part C 35 (3) (2005) 425434. [19] R. Rosipal, N. Krmer, Overview and recent advances in partial least
[4] H. Huang, H. He, X. Fan, J. Zhang, Super-resolution of human face squares, in: Proceedings of International Conference on Subspace,
image using canonical correlation analysis, Pattern Recognit. 43 (7) Latent Structure and Feature Selection, 2006, pp. 3451.
(2010) 25322543. [20] J. Mairal, M. Elad, G. Sapiro, Sparse representation for color image
[5] Y. Zhuang, J. Zhang, F. Wu, Hallucinating faces: LPH super-resolution restoration, IEEE Trans. Image Process. 17 (1) (2008) 5369.
and neighbor reconstruction for residue compensation, Pattern [21] W. Dong, L. Zhang, G. Shi, X. Wu, Image deblurring and super-
Recognit. 40 (11) (2007) 31783194. resolution by adaptive sparse domain selection and adaptive reg-
[6] X. Li, M.T. Orchard, New edge-directed interpolation, IEEE Trans. ularization, IEEE Trans. Image Process. 20 (7) (2011) 18381857.
Image Process. 10 (10) (2001) 15211527. [22] W. Dong, L. Zhang, G. Shi, Centralized sparse representation for
[7] Q. Shan, Z. Li, J. Jia, C.K. Tang, Fast image/video upsampling, ACM image restoration, in: Proceedings of IEEE International Conference
Trans. Graph. 27 (5) (2008) 17. on Computer Vision, 2011, pp. 12591266.
[8] J. Sun, Z. Xu, H. Shum, Image super-resolution using gradient profile [23] P. Chatterjee, P. Milanfar, Clustering-based denoising with locally
prior, in: Proceedings of IEEE Conference on Computer Vision and learned dictionaries, IEEE Trans. Image Process. 18 (7) (2009)
Pattern Recognition, 2008, pp. 18. 14381451.
[9] D. Glasner, S. Bagon, M. Irani, Super-resolution from a single image, [24] H. Chang, D.Y. Yeung, Y. Xiong, Super-resolution through neighbor
in: Proceedings of IEEE International Conference on Computer
embedding, in: Proceedings of IEEE Conference on Computer Vision
Vision, 2009, pp. 349356.
and Pattern Recognition, 2004, pp. 275282.
[10] K. Zhang, X. Gao, D. Tao, X. Li, Single image super-resolution with
[25] L.K. Saul, S.T. Roweis, Think globally, fit locally: unsupervised
non-local means and steering kernel regression, IEEE Trans. Image
learning of low dimensional manifolds, J. Mach. Learn. Res. 4 (2003)
Process. 21 (11) (2012) 45444556.
119155.
[11] J. Yang, J. Wright, T.S. Huang, Y. Ma, Image super-resolution via
[26] A. Sharma, D.W. Jacobs, Bypassing synthesis: PLS for face recogni-
sparse representation, IEEE Trans. Image Process. 19 (11) (2010)
tion with pose, low-resolution and sketch, in: Proceedings of IEEE
28612873.
[12] H. He, W.C. Siu, Single image super-resolution using Gaussian Conference on Computer Vision and Pattern Recognition, 2011,
process regression, in: Proceedings of IEEE Conference on Computer pp. 593600.
Vision and Pattern Recognition, 2011, pp. 449456. [27] P. Chatterjee, P. Milanfar, A generalization of non-local means via
[13] K.I. Kim, Y. Kwon, Single-image super-resolution using sparse kernel regression, in: Proceedings of IS&T-SPIE Computational
regression and natural image prior, IEEE Trans. Pattern Anal. Mach. Imaging VI, 2008, p. 68140P.
Intell. 32 (6) (2010) 11271133. [28] I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding
[14] H. Zhang, J. Yang, Y. Zhang, T.S. Huang, Non-local kernel regression algorithm for linear inverse problems with a sparsity constraint,
for image and video restoration, in: Proceedings of European Commun. Pure Appl. Math. 57 (11) (2004) 14131457.
Conference on Computer Vision, 2010, pp. 566579. [29] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality
[15] G. Freedman, R. Fattal, Image and video upscaling from local self- assessment: from error visibility to structural similarity, IEEE Trans.
examples, ACM Trans. Graph. 28 (3) (2010) 110. Image Process. 13 (4) (2004) 600612.
[16] H. Takeda, S. Farsiu, P. Milanfar, Kernel regression for image proces- [30] W. Gao, B. Cao, S. Shan, X. Chen, D. Zhou, X. Zhang, D. Zhao, The CAS-
sing and reconstruction, IEEE Trans. Image Process. 16 (2) (2007) PEAL large-scale Chinese face database and baseline evaluations,
349366. IEEE Trans. Syst. Man Cybern. Part A 38 (1) (2008) 149161.
[17] A. Buades, B. Coll, J.M. Morel, A non-local algorithm for image [31] P.J. Phillips, H. Moon, S.A. Rizvi, P.J. Rauss, The FERET evaluation
denoising, in: Proceedings of IEEE Conference on Computer Vision methodology for face-recognition algorithms, IEEE Trans. Pattern
and Pattern Recognition, 2005, pp. 6065. Anal. Mach. Intell. 22 (10) (2000) 10901104.

Vous aimerez peut-être aussi