Cartoon and Texture Decomposition

IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO.
9, SEPTEMBER 2016
1265
Blind Quality Metric for Multidistortion Images

Based on Cartoon and Texture Decomposition
Feiyan Zhang and Badri Roysam
AbstractIn this letter, a no-reference (NR) hybrid image quality assessment (IQA) metric based on cartoon-texture decomposition (CTD) is presented. Focusing on images distorted by both
blur and noise, the method takes properties of CTD to separate
image into cartoon part with salient edges and texture part with
noises. Then, the blur degree and noise level can be estimated
separately from different image parts, combine with a joint effect prediction between blur and noise distortions, we present
a cartoon-texture decomposition-based blind metric (CTDBBM).
Comparative studies with classical full-reference IQA metrics and
state-of-the-art NR metrics are conducted on multidistortion image
database: LIVEMD. Experimental results show that the CTDBBM
performs well and has a high consistency with the human opinions
given in the database.
Index TermsBlind/no-reference, cartoon and texture decomposition (CTD), image quality assessment (IQA), joint effect,
multidistortion.
I. INTRODUCTION
IDESPREAD using of imaging devices, such as digital
cameras and smart-phones obtain a truly massive scale
of images every day, and the Internet makes sharing of these images easy and fast. While going through the whole acquisition,
storage, transmission, processing, and displaying phases, images are unavoidably enduring various types of distortions, make
image quality assessment (IQA) research fundamentally important in modern multimedia systems. Since human observers
are the final users of the images, the most dependable way to
evaluate image quality is human opinions. However, subjective
evaluation of image quality can be costly and time consuming
which is unacceptable in real-time applications, and that is why
perceived visual quality using objective metrics has gained a lot
of interest these days.
According to the requirement of original undistorted image,
objective IQA metrics can be divided into three categories,
which are also the three developing phases in the IQA research. With the entire information of undistorted images, the
full-reference (FR) IQA is largely promoted since the emer-
Manuscript received May 19, 2016; revised June 29, 2016; accepted July
12, 2016. Date of publication July 27, 2016; date of current version August
08, 2016. This work was supported in part by the National Natural Science
Foundation of China under Grant 61201452. The associate editor coordinating
the review of this manuscript and approving it for publication was M. Cagnazzo.
F. Zhang is with the Department of Mathematics, Physics and Information Engineering, Zhejiang Normal University, Jinhua 321004, China (e-mail:
zhangfy@zjnu.cn).
B. Roysam is with the Department of Electrical and Computer Engineering,
University of Houston, Houston, TX 77204 USA (e-mail: broysam@central.
uh.edu).
Color versions of one or more of the figures in this letter are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/LSP.2016.2594166
Fig. 1.
Flowchart of the proposed metric.
gence of structure similarity (SSIM) [1] and the several improved versions [2][4]. However, the pristine image is usually
unavailable in most applications, gives a surge of development
in reduced-reference IQA metrics [5], [6], which rely on partial information from the original image and no-reference (NR)
metrics [7][9], which rely only on the distorted image. The best
ones of NR metrics come from natural scene statistics (NSS)
analysis, by extracting statistic features in spatial domain, DCT
domain, and DWT domain, it gives rise to several state-ofthe-art metrics, such as BRISQUE [7], BLIINDS-II [8], and
NIQE [9].
The fast development of IQA algorithms has given rise to
some further research interests, one of which is the quality measure of multidistortion images. Despite the efficiency of NR
metrics on images of single distortion types, most of the metrics have difficulty on multidistortion image quality estimation.
Without original image information, NR metrics rely only on
the feature change mode in distorted images; however, when
comes to multidistortion images, the feature change becomes
more complex. In [10], Chandler mentioned some of the possible joint effects or interference between different distortion
types on image features, and pointed out multidistortion IQA
is one of the seven challenges to be solved in the future. A
recently released LIVE multiply distorted (LIVEMD) image
database by Jayaraman et al. [11] is the first publicly released
image database concerning multidistortion IQA, the database
includes two groups of double-distortion images: 1) images are
first blurred and then compressed by a JPEG encoder; 2) images
are first blurred then corrupted by the white noise. Each subset
has 225 images generated from 15 pristine images.
This letter specifically interested in the IQA study of images
with both blur and noise distortions for the following reasons:
first, blur and noise are the two most common distortion types
occurring in real image communication systems: acquisition,
compression and transmission. Second, the impact of blur and
noise on image features are usually conflicting, makes it very
difficult to build an IQA model as the NR metrics with single
1070-9908 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
1266
IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO. 9, SEPTEMBER 2016
Fig. 2.
CTD. (a) Lake building. (b) Cartoon. (c) Texture.
distortion type. To solve this problem, a prestep called cartoontexture decomposition (CTD) is conducted to separate image
into cartoon part and texture part. The cartoon part, which maintains all the salient edges in the image can be used for sharpness/blur estimation, while the texture part, which contains all
the textures and noises (if exist) can be used to estimate noise
strength. Then, with joint effect prediction and suitable pooling
strategy, we proposed a cartoon-texture decomposition-based
blind metric (CTDBBM) on blur and noise distorted IQA. The
flowchart of the algorithm is shown in Fig. 1.
II. ALGORITHM
Fig. 3. Separation of edges and noise by CTD. (a) Cartoon part of undistorted
image, (b) texture part of undistorted image, (c) cartoon part of noisy image,
and (d) texture part of noisy image.
A. Cartoon-Texture Decomposition
1) Basic Idea: CTD is an especially interesting image content separation problem that targeting at decomposition of an
image into texture and piecewise smooth (cartoon) parts. Such
separation has been applied in image compression, image analysis, synthesis, and more. Fig. 2 shows that image lakebuilding
from LIVEMD is decomposed into cartoon part and texture part
using the algorithm proposed in [12].
As we can see, the salient edges of the original image are
preserved well in cartoon part, while the small textures are in
texture part.
2) Edge-Noise Separation: Since there is no ground truth in
CTD algorithms, it is difficult to make any numerical evaluation
of CTD results, so some researchers try to demonstrate their
advantages by applications of the algorithms, such as image deburring, image restoring from pixel missing, etc. [13], [14]. In
this letter, we also use this methodology to choose from various
recently proposed CTD algorithms [12][18]. The aim of applying CTD here is to separate the salient edges and the noises in
the image apart, so two standards have been set: first, the salient
edges must stay as unaffected as possible in the cartoon part
to make accurate blur estimation; second, the noises should be
separated completely into texture part to get an accurate noise
level estimation and to avoid its impact on blur estimation.
By comparing four CTD algorithms with source codes provided by the authors: AD-aBLMV-ADE [12], BNN [16], TV-L1
[17], and DF [18], we found that AD-aBLMV-ADE proposed
in [12] gives the best results according to the two standards.
Fig. 3 shows its ability at edge-noise separation.
As shown in Fig. 3(a) and (c), the salient edges are well preserved in the cartoon part, texture/noise, on the contrary, is com-
pletely excluded from this part, and no visible influence has been
made by noise in the image before decomposition, as shown in
Fig. 3(c). Comparison between Fig. 3(b) and (d) shows that, in
CTD, noises are taken as small textures and preserved in the
texture part. A conclusion can be drawn from this analysis, by
applying the CTD to a multidistortion image, we can perfectly
separate the edges and noises apart. Then, the sharpness/blur estimation based on salient edges and noise level estimation based
on flat blocks can be accomplished respectively.
B. Blur Estimation
A five-step blur estimation metric perceptual sharpness index
(PSI) presented in [19] is applied to the cartoon part of the
multidistortion image to obtain the blur index.
Step 1: The sobel filter is applied to the image patches of the
cartoon part to generate an edge map, followed by an adaptive
threshold leads to a focus on the most significant edges of the
image.
Step 2: Consider the vertical gradient only, the edge width is
computed by
w (x) =
wup (x) + wdown (x)

cos ( (x))
(1)
where wup (x) and wdown (x) are the distances between the
detected edge pixel x and the traced local maximum Im ax (x)
and local minimum Im in (x) pixels, respectively, cos((x))
compensates for the deviation (x) of the edge gradient
direction to the tracing direction.
ZHANG AND ROYSAM: BLIND QUALITY METRIC FOR MULTIDISTORTION IMAGES BASED ON CARTOON AND TEXTURE DECOMPOSITION
1267
Step 3: The refinement of the edge width according to human

perception is computed by
wP S I
Im ax (x) Im in (x)
m(x) =
w (x)

w (x) m (x), if w(x) wJNB ,
=
w (x), otherwise.
(2)
(3)
In which, wJNB = 3 is the width of the edges with just notifiable blurriness.
Step 4: Compute the average edge width wpatch of each image
patch.
Step 5: To avoid the defocusing regions of the image, only a
percentage of sharpest blocks are averaged to get the final
value of PSI.
K
(4)
PSI = K
k
k =1 wpatch
where K equals the number of chosen patches.
C. Noise-Level (NL) Estimation
It has been presented in [20] that one can easily estimate additive Gaussian noise level of an image by PCA decomposition
through the equation

m in
= m in
+ n2
(5)
y
z

where y signifies
the covariance matrix of one patch yi in the

noisy image, z denotes
the covariance matrix of the noise-free
patch zi , and m in ( ) represents the minimum eigenvalue of
the matrix .
Since the natural images are redundant, the data of an image span only low-dimensional subspace, by selecting weaktextured/low-rank patches from the noisy image, one can
assume
the minimum eigenvalue of the covariance m in ( z ) = 0.
So, the NL indicated by noise covariance can be obtained

= n2 .
NL = m in
(6)
y
In this letter, the texture part after CTD can be seen as a weaktextured image, so we can even save the patch selecting step to
get a reliable NL index.
D. Joint Effect Prediction
Human visual system have the ability to separate noise from
back ground information, so noise estimation is not usually
affected by other factors. So, the joint effect is focused on the
impact of noise casting on blur estimation. We randomly choose
six reference images from LIVEMD database. The PSI values
of these images distorted by same blur but different level of
noises are computed, the changing of PSI values versus NL are
shown in Fig. 4.
As we can see, for same blur distortion, the estimation of
PSI value grows as NL increasing. The reason is quite clear,
although most noises are excluded from cartoon part, the noises
added to the image edges cannot be recognized as textures in
CTD, but remains on the edges and makes them more sharper.
To reduce this impaction, two factors have to be considered:
Fig. 4.
PSI versus NL.
first, the NL of the image, the stronger the noise, the severe the
impaction. Second, the complexity of the image, more edges
means more inseparable noises, so more severe the impaction
will be.
According to this analysis, we introduce image entropy as the
modifier to reduce the impaction because its value correlates
highly with both factors: noise strength and image complexity.
The modified PSI is computed as
PSIj = PSI f (E)
(7)
where E is the image entropy f is a linear function of E decided

by experiments.
E. Pooling Strategy
The blur index PSI and noise index NL are not dimensionally
homogenous, so a linear pooling model is not a good choice
to get accurate final quality scores. To solve this problem, we
use the nonlinear regression mapping to project the scores into
subjective score (DMOS) space based on the four-parameter
logistic function [21]
Q (q) =
1 2
+ 2

1 + exp q 4 3
(8)
where q and Q scores are before and after mapping,

respectively, i (i = 1, 2, 3, 4) are mapping parameters.
The details are described as follows.
Step 1: A training strategy is adopted, by applying PSI metric to the 145 blur images and NL metric to the 145 noise
images in LIVE2 database, we can obtain two objective
IQA vectors: qblur and qnoise . The subjective quality scores
dmosblur and dmosnoise are given in the database. Then, according to (8), the mapping parameters, i (i = 1, 2, 3, 4)
for blur and noise quality vectors can be obtained, respectively.
Step 2: Then, with the mapping parameters and the PSIj scores
for cartoon parts and NL scores for texture parts, the mapping
scores PSIm and NLm can be obtained using (8).
Step 3: A linear model is presented to get the final IQA index
for the multidistortion images
CTDBBM = w1 PSIm + w2 NLm .
(9)
1268
IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO. 9, SEPTEMBER 2016
Fig. 5. Scatter plots of DMOS versus classical PSNR, SSIM, MS-SSIM, GMSD, state-of-the-art BRISQUE, SSEQ, NIQE, and the proposed CTDBBM on
LIVEMD.
In which, [w1 , w2 ] = [0.55, 0.45] is decided according to

experiments.
III. EXPERIMENT AND DISCUSSION
A. Performance Evaluation Parameters
The performance of an image quality metric is typically
judged by how well it correlates with human opinion of quality,
five criterions are employed to evaluate the correlation between
subjective and objective quality scores [21]: 1) Pearson linear
correlation coefficient (PLCC); 2) Spearman rank-order correlation coefficient (SROCC); 3) PLCC after a nonlinear modified
logistic mapping; 4) root-mean-squared error (RMSE) between
DMOS values and model predicted metric; and 5) mean absolute
error (MAE). In which, LCC and RMSE are employed to evaluate prediction accuracy; SROCC is used to assess prediction
monotonicity. A good objective quality measure is expected to
achieve high values in PLCC, SROCC, and low values in RMSE
and MAE.
B. Comparisons
In this section, we testify the performance of the proposed
metric and compare it with several classical or state-of-the-art
IQA metrics. These competitors are: PSNR, SSIM [1], MSSSIM [2], GMSD [22], BRISQUE [7], SSEQ [23], and NIQE
[9]. In which, PSNR, SSIM, MS-SSIM, and GMSD are FR
metrics, BRISQUE, SSEQ, and NIQE are NR metrics. All the
competitors are general-purposed IQA metrics, perform well on
singly distorted images, their ability on evaluation of multidistortion images are tested in LIVEMD database and compared
with the proposed metric. Scatter plots in Fig. 5 show the linear
fitting results of the DMOS versus the comparison metrics and
our CTDBBM.
Table I presents the performance evaluation parameters. Overall, the FR metrics perform better than NR metrics. The NR
metrics, BRISQUE and SSEQ perform very poorly because
blur and noise causes contrary changes on image features selected to build the prediction models. The detailed discussion
TABLE I
PERFORMANCE MEASURES OF THE PROPOSED CTDBBM AND COMPARISON
METRICS ON LIVEMD, THE BEST TWO ARE SHOWN IN BOLD
IQA metrics
Type
PSNR
FR
SSIM MS-SSIM GMSD BRISQUE SSEQ NIQE Proposed

FR
FR
FR
NR
NR
NR
NR
PLCC
0.7736 0.7473
PLCC (mapping) 0.7751 0.7725
SROCC
0.7088 0.7022
RMSE
11.7866 11.8467
MAE
9.3410 9.4329
0.8398
0.8926
0.8646
8.4110
6.6754
0.8518 0.2869 0.4662 0.8210

0.8647 0.4596 0.4713 0.8483
0.8366 0.2992 0.4370 0.7945
9.2815 16.5675 16.4529 9.8792
7.2868 13.3621 13.4414 7.7779
0.8550
0.8670
0.8467
9.2967
7.2416
can be found in [11]. However, the proposed NR metric, CTDBBM performs very well in terms of correlation with human
opinions. It is significantly outperforming several FR metrics
and all the NR metrics. The only competitive is MS-SSIM, a
FR metric. Furthermore, the proposed CTDBBM is a universal
model, since the blur and noise estimation components within
our framework can be replaced with other more effective dedicated blind metrics to improve the prediction accuracy.
IV. CONCLUSION
This letter is devoted to one specific problem in multidistortion IQA studies: the quality evaluation of double-distortion
images distorted by both blur and noise. Due to the contrary
effects of blur and noise on many image features, this problem is particularly difficult using the traditional idea based on
feature selection. By applying the CTD operation to the distorted image, we successfully decompose the image into cartoon
part and texture part, which are applied to blur estimation and
noise estimation, respectively. The evaluation on the publicly
released LIVEMD image database shows that CTDBBM correlates highly with human subjective opinions and outperforms
several powerful FR metrics and state-of-the-art NR metrics on
multidistortion images.
ZHANG AND ROYSAM: BLIND QUALITY METRIC FOR MULTIDISTORTION IMAGES BASED ON CARTOON AND TEXTURE DECOMPOSITION
REFERENCES
[1] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, Image quality assessment: From error visibility to structural similarity, IEEE Trans. Image
Process., vol. 13, no. 4, pp. 600612, Apr. 2004.
[2] Z. Wang, E. P. Simoncelli, and A. C. Bovik, Multi-scale structural similarity for image quality assessment, in Proc. IEEE Asilomar Conf. Signals,
Syst. Comput., Nov. 2003, pp. 13981402.
[3] Z. Wang and Q. Li, Information content weighting for perceptual image quality assessment, IEEE Trans. Image Process., vol. 20, no. 5,
pp. 11851198, May 2011.
[4] K. Gu, G. Zhai, X. Yang, W. Zhang, and M. Liu, Structural similarity
weighting for image quality assessment, in Proc. IEEE Int. Conf. Multimedia Expo Workshops, Jul. 2013, pp. 16.
[5] G. Zhai, X. Wu, X. Yang, W. Lin, and W. Zhang, A psycho visual quality
metric in free-energy principle, IEEE Trans. Image Process., vol. 21,
no. 1, pp. 4152, Jan. 2012.
[6] A. Rehman and Z. Wang, Reduced-reference image quality assessment
by structural similarity estimation, IEEE Trans. Image Process., vol. 21,
no. 8, pp. 33783389, Aug. 2012.
[7] A. Mittal, A. K. Moorthy, and A. C. Bovik, No-reference image quality
assessment in the spatial domain, IEEE Trans. Image Process., vol. 21,
no. 12, pp. 46954708, Dec. 2012.
[8] M. A. Saad, A. C. Bovik, and C. Charrier, Blind image quality assessment: A natural scene statistics approach in the DCT domain, IEEE Trans.
Image Process., vol. 21, no. 8, pp. 33393352, Aug. 2012.
[9] A. Mittal, R. Soundararajan, and A. C. Bovik, Making a completely
blind image quality analyzer, IEEE Signal Process. Lett., vol. 22, no. 3,
pp. 209212, Mar. 2013.
[10] D. M. Chandler, Seven challenges in image quality assessment: Past,
present, and future research, ISRN Signal Process., vol. 2013, 2013,
Art. no. 905685.
[11] D. Jayaraman, A. Mittal, A. K. Moorthy, and A. C. Bovik, Objective
quality assessment of multiply distorted images, in Proc. IEEE Asilomar
Conf. Signals, Syst. Comput., Nov. 2012, pp. 16931697.
1269
[12] D. Szolgay and T. Sziranyi, Adaptive image decomposition into cartoon

and texture parts optimized by the orthogonality criterion, IEEE Trans.
Image Process., vol. 21, no. 8, pp. 34053415, Aug. 2012.
[13] H. Schaeffer and S. Osher, A low patch-rank interpretation of texture,
SIAM J. Imag. Sci., vol. 6, no. 1, pp. 226262, 2013.
[14] M. K. Ng, X. Yuan, and W. Zhang, Coupled variational image decomposition and restoration model for blurred cartoon-plus-texture images with missing pixels, IEEE Trans. Image Process., vol. 22, no. 6,
pp. 22332246, Jun. 2013.
[15] V. Duval, J.-F. Aujol, and L. Vese, Mathematical modeling of textures:
Application to color image decomposition with a projected gradient algorithm, J. Math. Imag. Vis., vol. 37, no. 3, pp. 232248, 2010.
[16] S. Ono, T. Miyata, and I. Yamada, Cartoon-texture image decomposition
using blockwise low-rank texture characterization, IEEE Trans. Image
Process., vol. 23, no. 3, pp. 11281142, Mar. 2014.
[17] V. L. Guen, Cartoon + texture image decomposition by the TV-L1
model, Image Process. On Line, vol. 4, pp. 204219, 2014.
[18] A. Buades and J. Lisani, Directional filters for cartoon + texture
image decomposition, Image Process. On Line, vol. 6, pp. 7588,
2016.
[19] C. Feichtenhofer, H. Fassold, and P. Schallauer, A perceptual image
sharpness metric based on local edge gradient analysis, IEEE Signal
Process. Lett., vol. 20, no. 4, pp. 379382, Apr. 2013.
[20] X. Liu, M. Tanaka, and M. Okutomi, Single-image noise level estimation
for blind denoising noisy image, IEEE Trans. Image Process., vol. 22,
no. 12, pp. 52265237, Dec. 2013.
[21] VQEG. (2000, Mar.). Final report from the video quality experts group on
the validation of objective models of video quality assessment. [Online].
Available: http://www.vqeg.org/
[22] W. Xue, L. Zhang, X. Mou, and A. C. Bovik, Gradient magnitude similarity deviation: A highly efficient perceptual image quality index, IEEE
Trans. Image Process., vol. 23, no. 2, pp. 684695, Feb. 2014.
[23] L. Liu, B. Liu, H. Huang, and A. Conrad Bovik, No-reference image quality assessment based on spatial and spectral entropies, Signal Process.,
Image Commun., vol. 29, no. 8, pp. 856863, Sep. 2014.

Cartoon and Texture Decomposition

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Cartoon and Texture Decomposition

Transféré par

Droits d'auteur :

Formats disponibles

IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO.

Blind Quality Metric for Multidistortion Images

Flowchart of the proposed metric.

IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO. 9, SEPTEMBER 2016

CTD. (a) Lake building. (b) Cartoon. (c) Texture.

wup (x) + wdown (x)

Step 3: The refinement of the edge width according to human

PSI versus NL.

where E is the image entropy f is a linear function of E decided

where q and Q scores are before and after mapping,

IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO. 9, SEPTEMBER 2016

In which, [w1 , w2 ] = [0.55, 0.45] is decided according to

SSIM MS-SSIM GMSD BRISQUE SSEQ NIQE Proposed

0.8518 0.2869 0.4662 0.8210

[12] D. Szolgay and T. Sziranyi, Adaptive image decomposition into cartoon

Vous aimerez peut-être aussi