Académique Documents
Professionnel Documents
Culture Documents
Yang Xu, Jun Li, Jianbin Chen, Guangtian Shen, Yangjian Gao
Automation School, Chongqing University, Chongqing, China
e-mail: {xuyang.lijun_777.chenjianbin.shenguangtian.gaoyangjian}@cqu.edu.cn
Abstract-Visual saliency detection is usually a prerequisite for principals when exposed to a dynamic scene, for example,
image processing tasks like object segmentation, object we prefer to give object right on the middle of our view more
recognition and information compression. In this work, we attention than those near the edge [4]. For better persuasion,
first present a novel method that combines cognitive-based segmentation process should bring these TD attention
objectness and image-based saliency to obtain a better saliency surveillance into play so that a more intuitively favorable
map with less negligible background information. Then, by
object can be selected.
introducing some top-down attention priors, we further
propose a computational selective attention model for object
segmentation task on the saliency map. Experiments have
shown that our method obviously refines the saliency map of
several state-of-the-art algorithms, and our selective attention
model meanwhile evidently improves salient object
segmentation performance in the challenging saliency
benchmark-Extend Complex Scene Saliency Dataset (ECSSD).
I. INTRODUCTION (d)
Saliency detection is originated from visual attention of Figure 1. Illustration of our method: (a) input image (b) saliency map (c)
objectness map (d) overall saliency map (e) most salient region (f)
primates, which enables us to quickly distinguish attractive
ground-truth mask.
objects from those common ones or background. Attention
mechanism is of great importance for us to filter and refine In this paper, we attempt to refine saliency map by
information, and so is saliency detection for handling incorporating cognitive-based objectness and image-based
explosively growing image data. Specifically, finding salient saliency, for purpose of combing both TD object-level
objects or regions in an image will facilitates tasks such as information and BU pixel-level appearance to saliency map
object recognition, object segmentation, content-aware generation. Objectness quantifies how likely an image
image editing, and adaptive compression of images. window actually contains an object of any class [5,9]. Unlike
Saliency map is frequently used in previous work to category-dependent objectness in early work [7,8], we only
quantify visual attention distribution in an image. Two care about their category-independent representations. With
classes of knowledge dominate our attention assignment, the optimization of objectness, our saliency map shows
bottom-up and top-down knowledge. BU knowledge focus relatively great edge than results of conventional algorithms,
on cues that we naturally interested in, like colors, and its advantages extend to the following segmentation
orientations, density [1] and frequency [2,6], while TD operation. Objectness enables us to find more spatial cues in
knowledge is not innate but learned from tasks, for example, saliency map, it will conquer the drawback of scattering
our eye fixations basically favor objects with big volume [4] saliency in many algorithms [1,10,11], and this potentially
or with higher similarity to something we know [3,5]. Some improves segmentation performance. After obtaining the
previous work have also proved that purely BU fashion can saliency map, we further propose a TD attention-based
hardly generate object-level saliency maps in a given task, segmentation model, it is more intuition-driven and
and a reasonable map should take both image-based BU cues outperforms the state-of-the-art adaptive threshold
and cognitive-based TD cues into consideration for detecting segmentation method [6] in ECSSD benchmark. (See
the most salient objects in images. illustration in Figure 1).
Saliency segmentation is the last step for selecting a
concrete object, and segmentation performance largely II. RELATED WORK
depends on saliency map generated above, regions with
The term saliency was introduced by Olshausen and
higher saliency values are more likely to be chosen.
Anderson in their work [12]. Itti and Koch adopt
However, besides taking saliency as object's inherent
center-surround and inhabit of return principals from
attribute, we also obey some shared attention habits or
involved saliency map, our segmentation performance is a 0 or 1 indicator function denoting whether
J(pT,j. EB)
x
transcends method [6] in precision, recall and F-Measure for
considerable margins. Pi,} is inside the bounding box. We further normalize
362
In this paper, we choose four traditional saliency detector
algorithms proposed by Itti et al. to compute bottom-up
saliency map with low-level pixel cues. The choice of these
algorithms is motivated by the following reasons: Itti's
center-surround model [I] is the most classic model and is
ground-breaking for computational saliency map. Yan et al.
present a tree model [12] to decompose image information to
three layers and capture hierarchical saliency, it proves
remarkable performance in many saliency detection datasets
like MSRA-IOOO and 5000 Datasets [6]. Wei et al. [11] take
an opposite perspective, they focus more on the background
Figure 4. General framework to produce overall saliency map. Four overall
instead of the object, and exploit two common priors about saliency maps created from four different saliency maps and one identical
background in nature images: boundary and connectivity, the objectness map.
novel saliency measure called geodesic saliency. Federico
and Pritch [ lO] base their saliency estimation on uniqueness IV. SELECTIVE ATTENTION MODEL
and spatially distribution contrast, and formulate in a unified
way using high-dimensional Gaussian filters, it largely A. Introduction of Top-down Factors
simplify conception and leads to an efficient implementation Besides objectness, additional four top-down factors are
with linear complexity. We refer these four detectors to IT, raised based on principals or habits human visual attention
HS, GS and SF later. abides when detecting salient objects in a scene, together
For each image, we will have four preliminary saliency with overall saliency map, they are:
maps derived from aforementioned algorithms as • Pixel Saliency Value: Fusing object-level information
demonstrated by Figure 3. and bottom-up saliency.
• Area Size: Assigning more attention to bigger objects.
• Object Position: Assigning more attention to objects
near the optical center.
• Average saliency: Assigning more attention to overall
salient objects.
(a) (b) (c) • Regional variance: Eliminating effect of abnormal
points with high saliency value.
B. Feature Quantification and Merging
Before quantifying features, a binary segmentation is
operated on overall saliency map with an adaptive threshold.
(d) (e) (f)
others
(5)
C. Overall Saliency Map (OSM) Generation where Bi is the binary value after segmentation, sa lmean is
,j
After objectness map and preliminary saliency maps are mean value of overall saliency map, and r is the threshold
obtained, a linear approach is raised to combine them in
order to form the overall saliency map: coefficient.
After building the binary image, we also filter out tiny
regions that have few pixels. Now, five nominated factors
OSM.
t,)
=axOM'i. +(l-a)xSM. (3)
{,J
j will be quantified for our computational attention model.
Because we have one or more regions in a segmented binary
where OSM stands for the overall saliency map, and a is the map, when talking about region R , all pixel values in it will
weight for objectness map. Likewise, it needs to be be labeled as 1, and pixels outside as 0, for a specific pixel,
normalized: label 1 or label 0 .
= =
We normalize OSM to ensure that it obeys normal The second factor is measured by the pixels involved in
distribution, and set negatives to zeros, In this way pixel each region. We further denote proportion of every area size
scores will be smoother and have stronger expressive ability. to the whole map as AreaP , and a scale parameter 2 is added
Figure 4 illustrates the overall fusing process. for the area importance in saliency detection. With Num
denoting for pixel numbers in a region, we have this feature:
363
111
m 11
n saliency benchmark contammg
challenging ECSSD saliency contammg 10001000
Num R =
NumR = IL 2 )abeli,,jj
Ilabeli
i=1 i=1
(7)
(7) images, together with
complex images, with the salient object annotations
i =1 i=1 given byby ground-truth masks. We separately
separately compare our
refmed overall saliency maps with
refined with saliency maps from four
AreaPD _ 2NumR
A - 2NumR
rearRR = (8)
(8) selective attention
conventional saliency detectors and our selective attention
mxn with a highly recognized segmentation method.
model with method,
The third factor is intuitively represented by Euler In our experiments, we use use only three objectness
of a region to image
distance of image center. By averaging pixel bounding box candidates to form an objectness map map with
distance bias from image center, we have: less computation budget.
less budget. Also we let a be
let a be 0.5 to
to equally
equally
weigh OM and SM SM for OSM.
OSM, For binary
binary segmentation, the
L J
L ~(i - midX)2 +(j
(i -midX)2 - midY)2
+(j -midY)2 r 1.2,
threshold r is set to 1.2.
· _!abel.,=
w_e
b �}
(�
.� _]l (9)
E U D
EUD·lstR
ISt =
R= - =
'
-
__________________
- - - - - - - ---- (9)
A. Saliency Map Comparison
A.
Num R
criteria presented in
We mainly adopted the criteria in [6] to
where (midX,
(midX,M idY) is the center coordinate in binary
MidY) performance of
evaluate the performance of the saliency algorithms using
using
map. (PR) curves.
precision recall (PR) curves. The
The PR curve needs a fixed
threshold T recall values
T to obtain the precision and recall values for all
all
m images and calculate the average. Varying T
images T from 00 to 255
midX midY = !!..
midX = m ' midY !!.. (10)
(10)
22 22 gives us the average PR curve as Figure 5.5.
The fourth feature is is obtained by averaging saliency
saliency
scores in every region to measure its overall salient degree: 00 .9.91--
r-I -----+-----
- -t-- -+-----
--+- - --+----
t- --+----
- --+- -----1
------i
0.8 �:..:..::::::::j:
::~-- $;l
�=�=""=".J--------+
:: ----�
"OSM: .. 07 - - -- - - - --- -: - �
� ---- -
L..
L.- I,J
l,J -
---- --- ... - :-.,.-:
.-- - ...... I
_w_be�,( ��j =l
la be( _I _____ C 06 � ---- _ :- � - "� �� "
Avg (11)
(11)
! ::~
A vgRR =
=--'-".,'-------
.Sj -
-::..=-:: -- --�- I ��
Num R
NumR £ 0.5
03~B ~
~ 04 - � ... ~ ~ ... ... ....
�'."'\
0.4
IT
--- :�-OB
~
---GS-OB
__ _ G5-0B
00 .11. -
--SF SF
, . - Avg )2 - - - SF-OB
---SF-OB
"(OSM
L... 0o
00 00 .2
.2 00 .4
. 00 ..6
6 00 .8
8
.
R
4
I,J
labe(j =! (12)
(12) Recall
R ecall
VarR = Num R
5. The PR curve of
Figure 5, ofsaliency (lT, HS,
saliency algorithms (IT, HS, GS,
GS, SF) and our
improved ones. Name with 08OB is from our method supplemented with
Finally, we design a normalization method for our objectness.
objectness,
model:
shown in figure 5,
As shown 5, the introduction ofof objectness
fi
fi - min(foaturee)
feature eR -min(foature.)
featuresR
R = ---"--....::....---"-'-------"-'--
eature
eature eeR = ---''--------'''------'''------''-'-
- - information significantly
information significantly improves performance of of IT
IT[[1]
1] and
max( foature
feature e) - min(featuree)
e)-min(foature e) (13) SF[lO]
SF slightly improves
[IO] and slightly improves GS
GS[ll],
[II], but degrades aa little
little in
in
(13)
HS[12].
HS [12]. Consider we use use minimal objectness bounding
bounding
max( foature - feature pR
feature pp)) -feature pR boxes, this result is is quite reasonable, because the former
feature R
feature R = -------"----------"---
..!:- --!:...
p
P
= ----------
Ta = - Sal(x, y )
Ta=-IISal(x,y) LL (16)
(16)
Thereafter, attention values can be projected to region
Thereafter, mxn x = 1l y�
x� y =l1
saliency, and conversely, the most salient object can be
saliency,
selected by picking the region with highest attention score. selective attention model
Our selective model is very
very different, we
we design
design
selecting salient regions using
a computational approach for selecting using
V. EXPERIMENTS score, see
region attention score, see Figure 6 and Table
Table I for
for an
an
evaluated our method and model on the very
We evaluated very illustration.
illustration.
364
364
TABLE I. ATTENTION SCORE FOR TOP THREE REGIONS bounding box candidates to thoughly investigate the effect of
object-level information. In addition, we plan to introduce
more top-down factors to our selective attention model to
make it practical and task-oriented when facing scenes with
more salient objects. Lastly, our methods bring in reasonable
time budget, we will explore more time-saving approaches.
ACKNOWLEDGMENT
365