Académique Documents
Professionnel Documents
Culture Documents
ABSTRACT so called Web 2.0 wave) have led to the countless private
The community-contributed media contents over the Inter- image collections on local computing devices such as per-
net have become one of the primary sources for online adver- sonal computers, cell phones, and personal digital assistants
tising. However, conventional ad-networks such as Google (PDAs), as well as the huge yet increasing public image col-
AdSense treat image and video advertising as general text lections on the Internet. Compared with text and video,
advertising without considering the inherent characteristics image has some unique advantages: it is more attractive
of visual contents. In this work, we propose an innovative than plain text, and it has been found to be more salient
contextual advertising system driven by images, which auto- than text [9], thus it can grab users’ attention instantly; it
matically associates relevant ads with an image rather than carries more information that can be comprehended more
the entire text in a Web page and seamlessly inserts the quickly, just like an old saying, a picture is worth thousands
ads in the nonintrusive areas within each individual image. of words; and it can be shown to the users faster than video.
The proposed system, called ImageSense, represents the first As a result, image has become one of the most pervasive me-
attempt towards contextual in-image advertising. The rele- dia formats on the Internet [13]. On the other hand, we have
vant ads are selected based on not only textual relevance but witnessed a fast and consistently growing online advertising
also visual similarity so that the ads yield contextual rele- market in recent years. Motivated by the huge business op-
vance to both the text in the Web page and the image con- portunities in the online advertising market, people are now
tent. The ad insertion positions are detected based on image actively investigating new Internet advertising models. To
saliency to minimize intrusiveness to the user. We evaluate take advantages of the image form of information represen-
ImageSense on three photo-sharing sites with around one tation, image-driven contextual advertising, which associates
million images and 100 Web pages collected from several ma- advertisements with an image or a Web page containing im-
jor sites, and demonstrate the effectiveness of ImageSense. ages, has become an emerging online monetization strategy.
Many existing ad-networks such as Google AdSense [1],
Yahoo! [32], and BritePic [4], have provided contextual ad-
Categories and Subject Descriptors vertising services around images. However, conventional
H.3.3 [Information Storage and Retrieval]: Information image advertising primarily uses text content rather than
Search and Retrieval—Selection process; H.3.5 [Information image content to match relevant ads. In other words, im-
Storage and Retrieval]: Online Information Services— age advertising is usually treated as general text advertis-
Web-based services ing without considering the potential advantages could be
brought by images. There is no existing system to auto-
General Terms matically monetize the opportunities brought by individual
image. As a result, the ads are only generally relevant to
Algorithms, Experimentation, Human Factors.
the entire Web page rather than specific to images it con-
tained. Moreover, the ads are embedded at a predefined po-
Keywords sition in a Web page adjacent to the image, which normally
Web page segmentation, image saliency, image advertising. destroys the visually appealing appearance and the struc-
ture of original web page. It could not grab and monetize
1. INTRODUCTION users’ attention aroused by these compelling images. Fig-
ure 1 (a) and (b) show some exemplary image ad-networks,
The proliferation of digital capture devices and the explo-
where Google AdSense [1] displays the ads side by side with
sive growth of online social media (especially along with the
the image, while BritePic [4] embeds the ads at a corner of
the image. It is worth noting that although BritePic [4] also
supports embedding the ads within an image, the ad posi-
Permission to make digital or hard copies of all or part of this work for tions are most likely predefined without considering whether
personal or classroom use is granted without fee provided that copies are these positions would be intrusive to the users.
not made or distributed for profit or commercial advantage and that copies We believe that it is not appropriate to treat image ad-
bear this notice and the full citation on the first page. To copy otherwise, to vertising as general text advertising. One point we want to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
make here is that you can view online image contents as
MM’08, October 26–31, 2008, Vancouver, British Columbia, Canada. an effective information carrier for commercials. The more
Copyright 2008 ACM 978-1-60558-303-7/08/10 ...$5.00.
(a) AdSense [1] (b) BritePic [4] (c) ImageSense
Figure 1: The exemplary image advertising schemes. The highlighted rectangle areas indicate the associated
ads. The existing image ad-networks like (a) and (b) match ads with Web pages only based on textual
relevance and embed ads at a preserved position, while (c) associates ads not only with Web page but also
with the image content and seamlessly embeds the ads at a non-intrusive position within the image.
compelling the image contents, the more audience will view (c) gives an example of ImageSense. It is observed that a
them, then the more revenue will be generated. The follow- relevant ad (i.e., the NBA logo) has been inserted into the
ing distinctions between image and text advertising motivate region with less information within this image. Please note
a new advertising model dedicated to images. that this ad logo can also progressively appear, and then
disappear after a few seconds without affecting user viewing
• We believe that beyond the traditional media of Web experience, which is less intrusive and more visually pleasant
pages and videos, images can be more powerful and than conventional advertising.
effective carriers of online advertising. The rest of the paper is organized as follows. Section 2
reviews closely related research on advertising. Section 3
• We believe that the ads should be dynamically embed- provides a system overview of ImageSense. The details of
ded at the appropriate positions within each individual ImageSense are described in Section 4. Section 5 gives the
image (i.e., in-image) rather than at a predefined po- experiments, followed by conclusions in Section 6.
sition in the Web page. Based on this claim, it is
reasonable to assume the ads should be mostly image
logos, rather than just textual descriptions, which are
2. RELATED WORK
more suited to be seamlessly embedded into images. Driven by the huge online business opportunities, online
advertising has become emerging areas of research. One of
• We believe that the ads should be locally relevant to the fundamental problems in online advertising is ad rele-
image content and its surrounding text, rather than vance which in studies detracts from user experience and
globally relevant to the entire Web page. Therefore, it increases the probability of reaction [19]. The other one is
is reasonable to assume that the image content and its how to pick suitable keywords or Web pages for advertising.
surrounding text should have much more contributions The literature review will focus on two key problems: 1) ad
to the relevance matching than the whole Web page. keyword selection, and 2) ad relevance matching.
Typical advertising systems analyze a Web page or query
Motivated by the above observations, we propose in this to find prominent keywords or categories, and then match
paper an innovative in-image advertising model to automat- these keywords or categories against the words for which
ically deliver contextually relevant ads by an online advertis- advertisers bid. If there is a match, the corresponding ads
ing system. The proposed system, named ImageSense, sup- will be displayed to the user through the web page. Yih et
ports contextual image advertising by associating the most al. has studied a learning-based approach to automatically
relevant ads to an online image and seamlessly embedding extracting appropriate keywords from Web pages for adver-
the ads at the most appropriate positions within this image. tisement targeting [35]. Instead of dealing with general Web
The ads are selected to be globally relevant to the Web page pages, Li et al. proposed a sequential pattern mining-based
content, as well as locally relevant to the image content and method to discover keywords from a specific broadcasting
its surrounding text. Meanwhile, the ads are embedded at content domain [16]. In addition to Web pages, queries also
the most non-salient positions within the image. By lever- play an important role in paid search advertising. In [29],
aging visual content analysis, we are on the positive side to the queries are classified into an intermediate taxonomy so
better solve two challenges in online advertising, that is, ad that the selected ads are more targeted to the query.
relevance and ad position. Therefore, ImageSense is able to Research on ad relevance has proceeded along three di-
achieve effective advertising which is a powerful complement rections from the perspective of what the ads are matched
to conventional contextual advertising. To the best of our against: (1) keyword-targeted advertising (also called “paid
knowledge, this is one of the first attempts to investigate search advertising” or “sponsored search”) in which the ads
how visual content (both high-level concepts and low-level are matched against the originating query [12] [20], (2) content-
appearance) can benefit online image advertising. Figure 1 targeted advertising (also called “contextual advertising”) in
CDEDFGHIJEKL MJNK OKNPKGQJQ DFG JGL RPJNK STFUV WKQKUQ DFG |TFIJT } ~FUJT KQJT K TKJGUK K TKJGUK Q DP DJQ DFG
f_g h]^_ i_jc qp z]eqaq]c_
[\]^_ q apc
`abc_dae^ kblmno g]p_q ujv]epale
ZKIEDQK rsddlseqt i_jc i_jc i_jcs]b
@AB136 zlem_vc wa^xo b_y_b
_b_y]em_
{_c_mcld zlem_vcp ]cmxae^
XYJZ TK Y
•
•
• !
•
" # $ %
[\]^_
r]ba_em_
& '() aps]b qo [\]^_
{_c_mcale
_b_y]em_ kblmn
*+,,-+./0.1 23456 ]cmxae^ ]cmxae^
3 B13 *31A3 .5B5 0-. $ %
789 :
;<
8 < = && > ? L RGEK YQ DFG MF DGQ WKQKUQ DFG
which the ads are associated with the Web page content candidate ad insertion positions are detected in each image
rather than the keywords [5] [26], and (3) user-targeted ad- through visual saliency analysis. The ads will be reranked
vertising (also called “audience intelligence”) in which the by simultaneously considering local content relevance which
ads are driven based on user profile and demography [11], is derived from the visual similarity between each ad and
or behavior [27]. Although the paid search market develops ad insertion point, as well as the intrusiveness of ad inser-
quicker than contextual advertising market, and most tex- tion point. Finally, the most contextually relevant ads will
tual ads are still characterized by “bid phrases,” there has be embedded at the most non-intrusive position within each
been a drift to contextual advertising as it supports a long- image by a relevance optimization module.
tail business model [15]. For example, a recent work [26]
examines a number of strategies to match ads to Web pages
based on extracted keywords. To alleviate the problem of 4. IMAGESENSE
exact keyword match in conventional contextual advertising, In this section, we describe the implementation of Image-
Broder et al. propose to integrate semantic phrase into tra- Sense. We will show how the tasks of associating ads with
ditional keyword matching [5]. Specifically, both the pages the triggering page and image are formulated as an opti-
and ads were classified into a common large taxonomy, which mization problem and how this problem is practically solved
was then used to narrow down the search of keywords to using a heuristic searching approach.
concepts. Most recently, Hu et al. propose to predict user
demographics from browsing behavior [11]. The intuition is 4.1 Problem Definition
that while user demographics are not easy to obtain, brows- Without loss of generality, we define the task of Image-
ing behaviors indicate a user’s interest and profile. Sense as the association of ads with a Web page. We will
Recently, researchers have invented context-aware video show that it is easy to extend to the tasks of image- and
ads that can be inserted into video clips using intelligent website- based advertising in a single formulation. Let I de-
video content analysis techniques [22] [30]. However, due to note an image in the Web page P, which consists of Nb ad
current network bandwidth limitations, video is still not as Nb
insertion points, represented by B = {bi }i=1 . These inser-
popular in Web pages as text and image. tion points are usually blocks or regions in the image. Let A
denote the ad database containing Na ads, represented by
A = {aj }N j=1 . The task of ImageSense can be described as
a
VB-1-2-1-2
VB-1-2-1-2-1
Web page
VB-1
VB-1-2-1-2-2-1
VB-1-2-2
¡¢£¢¤¢£¢¤¢£ ¡¢£¢¤¢£¢¤¢¤
VB-1-2-1-2-2-2
¡¢£¢¤¢£¢£¢£¢£ ¡¢£¢¤¢£¢¤¢¤¢¤
Figure 3: Vision-based structure of a sample Web page. “VB” indicates the visual block. The surrounding
texts are extracted within the image blocks, i.e., “VB-1-2-1-2.”
a result, the textual content I(T ) can be decomposed into Let P(T ) denote the textual information of Web page P,
three parts, i.e., I(T ) = I(T1 ∪T2 ∪T3 ) = {I(T1 ) , I(T2 ) , I(T3 ) }, the global textual relevance Rg (P, aj ) is given by
where I(T1 ) = {title, description}, I(T2 ) = {expansion words (T )
of I(T1 ) }, and I(T3 ) = {concepts detected based on I(V ) }. Rg (P, aj ) = sim(P(T ) , aj ) (3)
Figure 4 shows an example of the original surrounding text, (T )
By considering P(T ) as the query and aj as the docu-
expansion text, and concept text. The corresponding image
ment, we can rank the ads with regard to the Web page P,
is extracted from “VB-1-2-1-2” in Figure 3.
which is widely applied in most existing contextual advertis-
Correspondingly, each ad aj can be represented by aj =
(V ) (T ) (V ) (T ) ing systems [1] [5] [14] [23] [26]. We adopt the global textual
{aj , aj }, where aj and aj denote the visual and tex- relevance for ad ranking as the baseline in ImageSense. Ac-
tual information associated with aj , respectively. Note that cordingly, the local textual relevance R` (I, aj ) is
(V ) (T )
aj corresponds to the visual content of ad logo and aj the
(T ) (T )
original text provided by the advertiser without expansion R` (I, aj ) = sim(I(T ) , aj ) = sim(I(T1 ∪T2 ∪T3 ) , aj ) (4)
(T )
words and concepts. The textual information aj consists The fusion of global and local textual relevance can be
of four structural parts: a title, keywords, a textual descrip- regarded as ad reranking. We will investigate how local rel-
tion, and a hyperlink. evance can benefit ad selection in the experiments.
We use Okapi BM25 algorithm [28] which is widely used
in information retrieval to compute the relevance between 4.4 Local Content Relevance
an ad landing page and an ad [21]. Given a query q with n
Note that ImageSense will embed the contextually rele-
terms {q1 , q2 , . . . , qn }, the BM25 score of a document d is
vant ads at certain spatial positions within an image. With
X
n
idf (qi ) × tf (qi , d) × (k + 1)
analysis on local content relevance, we aim to find the non-
sim(q, d) = (2) intrusive ad positions within an image and select the ads
i=1
tf (qi , d) + k(1 − b + b × ndl(d)) whose logos are visually similar or have similar visual style
to the image, in order to minimize intrusiveness and improve
where tf (qi , d) is the term frequency of qi in d, ndl(d) = user experience. Specifically, the candidate ad insertion po-
|d|/|d| is normalized document length, |d| and |d| indicate sitions (usually the blocks in B) are detected based on a
the document length of d and the average document length salience measure which is derived from an image saliency
in document collection, k and b are two parameters and map, while visual similarity is measured based on the HSV
usually chosen as k = 2.0 and b = 0.75, idf (qi ) is the inverse color feature. These two measures are combined for com-
document frequency of the query qi , give by puting the local content relevance Rc (bi , aj ).
M − m(qi ) + 0.5 Given an Image I which is represented by a set of blocks
idf (qi ) = log Nb
B = {bi }i=1 , a saliency map S representing the importance
m(qi ) + 0.5
of the visual content is extracted for image I, by investigat-
where M is the total number of documents in the collection, ing the effects of contrast in human perception [18]. Figure
and m(qi ) is the number of documents containing qi . 5(b) shows an example of the saliency map of an image. The
¿ÀÁÂà à áâÂâãÁä åæááçæãèâã éÃêë ìêíÁãîâçã éÃêë
¥¦§ ¨©ª«¬ ˱°¾Êº µÊ±½Ëº γ°ÎÔ˺ ±¹¾Ëκ ·¶¶´º
®¯°±² ³´µ ¶· ¸ ³¹±º »¼¯½¾ ½¯°±² ³´º ¼Ë¶¼ °±½Ê¯º ¾°³±Í¼ Ô˺
Ħ¬ÅªÆÇÈÆ©É Ë¼¯½¾º Ì·¶º °±³ÍÓ¶Òº µÌÍ°³µËº
¾ÊË ²¶µ¾ ·±²¶Ìµ ²¶Í̲Ë;µ ¶· ² ̲ ²¯º ½°³µ²º ´³Í¶µ±Ì°º ˲ ½³°Ëº
±ÍγË; »¼¯½¾Ï ÐÊËµË ²±µµ³ÑË ØÊ̷̺ ¾±Ó±º ÎÊ˶½µº ¼°Ë±¾
µ¾¶ÍË µ¾°Ìξ̰˵ Ò Ë°Ë Ó̳Ծ ½¯°±² ³´º ´±Ê±Óº ±µÒ ±Íº ±Î³Ë;º
±°¶ÌÍ´ ÕÖ×× ¯Ë±°µ ±¼¶ ¶Í ± °¶Îد ѱÔÔ˯ ¶· ¾ÊË Ø³Í¼µº ½¯°³² ³´º
´ËµË°¾ ½Ô±¾Ë±Ì ÎÔ¶µË ¾¶ ¾ÊË Ù³ÔËÏ ¾±Íµ³Øº ½±Í¼Ë±º µ±ßß±°±º
Ú Ì¾ ¾ÊË ³Í¾°³¼ ̳ͼ »¼¯½¾³±Í ÊÌ°¼Ê±´±º ¼ °ËËÎ˺ ˼¯½¾³±Íº
½¯°±² ³´µ Ò Ë°Ë ²¶°Ë ¾Ê±Í Û Ìµ¾ ±·°³Î±º ´ÌÓ±³º ί½°Ìµº Û±²±³Î±º
¾¶² Óµ ·¶° سͼµÏ ÐÊË ²¯µ¾Ë°³Ëµ ¼ ʱͱ
µÌ°°¶ÌÍ´³Í¼ ¾Ê˳°
µ¯² Ó¶Ô³µ²º ´Ëµ³¼ Í ±Í´ ½Ì°½¶µË
ʱÑË ³Íµ½³°Ë´ ½±µµ³¶Í±¾Ë ´ËÓ±¾ËÏ ïçãðà íë éÃêë
ܾ ³µ Ô³ØËÔ¯ ¾Ê±¾ ²±Í¯ ¶· ¾Ê˵Ë
²¯µ¾Ë°³Ëµ Ò ³ÔÔ ÍËÑË° ÓË µ¶ÔÑË´ ÏÏÏ Ý Ì¾´¶¶°º Þ¶Ì;±³Íº ®Ë°µ¶Í
Figure 4: An example of the original surrounding text, expansion text, and concept text. The image is
extracted from “VB-1-2-1-2” in Figure 3.
Precision
ñ üý
üý
ùòö üýÿ
üýþ
ù
ü
Images Web pages Overall ü üýþ üýÿ üý üý üý üý üý üý üý þ
Recall
Figure 6: The evaluations on ad relevance.
Figure 7: Precision-recall curves.
Table 2: Evaluations on ad position and satisfaction. Corbis image with poor tags Corbis image with ad embedded
page corresponding to the image search results of “scene.” It [3] S. Boyd and L. Vandenberghe. Convex Optimization.
is observed that after image block detection, there are five Cambridge University Press, 2004.
images selected for advertising. [4] BritePic. http://www.britepic.com/.
[5] A. Broder, M. Fontoura, V. Josifovski, and L. Riedel.
6. CONCLUSIONS A semantic approach to contextual advertising. In
We have presented an innovative system for contextual in- Proceedings of SIGIR, 2007.
image advertising. The system, called ImageSense, is able [6] D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma. VIPS: a
to automatically decompose the Web page into several co- vision-based page segmentation algorithm. In
herent blocks, select the suitable images from these blocks Microsoft Technical Report (MSR-TR-2003-79), 2003.
for advertising, detect the nonintrusive ad insertion posi- [7] CNN. http://www.cnn.com/.
tions within the images, and associate the relevant adver- [8] Flickr. http://www.flickr.com/.
tisements with these positions. In contrast to conventional [9] R. J. Gerrig and P. G. Zimbardo. Psychology and Life
ad-networks which treat image advertising as general text (16 Edition). Allyn & Bacon, 2001.
advertising, ImageSense is fully dedicated to image and aims [10] X. He, D. Cai, J.-R. Wen, W.-Y. Ma, and H.-J.
to monetize the visual content rather than surrounding text. Zhang. Clustering and searching www images using
We formalize the association of ads with images as an op- link and page layout analysis. ACM Transactions on
timization problem which maximizes the overall contextual Multimedia Computing, Communications, and
relevance. We have conducted extensive experiments over a Applications, 3(2), 2007.
large-scale of real-world image data and achieved promising [11] J. Hu, H.-J. Zeng, H. Li, C. Niu, and Z. Chen.
results in terms of ad relevance and user experience. To the Demographic prediction based on user’s browsing
best of our knowledge, this is one of the first works showing behavior. In Proceedings of WWW, 2007.
how visual content analysis at different levels (i.e., low-level [12] A. Joshi and R. Motwani. Keyword generation for
appearance and high-level semantic) can yield more effec- search engine advertising. In Proceedings of the
tive image advertising, especially from the perspectives of Workshops of IEEE International Conference on Data
ad position detection and contextual relevance. Mining, 2006.
[13] L. Kennedy, M. Naaman, S. Ahern, R. Nair, and
7. REFERENCES T. Rattenbury. How Flickr helps us make sense of the
[1] AdSense. http://www.google.com/adsense/. world: Context and content in community-contributed
[2] R. Baeza-Yates and B. Ribeiro-Neto. Modern media collections. In Proceedings of ACM Multimedia,
Information Retrieval. Addison Wesley, 1999. Augsburg, Germany, 2007.
[14] A. Lacerda, M. Cristo, M. A. Goncalves, et al. [24] MySpace. http://www.myspace.com/.
Learning to advertise. In Proceedings of SIGIR, 2006. [25] photoSIG. http://www.photosig.com/.
[15] H. Li, S. M. Edwards, and J.-H. Lee. Measuring the [26] B. Ribeiro-Neto, M. Cristo, P. B. Golgher, and E. S.
intrusiveness of advertisements: scale development and Moura. Impedance coupling in content-targeted
validation. Journal of Advertising, 31(2):37–47, 2002. advertising. In Proceedings of SIGIR, 2005.
[16] H. Li, D. Zhang, J. Hu, H.-J. Zeng, and Z. Chen. [27] M. Richardson, E. Dominowska, and R. Ragno.
Finding keyword from online broadcasting content for Predicting clicks: Estimating click-through rate for
targeted advertising. In International Workshop on new ads. In Proceedings of WWW, 2007.
Data Mining and Audience Intelligence for [28] S. E. Robertson, S. Walker, and M. Hancock-Beaulieu.
Advertising, 2007. Okapi at TREC-7. In the Seventh Text REtrieval
[17] Live Image Search. Conference, Gaithersburg, USA, Nov. 1998.
http://www.live.com/?&scope=images. [29] D. Shen, J.-T. Sun, Q. Yang, and Z. Chen. Building
[18] Y.-F. Ma and H.-J. Zhang. Contrast-based image bridges for web query classification. In Proceedings of
attention analysis by using fuzzy growing. In SIGIR, 2006.
Proceedings of ACM Multimedia, pages 374–381, Nov [30] S. H. Srinivasan, N. Sawant, and S. Wadhwa. vADeo -
2003. video advertising system. In Proceedings of ACM
[19] S. Mccoy, A. Everard, P. Polak, and D. F. Galletta. Multimedia, 2007.
The effects of online advertising. Communications of [31] TRECVID.
The ACM, 50(3):84–88, 2007. http://www-nlpir.nist.gov/projects/trecvid/.
[20] A. Mehta, A. Saberi, U. Vazirani, and V. Vazirani. [32] Yahoo! Image.
Adwords and generalized on-line matching. Journal of http://images.search.yahoo.com/images.
the ACM, 54(5), Oct. 2007. [33] B. Yang, T. Mei, X.-S. Hua, L. Yang, S.-Q. Yang, and
[21] T. Mei, X.-S. Hua, W. Lai, L. Yang, et al. M. Li. Online video recommendation based on
MSRA-USTC-SJTU at TRECVID 2007: High-level multimodal fusion and relevance feedback. In
feature extraction and search. In TREC Video Proceedings of CIVR, 2007.
Retrieval Evaluation Online Proceedings, 2007. [34] Y. Yang and X. Liu. A re-examination of text
[22] T. Mei, X.-S. Hua, L. Yang, and S. Li. VideoSense: categorization methods. In Proceedings of SIGIR,
Towards effective online video advertising. In 1999.
Proceedings of ACM Multimedia, pages 1075–1084, [35] W.-T. Yih, J. Goodman, and V. R. Carvalho. Finding
Augsburg, Germany, 2007. advertising keywords on web pages. In Proceedings of
[23] V. Murdock, M. Ciaramita, and V. Plachouras. A WWW, 2006.
noisy channel approach to contextual advertising. In
International Workshop on Data Mining and Audience
Intelligence for Advertising, 2007.