5 - TOMCCAP - Browse by Chunks PDF

2011
Volume 7S, Number 1
Special Section on ACM Multimedia 2010 Best Paper Candidates

Article 20 S. Shirmohammadi Introduction ACM Transactions on
(2 pages) J. Luo
J. Yang
A. El Saddik Multimedia Computing,
Article 21 S. Bhattacharya A Holistic Approach to Aesthetic Enhancement of Photographs
(21 pages) R. Sukthankar
M. Shah
Communications and
Article 22
(22 pages)
S. Tan
J. Bu
Using Rich Social Media Information for Music Recommendation via
Hypergraph Model
Applications
C. Chen
B. Xu
C. Wang
X. He
Article 23 S. Milani A Cognitive Approach for Effective Coding and Transmission of 3D Video
(21 pages) G. Calvagno
ACM Transactions on Multimedia Computing, Communications and Applications

Article 24 R. Hong Video Accessibility Enhancement for Hearing-Impaired Users
(19 pages) M. Wang
X.-T. Yuan
M. Xu
J. Jiang
S. Yan
T.-S. Chua
SPECIAL ISSUE ON SOCIAL MEDIA

Article 25 S. Boll Introduction
(2 pages) R. Jain
J. Luo
D. Xu
Article 26 Y.-C. Lin Exploiting Online Music Tags for Music Emotion Classification
(16 pages) Y.-H. Yang
H. H. Chen
Article 27 M. Rabbath Automatic Creation of Photo Books from Stories in Social Media
(18 pages) P. Sandhaus
S. Boll
Article 28 W. Hu Recognition of Adult Images, Videos, and Web Page Bags
(24 pages) H. Zuo
O. Wu
Y. Chen
Z. Zhang
D. Suter
Article 29 Y.-R. Lin SCENT: Scalable Compressed Monitoring of Evolving Multirelational
(22 pages) K. S. Candan Social Networks
H. Sundaram
L. Xie
Article 30 J. Sang Browse by Chunks: Topic Mining and Organizing on Web-Scale Social Media
2011 • Vol. 7S • No. 1

(18 pages) C. Xu
Article 31 R. Ji Mining Flickr Landmarks by Modeling Reconstruction Sparsity
(22 pages) Y. Gao
B. Zhong
H. Yao
Q. Tian
Article 32 M. I. Mandel Contextual Tag Inference
(18 pages) R. Pascanu
D. Eck
Articles 20-33
Y. Bengio
L. M. Aiello
R. Schifanella
F. Menczer
Article 33 J.-I. Biel VlogSense: Conversational Behavior and Social Attention in YouTube
(21 pages) D. Gatica-Perez
2011
Volume 7S, Number 1
Special Section on ACM Multimedia 2010 Best Paper Candidates

Article 20 S. Shirmohammadi Introduction ACM Transactions on
(2 pages) J. Luo
J. Yang
A. El Saddik Multimedia Computing,
Article 21 S. Bhattacharya A Holistic Approach to Aesthetic Enhancement of Photographs
(21 pages) R. Sukthankar
M. Shah
Communications and
Article 22
(22 pages)
S. Tan
J. Bu
Using Rich Social Media Information for Music Recommendation via
Hypergraph Model
Applications
C. Chen
B. Xu
C. Wang
X. He
Article 23 S. Milani A Cognitive Approach for Effective Coding and Transmission of 3D Video
(21 pages) G. Calvagno
ACM Transactions on Multimedia Computing, Communications and Applications

Article 24 R. Hong Video Accessibility Enhancement for Hearing-Impaired Users
(19 pages) M. Wang
X.-T. Yuan
M. Xu
J. Jiang
S. Yan
T.-S. Chua
SPECIAL ISSUE ON SOCIAL MEDIA

Article 25 S. Boll Introduction
(2 pages) R. Jain
J. Luo
D. Xu
Article 26 Y.-C. Lin Exploiting Online Music Tags for Music Emotion Classification
(16 pages) Y.-H. Yang
H. H. Chen
Article 27 M. Rabbath Automatic Creation of Photo Books from Stories in Social Media
(18 pages) P. Sandhaus
S. Boll
Article 28 W. Hu Recognition of Adult Images, Videos, and Web Page Bags
(24 pages) H. Zuo
O. Wu
Y. Chen
Z. Zhang
D. Suter
Article 29 Y.-R. Lin SCENT: Scalable Compressed Monitoring of Evolving Multirelational
(22 pages) K. S. Candan Social Networks
H. Sundaram
L. Xie
Article 30 J. Sang Browse by Chunks: Topic Mining and Organizing on Web-Scale Social Media
2011 • Vol. 7S • No. 1

(18 pages) C. Xu
Article 31 R. Ji Mining Flickr Landmarks by Modeling Reconstruction Sparsity
(22 pages) Y. Gao
B. Zhong
H. Yao
Q. Tian
Article 32 M. I. Mandel Contextual Tag Inference
(18 pages) R. Pascanu
D. Eck
Articles 20-33
Y. Bengio
L. M. Aiello
R. Schifanella
F. Menczer
Article 33 J.-I. Biel VlogSense: Conversational Behavior and Social Attention in YouTube
(21 pages) D. Gatica-Perez
ACM
ACM Transactions on 2 Penn Plaza, Suite 701
New York, NY 10121-0701
Multimedia Computing, Tel.: (212) 869-7440

Fax: (212) 869-0481
Communications Home Page: http://tomccap.acm.org/ ACM Transactions on Multimedia Computing, Communications and Applications
http://tomccap.acm.org/
and Applications Guide to Manuscript Submission

Submission to the ACM Transactions on Multimedia Computing, Communications and Applications is done electronically
through http://acm.manuscriptcentral.com. Once you are at that site, you can create an account and password with which
you can enter the ACM Manuscript Central manuscript review tracking system. From a drop-down list of journals, choose
Editor-in-Chief ACM Transactions on Multimedia Computing, Communications and Applications and proceed to the Author Center to sub-
Ralf Steinmetz Technische Universität Darmstadt / Darmstadt, Germany / http://www.kom.e-technik.tu-darmstadt.de/People/Staff/ mit your manuscript and your accompanying files.
Ralf_Steinmetz/ralf_steinmetz.html / email: steinmetz.eic@kom.tu-darmstadt.de
Associate Editors You will be asked to create an abstract that will be used throughout the system as a synopsis of your paper. You will also be
Kiyoharu Aizawa University of Tokyo / Tokyo, Japan / email: aizawa@hal.t.u-tokyo.ac.jp asked to classify your submission using the ACM Computing Classification System through a link provided at the Author Center.
Grenville Armitage Swinburne University of Technology / Melbourne, Australia / http://caia.swin.edu.au/cv/garmitage / email: garmitage@ For completeness, please select at least one primary-level classification followed by two secondary-level classifications. To make
swin.edu.au the process easier, you may cut and paste from the list. Remember, you, the author, know best which area and sub-areas are
Susanne Boll University of Oldenburg / Oldenburg, Germany / http://medien.informatik.uni-oldenburg.de / email: susanne.boll@ covered by your paper; in addition to clarifying the area where your paper belongs, classification often helps in quickly identi-
informatik.uni-oldenburg.de fying suitable reviewers for your paper. So it is important that you provide as thorough a classification of your paper as possible.
Wolfgang Effelsberg University of Mannheim / Manheim, Germany / http://www.informatik.uni-mannheim.de / email: effelsberg@
informatik.uni-mannheim.de The ACM Production Department prefers that your manuscript be prepared in either LaTeX or Ms Word format. Style files
Abdulmotaleb El Saddik University of Ottawa / Ottawa, Canada / email: abed@mcrlab.uottawa.ca for manuscript preparation can be obtained at the following location: http://www.acm.org/pubs/submissions/submission.
Gerald Friedland University of California / Berkeley, CA / http://www.icsi.berkeley.edu/~fractor/homepage/About_Me.html / email: htm. For editorial review, the manuscript should be submitted as a PDF or Postscript file. Accompanying material can be in
fractor@icsi.berkeley.edu any number of text or image formats, as well as software/documentation bundles in zip or tar-gzipped formats.
Carsten Griwodz University of Oslo / Oslo, Norway / http://www.simula.no/portal_memberdata/griff / email: griff@simula.no
Questions regarding editorial review process should be directed to the Editor-in-Chief. Questions regarding the post-
Mohamed Hefeeda Simon Fraser University / Surrey, BC V3T 0A3, Canada / http://www.cs.sfu.ca/~mhefeeda / email: mhefeeda@cs.sfu.ca acceptance production process should be addressed to the Journal Manager, Laura Lander, at lander@hq.acm.org.
Mohan S. Kankanhalli National University of Singapore / Singapore / http://www.comp.nus.edu.sg/%7Emohan / email: mohan@comp.nus.edu.sg
Karrie Karahalios University of Illinois / Urbana-Champaign, IL / email: kkarahal@cs.uiuc.edu Subscription, Single Copy, and Membership Information.
Rainer Lienhart University of Augsburg / Augsburg, Germany / http://www.lienhart.de/ / email: rainer.lienhart@informatik.uni-augsburg.de Send orders to:
Ketan Mayer-Patel University of North Carolina / Chapel Hill, NC / http://www.cs.unc.edu/%7Ekmp /email: kmp@cs.unc.edu
ACM Member Services Dept.
Klara Nahrstedt University of Illinois / Urbana-Champaign, IL / http://cairo.cs.uiuc.edu/%7Eklara/home.html / email: klara@cs.uiuc.edu
General Post Office
Thomas Plagemann University of Oslo / Oslo, Norway / http://heim.ifi.uio.no/%7Eplageman / email: plageman@ifi.uio.no
PO Box 30777
Yong Rui Microsoft Research / Redmond, WA / http://research.microsoft.com/%7Eyongrui / email: yongrui@microsoft.com
New York, NY 10087-0777
Shervin Shirmohammadi University of Ottawa / Ottawa, Ontario, Canada / http://www.site.uottowa.ca/%7Eshervin / email: shervin@site.uottawa.ca
Hari Sundaram Arizona State University / Tempe, AZ / http://www.public.asu.edu/%7Ehsundara / email: Hari.Sundaram@asu.edu For information, contact:
Svetha Venkatesh Curtin University of Technology / Australia / http://www.computing.edu.au/%7esvetha/ email: s.venkatesh@exchange. Mail: ACM Member Services Dept.
curtin.edu.au 2 Penn Plaza, Suite 701
Michelle X. Zhou IBM Research Almaden / San Jose, CA / email: mzhou@us.ibm.com New York, NY 10121-0701
Roger Zimmerman National University of Singapore / Singapore / http://www.comp.nus.edu.sg/%7Erogerz/roger.html / email: rogerz@ Phone: +1-212-626-0500
comp.nus.edu.sg
Information Director Fax: +1-212-944-1318
Lasse Lehmann AGT Group (R&D) GmbH / Darmstadt, Germany / email: lasse.lehmann@kom.tu-darmstadt.de
Email: acmhelp@acm.org
Catalog: http://www.acm.org/catalog
Sebastian Schmidt Technische Universität Darmstadt / Darmstadt, Germany / http://www.kom.tu-darmstadt.de/en/kom-multimedia-
communications-lab/people/staff/sebastian-schmidt / email: TOMCCAP@kom.tu-darmstadt.de Subscription rates for ACM Transactions on Multimedia Computing, Communications and Applications are $ 40 per year for
Headquarters Staff ACM members, $35 for students, and $140 for nonmembers. Single copies are $18 each for ACM members and $40 for
nonmembers. Your subscription expiration date is coded in four digits at the top of your mailing label; the first two digits
Laura Lander Journal Manager
show the year, the last two show the month of expiration.
Irma Strolia Editorial Assistant
Media Content Marketing Production About ACM. ACM is the world’s largest educational and scientific computing society, uniting educators, researchers and
professionals to inspire dialogue, share resources and address the field’s challenges. ACM strengthens the computing pro-
fession’s collective voice through strong leadership, promotion of the highest standards, and recognition of technical
The ACM Transactions on Multimedia Computing, Communications and Applications (TOMCCAP) (ISSN: 1551-6857) is published quarterly in Spring, Summer, Fall,
and Winter by the Association for Computing Machinery (ACM), 2 Penn Plaza, Suite 701, New York, NY 10121-0701. Printed in the U.S.A. POSTMASTER: Send excellence. ACM supports the professional growth of its members by providing opportunities for life-long learning, career
address changes to ACM Transactions on Multimedia Computing, Communications and Applications, ACM, 2 Penn Plaza, Suite 701, New York, NY 10121-0701. development, and professional networking.
For manuscript submissions, subscription, and change of address information, see inside back cover. Visit ACM's Website: http://www.acm.org.
Copyright © 2011 by the Association for Computing Machinery (ACM). Permission to make digital or hard copies of part or all of this work for personal or class-
room use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full Change of Address Notification: To notify ACM of a change of address, use the addresses above or send an email to
citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy oth- coa@acm.org.
erwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permission to republish from: Publications
Department, ACM, Inc. Fax +1 212-869-0481 or email permissions@acm.org. Please allow 6–8 weeks for new membership or change of name and address to become effective. Send your old label with
For other copying of articles that carry a code at the bottom of the first or last
your new address notification. To avoid interruption of service, notify your local post office before change of residence.
page or screen display, copying is permitted provided that the per-copy fee For a fee, the post office will forward 2nd- and 3rd-class periodicals.
indicated in the code is paid through the Copyright Clearance Center, 222
Rosewood Drive, Danvers, MA 01923.
Cover images from “A Holistic Approach to Aesthetic

Enhancement of Photographs,” by S. Bhattacharya,
R. Sukthankar, and M. Shah, in this issue.
ACM
ACM Transactions on 2 Penn Plaza, Suite 701
New York, NY 10121-0701
Multimedia Computing, Tel.: (212) 869-7440

Fax: (212) 869-0481
Communications Home Page: http://tomccap.acm.org/ ACM Transactions on Multimedia Computing, Communications and Applications
http://tomccap.acm.org/
and Applications Guide to Manuscript Submission

Submission to the ACM Transactions on Multimedia Computing, Communications and Applications is done electronically
through http://acm.manuscriptcentral.com. Once you are at that site, you can create an account and password with which
you can enter the ACM Manuscript Central manuscript review tracking system. From a drop-down list of journals, choose
Editor-in-Chief ACM Transactions on Multimedia Computing, Communications and Applications and proceed to the Author Center to sub-
Ralf Steinmetz Technische Universität Darmstadt / Darmstadt, Germany / http://www.kom.e-technik.tu-darmstadt.de/People/Staff/ mit your manuscript and your accompanying files.
Ralf_Steinmetz/ralf_steinmetz.html / email: steinmetz.eic@kom.tu-darmstadt.de
Associate Editors You will be asked to create an abstract that will be used throughout the system as a synopsis of your paper. You will also be
Kiyoharu Aizawa University of Tokyo / Tokyo, Japan / email: aizawa@hal.t.u-tokyo.ac.jp asked to classify your submission using the ACM Computing Classification System through a link provided at the Author Center.
Grenville Armitage Swinburne University of Technology / Melbourne, Australia / http://caia.swin.edu.au/cv/garmitage / email: garmitage@ For completeness, please select at least one primary-level classification followed by two secondary-level classifications. To make
swin.edu.au the process easier, you may cut and paste from the list. Remember, you, the author, know best which area and sub-areas are
Susanne Boll University of Oldenburg / Oldenburg, Germany / http://medien.informatik.uni-oldenburg.de / email: susanne.boll@ covered by your paper; in addition to clarifying the area where your paper belongs, classification often helps in quickly identi-
informatik.uni-oldenburg.de fying suitable reviewers for your paper. So it is important that you provide as thorough a classification of your paper as possible.
Wolfgang Effelsberg University of Mannheim / Manheim, Germany / http://www.informatik.uni-mannheim.de / email: effelsberg@
informatik.uni-mannheim.de The ACM Production Department prefers that your manuscript be prepared in either LaTeX or Ms Word format. Style files
Abdulmotaleb El Saddik University of Ottawa / Ottawa, Canada / email: abed@mcrlab.uottawa.ca for manuscript preparation can be obtained at the following location: http://www.acm.org/pubs/submissions/submission.
Gerald Friedland University of California / Berkeley, CA / http://www.icsi.berkeley.edu/~fractor/homepage/About_Me.html / email: htm. For editorial review, the manuscript should be submitted as a PDF or Postscript file. Accompanying material can be in
fractor@icsi.berkeley.edu any number of text or image formats, as well as software/documentation bundles in zip or tar-gzipped formats.
Carsten Griwodz University of Oslo / Oslo, Norway / http://www.simula.no/portal_memberdata/griff / email: griff@simula.no
Questions regarding editorial review process should be directed to the Editor-in-Chief. Questions regarding the post-
Mohamed Hefeeda Simon Fraser University / Surrey, BC V3T 0A3, Canada / http://www.cs.sfu.ca/~mhefeeda / email: mhefeeda@cs.sfu.ca acceptance production process should be addressed to the Journal Manager, Laura Lander, at lander@hq.acm.org.
Mohan S. Kankanhalli National University of Singapore / Singapore / http://www.comp.nus.edu.sg/%7Emohan / email: mohan@comp.nus.edu.sg
Karrie Karahalios University of Illinois / Urbana-Champaign, IL / email: kkarahal@cs.uiuc.edu Subscription, Single Copy, and Membership Information.
Rainer Lienhart University of Augsburg / Augsburg, Germany / http://www.lienhart.de/ / email: rainer.lienhart@informatik.uni-augsburg.de Send orders to:
Ketan Mayer-Patel University of North Carolina / Chapel Hill, NC / http://www.cs.unc.edu/%7Ekmp /email: kmp@cs.unc.edu
ACM Member Services Dept.
Klara Nahrstedt University of Illinois / Urbana-Champaign, IL / http://cairo.cs.uiuc.edu/%7Eklara/home.html / email: klara@cs.uiuc.edu
General Post Office
Thomas Plagemann University of Oslo / Oslo, Norway / http://heim.ifi.uio.no/%7Eplageman / email: plageman@ifi.uio.no
PO Box 30777
Yong Rui Microsoft Research / Redmond, WA / http://research.microsoft.com/%7Eyongrui / email: yongrui@microsoft.com
New York, NY 10087-0777
Shervin Shirmohammadi University of Ottawa / Ottawa, Ontario, Canada / http://www.site.uottowa.ca/%7Eshervin / email: shervin@site.uottawa.ca
Hari Sundaram Arizona State University / Tempe, AZ / http://www.public.asu.edu/%7Ehsundara / email: Hari.Sundaram@asu.edu For information, contact:
Svetha Venkatesh Curtin University of Technology / Australia / http://www.computing.edu.au/%7esvetha/ email: s.venkatesh@exchange. Mail: ACM Member Services Dept.
curtin.edu.au 2 Penn Plaza, Suite 701
Michelle X. Zhou IBM Research Almaden / San Jose, CA / email: mzhou@us.ibm.com New York, NY 10121-0701
Roger Zimmerman National University of Singapore / Singapore / http://www.comp.nus.edu.sg/%7Erogerz/roger.html / email: rogerz@ Phone: +1-212-626-0500
comp.nus.edu.sg
Information Director Fax: +1-212-944-1318
Lasse Lehmann AGT Group (R&D) GmbH / Darmstadt, Germany / email: lasse.lehmann@kom.tu-darmstadt.de
Email: acmhelp@acm.org
Catalog: http://www.acm.org/catalog
Sebastian Schmidt Technische Universität Darmstadt / Darmstadt, Germany / http://www.kom.tu-darmstadt.de/en/kom-multimedia-
communications-lab/people/staff/sebastian-schmidt / email: TOMCCAP@kom.tu-darmstadt.de Subscription rates for ACM Transactions on Multimedia Computing, Communications and Applications are $ 40 per year for
Headquarters Staff ACM members, $35 for students, and $140 for nonmembers. Single copies are $18 each for ACM members and $40 for
nonmembers. Your subscription expiration date is coded in four digits at the top of your mailing label; the first two digits
Laura Lander Journal Manager
show the year, the last two show the month of expiration.
Irma Strolia Editorial Assistant
Media Content Marketing Production About ACM. ACM is the world’s largest educational and scientific computing society, uniting educators, researchers and
professionals to inspire dialogue, share resources and address the field’s challenges. ACM strengthens the computing pro-
fession’s collective voice through strong leadership, promotion of the highest standards, and recognition of technical
The ACM Transactions on Multimedia Computing, Communications and Applications (TOMCCAP) (ISSN: 1551-6857) is published quarterly in Spring, Summer, Fall,
and Winter by the Association for Computing Machinery (ACM), 2 Penn Plaza, Suite 701, New York, NY 10121-0701. Printed in the U.S.A. POSTMASTER: Send excellence. ACM supports the professional growth of its members by providing opportunities for life-long learning, career
address changes to ACM Transactions on Multimedia Computing, Communications and Applications, ACM, 2 Penn Plaza, Suite 701, New York, NY 10121-0701. development, and professional networking.
For manuscript submissions, subscription, and change of address information, see inside back cover. Visit ACM's Website: http://www.acm.org.
Copyright © 2011 by the Association for Computing Machinery (ACM). Permission to make digital or hard copies of part or all of this work for personal or class-
room use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full Change of Address Notification: To notify ACM of a change of address, use the addresses above or send an email to
citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy oth- coa@acm.org.
erwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permission to republish from: Publications
Department, ACM, Inc. Fax +1 212-869-0481 or email permissions@acm.org. Please allow 6–8 weeks for new membership or change of name and address to become effective. Send your old label with
For other copying of articles that carry a code at the bottom of the first or last
your new address notification. To avoid interruption of service, notify your local post office before change of residence.
page or screen display, copying is permitted provided that the per-copy fee For a fee, the post office will forward 2nd- and 3rd-class periodicals.
indicated in the code is paid through the Copyright Clearance Center, 222
Rosewood Drive, Danvers, MA 01923.
Cover images from “A Holistic Approach to Aesthetic

Enhancement of Photographs,” by S. Bhattacharya,
R. Sukthankar, and M. Shah, in this issue.
Browse by Chunks: Topic Mining and Organizing
on Web-Scale Social Media
JITAO SANG and CHANGSHENG XU, Institute of Automation, China and China-Singapore Institute
of Digital Media, Singapore
The overwhelming amount of Web videos returned from search engines makes effective browsing and search a challenging task.
Rather than conventional ranked list, it becomes necessary to organize the retrieved videos in alternative ways. In this article,
we explore the issue of topic mining and organizing of the retrieved web videos in semantic clusters. We present a framework
for clustering-based video retrieval and build a visualization user interface. A hierarchical topic structure is exploited to encode
the characteristics of the retrieved video collection and a semi-supervised hierarchical topic model is proposed to guide the topic
hierarchy discovery. Carefully designed experiments on web-scale video dataset collected from video sharing websites validate
the proposed method and demonstrate that clustering-based video retrieval is practical to facilitate users for effective browsing.
Categories and Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval—
Clustering
General Terms: Algorithms, Design, Experimentation, Performance
Additional Key Words and Phrases: Hierarchical topic model, search result clustering, semisupervised learning, social media,
topic mining, video retrieval
ACM Reference Format:
Sang, J. and Xu, C. 2011. Browse by chunks: Topic mining and organizing on web-scale social media. ACM Trans. Multimedia
Comput. Commun. Appl. 7S, 1, Article 30 (October 2011), 18 pages.
DOI = 10.1145/2037676.2037687 http://doi.acm.org/10.1145/2037676.2037687
1. INTRODUCTION
With the development of multimedia technology and increasing proliferation of social media in Web 2.0,
an overwhelming volume of professional and user-generated videos has been posted to video sharing
websites. YouTube,1 one of the most popular video sharing websites, announced that its users upload
about 65,000 new videos and view more than 100 million videos each day. To detect and track hot events
or topics, more and more people prefer to search and watch videos on the web, which is timely and
1 http://www.youtube.com.
This work was supported by the National Natural Science Foundation of China (Grant No. 90920303) and 973 Program (Project
No. 2012CB316304).
Authors’ address: J. Sang and C. Xu (corresponding author): National Lab of Pattern Recognition, Institute of Automation, CAS,
Beijing 100190 China; email: {jtsang, csxu}@nlpr.ia.ac.cn.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided
that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page
or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to
lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be
requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481,
or permissions@acm.org.
c 2011 ACM 1551-6857/2011/10-ART30 $10.00
DOI 10.1145/2037676.2037687 http://doi.acm.org/10.1145/2037676.2037687
ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 7S, No. 1, Article 30, Publication date: October 2011.
30
30:2 • J. Sang and C. Xu
Fig. 1. An example page from Youtube for query of ‘9/11 attack’. 7,800 videos are returned. Alternative search options are also
shown.
convenient. With the explosion of shared videos, a heavy demand to provide users an effective way to
retrieve and access videos of interest has emerged. The goal of this work is to offer a novel topic mining
and organizing solution and build a visualization user interface by displaying topics as hierarchical
semantic clusters, which facilitates users browsing the retrieved videos and locating interesting ones.
Conventional video search engines order the retrieved videos according to their relevance to the
query. When a user issues a query, search engines return a ranked list including hundreds or thousands
of matches. Users have to painstakingly browse through the long list to judge whether the results
match their requirements and then locate the interesting videos. One question naturally arises: in
addition to a ranked list, is there any more effective way to organize the retrieved videos?
Clustering and visualizing the returned videos into semantically consistent groups offers alternative
solutions. Clustering the retrieved videos can help users get a quick overview of the retrieved video set
and thus locate interesting videos more easily. YouTube provides several options that allow users to
filter search results by U pload date, Category, Duration and Features (see Figure 1). While the coarse
groups involve generic categories of the videos, they provide users little information to understand
the internal configuration and semantic meaning of the returned video collection. There have also
been research attempts [Liu et al. 2008b; Ramachandran et al. 2009] on employing clustering to assist
video retrieval. The strategy was to build a static clustering of the entire collection and then match the
query to the cluster centroid. This is so-called preretrieval clustering. From the perspective of feature
selection, preretrieval clustering is based on features that are frequent in the whole collection but
irrelevant to the query, whereas post-retrieval clusters are tailored to the characteristics of the query,
which makes use of query-specific features. We cannot assume clustering to play a one-size-fits-all
classification role. Therefore, it is more reasonable to put clustering as a postprocessing step. In this
article we propose a postretrieval web video clustering method for cluster-based video retrieval (see
Figure 2 for illustration).
Our method is illustrated by the following observation. Simply taking a glance at the example
in Figure 1, we find that almost all the returned videos contain words like 9/11, attack, terrorism,
WTC, etc. This phenomenon implies that although diverse topics are involved in the retrieved video
Browse by Chunks: Topic Mining and Organizing on Web-Scale Social Media • 30:3
Fig. 2. Visualization of user interface of cluster-based video retrieval: (top)User submits a query, the underlying topic hierarchy
is exploited and displayed on the left as a complementary view to the conventional flat list. (bottom)When a user chooses one
subtopic (video cluster), the included videos will be shown on the right in the order of its relevance to this subtopic as computed
by Equation (2).
collection, they usually share one common topic referred in query, and we refer to the shared topic
as the parent topic. We elaborate this idea with the same example in Figure 2(top). On the left, the
circle illustrates the latent semantic structure of the retrieved video collection. Each color along the
circle demonstrates a subtopic, annotated with tag-cloud of its top eight probable terms. The length of
the arc is proportional to the number of videos belonging to this subtopic. The parent topic at the
root node of hierarchy is located in the center. Each retrieved video can be viewed as a combination
of the parent topic and one child topic (subtopic, which can be enumerated as Live attack and rescue
video, Domestic and international response afterwards, Investigation and The Else: long-term effect and
memorial).
Delighted from this, we extend the hierarchical topic model [Blei et al. 2010] to exploit a two-level
topic tree in the retrieved video collection and cluster the collected videos into the leaf-level subtopics.
Compared with flat structure based clustering method (e.g. k-means, LDA), utilizing the hierarchical
topic model will prevent the shared topic from being mixed within other topics and thus ensure the
clustering performance. Furthermore, we encode the consistency between the query and the root-level
topic (we denote it as the query-root-topic knowledge in this paper), as the prior information to form a
semi-supervised hierarchical topic model.
Since there are no ready metrics for evaluating the performance of cluster-based video retrieval, we
refer to text search result clustering and employ objective metrics as well as user study tasks to assess
the performance of the proposed method.
2. RELATED WORK
In this section, we review the previous researches on Web video mining and search result clustering.
The relations are also discussed.
2.1 Web Video Mining

In an effort to keep up with the tremendous growth of the Web videos, a lot of work targeted on ana-
lyzing web video content and structure to help the users finding desired information more efficiently
and accurately.
Topic detection and tracking (TDT), first proposed in the 1990s for news document, has attracted
increasing attention for web video analysis [Liu et al. 2008a; Yuan et al. 2008; Cao et al. 2010; Yuan
et al. 2010]. By automatically filtering out topic candidates and tracking “hot” topics, TDT strives to
organize large-scale Web videos into topics, facilitating users and advertisers efficiently browse and
track the evolution of topics. Other work considering clustering the whole collection into semantic top-
ics [Liu et al. 2008b; Ramachandran et al. 2009] can also be grouped into this category. We notice that
in TDT video clustering is performed in advance and on the whole document collection. The number
of topics is also predefined. Since web is a dynamic environment, statical and pre-computed clusters
should have to be constantly updated.
Near-duplicate video clustering and elimination [Cheung and Zakhor 2004; Wu et al. 2007] is another
way to help users retrieve and access web videos. With the explosion of web video pool, video search
engines tend to return similar or near-duplicate videos together in the result lists. Clustering the
search results according to their content and visual similarities is considered to be a practical way to
facilitate users for fast browsing. However, video clips in the same near-duplicate cluster are basically
derived from the same original copy. It cannot be used for topic-level browsing and fails to solve the
problem we bring forward in this paper either.
2.2 Search Result Clustering

Search result clustering [Carpineto et al. 2009], clustering the raw result set into different semantic
groups has been investigated in text retrieval [Cutting et al. 1992; Zamir and Etzioni 1998;
Kummamuru et al. 1998] and image retrieval fields [Cai et al. 2004; Jing et al. 2006]. By grouping
the results returned by a conventional search engine into labeled clusters, it allows better topic un-
derstanding and favors systematic exploration of the search results. The work in this paper can be
regarded as video search result clustering.
To the best of our knowledge, until now, the only work addressing the problem of video search result
clustering is Hindle et al. [2010]. They clustered the top returned videos based on visual similarity
of low-level appearance features and textual similarity of term vector features. Their clustered video
groups are near-duplicate alike. Our experiments demonstrate that the size of the clusters derived
by their method is much smaller than the cluster size of the underlying subtopic and the number of
clusters is relatively large.
We notice that most of the previous search result clustering methods are devoted to solving the am-
biguous problem resulted from nonspecific queries. The queries most involve general objects or names,
and the cluster labels correspond to alternative interpretations of the query. For example, query apple
Fig. 3. System framework of video search result clustering.
with interpretation of computer, ipod, logo and fruit [Cai et al. 2004]; query sting with interpretation
of musician, wrestler and film [Hindle et al. 2010]. In this article we focus on more complex queries
concerning political and social events or issues. The semantic clusters inside the returned videos are
diverse aspects of the query-corresponding events (e.g. query of 9/11 attack, see Figure 2) or different
viewpoints on controversial issues (e.g. query of abortion with opposing viewpoints of ‘pro-life’ and ‘pro-
choice’). In this case, limited general terms are insufficient for users to understand the subtopics. It is
best described by a set of representative keywords. In this paper, we introduce topic model to describe
the subtopic with a probability distribution over terms in a large vocabulary.
In addition, illustrated by the observation that the returned results share one common topic, we
explicitly considers the basic characteristic into the clustering process and exploit the inherent hierar-
chical topic structure.
3. FRAMEWORK
In this article, we propose a hierarchical topic model based framework for clustering-based video re-
trieval. The framework contains two steps, query expansion and hierarchical topic model based topic
hierarchy discovery. The input of our algorithm is web videos collected from video sharing websites,
and the output is the generated video clusters as well as the topic hierarchy. This is shown in Figure 3.
When video sharing websites (e.g. YouTube, Metacafe, Vimeo, etc.) capture a query submission from
a user, the search engine will return a raw ranked list of the videos. Metadata around each video are
collected and represented as a document-term matrix.
Hierarchical Latent Dirichlet Allocation (hLDA) [Blei et al. 2010; Blei et al. 2004] is a generalization
of the (flat) Latent Dirichlet Allocation (LDA) model [Blei et al. 2003]. We employ hLDA for unsu-
pervised discovery of the topic hierarchy in the retrieved video collection. To effectively incorporate
query-relevant terms into the root topic, we employ association mining as well as WordNet conceptual
relation between words to expand the query words, resulting in a seed word set. The seed word set
is viewed as supervision information (query-root-topic knowledge) and an extension to the standard
hLDA, semi-supervised hLDA (SShLDA) is proposed to guide the inference of the topic hierarchy.
Fig. 4. Hierarchy relation of word ‘attack’ in WordNet3.0.
After probabilistic inference of topic modeling, each video is assigned a single path from the root
node to a leaf node. The videos assigned to the same path will be grouped together to form a cluster
and the subtopics in the leaf node constitute the description for the corresponding video clusters.
The contributions of this article are summarized as follows: 1) We propose a novel solution frame-
work for clustering-based video retrieval. Hierarchical topic model is introduced to explore the inherent
hierarchical topic structure in the retrieved video collection. 2) Query-root-topic knowledge is incorpo-
rated to guide the topic hierarchy discovery and a semi-supervised extension to the standard hierarchi-
cal topic model is presented. 3) For cluster representation, topics characterized by term distributions
are utilized to deal with complex queries of political and social events or issues.
4. QUERY EXPANSION
Query expansion (QE) is the process of reformulating a seed query to improve retrieval performance
in information retrieval operations. For Web search engines, query expansion involves evaluating a
user’s input and expanding the search query to match additional documents. In our case, we employ
query expansion, combining WordNet and association mining to extend the query terms into a seed
word set S = {s1 , . . . , sC }, which composes the root topic of the derived topic hierarchy.
WordNet [Miller et al. 1990] is an online lexical dictionary which describes word relationships in
three dimensions of Hypernym, Hyponym and Synonym. It is organized conceptually. As in Figure 4,
fight is a hypernym of the verb attack and bombing is a hyponymy of the noun attack. Gong et al.
[2005] utilized WordNet nouns hypernym/hyponym and synonym relation between words to expand
the queries. To avoid bringing in noisy terms, they supplemented their method with a term semantic
network to filter out low-frequency and unusual words. According to our mechanism of incorporating
the supervision information (detailed in Section 5), adding noisy words not included in the vocabulary
will not detract from the topic modeling process. This means we are allowed to extend the query as
much as we can, on condition that no words concerned with subtopics are mixed. Therefore, we exclude
words having hyponym or troponym relations to the query in WordNet. In addition, instead of removing
unusual words, we employ association mining and add high-frequency words into the seed word set.
We utilize WordNet as the basic rule to extend the query along two dimensions including hypernym
and synonym relations. The original query 9/11 attack, for instance, may be expanded to include 911
attack assault aggress assail fight struggle contend onslaught onset attempt operation approach event.
Since WordNet has narrow coverage for domain specific queries [Chandramouli et al. 2008], we use
association rules to exploit collection-dependent word relationships. We examine the vocabulary and
add the words with both top 10 highest conf idence and support with the original query words into the
query expansion. The final seed word set of query 9/11 attack may be S = {911 attack Assault aggress
Fig. 5. (a) LDA graphical model. (b) Hierarchical LDA graphical model. (c) Semi-supervised Hierarchial LDA graphical model.
μ is the controlling the strength of our constraint derived from the seed set. The proposed SShLDA differs from standard hLDA
in the way w is generated.
assail fight struggle contend onslaught onset attempt operation approach event wtc world trade center
terrorist terrorism 9-11}.
5. SEMI-SUPERVISED HIERARCHICAL TOPIC MODEL

We begin by briefly reviewing LDA and the standard hLDA. Then we introduce our extension to hLDA,
SShLDA, and derive the parameter estimation and prediction algorithm. We will describe the models
using the original terms ‘documents’ (in our case, each video correspond to one document) and ‘words’
as used in the topic model literature.
5.1 Latent Dirichlet Allocation and Hierarchical Topic Model

Suppose we have a corpus of M documents, {w1 , w2 , . . . , w M } containing words from a vocabulary of
V terms. Further we assume that the order of words in a particular document is ignored. This is a
“bag-of-words” model.
LDA. The Latent Dirichlet Allocation model [Blei et al. 2003] assumes that documents are generated
from a set of K (K needs to be predefined) latent topics. In a document, each word wi is associated with
a hidden variable zi ∈ {1, . . . , K} indicating the topic from which wi was generated. The probability of
word wi is expressed as

K
P(wi ) = P(wi |zi = j)P(zi = j), (1)
j=1
where P(wi |zi = j) = βi j is a probability of word wi in topic j and P(zi = j) = θ j is a document specific
mixing weight indicating the proportion of topic j.
LDA treats the multinomial parameters β and θ as latent random variables sampled from a Dirich-
let prior with hyperparameters α and η respectively. The corresponding graphical model is shown in
Figure 5(a).
Hierarchical LDA. The LDA model we have described has a flat topic structure. Each document is
a superposition of all K topics with document specific mixture weights. The hierarchical LDA model
organizes topics in a tree of fixed depth L. Each node in the tree has an associated topic and each
document is assumed to be generated by topics on a single path from the root to a leaf through the
tree. Note that all documents share the topic associated with the root node, this feature of hLDA is
consistent with the characteristics of search result collection we mentioned in Section 1.
The merit of the hLDA model is that both the topics and the structure of the tree are learnt from
the training data. This is achieved by placing a nested Chinese restaurant process (nCRP) [Teh et al.
2006] prior on the tree structure. nCRP specifies a distribution on partitions of documents into paths
in a fixed depth L-level tree. To generate a tree structure from nCRP, assignments of documents to
paths are sampled sequentially, where the first document forms an initial L-level path, i.e. a tree with
a single branch. The probability of creating novel branches is controlled by parameter γ , where smaller
values of γ result in a tree with fewer branches.
In the hLDA, each document is assumed drawn from the following process.
i. Pick a L-level path cd from the nCRP prior: cd ∼ nC RP(γ ).
ii. Sample L-dimensional topic proportion vector θd ∼ GEM(m, π ).
iii. For each word wd,n ∈ wd:
(a) Choose level zd,n ∈ {1, . . . , L} ∼ Discrete(θd);
(b) Sample a word wd,n ∼ Discrete(βcd |zd,n), which is parameterized by the topic in level zd,n on the
path cd.
The corresponding graphical model is shown in Figure 5(b). Further details of hLDA can be found in
Blei et al. [2010].
5.2 Semi-Supervised Hierarchical LDA Model
When we utilize hierarchical topic model for the video clustering task, one subtopic corresponds to
one cluster. The cluster membership of each video is decided by its posterior path assignment cd. The
cluster videos are sorted by their proportion on the subtopic as computed by:

wd,n∈wd |zd,n = 2|
, (2)
Nd
where | · | is indicator function and the numerator accumulates the word allocated at the leaf level, Nd
denotes the word number.
To incorporate the query-root-topic knowledge into the hierarchical topic modeling, we propose an
extension to the standard hLDA, which we call Semi-Supervised Hierarchical LDA model (SShLDA).
The supervised information we add is the seed word set derived from query expansion, S = {s1 , . . . , sC }.
We jointly model the documents and the seed word set, in order to guide the discovery of topic hierarchy
so that the words in the seed word set will have high probability in the root topic and low probability
in subtopics.
We first explain how query-root-topic knowledge can be incorporated into the topic modeling process.
In the standard hLDA, the topic level allocation zd,n for word n in document d is a latent variable and
needs to be inferred through the model learning process. Assume we have the supervised information
of zd,n, that is, the topic level allocation for a given word in a given document. This can be seen as
similar to semi-supervised learning with labeled features [Druck et al. 2008]. In our case, we denote
it as hard constraint when the seed set words are restricted to be shown only in the root topic. In
practical applications, each word tends to be generated from every topic with different probabilities.
Therefore, we relax this strong assumption. Instead of providing topic level allocation zd,n for each seed
word, we modify the generative process of standard hLDA so that sampling seed words from root topic
and subtopics will have different probabilities.
Specifically, the proposed SShLDA differs from hLDA in the way wd,n is generated. The generative
process of SShLDA is:
i. Pick a L-level path cd from the nCRP prior: cd ∼ nC RP(γ ).
ii. Sample L-dimensional topic proportion vector θd ∼ GEM(m, π ).
iii. For each word wd,n ∈ wd:
(a) Choose level zd,n ∈ {1, . . . , L} ∼ Discrete(θd);

(b) Sample a word wd,n ∼ Constraint(μ, zd,n) · Discrete(βcd |zd,n)
The corresponding graphical model is shown in Figure 5(c). Constraint(μ, zd,n) is the soft constraint
function defined as follows:

μδ(wd,n ∈ S) + 1 − μ, zd,n = 1,
Constraint(μ, zd,n) = (3)
μδ(wd,n ∈
/ S) + 1 − μ, zd,n = 1.
where δ(·) is an indicator function and μ(0 ≤ μ ≤ 1) is the strength parameter of the supervision. μ = 0
reduces to standard hLDA and μ = 1 recovers the hard constraint.
This formulation provides us a flexible way to insert a prior domain knowledge into the inference of
latent topics with different definitions of the constraint function, for instance, with prior information
on the latent subtopics, S can be set independently for the specific subtopic.
5.3 Inference and Learning
Having the SShLDA model, we need to perform posterior inference [Bishop 2006], that is, to invert
the generative process of documents described above for estimating the hidden topical structure. We
modify the Gibbs sampling algorithm in hLDA to approximate the posterior for SShLDA model.
The goal is to obtain samples from the posterior distribution of the latent tree structure T , the
level allocations z of all words and the path assignments c for all documents conditioned on the ob-
served collection w and seed words constraint S. In a Gibbs sampler, each latent variable is iteratively
sampled conditioned on the observations and all the other latent variables. Collapsed Gibbs sam-
pling [Liu 1994] is employed, in which we marginalize out the topic parameters β and per-document
topic proportions θd to speed up the convergence. Therefore, the posterior we need to approximate
is p(c1:D, z1:D|γ , m, π, η, μ, w1:D), where γ and η are the hyperparamters of nCRP and the topic-word
distribution, {m, π } is the sticking-breaking parameter for topic proportions. μ controls the strength of
seed word set constraint. These parameters can be fixed according to the analysis and prior expectation
about the data, which will be discussed in the Experiment section.
The state of the Markov chain for a single document is illustrated in Figure 6. (The assignments
are taken at the approximate mode of the SShLDA posterior conditioned on search results metadata
collection of query ‘9/11 attack’). For each document, the process of Gibbs sampler is divided into two
steps: resample the per-word level allocations to topics zd,n and resample the per-document paths cd.
Sampling Level Allocations. Given the current path assignments, we need to re-sample the level
allocation variable zd,n for word n in document d:
p(zd,n|z−(d,n) , c, w, m, π, η) ∝ p(wd,n|z, c, w−(d,n) , η) p(zd,n|zd.−n, m, π ), (4)
where z−(d,n) and w−(d,n) are vectors of level allocations and observed words leaving out zd,n and wd,n
repectively, zd,−n denotes the level allocations in document d, leaving out zd,n. This is the same notation
as in Blei et al. [2010].
The first term in Equation (4) is the probability of a given word based on a possible assignment. In
standard hLDA, it is assumed that the topic parameters β are generated from a symmetric Dirichlet
distribution, thus the frequency of seeing word wd,n allocated to the topic at level zd,n of the path cd is:
p(wd,n|z, c, w−(d,n) , η) ∝ #[z−(d,n) = zd,n, czd,n = cd,zd,n , w−(d,n) = wd,n] + η, (5)
where #[·] counts the elements of an array satisfying a given condition.
Let
qd,n = #[z−(d,n) = zd,n, czd,n = cd,zd,n , w−(d,n) = wd,n] + η.
Fig. 6. A state of the Markov chain in the Gibbs sampler for the title and tag of “Mossad follow up - start asking questions
why this isnt being exposed.” The document is associated with a path cd through the hierarchy, and each node in the hierarchy
is associated with a distribution over words. Finally, each word wd,n in the title and tag is associated with a level zd,n in the
path cd , with 1 being the root level and 2 being the leaf level. Other words without level allocations are removed as stop − words
in preprocessing. As the constrained sampling proceeding, seed words like 911, attack, terrorism, etc. tend to be more and more
likely generated from the root topic.
We now incorporate the supervision of seed word set. We set a soft constraint by modifying the Gibbs
sampling process that seed words tend to be generated from the root topic (zd,n = 1):
p̂(wd,n|z, c, w−(d,n) , η) ∝ qd,n · Constraint(μ, zd,n), (6)
where the definition of Constraint(μ, zd,n) is in Equation (3). Following this sampling process, the words
relevant to the query are guaranteed to have a higher probability to be assigned the root topic, leaving
the subtopics focusing more on refined terms. We emphasize that SShLDA accommodates when derived
vocabulary V does not include the terms in the seed set.
The second term in Equation (4) is a distribution over levels which is concerned with the GEM
distribution of the stick breaking process. We keep it unchanged:
mπ + #[zd,−n = k] (1 − m)π + #[zd,−n > j]

k−1
p(zd,n = k|zd.−n, m, π ) = . (7)
π + #[zd,−n ≥ k] π + #[zd,−n ≥ j]
j=1
Sampling Path Assignments. Keeping the level allocation variables z fixed, we re-sample the path
assignment associated with each document cd, which will result in a deletion/creation of a branch in
the tree. This is same as the standard hLDA [Blei et al. 2010].
p(cd|w, c−d, z, η, γ ) ∝ p(cd|c−d, γ ) p(wd|c, w−d, z, η), (8)
where the first term is the prior on paths implied by nCRP, and the second one is the probability of the
data given a particular choice of path.
With these conditional distributions, the full Gibbs sampling process is specified. Given current
state of the sampler, {c(t) (t)
1:D , z1:D }, we iteratively sample each variable conditioned on the rest. After
running for sufficiently iterations, we can approach its stationary distribution, which is the conditional
distribution of the latent variables in the SShLDA model given the corpus and seed word set.
Table I. Collected Video Sharing Web Sites Dataset Information

ID Query Video retrieved Video collected Vocabulary Total word
1 9/11 attack 8,361 791 2140 38747
2 gay rights 602,885 799 2048 35538
3 abortion 66,606 797 1770 33144
4 Iraq war invasion 4,425 702 1778 36760
5 Beijing Olympics 202,511 787 1718 32370
6 Israel palestine conflict 252,746 798 1814 38499
7 US president election 36,037 731 1792 33249
6. EXPERIMENTS
Among the different metadata around a video, title, tag and description are more likely to be infor-
mative in revealing the semantic meaning. There may be possibilities for mining other metadata (e.g.,
comments), but we leave it for future research. We present two experiments to demonstrate the perfor-
mance of the proposed clustering-based video retrieval framework. First, we refer to text search result
clustering and evaluate subtopic reach time with state-of-the-art algorithms on a benchmark dataset.
Then we consider assessing the retrieval effectiveness in a web-scale video dataset collected from video
sharing websites.
6.1 Dataset
Text subtopic retrieval dataset. We utilized a benchmark text search result clustering evaluation
dataset, AMBIENT.2 AMBIENT consists of 44 topics, each with a set of subtopics and a list of 100
search results with corresponding subtopic relevance judgments. The topics were selected from the list
of ambiguous Wikipedia entries. The 100 search results associated with each topic were collected from
Yahoo, and their relevance to each of the Wikipedia subtopics were manually assessed.
Video sharing Web site dataset. Since the goal of this paper is to present a clustering-based browsing
algorithm for Web video retrieval, it is important to devise methods for evaluating its performance
in real video sharing websites. After careful examination of the hottest topics in Youtube, Google
Zeitgeist, and Twitter, we selected seven social and political topics as queries. We issued these queries
to Youtube, Metacafe, and Vimeo, and crawl the top 500, 150 and 150 (if there are) returned videos for
experiments, respectively. We focused on the topmost search results to avoid bringing too many unre-
lated videos. Videos with no tags are filtered out. The videos collected from each query form a video
set. The queries and information about corresponding video set are listed in Table I.
6.2 Parameter Settings
The work in Hindle et al. [2010] (we refer it as BCS) has a similar motivation, but our work differs from
theirs in several aspects: 1) BCS employs a flat-structure clustering algorithm; 2) BCS uses the cluster
centroid to represent the cluster and provides no mechanism for how to derive the cluster labels. Since
this is the most relevant work with us, we performed their method on our dataset as a comparasion.
The most important parameters for BCS are the weights for adopted features, visual, tag, title, and
description. Affinity propagation (AP) and normalized cut (NC) are utilized as the clustering algorithm
and they demonstrated AP generally outperforms NC. Therefore, we fixed the set of feature weights
showing best performance with AP clustering: visual-0.3, tag-0.49, title-0.07, description-0.14.
To further evaluate the advantage of exploiting a hierarchical topic structure, we also implemented
LDA and compare it with hLDA and SShLDA. Topic models make assumptions about the topic struc-
ture by the settings of hyperparameters. We empirically fixed the hyperparameters according to the
2 http://credo.fub.it/ambient.
Fig. 7. Average subtopic number error as μ changes.
prior expectation about the data. The hyperparameter η controls the smoothing/sparsity of topic-word
distribution. Small η encourages more words to have high probability in each topic. (For LDA, it re-
quires less topics to explain the data. For hLDA and SShLDA, it leads to a small tree with compact
topics.) Delighted from this, we empirically chose a relatively small value of η and set η = 0.5. Both
hLDA and SShLDA have an additional hyperparameter, CRF parameter γ , which decides the size of
the inferred tree. As in Blei et al. [2004], we set γ = 1 to reduce the likelihood choosing new paths
when traversing nCRP.
Dirichlet prior hyperparameter α for LDA and the GEM parameters m, π for hLDA and SShLDA
jointly control over the mixing of document-topic distribution. For LDA, our goal is to group documents
into topic-specific clusters according to the dominant topic proportions. Therefore, α is fixed to a value
much larger than 1 (α = 50) to encourage high mixing of topics. For hLDA and SShLDA, GEM pa-
rameters m, π reflect the stick-breaking distribution. We set m to be a small value (m = 0.1), and the
posterior is more likely to assign more words to the leaf level of the inferred tree. Setting variance π
to be a small value (π = 10) means that the word allocation adheres to the parameter settings, thus
accelerates the convergence speed.
For the choice of supervision strength parameter μ, we divided the AMBIENT dataset into two sub-
sets: one consisting of 10 topics for the determination of μ and one consisting of 34 topics for evaluating
the clustering performance. We assume that appropriate μ brings no perturbation to the hierarchical
topic discovery process and the derived topic tree should be consistent with the latent hierarchical
structure. Therefore, we analyzed the error between the subtopic number of ground truth and the de-
rived subtopic number over the different values of μ (see Figure 7). μ = 0.5 achieves the least error.
Therefore, we fixed μ = 0.5 in the following experiments. In fact, we also compared the retrieval per-
formance with respect to various μ in Section 6.4, and found that the performances for the different
queries share a similar variation pattern: the results deteriorate as μ approaches 0 or 1, and there is
little difference when μ ∈ [0.3, 0.6]. Therefore, for practical implementation where a training set is not
available, μ is suggested to set as 0.5.
6.3 Experiments on a Text Subtopic Retrieval Dataset

We first performed experiments on AMBIENT. To evaluate the retrieval performance by search result
clustering, we borrowed the metric of subtopic reach time (SRT) [Carpineto et al. 2009], which is a
modelization of the time taken to locate a relevant document. For each query’s subtopic, the subjects
first select the most appropriate label (or topic representation) created by the clustering algorithm.
The SRT value is then computed by summing the number of clusters and the position of the first
relevant result in the selected cluster. For instance, the SRT for the subtopic Live attack and rescue
video in Figure 2(b) would be 5, given by the number of clusters (4) plus the position of the first
relevant search result (Never before seen Video of WTC 9/11 attack) within the selected cluster (1).
Table II.
Comparison of subtopic reach time of state-of-the-art text search results
clustering with LDA, hLDA and SShLDA on the AMBIENT text collection.
CREDO Lingo Lingo3G STC TRSC LDA hLDA SShLDA
14.96 15.05 13.11 15.82 17.46 15.73 12.7 10.92
When no appropriate cluster fits the subtopic at hand, or the selected cluster does not contain any
relevant result, SRT is given by the number of clusters plus the position of the first result relevant to
the subtopic in the ranked list.
We noticed that Hindle et al. [2010] adopted visual feature in the clustering process, and it is not
fair to examine it in a text-based AMBIENT database. Therefore, we only compared the results (which
is averaged over the test set of 34 queries) of LDA, hLDA and proposed SShLDA with state-of-the-art
text search clustering algorithms in Table II (the results of text search clustering algorithms are taken
from [Carpineto et al. 2009]). Four graduate students participated in the user study task as subjects.
The best performance is achieved by SShLDA, followed by hLDA, which is due to the separation of
shared common topic from subtopics. It is interesting to note that the SRT for LDA is relatively high.
The topics AMBIENT included are most general terms, e.g. Eos, Cube, B-52. The descriptive power of
topic model for complex queries cannot be exerted.
6.4 Experiments on Video Sharing Web Sites

Visualization of the discovered subtopics. We visualize the discovered subtopics of video collections for
test queries in Figure 13. For the query of ‘9/11 attack’, the subtopics derived from LDA and topic
hierarchies derived from hLDA and SShLDA are presented together for comparison. It is shown that
LDA mixes common words like attack, 911, September, terrorist in different subtopics and fails to dis-
cover the shared topic. The topic hierarchy recovered by hLDA finds the shared topic on the root level.
However, without constraint of topic distribution over the seed word set, words describing the shared
topic, for instance, wtc, terrorist, 11, attack also appear in subtopics. This contaminates the subtopics
and limits its power to subevents or viewpoints detection. Incorporated with supervision information,
SShLDA prevents seed words generating from the subtopics, and results in a topic hierarchy with
concise subtopics focusing on the refined themes.
Comparing different clustering methods. For evaluation, human accessors create ground-truth
subtopic themes after browsing the retrieved videos for each query-corresponding video set. For exam-
ple, the subtopic themes inside the video collection derived from the query abortion are summarized
as pro-abortion, anti-abortion, and neutral. Videos are manually labeled as belonging to a certain
subtopic, (cluster). The ground-truth subtopic number and derived subtopic (cluster) number by BCS,
hLDA, and SShLDA for the test queries are shown in Figure 8(left). We can see that all three models
fail to recover the ground-truth subtopic number for some video sets. The reason is that the ground-
truth subtopic themes created by subjective assessment may not reflect the nature of the video set,
especially when unrelated noisy videos are involved. We also notice that SShLDA and hLDA performs
better than BCS. The BCS curve is high above the ground truth. This is due to its duplicate clustering
alike mechanism, which results in small-size duplicate video clusters.
We first compare the SRT of LDA, hLDA and SShLDA on the collected video dataset in Figure 8
(right). The result is consistent with the experimental result on AMBIENT dataset that SShLDA and
hLDA achieves lower SRT than LDA.
In addition to SRT which aims to access the retrieval performance, we use four criteria to quantify
the clustering quality, purity [Tan et al. 2005], F measure, cluster description readability and computa-
tional efficiency. Figure 9(left) show the cluster purity for BCS, LDA, hLDA and SShLDA. We find that
Fig. 8. (left:)The ground-truth subtopic number and automatically derived cluster number for test queries. (right:)Subtopic
reach time (SRT) for test queries.
Fig. 9. (left:)Purity rates. (right:)F1 measure for test queries.
BCS noticeably outperforms the other algorithms. High purity is easy to achieve when the number
of clusters is large. Therefore, we cannot use purity to trade off the quality of the clustering against
the number of the clusters. A measure to make this trade-off is F measure [Steinbach et al. 2000]. We
evenly penalize false negatives and false positives, i.e. the F1 measure (Figure 9(right)). It is shown
that BCS performs poorly on F1 measure, even much worse than LDA. The reason is that BCS focuses
on clustering duplicate or near-duplicate videos, which limits the cluster size and forces considerable
number of semantically similar videos assigned to different clusters.
The quality of the cluster description is crucial to the usability of clustering-based video retrieval.
If a cluster cannot be described, it is presumably of no value to the user. BCS employs the cluster
centroid as the cluster representation, which lacks real descriptions and is of litter use for guiding
the user understanding the cluster content and locating the interesting videos. The cluster descrip-
tion readability is evaluated as follows. Each cluster corresponding subtopic characterized by the top 5
probable words was shown to the participants with the top 3 ranked videos in this subtopic. The par-
ticipants were asked to evaluate the cluster description readability in two aspects: “whether the topic
description itself is sensible, comprehensive and compact” (question 1) and “whether the topic descrip-
tion is consistent with the representative videos” (question 2). For each question, participants rated
from 1 to 5 where 5 is best. The average ratings are shown in Figure 10. The proposed SShLDA shows
superiority on generating meaningful cluster descriptions, especially on generating sensible, compre-
hensive and compact representations (question 1). We note that ratings for query 5 Beijing Olympics
are relatively low. In the retrieved video set of Beijing Olympics, diverse events or subtopics are in-
volved, for instance opening ceremony, game video, athlete interview, torch relay, etc. The discovered
Fig. 10. Mean ratings of cluster description readability for (left:) Question 1 (right:) Question 2.
Table III. Average Time Cost of Different Clustering

Algorithms
BCS LDA hLDA SShLDA1 SShLDA2
Time Cost (s) 0.7 3.5 6.8 4.7 6.1
Fig. 11. Mean rating score of Youtube and our method.
Fig. 12. Subtopic reach time as strength parameter μ changes.
topic structure is sparse and less meaningful. Besides, some unrelated videos regarding issue of Tibet
are also included.
For clustering-based video retrieval, the clustering is performed online, which requires necessarily
short response time. We focus on the efficiency of clustering algorithms and do not consider about
Fig. 13. Discovered subtopics from the video collection of seven queries from Youtube. (a) 9/11 attack, comparison between LDA,
hLDA and SShLDA. For SShLDA, we also present 2 video examples having the largest proportion associated with the topics
(b) gay rights; (c) abortion; (d) Iraq war invasion; (e) Beijing Olympics; (f) Israeli Palestine conflicts; (g) US president election.
the video acquisition time cost. We assume that visual features used in BCS are extracted offline and
take no account of text preprocessing time. Table III illustrates time complexity for the clustering
algorithms. (SShLDA1 denotes the clustering time cost only, SShLDA2 also considers the query ex-
pansion time from local-storaged WordNet). Since BCS uses AP for clustering, it achieves lower time
cost than the generative topic models. The speedup of SShLDA over hLDA is due to that incorporated
prior guides the seed words gradually generated from the root set and thus speedups the convergence
process. We noticed that the computational cost dramatically increases when dealing with large-scale
web videos, and we will be researching towards this in future work.
Clustering versus ranked lists. To compare the proposed clustering-based video retrieval with ex-
isting video search engines, for instance, Youtube, we design a specific task. The task assumes the
participant is a news editor and wants to allsidedly introduce a hot event or topic to users, search for
10 Web videos. Participants choose Youtube or the proposed clustering-based interface to complete the
task in a random order. After the task, participants are required to select from four options for both
systems. The options are very satisfied (4), somewhat satisfied (3), unsatisfied (2) and very unsatisfied
(1). The average ratings are shown in Figure 11. For five out of seven test queries, participants prefer
the proposed clustering-based method to ranked list-based search engine.
Clustering performance with respect to strength parameter μ. To analyze the influence of the strength
parameter μ to the clustering performance, we performed an experiment to evaluate the SRT by tuning
μ ∈ [0, 1] at a step of 0.1. With the results illustrated in Figure 12, we come up with three observations:
1) As μ changes from 0 to 1, the retrieval performance of different queries varies similarly, with query
1 varies slightly different. A rough conclusion is that different datasets share a unique pattern of
choosing μ. 2) The results deteriorate dramatically when μ = 1, which verifies our assumption that a
hard constraint is not practical. 3) While the results deteriorate as μ approaches 0 or 1, there is little
difference when μ ∈ [0.3, 0.6]. This means that the incorporation of prior knowledge is effective and
our algorithm does not heavily depend on the choice of the strength parameter μ. A chart of subtopics
is given in Figure 13.
7. CONCLUSIONS
In this article, we have presented a hierarchical topic model based framework for clustering-based web
video retrieval. Instead of showing a long ranked list videos, we explore the hierarchical topic struc-
ture in the retrieved video collection and present users with videos organized into semantic clusters.
Experiments demonstrate the effectiveness of the proposed method.
In the future, we will improve our current work along three directions. 1) Unrelated videos in re-
trieved video collections will affect the clustering performance. We will develop noisy subtopic aware
hierarchical topic model to reduce the influence of noises as well as remove unrelated videos. 2) Some
summary videos cover various aspects of query related topic, for instance, an introductive video de-
scribes 3 main viewpoints towards the issue of abortion: pro-life, pro-choice and neutral. In this case,
the video cannot be grouped into arbitrary subtopic. The SShLDA needs to be extended to multipath
assignment version: each document exhibits multiple paths through the tree and topic depth L can
vary from document to document. 3) So far our experiments have been based on textual analysis and
consider no visual information. Web videos carry rich visual contents and visual information provides
important clues for video clustering that should not be ignored. We are now working towards incorpo-
rating visual information into the hierarchical topic modeling framework.
REFERENCES
BISHOP, C. M. 2006. Pattern Recognition and Machine Learning. Springer.
BLEI, D., NG, A., AND JORDAN, M. 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 7, 993–1022.
BLEI, D. M., GRIFFITHS, T. L., AND JORADAN, M. I. 2010. The nested chinese restaurant process and bayesian nonparametric
inference of topic hierarchies. J. ACM 57, 2, 1–30.
BLEI, D. M., GRIFFITHS, T. L., JORADAN, M. I., AND TENENBAUM, J. 2004. Hierarchical topic models and the nested chinese restaurant
process. In Advances in Neural Information Processing Systems. MIT Press, 17–24.
CAI, D., HE, X., LI, Z., MA, W. Y., AND WEN, J. R. 2004. Hierarchical clustering of www image search results using visual textual
and link information. In Proceedings of the ACM Multimedia Conference (MM). 952–959.
CAO, J., NGO, C.-W., ZHANG, Y.-D., ZHANG, D.-M., AND MA, L. 2010. Trajectory-based visualization of web video topics. In Proceed-
ings of the ACM Multimedia Conference (MM). 1639–1642.
CARPINETO, C., OSINSKI, S., ROMANO, G., AND WEISS, D. 2009. A survey of web clustering engines. ACM Comput. Surv. 41, 3, 1–38.
CHANDRAMOULI, K., KLIEGR, T., NEMRAVA, J., SVATEK, V., AND IZQUIERDO, E. 2008. Query refinement and user relevance feedback
for contextualized image retrieval. In Visual Information Engineering, Xian, China, 452–458.
CHEUNG, S. S. AND ZAKHOR, A. 2004. Fast similarity search and clustering of video sequences on the world-wide-web. IEEE
Trans. Multimedia 7, 3, 524–537.
CUTTING, D. R., PEDERSEN, J. O., KARGER, D. R., AND TUKEY, J. W. 1992. Scatter/gather: a cluster-based approach to browsing
large document collections. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development
in Information Retrieval (SIGIR). 318–329.
DRUCK, G., MANN, G., AND MCCALLUM, A. 2008. Learning from labeled features using generalized expectation criteria. In Pro-
ceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
595–602.
GONG, Z., CHEANG, C. W., AND U, L. H. 2005. Web query expansion by wordnet. In Proceedings of the International Conference on
Database and Expert Systems Applications (DEXA). Springer-Verlag, 166–175.
HINDLE, A., SHAO, J., LIN, D., LU, J., AND ZHANG, R. 2010. Clustering web video search results based on integration of multiple
features. In Proceedings of the International World Wide Web Conference (WWW), 1–21.
JING, F., WANG, C., YAO, Y., DENG, K., ZHANG, L., AND MA, W. Y. 2006. Igroup: web image search results clustering. In Proceedings
of the ACM Multimedia Conference (MM). 377–384.
KUMMAMURU, K., LOTIKAR, R., AND ETZIONI, O. 1998. Web document clustering: A feasibility demonstration. In Proceedings of the
21st International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 46–54.
LIU, J. 1994. The collapsed gibbs sampler in Bayesian computations with application to a gene regulation problem. J. Amer.
Stat. Assoc. 89, 958–966.
LIU, L., RUI, Y., SUN, L.-F., YANG, B., ZHANG, J., AND YANG, S.-Q. 2008b. Topic mining on web-shared videos. In Proceedings of the
International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2145–2148.
LIU, L., SUN, L.-F., RUI, Y., SHI, Y., AND YANG, S.-Q. 2008a. Web video topic discovery and tracking via bipartite graph reinforce-
ment model. In Proceedings of the International World Wide Web Conference (WWW). 1009–1018.
MILLER, G. A., BECKWITH, R., FELBAUM, C., GROSS, D., AND MILLER, K. 1990. Introduction to WordNet: An On-line Lexical Database.
Vol. 3. Oxford University Press.
RAMACHANDRAN, C., MALIK, R., JIN, X., GAO, J., AND HAN, J. 2009. Videomule: a consensus learning approach to multi-label
classification from noisy user-generated videos. In Proceedings of the Multimedia Conference (MM).
STEINBACH, M., KARYPIS, G., AND KUMAR, V. 2000. A comparison of document clustering techniques. In Proceedings of the ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining. 35–42.
TAN, P., STEINBACH, M., AND KUMAR, V. 2005. Introduction to Data Mining. Vol. 19. Addison Wesley.
TEH, Y. W., JORDAN, M. I., BEAL, M. J., AND BLEI, D. M. 2006. Hierarchical dirichlet processes. J. Amer. Stat. Asso. 101, 476,
1566–1581.
WU, X., HAUPTMANN, A. G., AND NGO, C.-W. 2007. Practical elimination of near-duplicates from web video search. In Proceedings
of the ACM MultiMedia Conference (MM). 218–227.
YUAN, J., LUO, J., AND WU, Y. 2010. Mining compositional features from gps and visual cues for event recognition in photo
collections. IEEE Trans. Multimedia 12, 7, 705–716.
YUAN, J., MENG, J., WU, Y., AND LUO, J. 2008. Mining recurring events through forest growing. IEEE Trans. Circuits Syst. Video
Techn. 18, 11, 1597–1607.
ZAMIR, O. AND ETZIONI, O. 1998. Web document clustering: A feasibility demonstration. In Proceedings of the 21st International
ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 46–54.
Received September 2010; revised March 2011; accepted July 2011

5 - TOMCCAP - Browse by Chunks PDF

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

5 - TOMCCAP - Browse by Chunks PDF

Transféré par

Droits d'auteur :

Formats disponibles

2011

Volume 7S, Number 1

Special Section on ACM Multimedia 2010 Best Paper Candidates

ACM Transactions on Multimedia Computing, Communications and Applications

SPECIAL ISSUE ON SOCIAL MEDIA

2011 • Vol. 7S • No. 1

Special Section on ACM Multimedia 2010 Best Paper Candidates

ACM Transactions on Multimedia Computing, Communications and Applications

SPECIAL ISSUE ON SOCIAL MEDIA

2011 • Vol. 7S • No. 1

Multimedia Computing, Tel.: (212) 869-7440

and Applications Guide to Manuscript Submission

Cover images from “A Holistic Approach to Aesthetic

Multimedia Computing, Tel.: (212) 869-7440

and Applications Guide to Manuscript Submission

Cover images from “A Holistic Approach to Aesthetic

2.1 Web Video Mining

2.2 Search Result Clustering

Fig. 3. System framework of video search result clustering.

Fig. 4. Hierarchy relation of word ‘attack’ in WordNet3.0.

5. SEMI-SUPERVISED HIERARCHICAL TOPIC MODEL

5.1 Latent Dirichlet Allocation and Hierarchical Topic Model

(a) Choose level zd,n ∈ {1, . . . , L} ∼ Discrete(θd);

p̂(wd,n|z, c, w−(d,n) , η) ∝ qd,n · Constraint(μ, zd,n), (6)

mπ + #[zd,−n = k]  (1 − m)π + #[zd,−n > j]

p(cd|w, c−d, z, η, γ ) ∝ p(cd|c−d, γ ) p(wd|c, w−d, z, η), (8)

Table I. Collected Video Sharing Web Sites Dataset Information

Fig. 7. Average subtopic number error as μ changes.

6.3 Experiments on a Text Subtopic Retrieval Dataset

6.4 Experiments on Video Sharing Web Sites

Fig. 9. (left:)Purity rates. (right:)F1 measure for test queries.

Table III. Average Time Cost of Different Clustering

Fig. 11. Mean rating score of Youtube and our method.

Fig. 12. Subtopic reach time as strength parameter μ changes.

Received September 2010; revised March 2011; accepted July 2011

Vous aimerez peut-être aussi

mπ + #[zd,−n = k] (1 − m)π + #[zd,−n > j]