Vous êtes sur la page 1sur 4

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 47, NO.

4, APRIL 2017 589

Guest Editorial
Introduction to the Special Issue on Large-Scale
Video Analytics for Enhanced Security:
Algorithms and Systems

UE TO the rapid increase of the number of cameras used semantic segmentation, baggage inspection, abnormal events
D in the video surveillance and the huge needs of the smart
city and public security, video surveillance by human beings
detection, and so on.
Besides the research at the algorithm level, it is also very
is no longer suitable. Hence, since the end of the last cen- important at the level of system engineering, especially at the
tury, video analytics for security or visual surveillance has era of Big Data. Surveillance system needs the capability
become one of the hottest research topics. Wide-area video to handle a large amount of raw video data from large-scale
surveillance systems can have extremely high data rates and monitoring and discover meaningful high-level events from
high data volumes. Therefore, the challenge of video analyt- abundant detected changes or objects at lower-level process-
ics is to extract meaningful information efficiently from the ing. On this issue, the first paper by Fan, Wang, and Huang
huge flow of video data in order to produce high-level seman- presents a wide-area monitoring system to handle the chal-
tic descriptions of the activities occurring in the area under lenges of large scale and heterogeneity data at the sensor,
surveillance. algorithm integration, and event visualization. Here, a Web-
Video analytics is also an interdisciplinary research area that based smart interface is proposed to visualize the high-level
covers abundant research interests in computer vision, sig- events detected from a diversity of multimodal information.
nal processing, and machine learning as well as diverse Within a condensed form of interface, the system can show
applications. The research subtopics include tracking, object a plenty of information not only 2-D, 3-D, and geographical
detection, object classification, behavior recognition, pose esti- data but also important spontaneous alerts and events, which
mation, and semantic scene modeling, etc. The applications can help security personnel effectively for event awareness in
of video analytics include public safety, intelligent transition, wide-area monitoring.
smart retail, crowd management, maintenance of secure sites, Object tracking is always a basic and hot research topic
etc. Video Analytics for Enhanced Security can provide a cost- in video analytics. The second paper by Ding, Yu, Oh, and
effective way of monitoring wide areas and capturing abundant Chen proposes an environment-dependent vehicle dynamic
semantic information within them. modeling approach considering interactions between the noisy
There has been great progress in the area of large-scale control input of a dynamic model and the environment in
video analytics over the past decades. For example, deep neu- order to make best use of domain knowledge. Based on
ral networks have achieved great promotion of performance for this modeling, a new domain knowledge-aided moving hori-
various vision tasks, such as object recognition, action recogni- zon estimation (DMHE) method is put forward for ground
tion, and so on. However, many challenges and problems still moving target tracking. The proposed method incorporates
have not been solved well in real surveillance scenes, such as different types of domain knowledge in the estimation pro-
robustness, adaptability, and scalability. More advanced solu- cess considering both environmental physical constraints and
tions are thus needed to meet emerging visual surveillance interaction behaviors between targets and the environment.
applications in the real world. Furthermore, in order to deal with a data association ambiguity
This special issue provides a forum to present both the state- problem of multiple target tracking in a cluttered environ-
of-the-art methodologies and applications of the advanced ment, the DMHE is combined with a multiple hypothesis
developments in large-scale video analytics for enhanced secu- tracking structure. Numerical simulation results show that
rity, and also to discuss the future research directions and the proposed DMHE-based method and its extension could
application issues in visual surveillance. achieve better performance than traditional tracking meth-
The accepted 11 papers span a variety of basic and hot top- ods which utilize no domain knowledge or simple physical
ics in large-scale video analytics for enhanced security, such as constraint information only.
object tracking, action recognition, pedestrian counting, image The growing popularity of Kinect and inertial sensors
poses some new problems on human action recognition.
Guo et al. present a multiview learning based approach
to fuse the multiple features extracted from multimodal
Digital Object Identifier 10.1109/TSMC.2017.2679694 sensors. Through encoding the complementary multiview
2168-2216 c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
590 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 47, NO. 4, APRIL 2017

features into a unified space, a new unsupervised feature on large-scale image datasets, including PASCAL VOC 2007,
fusion framework termed Multiview Cauchy Estimator Feature MSRC-21, and VOC 2012 databases. Results show that an
Embedding (MCEFE), is proposed for human action recogni- outstanding or comparable performance can be achieved by
tion. To further enhance robustness to outliers, the Cauchy the proposed method.
estimator is adopted on the data reconstruction. Moreover, The authors Zhang, Gao, Chen, Luo, and Sang present
ensemble manifold regularization is also enforced on the a midlevel feature-learning method for human action recog-
projection matrices to avoid overfitting. They perform exten- nition in videos. Different from previous work in seman-
sive experiments on a new multimodal human action dataset, tic part mining in human actions, the proposed method
i.e., CAS-YNU-MHAD, to demonstrate the effectiveness of adopts a saliency-driven max-pooling scheme to represent
MCEFE method. a video, so that the space-time invariance can be learned.
For the complex and inefficient optimization problem of Furthermore, a group sparse classifier model is developed
sparse representation based methods in visual tracking, the to utilize the relation between different part detectors, so
fourth paper from Cheng et al. proposes a temporal consis- as to select discriminative parts. They also conduct fea-
tency dictionary learning tracking algorithm to enable efficient ture selection based on the entry magnitude of the model
dictionary learning and tracking executive. First, they present coefficients, to further enhance the discriminative ability of
an objective function which introduces the fixed dictionary the representation. Extensive experiments are performed on
and variance dictionary to reconstruct the objects appearance. four challenging datasets, KTH, Olympic Sports, UCF50,
In particular, the proposed method takes the temporal con- and HMDB51. The results show that the proposed method
sistency into account by adding a regularization term into significantly outperforms the state-of-the-art methods.
the objective function to constrain the difference of object The eighth paper, by Li et al. presents a grayscale ther-
appearance at adjacent frames. Then the optimization problem mal object tracking method in a Bayesian filtering framework
is solved in an iteration way. Second, an effective observa- based on multitask Laplacian sparse representation. First,
tion likelihood function is developed based on the proposed they extract a set of overlapping local patches within each
model. Finally, an appearance updating strategy is presented blob, and then pursue the multitask joint sparse representation
to adapt to the objects appearance variations by the online for grayscale and thermal modalities. Next, the representa-
dictionary learning. Experimental evaluation on the TB50 and tion coefficients of the two modalities are concatenated into
TB100 datasets shows that the proposed tracking method out- a vector to represent the feature of the blob. Finally, the sim-
performs sparse representation related visual tracking as well ilarity between each patch pair is deployed to refine their
as other state-of-the-art tracking methods. representation coefficients in the sparse representation, which
As few existing global regression frameworks concern can be formulated as the Laplacian sparse representation.
on mitigating the suffering from target drift caused by They also incorporate the modal reliability into the Laplacian
imbalanced data distribution, the fifth contribution from sparse representation to achieve an adaptive fusion of different
Chen and Zhang proposes a novel counting-by-regression source data.
framework to improve the robustness against inconsistent X-ray images provide another important sensing modality
feature-target relationship based on learning with privileged for some surveillance applications. Based on some computer
information. The concept of back propagation considers how vision techniques proposed in last years, Mery et al. develop
to select more informative samples contributed to robust fit- an X-ray screening system for baggage inspection at secu-
ting performance. Moreover, the direction of target drift along rity checkpoints. They tested ten classical image classification
the continuously changing target dimension is discovered by algorithms including bag of words, sparse representations,
learning local classifiers under different situation of pedes- deep learning, etc., for the specific task on the recognition of
trian density. Experimental evaluation on the public UCSD and three threat objects, i.e., handguns, shuriken, and razor blades,
Shopping Mall benchmarks verifies that the proposed method from X-ray images. To evaluate those methods, they performed
significantly beats the state-of-the-art counting-by-regression extensive experiments on the same database. In order to make
frameworks. fair comparisons, a common experimental protocol based on
In large-scale vision learning, it is an essential problem to training, validation and testing data is also proposed. In their
reduce the cost of labeling amounts of training data. A weakly experiments, the recognitions of are tested. The high recog-
supervised learning method is specifically desirable in such nition performance (more than 95% of accuracy) achieved by
case. The sixth paper by Li, Guo, Kao, and He presents an visual vocabularies and deep features illustrates a strong poten-
interesting weakly supervised approach for image semantic tial to develop an automated system to aid human inspection
segmentation. They propose a conditional random field (CRF)- task using these computer vision algorithms.
based framework to infer a predefined category label for each The tenth contribution, by Zhang et al. also focuses on
pixel in the image. First, similar image pixels are merged into object tracking. They propose an output constraint trans-
super-pixels, i.e., a larger-piece based image representation. fer (OCT) method by modeling the distribution of correlation
Then, a piece library is constructed pieces from all the train- response in a Bayesian optimization framework, which is able
ing images, where CRF is adopted to associate certain pieces to mitigate the drifting problem of the kernelized correla-
with appropriate semantic labels. In test phrase, the super- tion filter (KCF) tracker. OCT builds upon the reasonable
pixels of test images are compared with image pieces in the assumption that the correlation response to the target image
library and assign them the labels. Experiments are performed follows a Gaussian distribution. OCT is rooted in theory which
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 47, NO. 4, APRIL 2017 591

transfers data distribution to a constraint of the optimized and suggestions on the submitted manuscripts. The authors
variable, leading to an efficient framework to calculate cor- also greatly acknowledge the strong support from the
relation filters. Extensive experiments on a commonly used Editor-in-Chief of the IEEE T RANSACTIONS ON S YSTEMS ,
tracking benchmark show that the proposed method signifi- M AN , AND C YBERNETICS : S YSTEMS C. L. Philip Chen and
cantly improves KCF, and achieves better performance than all kinds of help from his editorial colleagues.
other state-of-the-art trackers.
The last contribution, of Yu, Liu, and Sun, focuses on
K AIQI H UANG, Guest Editor
abnormal event detection in videos, where a low-rank-based
Institute of Automation
sparse reconstruction method is presented to detect abnormal
Chinese Academy of Sciences
events detection. First, a dictionary of normal videos is learned
Beijing 100049, China
based on the low-rank property from multiscale 3-D gradient
features, which represents important characteristics of nor-
mal behavior patterns; second, a sparsity algorithm is adopted T IENIU TAN, Guest Editor
to select the relevant dictionary bases adaptively, which can Institute of Automation
interpret the corresponding dynamic scene semantics effi- Chinese Academy of Sciences
ciently. Finally, based on the idea describing normal behavior Beijing 100049, China
patterns should be assigned with lower cost to reconstruct test-
ing samples, the weighted sparse reconstruction method can be S TEVE M AYBANK, Guest Editor
used to detect abnormal events effectively. In experiments, the Department of Computer Science and Information Systems
proposed method is evaluated on public datasets in comparison University of London
with the state-of-the-art methods. A comparable performance London WC1E 7HX, U.K.
can be achieved.
These papers in this special issue are thought as to be repre- R AMA C HELLAPPA, Guest Editor
sentative of the state of the art and to give excellent insight into Department of Electrical and Computer Engineering
the basic and hot problems in large-scale video analytics for University of Maryland
enhanced security. It is also expected that this special issue will College Park, MD 20742 USA
inspire more researchers to enter this exciting research area.
JAKE AGGARVAL, Guest Editor
ACKNOWLEDGMENT Department of Electrical and Computer Engineering
The authors would like to thank all the authors for their University of Texas
contributions and all the reviewers for their valuable comments Austin, TX 78712 USA

Kaiqi Huang (SM09) received the B.Sc. and M.Sc. degrees from the Nanjing University of
Science Technology, Nanjing, China, and the Ph.D. degree from Southeast University, Nanjing.
He was with the National Laboratory of Pattern Recognition (NLPR), Institute of Automation,
Chinese Academy of Science, China, where he is currently a Professor. His current research
interests include visual surveillance, digital image processing, pattern recognition, and biological-
based vision. He has published over 150 papers in the important international journals and con-
ference such as the IEEE TIPAMI, the IEEE T RANSACTIONS ON I NFORMATION P ROCESSING,
the IEEE T RANSACTIONS ON S YSTEMS , M AN , AND C YBERNETICS, the IEEE T RANSACTIONS
ON S YSTEMS , M AN , AND C YBERNETICS PART B: C YBERNETICS , the IEEE T RANSACTIONS
ON C IRCUITS AND S YSTEMS FOR V IDEO T ECHNOLOGY , Pattern Recognition, Computer Vision
and Image Understanding, ICCV, ECCV, CVPR, the International Conference on Information
Processing, and the International Conference on Pattern Recognition.
Dr. Huang received the Best Student Paper Awards from ACPR10, the winner prizes of the detection task in both PASCAL
VOC10 and PASCAL VOC11, the honorable mention prize of the classification task in PASCAL VOC11, the winner prized
of classification task with additional data in ILSVRC 2014. He was the Deputy General Secretary of the IEEE Beijing Section
from 2006 to 2008 and serves as an Associate Editor for journals such as the IEEE T RANSACTIONS ON S YSTEMS , M AN ,
AND C YBERNETICS .
592 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 47, NO. 4, APRIL 2017

Tieniu Tan received the B.Sc. degree in electronic engineering from Xian Jiaotong University,
Xian, China, in 1984, and the M.Sc. and Ph.D. degrees in electronic engineering from Imperial
College London, London, U.K., in 1986 and 1989, respectively.
In 1989, he joined the Computational Vision Group, Department of Computer Science,
University of Reading, Reading, U.K., where he was a Research Fellow, a Senior Research Fellow,
and a Lecturer. In 1998, he returned to China to join the National Laboratory of Pattern Recognition
(NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China, where he is cur-
rently a Professor and the Director of the Center for Research on Intelligent Perception and
Computing, and was a Former Director with NLPR from 1998 to 2013 and the Director General
of the Institute from 2000 to 2007. He is currently the Vice President of the Chinese Academy
of Sciences. He has published 13 edited books or monographs and over 500 research papers in
refereed international journals and conferences in the areas of image processing, computer vision,
and pattern recognition. His has an H-index of 66 (as of March 2015). His current research interests include biometrics, image
and video understanding, and information content security.
Steve Maybank received the B.A. degree in mathematics from Kings College, Cambridge, U.K.
in 1976, and the Ph.D. degree in computer science from Birkbeck College, University of London,
London, U.K., in 1988.
He was a Research Scientist with GEC, Coventry, U.K., from 1980 to 1995, and joined the
Department of Computer Science, University of Reading, Reading, U.K., as a Lecturer, in 1995.
He became a Reader in 1998. He has published over 80 scientific papers and one book. His
current research interests include nearest neighbor pattern classification, geometry of multiple
images, applications of invariants to computer vision, closed-circuit television surveillance, and
applications of Fisher information to computer vision.

Rama Chellappa received the B.E. (Hons.) degree from the University of Madras, Chennai, India,
in 1975, the M.E. (with Distinction) degree from the Indian Institute of Science, Bengaluru, India,
in 1977, and the M.S.E.E. and Ph.D. degrees in electrical engineering from Purdue University,
West Lafayette, IN, USA, in 1978 and 1981, respectively.
Since 1991, he has been a Professor of electrical engineering and an Affiliate Professor of
computer science with the University of Maryland, College Park, MD, USA, where he is also
affiliated with the Center for Automation Research, as the Director, and the Institute for Advanced
Computer Studies, as a Permanent Member. In 2005, he was named a Minta Martin Professor of
Engineering. He was an Assistant from 1981 to 1986, an Associate Professor from 1986 to 1991,
and the Director of the Signal and Image Processing Institute with the University of Southern
California, Los Angeles, CA, USA, from 1988 to 1990. Over the last 30 years, he has published
numerous book chapters, peer-reviewed journal, and conference papers. He has co-authored and
co-edited books on Markov random fields, face and gait recognition, and collected works on image processing and analysis.
His current research interests include face and gait analysis, markerless motion capture, 3-D modeling from video, image and
video-based recognition and exploitation, compressive sensing, and hyperspectral processing.
Dr. Chellappa has served as a Co-Editor-in-Chief of Graphical Models and Image Processing.
Jake Aggarwal (F76) received the B.Sc. degree from the University of Bombay, Mumbai, India,
in 1957, the B.Eng. degree from the University of Liverpool, Liverpool, U.K., in 1960, and the
M.S. and Ph.D. degrees from the University of Illinois, Urbana, IL, USA, in 1961 and 1964
respectively.
He has served on the Faculty of the College of Engineering, University of Texas at Austin,
Austin, TX, USA, from 1964 to 2014, where he is currently the Cullen Trust Endowed Professor
Emeritus with the Department of Electrical and Computer Engineering. He is still active in
research. His current research interests include computer vision, pattern recognition, and image
processing focusing on human motion and activities. He has authored or edited seven books and
52 book chapters, authored over 200 journal papers, as well as numerous proceeding papers and
technical reports.
Dr. Aggarwal was a recipient of the Best Paper Award of the Pattern Recognition Society in
1975, the Senior Research Award of the American Society of Engineering Education in 1992, the IEEE Computer Society
Technical Achievement Award in 1996, the 2004 K. S. Fu Prize of the IAPR, the 2005 Leon K. Kirchmayer Graduate Teaching
Award of the IEEE, and the 2007 Okawa Prize of the Okawa Foundation of Japan. He is a Fellow of the IAPR in 1998 and
AAAS in 2005.

Vous aimerez peut-être aussi