Académique Documents
Professionnel Documents
Culture Documents
ABSTRACT
In general, a distributed processing is not suitable for dealing with image data stream due to the
network load problem caused by communications of frames. For this reason, image data stream
processing has operated in just one node commonly. However, we need to process image data
stream in a distributed environment in a big data era due to increase in quantity and quality of
multimedia data. In this paper, we shall present a real-time pedestrian detection methodology in
a distributed environment which processes image data stream in real-time on Apache Storm
framework. It achieves sharp speed up by distributing frames onto several nodes called bolts,
each of which processes different regions of image frames. Moreover, it can reduce the
overhead caused by synchronization by computation bolts which returns only the processing
results to the merging bolts.
KEYWORDS
Distributed stream processing, Apache storm, Image Processing, Pedestrian Detection.
1. INTRODUCTION
Recently, the data in IT industry has been dramatically increasing. Besides, the volume of the
data is also increasing continuously. Especially, size of digital pictures and resolution of video is
bigger than before. In addition to this, the sharp data increase for services in an era of Internet of
Things(IoT) makes it difficult to process image data stream in real-time in just one node.
Therefore it is essential to process large-scaled stream image data in a distributed environment.
In this paper, we propose a pedestrian detection methodology which is an efficient model for
dealing with image data stream in the distributed environment. We use Apache Storm [1] for the
implementation which is a distributed stream processing framework in real-time. Apache Storm
runs topologies on a Storm cluster which consists of spouts and bolts. The spout is a streamer task
which makes sequence of tuples that is a data model of Storm and bolts are tasks for processing
jobs. A target of our distributed image stream processing is a pedestrian detection. The pedestrian
detection is an important technique which can help to prevent many accidents in an automobile
field and creates profit by analysing the number of customers on shops. Therefore we consider for
David C. Wyld et al. (Eds) : NETCOM, NCS, WiMoNe, CSEIT, SPM - 2015
pp. 211218, 2015. CS & IT-CSCP 2015
DOI : 10.5121/csit.2015.51618
212
pedestrian detection and propose an efficient way for detecting pedestrians on the distribute
environment.
Our methodology is has several advantages as follows: Firstly, it speeds up the processing by
being operated in parallel on the each node by distributing frames onto several computation.
Secondly, it can reduce the overall computation load by dividing the frame into several region to
detect on the each frame. Finally, the overhead caused by synchronization can be reduced by
returning only the processing result.
The outline of our paper is as follows: In section 2, we describe related works about Apache
Storm framework and pedestrian detection algorithm which we selected. In section 3, we present
a model for processing pedestrian detection efficiently on the distributed environment and explain
a topology for a workflow running on the Storm cluster. In section 4, we explain our results of
experiment.
2. RELATED WORKS
2.1. DISTRIBUTED STREAM PROCESSING
Recently, big data systems like Hadoop [2] have been used on various fields. Hadoop is populist
platform on the big data systems. However, applications become more various, and users have
needed to get process results more quickly. Therefore many stream process platforms appear to
provide services which can process the big data in real-time.
Big data needs to process data in distributed and parallel environment because it is not structured
like data saved in database. For this solution, Hadoop appear which can process big data using
MapReduce [3] that is parallel processing framework. However, according to increase of data like
RFID, twit and CCTV, batch systems like Hadoop have reached limit to process these largescaled data. For resolving this problem, various techniques have been suggested to process it.
213
Storms data model is unbounded sequence of tuples which is consist of a field and a value. The
stream of tuples flows through topologies, which are directed graphs. The topologys vertices
represent computations and the edges represent the data flow. The vertices divided into two type,
Spout and Bolt. Spout is a supplier of stream, which reads tuples form sources and provides it to
topologies. And bolt is processing unit. Every works of Storm is on the Bolt, and it emits the
results of processing to other bolts.
Haarcascades was used as our detector for finding pedestrians, since our purpose of this paper is
just detecting in distributed environment rather than focusing in accuracy of detection.
(a)
(b)
214
(c)
(d)
Figure 1. The continuous frames of fps 15 video (a) Frame 1, (b) Frame 2, (c) Frame 3 and (d) Frame 4
215
3.2. WORKFLOW
In this section, we explain the topology which is run on the Storm cluster. For reducing the
network load and the works, features are delivered instead of frames. If the frames as the result of
the detection are delivered, synchronization between the frames will be needed and happen
network load for delivering every frame again.
4. IMPLEMENTATION
4.1. EXPERIMENT ENVIRONMENT
Our system described above is implemented on a laptop with an Intel Celeron CPU B800 @
1.50GHz processor running Window8. Program is developed by Java using Eclipse Luna using
Storm framework. The library is used the OpenCV. Distributed environment is composed by
Oracle VM VirtualBox. Three virtual machine are generated and run Linux Ubuntu 14.04 64bit.
Apache-Storm 0.9.5 and Zookeeper 3.4.6 is installed for running Storm.
216
(a)
(b)
(c)
(d)
Figure 4. Detection pedestrians on each part of sampled frames. (a) Upper left, (b) Upper right, (c) Bottom
left, (d) Bottom right
217
ACKNOWLEDGEMENTS
This research was supported by Korea university and MSIP(Ministry of Science, ICT and Future
Planning), Korea, under the ITRC(Information Technology Research Center) support program
(IITP-2015-H8501-15-1004) supervised by the IITP(Institute for Information & communications
Technology Promotion)
REFERENCES
[1]
Toshniwal, Ankit, et al. "Storm@ twitter." Proceedings of the 2014 ACM SIGMOD international
conference on Management of data. ACM, 2014.
[2] Shvachko, Konstantin, et al. "The hadoop distributed file system." Mass Storage Systems and
Technologies (MSST), 2010 IEEE 26th Symposium on. IEEE, 2010
[3] Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters."
Communications of the ACM 51.1 (2008): 107-113.
[4] Hunt, Patrick, et al. "ZooKeeper: Wait-free Coordination for Internet-scale Systems." USENIX
Annual Technical Conference. Vol. 8. 2010.
[5] Bradski, Gary. "OpenCV." Dr. Dobbs Journal of Software Tools (2000).
[6] Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." Computer
Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 1.
IEEE, 2005.
[7] Viola, Paul, and Michael Jones. "Rapid object detection using a boosted cascade of simple features."
Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE
Computer Society Conference on. Vol. 1. IEEE, 2001.
[8] Kothiya, Shraddha V., and Kinjal B. Mistree. "A review on real time object tracking in video
sequences." Electrical, Electronics, Signals, Communication and Optimization (EESCO), 2015
International Conference on. IEEE, 2015.
[9] Im, Dong-Hyuck, Cheol-Hye Cho, and IlGu Jung. "Detecting a large number of objects in real-time
using apache storm." Information and Communication Technology Convergence (ICTC), 2014
International Conference on. IEEE, 2014.
[10] Benenson, R., Mathias, M., Timofte, R., & Van Gool, L. (2012, June). Pedestrian detection at 100
frames per second. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on
(pp. 2903-2910). IEEE.
218
AUTHORS
Du-Hyun Hwang is currently working towards a masters degree at Department of
Electrical Engineering, Korea University. His current research interests are distributed
parallel computing, computer vision and GPU processing.
Yoon-Ki Kim is currently working toward the PhD degree in Electronic and Computer
Engineering at the Korea University. His research interests include real-time distributed
and parallel data processing, IoT, Sensor processing and computer vision.