Vous êtes sur la page 1sur 8

Proceedings of the 8th World Congress on Intelligent Control and Automation July 6-9 2010, Jinan, China

Energy-Based Surveillance Systems for ATM Machines


Ning Ding, Yongquan Chen, Zhi Zhong and Yangsheng Xu
Mechanical and Automation Engineering Department, The Chinese University of Hong Kong, Hong Kong, China Shenzhen Institute of Advanced Technology, Shenzhen, Guangdong, China {nding, yqchen, zzhong, ysxu}@mae.cuhk.edu.hk

Abstract This paper presents a video surveillance system which can detect and deal with typical abnormal behaviors on Automatic Teller Machine (ATM), such as fraud and robbery, etc. Based on the case study of violent incident video records, a weighted kinetic energy extraction approach for violence identication is proposed. By using the new approach, the motion eld is weighted with angle coefcient, thus reducing the video stream to a one-dimension energy series. Experimental results show that the ATM video surveillance system with energy approach is effective for typical incident classication and that the corresponding alarm signal is reliable. Index Terms Intelligent surveillance system, feature extraction, Violence behavior detection, Automatic Teller Machine(ATM) Fig. 1 Real case snapshots. These gures shows two real crimes at ATM. The upper three gures show three criminals is stealing card by distracting customers attention, and the lower gures show robbery in ATM.

I. I NTRODUCTION With growing number of ATMs put into use comes the added responsibility of their maintenance and, more importantly, their security concerns. To address the problem, the banking industry has been using surveillance systems (sometimes called CCTV systems) extensively over the past 20 years. To the ATM scene, generally speaking, there are two typical ATM-related crimes. The rst is Non-violent crimes, such as Fraud, Password Theft. The second is Violent crimes, such as Robbery. The Fig.1 shows two actual crime case. The upper three gures is three criminals(highlight with yellow line) changing the victims card by distracting his attention. Compared with Non-violent activities, some crimes are more violent and more damaging to cardholder or to ATM. Take for example, Vandalism, in the form of violent damage, for purely destructive purposes or for stealing money; and Robbery, where a cardholder is physically threatened and forced by hijackers to draw money from the ATM as the lower three gures in Fig.1 shows. These crimes take place mostly at midnight on or around the unguarded ATMs offering 24-hour services. But, the majority of surveillance system still only play the role of After-care services because of the defect of manual control. So we propose a video surveillance system to detect and recognize abnormal behaviors at ATMs. Many researcher proposed human abnormal behaviors detection approaches in video surveillance system, eg. Cupillard
This work was supported by the grants from International S&T This work is partially supported by National Natural Science Foundation of China Grant#90820017, National Basic Research Program of China #2007cb311005, and Hong Kong Innovation and Technology Fund #GHP/006/09SZ

et al.[1] give a methodology of human behavior recognition and framework of video surveillance system. Aimed at abnormal behaviors, Datta[2] lay emphasis on studying human gait. Chen et al.[3] and Wu et al.[4] adopt SVM methods to identify whether the object gait is normal or not. This method is same to Geng et al.[5]. The paper[6], Oren et al. use chunks of data extracted from the database to recognize whether the video content is irregular or not. The motion feature in Nonviolent cases is diversied, but they have the same characteristics. That is, all criminals have to get close to victims before they start to act. Accordingly, we use object tracking to detect these behaviors. Haritaoglu et al.[7] have made great effort in background estimation and object tracking. Although the probability of violence occuring is limited, the consequences is very serious. As violence acts are usually rapid and complex physical movements. For human aggressive behavior detection, Nam et al.[8] and Zajdel et al.[9], presents an abnormality detection abnormality detection based on a video energy approach fusion an abnormal audio detection method to detect aggressive events at metro stations. These fusion method belongs to multimodal surveillance system[10]. In our early research work, Zhong et al. have adopted an energy-based algorithm[11], [12] for video surveillance for crowd estimation. In this paper, we propose a weighted kinetic energy extraction approach to describe the degree of the drastic motion of object in a macroscopical way. The algorithm was embedded in the video surveillance system framework at ATMs. And experimental results show

978-1-4244-6712-9/10/$26.00 2010 IEEE

2880

Fig. 2 The framework of Video Energy-based ATM Surveillance System. The system is described on Matlab and Simulink.

reduce the occurrence ratio of abnormal events, the drawing of a yellow line is reasonable and acceptable. With object tracking and trail analyzing, the system can detect several typical kinds of fraud, password theft, abnormal loitering and etc., at ATMs. Sensitive area monitoring subsystem uses the current frame of the video and the background from previous model as input signal, and then exports object positions to the logical decision making subsystem, whose functions can be easily realized with blob detection tracking. There are three steps in this subsystem: 1) Setting Region Of Interesting(ROI); 2) Foreground extraction; 3) Blob tracking and labeling. B. Aggressive Behaviors Detection Subsystem Identication for violent actions call for a high demands for real-time. However, the detection methods based on the posture detail and body characteristics of the target can not meet the requirement[2] etc.. In this subsystem, a 4-dimensions video stream will be converted into an 1dimension energy curve through a novel video energy mining approach. The method is a dimension reduction approach in nature. Violent act are not conned to a region being monitored, which requires the real-time analysis of the whole scene. The input of subsystem is the current frame, and the output the energy value of the frame. Processing functions include: 1) Optical ow computation; 2) Mask settings; 3) Motion eld and Angle eld extraction; 4) Energy mining model based on the Weighted Kinetic Energy Extraction algorithm which will be presented in the section 3 in detail. C. Logical Decision Subsystem As the Fig.3 shows, the logical decision subsystem identies the video content based on the information from the two proceeding subsystems, determining the corresponding levels of alarm. Decision rules are: 1) Violence detection possesses the highest weight according to the feature of the energy curve. 2) The ROI has no more than one blob to prevent the interference from other people waiting behind the yellow line when the customer is operating; 2) The size of the blob accords with normal activities of human being. This rule prevents the ATM from being operated by more than one person at a time; 3) The time of the blob staying in the area should not go beyond the threshold to prevent malicious operations. The system denes three levels of alarm to describe the degree of abnormality respectively. The Red Alarm means violence. The Yellow Alarm means minor violation, eg. crossing the yellow line and occasional interruptions to

that the system using the novel energy algorithm and the system performs well in abnormality detection. The paper is organized as follows: In section 2, we briey introduce the architecture of ATM video surveillance system. In section 3, details of Energy algorithm of weighted kinetic energy are explained. We present the experiment designation and procedure in Section 4. Finally, experimental results and conclusion is given out in the section 5. II. ATM V IDEO S URVEILLANCE S YSTEM The framework of the ATM video surveillance system is described by the Matlab Simulink develop platform. As shown in Fig.2, The system include Signal Input, Result Output and Analysis subsystem. The Output is used to show results and alarm. The Input includes the following function: 1) Gaussian background modeling; 2) Adaptive background updating; 3) Video resolution resizing for expediting the processing speed. Analysis subsystem includes the following function: 1) Sensitive regional monitoring subsystem; 2) Monitoring subsystem for violent behavior; 3) Logical decision-making subsystem. A. Sensitive Area Monitoring Subsystem To dene abnormal behaviors at ATMs, rst we have to know what normal behaviors are. Under normal circumstances, people should queue up at ATMs. When someone is operating on an ATM, other customers should keep waiting behind a yellow line. In fact, the strict implementation of this rule can prevent many problems from happening. Before a customer leaves the ATM booth with his card and cash, it is better for him to keep a safe distance from the others. We can set aside a region within the yellow line in front of ATM as the sensitive area, where customers are allowed to enter one at a time. Normally, there is more than one person at the ATM in service as, for instance, couples often go to the ATM together, and kids tend to stand next to their parents. To

2881

long. Take into consideration the real-time requirement, we use Lucas-Kanade[14] and Horn-Schunck optical ow algorithm [15]. We have adopted the energy-based methods for crowd estimation in a metro video surveillance system. Here, we should review the Motion Features based on optical ow motion eld shown in Table 1.
TABLE I M OTION F EATURES Symbol x y x y v angle angle d2 x d2 y Description horizontal position of Motion Feature vertical position of Motion Feature derivative of x derivative of y magnitude of the velocity orientation of the velocity difference orientation derivative of dx derivative of dy

Fig. 3

Logical Flowchart

1 2 3 4 5 6 7 8 9

customers in operation. The Orange Alarm means serious violation, which is the interim statue from normal to violence. The following situations fall in the category of Orange warning: 1) Loitering in the sensitive area; 2) More than one person being operation on the ATM or exceeding the time threshold; 3) Dramatic uctuation and fast rising in the energy curve. III. V IDEO E NERGY A video is a stream of varying image sequence. The difference between proximate frames reects a part of objects posture changes and movement in view. The object motion is the processing of energy release and exchange. Different human behavior shows different energy feature. Therefore, video energy can be used to describe the characteristics of human behaviors in the vision. And, mining more energy from the video stream is the core of this system. Relative to the slow-paced normal behaviors, an action with unusual high speed will be considered as an abnormality. It is clear that an object with high speed contains more kinetic energy. Then the motion speed is the feature we need. Furthermore, violence is a complex body motion. How to describe the complexity is our essential work. Thus, a weighted kinetic energy is proposed. A. Denition of Weighted Kinetic Energy In the previous work, weve discussed Motion Feature obtained by tracking corners from a video stream through the optical ow algorithm. Recently, more accuracy optical ow algorithms are proposed. For instance, Xiao et al.[13] gives a bilateral ltering based approach, which has high accuracy of building motion eld. But the computational time is rather

We extract the kinetic energy from video to estimate crowd degree according to the equation below:
W H 2 wi,j (n) vi,j (n) i=1 j=1

E(n) =

(1)

The parameter vi,j (n) is the velocity of the (ith, jth) pixel in the nth frame (W idth = W, Height = H) and the coefcient wi,j (n) is obtained from the blob area of the current frame to describe the number of people in the blob. This equation has been dened and discussed in[11], [12] in full length. In this paper, the same equation is used. The coefcient of kinetic energy wi,j (n) should be reset for the identication of violence like collision, rotation and vibration occur constantly in the process of the human body movements. Therefore, our question focuses on how to describe the discordance and complexity of motion. For this purpose, we should analyze the angle component of optical ow motion eld. B. Angle Field Analysis The phrase angle component in a motion eld, referred to as the Angle Field for short, can be received from following steps: 1) Calculate horizontal ui,j (n) and vertical vi,j (n) components in complex form of every pixel in current frame, ui,j (n) + j vi,j (n). 2) Get the velocity component by the complex modulus(magnitude) of the complex matrix, Vi,j (n) = |ui,j (n) + j vi,j (n)|. Make a M ASK(n) by setting a certain threshold Vmin on velocity component matrix. 3) Get angle component by the phase angle of the complex matrix. Use the M ASK(n) to obtain the nal Angle Field, ui,j (n) Ai,j (n) = arctan vi,j (n) M ASKi,j (n). The usage of the Mask is to reduce the noise from the background.

2882

Fig. 4 Motion Field and Angle Distribution. The left four image is describe Normal Behaviors and the right four is Abnormal Behaviors. The sub-gure(a) is current frame. The (b) is the optical ow eld. The red point in (b) indicate the pixel with highest velocity. The position mostly located in head and limbs, and change frequently along with the object posture. The (d) is the zoomed image from the red box in (b) to show detail around the red point. The (c) is the histogram of angle eld.

Fig.4 shows the angle distributing of the normal and abnormal situation separately. The angle concentrated in a certain direction when the object moving in the view normally. When aggressive events occurring, the angle eld presents average distribution. Next, we use two coefcients to describe the angle discordance. C. Weighted Coefcients Design In our algorithm, two coefcients are adopted. The rst is Angle in the Table. 1, which indicates the angle difference between proximate frames. The second is designed to represent the inconsistency of angle in the current frame. To achieve this purpose, we need an angle value for the benchmark. There are three kinds of angle which can be considered as the benchmark. The rst is the average direction that represents the common motion direction of each body part. The second one is the motion direction of the object centroid, which shows the general motion trend of the object. And the one we adopted is the direction of the pixel with the highest speed. Because it is situated in the most intense conicting region, to which we should pay much attention. We named the coefcient AngleM , which can be obtained from following steps: 1) Find the pixel with the highest speed, and use its angle as the benchmark angle AngleM ax; 2) Calculate the difference of all pixel in the ow eld with the AngleM ax Eq.(2); AngleMi,j (n) = Ai,j (n) AngleM ax (2)

4) Before application, we should normalize coefcients in the range of (0, 1). Firstly, Calculate absolute matrix of AngleMi,j (n), and divide the AngleMi,j (n) by . In order to exaggerate the weight effect for better classication performance, we can use two weight approaches, such as (A)Eq.(3): |Anglei,j (n)| |AngleMi,j (n)| 2 + ) (3) wi,j (n) = (1 + and (B)Eq.(4): |Anglei,j (n)| |AngleMi,j (n)| 10)2 +( 10)2 wi,j (n) = ( (4) Comparing the Eq.1, the Weighted Kinetic Energy wE of nth frame is obtained from the equation Eq.5 below:
W H

wE(n) =
i=1 j=1

wi,j (n) vi,j (n)2 M ASKi,j (n)

(5)

TABLE II E XPERIMENT D ESIGN Value Video Signal Video resolution 576 720pix Down sampled resolution 120 160pix Frame rate 25f ps Time threshold of the blob stay 10s Velocity threshold for Mask 0.01 Software platform OS Windows XP Simulation environment Matlab 2007a, Simulink Hardware platform CPU Inter D-Core 2140 Memory DDR II 667 2G Parameters

1 2 3 4 5 1 2 1 2

3) Some element in the matrix of AngleMi,j (n) should round into the range of (, ). if the value of AngleMi,j less than or big than , then add 2 or 2 on it. The AngleMi,j (n) is the absolute difference to the benchmark angle.

2883

Fig. 5 Comparison of Energy Curves. These curves are obtained from different optical ow algorithm and weight approaches. The comparison result will help to chose the best energy extraction method. The red line indicates the time when the abnormality occurs. The sub-gure under the chart shows the snapshot of a clip contained violence.

IV. E XPERIMENTS We randomly surveyed 30 ATM outlets in NanShan district of ShenZhen, a big city in south China, for installation information and experimental videos of ATM surveillance systems. The position and number of camera rely on the shape and space of the ATM booth to meet the requirements of face identication and event recording. The realistic video dataset of robberies and other abnormal events at ATMs is rather difcult to collect. When we viewed and analyzed clips of actual cases downloaded from network, We found that those cases occur in different scenes, the position of camera and the quality of video is varies considerably with

the position of camera and location s where those case occur. For the comparability of experimental result, we imitate those real events and act them out in a same camera location to test our algorithm. A. Experiment Designation A traditional CCTV video surveillance system is installed in the laboratory to simulate the ATM scene as shown in Fig.5. The door of the lab is considered the location of ATM and a yellow line is drawn 1m away from it outside the door. The camera is placed on the top of the wall close to the ATM to make sure that the customer operating on

2884

Fig. 6

ATM scene planform. Fig. 7 Wavelet Analysis. The upper sub-gure is the energy curve computer by Horn-Schunck algorithm and weight approach(B). The 3-level Approximation in the 2nd gure indicate that there is more energy when violence occur, and the 1-level detail indicate that the energy vary acutely in the abnormal part.

the ATM do not overlap the yellow line from the view of camera. The experiment materials which consist of 21 video clips running 20 to 30 seconds contain normal statues in the rst part and several typical abnormal events such as fraud, ghting, robbery and etc., in the second part. Experiments information and parameters of the system is listed in Table 2: B. Comparison and Analysis Fig.6 shows the performance that the Weighted Kinetic Energy describes Aggressive Behaviors in a clip recoding 4 people ghting. As the gure shows, when abnormality occurs (around the 269th frame as the red line indicated), the Original Kinetic Energy Curve[9](oKEn), the energy without weights, which remained in stationary state as in the previous normal situation. but, the Weighted Energy Curve(wKEn) uctuate drastically and rise to high value within a few frames. There are two high energy event occur around 80th and 180th frame. The Energy value of Original Kinetic Energy curve is close to 2500, and the value of Weighted Energy Curve is 2800. The proportion of two energy is (1 : 1.2). When the real abnormal event occur, the proportion is raise to (1 : 3 6). It means, the weighted method can restrain the weight when a high speed but not complex motion is happen, but spirit up the weight when a noticeable event occur. The difference between the Weighted and the Original Curve is enlarged in the abnormal part because of the disorder degree motion direction. The sub gure (1) and (2) is the energy curve obtained from Lucas-Kanade and Horn-Schunck Method separately. We chose Horn-Schunck algorithm, because those curves between 150th 200th frames prove to be rather robust in normal part. Another comparison is made between two weighted approaches (A)

Eq.(3)and (B) Eq.(4). Noticed that, comparing to the (A), the curve based on weighted approach (B) performs more robust in normal part and more sensitive in the real, the abnormal part, Which is what we want. Finally, the energy extraction approach we chosen is based on weighted method(B) and Horn-Schunck optical ow algorithm. Furthermore, Stationary Wavelet Transform(SWT)[16] is employed to analyze the energy curve and the sym4 of the Symlets wavelet family to perform 3-level signal decomposition. In Fig.7, the 3rd level Approximation coefcient of the energy curve shows there is more energy generating when aggressive behavior occurs. As for the 1st level detail coefcient (the lowest sub-gure), when adopting 1-dimensional variance adaptive threshold on it, the boundary(located in the 271st frame) between two parts with different variance is quite close to the frame of abnormal event start. From these result it is clear that we even need not use machine learning approaches to distinguish the abnormal from the normal as an energy threshold value is good enough to do the job. In this experiment, the threshold value of Orange and Red alarm are dened as 0.5 104 and 0.7 104 respectively. V. R ESULT A ND D ISCUSSION The following group of gures in Fig.8 show that corresponding alarms respond to the contents and the statistic of experimental results in Table 3. The system performs with a low rate of False Negative(FP.) and a high rate of False Positive(NP.), which means that the system is quite sensitive to abnormal events, but sometimes it overreacts to

2885

TABLE III R ESULT Case Type Normal Fraud Fight Robbery Clips Num. 4 5 7 4 Frame Num. 2000 3775 4200 1700 FP. 3.8 2.7 2.1 3.6 FN. 0 1.5 1.7 2.3

1 2 3 4

with other pixels which represents the important feature of aggressive behavior. The structure of the system has been proven effective in Non-violent and violent abnormality detection at ATMs. To enhance the robustness of the ATM surveillance system, our future work will be concentrated on the following three aspects. Firstly, the approach of energy mining should be improved for describing the abnormal situation in certain scene. Secondly, a robust calibration algorithm should be considered to reduce the false positive rate. Finally, a appropriate feature extraction and analysis methods of energy curve will discover more valuable information about abnormality in video. The intelligence level of the system should be updated by introducing machine learning and fuzzy decision rule in complex event recognition. ACKNOWLEDGMENT The authors would like to acknowledge Dr. Weizhong Ye in The Chinese University of Hong Kong. The authors also wish to thank Mr. Huihuan Qian for his insightful suggestions, and other colleagues in Advanced Robotics Laboratory, at The Chinese University of Hong Kong, for their help, and encouragement. R EFERENCES
[1] F. Cupillard, F.Bremond, M. Thonnat. Behaviour Recognition for individuals, groups of people and crowd, Intelligence Distributed Surveillance Systems, IEE Symposium on , vol. 7, pp. 1-5, 2003. [2] Ankur Datta, Mubarak Shah, Niels Da, Vitoria Lobo. Person-on-Person Violence Detection in Video Data, In Proceedings of the 16 th International Conference on Pattern Recognition. vol. 1, pp. 433- 438, 2002. [3] Yufeng Chen, Guoyuan Liang, Ka Keung Lee, and Yangsheng Xu. Abnormal Behavior Detection by Multi-SVM-Based Bayesian Network, Proceedings of the 2007 International Conference on Information Acquisition, vol. 1, pp. 298-303, 2007. [4] Xinyu Wu, Yongsheng Ou, Huihuan Qian, and Yangsheng Xu. A Detection System for Human Abnormal Behavior, IEEE International Conference on Intelligent Robot Systems,vol. 1, no. 1, pp. 589-1593, 2005. [5] Xin Geng, Gang Li1, Yangdong Ye, Yiqing Tu, Honghua Dai. Abnormal Behavior Detection for Early Warning of Terrorist Attack, Lecture Notes in Computer Science. vol. 4304, pp. 1002-1009, 2006. [6] Oren Boiman, Michal Irani. Detecting Irregularities in Images and in Video, International Journal of Computer Vision , vol. 74, no. 1, pp. 17-31, 2007. [7] Haritaoglu, I. Harwood, D. Davis, L.S. A fast background scene modeling and maintenance for outdoor surveillance, In Proceedings OF International Conference of Pattern Recognition, vol. 4, pp. 179-183, 2000. [8] Jeho Nam, Masoud Alghoniemy, Ahmed H. Tewk. Audio-Visual Content-based Violent Scene Characterization,In Proceedings of International Conference on Image Processing (ICIP), vol. 1, pp. 353-357, 1998.

Fig. 8 Snapshots of experimental results. The semitransparent yellow region is the sensitive area we dened at beginning. The rectangle on the people means that their motion is being tracked. The Yellow Alarm indicates that someone is striding over the yellow line when a customer is operating on the ATM. The Red Alarm warns of the occurrence of violence. The left sub-gure of Orange alarm indicate more than one customer in the area, and the right sub-gure shows the interim of violence behaviors.

normal situations. Due to imperfections of the calibration algorithm, such False Positives occur mainly in the normal part of the video when a customer walks close to the ATM. The False Negative of Nonviolent cases is lower than that of Violent events because Nonviolent crime detection mainly relies on the object tracking, which is more robust than energy approach for aggressive behavior detection. The system output frame rate is around 911f ps, which satises the real-time requirement. The proposed energy-based algorithm processes a prominent performance in solving the problem of aggressive behaviors detection. The novel weighted method is not just a description of the entropy of velocity histogram. It focuses on the pixel with max velocity in eld and its relationship

2886

[9] W. Zajdel, J.D. Krijnders, T. Andringa, D.M. Gavrila. CASSANDRA: audio-video sensor fusion for aggression detection., IEEE Int. Conf. on Advanced Video and Signal based Surveillance (AVSS), vol. 1, pp. 200205, 2007. [10] Thanassis Perperis, Soa Tsekeridou. A Knowledge Engineering Approach for Complex Violence Identication in Movies ,International Federation for Information Processing(IFIP). vol. 247, pp. 357-364, 2007. [11] Z. Zhong, W.Z. Ye, S.S. Wang, M. Yang and Y.S. Xu. Crowd Energy and Feature Analysis, IEEE International Conference on Integreted Technology (ICIT07), vol.1, pp. 144-150, 2007. [12] Z. Zhong, W.Z. Ye, M. Yang, S.S. Wang and Y.S. Xu. Energy Methods for Crowd Surveillance, IEEE International Conference on Information Acquisition (ICIA07),vol. 1, pp. 504-510, 2007. [13] Jiangjian Xiao, Hui Cheng, Harpreet S. Sawhney, Cen Rao, Michael A. Isnardi. Bilateral Filtering-Based Optical Flow Estimation with Occlusion Detection, ECCV (1): pp. 211-224, 2006. [14] B. Lucas and T. Kanade. An Iterative Image Registration Technique with an Application to Stereo Vision, In Proceedings of the International Joint Conference on Articial Intelligence, vol. 1, pp. 674-679, 1981. [15] Barron, J.L., D.J. Fleet, S.S. Beauchemin, and T.A. Burkitt. Performance of optical ow techniques. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. vol. 1, pp. 236-242, 1992. [16] Coifman, R.R.; Donoho, D.L., Translation invariant de-noising. Lecture Notes in Statistics, vol. 103, pp. 125-150, 1995.

2887

Vous aimerez peut-être aussi