Vous êtes sur la page 1sur 4

IEEE SIGNAL PROCESSING LETTERS, VOL. 19, NO.

1, JANUARY 2012

35

Video Steganalysis Exploiting Motion Vector Reversion-Based Features


Yun Cao, Xianfeng Zhao, and Dengguo Feng
AbstractUnlike traditional image or video steganography in spatial/transform domain, motion vector (MV)-based methods target the internal dynamics of video compression and embed messages while performing motion estimation. However, we have noticed that some existing methods adopt nonoptimal selection rules and modify MVs in somewhat arbitrary manners which violate the encoding principles a lot. Aiming at these weaknesses, we design a calibration-based approach and propose MV reversion-based features for steganalysis. Experimental results demonstrate that the proposed features are very sensitive to the tendency of MV reversion during calibration and can be used to effectively detect some typical MV-based steganography even with low embedding rates. Index TermsCalibration, motion vector, MPEG, steganalysis, video.

I. INTRODUCTION N recent years, networked multimedia applications have been signicantly facilitated by high performance networking and compression technologies. Steganography using compressed video stream can easily achieve a large capacity even with low embedding rates. Moreover, covert communications via internet television, video telephony or video conference are not easy to arouse suspicion. The proposed steganalytic approach targets the video steganography making use of MVs. We focus on these methods for the following two reasons: First, since the MV values are leveraged as the information carrier, the statistical characteristics of the spacial/frequence coefcients are indirectly affected. Secondly, because the motion compensation technique is adopted by most advanced compression standards and the MVs are lossless coded, little degradation of the reconstructed visual quality would be introduced [2]. The advantages outlined make the MV-based steganography less detectable compared to those utilizing spacial/frequence coefcients directly. Typical MV-based steganography share some features in common, i.e., they rst select a subset of MVs following a predened selection rule (SR), then make certain modications
Manuscript received June 16, 2011; revised November 02, 2011; accepted November 07, 2011. Date of publication November 15, 2011; date of current version November 28, 2011. This work was supported by the Natural Science Foundation of Beijing under Grant 4112063 and the Natural Science Foundation of China under Grant 61170281. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Dimitrios Tzovaras. Y. Cao is with the State Key Laboratory of Information Security, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China, and also with the Graduate University of the Chinese Academy of Sciences, Beijing 100049, China (e-mail: caoyun@is.iscas.ac.cn). X. Zhao and D. Feng are with the State Key Laboratory of Information Security, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China (e-mail: xfzhao@is.iscas.ac.cn; feng@is.iscas.ac.cn). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/LSP.2011.2176116

to them for data hiding. Xu et al. [3] suggested embedding message bits in the MV magnitudes, the LSBs of MVs horizontal or vertical components are used for embedding. Fang and Chang [4] designed a method utilizing MVs phase angles. These two schemes select candidate MVs (CMV) according to their magnitudes with the assumption that modications applied to MVs with larger magnitudes introduce less distortion. But later, Aly had pointed out that the magnitude-based SR cannot ensure minimum prediction errors [2]. He hence designed a new selection rule by which MVs associated with large prediction errors are chosen, and message bits are hidden in the LSBs of both their horizontal and vertical components. In order to further enhance the steganographic security, in our latest work, we suggested using adopting nonshared SRs and minimizing the embedding impacts by perturbed motion estimation [5]. Compared to the efforts devoted to image steganalysis, video steganalysis remains largely unexplored. Most current steganalyzers (e.g., [6][9]) model the video data as successive still images and the embedding process as adding independent mean zero Gaussian noises. The reliability of this model is likely to deteriorate when MV values are used for data hiding. Since MVs are leveraged, detectable MV statistical changes should be exploited when designing specic steganalysis. To the best of our knowledge, in this direction, the only achievement was made by Zhang et al. [13] Their steganalytic features are directly draw from certain MVs statistics, but not very effective with low embedding strengths as tested in our experiments. In this letter, we design a calibration-based approach to perform dynamic steganalysis. As will be demonstrated later, if we decompress a stego video to the spatial domain and compress it again with no embedding involved, the altered MVs are inclined to revert to their prior values. Strong tendency of MV reversion would signal the existence of hidden messages. Therefore, calibration is done by recompression and MV reversion-based features are derived from the differences between the original and the calibrated videos. The rest of the letter is organized as follows: In the next section, we explain the basic concepts of the motion-compensated prediction and the phenomenon of MV reversion. In Section III, we give details on feature denition and describe the implementation of the used steganalyzer. In Section IV, comparative experiments are conducted to show the performance of the proposed steganalytic features. Finally, concluding remarks are given in Section V with some future research directions. II. THE PHENOMENON OF MV REVERSION A. Motion-Compensated Prediction Motion-compensated prediction is an integral part of video compression, and its basic idea is to predict the frame to be coded using one or more prior coded frames. This is possible in practice because video data is essentially a series of highly

1070-9908/$26.00 2011 IEEE

36

IEEE SIGNAL PROCESSING LETTERS, VOL. 19, NO. 1, JANUARY 2012

Fig. 1. Generic structure of the inter-MB coding.

correlated still images, and the temporal redundancy can be greatly reduced by inter-frame coding. State-of-the-art video coding standards try to remove the temporal redundancy via pixels macro block-based motion estimation applied to blocks (MB). A generic structure of the inter-MB coding is depicted in Fig. 1. To encode the current MB , the encoder uses one prior coded frame as the reference and search for s approximation within it. To measure the prediction error between and one candidate MB , the sum of absolute differences (SAD) is commonly used (1) and are luminance values. As a result, the MB where with minimum SAD is taken as s best prediction and denoted representing the spatial by . Consequently, only the MV displacement offset and the differential signal block need to be further coded and transmitted. B. MV Reversion While Recompression Calibration [1] is well known as an image steganalytic concept which estimates the macroscopic properties of the cover from the stego image. A typical calibration-based steganalyzer reconstructs an estimation of the cover from the stego object and draws features based on the difference between the two. Since our target steganographic methods are MV-based, we are interested in certain statistical characteristics of MVs. For the compressed videos, calibration can be done by decompressing the videos to the spatial domain and compressing them again with no message embedded. As will be demonstrated in details, the MVs altered in the rst compression have the inclination to revert to their prior values. Therefore, the MVs of the calibrated videos have most macroscopic features similar to those of the clean videos. We focus on one inter-MB whose MV has been changed for embedding, and take a close look at what will happen during was modied to and the calibration. Suppose that differential signal was calculated based on the MB associated , i.e., . Subsequently, instead of with underwent DCT transformation, quantization and entropy coding before transmission as depicted in Fig. 1. The rst step of calibration is decompression during which is retrieved as where is s reconstruction. In the second step, recompression is performed without embedding. As the case in the rst compression, when ap1 and plying motion estimation to will be calculated for comparison as

(2) and similarly (3)

has zero mean, thus Lemma 1 tells us that the variable the expectations of and can be estimated as

(4) and similarly (5) compression the inequation holds, we have . Then a conclusion can be draw that, for an inter-MB whose MV has been changed for data hiding, its MV has an inclination to revert during recompression. Lemma 1: As to the differential coefcient, the difference between the original value and its reconstruction has zero mean, . i.e., Proof: In the rst compression, the differential signal has to be DCT coded and quantized. Bellifemine et al. [10] had pointed out that, if the motion compensation technique is used, the 2D-DCT coefcients of the differential signal tend to be less correlated. Thus the distribution of the coefcients in can be well modeled with the Laplacian probability density function [11] as (6) Since the commonly used quantizer divides the sample by integer Q and rounds to the nearest integer, the probability that a sample will be quantized to is simply the and probability that the sample is between calculated as (7) Since during the rst

1Actually, s reference frame has to be coded again before used as s reference frame. Because is a compressed frame, a second time compression under the same settings wont introduce much distortion compared to the rst compression. Here is used as s reference frame for a close approximation.

CAO et al.: VIDEO STEGANALYSIS EXPLOITING MOTION VECTOR REVERSION-BASED FEATURES

37

of features are dened as follows where denotes the inter-MB and the upper bound of MV shift distance. 1) Features of Type 1: These features estimate the probabilities of MV shift distances dened as (10) is used to calculate the cardinality of a set. here 2) Features of Type 2: These features are proportions of correspond to given shift distances dened as (11) . where 3) Features of Type 3: These features are derived from type 2 features by taking MV shift distances into account and dened as

Fig. 2. Proportion differences between stego (50 bpf) and nonstego videos.

Then the expectation of the difference introduced by quantization is (12) where (8) i.e., . Because and are linear combinations of the coefcients in and respectively, there is , i.e., . Now if one specic MV reverts to after calibration, we will call this reversion a shift. The shift . Given a distance is calculate by compressed video with hidden messages, according to the above analysis, those modied MVs are likely to have nonzero shifts after recompression. So compared to the corresponding nonstego video, the stego one is expected to have a lower proportion of zero shift MVs and higher proportions of nonzero shift MVs. Fig. 2. shows the proportion differences between stego and nonstego videos caused by different embedding methods. The embedding methods and the used test videos are described in Section IV-A. III. PROPOSED STEGANALYZER A. MV Reversion-Based Features Based on the fact that the modied MVs have the inclination to revert during recompression, we dene a differential operator applied to an inter-MB as

B. Steganalyzer Designing We choose ve features from each of the 3 types and form a 15-d feature vector for training and classication. To be more specic, the rst 5 features are of type 1, and (13) The second and the last ve features are of type 2 and type 3 respectively which are processed analogically. The classier is implemented using Changs support vector machine (SVM) [12] with the polynomial kernel. IV. EXPERIMENTS A. Experimental Setup 1) Test Sequences: A database of 22 CIF video sequences in the 4:2:0 YUV format is used for experiments. Since the sequences vary in sizes from 90 to 2000 frames and most of them have 300 frames, each sequence is divided into 75-frame subsequences without overlapping and the total number of subsequences sums up to 111. 2) Steganographic Methods: Our experiments focus on attacking 4 MV-based steganographic methods, i.e., Alys [2], Xu et al.s [3], Fang and Changs [4] and our recently proposed [5] methods, and they are referred to as Tar1, Tar2, Tar3 and Tar4. These four targets are implemented using a well-known

(9) and are s prediction errors before and after rewhere compression. The rst element of the tuple measures the MV shift distance whereas the second changes in prediction errors. indicate a larger probGenerally speaking, larger values in ability that s MV has been once modied. Given a group of compressed inter-frames consisting of inter-MBs, three types

38

IEEE SIGNAL PROCESSING LETTERS, VOL. 19, NO. 1, JANUARY 2012

In general, our proposed features outperform C. Zhangs by a signicant margin. It is also observed that, there is an evident drop in detection accuracy when testing Tar4. For Tar1-Tar3, we have pointed out in [5] that their SRs cannot eventually guarantee minimum distortions, and the selected CMVs are modied in some nonoptimal manners (e.g., LSB replacement) which violate the encoding principles a lot. As illustrated in Fig. 2(a)(c), the modied MVs are more likely to revert after recompression. As for Tar4, since both the SR and the embedding process have been optimized with the hope that the introduced perturbations will be confused with normal motion estimation deviations, the tendency of MV reversion is suppressed as can be seen from Fig. 2(d). V. CONCLUSION AND FUTURE WORK
Fig. 3. ROC curves of steganalyzers using Zhangs and our proposed features.

TABLE I PERFORMANCE COMPARISON BETWEEN ZHANGS (S1) AND OUR PROPOSED FEATURES (S2) USING DIFFERENT SLIDING WINDOW SIZES (WS). (IN THE UNIT OF %)

In this letter, we have presented a calibration-based steganalytic scheme against MV-based steganography. We have shown with both mathematical analysis and experiments that the perturbation in regular motion estimation causes MV reversion during recompression. Proposed features measuring the tendency of MV reversion can be used to effectively detect some typical MV-based steganography even with a low embedding strength. However, since our proposed features are sensitive to the tendency of MV reversion, if some optimized measures are adopted to weaken this embedding effect, the detection performance is likely to drop. In our future work, wed like to investigate how to improve the adaptability of the proposed features. Possible approaches include using higher-order features and adopting certain feature selection/fusion techniques. REFERENCES

MPEG-4 video codec Xvid [14]. As the message bits are embedded into MVs, the embedding strength is measured by the average embedded bits per inter-frame (bpf). In our work, the considered embedding strength is 50 bpf. 3) Training and Classication: In our experiments, 15 YUV sequences consisting of 77 subsequences are randomly selected for training purposes, and the remaining seven sequences are left for testing. All subsequences are compressed by Xvid encoder with standard settings to produce the class of clean videos. On the other hand, for a given steganography, all subsequences are subjected to compression with random messages embedded to create the class of stego videos. We use a xed size sliding window to scan each subsequence without overlapping, and the steganalytic features representing the clean or stego class are extracted from the frames within the window. It can be expected that with the window size increases, more stable statistical features can be obtained whereas the resolution of the steganalyzer will decrease. B. Performance Results Besides our proposed features, C. Zhang et al.s [13] steganalytic features are also leveraged for comparison. The true negative (TN) rates and true positive (TP) rates are computed by counting the number of detections after a whole scanning over each subsequence in the test set. The performances of the steganalyzers with different sliding window sizes are tested, and the corresponding results are recorded in Table I. As an example, the detector receiver operating characteristic (ROC) curves of the steganalyzers using 8-frame sliding window are plotted in Fig. 3.

[1] J. Fridrich, Feature-based steganalysis for JPEG images and its implications for future design of steganographic schemes, in Proc. IH04, Lecture Notes in Computer Science, 2004, vol. 3200/2005, pp. 6781. [2] H. Aly, Data hiding in motion vectors of compressed video based on their associated prediction error, IEEE Trans. Inf. Forensics Secur., vol. 6, no. 1, pp. 1418, 2011. [3] C. Xu, X. Ping, and T. Zhang, Steganography in compressed video stream, in Proc. ICICIC06, 2006, pp. 269272. [4] D. Fang and L. Chang, Data hiding for digital video with phase of motion vector, in Proc. Int. Symposium on Circuit and Systems (ISCAS)[C], 2006, pp. 14221425. [5] Y. Cao, X. Zhao, D. Feng, and R. Sheng, Video steganography with perturbed motion estimation, in Proc. IH11, Lecture Notes in Computer Science, 2011, vol. 6958, pp. 193207. [6] U. Budhia, D. Kundur, and T. Zourntos, Digital video steganalysis exploiting statistical visibility in the temporal domain, IEEE Trans. Inf. Forensics Secur., vol. 1, pp. 4355, 2006. [7] J. S. Jainsky, D. Kundur, and D. R. Halverson, Towards digital video steganalysis using asymptotic memoryless detection, in Proc. MM&Sec07, 2007, pp. 161168. [8] C. Zhang, Y. Su, and C. Zhang, Video steganalysis based on aliasing detection, Electron. Lett., vol. 44, no. 13, pp. 801803, 2008. [9] V. Pankajakshan, G. Doerr, and P. K. Bora, Detection of motion-incoherent components in video streams, IEEE Trans. Inf. Forensics Secur., vol. 4, no. 1, pp. 4958, 2009. [10] F. Bellifemine, A. Capellino, A. Chimienti, R. Picco, and R. Ponti, Statistical analysis of the 2D-DCT coefcients of the differential signal for images, Signal Process.: Image Commun., vol. 4, no. 6, pp. 477488, 1992. [11] M. J. Gormish and J. T. Gill, Computation-rate-distortion in transform coders for image compression, SPIE Vis. Commun. Image Process., pp. 146152, 1993. [12] C. Chang and C. Lin, LIBSVM: A Library for Support Vector Machines 2001 [Online]. Available: http://www.csie.ntu.edu.tw/cjlin/libsvm [13] C. Zhang, Y. Su, and C. Zhang, A new video steganalysis algorithm against motion vector steganography, in Proc. WiCOM08, 2008, pp. 14. [14] 2009 [Online]. Available: http://www.xvid.org/, Xvid Codec 1.1.3

Vous aimerez peut-être aussi