Vous êtes sur la page 1sur 5

JOURNAL OF TELECOMMUNICATIONS, VOLUME 28, ISSUE 1, NOVEMBER 2014

Performance Analysis of Variable Block-Size


Motion Estimation in H.264/AVC
S. M. Shamsul Alam and Sehrish Khan
Abstract Motion estimation by using variable macro block size in H.264 plays a vital role to enhance the coding efficiency
over a scheme that exploited the fixed macro block size motion compensation. However, searching the motion vectors using
variable macro block sizes requires different computational complexities. This paper describes the different motion estimation
algorithms by applying variable macro-block size and multiple frame differences. The extensive simulation results show that
exhaustive search with lower macro block size outperforms compared to other search techniques. But the computational
complexities of exhaustive search is very high than three step search (TSS), new three step search (NTSS), four step search
(FSS) and diamond search (DS) techniques. To make an optimum solution, this paper depicts a comparison between different
motion estimation techniques for diverse picture sequences. Based on this comparison, DS shows very low computational
complexities with low search parameters. Based on these observations, an optimization technique is required to select the
mode of macro block (MB), search parameter (SP) size, and block matching algorithm technique. To get this adaptation
technique, Affine transformation based motion prediction is applied. On the other hand, the PSNR (Peak Signal to Noise Ratio)
is plotted by using multiple frame differences. Simulation results portray that PSNR depends on how many similar sequences
are present in the reference picture frames. This simulation is very useful to design new motion estimation algorithms by using
multiple reference frames.
Index Terms Motion Estimation, Exhaustive Search (ES), Three Step Search (TSS), New Three Step Search (NTSS), Four
Step Search (FSS), Diamond Search (DS), Peak Signal to Noise Ratio (PSNR), Macro Block-Size.

1 INTRODUCTION

N general video coding systems, motion estimation


(ME) can efficiently eliminate temporal redundancy
between adjacent frames. At the same time, an ME is
also regarded as a vital component in a video coder as it
consumes a large amount of computation resources. Especially, in the video coding standard H.264, ME accounts
for most of the complexity of the encoder for its seven different block sizes and 1/4-pel accuracy fractional pixel
search. Therefore, simplifying the ME process is essential
for real-time applications [1]. The different device constraints and network limitations are the main challenging
issues for broadcasting video to different terminal devices. There are different resolutions in terminal devices. For
that reason, down-sampling video transcoding is commonly used for adapting the bit-rates of the video streams
to device requirements [2]. In video processing, 2D and
3D MEs are very popular due to their simplicity and effectiveness. All the ME algorithms are based on their
temporal changes in image intensities. The observed 2D
motions based on intensity changes may not be same as
the actual 2D motions. The velocity of observed or apparent 2D motion vectors are referred to as optical flow. Fig.
1 shows the ME between two given frames, ( x, y, t1 )
and ( x, y, t2 ) and the motion vector (MV) at x between
time t1and t2 is defined as the displacement of this point
from t1 to t2. According to the Fig. 1, tracked frame repre

sents the forward or backward frame sequence of anchor


frame.
Backward motion estimation
Time t-!t

Time t+!t

Time t
x

d(x;t-!t)

d(x;t+!t)

Anchor
Frame

Tracked
Frame

Tracked
Frame

Forward motion estimation


Fig. 1. Forward and backward ME [1].

2 VARIABLE BLOCK-SIZE MOTION ESTIMATION


Variable block-size motion estimation (VBSME) is a new
coding technique and provides more accurate predictions
compared to traditional fixed block-size motion estimation (FBSME) [3]. In VBSME, one 16 x 16 macro block is
further portioned into smaller blocks as shown in Fig. 2
[2]. If an MB consists two objects with different motion
directions, the coding performance of this MB is worse for
fixed block size motion estimation (FBSME). Thats why,
if the MB can be divided into smaller blocks in order to fit
the different motion directions with VBSME, the coding

S. M. Shamsul Alam is with the Electronics and Communication Engineering Discipline, Khulna University, Khulna, Bangladesh.
Sehrish Khan is with the Electronics and Communication Engineering
2014 JOT
Discipline, Khulna University, Khulna, Bangladesh.
www.journaloftelecommunications.co.uk

10

performance improves significantly. For that reason in


H.264/AVC, an MB with a variable block size can be divided into seven kinds of blocks including 4 x 4, 4 x 8, 8 x
4, 8 x 8, 8 x 16, 16 x 8, and 16 x 16.

Fig. 2. Variable block-size motion estimation in H.264/AVC [2].

Although VBSME shows good result and higher compression ratio, but its computational complexity is very
high and also its hardware implementation is very difficult. This paper portrays the performance analysis of exhaustive search (ES), three-step search (TSS), new three
step search (NTSS), four step search (FSS) and diamond
search (DS) techniques by using different block-size.

BLOCK MATCHING ALGORITHMS

In motion estimation, block matching algorithm (BMA) is


a technique for locating matching block from a previous
frame to current block. This is used to reveal the temporal
redundancy in the video sequence. In BMA, the current
block is divided into a matrix of macro blocks then these
are compared with the corresponding blocks and its
neighbor for creating the motion vectors. These vectors
specify the movement of macro blocks in the reference
frame and finally the next frame will be generated by applying this motion vectors with reference frame. Due to
this reason, the best matching vectors play key part for
identifying the near identical predicted frame. However,
the search area for matching good macro blocks is formed
as per Fig. 3. According this Fig. 3, p pixels on all four
sides of the corresponding macro block in previous frame
are searched at a time and this p is known as search parameter [4].

cost

function

MAD =

1
N2

which

N 1 N 1

| C

ij

is

given

as

Rij |

Where N is the size of the MB and Cij & Rij are the pixels
being compared in the current and reference block respectively [4]. Therefore, using this (1), the minimum cost value will indicate the best matches within current and reference frame. In this paper, peak signal to noise ratio
(PSNR) is used to measure the quality of motion compensated image. In this case, the error introduced by the
compression in compensated image is compared with the
original image and the PSNR value is calculated as (2).

MAX 2

PSNR = 10. log10


MSE

SIMULATION RESULTS

We compare five block-matching algorithms with different macro block-size for 30 frames picture sequences
missa, gs, clatrain and surfside. ES, TSS, NTSS,
FSS and DS techniques are widely analyzed with different
macro-block size and different frame difference d = 2, 3, 4,
5, 6, 7, 8. Fig. 4 represents the performance of ES, TSS,
NTSS, FSS and DS ME.

p=7

p=7
Fig. 4. ME with variable macro block-size (Picture Sq.: Missa).
Fig. 3. A macro block of size 16 x 16 including a search parameter p
of size 7 pixels [4].

In this paper, we have taken different search parameter


(SP) with different macro block (MB) size. The best
matching between reference and current MB depends on
the different cost functions. In this paper, mean absolute
difference (MAD) had been applied for calculating this

(2)

Where, MAX is the maximum possible pixel value of the


image and mean square error (MSE) is calculated between
original and compensated image [5]. Higher PSNR generally indicates the higher quality of reconstructed image.
For each MB size and SP, we have applied different BMA
techniques. Although there are so many BMAs, in this
paper ES, TSS, NTSS, FSS and DS are applied by taking
4x4, 8x8, 16x16, 8x16 and 16x8 MB sizes on different picture sequences. Besides this, we have applied the different SP i.e. p = 7, 5 and 3. Moreover, multiple reference
frames are taken into account while generating the estimated motion vectors.

Current
Macro Block

(1)

i =0 j =0

Search Block
16
16

(1)

11

Fig. 5. ME with variable macro block-size (Picture Sq.: gs).

fore it requires further tradeoff between SP, MB size and


coherent motion change.
Fig. 4, 5 and 6 are included only two picture sequences
named missa and gs. This experiment is also done by
using other picture sequences like caltrain and surfside. If all these results are analyzed, the lower macro
block shows higher performance that means PSNR value
is very high for low macro block size. But PSNR values
depend on the individual frame information of different
picture sequences. But in terms of computational complexity ES is more complex than other techniques. Fig. 7
depicts the response of different BMA techniques in terms
of computation.

Fig. 6. ME with variable search parameter (Picture Sequence: missa left MB size 16x16, gs middle MB size 4x4, gs right MB size
16x16).

From this figure, it can be shown that, when the block


size is gradually reduced, then the PSNR value is gradually increased. For MB size 4 x 4, PSNR value is very high
compared to other block size. Similarly, Fig. 5 depicts the
performance of BMAs for different block size. From this
figure, it can be said that for macro block-size 4 x 4, FSS
shows very good performance than others. From these
figures, for picture sequence missa and gs the performance of ES or FSS is not solely good. In picture sequence missa, the response of FSS MB size 4x4 is good
between the frame sequences 20 to 25. On the other hand,
in picture sequence gs, the response of ES is good between the frame sequences 20 to 25. In nutshell, it can be
said that the performing activities of BMAs depends on
the changes in picture sequences. That means, at first the
mode of a MB decided as 16 x 16 and then calculated its
PSNR value after getting the motion vectors. If this MB
size fails to satisfy higher PSNR value then divided this
MB size into higher resolution (size 8x8 or 4x4). Based on
the correlation between coherent picture sequences mode
of the MB size will be selected.
Fig. 6 shows the response of motion estimation algorithms for variable search parameters; p = 3, 5, 7 and MB
size 16x16 value. The response of search parameter depends on the coherent change of picture sequences. Lower p value always does not show the good PSNR value.
For example, in picture sequence missa search parameter p = 3 & 5 indicates high PSNR values. On the other
hand, picture sequence gs p = 7 shows very good response compared to lower p value. Here, simulation for
MB size 4x4 picture sequence gs ES outperforms compared to DS under search parameter p = 7. But for MB
size 16x16 and picture sequence gs, DS outperforms
compared to NTSS under search parameter p = 7. There-

Since ES is full step search algorithm, it requires higher


number of computations. On the other hand DS requires
very less computations compared to other algorithms.
However Fig. 7 is simulated by taking the picture sequence gs under 4x4 MB size and search parameter
p = 3. Fig. 8 shows the number of computations for various MB size. Hence it shows the complexity behavior of
different algorithm under diversified MB size.

Fig. 7. Number of computations for different BMA techniques.

Fig. 8. Number of computations for different BMA techniques using


different MB sizes.

12

From the Fig. 8, it is clearly identified that numbers of


computations are abruptly increased with respect to the
size of MB. Furthermore the mode of MB depends on the
coherent correlation of the picture sequences. Therefore, it
requires an optimization technique between mode of MB
size, BMA technique and value of search parameter p.
From Fig. 8, DS requires very less computation compared
to other search algorithms. As a result, an optimization
technique is required to select the mode of macro block
(MB), search parameter (SP) size, and block matching
algorithm technique. To get this adaptation technique,
Affine transformation based motion prediction is applied.
Fig. 9 is simulated by taking BMA techniques including
Affine transform based motion prediction. Here gs picture sequences are applying with MB size 8x8. From result of this Affine transform based motion prediction, in
the vicinity of frame number 10, it shows very low PSNR
value.

Fig. 10. ES ME with variable frame differences (Picture Seq.: misa).

On the other hand, at frame no 60, it is good to create


motion vector by using reference frames of previous or
second previous frame. So based on this simulation result, it is wise to take the effects of multiple frames for
generating current motion vectors. But if multiple reference frames are included for generating motion vectors
then its complexity is increased dramatically. So there
should require different algorithms to reduce the complexity as well as hardware implementation.

5 CONCLUSIONS

Fig. 9. Affine transform based motion estimation.

For this reason, for calculating motion vectors, ES based


motion estimation will be applied. On the other hand, in
the vicinity of frame number 20, Affine transform based
motion prediction shows very high PSNR value and because of low computational complexity, DS based motion
estimation will be applied. Fig. 10 shows the simulation
of ES based ME for different frame differences. From this
figure, it can be said that low frame difference does not
always show the good result. The smaller value of motion
estimated residual error value represents higher value of
PSNR rate. So the similarity of motion estimated block
depends on how many frames are included in reference
frames. That means while matching the motion vector,
how many forward and backward frames are taken. From
Fig. 10, it is shown that the response curve of frame difference does not show good result entire the 100 frames
simulation. For example, in the vicinity of frame no 20,
frame difference 2 shows very worse result and frame
difference 7 shows high PSNR value than others.

We have presented five motion estimation algorithms


with different macro block sizes and different frame differences. From the above discussion, no block matching
algorithm is solely good for motion estimation. Based on
the performance of different MB size, change of search
parameter p value, efficient BMA techniques and correlative change between coherent picture sequences an optimization is required to make this system adaptive. As per
this concept, Affine transform based motion prediction
had been applied to minimize the computational complexity without decreasing the image quality. Therefore,
these concepts are widely used in high level video coding.
From these results engineers can get useful information
for designing different motion estimation algorithms using multiple reference frames and macro-block size.

REFERENCES
[1]

[2]

[3]

[4]
[5]

P.Muralidhar, C.B.Rama Rao, I.Ranjith Kumar, Efficient


Architecture for Variable block size Motion Estimation of H.264
Video Encoder, International Conference on Solid-State and
Integrated Circuit (ICSIC)Singapore, IPCSIT vol. 32 pp. 1-7, 2012.
Q. Tang , P. Nasiopoulos, and R. Ward,Efficient motion vector reestimation for MPEG-2 TO H.264/AVC transcoding with arbitrary
down-sizing ratios, IEEE international Conference on Image
Processing Egypt, p. 3689 - 3692, 2009.
C. Y. Chen, S.Y. Chien, Y. W. Huang, T. C. Chen, T. C. Wang, L.
G. Chen, Analysis and Architecture Design of Variable Block-Size
Motion Estimation for H.264/AVC, IEEE Trans. Circuits and
Systems. vol. 53, pp. 578593, March 2006.
A. Barjatya, Block Matching Algorithms For Motion Estimation,
Student Member, IEEE, DIP 6620 Spring 2004.
S. E. Kim, J. K. Han, and J. G. Kim, An Efficient Scheme for Motion
Estimation Using Multireference Frames in H.264/AVC, IEEE
Trans. Multimedia, Vol. 8, No. 3, pp. 457-466, 2006.

S. M. Shamsul Alam received B.Sc. (Engg.) degree in Electronics


and Communication Engineering from Khulna University, Khulna,
Bangladesh in 2004 and M. Engg. degree in the Department of Information and Communication Engineering from Chosun University,

13

Gwnagju, Korea, under a Global IT, NIPA scholarship program in


2013. From 2011 to 2013, he was working as a Research Assistant
with the Department of Information and Communication Engineering,
Chosun University, Gwangju, Korea. Currently, he is with the Electronics and Communication Engineering (ECE) Discipline, Khulna
University, Khulna, Bangladesh and serving as a Faculty Member in
ECE Discipline. His research interest includes Chip Design and Application Specific Processor Design for communication systems.
Sehrish Khan received B.Sc.(Engg.) degree in Electronics and
Communication Engineering from Khulna University, Khulna, Bangladesh in 2002 and M. Engg. degree in Telecommunications from
Asian Institute of Technology (AIT), Thailand, under France Government Scholarship and AIT Fellowship in 2006. Currently, she is
with the Electronics and Communication Engineering (ECE) Discipline, Khulna University, Khulna, Bangladesh and serving as a Faculty Member in ECE Discipline. Her research interest includes Image
Processing, Wireless Communication and Ad-hoc networks.

Vous aimerez peut-être aussi