HW6 Solution

Polytechnic University, Dept.
Electrical and Computer Engineering

EE4414 Multimedia Communication System II
Fall 2005, Yao Wang
___________________________________________________________________________________
Homework 6 Solution (Video Coding Standards)
Reading Assignment:
• Lecture slides
• K. R. Rao, Z. S. Bojkovic, D. A. Milovanovic, Multimedia Communication Systems: Techniques,
Standards, and Networks, Prentice Hall PTR, 2002. (Chap.5)
• Excerpts from EL514-Multimedia Lab Manual on video coding
Written Assignment (Quiz based on selected problems on 11/1)
1. What is the target application of the H.320 standard? What is the video coding standard used in
H.320?
H.320 is the standard for audio-visual conferencing/telephony over the ISDN channels. This
standard is mainly used for tele-conferencing for business/education. At the time H.320 is
developed, the H.261 standard is developed for video coding. But in the new systems, H.263 can
also be used.
H.323?
H.323 is the standard for audio-visual conferencing/telephony over the packet switched networks
that does not provide guaranteed quality of services, mainly the Internet. It allows both H.263 and
H.261 standards for video coding, but H.263 is preferred.
H.324?
H.324 is the standard for audio-visual conferencing/telephony through the circuit switched
telephone networks, including both wired and wireless phone modems. It allows both H.263 and
H.261 standard for video coding, but H.263 is preferred.
4. What are the main differences between H.320, H.323 and H.324 applications in terms of
available bandwidth and delay variation?
The H.320 and H.324 standard are targeted for circuit switched networks which have dedicated
channels allocated to a particular communication session. Therefore the available bandwidth is
fixed and the delay variation is small. Because of this fixed bandwidth and delay, the quality of the
received audio and video quality stays fairly constant in time. The H.320 uses the ISDN channel,
with rates much higher that that affordable by either wired or wireless modems used by the H.324
system. The ISDN channels are also very reliable (very low bit error rates). The H.320 system is
mainly used within a large corporation. The channel quality for H.324 applications depend on the
underlying application: the wired channels are more reliable than the wireless channels. For
wireless channels, a large portion of the bandwidth is used for channel error correction, so that the
available bandwidth for sending audio and video signals is much lower.
The H.323 standard is targeted for the packet switched networks that do not guarantee quality of
services, with large variations in available bandwidth and the end-to-end delay. Because of this
variation, the quality of the received audio and video signals can vary greatly in time.
5. H.261 and H.263 video coding standards differ mainly in how motion estimation is performed.
Describe some of the techniques adopted in H.263 that helped improve its coding efficiency
over H.261.
The H.261 standard performs motion estimation at the integer-pel accuracy only. It also has a
relatively low search range (-16,16). The H.263 allows half-pel accuracy motion estimation. It
allows a larger maximum search range (-32,32). It also allows a 16x16 MB to be divided to 4 8x8
blocks for more accurate motion estimation, which is helpful when an MB contains multiple objects
with different motions. It also allows for, as an option, overlapping block motion compensation,
which can suppress the blocking artifacts in predicted images. All of these techniques help to
improve the prediction accuracy and consequently the coding efficiency.
6. What is the target application of MPEG-1? What are the different parts of MPEG-1 standard?
MPEG-1 is initially developed to enable storage of a 2-hour movie on a CD. This is the standard
used to produce VCD. But MPEG-1 is also used now to distribute video together with audio over
the Internet. The MPEG-1 standard contains many parts, including a video coding part, and an
audio-coding part, and a system part which deals with how to synchronize audio and video.
7. Describe some of the differences between MPEG-1 and H.261/H.263 video coding standards?
H.261 and H.263 are targeted for two-way video conferencing/telephony, which has stringent delay
requirement. This low-delay requirement rules out coding a frame as a B-frame, which requires
coding a future frame first and causes quite large delay. Both the encoder and decoder have to be
able to process a video in real time to enable effective communication between people at different
locations. Therefore the encoder and decoder both cannot be overly complex. Also the bit stream
should have a fairly constant bit rate, so as not to cause large delay variation due to transmission.
This requirement forbids the video encoder to insert I-frames periodically, rather only I-blocks
when necessary (either for coding efficiency or for error resilience). MPEG-1 on the other hand is
targeted for viewing a video that is either pre-compressed or live compressed, but does not involve
two-way communication. The encoder is located at the originator of the video content and can be
fairly complex. The bit rate can have large variation as long as the decoder has a large buffer. The
viewer just needs a decoder to view the compressed video. But the compressed video bitstream
should allow random access (fast forward, rewind, etc.). With MPEG-1 (and MPEG-2), this is
enabled by organizing video frames into group of pictures (GoP), with each picture starting with an
I-frame followed by P-frames. Between P-frames, the encoder can also use B-frames for enhanced
coding efficiency and random access capability.
8. What is the target application of MPEG-2? What are the different parts of MPEG-2 standard?
When the MPEG-2 standard was first developed, the major target application is to store a 2-hr
video in the BT.601 resolution (704x480) on a DVD with quality comparable or better than
broadcast TV. Later on, the scope of the standard was expanded to consider broadcasting of video
(SD and HD) over the air or cable and other types of networks. The MPEG-2 standards includes a
system part, a video coding part, an audio coding part, and several parts dealing with the
transmission aspect.
9. Describe some of the differences between MPEG-1 and MPEG-2 video coding standard?
The main difference is that MPEG2 must handle interlaced video. For this purpose, different modes
of motion estimation and DCT scanning were developed that can handle interlaced sequences more
efficiently. In addition, MPEG2 standard has options that allow a video to be coded into two layers.
It also has a profile dealing with how to code stereo video or more generally multiple view videos.
10. What are the different ways that MPEG-2 uses to generate two layer video? Explain each
briefly.
MPEG-2 has 4 ways to generate layered video: 1) data partition: a video is first coded using
conventional method into one bit stream. Then the bits for each MB is split between a base layer
and enhancement layer. The base layer includes the header and motion information and first few
low DCT coefficients, the enhancement layer includes the remaining coefficients. The base layer
alone yields a some what blurred version of the original video. The enhancement layer includes the
detail information, and, when added to the base layer, provides a more clear representation. 2) SNR
scalability: Each frame of a video is first coded using the conventional method but with a large
quantization step size. The resulting bits constitute the base layer. Then the quantization error for
the DCT coefficients are quantized again using a smaller step size. The resulting bits constitute the
enhancement layer. The base layer alone yields a coarsely quantized version of the original video,
the enhancement layer together with the base layer yields a more accurate version. 3) Spatial
scalability: the base layer codes a down-sized version of the original video, the enhancement layer
codes the original size, but with each frame predicted from either the past coded frame in the
original size, or the interpolated version of the current frame produced by the base layer, or a
weighted some of both. The enhancement layer together with the base layer yields the original size
video (with coding artifacts). 4) Temporal scalability: the base layer codes the original video (say
30 frames/s) at a lower frame rate (say 10 frames/s), the enhancement layer codes the skipped
frames, using either the coded frames in the base-layer, or the past coded frames in the
enhancement layer for motion compensated temporal prediction.
11. MPEG-4 video coding standard uses the so-called “object-based coding”. Describe what it is
and how a receiving user may make use of it? What are the three types of information contained
in each object?
With object-based coding, a video sequence is decomposed into multiple objects, and each object is
coded separately. This enables the encoder to code different objects with different accuracy. For
example, the foreground moving object can be coded more accurately than background. The
receiver can choose to compose the objects as desired. It can choose not to decode certain objects,
change the viewing angle for one or several objects when displaying a sequence, or replace an
object with some other pre-stored objects. The information transmitted for each object includes its
shape, its motion, and its texture (the color intensities in the initial frame and the prediction errors
in following frames).
12. Describe some techniques incorporated in H.264 that helped improving its coding efficiency
over H.263/MPEG-4.
1) Intra-prediction. In the prior standards, in the INTRA mode, the pixels in a block are coded
directly through transform coding, therefore not exploiting any spatial correlation that may exist
between pixels in this block and pixels in adjacent blocks. Intra-prediction in H.264 makes use of
this correlation. Different intra prediction modes are introduced to consider correlation along
different directions. 2) Integer transforms: instead of the DCT, H.264 uses an integer version of the
DCT, which approximates DCT but all computations can be done through integer operation. This
helps to eliminate any numerical errors between the forward transform at the encoder and the
inverse transform at the decoder. Also the transform block size can be varied from block to block
depending on which one gives the best representation. 3) More accurate motion estimation with 1/8
pel search step size, and variable block sizes from 16x16 down to 4x4. Also instead of using a
single reference frame, one can choose among several reference frames. The bidirectional
prediction is generalized to allow prediction from past two reference frames with any weighting. 4)
More efficient arithmetic coding. 5) Deblocking filtering to remover blocking artifacts in
reconstructed images.

HW6 Solution

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

HW6 Solution

Transféré par

Droits d'auteur :

Formats disponibles

Polytechnic University, Dept.

Electrical and Computer Engineering

Homework 6 Solution (Video Coding Standards)

Written Assignment (Quiz based on selected problems on 11/1)

Vous aimerez peut-être aussi