Vous êtes sur la page 1sur 35

Video coding (Part 5) Microsoft Window Media and VC-1

Yi-Shin Tung National Taiwan University (NTU)

z z z z z z

Windows Media family and its evolution WMV applications Video coding tools Comparison with MPEG-2, H.264/AVC Performance evaluations Conclusions

Goal and applications

z z

Focus on streaming compressed audio and video over the Internet to personal computers. Has a vision to move forward and enable the effective delivery of digital media through any networks to any devices. Applications include:
Internet based application like Web broadcast, VOD. Consumer electronics like DVD, car audio and mobile

phones. Terrestrial and satellite broadcast (DVB-T and DVB-S)

WM end-2-end delivery
z z

Windows SDK WM porting kit

Windows Media Codecs


Audio codec
Windows Media Audio 9 (mono/stereo, 8kHz~48kHz,

5kbps~320kbps, CD quality at 48~128kbps) Windows Media Audio 9 Professional (5.1 or 7.1 ch, up to 96kHz, up to 24 bits/sample, 128kbps~) Windows Media Audio 9 Lossless (2:1 ratio for stereo) Windows Media Audio 9 Voice (mono, 4kbps~20kbps, hybrid CELP/transform coding)

Video codec
Windows Media Video 7 and 8 (non-standard version of MPEG-4) Windows Media Video 9 (VC-9, VC-1) (160x120@10kbps,

BT.601@2Mbps, 720p@4~6Mbps, 1080i@6~20Mbps) Windows Media Video 9 Screen (generally 28kbps, 100kbps for images) Windows Media Video 9 Image (slide show and transitions)

Encoding operational modes

z z z z z

One-pass CBR (live encoding and transmission) Two-pass CBR (offline encoding for on-demand streaming) One-pass VBR (live capture) Two-pass VBR (download-and-play applications) Peak-constrained VBR (constrained reading-speed) Avg/max/min bitrates are specified. Multiple bitrate encoding (MBR)

WMV status

HD movies have been commercially released in 2003. z WMV-9 is under consideration of SMPTE, to be VC-1 by C-24 group, Sep 2003. Promoted to CD, March 2004.
previously named Proposed SMPTE Standard for Television: VC-9

Compressed Video Bitstream Format and Decoding Process


VC-1 becomes a mandatory codec for two major formats of HD video

HD-DVD: Microsoft on every DVD, Feb 2004 z http://news.com.com/2100-1041_3-5166786.html?tag=nefd_top BD-DVD (blu-ray): H.264 and VC-1 added to blu-ray standard z http://www.digitmag.co.uk/news/index.cfm?NewsID=4382

MPEG-LA announces plan for joint VC-1 license

Call for essential patents is first step (http://www.mpegla.com/pid/vc9/)

Decoding process block diagram

Conforming Implementation Implementationspecific

Bit-stream Parsing

Inv. VLC

Inv Quant

Inv Transf

Overlap Smooth & Loop Filter

Decoded Frame

Out-of-Loop Processing Post-filtering Color Conv.

Motion Compensation Inv. VLC Pred ? pel interp 4MV ? pel interp Buffer
(1-frame delay)


Intensity Comp. & Range Re-mapping

Same structure
z z z

Internal color format is 8-bit 4:2:0. Block-based motion compensation and spatial transform. I/P/B definitions are similar to MPEG-4. (not as H.264)

Design criteria

Design metrics

Rate-distortion curve Visually feedback by cinema testing Drift-free design for bit exact reconstruction Computational complexity v.s. coding gain
z z z

FP arithmetic is ruled out 16 bit word size is preferred Conditional statements should be minimized.

Guideline: Any inefficiency in signal processing operations tends to have a big impact on R-D at high rates, whereas any inefficiency in entropy coding has more impact at low rate R-D plot.
Signal process ops: motion comp., transform, loop filtering. Entropy coding: zigzag scanning, motion vector prediction.

Salient innovations of WMV-9

z z z z z z z z z z z

Adaptive block size transform Limited precision transform set Adaptive motion compensation Adaptive quantization Advanced entropy coding Loop filtering Advanced B frame coding Interlace coding Overlap smoothing Low-rate tools Fading compensation

Adaptive block size transform


Large transform v.s. small transform

Pros: good to capture trends and periodicities Cons: spreading effects due to local transients, ringing effects

Trends and textures are better preserved by large transform, while areas of discontinuity are better by small transform. z One 8x8, two 8x4, two 4x8 or four 4x4 transforms are applicable to code a block, which allows to use the size best suited for the underlying data. z Transform type can be signaled at the frame, macroblock or block level. z Intra block always adopts 8x8 transform.

Adaptive block size transform (contd)

z z

The ability of retain texture information by large transform. Although R-D gain is not huge, it provides major subjective quality benefits, especially for the subtle texture, film details and grain noise. In H.264 high profile, adaptive transform is added for acknowledging this benefit.

16 bit transform

Design constraints
A full 16-bit operation, where both sums and products of two 16-bit

values produce results within 16-bits. Forward and inverse transform form an orthogonal pair. VU = diag(D) Transform approximates a DCT. Norms of basis functions within one transform type are identical. Norms of basis functions between transform types are identical.

8x8 inverse transform places the tightest constraint. z WMV-9 relaxes the last two constraints. The norms are in the ratio 288:289:292 (1% difference). This is compensated during encoding process. z Row Itrans => rounding => column Itrans => rounding

Motion compensation
z z z

8x8 or 16x16 prediction Up to -pel motion vector is adopted. Adaptive motion mode derived from 3 criteria (MV resolution, size, filtering type) is signaled at frame level.
Mixed block size (16x16 and 8x8), -pel, bicubic [high

bitrate] 16x16, -pel, bicubic 16x16, -pel, bicubic 16x16, -pel, bilinear, [low bitrate]

Bicubic filtering

Direct filtering approach, where the 4-tapped coefficients are

(-1*P1 + 9*P2 + 9*P3 -1*P4 + 8 r) >> 4 (-4*P1 + 53*P2 + 18*P3 3*P4 + 32 r) >> 6 (-3*P1 + 18*P2 + 53*P3 4*P4 + 32 r) >> 6

-pel bilinear filtering is applied to chrominance components. -pel bilinear is optional for low complexity applications.
Case 1 Case 4 Case 5 Case 8 Integer locations Case 3 Case 6 Case 2 Case 7

Adaptive quantization
z z

The same quantization rule applies to all 4 transform coeffs. Two quantization modes, decided at each frame
Dead-zone, suitable for low bitrate, {-kQ-D, 0, kQ+D} Regular uniform quantization, high bitrate, {kQ} Adaptively change according to the running QP

In the encoding side, dead-zone is always existed.



Regular uniform quant

Entropy coding: Context adaptive multiple VLCs


In WMV9, up to 8 tables (coding sets) are used for coding each symbol and is selected by each frame. E.g., there are 8 transform AC coeff. tables, which is different from H.264, symbols are encoded adaptively by several tables of different symbol distributions.
Y blocks Table High Rate Intra High Motion Intra Mid Rate Intra Cb and Cr blocks Index Table 0 1 2 High Rate Inter High Motion Inter Mid Rate Inter
run_before 0 1 2 3 4 5 6

Coding Set Correspondence for PQINDEX <= 8 Index 0 1 2

Coding Set Correspondence for PQINDEX > 8 Index 0 1 2 Y blocks Table High Rate Intra High Motion Intra Mid Rate Intra Cb and Cr blocks Index Table 0 1 2 High Rate Inter High Motion Inter Mid Rate Inter

1 1 0 2 1 01 00 3 11 10 01 00 4 11 10 01 001 000 5 11 10 011 010 001 000 6 11 000 001 011 010 101 100 >6 111 110 101 100 011 010 001

Entropy coding: Bitplane coding


Some symbols are spatially correlated, e.g. MB type. An efficient way to encode these symbols by taking advantage of spatial dependency of these bits 7 Modes: Raw, RowSkip, ColSkip, Norm-2, Norm-6, Diff-2 and Diff-6

Norm-2 Diff-2 Norm-6 Diff-6

Row-skip Col-skip

MB type of P-VOP


Loop filtering

Independent block coding leads to

Visible blocky artifacts The quality reduction of reference frames

z z

z z z

In-loop deblocking filter is used as H.264. Filtering is applied to every 4th, 8th, 12th, etc pixel row or column depending on transform type. Adaptive filtering rule A shortcut to save computation. Filtering energy is small than that of H.264.


Interlace coding

Field picture coding mode

Intra-MB is coded as the progressive case Inter-MB may be either predicted by one 16x16 or four 4x4

MVs, where each MV can refer to either one of two previously encoded fields.

Interlace coding (contd)


Frame picture coding mode

Intra-MB may be coded by frame DCT or field DCT. Inter-MB may adopt frame prediction (1 or 4 MVs) or field

prediction (2 or 4 MVs) in addition to DCT types.

Advanced B-frame coding


z z z

Explicit coding of the B frames temporal position relative to its two reference frames. (variable velocity model) Intra-coded B frames. Improve MV coding efficiency. Allow bottom B-field to refer to top B-field.

Overlap smoothing

Another technique to reduce blocking artifacts in intra areas. z Drawback of deblocking filtering
It is purely a decoder process, which operates equally on both

block-aligned true edges and apparent block edges. Usually disable in the less complex profiles.

The lapped transform is another way to remove blocking effect. z Spatial-domain approach makes lapped transform as a pre- and post-processing. z Adaptive applications rule: applied in the lower bitrate, also can be switched on or off at MB-basis.
p0 p1 q1 q0 a0 a1 b1 b0

y0 7 y1 1 y = 1 2 y 1 3

0 0 1 x0 r0 7 1 1 x1 r1 + >> 3 1 7 1 x 2 r0 0 0 7 x3 r1

Low-rate tools (Rate control tools)


Dynamic range reduction (intensity res.)

Luminance and chrominance values may be scaled down by

a factor of 2 before coding.


Dynamic frame resizing (spat. res.)

Coded frame size may be half in vertical, horizontal or both

to further reduce rate cost and keep the constant bitrate requirement.

Int. range reduction Frame re-sizing


Fading compensation

Effective with global illumination changes

Natural illumination changes Artificial transitioning effects, such as fade-to-black, fade-

from-black and dissolves, blending, cross-fades and morphing.

z z

Encoder detects fading prior to motion compensation by comparing the error measure with a threshold. Encoder and decoder use the quantized fading parameters based on a linear first-order function to transform the original reference frame into a new reference frame.

Video smoothing

Interpolate missing frames after decoding, also referred to as frame interpolation z Use an advanced optical flow estimation technique (on a perpixel basis), along with warping, to synthesize new frames. z Need a CPU at 733MHz to interpolate a video clip at 320x240 from 10 to 30 fps. z J. Ribas-Corbeta and J. Sklansky, Interframe interpolation of cinematic sequences, Journal of VCIR, Dec 1993.

Profiles and levels

z z z

Simple profile Main profile Advanced profile

Comparison among HD-DVD video candidates

MPEGMPEG-2 Video prediction coding Motion res. & Interpolation bilinear Adaptive bilinear + 4-tap FIR + 4-tap FIR/direct 16x16, 8x8 Intensity compensation (P/B) 1/2 N/A Freq-domain pred. (DC/AC) Y 6-tap FIR/cascaded SMPTE VCVC-9 H.264/AVC

Motion block size Brightness change Ref. Frame num (P/B) Generalised B Intra prediction Inter-intra mixed

16x16 N/A 1/2 N/A Freq-domain pred. (DC) N/A

16x16, 16x8, , 4x4 Weighted prediction (B) M/M Y Spatial-domain prediction N/A

transform coding, entropy coding & postprocessing Transform size & type CA Multiple VLCs Bitplane coding Arithmetic coding Post-processing Rate control Quantization Dynamic frame resizing Dynamic range reduction Streaming & error resilience Data partitioning Bitstream switching N/A N/A N/A System level Slice level partitioning SI/SP frames uniform N/A Adaptive uniform and non-uniform Y log scale N/A 8x8 float N/A N/A N/A Optional 8x8, 8x4, 4x8, 4x4 integer Y Y N/A In-the-loop deblocking Overlapped transform 4x4 integer (only +, >>) Y N/A Y (Main profile) In-the-loop deblocking

WMV v.s. MPEG-2

WMV v.s. MPEG-4 SP

WMV v.s. H.264

42 41 40 39 38 37 36 35 34 33 32 31 30 29 0 100 200 300 400 500 600


W MV9 H264-1ref




z z

Software and hardware components can be developed based on SDKs or WM hardware porting kits. WM 9 provides a variety of state-of-the-art audio and video codecs for different applications. The quality of WMV-9 is competitive with H.264/AVC and arguably superior based on several independent tests, with significantly lower computational complexity. This paper explains why some of the tools unique to WMV-9 provide an intrinsic quality benefit over H.264/AVC.

Reading assignment

Sridhar Srinivasan et al., Windows Digital Media Division,

Microsoft Corporation, Windows Media Video 9: overview and applications, Signal Processing: Image Communication, Oct 2004.

7. Composite symbol represents different properties of one MB, and tries to exploit its joint occurrence probability. Bitplane coding collects the same symbol for all MBs and removes the in-between correlations. Could you think out a way to simultaneously take advantage of both?