Vous êtes sur la page 1sur 8

Aspect Processing

The Shape of Things to Come

Mike Knee & Roberta Piroddi


Snell & Wilcox, UK

Putting Pictures to Work


TM

snellwilcox.com
snellwilcox.com

Abstract The aspect ratio of the display may be anything from nearly
square (or even a “portrait” format) to the 16:9 shape now
The proliferation of video source formats, deliver y mechanisms common in home TV displays. Projection systems have greater
and display devices has significantly increased broadcasters’ flexibility to match ver y widescreen cinema film formats.
needs for aspect ratio conversion. We have taken this as an
opportunity to develop aspect processing, a framework for The multiplicity of deliver y and display platforms means that
producing an arbitrar y shape intelligently and flexibly. aspect ratio conversion has become an important part of the
Aspect processing extends aspect ratio conversion to the broadcast chain.
definition of optimum viewing for a given screen size and
shape. This paper presents new aspect processing technologies,
In this paper, we first review conventional approaches to
which introduce a set of efficient criteria to automatically
aspect ratio conversion and discuss their advantages and
identify regions of interest in the picture and to process the
disadvantages. To address the shortcomings, the wider concept
picture accordingly.
of aspect processing is introduced, in which we attempt to
optimize the viewing experience for a given display shape
This paper also introduces a unique and novel technology – and size. Two sections then introduce new content-aware
video seam carving, which works by altering the technologies for aspect processing: dynamic reframing and
perspective of the image sequence to change between video seam car ving.
aspect ratios.

Dynamic reframing is a technique for content-dependent


These new technologies allow broadcasters to access new cropping and resizing, geared especially towards small
platforms and audiences with maximum re-use of systems displays.
and source material, resulting in a simpler and cheaper
infrastructure.
Seam car ving is a much more radical approach, in which
regions considered to be less visually important are quite
literally removed from the picture.
Introduction
Once upon a time, television content was produced with a 4:3
aspect ratio and delivered at standard definition through a Aspect Ratio Conversion
mild compression system, for example PAL, to a 4:3 CRT
In this section, we shall base our examples on 16:9 to 4:3
display that was probably between about 12 and 24 inches
conversion, but the comments apply equally to conversion in
wide. All the broadcaster had to do was to ensure reliable,
the other direction and between other ratios. Examples of two
faithful transmission of the content to the display.
16:9 frames, which we shall use throughout this paper, are
shown in Figure 1.
Now, the proliferation of source formats, deliver y mechanisms
and screen sizes and shapes leads to a complex set of issues
for the broadcaster to address.

Broadcast programmes may be produced at standard or high


definition, at a 4:3 or 16:9 aspect ratio, and may include
content from anything from mobile phones to ultra-high-
definition widescreen film scanners working with aspect ratios
Figure 1: Two 16:9 frames from the film “Mission Antarctique”
up to 2.35:1 or even wider.

The first and simplest approach to handling different aspect


The content will generally be compressed for deliver y, at bit
ratios is to allow the source image to change shape to fill the
rates ranging from many tens of Mbit/s for a premium HDTV
display. This is known as anamorphic conversion. It involves no
ser vice, to a few tens of kbit/s for mobile content.
processing at all, as each scan line traverses the width of the
screen by design. Anamorphic conversion has the advantage
The results will be viewed on anything from a 1.5-inch mobile
that no information is lost, but ever ything looks the wrong
phone screen to a 50-inch plasma display, or even bigger
shape, as shown in Figure 2.
displays if we consider video projectors or outdoor screens.
Figure 2: Anamorphic conversion Figure 5: Combination of anamorphic conversion, cropping
and letterboxing
At another extreme of the set of possible techniques, we have
cropping. We remove the sides (or the top and bottom) of the Another technique sometimes encountered is non-linear
image to achieve the desired display aspect ratio. Cropping anamorphic conversion. The idea is to perform a stretch or
retains the shape of material that remains, but at the expense shrink conversion with a variable degree of shape distortion,
of losing what might be important information, as shown in so that the centre portion of the picture undergoes no
Figure 3. distortion, at the cost of a greater than average distortion
nearer the sides of the picture. In the example shown in Figure
6, the central 60% of the output picture undergoes no
distortion, but outside this region the distortion goes up
progressively to a maximum value in such a way as to achieve
the desired overall conversion ratio.

Figure 3: Cropping

The letterbox or, for conversion from narrower to wider


pictures, pillarbox approach maintains the integrity of both
shape and content of the source by resizing the source to fit
within the display window and filling the remainder of the Figure 6 Non-linear anamorphic conversion
display window with black or, less commonly, with patterns
or text. Letterboxing is often favoured by film makers as it This produces a pleasing result if the objects of interest are
preser ves the artistic intent, but this is at the cost of a loss or near the centre of the picture, but can be disastrous if objects
waste of resolution. Figure 4 shows letterboxing applied to our of interest are too close to the sides.
example images.
The techniques for aspect ratio conversion described so far
have two drawbacks. They do not take into account the size of
the display as distinct from its shape, and they fail to make use
of any knowledge about the relative importance of different
regions in the picture. Aspect processing can overcome these
two drawbacks.

Figure 4: Letterboxing
Aspect Processing
Aspect processing is a generalization of aspect ratio
A compromise approach sometimes used successfully in
conversion. Aspect ratio conversion may be thought of as
consumer TV sets is to combine all three of the above
using a map function that links pixel locations in the input
techniques, “sharing the pain” between them. In our example,
space to pixel locations in the output space. In the examples
the required 25% reduction in aspect ratio could be achieved
above, this map is more or less complicated but is fixed. In
by a 9.14% reduction from each of the three techniques.
aspect processing, we allow the map function to var y smoothly
This would produce the result shown in Figure 5.
from frame to frame in dependence on variations in content.
Aspect processing is therefore a content aware technology.
snellwilcox.com

One important feature of aspect processing is that the map A further refinement of the approach is to include in the
approach provides a convenient framework for these content distance measure a suitably normalized measure of the
aware methods to be applied together and, if desired, in badness of fit of the pixel to a motion model calculated for
conjunction with the fixed aspect ratio conversion techniques each segment. As the segment membership is updated, the
described above. motion models can be refined using an iterative gradient
approach such as that described in [2].
Two approaches to aspect processing will now be presented. In
dynamic reframing, the map function defines a rectangular Foreground-background segmentation from motion
window of variable position and size that attempts to estimation
encompass the region of greatest interest in the scene. In Another approach to foreground-background segmentation is
seam carving, the map function is much more complex and based more directly on motion information. Motion estimation
attempts to remove (or to expand) areas all over the picture can be applied to many image processing tasks, for example
that have least interest while maintaining the integrity of the standards conversion, image restoration or compression. In
areas of greatest interest. many cases, a motion estimator is therefore already available.

Dynamic Reframing The use of motion estimation for segmentation is based on the
assumption that the background motion can be modelled by a
In order to do dynamic reframing, the first problem is to
single, global, vector. The foreground is then an object or a set
segment the image into a region of greatest interest, or
of objects that move in a markedly different way from the
foreground, and the rest of the picture, the background.
background. The larger the difference between the movement
We have developed three methods of generic foreground -
of the object and that of the background, the more likely the
background segmentation, which can be combined and further
object is to be interesting and to be part of the foreground.
modified by genre-specific enhancements.

In our work, we used a motion estimator based on


Foreground-background segmentation by clustering
phase correlation [3], in which large blocks of successive
In one method of segmentation, we use a clustering approach pictures are compared in the two-dimensional Fourier domain,
[1], based on representing the pixels as “point masses” in a leading to a correlation surface with peaks corresponding to
multidimensional space. Each dimension represents a the different speeds and directions of motion present in the
measurable feature of the pixel, crucially including its x and y scene, thereby giving us a list of candidate motion vectors.
co-ordinates in the picture, but also including some or all of Vectors are then assigned to individual pixels by finding out
the colour components, motion vectors, texture measures and which candidate vector best models the local motion. The
any other quantity that is likely to be shared by pixels in the global motion vector may be obtained by taking the sum of all
same segment. Each of the two segments is represented by a the correlation surfaces generated for each block in the picture
“centroid” in the space, which can also be thought of as an and taking the highest peak.
entr y in a vector quantization codebook. The process of
assigning pixels to segments is then equivalent to vector In order to find out if an object is moving ver y differently from
quantization. Each pixel is assigned to the segment to whose the background, we do not need to segment and fit a motion
codebook entr y it is closest, producing a partition of the space. model. Instead, we estimate such a difference on a pixel basis.
The probability that a pixel is part of the background may be
Having assigned pixels to segments, the centroids themselves expressed as the inverse exponential of the Euclidean distance
can be updated to take into account the new par tition of the between the global vector and the pixel’s local assigned vector.
space. This is an iterative process that can also be carried over
from one frame of a sequence to the next. Overall performance Further information from the motion estimation process can be
can be improved by introducing a probabilistic or “soft” notion used in segmentation. Additionally, for increased accuracy and
of segment membership. quality of the segmentation results, measures of intra-frame
interest may be successfully integrated in the framework. In our
One problem with finding the “nearest” centroid is that the experience, the most useful measures of spatial interest are the
multidimensional space consists of quantities that are measured ones that are perceptually motivated. We draw on the concept
in different, incompatible units. We can overcome this by of pre-attentive saliency of objects, favouring the colour
normalizing each dimension according to its variance. We can elements that are more likely to attract the viewer’s attention,
also take account of dependencies between dimensions by independently from the content of the scene.
normalizing according to the covariance matrices of the pixels
in each segment – this is the Mahalanobis distance.
Genre-specific enhancements Dynamic zoom
The segmentation can be further enhanced using a priori For a small display device such as a mobile phone, it may be
knowledge of the programme material. For example, in advantageous to zoom dynamically into the picture so that the
football, simple detection of green areas can indicate output is filled by the foreground. We can use the segmentation
background, as can a more sophisticated crowd detection results to control an adaptive zoom process. Figure 9 shows
algorithm. In drama programmes, flesh tone can indicate some results, again as highlights followed by output pictures.
foreground.

Results of a foreground-background segmentation process for


our example pictures are shown in Figure 7.

Figure 7: Foreground-background segmentation


Figure 9: Dynamic reframing
In this case, a clustering based algorithm was used. Some
compromises are evident between an “ideal” motion based The key improvement here for small displays is that, where
approach, which would find just the bicycles, and picking up there is only one object of interest, the system is able to zoom
some of the detail in the background. in on it, together with the most interesting area of the nearby
background.
Dynamic pan-scan
The segmentation results can be used to control a dynamic Background softening
pan-scan algorithm. In our example, a full-height 4:3 window In many deliver y systems, the picture signal is compressed at a
into the input picture is steered using the segmentation ver y low bit-rate. The segmentation results can be used to
information. The results, shown first as a highlighted window control an adaptive filter that softens background areas,
into a thumbnail of the input picture, are given in Figure 8. making them easier to compress. The result is a saving of
between 10% and 30% of bit-rate for a given level of perceived
quality.

Dynamic pre-warping
One disadvantage of dynamic reframing is that it removes all
control of the framing from the display. Some viewers may
prefer the full reframing suggested by the segmentation
process, while others may prefer a milder reframing or none at
all. To give flexibility back to the display, it is necessar y to
transmit the whole picture (or at least a full-height, dynamic
pan-scan picture), but if this is done at the display resolution
and the display tries to do its own reframing, the result will
Figure 8 Dynamic pan-scan have unacceptably low resolution.

Comparing with Figure 3, the dynamic pan-scan algorithm


gives a slight advantage over fixed cropping. With general
source material, we find that the dynamic pan-scan output
nearly always gives framing that is better than fixed cropping.
snellwilcox.com

A novel solution to this problem is to pre-warp the picture [4], The minimum-energy seam can be found using a recursive
assigning more pixels and hence higher resolution to the technique in which we calculate best partial seams leading to
region of interest, at the expense of fewer pixels outside the each pixel on successive rows of the picture until we have a
region of interest. If the display chooses to zoom into the minimum-energy seam leading to each pixel on the bottom
region of interest, sufficient resolution will then be available, row. We simply remove all pixels belonging to this seam from
while if it chooses to display the whole picture, the only the picture, shifting the rest of the picture into the gap to make
penalty is a softening of the picture outside the region of a new picture one pixel narrower than before. This is the
interest. Pre-warping is an attractive idea but it does produce a process of “car ving” a seam from the picture.
non-standard transmitted picture, so the display has to include
an inverse warping function which is controlled by metadata Video seam car ving
and by user input. In our application of seam car ving to aspect processing of
moving sequences, we have made three powerful improvements
Figure 10 shows a pre-warped transmitted picture and to the original algorithm.
examples of zoomed-in and full-picture displays, both derived
from the same transmitted picture. The first is that an element of motion compensated recursion
is introduced, in which the energy function is weighted to
favour placing a seam where we would have expected the
corresponding seam in the previous picture to have moved
to [6].

The second improvement relies on thinking of seam car ving as


a process that generates a map function linking input and
output pixel locations – effectively a set of instructions for a
rendering engine. We can manipulate the map function by
filtering, scaling or mixing to make the seam car ving process
smoother and also to enable the seam car ving analysis to be
performed on a downsampled picture to save computation [6].

The third improvement extends the map approach to give


smooth seam carving. We take note of the amount of energy
removed from the picture by each seam in the original analysis
Figure 10: Pre-warping (top left) and two resulting phase. This then affects the width of picture material that is
display options actually removed. The first few seams will car ve through the
plainest, lowest-energy areas of the picture, so we remove
more picture material from these areas.
Seam Car ving
Seam car ving is a totally radical approach to aspect Figure 11 gives an indication of the energy levels of seams in
processing. Unlike dynamic reframing, which removes areas of our example pictures and shows the resulting output pictures.
least interest from the edges of the picture, seam car ving
removes such areas from throughout the picture.

First we give a brief, informal description of the seam car ving


algorithm for still pictures. Much more detailed descriptions are
given in [5] and [6].

Suppose we wish to shrink a picture horizontally. Seam car ving


is applied repeatedly, shrinking the picture by one pixel width
at a time. Each pass of the algorithm operates as follows.
We calculate an energy or activity function for each pixel
in the picture. We then find a seam (a set of connected pixels)
of minimum energy extending from the top to the bottom of
the picture.

Figure 11: Smooth, motion compensated recursive video seam


carving
If seam car ving is successful, it is difficult to tell where
information has been removed from the picture. The effect
Conclusions
often seems to be equivalent to a subtle change in perspective This paper has introduced and demonstrated the concept of
of the scene. However, there are cases where the result can aspect processing, in which new technologies of dynamic
look unnatural. These problems can be addressed by reframing and video seam car ving can be combined with
combining foreground-background segmentation with seam conventional aspect ratio conversion techniques. The result is a
car ving. The segmentation results can be used to weight the comprehensive toolkit for optimizing the television viewing
energy levels of parts of the picture that we wish to preser ve. experience over a wide range of display platforms, deliver y
Seams will then be less likely to cross those areas, and picture mechanisms and source characteristics.
material will not be removed.

References
The improvements that we have made to seam car ving for
1. Knee, M.J. Image segmentation algorithms for video re-
video sequences allow a great deal of flexibility. We can
purposing. Paper presented at CVMP, London, November 2006
perform seam car ving independently in both horizontal and
2. Vlachos, T. and Hill, L. Optimal search in Hough parameter
vertical dimensions and combine the resulting maps as
hyperspace for estimation of complex motion in image
described in [6]. We can use seam car ving to expand or
sequences. IEE Proc. Vision, Image and Signal Processing, vol
contract the picture in either or both dimensions. For example,
149, issue 2, Apr 2002, pp 63 - 71
Figure 12 shows the result of seam car ving both horizontally
and vertically, the aim this time being to retain the 16:9 aspect 3. Knee, M.J. International HDTV content exchange. Proc. IBC
ratio but to produce a smaller picture in which objects of 2006.
interest retain their original size and shape. In this example, 4. Knee, M.J. Video transmission. International patent application
the smooth seam car ving approach is not used, so the first PVT/GB2008/050158 filed 5 March 2008 – to be published.
pictures show the “hard” horizontal and vertical seams that are 5. Avidan, S and Shamir, A. 2006. Seam car ving for content-
removed from the picture. aware image resizing. ACM Transactions on Graphics, vol. 26,
no. 3
6. Knee, M.J. Seam car ving for video. Proc. NAB 2008.

Acknowledgements
The authors would like to thank the Directors of
Snell & Wilcox Ltd. for their permission to publish this paper.

Figure 12: Motion compensated recursive video seam car ving


for size reduction

There are parallels here with pre-warping, except that here the
“warped” picture is designed to be viewed directly and does
not require an inverse warping operation in the display.
snellwilcox.com

For further information


please contact one of our international sales offices:

Americas China South East Asia France


New York Snell & Wilcox China Snell & Wilcox Snell & Wilcox France
Snell & Wilcox Inc. D2002D South East Asia Sdn. Bhd. 3 rue de Rome
274 Madison Avenue Cyber Tower Building B Suite 9-09 93110 Rosny Sous Bois
Suite 1704 2 Zhongguancun South Road Plaza 138 France
New York City Haidan District Jalan Ampang Tel: +33 (0) 1 45 28 1000
NY 10016 Beijing PRC, 100086 50450 Kuala Lumpur Fax: +33 (0) 1 45 28 6452
Tel: +1 212 481 1830 China Malaysia swfrance@snellwilcox.com
Fax: +1 212 481 2642 Tel: +86 10 5172 7909 Tel: +60 (0) 3 2732 5557
americas@snellwilcox.com Fax: +86 10 5172 7914 Fax: +60 (0) 3 2732 8669 Germany
swchina@snellwilcox.com swmalaysia@snellwilcox.com Snell & Wilcox GmbH
Burbank Senefelderstrasse 3a
Snell & Wilcox Inc. India Europe, Middle East 65205 Wiesbaden
3519 Pacific Ave. Snell & Wilcox India & Africa Germany
Burbank, CA 91505 NewBridge Business Centre UK Tel: +49 (0) 6122 98430
Tel: +1 818 556 2616 Technopolis Snell & Wilcox Ltd. Fax: +49 (0) 6122 9843 55
Fax: +1 818 556 2626 DLF Golf Course Rd Southleigh Park House swgermany@snellwilcox.de
americas@snellwilcox.com Sector 54 Eastleigh Road
Gurgaon-122002 Havant Russia
Asia Pacific India Hampshire Snell & Wilcox AO
Hong Kong Tel: +91 124 462 6000 PO9 2PE Room 214
Snell & Wilcox (Hong Kong) Ltd. swindia@snellwilcox.com United Kingdom 35 Arbat str
Room 603, Tai Tung Building Tel: +44 (0)23 9248 9000 Moscow 119002
No.8 Fleming Road Fax: +44 (0)23 9245 1411 Russia
Wanchai info@snellwilcox.com Tel: +7 495 248 3443
Hong Kong Fax: +7 495 248 1104
Tel: +852 2356 1660 swrussia@snellwilcox.com
Fax: +852 2575 1690
swhk@snellwilcox.com.hk

© 2008 Snell & Wilcox Limited

Snell & Wilcox and Putting


Pictures to Work are trademarks
of the Snell & Wilcox Group.
All other trademarks are duly
acknowledged.

Code Corp50 08/08 v1

Vous aimerez peut-être aussi