2d 3d Conversion

3D-TV Content Creation: Automatic 2D-to-3D Video Conversion
Presented By, PADMASHRI N.K 1BJ08CS027
3D-TV Content Creation: Automatic 2D-to-3D Video Conversion

Area of the paper : Year of Publication : Authors : Broadcasting 2011 Liang Zhang, Carlos Vazquez, Sebastian Knorr
Terminologies
Broadcasting data conversion image generation stereo displays stereo synthesis stereo vision three-dimensional Displays 3D-TV
Outline
Abstract Introduction System Design Existing System Proposed system Problem definitions Computational model Summary of characteristics of depth representation Challenges Conclusion References
Abstract
Three-dimensional television (3D-TV) is the next major revolution in television. A successful rollout of 3D-TV will require a backward-compatible transmission/distribution system, inexpensive 3D displays, and an adequate supply of high-quality 3D program material. With respect to the last factor, the conversion of 2D images/videos to 3D will play an important role. This paper provides an overview of automatic 2D-to-3D video conversion with a specific look at a number of approaches for both the extraction of depth information from monoscopic images and the generation of stereoscopic images. Some challenging issues for the success of automatic 2D-to-3D video conversion are pointed out as possible research topics for the future.
Introduction (Digital TV Technology Trend Overview)
Introduction
Three-dimensional television (3D-TV) is anticipated to be the next step in the advancement of television. The term 3D in this context denotes stereoscopic, meaning a twoview system is used for visualization. The successful adoption of 3D-TV by the general public will depend not only on technological advances in 3D displays and 3D-TV broadcasting systems, but also on the availability of a wide variety of program content in stereoscopic 3D (S3D) format for 3D-TV services. The potential market is attracting many companies to invest their manpower and money for developing 2D-to-3D conversion techniques. The fundamental principle of 2D-to-3D conversion techniques rests on the fact that stereoscopic viewing involves binocular processing of two slightly dissimilar images.
Introduction (continued..)
Thus, converting 2D images to stereoscopic 3D images involves the underlying principle of horizontal shifting of pixels to create a new image, so that there are horizontal disparities between the original image and a new version of it. Various approaches for 2D-to-3D conversion have been proposed. These approaches can be classified into three schemes, namely: manual, humanassisted and automatic conversion.
System Design
System Design (continued..)

2D-to-3D video conversion involves the generation of new images from a single 2D image or a single stream of 2D images (video sequence). From this perspective, 2D-to-3D video conversion can be seen. The framework commonly used for the automatic 2D-to-3D video conversion basically consists of two elements namely: the extraction of depth information and the generation of stereoscopic images in accordance with both the estimated depth information and the expected viewing conditions.
The extraction of depth information:

The extraction of depth information aims to exploit pictorial cues and motion parallax, contained in a single 2D image or video, to recover the depth structure of the scene. The retrieved depth information is then converted into a suitable representation for usage in the 2D-to-3D video conversion process.
System Design (Continued..)

A sparse 3D scene structure and a depth map are two representations of incomplete geometry of a captured scene that are commonly used. A sparse 3D scene structure usually consists of a number of 3D real world coordinates, while a depth map is essentially a two-dimensional (2D) function that provides the depth, with respect to the camera position, as a function of the image coordinates.
The generation of stereoscopic images:

The generation of stereoscopic images is the step that involves warping textures from original images in accordance with the depth information retrieved to create a new image or video sequence for the second eye.
Existing system
The manual scheme:
The manual scheme is to shift the pixels horizontally with an artistically chosen depth value for different regions/objects in the image to generate a new image, where hand drawing produces high quality depth, but is very time consuming and expensive.
The human-assisted scheme:

The human-assisted scheme is to convert 2D images to stereoscopic 3D with some corrections made manually by an operator. Even though this scheme reduces the time consumed in comparison to the manual conversion scheme, a significant amount of human engagement is still required to complete the conversion.
Proposed System
Automatic 2D-to-3D Video Conversion:
To convert the vast collection of available 2D material into 3D in an economic manner, an automatic conversion scheme is desired. The automatic conversion scheme exploits depth information originated from a single image or from a stream of images to generate a new projection of the scene with a virtual camera of a slightly different (horizontally shifted) viewpoint. It may be done in real-time or in a more time-consuming off-line process. The quality of the resulting product is related to the level of the processing involved.
Problem definitions
There are two key issues to consider for automatic 2D-to-3D conversion techniques: how to retrieve depth information from a monoscopic image or video
how to generate high-quality stereoscopic images at new virtual viewpoints.
Computational model Extraction of scene depth information

Each depth image stores depth information as 8-bit grey values with the grey level 0 indicating the furthest value and the grey level 255 specifying the closest value. A variety of depth cues are exploited by the human being to perceive the world in three dimensions. These are typically classified into binocular and monocular depth cues.
Computational model (Continued..)

The extraction of scene depth information aims to convert monocular depth cues contained in video sequences into quantitative depth values of a captured scene. Monocular depth cues can be subdivided into pictorial and motion cues.
A. Depths From Pictorial Cues

Pictorial depth cues are the elements in an image that allow us to perceive depth in a 2D representation of the scene. Depth perception could be related to the physical characteristics of the Human Visual System (HVS) or could be learned from experience. The three categories of pictorial cues commonly used to extract depth information in the following subsections.

Classification of depth cues.
Pictorial depth cues in a 2D image. Visible depth cues: linear perspective, relative and known size, texture gradient, atmospheric scattering, relative height in picture, and interposition.

1) Depth from focus/defocus:
This mechanism can be exploited for the generation of depth information from captured images, which contain a focused plane and objects out of the focused plane. There are two main approaches to implement this mechanism: The first one employs several images with different focus characteristics. The second approach tries to extract the blur information from a single image by measuring the amount of blur associated with each pixel and then mapping the blur measures to the depth of that pixel.
Although the approach of recovering depth from focus/defocus is relatively simple, it suffers from a major drawback, how to distinguish the foreground from the background when the amount of blur is similar.

2) Depth from geometric cues: Geometric related pictorial depth cues are linear perspective, known size, relative size, height in picture, interposition, and texture gradient. Linear perspective refers to the property of parallel lines of converging at infinite distance. The height in picture denotes that objects that are closer to the bottom of the images are generally closer than objects at the top of the picture. Aside from linear perspective and height in picture it is also possible to recover depths from texture (called shape-from-texture), which aims to estimate the shape of a surface based on cues from markings on the surface or its texture. Those methods, however, are normally restricted to specific types of images and cannot be applied to 2D-to-3D conversion of general video content.

3) Depth from color and intensity cues:
Variations in the amount of light arriving to the eye could also provide information of the depth of objects. This type of variation is reflected on captured images as variations of intensity or changes in color. Atmospheric scattering refers to the scattering of light rays by the atmosphere producing a bluish tint and less contrast to objects that are in the far distance and a better contrast to objects that are in close range. Light and shadow distribution refers to the information provided by shadows with respect to the position and shape of objects relative to other objects and the background. Figure-ground perception is another mechanism that helps in the perception of depth. Edges and regions in the image are the depth cues providing this information.

B. Depths From Motion Cues
Motion parallax refers to the relative motions of objects across the retina. For a moving observer, near objects move faster across the retina than far objects, and so relative motion provides an important depth cue. This is usually called the principle of the depth from motion parallax approach. This is based on the fact that objects with different motions usually have different depths. In principle, only video sequences that are captured by a freely moving camera have motion parallax that is closely related to the captured scene structure. Different camera motions will lead to different strengths of depth perception. This process consists of two parts: determination of motion parallax from the sequence, and the mapping of motion parallax into depth information.

1) Motion Parallax Between Images: Motion parallax allows perceiving depth from the differences between two frames in a video sequence. These differences are observed in the video as image motion. By extracting this image motion, the motion parallax could be recovered. Image motion may relate to the whole image (global motion estimation) or specific parts, such as rectangular blocks, arbitrary shaped patches or even per pixel. Motion estimation methods can be generally classified into direct and indirect. 2) Conversion of Motion Parallax Into Depth Information: Dependent on the depth representation, motion parallax estimated from a video sequence is then converted into depth information in terms of a 2D depth map or a 3D scene structure.

a) 2D depth map reconstruction:
A 2D depth map can be reconstructed from image motion. The magnitude of motion vectors within each video frame are directly treated as depth values, when consecutive video frames are taken in almost parallel viewing or when they are acquired in a small baseline distance of two consecutive viewpoints. Such a motion-to-depth mapping might not generate correct depth magnitudes, but would have a correct depth order. A sparse 3D scene structure is represented by a set of 3D feature points in a reconstructed 3D world.
Sparse 3D scene structure and camera track determined by SfM and positioning of a virtual stereo camera.
b) Sparse 3D scene structure reconstruction:
Depth map enhancement Computational model (Continued..)
Depth map estimated by blockmatching motion estimation
Depth fusion
Enhanced depth map
Color segmented frame
Generation of stereoscopic images Computational model (Continued..)

The procedures for the generation of stereoscopic images vary with the representations of depth information.
A. Approaches Based on 2D Depth Maps:
To generate the 3D video, DIBR is used to syntheses the second view video based on the estimated depth map and the 2D video input. Depth image based rendering (DIBR) permits the creation of novel images, using information from depth maps, as if they were captured with a camera from different viewpoints. The DIBR system usually consists of three steps: i)The pre-processing ii)The 3D image warping iii)Hole filling: -> Detect holes -> Fill holes by averaging textures from neighborhood pixels -> Linear interpolation technology

Depth image based rendering(DIBR)
Depth estimation Depth map pre process -ing 3D image wraping Hole filling Left view
2D video input
Right view
Stereoscopic image synthesis with DIBR. (a) Original color image interview, (b) correspondent depth image, and (c) rendered image without hole-filling. Holes are marked with a green color.
Major challenges in DIBR: Occlusion: Two different points in the image plane at the real view can be warped to the same location in the virtual view. To resolve this, the point with position appear closer to the camera in the virtual view will be used. Disocclusion: Occluded area in the real view may become visible in the virtual view. Disocclusion can be resolved by (1) Hole-filling and (2) Depth Map Pre-processing

Hole filling by interpolation:
B. Approaches Based on Sparse 3D Scene Structure:

The basic idea in is to determine the transformation between the original and virtual views, based on the sparse 3D scene structure, to enable the generation of virtual views. This procedure consists the following three steps: 1) Setup of Virtual Stereo Rig
2) 2) Determination of Planar Homographies for Image Warping 3) 3) Virtual View Generation
Demo : 2D video sample
Demo : converted 3D video sample
Summary of characteristics of depth representation
Challenges
Even though much research has been done to enable automatic 2D-to-3D conversion, the techniques are still far from mature. Most available products and methods are only successful in certain circumstances. The following are some key challenging issues to be solved. One issue that directly affects the image quality is the occlusion disocclusion problem during the generation of the stereoscopic images. The depth ambiguity from monocular depth cues is one issue that impacts the depth quality. The depth ambiguity originates from the violation of the principles of depth generation. The integration of various depth cues is another issue affecting the success of automatic 2D-to-3D video conversion. The real-time implementation of 2D-to-3D conversion is also a critical issue for the adoption of the proposed techniques by the general public.
Conclusion
This paper summarized current technical advances related to the development of automatic 2D-to-3D video conversion. The underlying principle of the conversion is to horizontally shift the pixels of an original image to generate a new version of it. To enable this conversion, different approaches for the extraction of depth information from monoscopic images and the generation of stereoscopic images were reviewed. A number of challenging issues that have to be solved for the success of automatic 2D-to-3D video conversion were pointed out as possible research topics. With the development of more advanced techniques for 2D-to-3D video conversion, the vast collection of 2D material currently available will be converted into stereoscopic 3D to boost the general public interest in purchasing 3D displays and 3D-TV services.
References
IEEE TRANSACTIONS ON BROADCASTING, VOL. 57, NO. 2, JUNE 2011 S. Yano and I. Yuyama, Stereoscopic HDTV: Experimental system and psychological effects, J. SMPTE, vol. 100, pp. 1418, 1991. 3D video conversion .pdf N. S. Holliman, N. A. Dodgson, and G. Favalora, Three-dimensional display technologies: An analysis of technical performance characteristics, IEEE Trans. Broadcast., 2011, to be published. P. Harman, J. Flack, S. Fox, and M. Dowley, Rapid 2D to3D conversion, in SPIE Conf. Stereoscopic Displays and Virtual Reality Systems IX, 2002, vol. 4660, pp. 7886.

2d 3d Conversion

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

2d 3d Conversion

Transféré par

Droits d'auteur :

Formats disponibles

3D-TV Content Creation: Automatic 2D-to-3D Video Conversion

Presented By, PADMASHRI N.K 1BJ08CS027

3D-TV Content Creation: Automatic 2D-to-3D Video Conversion

Introduction (Digital TV Technology Trend Overview)

System Design (continued..)

The extraction of depth information:

System Design (Continued..)

The generation of stereoscopic images:

The human-assisted scheme:

how to generate high-quality stereoscopic images at new virtual viewpoints.

Computational model Extraction of scene depth information

Computational model (Continued..)

A. Depths From Pictorial Cues

Computational model (Continued..)

Computational model (Continued..)

Computational model (Continued..)

Computational model (Continued..)

Computational model (Continued..)

Computational model (Continued..)

Computational model (Continued..)

b) Sparse 3D scene structure reconstruction:

Depth map enhancement Computational model (Continued..)

Depth map estimated by blockmatching motion estimation

Enhanced depth map

Color segmented frame

Generation of stereoscopic images Computational model (Continued..)

A. Approaches Based on 2D Depth Maps:

Computational model (Continued..)

Computational model (Continued..)

Computational model (Continued..)

B. Approaches Based on Sparse 3D Scene Structure:

2) 2) Determination of Planar Homographies for Image Warping 3) 3) Virtual View Generation

Computational model (Continued..)

Demo : 2D video sample

Demo : converted 3D video sample

Summary of characteristics of depth representation

Vous aimerez peut-être aussi