Académique Documents
Professionnel Documents
Culture Documents
L
v
) and FEC rate of depth R
dFEC
(
L
d
) for obtaining
different 3D visual qualities, where
L
v
denotes the FEC rate
allocation vector for the selected video layers and
L
d
denotes
the FEC rate allocation vector for the selected depth layers.
By integrating the FEC rate allocation with video/depth rate
allocation, the optimal video/depth/FECrate allocation can be
expressed as,
(R
opt
v
, R
opt
d
, R
opt
vFEC
(
L
opt
v
), R
opt
dFEC
(
L
opt
d
))
= arg max
R
v
<R,R
d
<R,
R
vFEC
(
L
v
)<R,
R
dFEC
(
L
d
)<R
Q
3D
(R
v
, R
d
, R
vFEC
(
L
v
), R
dFEC
(
L
d
))
subject to R
v
+R
d
+R
vFEC
(
L
v
)+R
dFEC
(
L
d
) R
(1)
where Q
3D
(R
v
, R
d
, R
vFEC
(
L
v
), R
dFEC
(
L
d
)) denotes the
general 3D visual quality under the combination of R
v
, R
d
,
R
vFEC
(
L
v
) and R
dFEC
(
L
d
). With the same rate amount
of R
vFEC
(
L
v
), there are several numbers of
L
v
. Likewise,
there are several
L
d
for the same rate amount of R
dFEC
(
L
d
).
L
opt
v
and
L
opt
d
are the optimal
L
v
and
L
d
which can result in
the optimal 3D video quality.
In (1), the 3Dvisual quality is obtained by the coarse pool-
ing of the video quality and depth quality, and in this work we
adopt the simple pooling process suggested in [10] by substi-
tuting depth map for disparity map as
Q
3D
= a (Q
v
)
d
+ b (Q
d
)
e
+ c (Q
v
)
d
(Q
d
)
e
, (2)
where the video quality Q
v
and depth quality Q
d
need to be
estimated before transmission, and ae are pooling model
parameters which can be obtained by an off-line training way.
In our experiment, we set a = 2.381, b = 0.001, c = 0.001,
d = 2.127, e = 3.821. This kind of video and depth quality
pooling can approximately reect the 3D visual quality, and
its effectiveness has been conrmed in [10] with the extensive
subjective tests.
To obtain the overall 3D visual quality, the qualities of
video and depth in terms of signal distortions are indepen-
dently estimated through predicting their packet loss prob-
abilities. Since the layer-based FEC is used, the packet
loss rate for each layer must be rstly estimated. Based on
the Bernoulli error model, the packet loss rate for the m
th
(m < M
v
for video and m < M
v
for depth) layer after the
RS(N
m
, K
m
) encoding without considering the layer depen-
dency can be obtained by
m
=
K
m
j=1
(
N
m
j
)
()
j
(1 )
N
m
j
, (3)
where the packet error probability for each packet is equal
with that packet losses are identically distributed across al-
l source data and FEC packets. What should be mentioned
here is that the packet error probability can be regarded as the
average packet loss rate in the statistical sense.
Further, the packet loss rate
m
for that only the rst m
layers are decodable in the total M
v
layers for texture video
can be expressed as
m
=
m1
j=1
(1
j
), 1 < m < M
v
m
, m = 1
M
v
j=1
(1
j
), m = M
v
.
(4)
Likewise, the packet loss rate
m
for depth can be comput-
ed by substituting M
d
for M
v
in (4). With the estimated
PLRs for different layers, the total signal distortion for video
or depth can be further estimated by
D =
M
m=1
m
D
m
, (5)
where D
m
denotes the video or depth distortion that only the
rst m layers are decodable, and M denotes the M
v
for video
distortion estimation and M
d
for depth distortion estimation.
Finally, the overall Q
3D
can be computed with (2) by using
the PSNRs (Q
v
and Q
d
) of video and depth which are con-
verted from their estimated signal distortions.
With equations from (1) to (5), the specic rate allocation
can be sequentially performed with group of pictures (GOP)
unit. For each GOP, we monitor the transmission PLR , and
then obtain the optimal video/depth/FEC rate allocation result
through full searching all candidate rate allocation cases by
maximizing the 3D visual quality.
2.3. 3D saliency based FEC assignment
With the joint video/depth/FEC rate allocation, the optimal
GOP-level rate combinations of video, depth and their FEC
rates can be selected. This optimal selection is based on the
assumption that the importance levels of the packets in one
frame are equal. However, this is usually not the case for the
practical 3D video viewing. In the 3D video viewing, peo-
ple usually pay more attention to the interesting parts of one
picture than the other parts. Hence, the 3D saliency charac-
teristics can be considered into the FEC rate allocation within
the picture to further enhance the FEC performance.
To deliver high 3D visual experience over bandwidth-
limited channel, more FEC rates need to be allocated to some
saliency regions to which users pay attention. For saliency-
based FEC rate allocation, the 3D visual saliency map should
be rstly generated. Unlike 2D visual saliency, 3D visual
saliency involves the interested depth range [9]. Therefore,
we can estimate the 3D saliency by fusing the 2D saliency and
depth saliency. Though we can separately process the video
and depth to generate their corresponding salience maps and
then use them to assist the FEC rate allocation for video and
depth. However, this kind of FEC scheme with the individual
saliency maps can not provide the optimal overall 3D salien-
cy in the nal 3D video viewing. The 2D saliency map and
depth saliency map need to be merged into one uniform 3D
saliency map to guide the FEC rate assignment for video and
depth.
2D video saliency
computation
Video
Depth saliency
computation
Depth
Saliency map merging
2D saliency map Depth saliency map
3D saliency map
Fig. 3. 3D saliency map generation
The completed 3D saliency map generation ow is shown
in Fig. 3. The 2D visual saliency map is rstly extracted
using AIM model [11] which is based on a premise that lo-
calized saliency computation serves to maximize information
sampled from ones environment. After that, depth-based vi-
sual attention analysis is used to generate the depth saliency
map. The depth of the object often correlates with its saliency
level. Generally, the captured scene depth is not totally con-
formed to the perceived depth range on the target display, so
that some regions with appropriate depth ranges needs to be
given the high saliency to guarantee the perceived depth sen-
sation [12]. The contents of foreground often gain more atten-
tions of human being than those of background [13]. There-
fore, we allocate the high saliency values to the regions close
to minimal value of the depth range and a low saliency to the
maximal value of the depth range. Thus, the depth salien-
cy map is generated through simply mapping the appropriate
depth range to saliency map.
Finally, we obtain the 3D saliency map S
3D
by leveraging
2D saliency map S
v
and depth saliency map S
d
using a linear
merging of them as
S
3D
= (1 )S
v
+ S
d
, (6)
where is a leverage factor between 2D saliency and depth
saliency, which can be set to 0.5. Currently, how the 2D
saliency and depth saliency are fused into the actual 3D
saliency in the human brain is not very clear. However, the
simple linear weighting of 2D saliency and depth saliency can
approximately reect the actual 3D saliency. Fig. 4 shows the
2D saliency map, depth saliency map, the merged 3D salien-
cy map, and the corresponding segmented high 3D saliency
region. It can be seen from Fig.4 that the merged 3D saliency
map basically reects the humans 3D saliency.
(a) (b) (c)
(d) (e) (f)
Fig. 4. 3D saliency map for Balloons sequence. (a)Balloons
image, (b)depth map, (c)2D saliency map, (d)depth salien-
cy map, (e)merged 3D saliency map, (f) segmented high 3D
saliency region
In the current stage, the 3D saliency map is generated in
an off-line way before the 3Dvideo streaming. The 3Dsalien-
cy map is used as the complement data to assist 3D video
streaming. Given the 3D saliency map, more FEC rates can
be added into the high 3D saliency regions in each frame of
video and depth.
3. EXPERIMENTAL RESULTS
To evaluate the proposed method, 3D video streaming with
video/depth/FEC rate allocation is simulated in NS2. The
video and depth are independently encoded by the SVC soft-
ware JSVM9.19 with 4 quality layers (1 CGS layer and 3
MGS layers). To facilitate the concentrative protections of
the dispersed saliency regions, exible macro-block coding
(FMO) is used to encode the CGS layer of video and depth.
In the experiments, view4 for Lovebirds1 with 200 frames
(view5 is virtual view) and view1 of Balloons (view2 is virtu-
al view) with 300 frames are used. To estimate the packet loss
induced signal distortion, Open SVC Decoder [14] with de-
fault error concealment is used to decode the error-corrupted
scalable 3D video stream.
For FECcoding, we use RS(10, k) with dynamic k to reg-
ulate the FEC redundancy rate. To illustrate the performance
of the proposed joint video/depth/FECrate allocation method,
xed ratio 5:1 video/depth rate allocation method with xed
FEC redundancy rate ratio of 0.2, and joint video/depth/FEC
rate allocation without 3D saliency guidance are used as the
comparison solutions for the 3D video streaming.
Currently, for 3D saliency guided FEC rate allocation,
we only add 3D saliency guided FEC into the CGS layer of
video and depth since the FMO encoding of MGS layer is not
well supported in the JSVM9.19. Hence, the effectiveness of
3D saliency based video/depth/FEC rate allocation is veried
with the CGS layer into which the layer-based rate allocation
is not involved.
To verify the performance of the proposed join-
t video/depth/FECrate allocation method, subjective 3Dqual-
ity evaluation is performed with a 15.6 inch lenticular lenses
based stereoscopic display (TOSHIBA Qosmio F750 laptop
computer). The subjects sat in front of the screen with com-
fortable viewing distance and the eld of view was about 15
o
.
The SSIS (Single Stimulus Impairment Scale) method de-
scribed in ITU-R BT. 500 [15] and ve grade scales of MOS
are used in the subjective tests. 15 subjects are involved in-
to the subjective tests and the average MOS of their scores is
used to evaluate the experimental result.
Fig. 5 shows the performance of the proposed join-
t rate allocation method under the constant channel rate of
4500kbps with different networking PLRs. It can be seen
from Fig. 5 that the subjective 3D visual quality of xed ratio
video/depth/FEC rate allocation method gets worse rapidly
with increasing PLR, and comparably, the subjective quality
of the joint video/depth/FEC rate allocation gets worse slowly
with the increasing PLR. It suggests that the proposed method
can restrain the trend of the 3D quality getting worse through
regulating the trade-off among the FEC rates, video rates and
depth rates. It is also conrmed by the objective quality eval-
uation of the virtual view which is shown in Fig. 6. Though
the proposed rate allocation method is based on the subjective
3D visual quality evaluation, the video and depth distortion-
s are also indirectly used in the rate allocation optimization.
Therefore, the PSNR value of the virtual view synthesized by
the corresponding reference texture video and depth can also
reect the efciency of the proposed method.
0 0.02 0.04 0.06 0.08 0.1
3
3.5
4
4.5
5
PLR
M
O
S
Balloons
Joint rate allocation
Fixed ratio rate allocation
(a)
0 0.02 0.04 0.06 0.08 0.1
2.5
3
3.5
4
4.5
PLR
M
O
S
Lovebirds1
Joint rate allocation
Fixed ratio rate allocation
(b)
Fig. 5. The 3D perceptual quality comparisons between
the joint video/depth/FEC rate allocation, and the xed ratio
video/depth/FEC rate allocation under the constant rate con-
straint of 4500kbps with different networking PLRs
0 0.02 0.04 0.06 0.08 0.1
24
26
28
30
32
PLR
P
S
N
R
(
d
B
)
Balloons
Joint rate allocation
Fixed ratio rate allocation
(a)
0 0.02 0.04 0.06 0.08 0.1
20
22
24
26
28
PLR
P
S
N
R
(
d
B
)
Lovebirds1
Joint rate allocation
Fixed ratio rate allocation
(b)
Fig. 6. Objective quality (PSNR value) of the synthesized
virtual view video
The proposed joint rate allocation method dynamically
regulates the FEC rate ratios to adapt to the transmission
packet loss situations to improve the error-resilience perfor-
mance. Fig. 7 shows the FEC rate ratio variations for video
and depth under the transmission conditions of time-varying
networking PLRs (dynamically varied from 0 to 0.1) and the
constant rate constraint of 4500kbps. The FEC rate in Fig. 7
for video or depth includes the allocated rates of total selected
layers of video or depth. The FEC rate ratio changes with the
dynamic PLR in the temporal dimension. The FEC rate ratio
variations for video and depth are distinctly different. It indi-
cates that the FEC rate regulation affects the balance between
video and depth to achieve the optimal 3D visual quality.
Generally, the available channel bandwidth is often time-
variable for video streaming. Fig. 8 shows the performance
of the proposed joint rate allocation method under the time-
varying channel rate constraints from 3000kbps to 5000kbps
with different PLRs. It can be seen that the proposed method
exhibits the dynamic adaptations to the channel uctuation
and PLR variation. The dynamic rate allocation between
video and depth with balanced FEC rate assignment always
provides the superior 3D visual quality under the uctuated
channel conditions.
For the proposed rate allocation method, FEC rate can
also be unequally assigned by considering the 3D saliency
within the picture. Fig. 9 shows the subjective quality com-
parison between joint video/depth/FEC rate allocations with
3D saliency guidance and without 3D saliency guidance un-
der the constant rate constraint of 3500kbps (only the CGS
0 5 10 15 20 25 30 35
0
0.1
0.2
0.3
0.4
0.5
GOP number
F
E
C
r
a
t
e
r
a
t
i
o
Balloons
Video
Depth
Fig. 7. FEC rate ratio variation with increasing GOP number
0 0.02 0.04 0.06 0.08 0.1
2.5
3
3.5
4
4.5
PLR
M
O
S
Balloons
Joint rate allocation
Fixed ratio rate allocation
(a)
0 0.02 0.04 0.06 0.08 0.1
2
2.5
3
3.5
4
PLR
M
O
S
Lovebirds1
Joint rate allocation
Fixed ratio rate allocation
(b)
Fig. 8. The 3D perceptual quality comparisons between
the joint video/depth/FEC rate allocation, and the xed ratio
video/depth/FEC rate allocation under time-varying channel
rate constraints from 3000kbps to 5000kbps
layer are involved into rate allocation). From the gure, it can
be seen that the MOS values of 3D saliency-based rate allo-
cation are a little higher than those without 3D saliency guid-
ance. It veries that the 3D saliency based FEC can provide
the stronger FEC protection for the higher saliency region so
that the received total 3D perceptual quality is much better.
Fig.10 shows the red-cyan stereoscopic anaglyph 3D pictures
of Balloons sequence at the 117
th
frame under the transmis-
sion PLR of 5%. In the pictures, the region enclosed by the
ellipse is visually with higher quality stereopsis for 3D salien-
cy based FEC rate allocation than that without 3D saliency
consideration. Comparably, the FEC rate allocation without
saliency consideration can not provide the key protection for
the region enclosed by the ellipse and possibly result in the
packet loss. Consequently, the perceptual 3D quality of FEC
rate allocation without 3D saliency consideration is usually
worse than that of 3D saliency-based unequal FEC rate allo-
cation.
4. CONCLUSION
This paper presents a joint video/depth/FEC rate allocation
method with 3D saliency guidance for scalable 3D video
streaming. By utilizing the end-to-end 3D perceptual quality
estimation, the optimal video rate, depth rate and the corre-
sponding FEC rates can be optimally assigned. The unequal
protections between video and depth as well as different da-
ta layers are performed. Further, with the 3D saliency anal-
ysis, the unequal FEC rate assignment within the frame is
also utilized to improve the FEC performance for 3D video
streaming. Experimental results showed that the proposed 3D
saliency guided joint video/depth/FEC rate allocation method
0.03 0.05 0.10
2.5
3
3.5
4
4.5
5
PLR
M
O
S
Balloons
With 3D saliency
Without 3D saliency
(a)
0.03 0.05 0.10
2.5
3
3.5
4
4.5
5
PLR
M
O
S
Lovebirds1
With 3D saliency
Without 3D saliency
(b)
Fig. 9. The 3D visual quality comparison between the joint
video/depth/FEC rate allocations with and without 3D salien-
cy guidance (95% condence interval)
can provide good error resilience performance for salable 3D
video streaming, and correspondingly achieve the higher per-
ceptual 3D quality than the xed ratio video/depth/FEC rate
allocation method.
5. REFERENCES
[1] W. Zou, An Overview for Developing End-to-End Stan-
dards for 3-DTVin the Home, Information Display, Vol.
25, No.7, July 2009.
[2] P. Merkle, Y. Wang, K. M uller, A. Smolic, and T. W-
iegand, Video plus Depth Compression for Mobile 3D
Services, Proc. of IEEE 3DTV Conference 2009, Pots-
dam, Germany.
[3] A. Vetro, A. M. Tourapis, K. M uller, and T. Chen, 3D-
TVcontent storage and transmission, IEEE Transactions
on Broadcasting, vol. 57, no. 2, pp. 384-394 , June 2011.
[4] L. Karlsson, M. Sj ostr om, Layer Assignment Based on
Depth Data Distribution for Multiview-Plus-Depth Scal-
able Video Coding, IEEE Trans. Circuits Syst. Video
Techn. Vol. 21, No. 6, pp. 742-754, 2011.
[5] C-H. Lin, Y-C Wang, C-K Shieh, W-S Hwang, An Un-
equal Error Protection Mechanism for Video Streaming
over IEEE 802.11e WLANs, Computer Networks , vol.
56, no.11, pp. 2590-2599, July 2012.
[6] D. Jurca, P. Frossard and A. Jovanovic, Forward Error
Correction for Multipath Media Streaming, IEEE Trans-
actions on Circuits and Systems for Video Technology,
vol. 19, no. 9, pp.1315-1326, 2009.
[7] W.-T. Tan and A. Zakhor, Video multicast using lay-
ered FEC and scalable compression, IEEE Trans. Cir-
cuits Syst. Video Technol., vol. 11, pp.373 -386, 2001.
[8] Y. Liu, Q. Huang, S. Ma, D. Zhao, W. Gao, Joint
Video/Depth Rate Allocation for 3D Video Coding based
(a) Without 3D saliency guidance
(b) With 3D saliency guidance
Fig. 10. Stereoscopic anaglyph (red-cyan) comparison be-
tween the FEC rate allocations with 3D saliency guidance and
without 3D saliency guidance
on View Synthesis Distortion Model, Signal Process-
ing: Image Communication, vol.24, no.8, pp. 666-681,
Aug.2009.
[9] Y. Niu, Y. Geng, X. Li, and F. Liu, Leveraging Stereop-
sis for Saliency Analysis, IEEE CVPR, Providence, RI,
June 2012.
[10] J. You, L. Xing, A. Perkis, et al., Qerceptual Quali-
ty Assessment for Stereoscopic Images Based on 2D Im-
age Quality Metrics and Disparity Analysis, Internation-
al Workshop on Video Processing and Quality Metrics,
Scottsdale, AZ, USA, 2010.
[11] N. Bruce and J. Tsotsos, Saliency Based on Infor-
mation Maximization, Advances in Neural Information
Processing Systems, vol. 18, pp. 155-162, 2006.
[12] N. Holliman, Mapping perceived depth to regions of
interest in stereoscopic images, Stereoscopic Displays
and Virtual Reality Systems XI, Proceedings of SPIE
5291, 2004.
[13] M. Lang, A. Hornung, O.Wang, S. Poulakos, A. Smolic,
and M. Gross, Nonlinear disparity mapping for stereo-
scopic 3D, ACM Trans. Graph., vol.29, no. 4, July 2010.
[14] http://sourceforge.net/projects/opensvcdecoder/
[15] ITU-R Recommendation BT.500-11, Methodology for
the subjective assessment of the quality of television pic-
tures, 2002.