Are you sure?
This action might not be possible to undo. Are you sure you want to continue?
BINAURAL ANALYSIS/SYNTHESIS OF INTERIOR AIRCRAFT SOUNDS
Charles Verron
†
PhilippeAubert Gauthier
†
Jennifer Langlois
†
Catherine Guastavino
†
Multimodal Interaction Lab  McGill University  Montreal, Canada
†
Centre for Interdisciplinary Research on Music Media and Technology  Montreal, Canada
Groupe Acoustique Université de Sherbrooke  Sherbrooke, Canada
ABSTRACT
A binaural sinusoids+noise synthesis model is proposed for repro
ducing interior aircraft sounds. First, a method for spectral and spa
tial characterization of binaural interior aircraft sounds is presented.
This characterization relies on a stationarity hypothesis and involves
four estimators: left and right power spectra, interaural coherence
and interaural phase difference. Then we present two extensions of
the classical sinusoids+noise model for the analysis and synthesis
of stationary binaural sounds. First, we propose a binaural estima
tor using relevant information in both left and right channels for
peak detection. Second, the residual modeling is extended to inte
grate two interaural spatial cues, namely coherence and phase dif
ference. The resulting binaural sinusoids+noise model is evaluated
on a recorded aircraft sound.
Index Terms— Binaural modeling, aircraft sound, spectral en
velope, interaural coherence, interaural phase difference.
1. INTRODUCTION
Simulation of aircraft noises has received an increased attention
in the last decade, primarily as a means to generate stimuli for
studies on noise annoyance to aircraft ﬂyover noises. Synthesis
methods have been proposed to reproduce such sounds by broad
band, narrowband, and sinusoidal components, including the time
varying aircraft position relative to the observer, directivity patterns,
Doppler shift, atmospheric and ground effects [1, 2]. Realistic syn
thesis of interior aircraft sounds have, however, received less at
tention, despite its potential for ﬂexible sound generation in ﬂight
simulators. Sources of noise in aircraft cabins were reviewed in
[3]. The primary sources are the aircraft engines and the turbulent
boundary layer noise. A promising approach for simulating aircraft
sounds as a combination of sinusoidal and noisy components have
been proposed in [4] for monophonic signals. Compared to raw
aircraft recordings, such model has the advantage to enable para
metric transformations: individual modiﬁcations of the sinusoidal
and noisy components allow to investigate precisely their impact on
passengers comfort.
In this paper, we investigate the analysis/synthesis of binaural
interior aircraft sounds. Our contribution here is to extend previous
models to reproduce both spectral and spatial (binaural) character
istics of the aircraft sounds. To do so, we propose an extension to
the sinusoids+noise analysis/synthesis model: we present a binaural
peak detection estimator and propose two additional binaural cues
for the residual modeling: the interaural coherence and the interau
ral phase difference.
Correspondance: Catherine.guastavino@mcgill.ca
Figure 1: Binaural recordings positions in a CRJ900 Bombardier
aircraft. Three rows have been recorded (04, 12, and 22) with 7
positions per row (indicated by green dots).
The paper is organized as follows: ﬁrst we present spectral
and spatial descriptors used for characterization of binaural aircraft
sounds, then we present the binaural analysis/synthesis algorithm
and its evaluation on a recorded aircraft sound.
2. CHARACTERIZATION OF BINAURAL AIRCRAFT
SOUNDS
The original sound recordings were provided by Bombardier.
Twentyone binaural recordings were made at different positions
inside a CRJ900 Bombardier aircraft (see ﬁgure 1). The data were
recorded at 16 bits and 48kHz with a SQuadriga recorder from
Head Acoustics with binaural microphones BHS I Binaural Head
set mounted on a human head. All recordings were 16 seconds long
and the ﬂight conditions were constant across the measurements:
height 35,000 feet, speed Mach 0.77. To characterize these binau
ral recordings, we ﬁrst compute the shorttime Fourier transform
(STFT) deﬁned for a monophonic signal x[n], ∀m ∈ [1; M], ∀k ∈
[0; Na −1] by:
X(m, k) =
Na−1
n=0
wa(n)x(mMa + n)e
−j2π
k
Na
n
where wa is an analysis window of size Na, Ma is the analysis hop
size and M is the total number of blocs. STFT XL and XR are
calculated for Left and Right signals respectively. Figure 4 illus
trates the magnitude of the STFT for the binaural signal recorded
in the aircraft at position SAG (row 22). It shows that interior
aircraft sounds have stationary spectral properties. They contain
2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 1619, 2011, New Paltz, NY
time (s)
f
r
e
q
u
e
n
c
y
(
H
z
)
0 5 10 15
10
100
1k
10k
time (s)
f
r
e
q
u
e
n
c
y
(
H
z
)
0 5 10 15
10
100
1k
10k
!80
!60
!40
!20
0
Figure 2: Binaural signal recorded at position SAG (row 22). The timefrequency representations (Na = 8192) of the left and right signals
exhibit the salient components of interior aircraft sounds: spectral lines (sinusoids) and stationary broadband noise. The presence of sinusoidal
components with very close frequency (e.g., around 100Hz) appears as a slow amplitude modulation.
sinusoidal components (spectral lines) and broadband background
noise. The presence of sinusoids at very close frequencies can be
seen as slow amplitude modulation in the STFT. We will see in sec
tion 3.2 that detecting precisely these close frequencies require a
very high frequency precision. A compact representation can be
obtained by characterizing binaural aircraft sounds in terms of sta
tionary spectral and spatial cues. For each sound we compute left
and right power spectra (also called spectral envelopes throughout
the text) and two frequencydependant binaural cues: the interau
ral coherence (IC) and interaural phase difference (IPD). IPD and
differences between left and right power spectra are related to the
perceived position of sound sources while IC is related to perceived
auditory source width [5, 6]. Due to the stochastic nature of aircraft
sounds, a single shorttime spectrum is not sufﬁcient to estimate
these quantities properly. However the estimation can be done by
averaging shorttime spectra across time, as presented below.
The power spectrum is estimated with the Welch’s method [7].
The STFT (calculated with an analysis hop size Ma =
Na
2
) is time
averaged to get the estimate of the power spectrum:
S(k) =
1
M
M
m=1
 X(m, k) 
2
Spectral envelopes SL and SR are calculated for Left and Right
signals respectively. Note that the frequencydependant interaural
level difference (ILD) can be deduced from the binaural spectral
envelopes by: ILD(k) = SR(k) −SL(k).
The coherence function [8] gives a measure of correlation be
tween two signals, per frequency bin. It is a real function of fre
quency, with values ranging between 0 and 1. Here it is estimated
between Right and Left signals by:
IC(k) =

M
m=1
XR(m, k) XL(m, k) 
2
M
m=1
 XR(m, k) 
2
M
m=1
 XL(m, k) 
2
The phase difference between Right and Left signals is esti
mated by:
IPD(k) =
M
m=1
XR(m, k) XL(m, k)
When analyzing these signal properties, the choice of the analy
sis window is critical since it has a direct impact on the spectral res
olution. For our study we considered the digital prolate spheroidal
window (DPSW) family which is a particular case of the “discrete
prolate spheroidal sequences” developed by Slepian [9]. We choose
an optimal frequency concentration under fc =
3.5
Na
, resulting in
a DPSW window having nearly all its energy contained in its ﬁrst
lobe which is 7point large.
Based on these spectral and spatial estimators, we propose a
sinusoids+noise model for interior aircraft sounds: a binaural sinu
soidal extraction method is proposed, then the residual is modeled
in terms of spectral envelopes, IC and IPD cues.
3. SYNTHESIS MODEL
3.1. Sinusoids+noise model
In [10] an analysis/synthesis system based on a deterministic plus
stochastic decomposition of a monophonic sound was presented.
The deterministic part d(t) is a sum of I(t) sinusoids whose in
stantaneous amplitude ai(t) and frequency fi(t) vary slowly in
time, while the stochastic part is modeled as a “timevarying” ﬁl
tered noise s(t). This deterministic plus stochastic modeling (also
called sinusoids+noise model) has been used extensively for analy
sis, parametric transformation and synthesis of speech, musical and
environmental sounds (see for example [11]). Extracting relevant
information from both left and right channels can improve the re
liability of the sinusoids+noise analysis. This was used in [12] to
improve partial tracking. Here we propose a binaural peak detection
method using combined left/right information. We also extend the
residual modeling to the binaural case, by considering IC and IPD
cues.
3.2. Binaural extraction of stationary sinusoidal components
Among the various estimation methods available in the literature for
power spectrum estimation and sinusoidal detection (see for exam
ple [13, 14, 15]) we choose the modiﬁed periodogram ﬁrst proposed
by Welch [7] (see section 2). The proposed binaural peak detection
is performed on an average (binaural) Welch spectral estimator:
SLR(k) =
1
M
M
m=1
 XL(m, k) 
2
+  XR(m, k) 
2
2
(1)
A sinusoid is detected in SLR(k) at frequency ki if three con
ditions are satisﬁed:
• S
LR
(k
i
) is a local maximum
• ∃k∈ [k
i
−2K, k
i
]  20 log (S
LR
(k
i
)) > 20 log (S
LR
(k)) +∆
• ∃k∈ [k
i
, k
i
+2K]  20 log (S
LR
(k
i
)) > 20 log (S
LR
(k)) +∆
where ∆ is the peak detection threshold, set to 9 (dB) in this
study.
2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 1619, 2011, New Paltz, NY
Figure 3: Binaural peak detection. The peak detection algorithm
is processed on the averaged binaural power spectrum (1) to catch
peaks that would not be detectable on a single channel.
Using the average binaural power spectrum (1) for the peak de
tection ensures that each sinusoid is detected at the same frequency
in both channels (see ﬁgure 3). A perchannel peak detection would
miss a sinusoid attenuated in one channel, or detect it at a slightly
different frequency, leading to errors at the resynthesis stage. After
the peak detection, the sinusoid normalized frequency is fi =
k
i
Na
and its amplitude are derived for each channel by aiL = 2SL(ki)
and aiR = 2SR(ki). Note that wa is normalized to have a unity
sum (i.e.,
+inf
n=−inf
wa(n) =1) so that the amplitude of the peaks
is correctly estimated. Following the peak detection stage, two FFT
XL(k) and XR(k) are processed on the entire 16s long left and
right signals (Ntot =768000 samples). The phase of each detected
sinusoid is estimated by piL =
XL(ki) and piR =
XR(ki)
where ki is the nearest integer to fiNtot. To remove the sinusoid
from the original sounds, XL(k) and XR(k) are forced to 0 on
(
K
2
+1)
N
tot
Na
bins around ki. Then the IFFT is processed to get the
binaural residual which contains the background noise, while the
deterministic components are resynthesized in the time domain by:
dL(n) =
I
i=1
aiL cos(2πfin + piL)
where dL is the left ear signal (the same process is applied to syn
thesize dR).
Reproducing interior aircraft sounds requires a very good fre
quency resolution since sinusoidal components are sometimes sep
arated by only a few Hertz, producing beating sounds [3]. The beat
ing is perceptually salient and it should be reproduced correctly. To
capture very close sinusoids, we compute the STFT with a 2s long
analysis window wa (Na = 96000 samples at 48kHz) using the
DPSW so that each sinusoidal component is represented by only
K=7 bins, i.e., 3.5Hz in the 96000bin spectra. Note that very nar
row bands of noise (4K=15Hz) are also modeled as sinusoids with
this approach. The frequency resolution can be further increased by
zeropadding wa. As an example, zeropadding the window up to
8s (Na =384000 samples at 48kHz) provides a spacing of 0.125Hz
between each frequency bin, which is sufﬁcient to reproduce the
beating components of our sound database.
3.3. Binaural residual modeling
The binaural residual signal is modeled as a stationary stochas
tic process characterized by its left and right power spectra SL(k)
and SR(k), along with interaural cues IC(k) and IPD(k). These
quantities are estimated from the residual as described in section
2. Note that the analysis window wa is normalized to have a unity
power (i.e.,
+inf
n=−inf
w
2
a
(n) = 1 ) to compensate its global effect
on power spectra magnitudes [7]. Here the analysis window size
is called Ns. Synthesis of the stochastic residual is performed in
the timefrequency domain as proposed in [10], but we extend the
method to integrate IC and IPD cues. Binaural synthetic signals are
constructed in the timefrequency domain by:
YL(m, k) = SL(k)
g1(m, k)+j g2(m, k)
YR(m, k) = SR(k)
g1(m, k)+j g2(m, k)
IC(k)e
jIPD(k)
coherent part
+
g3(m, k)+j g4(m, k)
1 −IC(k)
noncoherent part
(2)
where g1, . . . , g4 are independent Gaussian noise sequences with
zero mean and standard deviation σ =
Ns
2
for k ∈
1;
Ns
2
−1
and σg
1
=σg
3
=
√
Ns and σg
2
=σg
4
=0 for k = 0 or
Ns
2
. Note that
the spectra are conjugate symmetric so we only consider positive
frequencies here (i.e., k ∈
0;
Ns
2
). For each frame the IFFT is
computed and the resulting shorttime segments y
m
are overlapped
and added to get the synthesized stochastic signal s:
y
m
L
(n) =
1
Ns
Ns−1
k=0
YL(m, k) e
2jπ
k
Ns
n
∀n ∈ [0; Ns −1]
sL(n) =
+inf
m=−inf
ws(n −mMs) y
m
L
(n −mMs)
where Ms is the synthesis hop size (the same process is applied to
get the right signal sR). The Nspoint synthesis window ws shall
respect
+inf
m=−inf
w
2
s
(n − mMs) = 1 ∀n so that the synthesized
stochastic process has a constant variance [15]. We choose ws to
be the square root of the Hanning window with a synthesis hop size
Ms =
Ns
2
.
3.4. Evaluation of the synthesis model
The proposed analysis/synthesis method was tested on the inﬂight
binaural recordings of our database. Sound examples are available
online [16]. The results are illustrated on ﬁgure 4 for the record
ing at position SAG (row 22) in the aircraft cabin. Spectral en
velopes (including sinusoidal peaks) and IC cues are correctly re
constructed. IPD is also well reproduced in frequency regions with
high IC (elsewhere the inﬂuence of IPD becomes negligible in the
model as indicated by (2)).
In addition, since the residual modeling does not require a very
high frequency resolution (needed for the sinusoidal analysis stage),
we used analysis/synthesis window sizes Ns ranging between 128
and 8192 samples as a reasonable latency/CPU/resolution tradeoff.
The effect of the window size on the resulting sound quality was
2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 1619, 2011, New Paltz, NY
further investigated in a formal listening test reported in [17]. The
results conﬁrmed that the original and resynthesized sounds were
indistinguishable for a window size greater or equal to 1024 sam
ples.
10
1
10
2
10
3
10
4
!60
!40
!20
0
p
o
w
e
r
s
p
e
c
t
r
u
m
L
(
d
B
)
ref
synth
error
10
1
10
2
10
3
10
4
!60
!40
!20
0
p
o
w
e
r
s
p
e
c
t
r
u
m
R
(
d
B
)
10
1
10
2
10
3
10
4
0
0.2
0.4
0.6
0.8
1
I
C
10
1
10
2
10
3
10
4
!2
0
2
frequency (Hz)
I
P
D
(
r
a
d
)
Figure 4: Result of the binaural analysis/synthesis algorithm for a
binaural signal recorded at position SAG (row 22) in the aircraft.
4. CONCLUSION
We presented a binaural model for interior aircraft sounds. The
analysis stage uses a binaural peak detection to extract sinusoidal
components and the residual is modeled by spectral envelopes, in
teraural coherence and phase difference. The model was validated
on inﬂight binaural recordings. Our method reproduces correctly
the spectral and spatial cues of the original sounds. A formal listen
ing test is presented in a companion paper [17] to further validate
the model perceptually.
5. ACKNOWLEDGEMENT
This research was jointly supported by an NSERC grant (CRDPJ
35713507) and research funds from CRIAQ, Bombardier and CAE
to A. Berry and C. Guastavino.
6. REFERENCES
[1] D. A. McCurdy and R. E. Grandle, “Aircraft noise synthesis
system,” NASA technical memorandum 89040, 1987.
[2] D. Berckmans, K. Janssens, H. V. der Auweraer, P. Sas, and
W. Desmet, “Modelbased synthesis of aircraft noise to quan
tify human perception of sound quality and annoyance,” Jour
nal of Sound and Vibration, vol. 311, no. 35, pp. 1175–1195,
2008.
[3] J. F. Wilby, “Aircraft interior noise,” Journal of Sound and
Vibration, vol. 190, no. 3, pp. 545–564, 1996.
[4] K. Janssens, A. Vecchio, and H. V. der Auweraer, “Synthesis
and sound quality evaluation of exterior and interior aircraft
noise,” Aerospace Science and Technology, vol. 12, no. 1, pp.
114–124, 2008.
[5] J. Blauert, Spatial Hearing. The MIT press edition, 1997.
[6] C. Faller, “Parametric multichannel audio coding: synthesis
of coherence cues,” IEEE Transactions on Audio, Speech, and
Language Processing, vol. 14, no. 1, pp. 299–310, 2006.
[7] P. D. Welch, “The Use of Fast Fourier Transform for the Esti
mation of Power Spectra: A Method Based on Time Averag
ing Over Short, Modiﬁed Periodograms,” IEEE Transactions
on Audio and Electroacoustics, vol. 15, pp. 70–73, 1967.
[8] J. O. Smith, Mathematics of the Discrete Fourier Transform
(DFT). W3K Publishing, 2007. [Online]. Available: http:
//www.w3k.org/books/
[9] D. Slepian, “Prolate spheroidal wave functions, Fourier anal
ysis, and uncertainty. V The discrete case,” Bell System Tech
nical Journal, vol. 57, pp. 1371–1430, 1978.
[10] X. Serra and J. O. Smith, “Spectral modeling synthesis: A
sound analysis/synthesis system based on a deterministic plus
stochastic decomposition,” Computer Music Journal, vol. 14,
no. 4, pp. 12–24, 1990.
[11] J. W. Beauchamp, Ed., Analysis, Synthesis, and Perception of
Musical Sounds: Sound of Music. Springer, 2007.
[12] M. Raspaud and G. Evangelista, “Binaural partial tracking,”
in Proc. of the 11th Int. Conference on Digital Audio Effects
(DAFx’08), 2008.
[13] J. G. Proakis and D. G. Manolakis, Digital signal processing:
Principles, algorithms, and applications, 3rd ed. Englewood
Cliffs, NJ.: Prentice Hall International, 1996.
[14] H. C. So, Y. T. Chan, Q. Ma, and P. C. Ching, “Comparison
of various periodograms for sinusoid detection and frequency
estimation,” IEEE Transactions on Aerospace and Electronic
Systems, vol. 35, pp. 945–950, 1999.
[15] S. Marchand, “Advances in spectral modeling of musical
sound,” Habilitation à Diriger des Recherches, Université Bor
deaux 1, 2008.
[16] [Online]. Available: http://mil.mcgill.ca/waspaa2011/model/
[17] J. Langlois, C. Verron, P.A. Gauthier, and C. Guastavino,
“Perceptual evaluation of interior aircraft sound models,” in
Proceedings of the IEEE Workshop on Applications of Signal
Processing to Audio and Acoustics (WASPAA 2011), 2011.
This action might not be possible to undo. Are you sure you want to continue?