2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 16-19, 2011, New Paltz, NY

BINAURAL ANALYSIS/SYNTHESIS OF INTERIOR AIRCRAFT SOUNDS
Charles Verron ￿

Philippe-Aubert Gauthier ￿

Jennifer Langlois ￿

Catherine Guastavino ￿
† ￿

Multimodal Interaction Lab - McGill University - Montreal, Canada

Centre for Interdisciplinary Research on Music Media and Technology - Montreal, Canada ￿

Groupe Acoustique Université de Sherbrooke - Sherbrooke, Canada
ABSTRACT
A binaural sinusoids+noise synthesis model is proposed for repro-
ducing interior aircraft sounds. First, a method for spectral and spa-
tial characterization of binaural interior aircraft sounds is presented.
This characterization relies on a stationarity hypothesis and involves
four estimators: left and right power spectra, interaural coherence
and interaural phase difference. Then we present two extensions of
the classical sinusoids+noise model for the analysis and synthesis
of stationary binaural sounds. First, we propose a binaural estima-
tor using relevant information in both left and right channels for
peak detection. Second, the residual modeling is extended to inte-
grate two interaural spatial cues, namely coherence and phase dif-
ference. The resulting binaural sinusoids+noise model is evaluated
on a recorded aircraft sound.
Index Terms— Binaural modeling, aircraft sound, spectral en-
velope, interaural coherence, interaural phase difference.
1. INTRODUCTION
Simulation of aircraft noises has received an increased attention
in the last decade, primarily as a means to generate stimuli for
studies on noise annoyance to aircraft flyover noises. Synthesis
methods have been proposed to reproduce such sounds by broad-
band, narrowband, and sinusoidal components, including the time-
varying aircraft position relative to the observer, directivity patterns,
Doppler shift, atmospheric and ground effects [1, 2]. Realistic syn-
thesis of interior aircraft sounds have, however, received less at-
tention, despite its potential for flexible sound generation in flight
simulators. Sources of noise in aircraft cabins were reviewed in
[3]. The primary sources are the aircraft engines and the turbulent
boundary layer noise. A promising approach for simulating aircraft
sounds as a combination of sinusoidal and noisy components have
been proposed in [4] for monophonic signals. Compared to raw
aircraft recordings, such model has the advantage to enable para-
metric transformations: individual modifications of the sinusoidal
and noisy components allow to investigate precisely their impact on
passengers comfort.
In this paper, we investigate the analysis/synthesis of binaural
interior aircraft sounds. Our contribution here is to extend previous
models to reproduce both spectral and spatial (binaural) character-
istics of the aircraft sounds. To do so, we propose an extension to
the sinusoids+noise analysis/synthesis model: we present a binaural
peak detection estimator and propose two additional binaural cues
for the residual modeling: the interaural coherence and the interau-
ral phase difference.
Correspondance: Catherine.guastavino@mcgill.ca
Figure 1: Binaural recordings positions in a CRJ900 Bombardier
aircraft. Three rows have been recorded (04, 12, and 22) with 7
positions per row (indicated by green dots).
The paper is organized as follows: first we present spectral
and spatial descriptors used for characterization of binaural aircraft
sounds, then we present the binaural analysis/synthesis algorithm
and its evaluation on a recorded aircraft sound.
2. CHARACTERIZATION OF BINAURAL AIRCRAFT
SOUNDS
The original sound recordings were provided by Bombardier.
Twenty-one binaural recordings were made at different positions
inside a CRJ900 Bombardier aircraft (see figure 1). The data were
recorded at 16 bits and 48kHz with a SQuadriga recorder from
Head Acoustics with binaural microphones BHS I Binaural Head-
set mounted on a human head. All recordings were 16 seconds long
and the flight conditions were constant across the measurements:
height 35,000 feet, speed Mach 0.77. To characterize these binau-
ral recordings, we first compute the short-time Fourier transform
(STFT) defined for a monophonic signal x[n], ∀m ∈ [1; M], ∀k ∈
[0; Na −1] by:
X(m, k) =
Na−1 ￿

n=0
wa(n)x(mMa + n)e
−j2π
k
Na
n
where wa is an analysis window of size Na, Ma is the analysis hop
size and M is the total number of blocs. STFT XL and XR are
calculated for Left and Right signals respectively. Figure 4 illus-
trates the magnitude of the STFT for the binaural signal recorded
in the aircraft at position SAG (row 22). It shows that interior
aircraft sounds have stationary spectral properties. They contain
2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 16-19, 2011, New Paltz, NY
time (s)
f
r
e
q
u
e
n
c
y

(
H
z
)

0 5 10 15
10
100
1k
10k
time (s)
f
r
e
q
u
e
n
c
y

(
H
z
)

0 5 10 15
10
100
1k
10k
!80
!60
!40
!20
0
Figure 2: Binaural signal recorded at position SAG (row 22). The time-frequency representations (Na = 8192) of the left and right signals
exhibit the salient components of interior aircraft sounds: spectral lines (sinusoids) and stationary broadband noise. The presence of sinusoidal
components with very close frequency (e.g., around 100Hz) appears as a slow amplitude modulation.
sinusoidal components (spectral lines) and broadband background
noise. The presence of sinusoids at very close frequencies can be
seen as slow amplitude modulation in the STFT. We will see in sec-
tion 3.2 that detecting precisely these close frequencies require a
very high frequency precision. A compact representation can be
obtained by characterizing binaural aircraft sounds in terms of sta-
tionary spectral and spatial cues. For each sound we compute left
and right power spectra (also called spectral envelopes throughout
the text) and two frequency-dependant binaural cues: the interau-
ral coherence (IC) and interaural phase difference (IPD). IPD and
differences between left and right power spectra are related to the
perceived position of sound sources while IC is related to perceived
auditory source width [5, 6]. Due to the stochastic nature of aircraft
sounds, a single short-time spectrum is not sufficient to estimate
these quantities properly. However the estimation can be done by
averaging short-time spectra across time, as presented below.
The power spectrum is estimated with the Welch’s method [7].
The STFT (calculated with an analysis hop size Ma =
Na
2
) is time-
averaged to get the estimate of the power spectrum:
S(k) = ￿ ￿ ￿ ￿

1
M
M ￿

m=1
| X(m, k) |
2
Spectral envelopes SL and SR are calculated for Left and Right
signals respectively. Note that the frequency-dependant interaural
level difference (ILD) can be deduced from the binaural spectral
envelopes by: ILD(k) = SR(k) −SL(k).
The coherence function [8] gives a measure of correlation be-
tween two signals, per frequency bin. It is a real function of fre-
quency, with values ranging between 0 and 1. Here it is estimated
between Right and Left signals by:
IC(k) =
|
￿
M
m=1
XR(m, k) XL(m, k) |
2 ￿

M
m=1
| XR(m, k) |
2
￿
M
m=1
| XL(m, k) |
2
The phase difference between Right and Left signals is esti-
mated by:
IPD(k) =
￿
M ￿

m=1
XR(m, k) XL(m, k)
When analyzing these signal properties, the choice of the analy-
sis window is critical since it has a direct impact on the spectral res-
olution. For our study we considered the digital prolate spheroidal
window (DPSW) family which is a particular case of the “discrete
prolate spheroidal sequences” developed by Slepian [9]. We choose
an optimal frequency concentration under fc =
3.5
Na
, resulting in
a DPSW window having nearly all its energy contained in its first
lobe which is 7-point large.
Based on these spectral and spatial estimators, we propose a
sinusoids+noise model for interior aircraft sounds: a binaural sinu-
soidal extraction method is proposed, then the residual is modeled
in terms of spectral envelopes, IC and IPD cues.
3. SYNTHESIS MODEL
3.1. Sinusoids+noise model
In [10] an analysis/synthesis system based on a deterministic plus
stochastic decomposition of a monophonic sound was presented.
The deterministic part d(t) is a sum of I(t) sinusoids whose in-
stantaneous amplitude ai(t) and frequency fi(t) vary slowly in
time, while the stochastic part is modeled as a “time-varying” fil-
tered noise s(t). This deterministic plus stochastic modeling (also
called sinusoids+noise model) has been used extensively for analy-
sis, parametric transformation and synthesis of speech, musical and
environmental sounds (see for example [11]). Extracting relevant
information from both left and right channels can improve the re-
liability of the sinusoids+noise analysis. This was used in [12] to
improve partial tracking. Here we propose a binaural peak detection
method using combined left/right information. We also extend the
residual modeling to the binaural case, by considering IC and IPD
cues.
3.2. Binaural extraction of stationary sinusoidal components
Among the various estimation methods available in the literature for
power spectrum estimation and sinusoidal detection (see for exam-
ple [13, 14, 15]) we choose the modified periodogram first proposed
by Welch [7] (see section 2). The proposed binaural peak detection
is performed on an average (binaural) Welch spectral estimator:
SLR(k) = ￿ ￿ ￿ ￿

1
M
M ￿

m=1
| XL(m, k) |
2
+ | XR(m, k) |
2
2
(1)
A sinusoid is detected in SLR(k) at frequency ki if three con-
ditions are satisfied:
• S
LR
(k
i
) is a local maximum
• ∃k∈ [k
i
−2K, k
i
] | 20 log (S
LR
(k
i
)) > 20 log (S
LR
(k)) +∆
• ∃k∈ [k
i
, k
i
+2K] | 20 log (S
LR
(k
i
)) > 20 log (S
LR
(k)) +∆
where ∆ is the peak detection threshold, set to 9 (dB) in this
study.
2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 16-19, 2011, New Paltz, NY
Figure 3: Binaural peak detection. The peak detection algorithm
is processed on the averaged binaural power spectrum (1) to catch
peaks that would not be detectable on a single channel.
Using the average binaural power spectrum (1) for the peak de-
tection ensures that each sinusoid is detected at the same frequency
in both channels (see figure 3). A per-channel peak detection would
miss a sinusoid attenuated in one channel, or detect it at a slightly
different frequency, leading to errors at the resynthesis stage. After
the peak detection, the sinusoid normalized frequency is fi =
k
i
Na
and its amplitude are derived for each channel by aiL = 2SL(ki)
and aiR = 2SR(ki). Note that wa is normalized to have a unity
sum (i.e.,
￿
+inf
n=−inf
wa(n) =1) so that the amplitude of the peaks
is correctly estimated. Following the peak detection stage, two FFT
XL(k) and XR(k) are processed on the entire 16s long left and
right signals (Ntot =768000 samples). The phase of each detected
sinusoid is estimated by piL =
￿
XL(ki) and piR =
￿
XR(ki)
where ki is the nearest integer to fiNtot. To remove the sinusoid
from the original sounds, XL(k) and XR(k) are forced to 0 on
(
K
2
+1)
N
tot
Na
bins around ki. Then the IFFT is processed to get the
binaural residual which contains the background noise, while the
deterministic components are resynthesized in the time domain by:
dL(n) =
I ￿

i=1
aiL cos(2πfin + piL)
where dL is the left ear signal (the same process is applied to syn-
thesize dR).
Reproducing interior aircraft sounds requires a very good fre-
quency resolution since sinusoidal components are sometimes sep-
arated by only a few Hertz, producing beating sounds [3]. The beat-
ing is perceptually salient and it should be reproduced correctly. To
capture very close sinusoids, we compute the STFT with a 2s long
analysis window wa (Na = 96000 samples at 48kHz) using the
DPSW so that each sinusoidal component is represented by only
K=7 bins, i.e., 3.5Hz in the 96000-bin spectra. Note that very nar-
row bands of noise (4K=15Hz) are also modeled as sinusoids with
this approach. The frequency resolution can be further increased by
zero-padding wa. As an example, zero-padding the window up to
8s (Na =384000 samples at 48kHz) provides a spacing of 0.125Hz
between each frequency bin, which is sufficient to reproduce the
beating components of our sound database.
3.3. Binaural residual modeling
The binaural residual signal is modeled as a stationary stochas-
tic process characterized by its left and right power spectra SL(k)
and SR(k), along with interaural cues IC(k) and IPD(k). These
quantities are estimated from the residual as described in section
2. Note that the analysis window wa is normalized to have a unity
power (i.e.,
￿
+inf
n=−inf
w
2
a
(n) = 1 ) to compensate its global effect
on power spectra magnitudes [7]. Here the analysis window size
is called Ns. Synthesis of the stochastic residual is performed in
the time-frequency domain as proposed in [10], but we extend the
method to integrate IC and IPD cues. Binaural synthetic signals are
constructed in the time-frequency domain by:
YL(m, k) = SL(k) ￿

g1(m, k)+j g2(m, k) ￿

YR(m, k) = SR(k) ￿ ￿

g1(m, k)+j g2(m, k) ￿￿

IC(k)e
jIPD(k) ￿
￿￿ ￿
coherent part
+ ￿

g3(m, k)+j g4(m, k) ￿￿

1 −IC(k) ￿
￿￿ ￿
non-coherent part ￿

(2)
where g1, . . . , g4 are independent Gaussian noise sequences with
zero mean and standard deviation σ = ￿

Ns
2
for k ∈ ￿

1;
Ns
2
−1 ￿

and σg
1
=σg
3
=

Ns and σg
2
=σg
4
=0 for k = 0 or
Ns
2
. Note that
the spectra are conjugate symmetric so we only consider positive
frequencies here (i.e., k ∈ ￿

0;
Ns
2 ￿

). For each frame the IFFT is
computed and the resulting short-time segments y
m
are overlapped
and added to get the synthesized stochastic signal s:
y
m
L
(n) =
1
Ns
Ns−1 ￿

k=0
YL(m, k) e
2jπ
k
Ns
n
∀n ∈ [0; Ns −1]
sL(n) =
+inf ￿

m=−inf
ws(n −mMs) y
m
L
(n −mMs)
where Ms is the synthesis hop size (the same process is applied to
get the right signal sR). The Ns-point synthesis window ws shall
respect
￿
+inf
m=−inf
w
2
s
(n − mMs) = 1 ∀n so that the synthesized
stochastic process has a constant variance [15]. We choose ws to
be the square root of the Hanning window with a synthesis hop size
Ms =
Ns
2
.
3.4. Evaluation of the synthesis model
The proposed analysis/synthesis method was tested on the in-flight
binaural recordings of our database. Sound examples are available
online [16]. The results are illustrated on figure 4 for the record-
ing at position SAG (row 22) in the aircraft cabin. Spectral en-
velopes (including sinusoidal peaks) and IC cues are correctly re-
constructed. IPD is also well reproduced in frequency regions with
high IC (elsewhere the influence of IPD becomes negligible in the
model as indicated by (2)).
In addition, since the residual modeling does not require a very
high frequency resolution (needed for the sinusoidal analysis stage),
we used analysis/synthesis window sizes Ns ranging between 128
and 8192 samples as a reasonable latency/CPU/resolution trade-off.
The effect of the window size on the resulting sound quality was
2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 16-19, 2011, New Paltz, NY
further investigated in a formal listening test reported in [17]. The
results confirmed that the original and resynthesized sounds were
indistinguishable for a window size greater or equal to 1024 sam-
ples.
10
1
10
2
10
3
10
4
!60
!40
!20
0
p
o
w
e
r

s
p
e
c
t
r
u
m

L

(
d
B
)


ref
synth
error
10
1
10
2
10
3
10
4
!60
!40
!20
0
p
o
w
e
r

s
p
e
c
t
r
u
m

R

(
d
B
)
10
1
10
2
10
3
10
4
0
0.2
0.4
0.6
0.8
1
I
C
10
1
10
2
10
3
10
4
!2
0
2
frequency (Hz)
I
P
D

(
r
a
d
)
Figure 4: Result of the binaural analysis/synthesis algorithm for a
binaural signal recorded at position SAG (row 22) in the aircraft.
4. CONCLUSION
We presented a binaural model for interior aircraft sounds. The
analysis stage uses a binaural peak detection to extract sinusoidal
components and the residual is modeled by spectral envelopes, in-
teraural coherence and phase difference. The model was validated
on in-flight binaural recordings. Our method reproduces correctly
the spectral and spatial cues of the original sounds. A formal listen-
ing test is presented in a companion paper [17] to further validate
the model perceptually.
5. ACKNOWLEDGEMENT
This research was jointly supported by an NSERC grant (CRDPJ
357135-07) and research funds from CRIAQ, Bombardier and CAE
to A. Berry and C. Guastavino.
6. REFERENCES
[1] D. A. McCurdy and R. E. Grandle, “Aircraft noise synthesis
system,” NASA technical memorandum 89040, 1987.
[2] D. Berckmans, K. Janssens, H. V. der Auweraer, P. Sas, and
W. Desmet, “Model-based synthesis of aircraft noise to quan-
tify human perception of sound quality and annoyance,” Jour-
nal of Sound and Vibration, vol. 311, no. 3-5, pp. 1175–1195,
2008.
[3] J. F. Wilby, “Aircraft interior noise,” Journal of Sound and
Vibration, vol. 190, no. 3, pp. 545–564, 1996.
[4] K. Janssens, A. Vecchio, and H. V. der Auweraer, “Synthesis
and sound quality evaluation of exterior and interior aircraft
noise,” Aerospace Science and Technology, vol. 12, no. 1, pp.
114–124, 2008.
[5] J. Blauert, Spatial Hearing. The MIT press edition, 1997.
[6] C. Faller, “Parametric multichannel audio coding: synthesis
of coherence cues,” IEEE Transactions on Audio, Speech, and
Language Processing, vol. 14, no. 1, pp. 299–310, 2006.
[7] P. D. Welch, “The Use of Fast Fourier Transform for the Esti-
mation of Power Spectra: A Method Based on Time Averag-
ing Over Short, Modified Periodograms,” IEEE Transactions
on Audio and Electroacoustics, vol. 15, pp. 70–73, 1967.
[8] J. O. Smith, Mathematics of the Discrete Fourier Transform
(DFT). W3K Publishing, 2007. [Online]. Available: http:
//www.w3k.org/books/
[9] D. Slepian, “Prolate spheroidal wave functions, Fourier anal-
ysis, and uncertainty. V- The discrete case,” Bell System Tech-
nical Journal, vol. 57, pp. 1371–1430, 1978.
[10] X. Serra and J. O. Smith, “Spectral modeling synthesis: A
sound analysis/synthesis system based on a deterministic plus
stochastic decomposition,” Computer Music Journal, vol. 14,
no. 4, pp. 12–24, 1990.
[11] J. W. Beauchamp, Ed., Analysis, Synthesis, and Perception of
Musical Sounds: Sound of Music. Springer, 2007.
[12] M. Raspaud and G. Evangelista, “Binaural partial tracking,”
in Proc. of the 11th Int. Conference on Digital Audio Effects
(DAFx’08), 2008.
[13] J. G. Proakis and D. G. Manolakis, Digital signal processing:
Principles, algorithms, and applications, 3rd ed. Englewood
Cliffs, NJ.: Prentice Hall International, 1996.
[14] H. C. So, Y. T. Chan, Q. Ma, and P. C. Ching, “Comparison
of various periodograms for sinusoid detection and frequency
estimation,” IEEE Transactions on Aerospace and Electronic
Systems, vol. 35, pp. 945–950, 1999.
[15] S. Marchand, “Advances in spectral modeling of musical
sound,” Habilitation à Diriger des Recherches, Université Bor-
deaux 1, 2008.
[16] [Online]. Available: http://mil.mcgill.ca/waspaa2011/model/
[17] J. Langlois, C. Verron, P.-A. Gauthier, and C. Guastavino,
“Perceptual evaluation of interior aircraft sound models,” in
Proceedings of the IEEE Workshop on Applications of Signal
Processing to Audio and Acoustics (WASPAA 2011), 2011.

Sign up to vote on this title
UsefulNot useful