Académique Documents
Professionnel Documents
Culture Documents
quyt nh)
1. INTRODUCTION
Mixed Excitation Linear Prediction (MELP) vocoder is one of the most recognized
and widely used speech coding methods due to its speech quality, compression
rate, and robustness to adverse working conditions such as ambient noises or
imperfect transmission channels a desired requirement for military applications.
It has a range of applications such as digital voice in high frequency (HF)
transceivers and in secured voice applications. MELP was standardized by the US
Department of Defense in 1997 known as MIL-STD-3005 [1]. This vocoder has
been improved and re-standardized during the time 1998-2001 under the name
MELPe (enhanced MELP) with key additional features: a new compression rate of
1200 bits per second, improvements in coding and decoding processes, noise preprocessing to remove background noise, transcoding between code rate 2400 bits/s
and 1200 bits/s, and a new post filter [8]. In this paper, only the first version MELP
was studied.
To see the demand of the MELP vocoder, a survey was conducted on highquality HF transceivers complied with NATO standards either currently equipped
for Vietnamese defense forces or come from prestigious HF manufactures. From
the survey, a finding is that most of the high-end HF transceivers do use the MELP
coding standard such as HF6000 of Tadiran Communications Ltd, TR2400 of
Grintek, Codans NGT SRx of Codan Radio [10]. That observation proved the
quality and prevalence of MELP over other vocoders used in HF transceivers. It is
worth noting that MELP was evaluated with many different languages such as
English, French, German, countries in North Atlantic Treaty Organization (NATO);
MELP for Vietnamese has been in use and practically has no big problems
reported; however, there have been no official reports on this.
Among hardware platforms for MELP implementations, Texas Instrument
(TI) DSP C5000 family is a good candidate; virtually all commercial MELP
products available in the market do support this platform. TMS320C5000 TM DSP
provides fixed-point low-power 16-bit DSPs with performance up to 300 Mhz.
This DSP family is also rich in peripherals and has a large portion of on-chip
memory to reduce the overall system cost. With these reasons, C5000 devices are a
perfect fit for a variety of low power and cost-effect signal processing solutions
including portable devices in audio, voice, medical and biometric applications [9].
It should also be mentioned that Texas Instrument provides not only the C5000
DSP chips but also a large set of supporting hardware and software resources to
help developers rapidly accomplish their tasks, some of them can be named: a
variety of DSK (DSP starter kit) and EVM (evaluation module) boards,
TMS320C55x DSP Library (DSPLIB), C5000 Chip Support Library (CSL) and
numerous helpful applications reports.
From the strong demand of low-bit rate speed coding MELP and surveys on
available hardware platforms, the research group decided to study and implement a
real-time MELP vocoder system on C55x, particularly in C5509 and C5510, and
gained some significant results that will be presented in this paper. The structure of
this paper is as follows. Section 1 presents the importance of MELP for military
applications and a quick introduction to low-power low-cost C5000 TI DSPs.
Section 2 briefly describes the MELP algorithm. Section 3 analyzes the C5000
systems used to develop speech coder MELP, Section 4 shows evaluation of the
system with detailed experimental results, and finally section 5 gives conclusions
and feature works.
2. MELP VOCODER ALGORITHM DESCRIPTION
MELP can be classified in the group of vocoders using Linear Prediction Coding
(LPC) model. In this group there have been well-known coders, CELP, LPC-10,
LPC-10e and MELP, to name a few. MELP provides equivalent or better
performance than the 4800 bits per second CELP coder (Federal Standard 1016) at
a lower bit rate [4]. Generally, MELP was developed based on LPC-10 (FS-1015,
STANAG 4198) with five major changes. They are mixed-excitation, aperiodic
pulses a new voicing state for jittery voiced frames, pulse dispersion, adaptive
spectral enhancement, and Fourier magnitude modeling [1-3].
The mixed-excitation is the combination of a pulse train and a random noise
which makes MELP differs from the conventional LPC model when the excitation
source is either the pulse train or noise at a time. This combination is implemented
using a multi-band mixing model which simulates frequency dependent voicing
strengths. The goal of this multi-band mixed-excitation is to reduce the buzz
usually associated with LPC vocoders, especially in broadband acoustic noise [1].
Aperiodic pulses are used in the excitation model where a voiced speech is
classified into voiced (periodic) and jittery voiced (aperiodic). Jittery voiced
speech is often observed during the transition regions between voiced and
unvoiced segments of the speech signal. This feature allows the synthesizer to
reproduce erratic glottal pulses without introducing tonal noises [1]. The pulse
dispersion is implemented using fixed pulse filter based on a spectrally flattened
triangle pulse. This filter has the effect of spreading the excitation filter with a
pitch period. This, in turn, reduces the harsh quality of the synthetic speech [1].
The adaptive spectral enhancement filter is used to enhance the formant structure
in the synthetic speech; it is constructed based on the poles of the LPC vocal tract
filter. This filter improves the match between synthetic and natural bandpass
waveforms, and introduces a more natural quality to the speech output [1]. Beside
the remarkable already mentioned improvements, another feature should be paid
attention is the Fourier magnitudes which are used to better model the speech
production process than LPC models with a more accurate excitation source [1].
Block diagrams of MELP vocoder with coding (analysis) and decoding
(synthesis) processes taken from [1] are presented in Figure 1 and 2 in that order.
In the analysis process, one heavy and important procedure is used repeatedly and
intensively is the pitch determination, which includes integer pitch search and
fractional pitch refinement [1, section A5.2.4] and [5]. Together with pitch
determination, the quantization of LPC coefficients, consisting of the conversion of
LPC coefficients to the Line Spectrum Frequency (LSF) form [1, 7] and Multistage vector quantization (MSVQ) of LSFs [1, 6], are the most computationally
heavy in the MELP algorithm. It should also be note that, the Fourier magnitudes
of the first 10 pitch harmonics are computed from the prediction residual
generated by the quantized prediction coefficients (LSFs get converted back to
LSP). Therefore, this step has to be done after the LPC quantization. In the
decoding process, pitch is decoded first since it contains the mode information
voiced, unvoiced, and frame erasures. If a frame is detected as an erasure either
with pitch information or by error detection, then a frame repeat mechanism is
implemented, all the parameters for the current frame are replaced with the
parameters from the previous frame. The decoding process generally takes steps in
a reverse order to the coding counterpart with a notice that it interpolates
parameters pitch-synchronously for each synthesized pitch period. The interpolated
parameters are the gain (in dB), LSFs, pitch, jitter, Fourier magnitudes, bandpass
voicing strengths, and the spectral tilt coefficient for the adaptive spectral
enhancement filter.
Input
speech
Pitch
calculatio
n
Bandpass
voicing
analysis
LPC residual
calculation
Peakiness
calculation
Compute
LSFs from
LPC
coefficients
Quantize
gain, pitch,
LSFs,
bandpass
voicing
Fractional
pitch
refinement
Final
pitch
calculatio
n
Compute
Fourier
magnitudes
and quantize
Aperiodic
flag
Pitch
doubling
check
Linear
Prediction
analysis
Gain
calculation
Average
pitch
update
MELP
frame
MELP has a frame size of 22.5 ms which contains 180 samples at sampling
rate 8000 samples per second; each sample has a resolution of 16 bits. The
recommended analog voice requirement is in the range from 100 Hz to 3800 Hz.
Voiced
25
8
8
7
4
1
1
54
Unvoiced
25
8
7
13
1
54
10
In the C5000 devices, C5509 (full name TMS320VC5509A) and C5510 (full name
TMS320VC5510A) are of the most high-end products. With a sophisticated DSP
architecture inside focusing on parallelism and power reduction, algorithms with
high complexity can be performed efficiently and in real-time in C5000. Some key
hardware features are: a complex internal bus structure composed of one program
bus, three data read buses, two data write buses, and additional buses dedicated to
peripheral and DMA activity which provide the ability to perform up to three data
reads and two data writes in a single cycle, two multiply-accumulate (MAC) units,
each capable of 17-bit x 17-bit multiplication in a single cycle, a central 40-bit
arithmetic/logic unit (ALU) supported by an additional 16-bit ALU, a fully
protected pipeline structure with predictive branching capability. Both C5509 and
C5510 have a set of valuable peripherals such as Timer (2), McBSP (3), DMA(6),
Programmable Phase-Locked Loop Clock Generator, but while C5009A is richer
with USB 1.1, I2C interfaces, C5510 is with more on-chip memory - 64K Bytes of
Dual-Access RAM (DARAM) 256K Bytes of Single-Access RAM (SARAM) over
64K Bytes of Dual-Access RAM (DARAM) 192K Bytes of Single-Access RAM
(SARAM) [9]. Materials for help, guidance on the hardware design and software
programming for these two devices have been well documented and easy to find
this will much help ones who start working on TI DSP.
11
development environment (IDE) was Code Composer Studio version 3.3 provided
by TI. CCS 3.3 which includes compilers for each of TI's device families, source
code editor, project build environment, debugger, profiler, simulators, real-time
operating system (DSP/BIOS) and many other features. CCS 3.3 is also a powerful
IDE cable of excellent compiler optimization with a range of different options
which can help developers easily and quickly speed up the performance of the
implemented algorithms [9]. The system used to develop the MELP vocoder and
the online real-time model are presented in Figure 3 and 4 respectively.
12
Speech Quality
Excellent
Good
Fair
Poor
Bad
Level of Distortion
Imperceptible
Just perceptible, but not annoying
Perceptible and slightly annoying
Annoying but not objectionable
Very annoying and objectionable
Filename
Eng_M.wav [13]
Language
English
Male/Female
Male
13
Eng_F.wav [13]
Vn_M.wav
Vn_F.wav
Vov1.wav
reference_64p0k.wav [14]
English
Vietnamese
Vietnamese
Vietnamese
English
Female
Male
Female
Female
Both
Filename
C55x MELP
Commercial
products
1
2
3
4
5
6
Eng_M.wav
Eng_F.wav
Vn_M.wav
Vn_F.wav
Vov1.wav
reference_64p0k.wav
2.641
2.384
2.631
2.267
2.713
3.106
2.666 [13]
2.445 [13]
Unavailable
Unavailable
Unavailable
2.970 (*)
0.4
0.2
0.2
-0.2
-0.2
-0.4
5000
10000
0.4
15000
-0.4
5000
10000
14
15000
0.4
0.2
0.2
-0.2
-0.2
-0.4
0.5
0.4
1.5
-0.4
0.5
x 10
1.5
2
4
x 10
15
ACKNOWLEDGES
This work is supported by project 118/2013/H NT (2013-2014), funded by
Ministry of science and technology of Vietnam.
REFERENCES
[1] U. S. DoD, MIL-STD-3005, Department of Defense Telecommunications
Systems Standard, 1999.
[2] A. V. McCree and T. P. Barnwell III, A mixed excitation LPC vocoder model
for low bit rate speech coding, Speech and Audio Processing, IEEE Transactions
on, vol. 3, no. 4, pp. 242250, 1995.
[3] L. M. Supplee, R. P. Cohn, J. S. Collura, and A. V. McCree, MELP: the new
federal standard at 2400 bps, in Acoustics, Speech, and Signal Processing, 1997.
ICASSP-97., 1997 IEEE International Conference on, 1997, vol. 2, pp. 15911594.
[4] M. Kohler, A comparison of the new 2400 bps MELP federal standard with
other standard coders, in Acoustics, Speech, and Signal Processing, 1997.
ICASSP-97., 1997 IEEE International Conference on, 1997, vol. 2, pp. 15871590.
[5] Y. Medan, E. Yair, and D. Chazan, Super resolution pitch determination of
speech signals, Signal Processing, IEEE Transactions on, vol. 39, no. 1, pp. 40
48, 1991.
[6] W. P. LeBlanc, B. Bhattacharya, S. A. Mahmoud, and V. Cuperman, Efficient
search and design procedures for robust multi-stage VQ of LPC parameters for 4
kb/s speech coding, Speech and Audio Processing, IEEE Transactions on, vol. 1,
no. 4, pp. 373385, 1993.
[7] P. Kabal and R. P. Ramachandran, The computation of line spectral
frequencies using Chebyshev polynomials, Acoustics, Speech and Signal
Processing, IEEE Transactions on, vol. 34, no. 6, pp. 14191426, 1986.
[8] MELP and MELPe Vocoder on Wikipedia
http://en.wikipedia.org/wiki/Mixed-excitation_linear_prediction
[9] TI official websites on C5000, TMS320VC5510a, TMS320VC5509a, and CCS
3.3
[10] Technical specifications of Tadiran HF6000, Grintek TR2400, Codans
NGT SRx
[11] ITU P.862, Perceptual evaluation of speech quality (PESQ), an objective
method for end-to-end speech quality assessment of narrowband telephone
networks and speech codecs, ITU Recommendation P.862, 2000.
[12] P. C. Loizou, Speech Enhancement: Theory and Practice, 1st ed. CRC Press,
2007.
[13] MELP commercial product provided by Signalogic
http://www.signalogic.com/index.pl?page=codec_samples
[14] MELP commercial product provided by Vocal
http://www.vocal.com/audio-examples/other-speech-coder-audio-examples/
16
17