Vous êtes sur la page 1sur 240

UNIT-IV

MODULATION & SIGNAL PROCESSING


Analog and digital modulation techniques,
Performance of various modulation techniques
Spectral efficiency, Error rate, Power Amplification,
Equalization/Rake receiver concepts,
Diversity and Space-time processing,
Speech coding and channel coding.

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Modulation Techniques for


Mobile Radio

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Analog and Digital Modulation,


Line Coding, AM/FM/SSB

General structure of a communication system


Info.

Receive
d
Receive
info.
User
r

Transmitter
Formatter

Source
encoder

Channel
encoder

Modulator

Receiver
Formatter
Lecture 1

Source
decoder

Channel
decoder
4

Demodulator

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Source

SOURCE

Nois
Transmitted e
Receive
signal
d
Transmitter
Channe signal
l

Learning fundamental issues in designing a


digital communication system (DCS):

Formatting and source coding


Modulation (Baseband and bandpass signaling)
Channel coding
Equalization
Synchronization
....

Design goals
Trade-off between various parameters
5

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Utilized techniques

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Digital Fundamentals:
The Sampling Theorem, Nyquist
Sampling

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Key Digital Modulation


Techniques:
BPSK, OFDM, GMSK

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Spread Spectrum:
DS-SS, FH-SS, Coding Gain,
Fading Margins

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Amplitude Modulation

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Double Sideband Spectrum

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

SSB Modulators

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Tone-in Band SSB

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Product Detection

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

VCO circuit

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Wideband FM generation

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Slope Detector for FM

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Digital Demod for FM

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

PLL Demod for FM

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Phase-shift quadrature FM demod

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

FM Demod circuit

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Line Coding spectra

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

RZ and NRZ Line Codes

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Nyquist Pulses for zero-ISI

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Raised Cosine Spectrum

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Spectrum of Raised Cosine pulse

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Raised Cosine pulses

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

RF signal usig Raised Cosine

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Gaussian pulse-shapes

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

BPSK constellation

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Virtue of pulse shaping

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

BPSK Coherent demodulator

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Differential PSK encoding

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

DPSK modulation

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

DPSK receiver

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

QPSK constellation diagrams

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Virtues of Pulse Shaping

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

QPSK modulation

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

QPSK receiver

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Offset QPSK waveforms

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Pi/4 QPSK signaling

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Pi/4 QPSK phase shifts

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Pi/4 QPSK transmitter

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Differential detection of pi/4 QPSK

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

IF Differential Detection

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

FM Discriminator detector

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

FSK Coherent Detection

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Noncoherent FSK

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Minimum Shift Keying spectra

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

MSK modulation

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

MSK reception

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

GMSK spectral shaping

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

GMSK spectra shaping

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Simple GMSK generation

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

GMSK Demodulator

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Digital GMSK demodulator

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

8-PSK Signal Constellation

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Bandwidth vs. Power Efficiency

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Pulse Shaped M-PSK

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

16-QAM Signal Constellation

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

QAM efficiencies

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

M-ary FSK efficiencies

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

PN Sequence Generator

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Direct Sequence Spread Spectrum

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Direct Sequence Spreading

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Frequency Hopping Spread


Spectrum

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

CDMA Multiple Users

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Effects of Fading

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Irreducible Bit Error Rate due to multipath

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Irreducible Bit Error Rate due to


multipath

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Simulation of Fading and Multipath

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Irreducible BER due to fading

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Irreducible BER due to fading

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

BER due to fading & multipath

Outline

Hamming Code Revisit


ReedMuller code

Cyclic Code
CRC Code
BCH Code
RS Code

Term paper

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Diversity
Linear Code

Three techniques are used independently or in tandem to improve


receiver signal quality
Equalization compensates for ISI created by multipath with time
dispersive channels (W>BC)
Change the overall response to remove ISI
Diversity also compensates for fading channel impairments, and is
usually implemented by using two or more receiving antennas
Multiple received copies: Spatial diversity, antenna polarization
diversity, frequency diversity, time diversity.
Reduces the depth and duration of the fades experienced by a
receiver in a flat fading (narrowband) channel
Channel Coding improves mobile communication link performance by
adding redundant data bits in the transmitted message
Channel coding is used by the Rx to detect or correct some (or all)
of the errors introduced by the channel (Post detection technique)
Block code and convolutional code

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Equalization, Diversity, and Channel Coding

Deep Fading
Channel Coding
2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Overcoming Channel
Impairments

Requires no training overhead


Can provides significant link improvement with little added
cost
Diversity decisions are made by the Rx, and are unknown
to the Tx
Diversity concept
If one radio path undergoes a deep fade, another independent
path may have a strong signal
By having more than one path to select from, both the
instantaneous and average SNRs at the receiver may be
improved, often by as much as 20 dB to 30 dB
Diversity order
How many independent copies
How many links to bring down the system

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Diversity Techniques

EQUALIZATION,
DIVERSITY &
CODING
2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

DIVERSITY

I. Introduction

1) ACI/CCI system generated interference


2) Shadowing large-scale path loss from LOS obstructions
3) Multipath Fading rapid small-scale signal variations
4) Doppler Spread due to motion of mobile unit

All can lead to significant distortion or attenuation of Rx signal

Degrade Bit Error Rate (BER) of digitally


modulated signal

79

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

MRC Impairments:

1) Equalization
2) Diversity
3) Channel Coding
Used independently or together
We will consider Diversity and Channel Coding
80

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Three techniques are used to improve Rx


signal quality and lower BER:

Effectiveness of each varies widely in


practical wireless systems
Cost & complexity are also important issues
Complexity in mobile vs. in base station

81

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

These techniques improve mobile radio


link performance

III. Diversity Techniques

Diversity : Primary goal is to reduce depth &


duration of small-scale fades
Use multiple Rx antennas in mobile or base station
Why would this be helpful?

Even small antenna separation ( ) changes phase of


signal constructive /destructive nature is changed

Other diversity types polarization, frequency, &


time
82

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Spatial or antenna diversity most common

Goal is to make use of several independent


(uncorrelated) received signal paths
Why is this necessary?

Select path with best SNR or combine


multiple paths improve overall SNR
performance

83

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Exploits random behavior of MRC

Most widely used


Use multiple antennas separated in space
At a mobile, signals are independent if separation > /
2
But it is not practical to have a mobile with multiple
antennas separated by / 2 (7.5 cm apart at 2 GHz)
Can have multiple receiving antennas at base stations,
but must be separated on the order of ten wavelengths
(1 to 5 meters).
84

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Microscopic diversity combat small-scale


fading

85

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Since reflections occur near receiver, independent


signals spread out a lot before they reach the base
station.
a typical antenna configuration for 120 degree sectoring.
For each sector, a transmit antenna is in the center, with
two diversity receiving antennas on each side.
If one radio path undergoes a deep fade, another
independent path may have a strong signal.
By having more than one path one select from, both the
instantaneous and average SNRs at the receiver may be
improved

Spatial or Antenna Diversity 4 basic types


M independent branches
Variable gain & phase at each branch G
Each branch has same average SNR:

Instantaneous SNR i

1
p( i ) e

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Eb
SNR
N0
, the pdf of i

i 0 (6.155)

1
Pr i p( i )d i e

0
0

d i 1 e

86

The probability that all M independent diversity branches


Rx signal which are simultaneously less than some
specific SNR threshold

Pr i 1 PM ( ) 1 (1 e / ) M
The pdf of :

d
M
pM ( )
PM ( ) 1 e
d

M 1

Average SNR improvement offered by selection diversity

pM ( )d Mx 1 e

x M 1

e x dx, x

k 1 k

87

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Pr 1 ,... M (1 e / ) M PM ( )

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

88

1) Selection diversity
2) Feedback diversity
3) Maximal radio combining
4) Equal gain diversity

89

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Space diversity methods:

Rx selects branch with highest instantaneous SNR


new selection made at a time that is the reciprocal of
the fading rate
this will cause the system to stay with the current
signal until it is likely the signal has faded
SNR improvement :
is new avg. SNR
: avg. SNR in each branch

90

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

1) Selection Diversity simple & cheap

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

91

Average SNR is 20 dB
Acceptable SNR is 10 dB
Assume four branch diversity
Determine that the probability that one signal
has SNR less than 10 dB

92

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Example:

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

93

scan each antenna until a signal is found that is


above predetermined threshold
if signal drops below threshold rescan
only one Rx is required (since only receiving one
signal at a time), so less costly still need multiple
antennas

94

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

2) Scanning Diversity

signal amplitudes are weighted according to


each SNR
summed in-phase
most complex of all types
a complicated mechanism, but modern DSP
makes this more practical especially in the
base station Rx where battery power to
perform computations is not an issue
95

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

3) Maximal Ratio Diversity

The resulting signal envelop applied to detector:


M

rM Gi ri
i 1

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Total noise power:


M

NT N Gi2
i 1

SNR applied to detector:

rM2

2 NT
96

( rM is maximized when Gi ri / N

The SNR out of the diversity combiner is the sum of


the SNRs in each branch.
97

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

The voltage signals i from each of the M diversity


branches are co-phased to provide coherent voltage
addition and are individually weighted to provide
optimal SNR

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

The probability that M less than some


specific SNR threshold

98

gives optimal SNR improvement :


i: avg. SNR of each individual branch
i = if the avg. SNR is the same for each
branch
M

i 1

i 1

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

M i i M

99

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

100

combine multiple signals into one


G = 1, but the phase is adjusted for each
received signal so that
The signal from each branch are co-phased
vectors add in-phase

better performance than selection diversity

101

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

4) Equal Gain Diversity

IV. Time Diversity

Time spacing > coherence time (coherence time is


the time over which a fading signal can be
considered to have similar characteristics)
So signals can be considered independent
Main disadvantage is that BW efficiency is
significantly worsened signal is transmitted more
than once
BW must to obtain the same Rd (data rate)
102

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Time Diversity transmit repeatedly the


information at different time spacings

If data stream repeated twice then either


2) Rd is reduced by for the same BW

103

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

1) BW doubles for the same Rd or

Powerful form of time diversity available in


spread spectrum (DS) systems CDMA
Signal is only transmitted once
Propagation delays in the MRC provide
multiple copies of Tx signals delayed in time

104

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

RAKE Receiver

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

105

attempts to collect the time-shifted versions


of the original signal by providing a separate
correlation receiver for each of the multipath
signals.
Each correlation receiver may be adjusted in
time delay, so that a microprocessor
controller can cause different correlation
receivers to search in different time windows
for significant multipath.
The range of time delays that a particular
correlator can search is called a search
window.

106

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

If time delay between multiple signals > chip period of


spreading sequence (Tc) multipath signals can be
considered uncorrelated (independent)
In a basic system, these delayed signals only appear
as noise, since they are delayed by more than a chip
duration. And ignored.
Multiplying by the chip code results in noise because
of the time shift.
But this can also be used to our advantage, by
shifting the chip sequence to receive that delayed
signal separately from the other signals.

107

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

** The RAKE Rx is a time diversity Rx that


collects time-shifted versions of the original Tx
signal **

faded signal low weight


strong signal high weight
overcomes fading of a signal in a single branch

108

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

M branches or fingers = # of correlation Rxs


Separately detect the M strongest signals
Weighted sum computed from M branches

the delay between multipath components is


usually large, the low autocorrelation
properties of a CDMA spreading sequence
can assure that multipath components will
appear nearly uncorrelated with each other.

109

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

In outdoor environments

In indoor environments

since the multipath delay spreads in indoor


channels
(100 ns) are much smaller than an
IS-95 chip duration ( 800 ns).
In such cases, a RAKE will not work since
multipath is unresolveable
Rayleigh flat-fading typically occurs within a single
chip period.
110

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

RAKE receiver in IS-95 CDMA has been


found to perform poorly

Diversity Motivation

Aim: Reduce effects of fast fading


Concept:

If probability of a deep fade on one channel is p, probability on N


channel pN .
e.g. 10% chance of losing contact for one channel becomes
0.13=0.001=0.1% with 3 channels

Requirements for Diversity

Multiple branches
Low correlation between branches
Similar mean powers:
Efficient combiner

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Multiple branches, independent fading


Process branches to reduce fading probability

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Diversity Example

Spatial Diversity
Multiple input multiple out system (MIMO)
Beamforming, smart antenna
Space time coding
Horizontal and Vertical Combining
Frequency diversity
Frequency diversity transmits information on more than one
carrier frequency
Frequencies separated by more than the coherence bandwidth of
the channel will not experience the same fads
Time diversity
Time diversity repeatedly transmits information at time spacings
that exceed the coherence time of the channel
Polarization diversity
Multi-user diversity

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Different Diversity

Large antenna spacing or large scatterer


spacing produce large path length
differences
Hence multipath will combine differently at
each antenna

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Space Diversity

Retransmit with Time Separation


Advantage: Need only one receiver
Disadvantage: Wastes bandwidth, adds
delay

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Time Diversity

Wideband Channel
Simultaneous Transmission
Wastes power and bandwidth
Equalizers

Channel
Spectrum

Frequency

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Frequency Diversity

Performance

Graph of probability distributions of SNR= threshold for M branch


selection diversity. The term represents the mean SNR on each
branch

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Example 7.4

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Interleaving

Block interleaver where source bits are read into columns and out as n-bit rows

H(n,k): k information bit length, n overall code length


n=2^m-1, k=2^m-m-1:
H(7,4), rate (4/7); H(15,11), rate (11/15); H(31,26), rate
(26/31)
H(7,4): Distance d=3, correction ability 1, detection ability
2.
Remember that it is good to have larger distance and
rate.
Larger n means larger delay, but usually better code

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Hamming Code

Combining Techniques

Selection diversity
Feedback diversity
Maximal ration combining
Equal gain diversity

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

To Combine the multiple received copies

H(7,4)
Generator matrix G: first 4-by-4 identical matrix
Message information vector p
Transmission vector x
Received vector r
and error vector e
Parity check matrix H

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Hamming Code Example

Error Correction

If there is one error at location 2


New syndrome vector z is

which corresponds to the second column of H. Thus, an


error has been detected in position 2, and can be corrected

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

If there is no error, syndrome vector z=zeros

Same problem as the previous slide, but


p=(1001) and the error occurs at location
4 instead.
Pause for 5 minutes
Might be 4 points in the finals.

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Exercise

Hamming (7,4,3) -code. It has 16 codewords of


length 7. It can be used to send 27 = 128
messages and can be used to correct 1 error.
Golay (23,12,7) -code. It has 4 096 codewords. It
can be used to transmit 8 3888 608 messages
and can correct 3 errors.
Quadratic residue (47,24,11) -code. It has 16 777
216 codewords and can be used to transmit 140
737 488 355 238 messages and correct 5 errors.

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Important Hamming Codes

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

ReedMuller code

Cyclic code
They posses rich algebraic structure that can be utilized in a
variety of ways.
They have extremely concise specifications.
They can be efficiently implemented using simple shift register
Many practically important codes are cyclic

In practice, cyclic codes are often used for error


detection (Cyclic redundancy check, CRC)
Used for packet networks
When an error is detected by the receiver, it requests retransmission
ARQ

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Cyclic codes are of interest and importance because

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

BASIC DEFINITION of Cyclic Code

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

FREQUENCY of CYCLIC CODES

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

EXAMPLE of a CYCLIC CODE

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

POLYNOMIALS over GF(q)

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

EXAMPLE

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Cyclic Code Encoder

Similar structure as multiplier for encoder


2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Cyclic Code Decoder

Divider

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Cyclic Redundancy Checks


(CRC)

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Example of CRC

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Checking for errors

Capability of CRC

All single-bit errors if G(x) has more than one nonzero term
All double-bit errors if G(x) has a factor with three terms
Any odd number of errors, if P(x) contain a factor x+1
Any burst with length less or equal to n-k
A fraction of error burst of length n-k+1; the fraction is 1-2^(-(-nk-1)).
A fraction of error burst of length greater than n-k+1; the fraction
is 1-2^(-(n-k)).

Powerful error detection; more computation complexity


compared to Internet checksum

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

An error E(X) is undetectable if it is divisible by G(x). The


following can be detected.

BCH Code
Bose, Ray-Chaudhuri, Hocquenghem

Most powerful cyclic code


For any positive integer m and t<2^(m-1), there exists a t-error
correcting (n,k) code with n=2^m-1 and n-k<=mt.

Industry standards
(511, 493) BCH code in ITU-T. Rec. H.261 video codec for
audiovisual service at kbit/s a video coding a standard used for
video conferencing and video phone.
(40, 32) BCH code in ATM (Asynchronous Transfer Mode)

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Multiple error correcting ability


Ease of encoding and decoding

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

BCH Performance

Reed-Solomon Codes

Storage devices (tape, CD, DVD)


Wireless or mobile communication
Satellite communication
Digital television/Digital Video
Broadcast(DVB)
High-speed modems (ADSL, xDSL)

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

An important subclass of non-binary BCH


Wide range of applications

1971: Mariner 9
Mariner 9 used a [32,6,16] Reed-Muller
code to transmit its grey images of Mars.

camera rate:
100,000 bits/second
transmission speed:
16,000 bits/second

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Voyagers I & II used a [24,12,8] Golay code


to send its color images of Jupiter and Saturn.

Voyager 2 traveled further to Uranus


and Neptune. Because of the higher
error rate it switched to the more
robust Reed-Solomon code.

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

1979+: Voyagers I & II

More recently
Turbo codes
were invented,
which are used in
3G cell phones,
(future) satellites,
and in the CassiniHuygens space
probe [1997].
Other modern codes: Fountain, Raptor, LT, online codes

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Modern Codes

Error Correcting Codes

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Imperfectness of a given code as the difference between the code's required Eb/No to attain a
given word error probability (Pw), and the minimum possible Eb/No required to attain the same Pw,
as implied by the sphere-packing bound for codes with the same block size k and code rate r.

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Speech-Coding Techniques

Introduction
Advantages for VoIP
Digital streams of ones and zeros
The lower the bandwidth, the lower the quality

RTP payload types


Processing power
The better quality (for a given bandwidth) uses a
more complex algorithm
A balance between quality and cost
3-146

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Efficient speech-coding techniques

Voice Quality
Bandwidth is easily quantified
MOS, Mean Opinion Score
ITU-T Recommendation P.800

Excellent 5
Good 4
Fair 3
Poor 2
Bad 1

A minimum of 30 people
Listen to voice samples or in conversations
3-147

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Voice quality is subjective

The selection of participants


The test environment
Explanations to listeners
Analysis of results

Toll quality
A MOS of 4.0 or higher

3-148

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

P.800 recommendations

ITU-T P.861
faithfully represent human judgement and perception
algorithmic comparison between the output signal
and a know input
type of speaker, loudness, delay, active/silence
frames, clipping, environmental noise
3-149

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Subjective and objective quality-testing


techniques
PSQM Perceptual Speech Quality
Measurement

A Little About Speech


Air pushed from the lungs past the vocal cords and
along the vocal tract
The basic vibrations vocal cords
The sound is altered by the disposition of the vocal
tract ( tongue and mouth)

Model the vocal tract as a filter


The shape changes relatively slowly

The vibrations at the vocal cords


The excitation signal
3-150

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Speech

Speech sounds
The vocal cords vibrate open and close
Interrupt the air flow
Quasi-periodic pluses of air
The rate of the opening and closing the
pitch
A high degree of periodicity at the pitch period
2-20 ms
3-151

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Voiced sound

3-152
2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Voiced speech
Power spectrum density

Forcing air at high velocities through a


constriction
The glottis is held open
Noise-like turbulence
Show little long-term periodicity
Short-term correlations still present

3-153

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Unvoiced sounds

3-154
2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

unvoiced speech
Power spectrum density

A complete closure in the vocal tract


Air pressure is built up and released suddenly

A vast array of sounds


The speech signal is relatively predictable
over time
The reduction of transmission bandwidth can
be significant
3-155

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Plosive sounds

Voice Sampling
discrete samples of the waveform and
represent each sample by some number of
bits
A signal can be reconstructed if it is sampled
at a minimum of twice the maximum freq.

Human speech
300-3800 Hz
8000 samples per second

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

A-to-D

Each sample is encoded


into an 8-bit PCM code
word
(e.g.
01100101)
time
=> 8000
x8
bit/s

3-156

Quantization

The difference between the actual level of the


input analog signal

More bits to reduce


Diminishing returns

Uniform quantization levels


Louder talkers sound better
11.2/11 v.s. 2.2/2
3-157

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

How many bits is used to represent


Quantization noise

Smaller quantization steps at smaller signal


levels
Spread signal-to-noise ratio more evenly

3-158

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Non-uniform quantization

DTX is Discontinuous Transmission


Voice activity detector (VAD) detects if there is
active speech or not.
When there is no active speech different DTX
procedures can be used:
No Transmission at all
Comfort Noise (CN) using RFC 3389
Codec built CN in like AMR SID (Silence Descriptor)

Frequency of Comfort Noise packets varies but


is usually some fraction of normal packet rate
3-159

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

DTX and Comfort Noise

Type of Speech Coders


Sample and code
High-quality and not complex
Large amount of bandwidth

source codecs (vocoders)


Match the incoming signal to a math model
Linear-predictive filter model of the vocal tract
A voiced/unvoiced flag for the excitation
The information is sent rather than the signal
Low bit rates, but sounds synthetic
Higher bit rates do not improve much
3-160

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Waveform codecs

Attempt to provide the best of both


Perform a degree of waveform matching
Utilize the sound production model
Quite good quality at low bit rate

3-161

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Hybrid codecs

G.711
The most commonplace codec

If uniform quantization
12 bits * 8 k/sec = 96 kbps

Non-uniform quantization
64 kbps DS0 rate
mu-law
North America

A-law
Other countries, a little friendlier to lower signal levels

An MOS of about 4.3


3-162

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Used in circuit-switched telephone network


PCM, Pulse-Code Modulation

DPCM
Only transmit the difference between the predicated value and
the actual value
Voice changes relatively slowly
It is possible to predict the value of a sample base on the
values of previous samples
The receiver perform the same prediction
The simplest form
No prediction

No algorithmic delay

3-163

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

DPCM, Differential PCM

ADPCM
ADPCM, Adaptive DPCM
Past samples
Factoring in some knowledge of how speech varies
over time

The error is quantized and transmitted


Fewer bits required

G.721
32 kbps

G.726
A-law/mu-law PCM -> 16, 24, 32, 40 kbps
An MOS of about 4.0 at 32 kbps
3-164

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Predicts sample values based on

Analysis-by-Synthesis (AbS)
Codecs
Fill the gap between waveform and source codecs
The most successful and commonly used

Time-domain AbS codecs


Not a simple two-state, voiced/unvoiced
Different excitation signals are attempted
Closest to the original waveform is selected
MPE, Multi-Pulse Excited
RPE, Regular-Pulse Excited
CELP, Code-Excited Linear Predictive
3-165

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Hybrid codec

G.728 LD-CELP
CELP codecs
A vector = a set of elements representing various char. of the
excitation

Transmit
Filter coefficients, gain, a pointer to the vector chosen

Low Delay CELP


Backward-adaptive coder
Use previous samples to determine filter coefficients
Operates on five samples at a time
Delay < 1 ms

Only the pointer is transmitted


3-166

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

A filter; its characteristics change over time


A codebook of acoustic vectors

LD-CELP encoder
Minimize a frequency-weighted mean-square
error

3-167

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

1024 vectors in the code book


10-bit pointer (index)
16 kbps

An MOS score of about 3.9


One-quarter of G.711 bandwidth
3-168

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

LD-CELP decoder

G.723.1 ACELP
6.3 or 5.3 kbps

The coder
A band-limited input speech signal
Sampled at 8 KHz, 16-bit uniform PCM quantization
Operate on blocks of 240 samples at a time
A look-ahead of 7.5 ms
A total algorithmic delay of 37.5 ms + other delays
A high-pass filter to remove any DC component
3-169

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Both mandatory
Can change from one to another during a
conversation

Linear predication coefficients


Gain parameters
Excitation codebook index
24-octet frames at 6.3 kbps, 20-octet frames at 5.3 kbps
3-170

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Various operations to determine the appropriate


filter coefficients
5.3 kbps, Algebraic Code-Excited Linear Prediction
6.3 kbps, Multi-pulse Maximum Likelihood
Quantization
The transmission

Silence Insertion Description (SID) frames of size


four octets

The two lsbs of the first octet


00
01
10

6.3kbps
24 octets/frame
5.3kbps
20
SID frame 4

An MOS of about 3.8


At least 27.5 ms delay
3-171

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

G.723.1 Annex A

8 kbps
Input frames of 10 ms, 80 samples for 8 KHz
sampling rate
5 ms look-ahead
Algorithmic delay of 15 ms

An 80-bit frame for 10 ms of speech


A complex codec
G.729.A (Annex A), a number of simplifications
Same frame structure
Encoder/decoder, G.729/G.729.A
Slightly lower quality
3-172

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

G.729

G.729.B
Based on analysis of several parameters of the input
The current frames plus two preceding frames

DTX, Discontinuous Transmission


Send nothing or send an SID frame
SID frame contains information to generate comfort noise

CNG, Comfort Noise Generation

G.729, an MOS of about 4.0


G.729A an MOS of about 3.7
3-173

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

VAD, Voice Activity Detection

a lower-rate extension
6.4 kbps; 10 ms speech samples, 64 bits/frame
MOS 6.3 kbps G.723.1

G.729 Annex E
a higher bit rate enhancement
the linear prediction filter of G.729 has 10 coef.
that of G.729 Annex E has 30 coef.
the codebook of G.729 has 35 bits
that of G.729 Annex E has 44 bits
118 bits/frame; 11.8 kbps
3-174

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

G.729 Annex D

Other Codecs
Variable-rate coder
Two most common rates
The high rate, 13.3 kbps
A lower rate, 6.2 kbps

Silence suppression
For use with RTP, RFC 2658

3-175

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

CDMA QCELP defined in IS-733

GSM 06.60
An enhanced version of GSM Full-Rate
ACELP-based codec
The same bit rate and the same overall
packing structure
12.2 kbps

Support discontinuous transmission


For use with RTP, RFC 1890
3-176

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

GSM Enhanced Full-Rate (EFR)

20 ms coding delay
Eight different modes
4.75 kbps to 12.2 kbps
12.2 kbps, GSM EFR
7.4 kbps, IS-641 (TDMA cellular systems)
Change the mode at any time
Offer discontinuous transmission
The SID (Silence Descriptor) is sent in every 8 th frame and is 5
bytes in size

The coding choice of many 3G wireless networks


3-177

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

GSM Adaptive Multi-Rate (AMR) codec

G.711 does not deal with lost packets


G.729 can accommodate a lost frame by
interpolating from previous frames
But cause errors in subsequent speech frames

Processing Power
G.728 or G.729, 40 MIPS
G.726 10 MIPS
3-178

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

The MOS values are for laboratory


conditions

a FREE codec for robust VoIP


13.33 kbit/s with an encoding frame
length of 30 ms and 15.20 kbps of 20 ms
Computational complexity in a range of
G.729A

3-179

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

iLBC

Speex

narrowband (8 kHz sampling rate)


2.15 24.6 kb/s
delay of 30 ms

wideband (16 kHz sampling rate)


4-44.2 kb/s
delay of 34 ms

ultra-wideband (32 kHz sampling rate)

intensity stereo encoding


variable bit rate (VBR) possible
voice activity detection (VAD)
3-180

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Open-source patent-free speech codec


CELP (code-excited linear prediction) codec
operating modes:

E.g., G.711 stream -> G.729 encoder/decoder


Might not even come close to G.729

Each coder only generate an approximate


of the incoming signal
Audio samples
http://www.cs.columbia.edu/~hgs/audio/codec
s.html
3-181

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Cascaded Codecs

3-182
2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Effects of packetization

Tones, Signal, and DTMF Digits


Other data may need to be transmitted
Tones: fax tones, dialing tone, busy tone
DTMF digits for two-stage dialing or voice-mail

G.711 is OK
G.723.1 and G.729 can be unintelligible
The ingress gateway needs to intercept
The tones and DTMF digits
Use an external signaling system
3-183

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

The hybrid codecs are optimized for human speech

Encode the tones differently from the speech


Send them along the same media path
An RTP packet provides the name of the tone and
the duration
Or, a dynamic RTP profile; an RTP packet containing
the frequency, volume and the duration
RFC 2198
An RTP payload format for redundant audio data
Sending both types of RTP payload
3-184

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Easy at the start of a call


Difficult in the middle of a call

An Internet Draft
Both methods described before
A large number of tones and events
DTMF digits, a busy tone, a congestion tone, a
ringing tone, etc.

The named events


E: the end of the tone, R: reserved
3-185

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

RTP Payload Format for DTMF Digits

3-186
2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Payload format

message
linguistic code (~ 50 b/s)

motor control
speech production

SPEECH SIGNAL (~50 kb/s)


speech perception
cognitive processes
linguistic code (~ 50 b/s)
message

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Human Speech Communication

PCM (Pulse Code Modulation)

dynamic range of speech is about 50-60 dB


11 bits/sample

maximum frequency in telephone speech is 3.4


kHz
sampling frequency 8 kHz

8000 x 11 = 88 kb/s
Simple and universal but not very efficient

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Transmit value of each speech sample

Less quantization noise


for weaker signals
IN

OUT
2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Better quantization ?

Logarithmic PCM (-law, Alaw)

A - law

Finer quantization for each individual small amplitude sample


how about small signal samples surrounded by large ones?
it is the instantaneous signal energy which should determine the step

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

- law

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Differential coding

time

For many natural signals, the difference between


successive samples quantizes better than samples
themselves
Even better, predict the current sample from the past ones
and transmit the error of the prediction

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

current sample

DPCM
a single predictor
reflecting global
predictability of speech
predictor order up to 4-5
delta modulation - gross
quantization of prediction
error into 1 bit (typically
requires up-sampling well
over the Nyquist rate)
adaptive DPCM
new predictor for every
new speech block
predictor needs to be
transmitted together with
the prediction error

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Differential predictive coding

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Speech Coders

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Linear model of speech


production

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

A.G. Bell got it almost right

source
filter
speech

changes slowly
2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

linear model of speech

long-term prediction

short-term prediction
time

short-term
long-term

- resonance of vocal tract


- periodicity of voiced speech (vocal cord vibration)

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

current
sample

The same principle as in H. Dudleys Vocoder


Used by US Government (LPC-10) - 2.4 kbs

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

LPC vocoder

Residual Excited LPC (RELP)


Simplify prediction
error (low-pass filter
and down-sample

Receiver
re-introduce high
frequencies in the
simplified residual
(nonlinear
distortion)

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Transmitter:

Identical synthesizer in coder and in decoder

change parameters in coder


use for synthesizing speech
compare synthesized speech with real speech
when close enough, send parameters to the receiver

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Analysis-by-synthesis

Future in speech coding?


No need to transmit what we do not hear
No need to transmit what is predictable
speech production mechanism
speaker characteristics
linguistic code (recognition-synthesis)
thought-to-speech

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

study human hearing, especially masking

Automatic recognition of
speech
electric signal
(more than 50 kb/s)

prior knowledge
( textbook )
acquired knowledge
( data )
linguistic message

phoneme string
(below 50 b/s)

reduce information = decrease entropy

Automatic speech recognition (ASR)


derive proper response from
speech stimulus

Auditory perception
how do biological systems
respond to acoustic stimuli

Knowledge of auditory
perception ?

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

knowledge

Principle of stochastic ASR


Using a model of speech production process, generate all
possible acoustic sequences wi for all legal linguistic messages

w arg max( P(M(w i ) | x))


i

1. What is the model M ( wi ) ?


2. Form of the data x ?

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Compare all generated sequences with the unknown acoustic


input x to find which one is the most similar

One (simple) model


hello world
e

Two dominant sources of variability in speech


1. people say the same thing with different speeds
( temporal variability )
2. different people sound different, communication
environment different, ( feature variability)

Doubly stochastic process (Hidden Markov Model)


Speech as a sequence of hidden states - recover the
state sequence
1. never know for sure in which state we are
2. never know for sure which data can be
generated from a given state

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Hidden Markov Model


hi

160 Hz

hi

hi

pm

hi

pf

The model
pm-f
p1m

170 Hz

200 Hz

hi

hi

hi

hi

hi

P(sound|gender)
sequence of male and
female groups?
f

f0
pf-m

hi

m
f

110 Hz 140 Hz 240 Hz

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

f0=160 Hz 170 Hz

What the x should


be ?

x
units of speech
(phonemes)

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

160 170 160 170 200 110 140 240


170 190

Reflects changes in acoustic pressure


its original purpose is reconstruction of speech
does carry relevant information

always also carry some


irrelevant information
additional processing
is necessary to
alleviate it

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Speech signal ?

histogram

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

speech signal
correlations

Where Is The Message ?

/u/ /o/

/a/

// /iy/

beer

/
uw//ao//ah//eh//ih/
/iy/

it is in the
spectrum !!

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Isaac Newton

averaged fft spectra


of some vowels from
3 hours of fluent
speech

Internal Combustion Engine (2003)


2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Inertia in engineering
Steam Engine (1769)

frequency

get spectral components

j/ /u/ /ar/ /j/ /o/ /j/


/o/
time

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Short-term Spectrum
time

10-20 ms

histogram

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

short-term speech spectral


envelope
correlations

histogram

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

logarithmic short-term speech


spectral envelope
correlations

histogram

correlations

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

cosine transform of
logarithmic short-term speech
spectral envelope
(cepstrum)

What Is Wrong With the Shortterm Spectrum ?


1) inconsistent (same message, different representation)

frequency
auditory-like
modifications
auditory-like
spectrum

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

short-term spectrum

Frequency
resolution of
human ear
decreases with
frequency

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Pitch of the tone (Mel scale)

FFT

critical-band energy

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Emulating frequency resolution of human


ear with FFT
t

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Equal Loudness Curves

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Perceptual Linear
Prediction (PLP)

Auditory-like modifications
of short-term speech
spectrum prior to its
approximation by all-pole
autoregressive model
critical-band spectral
resolution
equal-loundness
sensitivity
intensity-loudness
nonlinearity
Today applied in virtually
all state-of-the-art
experimental ASR systems

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

[Hermansky 1990]

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Spectral Basis from LDA

/j/

/u/

/ar/

/j/

/o/

/j/

/o/
time

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

frequency

LDA gives basis for projection of spectral


space

63 %

12 %

16 %

2%

Spectral resolution of
LDA-derived
spectral basis is
higher at low
frequencies
Critical bands of
human hearing are
narrower at lower
frequencies

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

LDA vectors from Fourier Spectrum

Sensitivity to Spectral
Change
(Malayath 1999)
LDA-derived bases

Critical-band fi
2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Cosine basis

Combination of channel and signal spectrum should be as flat


(as random-like) as possible.
Shannon, Communication in presence of noise (1949)

if the receiver could be controlled

energy of the
signal

put more resources (introduce less


noise) where there is more signal
biological system optimized for
information extraction from
sensory signals

resource space
energy of the
signal

level of noise in the


channel

resource space

if signal could be controlled (e.g. in


communication)

put more signal where there is


less noise

sensory signal optimized for a


given communication channel

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

level of noise in the


channel

What Is Wrong With the Short-term Spectral


Envelope?
2) Fragile (easily corrupted by minor disturbances)

additive band-limited noise


ignore the noisy parts of the spectrum
f

linear (high-pass) filtering


remove means from parts of the spectrum
f

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

spectrum

Simultaneous Masking
Nonlinear
frequency
resolution of
hearing
Critical bands
up to ~600 Hz
constant
bandwidth
above 1 kHz
constant Q

tone at f

critical
bandwidth
threshold of
perception
of the tone

noise bandwidth

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

band-pass filtered
noise centered at f

More Important Outcome of Masking


Experiments

Independent processing of parts of the spectrum ?

Replace spectral
vector by a
matrix of
posterior
probabilities of
{p(f)} acoustic events

S ( frequency )

pf1 pf2 pf3

pf4

pf5

pf6

( Hermansky, Sharma and Pavel 1996, Bourlard and Dupont 1996 )

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

What happens outside the critical band does not


affect detection of events within the band !!!

frequency

What Is Wrong With the Short-term Spectral


Envelope?
3) Coarticulation (inertia of organs of speech production)

coarticulation

human auditory perception

d
2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Masking in Time
masker

signal

stronger masker

200 ms

suggests ~200 ms buffer in auditory system


also seen in perception of loudness, detection of short stimuli, gaps
in tones, auditory afterimages, binaural release from masking, ..
what happens outside this buffer, does no affect detection of signal
within the buffer

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

time

increase
in threshold

data x

time

longer time span ?


(~250 ms?)
2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Short-term Features?
~10 ms
time

processing

time-frequency distribution of
the linear component of the
most efficient stimulus that
excites the given auditory
neuron

Average of the first two


principal components ( 83% of
variance ) along temporal axis
from about 180 cortical
receptive fields ( from D. Klein
2004, unpublished )

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

Cortical Receptive Fields

Data for Deriving Posterior Probabilities of


Speech Events
250-1000 ms

FREQUENCY

TIME [s]

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

1-3 critical bands

How to Get Estimates of Temporal Evolution of


Spectral Energy ?

time
200-1000 ms

10-20 ms

time

data x
1-3 Bark
200-1000 ms

all-pole model of part of


time-frequency plane

200-1000 ms

1-3 Bark

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

- with M. Athineos, D. Ellis (Columbia Univ), and P. Fousek (CTU Prague)

All-pole Model of Temporal Trajectory


of Spectral Energy

DCT
of
the signal

conventional LP

the signal
2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

spectral domain LP

Hilbert
envelope
of the signal

signal power
spectrum

all-pole
model of
the Hilbert
envelope

all-pole
model of
the power
spectrum

low frequency

signal

prediction

all-pole model
of lowfrequency
Hilbert
envelope

discrete
cosine
transform

prediction
high frequency

all-pole model
of highfrequency
Hilbert envelope

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

All-pole Models of Sub-band Energy


Contours

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

tonality

Critical-band Spectrum From FFT

tonality

time
Critical-band Spectrum From All-pole Models
Of Hilbert Envelopes in Critical Bands

time

Putting It All Together


TRAP-TANDEM
data-guided features based on frequency-independent
processing of relatively long spans of signal

class posteriors

processing
( trained NN )

data

processing
( trained NN )

data
time

processing
( trained NN )

some function
of phoneme
posteriors

2002 Pearson Education, Inc. Commercial use, distribution, or sale prohibited.

frequency

with S. Sharma, P. Jain, S. Sivadas, ICSI Berkeley and TU Brno

Vous aimerez peut-être aussi