Vous êtes sur la page 1sur 25

Chapter 6 Basics of Digital Audio 6.

6 1 Digitization of Sound
6 1 Digitization of Sound What is Sound?
6.2 MIDI: Musical Instrument Digital • Sound is a wave phenomenon like light, but
is macroscopic and involves molecules of air
Interface being compressed and expanded under the
6 3 Quantization and Transmission of Audio
6.3 action of some physical device.
6.4 Further Exploration a) For example, a speaker in an audio system
vibrates back and forth and produces a
longitudinal pressure wave that we perceive as
b) Since sound is a pressure wave, it takes on
continuous values, as opposed to digitized ones.

1 2

c) Even though such pressure waves are y Digitization means conversion to a stream
longitudinal, they still have ordinary wave of numbers, and preferably these numbers
properties and behaviors, such as reflection
(bouncing), refraction (change of angle when should be integers for efficiency.
enteringg a medium with a different density)y) and y Fig.
g 6.1 shows the 1-dimensional nature of
diffraction (bending around an obstacle). sound: amplitude values depend on a 1D
d)) If we wish to use a digital
g version of sound variable time.
variable, time (And note that images
waves we must form digitized representations
of audio information.
depend instead on a 2D set of variables, x
andd y).
→ Link to physical description of sound waves.

3 4
Sampling and Quantization
• The graph in Fig. 6.1 has to be made digital in
both time and amplitude. To digitize, the signal
must be sampled in each dimension: in time, and
in amplitude.
am lit de
a) Sampling means measuring the quantity we are
interested in,, usuallyy at evenly-spaced
y p intervals.
b) The first kind of sampling, using measurements only
at evenly spaced time intervals, is simply called,
sampling The rate at which it is performed is called
the sampling frequency (see Fig. 6.2(a)).
c) For audio, typical sampling rates are from 8 kHz
(8 000 samples
(8,000 l per second) d) tto 48 kHz.
kH This
Thi range is
Fig. 6.1: An analog signal: continuous determined by Nyquist theorem discussed later.
measurement of ppressure wave. d)) Sampling
p g in the amplitude
p or voltage
g dimension is
called quantization. Fig. 6.2(b) shows this kind of
sampling. 6

• Thus to decide how to digitize audio data

we need to answer the following questions:
1 What is the sampling rate?
2. How finely is the data to be quantized, and
i quantization
is ti ti uniform?
if ?
3. How is audio data formatted? (file format)
Si l can bbe d
Signals decomposed d into
i a sum off
sinusoids. Fig. 6.3 shows how weighted
Fig. 6.2: Sampling and Quantization. (a): Sampling the analog signal sinusoids
i id can bbuild
ild up quite
it a complexl
in the time dimension. (b): Quantization is sampling the signal.
a og signal
s g a in the
t e amplitude
a p tu e dimension.
e so .

7 8
Nyquist Theorem
y Signals can be decomposed into a sum of
sinusoids. Fig. 6.3 shows how weighted
sinusoids can build up quite a complex

Fig. 6.3: Building up a complex signal by superposing sinusoids
g g p p g y p p g

9 10

• Whereas frequency is an absolute measure, pitch is • The Nyquist theorem states how frequently we must
generally relative — a perceptual subjective quality of sample in time to be able to recover the original
sound. sound.
a) Pitch and frequency are linked by setting the note A a) Fig. 6.4(a)
Fig 6 4(a) shows a single sinusoid: it is a single,
single pure,
above middle C to exactly 440 Hz. frequency (only electronic instruments can create such
b) An octave above that note takes us to another A note. sounds).
A octave
An t correspondsd tot doubling
d bli the
th frequency.
f Thus
Th b) If sampling
li ratet just
j t equalsl the
th actual
t l ffrequency, Fi
with the middle "A" on a piano ("A4" or "A440") set to 6.4(b) shows that a false signal is detected: it is simply a
440 Hz, the next "A" up is at 880 Hz, or one octave constant, with zero frequency.
above c)) N
Now if sample
l at 1.5
1 5 times
i the
h actuall frequency,
f Fig.
c) Harmonics: any series of musical tones whose 6.4(c) shows that we obtain an incorrect (alias)
frequencies are integral multiples of the frequency of a frequency that is lower than the correct one — it is
fundamental tone: Fig.
Fig 66. 3 half the correct one (the wavelength,
wavelength from peak to peak peak,
d) If we allow non-integer multiples of the base frequency, is double that of the actual signal).
we allow non-"A" notes and have a more complex d) Thus for correct sampling we must use a sampling rate
lti sound.
d equall to
t att least
l t twice
t i theth maximumi ffrequency content
t t
in the signal. This rate is called the Nyquist rate.
11 12
(a): A single frequency.

(c): Sampling at 1.5 times per cycle produces an alias perceived frequency.

Fig. 66.4:
Fi 4 Ali
i (a):
( ) A single
i l frequency.
f (b):
(b) Sampling
S li at exactly
the frequency produces a constant. (c): Sampling at 1.5 times
per cycle produces an alias perceived frequency.
(b): Sampling at exactly the frequency produces a constant.

13 14

Apparent Frequency
• Nyquist Theorem: If a signal is band
limited, i.e., • IIn general,l th
the apparentt frequency
f off a sinusoid
i id is
i the
th lowest
l t
there is a lower limit f1 and an upper limit f2 of frequency of a sinusoid that has exactly the same samples as
frequency components in the signal, then the the input sinusoid. Fig. 6.5 shows the relationship of the
li rate t should
h ld beb att least
l t 2(f2 − f1).
) apparent frequency to the input frequency.
• Nyquist frequency: half of the Nyquist rate.
– Since it would be impossible to recover frequencies
higher than Nyquist frequency in any event, most
systems have an antialiasing filter that restricts the
frequency content in the input to the sampler to a
range at or below Nyquist frequency.
• The relationshipp amongg the Sampling
p g Frequency,
q y
True Frequency, and the Alias Frequency is as
• Fig. 6.5:
Fig 6 5: Folding of sinusoid frequency which is sampled at
8,000 Hz. The folding frequency, shown dashed, is 4,000 Hz.
15 16
Signal to Noise Ratio (SNR)
• The ratio off the
Th h power off theh correct signall andd a))The power in a signal is proportional to the
the noise is called the signal to noise ratio square of the voltage. For example, if the
(SNR) — a measure of the quality of the signal.
signal signal
i l voltage
lt Vsignal
V i l isi 10 ti
times the
th noise,
• The SNR is usually measured in decibels (dB), then the SNR is 20 log10(10) = 20dB.
where 1 dB is a tenth of a bel.
bel The SNR value,
value in
units of dB, is defined in terms of base-10 b) In terms of power, if the power from ten
logarithms of squared
q voltages,
g as follows: violins is ten times that from one violin
playing, then the ratio of power is 10dB, or
c) To know: Power — 10; Signal Voltage — 20.

17 18

Signal to Quantization Noise Ratio

• The usual
s al levels
le els off sound
s nd wee hear around
ar nd uss are described in •Aside from any noise that may have been
terms of decibels, as a ratio to the quietest sound we are capable
of hearing. Table 6.1 shows approximate levels for these sounds. present in the original analog signal, there is
also an additional error that results from
Table 6.1: Magnitude levels of common sounds, in decibels
a) If voltages are actually in 0 to 1 but we have
only 8 bits in which to store values, then
effectively we force all continuous values of
voltage into only 256 different values.
b) This introduces a roundoff error.
error It is not
really "noise". Nevertheless it is called
quantization noise (or quantization error).
19 20
• The quality of the quantization is c) For a quantization accuracy of N bits per
characterized by the Signal to sample, the SQNR can be simply expressed:
Quantization Noise Ratio (SQNR).
a) Quantization noise: the difference
between the
h actuall value
l off the
h analog
signal, for the particular sampling time,
• Notes:
andd the
h nearest quantization
i i iintervall
a) We map the maximum signal to 2N−1 − 1(≈
value. 2N−1)and the most negative signal to − 2N−1.
b) At most, this error can be as much as b) Eq. (6.3) is the Peak signal-to-noise ratio,
half of the interval. PSQNR:
Q ppeak signal
g and peak
p noise.
21 22

c) The dynamic range is the ratio of maximum to y 6.02N

6 02N is the worst case.
minimum absolute values of the signal: Vmax/Vmin.
The max abs. value Vmax gets mapped to 2N−1 − 1; y If the input signal is sinusoidal, the
the min abs.
abs value
al e Vmin gets
ets mapped
ma ed to t 1.
1 Vmin is
the smallest positive voltage that is not masked quantization error is statistically
byy noise. The most negative
g signal,
g −Vmax, is p
independent, , and its magnitude
g is
mapped to −2N−1.
uniformly distributed between 0 and half
d) The quantization interval is ∆V =(2Vmax)/2N ,
i there
th are 2N intervals.
i t l Th
The whole
h l range of the interval,
interval then it can be shown that
Vmax down to (Vmax − ∆V/2) is mapped to 2N−1 − the expression for the SQNR becomes:
e) The maximum noise, in terms of actual voltages, SQNR =6.02N+1.76(dB) (6.4)
is half the quantization interval: ∆V/2=Vmax/2N .

23 24
Linear and Non-
Non-linear Quantization – Integrating, we arrive at a solution
• Linear format: samples are typically stored as r = k ln s + C (6.7)
uniformly quantized values. with constant of integration C. Stated differently,
the solution is
• Non-uniform quantization: set up more finely-
spacedd levels
l l where
h h
humans hear
h withh the
h most r = k ln(s/s0) (6 8)
acuity. s0 = the lowest level of stimulus that causes a
– Weber
Weber'ss Law stated formally says that equally response
perceived differences have values proportional to (r =0 when s = s0).
absolute levels: • Nonlinear quantization works by first transforming an
ΔResponse ΔS i l /S i l
ΔStimulus/Stimulus (6 5)
(6.5) analog
l signal
i l ffrom th
the raw s space into
i t the
– Inserting a constant of proportionality k, we have a theoretical r space, and then uniformly quantizing the
differential equation that states: resulting values.
dr = k (1/s) ds (6.6) • Such a law for audio is called µ-law encoding, (or u-
law). A very similar rule, called A-law, is used in
with response
p r and stimulus s. telephony in Europe.
• The equations for these very similar encodings are as
25 follows: 26

Fig. 6.6 shows these curves. The parameter µ is set to µ= 100 • The µ-law in audio is used to develop a nonuniform
or µ
µ= 255; the parameter A for the A
law encoder is qquantization rule for sound: uniform quantization
q of r ggives
usually set to A=87.6. finer resolution in s at the quiet end.
27 28
Audio Filtering Audio Quality vs.
vs Data Rate
• Prior to sampling and AD conversion, the audio signal
is also usually filtered to remove unwanted
frequencies. The frequencies kept depend on the • The uncompressed
Th d data
d t rate
t increases
i as more
application: bits are used for quantization.
a)) For speech,
speech typically from 50Hz to 10kHz is retained,
Stereo: double the bandwidth to transmit a digital
and other frequencies are blocked by the use of a audio signal.
band-pass filter that screens out lower and higher
b) An audio music signal will typically contain from
about 20Hz up to 20kHz.
c) At the DA converter end, high frequencies may
reappear in the output — because of sampling and
th quantization,
then ti ti smooth th iinputt signal
i l iis replaced
l d bby
a series of step functions containing all possible
d) So at the decoder side, a lowpass filter is used after
the DA circuit. 29 30

Synthetic Sounds
1. FM (Frequency Modulation): one
approach to generating synthetic sound:

→ Link to details.

Fig. 6.7: Frequency Modulation. (a): A single frequency.

(b): Twice the frequency. (c): Usually, FM is carried out
using a sinusoid argument to a sinusoid.
sinusoid (d): A more
complex form arises from a carrier frequency, 2πt and a
31 modulating frequency 4πt cosine inside the sinusoid. 32
6.2 MIDI: Musical Instrument Digital
2. Wave Table synthesis: A more accurate • Use the sound card
card'ss defaults for sounds:
way of generating sounds from digital use a simple scripting language and hardware
setupp called MIDI.
• MIDI Overview
Also known, simply, as sampling. a) MIDI is a scripting language — it codes
– In
I this
hi technique,
h i the
h actuall digital
di i l samples
l off "
"events" " that stand for the production of
sounds from real instruments are stored. sounds. E.g., a MIDI event might include values
Since wave tables are stored in memory on for the pitch of a single note, its duration, and
the sound card, they can be manipulated by its volume.
software so that sounds can be combined, b) MIDI is a standard adopted by the electronic
musici iindustry
d f controlling
for lli devices,
d i suchh as
edited, and enhanced. synthesizers and sound cards, that produce
→ Link to details.
details music.

33 34

MIDI Concepts
c))The MIDI standard is supported by most • MIDI channels are used to separate
synthesizers, so sounds created on one
a) There are 16 channels numbered from 0 to
synthesizer can be played and 15. The channel forms the last 4 bits (the
manipulated on another synthesizer
y and least significant bits) of the message.
sound reasonably close. b) Usually a channel is associated with a
particular instrument: e.g., channel 1 is the
d) Computers must have a special MIDI piano,
i channel
h l 10 is
i the
th drums,
d etc.
interface, but this is incorporated into c) Nevertheless, one can switch instruments
most sound d cards.
d TheTh sound d card
d must midstream if desired
midstream, desired, and associate another
instrument with any channel.
also have both D/A and A/D converters.

35 36
• System messages • It is easy to confuse the term voice with the term
a) Several other types of messages, e.g. a general timbre — the latter is MIDI terminology for just what
message for all instruments indicating a change instrument that is trying to be emulated, e.g. a piano
as opposed to a violin: it is the quality of the sound.
in tuning or timing.
a) An instrument (or sound card) that is multi-timbral is
b) If the first 4 bits are all 1s, then the message is one that is capable of playing many different sounds at
interpreted as a system common message.message the same time,
time e.g.,
e g piano,
piano brass,
brass drums,
drums etc.
• The way a synthetic musical instrument b) On the other hand, the term voice, while sometimes
used by musicians to mean the same thing as timbre, is
responds to a MIDI message is usually by used in MIDI to mean every different timbre and pitch
simply ignoring any play sound message that that the tone module can produce at the same time.
is not for its channel. • Different timbres are produced digitally by using a
– If several messages are for its channel, then the patchh — the
h set off controll settings that
h define
d f a
instrument responds, provided it is multi-voice, particular timbre. Patches are often organized into
i e can play more than a single note at once
i.e., once. databases, called banks.

37 38

MIDI Concepts
• The data in a MIDI status byte is between 128 and
• General MIDI: A standard mapping
pp g specifying
p y g what 255; each of the data bytes is between 0 and 127.
instruments (what patches) will be associated with what Actual MIDI bytes are 10-bit, including a 0 start and 0
a)) In General MIDI,, channel 10 is reserved for ppercussion
stop bit.
instruments, and there are 128 patches associated with
standard instruments.
b) For most instruments, a typical message might be a Note On
message (meaning, e.g., a keypress and release), consisting of
what channel, what pitch, and what "velocity" (i.e., volume).
c) For percussion instruments, however, the pitch data means
hi h ki
d off drum.
d) A Note On message consists of "status" byte — which
channel, what pitch — followed by two data bytes. It is
followed by a Note Off message,
message which also has a pitch (which
note to turn off) and a velocity (often set to zero). Fig. 6.8: Stream of 10-bit bytes; for typical MIDI
messages, these consist of {Status byte, Data Byte,
→ Link to General MIDI Instrument
Instr ment Patch Ma
Map Data Byte} = {Note On,On Note Number
Number, Note Velocity}
→ Link to General MIDI Percussion Key Map
39 40
Hardware Aspects of MIDI
• A MIDI device often is capable of programmability,
and also can change
g the envelope
p describingg how the • The MIDI hardware setup consists of a 31.25 kbps
amplitude of a sound changes over time. serial connection. Usually, MIDI-capable units are
• Fig. 6.9 shows a model of the response of a digital either Input devices or Output devices, not both.
instrument to a Note On message: • A traditional synthesizer is shown in Fig.
Fig 6.10:
6 10:
• Keyboard

Fig 66.10:
Fig. 10: A MIDI synthesizer
Fig. 6.9: Stages of amplitude versus time for a music note
41 42

• The physical MIDI ports consist of 5 5-pin

pin connectors y A typical MIDI sequencer setup is shown
for IN and OUT, as well as a third connector called in Fig. 6.11:
a)) MIDI communication is half-duplex.
half duplex
b) MIDI IN is the connector via which the device
receives all MIDI data.
c) MIDI OUT is the connector through which the
device transmits all the MIDI data it generates itself.
d) MIDI THRU iis the h connector byb which
hi h the
h device
d i
echoes the data it receives from MIDI IN. Note that
it is onlyy the MIDI IN data that is echoed byy MIDI
THRU — all the data generated by the device itself
is sent via MIDI OUT.

Fig. 6.11: A typical MIDI setup 44
Structure of MIDI Messages
• MIDI messages can be classified into two • A. Channel messages: can have up to 3 bytes:
a) The first byte is the status byte (the opcode, as it were);
types: channel messages and system has its most significant bit set to 1.
messages as in Fig
messages, Fig. 6.12:
6 12: b) The 4 low-order
low order bits identify which channel this
message belongs to (for 16 possible channels).
c) The 3 remaining bits hold the message. For a data byte,
th mostt significant
the i ifi t bit iis sett tto 00.
• A.1. Voice messages:
a) This type of channel message controls a voice, i.e., sends
information specifying which note to play or to turn off,
and encodes key pressure.
b) Voice messages are also used to specify controller
effects such as sustain, vibrato, tremolo, and the pitch
Fig. 6.12: MIDI message taxonomy wheel.
c) Table 66.3
3 lists these operations.

45 46

• T bl 6.3:
Table 6 3 MIDI voice
i messages. • A 2 Channel mode messages:
a) Channel mode messages: special case of the
Control Change message → opcode B (the
message is &HBn, or 1011nnnn).
b) However, a Channel Mode message has its first
data byte in 121 through 127 (&H79–7F).
c) Channel mode messages determine how an
instrument pprocesses MIDI voice messages:g
respond to all messages, respond just to the
correct channel, don't respond at all, or go
over to local control of the instrument.
d) The data bytes have meanings as shown in
Table 6.4.
((** &H indicates hexadecimal
hexadecimal, and 'n'
n in the status byte hex
value stands for a channel number. All values are in 0..127
except Controller number, which is in 0..120) 47 48
Table 6.4:
6 4: MIDI Mode Messages • B System Messages:
a) System messages have no channel
number — commands that are not channel
specific, such as timing signals for
synchronization, positioning information in
pre-recorded MIDI sequences, and detailed
setupp information for the destination device.
b) Opcodes for all system messages start with
c) System messages are divided into three
classifications, according to their use:

49 50

• B.1.
B 1 SSystem
t common messages: relate
l t to
t timing
ti i B 2 SSystem real-time
B.2. l i messages: related
l d to synchronization.
h i i
or positioning.
Table 6.6: MIDI System Real
Time messages.
Table 6.5: MIDI System Common messages.

51 52
General MIDI
• B.3.
B 3 System exclusive message: included so • General MIDI is a scheme for standardizing the
assignment of instruments to patch numbers.
that the MIDI standard can be extended by a) A standard percussion map specifies 47 percussion
manufacturers sounds.
a) After the initial code, a stream of any specific b) Where a "note" appears on a musical score determines
what percussion instrument is being struck: a bongo
messages can be inserted that apply to their drum a cymbal.
drum, cymbal
own product. c) Other requirements for General MIDI compatibility:
b)) A System
y Exclusive messageg is supposed
pp to be MIDI device must support all 16 channels; a device must
be multitimbral (i.e.,
(i e each channel can play a different
terminated by a terminator byte &HF7, as instrument/program); a device must be polyphonic (i.e.,
specified in Table 6. each channel is able to play many voices); and there
must be a minimum of 24 dynamically allocated voices.
c) The terminator is optional and the data stream
may simply be ended by sending the status byte • General MIDI Level2: An extended general MIDI has
recently been defined, with a standard .smf "Standard
of the next message.
message MIDI File"
Fil " format
f defined
d fi d — inclusion
i l i off extra
character information, such as karaoke lyrics.
53 54

6.3 Quantization and Transmission

MIDI to WAV Conversion
of Audio
• Some programs,
programs such as early versions of • Coding of Audio: Quantization and
Premiere, cannot include .mid files — transformation of data are collectively
known as coding of the data.
instead, they insist on .wav format files.
a) For audio, the µ-law technique for companding
a)) Various shareware pprograms
g exist for audio signals is usually combined with an
approximating a reasonable conversion algorithm that exploits the temporal
between MIDI and WAV formats. redundancy present in audio signals.
b) These programs essentially consist of large b) Differences in signals between the present and
a past time can reduce the size of signal values
lookup files that try to substitute pre
pre- andd also
l concentrate
t t the
th histogram
hi t off pixel
i l
defined or shifted WAV output for MIDI values (differences, now) into a much smaller
messages, with inconsistent success. range.

55 56
Pulse Code Modulation
c The result of reducing the variance of values
c. yThe basic techniques for creating digital
is that lossless compression methods
pproduce a bitstream with shorter bit lengths
signals from analog signals are sampling
for more likely values (→ expanded and quantization.
discussion in Chap.7). yQ
Quantization consists of selectingg
• In general, producing quantized sampled breakpoints in magnitude, and then re-
p for audio is called PCM ((Pulse
output mapping any value within an interval to
Code Modulation). The differences one of the representative output levels.
version is called DPCM (and a crude but
→ Repeat
R off Fig.
Fi 66.2:
efficient variant is called DM). The
adaptive version is called ADPCM.
57 58

a) The set of interval boundaries are called

decision boundaries, and the representative
values are called reconstruction levels.
b) The boundaries for quantizer input intervals that
will all be mapped into the same output level
form a coder mapping.
c) The representative values that are the output
values from a quantizer are a decoder mapping.
d) Finally, we may wish to compress the data, by
i i a bi bit stream that
h uses fewer
f bits
bi for
f the
most prevalent signal values (Chap. 7).
g 6.2: Sampling
p g and Quantization.
59 60
• Every compression scheme has three stages: y For audio signals,
signals we first consider PCM
A. The input data is transformed to a new
representation that is easier or more efficient for digitization. This leads to Lossless
to compress. Predictive Coding as well as the DPCM
B. We may introduce loss of information. scheme; both methods use differential
Quantization is the main lossy step we use a
limited number of reconstruction levels, fewer coding. As well, we look at the adaptive
than in the original signal.
version, ADPCM, which can provide
C. Coding. Assign a codeword (thus forming a
binary bitstream) to each output level or better compression.
symbol. This could be a fixed
length code, or a
variable length code such as Huffman coding
(Chap. 7).

61 62

PCM in Speech Compression

• Assuming a bandwidth for speech from about 50 • However, there are two small wrinkles we must
Hz to about 10 kHz, the Nyquist rate would also address:
dictate a sampling rate of 20 kHz. 1. Since only sounds up to 4 kHz are to be
a)) U
Using uniform
f quantization without
h companding,
d considered,
d d allll other
h frequency
f content must be
the minimum sample size we could get away with noise. Therefore, we should remove this high-
would likelyy be about 12 bits. Hence for mono frequency content from the analog input signal.
speech transmission the bit-rate would be 240 This is done using a band-limiting filter that
kbps. (20K x 12 bits)
blocks out high, as well as very low, frequencies.
b) With companding, we can reduce the sample size
down to about 8 bits with the same perceived level – Also, once we arrive at a pulse signal, such as that in
of quality, and thus reduce the bit-rate to 160 kbps. Fig. 6.13(a) below, we must still perform DA
conversion and then construct a final output p analogg
c) However,
However the standard approach to telephony in signal. But, effectively, the signal we arrive at is the
fact assumes that the highest-frequency audio staircase shown in Fig. 6.13(b).
signal we want to reproduce is only about 4 kHz.
Th f
Therefore the
th sampling
li rate
t is
i only
l 8 kHz,
kH and d the
companded bit-rate thus reduces this to 64 kbps.
63 64
2. A discontinuous signal contains not just
frequency components due to the original
signal but also a theoretically infinite set of
higher-frequency components:
a) This result is from the theory of Fourier
analysis, in signal processing.
b)) These higher
g frequencies
q are extraneous.
c) Therefore the output of the digital-to-analog
converter ggoes to a low-pass
p filter that allows
only frequencies up to the original maximum
Fig. 6.13: Pulse Code Modulation (PCM). (a) Original analog signal to be retained.
andd its
i corresponding
di PCM signals.
i l (b) D Decodedd d staircase
i signal.
i l (c)( )
Reconstructed signal after low-pass filtering.
65 66

Differential Coding of Audio

• The complete scheme for encoding and
d di telephony
decoding l h signals
i l is i shown
h as a • Audio is often stored not in simple PCM
schematic in Fig. 6.14. As a result of the low- but instead in a form that exploits
pass filtering,
filtering the output becomes smoothed differences — which are generally smaller
and Fig. 6.13(c) above showed this effect. numbers, so offer the possibility of using
fewer bits to store.
a) If a time-dependent signal has some
c nsistenc over
consistency er time ("tem
( temporal
redundancy"), the difference signal,
subtracting the current sample from the
previous one, will have a more peaked
g , with a maximum around zero.

67 68
Lossless Predictive Coding
b) For example
example, as an extreme case the • Predictive
P di ti coding:
di simply
i l means ttransmitting
histogram for a linear ramp signal that has differences
constant slopep is flat,, whereas the histogram
g — predict the next sample as being equal to the
for the derivative of the signal (i.e., the current sample; send not the sample itself but the
differences, from sampling point to sampling difference between previous and next.
point) consists of a spike at the slope value. a) Predictive coding consists of finding differences, and
transmitting these using a PCM system.
c) So if we then go on to assign bit-string b) Note that differences of integers will be integers.
codewords to differences, we can assign Denote the integer input signal as the set of values
short codes to prevalent values and long fn. Then we predict values as simply the previous
d d tot rarelyl occurring
i ones. value and define the error en as the difference
between the actual and the predicted signal:
Predictive coding: simply means transmitting
69 70

c) But it is often the case that some function of • The idea of forming differences is to make the
histogram of sample values more peaked.
a few of the previous values, fn−1, fn−2, fn−3, etc., a) For example, Fig.6.15(a) plots 1 second of sampled
provides a better prediction
prediction. Typically
Typically, a linear speech at 8 kHz
kHz, with magnitude resolution of 8 bits per
predictor function is used: b) A histogram of these values is actually centered around
zero as in Fig.
zero, Fig 6.15(b).
6 15(b)
c) Fig. 6.15(c) shows the histogram for corresponding
speech signal differences: difference values are much
more clustered around zero than are sample values
d) As a result, a method that assigns short codewords to
tl occurring
i symbols
b l willill assign
i a short
h t code
d tto
zero and do rather well: such a coding scheme will
much more efficiently code sample differences than
samples themselves.

71 72
• One pproblem: suppose
pp our integer
g samplep values are in the rangeg
0..255. Then differences could be as much as -255..255
— we've increased our dynamic range (ratio of maximum to
minimum) by a factor of two → need more bits to transmit some
a) A clever solution for this: define two new codes, denoted SU and
SD, standing for Shift-Up and Shift-Down. Some special code
values will be reserved for these.
these Define SU and SD as shifts by
b) Then we can use codewords for only a limited set of signal
differences say only the range −15
differences, 15..16.
16 Differences which lie in the
limited range can be coded as is, but with the extra two values for
SU, SD, a value outside the range −15..16 can be transmitted as a
series of shifts, followed by a value that is indeed inside the range
c) For example, 100 is transmitted as: SU, SU, SU, 4, (3*32+4) where
g 6.15: Differencingg concentrates the histogram.
g (a):
( ) Digital
g (the codes for) SU and for 4 are what are transmitted (or stored).
speech signal. (b): Histogram of digital speech signal values.
(c): Histogram of digital speech signal differences.
73 74

Lossless predictive coding • Let's consider an explicit example. Suppose we

wish to code the sequence f1,f2,f3,f4,f5 = 21, 22, 27,
y The decoder produces the same signals 25, 22. For the purposes of the predictor, we'll
as the original. As a simple example, invent an extra signal value f0, equal to f1 = 21, and
suppose we devise a predictor for as fi t ttransmit
first it thi
this iinitial
iti l value,
l uncoded:
d d

75 76
y The error does center around zerozero, we
see, and coding (assigning bit-string
codewords) will be efficient. Fig. 6.16
shows a typical schematic diagram
g used
to encapsulate this type of system:

Fig. 6.16: Schematic diagram for Predictive

Coding encoder and decoder.
77 78

c) DPCM: form the prediction; form an error en by subtracting
the pprediction from the actual signal;
g then qquantize the
• Differential PCM is exactly the same as error to a quantized version, . The set of equations that
Predictive Coding, except that it describe DPCM are as follows:
incorporates a quantizer step.
a) One scheme for analytically determining the
best set of quantizer steps,
steps for a non
quantizer, is the Lloyd-Max quantizer, which
is based on a least
squares minimization of
the error term.
b) Our nomenclature: signal values: fn – the Th codewords
Then d d for
f quantized
i d error values
l are
original signal, – the predicted signal, and produced using entropy coding, e.g. Huffman coding (Chapter
the q
quantized, reconstructed signal.
g 7).

79 80
(d) The main effect of the coder
decoder • FFor speech,
h we couldld modify
d f quantization
process is to produce reconstructed, steps adaptively by estimating the mean and
quantized signal
q g variance of a patch of signal values
values, and shifting
quantization steps accordingly, for every block
g values. That is,, startingg at time i we
of signal
• The distortion is the average squared error could take a block of N values fn and try to
minimize the quantization error:
often plots distortion versus the number of
bit-levels used. A Lloyd-Max
y qquantizer will
do better (have less distortion) than a
uniform quantizer.

81 82

• This is a least-squares problem, and can be solved

iteratively == the Lloyd-Max quantizer.
• Schematic diagram for DPCM:
• Since signal
si nal differences are very
er peaked,
eaked wee could
c ld model
m del them
using a Laplacian probability distribution function, which is
strongly peaked at zero: it looks like

ffor variance σ2. 

• So typically
yp y one assigns
g quantization
q steps
p for a q
quantizer with
nonuniform steps by assuming signal differences, dn are drawn
from such a distribution and then choosing steps to minimize

Fig. 6.17: Schematic diagram for DPCM encoder and

83 84
y First, we note that the error is in the
range −255..255, i.e., there are 511
possible levels for the error term. The
quantizer simply
q p y divides the error range
into 32 patches of about 16 levels each. It
also makes the representative
reconstructed value for each patch equal
to the midway point for each group of 16

85 86

• Table 6.7 gives output values for any of the input • As an example stream of signal values,
codes: 44-bit
bit codes are mapped to 32
reconstruction levels in a staircase fashion. consider the set of values:
Table 6.7 DPCM quantizer reconstruction levels.

87 88
• DM (Delta Modulation): simplified version of b) Consider actual numbers: Suppose signal values are
DPCM. Often used as a quick AD converter.
1. Uniform-Delta DM: use only a single quantized
error value,
l either
ith positive
iti or negative.
a) ֜ a 1-bit coder. Produces coded output that
follows the original signal in a staircase fashion.
The set of equations is:

The reconstructed set of values 10, 14, 10, 14 is close

to the correct set 10,
10 11,
11 13,
13 15.
d) However, DM copes less well with rapidly changing
signals. One approach to mitigating this problem is to
l increase the
h sampling,
l perhaps
h to many times
• Note that the prediction simply involves a delay. the Nyquist rate.
89 90

• ADPCM (Adaptive DPCM) takes the idea of adapting
2. Adaptive DM: If the slope of the actual
2 the coder to suit the input much farther. The two
signal curve is high, the staircase pieces that make up a DPCM coder: the quantizer
and the predictor.
approximation cannot keep up. For a 1. In Adaptive DM, adapt the quantizer step size to
steepp curve, should changeg the stepp size k suit the input. In DPCM, we can change the step
i as wellll as decision
d i i boundaries,
b d i usingi a non-
adaptively. uniform quantizer. We can carry this out in two
◦ One scheme for analytically determining the ways:
a) Forward adaptive quantization: use the properties of the
best set of quantizer steps, for a non-uniform input signal.
quantizer is Lloyd
quantizer, Lloyd-Max
Max. b) Backward adaptive quantization: use the properties of
the quantized output. If quantized errors become too
large, we should change the non-uniform quantizer.

91 92
• However we can get into a difficult situation
if we try to change the prediction
2 We can also adapt the predictor
2. predictor, again using coefficients,
ffi i t that
th t multiply
lti l previous
i quantized
ti d
forward or backward adaptation. Making the values, because that makes a complicated set
di t coefficients
ffi i t adaptive
d ti iis called
ll d of equations to solve for these coefficients:
Adaptive Predictive Coding (APC):

93 94

• Fig. 6.18 shows a schematic diagram for

the ADPCM coder and decoder:

95 96
6 4 Further Exploration
−→ Link to Further Exploration for Chapter 6.

• The Useful links included are:

– An excellent discussion of the use of FM to create
synthetic sound.
– An extensive list of audio file formats.
– CD audio file formats are somewhat different. The main
music format is called “red book audio.” A good
description of various CD formats is on the website.
– A General MIDI Instrument Patch Map, along with a
General MIDI Percussion Key Map.
– A link
li k tto good
d ttutorial
t i l on MIDI andd wave table
t bl musici
– A link to a java program for decoding MIDI streams.
– A good multimedia/sound page, including a source for
locating In­ternet sound/music materials.