Audio Coding

Audio Coding Techniques (I)
 Introduction
 How is audio different from speech?
 Human auditory system
 Lossless audio coding
 Reversibility of closed-loop DPCM
 Inter-channel decorrelation
 Perceptual audio coding
 Psychoacoustics
 Perceptual entropy
EE493Q: Digital Speech Processing

Introduction to Audio
 What is audio?
 “Of or relating to high-fidelity sound reproduction”
 How is audio different from speech?
 Higher sampling rate
 CD-quality music: 20KHz
 Wideband speech in video conferencing: 7KHz
 Higher accuracy
 12-16 bits per sample
 Multi-channel
 Mono, stereo, 6-channel
 Require much more bandwidth
 raw data rate ~700Kbps per channel

Stereo-Audio

Audio Compression
 How is it different from speech
compression?
 Requirement
 Lossless compression is important in some
applications (e.g., archiving and mixing of high-
quality recordings in professional environments)
 Perceptually lossless compression (e.g., MP3 music)
 Principles
 No physical model exists for audio production
 Instead, more emphasis is put on human auditory
system, in particular, psychoacoustic masking effect

Sound Quality Requirements

Review Question (I)
NO
Q: Given an audio sampled at 16KHz, its 25th subband

(12-12.5KHz) has SPL of 10dB, can human ear hear it?

Review Question (II)
NO
Q: Given a masker tone with 2Khz and 60dB, if the testing

tone is played at the 15th CB with SPL of 50dB, is it masked?

Review Question (III)
AFTER
Q: Consider the echo hiding scheme for audio watermarking,

do we want to insert echoes before or after the masker?

Audio Coding Techniques (I)
 Introduction
 How is audio different from speech?
 Human auditory system
 Lossless audio coding
 Reversibility of closed-loop DPCM
 Inter-channel decorrelation
 Perceptual audio coding
 Psychoacoustics
 Perceptual entropy

Overview
Hans and Schafer, “Lossless compression of digital audio”

IEEE Signal Processing Magazine, July 2001
Note: such approach does not take inter-channel correlation

into account, which is unlikely to be optimal
Intra-Channel Decorrelation
(rounding)
Notes: prediction residues e(n) are integers due to rounding

A(z) is autoregressive (AR) model; B(z) is moving average (MA) model

Justification of Reversibility
Recall: quantization is not invertible, how can we achieve lossless
compression regardless of the rounding operation?
K
e(n) = x(n) − Q[ xˆ (n)], xˆ (n) = ∑ ak x(n − k ),
K k =1
xˆ (n) = ∑ ak x(n − k ) x(n) = e(n) + Q[ xˆ (n)]
k =1
Encoder Decoder
Answer: closed-loop DPCM guarantees the reversibility

Inter-Channel Decorrelation
L
s
Average: s=(L+R)/2 Difference: d=R-L

Stereo Recording
Techniques*
 X-Y technique: two directional
microphones are placed coincidentally,
typically at a 90+ degree angle to each
other
 Mono-compatible
 A-B technique: two omni-directional
microphones are used at an especial
distance to each other (20 centimeters up
to some meters).
 Add another microphone at the center, it
becomes “Decca Tree”
Audio Coding Techniques (II)
 MP3 Audio Compression
 Filter bank/Modified DCT
 Psychoacoustic Models
 Bit Allocation
 Advanced Audio Coding (AAC)
Techniques
 MPEG-1,2,4
 SONY ATRAC
 Lucent PAC
 Dolby AC-3
Introduction
 What does ISO MPEG-1 Audio provide?
A transparently lossy audio compression system based on
the weaknesses of the human ear.
 Can provide compression by a factor of 6 and retain
sound quality.
 One part of a three part standard that includes audio,
video, and audio/video synchronization
 MPEG-2 and MPEG-4 have advanced audio
coding (AAC) options
 ITU-T has its own standardized algorithm for
wideband speech (audio)
MPEG-I Audio Features
 PCM sampling rate of 32, 44.1, or 48 kHz
 Four channel modes:
 Monophonic and Dual-monophonic
 Stereo and Joint-stereo
 Three modes (layers in MPEG-I speak):
 Layer I: Computationally cheapest, bit rates > 128kbps
 Layer II: Bit rate ~ 128 kbps, used in VCD
 Layer III: Most complicated encoding/decoding, bit rates
~ 64kbps, originally intended for streaming audio

MPEG-I Encoder Architecture

MPEG-I Encoder
Architecture
 Polyphase Filter Bank: Transforms PCM samples
to frequency domain signals in 32 subbands
 Psychoacoustic Model: Calculates acoustically
irrelevant parts of signal
 Bit Allocation: Allots bits to subbands according
to input from psychoacoustic calculation.
 Frame Creation: Generates an MPEG-I compliant
bit stream.

What is Filter Bank?
Analysis Synthesis
Filter Bank Illustration

Modified Discrete Cosine
Transform
Forward Transform
Inverse Transform
Pre-Echo Distortion

MPEG-I Psychoacoustic
Models
 MPEG-I standard defines two models:
 Psychoacoustic Model 1:
 Less computationally expensive
 Makes some serious compromises in what it
assumes a listener cannot hear
 Psychoacoustic Model 2:
 Provides more features suited for Layer III
coding, assuming of course, increased
processor bandwidth.

Step 1: Spectral Analysis and
SPL Normalization
 Convert samples to frequency domain
 Use a Hann weighting and then a DFT
 Simply gives an edge artifact (from finite window
size) free frequency domain representation.
 Model 1 uses 512 (Layer I) or 1024 (Layers II
and III) sample window.
 Model 2 uses a 1024 sample window and two
calculations per frame.

Step 2: Identification of Tonal
and Noise Maskers
 Need to separate sound into “tones” and “noise”
components
 Model 1:
 Local peaks are tones, lump remaining spectrum per
critical band into noise at a representative frequency.
Example:
 Model 2:
 Calculate “tonality” index to determine likelihood of each
spectral point being a tone
 based on previous two analysis windows

Graphic Illustration
X: tonal
O: noise

Three Types of Frequency
Masking
 Noise-Masking-Tone (NMT): SMR=4dB
 Tone-Masking-Noise (TMN): SMR=24dB
 Noise-Masking-Noise (NMN): SMR=26dB
NMT Asymmetry
TMN
Step 3: Decimation and
Reorganization of Maskers
 “Smear” each signal within its critical band
 Use either a masking (Model 1) or a spreading
function (Model 2).
 Adjust calculated threshold by
incorporating a “quiet” mask – masking
threshold for each frequency when no
other frequencies are present.

Step 4: Calculation of
Individual Masking Thresholds
 Calculate a masking threshold for each subband in the
polyphase filter bank
 Model 1:
 Selects minima of masking threshold values in range of each
subband
 Inaccurate at higher frequencies – recall how subbands are
linearly distributed, critical bands are NOT!
 Model 2:
 If subband wider than critical band:
 Use minimal masking threshold in subband
 If critical band wider than subband:
 Use average masking threshold in subband

Tonal components Noise components

Step 5: Calculating Global
Masking Thresholds
 The hard work is done – now, we just
calculate the signal-to-mask ratio (SMR)
per subband
 SMR = signal energy / masking threshold
 The calculated SMR results can be used by
audio codec to determine how many bits
are needed to spend on each subband
 This is where most compression occurs – if
some coefficient is below the masking
threshold, it does not need any bit!

Psychoacoustic Model
Summary
input audio frame
Spectral Analysis and SPL Normalization
Identification of Tonal and Noise Maskers
Decimation and Reorganization of Maskers
Calculation of Individual Masking Thresholds
Calculating Global Masking Thresholds
Signal-to-Masking Ratios (SMR)

Example: Calculating Signal
Energy

Calculating Masking
Thresholds

SMR Results

How Perceptual Lossless
Compression is Achieved?
A C B
D
Coefficient A requires bits, but not coefficient B (masked)

Question: how about coefficients C and D?

Summary of Perceptual Audio
Coding
 Psychoacoustics
 Frequency dependency: Human ears are most sensitive
to 2-4KHz
 Masking: A tone could be inaudible because of the
presence of another one (close in frequency or time)
 Asymmetry: Noise-masking-tone is easier than tone-
masking-noise
 MP3
 Time-to-frequency transformation by filter bank or
modified Discrete Cosine Transform
 Psychoacoustic Model I or II produces Signal-to-Masking
Ratio (SMR) that guides the bit allocation process for
each subband
 Perceptually lossless at the bit rate of 64K-128Kbps
Headphone Technology
http://www.technologyreview.com/read_article.aspx?id=17642&ch=infotech

Audio Coding Techniques (II)
 MP3 Audio Compression
 Filter bank/Modified DCT
 Psychoacoustic Models
 Bit Allocation
 Beyond technical issues
 Legal, practical and ethic issues
 Open discussions

Legal Issues Surrounding
MP3
 It's a civil offense, punishable by fine, if
you distribute music that you don't own
the rights to.
 It's a criminal offense to copy music
illegally and then redistribute it for
financial gain.
 There is a great deal of uncertainty about
how copyright laws should function in the
digital world, but the laws themselves are
clear

The Story of Home-Taping
Nightmare
 In 1970s, tapes become easy to be duplicated at
home – nobody was caught as copyright
violators, right?
 The economics of the entire system actually
collapsed, and was only revived by the forced
implementation of an entirely new audio format,
the compact disc (CD).
 A tax on all blank tapes and taping mechanisms
was created in accordance with the 1992 Home
Recording Act to offset lost revenues

Now Comes the MP3
Nightmare
 The ease of downloading and sharing MP3s due to
internet
 Those mammoth companies are still going to sue
every college student they find with MP3s on their
site.
 On 9 October 1998, when the RIAA filed for a
temporary restraining order to prevent San Jose-
based Diamond Multimedia from selling their new
MP3 player. Called the "Rio," this player retails for
$199 and is essentially a Walkman for MP3 files.

What is Legal?
 Most MP3 files on the internet are illegal
except
 Recorded works to which you personally own
the copyrights.
 Recorded works in the public domain.
 As long as you keep your MP3s in the
privacy of your own hard drive and not on
the Web, you are very hard to catch and
relatively harmless.

MP3: The Transformation of
Recording Industry
 Why didn't the ISO/IEC address the
copyright issue when developing MPEG1?
 Its members weren't necessarily thinking
about the legal ramifications but instead
focused on creating an effective technology.
 Unsuccessful fight-back strategy by RIAA:
search and destroy
 Only by systematic, consistent, massive legal
action can the record company possibly hope
to win this war.

Watermarking: a Technical
Savior?
 RIAA's Secure Digital Music Initiative (SDMI) goes
in vain around 2000
 Your midterm project might have shown this
 All existing watermarking techniques are not good
enough to win this war against piracy
 Ultimately, it's all about a workable revenue
model. Once that's been established, then
perhaps the quality and convenience of the MP3
format can be seen as a boon to the industry
instead of a threat.

Ethic Issues
 Copying music for your own private use is
cool, but posting that music to a Web site
or distributing it in any way is not. By
doing this you are robbing people who
worked very hard to create the music you
like.
 Think about: is there any difference
between downloading an album from
internet for free and stealing an album
from the local store?
Open Discussions
 Who should get paid?
 Is there a better business model?
 Is there any better technical solution than
watermarking to fight against piracy?
 Which side will you take? A defender of
RIAA or a hacker?

Audio Coding

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Audio Coding

Transféré par

Droits d'auteur :

Formats disponibles

Audio Coding Techniques (I)

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

system, in particular, psychoacoustic masking effect

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

Q: Given an audio sampled at 16KHz, its 25th subband

EE493Q: Digital Speech Processing

Q: Given a masker tone with 2Khz and 60dB, if the testing

EE493Q: Digital Speech Processing

Q: Consider the echo hiding scheme for audio watermarking,

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

Hans and Schafer, “Lossless compression of digital audio”

Note: such approach does not take inter-channel correlation

Notes: prediction residues e(n) are integers due to rounding

EE493Q: Digital Speech Processing

Answer: closed-loop DPCM guarantees the reversibility

EE493Q: Digital Speech Processing

Average: s=(L+R)/2 Difference: d=R-L

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

Tonal components Noise components

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

Spectral Analysis and SPL Normalization

Identification of Tonal and Noise Maskers

Decimation and Reorganization of Maskers

Calculation of Individual Masking Thresholds

Calculating Global Masking Thresholds

Signal-to-Masking Ratios (SMR)

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

Coefficient A requires bits, but not coefficient B (masked)

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

EE493Q: Digital Speech Processing

Vous aimerez peut-être aussi