Audio Compression & Synthesis Technology

Audio Compression & Synthesis Technology Overview
Adam Chang, MSEE Product Marketing Manager
Contents
s
Audio Compression Technology Overview Audio Synthesis Technology Overview Speech Compression Overview MXIC Solution to Digital Audio & Speech Applications

Speech Product Offering Digital Audio Product Offering
Summary
Contents
s

Summary
Audio Compression Technologies

s
A wild range of Audio compression technologies are available, but few of them are really commercialized.
Owing to internet music, MPEG-1/Audio Layer-3 (so called MP3) becomes the most successful Audio compression technology
In addition to internet music, Audio compression technologies are applied to:

Portable solid-state Audio recorder Internet Radio DAB (Digital Audio Broadcast) system Audio accessories of portable devices (Cell phone, PDA, )
MPEG-1/Audio Layer 3 Coding
s s s s
MPEG/Audio compression layer 3 is now well known as MP3 Low bit-rate Application 64Kbps for mono channel Sampling Frequency: 32, 44.1, or 48KHz Lossy compression algorithm: 12-to-1 Compression ratio
Audio Encoding System Overview
Digital Audio Input
Filter Bank
Bit or Noise Allocation

SMR (Signal to Mask Ratio)
Bitstream Formatting
Encoded Bitstream
Psychoacoustic Model
Hybrid Filter Bank

s
Polyphase filter bank divides the audio signal into 32 equal-width frequency sub bands. Processing the filter outputs with a MDCT (Modified Discrete Cosine Transformation)
Psychoacoustic Model
s
Incoming signal is transformed from time domain to frequency domain for analysis. Psychoacoustic model will calculate SMR (Signal-to-Mask Ratio) to each band by using auditory perception like Simultaneous Masking, Temporal Masking, and Absolute Threshold. SMR of each band will have direct impact to compreesion rate and audio quality. Different Psychoacoustic models are chosen upon trade-off between audio quality and compression rate.
Noise/Bit Allocation
Based on SMR from Psychoacoustic model and bit rate restriction, 576 frequency coefficients are grouped to scale factor bands. Each scale factor band executes noise (or bit) allocation by repeating adjustment of its scale factor and global gain until distortion is minimized. Non-uniform quantization & Huffman Coding
Audio Compression Technologies Comparison
Technology MP3 MP3PRO AAC WMA ATRAC3
Bit rate (Kbit/sec) 128 64 96 96 72
Advantages (1) Internet Music Standard (2) Easy to be silicon LSI (1) High compression ratio (2) Extention of MP3 (1) Excellent audio quality (2) High compression ratio (1) Excellent audio quality (2) High compression ratio (1) Excellent audio quality (2) High compression ratio
Drawbacks (1) Bit rate is too high (1) IP by Thomson Multimedia (2) No encoder IC available (1) Not internet music standard (2) No encoder IC available (1) IP by Microsoft (2) No encoder IC available (1) IP by Sony (2) No encoder IC available
10
Audio Compression Technologies Brief Summary

s
MP3 is the most mature technology, and its encoder is easy to be implemented by silicon LSI
Among newly developed Audio compression technologies, MP3PRO is the most shining star, because:

It is backward compatible with MP3 Its compression rate is the lowest based on the same audio quality like MP3 Its encoder is easier to be implemented by silicon LSI Thomson Media aggressively promotes it be new internet music standard
11
Contents
s

Summary
12
Audio Synthesis Technologies

s
Audio synthesis technology is actually an method of producing sounds where no acoustic sound is used
Among audio synthesis technologies, FM (Frequency Modulation) and Wavetable Synthesis are now mainstream Audio technologies
Audio synthesis technologies are now wildly applied to many applications like

Music Keyboard Cell phone sound generator Toys Melody accessories
13
Wavetable Synthesis Technology

u-Law Compression Sound Model Loop Envelope Control Pitch shift Interpolation
14
u-Law Compression
s
Converts linear 16-bit samples into 8-bit codes
log(1 + 255 | s |) s = sign( s ) log(1 + 255)

Assume all samples are fractional values between -1 and 1
log(1 + 255s ) s = log 256

256 1 s= 255
s
8-bit u-Law codes
16-bit linear samples

15
A Typical Waveform of Sound
16
Sound Model
s
ADSR Model

A (Attack), D (Decay), S (Sustain), R (Release) For non-percussive instruments (e.g. violin)
0 dB
D A
eduil p m t a not aunet a i t
S R
note on
note off
time
17
Sound Model
s
ADSR Model
For percussive instruments (e.g. piano, drum)
0 dB
D A
eduil p m t a not aunet a i t
S R
note on
note off
time
18
Loop
19
Envelope Control
20
Pitch shift
s
Use one or limited sound samples of notes to generate all notes you want to perform Access the stored sample memory at different rates during playback
Memory Pointer Pointer
Memory
Some particular pitch fs
Pitch shifted up by one octave 2fs
21
Interpolation
22
Wavetable System Implementation
Audio Out (L)
RAM
DAC
Audio Out (R)
MIDI IN
Micro Processor Program ROM
Wavetable Synthesizer Wavetable ROM
23
FM (Frequency modulation)
s
FM is actually a process of varying the frequency of a signal, often periodically;
24
FM (Frequency Modulation)
s
Fundamental principle of FM sound generator is to synthesizing tones by combining modulation signal and carrier signal.
Modulator
FM Modulation
Output Sound
Pyramidal Wave Created
Saw toothed Wave Created Paramete r
Oblong Wave Created
Carrier (Sine wave) Carrier Created
Paramete r
Paramete r
25
FM (Frequency Modulation)
s
A device producing carrier or modulator is called an operator
At least two operators are required to generate sound of a musical instrument.
For percussion instruments, at least 4 operators are required if expecting decent instrumental sound quality
26
Audio Synthesis Comparison

s
Theoretically, FM and Wavetable synthesis can achieve the same audio quality.
Technology Wavetable Synthesis
Advantages (1) Easy to be implemented (2) Quality consistent (1) Cost (1) Cost
Drawbacks
Frequency Modulation
(1) Not easy to be implemented (2) Quality is inconsistent
27
Contents
s

Summary
28
Speech Compression Technologies

s
In last decade, we have seen rapid progress in speech technologies. Present speech coders are tending to source-specific and hearing-specific for low rate consideration. Speech compression technologies are now wildly applied to many applications like
Digital Telecommucation devices (Cell phone, ISDN, DECT, SST, DAM, ) Digital voice recording accessories of Cell phone, PDA, DSC, ... Electronic Language learning solution Toys
29
Quality Measures
s
Rather from Audio compression technologies, there does exist an impersonal quality measure method called MOS (Mean Opinion Scoring)
MOS (Mean Opinion Score) 5 4 3 2 1 Impairment scale Imperceptible Perceptible, but not annoying Slightly annoying Annoying Very annoying
30
Major Speech Coders
Type of coder PCM ADPCM GSM CELP LPC
Bit Rates in Kb/sec 64 32 13 4.8 2.4
MOS 4.3 4.1 3.8 3.3 2.6
31
Waveform Coding
s
PCM (Pulse Code Modulation)

Quantized Output
Analog Input
0111 0110 0101 0100 0011 0010 0001 0000
0001
0100
0110
0110
0100
0011
0101
0110
0111
0111
0111
0101
0010
0000
32
Waveform Coding
s
ADPCM (Adaptive Differential Pulse Code Modulation)
Analysis of speech waveforms shows a high sample-to-sample correlation. ADPCM (Adaptive differential Pulse Code Modulation) was developed to further reduce bit rate while preserving the overall speech quality.
Step size Calculation ss(n+1) Adjusted step size Z-1 ss(n) Step size
X(n) Linear Input Signal
+ -
d(n) difference
Encoder
L(n) ADPCM output sample
X(n-1) estimate of last input sample
Z-1
X(n)
Decoder
33
Source Coding
s
Speech is produced when air is forced from the lungs through the vocal cords and along the vocal tracts. Voiced sound are produced when the vocal cords vibrate open and closed like quasi-periodic pulses. Unvoiced sounds result when the excitation is a noise-like turbulence.
A Periodic Signal
B Variable Signal
C Output sound
34
Source Coding
s
LPC (Linear Predictive Coder)
h d wdna B t i
yc ne uqerf t na m oF r
Pn
Pulse generator Voiced/unvoiced control White noise generator
e duil p mA t
X
P3 P2 P1
X X X
Speech Signal
35
Hybrid Coding
s
Hybrid coding is an analysis-by-synthesis approach. The encoder analyzes the input speech by synthesizing many different approximations to it, then transmits information representing the synthesis filter parameters and the excitation to the decoder.
Input speech s(n)
Encoder
Excitation Generation
u(n)
Synthesis Filter
s(n)
e(n) Error Weighting
Error Minimization
ew(n)
Decoder
Excitation Generation
u(n)
Synthesis Filter
s(n)
Reproduced speech
36
Speech Compression Technologies Brief Summary

s
Typically waveform coding (like ADPCM) is used at high bit rates, and gives very good quality speech. Source coding (like LPC) operates at very low bit rates, but tend to produce speech which sounds synthetic. Hybrid coding (like CELP) uses techniques from both source and waveform coding, and gives good quality speech at intermediate bit rates.
MOS 5 4 3 2 1 1 2 4 8 16 32 64 (Kbps)
37
Hybrid Coding
Waveform coding
Source Coding
Contents
s

Summary
38
Speech product Offering - I ELL (Electronic Language Learning)

ELL System Block Diagram
I/O & Peripherals (Keyboard, battery, ...) SRAM (Data buffer, PIM) LCD Module (Displa y)
D/A (Voice output)
Flash ROM (Program, data)
MCU core (6502, Z80, 8051)
DSP (LRC, synthesizer) A/D (Voice input, Pen-input)
USB Memory Card PC
* Red block means the components or technologies that MXIC can provide.
39
MXIC ELL Product Features

THV - True Human Voice
s
What is True Human Voice?
Record Human voice
Compression in PC
Code stored in ROM
DSP decodes and playback
What can MXIC provide to THV solution?
MXIC has 1.2K/2.0Kbps LRC (Low-Rate Coder) with excellent speech quality. Over 50,000 THV words can be stored in 64Mb ROM based on 1.2Kbps LRC.
40
MXIC ELL Product Features

THV - True Human Voice
s
Why is it so important to have Sequential ROM interface in ED application? Because:
ED needs larger and larger Mask ROM density:

Content becomes larger and larger True Human Voice
MCU just needs 20 pins up to 4Gb Sequential ROM. It saves pin-count, which means to save die size Sequential ROM is the most cost-effective
MXIC Sequential ROM
Conventional ROM
41
Worldwide ED Market Size
Japan 44%
China 44%
Korea HK 2% 3%
Taiwan 7%
China Quantity (K sets) 4,000

Source: MXIC, 2001
Taiwan 600
HK 300
Korea 200
Japan 4,000
Total 9,100
42
Worldwide ED Market Size

14,000 12,000 10,000 8,000 6,000 4,000 2,000 0 1999 2000 2001 2002 2003 Japan Korea HK Taiwan China
China Taiwan HK Korea Japan Total
1999 2000 2001 2002 2003 2,600 3,600 4,000 4,600 5,400 500 550 600 630 660 250 280 300 320 350 180 200 200 220 240 3,000 3,500 4,000 4,500 5,000 6,530 8,130 9,100 10,270 11,650
CAGR 20.05% 7.19% 8.78% 7.46% 13.62% 15.57%

43
Source: MXIC, 2001
ELL Product Road Map

DA D&P E o r fo r ss All-in-one ED Proce ch e (MCU + DSP + S-ROM I/F) & Spe CU M
MX93L551 DVR Processor with LRC ARM7TDMI embedded ED Controller 6502 embedded ED Controller LRC decoder
MX93L552 DVR Processor with VR
Z80 embedded 3-in-1 ED Controller

Q2/2001 Q3/2001 Q4/2001 Q1/2002
Q1/2001
Q2/2002
Q3/2002
Q4/2002
* Rectangle means existing products, and circle means under developing products * Left edge of circles is the project starting schedule, and the right edge of circles is the commercial sample schedule. * DVR stands for Digital Voice Recorder, VR stands for Voice Recognition 44
MXIC ELL Solution Advantage

s
We can provide THV (True Human Voice) solution! We can provide MCU ASSP with:
Effective Sequential ROM interface for program and data storage in ED with THV (True Human Voice) feature
We can provide Sequential ROM family (64Mb ~ 256Mb) for ED and E-Book
45
Speech Product Offering - II DVR (Digital Voice Recorder)

LCD Display
Keypad
Micro controller
Digital Voice Recorder (DSP Engine Chip)
Speak er M I C
Flash
46
DVR (Digital Voice Recorder)

s
Message management:

Playback, Fast Forward, Rewind Forward/backward Search within specific message Repeat
RW
FF
00:00
05:30
02:15
200ms
Repeat
05:10
BS
FS
47
DVR (Digital Voice Recorder)

s
PSA (Playback Speed Adjustment) can be ranged from 50% to 200%
50%
100%
200%
Fast Playback Normal Playback Slow Playback
48
MXIC DVR Solution Advantage

s
We can provide switchable speech compression rate (4.8K/12.8K/32Kbps) for different speech recording systems We can provide flexible speech manipulations like:

Folder management Playback, pause, FF, RW, Repeat, Forward/backward search, append, PSA (Playback Speed Adjustment)
We can provide Total System Solution (MCU, DSP, Flash)
49
Speech Product Offering - III DAM (Digital Answering Machine)

Telephone Line All-in-one DAM Controller
Microphone
MCU (System control code)
Speaker
AFlash (Voice Prompt)

Key pad
Disp lay
50
DAM (Digital Answering Machine)

s
Key successful factor is to have an excellent speech CODEC

Switchable compression rate: 4.8K/12.8K/32Kbps MRC (Multi-Rate Coder): 3.6Kbps ~ 14.2Kbps
Full-duplex speakerphone is highlighted in this application Also, Telecom signal processing (tone generation/detection) is also included
SPK Driver MIC Gain Line Gain Line Driver
Speaker Microphone ACOUSTIC COUPLING
PCM Codec-1
DAM Engine Chip
PCM Codec-2
4-2 wire coupling LINE COUPLING
51
Worldwide DAM Size
Japan 19%
Others 5%
Europe 16%
North America 60%

unit: M sets
Q'ty (M sets) Source: MXIC, 2001
North America 22
Europe 6
Japan 7
Others 2
Total 37
52
Worldwide DAM Size

45,000 40,000 35,000 30,000 25,000 20,000 15,000 10,000 5,000 0 1999 2000 2001 2002 2003
unit: K sets
Others China Europe Japan US
US Japan Europe China Others Total
1999 20,500 6,800 6,200 180 1,800 35,480
2000 21,500 7,000 6,000 200 2,000 36,700
2001 22,000 7,000 6,000 200 2,000 37,200
2002 22,500 7,100 5,800 600 2,100 38,100
2003 CAGR 23,000 2.92% 7,200 1.44% 5,800 -1.65% 1,000 53.53% 2,200 5.14% 39,200 2.52%
53
Source: MXIC, 2001
DAM Product Road Map
DAM processor n olutio embedded 1Mb MTP S DAMMX93L132A MX93132 3V MRC DAM 5V DAM w/ MX93L108 w/ CID/SPK CID/SPK Entry level DAM Processor MX93L111A MX93111 5V DAM 3V MRC DAM
DAM embedded 4Mb Flash
Q1/2002
Q2/2002
Q3/2002
Q4/2002
Q1/2003
Q2/2003
Q3/2003
Q4/2003
* Rectangle means existing products, and circle means under developing products * Left edge of circles is the project starting schedule, and the right edge of circles is the commercial sample schedule. * MRC stands for Multi-Rate Coder, CID stands for Caller ID, and SPK stands for Speaker phone 54
MXIC DAM Solution Advantages

s
MXIC has different kinds of solutions in each DAM market segment MXIC is the leader in mid-range segment, and Top 2 DAM IC Vendor in the World MXIC provides one-stop shopping service (DSP, MCU, AFlash) in DAM application
High-end MRC (Multi-Rate Coder) + 8/16Mb AFlash 12.8K/32Kbps + 64/128Mb SDRAM MXIC
Low-end
55
Contents
s

Summary
56
Audio Product Offering - I AIRTM (Audio IC Recorder)

Audio Devices Audio input
Audio Devices
Host controller
Audio Encoder/Decoder Processor
16-bit Audio Codec

Power Amplifier
Speaker
Memory (MMC, CF, SD, Memory Stick, )
Audio ROM
Flash
Headphone
57
AIRTM (Audio IC Recorder)

s
AIRTM, A brand new Audio product concept!
Built-in S/PDIF, Audio data can be directly saved into the MP3 Player via its MP3 real-time encoding. Say Good-bye to the sophisticated PC download method!
CD
Compression
Download
Audio Devices
S/PDIF
58
AIRTM (Audio IC Recorder)

s
Mini Component System and Portable Audio:

Upgrade Conventional Models to Fully-Digital Audio (MP3) Alignment with Young Generations Portable MP3 Players!
MX92L600 Audio IC Recorder
Mini Component System
Portable Audio
Portable MP3 Players
Cassette
Memory Cards
59
Audio Product Offering - II Wavetable Sound Generator

s
MIDI for Sound Generator: Sound Generator ASSP SRAM Audio DAC
MIDI IN
Micro Processor
Wavetable Synthesizer
Program ROM
Wavetable ROM
60
Digital Audio Product Road Map
ion Solut er Audio ROM derivatives ecord er & R ay AC P l MP3/A

MX92L600 MP3 Codec Promotional Singles (8MB embedded)
MX92L500 MP3 decoder
Q1/2001
Q2/2001
Q3/2001
Q4/2001
Q1/2002
Q2/2002
Q3/2002
Q4/2002
* Rectangle means existing products, and circle means under developing products * Left edge of circles is the project starting schedule, and the right edge of circles is the commercial sample schedule. * DVR stands for Digital Voice Recorder, LRC stands for Low-Rate Coder 61
MXIC Digital Audio Advantages

s
Professional MIDI technology (with General MIDI V1.0 Sound set, 32 Polyphony and 32 Multi-timbre) provides supreme sound generator solution for Mobile phones, PDA, ED, and Toys applications. Complete solution for MP3 player and recorder In-house Sequential ROM, Flash and Memory Card support
62
Contents
s

Summary
63
Summary
s
Among Audio Compression technologies, MP3 is the most mature one, while MP3PRO is deemed to be a future start. FM and wavetable synthesis are mainstream Audio synthesis technologies, and wavetable synthesis seems superior pratically. Different speech technologies are for different applications. Among all, Hybrid coding is superior reinforced by DSP technology. MXIC focus on Audio & speech technologies, and several products related to Audio & Speech were presented.
64
Audio Compression & Synthesis Technology Overview
Moving Toward IA, Moving with Us!

65

Audio Compression &amp; Synthesis Technology

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Audio Compression &amp; Synthesis Technology

Transféré par

Droits d'auteur :

Formats disponibles

Audio Compression & Synthesis Technology Overview

Adam Chang, MSEE Product Marketing Manager

Speech Product Offering Digital Audio Product Offering

Speech Product Offering Digital Audio Product Offering

Audio Compression Technologies

In addition to internet music, Audio compression technologies are applied to:

MPEG-1/Audio Layer 3 Coding

Audio Encoding System Overview

Digital Audio Input

Bit or Noise Allocation

Hybrid Filter Bank

Audio Compression Technologies Comparison

Technology MP3 MP3PRO AAC WMA ATRAC3

Bit rate (Kbit/sec) 128 64 96 96 72

Audio Compression Technologies Brief Summary

Speech Product Offering Digital Audio Product Offering

Audio Synthesis Technologies

Music Keyboard Cell phone sound generator Toys Melody accessories

Wavetable Synthesis Technology

Converts linear 16-bit samples into 8-bit codes

log(1 + 255 | s |) s = sign( s ) log(1 + 255)

log(1 + 255s ) s = log 256

8-bit u-Law codes

16-bit linear samples

A Typical Waveform of Sound

A (Attack), D (Decay), S (Sustain), R (Release) For non-percussive instruments (e.g. violin)

For percussive instruments (e.g. piano, drum)

Memory Pointer Pointer

Some particular pitch fs

Pitch shifted up by one octave 2fs

Wavetable System Implementation

Audio Out (L)

Audio Out (R)

Micro Processor Program ROM

Wavetable Synthesizer Wavetable ROM

FM is actually a process of varying the frequency of a signal, often periodically;

Pyramidal Wave Created

Saw toothed Wave Created Paramete r

Oblong Wave Created

Carrier (Sine wave) Carrier Created

A device producing carrier or modulator is called an operator

At least two operators are required to generate sound of a musical instrument.

Audio Synthesis Comparison

Technology Wavetable Synthesis

(1) Not easy to be implemented (2) Quality is inconsistent

Speech Product Offering Digital Audio Product Offering

Speech Compression Technologies

Major Speech Coders

Type of coder PCM ADPCM GSM CELP LPC

Bit Rates in Kb/sec 64 32 13 4.8 2.4

MOS 4.3 4.1 3.8 3.3 2.6

PCM (Pulse Code Modulation)

0111 0110 0101 0100 0011 0010 0001 0000

ADPCM (Adaptive Differential Pulse Code Modulation)

X(n) Linear Input Signal

L(n) ADPCM output sample

X(n-1) estimate of last input sample

LPC (Linear Predictive Coder)

Pulse generator Voiced/unvoiced control White noise generator

e(n) Error Weighting

Speech Compression Technologies Brief Summary

Speech Product Offering Digital Audio Product Offering

Speech product Offering - I ELL (Electronic Language Learning)

Audio Compression & Synthesis Technology

Audio Compression & Synthesis Technology