Académique Documents
Professionnel Documents
Culture Documents
Contents
s
Audio Compression Technology Overview Audio Synthesis Technology Overview Speech Compression Overview MXIC Solution to Digital Audio & Speech Applications
Summary
Contents
s
Audio Compression Technology Overview Audio Synthesis Technology Overview Speech Compression Overview MXIC Solution to Digital Audio & Speech Applications
Summary
A wild range of Audio compression technologies are available, but few of them are really commercialized.
Owing to internet music, MPEG-1/Audio Layer-3 (so called MP3) becomes the most successful Audio compression technology
Portable solid-state Audio recorder Internet Radio DAB (Digital Audio Broadcast) system Audio accessories of portable devices (Cell phone, PDA, )
s s s s
MPEG/Audio compression layer 3 is now well known as MP3 Low bit-rate Application 64Kbps for mono channel Sampling Frequency: 32, 44.1, or 48KHz Lossy compression algorithm: 12-to-1 Compression ratio
Filter Bank
Bitstream Formatting
Encoded Bitstream
Psychoacoustic Model
Polyphase filter bank divides the audio signal into 32 equal-width frequency sub bands. Processing the filter outputs with a MDCT (Modified Discrete Cosine Transformation)
Psychoacoustic Model
s
Incoming signal is transformed from time domain to frequency domain for analysis. Psychoacoustic model will calculate SMR (Signal-to-Mask Ratio) to each band by using auditory perception like Simultaneous Masking, Temporal Masking, and Absolute Threshold. SMR of each band will have direct impact to compreesion rate and audio quality. Different Psychoacoustic models are chosen upon trade-off between audio quality and compression rate.
Noise/Bit Allocation
Based on SMR from Psychoacoustic model and bit rate restriction, 576 frequency coefficients are grouped to scale factor bands. Each scale factor band executes noise (or bit) allocation by repeating adjustment of its scale factor and global gain until distortion is minimized. Non-uniform quantization & Huffman Coding
Advantages (1) Internet Music Standard (2) Easy to be silicon LSI (1) High compression ratio (2) Extention of MP3 (1) Excellent audio quality (2) High compression ratio (1) Excellent audio quality (2) High compression ratio (1) Excellent audio quality (2) High compression ratio
Drawbacks (1) Bit rate is too high (1) IP by Thomson Multimedia (2) No encoder IC available (1) Not internet music standard (2) No encoder IC available (1) IP by Microsoft (2) No encoder IC available (1) IP by Sony (2) No encoder IC available
10
MP3 is the most mature technology, and its encoder is easy to be implemented by silicon LSI
Among newly developed Audio compression technologies, MP3PRO is the most shining star, because:
It is backward compatible with MP3 Its compression rate is the lowest based on the same audio quality like MP3 Its encoder is easier to be implemented by silicon LSI Thomson Media aggressively promotes it be new internet music standard
11
Contents
s
Audio Compression Technology Overview Audio Synthesis Technology Overview Speech Compression Overview MXIC Solution to Digital Audio & Speech Applications
Summary
12
Audio synthesis technology is actually an method of producing sounds where no acoustic sound is used
Among audio synthesis technologies, FM (Frequency Modulation) and Wavetable Synthesis are now mainstream Audio technologies
Audio synthesis technologies are now wildly applied to many applications like
13
14
u-Law Compression
s
16
Sound Model
s
ADSR Model
0 dB
D A
eduil p m t a not aunet a i t
S R
note on
note off
time
17
Sound Model
s
ADSR Model
0 dB
D A
eduil p m t a not aunet a i t
S R
note on
note off
time
18
Loop
19
Envelope Control
20
Pitch shift
s
Use one or limited sound samples of notes to generate all notes you want to perform Access the stored sample memory at different rates during playback
Memory
21
Interpolation
22
RAM
DAC
MIDI IN
23
FM (Frequency modulation)
s
24
FM (Frequency Modulation)
s
Fundamental principle of FM sound generator is to synthesizing tones by combining modulation signal and carrier signal.
Modulator
FM Modulation
Output Sound
Paramete r
Paramete r
25
FM (Frequency Modulation)
s
For percussion instruments, at least 4 operators are required if expecting decent instrumental sound quality
26
Theoretically, FM and Wavetable synthesis can achieve the same audio quality.
Advantages (1) Easy to be implemented (2) Quality consistent (1) Cost (1) Cost
Drawbacks
Frequency Modulation
27
Contents
s
Audio Compression Technology Overview Audio Synthesis Technology Overview Speech Compression Overview MXIC Solution to Digital Audio & Speech Applications
Summary
28
In last decade, we have seen rapid progress in speech technologies. Present speech coders are tending to source-specific and hearing-specific for low rate consideration. Speech compression technologies are now wildly applied to many applications like
Digital Telecommucation devices (Cell phone, ISDN, DECT, SST, DAM, ) Digital voice recording accessories of Cell phone, PDA, DSC, ... Electronic Language learning solution Toys
29
Quality Measures
s
Rather from Audio compression technologies, there does exist an impersonal quality measure method called MOS (Mean Opinion Scoring)
MOS (Mean Opinion Score) 5 4 3 2 1 Impairment scale Imperceptible Perceptible, but not annoying Slightly annoying Annoying Very annoying
30
31
Waveform Coding
s
Analog Input
0001
0100
0110
0110
0100
0011
0101
0110
0111
0111
0111
0101
0010
0000
32
Waveform Coding
s
Analysis of speech waveforms shows a high sample-to-sample correlation. ADPCM (Adaptive differential Pulse Code Modulation) was developed to further reduce bit rate while preserving the overall speech quality.
Step size Calculation ss(n+1) Adjusted step size Z-1 ss(n) Step size
+ -
d(n) difference
Encoder
Z-1
X(n)
Decoder
33
Source Coding
s
Speech is produced when air is forced from the lungs through the vocal cords and along the vocal tracts. Voiced sound are produced when the vocal cords vibrate open and closed like quasi-periodic pulses. Unvoiced sounds result when the excitation is a noise-like turbulence.
A Periodic Signal
B Variable Signal
C Output sound
34
Source Coding
s
h d wdna B t i
yc ne uqerf t na m oF r
Pn
e duil p mA t
X
P3 P2 P1
X X X
Speech Signal
35
Hybrid Coding
s
Hybrid coding is an analysis-by-synthesis approach. The encoder analyzes the input speech by synthesizing many different approximations to it, then transmits information representing the synthesis filter parameters and the excitation to the decoder.
Input speech s(n)
Encoder
Excitation Generation
u(n)
Synthesis Filter
s(n)
Error Minimization
ew(n)
Decoder
Excitation Generation
u(n)
Synthesis Filter
s(n)
Reproduced speech
36
Typically waveform coding (like ADPCM) is used at high bit rates, and gives very good quality speech. Source coding (like LPC) operates at very low bit rates, but tend to produce speech which sounds synthetic. Hybrid coding (like CELP) uses techniques from both source and waveform coding, and gives good quality speech at intermediate bit rates.
MOS 5 4 3 2 1 1 2 4 8 16 32 64 (Kbps)
37
Hybrid Coding
Waveform coding
Source Coding
Contents
s
Audio Compression Technology Overview Audio Synthesis Technology Overview Speech Compression Overview MXIC Solution to Digital Audio & Speech Applications
Summary
38
* Red block means the components or technologies that MXIC can provide.
39
Compression in PC
MXIC has 1.2K/2.0Kbps LRC (Low-Rate Coder) with excellent speech quality. Over 50,000 THV words can be stored in 64Mb ROM based on 1.2Kbps LRC.
40
MCU just needs 20 pins up to 4Gb Sequential ROM. It saves pin-count, which means to save die size Sequential ROM is the most cost-effective
Conventional ROM
41
Japan 44%
China 44%
Korea HK 2% 3%
Taiwan 7%
Taiwan 600
HK 300
Korea 200
Japan 4,000
Total 9,100
42
1999 2000 2001 2002 2003 2,600 3,600 4,000 4,600 5,400 500 550 600 630 660 250 280 300 320 350 180 200 200 220 240 3,000 3,500 4,000 4,500 5,000 6,530 8,130 9,100 10,270 11,650
Q1/2001
Q2/2002
Q3/2002
Q4/2002
* Rectangle means existing products, and circle means under developing products * Left edge of circles is the project starting schedule, and the right edge of circles is the commercial sample schedule. * DVR stands for Digital Voice Recorder, VR stands for Voice Recognition 44
We can provide THV (True Human Voice) solution! We can provide MCU ASSP with:
Effective Sequential ROM interface for program and data storage in ED with THV (True Human Voice) feature
We can provide Sequential ROM family (64Mb ~ 256Mb) for ED and E-Book
45
Keypad
Micro controller
Speak er M I C
Flash
* Red block means the components or technologies that MXIC can provide.
46
Message management:
Playback, Fast Forward, Rewind Forward/backward Search within specific message Repeat
RW
FF
00:00
05:30
02:15
200ms
Repeat
05:10
BS
FS
47
50%
100%
200%
48
We can provide switchable speech compression rate (4.8K/12.8K/32Kbps) for different speech recording systems We can provide flexible speech manipulations like:
Folder management Playback, pause, FF, RW, Repeat, Forward/backward search, append, PSA (Playback Speed Adjustment)
49
Speaker
Disp lay
* Red block means the components or technologies that MXIC can provide.
50
Full-duplex speakerphone is highlighted in this application Also, Telecom signal processing (tone generation/detection) is also included
SPK Driver MIC Gain Line Gain Line Driver
PCM Codec-1
PCM Codec-2
51
Japan 19%
Others 5%
Europe 16%
North America 22
Europe 6
Japan 7
Others 2
Total 37
52
2003 CAGR 23,000 2.92% 7,200 1.44% 5,800 -1.65% 1,000 53.53% 2,200 5.14% 39,200 2.52%
53
DAM processor n olutio embedded 1Mb MTP S DAMMX93L132A MX93132 3V MRC DAM 5V DAM w/ MX93L108 w/ CID/SPK CID/SPK Entry level DAM Processor MX93L111A MX93111 5V DAM 3V MRC DAM
Q1/2002
Q2/2002
Q3/2002
Q4/2002
Q1/2003
Q2/2003
Q3/2003
Q4/2003
* Rectangle means existing products, and circle means under developing products * Left edge of circles is the project starting schedule, and the right edge of circles is the commercial sample schedule. * MRC stands for Multi-Rate Coder, CID stands for Caller ID, and SPK stands for Speaker phone 54
MXIC has different kinds of solutions in each DAM market segment MXIC is the leader in mid-range segment, and Top 2 DAM IC Vendor in the World MXIC provides one-stop shopping service (DSP, MCU, AFlash) in DAM application
High-end MRC (Multi-Rate Coder) + 8/16Mb AFlash 12.8K/32Kbps + 64/128Mb SDRAM MXIC
Low-end
55
Contents
s
Audio Compression Technology Overview Audio Synthesis Technology Overview Speech Compression Overview MXIC Solution to Digital Audio & Speech Applications
Summary
56
Audio Devices
Host controller
Speaker
Audio ROM
Flash
Headphone
* Red block means the components or technologies that MXIC can provide.
57
Built-in S/PDIF, Audio data can be directly saved into the MP3 Player via its MP3 real-time encoding. Say Good-bye to the sophisticated PC download method!
CD
Compression
Download
Audio Devices
S/PDIF
58
Upgrade Conventional Models to Fully-Digital Audio (MP3) Alignment with Young Generations Portable MP3 Players!
Portable Audio
Cassette
Memory Cards
59
MIDI for Sound Generator: Sound Generator ASSP SRAM Audio DAC
MIDI IN
Micro Processor
Wavetable Synthesizer
Program ROM
Wavetable ROM
* Red block means the components or technologies that MXIC can provide.
60
Q1/2001
Q2/2001
Q3/2001
Q4/2001
Q1/2002
Q2/2002
Q3/2002
Q4/2002
* Rectangle means existing products, and circle means under developing products * Left edge of circles is the project starting schedule, and the right edge of circles is the commercial sample schedule. * DVR stands for Digital Voice Recorder, LRC stands for Low-Rate Coder 61
Professional MIDI technology (with General MIDI V1.0 Sound set, 32 Polyphony and 32 Multi-timbre) provides supreme sound generator solution for Mobile phones, PDA, ED, and Toys applications. Complete solution for MP3 player and recorder In-house Sequential ROM, Flash and Memory Card support
62
Contents
s
Audio Compression Technology Overview Audio Synthesis Technology Overview Speech Compression Overview MXIC Solution to Digital Audio & Speech Applications
Summary
63
Summary
s
Among Audio Compression technologies, MP3 is the most mature one, while MP3PRO is deemed to be a future start. FM and wavetable synthesis are mainstream Audio synthesis technologies, and wavetable synthesis seems superior pratically. Different speech technologies are for different applications. Among all, Hybrid coding is superior reinforced by DSP technology. MXIC focus on Audio & speech technologies, and several products related to Audio & Speech were presented.
64