Académique Documents
Professionnel Documents
Culture Documents
DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
Digital Audio
Overview The principles which underlie almost all digital audio applications and devices, be they digital signal processing , digital recording , CD,HDBRS iPod playback ,Internet broadcasting ,Digital Broadcasting are all based on the basic concepts which follow here in. New forms of playback, file formats, compression and storage of data are all changing on a seemingly daily basis, but the underlying mechanisms for converting real-world sound into digital values, manipulating those data and finally converting them back into real-world sound has not varied much . Analog to digital (A/D) conversion Sampling Aliasing Nyquist criteria Sample rates Quantization Bit Rates Dither Encoding PCM or any other algorithm based..like MP3,AAC etc
DEEPAK JOSHI DY.DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
A to D / D to A conversion Digital recording converts the analog wave into a stream of numbers and records the numbers instead of the wave. The conversion is done by a device called an analog-todigital converter (ADC). To play back the music, the stream of numbers is converted back to an analog wave by a digital-toanalog converter (DAC). The analog wave produced by the DAC is amplified and fed to the speakers to produce the sound.
When you sample the wave with an analog-to-digital converter, you have control over two variables: The sampling frequency - Controls how many samples are taken per second . It tells sample rate which defines frequency resolution The sampling precision - Controls how many different gradations (quantization levels) are possible or required when taking the sample. Which helps in defining Bit depth or Bit resolution. Bit depth defines amplitude resolution. Higher Bit depth means greater dynamic range, lower noise floor and high fidelity.
DEEPAK JOSHI DY.DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
Sampling
To convert an analog signal to a digital format, the voltage is sampled at regular intervals, thousands of times per second. The value of each sample is rounded to the nearest integer on a scale that varies according to the resolution of the signal. The integers are then converted to binary numbers. The sampling frequency is how many times per second the voltage of the analog signal is measured or The sampling rate is the number of times your audio is measured (sampled) per second.
In the adjoining figure, let's assume that the sampling rate is 1,000 per second and the precision is 10. The green rectangles represent samples. Every one-thousandth of a second. . When the DAC recreates the wave from these numbers, you get the blue line shown You can see that the blue line lost quite a bit of the detail originally found in the red line, and that means the fidelity of the reproduced wave is not very good. This is the sampling error.
Now In these figure, both the rate and the precision have been improved by a factor of 2 (20 gradations at a rate of 2,000 samples per second): Further the rate and the precision have been doubled again (40 gradations at 4,000 samples per second): You can see that as the rate and precision increase, the fidelity (the similarity between the original wave and the DAC's output) improves. Higher the sampling frequency the closer the shape of Digital wave form to that of original analogue signal
DEEPAK JOSHI DY.DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
Nyquist criteria :
where: fs = 1/Ts is the sampling frequency & V(t) = value of signal at arbitrary time t. V( nTs) at discrete times tn = nTs where n = ... -1, 0 , 1 , 2 , 3 ... The Nyquist theorem states that if a signal V(t) does not contain frequencies higher than fs/2 (where fs = 1/Ts), then it can be fully recovered from its sampled values
DEEPAK JOSHI DY.DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
The sampling rate must be at least twice as high as the highest frequency to be reproduced. In other words to reproduce a given frequency the sampling rate must be at least twice that frequency. Nyquist theory suggests that the actual upper threshold of a piece of digital audio will top out at half the sample rate. So if you are recording at 44.1, the highest frequencies generated will be around 22kHz. That is 2khz higher than the typical human with excellent hearing can hear. So at the red book standard for CDs, the sample rate is 44.1 kHz or 44,100 slices every second. Hence CD quality audio is sampled at 44.1 kHz (Nyquist frequency = 22.05 kHz)
Current standards of sampling full-quality (i.e. not compressed) digital audio rates are: Rate 32K 44.1K 48K 96K 192K Use(s) Older DATs, voice quality CD, DAT, digital recording software/hardware DAT, digital recording software/hardware Digital recording software/hardware Digital recording software/hardware
Why did the CD standard settle on 44.1K rather than say 48K? Rumor has it that video equipment already had clocks that ran at 44.1K that could be integrated into the first CD players. Its also heard that Herbert von Karajan complained to Sony that Beethoven's 9th would not fit on the early CD specifications. By lowering the rate to 44.1K, 74 minutes could be recorded onto a CD using 16-bit samples, enough to do the trick.
DEEPAK JOSHI DY.DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
Aliasing
If the audio signal contains components which are above half the sample frequency aliasing will occur. First consider the example below which shows the frequency bands created when a 20 kHz audio band is sampled at 48 kHz.
The sampling process produces a signal at the sampling frequency and a band of frequencies which spread above and below this by an amount equal to the audio bandwidth. A further set of frequencies are created at double the sample frequency and at three times and so on
DEEPAK JOSHI DY.DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
If the sample frequency is reduced to 32 kHz but the audio band remains at 20 kHz, the 20 kHz bands either side of the sample frequency will overlap with the audio band. Aliasing will occur which has the effect of generating additional side bands which can be extremely audible. Inter-modulation can also produce further frequencies and the sound can become highly distorted.
The diagram above indicates that even at a sampling rate of 44.1K (the CD rate), some audible frequencies are attenuated by the filter. Attenuation is determined by the quality of the filter and it steepness factor (or 'Q'). So note that The higher the sampling rate used, the less noticeable impact of the filter rolloff on audible frequencies become as more and more of the rolloff is above audio rate.
DEEPAK JOSHI DY.DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
Over-sampling
Using a sample frequency which is double the one finally required moves the sides bands further away from the audio band and allows simpler filters to be used. Commercial designs can use sampling which is many times the required rate.
Quantization
When a wave form is digitised the amplitude is converted to a numeric value. Samples taken are assigned numeric values that the computer or
digital circuit can use or store in a process called quantization. Quantization is the process of selecting whole numbers to represent the voltage level of each sample. The A/D converter must select a whole number that is closest to the signal level at the instant its sampled. This produces small rounding errors that cause distortion. The number of available values is determined by the number of bits (0's and 1's) used for each sample, also called bit depth or bit resolution
Bit Depth refers to the number of bits you have to use to capture audio. The easiest way to envision this is as a series of levels, that audio energy can be sliced at any given moment in time. It is also called Bit Resolution . The smaller the number of bits used per sample, the greater the distances the analog values need to be rounded off to.
DEEPAK JOSHI DY.DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
Bit Depth.
As per the pioneering work done by (Nakajima, 1983; Mieszkowski, 1987 )The maximum Signal-to-Noise Ratio (S/N) in decibels and The maximum representable signal amplitude to the maximum quantization error for of an ideal ADC or DSP-based digital system is actually calculated as:
1.76 dB is based on sinusoidal waveform statistics and would vary for other waveforms, while n represents the data word length in bits of the converter or the digital signal processor . In undithered DSP-based systems, the SNR definition above is not directly applicable since there is no noise present when there is no signal . Therefore, when referring to SNR or Dynamic range in terms of DSP data word size and quantization errors"both terms mean the same thing. So Note that the "6-dB-Per-Bit-Rule" is an approximation to calculating the actual dynamic range for a given word width.
DEEPAK JOSHI DY.DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
As a rule of thumb, each bit of precision used in quantization adds 6 dB to the SQNR and dynamic range. Thus, converter resolutions of 8, 12,16, and 20 bits would allow a 48, 72, 96, and 120 dB S/N ratio or dynamic range respectively Sound sampled with 16-bit precision ("CD-quality") has a SQNR of 96 dB, which is quite good--much better than traditional tape recording. Why does bit depth matter ? It's a matter of the human ear, which is only accustomed to hearing and resolving frequencies from 2020,000 hz., but can resolve about 130 dB of dynamic range. The dynamic range of CDs and 16 bit digital audio is, at best, 90 dB; the dynamic range of 24 bit digital audio is 109-120 dB, depending on the quality of the converters. You see, the DVD audio standard allows a wider variety of amplitudes, or volumes, than CDs (which is its main improvement)
DEEPAK JOSHI DY.DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
CD quality recordings are done with a sample size of 16 bits. Some newer audio devices like DVDs are capable of working with the sample size of 24 bits. One of the advantages of using digital recorders with more bits is the ability to directly record uncontrolled microphone signals.
DEEPAK JOSHI DY.DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
The bit-rate of uncompressed audio can be calculated by multiplying the sampling rate by the resolution (8-bit, 16-bit, etc.) and the number of channels. For example, CD Audio (or a WAV file extracted from a CD) has a sampling rate of 44,100 times per second, a resolution of 16 bits and two channels. The bit-rate would be approximately 1.4 million bits per second (1,411 kbps).
For compressed sound its lower than the above product and the difference
Bit Depth
Sample Rate
Bit Rate
16
44,100
16
48,000
24
96,000
Number Size Size per of mono per mono song tracks track 8 8 8 16 16 16 15.1 Mb 16.5 Mb 49.5 Mb 15.1 Mb 16.5 Mb 49.5 Mb 121 Mb 132 Mb 396 Mb 242 Mb 264 Mb 792 Mb
Bit-rate 1,411 kbps (CD Audio) 80 kbps 128 kbps 160 kbps 192 kbps 256 kbps 320 kbps
10.3
Question arises which is the purest digital audio ? Can we afford to record and store it in that format ?
What about the bandwidth aspect and transferability convenience ? What is the solution ?
Compression ..?
Format
CD MPEG Layer-I MPEG Layer-II MPEG Layer-III (MP3) MPEG Layer-III (MP3) MPEG AAC
Bit-rate
1.4Mbps 384 kbps 256 kbps 192 kbps VBR Normal/High 128 kbps
Compression
None 3.6=1 5.5=1 7.3=1 7=1 to 10=1 11=1
Format
DSP TrueSpeech
Attributes
8.0kHz,1 bit, mono 11.5kHz, 16kBit/s mono
MP3*
PCM PCM (High Quality) WMA voice WMA lossless Flac lossless
44.1kHz,16 bit, stereo 96kHz, 24 bit, stereo (available for Professional and Developer Edition users only) 20kBit/s, 22.05kHz, mono VBR Quality 100, 44 kHz, 2 channel 16 bit 96kHz, 16 bit, stereo
1978
9 150 240
Audio Formats
Audio Formats can be broken down into three main categories :- Uncompressed formats , Lossless compression formats & Lossy compression formats.
Uncompressed audio formats (often referred to as PCM formats) are just as the name suggests formats that use no compression. This means all the data is available, at the risk of large file sizes. A WAV audio file is an example of an uncompressed audio file. Lossless compression format applies compression to an uncompressed audio file, but it doesn;t lose information or degrade the quality of the digital audio file. The WMA audio file format uses lossless compression . Lossy compression format will result in some loss of data as the compression algorithm eliminates redundant or unnecessary information basically it tosses what it sees as irrelevant information. Lossy compression has become popular online because of its small file size, it is easier to transmit over the Internet. MP3 and Real Audio files uses a lossy compression
DEEPAK JOSHI DY.DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
.aif or AIFF (Audio Interchange File Format), Apple Computer developed the Audio Interchange File Format (AIFF) audio file format .It is the gold standard of 16-bit audio, travels well between almost all computers and software . Also capable of 24- bit and 32-bit resolution .. Can be used to store high-quality sampled audio and musical instrument information. it uses much more disk space -- about 10MB for one minute of stereo audio. Extension icon: .aif compressed version of AIFF was thought to be candidate to supercede AIFF as it non lossy compression format,Compression obtained by manipulating bitrate. or BWF (Broadcast Wave Format) is a standard audio format created by the European Broadcasting Union as a successor to WAV. BWF allows metadata to be stored in the file. This format is the primary recording format used in many professional Audio Workstations used in the Television and Film industry. Extension icon: .bwf
DEEPAK JOSHI DY.DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
WAV (.wav) WAV is the format used for storing sound in files developed jointly by Microsoft and IBM. Support for WAV files was built into Windows 95 making it the de facto standard for sound on PCs. Windows uses the Wave Form Audio (WAV) file format to store sounds as waveforms. One minute of Pulse Code Modulation (PCM)-encoded sound can occupy as little as 644 kilobytes (KB) or as much as 27 megabytes (MB) of storage. This size of the storage space depends on the sampling frequency, the type of sound (mono or stereo), and the number of bits that are used for the sample. WAV sound files end with a .wav extension and can be played by nearly all Windows applications that support sound. Extension icon: .wav WMA - Windows Media Audio (.wma) Windows Media Audio (.wma) Short for Windows Media Audio is a Microsoft file format for encoding digital audio files similar to MP3 though can compress files at a higher rate than MP3. Systems Format (.asf) files include audio that is compressed with the Windows Media Audio (WMA) codec. WMA files, which use the ".wma" file extension, can be of any size compressed to match many different connection speeds, or bandwidths. Extension icon: .wma
DEEPAK JOSHI DY.DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
MP3 (.mp3) MP3 is the name of the file extension and also the name of the type of file for MPEG, audio layer 3. Layer 3 is one of three coding schemes (layer 1, layer 2 and layer 3) for the compression of audio signals. Layer 3 uses perceptual audio coding and psychoacoustic compression to remove all superfluous information (more specifically, the redundant and irrelevant parts of a sound signal. The stuff the human ear doesn't hear anyway). It also adds a MDCT (Modified Discrete Cosine Transform) that implements a filter bank, increasing the frequency resolution 18 times higher than that of layer 2. The result in real terms is layer 3 shrinks the original sound data from a CD quality sound (with a bit rate of 1411.2 kilobits per one second of stereo music) by a factor of 12 (down to 112-128kbps) with almost the same fidelity . Extension icon: .mp3 Real Audio (.ra .ram .rm) Real Audio is a proprietary format, and is used for streaming audio on line that enables you to play digital audio files in real-time. To use this type of file you must have RealPlayer (for Windows or Mac), which you can download for free. Real Audio was developed by RealNetworks Extension icon: .ra .ram .rm
Comparison Analogue v/s Digital Table I summarizes the comparison of studio quality reel-to-reel analog tape recorder with 16 bit digital recorder. These data are derived from specifications by various manufacturers of analog and digital audio products. This table implies that the digital recorder has many advantages over its analog counterpart Performance.
Parameter
S/N Ratio Total harmonic distortion Wow and Flutter Frequency Response Loss of S/N during copying
YES
What is AES/EBU ?
Is the most popular audio standards Is a bit serial communication protocol for transmitting digital audio It provides up-to two channels of 24 bit per sample Provides both professional and consumer modes
AES/EBU
Originally published in 1985 Ratified by EBU with provision of transformer coupling Later on IEC combined professional and consumer standards
DEEPAK JOSHI DY.DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
It provides two channels of audio data, a method for communication control and status information and some error detection capabilities. AES/EBU is a bit-serial communications protocol for transmitting digital audio data through a single transmission line. Clocking information is derived from the AES/EBU bit stream, and is thus controlled by the transmitter. The standard mandates use of 32 kHz,44.1 kHz, or 48 kHz sample rates, but some interfaces can be made to work at other sample rates.
DEEPAK JOSHI DY.DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
Sampling frequencies
B loc k 1
B lo ck 2
B lo ck 3
B loc k 4
F ram e 1 91
F ram e 0
F ram e 1
S u b F ram e 2 28 V U 29 C 30 P 31
X , Y , Z P re am ble
V U C P
= = = =
Sub-frame
Y Subframe
Frame
32 Bits/subframe 4 Preamble Bits, 24 Data Bits, 1 Validity, 1 User, 1Channel Status & 1 Parity
Highlights
LSB is sent first than MSB for simple arithmetic Even parity is employed for each sub frame Two sub frames make one frame First take L channel and the second sub frame take R channel
Highlights Contd..
At 48kHz the bit rate will be 3.072 MHz Two separate synchronizing patterns are used for easy separation L & R Channels When the resolution required is only 20 bits, then we can use 4 bits for talk back
Frame Sequence
One Block = 192 Frames One Frame = 2 Subframes Start of Block indicated by Z Preamble
DEEPAK JOSHI DY.DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
Sub-frame Structure
Preamble
4 bit synchronizing word X (Channel1) Y (Channel 2) Z (Channel 1 & 192 Frame Identifier)
Aux
4 bit used for auxiliary data or 4 LSB of 24 bit audio word
Audio Data
24 bit audio or 20 bit audio
Parity (P)
Produces even parity in values 4-31. Simple error detection
DEEPAK JOSHI DY.DIRECTOR STI(T)
Create PDF files without this message by purchasing novaPDF printer (http://www.novapdf.com)
Auxiliary data
Used for studio talk balk communication Samples at 1/3rd of sampling frequency 12 bit sample word Divided into 4 bit nibbles 192X4 = 768 bits per channel/block 64X12=768 talkback samples
USER BIT
Has flexible frame length User bit management is in byte 1 of Channel status Many professional eqpts. Do not use this. In such cases it is set to ZEROs
SPDIF
Sony-Philips Digital Interface format Refers to AES/EBU standard operated in consumer mode Un balanced RCA connector is used Not used for embedding in Digital video Used for CD and R Dat (44.14 and 48kHz)