Vous êtes sur la page 1sur 14

Audio Compression

INTRODUCTION Large storage requirements limit the amount of audio data that can be stored on compact discs, flash memory, and other media. Large file sizes also give rise to long download times for retrieving songs from the internet. For these reasons (and others), there is considerable interest in shrinking the storage requirements of sampled sound.

1.Simple way of looking at audio compression

An audio signal can be compressed using digital filter. Digital filtering can reduce the storage requirements of digital audio by simply looping parts of the data that correspond to specific frequencies. Cutting out frequencies may affects the sound quality of data. However, the human ear is not equally sensitive to all frequencies. Digital filtering is an effective technique for compressing audio data in many situations.There are more effective ways to reduce the storage required of digital audio data, while also maintaining a high-quality sound. One idea is this: rather than cutting out less-important frequencies altogether, we could store the corresponding model coefficients with lower precision - that is, with fewer bits. This technique is called quantization. The less-important frequencies are determined by the magnitude of their DCT model coefficients.

MP3-like compression with MATLAB

clc clear all % Load an audio sample data set [b,R] = wavread ('cantget.wav'); %sound(b,R) N = length(b); % c w f Compute the interpolation model coefficients = dct(b); = sqrt(2/N); = linspace(0,R/2,N);

% Lets look at the weighted coefficients and pick a cut-off value

plot (f,w*c) % Pick a cut-off value and split the coefficients into low- and highprecision sets: cutoff = 0.00075; mask = (abs(w*c)<cutoff); low=mask.*c; high=(1-mask).*c; % This plot nicely illustrates the cut-off region: plot(f,w*high,'-R',f,w*low,'-b') % Now pick a precision (in bits) for the low precision data set: lowbits=8; % We wont quantize the high-precision set of coefficients (high), only the % low precision part (requires quantize.m): m = max(abs(low)); y = low/m; y = floor((2^lowbits - 1)*y/2); y = 2*y/(2^lowbits -1); low = m*y; % Finally, lets reconstruct our compressed audio sample and listen to it! z=idct(low+high); sound(z,R)

2.Audio Coding based on psychoacoustic model

Some coders are considered lossless because they retain all the original audio data while reducing bit-rate. Others coders are called lossy because they throw away portions of the audio stream that cannot be easily heard, and many are commonly in use today including MP3, WMA, AAC, PAC.

The psychoacoustic model is based on many studies of human perception. The two main properties of the human auditory system that make up the psychoacoustic model are:

absolute threshold of hearing auditory masking.

Absolute Threshold of Hearing

Humans can hear frequencies in the range from 20 Hz to 20,000 Hz. However, this does not mean that all frequencies are heard in the same way. Hearing a tone becomes more difficult as its frequency nears either of the extremes. the frequency range from 20 Hzto 20 kHz can be broken up into critical bandwidths, which are non-uniform, non-linear,and dependent on the level of the incoming sound

Many portions of an audio stream cannot actually be heard. Any sound with intensity below a certain threshold (called the threshold in quiet) cannot be heard, due to the limits of the ears sensitivity. Sometimes, sounds above the threshold in quiet cannot be heard because other sounds cover them up. This is due to a psychoacoustic phenomenon known as masking. If two separate tones are close enough in frequency, one tone may actually cover up the other one. The tone that is heard is called the masker, and the tone which is not heard is called the maskee.

The above phenomenon is known as simultaneous masking. There is another phenomenon known as temporal masking. Temporal masking is the masking of a sound before or after the masker event occurs.


2.1 Frames The first step in designing an audio coder is to segment the audio stream into frames. A frame is a short section of audio, typically less than 50ms each. At a sampling frequency of 44.1kHz, a frame of 2048 samples is about 46ms long. Enframing the audio stream allows the engineer to treat each frame as a relatively stationary sound. Frame lengths longer than 50ms are typically not used, since pleasant sounding audio is non-stationary.. The coder presented in this paper uses a fixed frame length of 2048 samples. 2.2 Signal-to-Mask Ratio There are many ways to calculate the masking threshold. In general, the masking threshold varies depending on the frequency and intensity of the masker signal. In order to calculate the masking threshold, the first step is to calculate the FFT of the frame, and find the spectral peaks. To find the peaks, simply search for every point where the slope changes from positive to negative. Each of these peaks corresponds to individual frequencies in the signal.

2.3 Bit Allocation From the SMR, it can determined which frequency bands should receive the most bits. As a general rule, each bit increases signal-to-noise ratio by about 6dB. Therefore, allocating a bit for each 6dB of SMR would ensure that quantization noise is below the masking threshold, and thus inaudible. However, there may not be enough bits available to do this, bits must be allocated to where they are needed most. The water-filling bit allocation algorithm is used to allocate bits by looking for the maximum value of the SMR, allocating a bit to that subband, subtracting 6dB from the SMR at that frequency, and repeating as long as bits are

available to allocate. 2.4 Quantization After determining where bits should be allocated, the next step is to quantize the audio signal to the appropriate number of bits. This audio coder is based on the Modified Discrete Cosine Transform (MDCT), so the MDCT coefficients are quantized. The MDCT of the original time-domain frame must first be computed. Then the coefficients must be attenuated because values as large as those typically found in the MDCT cannot typically be quantized. Therefore, an attenuation factor is chosen equal to the maximum value found in the MDCT, reducing the maximum value that needs to be quantized to unity. After attenuating the coefficients, they are quantized according to the bit allocation scheme determined earlier. 2.5 Reading/Writing the files Once the MDCT coefficients are quantized, they can be written to a file. In addition to the MDCT coefficients, the gain factor must also be specified as well as the number of bits allocated to each band. In this coder, a file header is also included which contains information such as the sampling frequency, frame length, bit rate, number of bits used for writing the gain factor, and the number of frames in the file. Because only a few bits are to be used to represent the gain factor, the logarithm of the gain is written to the file.


function Fs = new_codec() % yourfile.wav is the input file % yourfile.jon is the encoded file % decoded_yourfile.wav is the decoded output file % % If you make any modifications to this code, I would % like to hear about it. % - Jon Boley (jdb@jboley.com) clear all; scalebits = 4; bitrate = 128000; N = 2048; % framelength original_filename = sprintf('yourfile.wav'); coded_filename = sprintf('yourfile.jon'); decoded_filename = sprintf('decoded_yourfile.wav'); [Y,Fs,NBITS] = wavread(original_filename); tone = Y; num_subbands = floor(fftbark(N/2,N/2,Fs))+1; bits_per_frame = floor(((bitrate/Fs)*(N/2)) - (scalebits*num_subbands));

sig=sin(2*pi*1000*[1/Fs:1/Fs:(N/2)/Fs]); win=(0.5 - 0.5*cos((2*pi*([1:(N/2)]-0.5))/(N/2))); fftmax = max(abs(fft(sig.*win))); % defined as 96dB % Enframe Audio FRAMES = enframe(tone,N,N/2); % Write File Header fid = fopen(coded_filename,'w'); fwrite(fid, Fs, 'ubit16'); % Sampling Frequency fwrite(fid, N, 'ubit12'); % Frame Length fwrite(fid, bitrate, 'ubit18'); % Bit Rate fwrite(fid, scalebits, 'ubit4'); % Number of Scale Bits per Sub-Band fwrite(fid, length(FRAMES(:,1)), 'ubit26'); % Number of frames % Computations for frame_count=1:length(FRAMES(:,1)) if mod(frame_count,10) == 0 outstring = sprintf('Now Encoding Frame %i of %i', frame_count, length(FRAMES(:,1))); disp(outstring); end fft_frame = fft(FRAMES(frame_count,:)); if fft_frame == zeros(1,N) Gain = zeros(1,floor(fftbark(N/2,N/2,Fs))+1); bit_alloc = zeros(1,floor(fftbark(N/2,N/2,Fs))+1); else len = length(fft_frame); peak_width = zeros(1,len); peak_points = cell(len,len); peak_min_value = zeros(1,len); % Find Peaks centers = find(diff(sign(diff( abs(fft_frame).^2) )) == -2) + 1; spectral_density = zeros(1,length(centers)); for k=1:length(centers) peak_max(k) = centers(k) +2; peak_min(k) = centers(k) - 2; peak_width(k) = peak_max(k) - peak_min(k); for j=peak_min(k):peak_max(k) if (j > 0) & (j < N) spectral_density(k) = spectral_density(k) + abs(fft_frame(j))^2; end end end % This gives the amplitude squared of the original signal modified_SD = spectral_density / ((N^2)/8); SPL = 96 + 10*log10(modified_SD); % TRANSFORM FFT'S TO SPL VALUES fft_spl = 96 + 20*log10(abs(fft_frame)/fftmax); % Threshold in Quiet

f_kHz = [1:Fs/N:Fs/2]; f_kHz = f_kHz/1000; A = 3.64*(f_kHz).^(-0.8) - 6.5*exp(-0.6*(f_kHz - 3.3).^2) + (10^(3))*(f_kHz).^4; % Masking Spectrum big_mask = max(A,Schroeder(centers(1)*(Fs/2)/N,fft_spl(centers(1)),... 14.5+bark(centers(1)*(Fs/2)/N))); for peak_count=2:length(centers) try big_mask = max(big_mask,Schroeder(centers(peak_count)*(Fs/2)/N,fft_spl((peak_count)),. .. 14.5+bark(centers(peak_count)*(Fs/2)/N))); catch peak_count=peak_count; end end % Signal Spectrum - Masking Spectrum (with max of 0dB) New_FFT = fft_spl(1:N/2)-big_mask; New_FFT_indices = find(New_FFT > 0); New_FFT2 = zeros(1,N/2); for i=1:length(New_FFT_indices) New_FFT2(New_FFT_indices(i)) = New_FFT(New_FFT_indices(i)); end if frame_count == 55 semilogx([0:(Fs/2)/(N/2):Fs/2-1],fft_spl(1:N/2),'b'); hold on; semilogx([0:(Fs/2)/(N/2):Fs/2-1],big_mask,'m'); hold off; title('Signal (blue) and Masking Spectrum (pink)'); figure; semilogx([0:(Fs/2)/(N/2):Fs/2-1],New_FFT2); title('SMR'); figure; stem(allocate(New_FFT2,bits_per_frame,N,Fs)); title('Bits perceptually allocated'); end bit_alloc = allocate(New_FFT2,bits_per_frame,N,Fs); [Gain,Data] = p_encode(mdct(FRAMES(frame_count,:)),Fs,N,bit_alloc,scalebits); end % end of If-Else Statement % Write Audio Data to File qbits = sprintf('ubit%i', scalebits); fwrite(fid, Gain, qbits); fwrite(fid, bit_alloc, 'ubit4'); for i=1:25 indices = find((floor(fftbark([1:N/2],N/2,Fs))+1)==i); qbits = sprintf('ubit%i', bit_alloc(i)); % bits(floor(fftbark(i,framelength/2,48000))+1) if ((bit_alloc(i) ~= 0) & (bit_alloc(i) ~= 1)) fwrite(fid, Data(indices(1):indices(end)) ,qbits); end end end % end of frame loop

fclose(fid); % RUN DECODER disp('Decoding...'); p_decode(coded_filename,decoded_filename); disp('Okay, all done!');

%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % FFTBARK % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function b=fftbark(bin,N,Fs) % b=fftbark(bin,N,Fs) % Converts fft bin number to bark scale % N is the fft length % Fs is the sampling frequency f = bin*(Fs/2)/N; b = 13*atan(0.76*f/1000) + 3.5*atan((f/7500).^2);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % SCHROEDER % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function m=Schroeder(freq,spl,downshift) % Calculate the Schroeder masking spectrum for a given frequency and SPL N = 2048; f_kHz = [1:48000/N:48000/2]; f_kHz = f_kHz/1000; A = 3.64*(f_kHz).^(-0.8) - 6.5*exp(-0.6*(f_kHz - 3.3).^2) + (10^(3))*(f_kHz).^4; f_Hz = f_kHz*1000; % Schroeder Spreading Function dz = bark(freq)-bark(f_Hz); mask = 15.81 + 7.5*(dz+0.474) - 17.5*sqrt(1 + (dz+0.474).^2); New_mask = (mask + spl - downshift); m = New_mask; %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % BARK % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function b=bark(f) % b=bark(f) % Converts frequency to bark scale % Frequency should be specified in Hertz b = 13*atan(0.76*f/1000) + 3.5*atan((f/7500).^2); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % ALLOCATE %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function x=allocate(y,b,N,Fs) % x=allocate(y,b,N) % Allocates b bits to the 25 subbands % of y (a length N/2 MDCT, in dB SPL) bits(floor(bark( (Fs/2)*[1:N/2]/(N/2) )) +1) = 0; for i=1:N/2 bits(floor(bark( (Fs/2)*i/(N/2) )) +1) = max(bits(floor(bark( (Fs/2)*i/(N/2) )) +1) , ceil( y(i)/6 )); end indices = find(bits(1:end) < 2); bits(indices(1:end)) = 0; % NEED TO CALCULATE SAMPLES PER SUBBAND n = 0:N/2-1; f_Hz = n*Fs/N; f_kHz = f_Hz / 1000; A_f = 3.64*f_kHz.^-.8 - 6.5*exp(-.6*(f_kHz-3.3).^2) + 1e-3*f_kHz.^4; % *** Threshold in Quiet z = 13*atan(0.76*f_kHz) + 3.5*atan((f_kHz/7.5).^2); % *** bark frequency scale crit_band = floor(z)+1; num_crit_bands = max(crit_band); num_crit_band_samples = zeros(num_crit_bands,1); for i=1:N/2 num_crit_band_samples(crit_band(i)) = num_crit_band_samples(crit_band(i)) + 1; end x=zeros(1,25); bitsleft=b; [blah,i]=max(bits); while bitsleft > num_crit_band_samples(i) [blah,i]=max(bits); x(i) = x(i) + 1; bits(i) = bits(i) - 1; bitsleft=bitsleft-num_crit_band_samples(i); end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % P_ENCODE % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [Quantized_Gain,quantized_words]=p_encode(x2,Fs,framelength,bit_alloc,scale bits) for i=1:floor(fftbark(framelength/2,framelength/2,Fs))+1 indices = find((floor(fftbark([1:framelength/2],framelength/2,Fs))+1)==i); Gain(i) = 2^(ceil(log2((max(abs(x2(indices(1):indices(end))+1e10)))))); if Gain(i) < 1 Gain(i) = 1; end x2(indices(1):indices(end)) = x2(indices(1):indices(end)) / (Gain(i)+1e-10);

Quantized_Gain(i) = log2(Gain(i)); end for i=1:length(x2) quantized_words(i) = midtread_quantizer(x2(i), max(bit_alloc(floor(fftbark(i,framelength/2,Fs))+1),0)+1e-10); % 03/20/03 end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % MIDTREAD_QUANTIZER % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [ret_value] = midtread_quantizer(x,R) Q = 2 / (2^R - 1); q = quant(x,Q); s = q<0; ret_value = uint16(abs(q)./Q + s*2^(R-1)); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % MIDTREAD_DEQUANTIZER % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [ret_value] = midtread_dequantizer(x,R) sign = (2 * (x < 2^(R-1))) - 1; Q = 2 / (2^R - 1); x_uint = uint32(x); x = bitset(x_uint,R,0); x = double(x); ret_value = sign * Q .* x;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % P_DECODE % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function Fs=p_decode(coded_filename,decoded_filename) %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % READ FILE HEADER % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% fid = fopen(coded_filename,'r'); Fs = fread(fid,1,'ubit16'); framelength = fread(fid,1,'ubit12'); bitrate = fread(fid,1,'ubit18'); scalebits = fread(fid,1,'ubit4' ); num_frames = fread(fid,1,'ubit26');

% % % % %

Sampling Frequency Frame Length Bit Rate Number of Scale Bits per Sub-Band Number of frames

for frame_count=1:num_frames %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % READ FILE CONTENTS % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% qbits = sprintf('ubit%i', scalebits); gain = fread(fid,25,qbits); bit_alloc = fread(fid,25,'ubit4'); for i=1:floor(fftbark(framelength/2,framelength/2,Fs))+1

indices = find((floor(fftbark([1:framelength/2],framelength/2,Fs))+1)==i); if ((bit_alloc(i) ~= 0) & (bit_alloc(i) ~= 1)) qbits = sprintf('ubit%i', bit_alloc(i)); InputValues(indices(1):indices(end)) = fread(fid, length(indices) ,qbits); else InputValues(indices(1):indices(end)) = 0; end end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % DEQUANTIZE VALUES % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% for i=1:length(InputValues) if InputValues(i) ~= 0 if max(bit_alloc(floor(fftbark(i,framelength/2,Fs))+1),0) ~= 0 InputValues(i) = midtread_dequantizer(InputValues(i),... max(bit_alloc(floor(fftbark(i,framelength/2,Fs))+1),0)); end end end for i=1:25 gain2(i) = 2^gain(i); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % APPLY GAIN % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% for i=1:floor(fftbark(framelength/2,framelength/2,Fs))+1 indices = find((floor(fftbark([1:framelength/2],framelength/2,Fs))+1)==i); InputValues(indices(1):indices(end)) = InputValues(indices(1):indices(end)) * gain2(i); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % INVERSE MDCT % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% x2((frame_count-1)*framelength+1:frame_count*framelength) = imdct(InputValues(1:framelength/2)); end status = fclose(fid); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % RECOMBINE FRAMES % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% x3 = zeros(1,(length(x2)-1)/2+1); for i=0:0.5:floor(length(x2)/(2*framelength))-1 x3(i*framelength+1 : (i+1)*framelength) = x3(i*framelength+1 : (i+1)*framelength) + x2((2*i)*framelength+1 : (2*i+1)*framelength); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % WRITE FILE %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%% wavwrite(x3/2,Fs,decoded_filename);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % MDCT % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function y = mdct(x) x=x(:); N=length(x); n0 = (N/2+1)/2; wa = sin(([0:N-1]'+0.5)/N*pi); y = zeros(N/2,1); x = x .* exp(-j*2*pi*[0:N-1]'/2/N) .* wa; X = fft(x); y = real(X(1:N/2) .* exp(-j*2*pi*n0*([0:N/2-1]'+0.5)/N)); y=y(:); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % IMDCT % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function y = imdct(X) X=X(:); N = 2*length(X); ws = sin(([0:N-1]'+0.5)/N*pi); n0 = (N/2+1)/2; Y = zeros(N,1); Y(1:N/2) = X; Y(N/2+1:N) = -1*flipud(X); Y = Y .* exp(j*2*pi*[0:N-1]'*n0/N); y = ifft(Y); y = 2*ws .* real(y .* exp(j*2*pi*([0:N-1]'+n0)/2/N));