Vous êtes sur la page 1sur 15

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % 9. Computer assignment.

The following assignment requires you to record % in your own voice, the digits zero to nine, each spoken in isolation, % and then use Matlab and the waveform files for analysis. % (a) Record the digits at 16 KHz sampling rate and 16 bits per % sample in a single channel, ensuring that background noise is % minimum. What is the estimated range of SNR for digit zero, where % the noise is due to the background? Explain % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Sol> I recorded my voice at 16 KHz sampling rate and 16 channel with minimum noise. For getting SNR range, I sound into 10ms segments. I chose one for noise, one for maximum signal so that I could calculate SNR for value which is the range of the zero sound. bit per sample in a single need to specify the zero for minimum signal, and one maximum value and minimum

Fig a-1. Zeros time waveform

Fig a-2. i sound in /z-rou/ Maximum segments(10ms) With 0.3367 to -0.5332

Fig a-3. z sound in /z-rou/ Minimum segments(10ms) with 0.02557 to -0.04285

Fig a-4. Noise segments(10ms) with 0.004272 to -0.003723

So, I could estimate the value from 10 ms segment of above Fig a-2, a-3, a-4 regions. Signal power is sum of the sampled values square. Signal power = sum(segments amplitude.^2); max_sig_pwr = 1.9333 min_sig_pwr = 0.0238 noise_sig_pwr = 3.2553e-004 in dB scale, 10log10(signal power) db_max_sig = 2.8630 dB db_min_sig = -16.2391 dB db_noise_sig = -34.8741 dB SNR could get Signal power in dB scale noise power in dB scale. Thus, the variation of SNR range is 18.635027 dB to 37.737136 dB.

clear

Matlab Source code -

[y, fs, nbits] = wavread('zero.wav'); time=(1:length(y))/fs; % Time vector on x-axis figure('name','zero : time waveform') plot(time, y) % Plot the waveform w.r.t. time xlabel('time(sec)'); ylabel('amplitude'); pause maximum=max(y); % finding maximum value's index for i=1:length(y) if(y(i) == maximum) maximum_middle_index=i; end end % in case of zero, starting point for 'z' sound is minimum. % so, in this case, I find start point which is amplitute larger than 0.02. for i=1:length(y) if(y(i) > 0.02) minimum_starting_index=i; break end end % Sampling rate of the recording time_segment=1:160; maximum_segment=zeros(160,1); minimum_segment=zeros(160,1); noise_segment=zeros(160,1); is 1/16000 sec, so 10 ms is 160 samples. % declare maximum segment in array % declare minimum segment in array % declare noise segment in array

for i=1:160 maximum_segment(i)=y(maximum_middle_index+i-81); end for i=1:160 minimum_segment(i)=y(minimum_starting_index+i-1); end for i=1:160 noise_segment(i)=y(i); end figure('name','maximum segments') plot(time_segment, maximum_segment) xlabel('sample'); ylabel('amplitude'); pause figure('name','minimum segments') plot(time_segment, minimum_segment) xlabel('sample'); ylabel('amplitude'); pause figure('name','noise segments') plot(time_segment, noise_segment) xlabel('sample'); ylabel('amplitude'); max_sig_pwr=sum(maximum_segment.^2); min_sig_pwr=sum(minimum_segment.^2); noise_sig_pwr=sum(noise_segment.^2); db_max_sig=10*log10(max_sig_pwr); db_min_sig=10*log10(min_sig_pwr); db_noise_sig=10*log10(noise_sig_pwr); max_snr=db_max_sig-db_noise_sig; min_snr=db_min_sig-db_noise_sig; fprintf('The variation of SNR range is %f dB to %f dB.\n', min_snr, max_snr);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % (b) Using (i) time waveform, and, (ii) narrowband spectrograms, % obtain your fundamental frequency from suitable voiced segments of % two digits. Identify the segment on the waveform in your % submission. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Sol> To get fundamental frequency from time waveform, there are two way to get. One is analyze time waveform by finding pitch period, the other is analyze narrowband spectrogram by finding harmonics. (i) Time waveform I need to observe pitch period in periodic signal which could be the voiced sound. And after obtain the pitch period, I could get fundamental frequency which is (1 / (pitch period)). Pitch period is the same (almost) in any signal if the speaker is the same. Unlike formant, it depends on phoneme, but pitch period is depends on the speaker, so that it could be used for speaker recognition.

Fig b-1. Zero

Fig b-2. One

Fig b-3. Pitch period(4th cycle / 4) of zero sound : 0.0310

Fig b-4. Pitch period(4th cycle / 4) of one sound : 0.0325

Note that I used, continuous time domain on x-axes. If I used discrete time domain which is sampled data, then I should calculate the difference between two pitch and should divide by fs(sampling rate : 16000 sample/sec). This could be more accurate but, I used continuous time domain. It will show very little difference. So, in My case, I just need to observe the difference of the two spot, which is 0.0310 and 0.0325. Both measurement done by 4th cycle of the pitch. Thus I need average value

of that. When I divided by 4, then I get 0.0077 and 0.0081. This is my pitch period P. Then my fundamental frequency F is as follow. F = 1/P 129.8701 = 1 / 0.0077, or 123.4568 = 1/0.0081 Thus from this, my fundamental frequency from time domain is approximately 126.7 Hz. (ii) Narrowband spectrogram

Fig b-5. Narrowband spectrogram of zero sound

Fig b-6. Narrowband spectrogram of one sound

Fig b-7. Horizontal line between 0 to 1000 Hz for zero : 8 spots

Fig b-8. Horizontal line between 0 to 1000 Hz for one : 8 spots

To obtain narrowband spectrogram, I should decide the window size and overlapping size. Usually, between 3 ~ 10 ms of window size is for wideband, 20~30 ms of window size is for narrowband. Overlapping size should be larger than half of the window size and smaller than the window size. Thus, if I want to use 25 ms of window size then it should calculate for discrete number of samples. 10 ms have 160 samples due to sampling rate, so 25 ms have 400. For the better shape, I did adjust the parameter little bit higher sample size. Over here I used 512 for window size, and 480 is used for overlapping. As you can see in the Fig b-7, b-8, there are 8 harmonics. We can approximately calculate fundamental frequency 1000 divided by 8. Thus, the result of 1000/8 is 125 Hz. Thus, through time domain waveform, I got 126.7 Hz of my voices fundamental frequency and by narrowband spectrogram, I got 125 Hz. Voice signal itself, dynamic signal which can be changed every time, even same speaker and same phoneme. But, through this question, I got some idea of speaker recognition. Even many features are different, but fundamental frequency shows me fairly stable.

clear

Matlab Source code

[y, fs] = wavread('zero.wav'); time=(1:length(y))/fs; % Time vector on x-axis figure('name','zero : time waveform') plot(time, y) % Plot the waveform w.r.t. time xlabel('Time(sec)'); ylabel('Amplitude'); pause figure('name','zero : narrowband spectrogram') spectrogram(y,512,480,1024,fs,'yaxis') % display spectrogram in new window xlabel('Time(sec)'); ylabel('Frequency(Hz)'); pause [x, fs] = wavread('one.wav'); time=(1:length(x))/fs; % Time vector on x-axis figure('name','one : time waveform') plot(time, x) % Plot the waveform w.r.t. time xlabel('Time(sec)'); ylabel('Amplitude'); pause figure('name','one : narrowband spectrogram') spectrogram(x,512,480,1024,fs,'yaxis') % display spectrogram in new window xlabel('Time(sec)'); ylabel('Frequency(Hz)');

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % (c) Take a printout of the time-waveform of each digit. Mark the % boundaries of voiced, unvoiced and silence segments, and label the % segments. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Sol> Plot all digit with marking the boundaries of voiced, unvoiced, silence. And labeling segments are done by pen. The results are following.

Fig c-1. Time waveform of zero

Fig c-2. Time waveform of one

Fig c-3. Time waveform of two

Fig c-4. Time waveform of three

Fig c-5. Time waveform of four

Fig c-6. Time waveform of five

Fig c-7. Time waveform of six

Fig c-8. Time waveform of seven

Fig c-9. Time waveform of eight

Fig c-10. Time waveform of nine

clear

Matlab Source code -

[y, fs] = wavread('zero.wav'); figure('name','zero') plot(y) % Plot the waveform w.r.t. time xlabel('Sample'); ylabel('Amplitude'); pause [y, fs] = wavread('one.wav'); figure('name','one') plot(y) % Plot the waveform w.r.t. time xlabel('Sample'); ylabel('Amplitude'); pause [y, fs] = wavread('two.wav'); figure('name','two') plot(y) % Plot the waveform w.r.t. time xlabel('Sample'); ylabel('Amplitude'); pause [y, fs] = wavread('three.wav'); figure('name','three') plot(y) % Plot the waveform w.r.t. time xlabel('Sample'); ylabel('Amplitude'); pause [y, fs] = wavread('four.wav'); figure('name','four') plot(y) % Plot the waveform w.r.t. time xlabel('Sample'); ylabel('Amplitude'); pause [y, fs] = wavread('five.wav'); figure('name','five') plot(y) % Plot the waveform w.r.t. time xlabel('Sample'); ylabel('Amplitude'); pause [y, fs] = wavread('six.wav'); figure('name','six') plot(y) % Plot the waveform w.r.t. time xlabel('Sample'); ylabel('Amplitude'); pause [y, fs] = wavread('seven.wav'); figure('name','seven') plot(y) % Plot the waveform w.r.t. time xlabel('Sample'); ylabel('Amplitude'); pause [y, fs] = wavread('eight.wav'); figure('name','eight') plot(y) % Plot the waveform w.r.t. time xlabel('Sample'); ylabel('Amplitude'); pause [y, fs] = wavread('nine.wav'); figure('name','nine') plot(y) % Plot the waveform w.r.t. time xlabel('Sample'); ylabel('Amplitude');

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % (d) From any one digit that displays large intensity variation, % extract two 10 ms segments - one each from the maximum and minimum % intensity regions(excluding silence regions). Obtain the energy in % the two segments, and express the variation in dB SPL. Label the % phonetic utterance associated with the two segments. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Sol> First, I should get segments of maximum and minimum. The result is followed.

Fig d-1. Time waveform of seven. Max is /e/, min is /s/ as phonetic utterance

Fig d-2. Maximum segments of /e/ sound in /sevn/

Fig d-3. Minimum segments of /s/ sound in /sevn/

To obtain energy in these two segments, I need to calculate maximum and minimum segments energy. It is sum of square of segments. After achieve these value, I could easily derive in dB scale with 10log10(energy of the segments) max_pwr=sum(maximum_segment.^2); db_max=10*log10(max_pwr); min_pwr=sum(minimum_segment.^2); db_min=10*log10(min_pwr); So, the Result above are : Maximum segments energy = 0.6422 Minimum segments energy = 8.1978e-004 Thus, variation in dB SPL is 28.9397 dB -1.9233 dB -30.8630 dB

clear

Matlab Source code -

[y, fs, nbits] = wavread('seven.wav'); figure('name','seven') plot(y) % Plot the waveform w.r.t. time xlabel('Sample'); ylabel('Amplitude'); pause maximum=max(y); % finding maximum valuse's index for i=1:length(y) if(y(i) == maximum) maximum_middle_index=i; end end % in case of seven, starting point for 's' sound is minimum. % so, in this case, I find start point which is larger than 0.005. for i=1:length(y) if(y(i) > 0.005) minimum_starting_index=i; break end end % Sampling rate of the recording is 1/16000 sec, so 10 ms is 160 samples. time_segment=1:160; maximum_segment=zeros(160,1); % declare maximum segment in array minimum_segment=zeros(160,1); % declare minimum segment in array for i=1:160 maximum_segment(i)=y(maximum_middle_index+i-81); end for i=1:160 minimum_segment(i)=y(minimum_starting_index+i-1); end figure('name','maximum segments') plot(time_segment, maximum_segment) xlabel('Sample'); ylabel('Amplitude'); pause figure('name','minimum segments') plot(time_segment, minimum_segment) xlabel('Sample'); ylabel('Amplitude'); max_pwr=sum(maximum_segment.^2); db_max=10*log10(max_pwr); min_pwr=sum(minimum_segment.^2); db_min=10*log10(min_pwr); % to label the phonetic utterance associated with the two segments. % maximum segment's phonetic is '[e]' among [sevn]. % minimum segment's phonetic is '[s]' among [sevn].

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % (e) From the printout of the wideband spectrogram of the digits, % identify one segment each of following types: (a)vowel, % (b)fricative, and, (c)stop consonant, by marking their boundaries. % Study the spectrogram and list all properties that you can identify % from these segments. Give the spectrogram settings used. Mark all % axes properly. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Sol> For wideband spectrogram if I used 3 ms of the window size, then I got 48 samples as 10 ms for 160 samples. And I put overlapping size as half of window size which is 24. From these wideband spectrogram, I could identify speakers gender(mostly), formant resonances, vowel, fricative, stop consonant, voiced, unvoiced, silence, maxima, fundamental frequency, and pitch period. Above things that I could identify are marked by pen below.

Fig e-1. Wideband spectrogram of zero

Fig e-2. Wideband spectrogram of one

Fig e-3. Wideband spectrogram of two

Fig e-4. Wideband spectrogram of three

Fig e-5. Wideband spectrogram of four

Fig e-6. Wideband spectrogram of five

Fig e-7. Wideband spectrogram of six

Fig e-8. Wideband spectrogram of seven

Fig e-9. Wideband spectrogram of eight

Fig e-10. Wideband spectrogram of nine

clear

Matlab Source code

[a, fs] = wavread('zero.wav'); figure('name','zero'), spectrogram(a,48,24,1024,fs,'yaxis') colorbar % display spectrogram in new window xlabel('Time(sec)'); ylabel('Frequency(Hz)'); pause [b, fs] = wavread('one.wav'); figure('name','one'), spectrogram(b,48,24,1024,fs,'yaxis'), colorbar xlabel('Time(sec)'); ylabel('Frequency(Hz)'); pause [c, fs] = wavread('two.wav'); figure('name','two'), spectrogram(c,48,24,1024,fs,'yaxis'), colorbar xlabel('Time(sec)'); ylabel('Frequency(Hz)'); pause [d, fs] = wavread('three.wav'); figure('name','three'), spectrogram(d,48,24,1024,fs,'yaxis'), colorbar xlabel('Time(sec)'); ylabel('Frequency(Hz)'); pause [e, fs] = wavread('four.wav'); figure('name','four'), spectrogram(e,48,24,1024,fs,'yaxis'), colorbar xlabel('Time(sec)'); ylabel('Frequency(Hz)'); pause [f, fs] = wavread('five.wav'); figure('name','five'), spectrogram(f,48,24,1024,fs,'yaxis'), colorbar xlabel('Time(sec)'); ylabel('Frequency(Hz)'); pause [g, fs] = wavread('six.wav'); figure('name','six'), spectrogram(g,48,24,1024,fs,'yaxis'), colorbar xlabel('Time(sec)'); ylabel('Frequency(Hz)'); pause [h, fs] = wavread('seven.wav'); figure('name','seven'), spectrogram(h,48,24,1024,fs,'yaxis'), colorbar xlabel('Time(sec)'); ylabel('Frequency(Hz)'); pause [i, fs] = wavread('eight.wav'); figure('name','eight'), spectrogram(i,48,24,1024,fs,'yaxis'), colorbar xlabel('Time(sec)'); ylabel('Frequency(Hz)'); pause [j, fs] = wavread('nine.wav'); figure('name','nine'), spectrogram(j,48,24,1024,fs,'yaxis'), colorbar xlabel('Time(sec)'); ylabel('Frequency(Hz)');

Vous aimerez peut-être aussi