Académique Documents
Professionnel Documents
Culture Documents
Project Report
Written by:
Jordan D. Ulmer,
Instructor:
Dr. Tan
Joshua Behnken
Design Initiated:
04/26/2015
Report Submission:
05/05/2015
Page |1
Background:
In todays technology driven world communication is essential. Signal processing, and more specifically,
voice data processing is a key component in communications at large. Computer automated telephone
systems are frequently employed to direct customers of a given business to the department most
applicable to their needs.
By characterizing many common, albeit simple, vocal patterns, complex vocal signals can be digitally
reproduced.
This vocal synthesis is useful in text to speech and other speech based methods of
communication. Furthermore through the use of digital filters voice audio may be processed, a fact which
is useful in speech to text transliteration.
Introduction:
This brief study of digital filers and their application to voice synthesis consisted of three parts. First, a
resonator filter was designed and tested on a simulated signal. Second, the waveform resulting from the
voiced phrase arr was analyzed, and a cascade of resonator filters were designed to simulate the original
waveform. Third, the waveform resulting from the un-voiced phrase Shh was analyzed, and a cascade
of resonator filters were designed to simulate the original waveform.
( ) =
() =
/2
[0, ]
1
( (cos() + sin()))( (cos() sin()))
(1)
(2)
A resonator filter (7) was designed for the frequency in (3) at a sampling frequency in (4) yielding the
normalized frequency in (5) with a chosen value defined in (6).
= 600[]
= 15 [
( ) = 600[]
0.25133 [
]
15 [ ]
2
= 0.9
(3)
(4)
(5)
(6)
505005 ()
=
(7)
The frequency response (Bode plot) of the system in (7) was plotted in Figure 1.
Page |4
A test waveform (8) was then generated, using 1.5 103 [], in Figure 2 using a combination of
sinusoids with frequencies at 1 = 50 [], 2 = 500 [] and 3 = 5 [].
[] = cos(1 ) + cos(2 ) + cos(3 )
(8)
The resonator filter in (7) was then applied to the test waveform and the frequency domain response is
shown in Figure 3. At 600 [] the magnitude of the test waveforms frequency response increased from
64 [] in the unfiltered waveform to 6.2 [] in the filtered waveform. Notably, the 50 [] and
500 [] peaks were amplified, while the 5 [] peak was slightly attenuated, as shown in Figure 3.
Page |5
System #(0)
50
200
X: 51
Y: 37.5
X: 501
Y: 37.5
X: 5001
Y: 37.5
Phase [Degrees]
Magnitude [dB]
X: 600
Y: -64.07
-50
X: 3
Y: -72.72
-100
-150 0
10
10
-100
10
10
10
Frequency [Hz]
Filtered Simulated Signal
(H505005k*XZ(z))
10
10
200
X: 501
Y: 63.89
X: 51
Y: 61.07
Phase [Degrees]
60
40
X: 5001
Y: 29.03
20
X: 600
Y: 6.159
X: 2
Y: 0.4039
0
-20 0
10
-200 0
10
10
10
Frequency [Hz]
Filtered Simulated Signal [dB]
dB(H505005k*XZ(z))
80
Magnitude [dB]
Resonator
Filter Center
Frequency @
600 [Hz]
100
10
10
Frequency [Hz]
10
10
100
-100
-200 0
10
10
10
Frequency [Hz]
10
10
Figure 3: [Preliminary Design] Simulated Test Waveform Frequency Response Before and After Filtering In Decibels
Page |6
Voice Characterization:
During the characterization stage, the voiced phrase arr and the unvoiced phrase Shh were analyzed.
Two voices enunciated these phrases to provide a basis of comparison and the corresponding
fundamental frequencies of the enunciated phrases were identified.
(9)
The fundamental frequency of the spoken phrase arr has been empirically identified in Table 1 and the
significant harmonics are listed in Table 2.
Table 1: [Characterization](arr-Jordan)(arr-Josh)
Fundamental Frequency of the Spoken Phrase "arr"
Harmonic
UID
0
1
2
3
4
5
6
7
Jordan
Empirical
Harmonics
123 []
246 []
371 []
492 []
611 []
Josh
Empirical
Harmonics
68 []
135 []
204 []
273 []
341 []
409 []
475 []
541 []
Peak
Jordan
UID
Significant Frequencies
183 []
1
1731 []
2
(2,3)
2041 []
2164 []
3
3125 []
4
Josh
Significant Frequencies
85 []
1801 []
1947 []
2067 []
3000 []
Page |7
0.1
Magnitude
0.05
-0.05
-0.1
-0.15
-0.2
0.5
1.5
2
2.5
3
Time Sample (0 [s] - 5 [s])
@ fsampling= 8000 [Samples/Second]
3.5
4
4
x 10
600
Fundemental
Frequency
123 [Hz]
Empirical
Harmonics:
(1): 246 [Hz]
(2): 371 [Hz]
(3): 492 [Hz]
(4): 611 [Hz]
X: 123
Y: 580.3
500
|y(jw)|
400
300
X: -245.6
Y: 196.1
200
X: -491.8
Y: 116
X: 371.2
Y: 158
100
-2000
X: 610.8
Y: 37.76
-1500
-1000
-500
0
500
Frequency [Hz]
1000
1500
2000
Magnitude
0.1
0.05
0
-0.05
-0.1
-0.15
0.5
1.5
2
2.5
3
Time Sample (0 [s] - 5 [s])
@ fsampling= 8000 [Samples/Second]
3.5
4
4
x 10
250
200
|y(jw)|
X: -135
Y: 271.2
Fundemental
Frequency
68 [Hz]
X: -409.2
Y: 166.7
Empirical
Harmonics:
(1): 135 [Hz]
(2): 204 [Hz]
(3): 273 [Hz]
(4): 341 [Hz]
(5): 409 [Hz]
(6): 475 [Hz]
(7): 541 [Hz]
X: 67.8
Y: 183.6
150
X: -341.2
Y: 106.5
X: 204.2
Y: 78.36
100
X: -541.4
Y: 38.35
50
-1500
-1000
X: -273.4
Y: 68.69
-500
X: 474.6
Y: 36.06
0
500
Frequency [Hz]
1000
1500
0
-0.02
-0.04
-0.06
-0.08
-0.1
0.5
1.5
2
2.5
3
Time Sample (0 [s] - 5 [s])
@ fsampling= 8000 [Samples/Second]
3.5
4
4
x 10
30
Weighted
Average
2041 [Hz]
25
20
|y(jw)|
Magnitude
0.02
Significant
Peaks
(1): 183 [Hz]
(2): 1731 [Hz]
(3): 2164 [Hz]
(4): 3125 [Hz]
X: -2164
Y: 16.58
15
10
X: -1731
Y: 6.631
X: 3125
Y: 2.156
0
-4000
-3000
-2000
-1000
0
1000
Frequency [Hz]
2000
3000
4000
10
Magnitude
0.02
0
-0.02
-0.04
-0.06
-0.08
0.5
1.5
2
2.5
3
Time Sample (0 [s] - 5 [s])
@ fsampling= 8000 [Samples/Second]
3.5
4
4
x 10
30
|y(jw)|
25
X: -84.6
Y: 36.1
X: -2067
Y: 20.77
Significant
Peaks
(1): 85 [Hz]
(2): 1801 [Hz]
(3): 2067 [Hz]
(4): 3000 [Hz]
X: -1801
Y: 16.99
20
15
10
X: 3000
Y: 2.292
5
0
-4000
-3000
-2000
-1000
0
1000
Frequency [Hz]
2000
3000
4000
11
Voice Synthesis:
During the synthesis stage, resonator filters were designed for both the voiced phrase arr and the
unvoiced phrase Shh. The frequency response (Bode) of each resonator filter was plotted against the
original frequency response of the respective spoken phrase and the significant peaks of the filter were
validated. Each filter was applied to a unique input signal, and in this way the voiced phrase arr and the
unvoiced phrase Shh were synthesized.
thus three resonator filters were designed in cascade. For both filters a sampling frequency of 8 [ ] was
used. The normalized frequency resulting from aforementioned significant center frequencies and the
chosen responsivities of each resonator filter of the form described in (2) are detailed in Table 4.
Table 4: [Initial Design](arr-Jordan)(arr-Josh) Resonator Filter Parameters for the Spoken Sound "arr"
Single Filter
(
)
Parameter
Name
Parameter
Symbol
Units
Harmonic #
Center
Frequency
Sampling
Frequency
Normalized
Frequency
Responsivity
Cascaded Filter
(
)
Jordan
Filter-1
Josh
Filter-1
Josh
Filter-2
Josh
Filter-3
[]
[]
123
135
409
541
[]
0.0966
0.106
0.321
0.425
0.95
0.9
0.6
0.2
[]
12
1 [] = cos[2 0 ]
(10)
=1
2. The input magnitude was scaled to resemble the magnitude of the recorded inputs Figure 4 and
Fig.Figure 6.
1
2 [] = 1 [] (
)
100
(11)
3. Using MATLAB, random noise (12) was added to the signal (14) with a specified signal noise ratio
(13) (resulting in a slight positive DC bias).
[] = ( );
(12)
2 []
30
[]
(13)
3 [] = 2 [] + []
(14)
(15)
Jordan
Josh
Single Filter Single Filter
( )
( )
Parameter
Name
Parameter
Symbol
Units
Harmonic #
Center Frequency
[]
[]
0
123
0
68
Sampling Frequency
[]
Normalized Frequency
Responsivity
[/]
[]
0.0966
0.75
0.0534
0.55
13
Magnitude [dB]
30
X: 123
Y: 22.48
20
10
0
-10 -2
10
-1
10
10
10
Frequency [Hz]
10
10
10
@ 123 [Hz]
Magnitude = 22.48 [dB]
Phase = -28.61 [Degrees]
0
X: 123
Y: -28.61
Phase [Degrees]
-20
-40
-60
-80
-100 -2
10
-1
10
10
10
Frequency [Hz]
10
10
10
Magnitude [dB]
15
X: 68
Y: 13.74
10
5
0
-5
-10 -2
10
-1
10
10
10
Frequency [Hz]
10
10
10
@ 68 [Hz]
Magnitude = 13.74 [dB]
Phase = -7.373 [Degrees]
Phase [Degrees]
0
X: 68
Y: -7.373
-20
-40
-60
-80 -2
10
-1
10
10
10
Frequency [Hz]
10
10
10
14
0.1
0.08
Magnitude
0.06
0.04
0.02
-0.02
-0.04
0
0.1
0.2
0.3
0.4
0.5
0.6
Time Sample (0 [s] - 1[s])
@ f sampling = 8000[Samples/Second]
0.7
0.8
0.9
0.9
0.1
0.08
Magnitude
0.06
0.04
0.02
-0.02
-0.04
0
0.1
0.2
0.3
0.4
0.5
0.6
Time Sample (0 [s] - 1[s])
@ f sampling = 8000[Samples/Second]
0.7
0.8
15
The frequency domain effects resulting from the application of resonator filter, and ,
are depicted in Figure 16 and Figure 17 respectively.
Original Simulated Signal [dB]
dB(Xz(r) + Gaussian Noise)
X: 370
Y: 32.04
40
X: 1231
Y: 32.04
200
X: 124
Y: 32.04
X: 4
Y: 103.8
Phase [Degrees]
20
Magnitude [dB]
0
-20
X: 1
Y: -21.97
-40
-60
100
-100
X: 5
Y: -68.4
-80
X: 3
Y: -132.9
-100 0
10
10
-200 0
10
10
10
Frequency [Hz]
Filtered Simulated Signal [dB]
dB(HarrJDU*XZ(z)) Vs. Recording
10
X: 370
Y: 50.06
60
arr-Jordan.mat
HarrJDU*XZ(z)
X: 0.2
Y: 11.82
20
X: 2
Y: 7.498
X: 1231
Y: 35.05
X: 20
Y: 7.453
0
X: 0.4
Y: -6.12
-20
10
10
10
Frequency [Hz]
Filtered Simulated Signal
(HarrJDU*XZ(z)) Vs. Recording
10
200
X: 124
Y: 54.48
Phase [Degrees]
40
Magnitude [dB]
System #(1)
100
arr-Jordan.mat
HarrJDU*XZ(z)
0
X: 124
Y: -23.07
-40
X: 1231
Y: -38.36
X: 370
Y: -54.78
-100
X: 0.4
Y: -179.9
-60 -1
10
10
10
10
Frequency [Hz]
10
-200 -1
10
10
10
10
10
Frequency [Hz]
10
10
200
X: 681
Y: 32.04
X: 1
Y: -21.97
-50
Phase [Degrees]
X: 2
Y: -69.64
-100
-150 0
10
10
Magnitude [dB]
X: 69
Y: 45.74
arr-Josh.mat
HarrJOSH*XZ(z)
10
0
X: 3
Y: -41.68
-100
10
10
10
Frequency [Hz]
Filtered Simulated Signal
(HarrJOSH*XZ(z)) Vs. Recording
10
200
X: 205
Y: 45.23
X: 681
Y: 40.98
20
X: 2
Y: -1.002
0
-20
100
-200 0
10
10
10
Frequency [Hz]
Filtered Simulated Signal [dB]
dB(HarrJOSH*XZ(z)) Vs. Recording
60
40
System #(2)
X: 205
Y: 32.04
Phase [Degrees]
Magnitude [dB]
50
X: 69
Y: 32.04
X: 0.4
Y: -12.26
-40
100
arr-Josh.mat
HarrJOSH*XZ(z)
0
X: 69
Y: -4.306
X: 205
Y: -12.32
X: 681
Y: -25.06
-100
-60
-80 -1
10
10
10
10
Frequency [Hz]
10
10
-200 -1
10
10
10
10
Frequency [Hz]
10
10
16
50
200
Magnitude [Absolute]
Phase [Degrees]
30
30
20
10
10
0 0
10
10
10
10
10 10
10 10
Frequency
Frequency
[Hz] [Hz]
Filtered Simulated
Filtered Simulated
Signal [ABS]
Signal
|HarrJOSH
(H*X
(z)|*XVs.
(z))Recording
Vs. Recording
arrJDU
Z
Z
10
300
200
100
-3000
-2000
-1000
0
1000
Frequency [Hz]
2000
3000
4000
250
100
-100
200
arr-Josh.mat
arr-Jordan.mat
HarrJOSH
HarrJDU
*XZ(z)
*XZ(z)
200
150
100
-200
1
10
Phase [Degrees]
Magnitude [Absolute]
Magnitude [Absolute]
10
300 200
arr-Jordan.mat
HarrJDU*XZ(z)
400
0
-4000
-100
0 -200
0
0
10 10
10
10
Frequency [Hz]
Filtered Simulated Signal [ABS]
|HarrJDU*XZ(z)| Vs. Recording
600
500
20
Phase [Degrees]
Magnitude [Absolute]
An absolute
waveform for the spoken phrase arr are plotted in Figure 18.
40frequency response plot of the resulting filtered signal and the original recorded
40
100
Notably, the basic spectral trend was matched and the resulting voice synthesis was aurally verified using the MATLAB code placed in the Appendix.
Phase [Degrees]
100
-100
100
-100
50
0 -200 -1
-4000 10-3000
0
1
-2000
1000102 2000
10 -1000 100
Frequency
[Hz]
Frequency [Hz]
3
3000
10
4000 104
Figure 18: [Final Design arr] ] Side By Side Frequency Response Comparison In Absolute Units
Left (arr-Jordan) and Recording (Red) ; Right (arr-Josh) and Recording (Red)
P a g e | 17
-200
1
frequency response. For both filters a sampling frequency of 8 [ ] was used. The normalized frequency
resulting from aforementioned significant center frequencies and the chosen responsivities of each
resonator filter of the form described in (2) are detailed in Table 6.
Table 6: [Initial Design] Resonator Filter Parameters for the Spoken Sound "arr"
Cascaded Filter
( )
Parameter
Name
Parameter
Symbol
Units
Center
Frequency
Sampling
Frequency
Normalized
Frequency
Responsivity
Cascaded Filter
( )
Jordan
Filter-1
Jordan
Filter-2
Josh
Filter-1
Josh
Filter-2
[]
183
2041
85
1947
[]
0.144
1.60
0.0668
1.53
0.9
0.4
0.9
0.5
[]
(16)
3 [] = (( ), );
(17)
Step 4 , step 5 and step 6 , are identical to the section titled, Filter Design Process For arr:
P a g e | 18
Parameter
Name
Parameter
Symbol
Units
Peak UID
Center
Frequency
Sampling
Frequency
Normalized
Frequency
Responsivity
Jordan
Filter-1
Jordan
Filter-2
Jordan
Filter-3
Jordan
Filter-4
Jordan
Filter-5
Jordan
Filter-6
[]
[]
183
183
2164
2164
3125
3125
[]
0.144
0.144
1.70
1.70
2.45
2.45
0.9
0.8
0.95
0.8
0.8
0.8
[]
Table 8: [Final Design](Shh-Josh) Resonator Filter Parameters for the Spoken Sound "Shh"
Parameter
Name
Parameter
Symbol
Units
Peak UID
Table 3
[]
Josh
Filter-1
Josh
Filter-2
Josh
Filter-3
Josh
Filter-4
Josh
Filter-5
Josh
Filter-6
Josh
Filter-7
Josh
Filter-8
3+
3+
1751*
1751*
2107**
2107**
3000
3000
1.38
1.38
1.70
1.70
0.60
0.60
0.8
0.8
0.85
0.85
0.6
0.6
Center
[]
85
85
Frequency
Sampling
[]
8
8
Frequency
Normalized
[
] 0.0668 0.0668
Frequency
[]
Responsivity
0.83
0.83
*
Shifted From Peak UID 2 : 1751 = (1801) 50
**
Shifted From Peak UID 3 : 2107 = (2067) + 100
19
Magnitude [dB]
40
X: 0.04
Y: 35.07
30
X: 2164
Y: 25.97
X: 183
Y: 33.99
20
X: 3125
Y: 13.34
10
0
-10 -2
10
-1
10
@ 0.04 [Hz]
Magnitude = 35 [dB]
Phase = 0 [Degrees]
10
10
Frequency [Hz]
10
10
10
@ 183 [Hz]
@ 2164 [Hz]
@ 3125 [Hz]
Magnitude = 40 [dB]
Magnitude = 26 [dB] Magnitude = 13 [dB]
Phase = -77 [Degrees] Phase = 22 [Degrees] Phase = -50 [Degrees]
Phase [Degrees]
100
50
X: 2164
Y: 21.6
0
X: 0.04
Y: -0.01353
X: 183
Y: -77.15
-50
X: 3125
Y: -49.9
-100
-150 -2
10
-1
10
10
10
Frequency [Hz]
10
10
10
Magnitude [dB]
X: 0.04
Y: 38.75
X: 2107
Y: 30.88
X: 85
Y: 37.36
20
X: 3000
Y: -2.68
-20 -2
10
-1
10
@ 0.04 [Hz]
Magnitude = 39 [dB]
Phase = 0 [Degrees]
10
10
Frequency [Hz]
10
10
10
@ 84 [Hz]
@ 2107 [Hz]
@ 3000 [Hz]
Magnitude = 37 [dB]
Magnitude = 31 [dB]
Magnitude = -3 [dB]
Phase = -46 [Degrees] Phase = -95 [Degrees] Phase = -157 [Degrees]
Phase [Degrees]
200
100
0
X: 85
Y: -46.27
X: 0.04
Y: -0.02249
-100
-200 -2
10
X: 2107
Y: -95
-1
10
10
10
Frequency [Hz]
10
10
X: 3000
Y: -157.3
10
20
X: 3986
Y: -2.479
System #(3)
200
X: 5
Y: -8.004
X: 262
Y: 179.9
Phase [Degrees]
Magnitude [dB]
-10
-20
X: 4
Y: -21.28
-30
-40
10
X: 3516
Y: -45.68
10
10
Frequency [Hz]
Filtered Simulated Signal [dB]
dB(HShhJDU*XZ(z)) Vs. Recording
40
-100
X: 2
Y: -61.25
10
X: 2165
Y: 24.39
X: 3246
Y: 15.83
0
-20
X: 0.4
Y: -31.41
-40
10
10
10
Frequency [Hz]
Filtered Simulated Signal
(HShhJDU*XZ(z)) Vs. Recording
10
200
X: 164
Y: 30.96
Shh-Jordan.mat
HShhJDU*XZ(z)
-200 0
10
Phase [Degrees]
Magnitude [dB]
X: 3878
Y: -178.8
-50 0
10
20
X: 4000
Y: 178.7
100
100
Shh-Jordan.mat
HShhJDU*XZ(z)
X: 230
Y: 178.8
X: 4000
Y: 179.3
-100
X: 2066
Y: -179.4
-60 -1
10
10
10
10
Frequency [Hz]
-200 -1
10
10
10
10
10
10
Frequency [Hz]
10
10
System #(4)
200
Phase [Degrees]
Magnitude [dB]
-20
-30
X: 2
Y: -22.4
X: 268
Y: -38.02
X: 14
Y: -42.22
-40
X: 3670
Y: -51.13
-50
-60 0
10
10
X: 81
Y: 32.59
X: 0.4
Y: -15.53
X: 4
Y: -33.04
-40
-100
10
10
10
Frequency [Hz]
Filtered Simulated Signal
(HShhJOSH*XZ(z)) Vs. Recording
10
200
X: 2130
Y: 26.64
0
-20
-200 0
10
10
Phase [Degrees]
20
Shh-Josh.mat
HShhJOSH*XZ(z)
100
X: 3972
Y: -169.8
10
10
Frequency [Hz]
Filtered Simulated Signal [dB]
dB(HShhJOSH*XZ(z)) Vs. Recording
40
Magnitude [dB]
X: 3985
Y: 176.3
X: 3892
Y: -4.313
-10
X: 3893
Y: -49.05
100
Shh-Josh.mat
HShhJOSH*XZ(z)
X: 3822
Y: 179.4
-100
X: 3992
Y: -178
X: 1108
Y: -45.62
-60 -1
10
10
10
10
Frequency [Hz]
10
10
-200 -1
10
10
10
10
Frequency [Hz]
10
10
21
0.6
0.6
Phase [Degrees]
Magnitude [Absolute]
Magnitude [Absolute]
0.8
System #(3)
0.8
200
200
100
100
An absolute frequency response plot of the resulting filtered signal and the original recorded waveform for the spoken phrase Shh is depicted in Figure 23.
Notably, the basic spectral trend was matched very closely and the resulting voice
was aurally verified using the MATLAB code placed in the
0.4 synthesis
0.4
0
Appendix.
0 0
10
10
0 0
10
10
10
Frequency [Hz]
Filtered Simulated Signal [ABS]
|HShhJDU*XZ(z)| Vs. Recording
10
60
Magnitude [Absolute]
Magnitude [Absolute]
Shh-Jordan.mat
HShhJDU*XZ(z)
25
20
15
10
5
0
-4000 -3000
-2000 -1000
0
1000
Frequency [Hz]
2000
3000
4000
50
40
30
20
Phase [Degrees]
35
30
0.2
-100
-200 0
10
10
10
10
10
10
10
FrequencyFrequency
[Hz]
[Hz]
Filtered Simulated
Signal
[ABS]Signal
Filtered
Simulated
|HShhJOSH*X
Vs.
Recording
(HZ(z)|
*X
(z))
Vs. Recording
ShhJDU Z
10
-100
-200
1
10
200
200
Shh-Josh.mat
Shh-Jordan.mat
HShhJOSH*X
(z) *X (z)
HShhJDU
Z
Z
100
Phase [Degrees]
0.2
Phase [Degrees]
-100
100
-100
10
0
-200 -1
0
2
-4000 -3000
-1000
0 101 1000 2000
10 -2000 10
10
FrequencyFrequency
[Hz]
[Hz]
30001034000
10
Figure 23: [Final Design Shh] Side By Side Frequency Response Comparison In Absolute Units
Left (Shh-Jordan) and Recording (Red) ; Right (Shh-Josh) and Recording (Red)
P a g e | 22
-200
1
Conclusions:
In this study, two specific voiced sounds from two unique speakers, arr and shh , were characterized.
Subsequently, four resonator filters were designed in order to synthesize each sound generated by each
speaker. Each resonator filter used to generate the arr sound was applied to an impulse train with the
fundamental frequency of each speaker. However, each resonator filter used to generate the Shh sound
was applied to zero biased white Gaussian noise.
In the preliminary resonator filter designed for 600 [] was found that resonator filters have large
bandwidths and thus may have limited frequency selectability. The frequency responses of arr test
waveforms consisted of dense frequency components greatly varying in magnitude, posing a problem in
the initial cascaded filter design.
Due to the large bandwidth of resonator filters, only a single resonator filter was utilized to cover the
entire bandwidth of the desired signal. When analyzing the arr sounds aurally, it was found that there
was no difference in pitch (to moderately trained ears) between the real audio sample and the replicated
signal, however the tonal quality was not exactly the same. The replicated signal was missing certain highend frequencies that distinguished it from a real voice. Plots of the replicated spectrums in comparison to
the desired spectrum were generated for the arr sound, and the final voice syntheses were visually and
aurally verified.
Plots of the replicated spectrum were also compared to the desired spectrum of the shh sound. This
sound was much easier to replicate, as its frequency spectrum has smooth changes in frequency
component magnitude as frequency changes. This enabled distinct filters to be placed at all frequencies
where a large magnitude was desired, without also amplifying unwanted frequencies. When analyzing
these sounds aurally, it was difficult to distinguish which was the recorded sound, and which was the
replicated spectrum. The replicated spectrums low-end frequencies were amplified just above the
desired magnitude, whereas the high-end frequencies were slightly attenuated.
Potential applications of these results would lie in the fields of voice generation and speech recognition
for automated voice messaging systems, as well as analysis of voice spectrums to verify a certain
individual said something that was recorded, such as in a criminal investigation. The applications of this
field of signal processing has not yet reached its full potential, and will continue on as an area of research
for scientists and engineers in the future.
P a g e | 23
"MathWorks - MATLAB and Simulink for Technical Computing." Web. 2 May 2015.
24
Member Contributions:
Jordan:
Wrote Large Portion of Matlab code.
Wrote report other than conclusions
Josh:
Wrote small portion of Matlab code.
Wrote report conclusions
Edited report
Both:
Filter Design
25
Appendices:
A Noise Analysis:
To better characterize the voice signals a noise sample was taken at the end of the data collection period
under the same conditions as the data presented in the main body of this report.
The surrounding noise was measured in the time domain as depicted in Figure 24 and analyzed in the
frequency domain FFT and spectrogram as depicted in Figure 25 and Figure 26 , respectively.
-4
x 10
6
4
Magnitude
2
0
-2
-4
-6
-8
0
0.5
1.5
2
2.5
3
Time Sample (0 [s] - 5 [s])
@ fsampling= 8000 [Samples/Second]
3.5
4
4
x 10
26
Noise - FFT
X: 0.2
Y: 3.353
2.5
Potential
DC - Bias
|y(jw)|
2
X: -325.4
Y: 1.489
X: 136.4
Y: 1.266
1.5
1
X: -60
Y: 0.464
X: 651
Y: 0.2963
0.5
-1000 -800
400
600
800 1000
27
B MATLAB Code:
MATLAB Code [Part 1] Attaining Voice Recordings:
%% EE317 - Signals and Systems 2
% Project 2 - Voice Synthisis
% Written By: Jordan D. Ulmer and Joshua Behnken
% Start Date: 04/17/2015
% Revision Date: 04/27/2015
%% Project 1 Code
clear all
close all
sampling_frequency = 8000;
recObj = audiorecorder(sampling_frequency,16,1)
disp('Start speaking.')
recordblocking(recObj, 5);
disp('End of Recording.');
%play(recObj);
y = getaudiodata(recObj);
figure(1);
% x_time_axis = [1:round(size(y,1)/sampling_frequency)];
plot(y);
% xlabel('Time [s]');
xlabel({'Time Sample (0 [s] - ',num2str(size(y,1)/sampling_frequency),' [s])',' @ f_s_a_m_p_l_i_n_g= 8000
[Samples/Second]'})
ylabel('Magnitude');
title('Voice - Time Waveform');
xlabel('Frequency [Hz]');
ylabel('|y(jw)|');
title('Voice - FFT');
% axis([0 440 -1 600]) %looking for 261 for middle c
figure()
spectrogram(y,128,120,1000,sampling_frequency);
view(-90,90)
title('Voice - Spectrogram')
set(gca,'ydir','reverse')
28
29
figures.parameters.legend.extra_arg
'toggle' 'off'
% INPUTS
inputs.parameters.system_number.state = inputdlg(['System Number
(Default:',num2str(inputs.parameters.system_number.DEFAULT),'): ']);
inputs.parameters.BODE_UNITS.state = inputdlg(['Bode Units "db" or "abs"
(Default:',figures.plot_parameters.bode_units.DEFAULT,'): ']);
figures.plot_parameters.grid.state
= inputdlg(['Grids "on" or "off"
(Default:',figures.plot_parameters.grid.DEFAULT,'): ']);
% Set Defaults If Needed
try
if(isempty(inputs.parameters.system_number.state{1}))
inputs.parameters.system_number.state = inputs.parameters.system_number.DEFAULT; % Set
Default
else
inputs.parameters.system_number.state = str2num(inputs.parameters.system_number.state{1});
end
catch
inputs.parameters.system_number.state = inputs.parameters.system_number.DEFAULT; % Set
Default
end
% Set Defaults If Needed
try
if(isempty(inputs.parameters.BODE_UNITS.state{1}))
inputs.parameters.BODE_UNITS.state = figures.plot_parameters.bode_units.DEFAULT; % Set
Default
else
inputs.parameters.BODE_UNITS.state = inputs.parameters.BODE_UNITS.state{1};
end
30
catch
inputs.parameters.BODE_UNITS.state = figures.plot_parameters.bode_units.DEFAULT; % Set
Default
end
% Set Defaults If Needed
try
if(isempty(figures.plot_parameters.grid.state{1}))
figures.plot_parameters.grid.state = figures.plot_parameters.grid.DEFAULT; % Set Default
else
figures.plot_parameters.grid.state = figures.plot_parameters.grid.state{1};
end
catch
figures.plot_parameters.grid.state = figures.plot_parameters.grid.DEFAULT; % Set Default
end
31
filters.resonator.arr.jordan.simulation.cos_frequencies
=
[inputs.arr.jordan.simulation.cos_frequencies(1)]; % w = f *pi/(fs/2)
filters.resonator.arr.jordan.simulation.R_responsivity
= [0.75]; % R
% Set Chosen Values
current.parameters.sampling_frequency
=
inputs.parameters.normal_sampling_frequency;
current.inputs.simulation.cos_frequencies
=
inputs.arr.jordan.simulation.cos_frequencies;
current.filters.resonator.simulation.cos_frequencies
=
filters.resonator.arr.jordan.simulation.cos_frequencies;
current.filters.resonator.simulation.R_responsivity
=
filters.resonator.arr.jordan.simulation.R_responsivity;
current.filters.resonator.filter_name
=
filters.resonator.arr.jordan.filter_name;
current.original_recording.data
=
inputs.arr.jordan.recording.data;
current.original_recording.filename
=inputs.arr.jordan.recording.filename;
case 2%-----------------------------------% External Recording File
try % The Recording .mat May Not Be Stored Locally
inputs.arr.josh.recording.filename
= 'arr_Josh.mat';
inputs.arr.josh.recording.emperical_cos_frequencies
= [68 135 204 273 341 409 475
541];% Significant Frequencies (Emperical)
inputs.arr.josh.recording.data
=
load([filelepath.recordings.root,'\',inputs.arr.josh.recording.filename],'-mat'); % Get original recording data
catch
disp('COULD NOT FIND RECORDING')
disp('ERROR! NO STANDARD OF REFERENCE!!!!!!!')
end
% Simulation Waveform
inputs.arr.josh.simulation.cos_frequencies =
inputs.arr.josh.recording.emperical_cos_frequencies(1)...
*[1:inputs.parameters.max_harmonic_number]; % cos(2*pi*f1*t)+cos(2*pi*f2*t)+...%
Simulated Input Using Fundemental Frequency
% Resonator Filter
filters.resonator.arr.josh.filter_name = 'H_a_r_r_J_O_S_H';
filters.resonator.arr.josh.simulation.cos_frequencies
=
[inputs.arr.josh.simulation.cos_frequencies(1)]; % w = f *pi/(fs/2)
filters.resonator.arr.josh.simulation.R_responsivity
= [0.55]; % R
% Set Chosen Values
current.parameters.sampling_frequency
=
inputs.parameters.normal_sampling_frequency;
current.inputs.simulation.cos_frequencies
=
inputs.arr.josh.simulation.cos_frequencies;
current.filters.resonator.simulation.cos_frequencies
=
filters.resonator.arr.josh.simulation.cos_frequencies;
current.filters.resonator.simulation.R_responsivity
=
filters.resonator.arr.josh.simulation.R_responsivity;
current.filters.resonator.filter_name
=
filters.resonator.arr.josh.filter_name;
current.original_recording.data
= inputs.arr.josh.recording.data;
current.original_recording.filename
=inputs.arr.josh.recording.filename;
case 3%-----------------------------------% External Recording File
try % The Recording .mat May Not Be Stored Locally
inputs.shh.jordan.recording.filename
= 'Shh_Jordan.mat';
inputs.shh.jordan.recording.emperical_cos_frequencies = [183 1731 2164 3125];% Significant
Frequencies (Emperical)
inputs.shh.jordan.recording.weighted_avg_of_last_two_emperical_cos_frequencies = 2041;
32
inputs.shh.jordan.recording.data
=
load([filelepath.recordings.root,'\',inputs.shh.jordan.recording.filename],'-mat'); % Get original recording data
catch
disp('COULD NOT FIND RECORDING')
disp('ERROR! NO STANDARD OF REFERENCE!!!!!!!')
end
% Simulation Waveform
inputs.shh.jordan.simulation.cos_frequencies =
inputs.shh.jordan.recording.emperical_cos_frequencies(1)...
*[1:inputs.parameters.max_harmonic_number]; % cos(2*pi*f1*t)+cos(2*pi*f2*t)+...%
Simulated Input Using Fundemental Frequency
% Resonator Filter
filters.resonator.shh.jordan.filter_name = 'H_S_h_h_J_D_U';
filters.resonator.shh.jordan.simulation.cos_frequencies
=
[inputs.shh.jordan.simulation.cos_frequencies(1),...
inputs.shh.jordan.simulation.cos_frequencies(1),...
inputs.shh.jordan.recording.emperical_cos_frequencies(3),...
inputs.shh.jordan.recording.emperical_cos_frequencies(3),...
inputs.shh.jordan.recording.emperical_cos_frequencies(4),...
inputs.shh.jordan.recording.emperical_cos_frequencies(4)];% w = f *pi/(fs/2)
filters.resonator.shh.jordan.simulation.R_responsivity
= [0.9 , 0.8 , 0.95 , 0.8 , 0.8 ,
0.8]; % R
% Set Chosen Values
current.parameters.sampling_frequency
=
inputs.parameters.normal_sampling_frequency;
current.inputs.simulation.cos_frequencies
=
inputs.shh.jordan.simulation.cos_frequencies;
current.filters.resonator.simulation.cos_frequencies
=
filters.resonator.shh.jordan.simulation.cos_frequencies;
current.filters.resonator.simulation.R_responsivity
=
filters.resonator.shh.jordan.simulation.R_responsivity;
current.filters.resonator.filter_name
=
filters.resonator.shh.jordan.filter_name;
current.original_recording.data
=
inputs.shh.jordan.recording.data;
current.original_recording.filename
=inputs.shh.jordan.recording.filename;
current.filters.forward_gain_k
= 3;
case 4%-----------------------------------% External Recording File
try % The Recording .mat May Not Be Stored Locally
inputs.shh.josh.recording.filename
= 'Shh_Josh.mat';
inputs.shh.josh.recording.emperical_cos_frequencies = [85 1801 2067 3000];% Significant
Frequencies (Emperical)
inputs.shh.josh.recording.weighted_avg_of_last_two_emperical_cos_frequencies = 1947;
inputs.shh.josh.recording.data
=
load([filelepath.recordings.root,'\',inputs.shh.josh.recording.filename],'-mat'); % Get original recording data
catch
disp('COULD NOT FIND RECORDING')
disp('ERROR! NO STANDARD OF REFERENCE!!!!!!!')
end
% Simulation Waveform
inputs.shh.josh.simulation.cos_frequencies =
inputs.shh.josh.recording.emperical_cos_frequencies(1)...
*[1:inputs.parameters.max_harmonic_number]; % cos(2*pi*f1*t)+cos(2*pi*f2*t)+...%
Simulated Input Using Fundemental Frequency
% Resonator Filter
filters.resonator.shh.josh.filter_name = 'H_S_h_h_J_O_S_H';
filters.resonator.shh.josh.simulation.cos_frequencies
=
[inputs.shh.josh.simulation.cos_frequencies(1),...
33
inputs.shh.josh.simulation.cos_frequencies(1),...
inputs.shh.josh.recording.emperical_cos_frequencies(2)-50,...
inputs.shh.josh.recording.emperical_cos_frequencies(2)-50,...
inputs.shh.josh.recording.emperical_cos_frequencies(3)+100,...
inputs.shh.josh.recording.emperical_cos_frequencies(3)+100,...
inputs.shh.josh.recording.emperical_cos_frequencies(4),...
inputs.shh.josh.recording.emperical_cos_frequencies(4)]; % w = f *pi/(fs/2)
filters.resonator.shh.josh.simulation.R_responsivity
= [0.83 , 0.83 , 0.8 , 0.8 , 0.85
, 0.85, 0.6 , 0.6]; % R
% Set Chosen Values
current.parameters.sampling_frequency
=
inputs.parameters.normal_sampling_frequency;
current.inputs.simulation.cos_frequencies
=
inputs.shh.josh.simulation.cos_frequencies;
current.filters.resonator.simulation.cos_frequencies
=
filters.resonator.shh.josh.simulation.cos_frequencies;
current.filters.resonator.simulation.R_responsivity
=
filters.resonator.shh.josh.simulation.R_responsivity;
current.filters.resonator.filter_name
=
filters.resonator.shh.josh.filter_name;
current.original_recording.data
=
inputs.shh.josh.recording.data;
current.original_recording.filename
=inputs.shh.josh.recording.filename;
otherwise%-----------------------------------disp('NOT AN OPTION - THAT SYSTEM DOES NOT EXIST')
break
end % The system has been chosen
% Time Vector
current.simulation.time_duration = 1;
current.simulation.time_vector = [1/current.parameters.sampling_frequency:...
1/current.parameters.sampling_frequency:...
current.simulation.time_duration];
switch inputs.parameters.system_number.state
case {0,1,2}%-----------------------------------% Time Domain Signal
current.simulation.x_t = zeros(1,numel(current.simulation.time_vector));
% Sumation of Cosines
for h = 1:numel(current.inputs.simulation.cos_frequencies)
current.simulation.x_t = current.simulation.x_t +
cos(2*pi*current.inputs.simulation.cos_frequencies(h)*current.simulation.time_vector);
end
% Scale Time Domain Input
current.simulation.time_amplitude_scaling_factor = 1/100;
current.simulation.x_t = current.simulation.x_t *
current.simulation.time_amplitude_scaling_factor;% Scale
% Add Uniform Random Time Domain Noise
current.simulation.SNR = (1/5)*current.simulation.time_amplitude_scaling_factor;
current.simulation.noise_t =(rand(1,numel(current.simulation.x_t)))...
.*current.simulation.SNR.*current.simulation.time_amplitude_scaling_factor;
current.simulation.x_t = current.simulation.x_t + current.simulation.noise_t; % Add Noise
case{3,4}%------------------------------------.
% Time Domain Signal
current.simulation.x_t = zeros(1,numel(current.simulation.time_vector));
% Add White Gaussian Noise
current.simulation.SNR = 10 ;
current.simulation.x_t = awgn(current.simulation.x_t,current.simulation.SNR);
34
35
end
36
% Decibles % DEFAULT
semilogx(W_Frequency_Hz,db(abs(H_Filter)))
ylabel('Magnitude [dB]')
end
xlabel('Frequency [Hz]')
title({[current.filters.resonator.filter_name,' ; ',current.system.name],...
['Cascade Resonator Filter H(z)=1/((z-[R*cos(\omega)+j*R*sin(\omega)])*(z-[R*cos(\omega)j*R*sin(\omega)]))'] ,...
['R = ',num2str(current.filters.resonator.simulation.R_responsivity,'%.2e ; ')],...
['\omega = ',num2str(current.filters.resonator.w_filter,'%.2e [rad/s] ; ')],...
[' f = ',num2str(current.filters.resonator.simulation.cos_frequencies,'%.2e [Hz] ; ')]})
eval(['grid ' ,figures.plot_parameters.grid.state])
subplot(2,1,2)
semilogx(W_Frequency_Hz,rad2deg(angle(H_Filter)))
xlabel('Frequency [Hz]')
ylabel('Phase [Degrees]')
eval(['grid ' ,figures.plot_parameters.grid.state])
% Plot BODE Original Signal
figure(3)
set(gcf,'Units','normalized','Position',figures.position.rightScreen_fit_normalized,...
'Name',['Task (1) . Simulation'])
suptitle([current.system.name]);
subplot(2,2,[1,1])
% Use Absolute or Decibles?
if
strncmpi(figures.plot_parameters.bode_units.state,figures.plot_parameters.bode_units.DB,2)==0 % If first
two chars don't match
% Absolute
semilogx(figures.xaxis.positive_and_negative_frequency,abs(current.simulation.X_z));
title({'Original Simulated Signal [ABS]' , '|X_z(r) + Gaussian Noise|'},'horizontalAlignment',
'right')
xlabel('Frequency [Hz]')
ylabel('Magnitude [Absolute]')
eval(['grid ' ,figures.plot_parameters.grid.state])
else
% Decibles % DEFAULT
semilogx(figures.xaxis.positive_and_negative_frequency,db(abs(current.simulation.X_z)));
title({'Original Simulated Signal [dB]' , 'dB(X_z(r) + Gaussian
Noise)'},'horizontalAlignment', 'right')
xlabel('Frequency [Hz]')
ylabel('Magnitude [dB]')
eval(['grid ' ,figures.plot_parameters.grid.state])
end
subplot(2,2,[2,2])
semilogx(figures.xaxis.positive_and_negative_frequency,rad2deg(angle((current.simulation.X_z))));
title({'Original Simulated Signal' ,['\angle( X_z(r) + Gaussian Noise )']},'horizontalAlignment',
'left')
xlabel('Frequency [Hz]')
ylabel('Phase [Degrees]')
eval(['grid ' ,figures.plot_parameters.grid.state])
% Plot BODE of Filtered Signal
figure(3)
subplot(2,2,[3,3])
% Use Absolute or Decibles?
37
if
strncmpi(figures.plot_parameters.bode_units.state,figures.plot_parameters.bode_units.DB,2)==0 % If first
two chars don't match
% Absolute
try % The Recording .mat May Not Be Stored Locally
hold on
current.simulation.Y_z=fft(current.original_recording.data.y);
semilogx(current.original_recording.data.frequency_xaxis,abs(current.simulation.Y_z),'r');
semilogx(figures.xaxis.positive_and_negative_frequency,abs(current.simulation.X_Z_filtered))
hold off
legend(strrep(current.original_recording.filename,'_',''),[current.filters.resonator.filter_name,'*X_Z(z)'])
legend('Location',figures.parameters.legend.location)
legend(figures.parameters.legend.extra_arg)
title({'Filtered Simulated Signal
[ABS]',['|',current.filters.resonator.filter_name,'*X_Z(z)| Vs. Recording']})
catch
hold off
semilogx(figures.xaxis.positive_and_negative_frequency,abs(current.simulation.X_Z_filtered))
title({'Filtered Simulated Signal
[ABS]',['|',current.filters.resonator.filter_name,'*X_Z(z)|']})
disp('COULD NOT FIND RECORDING')
disp('ERROR! NO STANDARD OF REFERENCE!!!!!!!')
end
eval(['grid ' ,figures.plot_parameters.grid.state])
xlabel('Frequency [Hz]')
ylabel('Magnitude [Absolute]')
else
% Decibles
try % The Recording .mat May Not Be Stored Locally
current.simulation.Y_z=fft(current.original_recording.data.y);
semilogx(current.original_recording.data.frequency_xaxis,db(abs(current.simulation.Y_z)),'r');
hold on
semilogx(figures.xaxis.positive_and_negative_frequency,db(abs(current.simulation.X_Z_filtered)))
hold off
legend(strrep(current.original_recording.filename,'_',''),[current.filters.resonator.filter_name,'*X_Z(z)'],'Location','Best')
legend('Location',figures.parameters.legend.location)
legend(figures.parameters.legend.extra_arg)
title({'Filtered Simulated Signal
[dB]',['dB(',current.filters.resonator.filter_name,'*X_Z(z)) Vs. Recording']})
catch
semilogx(figures.xaxis.positive_and_negative_frequency,db(abs(current.simulation.X_Z_filtered)))
title({'Filtered Simulated Signal
[dB]',['dB(',current.filters.resonator.filter_name,'*X_Z(z))']})
disp('COULD NOT FIND RECORDING')
disp('ERROR! NO STANDARD OF REFERENCE!!!!!!!')
end
xlabel('Frequency [Hz]')
ylabel('Magnitude [dB]')
eval(['grid ' ,figures.plot_parameters.grid.state])
end
subplot(2,2,[4,4])
38
39
40
% otherwise
end
% Change Figure
set(figure(open_figures(f_index)), 'Position', figures.save.position.current); % Position
set(findall(gca,'property','linewidth'),'linewidth',figures.save.parameters.LINEWIDTH.current) % Line Width
set(findall(f_index,'property','FontSize'),'FontSize',figures.save.parameters.FONTSIZE.current)
% Save Figure
set(gcf, 'PaperUnits', figures.save.paper_units.current );
set(gcf, 'PaperPosition', figures.save.paper_position.current);
saveas(f_index, filepath.save.current ,'fig')
saveas(f_index, filepath.save.current ,'tif')
set(figure(open_figures(f_index)), 'WindowStyle', 'docked') % Dock window in order
end
% Save MATLAB
save([filelepath.save.figs,current.system.name,'MATLAB']);
disp('All figures have been docked and are now ordered sequentially for your convenience...')
% end % END Overall For Loop To Generate Figures
41
43
Table of Contents:
EE 317 Signals and Systems II.............................................................................Error! Bookmark not defined.
Background: ...................................................................................................Error! Bookmark not defined.
Introduction: ..................................................................................................Error! Bookmark not defined.
Initial Resonator Filter Design: ......................................................................Error! Bookmark not defined.
Voice Characterization:..................................................................................Error! Bookmark not defined.
Characterizing the Voice Waveform for the Spoken Phrase arr: ...............Error! Bookmark not defined.
Characterizing the Voice Waveform for the Spoken Phrase Shh: ..............Error! Bookmark not defined.
Voice Synthesis: .............................................................................................Error! Bookmark not defined.
Synthesizing a Waveform for the Spoken Phrase arr: ............................Error! Bookmark not defined.
Synthesizing a Waveform for the Spoken Phrase Shh:...........................Error! Bookmark not defined.
Conclusions: ...................................................................................................Error! Bookmark not defined.
Works Cited & Works Consulted: ..................................................................Error! Bookmark not defined.
Member Contributions: .....................................................................................Error! Bookmark not defined.
Appendices:........................................................................................................Error! Bookmark not defined.
A Noise Analysis: .........................................................................................Error! Bookmark not defined.
B MATLAB Code: .........................................................................................Error! Bookmark not defined.
MATLAB Code [Part 1] Attaining Voice Recordings: ...............................Error! Bookmark not defined.
MATLAB Code [Part 2] Data Analysis: .....................................................Error! Bookmark not defined.
B [TOC] Table of Contents:................................................................................................................... 42
Figure List: .............................................................................................................................................. 42
Table List: ............................................................................................................................................... 42
Table of Contents: .................................................................................................................................. 43
44