Académique Documents
Professionnel Documents
Culture Documents
Abstract: Speech Recognition Systems(SRS) have been implemented by various processors including the
digital signal processors(DSPs) and field programmable gate arrays(FPGAs) and their performance has been
reported in literature. The fundamental purpose of speech is communication, i.e., the transmission of
messages.In the case of speech, the fundamental analog form of the message is an acoustic waveform, which we
call the speech signal. Speech signals can be converted to an electrical waveform by a microphone, further
manipulated by both analog and digital signal processing, and then converted back to acoustic form by a
loudspeaker, a telephone handset or headphone, as desired.The recognition of speech requires feature extraction
and classification. The systems that use speech as input require a microcontroller to carry out the desired
actions. In this paper, Cypress Programmable System on Chip (PSoC) has been studied and used for
implementation of SRS. From all the available PSoCs, PSoC5 containing ARM Cortex-M3 as its CPU is used.
The noise signals are firstly nullified from the speech signals using LogMMSE filtering. These signals are then
sent to the PSoC5 wherein the speech is recognized and desired actions are performed.
Keywords: PSoC, LogMMSE, Speech Recognition
I.
INTRODUCTION
The basic idea of speech is the transmission of messages. A message is represented as a sequence of
discrete symbols that quantifies its information in bits and the rate at which information is transmitted as bits per
second (bps). Speech recognition techniques have seemed to be more efficient and convenient for humanmachine interaction. The speech recognition systems with fixed vocabulary were deployed in many
applications[8] and [9]. Speech recognition systems for voice operated application have been implemented using
various hardware platforms such as the DSPs [3], FPGAs [5] and microprocessors [10]. In speech production,
the information to be transmitted is encoded in the form of a continuously varying analog waveform that can be
transmitted, recorded, manipulated, and ultimately decoded by a human listener. This analog signal is the speech
signal. These signals tend to be corrupted by noise in the real world. If the noise can be estimated from the noise
source, this estimated noise can then be subtracted from the primary channel resulting in the desired signal. This
task is usually done by linear filtering. In real time situations, the corrupting noise is a nonlinear distortion
version of the source noise, so a nonlinear filter should be a better choice. To reduce the influence of noise in the
speech, speech enhancement is done. The recognition algorithm without the use of enhancement algorithms
proved to be less efficient.
Programmable System on Chip (PSoC) have and are being employed in a number of applications. They
are cost effective due to which they have limited storage and computational power. In context to this, the
recognition accuracy becomes important for PSoC and is addressed in the paper.
II.
Programmable System on Chip(PSoC) has been designed and implemented by Cypress semiconductors
[2] and [4]. Every PSoC contains a microcontroller, programmable analog blocks such as ADC, DAC, I/O
drivers and digital blocks such as Universal Digital Blocks (UDBs), CAN, I2C, PWM in a single chip.
Embedded Development kits from Cypress contain one of the three PSoCs PsoC1, PSoC3 and PSoC5. The
processing performance, functionality, internal memories and configurability of the PSoC increases from
PSoC1through PSoc5.
www.irjes.com
1 | Page
ADCs
DACs
I/Os
PSoC1
8-bit M8C
core
4 to 32kB
IC,
SPI,
UART,
FS USB 2.0
1
deltasigma
2(6 bit)
64
PSoC3
8-bit 8051
8 to 64kB
IC,
SPI,
UART,
LIN,
FS
USB
2.0,
IS,
CAN
1
DeltaSigma
4(8 bit)
72
PSoC5
32-bit ARM
Cortex
32 to 256kB
IC,
SPI,
UART, LIN,
FS USB 2.0,
IS,CAN
1 Delta-Sigma,
2SARs
4(8 bit)
72
III.
www.irjes.com
2 | Page
IV.
TECHNIQUES USED
The performance of the SRS degrades when implemented in real world environment. This degradation
is due to acoustic model mismatch. The acoustic model mismatch describes the difference between the
environment in which the SRS is tested and the actual environment in which it is deployed. This can include
echoes, background noise, speaker variability and transmission effects.
1.
www.irjes.com
3 | Page
0.04
0.6
0.5
0.02
0.4
0
0.3
-0.02
0.2
-0.04
0.1
-0.06
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
www.irjes.com
4 | Page
Fig. 4 MFCC
In speech recognition, the Mel-frequency cepstrum (MFC) is a representation of the shortterm power
spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear Mel scale of
frequency.
MFCC = DCT [ LOG [ ABS [ FFT (SPEECH) ]]]
Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. The difference
between the cepstrum and the mel-frequency cepstrum is that in the MFC, the frequency bands are equally
spaced on the mel scale, which approximates the human auditory system's response more closely than the
linearly spaced frequency bands used in the normal cepstrum. This frequency warping can allow for better
representation of sound.
V.
The goal of the system training and inventory design stage is twofold: we need to divide the inventory
into collections of phonetically similar segments with varying lengths and we need to arrive at a statistical
description that tells us which set of collections is most likely to contain the inventory subsection that best
matches the underlying clean frame of an incoming noisy frame . The division of the inventory into the
collections is performed in a step-by-step fashion. First, is segmented and all silent segments are removed. The
non-silent part of the inventory is then divided into sections that each belong to one of 40 phonetic classes.
We are applying the feature extraction to the entire segment stream of the inventory. Because the inventory
signal is assumed to be virtually undistorted, it is sufficient to only retain the resulting short-time MFCC feature
means and to discard the associated variance estimates. The feature means become, thereby, essentially feature
vectors in their own right and we can develop a cluster model for them. We have decided to use the means and
not the actual cepstral vectors at this stage to ensure that the impact of the mean-extraction-processing is
captured in our feature representation.
VI.
RESULTS
www.irjes.com
5 | Page
VI.
CONCLUSION
In this paper speech recognition is done and appliances are controlled using speech commands. The
commands are given by the user in the form of speech. These signals are filtered using LogMMSE filtering due
to which environmental noise is nullified. Further as a part of feature extraction, MFCC is used. A database is
created where all the commands are saved. These commands are then given as input to the PSoC5 kit where the
PSoC is programmed to give the desired result. According to the PSoC, the robot connected to the PSoC moves
in different directions as specified.
REFERENCES
[1].
[2].
[3].
[4].
V. Naresh, B. Venkataramani, Abhishek Karan and J. Manikandan, " PSoCbased isolated speech
recognition system, " International conference on Communication and Signal Processing, April 3-5,
2013.
R Namba, K Kobayashi, T Ohkubo and Y.Kurihara, "Development of PSoC microcontroller based
solar energy storage system," Proceedings of SICE Annual Conference (SICE), 2011, pp.718-721,
2011.
J. Manikandan, B. Venkataramani, K. Girish, H. Karthic, V. Siddharth, "Hardware Implementation of
Real-Time Speech Recognition System Using TMS320C6713 DSP",24th International Conference
onVLSI Design (VLSI Design),pp.250-255, 2011
Jingchuan Wang and WeidongChen , "Integration of PSoC technology with educational robotics",
International Conference on Field-Programmable Technology (FPT), 2010, pp.332-336, 2010.
www.irjes.com
6 | Page
Cheng-Yuan Chang , Ching-Fa Chen , Shing-Tai Pan , Xu-Yu Li, " The speech recognition chip
implementation on FPGA , Mechanical and Electronics Engineering (ICMEE), 2nd International
Conference,2010.
[6]. V. Amudha, B. Venkataramani, J. Manikandan, "FPGA implementation of isolated digit recognition
system using modified back propagation algorithm,"International Conference on Electronic Design
ICED 2008, pp.1-6.
[7]. V.Amudha, B.Venkataramani, R.Vinoth Kumar and S. Ravishankar,SOC Implementation of HMM
Based Speaker Independent Isolated Digit Recognition System, in Proc. of IEEE Int. Conf. on
VLSIDesign VLSI07, 2007, pp.848-853.
[8]. Trihandyo, A. Belloum, A. Kun-Mean Hou, A real-time speech recognition architecture for a multichannel interactive voice response system, Iternational Conference on Acoustics, Speech, and Signal
ProcessingICASSP-97, vol.2, pp.1527-1530,1997
[9]. Mike Wald, Using Automatic Speech Recognition to Enhance Education for All Students: Turning a
Vision into Reality [A].In Proceedings of 34th ASEE/IEEE Frontiers in Education Conference S3G,
Indianapolis, Indiana, 2005, pp 22-25.
[10]. N. Hataoka, H. Kokubo, Y. Obuchi, and A. Amano, "Compact and robust speech recognition for
embedded use on microprocessors," IEEE Workshop on Multimedia Signal Processing, pp. 288-291, 911 Dec. 2002.
www.irjes.com
7 | Page