Project Report

1. INTRODUCTION Speech is a natural mode of communication for people.
We learn all the relevant skills during early childhood, without instruction, and we continue to rely on speech communication throughout our lives. It comes so naturally to us that we don't realize how complex a phenomenon speech is. The human vocal tract and articulators are biological organs with nonlinear properties, whose operation are not just under conscious control but also affected by factors ranging from gender to upbringing to emotional state. As a result, vocalizations can vary widely in terms of their accent, pronunciation, articulation, roughness, nasality, pitch, volume, and speed; moreover, during transmission, our irregular speech patterns can be further distorted by background noise and echoes, as well as electrical characteristics (if telephones or other electronic equipment are used). All these sources of variability make speech recognition, even more than speech generation, a very complex problem. What makes people so good at recognizing speech? Intriguingly, the human brain is known to be wired differently than a conventional computer; in fact it operates under a radically different computational paradigm. While conventional computers use a very fast & complex central processor with explicit program instructions and locally addressable memory, by contrast the human brain uses a massively parallel collection of slow & simple processing elements (neurons), densely connected by weights (synapses) whose strengths are modified with experience, directly supporting the integration of multiple constraints, and providing a distributed form of associative memory. Speech recognition is a multileveled pattern recognition task, in which acoustical signals are examined and structured into a hierarchy of subword units (e.g., phonemes), words, phrases, and sentences. Each level may provide additional temporal constraints, e.g., known word pronunciations or legal word sequences, which can compensate for errors or uncertainties at lower levels. This hierarchy of constraints can best be exploited by combining decisions probabilistically at all lower levels, and making discrete decisions only at the highest level. Introduction of Project
Here we are designing a security based army tank. In an army tank there are mainly four authenticated users namely the driver, the navigator, the loader and the fireman. It is a desirable feature of the tank that it should not be used by unauthorized persons. In the project the fingerprint module firstly takes the fingerprint of the person wishing to enter the tank. It compares the fingerprint with the fingerprint available in the microcontroller database. If the fingerprint matches it provides further access and if it does not match it displays an error. Now the authorized person can control the locomotion of the tank by voice commands. The voice command given by the person is compared with the voice command stored in the microcontroller and if the command matches the tank operates accordingly. If in case the command does not match it displays an error. The microcontroller then rotates the DC motors via the H bridge which are the motor drivers. The tank moves forward, reverse, right turn, left turn, load, fire, lights on and stop. In this project we are going to control a tank by using speech command. For this we have voice recognition system using VRbot Module. VRbot Module has built in capacity of speech recognition and control. By using this voice recognition system we are going to control movements of tank using microcontroller. Currently many systems are developed for voice recognition and application control. The main drawback of these systems is that they are developed using software algorithms thus they require computer to access them. Therefore these systems are inconvenient for mobile applications such as moving robots. To overcome this difficulty we decided to design a speech recognition system by using hardware only. Advantage of this system is that it cheap as compare to currently available system. Also this system is more user friendly as compare to software applications. Voice recognition system performs two fundamental operations, Signal Modelling and Pattern Matching. Signal modelling represents process of converting speech signal into a set of parameters. Pattern matching is the task of finding parameter set from memory which closely matches the parameter set obtained from the input speech signal. We have designed a very basic Speaker-dependent Voice recognition system that identifies isolated spoken words. Strictly speaking, voice is also a physiological trait because every person has a
different vocal tract, but voice recognition is mainly based on the study of the way a person speaks, commonly classified as behavioral. Biometrics comprises methods for uniquely recognizing humans based upon one or more intrinsic physical or behavioral traits. In computer science, in particular, biometrics is used as a form of identity access management and access control. It is also used to identify individuals in groups that are under surveillance. Biometric characteristics can be divided in two main classes: 1. Physiological are related to the shape of the body. Examples include, but are not limited to fingerprint, face recognition, DNA, Palm print, hand geometry, iris recognition, which has largely replaced retina, and odor/scent. 2. Behavioral are related to the behavior of a person. Examples include, but are not limited to typing rhythm, gait, and voice. Some researchers have coined the term behaviometrics for this class of biometrics. A biometric system can operate in the following two modes: 1. Verification A one to one comparison of a captured biometric with a stored template to verify that the individual is who he claims to be. Can be done in conjunction with a smart card, username or ID number. 2. Identification A one to many comparison of the captured biometric against a biometric database in attempt to identify an unknown individual. The identification only succeeds in identifying the individual if the comparison of the biometric sample to a template in the database falls within a previously set threshold. The first time an individual uses a biometric system is called an enrollment. During the enrollment, biometric information from an individual is stored. In subsequent uses, biometric information is detected and compared with the information stored at the time of enrollment. Note that it is crucial that storage and retrieval of such systems themselves be secure if the biometric system is to be robust.
1.2 Motivation of Project The most vital component in the success of any scientific advancements, innovations or breakthrough would be to put to use the technology into the application specific programs for the welfare of the human being. The recent developments in speech technology and biomedical engineering world over have diverted the attention of researchers and technocrats to concentrate more towards the design and development of simple, cost effective and technically viable solutions for the welfare of society. Speech being the most natural and fastest means of communication is now being applied to machines to respond on recognition of voice commands. The present submission intends to focus on one such augmentative and in certain cases alternative approach to ensure the full, equal and active participation by way of developing a voice operated intelligent motorized wheelchair [4]. Getting motivated from the above concept we have decided to create an army tank which would be operated completely by our voice commands. The tank is not only capable of basic locomotion that is moving in forward and reverse direction but it is also capable of performing actions such as load and attack by just obeying the voice commands. The problem that then arises was of the security issue, since it is an army tank it is expected that the security measures have to be taken into consideration so that no unwanted person can enter inside the tank and misuse the tank. To solve this problem we have included a fingerprint module. The fingerprint module is solely used for authentication process. Extensive research has been done on fingerprints in humans. Two of the fundamentally important conclusions that have risen from research are: 1. A person's fingerprint will not naturally change structure after about one year after birth 2. The fingerprints of individuals are unique. Even the fingerprints in twins are not the same. In practice two humans with the same fingerprint have never been found. Since fingerprint is a form of biometric security it can be rest assured that the army tank can now be operated only by the desired personal.
1.3 Organization of Project This project gives exact concept of controlling a robot by a voice instruction. This project is the first step to design of voice based robotic automation projects. The speech recognition system is easy to use programmable speech recognition circuit. Programmable, in the sense that the system to be trained the words (or vocal utterances) the user wants the circuit to recognize. The speech recognition system is the heart of our project. WE have also used a fingerprint scanner for authentication purpose. A fingerprint sensor is an electronic device used to capture a digital image of the fingerprint pattern. The captured image is called a live scan. This live scan is digitally processed to create a biometric template (a collection of extracted features) which is stored and used for matching. Capacitance sensors utilize the principles associated with capacitance in order to form fingerprint images. In this method of imaging, the sensor array pixels each act as one plate of a parallel-plate capacitor, the dermal layer (which is electrically conductive) acts as the other plate, and the non-conductive epidermal layer acts as a dielectric. Passive capacitance: A passive capacitance sensor uses the principle outlined above to form an image of the fingerprint patterns on the dermal layer of skin. Each sensor pixel is used to measure the capacitance at that point of the array. The capacitance varies between the ridges and valleys of the fingerprint due to the fact that the volume between the dermal layer and sensing element in valleys contains an air gap. The dielectric constant of the epidermis and the area of the sensing element are known values. The measured capacitance values are then used to distinguish between fingerprint ridges and valleys. Active capacitance: Active capacitance sensors use a charging cycle to apply a voltage to the skin before measurement takes place. The application of voltage charges the effective capacitor. The electric field between the finger and sensor follows the pattern of the ridges in the dermal skin layer. On the discharge cycle, the voltage across the dermal layer and sensing element is compared against a reference voltage in order to calculate the capacitance. The distance values are then calculated mathematically, using the above equations, and used to form an image of the fingerprint. Active capacitance sensors measure the ridge patterns of the dermal layer like the ultrasonic method.
WE have used an LCD module for showing the matching of fingerprint or showing an error message if an error has occurred. It is used for debugging purpose. LCD displays utilize two sheets of polarizing material with a liquid crystal solution between them. An electric current passed through the liquid causes the crystals to align so that light cannot pass through them. Each crystal, therefore, is like a shutter, either allowing light to pass through or blocking the light. It is used for debugging purpose, to illustrate whether there is an error or not and also for display whether owner is correct or not. We have used a transistorized relay circuit for switching between the two main modules of our project that is the finger print module and voice module. The use of relay becomes essential because the microcontroller has only one com port so we cannot interface two modules directly. The D.C. motor, by its very nature, has a high torque vs. falling speed characteristic and this enables it to deal with high starting torques and to absorb sudden rises in load easily. The speed of the motor adjusts to the load. ontroller 1.4.ELECTRICAL SPECIFICATION OF PROJECT 1. Voice recognition module: 5V,12 mA 2. Fingerprint module: 3.3V,100mA 3. Microcontroller:4.4V-5.5V,30mA 4. DC geared motor: 12Vdc, current 500mA, 30rpm motor 5. Lcd display: 5V,0.2mA 6. MAX 232 : 5V,10mA 7. Relay:5V,25mA 8. DC Motor driver IC:4.5V-36V,600mA per channel
2. LITERATURE SURVEY
Introduction Speech is the primary means of communication between people. For reasons ranging from technological curiosity about the mechanisms for mechanical realization of human speech capabilities, to the desire to automate simple tasks inherently requiring human-machine interactions, research in automatic speech recognition (and speech synthesis) by machine has attracted a great deal of attention over the past five decades. The desire for automation of simple tasks is not a modern phenomenon, but one that goes back more than one hundred years in history. By way of example, in 1881 Alexander Graham Bell, his cousin Chichester Bell and Charles Sumner Tainter invented a recording device that used a rotating cylinder with a wax coating on which up-and-down grooves could be cut by a stylus, which responded to incoming sound pressure (in much the same way as a microphone that Bell invented earlier for use with the telephone). Based on this invention, Bell and Tainter formed the Volta Graphophone Co. in 1888 in order to manufacture machines for the recording and reproduction of sound in office environments. The American Graphophone Co., which later became the Columbia Graphophone Co., acquired the patent in 1907 and trademarked the term Dictaphone. Just about the same time, Thomas Edison invented the phonograph using a tinfoil based cylinder, which was subsequently adapted to wax, and developed the Ediphone to compete directly with Columbia. The purpose of these products was to record dictation of notes and letters for a secretary (likely in a large pool that offered the service as shown in Figure 1) who would later type them out (offline), thereby circumventing the need for costly stenographers. This turn-of-the-century concept of office mechanization spawned a range of electric and electronic implements and improvements, including the electric typewriter, which changed the face of office automation in the mid-part of the twentieth century. It does not take much imagination to envision the obvious interest in creating an automatic typewriter that could directly respond to and transcribe a humans voice without having to deal with the annoyance of recording and handling the speech on wax cylinders or other recording media. A similar kind of automation took place a century later in the 1990s in the area of call centers. A call center is a concentration of agents or associates that handle telephone calls from
7
customers requesting assistance. Among the tasks of such call centers are routing the in-coming calls to the proper department, where specific help is provided or where transactions are carried out. One example of such a service was the AT&T Operator line which helped a caller place calls, arrange payment methods, and conduct credit card transactions. The number of agent positions (or stations) in a large call center could reach several thousand. Automatic speech recognition technologies provided the capability of automating these call handling functions, thereby reducing the large operating cost of a call center. By way of example, the AT&T Voice Recognition Call Processing (VRCP) service, which was introduced into the AT&T Network in 1992, routinely handles about 1.2 billion voice transactions with machines each year using automatic speech recognition technology to appropriately route and handle the calls [3]. Speech recognition technology has also been a topic of great interest to a broad general population since it became popularized in several blockbuster movies of the 1960s and 1970s, most notably Stanley Kubricks acclaimed movie 2001: A Space Odyssey. In this movie, an intelligent computer named HAL spoke in a natural sounding voice and was able to recognize and understand fluently spoken speech, and respond accordingly. This anthropomorphism of HAL made the general public aware of the potential of intelligent machines. In the famous Star Wars saga, George Lucas extended the abilities of intelligent machines by making them mobile as well as intelligent and the droids like R2D2 and C3PO were able to speak naturally, recognize and understand fluent speech, and move around and interact with their environment, with other droids, and with the human population at large. More recently (in 1988), in the technology community, Apple Computer created a vision of speech technology and computers for the year 2011, titled Knowledge Navigator, which defined the concepts of a Speech User Interface (SUI) and a Multimodal User Interface (MUI) along with the theme of intelligent voice-enabled agents. This video had a dramatic effect in the technical community and focused technology efforts, especially in the area of visual talking agents
. Figure 1 An early 20th century transcribing pool at Sears, Roebuck and Co. The women are using cylinder dictation machines, and listening to the recordings with ear-tubes (David Morton, the history of Sound Recording History) Today speech technologies are commercially available for a limited but interesting range of tasks. These technologies enable machines to respond correctly and reliably to human voices, and provide useful and valuable services. While we are still far from having a machine that converses with humans on any topic like another human, many important scientific and technological advances have taken place, bringing us closer to the Holy Grail of machines that recognize and understand fluently spoken speech. This article attempts to provide an historic perspective on key inventions that have enabled progress in speech recognition and language understanding and briefly reviews several technology milestones as well as enumerating some of the remaining challenges that lie ahead of us.
One of the earliest Speech recognition devices was the IBM Shoebox [2], exhibited at the 1964 New York World's Fair. One of the most notable domains in commercial application of speech recognition device in United States has been healthcare and Medical Transcriptionist (MT). It is the process of generating medical records using speech. Many Electronic Medical Records (EMR) applications can be more effective and may be performed more easily when deployed in conjunction with a speech-recognition engine. Searches, queries, and form filling may all be faster to perform by voice than by using a keyboard.
9
Substantial efforts have been devoted in the last decade to the test and evaluation of speech recognition in fighter aircraft. Of particular note are the U.S. program in speech recognition for the Advanced Fighter Technology Integration (AFTI)/F-16 aircraft (F-16 VISTA), the program in France on installing speech recognition systems on Mirage aircraft, and programs in the UK dealing with a variety of aircraft platforms. In these programs, speech recognizers have been operated successfully in fighter aircraft with applications including: setting radio frequencies, commanding an autopilot system, setting steer-point coordinates and weapons release parameters, and controlling flight displays. Generally, only very limited, constrained vocabularies have been used successfully, and a major effort has been devoted to integration of the speech recognizer with the avionics system. Automatic Speech Recognition Systems are currently being used as answering machines in Telephone systems. These systems take voice command from users and perform specific operations. People with disabilities can benefit from speech recognition programs. Speech recognition is especially useful for people who have difficulty using their hands. Such systems include Voice Operated Wheelchairs and Voice operated computers[3]. H.R. Singh, Abdul Mobin, Sanjiv Kumar, Sandip Chuhan and S.S. Agrwal have published paper on Designed and Development of Voice/Joystick Operated Microcontroller Based Intelligent Motorised Wheelchair. They have developed a system which built around N80C196KC microcontroller, received coded signal from an isolated word recognition system and mechanical joystick mechanism to control various movements and operations of the chair. The main contribution is that they developed highly accurate language independent and speaker adaptable (IWSR) Isolated word recognition System. Masato Nishimori, Takeshi Saitoh and Ryosuke Konishi from Tottori University, Tottori, Japan In order to assist physically handicapped persons, had developed a voice controlled wheelchair [4]. In which the user can control the wheelchair by voice commands, such as susume (run forward) in Japanese. A grammar-based recognition parser named Julian is used in our system. Three type commands, the basic reaction command, the short moving
10
reaction command, and the verification command, are given. We experimented speech recognition by Julian, and obtained a successful recognition rate 98.3%, 97.0% of the movement command and the verification command, respectively. The running experiment with three persons was carried out. The technology of Automatic Speech Recognition (ASR) and Transcription has progressed greatly over the past few years. Ever since research of this technology began in 1936, the largest barriers to the speed and accuracy of speech recognition was computer speed and power (or lack there of). With the average the CPU now at and above a Pentium III and RAM levels at 500 MB and up, accuracy levels have reached 95% and better with transcription speeds at over 160 words per minutes. As mentioned above, the study of automatic speech recognition [5] and transcription began in the 1936 with ATT&T's Bell Labs. At that time, most research was funded and performed by Universities and the U.S. Government (primarily by the Military and DARPA Defense Advanced Research Project Agency). It was not until the early 1980's when the technology reached the commercial market. Like most emerging technologies, there were several competing research "camps", each working independently to develop speech recognition. The first company to launch a commercial product was COVOX in 1982. COVOX brought digital sound (via The Voice Master, Sound Master and The Speech Thing) to the Commodore 64, Atari 400/800, and finally to the IBM PC in the mid 80s. Along with (or bundled) this introduction of sound to computers came Speech Recognition. Another company that was founded in 1982 and whose eventual product has become the overwhelming leader in the speech recognition market was Dragon Systems. Scan soft, Inc. now owns and manufactures this product, Dragon Naturally Speaking. Dragon Systems was founded in 1982 by James and Janet Baker to commercialize speech recognition technology. As graduate students at Rockefeller University in 1970, they became interested in speech recognition while observing waveforms of speech on an oscilloscope. At the time, systems were in place for recognizing a few hundred words of discrete speech, provided the system was trained on the speaker and the speaker paused between words. There were not yet
11
techniques that could sort through naturally spoken sentences. James Baker saw the waveforms-and the problem of natural speech recognition--as an interesting pattern-recognition problem. Rockefeller had neither experts in speech understanding nor suitable computing power, and so the Bakers moved to Carnegie Mellon University (CMU), a prime contractor for DARPA's Speech Understanding Research program. There they began to work on natural speech recognition capabilities. Their approach differed from that of other speech researchers, most of whom were attempting to recognize spoken language by providing contextual information, such as the speaker's identity, what the speaker knew, and what the speaker might be trying to say, in addition to rules of English. The Bakers' approach was based purely on statistical relationships, such as the probability that any two or three words would appear one after another in spoken English. They created a phonetic dictionary with the sounds of different word groups and then set to work on an algorithm to decipher a string of spoken words based on phonetic sound matches and the probability that someone would speak the words in that order. Their approach soon began outperforming competing system. After receiving their doctorates from CMU in 1975, the Bakers joined IBM's T.J. Watson Research Center, one of the only organizations at the time working on large-vocabulary, continuous speech recognition. The Bakers developed a program that could recognize speech from a 1,000-word vocabulary, but it could not do so in real time. Running on an IBM System 370 computer, it took roughly an hour to decode a single spoken sentence. Nevertheless, the Bakers grew impatient with what they saw as IBM's reluctance to develop simpler systems that could be more rapidly put to commercial use. They left in 1979 to join VERBEX Voice Systems, a subsidiary of Exxon Enterprises that had built a system for collecting data over the telephone using spoken digits. Less than 3 years later, however, Exxon exited the speech recognition business. With few alternatives, the Bakers decided to start their own company, Dragon Systems. The company survived its early years through a mix of custom projects, government research contracts, and new products that relied on the more mature discrete speech recognition technology. In 1984, they provided Apricot Computer, a British company, with the first speech recognition capability for a personal computer (PC). It allowed users to open files and run programs using spoken commands. But Apricot folded shortly thereafter. In 1986, Dragon
12
Systems was awarded the first of a series of contracts from DARPA to advance largevocabulary, speaker-independent continuous speech recognition, and by 1988, Dragon conducted the first public demonstration of a PC-based discrete speech recognition system, boasting an 8,000-word vocabulary. In 1990, Dragon demonstrated a 5,000-word continuous speech system for PCs and introduced Dragon Dictate 30K, the first large-vocabulary, speech-to-text system for generalpurpose dictation. It allowed control of a PC using voice commands only and found acceptance among the disabled. The system had limited appeal in the broader marketplace because it required users to pause between words. Other federal contracts enabled Dragon to improve its technology. In 1991, Dragon received a contract from DARPA for work on machine-assisted translation systems, and in 1993, Dragon received a federal Technology Reinvestment Project award to develop, in collaboration with Analog Devices Corporation, continuous speech recognition systems for desktop and hand-held personal digital assistants (PDAs). Dragon demonstrated PDA speech recognition in the Apple Newton Message Pad 2000 in 1997. Late in 1993, the Bakers realized that improvements in desktop computers would soon allow continuous voice recognition. They quickly began setting up a new development team to build such a product. To finance the needed expansion of its engineering, marketing, and sales staff, Dragon brokered a deal whereby Seagate Technologies bought 25 percent of Dragon's stock. By July 1997, Dragon had launched Dragon NaturallySpeaking, a continuous speech recognition program for general-purpose use with a vocabulary of 23,000 words. The package won rave reviews and numerous awards. IBM quickly followed suit, offering its own continuous speech recognition program, Via Voice, in August after a crash development program. By the end of the year, the two companies combined had sold more than 75,000 copies of their software. Other companies, such as Microsoft Corporation and Lucent Technologies, are expected to introduce products in the near future, and analysts expect a $4 billion worldwide market by 2001. In 2000, Lernout & Hauspie acquired Dragon Systems. In 2001, Scansoft, Inc. acquired all rights to Lernout & Hauspie's speech recognition products including Dragon Naturally Speaking. In 2003, Scan soft, Inc. acquires Speech works. Scan soft Inc. is presently the world leader in the technology of Speech Recognition in the commercial market.
13
2.1.Early Automatic Speech Recognizers Early attempts to design systems for automatic speech recognition were mostly guided by the Theory of acoustic-phonetics, which describes the phonetic elements of speech (the basic sounds of the language) and tries to explain how they are acoustically realized in a spoken utterance. These elements include the phonemes and the corresponding place and manner of articulation used to produce the sound in various phonetic contexts. For example, in order to produce a steady vowel sound, the vocal cords need to vibrate (to excite the vocal tract), and the air that propagates through the vocal tract results in sound with natural modes of resonance similar to what occurs in an acoustic tube. These natural modes of resonance, called the formants or formant frequencies, are manifested as major regions of energy concentration in the speech power spectrum. In 1952, Davis, Biddulph, and Balashek of Bell Laboratories built a system for isolated digit recognition [5] for a single speaker using the formant frequencies measured (or estimated) during vowel regions of each digit. In other early recognition systems of the 1950s, Olson and Belar of RCA Laboratories built a system to recognize 10 syllables of a single talker and at MIT Lincoln Lab, Forgie and Forgie built a speaker-independent 10-vowel recognizer .In the 1960s, several Japanese laboratories demonstrated their capability of building special purpose hardware to perform a speech recognition task. Most notable were the vowel recognizer of Suzuki and Nakata at the Radio Research Lab in Tokyo the phoneme recognizer of Sakai and Doshita at Kyoto University and the digit recognizer of NEC Laboratories. The work of Sakai and Doshita involved the first use of a speech segmenter for analysis and recognition of speech in different portions of the input utterance. In contrast, an isolated digit recognizer implicitly assumed that the unknown utterance contained a complete digit (and no other speech sounds or words) and thus did not need an explicit segmenter. Kyoto Universitys work could be considered a precursor to a continuous speech recognition system. In another early recognition system Fry and Denes, at University College in England, built a phoneme recognizer to recognize 4 vowels and 9 consonants. By incorporating statistical information about allowable phoneme sequences in English, they increased the overall phoneme recognition accuracy for words consisting of two or more
14
phonemes. This work marked the first use of statistical syntax (at the phoneme level) in automatic speech recognition. An alternative to the use of a speech segmenter was the concept of adopting a nonuniform time scale for aligning speech patterns. This concept started to gain acceptance in the 1960s through the work of Tom Martin at RCA Laboratories and Vintsyuk in the Soviet Union. Martin recognized the need to deal with the temporal non-uniformity in repeated speech events and suggested a range of solutions, including detection of utterance endpoints, which greatly enhanced the reliability of the recognizer performance .Vintsyuk proposed the use of dynamic programming for time alignment between two utterances in order to derive a meaningful assessment of their similarity .His work, though largely unknown in the West, appears to have preceded that of Sakoe and Chiba as well as others who proposed more formal methods, generally known as dynamic time warping, in speech pattern matching. Since the late 1970s, mainly due to the publication by Sakoe and Chiba, dynamic programming, in numerous variant forms (including the Viterbi algorithm which came from the communication theory community), has become an indispensable technique in automatic speech recognition.
2.1.1. Technology Drivers since the 1970s In the late 1960s, Atal and Itakura independently formulated the fundamental concepts of Linear Predictive Coding (LPC) which greatly simplified the estimation of the vocal tract response from speech waveforms. By the mid 1970s, the basic ideas of applying fundamental pattern recognition technology to speech recognition, based on LPC methods, were proposed by Itakura , Rabiner and Levinson and others. Also during this time period, based on his earlier success at aligning speech utterances, Tom Martin founded the first speech recognition commercial company called Threshold Technology, Inc. and developed the first real ASR product called the VIP-100 System. The system was only used in a few simple applications, such as by television faceplate manufacturing firms (for quality control) and by FedEx (for package sorting on a conveyor belt), but its main importance was the way it influenced the Advanced Research Projects Agency (ARPA) of the U.S. Department of Defense to fund the Speech Understanding Research (SUR) program during the early 1970s.
15
Among the systems built by the contractors of the ARPA program was Carnegie Mellon Universitys Harpy which was shown to be able to recognize speech using a vocabulary of 1,011 words, and with reasonable accuracy. One particular contribution from the Harpy system was the concept of doing a graph search, where the speech recognition language was represented as a connected network derived from lexical representations of words, with syntactical production rules and word boundary rules. In the proposed Harpy system, the input speech, after going through a parametric analysis, was segmented and the segmented parametric sequence of speech was then subjected to phone template matching using the Itakura distance. The graph search, based on a beam search algorithm, compiled, hypothesized, pruned, and then verified the recognized sequence of words (or sounds) that satisfied the knowledge Constraints with the highest matching score (smallest distance to the reference patterns). The Harpy system was perhaps the first to take advantage of a finite state network to reduce computation and efficiently determine the closest matching string. However, methods which optimized the resulting finite state network (FSN) (for performance as well as to eliminate redundancy) did not come about until the early 1990s. Other systems developed under DARPAs SUR program included CMUs Hearsay(-II) and BBNs HWIM Neither Hearsay-II nor HWIM (Hear What I Mean) met the DARPA programs performance goal at its conclusion in 1976. However, the approach proposed by Hearsay-II of using parallel asynchronous processes that simulate the component knowledge sources in a speech system was a pioneering concept. The Hearsay-II system extended sound identity analysis (to higher level hypotheses) given the detection of a certain type of (lower level) information or evidence, which was provided to a global blackboard where knowledge from parallel sources was integrated to produce the next level of hypothesis. BBNs HWIM system, on the other hand, was known for its interesting ideas including a lexical decoding network incorporating sophisticated phonological rules (aimed at phoneme recognition accuracy), its handling of segmentation ambiguity by a lattice of alternative hypotheses, and the concept of word verification at the parametric level. Another system worth noting of the time was the DRAGON system by Jim Baker, who moved to Massachusetts to start a company with the same name in the early 1980s.
16
In parallel to the ARPA-initiated efforts, two broad directions in speech recognition research started to take shape in the 1970s, with IBM and AT&T Bell Laboratories essentially representing two different schools of thought as to the applicability of automatic speech recognition systems for commercial applications. IBMs effort, led by Fred Jelinek, was aimed at creating a voice-activated typewriter (VAT), the main function of which was to convert a spoken sentence into a sequence of letters and words that could be shown on a display or typed on paper. The recognition system, called Tangora, was essentially a speaker-dependent system (i.e., the typewriter had to be trained by each individual user). The technical focus was on the size of the recognition vocabulary (as large as possible, with a primary target being one used in office correspondence), and the structure of the language model (the grammar), which was represented by statistical syntactical rules that described how likely, in a probabilistic sense, was a sequence of language symbols (e.g. phonemes or words) that could appear in the speech signal. This type of speech recognition task is generally referred to as transcription. The set of statistical grammatical or syntactical rules was called a language model, of which the n-gram model, which defined the probability of occurrence of an ordered sequence of n words, was the most frequently used variant. Although both the n gram language model and a traditional grammar are manifestations of the rules of the language, their roles were fundamentally different. The n-gram model, which characterized the word relationship within a span of n words, was purely a convenient and powerful statistical representation of a grammar. Its effectiveness in guiding a word search for speech recognition, however, was strongly validated by the famous word game of Claude Shannon which involved a competition between a human and a computer. In this competition both the computer and the human are asked to sequentially guess the next word in an arbitrary sentence. The human guesses based on native experience with language; the computer uses the accumulated word statistics to make its best guess based on maximum probability from the estimated word frequencies. It was shown that once the span of the words, n, exceeded 3, the computer was very likely to win (make better guesses as to the next word in the sequence) over the human player. Since their introduction in the 1980s, the use of n-gram language models, and its variants, has become indispensable in large vocabulary speech recognition systems. At AT&T Bell Laboratories, the goal of the research program was to provide automated telecommunication services to the public, such as voice dialling, and command and control for
17
routing of phone calls. These automated systems were expected to work well for a vast population (literally tens of millions) of talkers without the need for individual speaker training. The focus at Bell Laboratories was in the design of a speaker-independent system that could deal with the acoustic variability intrinsic in the speech signals coming from many different talkers, often with notably different regional accents. This led to the creation of a range of speech clustering algorithms for creating word and sound reference patterns (initially templates but ultimately statistical models) that could be used across a wide range of talkers and accents. Furthermore, research to understand and to control the acoustic variability of various speech representations across talkers led to the study of a range of spectral distance measures (e.g., the Itakura distance) and statistical modelling techniques that produced sufficiently rich representations of the utterances from a vast population mixture density hidden Markov models has since become the prevalent representation of speech units for speaker independent continuous speech recognition.) Since applications, such as voice dialling and call routing, usually involved only short utterances of limited vocabulary and consisted of only a few words, there was an emphasis of the research at Bell Laboratories on what is generally called the acoustic model (the spectral representation of sounds or words) over the language model (the representation of the grammar or syntax of the task). Also, of great importance in the Bell Laboratories approach was the concept of keyword spotting as a primitive form of speech understanding. The technique of keyword spotting aimed at detecting a keyword or a key-phrase of some particular significance that was embedded in a longer utterance where there was no semantic significance to the other words in the utterance. The need for such keyword spotting was to accommodate talkers who preferred to speak in natural sentences rather than using rigid command sequences when requesting services (i.e., as if they were speaking to a human operator). For example, a telephone caller requesting a credit card charge might speak the sentence Id like to charge it to my credit card rather than just say credit card. In a limited domain application, the presence of the key-phrase credit card in an otherwise naturally spoken sentence was generally sufficient to indicate the callers intent to make a credit card call. The detected keyword or key-phrase would then trigger a prescribed action (or sequence of actions) as part of the service, in response to the talkers spoken utterance. The technique of keyword
18
spotting required extension of the usual pattern recognition paradigm to one that supported hypothesis testing. The IBM and AT&T Bell Laboratories approaches to speech recognition both had a profound influence in the evolution of human-machine speech communication technology of the last two decades. One common theme between these efforts, despite the differences, was that mathematical formalism and rigor started to emerge as distinct and important aspects of speech recognition research. While the difference in goals led to different realizations of the technology in various applications, the rapid development of statistical methods in the 1980s, most notably the hidden Markov model (HMM) framework , caused a certain degree of convergence in the system design. Today, most practical speech recognition systems are based on the statistical framework and results developed in the 1980s, with significant additional improvements in the 1990s. 2.1.2 Technology Directions in the 1980s and 1990s Speech recognition research in the 1980s was characterized by a shift in methodology from the more intuitive template-based approach (a straightforward pattern recognition paradigm) towards a more rigorous statistical modelling framework. Although the basic idea of the hidden Markov model (HMM) [5] was known and understood early on in a few laboratories (e.g., IBM and the Institute for Defence Analyses (IDA)), the methodology was not complete until the mid-1980s and it wasnt until after widespread publication of the theory that the hidden Markov model became the preferred method for speech recognition. The popularity and use of the HMM as the main foundation for automatic speech recognition and understanding systems has remained constant over the past two decades, especially because of the steady stream of improvements and refinements of the technology. The hidden Markov model, which is a doubly stochastic process, models the intrinsic variability of the speech signal (and the resulting spectral features) as well as the structure of spoken language in an integrated and consistent statistical modelling framework .As is well known, a realistic speech signal is inherently highly variable (due to variations in pronunciation and accent, as well as environmental factors such as reverberation and noise). When people speak the same word, the acoustic signals are not identical (in fact they may even be remarkably
19
different), even though the underlying linguistic structure, in terms of the pronunciation, syntax and grammar, may (or may not) remain the same. The formalism of the HMM is a probability measure that uses a Markov chain to represent the linguistic structure and a set of probability distributions to account for the variability in the acoustic realization of the sounds in the utterance. Given a set of known (text-labelled) utterances, representing a sufficient collection of the variations of the words of interest (called a training set), one can use an efficient estimation method, called the Baum-Welch algorithm, to obtain the best set of parameters that define the corresponding model or models. The estimation of the parameters that define the model is equivalent to training and learning. The resulting model is then used to provide an indication of the likelihood (probability) that an unknown utterance is indeed a realization of the word (or words) represented by the model. The probability measure represented by the hidden Markov model is an essential component of a speech recognition system that follows the statistical pattern recognition approach, and has its root in Bayes decision theory. The HMM methodology represented a major step forward from the simple pattern recognition and acoustic-phonetic methods used earlier in automatic speech recognition systems. The idea of the hidden Markov model appears to have first come out in the late 1960s at the Institute for Defence Analyses (IDA) in Princeton, N.J. Len Baum referred to an HMM as a set of probabilistic functions of a Markov chain, which, by definition, involves two nested distributions, one pertaining to the Markov chain and the other to a set of the probability distributions, each associated with a state of the Markov chain, respectively. The HMM model attempts to address the characteristics of a probabilistic sequence of observations that may not be a fixed function but instead changes according to a Markov chain. This doubly stochastic process was found to be useful in a number of applications such as stock market prediction and cryptoanalysis of a rotary cipher, which was widely used during World War II. Baums modelling and estimation technique was first shown to work for discrete observations (i.e., ones that assume values from a finite set and thus are governed by discrete probability distributions) and then random observations that were well modelled using log-concave probability density functions. The technique was powerful but limited. Liporace, also of IDA, relaxed the log-concave density
20
constraint to include an elliptical symmetric density constraint (thereby including a Gaussian density and a Cauchy density), with help from an old representation theorem by Fan. Baums doubly stochastic process started to find applications in the speech area, initially in speaker identification systems, in the late 1970s .As more people attempted to use the HMM technique, it became clear that the constraint on the form of the density functions imposed a limitation on the performance of the system, particularly for speaker independent tasks where the speech parameter distribution was not sufficiently well modelled by a simple log-concave or an elliptically symmetric density function. In the early 1980s at Bell Laboratories, the theory of HMM was extended to mixture densities which have since proven vitally important in ensuring satisfactory recognition accuracy, particularly for speaker independent, large vocabulary speech recognition tasks. The HMM, being a probability measure, was amenable for incorporation in a larger speech decoding framework which included a language model. The use of a finite-state grammar in large vocabulary continuous speech recognition represented a consistent extension of the Markov chain that the HMM utilized to account for the structure of the language, albeit at a level that accounted for the interaction between articulation and pronunciation. Although these structures (for various levels of the language constraints) were at best crude approximations to the real speech statistical consistency, particularly in handling acoustic variability) and the finite state network (with its search and computational efficiency, particularly in handling word sequence hypotheses) was an important, although not unexpected, technological development in the mid-1980s. A composite finite-state network for the utterance show all alerts Another technology that was (re)introduced in the late 1980s was the idea of artificial neural networks (ANN). Neural networks were first introduced in the 1950s, but failed to produce notable results initially. The advent, in the 1980s, of a parallel distributed processing (PDP) model, which was a dense interconnection of simple computational elements, and a corresponding training method, called error back-propagation, and revived interest around the old idea of mimicking the human neural processing mechanism. A particular form of PDP, the multilayer perceptron, shown in Fig. 8, received perhaps the most intense attention then, not because of its analog to neural processing but due to its capability in approximating any function (of the input) to an arbitrary precision, provided no limitation in the complexity of the processing configuration was
21
imposed. If a pattern recognizer is viewed as one that performs a function mapping an input pattern to its class identity, the multi-layer perceptron was then a readily available candidate for this purpose. Early attempts at using neural networks for speech recognition centred on simple tasks like recognizing a few phonemes or a few words (e.g., isolated digits), with good success. However, as the problem of speech recognition inevitably requires handling of temporal variation, neural networks in their original form have not proven to be extensible to this task. Ongoing research focuses on integrating neural networks with the essential structure of a hidden Markov model to take advantage of the temporal handling capability of the HMM. In the 1990s, a number of innovations took place in the field of pattern recognition. The problem of pattern recognition, which traditionally followed the framework of Bayes and required estimation of distributions for the data, was transformed into an optimization problem involving minimization of the empirical recognition error. After all, the objective of a recognizer design should be to achieve the least recognition error rather than the best fitting of a distribution function to the given (known) data set as advocated by the Bayes criterion. The concept of minimum classification or empirical error subsequently spawned a number of techniques, among which discriminative training and kernel-based methods such as the support vector machines (SVM) have become popular subjects of study. In the 1990s great progress was made in the development of software tools that enabled many individual research programs all over the world. As systems became more sophisticated (many large vocabulary systems now involve tens of thousands of phone unit models and millions of parameters), a well-structured baseline software system was indispensable for further research and development to incorporate new concepts and algorithms. The system that was made available by the Cambridge University team (led by Steve Young), called the Hidden Markov Model Tool Kit (HTK), was (and remains today as) one of the most widely adopted software tools for automatic speech recognition research.
22
Figure 2.2 Milestones in Speech Recognition and Understanding Technology
Further developed system using Speech Recognition are: Automatic translation. Automotive speech recognition. Telematics (e.g. vehicle Navigation Systems). Home automation. Automatic wheelchair for disable person.
Fingerprint Module Pre-historic picture writing of a hand with ridge patterns was discovered in Nova Scotia. In ancient Babylon, fingerprints were used on clay tablets for business transactions. In ancient China, thumb prints were found on clay seals. In 14th century Persia, various official government papers had fingerprints (impressions), and one government official, a doctor, observed that no two fingerprints were exactly alike.
23
Marcello Malpighi 1686.In 1686, Marcello Malpighi, a professor of anatomy at the University of Bologna, noted in his treatise; ridges, spirals and loops in fingerprints. He made no mention of their value as a tool for individual identification. A layer of skin was named after him; "Malpighi" layer, which is approximately 1.8mm thick. John Evangelist Purkinji 1823.In 1823, John Evangelist Purkinji, a professor of anatomy at the University of Breslau, published his thesis discussing 9 fingerprint patterns, but he too made no mention of the value of fingerprints for personal identification. Sir William Hershel 1856.The English first began using fingerprints in July of 1858, when Sir William Herschel, Chief Magistrate of the Hooghly district in Jungipoor, India, first used fingerprints on native contracts. On a whim, and with no thought toward personal identification, Herschel had RajyadharKonai, a local businessman, impress his hand print on the back of a contract.The idea was merely ". . . to frighten [him] out of all thought of repudiating his signature." The native was suitably impressed, and Herschel made a habit of requiring palm prints--and later, simply the prints of the right Index and Middle fingers--on every contract made with the locals. Personal contact with the document, they believed, made the contract more binding than if they simply signed it. Thus, the first wide-scale, modern-day use of fingerprints was predicated, not upon scientific evidence, but upon superstitious beliefs.As his fingerprint collection grew, however, Herschel began to note that the inked impressions could, indeed, prove or disprove identity. While his experience with fingerprinting was admittedly limited, Sir Herschel's private conviction that all fingerprints were unique to the individual, as well as permanent throughout that individual's life, inspired him to expand their use. Around 1870 a French anthropologist devised a system to measure and record the dimensions of certain bony parts of the body. These measurements were reduced to a formula which, theoretically, would apply only to one person and would not change during his/her adult life.This Bertillon System, named after its inventor, Alphonse Bertillon, was generally accepted for thirty years. Dr. Henry Faulds 1880.During the 1870's, Dr. Henry Faulds, the British SurgeonSuperintendent of Tsukiji Hospital in Tokyo, Japan, took up the study of "skin-furrows" after
24
noticing finger marks on specimens of "prehistoric" pottery. A learned and industrious man, Dr.Faulds not only recognized the importance of fingerprints as a means of identification, but devised a method of classification as well.In 1880, Faulds forwarded an explanation of his classification system and a sample of the forms he had designed for recording inked impressions, to Sir Charles Darwin. Darwin, in advanced age and ill health, informed Dr.Faulds that he could be of no assistance to him, but promised to pass the materials on to his cousin, Francis Galton. Also in 1880, Dr.Faulds published an article in the Scientific Journal, "Nautre" (nature). He discussed fingerprints as a means of personal identification, and the use of printers ink as a method for obtaining such fingerprints. He is also credited with the first fingerprint identification of a greasy fingerprint left on an alcohol bottle. Gilbert Thompson 1882.In 1882, Gilbert Thompson of the U.S. Geological Survey in New Mexico, used his own fingerprints on a document to prevent forgery. This is the first known use of fingerprints in the United States. Mark Twain (Samuel L. Clemens) 1883.In Mark Twain's book, "Life on the Mississippi", a murderer was identified by the use of fingerprint identification. In a later book by Mark Twain, "Pudd'n Head Wilson", there was a dramatic court trial on fingerprint identification. A more recent movie was made from this book. Sir Francis Galton 1888.Sir Francis Galton, a British anthropologist and a cousin of Charles Darwin, began his observations of fingerprints as a means of identification in the 1880's. In 1892, he published his book, "Fingerprints", establishing the individuality and permanence of fingerprints. The book included the first classification system for fingerprints.Galton's primary interest in fingerprints was as an aid in determining heredity and racial background. While he soon discovered that fingerprints offered no firm clues to an individual's intelligence or genetic history, he was able to scientifically prove what Herschel and Faulds already suspected: that fingerprints do not change over the course of an individual's lifetime, and that no two fingerprints are exactly the same. According to his calculations, the odds of two individual fingerprints being the same were 1 in 64 billion.Galton identified the characteristics by which fingerprints can be identified. These same characteristics (minutia) are basically still in use today, and are often referred to as Galton's Details.
25
Juan Vucetich.In 1891, Juan Vucetich, an Argentine Police Official, began the first fingerprint files based on Galton pattern types. At first, Vucetich included the Bertillon System with the files. (see Bertillon below).In 1892, Juan Vucetich made the first criminal fingerprint identification. He was able to identify a woman by the name of Rojas, who had murdered her two sons, and cut her own throat in an attempt to place blame on another.Her bloody print was left on a door post, proving her identity as the murderer. 1901,Introduction of fingerprints for criminal identification in England and Wales, using Galton's observations and revised by Sir Edward Richard Henry. Thus began the Henry Classification System, used even today in all English speaking countries. 1902,First systematic use of fingerprints in the U.S. by the New York Civil Service Commission for testing.Dr. Henry P. DeForrest pioneers U.S. fingerprinting. 1903,The New York State Prison system began the first systematic use of fingerprints in U.S. for criminals. 1904,The use of fingerprints began in Leavenworth Federal Penitentiary in Kansas and the St. Louis Police Department. They were assisted by a Sergeant from Scotland Yard who had been on duty at the St. Louis Exposition guarding the British Display. 1905 saw the use of fingerprints for the U.S. Army. Two years later the U.S. Navy started, and was joined the next year by the Marine Corp. During the next 25 years more and more law enforcement agencies join in the use of fingerprints as a means of personal identification. Many of these agencies began sending copies of their fingerprint cards to the National Bureau of Criminal Identification, which was established by the International Association of Police Chiefs. 1918,It was in 1918 when Edmond Locard wrote that if 12 points (Galton's Details) were the same between two fingerprints, it would suffice as a positive identification. This is where the often quoted (12 points) originated. Be aware though, there is "NO" required number of points necessary for an identification. Some countries have set their own standards which do include a minimum number of points, but not in the United States.
26
In 1924, an act of congress established the Identification Division of the F.B.I.. The National Bureau and Leavenworth consolidated to form the nucleus of the F.B.I. fingerprint files. By 1946, the F.B.I. had processed 100 million fingerprint cards in manually maintained files and by 1971, 200 million cards.With the introduction of AFIS technology, the files were split into computerized criminal files and manually maintained civil files. Many of the manual files were duplicates though, the records actually represented somewhere in the neighborhood of 25 to 30 million criminals, and an unknown number of individuals in the civil files. By 1999, the FBI had planned to stop using paper fingerprint cards (at least for the newly arriving civil fingerprints) inside their new Integrated AFIS (IAFIS) site at Clarksburg, WV. IAFIS will initially have individual computerized fingerprint records for approximately 33 million criminals. Old paper fingerprint cards for the civil files are still manually maintained in a warehouse facility (rented shopping center space) in Fairmont, WV. Since the Gulf War, most military fingerprint enlistment cards received have been filed only alphabetically by name. The FBI hopes to someday classify and file these cards so they can be of value for unknown casualty (or amnesiac) identification (when no passenger/victim list from a flight, etc., is known). Currently now in 2005, paper fingerprint cards are still in use and being processed for all identification purposes. In the United States, the FBI maintains the biggest biometric database in the world with fingerprint files and criminal history files that can be accessed and searched all day, any day, by law enforcement agencies. According to the FBI, by 2010 there were records of more than 55 million subjects on file. Electronic scanning inventions have shortened the time it takes for a subjects rolled and flat print images to be uploaded into the database, from several weeks to two hours for criminal records and 24 hours for civilian records.Fans of TV crime dramas know what it means to be in the system, because the effect of a match to a latent print found at a crime scene is the inevitable interrogation of the identified suspect. Ironically, the enthrallment with forensic science in books, movies, and television has produced a peculiar phenomenon in the real-world justice system, termed by some as the CSI effect. Jury members who are fans of fictional crime-solving may attribute more credibility and trustworthiness to the professionals
27
who appear in a real criminal trial than may be justified. The general public watching these types of television series may see a fingerprint image on a computer screen in a show and may think that the computer does the analysis and that the resulting ID would be all that would need to be admitted as proof. That might be possible if suspects left full, perfect prints at the scene, because the software operates with a high grade of accuracy. The actuality, however, is this: latent prints are created by oils and residues on the fingertips or contact surfaces, they may only be a partial image, and they may be of poor quality due to smudging. In fact, fingerprint identification relies on an examiner to make a visual comparison and confirm or negate any preliminary match identified by the database search software. Despite the high-level technology that has helped invent and advance biometric ID systems, fingerprint recognition has developed some blemish on its reputation. In 2004, shortly after a series of passenger train bombings in Spain, the FBI held Brandon Mayfield, a Portland, Oregon, lawyer, for two weeks after false fingerprint identification linked him to these overseas acts of terrorism. It brightened the spotlight on the factor of human fallibility in this field of forensic science. Academicians are disappointed to identify that practicing fingerprint examiners may not conduct their analyses objectively, and seem to be influenced by the context in which the prints are evaluated to a certain extent. In one experiment, the subjects had, in the past, analyzed some latent fingerprints, and were then presented with the same prints again. The second time, however, the prints were accompanied by photos of graphic violence. Surprisingly, two-thirds of the examiners drew a conclusion about the match that was different from their first findings. The researcher, a British neuroscientist named Itiel Dror, concluded that people are vulnerable to cognitive and psychology/cal markers that distinctly influenced the outcome. Despite some evidence of human error in viewing latent prints of irregular quality, the good news is that the automated fingerprint identification technology that uses direct scanning for security ID functions is enjoying great success. Mainstream applications of it are used to lock the contents of valuable portable consumer electronics products such as laptops and wireless phones. If these devices are ripped off, the thieves will not be able to access information and files.
28
The pattern-matching programs that examine the minutiae of fingertip ridges for criminal investigations have paved the way for automated biometric ID software and systems that are even more sophisticated and more capable of protecting us and our property, such as hand geometry, iris scanning etc.
MODELLING/DEVELOPMENT OF SYSTEM
3.1. Block diagram:
29
3.2. HARDWARE DESCRIPTION: 3.2.1 Speech Recognition Module
30
The heart of this system is the VeeaR speech recognition IC. VRbot Module is designed to easily add versatile voice command functionality to robots (e.g. ROBONOVA-I, RoboZak, POP-BOT, ) or any other host (e.g. PIC, Arduino boards,.)
3.2.1.1 VRbot features A host of built-in speaker independent (SI) commands for ready to run basic controls Supports up to 32 user-defined Speaker Dependent (SD) triggers or commands language. as
Well as Voice Passwords. SD custom commands can be spoken in ANY
Easy-to-use and simple Graphical User Interface to program Voice Commands Languages currently supported for SI commands: English U.S., Italian, Japanese and
German. More languages would be available in the near future. Module can be used with any host with an UART interface (powered at 3.3V 31
5V)
4.1.1. Working The working comprising the steps of providing a database of labeled speech data; providing a prototype of a Hidden Markov Model (HMM) definition to define the characteristics of the HMM; and parameterizing speech utterances according to one of linear prediction parameters or Mel-scale filter bank parameters. The method further includes selecting a frame period for accommodating the parameters and generating HMMs and decoding to specified speech utterances by causing the user to utter predefined training speech utterances for each HMM. The method then statistically computes the generated HMMs with the prototype HMM to provide a set of fully trained HMMs for each utterance indicative of the speaker. The trained HMMs are used for recognizing a speaker by computing Laplacian distances via distance table lookup for utterances of the speaker during the selected frame period; and iteratively decoding node transitions corresponding to the spoken utterances during the selected frame period to determine which predefined utterance is present.
4.1.Microcontroller :AT89S52 The AT89S52 is a low-power, high-performance CMOS 8-bit microcontroller with 8K bytes of in-system programmable Flash memory. The AT89S52 provides the following standard features: 4.0V to 5.5V Operating Range Fully Static Operation: 0 Hz to 33 MHz Three-level Program Memory Lock 256 x 8-bit Internal RAM 32 Programmable I/O Lines Three 16-bit Timer/Counters Eight Interrupt Sources Full Duplex UART Serial Channel Low-power Idle and Power-down Modes Interrupt Recovery from Power-down Mode Watchdog Timer Dual Data Pointer
32
Power-off Flag Fast Programming Time Flexible ISP Programming (Byte and Page Mode) Green (Pb/Halide-free) Packaging Option In addition, the AT89S52 is designed with static logic for operation down to zero frequency and supports two software selectable power saving modes. The Idle Mode stops the CPU while allowing the RAM, timer/counters, serial port, and Interrupt system to continue functioning. The Power-down mode saves the RAM contents but freezes the oscillator, disabling all other chip functions until the next interrupt or hardware reset.
WORKING: Firstly microcontroller take input from voice recognition section which is voice recognized code of controlling commands and according to that command code it will control our army tank. Microcontroller compares the input command code which coming from recognition part with his reference table and then decide to which direction army tank want to go. Basically there are 6 commands are handled by Microcontroller they are a) Forward: Army tank will proceed in forward direction. b) Back: Army tank will move in backward direction. c) Turn Right: Tank will turn in right direction. d) Turn Left: tank will turn in left direction. e) Stop: Army tank will stop moving. f) Lock: lock the whole tank system. g) Fire: led glows indicating firing.
4.2.MAX 232 The MAX232 is an integrated circuit that converts signals from an RS-232 serial port to signals suitable for use in TTL compatible digital logic circuits. The MAX232 is a dual
33
driver/receiver and typically converts the RX, TX, CTS and RTS signals. The drivers provide RS-232 voltage level outputs (approx. 7.5 V) from a single + 5 V supply via on-chip charge pumps and external capacitors. This makes it useful for implementing RS-232 in devices that otherwise do not need any voltages outside the 0 V to + 5 V range, as power supply design does not need to be made more complicated just for driving the RS-232 in this case.
Figure 4.1 MAX232 The 89S52 has a built in serial port that makes it very easy to communicate with the PC's serial port but the 89S52 outputs are 0 and 5 volts and we need +10 and -10 volts to meet the RS232 serial port standard. The easiest way to get these values is to use the MAX232. The MAX232 acts as a buffer driver for the processor. It accepts the standard digital logic values of 0 and 5 volts and converts them to the RS232 standard of +10 and -10 volts. It also helps protect the processor from possible damage from static that may come from people handling the serial port connectors. It includes a Charge Pump, which generates +10V and -10V from a single 5v supply. This I.C. also includes two receivers and two transmitters in the same package. This is handy in many cases when you only want to use the Transmit and Receive data Lines. You don't need to use two chips, one for the receive line and one for the transmit. However all this convenience comes at a price, but compared with the price of designing a new power supply it is very cheap. There are also many variations of these devices. The large values of capacitors are not only bulky, but also expensive. Therefore other devices are available which use smaller capacitors and
34
even some with inbuilt capacitors. However the MAX-232 is the most common, and thus we will use this RS-232 Level Converter in our project. The MAX232 requires 5 external 1F capacitors. These are used by the internal charge pump to create +10 volts and -10 volts. 4.3.LCD Display Short for liquid crystal display, a type of display used in digital watches and many portable computers. LCD displays utilize two sheets of polarizing material with a liquid crystal solution between them. An electric current passed through the liquid causes the crystals to align so that light cannot pass through them. Each crystal, therefore, is like a shutter, either allowing light to pass through or blocking the light. It is used for debugging purpose, to illustrate whether there is an error or not and also for display whether owner is correct or not.
FEATURES: 5 x 8 dots with cursor Built-in controller (KS 0066 or Equivalent) + 5V power supply 1/16 duty cycle
35
B/L to be driven by pin 1, pin 2 or pin 15, pin 16 or A.K (LED) N.V. optional for + 3V power supply ITEM STANDARD VALUE UNIT: Module Dimension 80.0 x 36.0 mm Viewing Area 66.0 x 16.0 mm Dot Size 0.56 x 0.66 mm Character Size 2.96 x 5.56 mm.
4.4.
Biometric Fingerprint Module Sm12
Biometric Fingerprint Module Sm12
36
Optical Fingerprint Sensor Absorption Figure 5.1 explains the basic principle of absorption in an optical fingerprint sensor.
* Figure 5.1 Principle of Absorption in Fingerprint Sensor An absorption optical fingerprint sensor is composed of a right-angled triangle prism (4), light source (20), a diffusion plate (3), a lens group and an image sensor (6). When a fingerprint is placed on the contact surface, its ridges are closely pressed onto the surface while its valleys are detached from it. The light radiated from light source becomes uniform after undergoing the diffusion plate. The light reaches the fingerprint contact surface after passing through the prism. If the light touches the valley, total internal reflection happens so that it reaches the image sensor composed of CCD (Carge Coupled Device) element or CMOS (Complementary Metal Oxide Semiconductor) element after going through the lens group. On the other hand, if the light reaches the ridges closely pushed onto the surface, some light goes to the image sensor after the total internal reflection and some light is absorbed in the ridges. There are changes in luminous intensity between light reflected from valleys and light from
37
ridges and the image sensor obtains the fingerprint image by calculating the changes in the reflected light intensity between the two. The absorption optical fingerprint sensor needs several LEDs (15-20) since the light should be two-dimensionally uniform after going through the diffusion plate. To capture a fingerprint image without distortion brought on by different optical paths, enough distance is required between the prism and the image sensor. Scattering
* Figure 5.2 Principle of scattering fingerprint sensor The scattering optical fingerprint sensor is mainly comprised of a rectangular-triangle prism (13), light source (20), a lens group (15), and an image sensor (16). When a fingerprint is placed on the contact surface, its ridges are closely pressed onto the surface while its valleys are detached from it. The light radiated from the source passes through the prism and reaches the surface. The light perpendicularly goes through the surface unlike the absorption sensor. If the light reaches the valleys, it goes through the surface, radiating to the outside. If it touches upon the ridges, scattering happens at the ridges. The scattered light gets to the image sensor composed of CCD or CMOS element through the lens group. The light radiated to outside near
38
the valleys seldom reaches the image sensor. Only the scattered light near the ridges gets to the sensor. As a result, a fingerprint image can be captured since the valley area is dark and the ridge area is bright. FEATURES: The scattering optical fingerprint sensor doesn't need a diffusion plate and its contrast is gfingerprint module with Precision optical fingerprint sensor has integrated fingerprint collecting and single chip processor together. It features small size, low power consumption, simple ports, high reliability, small fingerprint template (512bytes), large fingerprint capacity, etc. outstandingly features self-learning function. UART communication port Achieving fingerprint enrollment and verification at the condition of minimum storage Optical reflection fingerprint sensor 1:N verification and 1:1 identification function Self-learning function of the fingerprint features Able to set baud rate and equipment Number Access to standby situation by instruction control module Sensor: CMOS image sensor: GC0303, optical reflection Scan Area: 18mm X 20mm Fingerprint image: 210 x 250 (pixel) Fingerprint capacity: 3000 Fingerprint template: 512 bytes Identification speed: 1:1 <0.5S; 1:N (2000 fingerprints) <0.9S
WORKING: To verify the identity of a user by automatically extracting minutiae from his or her fingerprint image, a fingerprint recognition algorithm is required. The fingerprint recognition algorithm is composed of two main technologies: image processing technology that captures the
39
Characteristics of the corresponding fingerprint by having the image under going several stages, and matching algorithm technology that authenticates the identity by comparing feature data comprised of minutiae with Templates in a database. Figure 5.5 shown below explains the overall block map of the fingerprint recognition algorithm consisting of the two technologies. * Figure 5.5 Block map of the fingerprint recognition algorithm consisting of the two technologies
Image Processing This part consists of six stages. At the image enhancement stage, noise on the input fingerprint image is eliminated and contrast is fortified for the sake of successive stages. At the image analysis stage, area where fingerprint is severely corrupted is cut out to prevent adverse effects on recognition. The binarization stage is designed to binarize a gray-level fingerprint image. The thinning stage thins the binarized image. The ridge reconstruction stage reconstructs the ridges by removing pseudo minutiae. At the last stage, minutiae are extracted from the reconstructed ridge image.
40
Matching After obtaining feature data of a specific fingerprint, compare the corresponding user who is already stored in the DB with Templates. If the fingerprint is immensely destructed and only general ridges, not minutiae, can be recognized, two algorithms can be used in parallel: an algorithm based on minutiae and an algorithm based on the overall ridge shape. Matching.
Matching stages show big differences according to their types although they are based on the same minutiae. Here, the most well-known matching algorithm will be briefly explained. The
41
matching process consists of four main stages. First of all, the minutiae analysis stage analyzes the geometric characteristics such as distance and angle between standard minutiae and its neighboring minutiae based on the analysis of the image-processed feature data. After the analysis, all the minutiae pairs have some kind of geometric relationship with their neighboring minutiae, and the relationship will be used as basic information for local similarity measurement. In Figure 5.7, picture (a) shows feature data of the input fingerprint, and (b) shows the already stored Template. Finding a similar minutiae pair in (b) against a minutiae pair in (a) is the local similarity measurement. Global similarity measurement means calculating similarity of two fingerprints by finding minutiae pairs in the local similarity measurement in both feature data and selecting the greatest matching minutiae pairs in the feature data. Lastly, calculating final matching scores with the global similarity value and comparing them with the previously set critical value verifies the identity of the user. 4.5. DC MOTOR DRIVER: As the MCU ports are not powerful enough to drive dc motor directly so we need some kind of drivers. A very safe and easy way is to use popular L293D chips. The L293D is a quadruple push-pull 4 channel driver capable of delivering 600 mA (1.2 A peak surge) per channel. The L293D is ideal for controlling the forward/reverse/brake motions of small DC motors controlled by a microcontroller such as a PIC or BASIC Stamp. The L293D is a high voltage, high current four channel driver designed to accept standard TTL logic levels and drive inductive loads (such as relays solenoids, DC and stepping motors) and switching power transistors. The L293D is suitable for use in switching applications at frequencies up to 5 KHz.
Features Include :

600 mA Output Current Capability Per Driver Pulsed Current 1.2 A / Driver Wide Supply Voltage Range: 4.5 V to 36 V
42
Separate Input-Logic Supply NE Package Designed for Heat Sinking Thermal Shutdown & Internal ESD Protection High-Noise-Immunity Inputs
4.6. DC MOTORS: For a designer there are always 2 options in front of designer whether to use dc motor or stepper motor. When it comes to speed, weight, size and cost dc motors are always preferred over stepper motor. Also there are many things you can do with dc motor when interfaced with microcontroller like controlling speed and direction of rotation. Also dc motor have faster response time than stepper motors. Direct current (DC) motors have a stable and continuous current. DC motors were the first and earliest motors used in the industry. Direct current (dc) motors are often found in appliances around the home.The D.C. motor has ability to deal with high starting torques and to absorb sudden rises in load easily. The speed of the motor adjusts to the load.DC motors are available in a wide range of sizes, but their use is generally restricted to a few low speed, low-to-medium power applications like machine tools and rolling mills because of problems with mechanical commutation at large sizes. Also, they are restricted for use only in clean, non-hazardous areas because of the risk of sparking at the brushes. 4.7. RELAY: Relays are components which allow a low-power circuit to switch a relatively high current on and off, or to control signals that must be electrically isolated from the controlling circuit itself. Newcomers to electronics sometimes want to use a relay for this type of application, but are unsure about the details of doing so. Heres a quick rundown. To make a relay operate, you have to pass a suitable pull-in and holding current (DC) through its energizing coil. And generally relay coils are designed to operate from a particular supply voltage often 12V or 5V, in the case of many of the small relays used for electronics work. In each case the coil has a resistance which will draw the right pull-in and holding currents when its connected to that supply voltage. So the basic idea is to choose a relay with a coil designed to operate from the supply
43
voltage you are using for your control circuit (and with contacts capable of switching the currents you want to control), and then provide a suitable relay driver circuit so that your lowpower circuitry can control the current through the relays coil. Typically this will be somewhere between 25mA or 70mA.
DESIGNING OF POWER SUPPLY:A) Design of step down transformer:-
The following information must be available to the designer of the transformer.

1. 2. 3. 4. power output. operating voltage. Frequency range. Efficiency and regulation.
Size of core is one of the first consideration in regard of weight and volume of a transformer. This depends on type of core and winding configuration used. Generally following formula is used to find Area or Size of the Core. Ai = Wp / 0.87 Where Ai = Area of cross section in square cm.
44
Wp = Primary Wattage. For our project we require +5V output, so transformer secondary winding rating is 9V, 500mA. So secondary power wattage is, P2 = 9 * 500mA = 4.5Watt So, Ai = 4.5 / 0.87 = 2.4 Generally 10% of area should be added to the core. So, Ai = 2.8 a) Turns per volt:- Turns per volt of transformer are given by relation. Turns per volt = 100000 / 4.44 f * Bm * Ai Where, F = Frequency in Hz. Bm = Density in Wb / Square meter. Ai = Net area of the cross section. Following table gives the value of turns per volt for 50 Hz frequency.
Flux density 0.76 Wb /sq m1.14 1.01 0.91 0.83 Turns per Volt 40 / Ai45 / Ai50 / Ai55 / Ai
45 / Ai
45
Generally lower the flux density better the quality of transformer. For our project we have taken the turns per volt is 0.91 Wb / sq.m from above table. Turns per volt = 50 / Ai = 50 / 2.8 = 17.85 Thus the turns for the primary winding is, 220 * 17.85 = 3927 And for secondary winding, 9 * 17.85 = 160 b) wire size :- As stated above the size is depends upon the current to be carried out by winding which depends upon current density. For our transformer one tie can safely use current density of 3.1 Amp / sq.mm. for less copper loss 1.6Amp/sq.mm or 2.4sq.mm may be used generally even size gauge of wire are used. R.M.S secondary voltage at secondary to transformer is 9V. so maximum voltage Vm across secondary is = 9 * 1.141 = 12.727v D.C output voltage Vm across secondary is, Vdc = 2 * Vm/pi = 2 * 12.727/3.14 = 8.08 V P.I.V rating of each diode is PIV = 2Vm = 2 * 8.08
46
= 16.16 V Maximum forward current, which flow from each diode is 500 mA. So from above parameter, we select diode IN4007 from the diode selection manual. B) Design of filter capacitor:Formula for calculating filter capacitor is C = 3 r * F * R1 Where, r = ripple present at output of rectifier, which is maximum 0.1 for full wave rectifier. F = frequency of AC main. R1 = input impedance of voltage regulator IC C = 3 * 0.1 * 50 * 28 = 1030 f = 1000 f Voltage rating of filter capacitor should be greater than the i/p Vdc i.e. rectifier output which is 8.08 V so we choose 1000f / 25V filter capacitor C) Specification of voltage regulator IC:Parameter Rating
Available output DC voltage.+5V Line regulation. Load regulation. Vin maximum. Ripple rejection. 0.03 0.5 16.16 V 60-80db
47
SOFTWARE DESCRIPTION:
ALGORITHM: First switch on the power supply. Check for Owners authentication using fingerprint module. Check if fingerprint matches. If yes Give Supply Inputs to all parts. Train Voice Recognition part using keyboard from voice commands. Digital 8 bit input to microcontroller part. Compare the command with standard commands of microcontroller. If compared command is right Motor Rotates left, right, forward, fire, load and stop. If not there is Error in Command.
48
FLOWCHART:
49
3.4
PCB Designing
A Printed circuit is an electronic circuit mounted on a base material. The circuit made of copper foil is so thin that it needs a base to support it. The base is also mounting device, used to fasten the complete package to its case. The type and shape of the actual electronic circuit are limited on by the imagination of the person designing the board. The name printed circuit arose because the electronic circuit appears to be printed on the base material. In ordinary printing, ink is deposited on the paper. The electronic printed circuit gives this same appearance although the circuit is actually a thin layer of copper. The shape of copper is determined by the layout or art work, required for actual circuit. The final shape is developed by etching that is, chemically removing some copper from the surface of a blank board and the remaining copper and the base material from the complete printed circuit board. It is abbreviated as PCB. The printed circuit board has been developed by the electronics industry so that mass production techniques could be applied to electronic assemblies. Using PCBs gives a high rate of reliability in production. All the circuits are uniform in layout, eliminating the wiring errors common to hand wired electronic circuits. An etched or printed circuit consists of a thin layer bf copper foil. The final circuit is shaped by etching the copper in a chemical. The copper foil acts as the wire, or conductor, in the circuit. Component parts like resistors, transistors and capacitors are soldered to the conductive foil to complete the electrical path and circuit. Functions 1. It provides the necessary mechanical support for the components in the circuit. 2. It provides the necessary electrical inter-connections.
3.4.1 The Base Material of PCB Although the number of different printed circuits base materials in common use is finite, the problem of material selection and quality control is almost limitless. The laminate is the base material used to manufacture the laminates. The laminate can be simply described as the product obtained by pressing layers of a filler material with resin under heat and pressure. The commonly used fillers are a variety of papers or glass in various forms such as cloth and continuous filament mat. The commonly used resins are phenolic, epoxy, polyester, Teflon etc.
50
3.4.2 Conducting Material of PCB The commonly used conducting materials are copper, aluminum and silver. But aluminum is difficult to solder and silver is a costly material, hence copper or silver coated copper conducting material is most commonly used for the manufacture of PCB. When the copper foil is pressed or fixed on insulating base material (laminate).
3.4.3 Production of PCB The transfer of the conductor pattern, which is on the film master on to the copper clad laminate is done by two methods. They are as follows: 1. Photo printing 2. Screen printing
3.4.4 Where and Why PCBs are used? Printed circuit boards are used to route electrical currents and signals through copper tracks which are firmly bonded to an insulating base. Advantages of PCB over normal wiring are as follows: 1. PCBs are necessary for interconnecting a large number of electronic components in a very small area with minimum parasitic wiring effects. 2. PCBs are suitable for mass production with less chance of wiring error. 3. Small components can be easily mounted on PCB. 4. Wiring micro phony is avoided. 5. Servicing is simplified 6. Construction is neat, small and truly a work of art. 7. By using PCB, the electronic equipment becomes more reliable, small in size and less costly.
3.4.5 Advantages of PCB

51
1. Once the board layout has been proved, there is no need to check each unit built for correct routing of inter-connections. 2. The weight of electronic system in which a PCB is employed is reduced by as much as 10:1 as compared to external wiring. 3. All the signals are accessible for testing at any point along the length of a track without any risk of a short circuit caused by wires touching each other. 4. Due to PCB troubleshooting and fault finding become easier. 5. Miniaturized and sensitive components like ICs, chips, transistors etc can be mounted easily.
3.4.6 Material Required A phenolic laminate board Ferric chloride (in powder form) Enamel-paint/ alcohol based marker/Nail Polish Set of very fine brushes (0-3 sizes) Petrol, spirit or Acetone (Nail polish remover) Hacksaw File and Drill with various bit sizes (1-5 mm). Varnish Steel wool
3.4.7 Procedure 1. The board surface is cleaned with petrol, spirit or any other cleaning agent. 2. Scrubbing with a steel wool can also help. 3. The shape and size into which the board is to be cut is marked. Mounting holes are drilled and the boards are cut. 4. Draw the layout or photocopy the design. Dont forget to invert IC pins. Two coats of paint are applied and let to dry for at least three hours. 5. Suitable holes of desired sizes are drilled for inserting components. 6. Hot water (85 degrees) is taken in a flat bottom plastic tray. 7. Then 30-50gms of ferric chloride is added to it.
52
8. The board with the copper side facing up is fully immersed in the solution. 9. Few drops of hydrochloric acid are added to speed up the process. 10. The etching process takes about 30-60 minutes to complete depending on the size of the PCB. The backside of the board should be clearly visible, if not then it should be etched for few more minutes. 11. After the etching is complete, the board is cleaned under running water and then paint is removed using acetone or spirit. 12. Rough surface around the holes is scrubbed. 13. Components are soldered 14. A coat of varnish is applied to prevent oxidation.
3.4.8 Precautions Ferric chloride is a very harmful chemical. It is hygroscopic in nature (it absorbs moisture from air) so it should be kept in airtight container all the time. Gloves should be used to handle it. It should not be let to be in contact with metal because it eats away the metal. Etching of boards should be done in a well-ventilated area, as the fumes produced are extremely poisonous.
3.4.9 Solder Gun Soldering is the process used for joining metal parts. It is necessary to use molten metal known as SOLDER. The melting temperature of solder is below that of the metal joined so that its surface is only wetted without melting. In this process, relative positioning of the surface to be joined, wetting of these surface with molten solder and cooling time for solidification is important. It is necessary that the surface should be cleaned for good electric contacts.
3.4.10 Soldering Techniques There are two methods of soldering techniques.
Hand Soldering
53
This is used in small-scale production by using solder iron, each component and contact is fixed to PCB. The wattage of soldering iron depends upon the thickness of solder pad and contact leads. For earthings and heat sink mounts, solder iron of 35 W- 60W is used. For normal component, 15 W to 25 W iron is used. For ICs, 10 W is used. The iron consists of an insulating handle, connected via metal shank to the bit. The function of bit is to Store heat and convey it to the component. To store and deliver molten solder and flux. To remove surplus solder from joint. Soldering bit is made of copper because it has good wetting, good heat capacity and thermal conductivity. It may erode after long term use. To avoid it, coating of nickel or tin is used
Image 3.8 Soldering gun Soldering With Iron The surface to be soldered must be cleaned and fluxed. The soldering iron is switched on and allowed to attain soldering temperature. The solder in form of wire is applied near the component to be soldered and heated with iron. The surfaces to be soldered are filled. Iron is removed and the joint is cooled without disturbing. By standardization testing and quality control division solder joints are supposed to1. Provide a permanent low resistance path. 2. Make a robust mechanical link between PCB and leads of components.
54
3. Allow heat flow between components, joining elements and PCB. 4. Retain adequate strength with temperature variation. The other methods of soldering are Mass Soldering, Dip Soldering and Wave Soldering. Right Amount of Solder Minimum amount of solder Optimal Excessive solder
Solder ability Bad Solder ability of terminal wire Bad soldering of PCB Bad soldering of terminal wire and PCB
Key Points to Remember 1. Always keep the tip coated with a thin layer of solder. 2. Use fluxes that are as mild as possible but still provide a strong solder joint. 3. Keep temperature as low as possible while maintaining enough temperature to quickly solder a joint (2 to 3 seconds maximum for electronic soldering). 4. Match the tips size to the work. 5. Use a tip with the shortest reach possible for maximum efficiency.
55
Figure 3.3 Soldering method
CHAPTER 4: RESULT AND DISCUSSION

4.1 TESTING PROCEDURE: 4.1.1 4.1.2 Measuring instruement: DMM Checking voice commands in VRBOT:
To check spoken command using the VRbot GUI software, connect the robot to your PC and turn on yourROBONOVA. Select the serial port to use (the same as in RoboBasic Editor) from the toolbar or the File menu,then go with the Connect command.
56
The user can add a new command by first selecting the group in which the command needs to be created and then using the toolbar icons or the Edit menu. Select your command group, it will show commands trained by user. o the "train command" is selected from the Edit menu.
If any error happens, command training will be canceled. Errors may happen when the user voice is not heard correctly or when the second word heard is too different from the first. The selected group of commands can also be tested, by using the icon on the toolbar or the Tools menu, to make sure the trained commands can be recognized successfully. Then disconnect VRbotGUI, open the file with the RoboBasic Editor, make the required changes to customize the behavior, and finally download and run it on the ROBONOVA controller.
4.1.3 To check fingerprint authentication:

1) Press your thumbprint across fingerprint module. 2) Module will compare the stored fingerprints with the applied print 3) If applied print matches with stored fingerprints, it will display the fingerprint no on lcd display n provide access to the system. 4) If fingerprint do not match it display ERROR 0000 with buzzer sound indicating unauthorized owner and block him from using system further.
57
4.2 RESULT: 1. Fingerprint module scan the owners fingerprint and display result on lcd module. 2. When speaker give command Forward tank moves Forward. 3. When speaker give command Back tank moves Back. 4. When speaker give command Left tank moves Left. 5. When speaker give command Right tank moves Right. 6. When speaker give command Fire leds attached on tank starts glowing indicating firing. 7. When speaker give command Lock tank gets locked blocking further access.
4.3 4.3.1
Advantages, Disadvantages and Applications: Advantage:
1) In our project we have used voice module because of following advantages: 1. Speech is a preferred input because it does not require training and it is much faster than any other input. Also information can be input while the person in engaged in other activities and information can be fed via telephone or microphone which are relatively cheaper compared to current input systems. 2. Voice recognition would allow the user to speak to the computer instead of type a command or point with a mouse. Voice recognition combined with voice dictation would virtually eliminate all current input devices. 3. We can control tank and give commands from a distance. 2) By using fingerprint authentication we can increase security level of our tank by prohibiting other users. 3) Fingerprint authentication is simple, portable and less complicated than other security mechanism. 4) This project
58
4.3.2
Diadvantages:
1. High Initial Costs: Since we have used two modules voice module as well as thumbprint modules to increase simplicity of project,the cost of modules get added in the system.
2. Voice module system: The Disadvantages of using Voice Recognition Software is that it can not understand all the words we speak even after hours of training. Also its memory get corrupted after some regular interval of time. The phonemes in reference are recorded in isolation and it's spectrum is different from the phonemes in the input speech because they are affected by neighbouring phonemes. Thus,it do not recognizes word effectively even if a bit background noise is present.
3. Fingerprint module: If a fingerprint is placed at a different angle at module than it was stored, the scanning results do not match and shows error even if owner is correct person.
59
60

Project Report

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Project Report

Transféré par

Droits d'auteur :

Formats disponibles

1. INTRODUCTION Speech is a natural mode of communication for people.

Figure 2.2 Milestones in Speech Recognition and Understanding Technology

3.1. Block diagram:

3.2. HARDWARE DESCRIPTION: 3.2.1 Speech Recognition Module

Well as Voice Passwords. SD custom commands can be spoken in ANY

Biometric Fingerprint Module Sm12

Biometric Fingerprint Module Sm12

DESIGNING OF POWER SUPPLY:A) Design of step down transformer:-

The following information must be available to the designer of the transformer.

3.4.5 Advantages of PCB

3.4.10 Soldering Techniques There are two methods of soldering techniques.

Figure 3.3 Soldering method

CHAPTER 4: RESULT AND DISCUSSION

4.1.3 To check fingerprint authentication:

Advantages, Disadvantages and Applications: Advantage:

Vous aimerez peut-être aussi