Vous êtes sur la page 1sur 32

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

CHAPTER 1
1. INTRODUCTION
One of the important challenges in Human Computer Interactions is to develop more
intuitive and more natural interfaces. Computing environments presently are strongly tied to the
availability of a high resolution pointing device with a single, discrete two dimensional cursor.
Modern Graphical user interface (GUI), which is a current standard interface on personal
computers (PCs), is well-defined, and it provides an efficient interface for a user to use various
applications on a computer. GUIs (graphical user interfaces) combined with devices such as mice
and track pads are extremely effective at reducing the richness and variety of human
communication down to a single point.
While the utility of such devices in todays interfaces cannot be denied, there are many
users who find that the capability of GUI is rather limited when they try to do some tasks by
using gestures. There are opportunities to apply other kinds of sensors and techniques to enrich
the user experience of such users. For example, video cameras and computer vision techniques
may be used to capture many details of human shape and movement. The shape of the hand may
be analyzed over time to manipulate an onscreen object in a way analogous to the hands
manipulation of paper on a desk. Such an approach may lead to a faster, more natural, and more
fluid style of interaction for certain tasks.
Ubiquitous computing is devoted to changing the relationship between humans and the
computers with which we interact, towards allowing computers to become invisible and recede
into the periphery of peoples lives.
Our project, Human Computer Interaction Where Controlling Computer and
Applications Using Image Processing and Voice recognition is an attempt in ubiquitous
computing. Here, we will be using colored tapes on our fingers. One of the tapes will be used for
controlling cursor movement while the relative distance between the two colored tapes will be
used for click events of the mouse. And center color tape we are using for gestures Also, we will
be enriching our system with voice recognition capability to perform basic actions like
shutdown, search and surfing thus, the system will provide a new experience for users in
interacting with the computer

ISB&M School of Technology Pune

Page 1

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

CHAPTER 2
1. PROBLEM DEFINITION

The project that we are trying to develop will completely change the way people are going to use
the computer system. Presently, we are using the camera and mice to detect the hand gesture and
voice commands to control the computer and its applications.
Also this would lead to a new era of Human Computer Interaction
(HCI) where no physical contact with the device is required. And it can be used in many media
application as well as new product designs. It can be used in advertisement industry for the
natural user interface so that user can be connected with the advertisers more effectively.

ISB&M School of Technology Pune

Page 2

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

CHAPTER 3
3. LITERATURE SURVEY
3.1.

RELATED WORK
A lot of research is being done in the fields of Human Computer Interaction (HCI) and

Robotics. Researchers have tried to control mouse movement using video devices for HCI.
However, all of them used different methods to make mouse cursor movement and clicking
events.
3.1.1] A Method for Controlling Mouse Movement using a Real-Time Camera. [1].
Hojoon Park
One approach, by Hojoon Park [1] used index finger for cursor movement and angle between
index finger and thumb for clicking events.
Working:
Hojoon Park [1] used index finger for cursor movement. In his system Hojon park [1] used index
finger movement to move mouse movements on the computer. In which he used the effective
algorithm to detect the fingers. Hojoon Park [1] showed that we can use angle between finger
and thumb for clicking events.
To recognize that a finger is inside of the palm area or not, He used a convex hull algorithm. The
convex hull algorithm is used to solve the problem of finding the biggest polygon including all
vertices. Using this feature of this algorithm, He can detect finger tips on the hand. He used this
algorithm to recognize if a finger is folded or not. To recognize those states, He multiplied 2
times (He got this number through multiple trials) to the hand radius value and check the
distance between the center and a pixel which is in convex hull set. If the distance is longer than
the radius of the hand, then a finger is spread. In addition, if two or more interesting points
existed in the result, then He regarded the longest vertex as the index finger and the hand gesture
is clicked when the number of the result vertex is two or more.

Advantages:
He developed a system to control the mouse cursor using a real-time camera. He implemented all
mouse tasks such as left and right clicking, double clicking, and scrolling. This system is based
on computer vision algorithms and can do all mouse tasks
ISB&M School of Technology Pune

Page 3

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

The Hojoon Park [1] used Simple Effective System which is less complicated. And the
Algorithms are simple and take less times. The system is easy to use can be used to control small
applications.
Limitations:
In this project, the problem was when the finger shook at lot. Since He used real-time video, the
Illumination changes every frame. Hence the position of the hand changes every frame. Thus, the
fingertip position detected by convex hull algorithm is also changed. Then the mouse cursor
pointer shakes fast. To fix this problem, we added a code that the cursor does not move if the
difference of the previous and the current finger tips position is within 5 pixels. This constraint
worked well but it makes it difficult to control the mouse cursor sensitively. Another problem by
illumination issue is segmentation of the background for extracting the hand shape. Since the
hand reflects all light sources, the hand color is changed according to the place. If hand shape is
not good then our algorithm cannot work well because our algorithm assumes the hand shape is
well segmented. If the hand shape is not good then we cannot estimate the length of radius of the
hand.
3.1.2] Virtual mouse vision based interface, January 2004 [3]
Robertson P., Laddaga R., Van Kleek M.[3]

Working
There solution was to develop a virtual mouse that enables users to control the kiosk with hand
signs and movements. The kiosk has a standard visual user interface, with arrowcursor to
indicate pointer movement. The user walks up to the kiosk. People approaching the kiosk are
tracked by a robotic head called IGOR (Intelligent Gaze Oriented Robot)[3] described below.
When the user makes a recognized hand sign the kiosk allows movement of the hand to move the
mouse pointer on the kiosk display. Separate hand signs allow for clicking of the mouse buttons
for making selections on the kiosk display.
Note that the arrow pointer is the only feedback the user gets as to where the user is pointing.
The user can use that feedback, adjusting to imperfections in tracking, without the distraction of
a distinct and dierent other signal.

ISB&M School of Technology Pune

Page 4

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

Advantages:
In the face tracking state the face recognition is a higher priority than the sign recognition. In the
sign tracking state the optical ow is given the higher priority. In this way good responsiveness
is achieved in both user tracking and gesture tracking.
Optical ow allows smooth tracking of the hand gestures. It is robust because no recognition is
required to achieve mouse motion, and it also provides smooth motion estimates.
Limitations:
The System was not Capable for complex operations and It cannot control high end system and
applications which lowers its scope.

3.1.3] Real-time Hand Tracking and Finger Tracking for Interaction [4]
Shahzad Malik CSC2503F Project Report, December 18, 2003[4]
Working:
In this system, which is primarily based on the single hand tracker presented in [Segen99].[4]
The system can extract the 3D position and 2D orientation of the index finger for each hand, and
when present the pose of the thumb as well. In interactive applications a single pointing gesture
could then be used for selection operations while both the thumb and index finger could be used
together for pinching gestures [4] in order to grasp and manipulate virtual objects.
Advantages:
This project presents the implementation and analysis of a real-time stereo vision hand tracking
system that can be used for interaction purposes. The system uses two low-cost web cameras
mounted above the work area and facing downward. In real-time, the system can track the 3D
position and 2D orientation of the thumb and index finger of each hand without the use of special
markers or gloves, resulting in up to 8 degrees of freedom for each hand.
Limitations:
Misclassification problem occurs when two hands appear close together in the captured images.
This results in a single large region being segmented by the background subtraction and skin
detection phases. Therefore the contour detector interprets the two hands as a single hand and
thus the fingers are labelled as a right hand.

ISB&M School of Technology Pune

Page 5

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

3.1.4] Portable Vision-Based HCI - A Real-time Hand Mouse System on Handheld Devices
[9]
Chu-Feng Lien,[9]
Working:
They assume the popularity of equipping a low-resolution camera on those handheld devices. By
adopting their embedded cameras, the system can detect a user's hand motion in real-time, and
results in the autonomous manipulation of corresponding programs on the device. To run the
vision-based HCI system on the handheld devices, computing power of frame processing is a
critical concern. They used Adaboost [9] method proposed by Viola and Jones seem to be a
choice for the project, they did not have a good result with a low-resolution camera. Instead of
using gesture recognition methods, then they use Motion History Images. By grabbing and
processing the image on pixels directly, they gain an efficiency way of computing as a result.
Advantages:
There system can find Projected screen and that is so much useful for the Projection purpose
The system is more convenient because it can be used on all the portable devices. The Low
resolution Cameras Can be supported.
Limitations:
High error rate on fast moving motion (this issue can be improved by increasing the framing
rate) will result in high false positive detection. If the speaker walks around the projected area
continuously, the system can adapt to this behaviour and performs well, but if the speaker
suddenly stops in the middle of the screen, system will result in a false alarm. If the
environmental lights are changing or a shadow is projected within the scope of the camera, the
system will be misleading to a difference result. If the edge color of projected screen is similar to
its neighbour objects. The screen will not be well detected.

ISB&M School of Technology Pune

Page 6

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

CHAPTER 4
2. SOFTWARE REQUIREMENT SPECIFICATION
4.1 INTRODUCTION
As computer technology continues to develop, people have smaller and smaller electronic
devices and want to use them ubiquitously. There is a need for new interfaces designed
specifically for use with these smaller devices. Increasingly we are recognizing the importance of
human computing interaction (HCI), and in particular vision-based gesture and object
recognition. Simple interfaces already exist, such as embedded keyboard, folder-keyboard and
mini-keyboard. However, these interfaces need some amount of space to use and cannot be used
while moving. Touch screens are also a good control interface and nowadays it is used globally
in many applications. However, touch screens cannot be applied to desktop systems because of
cost and other hardware limitations. By applying vision technology and controlling the mouse by
natural hand gestures, we can reduce the work space required. In this paper, we propose a novel
approach that uses a video device to control the mouse system. This mouse system can control
all mouse tasks, such as clicking (right and left), doubleclicking and scrolling. We employ
several image processing algorithms to implement this.
Our project, Human Computer Interaction Where Controlling Computer and
Applications Using Image Processing and Voice recognition is an attempt in ubiquitous
computing. Here, we will be using colored tapes on our fingers. One of the tapes will be used for
controlling cursor movement while the relative distance between the two colored tapes will be
used for click events of the mouse. And center color tape we are using for gestures Also, we will
be enriching our system with voice recognition capability to perform basic actions like
shutdown, search and surfing thus, the system will provide a new experience for users in
interacting with the computer

ISB&M School of Technology Pune

Page 7

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

4.2 SYSTEM FEATURES


We introduce an effective way of Human Computer Interaction where the proposed system has following
modules

Controlling the Mouse Movements through Hand Movements.

Controlling Media Player Options through Hand Gestures.

Controlling Application like Games, Maps, Image Viewer.

Perform Basic Operations Through Voice Recognition like Search, Shutdown, Restart, etc.

Here we are using Users hand gesture through webcam and process it using image processing techniques
and we are using voice recognition to make it more natural interaction with the system.

4.3 EXTERNAL INTERFACE REQUIREMENT


There are many types of interfaces such as User Interface, Software Interface and Hardware
Interface.
User Interfaces
The user interface for the software shall be compatible to windows operating system.
Where user interface is developed using java media framework and Camera should be
compatible with the system.

Software Interfaces
The system utilizes JDK(1.6) framework which provides it with the necessary
components to build system components and objects, plus providing the system with the required
data access components.

Communication Interfaces
Graphical user interface is most convenience way to do the interaction with the system.
So in our proposed system we have design and developed GUI to interact with the system by
using Java Swing classes.

ISB&M School of Technology Pune

Page 8

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

4.4 FUNCTIONAL REQUIREMENT

Class

Function

Requirement

Color_Tape_Detection

Color_Extraction

HSV Color Detection


Algorithm

blob_Detection

Histogram-based Skin
Classifier

Cursor_Move

Mapping Cursor Control


Algorithm, Weighted Speed
Cursor Control Algorithm

Voice_Recognition

Voice_Recognise

Dynamic Time Wrapping


Algorithm

Table 1

4.5 NON FUNCTIONAL REQUIREMENT


Performance Requirements

High Speed: System should process voice messages in parallel for various users to give
quick response then system must wait for process completion.

Better component design: To get better performance at peak time

Security Requirements

Secure access of confidential data (users details). Information security means protecting
information and information systems from unauthorized access, use, disclosure,
disruption, modification or destruction.

ISB&M School of Technology Pune

Page 9

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

1. User password must be stored in encrypted form for the security reason
2. All the user details shall be accessible to only high authority persons.
3. Access will be controlled with usernames and passwords.
4. Voice database must secure.
Extensibility

Extensibility allows adding new component to the system, replaces the existing ones.
This is done without affecting the components that are in their original places. Flexible
service based architecture will be highly desirable for future extension

Scalability

The solution should be able to accommodate high number of customers and brokers. Both
may be geographically distributed.

Compatibility

Compatibility is the measure with which user can extend the one type of application with
another.

Serviceability

In software engineering and hardware engineering, serviceability also known as


supportability, is one of the aspects (from IBM's RASU (Reliability, Availability,
Serviceability, and Usability). It refers to the ability of technical support personnel to
install, configure, and monitor computer products, identify exceptions or faults, debug or
isolate faults to root cause analysis, and provide hardware or software maintenance in
pursuit of solving a problem and restoring the product into service.

ISB&M School of Technology Pune

Page 10

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

4.6 ANALYSIS MODEL


4.6.1 DFD level 0

Figure 1

2.1.1 DFD level 1

Figure 2

ISB&M School of Technology Pune

Page 11

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

2.1.2 DFD level 2

Figure 3

4.6.4 Class Diagram:

Figure 4

ISB&M School of Technology Pune

Page 12

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

4.6.5 ER Diagram

Figure 5

4.7 SPECIFIC REQUIREMENTS


Project System will be Windows based supporting versions, Windows XP onwards. The
minimum configuration required for system is:
4.7.1 Hardware:

CPU : 2.4 GHZ (intel or AMD)

Hard disk : 80 GB

RAM : 1 GB.

Camera : 2 Megapixel (minimum) and 30 FPS (Frame per Seconds)

ISB&M School of Technology Pune

Page 13

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

4.7.2 Software:

JDK 1.6

NetBeans IDE

Java Advance Imaging

Java Media Framework

SAPI

4.8 SYSTEM IMPLEMENTATION PLAN


Sr. No.

Planning

Start Date

Completion Date

1.

Topic search and Finalization

4th July 2014

25th July 2014

2.

Literature Survey

8th August 2014

21st August 2014

3.

Objective And Planning

22nd August 2014

28th August 2014

4.

Software ,Hardware Requirements & Budget

29th August 2014

5th September 2014

5.

DFD , UML diagrams

8th September 2014

10th September 2014

6.

Algorithm Design

10th September 2014

13th September 2014

7.

Algorithm Analysis

15th September 2014

21st October 2014

8.

Preliminary Report

22nd October 2014

29th September 2014

9.

Study of Project related Technology

1st November 2014

3rd January 2014

10.

Coding and Implementation of Project.

5th January 2014

20th February 2014

11.

Working Model and Testing.

24th February 2014

15th March 2014

12.

Tested and Executable Project Model.

16th March 2014

29th March 2014

13.

Final Report and Deployment of Project.

2nd April 2014

20th April 2014

Table 2

ISB&M School of Technology Pune

Page 14

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

4.9 BUDGET

Sr. No.

Product

Quantity

Cost 1

1.

Computer

20000

2.

WebCamera

2500

3.

Windows XP

3000

4.

Net BeansIDE 7.0.1

OSS

5.

JDK 1.6

OSS

6.

STAR UML

OSS

Total

27500

Note: OSS (Open Source Software)


Table 3

ISB&M School of Technology Pune

Page 15

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

CHATPER -5
5. SYSTEM DESIGN
5.1 SYSTEM ARCHITECTURE:
In our Proposed System We Have 3 Main Blocks.
1. Functional Block
2. Voice Recognition Module
3. Technology
Functional Block Takes a Camera feed as a input and process it through image processing
algorithms as well as voice recognition module takes a voice commands as a input and process
those voice commands both the functional block and voice recognition module performs the
processing using technology block.

Figure 6

ISB&M School of Technology Pune

Page 16

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

4.2 UML DIAGRAMS


4.2.1 USE-CASE Diagram
Use case Diagram
Use case diagrams are one of the five diagrams in the UML for modelling the dynamic
aspects of systems (activity diagrams, state chart diagrams, sequence diagrams, and collaboration
diagrams are four other kinds of diagrams in the UML for modelling the dynamic aspects of
systems). Use case diagrams are central to modelling the behaviour of a system, a subsystem, or
a class. Each one shows a set of use cases and actors and their relationships. You apply use case
diagrams to model the use case view of a system. For the most part, this involves modelling the
context of a system, subsystem, or class, or modelling the requirements of the behaviour of these
elements. Use case diagrams are important for visualizing, specifying, and documenting the
behaviour of an element. They make systems, subsystems, and classes approachable and
understandable by presenting an outside view of how those elements may be used in context. Use
case diagrams are also important for testing executable systems through forward engineering and
for comprehending executable systems through reverse engineering.

Figure 7

ISB&M School of Technology Pune

Page 17

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

4.2.2 Activity Diagram


A class diagram shows a set of classes, interfaces, and collaborations and their
relationships. These diagrams are the most common diagram found in modeling object-oriented
systems. Class diagrams address the static design view of a system. Class diagrams that include
active classes address the static process view of a system

Figure 8

ISB&M School of Technology Pune

Page 18

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

4.2.3 Sequence Diagram


A statechart diagram shows a state machine, consisting of states, transitions, events, and
activities. You use statechart diagrams to illustrate the dynamic view of a system. They are
especially important in modelling the behaviour of an interface, class, or collaboration.
Statechart diagrams emphasize the event-ordered behaviour of an object, which is especially
useful in modelling reactive systems.

Figure 9

ISB&M School of Technology Pune

Page 19

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

CHAPTER -6
5. TECHNICAL SPECIFICATION
6.1 TECHNOLOGY USED IN PROJECT
6.1.1 JDK 1.6
JDK (Java Development Kit) is a free software development package from Sun
Microsystems that implements the basic set of tools needed to write, test and debug Java
applications and applets.

6.1.2 Java Advance Imaging


The Java Advanced Imaging API extends the Java 2 platform by allowing sophisticated,
high-performance image processing to be incorporated into Java applets and applications. It is a
set of classes providing imaging functionality beyond that of Java 2D and the Java Foundation
classes, though it is designed for compatibility with those APIs. This API implements a set of
core image processing capabilities including image tiling, regions of interest, deferred execution
and a set of core image processing operators, including many common point, area, and frequency
domain operators.
The Java Advanced Imaging API is intended to meet the needs of technical (medical,
seismological, remote sensing, etc.) as well as commercial imaging (such as document
production and photography). The API can benefit all Java developers who want to incorporate
imaging into their Java applets and applications.
Features of JAI

Rich set of functionality for digital imaging.

High level of extensibility to allow arbitrary processing capabilities.

Support for a wide variety of data types.

Deferred Execution.

Remote Imaging and truly distributed imaging.

Allow multiple implementations with different trade-offs of memory usage, operator


optimization, and hardware acceleration.

ISB&M School of Technology Pune

Page 20

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

6.1.3 Java Media Framework


JMF is a framework for handling streaming media in Java programs. JMF is an optional
package of Java 2 standard platform. JMF provides a unified architecture and messaging protocol
for managing the acquisition, processing and delivery of time-based media. JMF enables Java
programs to

Present ( playback) multimedia contents,

Capture audio through microphone and video through Camera,

Do real-time streaming of media over the Internet,

Process media ( such as changing media format, adding special effects),

Store media into a file.

Features of JMF

JMF supports many popular media formats such as JPEG, MPEG-1, MPEG-2,
QuickTime, AVI, WAV, MP3, GSM, G723, H263, and MIDI. JMF supports popular
media access protocols such as file, HTTP, HTTPS, FTP, RTP, and RTSP.

JMF uses a well-defined event reporting mechanism that follows the Observer design
pattern. JMF uses the Factory design pattern that simplifies the creation of JMF
objects. The JMF support the reception and transmission of media streams using Realtime Transport Protocol (RTP) and JMF supports management of RTP sessions.

JMF scales across different media data types, protocols and delivery mechanisms. JMF
provides a plug-in architecture that allows JMF to be customized and extended.
Technology providers can extend JMF to support additional media formats. High
performance custom implementation of media players, or codecs possibly using hardware
accelerators can be defined and integrated with the JMF.

6.2 ADVANTAGE

Hand as an acceptable tool to control a computer cursor.

It will enable people to interact with computers without physical contact.

It Will give more natural way to interact with computer

Benefits from Finger Mouse & Voice in other applications as commercials and
interactive advertisements.

ISB&M School of Technology Pune

Page 21

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

6.3 APPLICATIONS

Our system can be used for the human computer interaction

It can also used to control computer

It can be used to control various software applications

Our system can be used to control power point presentation

Our system can be used to control or play games

ISB&M School of Technology Pune

Page 22

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

CHAPTER 7
7. CONCLUSION
The product that we are trying to develop will improve the way people are going to use the
computer system. Presently, the webcam, microphone and mouse are an integral part of the
computer system. Our product which uses only two of them i.e. webcam and microphone would
may eliminate the mouse. Also this would lead to a new era of Human Computer Interaction
(HCI) where no physical contact with the device is required.
This technology can be further enhanced for use in robotics, gaming and developing
systems which could understand human behaviour based on their way of interaction.

ISB&M School of Technology Pune

Page 23

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

CHAPTER -8
REFERENCES
1. Hojoon Park, A Method for Controlling Mouse Movement using a Real-Time Camera.
2. Hart Lambur, Blake Shaw, Gesture Recognition, CS4731 Project, December 21,2004.
3. Robertson P., Laddaga R., Van Kleek M., Virtual mouse vision based interface, January
2004.

4. Shahzad Malik, Real-time Hand Tracking and Finger Tracking for Interaction CSC2503F
Project Report, December 18, 2003

5. A. Erdem, E. Yardimci, Y. Atalay, V. Cetin, Computer vision based mouse, A. E.


Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASS). IEEE
International Conference

6. Y. Sato, Y. Kobayashi, H. Koike. Fast tracking of hands and fingertips in infrared images
for augmented desk interface. In Proceedings of IEEE, International Conference on
Automatic Face and Gesture Recognition (FG), 2000. pp. 462-467.

7. J. Segen, S. Kumar. Shadow gestures: 3D hand pose estimation using a single camera. In
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
1999. Vol. 1, pp. 479-485.

8. James M. Rehg, Takeo Kanade, DigitEyes: Vision-Based Hand Tracking for HumanComputer Interaction, Proc. of the IEEE Workshop on Motion of Non-Rigid and
Articulated Objects, Austin, Texas, November 1994, pages 16-22.

9. Chu-Feng Lien, Portable Vision-Based HCI - A Real-time Hand Mouse System on


Handheld Devices.

10.Asanterabi Malima, Erol Ozgur, and Mujdat Cetin, A Fast Algorithm for Vision-Based
Hand Gesture Recognition for Robot Control.

11.Stephen Tu, HSV Color Detection Algorithm, White Paper.

ISB&M School of Technology Pune

Page 24

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

ANNEXURE
ANNEXURE A:
Project Analysis of Algorithm Design
Project Analysis:

Figure 10

Let G be a closed graph that represents our system of mouse simulation and application
control; such that G = {E, V} where E represents the set of edges; E = {e0, e1, e2, e3 ....., e14}
and V is a set of vertices; V = {v0, v1, v2, v3....., v10}.
In the graphical representation of the system, vertices in the set V represent the modules
which are connected through directed edges in the set E representing the input/output of
modules. We define the vertices as,

ISB&M School of Technology Pune

Page 25

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

VERTEX

MODULE

v0

Initialize

v1

Image Capture

v2

Voice Capture

v3

Finger Tape Colour Detect

v4

Voice Recognition

v5

Event Detect

v6

Mouse Operation Events

v7

OperationGesture events

v8

Key Strokes

v9

Voice Operations

v10

Aggregation

Table 4

We define the edges as,

EDGE

INPUT/OUTPUT

e0

Call to camera

e1

Call to microphone

e2

Image frames

e3

Voice records

e4

Pixel position

e5

Distance between tapes and wait time

e6

Voice command

e7

Operation Gesture Detected

e8

Gesture For Key stroke

e9

Voice Command

ISB&M School of Technology Pune

Page 26

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

e10

Call to aggregation module

e11

-------------------------------

e12

-------------------------------

e13

-------------------------------

e14

Iteration Call

Table 5

Let fe be a rule of E into V such that for given edge; it returns vertices. fe (E) | V.
Thus, for our system,

fe (e0) = {v1}.......v1 is called using e0 to capture image.


fe (e1) = {v2}.......v2 is called using e1 to capture voice.
fe (e2) = {v3}.......frames are passed to v3 using e2 for detection.
fe (e3) = {v4}......voice data is passed to v4 using e3 for recognition.
fe (e4) = {v6}......position is passed to v6 using e4 for cursor movement.
fe (e5) = {v5}......distance between coloured tapes or wait time is passed to v5 using e5 for event
detection.

fe (e6) = {v5}......voice command is passed to v5 using e6 for event detection.


fe (e7) = {v7}.......v7 is called for Operation Gesture event using e7.
fe (e8) = {v8}.......v8 is called for key stroke event using e8.
fe (e9) = {v9}.......v9 is called for Voice Operation event using e9.
fe (e10) ={v10}
fe (e11) = {v10}
fe (e12) = {v10}

e10, e11, e12, e13 aggregate at v10.

fe (e13) = {v10}
fe (e14) = {v0}......v0 is called again to iterate using e14.

ISB&M School of Technology Pune

Page 27

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

Overloading on {e4, e5, e6}


The mouse movement and events are overloaded using pixel position (e4), distance
between coloured tapes and wait time (e5), and voice commands (e6).

COMPLEXITY
Our system involves three main modules

Image recognition and analysis (v3)

Voice recognition and analysis (v4)

Event selection (v5)

For a standard image recognition and analysis module/system, the complexity is logarithmic
and it is given as
___________________________________________________________________________
O( mn + (mn/k2 )log(mn/k2 ))
where (m x n) are width and height of image and (k x k) is segmentation blob respectively.
___________________________________________________________________________
Also, for a standard voice recognition module/system, the complexity is quadratic in
nature and it is given as
___________________________________________________________________________
O( n2 v)
where v is number of words in dictionary and n is length of sequence respectively.
___________________________________________________________________________

And the complexity of the event selection module depends on the number of events involved in
it. Thus for n events, the complexity is
___________________________________________________________________________
O(n).
___________________________________________________________________________
Thus, total complexity of our system is given as
ISB&M School of Technology Pune

Page 28

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

___________________________________________________________________________
Total complexity = Image recognition complexity + Voice recognition complexity + Event
selection complexity
2

= O( mn + (mn/k )log(mn/k2 )) + O( n2 v) + O(n)


___________________________________________________________________________
Hence, Overall complexity of our system nearly comes out to be O(n2).

P Class Problem
A problem is in P class if it is solvable in polynomial time by a deterministic algorithm.
For our system, algorithms are deterministic and the overall complexity is O(n2) which shows
that it is in P class.

ISB&M School of Technology Pune

Page 29

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

ANNEXURE B:
Project Quality and Reliability Testing of Project Design
Testing is the process of evaluating a system or its component(s) with the intent to find that
whether it satisfies the specified requirements or not. This activity results in the actual, expected
and difference between their results. In simple words testing is executing a system in order to
identify any gaps, errors or missing requirements in contrary to the actual desire or requirements.

There are different types of testing which may be used to test a Software during SDLC.
Manual testing: This type includes the testing of the Software manually i.e. without using
any automated tool or any script.
Automation testing: Automation testing which is also known as Test Automation, is when
the tester writes scripts and uses another software to test the software.

There are different methods which can be use for Software testing.
Black Box Testing: The technique of testing without having any knowledge of the interior
workings of the application is Black Box testing.
White Box Testing: White box testing is the detailed investigation of internal logic and
structure of the code. White box testing is also called glass testing or open box testing.
Grey Box Testing: Grey Box testing is a technique to test the application with limited
knowledge of the internal workings of an application.

ISB&M School of Technology Pune

Page 30

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

ANNEXURE C
ABBREVIATIONS
A
AMD: Advanced Micro Devices
API: Application Programming Interface
AVI: Audio Video Interleave

C
CPU: Central Processing Unit

D
DFD: Data Flow Diagram

E
ER Diagram: Entity Relationship Diagram

F
FPS: Frame per Seconds
FTP:File Transfer Protocol

G
GB: GigaByte
GUI : Graphical user Interface
GSM: Global System for Mobile

H
HCI: Human Computer Interaction
HSV: Hue Saturation Value
HTTP: Hyper Text Transfer Protocol
HTTPS:Hyper Text Transfer Protocol Secure

I
IGOR: Intelligent Gaze Oriented Robot
IBM: International Business Machines
IDE: Integrated Development Environment

ISB&M School of Technology Pune

Page 31

Human Computer Interaction Where Controlling Computer And Applications Using Image Processing and Voice Recognition

J
JDK(1.6): JAVA Development Kit 1.6
JAI: Java Advance Imaging
JMF: Java Media Framework
JPEG: Joint Photographic Experts Group

M
MPEG-1: Motion Picture Experts Group 1
MPEG-2: Motion Picture Experts Group 2
MP3: MPEG-2 Audio Layer III
MIDI: Musical Instrument Digital Interface

O
OSS:Open Source Software

P
PC : Personal Computer

R
RASU: (Reliability, Availability, Serviceability, and Usability)
RAM: Random Access Memory
RTP: Real-Time Transport Protocol
RTSP: Real-Time Streaming Protocol.

S
SAPI: Speech Application Programming Interface
SDLC: Software Development Life Cycle

U
UML: Unified Modelling Language

W
WAV: WAVEform audio format
Numbers
3D: Three Dimensional
2D: Two Dimensional

ISB&M School of Technology Pune

Page 32

Vous aimerez peut-être aussi