Académique Documents
Professionnel Documents
Culture Documents
P.THIRUMAL REDDY CSE 3rd YEAR SRI SAI INSTITUTE OF TECHNOLOGY ,KADAPA EMAIL: ptirumal584@gmail.com
Abstract:
Human Computer Interaction in the field of input and output techniques has developed a lot of new techniques over the last few years. With the recently released full multi-touch tablets and notebooks the way how people interact with the computer is coming to a new dimension. As humans are used to handle things with their hands the technology of multi-touch displays or touchpad's brought much more convenience for use in daily life. But for sure the usage of human speech recognition will also play an important part in the future of human computer interaction. This paper introduces techniques and devices using the humans hand gestures for the use with multi-touch tablets and video recognition and techniques for voice interaction. Thereby the gesture and speech recognition take an important role as these are the main communication methods between humans and how they could disrupt the keyboard or mouse as we know it today.
1 Introduction
As mentioned before, much work in the sector of human computer interaction, in the field of input and output techniques, has been made since the last years. Now Since the release of the multi-touch tablets and notebooks, some of these developed multitouch techniques are coming into practical usage. Thus it will surely take not much time that sophisticated techniques will enhance these techniques for human gesture or voice detection. These two new methods will for sure play an important role of how the HCI in future will change and how people can interact more easily with their computer in daily life. Hewett, et al defined that "Human-computer interaction is a discipline concerned with the design, evaluation and implementation of interactive computing systems for human use and with the study of major phenomena
surrounding them. So since the invention of the Human Computer Interface in the 1970s at Xerox Park, we are used to have a mouse and a key-board to interact with the computer and to have the screen as a simple output device. With upcoming new technologies these devices are more and more converting with each other or sophisticated methods are re-placing them. Therefore this paper mainly deals with these new developments, how they should be implemented in future and how they could influence and change the daily computer interaction. The screen recently becomes a new input and output tool in one device. In doing so it is seen that there is no need for extra input devices. Thus even this fact is a completely new way, as we are used to having more than just one device. Included in section 2.2 is another awarding method concerning human gesture interaction in the field of detecting gestures via video devices? In section 2.3 then an-other future technique is pointed out which is concerned with the human speech detection as an input method. Section 2.4 then deals with a method of using the combination of video and speech detection. After these sections about the different types of recent human computer interaction work the section 3 then deals with the opportunities of these new techniques and how they can be used in future especially how life can be changed. It is then also pointed out in which field these new developments can be adopted.
2 Recent Developments:
Human gestures and the human speech are the most intuitive motions which humans use to communicate with each other. Al-though after the invention of the mouse and the keyboard, no further devices which could replace this two objects as a computer input method have been developed. Relating to the fact that it is more human like methods of a lot of research of how this fact can be used
for communication between computers and human beings have been done.as there are
different ways of how gestures should be used as input this section is divided into multi-touch, video, speech and mutli-modal inter-action Sections. Nowadays there are already many tablets with touch screens available and with the new Apple iPad a new full multi-touch product has been released. But there is also a trend noticeable that these methods can also be used on bigger screens like the approach of Miller [2] or Microsoft's Surface [3], only to mention these two. Thus the trend is going more and more not only in the direction of merging input and output devices but rather using every surface as an input and output facility. In this relation the video is becoming of greater interest as it uses the full range of human motion gestures and it is usable on any ground. In the end the input via speech also takes a part in the HCI as it is the human's easiest way to communicate, but for sure some-thing completely different, compared with the other two types of input and output methods, as it is more an algorithm than a device. The combination of different input methods, called multi-modal interaction, is then described in the last Section.
seems to be really common and that this is going to be the future of human computer interaction, but there are for sure more enhancements which can be seen in many approaches. In the field of multi-touch products the trend to bigger touch pads in terms of multi-touch screens can be seen Therefore the technique single-touch touchpad as it is known from former notebooks is enhanced and more fingers offering natural human hand gestures can be used. Thus the user can use up to 10 fingers to fully control things with both hands as with the 10/GUI system which R. Clayton Miller introduced in 2009.With another upcoming tool even the usage of any surface for su from DISPLAXTM Interactive Systems. From these examples it can be clearly seen that new ch a touch screen can be used in future, like for the example TM Displax Multitouch Technology high potential techniques are pushing into the market and are going to replace the Apple's iPad and Microsoft's Surface. In the next following sections these new tools and also the related devices which are already in the market are described in detail.
2.1
Multi-Touch Devices:
As mentioned before this section is dealing with the technique of the recently released multi-touch devices and with some new enhanced approaches. This method is now becoming common in the tablet pc's like for example the new Apple iPad in the sector of notebooks and the HP Touch Smart in the desktop sector. Thereby the screen becomes an input and output device. But multi-touch is also used today in many normal touch pads which are offering four-finger navigation. With this invention of a new human computer interaction many more work in this sector has Figure1: Example for Multi-Touch devices been done and should sooner or later also come into practical use. Nowadays the usage of touch screen and multi-touch pads
2.2.1iPad:
The recently introduced iPad from Apple is one of the many implementations of full multi-touch displays whereas it is a completely new way how people can interact with their computer. Thus the possibility to use the screen not only as a single-touch display is taking the Human Computer Interaction to the next level. With the introduced iPad it is possible to use all the finger movements which are also possible with the build-in multi touchpad's in the Apple Mac Books that can be found on the Apple1 homepage. In doing so the user is able to use up to four fingers at the same time to navigate through the interface. For example two fingers can be used to zoom and four fingers to browse through windows. With using the screen as a big touchpad the techniques of the normal touchpad have been enhanced. Although this technique is just the beginning of the new multi-touch display revolution which will be for sure expanded by increasing the display size.
the table-top pc then provides more information and interaction. So it is perhaps possible to browse through different information menus about the placed item and obtain more dig-ital information. On the other hand the size of the display and the therefore needed infrared cameras underneath is leading to an increase of size.
2.1.3
10/GUI:
The 10/GUI system which Miller invented, is an enhanced touchpad for desk-top computer purpose which can recognize 10 fingers.
With this opportunity human beings can interact with the computer with both hands and use it as a tracking and maybe also as a keyboard device. There-fore Miller designed this new touch surface tool especially for use in the desktop field. To have the best ergonomically position he argues that it is
better to have a full multi-Touch pad in front of a screen as the key-board and mouse replacement then having the whole screen as an input device, like it is known from other touch-screens. This novel system is a whole different method of computer interaction and is extended with a special graphical user interface for a full use of all 10 fingers. The most remarkable thing about this touchpad except the recognition of more fingers is also the pressure detection of every finger, which is directly stated on the screen. Therefore every finger can be used as a pointing de-vice like the mouse. With this feature it can be also used in future as a keyboard in the daily used position of the hands, without the need of selecting Letters with only one finger. But for sure this first development of the 10/GUI is mainly concerned with having the 10 fingers act instead of the mouse,"... As many activities today need only a mouse and windowed information display on the top of that..." The other innovation of this system is its special designed user interface for the usage of these 10 fingers. Thereby Miller is proposing a way to solve the problem of multiple open windows. His solution therefore is a linear arrangement of the active windows where it is possible to browse through them with 2 buttons on the outer left and right side of the touch-panel
With this possibility you can work directly on a big screen by just using your hands. This can be seen as the main advantage against the Microsoft Surface tool which is tied to a more a less stationary place. Additionally this interface allows a usage of 16 fingers at the same time so that more than just one user can work on the screen
simultaneously. With the weight of just 300g it is also a very transportable tool beside the fact that it is well durable as the lm is placed on the back of the surface to protect it from scratches and other damage. Figure 1 illustrates a detail of the thin touch-screen surface where even the usage on transparent surfaces is possible.
2.2
Video Devices
Great e orts have been made also in the section of video input and output devices like for example Sixth Sense that Pranav Mistry, et invented in 2009. The main purpose of such devices thus lies in the potential of having even more interaction with the computer then with normal touch-screens or -pads. Thereby these techniques tend to recognize human gestures like the hands without the need of additional hand-held pointing devices. According to the approach
mentioned video recognition tool is handling with human motion gestures. Especially it is dealing with recognizing the hand gestures and letting people interact with their hand gestures over a certain out-put device. Thereby the input can also be a keyboard which is projected over the small beamer and so it is fulling the output and input at the same time. With this opportunity the have also included in the video recognition to react to other things then just on the human hand gestures. Perhaps it recognizes that the user is 2.2.1 Sixth Sense:vvmvm reading a newspaper and adds additional media to this topic by projecting the media Pranav Mistry, et al. [5] developed with Sixth directly on the news-paper. Sense clearly the beginning of new techniques for human computer interaction. 2.2.2 Skin put: In their approach they are using a wearable Researchers at Carnegie Mellon University gesture interface, shown in Figure 2, where and Microsoft have developed another highly they are taking the HCI to the next level topical approach by using the humans arm as where communication is done with hands an input surface, called Skin put. without any handheld tools. Just as easy is also the output of the device for this inter-face Figure 6: Acoustic biosensor with projector which is included in this device. With a simple small projector the output is directly of Feng, et. al they have already provided an algorithm for real-time natural hand gesture detection in their paper with which it is possible to use only the hand as a pointing device like the mouse. Pranav Mistry, et. have improved this input pointing method with the development of a wearable gesture interface which is not only an input but also an output tool which will for sure have an impact on the future human computer interaction. Within this section these new methods of handling the interaction are now explained..
projected to any surface you like, which is clearly the main advantage against other devices which are tied to a certain place. So they are using for there whole communication with the computer a simple webcam for the users input and a simple projector for the output. Therefore the input can range from simple hand gestures for sorting or editing images to more specific tasks. The above
Thereby they are using a small projector which is fixed around the humans biceps to project buttons on the user's arm. Seems at first sight like the same approach like the before mentioned Sixth Sense tool. But when the user then tips on a button an inthe wristband built-in acoustic biosensor detects the pushed button. This technique especially works because it detects the different acoustic sounds which vary due to the underlying bones in the forearm. Detailed
language is described and second I am describing a more common architecture from Microsoft. of the user. For detailed operations the user can also drag objects from the big screens to a smaller screen in front of him. The user is then able to operate with the interface on a touch screen and can drag the objects back to the big output facilities. The most important thing thereby is the fact that the system can be used with any device you want to choose, for example a desktop screen, touches screen or hand-held device. Furthermore with the support of multiple users this interface then allows more users to interact with this system and work together very easy. 2.2.3 g-speak: This can be seen as a form of multi-modal computer interaction interface where the user Oblong Industries invented with g-speak a has the opportunity to interact with the full gesture input/output device with 3D computer with more then just one device. interface. This of course can be compared with the mentioned SixthSense and Skin-put devices before. The disparity between the other two methods is for sure the tar-get group. As Sixth Sense and Skin put are mainly designed for mobile usage, g-speak comes up with a more sophisticated user interface and is designed for a usage with big screens that are using a lot more space. Therefore the user wears a sort of hand gloves to control the interface, seen in Figure 6. The gestural motions are thereby detected with an Figure 7: Usage of g-speak on a largescaled screen accuracy of 0.1 mm at 100 Hz as well as twohanded and multi-user in-put are also supported. Additionally the system comes with a High-definition graphical output which can be projected to any screen in front In speech recognition itself you only need a Speech Detection is always mentioned as normal microphone. The only thing you then the most common straight-forwarded way, have to consider is noise which will be also after the gestural motion, of how people recorded with your actual planed voice. The interact between each other. This fact of major thing thereby is to create a good course impacts also the design of human algorithm not only to select the noise from the computer interfaces. Within the section of actual voice but rather detecting what humans speech detection the main point is of course are actually saying this section I am the soft-ware. For in speech recognition introducing two of the various approaches itself you only need a normal microphone. which can be used for speech detection. First The only thing you then have to consider is an algorithm to improve the spoken natural noise which will be also recorded with your
results of how accurate this new method can work in practical use will be announced on the 28th ACM Conference on Human Factors in Computing Systems2 this year. But so far it is stated that with 5 buttons the researcher were able to reach a "...accuracy of 95.5% with the controllers when five points on the arm were designated as but-tons." With the different fingers it is then possible to operate even through more complex user interfaces like using a scrolling button or even playing tetris on the palm. An example of how Skinput looks like and the technical components are shown in Figure 5.
2.3 Detection:
Speech
actual planed voice. The major thing thereby is to create a good algorithm not only to select the noise from the actual voice but rather detecting what humans are actually saying this section I am introducing two of the various approaches which can be used for speech detection. First an algorithm to improve the spoken natural language is described and second I am describing a more common architecture from Microsoft.
advantage compared to the approach of Florez Choque, et is thereby the fact that this tool comes up with a package for nearly every common language, starting with English, German, French, Spanish, and so on. On the one hand this API is available on every Windows OS but on the other hand it is only made for native windows applications. So only windows applications "...can listen for speech, recognize content, process spoken commands, and speak text... A lot more information about the development of this tool can be found on the Microsoft Speech Technology page itself.
Figure 8: Speech Application Programming
Interface
the approach which Microsoft introduced in their vision of a future home. This was originally developed for the usage in a future kitchen but describes perfect how the future Human Computer Interaction may look like. In this approach they are multi-modal interfaces with combining video and speech recognition. Therefore they are using video to detect goods in the kitchen and video projection to display the user interface directly on the kitchen surface. The detection, for instance, can be imagined like that the systems recognized which ingredient is placed on the surface. For the navigation through the interface they are then combining this video detecting method with speech detection. The whole demonstration of this device can be found on Microsoft's Future Home website itself. It is then obvious that this technology of multi-modal interface for sure influences also the "normal" Human Computer Inter-action as the same methods can be used to in a dayto-day user interface.
The main point according to this is where these techniques and devices can find their appliance in out daily use. Within these Section some possible application of how we can use this new devices are shown. This ranking is based on the fact that multi-touch devices with the iPad as a cutting-edge, are already pushing into the market.
Applications
It can be clearly seen that the way of how people are interacting with their computer has changed over the years and that the trend is going to have a more convenient way to interact with the PC. As mentioned in the section before it can be seen that there are mainly 3 different types of how the interaction with the computer in the future could look like. Multi-touch, Video gesture and pointing speech
Recognition is thereby the categories in which the Human Computer Interaction will make great efforts. The combination of these techniques then will lead to multi-modal interfaces where we would use all this types together in one system. e heard this new
now not reached the critical price, so that the wide mass could e ort to buy it. A wide range of applications for such models is certainly available. Especially for tools like the 10/GUI system [2] that is made for home computer ser-vices. Most of us would appreciate having a full multi touchpad in front of the computer screen with which everything could be operated. No unnecessary cables and different devices for an input pointing and typing device would be needed. In doing so it is clear that this device could disrupt the mouse and the keyboard as the typical input devices. Another well known product example is Microsoft's Surface. [3] With their multitouch large screen tabletop computer the intended to generate a device for many different applications. With the opportunity to work without the need of any additional devices like for example the mouse it is very intuitively to use. Microsoft is seeing appliances in the industries of financial service. Hospitality, retail and the public sector. With these examples it is obvious that this technique should have a wide target group. We can see the main target group in the public or even more in the entertainment sector because of its relatively larger size of the device, where other devices use a lot less space. On the other hand the main advantage against all other devices is truly the usage of tagged objects. With this possibility not only human gestures are tracked also more sophisticated methods are used. A completely different way of touch screen usage supports the DisplaxTMMultitouch Technology with its ultra thin transparent paper that can be spanned over every surface you like. The certainty that the whole device only weighs 300g makes it also very mobile, so that it can be used every-where, for example to bring it to conferences or meetings where more people can collaborate with each other. The actuality that the systems also allow the usage of 16 fingers brings even more advantages for this inter-action. These appliances are all mainly for personal or business use but in what other areas could
there is an employment? There is no specific answer for this question. For sure any surface with this paper lm placed on could then be an interactive input device. The TM DISPLAX company itself sees their potential customers in the retail and diverse industries such as telecoms, museums retail, property, broadcast, pharma or finance. As this technology was primarily developed for displays to integrate a touch screen it "...will also be available for LCD manufacturers, audio visual integrators or gaming platforms.
3.2
Video gesture and pointing recognition devices are using even more sophisticated methods to interact with the computer. Therefore the basic background behind these technique is the human gestures and motions with which these devices are controlled. It can be seen from the recent developments that there are two types of applications for video recognition devices. The disparity is displayed in the mobility aspect. As mobility in SixthSense and Skinput plays an important role in this two devices, g-speak is mainly designed for stationary purposes. Nevertheless all three types have their potential to become a future device for Human Computer Inter-action, but lets start with the possible appliances of the g-speak tool. Oblong Industries have designed with their tool a complete new way to allow free hand gestures as input and output. Beside the gestural part they also constructed this platform for real-space representation of all input objects and on-screen constructs on multi-screens as shown in Figure 4. With this tool they are not only using the so called midair detection which recognizes the human hand gestures and operates the inter-face but also a multi-touch tabletop pc is used as described in the previous Section.
Figure 10: Recognition Video gesture and pointing
Skinput as it is introduced in Section 2.2.2 is a new way how interaction with human fingers can look like in the future. Skinput was designed to the fact that mobile devices often do not have very large input displays. Therefore it uses the human body, or more precisely the humans forearm to use it as an input with several touch buttons. As it can be seen for now on this system contains some new interesting technology but is limited to pushing buttons. Thus it will find its use in areas where a user operates an interface with just buttons and so it is not very likely that it will replace the mouse a map where the user then can zoom and or keyboard in general. Age in future human computer interaction. As already mentioned this tool uses human gestures as an input method but it offers a lot of more possible appliances. As only a few examples from many it can project navigate through, it can be used as a photo organizer or as a free painting application when displayed on a wall. Taking pictures by forming the hands to a frame, displaying a watch on the hands wrist or a key-board on the palm and displaying detailed information in newspapers or actual light informations on light tickets are just a few of the many opportunities. Some of these applications are shown in Figure 5. This list of many appliances highlights the high potential of this new technology and with the up-to-date components in the prototype with about $300 it is even affordable for the global market. Thus this device delivers most likely the highest probability to be the first to push into the market and become the beginning of the future interaction with the computer.
3.3 Speech Recognition: The humans speech as mentioned before is the easiest and most convenient way how people are used to communicate with each other. But how can this fact are used for the interaction between humans and the computer? As described in the above section there are different approaches to solve the problem to get a good accuracy of correct detected words. Some of these techniques are already in the practical use like the Microsoft's Speech Application which comes with all Windows operating systems. Although particularly Microsoft has done a lot of work in this sector until now they were not able to develop a method with a accuracy which is high enough so that it could be used as a 100% reliable input technique. As long as a 100% reliability is not reached this technique will perhaps find no practical use as a single human computer interaction alternative. When this goal is achieved this method will for sure find parts in many areas. A large target group in general is the business section where an automatically speech detection would relievable a lot. Thereby especially the part of speech to text recognition is an important part as this represents the hardest work. For easier speech recognition some approaches are already somewhat certifiable in use. For in-stance for a simple navigation through operating systems these technique is just good enough and helps many people especially in the medical sector. Thereby it is really helpful for people with physical disability or where the usage of hands is unsuitable be-cause the common way to interact with the computer is definitely going in the direction of using human gesture detection or the usage of multi-modal interface as described in the next Section.
Future Home example where they are combining video and audio detection for the interaction between the computer and humans. Thus it is concerned in detail with the pros of adding the audio part to the interaction. Microsoft is using in their example an outstanding way how the communication with the computer can look like in future. With the usage of video and audio detection this is brought to the next level. In their approach they are introducing these methods in general for the kitchen appliance. For practical usage it can be clearly seen that this combination brings more extendibility in the interaction. This refers especially to tasks where it is not possible to use the hands to interact with the computer. In Microsoft's example this location is as mentioned before the kitchen where you often need the hand for cooking and it is very helpful that you can navigate through an interface with your voice. Another few examples therefore can then be the medical sector for people with disabilities, the car, the plane, and many more. Unfortunately the most appliances claim a high level of ac-curacy which the speech recognition for now one can not achieve. Of course this technique is sophisticated enough to use it for simple navigations through interfaces as in the kitchen example. On the other hand the video object recognition in this example is also a good addition to generate a powerful tool..
details about the tools and the new approaches and applications. Thus this leads to the question which tools will come up into practical use in the near future. With the recently released iPad and also Microsoft's Surface the first step into a future of multitouch and video detection devices has been made. Thereby the iPad with its affordable price is the first one who is really pushing into the global market and com-pared to the other products and its features it provides the most advantages. The ultra thin paper developed by DisplaxTMextends this multitouch display to a large screen with the great benefit that it allows multi user. Nevertheless the SixthSense tool and g-speak have a complete new technology which they are using. The only problem thereby is the matter of the price. Until now these tools have not reached the critical price so that everybody can effort buying such a tool. With the specified price of the SixthSense with only $300 prototype it could be a big competitor for these multi-touch devices. Therefore we have to see if this price is really marketable and if so it has the highest potential to come into the global market within the next few years. The main advantage of this tool is definitely the wide range of usage which all the other devices are not able to compete with. For a more sophisticated adoption the g-speak is a better solution. It delivers an enhance interface and the possibility of multiple user interaction which is the big advantage but on the other hand it serves more the business or education sector as it would be expensive to for individuals. The 10/GUI systems come up with a solution for the homecomputer use. It enhances the multi-touch pads with an individual designed interfaceand could displace thereby the mouse and keyboard the in the home pc sector. All considered all these methods deliver many new opportunities of future human computer interaction. The main factor will be the price and the sixthsence seems to supply the best price for an adoption in the near future.
This paper introduces a lot approaches of new future Human Computer Interaction methods and also devices or prototypes in which these techniques are already in use. We have seen that we are tending towards to disrupt the usage the usage of the mouse and the keyboard as we are used to use them as a computer input device for the last 3 decades. Many new methods are going into the sector of using human hand gestures and even multi-modal methods to interact with the computer. With the tools described we have seen that with the recently released iPad and Microsoft's Surface some of these methods are already included. For sure there are more sophisticated methods which will push into the market soon. With the SixthSense and gspeak tool two enhanced methods of human computer interfaces were developed. With this opportunity and the fact that we are used to act with our hands and communicate with our voice this parts will play a major role in our interaction with the computer. On the other hand it can be seen that there is still a lot of work left especially in the sector of human voice detection though the video and multi-touch detection also leaves some space for expansion. Many new approaches will be described in the years 28th ACM Conference on Human Factors in Computing Systems and we will see how other devices will take part in this new technology.
4 Conclusion