Vous êtes sur la page 1sur 13

IMPROMPTU : Audio Applications for Mobile IP

Proposal for degree of Master of Science Fall 2000

Kwan Hong Lee kwan@media.mit.edu December 11, 2000

Thesis Advisor Christopher M. Schmandt Principal Research Scientist MIT Media Laboratory

Thesis Reader Mark Ackerman Principal Research Scientist MIT Laboratory for Computer Science

Thesis Reader Brian Smith LG Electronics Assistant Professor MIT Media Laboratory

IMPROMPTU : Audio Applications for Mobile IP

ABSTRACT

IMPROMPTU is an Internet Protocol (IP) based audio platform for audio communication and mobile audio applications. Although mobile phones have provided great amount of mobility in our lives, they were

fundamentally not designed with extensibility in mind. With IMPROMPTU, a variety of audio services, beyond the services that are offered on current mobile telephones can be realized, enriching the always on audio experience. Peers can know about ones availability before they make calls; audio news can be streamed to a number of people in real time as a reporter is reporting an incident; one may be able to listen to the news and decide that it be stored in an audio archive to be played back later; parents can monitor their baby much more closely while away from home and away from their desk.

In my project, IMPROMPTU, I propose an extensible audio platform that is composed of several distributed components working together to provide various audio services to a client device. The client device will support high bandwidth wireless connection and communicate with the distributed components to manage multiple audio applications. IMPROMPTU addresses the limitations of current mobile telephony by supporting multiple audio applications with a coherent audio user interface. The research results will provide guidelines for designing audio interfaces for managing and interacting with multiple audio applications on a mobile device. This will be useful for future broadband mobile communications.

TABLE OF CONTENTS ABSTRACT....................................................................................................................................2 TABLE OF CONTENTS..............................................................................................................3 INTRODUCTION..........................................................................................................................4 RELATED WORK........................................................................................................................4 APPLICATIONS AND THEIR CHARACTERISTICS............................................................5 MONITORING APPLICATIONS........................................................................................................6 REAL TIME APPLICATIONS...........................................................................................................6 INTERACTIVE APPLICATIONS........................................................................................................6 MULTI USER APPLICATIONS.........................................................................................................7 CLIENT AND USER INTERFACE............................................................................................7 HARDWARE..................................................................................................................................7 MANAGING MULTIPLE APPLICATIONS.........................................................................................7 AWARENESS USER INTERFACE.....................................................................................................7 REAL TIME AND NON REAL TIME................................................................................................8 BACKGROUND CHAT....................................................................................................................8 SYSTEM ARCHITECTURE AND ITS REQUIREMENTS....................................................8 REQUIREMENTS............................................................................................................................8 COMPONENT DESCRIPTIONS.........................................................................................................9 PLAN AND EVALUATION.......................................................................................................10 DELIVERABLES...........................................................................................................................10 TIMELINE....................................................................................................................................11 REFERENCES.............................................................................................................................12 READER BIOS............................................................................................................................13 MARK S. ACKERMAN.................................................................................................................13

INTRODUCTION
Personal computers have been a multipurpose platform for a wide range of different applications. Combined with the Internet, it has been the general platform for different types of communication applications ranging from instant messaging, e-mail, news groups to IRC chats. However, these applications have been designed with the assumption that users are sitting, stationary, and focused on the application. With increasing mobility, audio communication becomes more important because ones eyes and hands are needed for other tasks. Recent growth in mobile phone usage has proven the value of voice communication for people on the move. Nonetheless, there are fundamental problems with current mobile telephone systems that limit their functionality. These systems are based on the traditional circuit switched phone network, and are not designed to handle multiple applications or multiple communication channels. The call setup process is based on a simple alerting model, which requires a caller to call a number and wait for the recipient to respond to an alert before a call is set up or forwarded to the voice mail. In my project, IMPROMPTU, I propose to build an IP based mobile platform, which consist of a wireless client device and distributed services that support multiple audio applications. I plan to explore different kinds of audio applications that would be interesting for mobile users when high bandwidth wireless connectivity is available. These applications will impose requirements for the user interface and the system architecture. By looking into

these requirements, I hope to overcome some of the limitations of current mobile audio communications.

RELATED WORK
Internet telephony protocols have been developed in the last few years to send voice over the IP (Internet Protocol) network[1]. The two main protocols that are being standardized are the H.323[2] from the International

Telecommunication Union (ITU) and the Session Initiation Protocol (SIP)[3] from the Internet Engineering Task Force (IETF). These protocols specify how audio and video can be transported over the Internet and what kind of signaling protocol is necessary. Recent development efforts in SIP also include support for presence related protocols to provide awareness between callers[4]. These efforts will provide the standard protocols for transporting voice over the IP network.

In mobile telephony, WAP Forum[5], NTTs i-mode[6] and the Voice Browser Activity[7] are the main activities currently going on to provide visual applications and interactive voice portals for mobile users. Their work will provide different services that are useful to current and future mobile phone users. However, their applications are limited to mostly visual or speech based interactive applications. Other projects that also attempt to change the landscape of the Internet and telecommunications are the ICEBERG[8] project at UC Berkeley and the Oxygen[9] project at MIT Laboratory of Computer Science, which are larger in scale and scope. They also explore the more fundamental levels of computing infrastructure and

communication infrastructure. The Ektara[10] architecture proposed by the Wearables group at the MIT Media Lab also encompass a larger scope in wearable and ubiquitous computing. IMPROMPTU will provide guidelines for managing multiple audio applications on a mobile device. In addition, by integrating the functionalities of instant messaging, phone, radio, walkman, audio recorder and other audio applications on a mobile device, it will provide a general platform for building and deploying audio applications for a mobile user.

APPLICATIONS AND THEIR CHARACTERISTICS


Audio applications can be classified by their timeliness, real time or archived; their mode of interaction, interactive, active or passive; their involvement of users, single user or multi-user; their accessibility, private or public. These characteristics are not mutually exclusive as the following table indicates. The characteristics of these applications impose different requirements for the system. Table 1 Characteristics of Applications
Baby monitor Real time1 Archived2 Delay permitted Continuous (No breaks) Tolerate breaks Regular/ Timely3 Interactive (no UI) Interactive (UI) X X X X Presence info (Awareness) X X X X X X X X X X X X X X X X News broadcast (Radio) X X X X Community news X X X X Personal notes/ reminders X X X X Call screening (Preview) X X X X Public chat (IRC type) X Private chat X X X X Audio books Music

X X X X

Active (Pushed) Passive (Polled) Single user Multi user Limited access Public access
1 2

X X X X X

X X X X X

X X X X

X X X X X X X X X X X X X X X X X X X X X X X X

Generated in real time Originally archived, since any audio content can be archived. 3 The content is regularly streamed

Monitoring Applications The first application uses the always on connection to monitor environments such as a babys room and deliver the audio data to the user who is always connected and subscribed to the particular application. The creator of the application can restrict the access to certain users or broadcast the audio to everybody in a community. Moreover, it can alert the user when sudden changes occur in the environment. A babys crying can immediately open a real time audio channel to his mother or father, depending on the application settings. It can also be constantly streaming in the background of the current active application so the user can be constantly aware of certain places or certain people. The application provides a level of awareness in voice communication before an actual interaction is established which is not possible with the traditional telephony. Real Time Applications Radio and timely news information are real time audio applications. The audio will be delivered according to the users preferences at a certain time of the day or depending on the users availability. As a user listens to a certain program for a certain length of time, its priority level will be raised. A slightly different application is the audio book, where a user actively accesses the application and decides whether to listen to it or not. It is also different because it is archived and interactive. While the radio news can also be archived for later access, it would not be as structured as the audio book to navigate easily through the content. Interactive Applications Interactive applications are mostly transaction-based applications. Their primary modality of interaction would be through speech. The applications will be similar to current VoiceXML applications. VoiceXML is a markup language for speech based interactive applications. It is similar to HTML, where HTML is used to markup visual content, while VoiceXML is used to markup speech dialogs. These applications would utilize speech services (one

of the distributed services in the system) to interact with the user. application can be used to profile the user. Multi User Applications

Keywords describing the content of the

Finally, there is the chat application, which is a multi-user audio application. Multiple users access the application and parts of the conversation could be archived for future reference. Alternatively, a user might need to switch to another application during a chat session, and later refer to the archived chat contents to catch up with the current conversation. In addition, multiple users could tune into an audio source together to have a discussion while listening to the source at the same time.

CLIENT AND USER INTERFACE


Hardware The user will use a wireless device probably through a headset that includes a microphone. The client system will be based on the Intel StrongArm[11] board, which is the same hardware platform as the Compaq iPaqs[12] available in the market. The device will have a LCD screen to support a limited visual interface and it will support wireless LAN to simulate high bandwidth wireless connectivity. Managing Multiple Applications The major challenge lies in managing multiple applications and presenting them coherently so that users can easily get a sense of what is going on in their audio space. Distinctive alerts are required for each application, and these alerts are going to be customized by the application creator or mapped according to user preferences. The audio interface will be modeled after Sawhneys work on audio interfaces and contextual alerting[13, 14]. However, I will utilize the screen to give appropriate visual feedback to indicate which applications are active or in the background. The screen is currently a touch screen, but a jog dial type of control as in current radio would minimize the need for visual focus when switching between applications. Awareness User Interface To provide awareness to trusted friends and family members about ones communication status, audio can be used to publish presence information. The always on connection of the Internet implies that a users device will always

be monitoring ones audio environment. The user can choose to let others know about ones status by allowing the device to send out garbled audio to peers[15]. Any peer who wants to talk to the user will be able to listen to this audio before making a decision to disturb the user. As this scenario suggests, the flexibility of the IP network introduces various ways of call control, where call setup decision is not only made by the receiving party, but also by the network services (by consulting user profile or presence service) or by the calling party through the utilization of the awareness information. In the case when calls are diverted by the system, auditory or visual cue will be used to notify the user about the event. Real Time and Non Real Time In presenting audio streams, audio connection can change from non real-time to real-time audio and vice versa depending on the need for the user to respond. Call screening is an application where the audio connection changes from non real-time to real-time audio during communication. Initially when a caller is routed to the voice mail, non real-time audio can be simultaneously transmitted to the user to screen the call. At this point, the user can decide that it is important to talk to the person and decide to get connected. As a result, the audio that got forwarded through the voice mail will be directly routed to the user and the connection changes to a real time connection. Background Chat Finally, utilizing the archival capabilities of the system, a user can retrieve audio data from the past. This would be useful in cases where a user has to step away from a chat session because of an urgent call from his wife. When the user returns and engages back in the chat, he can catch up on what he missed because the chat session was recorded during his absence. Users can also easily share these clips of audio with other users through the system.

SYSTEM ARCHITECTURE AND ITS REQUIREMENTS


Requirements The applications and the user interface impose the following requirements on the system architecture. The system must support different types of audio applications as described in the Applications section. The system must support audio conferencing.

The service components must have a distributed architecture to support different services and applications (extensibility) and also reduce the processing requirements on the client.

The system must support presence information publishing and subscribing. The system must profile a user. The system must be able to archive audio and make it easily accessible to the user. The system must support interactive applications that require speech recognition and text to speech.

Successful implementation of the system and showing that it supports the user interfaces and the applications will validate my architecture.

Figure 1 System Architecture Diagram

Component Descriptions The application manager manages multiple audio applications, multiple peer connections and communications between the client and the services that are listed above. Basically, it is the communication hub for the services, applications and the client.

The presence service maintains users availability status and permissions for peers to access audio awareness information, which is used by the application manager during call control. The application manager decides whether an application or peer can be connected to the user depending on the information from the presence service. Peers can also subscribe and get notified about a users presence. The chat service manages any audio conferencing sessions. It will maintain information about current active chat sessions and the users involved in it. Users will interact with the chat service to setup, invite, join and leave chat sessions. The user profiler profiles a users activity through the client device by monitoring his application usage. It also maintains the priority of applications and peers according to their activity. This information is also used for controlling incoming calls by preventing certain peers or applications from interrupting when one is busy. The service will be designed with extensibility in mind so that in the future, a more detailed user profiling can easily be implemented and used with other services. Users should also be able to set their profiles themselves. The speech service is the component that will handle speech recognition and text to speech. It will also be combined with audio storage so that in the future, audio stored could also be processed by the speech recognizer to gather data for profiling purposes and delivery of voice messages in text format. Finally, the lookup service will maintain currently active clients and applications so that one can find each other. It is similar to domain name service and hopefully will be replaced by existing naming directory services.

PLAN AND EVALUATION


Deliverables Building a system that implements the proposed architecture is the first step in this project. However, due to time constraints only a limited subset of the system will be implemented. The system will be programmed in Java, using RMI, its built in support for distributed applications[16]. I will begin with the development of the application manager, which is required for other components to work together. Presence service is the next most important element that will be implemented in order to support awareness user interfaces and call control based on availability. The chat service will be next in priority to support audio conferencing. Then, as time permits, speech service and the profiler service will be explored. The systems success will be evaluated by supporting at least three

applications: a baby monitor, chat application and timely news delivery application.

10

In parallel to the backend service development, I will be building the client software on the Intel StrongArm device that supports full duplex audio and wireless Ethernet (IEEE 802.11). The device will be running Linux and a client application that enables always on audio connection will be implemented. interfaces as described above. Timeline Dec 2000 Feb Mar Requirements for the system must be refined and made clear. Design goals must be clear in order to come up with a good implementation plan. Two to three UROPs are going to be hired in order to aid in the development process. Complete application manager. Peer to peer audio communication between clients operational. Simple baby monitor will be made and tested to work together with telephony functions. Start writing thesis. Refine client user interface. Make presence service operational. Integrate audio applications and callers to interact with presence service for call control. Make chat service operational. Refine client user interface from feedback from users. Demo, work on thesis draft and review with readers. Explore user profiling and speech service if progressing on schedule. System must be robust (run at least 24 hrs without breaking). Demo, work on thesis draft and review with readers. Thesis final draft, review with readers. It will support different audio

Jan 2001

Apr

May

11

REFERENCES
[1] [2] [3] [4] [5] [6] [7] [8] I. T. Union, "IP Telephony Workshop," International Telecommunication Union IPTEL/03, May 29 2000. P. E. Jones, "H.323 Information Site," 2000. See http://www.packetizer.com/iptel/h323/ H. Schulzrinne and J. Rosenberg, "Internet Telephony: Architecture and Protocols -- an IETF perspective," in Computer Networks, vol. 31, 1999, pp. 237~255. R. Bennett and J. Rosenberg, "Integrating Presence with Multi-Media Communications," dynamicsoft, Inc. 2000. W. Forum, "WAP Forum," 2000. See http://www.wapforum.org/ N. DoCoMo, "All About i-mode," 2000. See http://www.nttdocomo.com/i/index.html W. W. W. Consortium, ""Voice Browser" Activity," World Wide Web Consortium, 2000. See http://www.w3.org/Voice/ H. J. Wang, B. Raman, C.-N. Chuah, R. Biswas, R. Gummadi, B. Hohlt, X. Hong, E. Kiciman, Z. Mao, J. S. Shih, L. Subramanian, B. Y. Zhao, A. D. Joseph, and R. H. Katz, "ICEBERG: An Internet-core Network Architecture for Integrated Communications," in IEEE Personal Communications (2000): Special Issue on IP-based Mobile Telecommunication Networks, vol. 7, 2000, pp. 10~19. M. L. Dertouzos, "The Oxygen Project : The Future of Computing," in Scientific American, 1999. R. W. DeVaul and S. Pentland, "The Ektara Architecture: The Right Framework for Context-Aware Wearable and Ubiquitous Computing Applications," The Media Laboratory, Massachusetts Institute of Technology 2000. Intel, "Handheld/Wireless Applied Computing Platforms," Intel, 2000. See http://developer.intel.com/platforms/applied/hhwless/index.htm?iid=ipshomepic+hhwless& Compaq, "Handhelds Home," Compaq, 2000. See http://www5.compaq.com/products/handhelds/ N. Sawhney and C. Schmandt, "Speaking and Listening on the Run: Design for Wearable Audio Computing," presented at ISWC: International Symposium on Wearable Computing, 1998. N. Sawhney and C. Schmandt, "Nomadic Radio: Scaleable and Contextual Notification for Wearable Audio Messaging," presented at ACM SIGCHI, Pittsburgh, Pennsylvania, 1999. S. Marti, N. Sawhney, and C. Schmandt, "GarblePhone : auditory lurking," 1998. See http://www.media.mit.edu/speech/projects/garblephone.html S. Microsystems, "Java Remote Method Invocation," 2000. See http://java.sun.com/products/jdk/rmi/index.html

[9] [10] [11] [12] [13] [14] [15] [16]

12

READER BIOS
Mark S. Ackerman Mark Ackerman is currently at MIT Laboratory for Computer Science as a Principal Research Scientist in the Oxygen Project. He is on leave from University of California, Irvine (UCI). At UCI, he is an Associate Professor in Department of Information and Computer Science. He has been involved in the initial stages of IMPROMPTU from February 2000 to August 2000. He worked closely with us in brainstorming about applications and designing the architecture for IMPROMPTU. Research Areas: Computer Supported Cooperative Work (CSCW), Human-Computer Interface, Sociology of Information, Social Analysis of Computing Systems, Sociology of Programming. Particular Topics: Organizational Memory, Collaborative Help, Electronic Social Spaces, Computer-Mediated Communications. The following is a description of his research interests in his own words. I primarily investigate the interplay of the social world with our software systems. I'm interested in the two phases of this: how we can incorporate elements of the social world within software systems (such as with computersupported cooperative work systems), and how systems affect our society and lives in return. This requires a dual emphasis on both the technology and the social structures of its use. [from his website at Department of Information and Computer Science at University of California, Irvine. http://www.ics.uci.edu/~ackerman/]

13

Vous aimerez peut-être aussi