Vous êtes sur la page 1sur 26

Mobile Facial Animation

ACKNOWLEDGEMENT

I take this opportunity to thank the Almighty for keeping me on the right path and the immense blessing towards the successful completion of my Seminar.

I wish to express my sincere gratitude to Smt. Geetha Ranjin, H.O.D Department of Electronics and Communication Engineering for her expert guidance, constant encouragement and valuable suggestions for the completion of this Seminar.

I am also grateful to my Staff-in-Charge Mr. Ranjith Ram and Mr. Vinod Kumar, Department of Electronics and Communication Engineering, for always being there to hand out invaluable pieces of advise.

Last of all, all my teachers and friends who extend every possible assistance they could. ROSHITH. P

Govt. College of Engg., Kannur

2005

Mobile Facial Animation

ABSTRACT
Three-dimensional facial model coding can be employed in various mobile applications to provide an enhanced user experience. Instead of directly encoding the data using conventional coding techniques such as MPEG-2, a one-time 3D computer model of the caller is transmitted at the beginning of the telephone call. There after capturing 3D movements and mimicry parameters with the camera is all that required to continually see and hear a true -to-life, synchronized caller on the display. The 3Dmodels are interchangeable, which means that one person can be displayed on the screen with the movements of another, it is suitable for use in conjunction with various mobile networks from GSM to UMTS. However what is less clear is that sensitivity to channel errors of 3D -coded data.

Govt. College of Engg., Kannur

2005

Mobile Facial Animation

CONTENTS
CHAPTERS Chapter 1. Chapter 2. Chapter 3. INDEX INTRODUCTION SYSTEM OVERVIEW FACIAL ANIMATION AND SPECIFICATION 3.1 MPEG-4 standard 3.2 Face animation parameters 3.3 Facial animation parameter units 3.4 Face feature points Chapter 4. 3.5 MPEG-4 facial animations delivery CODING OF FAP'S 4.1 Arithmetic coding OF FAP'S 4.2. DCT13.1 coding of FAPS Chapter 5. Chapter 6. 4.3. Interpolation and extrapolation SYSTEM ARCHITECTURE CHANNEL MODELS FOR FAP 6.1 GPRS 6.2 EDGE Chapter 7. Chapter 8. 6.3 Results ERRORS IN MOBILE FACIAL ANIMATION APPLICATIONS 7.1 Embodied agents in spoken dialogue systems 7. 2 Language training with talking heads Chapter 9. 7.3 Synthetic faces as aids in communication CONCLUSION REFERENCES 22 23 18 20 12 14 9 PAGES 1 2 3

Govt. College of Engg., Kannur

2005

Mobile Facial Animation CHAPTER 1

INTRODUCTION
Facial animation and virtual human technology in computer graphics has made considerable advancements during past decades has been a research topic attracting an -increasing. Number of commercial applications such as mobile platforms, telecommunications, tele-presence via Internet end or digital entertainment. A number of mobile applications may benefit from the enhancement that 3-D video can bring including message services and e-commerce. Despite of the possible advantages of such technologies, the effect of the mobile link on 3-D video has not considered in the design of it's syntax. Another issue is delivering the coded possible. MPEG-4 is the first international standard that standardizes-time multimedia communication- including natural and, synthetic audio, video and 3D graphics. In order to define face models there is BIFS for MPEG-4. Within BIFS the FAP coding provides, lower bit rate for face models. In order to deliver the services, the possible channel models are GPRS and EDGE. But the coded data is sensitive to errors, so it should be considered. In the next session, gives an overview of some relevant parts of FAP technology, coding of FAP, and discusses different mobile network technology. This is followed by some results when FAP is delivered through GPRS and EDGE channels and Comparing channel errors. Next are other noticeable issues involved on this technology. The applications of facial animation in mobile terminals are also discussed. bit stream over the wireless network The bandwidth should be as narrow as

Govt. College of Engg., Kannur

2005

Mobile Facial Animation CHAPTER 2

SYSTEM OVERVIEW

Figure 2.1 System overview The mobile facial animation can be describe-'' using the above block diagram. Using a projection camera the 3D input surfaces or facial models are produced. Using facial animation technique the movements of the face can be tracked. MPEG-4 FAP encoder encodes the high-resolution facial models. Now this data stream is transmitted over wireless network. GPRS and EDGE channel models are preferred to use this due to data rate and bandwidth. In this error rate is small in EDGE. Now in the receiver the data stream is received using same protocol stack as in transmitter but in the inverse manner. Then it -s decoded using MPEG-4 FAP decoder. Now the face model is reconstructed.

Govt. College of Engg., Kannur

2005

Mobile Facial Animation CHAPTER 3

FACIAL ANIMATION AND SPECIFICATION


3.1 MPEG-4 STANDARD
The MPEG-4 Systems standard 1) Contains a method of representing and encoding 3-D scenes, called Binary Information For Scenes (BIFS). 2) This standard is based on the Virtual Reality Modeling Language (VRML), which specifies a language for describing 3D scenes. 3) BIFS provides a method for compressing VRML type data, and animating the 3-D objects in a scene. 4) Within BIFS, MPEG-4 provides Facial Animation Parameter (FAP) encoding. 5) FAP encoding specifically lows the representation, animation, and binary encoding of facial models. This can be performed using techniques of varying complexity. Like the VRML standard (3), MPEG-4 BIFS describes 3-D scenes using a series of nodes. Nodes can describe various scene aspects including object shape, rotation, and translation. They can also contain other nodes; it is common for scenes to contain hierarchical node trees. Although scenes are described using VRML type structures, BIFS includes a number of features that are not present in VRML: Data streaming Scene updates Compression of scene data

A combination of the first two features allows elements within scenes to be animated. BIFS also allows scenes to be displayed as the data arrives at the client, while VRML requires the whole scene to be downloaded before anything is shown.

3.2 FACE ANIMATION PARAMETERS


Govt. College of Engg., Kannur 2005

Mobile Facial Animation

In an effort to standardize face models parameterization, originally* for the purposes of efficient model-based coding of moving images, the MPEG consortium developed the MPEG-4 Facial animation standard. This standard defines 68 facial animation parameters (FAPs) and 84 facial feature points. The facial feature points are well-defined landmark points on the human face. Face Animation Parameters (FAPs) that have been designed to be independent of any particular facial model. In other words, essential facial gestures and visual speech derived from a particular performer will produce good results on other faces unknown at the time the encoding takes place. The 68 parameters are categorized into 10 groups related to parts of the face (Table 1). FAPs represent a complete set of basic facial actions including head motion, tongue, eye, and mouth control. They allow the representation of natural facial expressions. They can also be used to define facial action' units . Exaggerated values permit the definition of actions that are normally not possible for humans, but are desirable for cartoon-like characters. The FAP set contains the two high-level FAP and 66 low level FAP's .The high level FAP are visemes and expressions (FAP group 1). A viseme is a visual correlate to a phoneme. Only 14 static visemes that are clearly distinguished are included in the standard set. In order to allow for co articulation of speech and mouth movement, transitions from one viseme to the next are defined by blending the two visemes with a weighting factor. Similarly, the expression parameter defines 6 high level facial expressions like joy and sadness (Figure). In contrast to visemes, facial expressions are animated with a value defining the excitation ,of the expression. Two facial expressions can be blended with a weighting factor. Since expressions are high-level animation parameters', they allow animating unknown models with high subjective quality.

Govt. College of Engg., Kannur

2005

Mobile Facial Animation

Figure 3.1 Facial Expressions

Table 3.1: FAP groups Group 1: Visemes and expressions 2: Jaw, chin, inner lowerlip, cornerlips, midlip 3: Eyeballs, pupils, eyelids 4: Eyebrow 5: Cheeks 6: Tongue 7: Head rotation 8: Outer lip positions 9: Nose 10: Ears Number of FAPs 2 16 12 8 4 5 3 10 4 4

Govt. College of Engg., Kannur

2005

Mobile Facial Animation

3.3 FACIAL ANIMATION PARAMETER UNITS


The MPEG-4 standard uses a set of parameters (FAP) already explained. As explained FAP defined independent of a face model. This is accomplished by defining each parameter in a normalized space referring to FAP Units or FAPUs. In a system FAPUs are computed by measuring the distance of key feature points on the neutral high-resolution model. Figure shows these key standard measurements including the Eye Separation (ES), Iris Diameter (IRISD), Eye Nose Separation (ENS), Mouth Nose Separation (MNS), and the Mouth Width (MW), respectively. FAPUs derived from these measurements can be scaled and adjusted to produce a "visual volume" best suited for the target face.

MPEG-4 FAPUs measured on a person's face.

Govt. College of Engg., Kannur

2005

Mobile Facial Animation

3.4 FACE FEATURE POINTS


An MPEG-4 compliant face model is a 3D mesh that includes 84 well-defined Feature Points (Figure). 68Facial Animation Parameters describe the displacements of the feature points, together with global head rotations around the x,.y and z axes. It is up to the animation player to reconstruct realistic movements for the remaining points of the face model. The FAPs are normalized so that the same stream can be used to animate different face models (FAPU).

Figure 3.4.1 An MPEG-4 compliant face model, the dots representing the 84 Feature Points (left). An example of varying FAP 4,5,6 and 12 to describe mouth opening (right) Some feature points like the ones along the hairline are not affected by FAPs. They are required for defining the shape of a proprietary face model using feature points. Feature points are arranged in groups like cheeks, eyes, and mouth (Table 1). The location of these feature points has to he known for any MPEG-4 compliant face model.

3.5 MPEG-4 FACIAL Figure3.4.2 Facial Feature Points ANIMATIONS DELIVERY


Govt. College of Engg., Kannur 2005

Mobile Facial Animation

This discussion aims at analyzing the issues in the transmission of MPEG-4 compliant facial animation streams over lossy packet networks, such as wireless LANs, the Internet, or third generation mobile networks. Many web-based applications now exploit three-dimensional, animated virtual characters to enrich their user interface . However, the HTTP/TCP protocols used for animation transport in the majority of existing systems fail to guarantee a fast interaction over a wide range of network conditions .The use of unreliable, connectionless transport protocols, such as the Real-time Transport: Protocol (RTP) over UDP,for the delivery of multimedia content has been proposed in order to reduce end-to-end latency and improve robustness against network congestion. The MPEG-4 standard allows for the encoding and representation of a wide range of natural and synthetic audio and video sources .A major difference with previous multimedia standards lies in its object-based approach in which a scene is composed of severs! AudioVisual Objects, each of them being represented through an elementary bit stream. One such object is the Face Object-, a three dimensional face model (either humanor cartoon-like)that may be animated by a set of Facial Animation Parameters (FAP) .Because of the complexity of implementing a complete MPEG-4 Systems architecture, a common approach in web-based applications is that of directly carrying a single elementary stream over the lightweight RTP protocol . For face models very low bit rate and the use of a model-based, variable-length, predictive encoding is used. MPEG-4 employs the highly efficient arithmetic and DOT be achieved .Thus, the frame size (Discrete Cosine Transform) coding algorithms to reduce temporal redundancy in FAP streams. Bit rates as low as 2 Kbps can becomes comparable with the size of RTP/UDP/IP headers. The use of these algorithms also means that the loss/late arrival of a single packet may destroy a significant amount of information, hence requiring the use of error resilience and/or concealment techniques .Finally, the specific bit stream syntax often requires a significant amount of look-ahead in the decoding process: if a packet is lost or corrupt, the decoding process is interrupted up to the next reference frame. This effects on bandwidth . work investigates the

Govt. College of Engg., Kannur

2005

Mobile Facial Animation CHAPTER 4

CODING OF FAP'S
One key issue in making use of FAP technology is how FAP parameters are obtained ready for encoding. The standard MPEG-4 FAP encoder software uses text based FAP files as input. These text-based files contain Various parameters specifying how the face moves, anyhow the binary encoded FAP should be produced. Generation of these FAP files may be achieved in two ways: manually, or automatically by employing image Processing algorithms. A number of image processing techniques have been proposed, which are capable of identifying and tracking facial features. Basically for coding facial animation parameters,MPEG-4 provides two tools, ceding of quantized and temporally predicted FAP's using an arithmetic coder allows for coding of FAP's introducing small delay only. Using a discrete cosine transform(DCT) for encoding a sequence of FAP's introduces significant delay but achieves higher coding efficiency.

4.1 ARITHMETIC CODING OF FAP'S


The figure below shows the block diagram for encoding FAP's. The first set of FAP values FAP (i)0 at the instant 0 is coded in intra mode. the value of an FAP at time instant k FAP(i)k is predicted using the previously decoded value FAP(i)k-l.the prediction error e is quantized using a quantization step size that is specified for each FAP multiplied by a quantization parameter FAP_QUANT with CK FAP_QUANT<9. FAP_QUANT is identical for all FAP values of one time instant k. using the FAP dependent quantization step size and FAP_QUANT assures that quantization errors are subjectively evenly distributed between different FAP's. The quantization prediction error e is arithmetically encoded using a separate adaptive probability model for each FAP's. Since the encoding of the current FAP value depends only on one previously coded FAP value, this coding scheme allows for lowdelay communications. At the decoder, the received data is arithmetically decoded, .dequantized and added to the previously decoded value in order to, recover the,encoded FAP value.

Govt. College of Engg., Kannur

2005

Mobile Facial Animation

10

In order to avoid transmitting all FAPs for every frame, the encoder can transmit a mask indicating for which group FAP values are transmitted. The encoder can also specify for which FAPs within a group values will be transmitted. This allows the encoder .to send incomplete sets of FAPs to the decoder.

Figure 4.1.1 Block diagram of the encoder using arithmetic coding for FAPs.

4.2. DCT CODING OF FAPS


The Second coding tool that is provided for coding FAPs is the discrete cosine transform applied to 16 consecutive FAP values (Figure). This introduces a significant delay into the coding and decoding process. Hence, this coding method is mainly useful for application where animation parameter streams are retrieved from a data-base. This coder replaces the coder shown in arithmetic coding. After computing the DCT of 16 consecutive values of one FAP, DC and AC coefficients are coded differently. Whereas the DC value is coded predictively using the previous DC coefficient as prediction, the AC coefficients directly coded. The AC coefficient and the prediction error of the DC coefficient are linearly quantized. Whereas the quantizer step size can be controlled, the ratio between the quantizer step size of the DC coefficients and the AC coefficients is set-to 1A. The quantized AC coefficients are encoded with one variable length code word (VLC) defining the number of zero-coefficients prior to the next non-zero coefficient and one VLC for the amplitude this non-zero coefficient. The hanging of the de-coded FAPs is not changed.

Govt. College of Engg., Kannur

2005

Mobile Facial Animation

11

Figure 4.2.1 Block diagram of the FAP encoder using DCT. DC coefficients are predictively coded. AC coefficients are directly coded.

4.3. INTERPOLATION AND EXTRAPOLATION


The encoder may allow the decoder to extrapolate the values of some FAPs from the transmitted FAPs. Alternatively, the decoder can specify the interpolation rules using FAP interpolation tables (FIT). A FIT allows a smaller set of FAPs to be sent during a facial animation. This small set can then be used to determine the values of other FAPs, using a rational polynomial mapping between parameters. For example, the top inner lip FAPs can be sent and then used to determine the top outer lip FAPs. The inner lip FAPs would be mapped to the outer lip FAPs using a rational polynomial function that is specified in the FIT. The decoder can extrapolate values of unspecified FAPs, in order to create a more complete set of FAPs. The standard is vague in specifying how the decoder is supposed to extrapolate FAP values. Examples are that if only FAPs for the left half of a face are transmitted, the corresponding FAPs of the right side have to be set such that the face moves symmetrically. If the encoder only specifies motion of the inner lip (FAP group 2), the motion of the outer lip (FAP group 8) has .to be extrapolated. Letting the decoder extrapolate FAP values may create unexpected-results unless FAP interpolation functions are defined.

Govt. College of Engg., Kannur

2005

Mobile Facial Animation CHAPTER 5

12

SYSTEM ARCHITECTURE
On the transmitting machine /7X) an uncompressed FAP file is encoded in real time. A dedicated hardware motion capture system could also be taken as the FAP source. The transmitter is responsible for applying the desired encoding parameters and implementing the packetization policy.

Figure 5.1 System Architecture The HTTP/TCP protocols used for animation transport in the majority of existing systems fail to guarantee a fast interaction over a wide range of network conditions. The use of unreliable, connectionless transport protocols, such as the Real-time Transport Protocol (RTP) over UDP, for the delivery of multimedia content has been proposed in order to reduce end-to-end latency and improve robustness against network congestion . On the receiving terminal (KX), a network buffer is used to compensate for jitter and out-of-order arrival of packets. As soon as a sufficient quantity of packets is received (typically, 12 packets or about 1 second; the exact number may be adjusted to fit network conditions), the decoder starts processing the received stream. After reassembling the bit stream, the receiver is to detect and hide network errors from the animation' player. As soon as an error/packet loss is detected, the decoder starts a search for the next reference frame in the bit stream. When this is reached, the decoding process can restart. Another issue is the generation of 3-D face models that resemble the speaker. FAP data can be decoded and applied either to default facial models on the end user's terminal, or can be applied to models downloaded for a particular session that more accurately represent the speaker. Various methods have been proposed to produce 3-D models of human faces from camera images.

Govt. College of Engg., Kannur

2005

Mobile Facial Animation

13

To find an effective compromise between bandwidth and video quality, the choice of encoding and packetization parameters must take into account the characteristics of the channel on which the animation is transmitted. So the channel should be enough bandwidth.

Govt. College of Engg., Kannur

2005

Mobile Facial Animation CHAPTER 6

14

CHANNEL MODELS FOR FAP

For FAP the suitable channel models are GPRS channel and EDGE (Enhanced Data rates for GSM and TDMA/136 Evolution) channel .The following figure gives relevant comparison of different networks. From that it is clear that high bandwidth is needed for low error, which can give, by GPRS and EDGE. Table for comparison of error in different application Application environment Uni-directional applications Interactive <5% <5% 5-7 3-5 1-3 Packet loss IFD Frames/ packet 2 2 2 RTP Animation Bitrate 5 Kbps 6 Kbps 9 Kbps Buffering for Error Concealment -500 ms -200 ms -120ms

applications Low delay Wireless/mobile 10-15% networks

6.1 GPRS
GPRS is a wireless packet based network architecture using GSM radio systems. The original design of GPRS has been driven by non real time requirements. Nevertheless the adaptive multi slot capability of GPRS which allows for dynamic allocations of timeslots to a given terminal, provided enough bandwidth for the support of a limited set of multimedia enabled services. Furthermore the native support of the IP protocol allows a simple .interfacing of current IP/RTP based multimedia applications such as facial animation streaming to a GPRS network For the GPRS channel model, the propagation conditions were those specified in GSM 05.05 as TU50 Ideal Frequency Hopping at 900MHz. The TU50 channel model represents multi-path propagation conditions found in typical urban conditions. Four channel coding schemes are specified for GPRS, three of which were employed here. The frames are convolution coded under different rates. In convolutional coding when we use v output symbols for each input code the convolutional code rate is 1/p.when k symbols are shifted using k shift .registers and output symbols are v then code rate is k/v. The schemes

Govt. College of Engg., Kannur

2005

Mobile Facial Animation

15

used for GPRS are labeled CS-1, CS-2, CS-3, and respectively correspond to convolutional code rates of , 2/3 and 3/4. Figure below shows the results of GPRS simulations performed using various channel coding schemes, at a number of C/I13.2 ratios. PSNR values above 45 dB generally indicate very infrequent error bursts. Values between 40 and 45 dB indicate more frequent errors, but overall quality is likely to be acceptable to many users. Taking this as a guide, it is clear that acceptable quality is achievable using the entire channel coding schemes tested. However, relatively high C/I ratios are required using CS-3, making the use of this scheme undesirable.

Figure 6.1.1 : PSNR results for GPRS channel of FAP transmission with 11-frame per Second

Govt. College of Engg., Kannur

2005

Mobile Facial Animation

16

6.2 EDGE
Beyond GPRS, EDGE (Enhanced Data Rates For GSM Evolution) is a generation 2.5-air interface, which represents a step towards UMTS. It provides higher data rates than GPRS and introduces a new modulation technique called eight-phase-shift-keying (8-psk) that allows a much higher bit rates and automatically adapts to radio conditions. EDGE shares its available bandwidth among users on one carrier in a sector, which ranges from several 10's of kbps to 384 kbps, depending on various conditions such as propagation, interference, and traffic load. The network chooses a maximum number of retransmissions that may be attempted for each link layer segment. Link adaptation is used in EDGE so that the system can select the most efficient modulation and coding scheme for each mobile based on its current channel condition method. EDGE uses 8 different channel coding schemes. Some of them are based on the convolutional coding with correct error correction capabilities. For the EDGE channel model, the propagation conditions were again those specified in GSM 05.05 with ideal frequency hopping. However, for this model the mobile terminal speed is set to 3 km/hr. eight joint modulation-coding schemes are specified, which make use of two different modulation schemes and various convolutional coding rates. Modulation is either GMSK, as used in GSM and GPRS, or 8-PSK, which gives higher data rates. Two GMSK schemes are used here: MCS-1 and MCS:2 correspond to covolutional code rates of 0.53 and 0.66, Two 8-PSK schemes are also tested: MCS-5 and MCS-6 correspond to covolutional code rates of 0.37and 0.49. The other modulation-coding schemes resulted in the transmitted data being subjected to error rates that are too high to consider for transmission of FAP's. Results with the EDGE channel model are shown in figure below, The results show that transmission of FAP's using the 8-PSK-modulation scheme is likely to result in unacceptable quality unless the C/I ratio is greater than 18 dB. Even with GMSK modulation, acceptable quality decoding of FAP's may only realistically be possible using MCS-1, unless a C/Lratio greater than 15 dB can be guaranteed. In terms of error rates, EDGE provides a more hostile environment to multimedia than GPRS.

Govt. College of Engg., Kannur

2005

Mobile Facial Animation

17

Figure 6.2.1 PSNR results of FAP transmission over and EDGE channel with 11-frame per second.

6.3 RESULTS
1. Freezing of the animation: Corrupted data is detected before 4t is displayed. The display freezes while the decoder searches for the next resync code. 2. Catastrophic display of corrupted data. Corrupted data is not detected before it is displayed. This leads to highly obvious, "catastrophic" errors being visible in the decoder display (see figure below).

Govt. College of Engg., Kannur

2005

Mobile Facial Animation CHAPTER 7

18

ERRORS IN MOBILE FACIAL ANIMATION


A barrier to the introduction of FAP technology to mobile devices is computational Complexity. This is an issue for both the encoding and decoding terminals. At the decoder, the 3-D model must be reconstructed and rendered. Fortunately, MPEG-4 FAP models are relatively simple compared to many modern 3-D applications, and can be rendered on relatively cheap, low power hardware. Producing a compressed FAP bit stream from a text based FAP file consumes very little processing power. However, some of the image processing algorithms required to produce the FAP file parameters are often complex. This does not necessarily prohibit the use of FAP encoding in mobile devices. If the application is not real-time then the processing could be performed in the background by the mobile device. Predictive coding means that errors encountered in one P-frame propagate to following P frames. This makes the regular insertion of I-frames vital for combating the effects of channel errors. The channel errors cause loss of synchronization. The effects of synchronization loss are commonly limited through the insertion of ^synchronization code words. For MPEG-4 natural video coding, resync code words are inserted at the beginning of every frame, and also at regular locations within, each frame when the error resilience modes are used. However, because FAP's can be compressed down to such low bit rates, it would be inappropriate to insert lengthy resync code words, at such a frequency. When there is no effective error detection and concealment appropriate algorithm resync code words were inserted before every I-frame. Undetected error scan often cause very serious problems in the displayed output. P-frames following such serious errors would not improve the quality, and can therefore be skipped. In the receiver error resilience can be achieved through two complementary mechanisms: early error detection and interpolation-based concealment. The first guarantees that packet losses and bit stream errors are signaled to the concealment module as soon as possible. The second one-is responsible for recovering missing or corrupted frames.

Govt. College of Engg., Kannur

2005

Mobile Facial Animation

19

Typically, an error is detected only indirectly after some frames, and the perfect localization of the error is often impossible. It is thus very important detect the error as early as possible. With this in mind, the bit stream syntax was analyzed in order to pinpoint the places where an error could be detected. Based on several assumptions that stand in the case of most on-line transmissions, it introduced optional checks for the values of certain fields of the stream. While these fields, like gender bit, coding type, and object mask, are theoretically unconstrained, in practice they are not supposed to change during a single session. On one hand, the need for an error concealment module is increased by the fact that the loss of a single P-frame prevents the correct decoding of the following ones. On the other hand, given that the facial animation parameters represent 1-D displacements of the Feature Points, and that loss bursts are typically comparable with the length of-a phoneme (during which the mouth position, or viseme, does not vary significantly), the use-of error concealment techniques based on Interpolation proves effective in reconstructing FAP trajectories. In the MPEG-4 version different software's are developing for FAP encoding & decoding in mobile platforms. But it is necessary for them to add the following functionality. Error resilient decoding: When errors are detected, the decoder freezes the display, and searches for the next resync code. There is no error concealment built in, it has to be added. Regular insertions of resync codes: A 32-bit resync code -specified in the MPEG-4 standard is inserted before every I-frame to limit the effects of synchronization loss. Output of decoded visual data to file: The displayed output is written to a series of bitmap files, to aid quality evaluation and comparison of test result.

Govt. College of Engg., Kannur

2005

Mobile Facial Animation CHAPTER 8

20

APPLICATIONS
8.1 EMBODIED AGENTS IN SPOKEN DIALOGUE SYSTEMS
Using facial animation we can create talking heads, which deliver their services through mobile phones. Users were able to ask questions about available services to the talking heads on their mobile phones. Examples of the services are timetables for trains, accommodation and location of hotels. The system may use graphical interface. Other than providing lip movements to accompany the synthesized voice output, the head was capable of deictic movements: when information (e.g. a timetable) was presented somewhere in the graphical interface, the face would look and turn towards that location on the screen, thereby guiding the user's attention.

Figure 8.1.1 Talking head for service assistance

8. 2 LANGUAGE TRAINING WITH TALKING HEADS


Using a multimedia-communicating device, the facial animation can be used as language training tool. Rather than aiming at building fixed set of speech training applications, this focused on integrating a number of relevant technologies into an interactive, easy to environment, making it possible for teachers, parents and other

Govt. College of Engg., Kannur

2005

Mobile Facial Animation

21

interested parties to construct applications involving multimodal speech technology. Using the graphical interface (GUI), users could select different viewings of face and tongue.

Figure 8.2.1 Software tool for language training (remote assistance); Talking head animated for mouth.

8.3 SYNTHETIC FACES AS AIDS IN COMMUNICATION


This application aims a communication device that, in a speaker independent fashion, translates telephone quality speech signals into visible articulatory motion in a synthetic talking head with sufficient accuracy to provide significant speech reading support to the hearing impaired user, improving his ability to communicate over mobile phones. There are two factors makes it difficult. I. The device has to be speaker independent II. It has to work in real time, with no more than about 100msec delay There are efforts going on to solve this problem

Govt. College of Engg., Kannur

2005

Mobile Facial Animation CHAPTER 9

22

CONCLUSION
FAP coding provides a method of supplying animated 3-D representations of speakers at very low bandwidths. Although the processing power involved in acquiring FAP information appropriate for encoding may be challenging for mobile terminals, trading quality for complexity may produce feasible solutions. When it is carried out using the GPRS and EDGE channel models revealed that FAP coded streams are reasonably robust to error when compared to normal coded video. However, certain channel errors produce highly disturbing effects that indicate the need for efficient error detection and concealment schemes. Investigation of more advanced resynchronization code insertion schemes is also recommended.

Govt. College of Engg., Kannur

2005

Mobile Facial Animation

23

REFERENCES

[1] Jochen Schiller-Mobile Communications, 2nd Edition, ADDISON WESLEY Publishers, 2003. [2] www.apple.com /mpeg4. [3] www.vidiator.com [4] www.visage technologies.com

Govt. College of Engg., Kannur

2005

Vous aimerez peut-être aussi