Vous êtes sur la page 1sur 25

Time Frequency Analysis Tutorial

Gabor Feature and its Aapplication

Professor: Jian-Jiun Ding R99942057

Outline
Abstract Chapter 1 Introduction----------------------------------- 3 Chapter 2 Short Time Fourier Transform-------------3 Chapter3 Gabor Wavelet Transform-------------------10 Chapter4 Gabor Feature in Speech Recognition----17 Chapter5 Gabor Feature for Face Recognition------21 Chapter6 Conclusion--------------------------------------24 References

Abstract
Fourier transform are a common method for signal analysis in recent years.However, Fourier transform focus on the frequency domain, cant analyze the variations of frequency with time.Short time fourier transform have thus been proposed to do the analysis of frequency with time.The most famous one of short time fourier transform is Gabor transform.Gabor transform use gaussian function as its window function to do the short time fourier transform and is commonly used in signal analysis nowadays.Gabor transform has been applied to speech recognition,acoustic,signal sampling,image recognition,etc.In this paper well introduce what is gabor transform and its different from traditional fourier transform.Also well cover how gabor filter can be used to extract gabor feature for speech recognition and face recognition,etc.

Chapter 1

Introduction

Although the Fourier transform of the entire time series does contain information about the spectral components in time series, it cannot detect the time distribution of different frequency, so for a large class of practical applications, the Fourier transform is unsuitable. So the time-frequency analysis is proposed and applied in some special situations. The STFT is most often used.The paper are organized as the following:Chapter 2 will give a brief introduction to short time fourier transform and its advantages in signal analysis.Chapter 3 will introduce gabor transform and its application.Chapter 4 will give a overview of how gabor filter can be use in speech recognition,which will definitely be our key point in this paper.Chapter 5 will introduce the use of gabor feature in image recognition.The last part will be the conclusion and some future work for gabor features.

Chapter 2 Short Time Fourier Transform(STFT)


2.1 Introduction:
Conventional Fourier Transform, named in honor of Joseph Fourier, is a special linear operator that maps the signal into a set of frequency component. The problem of conventional Fourier Transform is that it does not show where in time these components occur. For stationary signal conventional Fourier Transform is a favorable representation, since the frequency components of the stationary signal do not change with time, but in real world almost all the signals are non-stationary. Therefore, conventional Fourier Transform might not be suitable to analyze the real signal.
3

Instead of representing a signal either as a function of time or frequency separately, in 1946, Dennis Gabor proposed Gabor expansion, that any signal could be expressed as a summation of mutually orthogonal time-shift and frequency-shift Gaussian function. Gabor expansion is one kind of sampled Short Time Fourier Transform (STFT). STFT, the earliest time-frequency representation (TFR) [3-5] uses sliding window to get the local signal and then transform the masked signal into frequency domain. The most famous TFRs are STFT, Wigner transform, Wavelet transform, Cohens class, and S-transform. Short Time Fourier Transform (STFT), the simplest time-frequency representation, is a two-dimensional representation created by computing the Fourier Transform and using a sliding temporal window. By using the STFT we can observe how the frequency of the signal changes with time.

2.2 Conventional Fourier Transform


Conventional Fourier Transform transforms the signal x(t ) into a linear combination of orthogonal basis {exp( j 2 ft )} f = ~ . The formula is (2.1)
and the inverse Fourier Transform is

(2.2) where the frequency has ranges from ~ . The conventional Fourier transform does a good job, when analyzing stationary signal or periodic signal. Because the continuous signal cannot be analyzed, we have to sample the signal with sampling frequency f s = 1/ T and implement the Fourier Transform in discrete form. The relation and formulas are x[n] = x(nT ) (2.3) (2.4) In order to avoid aliasing effect, the sampling frequency should be more than the double of the signals bandwidth. If the signal is time-limited, then we can use FFT to implements (2.4) to reduce the computing time. For non-stationary signals, such as chirp signal, the conventional Fourier non-stationary signal which has two frequency components, 0.5 Hz and 1.0 Hz, but they occur in different place as shown in Figure 2.1(a) (2.5)

The power spectrum shown in Figure 2.1(b) has two peaks at 0.5 Hz and 1.0 Hz.

Figure 2.1: (a) The non-stationary signal x1 (t ) in time domain. (b) The power spectrum of x1 (t ) .

Several problems of conventional Fourier Transform have to be noticed. The first is that we cannot obtain the information about where in time these frequency components occur, and the second is that there are some leakages in the frequency domain but in reality they do not appear in (2.5). For stationary signal or periodic signal conventional Fourier Transform can do a good job. But from the above example, we know that conventional Fourier Transform is not a suitable analyzing tool for us to analyze the non-stationary signal and non-periodic signal. Since almost all the signals in real world are non-stationary and non-periodic, conventional Fourier Transform will not be very useful in real-world signal analysis. Fortunately, we have
another tool for non-stationary signal,short time fourier transform which will be introduced in the next section.

STFT uses a fixed sliding window to mask the signal and then transforms it to the frequency domain. The STFT of x1 (t ) is shown in Figure 2.2, and we can observe the two frequency components: 0.5 Hz and 1.0 Hz. 0.5 Hz appears around 10s~20s, and 1.0 Hz appears around 30s~40s. The time-frequency location is almost the same as defined in (2.5). Through the help of STFT, we can observe how the frequency of the signal changes with time. Therefore, in real world TFRs are more suitable to analyze signal than the conventional Fourier Transform.

Figure 2.2: The STFT amplitude of x1 (t ) as shown in (2.5) and Figure 2.1(a).

2.2 Short Time Conventional Fourier Transform(STFT)


In 1946 Gabor suggested representing a signal in two dimensions, with time and frequency coordinates [1], and this expansion is called Gabor expansion, which is also a sampled STFT with Gaussian window. A lot of papers have mentioned that STFT was proposed by Gabor, but in fact Gabor did not do so. Not as the same as the conventional Fourier Transform, STFT multiples the signal with a symmetric sliding window, w( t ) , and then transforms it to the frequency domain. STFTs definition is (2.6) where X ( f ) and W ( f ) are the Fourier transform of the signal, x(t ) , and the sliding window, w(t ) . In the usual case, we choose w(t ) to center around t = 0 and w(t ) 0 for |t| > range , in order to get the local spectrum. STFT has the other name, Windowed Fourier transform, because we can regard STFT ( , f ) as the Fourier Transform of the masked signal, x(t ) w( t ) . Figure 2.3 shows the detail operation of STFT.

Figure 2.3: Block diagram illustrates how STFT works. 1 , 2 ..... n are the discrete
6

Unlike the conventional Fourier Transform, which transforms one-dimensional signal into another one-dimensional signal and has only one unique inverse formula, the STFT transforms one dimensional signal into two-dimensional signal and has a lot of redundancies. Therefore, the inverse of STFT is really a big problem, especially when we want to use the STFT to do time-frequency filtering. In general form, we can express the inverse of STFT as (2.7)
where wi ( t ) is the inverse weighting function which has to fulfill the following relation

(2.8) After showing the several inverse formulas, there is one most important thing left, the choice of sliding window w( t ) . In order to have a local property around , we have to make the window fulfill the following restriction (2.9) The earliest and common window is rectangular window, which has value 1 in some range and 0 outside the range. Figure 2.4 shows the rectangular window and its amplitude response.

Figure 2.4: (a) The rectangular analysis window function wrect (t ) . (b) The amplitude response of wrect (t )

From (2.6), we can regard the upper equation as the Fourier Transform of the masked signal x(t ) w( t ) . If the window narrows, then the time resolution will be better. On the other hand, we can also regard the lower equation of (2.6) as the inverse
7

Fourier Transform of X ( + f )W ( ) ; therefore, the frequency resolution will be worse if the window becomes more narrow. The narrow window will give a better time resolution and worse frequency resolution. Hence the width of window function balances the time resolution and the frequency resolution, and it has relation with the Uncertainty Principle . The Uncertainty Principle state that (2.10) where t denotes the standard deviation of the signal in time domain (uncertainty in time), and f denotes the standard deviation of the signal in frequency domain(uncertainty in frequency). It is impossible for us to have a STFT that could provide both nice time and frequency resolution. Figure 2.5 and Figure 2.6 demonstrate the tradeoff of time resolution and frequency resolution with different window size for the signal(2.1).

Figure 2.5: Demonstrate the tradeoff between time resolution and frequency resolution; (a) The time function of the wide Hamming window. (b) The Fourier Transform of the wide Hamming window. (c) The amplitude of STFT.

In Figure 2.5 we can observe that it really has poor time resolution but good frequency resolution, and Figure 2.6 has good time resolution but poor frequency resolution. This result corresponds with what we have mentioned above. Besides the rectangular window, the other two common windows are Gaussian window and Hamming window. Their continuous formulas are (2.11) (2.12) If Gaussian window is used in STFT, it is also named Gabor Transform, which is widely used due to its less leakage in time-frequency domain. In speech analysis, people usually prefer Hamming window,which will be introduced in the next chapter.

2.4 Applications and Conclusion


About twenty or thirty years ago, conventional Fourier Transform was the major tool for signal analysis, but during the past two decades SHFT became more and more important, due to the growing of computer science. The major advantage of SHFT is that it could provide us the time-frequency location of the signal, which we are interested in, and some of their applications can be listed as follows:
9

1. Signal analysis: By using the TFR we can learn about the signals time-frequency components, and then we could analyze them and get more information, that we cannot observed direct from time or frequency domain, such as medical, geologic, power quality, optical, speech, and image signals analysis. 2. Time-frequency filtering: If we mixed two chirp signals which have different location in time-frequency domain, we can use a time-frequency mask to filter the signal and get the desired signal. Recently time-frequency filtering is widely used, especially using the Wavelet Transform and STFT. 3. Pattern recognition: A lot of signal has its own time-frequency pattern. For example, music instrument usually has its own time-frequency pattern unlike others. Different words and different people have their own voice pattern. Then we can recognize these time-frequency pattern to decide which this signal belongs to in our database. In this chapter, we have introduced the conventional Fourier Transform and its limitation when dealing with non-stationary signals. STFT is more suitable for non-stationary signal analysis.We also introduce the framework of STFT which will be extended to Gabor transform in the next chapter.

Chapter 3 Gabor Transform and Gabor Wavelet Transform


3.1 Introduction
In the previous chapter we have introduced the use and advantages of Short Time Fourier Transform.In this chapter well focus the most common use STFT Gabor Transform, well give a brief introduction to Gabor Transform and explain its advantages in signal analysis. Then were going to introduce wavelet transform and Gabor wavelet transform which will be used to feature extraction. The Gabor can be used in speech recognition, image recognition,etc.

3.2 Gabor Transform


The definition of Gabor transform is

(3.1) The Gabor transform is like the short time Fourier transform. We can see that the Gabor transform kernel is the Fourier transform kernel plus a Gaussian function. Therefore we can make a lot of transforms like the Gabor transform. Since the Gaussian sig nal is more concentrated than the rectangular function in the frequency domain, the frequency resolution of the Gabor transform is much better than short time Fourier trans form, Fig.3-1

10

Frequency (Hz)

-5

10

15 Tim (Sec) e

20

25

Figure3-1The Gabor transform of (3.2) (b) The Gabor transform of (3.3)

(3.2) x(t) = cos(2 t) when t < 10, x(t) = cos(6 t) when 10 t < 20, x(t) = cos(4 t) when t 20 (3.3)

Figure 3-2 (a) The short time Fourier transform of (3.2) (b) The short time Fourier transform of (3.3). From the above figure, we can easily see that the frequency resolution of the Gabor transform is much better than short time Fourier trans form.

3.2 Wavelet Theory


Wavelet-based analysis of signals is an interesting, and relatively recent, new tool. Similar to Fourier series analysis, where sinusoids are chosen as the basis function, wavelet analysis is also based on a decomposition of a signal using an orthonormal family of basis functions. Unlike a sine wave, a wavelet has its energy
11

concentrated in time. Sinusoids are useful in analyzing periodic and time-invariant phenomena, while wavelets are well suited for the analysis of transient, time-varying signals. Most standard wavelets are based on one wavelet function, ( x ) , which has some special properties . The wavelet function have oscillation property expressed mathematically by an integration to zero given by (3.4) A wavelet basis is a two-parameter family of functions that are related to a function ( x ) . They are defined by the set { j ,k ( x)} of wavelets given by (3.5) The variables j and k are integers that scale and displace the function ( x ) to generate a succession of wavelets. The scale index j indicates the wavelets width, and the location index k gives its position of displacement. Notice that the functions ( x ) are rescaled, or dilated by powers of two, and translated by integers. Once we know about the functions ( x ) , we know everything about the basis. The wavelet functions j ,k ( t ) for all kZ span a subspace, called W j . That is (3.6) And if f ( x ) W j , it can be expanded as (3.7) In real applications the coefficients { j ,k | j ,kZ} are processed by the discrete wavelet transform (DWT) which is an implementation of the wavelet transform using a filter bank. The discrete sequences processed by the DWT constitute multi-resolution representation. As shown in Fig. 3-3, W ( j , k ) and W ( j ,
k ) are the detail and approximation coefficients at scale j , W ( j + 1, k ) is the

approximation coefficients at scale j + 1 . h ( n ) and h ( n ) are the time-reversed low-pass and high-pass filters associated with and respectively

Fig. 3-3 DWT using filter bank

12

We can easily extend the one-dimensional transform to the two-dimensional case. In two-dimensional, an image is filtered and decomposed into an approximation and details images by applying a separable filter bank. The original image is split into approximation W ( j , m, n ) , and details WH ( j , m, n ) , WV ( j , m, n ) , WD ( j , m, n ) at level j , in horizontal, vertical, and diagonal directions. Like the one-dimensional discrete wavelet transform, the two- dimensional wavelet transform can be implemented using digital filters and down-sampling as shown in Fig. 3-4. A resulting decomposition is shown in Fig. 3-5 in which the wavelet sub-images consisting of three sub-bands in vertical, horizontal, and diagonal direction respectively.

Fig. 3-4 The two-dimensional DWT using filter bank

Fig. 3-5 Wavelet decomposition of a synthetic image

13

3.3 Gabor Wavelets


The use of Gabor filters in image analysis is biologically motivated as they model the response of the receptive fields of the orientation-selective simple cells in the human visual cortex . Furthermore, they provide the best possible tradeoff between spatial and frequency resolution. The Gabor wavelet transform (GWT) has been utilized as an effective and powerful time-frequency analysis tool for identifying the rapidly-varying characteristics of some dispersive wave signals. The effectiveness of the GWT is strongly influenced by the wavelet shape that controls the time-frequency localization property. Therefore, it is very important to choose the right Gabor wavelet shape for given signals. Gabor filter representation is optimal and give better performance for classifying facial actions. Daugman pioneered the using of the 2D Gabor wavelet representation in computer vision in 1980s In this section, we review the basics on Gabor wavelets, and discuss the Gabor feature representation of image. Complex Gabor functions were first introduced by Gabor.They are complex exponentials with a Gaussian envelope, or Gaussians which are modulated by complex harmonics. In this tutorial, we take the Gabor wavelets(kernels, filters) from Chengjun Lius paper defined as follows (3.8) We will explain each term in the formula as follows. z = ( x, y ) and e is a function of oscillation, whose real part is the cosine function and imaginary part is a sine function as shown in Fig. 3-6 and Fig. 3-7. The phase of the oscillation function is shown in Fig. 3-8. is the Gauss function as shown in Fig. 3-9. The Gauss window reflects the localization of the Gabor filter both in the time and frequency domain, and limit the range of the oscillation function. Gabor filter can tolerate image slight distortion by using the Gauss window. The amplitude part and the phase part of a Gabor filter are shown in Fig. 3-10 and Fig. 3-11 respectively.

Fig. 3-6 Real part of the oscillation function

Fig. 3-7 Imaginary part of the oscillation function


14

Fig. 3-8 Phase angle of the oscillation function

Fig. 3-9 The Gauss function

Fig. 3-10 Gabor function Amplitude part

Fig. 3-11 Gabor function - phase part. (In fact, it is a level. Here planning to ( , ] )

is the DC composition. In this way, the filter can be free of DC composition. k ,v is the wave-vector of the filter corresponding to orientation and scale v . Through choosing a series of k ,v a set of Gabor filter can be obtained. is a constant that with k ,v portray the wavelength of the Gauss window together. Here we choose = 2 . k ,v can be further written as (3.9) where kv = kmax f v and = / 8 . kmax is the maximum frequency, and f is the spacing factor between kernels in the frequency domain Different v is chosen to describe different wavelength of the Gauss window, and then control the scale of sampling. We can say too that controls frequency. Different is chosen to describe the oscillation function with different direction, and then control the direction of sampling. In this thesis, we useGabor wavelets at five different scales, v {0,4}, and eight orientation, {0,7}.The morphology of 40 Gabor filters is shown in Fig3-12

15

Fig. 3-12 Morphology of 40 Gabor filters Five different scales and eight orientations generate 40 filters. Fig. 3-13 shows the real part of the 40 Gabor kernels at scale v = 0,4 and orientation = 0,7 with the following parameters : = 2 , kmax = /2 and f = 21/2 . The filter demonstrate desirable property of spatial locality, and orientation selectivity.

Fig. 3-13 Gabor wavelets. (a) The real part of the Gabor kernels at five scales and eight orientations for = 2 , kmax = /2 , and f = 21/2 . (b) Magnitudes of the Gabor kernels at five different scales The other use of Gabor feature will be explained in the 5 Chapter.

16

Chapter 4 Gabor Feature in Speech Recognition


4.1 Introduction
There are many categories in the speechfeld, like speech recognition, speech enhancement,speech synthesis and speaker recognitions, each of which involves many techniques. For speech recognition purpose, feature extraction is often a key process, and the mel-frequency cepstral coeffcient (MFCC) has been one of the most widely used speech features over many years. In deriving the MFCC, the short-time Fourier transform (STFT) is applied. However, STFT is actually not suitable for analyzing a discontinuous signal or a non-stationary signal like speech due to its time-frequency properties , which implies the resulting MFCC is not always optimal for representing the speech signal and possibly provides less recognition accuracy. Because of the possible drawbacks for STFT in MFCC constructing process, wavelet transform has replace STFT in recent years. As we know, the basis for wavelet transform possesses better transformable ability for discontinuous or non-periodic signals than that for STFT. Furthermore, in wavelet transform, the different scale" and resolution" analysis can present the signal with rough part" in the low frequency and the detailed part" in the high frequency.For speech features extraction, a wavelet transform seems to be the better choice than STFT. In comparison, wavelet transform has a better performance in analyzing the nonperiodic signal, while STFT performs better for presenting the periodic signal . If we can take their advantages at the same time, the performance of the resulting new features could be better. In addition, constructing more noise-robust speech features is another direction inmy thesis. In the real-world application environment, a speech recognition system is often inuenced by noise. To overcome this problem, researchers have proposed many speech enhancement or robustness techniques to enhance the speech or alleviate the effect from the noise. We find that the wavelet analysis can also applied to constructing the noise-robust speech features. In the past research of our speech -laboratory(by professor Lin-shan Lee), the wavelet transform was used in the temporal speech feature stream and good recognition performance can be achieved .

4.2 Traditional Mel-Frequency Cepstral Coeffcients (MFCC)


The procedure to construct mel-frequency cepstral coeffcients (MFCCs) is illustrated in Fig. 4-1, which is also described as follows. A recorded time-domain speech signal is first passed through an pre-emphasis filter, which is high-pass and thus enhances the higher-frequency parts of the signal. Second,the signal is segmented into many overlapped frames (often with equal width).
17

Thirdly,each frame signal is multiplied with a window function, and is then transformed into thespectral domain with fast Fourier transform (FFT). The resulting magnitude spectrum(the phase part is discarded) is further processed with a mel-frequency filter-bank, and each filter output is the weighted sum of the magnitude values within the pass-band.Finally, all the filter outputs are further processed by the logarithm operation and the discrete cosine transform (DCT). The resulting new parameters are just the melfrequency cepstral coeffcients (MFCCs) for that frame signal. Besides, in addition to the MFCCs for a single frame, we often group the MFCCs of several adjacent frames to obtain the delta and delta-delta MFCCs, which are used together with the original MFCCs to be the finally-used feature vector for that frame.

Figure 4-1: The flowchart of MFCC feature extraction Although MFCC performs quite well and is thus widely used for speech recognition,we believe it can be further enhanced by taking some points into consideration. First, since the speech signal is a non-stationary random process, dividing it into frames and realizing the short-time Fourier transform (STFT) for obtaining MFCC may just provide a good estimate of the underlying characteristics. As we said in the previous chapter, the discrete Fourier transform gives a better analysis for periodic signals than for the signals containing sudden bursts. Secondly, using the overlapped triangular-shaped" mel-filters in deriving MFCC is efficient in computation, but is not optimal in any sense.

18

4.3 Introduction to Gabor Features


Speech is characterized by its fluctuations across time and frequency. The latter reflect the characteristics of the human vocal cords and tract and are commonly exploited in automatic speech recognition (ASR) by using short-term spectral representations such as cepstral coefficients. The temporal properties of speech are targeted in ASR by dynamic (delta and delta-delta) features and temporal filtering and feature extraction techniques like RASTA and TRAPS. Nevertheless, speech clearly exhibits combined spectro-temporal modulations. This is due to intonation, coarticulation and the succession of several phonetic elements,e.g., in a syllable. Formant transitions, for example, result in diagonal features in a spectrogram representation of speech. This kind of pattern is explicitly targeted by the feature extraction method used in this tutorialpaper. Recent findings from a number of physiological experiments in different mammal species showed that a large percentage of neurons in the primary auditory cortex respond differently to upward- versus downward-moving ripples in the spectrogram of the input . Each individual neuron is tuned to a specific combination of spectral and temporal modulation frequencies, with a spectro-temporal response field that may span up to a few 100ms in time and several critical bands in frequency and may have multiple peaks. A psychoacoustical model of modulation perception was built based on that observation and inspired the use of two-dimensional Gabor functions as a feature extraction method for ASR in this study. Gabor functions are localized sinusoids known to model the characteristics of neurons in the visual system . The use of Gabor features for ASR has been proposed earlier and proven to be relatively robust in combination with a simple classifier .Automatic feature selection methods are described in and the resulting parameter distribution has been shown to remarkedly resemble neurophysiological and psychoacoustical data as well as modulation properties of speech. Other approaches to targeting spectro-temporal variability in feature extraction include time-frequency filtering (tiffing) .Still, this novel approach of spectro-temporal processing by using localized sinusoids most closely matches the neurobiological data and also incorporates other features as special cases: purely spectral Gabor functions perform subband cepstral analysismodulo the windowing function and purely temporal ones can resemble TRAPS or the RASTA impulse response and its derivatives in terms of temporal extent and filter shape.

4.3 Gabor Feature Extraction


A spectro-temporal representation of the input signal is processed by a number of Gabor functions used as 2-D filters. The filtering is performed by correlation over time of each input frequency channel with the corresponding part of the Gabor
19

function (with the Gabor function centered on the current frame and desired frequency channel) and a subsequent summation over frequency. This yields one output value per frame per Gabor function (we call these output values the Gabor features) and is equivalent to a 2-D correlation of the input representation with the complete filter function and a subsequent selection of the desired frequency channel of the output. In this study, log mel-spectrograms serve as input features for Gabor feature extraction. This was chosen for its widespread use in ASR and because the logarithmic compression andmel-frequency scalemight be considered a very simple model of peripheral auditory processing. Any other spectro-temporal representation of speech could be used instead and especiallymore sophisticated auditory models might be a good choice for future experiments. The two-dimensional complex Gabor function g(t,f) is defined as the product of a Gaussian envelope n(t,f) and the complex Euler function e(t,f). The envelope width is dened by standard deviation values f and t , while the periodicity is dened by the radian frequenciesf and t denoting the frequency and time axis, respectively. The two independent parametersf andt allow the Gabor function to be tuned to particular directions of spectro-temporal modulation, including diagonal modulations. Further parameters are the centers of mass of the envelope in time and frequency t0 and f0 . In this notation the Gaussian envelope n (t,f) is dened as (4.1) and the complex Euler function e(f,t) as (4.2) It is reasonable to set the envelope width depending on the modulation frequenciesf andt to keep the same number of periods T in the lter function for all frequencies. Here, the spread of the Gaussian envelope in dimension x was set to The innite support of the Gaussian envelope is cut off at between x and 2x from the center. For time dependent features, t0 is set to the current frame, leaving f0 ,f andt as free parameters. From the complex results of the lter operation, real-valued features may be obtained by using the real or imaginary part only. In this case, the overall DC bias was removed from the template. The magnitude of the complex output can also be used. Special cases are temporal lters (f=0 ) and spectral lters(t=0 ).

20

Chapter 5 Gabor Features For Face Recognition


5.1 Introduction
Pattern recognition and computer vision have witnessed the growing interests in face recognition problems. It is one of the important research topics and many researchers are trying to achieve successful results. There are a large number of commercial, security and forensic applications requiring the use of face recognition technologies, such as access control, intelligent surveillance, human-computer interface and image/video retrieval. Face recognition is one of the most important applications of Gabor wavelets. The face image is convolved with a set of Gabor wavelets and the resulting images are further processed for recognition purpose. The Gabor wavelets are usually called Gabor filters in the scope of applications. There have been a great amount of researches on face recognition recently.In this chapter were going to give a brief introduction to the use of Gabor feature in face recognition.

5.2 2D Gabor Wavelet Representation of Faces


Since face recognition is not a difficult task for human beings, selection of biologically motivated Gabor filters is well suited to this problem. Gabor filters, modeling the responses of simple cells in the primary visual cortex, are simply plane waves restricted by a Gaussian envelope function.

Figure 5-1: Gabor filters correspond to 5 spatial frequency and 8 orientation.

An image can be represented by the Gabor wavelet transform allowing the description of both the spatial frequency structure and spatial relations. Convolving the image with complex Gabor filters with 5 spatial frequency (v =0,,4) and 8 orientation (= 0,,7) captures the whole frequency spectrum, both amplitude and phase (Figure 5-1). In Figure 5-2, an input face image and the amplitude of the Gabor filter responses are
21

shown.

Figure 5-2 Example of a facial image response to above Gabor filters, a) original face image (from Stirling database), and b) filter responses. One of the techniques used in the literature for Gabor based face recognition is based on using the response of a grid representing the facial topography for coding the face.Instead of using the graph nodes, high-energized points can be used in comparisons which forms the basis of this work. This approach not only reduces computational complexity, but also improves the performance in the presence of occlusions.

5.3 Feature extraction


Feature extraction algorithm for the proposed method has two main steps (Figure 5-4): (1) Feature point localization, (2) Feature vector computation. 5.3.1 Feature point localization In this step, feature vectors are extracted from points with high information content on the face image. In most feature-based methods, facial features are assumed to be the eyes, nose and mouth. However, we do not fix the locations and also the number of feature points in this work. The feature vectors and their locations can vary in order to better represent diverse
22

number of facial characteristics of different faces, such as dimples, moles, etc., which are also the features that people might use for recognizing faces (Figure 5-3).

Figure 5-3: Facial feature points found as the high-energized points of Gabor wavelet responses. From the responses of the face image to Gabor filters, peaks are found by searching the locations in a window W0 of size WxW by the following procedure: A feature point is located at (x0, y0), if (5.1) (5.2) where Rj is the response of the face image to the jth Gabor filter . N1 N2 is the size of face image, the center of the window, W0 is at (x0, y0). Window size W is one of the important parameters of proposed algorithm, and it must be chosen small enough to capture the important features and large enough to avoid redundancy Equation (5.2) is applied in order not to get stuck on a local. maximum, instead of finding the peaks of the responses.

Figure 5-4: Flowchart of the feature extraction stage of the facial images
23

5.3.2 Feature vector generation Feature vectors are generated at the feature points as a composition of Gabor wavelet transform coefficients. kth feature vector of ith reference face is defined as, (5.3) While there are 40 Gabor filters, feature vectors have 42 components. The first two components represent the location of that feature point by storing (x, y) coordinates. Since we have no other information about the locations of the feature vectors, the first two components of feature vectors are very important during matching (comparison) process. The remaining 40 components are the samples of the Gabor filter responses at that point. Although one may use some edge information for feature point selection, here it is important to construct feature vectors as the coefficients of Gabor wavelet transform. Feature vectors, as the samples of Gabor wavelet transform at feature points, allow representing both the spatial frequency structure and spatial relations of the local image region around the corresponding feature point.

Chapter 6 Conclusion
This tutorial report introduces the well-known Gabor featurewavelet transform and its application. The multi-resolution and multi-orientation properties of the Gabor wavelet transform makes it a popular method for feature extraction even if the intrinsic nonorthogonality exists. Among all the works based on Gabor wavelet, face recognition and speech recognition are the most noticeable applications, and other research used the Gabor wavelets mainly for feature extraction. Several Matlab implementations are presented in this tutorial and show both the theoretical and application aspects of Gabor wavelets. There seems no further necessity to modify the formula of Gabor wavelets while the feature representation and more possible applications remain spaces for future works.

24

References
[1] F. Smeraldi and J. Bigun, Facial feature detection by saccadic exploration of the Gabor decomposition, Proc. Intl Conf. Image Processing, 163-167 [2] F. Samaria and F. Fallside, Face identification and feature extraction using Hidden Markov Models, Image Processing: Theory and Applications, 1993. [3] M. Kleinschmidt, Methods for capturing spectro-temporal modulations in ASR, Acustica united with acta acustica,2002 [4] M. Kleinschmidt, Spectro-temporal Gabor features as a front end for ASR, in Proc. Forum Acusticum Sevilla, 2002. [5] T. S. Lee, Image representation using 2D Gabor wavelets, IEEE Trans. Pattern Analysis and Machine Intelligence, 18(10), 1996 [6] L. Shen and L. Bai, A review of Gabor wavelets for face recognition, Patt. Anal. Appl. 9: 273-292, 2006 [7] B. S. Manjunath, R. Chellappa, and C. von der Malsburg, A feature based approach to face ecognition, Proc. IEEE Conf. CVPR92: 373-378, 1992 [8] M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, Coding facial expressions with Gabor wavelets, Proc. Intl Conf. Automatic Face and Gesture Recognition, 200-205, 1998 [9] F. Smeraldi and J. Bigun, Facial feature detection by saccadic exploration of the Gabor decomposition, Proc. Intl Conf. Image Processing, 163-167

25

Vous aimerez peut-être aussi