Vous êtes sur la page 1sur 6

SICE Annual Conference 2008 August 20-22, 2008, The University Electro-Communications, Japan

Steering of Camera by Stepper Motor Towards Active Speaker Using Microphone Array

Manoj Kumar Mukul1, Rajkishore Prasad2 ,M.M Choudhary3 and Fumitoshi Matsuno4 Electronics and Communication Engg, Birla Institute of Technology Mesra, Ranchi, Jharkhand, India (E-mail: mkm@hi.mce.uec.ac.jp , mkm@bitmesra.ac.in ) 2 University Department of Electronics, BRA Bihar University, Muzaffarpur, Bihar, India 3 National Institute of Technology, Patna, Bihar, India 4 The University of Electro-communication, Tokyo, Japan.

Abstract: This paper describes camera steering for bringing in focus the active speaker using microphone array. Automatic camera steering has many important applications such as video conferencing, surveillances and monitoring. The system proposed here uses a linear microphone array to capture speech signal from the speakers and a stepper motor based Camera Steering Unit (CSU). The captured speech signals are analyzed to estimate Direction of Arrival (DOAs) of the speakers using MUltiple SIgnal Classification (MUSIC) algorithm. The estimated DOA is further inputted to the CSU which contains a stepper motor for steering the camera to the estimated direction. The experimental results suggest that high resolution MUSIC algorithm can be effective in steering camera in the noisy environment too if the number of sensors in array is increased. However, it also ups the computational cost. Keywords: MUSIC algorithm, Microphone array, Stepper Motor Controller

Automatic steering a camera towards an active speaker is an interesting problem and efficient system for the same has many applications. It is very useful in teleconferencing [1], acoustic surveillance [2], and human machine communication [3]. The key problem in such system is localization of active speaker in the real environment [4]. The localization becomes very hard in noisy and reverberant environment. The speech signal coming from speaker keeps spatial clue and its DOA information is same as that of the speaker location. There have been developments of different algorithms to extract DOA information from the speech signal captured by distributed microphones or a microphone array [4]-[9]. The aim of this paper is to describe our system for automatic camera control using a microphone array based estimation of DOA. Here small linear microphone array, of two elements, has been used to capture the speech signal [10]. In order to estimate DOA we have used MUSIC algorithm [11]-[12]. The estimated DOAs information was used by self designed stepper motor controller circuit to rotate the camera. The motor controller unit was interfaced with parallel port of the PC. The interfacing and control software were written in MATLAB [13]. These captured speech signal are analyzed in frequency domain to estimate DOA using MUSIC algorithm. Motor controller unit has been designed using a buffering IC, Darlington's pairs and some loose components. Interfacing with PC has been done in MATLAB using its Data Acquisition Toolbox.



Fig. The block diagram of proposed system. Rest of the paper is organized as follows. Next section provides overview of the system. Section 3, deals with the DOA estimation using microphone array. Section 4, presents design of stepper motor controller circuit. Section 5, deals with the experiments and results which are followed by conclusions and references.

The proposed method of automatic camera steering consists of two major parts. First is the DOA estimation from the captured speech signal and other is controller for rotation of steeper motor mounted with camera to the estimated angles of DOA. The block diagram of the proposed system is shown in Fig.1. For DOA estimation two element linear microphone arrays, which can detect two DOAs, has been used. The speech signal captured by array is transformed into frequency domain using Short Time Fourier Transform (STFT) method. The DOA is estimated in each frequency bin using MUSIC algorithm. The averaged estimated DOA is converted into rotational command by software and transmitted to the external motor controller hardware interfaced with the parallel port of the PC.

- 19 -

PR0001/08/0000-0019 400 2008 SICE


The microphone array captures spatial features of the incoming signal by the process of spatial sampling. As a result the speech signal captured by different elements of array suffers delay depending on relative distance of the speaker from the sensor, and velocity of sound. Let us

the ith source position with respect to the reference sensor in array geometry. For uniform linear array A() is Vander monde matrix given by

A( ) = [a(1 ), a( 2 )........a( k )],

where a( )


Fig.2 Simple source localization problem consider a linear microphone array of N elements with uniform inter-element spacing of d. We consider here case of far field condition so that the wave fronts arriving at each microphone are plane. The delay between signals captured by any two successive microphones from kth source can be estimated from geometry shown in Fig. 2. and can be given by = d sin k v in which v is the velocity of sound in air. If first microphone is regarded as reference, the delay for the ith microphone is given as i = (i 1) = di sin k v in which di is the linear separation of ith microphone from the reference. Assuming now that there are K sources and some noise arriving at each of the microphones simultaneously, the total received signal at the ith microphone is given as the combination of noise and K incoming signals as follows

= 2 fd / v. The MUSIC algorithm exploits the orthogonalaity of array manifolds a( ) , corresponds to signal subspace, and noise subspace of the captured signal to estimate DOA. The array manifold is known from the geometry of array and noise subspace of the captured signal is obtained from the Eigen value decomposition of the covariance matrix R = E[ X (t ) X (t )] of the captured signal. The Eigen vectors corresponding to lower Eigen values represent noise space. The estimated DOA as per MUSIC algorithm is obtained as the location of peaks in the curve of inverse of the dot product of noise subspace and array manifold. The estimated DOA M is given as

= [1

e j sin( ) ........e j ( k 1)sin( ) ] and

M =

i = K +1

i A( )


xi (t ) = sk (t )e j 2 di sin k / + ni (t ),
k =1


is given as the average of the all the estimated DOA DOAs as below

where i represents the eigenvectors corresponding to the noise subspace. Using Eq.(4) the DOA is estimated in frequency domain using Time Frequency Series of Speech (TFSS). The process of obtaining TFSS using P-point STFT analysis is shown in Fig. 3. The speech signal from each microphone is segmented into pseudo stationary segments of 20 ms duration using hanning window with 50% overlap. It is evident from Fig.3 that the time-frequency series consists of speech spectral component of same frequency from all analysis frames in the time succession. The MUSIC method is applied in each frequency bin to estimate DOA. The estimated

where k represents direction of arrival of the kth source that are to be estimated from the array output signals and = wavelength of incoming narrowband signal. The Eq.(2) in matrix form can be given as X (t ) = A( ) S (t ) + n(t ),

M ( f1 ) + M ( f 2 ) + .......... + M ( f N / 21 )
N / 2 1



where = [1 , 2 ......., k ] , X (t) = [x1(t), x2(t) , xm(t)]T is m1 array whose ith components, xi(t), represent signal from the ith microphone. S (t) = [s1(t), s2(t) sk(t)] T is k 1 array whose ith components, si (t), represent signal from the ith sound source signal. A() is m k matrix whose ith column a(i ), 1 i k is parameterized by the angle i (DOA) which specified

A stepper motor [14] converts digital pulses into mechanical shaft rotation. These pulses control the rotation of the shaft in small angular steps. Such stepped rotation is achieved by aligning certain 'teeth' of the rotor with certain poles of the stator (depending on which windings are energized and which are not) at any given time. As such, there are only specific equilibrium points at which the rotor can 'rest.' Every time a new set of pulses is delivered,

- 20 -

the rotor rotates to the next 'equilibrium point', and its angular position with respect to the stator is locked at resting place until a new set of pulses arrives to change. A stepper motor controller must be able to handle the generation and conditioning of pulses needed to produce rotational steps. A typical stepper motor

Fig.4 Complete interfaced circuit for Stepper motor control. The driver makes use of the ULN2003 driver IC, which contains an array of 7 power-Darlington pairs, each capable of driving 500mA of current. The device has base resistors, allowing direct connection to any common logic family. All the emitters are tied together and brought out to a separate terminal. Output protection diodes are included; hence the device can drive inductive loads with minimum extra components. Only four of these Darlington's pairs are actually used here. The first pin is connected to D0 of the parallel port. Each successive pin of the stepper motor is connected to successive data lines on the parallel port. If this order is not correct, the motor will not rotate, but will wiggle around from side to side. In the above mentioned circuit the job of isolation is done by octal three state buffers 74LS244. Such isolation is important as PC is low power circuit and driver is high power circuit. Also, when the motor coils are switched off, there is a collapsing magnetic field which produces back EMF. A 5 volt Zener diode has been connected between the power supply pin 9 on the chip to absorb that back EMF.

Fig.3 Process of obtaining TFSS by STFT analysis. controller consists of three basic elements namely indexer (which generates low level signals that correspond to step pulses and direction signals), motor driver circuit (converts low level signals into power signals to energies windings of stepper motor) and an interface to PC or microcontroller. In our system, it is interfaced with PC. For this purpose we designed a driver circuit using IC ULN2003. The driver circuit used for the purpose of interfacing consists of Darlington array ULN 2003 and an isolation buffer IC 74LS244. We selected this particular IC after taking into consideration numerous other options available to interface the Stepper Motor. The type of stepper motor used is very important in designing the drive circuit for the motor. Different types of stepper motor require different drive circuit. We have used a bifilar permanent magnet type stepper motor it has a stepping angle of 1.8 degree or 200 step/revolution. Other considerations in designing the drive circuit are the voltage and current required to energize the stator windings of the motor. The stepper motor we use in the lab requires a 5 volt and 500 mA of current for each phase winding. The driver circuit with interfacing to parallel port is shown in Fig.4.


For the experiment, we used three element linear microphone arrays with inter-element spacing of .02 cm. The distance of the speaker was fixed at 1.2 m from the array as shown in Fig.5. The height of the microphone was also fixed relative to speakers as we are interested in estimation of azimuth only. The sampling frequency was kept at 12 kHz. The data from two speakers situated at the distance of 1.5 m at the directions of -40 and 30 degrees were captured an is shown in Fig.6. As said before, for the DOA estimation from the captured speech signal, signal from each channel was subjected to 1024 point STFT analysis to generate TFSS. In each frequency bin covariance matrix of the TFSS were formed and decomposed into signal and noise subspaces using Eigen value decomposition. The Eigen values were arranged in descending order and eigenvectors corresponding to lower Eigen values were taken as the

- 21 -

noise subspace. The larger Eigen values indicate number of active sources. The results of Eigen value decomposition before and after are shown in Fig.6 in which two larger Eigen values indicate two active sources.

accordingly generated pulse were passed to motor driver circuit. The rotational accuracy was found to be limited by step size of motor. The motor we used in our experiment can rotate in step size of 1.8 . However, with this step size and for the used distance of the speaker the performance of camera for bringing in focus the active speaker was found satisfactory. The situation worsens with increase in reverberation. The DOA estimation in

Fig. 5 Experimental Setup.

2 s ig n a l v a lu e x 10

Mic No 1


0 x 10


1 Time in sec. Mic No 2



Fig. 7 Two peaks correspond to two source directions in frequency bin 1.5 KHz highly reverberated situation is very difficult. The performance of MUSIC algorithm with the data captured in with different Reverberation Time (RT) is shown in Fig.8 and Fig9. The estimated DOA is highly affected by reverberation time DOA estimation becomes less accurate as reverberation increases the rotation of motor and thus camera also went in wrong position with respect to position of active speaker resulting in partial framing of speaker by camera.

2 s i g n a l v a lu e


0 x 10


1 Time in sec. Mic No 3



2 s ig n a l v a lu e



1 Time in sec.



Fig. 5 Speech Signal captured by a linear array with three microphones from two speakers

Fig.6 Eigen Value corresponding two signals subspace and noise subspace ( Frequency = 1.5 KHz The estimated DOA using Eq. (4) are shown in the Fig. 7. Two peaks correspond to two source direction. The estimated DOAs in different frequency bins are not exactly same. So the average value of DOA is taken as estimated DOA. The computational load can be reduced by selecting frequency bins from different ranges for DOA estimation. The implemented software decides direction of rotation based on information of estimated DOA and current angular position of the motor and

Fig. 8 Change in estimated DOA peaks for speakers with increasing reverberation.

- 22 -

Fig. 9 Change in estimated DOA peaks for speakers with increasing reverberation.


In this paper we have presented a stepper motor control system for automatic steering of camera based on the DOA estimate of incoming speech signals with the help of a linear microphone array. We also presented frequency domain MUSIC algorithm along with interfacing of the stepper motor driver circuit with the parallel port of the PC. The proposed system is off line system because MUSIC method is computationally intensive; however by doing DOA estimation in some well selected frequency bins computational cost can be reduced. The other possibility of reducing the computational cost is to use some adaptive technique for Eigen value decomposition. We keep all these possibilities to try in future to make the system real time.

[8] H. C. Schau and A. Z. Robinson, Passive source Localization employing intersecting spherical Surfaces from time-of-arrival differences, IEEE Trans. Acoustic., Speech, Signal Processing, vol. 35, pp. 12231225, Aug.1987. [9] T. Nishiura, T. Yamada, S. Nakamura, and K. Shikano, Localization of Multiple Sound Sources Based on a CSP Analysis with a Microphone Array, Proc. ICASSP2000, pp. 10531056, Jun. 2000. [10] Small microphone array sound localization technique and ApplicationBart H.McGuyer, The Kinkad School Houston, TX October2, 2001. [11] R. Schmidt, ``Multiple emitter location and Signal parameter estimation,'' RADC Spectrum Estimation Workshop, pp. 243 - 258, 1979. [12] Performance analysis of MUSIC method for DOA estimation---Ding Sang. The University of Texas at Dallas, 1992. [13] MATLAB-the language of technical computing http://www.mathworks.com/products/matlab/. [14] Stepper motors, Fundamentals, Application and DesignV.V.Athani

[1] D. Giuliani, M. Matassoni, M.Omologo, P.Svaizer, "Hands Free Continuous Speech Recognition in Noisy Environment Using a Four Microphone Array", Proc. ICASSP, vol.2, pp.273-276, Detroit 1995, [2] M. Omologo, P.Svaizer, Acoustic Event Localization using Cross-Power Spectrum phase based Technique,"Proc. IACASSP, 1994. [3] R.K. Prasad, H. Saruwatari, K. Shikano," Robots that can hear, understand and talk", Jr. Advanced Robotics, Vol.18,No.5,pp 533-564,2004. [4] T.Guastafsson, B.D.Rao, M.Trivedi," Source Localization in Reverberant Environments: Modeling and Statistical Analysis," IEEE Trans. on Speech and Audio Processing, Vol.11, No.6, November 2003. [5] M. S. Brandstein and D.Wards (Eds.), "Microphone Arrays", Springer 2001. [6] M. S. Brandstein and H.F. Silverman," Apractical methodology for speech source localization with Microphone arrays," Comput., Speech, Lang., vol.11, no.2, pp.91-126, Apr.1997. [7] J. E. Adcock, M. S. Brandstein, and H. F. Silverman, A closed-form estimator for use with room environment microphone arrays, IEEE Trans.Speech Audio Processing, vol. 5, pp. 4550, Jan. 1997.

- 23 -

- 24 -

Vous aimerez peut-être aussi