Vous êtes sur la page 1sur 19

Real-time Tracking of Multiple

People Using Stereo


David Beymer
Kurt Konolige

Bob Bolles
Chris Eveland

Artificial Intelligence Center


SRI International

Problem: people tracking for surveillance


return coarse 3D locations
of people
real-time on standard
hardware
multiple people in scene
stationary camera

Approach
consider: template-based tracking
maintain template of object

T ( x, y )
correlation used to update object position ( x p , y p )
template is recursively updated to handle changing object
appearance

T ( x, y ) T ( x, y ) (1 ) I ( x x p , y y p )

limitations/problems

1) object initialization/detection
2) template drift

Goal: add modality of stereo


segmentation: background subtraction on stereo
disparities to detect foreground

left

background

disparities

foreground

detection: person templates encoding head and torso shape


tracking:

person templates used to avoid drift


stereo segmentation used to add support template

Approach
background
init

segment foreground into


depth layers
correlate with person
templates

background
subtraction

stereo

left
intensity
foreground

detection
person
templates

tracking

detection

tracking
intensity and "support"
templates are recursively
updated
Kalman filtering on person
location in 3D
person templates used to
avoid drift

Related Work
Companies

Teleos Research/Autodesk, People Tracker


DEC/Compac, Smart Kiosk [Rehg, et al, 1997]
Interval, Morphin' Mirror [Darrell, et al, 1998]
Sarnoff [IUW, 1998]
Texas Instruments [Flinchbaugh, 1998]
Electric Planet

Universities

MIT, Pfinder [Wren, et al, 1997]


Toronto, [Fieguth and Terzopoulos, 1997
4
Maryland, W S [Haritaoglu, et al., 1998]
MIT, Forest of Sensors [Grimson, et al., 1998]
CMU [Kanade, et al, 1998]
Columbia/Lehigh [Nayar and Boult, 1998]
Boston Univ., [Rosales and Sclaroff, 1998]

Stereo module: SRI's Small Vision System (SVS)

Hardware
two CMOS cameras
low power (150mW), inexpensive
($100 components)
adjustable baseline: 2.7'' to 6.2'' in
1'' increments
another version with DSP
processing onboard

Software
stereo algorithm is area
correlation based
optimized C and MMX code
20 Hz on 320x240 image, 24
disparities, 400 MHz Pentium II

SVS Stereo Results

left

right

notation:
d( x, y )current disparities
d 0 ( x, ybackground
)
estimate
disparities

Background subtraction
look for disparities closer than background
d ( x, y )

if

f ( x, y )

d(x,y) defined and d 0 ( x, y ) undefined


0

left

d( x, y ) d 0 ( x, y ) thresh, or

otherwise

background d 0 ( x, y )

disparities d( x, y )

foreground f ( x, y )

using stereo disparities versus intensities


+less sensitive to lighting changes, shadows

more computationally expensive

+can segment people at different depths

tends to blur & expand object


boundaries

Handling scale
idea: range info from stereo can be used to fix scale of
processing
avoid search over scale parameter
person width is proportional to disparity
image

w'

COP

f
z

from similar triangles:


stereo equation:

z
f

w w'
z w' f w const

bf
z

w' dK

d: disparity
b: baseline
K: constant

Detection
foreground
f(x,y)

another
peak?

histogram

no
count

exit
disparity

yes

threshold
disparities
layer(x,y)
correlate with
person
template

found
person?
yes
remove person
from layer(x,y)

no

Detection example
during detection, extract intensity and support template
from layer(x,y)

Tracking -- coordinate space


stereo
head

top view

left
right

image
(x, disparity)

3D
(X, Z)

Tracking Steps
prediction
predict Kalman filter (X, Z)
predict person disparity

segmentation
select foreground layer around predicted disparity

localization
correlate gray level template against left image, weighted by support
template [coarse localization]
correlate head/torso shape template against segmented foreground
layer [re-centering step that addresses template drift]

update
Kalman filter
recursive update of intensity and support templates

Tracking Videos
recursive template update

walking figure eight

running

Please click on image to start video. Once finished viewing


the video, use the back button on your browser to return.

Tracking Videos

visualizing tracks from map


view

tracking under multiple


occlusions

Please click on image to start video. Once finished viewing


the video, use the back button on your browser to return.

Tracking: quantitative results


Sequence # people # occlusions
1
1
0
2
1
0
3
1
0
4
2
0
5
2
0
6
2
1
7
3
2
8
4
2
9
3
6
10
5
10
11
4
9
12
5
20
13
5
28

TR = tracking rate

TR
96%
98%
96%
89%
92%
86%
79%
85%
84%
78%
69%
68%
70%

FP MTD
0%
6.0
0%
4.0
0% 10.0
10%
2.5
6% 11.0
0%
9.0
3%
7.7
2%
5.0
4%
5.8
1.3%
6.6
5.6%
7.0
3.2%
5.4
6.7%
6.2

FP = false positive rate

MTD = mean time to detect

Evaluating use of stereo in tracker


Experiment: disable stereo in tracker
code modifications:
disable re-centering step
weighted intensity correlation

unweighted correlation

results:
mean tracking rate (TR) drops 4%
mean false positive rate (FP) increases from 3% to 10%
(qualitative) template drift causes people to be lost and re-detected

Conclusion
Stereo is an effective segmentation tool:
detection: provides a foreground layer divided into different depth
layers
tracking: helps to avoid template drift by focusing on foreground
pixels at objects depth

Combine segmentation with priors on person shape (i.e.


head/torso templates) for person localization.

Vous aimerez peut-être aussi