Académique Documents
Professionnel Documents
Culture Documents
1459
I. INTRODUCTION
N COMPUTER vision applications, such as video surveillance, human motion analysis, human-machine interaction,
and object based video encoding (e.g., MPEG4), objects of interest are often the moving foreground objects in an image sequence. One effective way of foreground object extraction is
to suppress the background points in the image frames [1][6].
To achieve this, an accurate and adaptive background model is
often desirable.
Background usually contains nonliving objects that remain
passive in the scene. The background objects can be stationary
objects, such as walls, doors and room furniture, or nonstationary objects such as wavering bushes or moving escalators.
Manuscript received June 19, 2003; revised January 29, 2004. The associate
editor coordinating the review of this manuscript and approving it for publication was Dr. Luca Lucchese.
L. Li, W. Huang, and Q. Tian are with Institute for Infocomm Research
,
Singapore, 119613 (e-mail: lyli@i2r.a-star.edu.sg; wmhuang@i2r.a-star.edu.sg;
tian@i2r.a-star.edu.sg).
I. Y.-H. Gu is with the Department of Signals and Systems, Chalmers
University of Technology, SE-412 96 Gteborg, Sweden (e-mail: irenegu@
s2.chalmers.se).
Digital Object Identifier 10.1109/TIP.2004.836169
1460
2) A new formula of Bayes decision rule is derived for background and foreground classification.
3) The background is represented using statistics of principal features associated with stationary and nonstationary background objects.
4) A novel method is proposed for learning and updating
background features to both gradual and once-off background changes.
5) The convergence of the learning process is analyzed and
a formula is derived to select a proper learning rate.
6) A new real-time algorithm is developed for foreground
object detection from complex environments.
Further, a wide range of tests is conducted on a variety of
environments, including offices, campuses, parks, commercial
buildings, hotels, subway stations, airports, and sidewalks.
The remaining part of the paper is organized as follows.
After a brief literature review of existing work in Section I-A,
Section II describes the statistical modeling of complex background based on principal features. First, a new formula of
Bayes decision rule for background and foreground classification is derived. Based on this formula, an effective data
structure to record the statistics of principal features is established. Principal feature representation for different background
objects is addressed. In Section III, the method for learning
and updating the statistics of principal features is described.
Strategies to adapt to both gradual and sudden once-off
background changes are proposed. Properties of the learning
process are analyzed. In Section IV, an algorithm for foreground
object detection based on the statistical background modeling
is described. It contains four steps: change detection, change
classification, foreground segmentation, and background maintenance. Section V presents the experimental results on various
environments. Evaluations and comparisons with an existing
method are also included. Finally, conclusions are given in
Section VI.
A. Related Work
A simple and direct way to describe the background at each
pixel is to use the spectral information, i.e., the gray-scale
or color of the background pixel. Early studies describe
background features using an average of gray-scale or color
intensities at each pixel. Infinite impulse response (IIR) or
Kalman filters [7], [14], [15] are employed to update slow
and gradual changes in the background. These methods are
applicable to backgrounds consisting of stationary objects. To
tolerate the background variations caused by imaging noise,
illumination changes, and the motion of nonstationary objects,
the statistical models are used to represent the spectral features
at each background pixel. The frequently used models include
gaussian [8], [16][22] and mixture of gaussians (MoG) [4],
[23][25]. In these models, one or a few gaussians are used to
represent the color distributions at each background pixel. A
mixture of Gaussian distributions can represent various background appearances, e.g., road surfaces under the sun or in the
shadows [23]. The parameters (mean, variance, and weight) for
each gaussian are updated using an IIR filter to adapt to gradual
background changes. Moreover, by replacing an old gaussian
with a newly learned color distribution, MoG can adapt to
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 11, NOVEMBER 2004
1461
Using (5), the pixel with observed feature vector at time can
be classified as a background or a foreground point, provided
,
, and
that the prior and conditional probabilities
are known in advance.
TABLE I
CLASSIFICATION OF PREVIOUS METHODS AND THE PROPOSED METHOD
and
(6)
The value of
and the existence of
and
depend
on the selection and quantization of the feature vectors. The
feature vectors are defined as the principal features of the
background at the pixel .
To learn and update the prior and conditional probabilities for
the principal feature vectors, a table of statistics for the possible
principal features is established for each feature type at . The
table is denoted as
(7)
is the learned
based on the observation of
where
records the statistics of the
most
the features and
1462
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 11, NOVEMBER 2004
Fig. 1. One example of learned principal features for a static background pixel in a busy scene. The left image shows the position of the selected pixel. The two
, the light gray part is
right images are the histograms of the statistics for the most significant colors and gradients, where the height of a bar is the value of
, and the top dark gray part is
. The icons below the histograms are the corresponding color and gradient features.
at pixel . Each
(8)
where
is the dimension of the feature vector . The
in the table
are sorted in descending order with respect
. The first
elements from the table
,
to the value
, are used in (5) for background and foretogether with
ground classification.
C. Feature Selection
The next essential issue for principal feature representation
is feature selection. The significant features of different background objects are different. To achieve effective and accurate
representation of background pixels with principal features,
the employment of proper types of features is important. Three
types of features, the spectral, spatial, and temporal features,
are used for background modeling.
1) Features for Static Background Pixels: For a pixel belonging to a stationary background object, the stable and most
significant features are its color and local structure (gradient).
Hence, two tables are used to learn the principal features. They
and
with
and
repare
resenting the color and gradient vectors, respectively. Since the
gradient is less sensitive to illumination changes, the two types
of feature vectors can be integrated under the Bayes framework
as the following.
and assume that the and are indepenLet
dent, the Bayes decision rule (5) becomes
(9)
For the features from static background pixels, the quantization
measure should be less sensitive to illumination changes. Here,
a normalized distance measure based on the inner product of
two vectors is employed for both color and gradient vectors. The
distance measure is
(10)
is less than
where can be or , respectively. If
a small value ,
and
are matched to each other. The robustness of the distance measure (10) to illumination changes
and imaging noise is shown in [2]. The color vector is directly
obtained from the input images with 256 resolution levels for
each component, while the gradient vector is obtained by applying Sobel operator to the corresponding gray-scale input im,
is
ages with 256 resolution levels. With
found accurate enough to learn the principal features for static
background pixels. An example of principal feature representation for static background pixel is shown in Fig. 1, where the
histograms for the most significant color and gradient features
and
are displayed. The histogram of the color
in
features shows that only the first two are the principal colors for
the background, and the histogram of the gradients shows that
the first six, excluding the fourth, are the principal gradients for
the background.
2) Features for Dynamic Background Pixels: For dynamic
background pixels associated with nonstationary objects, color
co-occurrences are used as their dynamic features. This is because the color co-occurrence between consecutive frames has
been found to be suitable to describe the dynamic features associated with nonstationary background objects, such as moving
tree branches or a flickering screen [12]. Giving an interframe
to
change from the color
at the time instant and the pixel
,
the feature vector of color co-occurrence is defined as
. Similarly, a table of
is maintained at each
statistics for color co-occurrence
pixel. Let
be the input
is generated by
color image; the color co-occurrence vector
quantizing color components to low resolution. For example,
by quantizing the color resolution to 32 levels for each com, one may obtain a good
ponent and selecting
principal feature representation for dynamic background pixel.
An example of the principal feature representation with color
co-occurrence for a flickering screen is shown in Fig. 2. Compared with the quantized color co-occurrence feature space of
cells,
implies that with a very small number of
feature vectors, the principal features are capable of modeling
the dynamic background pixels.
1463
Fig. 2. One example of learned principal features for dynamic background pixels. The left image shows the position of the selected pixel. The right image is the
histogram of the statistics for the most significant color co-occurrences in
, where the height of a bar is the value of
, the light gray part is
, and
the top dark gray part is
. The icons below the histogram are the corresponding color co-occurrence features. In the screen, the color changes among
white, dark blue, and light blue periodically.
(13)
These probabilities are learned gradually with operations described by (11) and (12) at each pixel . When a once-off
background change has happened, the new background appearance soon becomes dominant after the change. With the replacement operation (12), the gradual accumulation operation (11)
and resorting at each time step, the learned new features will be
. After some
gradually moved to the first few positions in
time duration, the term on the left hand of (13) becomes large
( 1) and the first term on the right hand of (13) becomes very
small since the new background features are classified as foreground. From (6) and (13), new background appearance at can
be found if
(11)
where the learning rate is a small positive number and
. In (11),
means that is classified as a
background point at time in the final segmentation, otherwise,
. Similarly,
means that the th vector of the
matches the input feature vector , and otherwise
table
.
The above updating operation states the following. If the pixel
is labeled as a background point at time ,
is slightly
due to
. Further, the probabilities
increased from
.
for the matched feature vector are also increased due to
However, if
, then the statistics for the un-matched
feature vectors are slightly decreased. If there is no match be, the
tween the feature vector and the vectors in the table
th vector in the table is replaced by a new feature vector
(14)
In (14), denotes the previous background before the once-off
change and denotes the new background appearance after the
prevents errors caused by
once-off change. The factor
a small number of foreground features. Using the notation in (7)
and (8), the condition (14) becomes
(15)
Once the above condition is satisfied, the statistics for the foreground should be tuned to be the new background appearance.
According to (4), the once-off learning operation is performed
as follows:
(12)
If the pixel is labeled as a foreground point at time ,
and
are slightly decreased with
. However, the
matched vector in the table
is slightly increased.
The updated elements in the table
are resorted in a de, such that the table may keep
scending order with respect to
(16)
for
1464
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 11, NOVEMBER 2004
(17)
which implies the sum of the conditional probabilities of the
principal features being background will remain equal or close
to 1 during the evolution of the learning process.
at time due to some reasons
Let us suppose
such as the disturbance from foreground objects or the operation
from the first
vectors
of once-off learning, and the
matches the input feature vector , then we have
in
(18)
elements of
and the features after fall into the next
. Then, the statistics at time can be described
as
(20)
Since the new background appearance at pixel after time
is classified as foreground before the once-off updating with
(16),
,
and
decrease exponentially,
increases exponentially and will be
whereas
shifted to the first
positions in the updated table
with sorting at each time step. Once the condition of (15) is
met at time , the new background state is learned. To make
the expression simpler, let us assume that there is no resorting
operation. Then the condition (15) becomes
(21)
From (11) and (20), it follows that at time
conditions hold:
, the following
(22)
(24)
By substituting (22)(24) to (21) and rearranging terms, one can
obtain
(25)
where is the number of frames required to learn the new background appearance. Equation (25) implies that if one wishes
the system to learn the new background state in no later than
frames, one should choose , such that (25) is satisfied. For example, if the system is to respond to an once-off background
,
change in 20 s with the frame rate being 20 fps and
should be satisfied.
IV. FOREGROUND OBJECT DETECTION: THE ALGORITHM
With the Bayesain formulation of background and foreground
classification, as well as the background representation with
principal features, an algorithm for foreground object detection
from complex environments is developed. It consists of four
parts: change detection, change classification, foreground object
segmentation, and background maintenance. The block diagram
of the algorithm is shown in Fig. 3. The white blocks from left to
right correspond to the first three steps, and the blocks with gray
shades correspond to background maintenance. In the first step,
unchanged background pixels in the current frame are filtered
Fig. 3.
1465
B. Change Classification
is detected at a pixel , it is classified as
If
a dynamic point, otherwise, it is classified as a static point. A
change that occurs at a static point could be caused by illumination changes, once-off background changes, or a temporarily
motionless foreground object. A change detected at a dynamic
point could be caused by a moving background or foreground
object. They are further classified as background or foreground
by using the Bayes decision rule and the statistics of the corresponding principal features.
Let be the input feature vector at and time . The probabilities are estimated as
A. Change Detection
In this step, simple adaptive image differencing is used to
filter out nonchange background pixels. The minor variations
of colors caused by imaging noise are filtered out to save the
computation for further processing.
be the input image and
Let
be the reference background image maintained at
time with
denoting a color component. The
background difference is obtained as follows. First, image differencing and thresholding for each color component are performed, where the threshold is automatically generated using
the least median of squares (LMedS) method [31]. The backis then obtained by fusing the results
ground difference
from the three color components. Similarly, the temporal (or inbetween two consecutive frames
terframe) difference
and
is obtained. If both
and
, the pixel is classified as a nonchange background point. In general, more than 50% of the pixels would be
filtered out in this step.
(26)
is a feature vector set composed of those in
where
which match the input vector , i.e.
and
(27)
matches ,
If no principal feature vector in the table
and
are set as 0. Then, the change point is
both
classified as background or foreground as the following.
Classification of Static Point: For a static point, the probabilities for both color and gradient features are estimated by (26)
and
, respectively, where the vector distance
with
in (27) is calculated as (10). In this work, the
measure
statistics of the two type principal features (
and
)
are learned separately. In general cases, there would be
. The Bayes decision rule (9) can be applied for background and foreground classification. In some complex cases,
1466
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 11, NOVEMBER 2004
Fig. 4.
With (30), the reference background image can follow the dynamic background changes, e.g., the changes of color between
tree branch and sky, as well as once-off background changes.
E. Memory Requirement and Computational Time
The complete algorithm is summarized in Fig. 4. The major
part of memory usage is to store the tables of the statistics
,
and
) for each pixel. In our implementa(
tion, the memory requirement for each pixel is approximately
1.78 KB. For a video with image sized 160 120 pixels, the
required memory is approximately 33.4 MB. While for image
sized 320 240 pixels, 133.5-MB memory is required. For a
standard PC, this is still feasible. With a 1.7-GHz Pentium CPU
PC, real-time processing of image sequences is achievable at a
rate of about 15 frames per second (fps) for images sized 160
120 pixels and at a rate of 3 fps for images sized 320 240
pixels.
1467
Fig. 5. Experimental results on a meeting room environment (MR) with wavering curtains in the winds. The two examples are the results of the frame 1816 and
2268.
Fig. 6. Experimental results on a lobby environment (LB) in an office building with switching on/off lights. Upper row: a frame before switching off some lights
(364). Lower row: the frame 15 s after switching off some lights (648).
V. EXPERIMENTAL RESULTS
The proposed method has been tested on a variety of indoor
and outdoor environments, including offices, campuses, parking
lots, shopping malls, restaurants, airports, subway stations, sidewalks, and other private and public sites. It has also been tested
on image sequences captured in various weather conditions, including sunny, cloudy, and rainy weather, as well as night and
crowd scenes. In all the tests, the proposed method was automatically initialized (bootstrap) from blinking background (i.e.,
,
, and
for
and
). The system gradually learned the most significant features for both stationary and nonstationary background
objects. Once the once-off updating is performed, the system
is able to separate the foreground from the background well.
MoG [4] is a widely-used adaptive background subtraction
method. It performs quite well for both stationary and nonstationary backgrounds among the existing methods [6]. The proposed method has also been compared with MoG in the experiments. The same learning rate was used for both the proposed
method and MoG in each test.1 Further, for a fair comparison,
the post processing used in the proposed method was applied
for the MoG method as well.
1The similar analysis of the learning process and dynamic performance for
MoG can be made as in Section III-C and III-D.
The visual examples and quantitative evaluations of the experiments are described in the following two subsections, respectively.
A. Examples on Various Environments
Selected results on five typical indoor and outdoor environments are displayed in this section. The typical environments are
offices, campuses, shopping malls, subway stations, and sidewalks. In the figures of this subsection, pictures are arranged in
rows. In each row, the images from left to right are the input
frame, the background reference image maintained by the pro, the manually genposed method at the moment
erated ground truth, the results of the proposed method and
MoG.
1) Office Environments: Office environments include offices, laboratories, meeting rooms, corridors, lobbies, and
entrances. An office environment is usually composed of
stationary background objects. The difficulties for foreground
detection in these scenes can be caused by shadows, changes
of illumination conditions, and camouflage foreground objects
(i.e., the color of the foreground object is similar to that of the
covered background). In some cases, background may consist
of dynamic objects, such as waving curtains, running fans,
and flickering screens. Examples from two test sequences are
shown in Figs. 5 and 6, respectively.
1468
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 11, NOVEMBER 2004
Fig. 7. Experimental results on a campus environment (CAM) containing wavering tree branches in strong winds. They are frame 1019, 1337, and 1393.
1469
Fig. 8. Experimental results on shopping mall environments which contain specular ground surfaces. The three examples came from a busy shopping center (SC),
an airport (AP), and a buffet restaurant (BR), respectively.
Fig. 9.
Experimental results on a subway station environments (SS). The examples are the frame 1993 and 2634.
1470
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 11, NOVEMBER 2004
Fig. 10. Experimental results of pedestrian detection from a sidewalk environment (SW) around the clock. From top to bottom are the frames from sunny, cloudy,
rainy, night, and crowd scenes.
TABLE II
PARAMETERS USED FOR ALL TEST EXAMPLES
TABLE III
LEARNING RATES USED IN THE TEST EXAMPLES
In the previous work [6], the results were evaluated quantitatively from the comparison with the ground truths in terms of
1) false negative error: the number of foreground pixels that
are missed;
2) false positive error: the number of background pixels that
are misdetected as foreground.
However, it is found that when averaging the measures over various environments, they are not accurate enough. In this paper,
a new similarity measure is introduced to evaluate the results of
foreground segmentation. Let be a detected region and be
1471
Fig. 11. Some examples of matching images with different similarity measure values. In the images, the bright color indicates the intersection of the detected
regions and the ground truths, the dark gray color indicates the false negatives, and the light gray color indicates the false positives.
TABLE IV
QUANTITATIVE EVALUATION AND COMPARISON RESULT:
Foreground objects are detected through foreground and background classification under Bayesian framework. Our test results have shown that the principal features are effective in representing the spectral, spatial, and temporal characteristics of
the background. A learning method to adapt to the time-varying
background features has been proposed and analyzed. Experiments have been conducted on a variety of environments, including offices, public buildings, subway stations, campuses,
parking lots, airports, and sidewalks. The experimental results
have shown the effectiveness of the proposed method. Quantitative evaluation and comparison with the existing method have
shown that an improved performance for foreground object detection in complex background has been achieved. Some limitations of the method have been discussed with suggestions to
possible improvement.
ACKNOWLEDGMENT
The authors would like to thank R. Luo, J. Shang, X. Huang,
and W. Liu for their work to generate the ground truths for
evaluation.
REFERENCES
[1] D. Gavrila, The visual analysis of human movement: A survey,
Comput. Vis. Image Understanding, vol. 73, no. 1, pp. 8298, 1999.
[2] L. Li and M. Leung, Integrating intensity and texture differences for
robust change detection, IEEE Trans. Image Processing, vol. 11, pp.
105112, Feb. 2002.
[3] E. Durucan and T. Ebrahimi, Change detection and background extraction by linear algebra, Proc. IEEE, vol. 89, pp. 13681381, Oct. 2001.
[4] C. Stauffer and W. Grimson, Learning patterns of activity using realtime tracking, IEEE Trans. Pattern Anal. Machine Intell., vol. 22, pp.
747757, Aug. 2000.
[5] I. Haritaoglu, D. Harwood, and L. Davis,
: Real-time surveillance
of people and their activities, IEEE Trans. Pattern Anal. Machine Intell.,
vol. 22, pp. 809830, Aug. 2000.
[6] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, Wallflower: Principles and practice of background maintenance, in Proc. IEEE Int. Conf.
Computer Vision, Sept. 1999, pp. 255261.
[7] K. Karmann and A. Von Brandt, Moving object recognition using an
adaptive background memory, Time-Varing Image Process. Moving
Object Recognit., 2, pp. 289296, 1990.
[8] C. Wren, A. Azarbaygaui, T. Darrell, and A. Pentland, Pfinder: realtime tracking of the human body, IEEE Trans. Pattern Anal. Machine
Intell., vol. 19, pp. 780785, July 1997.
[9] A. Elgammal, D. Harwood, and L. Davis, Non-parametric model for
background subtraction, in Proc. Eur. Conf. Computer Vision, 2000.
1472
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 11, NOVEMBER 2004