Académique Documents
Professionnel Documents
Culture Documents
Dissertation
zur Erlangung des Grades eines
Doktor-Ingenieurs(Dr.-Ing.)
an der Fakultt
fr Elektrotechnik und Informationstechnik
der Ruhr-Universitt Bochum
vorgelegt von
Bochum 2006
Contents
1 Introduction
10
1.1
Problem description . . . . . . . . . . . . . . . . .
10
1.2
. . . . . . . .
12
1.3
1.4
1.2.1
Segmentation
. . . . . . . . . . . . . . . .
12
1.2.2
Classication . . . . . . . . . . . . . . . . .
17
1.2.3
Tracking
. . . . . . . . . . . . . . . . . . .
18
19
1.3.1
. . . .
19
1.3.2
21
1.3.3
Classication . . . . . . . . . . . . . . . . .
22
1.3.4
Tracking
. . . . . . . . . . . . . . . . . . .
24
Selected approach . . . . . . . . . . . . . . . . . .
25
1.4.1
Image preprocessing . . . . . . . . . . . . .
26
1.4.2
Initial detection
. . . . . . . . . . . . . . .
26
1.4.3
Classication . . . . . . . . . . . . . . . . .
27
1.4.4
Tracking
. . . . . . . . . . . . . . . . . . .
28
. . . . . . . . . . . . . . . . . . . .
29
1.5
Contributions
1.6
. . . . . . . . . . . . . . . .
31
CONTENTS
2 Initial detection
2.1
2.2
32
Image pre-processing
. . . . . . . . . . . . . . . .
33
2.1.1
Image enhancement . . . . . . . . . . . . .
33
2.1.2
Image features . . . . . . . . . . . . . . . .
35
39
2.2.1
39
2.2.2
2.3
. . . . . .
3 Classication
3.1
3.2
Classiers
3.4
48
52
. . . . . . . . . . . . . . . . . . . . . .
52
3.1.1
Neural networks . . . . . . . . . . . . . . .
53
3.1.2
. . . . . . . . . .
55
56
3.2.1
Rectangle features . . . . . . . . . . . . . .
57
3.2.2
3.3
45
. . . . . . . . . . . . . . . . . . . . .
57
. . . . . .
58
3.3.1
58
3.3.2
60
3.3.3
. . . . . . . . . .
. . . . .
60
61
3.4.1
Bootstrapping
61
3.4.2
Component classication
3.4.3
. . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . .
62
62
CONTENTS
3.5
3.6
Feature selection . . . . . . . . . . . . . . . . . . .
63
3.5.1
63
3.5.2
Adaboost . . . . . . . . . . . . . . . . . . .
65
3.5.3
Multi-objective optimization
66
. . . . . . . .
. . . . . . . . . . . . . . . . . .
4 Tracking
69
72
4.1
. . . . . . .
73
4.2
. . . . . . . . . . . . . . . . .
75
4.3
. . . . . . . . . . . .
77
4.4
80
4.5
81
5 Experimental results
5.1
5.2
Initial detection
83
. . . . . . . . . . . . . . . . . . .
83
5.1.1
Results on r images
. . . . . . . . . . . .
86
5.1.2
89
Classication . . . . . . . . . . . . . . . . . . . . .
92
5.2.1
Results on r images
. . . . . . . . . . . .
94
5.2.2
104
5.2.3
109
5.2.4
5.2.5
. . . . . . . . . . . . .
109
. . . . . . . . . . . . . . . . .
112
CONTENTS
5.2.6
5.3
Tracking
. . . . . . . . . .
115
. . . . . . . . . . . . . . . . . . . . . . .
115
5.3.1
Results on r images
. . . . . . . . . . . .
116
5.3.2
118
6 Discussion
6.1
119
. . . . . . . . . . .
119
6.1.1
Initial detection
. . . . . . . . . . . . . . .
120
6.1.2
Classication . . . . . . . . . . . . . . . . .
124
6.1.3
Tracking
130
6.1.4
. . . . . . . . . . . . . . . . . . .
. . . . . . .
136
6.2
Further work . . . . . . . . . . . . . . . . . . . . .
137
6.3
140
List of Figures
1.1
. . . . . . . . .
10
2.1
Rectangle lters. . . . . . . . . . . . . . . . . . . .
36
2.2
37
2.3
. . . . . . . . . . . .
38
2.4
41
2.5
42
2.6
42
2.7
. . . . . . . . . . .
44
2.8
A scanning window. . . . . . . . . . . . . . . . . .
46
2.9
. . . . . . . . . . . . . .
47
. . . . . . . . . . . . . .
48
49
50
. . . . . . . . . . .
3.1
. . . . . . . .
57
3.2
58
3.3
62
LIST OF FIGURES
3.4
71
4.1
. . .
79
4.2
. . . .
81
5.1
84
5.2
85
5.3
. . . .
87
5.4
. . . . . . . . . . . . . .
88
5.5
. . . . . . . . . .
88
5.6
89
5.7
90
5.8
5.9
. . . . .
. . .
91
. .
92
. . . . . . . . . . . . . .
93
. . . . . . . . . . . . . . . . .
96
5.12 Results of support vector machine classication gradients and orientations features.
. . . . . . . . . .
97
. . . . . . . . . . . . . . . . . . .
98
. . . . . . . . . . . . . . . . . . . . . . .
100
. . . . . . . . . . . . .
101
. . . . . . . . . . . . . . . . . . . . . . .
102
LIST OF FIGURES
. . . .
103
. . . . . . . . . . . . . . . . . . . . . .
104
. . . . . . . . . . . . . . . . . . . . . . .
106
. . . . . . . . . . . . . . . . .
107
. . . . . . . . . . . . . . . . . . . . . . . .
108
. . . . . . . .
109
. . . . . . . . .
110
. . . . . . .
111
111
113
114
. . . . . . . . .
117
118
6.2
. . . . . . .
. . . . . . . . . . .
122
. . . . . . . . .
122
6.3
123
6.4
. . . . . . .
124
6.5
. . . . . .
127
6.6
. . . .
132
LIST OF FIGURES
6.7
6.8
133
. . . . . . . . . . . . . . . . . . . .
134
List of Algorithms
1
. . . . . . . . . .
40
45
47
. . . . . .
51
. . . . . . . . . .
65
68
69
70
. . . . . . . . . . . . . . .
75
10
. . . . . . . . . . . . . . . . .
77
11
79
12
80
13
. . . . . . . . . . . . . .
. . . . . . . . .
82
Chapter 1
Introduction
1.1
Problem description
This thesis describes a complete system for the detection of pedestrians in monocular camera images and monocular r(far infrared)
images recorded from a moving car. The objective of this work
is to have a system that can warn the driver of the car when a
pedestrian crosses the street. Two examples of a street scene and
the output of the detection system are shown in gure 1.1. There
are a few important properties such as system should have:
10
CHAPTER 1.
INTRODUCTION
11
It should be very reliable with respect to environment conditions; it should be able to operate in a complex city environment as well as on a land road.
It should be able to operate in real-time at moderate hardware. The aim of the system presented in this work is to run
at at least 20 frames per second on a 1 GHz. desktop processor. This processing speed is what roughly can be expected
to be available for driver assistance systems in cars in the
next years.
An image pre-processing component which calculates relevant features from the image.
CHAPTER 1.
12
INTRODUCTION
should be able to handle the motion of the car and the motion
of the pedestrian. The tracking component makes it possible
to stabilize detections over time and to predict the time of
contact in case of a possible collision.
1.2
1.2.1 Segmentation
The purpose of segmentation is to extract interesting regions from
the image. An example of segmentation in an image recorded from
CHAPTER 1.
13
INTRODUCTION
w((vi , vj ))
S is a
partition of V into components where each component C S is
0
0
0
a connected component in a graph G = (V, E ), where E E .
A segmentation
CHAPTER 1.
14
INTRODUCTION
in the image at
dI
is I(px , py , t), the derivative
dt with respect to
is
dI
I dpx
I dpy I dt
.
=
+
+
dt
px dt
py dt
t dt
The assumption made in the calculation of the optic ow is that
the intensity of a point
dI
dt
= 0.
T dpx
I dpy
I
=
+
.
t
px dt
py dt
(1.1)
Sp
gradients in a region
Sp
around
So measuring the
optic ow vector. The optic ow in each point in this region is
constrained by 1.1. To nd the optic ow
v = ( dpdtx ,
dpy
dt ) can be
calculated by minimizing
2
X I
I
I
vx +
vy +
.
Ep =
px
py
t
x,ySp
CHAPTER 1.
15
INTRODUCTION
Finding
CHAPTER 1.
16
INTRODUCTION
Z
ESnake =
where
Eint represents
Eext
Eimage
Esnake
ational calculus. The Active Contour Model is much used in medical image processing, for example for the segmentation of organs
in images recorded by a medical scanner.
An alternative to the spline representation of Active Contour Models is the Level Set representation[10], which can detect contours
with discontinuities. The Active Contour Model from [10] is based
on the Mumford Shah function [29] for image segmentation. The
E(c1 , c2 , )
energy function
Z
E(c1 , c2 , ) =
erage values of
is:
Z
Z
(f c1 )2 H()+ (f c2 )2 (1H())+ |5H()|
where
is the image,
c1 and c2 are
the av-
H()
E(c1 , c2 , )
is
Eshape
CHAPTER 1.
17
INTRODUCTION
1.2.2 Classication
The purpose of classication in the context of computer vision
is to learn to distinguish between images of a target class of objects and images of a non-target class of objects.
A classier is
neighbour classier selects the k example points closest to the example being classied and votes between those example points to
determine the class of the example. Neural networks and support
vector machines are explained in section 3.1.1 and section 3.1.2
respectively.
An important choice is which kind of image features are used
for classication.
CHAPTER 1.
18
INTRODUCTION
1.2.3 Tracking
The purpose of tracking in the context of computer vision is usually to determine some properties(usually position and scale) of
an object in a next time step or next image frame. Tracking usually involves an estimation and a conrmation of the estimation.
Tracking makes it possible to integrate information about an object through time, it makes it possible to estimate the position
and scale of an object in the near future. This can be useful for
estimating time to contact, for example. In addition, this makes
it possible to limit processing in the next frame to the area in the
image around where the tracker estimated the object to be. This
is important in real-time image processing, for example.
Two commonly used methods for estimating the state of a system
from measurements are the Kalman lter and Condensation[25].
They consist of two main steps:
This
CHAPTER 1.
1.3
19
INTRODUCTION
Recently, there has been a lot of interest in driver assistance applications like lane detection, car detection, trac sign recognition,
and pedestrian detection. The reasons for this are the desire to
make trac safer, to make driving more comfortable for drivers,
and the recent technological progress in camera systems and computer hardware which make this research possible.
At time of
CHAPTER 1.
20
INTRODUCTION
The leaves of
the hierarchy contain all templates and the nodes of the hierarchy
contain the prototypes. The hierarchy of templates is scanned at
a coarse-to-ne scale through the image. If a template at a certain node in the hierarchy matches(the distance measure between
template and image is below a certain threshold), the templates
under the leaf are processed.
from intensity, color, and orientation features are used for initial
detection.
In [7] and [3], a combination of a shape-based method and stereo
is used for initial detection. An edge image is calculated from a
grayscale image.
edges.
ground edges.
Further processing
CHAPTER 1.
21
INTRODUCTION
Usually,
the legs of a pedestrian are combined into the same cluster. The
temporal change in the shape of such a cluster is used to distinguish it from clusters belonging to other objects in the image. The
limitation of using motion for detection is that it can only detect
moving pedestrians. Also, the clustering of a color image is time
consuming and does not always provide good segmentation results
if the input image has a complex background.
In
In
CHAPTER 1.
22
INTRODUCTION
1.3.3 Classication
It is possible to completely omit the initial detection step using a
pattern recognition approach. In [31], a support vector machine
classier is scanned through the image at every scale and every location. The classier is trained on haar wavelets calculated from
a database of pedestrian and non-pedestrian color images.
An
Ad-
Motion
The
CHAPTER 1.
INTRODUCTION
23
CHAPTER 1.
24
INTRODUCTION
generated by a stereo algorithm searching for the legs of a pedestrian are fed into a feed-forward time delay neural network. The
input to the network consists of pixel values from multiple frames.
Neurons in a higher layer are only connected to a subselection of
neurons in the lower layer, called receptive elds, which makes it
possible to detect specic leg poses and motion patterns.
1.3.4 Tracking
One approach to tracking is building a model of human shapes
and matching this model with image data.
The Condensa-
CHAPTER 1.
25
INTRODUCTION
The
Condensation
1.4
Selected approach
CHAPTER 1.
INTRODUCTION
26
ponent.
CHAPTER 1.
INTRODUCTION
27
dependent on the size of the scan window. This makes the initial
detection routine suitable for detection objects at all scales. These
routines are mainly interesting for r images. In grayscale images,
there is usually so much gradient response from background structures that it is dicult to segment pedestrians based on gradient
information alone.
The region based initial detection routine operate directly on the
intensity values.
tensity values.
1.4.3 Classication
The classication of pedestrians is challenging because of the high
variability among objects in the pedestrians class. Also, there are
many pedestrian like objects in trac scenes, for example trees,
poles, parts of building, and parts of cars. Therefore, an advanced
classication component is required.
CHAPTER 1.
28
INTRODUCTION
rectangle features [39], histograms of image gradients and orientations from the image gradients.
Therefore,
1.4.4 Tracking
The reliable tracking pedestrians is also challenging: the contrast
between the pedestrian and the background is often low, the resolution of pedestrians in trac scenes is often low, there are many
background objects that resemble pedestrians, due to the move-
CHAPTER 1.
INTRODUCTION
29
1.5
Contributions
CHAPTER 1.
30
INTRODUCTION
The fourth
And the
CHAPTER 1.
1.6
INTRODUCTION
31
The rest of this thesis is structured in the following way: chapter two contains a description of the initial detection routines and
the image preprocessing methods it uses. Chapter three describes
the methods used for classication and optimization of the classication. Chapter four describes the methods used for tracking.
Chapter ve contains the experimental results. Chapter six contains the discussion and conclusion.
Chapter 2
Initial detection
The goal of the initial detection is to nd regions of interest in
the image.
It
32
CHAPTER 2.
33
INITIAL DETECTION
initial detection routines which use the image gradients for generating initial detections.
2.1
Image pre-processing
Contrast stretching
Contrast stretching enhances an image by stretching the range of
intensity values to the maximum possible range. It does this by
applying a linear scaling to the image. The pixel with minimum
intensity in the image is set to the lowest possible value, the pixel
with maximum intensity in the image is set to the highest possible
value, and the other pixels are interpolated between the lowest and
highest possible values.
To perform contrast stretching, the minimum intensity value
min
from the
from the
original image
using
CHAPTER 2.
j(x, y) =
where
and
34
INITIAL DETECTION
lower
i(x,y)min
upper maxmin
upper
i(x, y) = min
min < i(x, y) < max
i(x, y) = max
lower
is the lowest
To
selected at
Smoothing
A convolution kernel for smoothing is usually constructed with
the 2D Gaussian
1 x2 +y2 2
G(x, y) =
e 2
2 2
For performance reasons, in this work smoothing is performed
with two 1-dimensional approximations to a Gaussian
0.25
0.5
G(y) =
0.25
for a kernel size of three and
CHAPTER 2.
35
INITIAL DETECTION
and
0.0625
0.25
G(y) =
0.375
0.25
0.0625
for a kernel size of ve. The oating point values in the kernels
are selected in a way that a convolution with a kernel can be
performed eciently with integer math.
G(x) =
G(x) = {1, 2, 1}
For example,
the white region are subtracted from the sum of the pixels in the
dark region. The lter in gure 2.1(a) gives a high output at a
vertical edge, the lter in gure 2.1(b) gives a high output at a
horizontal edge, and the lter in gure 2.1(c) gives a high output
at a diagonal edge. The motivation for using these features are
that they are extremely fast to calculate. They can be calculated
in constant time regardless of their size using an integral image
representation related to summed area tables from texture mapping in computer graphics. The integral image at location
the sum of the pixels above and to the left of
ii(x, y) =
X
x0 x,y 0 y
i(x0 , y 0 )
x, y :
x, y
is
CHAPTER 2.
where
36
INITIAL DETECTION
ii(x, y)is the integral image and i(x, y) is the original image.
s(x, 1) = 0,
and
can be calculated in only four array references. An example calculation is shown in gure 2.2.
calculated with
CHAPTER 2.
37
INITIAL DETECTION
Gradients
The image gradients are calculated by convolving the smoothed
image with two 1-dimensional kernels. For a lter size of 5, the
horizontal gradient image
like kernel
gx =
0
.
1
The vertical gradient image
gy
kernel
gy = {1, 2, 0, 2, 1}.
An approximation to the energy image
E = |gx | + |gy | .
CHAPTER 2.
38
INITIAL DETECTION
The gradient images and energy images can be used for edge detection. Also, the angle of orientation
from
gx and gy
= arctan(
gy
).
gx
An example of an image, its vertical gradients, its horizontal gradients, and its energy is shown in gure 2.3.
(a) A r image
CHAPTER 2.
2.2
39
INITIAL DETECTION
Two dierent gradient based initial detection routines are described in this section.
gradients. The second is based on scanning a vertical edge detector through the image at every location and scale.
CHAPTER 2.
40
INITIAL DETECTION
An example of this
An example of
CHAPTER 2.
41
INITIAL DETECTION
(a) A r image
The
CHAPTER 2.
42
INITIAL DETECTION
CHAPTER 2.
INITIAL DETECTION
43
Often, in the case of r image, the feet and the head of pedestrians
are brighter than the upper body because of the isolation of the
coat. These upper body regions do not generate strong gradient
magnitudes.
with the head. When another region of interest is found, the two
regions are combined and it is tested if the region matches the
width to height ratio of a pedestrian. An example of this is shown
in gure 2.7. In gure 2.7(b), the complete pedestrian cannot be
segmented as one region. After the combination of the feet with
the head, the pedestrian is segmented as one object 2.7(c). The
complete initial detection method is described in algorithm 2.
CHAPTER 2.
44
INITIAL DETECTION
(a) A r image
CHAPTER 2.
45
INITIAL DETECTION
If the width to
height ratio of the combined region matches the width to height ratio
of a pedestrian, add the coordinates of the combined region to the
output list.
CHAPTER 2.
46
INITIAL DETECTION
An example of a
CHAPTER 2.
47
INITIAL DETECTION
(a) A r image
Figure 2.9: Scanning an edge detector through the image at every position
and scale.
CHAPTER 2.
48
INITIAL DETECTION
Figure 2.10: Scanning an edge detector through the image at every position
and scale.
2.3
CHAPTER 2.
49
INITIAL DETECTION
(a) A r image
CHAPTER 2.
50
INITIAL DETECTION
(a) A r image
CHAPTER 2.
INITIAL DETECTION
51
Chapter 3
Classication
The purpose of classication is to determine which of the regions
of interest provided for example by the initial detection routine
contain pedestrians and which do not. The classication consists
of the following steps:
the generation of representative mutual exclusive training examples and validation examples of positive examples(pedestrians)
and negative examples(non-pedestrians)
the evaluation of the generalization performance of the classier on the validation examples
3.1
Classiers
52
CHAPTER 3.
53
CLASSIFICATION
performs the forward propagation: the assignment of a class label(positive or negative) to an unseen example. In this work, the
classication is always a two-class problem. The classier learns
to separate between two classes of objects, pedestrians and nonpedestrians. So, the output of the classier is always interpreted
as a binary value, pedestrian or non-pedestrian.
Two types of
one input layer which receives input from outside the network,
zero, one or more hidden layers which receive input from the
input layer and from each other(in the case that there are
multiple hidden layers),
yk
of a neuron
X
yk (t) = Fk (
wjk (t 1)yj (t 1) + k (t))
j
where
wjk
k (t)
neuron
k.
k,
and
Fk
to neuron
k,
CHAPTER 3.
54
CLASSIFICATION
F (sk ) =
1
.
1 + esk
yp
p.
dp
is as
This is
is dened as
E=
1X p
(d y p )2 .
2 p
For a neural network with at least one hidden layer, the error is
minimized with error back-propagation. The change in a weight
is proportional to the negative of the derivative of the error:
p wj =
where
E p
wj
p wjk =
E p p
y
spk j
CHAPTER 3.
where
spk
55
CLASSIFICATION
that
k.
o it holds
E p
p
p
0 p
p = (do yo )Fo (so )
so
unit h it holds that
No
X
E p
0 p
(dpo yop )Fo0 (spo )who .
p = F (sh )
sh
o=1
For more information about neural networks, see for example [6].
and calculate a
1
minimize kwk2
2
with respect to
parameters
yi ((w (xi )) + b) 1.
X
1
kwk2
i (yi ((w (xi )) + b) 1)
2
i=1
CHAPTER 3.
56
CLASSIFICATION
where
and should be
i .
b),
w, b
P
f (x) = sign( li=1 i yi k(x, xi )+
multipliers.
Two examples of kernels for classication are: a radial basis function kernel
k(x, y) = (x y)d
where
3.2
CHAPTER 3.
57
CLASSIFICATION
by
CHAPTER 3.
58
CLASSIFICATION
3.3
CHAPTER 3.
CLASSIFICATION
59
The
CHAPTER 3.
60
CLASSIFICATION
rate possible.
Once the available data has been divided into subdatasets, the
image features are calculated and stored as feature vectors. For
each of the initial detection routines, feature vectors are generated.
ation, the classication error on the validation dataset is monitored. During a certain amount of iterations after the training has
started, both the training error and the validation error decrease.
From a certain iteration on, the training error keeps decreasing
but the validation error increases.
the margin of the separating hyperplane and training error minimization, and in the case of a radial basis function kernel to nd
CHAPTER 3.
61
CLASSIFICATION
3.4
The standard classication with neural networks and support vector machines is often not optimal.
3.4.1 Bootstrapping
The class of negative training examples is usually not well dened. To generate a dataset of representative negative examples,
a bootstrapping technique can be applied. This works as follows:
A training dataset is generated from all positive training examples
and a small number of negative examples. A classier is trained on
this data and is tested on a dataset not used for training. The false
positives from the test dataset are added to the training dataset
and a new classier is trained on the training dataset. Again, this
classier is tested on another test dataset and the false positives
from this dataset are added to the training examples. This procedure is repeated until a desired false positive rate is achieved.
CHAPTER 3.
62
CLASSIFICATION
negative.
CHAPTER 3.
63
CLASSIFICATION
reduced set of vectors which approximate the original decision surface, the algorithm from [8] is applied. From the original decision
surface
Ns
X
a ya (sa )
a=1
where
Ns
zk
of size
Nz < Ns
support vectors
sa a
ya {1, 1}
reduced vector
Nz
X
k (zk )
k=1
where
method.
3.5
Feature selection
CHAPTER 3.
64
CLASSIFICATION
complete set of features. The method for generating a linear subspace of a lower dimension is the Karhunen-Love transform, a
standard technique from statistical pattern recognition. Principal
component analysis on a dataset consists of the following steps:
dataset.
2. Subtraction of the mean of each dimension in the dataset
ance matrix.
5. Selection of the eigenvectors with the highest eigenvalue as
For classication this is used as follows: principal component analysis is applied on the feature vectors of pedestrian images in the
training dataset. The matrix containing the eigenvectors for transforming the training dataset into the reduced features dataset is
stored. The transformation matrix is used to transform the feature
vectors of the training data to a lower dimension and a classier is
trained on the reduced set of feature vectors. Before each forward
propagation through the classier, the transformation is applied
to the feature vector using the stored matrix.
The purpose of
CHAPTER 3.
65
CLASSIFICATION
3.5.2 Adaboost
Adaboost [19] is a learning algorithm which constructs a classier
from a number of weak classiers. Adaboost is explained in algorithm 5. Adaboost can be used for feature selection by limiting
the number of iterations in the algorithm. The features which are
the most discriminative are found in increasing iteration order. In
this work, a single layer perceptron with a sigmoidal activation
function is used as weak learner.
Like principal
Algorithm 5 Adaboost.
yi {1, 1}
h(x)
is
wt,i =
w1,i =
1
, for all i.
n
1,..., T
w
Pn t,i
j=1wt,j
j, P
train a classier weak learner hj .
wt j = i wi |hj (xi ) yi |.
wt+1,i = wt,i t1ei , where ei = 0 if xi is classied correctly, ei = 0 if xi is classied incorrectly, et is the error of the classier
t
with the lowest error, and t =
1t
h(x) =
1
0
PT
PT
1
a
h
(x)
a
t
t
t
t=1
t=1
2
otherwise
CHAPTER 3.
66
CLASSIFICATION
A genetic
In [17], a
multi-objective feature selection is used with classication performance and feature dimension as tness criteria.
In [33], the
indi-
The in-
individuals is
selected. Uniform crossover is used to generate an ospring population from the parent population. This means each gene on a
chromosome from an ospring is randomly selected from the corresponding genes on the parent chromosomes. The probability of
crossover is set to 0.7.
CHAPTER 3.
67
CLASSIFICATION
important choice in multi-objective optimization is how individuals with multiple tness values are sorted before reproduction. In
this work, the method from [16] is used. This method is based on
a comparison operator
has two attributes: a
The calculation of the rank is shown in algorithm 6, the calculation of the crowding distance is shown in algorithm 7. The
sort
max
min
fm
and fm
are the maximum and minimum value of object
m, respectively. The operator in algorithm 6 is the domination
operator. An individual p with n objectives {p1 , ..., pn } dominates
an individual q if
i n j
is dened as
= jrank )
and
CHAPTER 3.
68
CLASSIFICATION
Input: A population
Sp =
, .np is
q)
p P.
then
q P.
Sp is Sp {q}.
5. Else if (q
p)
then
np = 0
then
6. If
7.
i=
np is np + 1.
1.
Fi 6= .
Q=.
p Fi .
q Sp .
12.
p P.
0.
nq = nq -
13. If
1.
nq = 0
14.
i = i + 1.
15.
Fi = Q.
then
qrank = i + 1, Q = Q {q}.
p P.
CHAPTER 3.
69
CLASSIFICATION
Algorithm 7 Crowding-distance-assignment.
Input:A population
in
of side
P.
i = 1, ..., N
P (i)distance = 0.
5.
P (1)distance = P (N )distance =
3.6
m P (i).
i = 2, ..., N 1
max
min
P (i)distance = P (i)distance + (I(i + 1).m I(i 1).m)/(fm
fm
)
CHAPTER 3.
Algorithm 8
CLASSIFICATION
70
location.
Input: An image
Output: A list of positive classications
1. Repeat steps 2 until 4 for every scale and image position
2. In the current scanning window, calculate the image features for classication.
3. Classify the feature vector from the current scanning window.
4. If the classication output is positive, add the coordinates of the scanning window to the output list.
CHAPTER 3.
CLASSIFICATION
Figure 3.4: Classifying the whole image at every scale and resolution.
71
Chapter 4
Tracking
The purpose of tracking is to keep track of an object through
successive image frames after a positive classication. In the case
of pedestrians, this is challenging. The reasons for this are that:
the background in urban trac scenes is complex and contains often many pedestrian alike objects,
72
CHAPTER 4.
73
TRACKING
4.1
P = {p1 , ..., pm }
sets of points.
Q = {q1 , ..., qn },
and
If
d1
and
in
qQ pP
d2
distances.
The partial Hausdor distance can be used to measure the inequality between subsets of two sets of points.
P = {p1 , ..., pm }
and
Q = {q1 , ..., qn },
distance is
th
Hk (P, Q) = KpP
min kp qk
qQ
where
1 k m.
Here, the
k th
P (intensity
Q,
feature set
set
Q.
and
is tracked.
to estimate the
position and
CHAPTER 4.
74
TRACKING
for the pedestrian in the next frame in which the feature set
is calculated.
This
In
CHAPTER 4.
75
TRACKING
pt1 ,
I,
P.
pt .
and
pt with feature
Q is minimal.
set
xt
xt .
of the object
L1
of
image features which are present in the model but do not correspond
to an image feature in frame
L2
of image features
L1
L2
to
the model.
4.2
Mean shift tracking [11] is based on the assumption that the position and scale of an object will not change much from one frame
to the next. A target model with image feature
function
qz ,
has a density
has a fea-
ture distribution
whose density
(y) [p(y), q] =
Z p
pz (y)qz dz.
(4.1)
Pm
u = 1 for the model
q = {
q1 , ..., qm } P
with
u=1 q
m
and p
(y) = {
p1 , ..., pm (y)} with u=1 pu = 1 for the candidate. A
bin histogram:
CHAPTER 4.
76
TRACKING
kernel
(
k(x) =
1 1
2 cd (d
+ 2)(1 x)
if
x<1
otherwise
x.
y1
y0 xi
2
i=1 xi wi g(
h
)
y1 = P
y0 xi
2
nh
i=1 wi g(
h
)
Pnh
where
y0
wi
wi
is
is the
wi =
m
X
s
[b(xi ) u]
u=1
xi , nh
y0 .
The weights
where
(4.2)
qu
pu y0
The radius of
CHAPTER 4.
77
TRACKING
y0 ,
x.
y1 .
{
pu (
y0 )}u=1,...,m in the current
[
p(
y0 , q] with 4.1.
frame at
y0
y1 with
[
p(
y1 ), q]
4.2.
at the new position
y1 .
4. Repeat step 5 while
5.
[
p(
y1 ), q] <[
p(
y0 ), q].
y1 = 12 (
y0 + y1 ).
6. If
4.3
k
y1 y0 k > y0 = y1 ,
go to step 1.
p(xt |Zt )
xt
t, zt
CHAPTER 4.
78
TRACKING
multiple times.
tic step followed by a random step. This gives the prior density
p(xt |Zt1 )
t.
of timestep
p(zt |xt )
posterior density
p(xt |Zt )
of timestep
t.
x position, y position,
and scale are the parameters for tracking. Each sample represents
a position and scale. For calculating the observation density from
the image data, the contour based approach from [25] is followed.
A small set of model contours is calculated by performing a principal component analysis on a set of manually labeled contours(as
described in [1]). An example of a model contour is shown in gure 4.1. A sample represents a contour with a certain scale. The
distance from the contour along a xed set of
normals along
p(z|x)
of
a sample:
(
p(z|x) exp
where
r(sm )
r(sm )
along normal
M
X
1
min(z1 (sm ) r(sm ); ) ,
2rM
m=1
z1 (sm ) is the closest feature to
r and are constants. In the case
is a model contour,
m,
and
CHAPTER 4.
79
TRACKING
used as features for tracking. The complete outline of the Condensation algorithm is shown in algorithm 11.
1.
of timestep
t = 1, ..., T
T.
iterations.
sit+1 with the smallest sum of distances calculated for its ob-
CHAPTER 4.
80
TRACKING
4.4
A very simple model-free tracking method is based on the assumption that if a pedestrian is present in a certain frame, it will be
present at approximately the same location at approximately the
same scale in the next image. To nd the pedestrian in the next
image, the output from the initial detection routine is used. The
positions and scales of the regions of interest from the initial detection routine are evaluated. If a matching region of interest is
found, it is selected as the location of the pedestrian in the next
frame.
The
j = 1. dj =sqrt((xcurrent xj )2 + (ycurrent yj )2 )
ds =scurrent sj
1. Initialize
di =sqrt((xcurrent xi )2 + (ycurrent yi )2 )
4. if
5.
i = 2, ..., N
and
snew = sj .
and
di =scurrent si .
and
CHAPTER 4.
TRACKING
81
(a) A positive classication in a r image (b) The initial detections in the next frame
4.5
CHAPTER 4.
82
TRACKING
much from one frame to another. Tracking a pedestrian means locating the highest peak in classication outputs near the position
of the pedestrian in the previous frame. This method can possibly
reduce the false positive detection rate because it can be expected
that false positives do not generate a similar peak in classication output as a pedestrian does. An example of the classication
outputs at a few scales was shown in gure 3.4.
The complete
a pedestrian.
t = 1, ..., T .
t.
p {p1 , ..., pn }.
p.
Chapter 5
Experimental results
5.1
Initial detection
To evaluate
This is shown by
the overlap of the red outlined rectangle with the blue outlined
rectangle in gure 5.1. There is a false positive detection if the
initial detection routine outputs a region of interest and there is
no corresponding region of interest in the ground truth database.
This is shown by the blue outlined rectangle in gure 5.1 for which
there is no corresponding red outlined rectangle. There is a false
83
CHAPTER 5.
84
EXPERIMENTAL RESULTS
In
addition, the ground-truth database is produced by manually labeling image sequences. Usually, there is always a certain amount
of inaccuracy in the ground truth data.
CHAPTER 5.
85
EXPERIMENTAL RESULTS
interest.
(a) Initial detections suitable for classica- (b) Initial detections unsuitable for classitions
cation
Figure 5.2: Suitability of initial detections for classication.
In order to properly measure the performance of the initial detection routines, the ground-truth database should contain a large set
of representative sequences. The r images ground-truth database
for initial detection consists of 4443 pedestrians for the temper-
CHAPTER 5.
EXPERIMENTAL RESULTS
86
CHAPTER 5.
EXPERIMENTAL RESULTS
87
Figure 5.4 shows the mean true positive rate of the initial detec-
At 35
CHAPTER 5.
EXPERIMENTAL RESULTS
88
CHAPTER 5.
89
EXPERIMENTAL RESULTS
Figure 5.6 shows the calculation times of the initial detection routines for r images. These values do not include the time required
for image pre-processing. All values are measured on a computer
with a 1470 MHz. AMD Athlon processor.
So there
An ex-
while in gure 5.7(b) which is recorded 6 frames later, the sudden appearance of sun light causes the image to be much brighter
than the previous image.
CHAPTER 5.
EXPERIMENTAL RESULTS
90
CHAPTER 5.
EXPERIMENTAL RESULTS
91
Figure 5.9 shows the calculation times of the initial detection routines for grayscale images. These values include the time required
for image pre-processing. All values are measured on a computer
with a 1470 MHz. AMD Athlon processor.
CHAPTER 5.
92
EXPERIMENTAL RESULTS
5.2
Classication
The performance of the classication is measured by calculating the true positive rate and the false positive rate of a classier/image feature combination. This is usually visualized with a
so called ROC(Receiver Operating Characteristic) curve. A ROC
curve displays the true positive rate of the classier at a certain
false positive rate. An example of a ROC curve is shown in gure
5.10. For neural networks, the ROC curve is created by varying
the threshold(normally 0.5) at the output neuron.
For support
b).
P
f (x) = sign( li=1 i yi k(x, xi ) +
CHAPTER 5.
EXPERIMENTAL RESULTS
93
CHAPTER 5.
94
EXPERIMENTAL RESULTS
the datasets of 5 , 15
CHAPTER 5.
EXPERIMENTAL RESULTS
95
Figures 5.11, 5.12, and 5.13 show the ROC curves for the support
vector machine classication on a test set where the training data
is generated by the initial detection routines from sections 2.3 and
2.3 for dierent image features.
CHAPTER 5.
EXPERIMENTAL RESULTS
96
CHAPTER 5.
EXPERIMENTAL RESULTS
97
CHAPTER 5.
EXPERIMENTAL RESULTS
98
CHAPTER 5.
EXPERIMENTAL RESULTS
99
Figures 5.14, 5.15, and 5.16 show the ROC curves for the neural
networks classication on a test set where the training data is
generated by the initial detection routines from section 2.3.
CHAPTER 5.
EXPERIMENTAL RESULTS
100
CHAPTER 5.
EXPERIMENTAL RESULTS
101
CHAPTER 5.
EXPERIMENTAL RESULTS
102
CHAPTER 5.
EXPERIMENTAL RESULTS
103
Figure 5.17 shows the processing times of the feature vector calculation for forward propagation through the classier.
Figure 5.18 shows the classication times of the dierent classier/feature combinations.
CHAPTER 5.
EXPERIMENTAL RESULTS
104
The train-
CHAPTER 5.
EXPERIMENTAL RESULTS
105
the support vector machine and neural networks classication results on a test set of grayscale images where the training data is
generated using the bootstrapping method from section 3.4.1.
CHAPTER 5.
EXPERIMENTAL RESULTS
106
CHAPTER 5.
EXPERIMENTAL RESULTS
107
CHAPTER 5.
EXPERIMENTAL RESULTS
108
CHAPTER 5.
109
EXPERIMENTAL RESULTS
Figure 5.18 shows the classication times of the dierent classier/feature combinations for r and grayscale images.
data from 5 .
All data is
CHAPTER 5.
EXPERIMENTAL RESULTS
110
CHAPTER 5.
EXPERIMENTAL RESULTS
111
CHAPTER 5.
EXPERIMENTAL RESULTS
112
CHAPTER 5.
EXPERIMENTAL RESULTS
(b) Adaboost
Figure 5.26: Classication times of optimization algorithms.
113
CHAPTER 5.
EXPERIMENTAL RESULTS
114
CHAPTER 5.
EXPERIMENTAL RESULTS
115
5.3
Tracking
The
CHAPTER 5.
is allowed.
EXPERIMENTAL RESULTS
116
horizontal and vertical direction from the width and height of the
ground-truth region of interest is allowed in the horizontal and
vertical direction.
classication.
CHAPTER 5.
EXPERIMENTAL RESULTS
117
Figure 5.29 shows the processing times for the dierent tracking
algorithms in r images.
CHAPTER 5.
EXPERIMENTAL RESULTS
118
Chapter 6
Discussion
6.1
true positive detection rate is quite low and the false positive rate
is high.
The true
5.23, 5.24 and 5.25 show that the component classier and the
optimization methods for classication lead to better classication
performance.
Figure 5.28 and section 5.3.2 show that aside from the tracking method which integrates the output of the classier which
is scanned through the whole image at every position and scale
through the image, the tracking results are very disappointing.
119
CHAPTER 6.
120
DISCUSSION
The
20 ,
CHAPTER 6.
121
DISCUSSION
satisfactory anymore.
Figure 5.4 shows that for r images, the vertical gradient based
initial detection routine and the initial detection routine which
scans a vertical edge detector through the image at every position
and scale perform the best at low outside temperatures.
There
Because of
the many vertical structures in the background in urban scenarios, the initial detection routines segment the pedestrian together
with a background structure or do not segment the pedestrian at
all because they are vertical structures with a stronger gradient
magnitude in the image. Examples are shown in gure 6.1 for r
images. An example of the many vertical structures in grayscale
images and the diculties this creates for the initial detection routines from section 2.2 are shown in gure 6.2. What may happen
is that the initial detection routines only nd a part of the pedestrian. The match with the ground truth data, which is used for
evaluating the initial detection routines, can for this reason also
be unsuccessful.
CHAPTER 6.
122
DISCUSSION
CHAPTER 6.
123
DISCUSSION
So there is
This
Even more complications arise when there are groups of pedestrians present in the image.
CHAPTER 6.
124
DISCUSSION
The
6.1.2.
The threshold parameters applied in algorithms 2.5, 3, and 4 adapt
the initial detection routines for r images to a specic temperature range.
6.1.2 Classication
Figures 5.11, 5.12, 5.13 and 5.14, 5.15, 5.16 for r images, and
CHAPTER 6.
125
DISCUSSION
gures 5.19, 5.20, 5.21 for grayscale images show that the classication results are very good in general.
The classier/image
Better means
the ROC curve lies higher and more to the left. The reason for
at 5 .
Figures 5.11, 5.12, 5.13 and gures 5.14, 5.15, 5.16 show that for
r images, the classication results of the histograms of gradients
and gradient orientations features are generally the best for both
15 .
and
the gradient in combination with the orientations of the gradient perform better than the orientation of the gradient alone. In
addition, a feature set as large as the rectangle feature set(1140
features) is not required for successful classication, the smaller
feature set of histograms of gradient magnitudes and gradient orientations(448 features) performs better.
CHAPTER 6.
126
DISCUSSION
and 5.21 it becomes clear that in the case of grayscale images, the
classication results of the histograms of orientations features are
comparable to the results of the rectangle features.
By comparing gures 5.11, 5.12, 5.13 to gures 5.14, 5.15, 5.16
and by looking at gures 5.19, 5.20, 5.21 it becomes clear that
the results of the support vector machine classication are generally a bit better than the results of the neural network classication. However, the forward propagation speed of neural networks
is much higher than the forward propagation speed of support
vector machines. This can be seen in gure 5.18.
Comparing gures 5.11, 5.12, 5.13 and gures 5.14, 5.15, 5.16 to
gures 5.19, 5.20, 5.21 shows that the classication results on r
images are comparable to the the classication results on grayscale
images. These gures should not be compared directly because the
training data for the r data is generated from initial detection
routines while the training data for the grayscale data is generated
from ground truth data using a bootstrapping method.
The classication results on front/back views of pedestrians are
comparable to the results on side views of pedestrians. This becomes clear by looking at gures 5.11, 5.12, 5.13 and gures 5.14,
5.15, 5.16 for r images and gures 5.19, 5.20, 5.21 for grayscale
images. In some gures, the front view results are better while in
other gures, the side view results are better. This is remarkable
because the class of side views of pedestrians is less well dened
than the class of front/back views of pedestrians. The side view
class of pedestrians contains pedestrians oriented to the left and
to the right, while the front view class contains pedestrians of a
homogeneous orientation.
classication, the classier has enough representational capabilities to learn both orientations.
Section 5.2 mentions that a minimum region of interest size of
20x40 pixels is used for r images and a minimum size of 24x48
CHAPTER 6.
127
DISCUSSION
age sizes there is too little feature information for calculating the
histograms of gradient orientations and gradient magnitudes and
the classication performance strongly decreases. In general, it is
better to use higher resolution images for object detection.
For
CHAPTER 6.
128
DISCUSSION
CHAPTER 6.
129
DISCUSSION
CHAPTER 6.
130
DISCUSSION
c and
ing the support vector machine. These are not the optimal values
for the reduced network and resulted in a large set of support
vectors.
From the feature selection methods, the PCA feature selection
method gives the best classication results. It is also by far the
easiest and fastest feature selection method to apply. The PCA
is therefore strongly preferable over Adaboost and multi-objective
optimization.
One general limitation of neural networks and support vector machines is that both types of classiers are black boxes.
It can
6.1.3 Tracking
It becomes clear by looking at gure 5.28 and section 5.3.2 that
the tracking methods which use image features for tracking: the
CHAPTER 6.
DISCUSSION
131
the shape of the object should not change much from one
frame to the other so the distribution of image features does
not change much from frame to frame,
CHAPTER 6.
DISCUSSION
132
(a) On a positive classication, the tracker is (b) Tracking locks on a part of the pedesstarted.
trian
Figure 6.6: Example of Hausdor tracker in r images.
CHAPTER 6.
133
DISCUSSION
(a) frame 1
If there is
CHAPTER 6.
134
DISCUSSION
(a) frame 1
CHAPTER 6.
DISCUSSION
135
trackers which integrate initial detection and classication outputs over time are that they cannot operate autonomously, they
require the initial detection routines and the classication routines
respectively to run.
The results of the tracker which integrates initial detections through
time is based on the performance of the initial detection routines.
As gure 5.4 and gure 5.8 show, the performance of the initial
detection routines applied to r images is reasonable and the performance of the initial detection routines applied to grayscale images is unacceptable. This severely limits the performance of the
tracker which integrates initial detection over time. Much better
performing is the tracker which integrates classication outputs
over time.
CHAPTER 6.
DISCUSSION
136
CHAPTER 6.
DISCUSSION
137
humans have no problem assigning object parts or object features to an object. In image processing, simple heuristics or
exhaustive search is required to assign object parts or object
features to an object.
6.2
Further work
CHAPTER 6.
138
DISCUSSION
the camera than background structures. This can be used in segmentation. Distance information is available from stereo cameras
or from distance measurement sensors. Much work has been done
on the use of stereo cameras in pedestrian detection [42], [45], and
[20]. In addition, the use of distance information makes classication easier because the class of negative examples is limited to
non-pedestrian objects at a certain distance to the camera.
When the car is not moving because it is waiting for a trac
light, the number of pedestrians crossing the road in front of the
car in the image may be large. In this case, motion information
from optic ow or background subtraction can be used for initial
detection.
For a human, pedestrians are easier to detect in color images than
in grayscale images.
CHAPTER 6.
139
DISCUSSION
tion images used in this work. The largest performance improvement can be achieved by using high resolution, high quality(with
respect to contrast, reections, overblending) image data.
It is
another possibility would be to test other image features for classication, for example Gabor features, although they may be difcult to apply in real-time.
In order to reliably track moving pedestrians from a fast moving
camera in real-time, it is required to have a tracker which is exible with respect to shape change, large translation, and scaling
of the object it tracks. The most promising seems to be a shape
adaptive method like applied in [2]. This method has the disadvantages that it cannot be used for very small pedestrians because
of aliasing problems and that it is dicult to apply it in real-time.
The use of higher resolution images makes the use of a method
like this more feasible.
The problem of detecting occluded pedestrians or pedestrians in a
group occluding each other is an even more dicult problem than
detecting single pedestrians.
CHAPTER 6.
DISCUSSION
140
6.3
This work describes a system for detecting pedestrians in r images and grayscale images. The pedestrian detection system consists of three main components:
a classication component which classies the regions of interest from the initial detection routines as pedestrians or
non-pedestrians,
CHAPTER 6.
DISCUSSION
141
rectangle features,
Adaboost
multi-objective optimization
a Condensation tracker,
CHAPTER 6.
DISCUSSION
142
the systematic evaluation of classier/image feature combinations for pedestrian classication in r and grayscale images,
an evaluation of the following classier optimization algorithms on classication performance and forward propagation speed of the classiers:
the evaluation of three existing tracking algorithms for tracking pedestrians in real-time in r and grayscale images:
CHAPTER 6.
143
DISCUSSION
The initial detection methods proposed in this work give reasonable results on r images. The results on grayscale images
are not acceptable.
The support vector machine classication results are generally a bit better than the neural network classication results.
mately 5000 classications per image this method is not applicable in real-time and its false positive rate is too high for
use in a real system.
Integrating detections over time for the stabilization of positive detections does not improve detection results because
false positives are for a large degree persistent in time.
CHAPTER 6.
DISCUSSION
144
Bibliography
[1] A. Baumberg and D. Hogg.
image sequences. In
Technical report.
Pro-
IEEE Conference on Computer Vision and Pattern Recognition, pages 495501, 1997.
[6] C.M. Bishop.
Shape-
145
146
BIBLIOGRAPHY
In-
An Introduction to
IEEE Trans-
IEEE Transactions
147
BIBLIOGRAPHY
A mul-
Proceedings of the
E-
Internation Journal
Proceedings of the
148
BIBLIOGRAPHY
Snakes:
Active
Proceedings
C. Papageorgiou,
and T. Poggio.
Example-
IEEE
symposium, 2002.
[31] M. Oren, C. Papageorgiou, P. Sinha, E. Osuna, and T. Poggio. Pedestrian detection using wavelet templates.
Proceed-
ings of conference on computer vision and pattern recognition, pages 193199, 1997.
[32] C. Papageorgiou. A trainable system for object detection in
images and video sequences.
PhD. Thesis.
149
BIBLIOGRAPHY
Rotation invari-
IEEE Conference
Proceed-
Proceedings of the
150
BIBLIOGRAPHY
Curriculum Vitae
20.02.1977
Geboren in Drachten
09.1989 - 06.1997
09.1997 - 11.2002
02.2003 - 04.2006
Wissenschaftlicher Mitarbeiter,
Institut fr Neuroinformatik,
Ruhr Universitt Bochum