Acetd2 Sys843 PDF

SYS843
D. Méta heuristique et optimisation

évolutionnaire
Partie 2: Optimisation par essaims

particulaires
Eric Granger
Ismail Ben Ayed
CONTENU DU COURS
SYS843: Réseaux de neurones et systèmes flous

D2-2
CONTENU DU COURS
D.2 Optimisation par essaims particulaires
1) Intelligence d’essaims
2) Algorithme PSO canonique
3) Variantes de PSO
− Environnements dynamiques
− Optimisation multicritère
4) Application – Optimisation évolutionnaire de RNA

D2-3
1) Intelligence par essaims
Définitions
Une famille de techniques en AI:
− Les systèmes sont typiquement constitués
d’une population d’agents simples qui
interagissent entre eux, et avec leur
environnement
− Aucun contrôle centralisé, mais
l’interaction entre les agents permet un
comportement global
− Inspiré des phénomènes naturels: colonies
fourmis, poissons, flocage d’oiseaux, etc.
D2-4
Exemple d’application
Robotique distribuée:
Swarm-bots: Swarms of self-assembling artifacts
− Une population de robots mobiles (s-bots) autonomes qui
peuvent s’auto-organiser pour naviguer, percevoir et
manipuler
Swarmanoid: Towards Humanoid Robotic Swarms
− Systèmes de robotique distribués,
conçus avec des petits robots
autonomes qui sont interconnectés
dynamiquement
− https://youtu.be/Hyk3D6j1DsU
D2-5
Définitions
Optimisation par essaims particulaires (ou Particle Swarm
Optimization, PSO):
− Technique d’optimisation stochastique à base de
populations
− Développé par by Eberhart et Kennedy en 1995
− Le PSO est initialisé avec une population aléatoire de
solutions potentielles (essaim de particules) dans l’espace
de recherche, et explore l’espace pour un optima global
− Les particules planent sur l’espace de recherche, guidées
selon l’emplacement des particules avec la meilleure fitness

D2-6
Concept général
Un algorithme PSO imite le comportement social

d’animaux et d’insectes
− Les membres de la population
interagissent entre eux tout
en apprenant de leur propre
expérience
− Ils bougent graduellement dans
des meilleures régions de
l’espace des solutions

D2-7
Concept général
PSO fait évoluer un essaim de particules:
− chacun réside à un endroit dans l’espace de recherche
− la valeur de coût (ou fitness) lié avec chaque particule
indique la qualité de sa position dans l’espace
Les particules planent sur l’espace de recherche avec une
certaine vélocité (direction et vitesse) qui est influencée par:
1. la vitesse et direction actuelle,
2. la meilleure position qu’elle a trouvé à date, et
3. la meilleure solution trouvé à date par ses voisins
L’essaim converge éventuellement à des positions
‘optimales’
D2-8
Concept général
La vélocité des particules (direction et vitesse) est guidée par

deux composantes:
1. cognitives – son expérience antérieure d’exploration, et
2. sociales – l’expérience d’exploration dans son voisinage
Avantages:
− peut converger rapidement vers des bonnes solutions
− implémentations simples, avec peu de paramètres
− versatilité: peut résoudre beaucoup de différents problèmes
Applications aux problèmes avec:
− un espace de recherche continu, discret ou mixte
− optimisation dynamique et multicritère, avec 1+ minimums locaux
D2-9
PSO versus AG
Lien avec les techniques de calcul évolutionnaires
(e.g., algorithmes génétiques, AG)
− Comme un AG, le PSO exploite aussi des populations
de solutions pour chercher l’optima, mais…
− PSO ne suit aucun concept de ‘survie des plus forts’,
même s’il exploite le concept de fitness
− PSO n’utilise pas d’opérateurs évolutionnaires comme celles
de mutation et de croisement
− chaque particule évolue selon son expérience antérieure, et
selon ses relations avec les autres particules de l’essaim
…par contre, certains algorithmes hybrides intègrent des concepts d’AG et de PSO
D2-10
CONTENU DU COURS
3) Variantes de PSO

D2-11
Particle Swarm Optimization

LIACS Natural Computing Group Leiden University
présentation basée sur l’article

R. Poli, J. Kennedy, T. Blackwell,
‘Particle Swarm Optimization: An Overview,’
Swarm Intelligence, 1:1, 33-57, 2007.

D2-12

D2-13
Original PSO - Algorithm
Algorithme
1. Randomly
• Randomly initialize
initialize particle
particle positions positions and velocities
and velocities
• While not terminate
2. While not terminated
- For
• each
For each particle i: i:
particle
- Evaluate
- Evaluate fitness yi atf(xcurrent
fitness position xi
i ) at current position xi
- If- yiIfisf(x
better than pbest then update pbesti and ipand
i ) is better thani pbesti then update pbest i pi
- If- yiIfisf(x i ) is than
better bettergbest
thani gbest then update
then iupdate gbestgbest
i and igand
i gi
- For
• Foreach particle
each particle
- Update
- Update velocity vi and
velocity v andposition xi using:
position x using:
i i
vi vi U (0, φ1) ( pi xi ) U (0, φ2 ) ( gi xi )
xi xi vi
D2-14
7 LIACS Natural Computing Group Leiden University
Notation
For each particle i :
• xi : a vector denoting its position
• vi : the vector denoting its velocity
• f(xi ) : denotes the fitness score of xi
• pi : the best position that it has found so far

• pbesti : denotes the fitness of pi
• gi : the best position that has been found so far in its

neighborhood
• gbesti : denotes the fitness of gi
D2-15
Notation
Velocity update:
• U( 0, ϕi ) : a random vector uniformly distributed in [ 0, ϕi ]
regenerated at every generation for each particle
• ϕ1 : the acceleration coefficients determining the scale of the
force in direction of pi
• ϕ2 : are the acceleration coefficients determining the scale of
the force in direction of gi
• ⊗ : denotes the element-wise multiplication operator

D2-16
Original PSO - Velocity update
riginal PSO - Velocity update
Velocity Update
vi vi U (0, φ1) ( pi xi ) U (0, φ2 ) ( gi xi )

Momentum
pi(t)
The force pulling the particle to continue gi(t)
mentum its current direction

• Momentum:
The force pullingCognitiveThecomponent
the particle force pulling thepi(t)
to continue particle
gi(t)
U(0,ϕ1) (pi-xi)
to direction
ts current continueThe itsforce
current direction
emerging from the tendency to
return to its own best solution found so far xi(t) vi(t+1)
xi(t+1)
• Cognitive
gnitive component component:
Social component The force emerging U(0,ϕ1) (pi-xi)
The force emerging
from the from the
The tendency tendency to
to return
force emerging from theto its own
attraction of vi(t)
eturn to its own best

pbest the solution
solution found
best solution
found so far
found
so farfar
so in xits
(pbest) i(t) vi(t+1)
xi(t+1)
cial component neighborhood

• Social component: The force emerging from the attraction of vi(t)
The force emerging from the attraction of
the gbest
he best solution found solution
so far in found so far in its neighborhood (l/gbest)
its
9 LIACS Natural Computing Group Leiden University
neighborhood
D2-17
Neighborhood Topologies
For the social component, the neighborhood of each particle is
defined by its communication structure (its social network):
1. Geographical neighborhood topologies:
Neighborhood
• Based on Euclidean proximity intopologies
the search space
• Close to the real-world paradigm but computationally
expensive
Ring (local best) Global best
Random graph Star

D2-18
Geographical neighborhoods Communication network topologies
2. Communication network topologies:
• Communication networks based on some connection graph
architecture (e.g., rings, stars, von Neumann networks)
• Favored over geographical neighborhood because of better
convergence properties and fewer computations
• gbest: each particle is influenced by

the best found from the entire swarm
• lbest: each particle is influenced only

by the particles in its local
neighborhood

D2-19
• There is no clear way of selecting the best topology for a
given problem
• Compromise between exploration - exploitation
Ø some neighborhood topologies are better for local search,
while others for global search
- lbest: topologies seem better for the global distributed
search of an optima
- gbest topologies seem better for local search of an optima
(it propagates information fastest to the entire population)

D2-20
Synchronous vs. Asynchronous
• Synchronous updates:
– Personal best and neighborhood bests are updated separately
from position and velocity vectors
– Slower feedback about best positions
– Better for gbest PSO
• Asynchronous updates:
– New best positions updated after each particle position update
– Immediate feedback about best regions of the search space
– Better for lbest PSO
D2-21
Acceleration
Acceleration Coefficients: ϕ
Coefficients
• The
• Theboxes
boxesshow
showthethe
distribution
distributionof
of the
the random vectorsofof the
random vectors
attracting forcesforces
the attracting of theoflbest and best
the local gbestand global best
• The
• The acceleration
acceleration coefficientsdetermine
coefficients determine the
the scale
scale distribution of
distribution of the random cognitive component vector and
thethe
random
socialcognitive
component and social component vectors
vector
pi(t)
gi(t) pi(t)
gi(t)
xi(t)
xi(t)
vi(t)
vi(t)
ϕ1 = ϕ2 = 1 ϕ1 , ϕ2 > 1
14 D2-22
LIACS Natural Computing Group Leiden University
Original PSO -Stability
Problems problem
• The acceleration coefficients should be set sufficiently high, but
• The acceleration
higher coefficients
acceleration coefficientsshould beless
result in setstable
sufficiently
systemshigh
• Higher
whereacceleration
velocity tendscoefficients
to explode. result in less stable
•systems in which
Solution: thevelocity
keep the velocityv has a tendency
withini
to explode
the range [-v ; +v ]. max max
• •ToHowever,
fix this, the limiting
velocity the velocity does not
vi is usually keptnecessarily
within theprevent
range
particles from leaving the search space, nor does it guarantee
[-vmax, vmax]
convergence
• However, limiting
SYS843: Réseaux de the velocity
neurones et systèmes flous does not necessarily prevent
D2-23
particles from leaving the search space, nor does it help to
Inertia weighted PSO
• •AnSolution: an inertia
inertia weight ω was weight ω to
introduced to control thevelocity
control the velocityexplosion:
explosion:
vi ωvi U (0, φ1) ( pi xi ) U (0, φ2 ) ( gi xi )
• •If ω,
If ϕω, ϕ1 and
1 and
ϕ2 set
ϕ2 are arecorrectly,
set correctly, this update
this update rule for
rule allows allows for
convergence
convergence without the use
without usingof vvmax
max
• The inertia weight can be used to control the balance between
•exploration
Weight ω andcan be used to control the balance between
exploitation:
– exploration and increase
ω ≥ 1: velocities exploitation:
over time, swarm diverges
– 0Ø<ifωω<≥ 1:1: particles
velocities increase
decelerate, over time, swarm
convergence dependsdiverges
ϕ1 and ϕ2
Ø if 0 < ω <settings:
• Rule-of-thumb 1: particles decelerate,
ω = 0.7298 andconvergence depends ϕ1 and ϕ2
ϕ1 = ϕ2 = 1.49618
•Shi,Rule-of-thumb: set ω = 0.7298 and ϕ1 = inϕ2Evolutionary

Y. Eberhart, R., 'A modified particle swarm optimizer',
= 1.49618
Computation
SYS843: Réseaux deProceedings, 1998.
IEEE World Congress on Computational
neurones et systèmes flous
D2-24
Intelligence., The 1998 IEEE International Conference on , pp. 69-73 (1998).

D2-25
Time decreasing inertia weight
•• Eberhart
Solution: and
time Shi suggested
decreasing inertiato decrease ω over time
weight
(typically from 0.9 to 0.4) and thereby gradually changing
• It has been suggested to decrease ω over time (typically from
from an exploration to exploitation
0.9 to 0.4) and thereby gradually change from an exploration
to exploitation
•• Other
Other schemes
schemesfor afor
dynamically changing
a dynamically inertia weight
changing inertia weigh
have also been proposed
are also possible and have also been tried
D2-26
Eberhart, R. C. Shi, Y., 'Comparing inertia weights and constriction
Examples de functions pour benchmarking
Rastrigrin Griewank

D2-27
CONTENU DU COURS
3) Variantes de PSO

D2-28
D.2(3) Variantes de PSO
Particle Swarm Optimization

And Introduction and its Recent developments
X. Li et A. P . Engelbrecht
(Tutorial prepared for GECCO 2007)

D2-29

D2-30
Many other PSO variants
• Binary/discrete particle swarms

• Constricted coefficients PSO
• PSO for noisy fitness functions
• PSO for dynamical problems
• PSO for multi-objective optimization problems
• Adaptive particle swarms
• PSO with diversity control
• Hybrids (e.g. with evolutionary algorithms)

D2-31
Dynamic Optimisation Problems
• Originally developed for static optimization problems, the

PSO algorithm has been adapted for the dynamic case by
adding mechanisms to:
1) modify the social influence to maintain diversity in the
optimization space and detect several optima;
2) detect changes in the objective function by using the
memory of each particle; and
3) adapt the memory of its population if change occur in the
optimization environment

D2-32
Dynamic Niching PSO [Nickabadi, 2008]:

maintains diversity among subswarms
neighborhood topology: dynamically create subswarms

around local best particle positions
̶ the size of neighborhoods is defined by the distance
among particles
free particles (not in a subswarm): explore independently

̶ re-initialized if they converge to non-optimal solutions
33
Modified Speciation PSO [Blackwell, 2008)]:
maintains diversity among and within subswarms
neighborhood topology: groups subswarms around local
best particle positions
̶ ranks particle by fitness
̶ lbest is defined as particle with best fitness outside
range of other subswarms
anti-convergence: re-initialized particles from least fit
subswarm
quantum cloud re-sampling procedure: randomly
repositions particles around center of their subswarm
34/
22
Dynamical Niching PSO (DNPSO)
• This algorithm maintains diversity in the search space by:
[Nickabadi, 2008]
1) using a local neighborhood topology, where sub-swarms
are dynamically created around masters (particles that are
their own local best in their neighborhood),
2) defining a minimal distance within which two masters
cannot co-exist,
3) allowing free particles that do not belong to a sub-swarm,
to move independently, and
4) reinitializing those free particles that exhibit low
velocities, meaning that they have converged on a non-
optimal position.
D2-35
Multi-objective optimization
Originally developed for static mono-objective optimization,

the PSO algorithm is adapted for multi-objective optimization
problems with mechanisms to :
1) select and update of leaders
2) promote diversity in the creation of new solutions using
PSO position update and mutation operators
Algorithms for MOPSO are classified as: aggregating,
lexicographic ordering, sub-population, Pareto-based, or
combined approaches.

D2-36
Algorithms for multi-objective optimization aim to generate and
select a set of non-dominated solutions (that belong to a Pareto
front), instead of a single solution as in global optimization:
In multi-objective PSO problems:
• Each particle may have a different set of leaders from which just
one can be selected to update its position. The set of leaders is
stored in an external archive of non-dominated solutions.
• The solutions contained in the archive are used as leaders to
update particle positions, and are also reported as the final output
of the algorithm.
D2-37
MOPSO algorithms:
1) The swarm is first initialized. A set of leaders is also initialized
with the non-dominated particles from the swarm, and stored in
an external archive.
2) During each generation and for each particle, a leader is
selected, and the particle position is updated.
3) The particle’s fitness is then evaluated and its corresponding
pbest value is updated. A new particle usually replaces its pbest
particle when this particle is dominated or if both are non-
dominated with respect to each other.
D2-38
MOPSO algorithms:
• Pareto-based approaches use leader selection techniques
based on Pareto dominance [Coello Coell, 2008].
• Leaders are defined as particles that are non-dominated with
respect to the swarm.
• Most authors adopt additional information (e.g., information
provided by a density estimator) in order to avoid a random
selection of a leader from the current set of non-dominated
solutions.

D2-39
CONTENU DU COURS
3) Variantes de PSO
4) Application – Optimisation évolutionnaire de
RNA

D2-40
D.2(4) Optimisation du FAM
Reconnaissance de visages en vidéosurveillance
Système générique pour la reconnaissance spatio-
temporelle de visages en vidéo surveillance

D2-41
Base COX-S2V: les individus marchent à travers un
circuit de caméras [Huang et al., ACCV 2012]

D2-42
Base COX-S2V: les individus marchent à travers un
circuit de caméras [Huang et al., ACCV 2012]
statique vidéo

D2-43
Défis – ressources de calcul: les réseaux de vidéo surveillance
comprennent beaucoup de caméras
Défis – environnements réelles sont complexes et changent

dynamiquement:
̶ la compression et basse qualité et résolution des vidéos
̶ interopérabilité des caméras
̶ conditions d’acquisition: variations de pose, expression, occlusion,
illumination, échelle, floue, etc.
feature Track numbers,
̶ modèles de visages peu robustes: vectors a and classification

Classification scores or tags
facial model
conçus a priori (lors de l’abonnement) May be poor representatives of the
avec des ROI référence en nombre limité Biometric

models
biometric trait to be recognized

D2-44
Réseau de classification ARTMAP
Versatility: capables d’apprentissage rapide, en-
ligne, supervisé, non-supervisé et incrémental
Constructive: les poids et l’architecture (neurones
F2) peuvent s’adapter en fonction de nouvelles
données

D2-45
Réseau de classification fuzzy ARTMAP
Structure simplifiée d’un réseau ARTMAP:

D2-46
Algorithme – mode entraînement:
1. Initialisation des poids: fixer tous les poids Wab = 0
2. Encodage de la prochaine entrée: (a, t)
3. Remise à zéro du seuil de vigilance ρ
4. Choix d’une catégorie
5. Applique le critère de vigilance
6. Prédiction d’une classe:
§ le code de réponse désirée t est présenté à Fab
§ fonction de prédiction: le patron y active la couche Fab
via les poids Wab
D2-47
6. Prédiction d’une classe: (suite)
{
§ prédiction: K = max S abk (y) : k = 1, 2,..., L }
code binaire yab est actif pour le neurone K correspondant
à la prédiction (yKab = 1 et ykab = 0 pour k ≠ K)
§ si la prédiction K correspond à la réponse désirée, on

procède à l’apprentissage (étape 7), sinon on effectue un
‘match tracking’

D2-48
• ‘match tracking’:
augmente ρ du fuzzy ART juste assez pour induire une

nouvelle recherche pour soit:
• trouver un autre neurone commis de F2 qui prédit la
classe désirée (étape 4)
• initier un neurone non-commis de F2 pour apprendre
la classe désirée (étape 7)

D2-49
7. Apprentissage:
• mise à jour du prototype de J: le vecteur prototype wJ
du neurone J est adapté selon:
• création d’un nouveau lien associatif: si J vient d’être

commis, on fixe wJKab = 1, où k = K est la réponse
désiré
Retour à l’étape 2 pour prendre une autre entrée

D2-50
Algorithme – mode test:
Afin de prédire la classe liée à chaque patron d’entrée:
1.
2. Encodage d’un patron d’entrée a
3.
4. Choix de catégorie
5.
6. Prédiction d’une K classe (sans tests)
7.
D2-51
Une taxonomie des RNA de la famille ARTMAP

D2-52
Taxonomy of the ARTMAP architecture
(based on the internal matching process)
1. Fuzzy category activation:
• a class is represented by one or more fuzzy set hyper-rectangles
• category activation using Webber law choice function
• EX: fuzzy ARTMAP, ART-EMAP, distributed ARTMAP, ARTMAP-IC…
2. Probabilistic category activation:

• a class is represented by one or more normal density functions
• estimate the posterior probability of each class in order to apply the Bayes
decision procedure
• EX: PROBART, PFAM, MLANS, Gaussian ARTMAP, Ellipsoid
ARTMAP, boosted ARTMAP
D2-53
Fuzzy ARTMAP
• category (F2 node) ≡ a fuzzy set
hyper-rectangle
• class (Fab node) ≡ 1+ fuzzy set
hyper-rectangles

D2-54
Gaussian ARTMAP
• category (F2 node) ≡ a normal
distribution
• class (Fab node) ≡ 1+ normal
distribution

D2-55

Common training strategies
1. one epoch (1EP):
learning is completed after one epoch
2. convergence based on training set classifications (CONVp):
learning ends once no training patterns are misclassified
3. convergence based on weight values (CONVw):
learning ends once weights remain constant for 2 successive
epochs
4. hold-out validation between epochs (HV):
learning ends once the Egen is minimized on an independent
validation subset
57/35
Training dynamics is governed by 4 inter-related
hyperparameters: h = ( b , a , r , e )
̶ Example: decision boundaries on P2 data for FAM trained
with different hyperparameters values h:
h1 = (0.7; 0.7; 0.8; 0.85) h2 = (0.13; 0.41; 0.08; 0.86) h3 = (0.67; 0.73; 0.68; 0.89)

58
PSO learning strategy
[Granger et al., JPRR, 2007]
Mono-objective: maximize the FAM classification rate
in the hyperparameter space, h
− PSO: population-based evolutionary optimization technique
− swarm ≡ a population or pool of N particles, each one
corresponding to a FAM network evolving in the h space
− PSO training strategy: co-jointly determines all parameters
of a FAM network (weights + architecture + h) such that Egen
is minimized
59
Inspired by the synchronous parallel version of PSO,
with exchange of information using:
1. pbest: pik – the best previously-visited position of
particle i
2. gbest: pgk – the best particle position for the swarm
PSO update of hik in 2-D space for iteration k+1:
− particles move though the
search space by following
the particle with the best
fitness value
hik = ( bik , a ik , rik , e ik )
60/
35
A. Initialization: N, kmax , r1 , r1 , c1 , c2 ,wk , hi0 , etc.
B. Iterations:
while k ≤ kmax or Egen(pgk) – Egen(pgk-1) < φ do
• for i = 1, 2, . . . , N particles
− train FAM network using parameters of hik on Dtrain
− compute fitness value of network Egen(hik) on Dval
− if Egen(hik) ≤ Egen(pik), update pbest (pik = hik)
• select the gbest particle, g = argmin{Egen(hik)}
• for i = 1, 2, . . . , N particles
− update particle velocity vik+1 and position hik+1
• update particle inertia wk, and increment k = k + 1
61/35
Experimental Methodogy:
4 independent replications: retain network with best pgk
each replication is performed with N = 15 particles
− particle vectors are initialized randomly according to a uniform
distribution
− except hi0 is set to minimize resources
a trial ends if:
− kmax = 100 iterations
− Egen(pgk) s constant for 10 consecutive iterations
c1 = c2 = 2
wk decreased linearly from 0.9 to 0.4 over kmax
r1 and r1 = random numbers from a uniform distribution
62/
35
Synthetic data sets Dµ(ξtot) and Dσ(ξ tot) – linear decision
bounds where class distributions overlap:
Dµ(ξtot) Dσ(ξtot)

63/
35
Synthetic data sets DCIS and DP2 – non-linear decision
bounds where class distributions do not overlap:
DCIS DP2

64/
35
Average Egen vs. training set size
Dµ(9%) DCIS

65/35
Average Egen vs. network compression
DCIS Dµ(9%)

66/
35

Average Egen at 5000 patterns per class

67
Ensembles évolutives avec DPSO
Abonnement et adaptation de modèles faciales en
fonction de nouvelles données référence
Enrollment Update
New individuals are Individuals already in the
added to the system. system are updated.
SYS843: Réseaux de neurones et systèmes flous 68

Supervised incremental learning
Ø new training data Di is acquired from the environment at
different instants in time ti for i = 1, 2, ..., n
Di : block of labeled training data available to the classifier at
discrete time ti
hi : hypothesis of classifier based on hi-1 and training with Di

Survey – Incremental Learning Techniques

1. Classifiers designed for incremental learning: such as
ARTMAP and Growing Self-Organizing families of neural
networks
2. Adaptations of popular classifiers: such as the SVM, and
the MLP and RBF neural networks
3. Ensemble of Classifiers: such as Learn++ with MLPs

Challenge of Adaptation
Knowledge corruption – common during incremental learning of
new reference data [Khreich et al., Information Sciences 2012]

11
Challenge of Adaptation
La variations des conditions d’acquisition (e.g.,
illumination et pose) permet de définir différent
concepts:

72
Data sets with complex decision boundaries and
overlapping class distributions:
D2N(13%) DXOR(13%) DCIS DXOR-U DP2


Protocole for each trial:
• divide each data set into LEARN and TEST subsets, each one
with 5,000 patterns/class
• subdivide LEARN into b blocks of data Di (i = 1, 2, …, b),
each one with an equal number of patterns per class
– large blocks: b =10 blocks with |Di| = 1000 patterns
– small blocks: b =100 blocks with |Di| = 100 patterns
• in each block, 2/3 of patterns from each class are used for
training, and the rest for validation
Batch learning process:
t1 : ARTMAP0(D1) → ARTMAP1
t2 : ARTMAP0(D1 U D2) → ARTMAP2
•••
tb : ARTMAP0(D1 U D2 U … U Db) → ARTMAPb
Incremental learning process:

•••
tb : ARTMAPb-1(Db) → ARTMAPb
Average error rate of fuzzy and Gaussian
ARTMAP networks with all data sets

Approche pour adapter h = (α, β, ε, ρ): stratégie
d’apprentissage incrémentale basé sur DPSO, afin d’évoluer
un ensemble hétérogène de classifieurs
F1 F2 F ab
A1 1 W 1 Wab 1
c
A2 2 2 2
x y yab
... ... ... ...
a
A2I 2I J K
|A| |x|
r e
Fusion

L’appentissage incrémental de nouvelles données
correspond à un problème d’optimization dynamique
tel que:
[Connolly et al., Information Sciences, 2010]

78
Un système adaptif pour le contrôle d’accès (1:N):
• Comprend une LTM, un essaim de classifieurs incrémentaux et
un module d’optimisation dynamique [Connolly et al., PR 2012]
Swarm (Pool) of EoC Selection and

Long Term Memory: stores Incremental Learning Fusion of base
reference samples for validation Classifiers: guided by a classifiers from the
dynamic PSO algorithm swarm

79
DPSO Strategy for Incremental Learning – given a
new block of data Dt , evolve an EoFAMs as follows:
- DNPSO algorithm to re-optimize parameters
of N classifier such that f(h,t) is maximized
- direct selection of classifiers using particle

swarm properties (optima: local best particles)
- basic majority voting for output predictions

80
Dynamic PSO (DPSO) algorithms allow to:
− maintain diversity in the h space
− detect and track several optima in h space over time
Since h governs learning dynamics, diversity among
particles also assures diversity among corresponding classifiers
Classification Optimization
environment environment
C1 ...
C2
f2(h)
…
h2
CK
...
Decision space h1 f1(h)
Feature space
Predefined Search spaces Objective space
Input feature vectors
class labels Hyperparameters Objectives
a = (a1, ..., aI)
W = {C1, ..., CK} h = (h1, ..., hD) o = (f1(h), …, fO(h))
Mapping Mapping

81
Particle properties are used for direct selection
1. Initial selection: classifiers
associated with positions
of the local best particles
2. Second selection: greedy
search among remaining particles
that seeks to increase the average
diversity among particles
Low cost: avoids computing classifier diversity indicators in input

features space
82
NRC-IIT data [Gorodnichy, NRC-48216, 2005]
Task: bio-login, recognize the user of a PC
11 individuals: 2 video sequences per individual, one dedicated
for training and the other for testing
challenging conditions: changes in pose, expression and
proximity, motion blur, low resolution and partial occlusion
SYS843: Réseaux de neurones et systèmes flous 83/22

CMU-MoBo data [Gross et al., CMU-RI-TR-01-18, 2001]
Task: recognize subjects walking
25 individuals: several indoor videos per individual, 6 cameras
and 4 motions
challenging conditions: changes in pose, expression, blur, low
resolution and partial occlusion
SYS843: Réseaux de neurones et systèmes flous 84/22

Evolving Ensembles with DPSO
Performance for the update scenarios:
[Connolly et al., PR 2012]

85
Cumulative Match Curves:

86
Stratégie d’appentissage MOPSO
[Prieur et al., CEC2010]
Determines FAM weights + architecture +

parameters such that the error rate and resources are
minimized:
− inspired by an approach by Coello Coello et al. (2004) – the
leader selection technique is based on Pareto dominance
− external archive: the algorithms outputs a set of non-
dominated FAM networks (solutions on the Pareto front)
− allows to trade-off accuracy vs. resources, and discover more
cost-effective FAM networks
87
Dcis data: average Egen vs. training set size
(Granger et al., JPRR 2007)

88
A. Initialization:
− set MOPSO parameters and counters
− initialize particle positions
B. Iterations:
while q ≤ qmax iterations do
1. for i = 1, 2, . . . , P particles
– train FAM network using parameters of hiq
– compute fitness value F(hiq) of network on validation data
– update particle pbest pi if F(hiq) dominates F(pi)
– management (add/remove) of leader particles in archive
2. for i = 1, 2, . . . , P particles
− select a leader particle from archive
− compute particle velocity viq+1 and update its position hiq+1
3. increment iteration counter q = q + 1
89
Protocol: k-fold cross validation:
divide data for learning into k = 10 folds
fixed parameters: b = 1, a = 0.001, r = 0, e = 0.001
stop training epochs: CR(VAL1)e – CR(VAL1)e-1 < 0.001
Trials with the PSO/MOPSO strategies:

each replication is performed with P = 32 particles
− particle positions are initialized randomly, except si0
is set to minimize resources
select a second VAL2 set for fitness evaluation
a trial ends if: qmax = 25 iterations
90
(4) Simulation Results
Batch learning of NRC data: error rate vs. compression

91/
33
Ensembles évolutives à Critères Multiples
ADNPSO: mimetic approach for multi-objective
optimization (to minimize error rate and complexity)
[Connolly et al., ASC2013]
ADNPSO
LTM Hyper- module
parameters
Fitness
Dt
Archive
+
Swarm of (pool of Selection hypt
incremental classifiers) and fusion
hypt-1 classifiers
A Long Term Memory that Swarm of incremental An archive that An ensemble

captures data from each classifiers guided by a selects non- selection and
class. dynamic particle swarm dominated classifiers fusion module
optimization algorithm

92
ADNPSO: mimetic approach for multi-objective
optimization (to minimize error rate and complexity)

93
Position of local Pareto fronts for 2 search spaces:
̶ MOPSO: seeks to find true optimal Pareto-optimal solutions [dark]
̶ ADNPSO: seeks to search both spaces to find diverse (locally Pareto-
optimal) solutions [light]
̶ detect local Pareto front to find solutions between the local optima
94
Equation to define PSO motion
Final position Inertia
Initial position A sum of

influences
Illustration

Equation to define ADNPSO motion
ADNPSO: Aggregated dynamical niching PSO
• Cognitive influence: personal best position
• Social influence: local best position

With sub-swarms defined dynamically!
Particles Influences

Requires a specialized archive and ensemble selection
selection
Error rate (%)
FAM network size (number of F2 nodes)

Selection:
1. most accurate FAMs per network size domain (phenotype diversity)
2. additional greedy search selection to maximize genotype diversity
98
Protocol
Performance is assessed for AMCSs designed and
update according to 3 different IL strategies:
1. DNPSO: Dynamic Niching PSO
selects local best DNPSO particles + greedy search in swarm
2. MOPSO: Multi-Objective PSO
uses archive and notion of dominance to guide particles
towards Pareto-optimal front
3. ADNPSO: Aggregated DNPSO
uses archive with phenotype local best particles + greedy
search to maximize genotype diversity

99
Error rate and complexity indicators of approaches:
̶ after incremental learning of all 12 blocks
̶ complexity: ensemble size, average and total number of F2
nodes for entire ensemble

100
Examples of results in the objective space
̶ circles: show evolution of the swarm after learning of all 12 blocks
̶ light and dark circles: position of each particle at the start and end of
optimization process.
̶ squares: solutions found by DNPSO or stored in archive (MOPSO and ADNPSO)
DNPSO MOPSO ADNPSO

101
Error rates versus the number of ROIs used to
identify individuals over time:
̶ in video surveillance, predictions are accumulated, and
several predictions are used for FR

102
Cumulative match curves over time:
̶ in video surveillance, predictions are accumulated, and
several predictions are used for FR
̶ performance when 15 ROIs are used to perform recognition

103
Incremental learning strategy based on ADNPSO allos
to evolve EoCs in response to new data
̶ ADNPSO algorithms: allows multi-objective min, where
both genotype and phenotype diversity are maintained
̶ specialized archive: local Pareto-optimal solutions found by
ADNPSO can be stored and combined
̶ EoCs based on accuracy + genotype and phenotype diversity
Results: ADNPSO yields a high level of accuracy comparable

to mono-objective DNPSO (higher than MOPSO), but with a
fraction of the complexity
104
Impact of incremental learning on video-based
classification rate:
• EoCs formed using DNPSO on IIT-NRC update scenario
100
Classification rate (%)
90
80
70
60
50
40
100
90
12
11 80
Is
10
9 60
70
d RO
8
te on
etecificati
7 50
Tim
6 40
f d
er o lass
5
4 30
e 3 20
mb for c
2
1
1
10 u
N sed
u

105
Accuracy: number of regions of interest to reach an
error rate comparable to 0%

106
Computational complexity: network size

107

Acetd2 Sys843 PDF

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Acetd2 Sys843 PDF

Transféré par

Droits d'auteur :

Formats disponibles

SYS843

D. Méta heuristique et optimisation

Partie 2: Optimisation par essaims

SYS843: Réseaux de neurones et systèmes flous

SYS843: Réseaux de neurones et systèmes flous

SYS843: Réseaux de neurones et systèmes flous

Un algorithme PSO imite le comportement social

SYS843: Réseaux de neurones et systèmes flous

La vélocité des particules (direction et vitesse) est guidée par

SYS843: Réseaux de neurones et systèmes flous

Particle Swarm Optimization

présentation basée sur l’article

SYS843: Réseaux de neurones et systèmes flous

SYS843: Réseaux de neurones et systèmes flous

• pi : the best position that it has found so far

• gi : the best position that has been found so far in its

SYS843: Réseaux de neurones et systèmes flous

vi vi U (0, φ1) ( pi xi ) U (0, φ2 ) ( gi xi )

mentum its current direction

eturn to its own best

cial component neighborhood

Ring (local best) Global best

Random graph Star

• gbest: each particle is influenced by

• lbest: each particle is influenced only

SYS843: Réseaux de neurones et systèmes flous

SYS843: Réseaux de neurones et systèmes flous

•Shi,Rule-of-thumb: set ω = 0.7298 and ϕ1 = inϕ2Evolutionary

SYS843: Réseaux de neurones et systèmes flous

SYS843: Réseaux de neurones et systèmes flous

SYS843: Réseaux de neurones et systèmes flous

Particle Swarm Optimization

SYS843: Réseaux de neurones et systèmes flous

SYS843: Réseaux de neurones et systèmes flous

• Binary/discrete particle swarms

SYS843: Réseaux de neurones et systèmes flous

• Originally developed for static optimization problems, the

SYS843: Réseaux de neurones et systèmes flous

Dynamic Niching PSO [Nickabadi, 2008]:

neighborhood topology: dynamically create subswarms

free particles (not in a subswarm): explore independently

Originally developed for static mono-objective optimization,

SYS843: Réseaux de neurones et systèmes flous

SYS843: Réseaux de neurones et systèmes flous

SYS843: Réseaux de neurones et systèmes flous

SYS843: Réseaux de neurones et systèmes flous

SYS843: Réseaux de neurones et systèmes flous

SYS843: Réseaux de neurones et systèmes flous

Défis – environnements réelles sont complexes et changent

̶ modèles de visages peu robustes: vectors a and classification

conçus a priori (lors de l’abonnement) May be poor representatives of the

avec des ROI référence en nombre limité Biometric

SYS843: Réseaux de neurones et systèmes flous

SYS843: Réseaux de neurones et systèmes flous

SYS843: Réseaux de neurones et systèmes flous

§ si la prédiction K correspond à la réponse désirée, on

SYS843: Réseaux de neurones et systèmes flous

augmente ρ du fuzzy ART juste assez pour induire une

SYS843: Réseaux de neurones et systèmes flous

• création d’un nouveau lien associatif: si J vient d’être

Retour à l’étape 2 pour prendre une autre entrée

SYS843: Réseaux de neurones et systèmes flous

2. Probabilistic category activation: